OR in an OB World: 2019

Thursday, December 26, 2019

1-D Cutting Stock with Overlap

An interesting variation of the 1-D cutting stock problem was posed on OR Stack Exchange. Rather than having a specified demand for various size cut pieces, the requirement is to cut pieces to cover a specified number of units of fixed lengths. Moreover, you can use multiple pieces to cover one target unit, but if you do, you have to allow a specified overlap between adjacent pieces.

There are two common ways to build an integer programming model for a "typical" 1-D cutting stock problem. One is to use decision variables for how many pieces of each defined length are cut from each piece of stock (along with a variable or variables for how many pieces of stock are used). That approach contains a high degree of symmetry, since it uses an index for each piece of stock that is cut and the solution is invariant to permutations in that index. The other is to define cutting patterns, and use decision variables for how many pieces of stock are cut according to each pattern (along with variables for how many cut pieces result). There's no symmetry issue, but the number of possible patterns blows up pretty quickly. For problems where the number of patterns is too large to comfortably enumerate, Gilmore and Gomory [1] proposed a column generation approach (which has since been improved on by others). They start with a small number of patterns and relax the integrality conditions to get a linear program. Each time they solve the LP, they use the dual values in a subproblem to see if they can generate a new pattern that would improve the LP solution. If so, they add the pattern, solve the modified LP, and repeat. When no new pattern emerges, they restore the integrality restrictions, solve a MILP once, and get a good (though not provably optimal) solution. More recently, branch and price (or branch-price-and-cut as it is sometimes known) has emerged as a method for finding a provably optimal solution using patterns.

For the problem at hand, we actually have two places where we must choose between piece counts and symmetry or pattern counts: how we will cut the stock, and how we will piece together covers for the output parts. I'll reproduce here the answer I gave on OR Stack Exchange, which mixes the two methods (patterns for cutting, piece counts for output).

Let $L$ be the length of a piece of stock and $D$ the length of a part that needs to be covered. (I'll keep this simple with a single size for stock and a single size item to be covered, but you can easily expand to multiple lengths of either by tacking on subscripts.) $N$ will denote the number of items needing to be covered. $M$ will be the required overlap between adjacent cover pieces.

Suppose that we have a set $\mathcal{P}$ of patterns for cutting the stock, and a set $\mathcal{K}$ of possible piece sizes (where $k\in\mathcal{K}\implies k\le \min\{L, D\}$). Our first decision variable will be the number $x_p\in \mathbb{Z}_+$ of units of stock cut according to pattern $p\in\mathcal{P}$. We want to minimize the use of stock. Assuming there is no scrap value for unused cut pieces or trim waste, the objective is just $$\min \sum_{p\in\mathcal{P}} x_p.$$Now let $\pi_{k,p}$ be the number of pieces of length $k\in\mathcal{K}$ we get from one unit of stock cut according to pattern $p\in\mathcal{P}$, and define variable $y_k \ge 0$ to be the total number of pieces of length $k\in\mathcal{K}$ produced by all cutting operations. ($y_k$ will take integer values, but we do not need to restrict it to be integer.) The constraints tying the $y$ variables to production are $$y_k = \sum_{p\in\mathcal{P}} \pi_{k,p}x_p \, \forall k\in\mathcal{K}.$$Finally, let $n=1,\dots,N$ enumerate the items needing to be covered, and let variable $z_{k,n}\in \mathbb{Z}_+$ denote the number of pieces of length $k\in\mathcal{K}$ used to cover item $n\in\{1,\dots,N\}$. To be sure that we do not use pieces we do not have, we add constraints $$\sum_{n=1}^N z_{k,n} \le y_k \,\forall k\in\mathcal{K}.$$Note that this is a "single period" model (we fulfill demand once and that's that). If the cutting and covering operations were to be repeated over time, we would need to add inventory constraints to account for excess cut pieces being carried over to a subsequent period. Finally, we need to be sure that the solution actually covers all the output units, and meets the overlap conditions. We combine both those restrictions into a single set of constraints:$$\sum_{k\in\mathcal{K}} k\cdot z_{k,n} \ge D + M\left(\sum_{k\in\mathcal{K}}z_{k,n} - 1\right) \,\forall n=1,\dots,N.$$By using an inequality there rather than an equation, I snuck in an assumption that excess length can be trimmed at no cost.

This formulation uses patterns on the cutting side but not on the covering side. That means the covering side is subject to symmetry concerns. Namely, given any feasible solution the model, if we permute the subscript $n$ we get another feasible solution, nominally different but with an identical cost. For instance, suppose that $D=11$ and $M=1$, and that we have a solution that covers unit 1 with two pieces of length 6 and unit 2 with two pieces of length 5 and one piece of length 3. Change that so that unit 2 gets two pieces of length 6 and unit 1 gets two pieces of length 5 and one of length 3, and you have a "different" solution that really is not different in any meaningful way. There are various ways to reduce or eliminate the symmetry via added constraints. One simple way to reduce (but not eliminate) the symmetry is to require that the units receiving the most pieces have the lowest indices. The added constraints are $$\sum_{k\in\mathcal{K}}z_{k,n} \ge \sum_{k\in\mathcal{K}}z_{k,n+1} \,\forall n=1,\dots,N-1.$$(This still leaves symmetry among covering patterns that use the same number of pieces, such as 5-5-3 versus 5-4-4.)

[1] Gilmore, P. C. & Gomory, R. E. A Linear Programming Approach to the Cutting-Stock Problem. Operations Research, 1961, 9, 849-859

Thursday, October 17, 2019

A Value-ordered Java Map

For a project I'm working on (in Java), I wanted to store pairs of items in a key-value map. The catch is that I need the items sorted according to the values, not the keys. Java provides an efficient, presorted mapping in the TreeMap class, but it sorts on the keys, not the values. Revering the roles of keys and values is not an option for me, as I might have several keys mapped to the same value.

A Google search wasn't particularly helpful, other than to confirm that I was screwed. The most practical suggestions I saw revolved around the theme of using a conventional mapping (HashMap, TreeMap, whatever) and then, as needed, dumping the entries into a list or stream and sorting that using a comparator based on the values. For my application, though, this would mean modifying the map contents, exporting it and sorting it thousands of times, which would put a big drag on the application. So I really wanted something moderately efficient that would maintain the map contents in sorted order. Eventually, I gave up looking for a solution and wrote one myself. I gave my class the not entirely compact name ValueOrderedMap. It's generic, meaning you can specify any types for keys and values (as long as both types are comparable).

I've implemented most of the commonly used methods associated with map classes (but not all known map methods), and I've tried to make it thread-safe (but as yet have not tested that). Anyone who wants to take it for a spin is welcome to. It's released under a Creative Commons license. There are only two classes, and only one (ValueOrderedMap) is essential; the other is just a small demo program. You can find the source code in my campus GitLab repository. There's an issue tracker (requires registration) where you can report bugs or ask for new features. If you just want a binary jar file (along with the Javadoc, README and license files), I've parked a zip archive on my faculty website.

Tuesday, October 8, 2019

Numerical Inaccuracy (Linear Algebra Edition)

"There are three kinds of lies: lies, damned lies, and statistics." The origin of the statement is lost in time, but the sentiment lives on. I'm tempted to give it a 21st century update. "There are three kinds of lies: lies, damned lies, and floating point arithmetic."

I am currently working on a research project in which the following problem arises. We start with an $m \times n$ matrix $A$ with $m < n$, and a set of indices $I\subseteq \{1,\dots,n\}$. I need to confirm that $x_i = 0$ for all $i\in I$ and for all $x\in \mathcal{N}$, where $\mathcal{N} = \{x\in \mathbb{R}^n : Ax = 0\}$. There are various names for $\mathcal{N}$; I'll go with the null space of $A$.

How do you test whether a particular component of every vector in an uncountably infinite set is zero? Since $\mathcal{N}$ is a vector space, we can compute a basis for it and just check the basis vectors. If $x_i =0$ for every basis vector $x$ of $\mathcal{N}$, it will be true for every vector in $\mathcal{N}$.

This is going on in a Java program, and I have tried multiple linear algebra packages. Initially I had the idea to use a QR decomposition of $A$ to get a basis for the null space, but that requires (at least as far as I know) an upper triangular $R$ component, and neither of the two most promising packages I tried delivered one. (I'm pretty sure the Javadocs for one or both said $R$ would be upper triangular ... but it wasn't.) So I switched to singular value decomposition (SVD) using the EJML library. SVD is generally slower than QR decomposition, but tends to be more accurate.

I got an answer, but a check performed by the Java program warned me that some of the $x_i$ values that were supposed to be zero were in the vicinity of $10^{-6}$, which is a bit dicey to shrug off as rounding error. (The basis vectors are all normalized, by the way.) So I decided to port $A$ and some other stuff over to R and do some testing there ... which is when the adventure began.

The dimensions of this $A$ matrix were $m=401$ and $n=411$. EJML gave me a basis of 17 vectors, indicating that $A$ had nullity 17 and thus rank 394. (The nullity of matrix $A$ is the dimension of $\mathcal{N}$, and is in turn the difference between $n$ and the rank of $A$.) I tested this by doing QR and SVD decompositions of $A$ (using the qr() and svd() functions from the R base package) and the Rank() function from the R pracma package. The QR and SVD results indicated that $A$ had rank 393 and nullity 18. Only pracma::Rank() agreed with EJML ... and that came with the warning "Rank calculation may be problematic". (As if the rest of this stuff isn't.)

So EJML might have shorted me one dimension in the null space ... or maybe not. The smallest eigenvalues of $A$ from the SVD were as follows: 9.16e-05, 5.37e-06, 1.09e-10, 1.13e-15, 7.56e-16, 6.20e-16, 4.87e-16, 3.79e-16, 3.37e-16, 3.16e-16. So the last seven (italicized) were pretty clearly zero and the preceding one (bold) is maybe zero, maybe not. The SVD does not produce the ten zero eigenvalues resulting from the difference between $m$ and $n$, so basically we're looking at a nullity of 18 if you treat the bold one as zero and 17 if you don't.

The 17 basis vectors EJML gave me were all at least close to zero in all components $i\in I$. R had almost all the designated components $10^{-7}$ or smaller. The pracma::nullspace() function gave me 18 basis vectors (despite pracma::Rank() coming up with a nullity of 17), and some of the corresponding $x_i$ were pretty big (0.29, for example) -- way too big to be zero. So my condition is (probably) satisfied if I believe EJML and (definitely) not satisfied if I believe the R computations.

Hoping for some clarity, I calculated (in R) $Ax$ for each basis vector $x$ from either EJML or pracma::nullspace(). The products with the EJML basis vectors were all zero vectors (meaning no components larger than about $10^{-15}$ in magnitude). For the pracma basis vectors, the products had components around $10^{-9}$ in many cases. So much for clarity. Do we treat $10^{-9}$ as approximately zero and believe pracma, or do we call it nonzero and believe EJML?

I'm not new to the limitations of double-precision arithmetic. My first course in grad school was numerical analysis, and I've had a fair share of adventures with MIP models in CPLEX where the numerics were, shall we say, dicey. Still, it's rather frustrating to have to try multiple packages and then have to judge (meaning guess) which one is lying to me the least.

Sunday, September 1, 2019

Further Adventures with MythTV

I've documented various travails getting MythTV (specifically, via the Mythbuntu Linux distribution) to behave as expected and play well with my hardware. (If you want to see them, you can click the MythTV link in the tag cloud on the lower right, or use the search box.) Unfortunately, I've been stuck on an old version, unable to upgrade safely, for a while. The problem was likely the BIOS. Assorted kernel updates made waking up to record shows dicey at best, so I froze the machine on a now rather dated kernel ... when meant sticking to a rather dated version of Mythbuntu as well.

I finally got around to buying a new PC (HP Pavilion) to be my PVR. In the meantime, development of Mythbuntu has apparently ceased, so I decided to install Linux Mint (the same operating system I have on my desktop and laptop) and then install MythTV on top of that. What I thought would be an afternoon's work stretched out over four days or so, due to a combination of BIOS adventures (starting with disabling "SecureBoot" and convincing the BIOS to let me boot from the DVD drive ... which I somehow had to do twice) and my natural ineptitude. I mistakenly thought that having a working MythTV installation on the old machine, from which I could copy various settings and scripts, would make installation smooth sailing. Going into the process, my expectation was that moving the backend database from the old machine to the new one would be the sticking point. It turned out that was the one step that went (fairly) smoothly.

In this post, I'll document a couple of issues and how I sorted them out.

First issue: No sound

Watching TV is considerably less fun when you have no sound. Both the old machine and the new one were/are connected to my TV via an HDMI cable, and after a bit of fuss sound worked on the old machine. I have no idea why the new machine put up more of a fight, but I suspect it has something to do with using Mythbuntu versus Mint and which sound components each installs by default.

At any event, I found a couple of blog posts that, between them, got me over the hump:

Fix No Sound (Dummy Output) Issue In Ubuntu With SND HDA Intel

No sound on Mythbuntu/optiplex frontend SOLVED

I have an old post of my own relating to the same problem (HDMI Audio in Mythbuntu), but the symptoms on the Mythbuntu box were a bit different, so I'm not sure how helpful it would be on Mint (or any other distribution).

Second issue: Being bugged for a password

I swear that this never happened on Mythbuntu, so it surprised me when it happened on Mint. On both machines, I use the MythWelcome program as the gateway to the front end, and the Mythshutdown application to set wake-up times for scheduled recordings and shut the machine down (either automatically when idle for a specified time, or manually by clicking the shut down button in MythWelcome). On the new machine, shutting down from the MythWelcome menu resulted in a prompt for the root password. Back end use of mythwelcome did not produce a visible password request, but also did not work as expected. I suspect it might have been trying to ask for a root password in a non-graphical context (nowhere to display the password prompt). On the old machine, it just worked with no intervention on my part.

It turns out I was missing a file. The MythTV installation process on both machines created a "mythtv" user group and added my account to it. What was missing was an addition to the sudoers file -- or, more precisely, a file parked in the /etc/sudoers.d directory, which is automatically appended to the sudoers file at startup. The file needs to be installed as root. Apparently Mythbuntu did that automatically, whereas the MythTV installer did not (and did not prompt me to do so).

The file's name is just "mythtv" (which I don't think is significant -- any file name should work). It is a plain text file, containing the following single line:

%mythtv ALL = NOPASSWD: /sbin/shutdown, /bin/sh, /usr/bin/setwakeup.sh, /usr/bin/mythshutdown, /sbin/poweroff, /sbin/reboot

The first bit ("%mythtv" -- don't forget the percent sign) says that what follows applies to anyone in the "mythtv" user group. The rest identifies six programs that get to run as root without requiring a password to be entered. Note that it includes the "setwakeup" script (which sets the wake-up time for a specified interval ahead of the next scheduled recording) and various commands relating to shutting down or rebooting the system. I'm a bit surprised to see /bin/sh in there, which apparently lets the mythtv group run any shell script that requires elevated privileges. That said, I'm not going to screw with it. If it ain't broke, don't fix it!

Saturday, July 20, 2019

Using Java Collections with CPLEX

Disclaimer: What follows is specific to Java, but with some name changes will also apply to C++. If you are using one of the other programming APIs for CPLEX, something analogous may exist, but I would have no idea about it.

I've seen a few questions by CPLEX users on forums recently that suggest the questioners may be moving from using a modeling language for optimization to using a general purpose language. Modeling languages tend to be more expressive (at least in my opinion) when it comes to writing optimization models. That's hardly shocking given that they are built for that purpose (and general programming languages are not). Recent questions have been along the lines of the following: I was using a structure (a tuple, or something similar) supported by the modeling language, and I don't know how to translate that to the general language; or I was creating arrays of variables / constraints / whatever indexed by something human-readable (probably names of things) and I don't know how to switch to integer-indexed arrays and keep it readable.

In my mind, the answer to those issues in Java is to make good use of classes and the Java Collection interface (and its various implementations). I'll give some examples in the context of a network flow problem.

In a modeling language, I might use a tuple to represent an arc. The tuple would consist of some identifier (string? index?) for the tail node of the arc, some identifier for the head node of the arc, and maybe an arc weight or some other arc property. In Java, I would create a class named Node to represent a single node, with fields including the node's name or other unique identifier and any costs, weights or other parameters associated with the node. Then I would create a class named Arc with fields of type Node holding the tail and head nodes, along with fields for any parameters associated with the arc (such as cost), and maybe a string field with a human-friendly label for the arc.

Now suppose I have an instance of IloCplex named cplex, and that I need a variable for each arc representing the flow on the arc. The conventional (?) approach would be to create a one dimensional array of IloNumVar instances, one per arc, and park the variables there. In fact, CPLEX has convenience methods for creating vectors of variables. The catch is that it may be hard to remember which index corresponds to which arc. One partial answer is to attach a label to each variable. The various methods in the CPLEX API for creating variables (and constraints) all have versions with a final string argument assigning a name to that variable or constraint. This works fine when you print out the model, but it's not terribly helpful while you're doing the coding.

I pretty much always use that feature to assign labels to variables and constraints, but I also employ collections in various ways. In my hypothetical network flow example, I would do something like the following.

HashSet<Node> nodes = new HashSet<>();
// Fill nodes with all node instances.
HashSet<Arc> arcs = new HashSet<>();
// Fill arcs with all arc instances.
HashMap<Arc, IloNumVar> flowVars = new HashMap<>();
HashMap<IloNumVar, Arc> flowArcs = new HashMap<>();
for (Arc a : arcs) {
  IloNumVar x = cplex.numVar(0, Double.MAX_VALUE, "Flow_" + a.getID());
  flowVars.put(a, x);
  flowArcs.put(x, a);
}

I put the nodes and arcs in separate collections (I used sets here, but you could equally well use lists), and then I create mappings between model constructs (in this case, arcs) and CPLEX constructs (in this case, variables). Collections can be iterated over, so there is no need to mess around with going back and forth between model constructs and integer indices. For each arc, I create a variable (and give it a name that includes the string identifier of the arc -- I'm assuming here that I gave the Arc class a getID method). I then map the arc to the variable and the variable to the arc. Why two maps? The arc to variable map lets me grab the correct variable while I'm building the model. For instance, a flow balance constraint would involve the flow variables for all arcs leading into and out of a node. I would identify all those arcs, park them in a collection (list or set), iterate over that collection of arcs, and for use the flowVars map to get the correct variables.

The reverse mapping I set up comes into play when the solver has a solution I want to inspect. I can fetch the value of each variable and map it to the corresponding variable, along the following lines.

HashMap<IloNumVar, Double> solution = new HashMap<>();
for (IloNumVar x : flowArcs.keySet()) {
  solution.put(x, cplex.getValue(x));
}

When I need to associate variable values with arcs, I can iterate over solution.entrySet(). For each entry e, flowArcs.get(e.getKey()) gives me the arc and e.getValue() gives me the flow on the arc.

Similar things can be done with constraints (IloRange instances).

Saturday, June 22, 2019

Evaluating Expressions in CPLEX

There's a feature in the Java API for CPLEX (and in the C++ and C APIs; I'm not sure about the others) that I don't see mentioned very often, possibly because use cases may not arise all that frequently. It became relevant in a recent email exchange, though, so I thought I'd highlight it. As usual, I describe it in the context of Java and let users of other APIs figure out the corresponding syntax.

Anyone who uses CPLEX knows how to use IloCplex.getValue() or IloCplex.getValues() to retrieve the values of the model variables in the final solution. What might fly under the radar is that there is an overload that takes as an argument an expression involving the variables (an instance of IloNumExpr), evaluates that expression, and returns the value. This is more a convenience than an essential feature: armed with the variable values, you could presumably compute the expression value yourself. The convenience aspects are not trivial though. Using getValue() to evaluate an expression saves you having to retrieve coefficients from someplace and likely run through a loop each time you evaluate the expression. (You do have to create the expression and store a pointer to it, but you do that once.) Also, if the only reason you need the variable values is to evaluate the expression, it saves you having to use getValue() to get the variable values.

Where might you use this? It's possible to use it to compute slacks for cuts you generate somewhere. (Use it to evaluate the cut expression, then subtract the relevant bound on the cut inequality and you have the slack.) It's also possible to use it to track secondary criteria or objectives that you did not build into the model. I cobbled together some code to demonstrate the latter use, which is now available from my campus GitLab repository. The underlying problem comes from an old post by Erwin Kalvelagen ("Minimizing Standard Deviation"). It involves loading pallets and minimizing the standard deviation of the pallet loads. Let $p$ be the vector of pallet loads and $\mu$ the mean load per pallet. My code creates expressions for both $\parallel p-(\mu,\dots,\mu)\parallel_{1}$ and $\parallel p-(\mu,\dots,\mu)\parallel_{2}^{2}$, minimizes the $L_1$ norm (which is computationally easier than minimizing the $L_2$ norm), and evaluates for each newly found solution (in an incumbent callback) and for the final solution.

The use in a callback brings up an important point. There are getValue() methods in several classes, and you need to be a bit careful about which one you are using. IloCplex.getValue() evaluates the expression using the final solution, and will generate error 1217 (no solution exists) if you try to use before the solver is done. That in particular means you cannot use it in a callback. Fortunately, the relevant callbacks have their own versions. IloCplex.LazyConstraintCallback.getValue() and IloCplex.IncumbentCallback.getValue() evaluate expressions using the new candidate and accepted integer-feasible solution, respectively. Other callbacks (solve, branch, user cut) evaluate using the solution to the LP relaxation. To use any of them, call this.getValue(...) or just getValue(...). Do not call mip.getValue(...) inside a callback, where mip is the problem you are solving: that will invoke IloCplex.getValue() and trigger the aforementioned error.

Note that there are also getSlack() and getSlacks() methods for computing the slack in a constraint. The former takes a single instance of IloRange as argument and the latter takes a vector of IloRange instances. Like getValue(), there are separate versions in IloCplex and in the callback classes. I assume they behave the same way that getValue does, but I have not actually checked that.

Finally, if you have switched to generic callbacks, there is good news. The IloCplex.Callback.Context class (the class used by the argument to your callback function) has getCandidateValue(), getIncumbentValue() and getRelaxationValue() for evaluating expressions based on a proposed new integer-feasible solution, the current best integer-feasible solution and the current node LP solution, respectively. (IloCplex.getValue() remains as it was.) Not only can you do the same evaluations, but the method names are utterly unambiguous. Gotta love that!

Sunday, June 16, 2019

R v. Python

A couple of days ago, I was having a conversation with someone that touched on the curriculum for a masters program in analytics. One thing that struck me was requirement of one semester each of R and Python programming. On the one hand, I can see a couple of reasons for requiring both: some jobs will require the worker to deal with both kinds of code; and even if you're headed to a job where one of them suffices, you don't know which one it will be while you're in the program. On the other hand, I tend to think that most people can get by nicely with just one or the other (in my case R), if not neither. Also, once you've learned an interpreted language (plus the general concepts of programming), I tend to think you can learn another one on your own if you need it. (This will not, however, get you hired if proficiency in the one you don't know is an up-front requirement.)

I don't really intend to argue the merits of requiring one v. both here, nor the issue of whether a one-semester deep understanding of two languages is as good as a two-semester deep understanding of one language. Purely by coincidence, that conversation was followed by my discovering R vs. Python for Data Science, a point-by-point comparison of the two by Norm Matloff (a computer science professor at UC Davis). If you are interested in data science and trying to decide which one to learn (or which to learn first), I think Matloff's comparison provides some very useful information. With the exception of "Language unity", I'm inclined to agree with his ratings.

Matloff calls language unity a "horrible loss" for R, emphasizing a dichotomy between conventional/traditional R and the so-called "Tidyverse" (which is actually a set of packages). At the same time, he calls the transition form Python 2.7 to 3.x "nothing too elaborate". Personally, I use Tidyverse packages when it suits me and not when it doesn't. There's a bit of work involved in learning each new Tidyverse package, but that's not different from learning any other package. He mentions tibbles and pipes. Since a tibble is a subclass of a data frame, I can (and often do) ignore the differences and just treat them as data frames. As far as pipes go, they're not exactly a Tidyverse creation. The Tidyverse packages load the magrittr package to get pipes, but I think that package predates the Tidyverse, and I use it with "ordinary" R code. Matloff also says that "... the Tidyverse should be considered advanced R, not for beginners." If I were teaching an intro to R course, I think I would introduce the Tidyverse stuff early, because it imposes some consistency on the outputs of R functions that is sorely lacking in base R. (If you've done much programming in R, you've probably had more than a few "why the hell did it do that??" moments, such as getting a vector output when you expected a scalar or vice versa.) Meanwhile, I've seen issues with software that bundled Python scripts (and maybe libraries), for instance because users who came to Python recently and have only ever installed 3.x discover that the bundled scripts require 2.x (or vice versa).

Anyway, that one section aside (where I think Matloff and I can agree to disagree), the comparison is quite cogent, and makes for good reading.

Friday, June 14, 2019

A Java Container for Parameters

A few days ago, I posted about a Swing class (and supporting stuff) that I developed to facilitate my own computations research, and which I have now made open-source in a Bitbucket repository. I finally got around to cleaning up another Java utility class I wrote, and which I use regularly in experiments. I call it ParameterBlock. It is designed to be a container for various parameters that I need to set during experiments.

It might be easiest if I start with a couple of screen shots. The first one shows a Swing application I was using in a recent project. Specifically, it shows the "Settings" menu, which has multiple entries corresponding to different computational stages (the overall solver, an initial heuristic, two phases of post-heuristic number crunching), along with options to save and load settings.

Parameter settings menu

Computational research can involve a ton of choices for various parameters. CPLEX alone has what seems to be an uncountable number of them. In the Dark Ages, I hard-coded parameters, which meant searching the source code (and recompiling) every time I wanted to change one. Later I graduated to putting them on the command line, but that gets old quickly if there are more than just a handful. When I started writing simple Swing platforms for my work (like the one shown above), I added menu options to call up dialogs that would let me see the current settings and change them. Over time, this led me to my current solution.

I put each collection of parameters in a separate subclass of the (abstract) ParameterBlock class. So clicking on "Solver" would access on subclass, clicking on "Heuristic" would access a different subclass, and so on. A parameter block can contain parameters of any types. The next shot shows a dialog for the solver parameters in my application. Two of the parameters are boolean (and have check boxes), two are Java enumerations (and have radio button groups), and three are numeric (and have text fields). String parameters are also fine (handled by text boxes).

Defining a parameter block is easy (in my opinion). It pretty much boils down to deciding how many parameters you are going to have, assigning a symbolic name to each (so that in other parts of the code you can refer to "DOMINANCE" and not need to remember if it parameter 1, parameter 2 or whatever), giving each one a label (for dialogs), a type (such as boolean.class or double.class), a default value and a tool tip. The ParameterBlock class contains a static method for generating a dialog like the one below, and one or two other useful methods.

Solver parameters

You can more details in the README file at the GitLab repository I set up for this. The repository contains a small example for demonstration purposes, but to use it you just need to copy ParameterBlock.java into your application. As usual, I'm releasing it under a Creative Commons license. Hopefully someone besides me will find it useful.

Tuesday, June 11, 2019

A Swing Platform for Computational Experiments

Most of my research involves coding algorithms and running computational experiments with them. It also involves lots of trial-and-error, both with the algorithms themselves and with assorted parameters that govern their functioning. Back in the Dark Ages, I did all this with programs that ran at a command prompt (or, in Linux terms, in a terminal) and wrote any output to the terminal. Eventually I got hip and started writing simple GUI applications for those experiments. A GUI application lets me load different problem without having to stop and restart the program, lets me change parameter settings visually (without having to screw around with lots of command line options ... is the switch for using antisymmetry constraints -A or -a?), and save output to text files when it's helpful.

Since I code (mostly) in Java, the natural way to do this is with a Swing application. (When I use R, I typically use an R notebook.) Since there are certain features I always want, I found myself copying code from old projects and then cutting out the project-specific stuff and adding new stuff, which is a bit inefficient. So I finally got around to creating a minimal template version of the application, which I'm calling "XFrame" (short for "Experimental JFrame" or "JFrame for Experiments" or something).

I just uploaded it to Bitbucket, where it is open-source under a very nonrestrictive Creative Commons license. Feel free to download it if you want to try it. There's an issue tracker where you can report any bugs or sorely missing features (but keep in mind I'm taking a fairly minimalist approach here).

Using it is pretty simple. The main package (xframe) contains two classes and an interface. You can just plop them into your application somewhere. One class (CustomOutputStream) you will probably not want to change. The actual GUI is the XFrame class. You will want to add menu items (the only one it comes with is File > Exit) and other logic. Feel free to change the class name as well if you wish. Finally, your program needs to implement the interface defined in XFrameController so that XFrame knows how to talk to the program.

The layout contains a window title at the top and a status message area at the bottom, both of which can be fetched and changed by code. The central area is a scrollable text area where the program can write output. It has buttons to save the content and to clear it, and the program will not exit with unsaved text unless you explicitly bless the operation.

There is a function that lets the program open nonmodal, scrollable dialog (which can be left open while the main window is in use, and whose content can be saved to a file). Another function allows the program to pop up modal dialogs (typically warnings or error messages). Yet another function provides a place where you can insert logic to tell the GUI when to enable/disable menu choices (and maybe other things). Finally, there is a built-in extension of SwingWorker that lets you launch a computational operation in a separate thread (where it will not slow down or freeze the GUI).

I included a small "Hello world!" application to show it works. I'll end with a couple of screen shots, one of the main window and the other of the nonmodal dialog (both from the demo). If it looks like something you might want to use, please head over to Bitbucket and grab the source.

Main window

Nonmodal dialog

Monday, June 10, 2019

Indicator Constraints v. Big M

Way, way back I did a couple of posts related to how to model "logical indicators" (true/false values that control enforcement of constraints):

The topic ties in to the general issue of "big M" model formulations. Somewhere around version 10, CPLEX introduced what they call indicator constraints, which are essentially implications of the form "if constraint 1 holds, then constraint 2 holds". As an example, if $a$ and $b$ are vector respectively scalar parameters, $x$ is a binary variable and $y$ is a vector of real variables, you might use an indicator constraint to express $x=1 \implies a'y\le b$.

That same constraint, in a "big M" formulation, would look like $a'y \le b + M(1-x)$. Using an indicator constraint saves you having to make a careful choice of the value of $M$ and avoids certain numerical complications. So should you always use indicator constraints? It's not that simple. Although "big M" formulations are notorious for having weak relaxations, indicator constraints can also result in weak (possibly weaker) relaxations.

For more details, I'll refer you to a question on the new Operations Research Stack Exchange site, and in particular to an answer containing information provided by Dr. Ed Klotz of IBM. My take is that this remains one of those "try it and see" kinds of questions.

Saturday, June 1, 2019

Naming CPLEX Objects

A CPLEX user recently asked the following question on a user forum: "Is there a way to print the constraints as interpreted by CPLEX immediately after adding these constraints using addEq, addLe etc." The context for a question like this is often an attempt to debug either a model or the code creating the model. Other users in the past have indicated difficulty in parsing models after exporting them to a text (LP format) file, because they cannot associate the variable names and constraint names assigned by CPLEX to the variables and constraints in their mathematical models. For example, here is part of a simple set covering model created in Java using CPLEX 12.9 and exported to an LP file:

Minimize
 obj: x1 + x2 + x3 + x4 + x5
Subject To
 c1: x1 + x4 >= 1
 c2: x1 + x3 + x4 >= 1
 c3: x2 + x5 >= 1

Assume that the decisions relate to possible locations for depots. The text is perfectly legible, but is "x4" the decision to put a depot in "San Jose" or the decision to put a depot in "Detroit"? Is constraint "c1" the requirement to have a depot near central Ohio or the requirement to have a depot near the metro New York area?

Assigning meaningful names to variables and constraints is easy. In the Java and C++ APIs (and, I assume, the others), the functions that create variables and constraints have optional string arguments for names. Let's say that I create my variables and my constraints inside loops, both indexed by a variable i. I just need to change

mip.boolVar()

and

mip.addGe(sum, 1)

(where sum is an expression calculated inside the loop) to

mip.boolVar(vnames[i]);

and

mip.addGe(sum, 1, cnames[i]);

where vnames[] and cnames[] are string arrays containing the names I want to use for variables and constraints, respectively. the previous model fragment now looks like the following:

Minimize
 obj: Pittsburgh + San_Jose + Newark + Detroit + Fresno
Subject To
 Central_Ohio: Pittsburgh + Detroit >= 1
 Metro_NY:     Pittsburgh + Newark + Detroit >= 1
 SF_Oakland:   San_Jose + Fresno >= 1

This version is much more useful when either explaining the model (to someone else) or looking for problems with it. (Just don't look for any geographic logic in the example.)

Note the use of underscores, rather than spaces, in the names. CPLEX has some rules about what is syntactically a legitimate name in an LP file, and if you violate those rules, CPLEX with futz with your names and add index numbers to them. So "San Jose" might become "San_Jose#2", and "SF/Oakland" would turn into something at least as silly.

That's part of the battle. The question I cited asks how to print out constraints as they arise. The key there is that the various constraint constructors (IloCplex.ge(), IloCplex.addGe(), IloCplex.le(), IloCplex.addLe(), IloCplex.eq(), IloCplex.addEq(), ...) return a pointer to the constraint they construct. If you pass that pointer to a print statement, you print the constraint. Extending my example, I will tweak the constraint construction a bit, to

IloRange newConstraint = mip.addGe(sum, 1, cnames[i]);
System.out.println(newConstraint);

which creates the cover constraint, adds it to the model and then prints it. That results in output lines like this:

IloRange Central_Ohio : 1.0 <= (1.0*Detroit + 1.0*Pittsburgh) <= infinity

One last observation: If you want to print the entire model out, you do not need to save it to an LP file. Just pass the model (IloCplex object) to a print statement. If I execute

System.out.println(mip);

after the model is complete, I get this:

IloModel  {
IloMinimize  : (1.0*San_Jose + 1.0*Detroit + 1.0*Pittsburgh + 1.0*Newark + 1.0*Fresno)
IloRange Central_Ohio : 1.0 <= (1.0*San_Jose + 1.0*Pittsburgh) <= infinity
IloRange Metro_NY : 1.0 <= (1.0*Pittsburgh + 1.0*Fresno) <= infinity
IloRange SF_Oakland : 1.0 <= (1.0*Detroit + 1.0*Fresno) <= infinity

}

It is not entirely complete (you don't see the declarations of the variables as binary), but it is arguably enough for most model debugging.

Wednesday, May 29, 2019

How to Crash CPLEX

A question elsewhere on the blog reminded me that some users of the CPLEX programming APIs are not conscious of a "technicality" that, when violated, might cause CPLEX to crash (or at least throw an exception).

The bottom line can be stated easily enough: modifying a CPLEX model while solving it is a Bozo no-no. In the Java API, that means making a change to instance of IloCplex while that instance is solving. In the C++ API, where the model (IloModel) and solver (IloCplex) are separate, I suspect it means making a change to either on ... but I'm not sure.

But wait, you say, callbacks do just that! No, not exactly. Callbacks add constraints (either user cuts or lazy constraints) to cut pools maintained by the solver, but they are not added to the model itself. Callbacks have their own add-whatever methods, for this specific purpose. Let's say that (in Java) I declare

IloCplex cplex = new IloCplex();

and then build a model, attach a callback or two and call cplex.solve(). While the solver is working, I can add user cuts or lazy constraints using the IloCplex.UserCutCallback.add() or IloCplex.LazyConstraintCallback.add() methods (or their local alternatives). What I cannot do, even in a callback, is use cplex.add() to add a user cut or lazy constraint (or anything else). If I do, kaboom!

What if, during a solve, I discover some new constraints that I would like to add to the model? One option is to abort the solver, add them, and start over. Another is to store them in program memory, wait for the solver to terminate (optimal solution, time limit, whatever), and then add them to the model. I just cannot add them while the solver is running (unless I want to crash the program).

Note that, while one model is solving, I am free to modify some other model that is not solving. For instance, suppose I decompose a problem into a master problem and several subproblems (one per time period, say). Assuming that subproblems are solved sequentially, not in parallel, if I stumble on a constraint relevant to subproblem #2 while I am solving subproblem #1, I am free to add it to subproblem #2 ... just not (yet) to subproblem #1.

Monday, May 13, 2019

Randomness: Friend or Foe?

I spent a chunk of the weekend debugging some code (which involved solving an optimization problem). There was an R script to setup input files and a Java program to process them. The Java program included both a random heuristic to get things going and an integer program solved by CPLEX.

Randomness in algorithms is both a blessing and a curse. Without it, genetic algorithms would reduce to identical clones of one individual engaging in a staring contest, simulation models would produce absurdly tight confidence intervals (thanks to the output have zero variance) and so on. With it, though, testing and debugging software gets tricky, since you cannot rely on the same inputs, parameter settings, hardware etc. to produce the same output, even when the program is function correctly. If I rerun a problem and get different results, or different performance (e.g., same answer but considerably faster or slower), is that an indication of a bug or just luck of the draw?

Somewhere, back in the Dark Ages, I was told that the answer to all this was to set the seed of the pseudorandom number stream explicitly. This was after the introduction of pseudorandom number generators, of course. Before that, random numbers were generated based on things like fluctuations in the voltage of the power supplied to the mainframe, or readings from a thermocouple stuck ... well, never mind. Today hardware random number generators are used mainly in cryptography or gambling, where you explicitly do not want reproducibility. With pseudorandom numbers, using the same seed will produce the same sequences of "random" values, which should hypothetically make reproducibility possible.

I think I originally came across that in the context of simulation, but it also applies in optimization, and not just in obviously random heuristics and metaheuristics. CPLEX has a built-in pseudorandom number generator which is used for some things related to branching decisions when solving integer programs (and possibly other places too). So explicitly setting a seed for both the random number generator used by my heuristic and CPLEX should make my code reproducible, right?

Wrong. There are two more wildcards here. One is that my Java code uses various types of collections (HashMaps, HashSets, ...) and also uses Streams. Both of those can behave in nondeterministic ways if you are not very careful. (Whether a Stream is deterministic may boil down to whether the source is. I'm not sure about that.) The other, and I'm pretty sure this bites me in the butt more often than any other aspect, is the use of parallel threads. My code runs multiple copies of the heuristic in parallel (using different seeds), and CPLEX uses multiple threads. The problem with parallel threads is that timing is nondeterministic, even if the sequence of operations in each thread is. The operating system will interrupt threads willy-nilly to use those cores for other tasks, such as updating the contents of the display, doing garbage collection in Java, pinging the mail server or asking Skynet whether it's time to terminate the user). If there is any communication between the processes running in parallel threads in your code, the sequencing of the messages will be inherently random. Also, if you give a process a time limit and base it on "wall clock" time (which I'm prone to doing), how far the process gets before terminating will depend on how often and for how long it was interrupted.

Limiting everything to a single thread tends not to be practical, at least for the problems I find myself tackling. So I'm (somewhat) reconciled to the notion that "reproducibility" in what I do has to be interpreted somewhat loosely.

Tuesday, April 16, 2019

CPLEX Callbacks: ThreadLocal v. Clone

A while back, I wrote a post about the new (at the time) "generic" callbacks in CPLEX, including a brief discussion of adventures with multiple threads. A key element was that, with generic callbacks, IBM was making the user more responsible for thread safety. In that previous post, I explored a few options for doing so, including use of Java's ThreadLocal class. ThreadLocal is my current first choice when writing a generic callback.

Recently, I saw a reply by IBM's Daniel Junglas on a CPLEX user forum that contained the following information.

For each thread CPLEX creates a clone of your callback object. This is done by invoking the callback's clone() method. The Java default implementation of clone() returns a shallow copy of the class.

That set me to wondering about the differences between ThreadLocal and cloning, so I did a bit of experimentation. I'll summarize what I think I learned below.

Why is any of this necessary?

It's not if you only allow CPLEX to use a single thread. When using callbacks with multiple threads, it is possible that two or more threads will access the callback simultaneously. Simultaneously reading/fetching the value of something is harmless, but trying to modify a value simultaneously will either trigger an exception or cause the JVM to crash. (That's not hypothetical, by the way. I have a test program that routinely crashes the JVM.) "Information callbacks" are harmless, but "control callbacks" (where you attempt to alter what CPLEX is doing, by adding cuts or lazy constraints or by taking control of branching), usually involve modifying something. In particular, with Benders decomposition your callback needs to solve a subproblem after first adjusting it based on the proposed master problem solution. Two threads trying to adjust the subproblem at the same time spells disaster.

Is the use of ThreadLocal an option for both legacy and generic callbacks?

Yes. The only tricky part is in initialization of the storage. Let's say that I'm doing Benders decomposition, and I have a subproblem that is an instance of IloCplex. I am going to need to create a separate version of the subproblem for each thread. So my callback class will contain a class field declared something like

private ThreadLocal<IloCplex> subproblem;

and it will need to fill in a value of the subproblem for each thread.

With a generic callback, the ThreadUp context provided by CPLEX can be used to do this. Assuming that context is the argument to the callback function you write, you can use code like the following to initialize the subproblem for each thread.

if (context == IloCplex.Callback.Context.Id.ThreadUp) {
  IloCplex s = ...; // code to generate a new subproblem
  subproblem.set(s);
}

Once the subproblem is initialized, to use it when a candidate solution is being proposed, you need to extract it from the ThreadLocal field. Here is an example of how that would look.

if (context == IloCplex.Callback.Context.Id.Candidate) {
  IloCplex s = subproblem.get(s);
  // Do Benders stuff with s.
}

Legacy callbacks do not have a mechanism like ThreadUp for detecting the creation of a new thread. You can still initialize the subproblem "lazily". In the legacy callback, before using the subproblem check to see if it exists. If not, create it. Here's some sample code.

IloCplex s = subproblem.get();  // get the subproblem
if (s == null) {
  // First use: need to generate a fresh subproblem.
  s = ...; // code to generate a new subproblem
  subproblem.set(s);
}
// Do Benders stuff with s.

Is cloning an option for both legacy and generic callbacks?

No. I don't think cloning can be used with generic callbacks. In Java, objects can be cloned. CPLEX declares legacy callbacks as Java objects, but it declares generic callbacks by means of an interface. Cloning an interface is not really a "thing" in Java.

When you use a generic callback, you create a class that implements the Callback interface. It's certainly possible to make that class cloneable, but according to my experiments CPLEX will not call the clone() method on that class, even if it exists. So the solver will be working with a single instance of that class, and if the class is not thread-safe, kaboom.

What does cloning entail?

There are quite a few web sites that explain cloning in Java, and if you intend to use it I recommend you do a search. I'll just cover the bare bones here.

Declare your class with the modifier "implements Cloneable".
Override the protected method Object.clone.
Call the parent method (super.clone()) as the very first executable line of the clone() method.
Handle the CloneNotSupportedException that the parent method might hypothetically throw, either by catching it and doing something, or by rethrowing it.
After calling the parent method, fix anything that needs fixing.

I left that last step rather vague. With a Benders lazy constraint callback, you will need to replace the original subproblem with a freshly generated version (so that no two threads are chewing on the same subproblem instance). If life were fair (it's not), you would just do a deep clone of the original problem. The catch is that the IloCplex class is not cloneable. So the best (only?) alternative I can see is to build a new copy of the subproblem, the same way you built the original copy.

If you have other class fields that the callback might modify (such as a vector that stores the best feasible solution encountered), you may need to do a deep clone of them (or replace them with new versions). A detailed analysis of the differences between shallow and deep cloning is beyond the scope of this post (or my competence). As Daniel points out in his answer, super.clone() only makes a shallow copy. You'll need to take additional steps to make sure that fields containing arrays and other objects (other than primitives) are handled properly if the callback might modify them.

Here is some skeleton code.

public class MyCallback extends IloCplex.LazyConstraintCallback
                        implements Cloneable {
  private IloCplex subproblem;

  // ...

  /**
   * Clone this callback.
   * @return the clone
   * @throws CloneNotSupportedException if the parent clone method bombs
   */
  @Override
  public MyCallback clone() throws CloneNotSupportedException {
    // Call Object.clone first.
    MyCallback cb = (CloneCallback) super.clone();
    // Replace the subproblem with a fresh copy.
    subproblem = ...;  // generate a fresh copy of the subproblem
    // Make deep copies (or new values) for other fields as needed.
    return cb;
  }

}

Tuesday, March 12, 2019

Pseudocode in LyX Revisited

This post is strictly for users of LyX.

In a previous post I offered up a LyX module for typesetting pseudocode. Unfortunately, development of that module bumped up against some fundamental incompatibilities between the algorithmicx package and the way LyX layouts work. The repository for it remains open, but I decided to shift my efforts to a different approach.

LyX has built-in support for "program listings", insets that use either the listings (default choice) or minted LaTeX package. The listings package supports a lengthy list of programming languages, but not pseudocode. This makes sense because pseudocode has no fixed syntax; everyone writes it their own way. So I cobbled together a LyX module that adds my take on a pseudocode language to the listings package, allowing users to type pseudocode into a LyX program listing inset.

The module is available (under a GPL3 license) from a GitHub repository. The repository includes a user guide (both a LyX file and a PDF version) explaining both installation and the nuances of using the module. (Yes, there are some finicky bits.) The user guide also spells out what words are treated as keywords and formatted accordingly. I also built in a (somewhat clumsy) way to insert vertical lines to help the user recognize the span of a block of code.

The listings package provides (and LyX supports) a number of nice formatting features, including options for color-coding keywords, numbering lines and such. In addition, you can modify the list of recognized keywords by editing the module file in a text editor. If you are familiar with the listings package, you can customize the pseudocode language definition even further (for instance, by changing the delimiters for comments), again by hacking the module file. If you use the module, feel free to suggest changes to the language definition via the issue tracker for the repository.

Here is a small sample of the output, to give you an idea of what I am talking about. (I had to convert the PDF output to an image, which explains the slight fuzziness.)

While we're on the subject of LyX, I also have a layout file for the standalone LaTeX class, which is useful for generating PDF output of just one table, image or whatever without the wasted space of page margins. That layout file has its own GitHub repository, if you are interested.

Sunday, February 24, 2019

Guessing Pareto Solutions: A Test

In yesterday's post, I described a simple multiple-criterion decision problem (binary decisions, no constraints), and suggested a possible way to identify a portion of the Pareto frontier using what amounts to guesswork: randomly generate weights; use them to combine the multiple criteria into a single objective function; optimize that (trivial); repeat ad nauseam.

I ran an experiment, just to see whether the idea was a complete dud or not. Before getting into the results, I should manage expectations a bit. Specifically, while it's theoretically possible in some cases that you might find all the Pareto efficient solutions (and is in fact guaranteed if there is only one), in general you should not bank on it. Here's a trivial case that demonstrates that it may be impossible to find them all using my proposed technique. Suppose we have three yes-no decisions ($N=3$, using the notation from yesterday's post) and two criteria ($M=2$), one to be maximized and one to be minimized. Supposed further that, after changing the sign of the second criterion (so that we are maximizing both) and scaling, that all three decisions produce identical criterion values of $(1,-1)$. With $N=3$, there are $2^3=8$ candidate solutions. One (do nothing) produces the result $(0,0)$. Three (do just one thing) produce $(1,-1)$, three (do any two things) produce $(2,-2)$ and one (do all three things) produces $(3,-3)$. All of those are Pareto efficient! However, if we form a weighted combination using weights $(\alpha_1, \alpha_2) \gg 0$ for the criteria, then every decision has the same net impact $\beta = \alpha_1 - \alpha_2$. If $\beta > 0$, the only solution that optimizes the composite function is $x=(1,1,1)$ with value $3\beta$. If $\beta<0$, the only solution that optimizes the composite function is $x=(0,0,0)$ with value $0$. The other six solutions, while Pareto efficient, cannot be found my way.

With that said, I ran a single test using $N=10$ and $M=5$, with randomly generated criterion values. One test like this is not conclusive for all sorts of reasons (including but not limited to the fact that I made the five criteria statistically independent of each other). You can see (and try to reproduce) my work in an R notebook hosted on my web site. The code is embedded in the notebook, and you can extract it easily. Fair warning: it's pretty slow. In particular, I enumerated all the Pareto efficient solutions among the $2^{10} = 1,024$ possible solutions, which took about five minutes on my PC. I then tried one million random weight combinations, which took about four and a half minutes.

Correction: After I rejiggered the code, generating a million random trials took only 5.8 seconds. The bulk of that 4.5 minutes was apparently spent keeping track of how often each solution appeared among the one million results ... and even that dropped to a second or so after I tightened up the code.

To summarize the results, there were 623 Pareto efficient solutions out of the population of 1,024. The random guessing strategy only found 126 of them. It found the heck out of some of them: the maximum number of times the same solution was identified was 203,690! (I guess that one stuck out like a sore thumb.) Finding 126 out of 623 may not sound too good, but bear in mind the idea is to present a decision maker with a reasonable selection of Pareto efficient solutions. I'm not sure how many decision makers would want to see even 126 choices.

A key question is whether the solutions found by the random heuristic are representative of the Pareto frontier. Presented below are scatter plots of four pairs of criteria, showing all the Pareto efficient solutions color-coded according to whether or not they were identified by the heuristic. (You can see higher resolution versions by clicking on the link above to the notebook, which will open in your browser.) In all cases, the upper right corner would be ideal. Are the identified points a representative sample of all the Pareto efficient points? I'll let you judge.

I'll offer one final thought. The weight vectors are drawn uniformly over a unit hypercube of dimension $M$, and the frequency with which each identified Pareto solution occurs within the sample should be proportional to the volume of the region within that hypercube containing weights favoring that solution. So high frequency solutions have those high frequencies because wider ranges of weights favor them. Perhaps that makes them solutions that are more likely to appeal to decision makers than their low frequency (or undiscovered) counterparts?

Saturday, February 23, 2019

Guessing Pareto Solutions

One of the challenges of multiple-criteria decision-making (MCDM) is that, in the absence of a definitive weighting or prioritization of criteria, you cannot talk meaningfully about a "best" solution. (Optimization geeks such as myself tend to find that a major turn-off.) Instead, it is common to focus on Pareto efficient solutions. We can say that one solution A dominates another solution B if A does at least as well as B on all criteria and better than B on at least one criterion. A solution is Pareto efficient if no solution dominates it.

Recently I was chatting with a woman about a multiple-criteria model she had. The variables were all binary (yes-no decisions) and were otherwise unconstrained. Normally my hackles would go up at an unconstrained decision problem, but in this case all constraints were "soft". For example, rather than specify a fixed budget, she would make cost one of the criteria (less being better) and let the decision maker screen out Pareto efficient solutions with unrealistic budgets.

I won't name the researcher or go into more details, since her work is currently under review (as of this writing), but I will look at a similar problem that is only slightly simpler. Namely, I will add the requirement that the criteria are additive, meaning that the value of any criterion for any "yes" does not depend on which other decisions wound up "yes" rather than "no". So, in mathematical terms, we have $N$ binary decisions ($x\in\{0,1\}^N$) and $M$ criteria. Let $c_{ij}$ denote the value of criterion $j$ when $x_i=1$. In what follows, I will want to be consistent in maximizing (rather than minimizing) criterion values, so I'll assume that (a) a "no" decision has value 0 for every criterion and (b) criteria are normalized so that $c_{ij}\in [0,1]$ for all $i$ and for all $j$ such that more is better, while $c_{ij}\in [-1,0]$ for all $i$ and for all $j$ such that (before scaling) less is better. For example, if cost is a criterion (and if you do not work for the Pentagon), the scaled values of cost will run from -1 for the most expensive choice encountered to 0 for any choice that is free.

The solution space contains $2^N$ candidate solutions, which means that enumerating them is out of the question for nontrivial values of $N$ (and, trust me, hers are nontrivial ... very, very nontrivial). In any case, the size of the Pareto frontier when $N$ is nontrivial will quickly outpace any decision maker's capacity (or patience). So it seems reasonable to sample the Pareto frontier and present the solution maker with a reasonably (but not excessively) large set of Pareto efficient solutions that hopefully is representative of the overall Pareto frontier.

There are methods for doing this, and the researcher pointed me toward one: the NSGA-II genetic algorithm [1]. Like any genetic algorithm, NSGA-II runs until you stop it via some limit (run time, number of generations, some measure of stagnation, ...). As I understand it, the final population of solutions are not guaranteed to be Pareto efficient, but the population converges in some sense to the Pareto frontier. That would likely be sufficient in practice, and the researcher had good luck with it initially, but ran into scaling problems as $N$ grew.

Looking at the problem reminded me of a sometimes troubling dichotomy faced by academics (in at least some disciplines, including operations research). On the one hand, real-world decision makers with real-world problems are often quite happy to get solutions that are obtained quickly and cheaply and are easily explained or understood, even if the technology is "primitive". Academics, on the other hand, usually need to publish their research. In the domain of the problem, low-tech algorithms may be perfectly fine (and easier to understand than more esoteric algorithms), but for a professor in IE or OR looking to publish in a journal in his or her field, "simple" usually means "too unsophisticated to publish". Thus we tend at times to "over-engineer" solutions.

Anyway, taking off my academic hat for a moment, I got to wondering whether a no-brainer random search approach would work for a problem such as this one. My premise was the following. Suppose I randomly select weights $\alpha_1,\dots,\alpha_M \in (0,1]$ and use them to form a single weighted criterion function $f_\alpha:\{0,1\}^N \rightarrow \Re$ as follows:$$f_\alpha (x)=\sum_{i=1}^N w_i(\alpha) x_i$$where$$w_i(\alpha)=\sum_{j=1}^M \alpha_j c_{ij}.$$Note that the weights $\alpha_i$ need to be strictly positive, not just nonnegative. Maximizing $f_\alpha$ over the solution space is trivial: set $x_i=1$ if $w_i(\alpha) > 0$ and $x_i=0$ if $w_i(\alpha) < 0$. (You can set $x_i$ to anything you like if $w_i(\alpha) = 0$, but that should have probability 0 of occurring given the random generation of $\alpha$.) Constructing the coefficients of $f_\alpha$ is $O(MN)$ and optimizing $f_\alpha$ is $O(N)$, so each solution $x$ produced this way takes $O(MN)$ time. In practice, $M$ is likely not to get too big, and very likely does not grow as $N$ does, so functionally this is really $O(N)$ in complexity.

It's trivial to show that, so long as none of the composite weights $w_i(\alpha)$ is 0, each $x$ found by optimizing some $f_\alpha$ must be Pareto efficient. So we can generate a sample of Pareto efficient solutions fairly efficiently ... maybe. I did deal one card off the bottom of the deck. Solutions will tend to repeat, so we will need to do some bookkeeping to ensure that, in the final results, duplicates are weeded out. That means comparing each new solution to its predecessors. Each comparison is $O(N)$, and if we assume that we only retain a "reasonable" number of alternatives (a number that does not grow with $N$) in order to avoid causing the decision maker's brain to explode, then time spent weeding duplicates should be $O(N)$. Overall, the load is $O(NR)$, where $R$ is the number of replications performed.

That left me with a few questions, the main one being this: for plausible sample sizes, how well does the set of solutions obtained represent the Pareto frontier? Is it "biased" in the sense that solutions cluster on one part of the frontier while missing other parts?

I'll show some experimental results in my next post, and let you decide.

[1] K. Deb, A. Pratap, S. Agarwal & T. Meyarivan (2002). A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6, 182-197.

Thursday, February 14, 2019

Binary / Integer Conversion in R

I've been poking around in R with an unconstrained optimization problem involving binary decision variables. (Trust me, it's not as trivial as it sounds.) I wanted to explore the entire solution space. Assuming $n$ binary decisions, this means looking at the set $\{0,1\}^n$, which contains $2^n$ 0-1 vectors of dimension $n$.

For reasons I won't get into, I wanted to associate each vector with an ID number from 0 to $2^n-1$, which sounds pretty straightforward ... until you try to do it. I thought I'd post my code here, with no warranty ("express or implied", as the legal eagles would say) that it's computationally efficient.

Before I explain the code, let me just point out that using an integer from 0 to $2^n-1$ implies that $n$ is small enough that $2^n-1$ fits in a machine integer. So if you decide to try the code, don't go nuts with it.

Here's the code:

# Create a vector of powers of 2 (for use in conversions from binary vectors to integers).
powers.of.two <- 2^(0:(n - 1))
# Convert an ID (integer) to a binary vector of appropriate length. Note that the vector is reversed so that the lowest order bit (corresponding to the first decision) comes first.
fromID <- function(id) { as.integer(head(intToBits(id), n)) }
# Convert a binary vector of appropriate length to an ID value (integer).
toID <- function(vec) { as.integer(vec %*% powers.of.two) }

To facilitate converting binary vectors to integers, I store all the powers of 2 once. The fromID() function takes an integer ID number and converts it to a binary vector of length $n$. It uses the intToBits() function from the R base package, which does the heavy lifting but whose output needs a little massaging. The toID() function takes a binary vector and converts it to an integer ID number. So, with $n=5$, the output of fromID(22) is

[1] 0 1 1 0 1

while the output of toID(c(0, 1, 1, 0, 1)) is

[1] 22

(as you would expect).

There is one other thing I should point out: because I am using the ID numbers to index binary vectors starting from (0, 0, ..., 0) and working up to (1, 1, ..., 1), I designed the functions to put the least significant bit first. If you want the most significant bit first and the least significant bit last, you need to make two changes to the code: change the definition of powers.of.two to 2^((n - 1):0); and, in the definition of fromID, change head(intToBits(id), n) to rev(head(intToBits(id), n)).

Friday, January 25, 2019

Threads and Memory

Yesterday I had a rather rude reminder (actually, two) of something I've known for a while. I was running a Java program that uses CPLEX to solve an integer programming model. The symptoms were as follows: shortly after the IP solver run started, I ran out of RAM, the operating system started paging memory to the hard drive, and the resulting hard drive thrashing made the system extremely sluggish (to put it charitably). Long after I regained enough control to kill the program, there was a significant amount of disk activity (and concomitant noise) as the system gradually came back to its senses.

How did this happen? My system has four cores, which means CPLEX defaults to running four parallel threads when solving models. What's not always obvious is that each thread gets a separate copy of the model. (I'm not sure if it is the entire model after presolving or just most of it, but it's definitely a large chunk.) In my particular case, the model begins with an unpredictably large number of constraints, and when that number got big, the model got big -- not big enough to be a problem if I used a single thread, but too big to get away with four copies of it ... or, as it turned out, three copies. (The second thrashing event was when I tried to run it with three threads.)

Parallel threading is great, but there are two caveats associated with it. First, performance is a sublinear function of the number of threads, meaning that doubling the number of threads will not cut run time in half, tripling the number of threads will not cut run time to a third the single-thread time, and so on. Second, if you are dealing with a largish model, you might want to try running a little while with a single thread to see how much memory it eats, and then decide how many copies your system can handle comfortably. That's an upper bound on how many threads you should use.