Sunday, June 16, 2019

R v. Python

A couple of days ago, I was having a conversation with someone that touched on the curriculum for a masters program in analytics. One thing that struck me was requirement of one semester each of R and Python programming. On the one hand, I can see a couple of reasons for requiring both: some jobs will require the worker to deal with both kinds of code; and even if you're headed to a job where one of them suffices, you don't know which one it will be while you're in the program. On the other hand, I tend to think that most people can get by nicely with just one or the other (in my case R), if not neither. Also, once you've learned an interpreted language (plus the general concepts of programming), I tend to think you can learn another one on your own if you need it. (This will not, however, get you hired if proficiency in the one you don't know is an up-front requirement.)

I don't really intend to argue the merits of requiring one v. both here, nor the issue of whether a one-semester deep understanding of two languages is as good as a two-semester deep understanding of one language. Purely by coincidence, that conversation was followed by my discovering R vs. Python for Data Science, a point-by-point comparison of the two by Norm Matloff (a computer science professor at UC Davis). If you are interested in data science and trying to decide which one to learn (or which to learn first), I think Matloff's comparison provides some very useful information. With the exception of "Language unity", I'm inclined to agree with his ratings.

Matloff calls language unity a "horrible loss" for R, emphasizing a dichotomy between conventional/traditional R and the so-called "Tidyverse" (which is actually a set of packages). At the same time, he calls the transition form Python 2.7 to 3.x "nothing too elaborate". Personally, I use Tidyverse packages when it suits me and not when it doesn't. There's a bit of work involved in learning each new Tidyverse package, but that's not different from learning any other package. He mentions tibbles and pipes. Since a tibble is a subclass of a data frame, I can (and often do) ignore the differences and just treat them as data frames. As far as pipes go, they're not exactly a Tidyverse creation. The Tidyverse packages load the magrittr package to get pipes, but I think that package predates the Tidyverse, and I use it with "ordinary" R code. Matloff also says that "... the Tidyverse should be considered advanced R, not for beginners." If I were teaching an intro to R course, I think I would introduce the Tidyverse stuff early, because it imposes some consistency on the outputs of R functions that is sorely lacking in base R. (If you've done much programming in R, you've probably had more than a few "why the hell did it do that??" moments, such as getting a vector output when you expected a scalar or vice versa.) Meanwhile, I've seen issues with software that bundled Python scripts (and maybe libraries), for instance because users who came to Python recently and have only ever installed 3.x discover that the bundled scripts require 2.x (or vice versa).

Anyway, that one section aside (where I think Matloff and I can agree to disagree), the comparison is quite cogent, and makes for good reading.

Friday, June 14, 2019

A Java Container for Parameters

A few days ago, I posted about a Swing class (and supporting stuff) that I developed to facilitate my own computations research, and which I have now made open-source in a Bitbucket repository. I finally got around to cleaning up another Java utility class I wrote, and which I use regularly in experiments. I call it ParameterBlock. It is designed to be a container for various parameters that I need to set during experiments.

It might be easiest if I start with a couple of screen shots. The first one shows a Swing application I was using in a recent project. Specifically, it shows the "Settings" menu, which has multiple entries corresponding to different computational stages (the overall solver, an initial heuristic, two phases of post-heuristic number crunching), along with options to save and load settings.

Parameter settings menu




Computational research can involve a ton of choices for various parameters. CPLEX alone has what seems to be an uncountable number of them. In the Dark Ages, I hard-coded parameters, which meant searching the source code (and recompiling) every time I wanted to change one. Later I graduated to putting them on the command line, but that gets old quickly if there are more than just a handful. When I started writing simple Swing platforms for my work (like the one shown above), I added menu options to call up dialogs that would let me see the current settings and change them. Over time, this led me to my current solution.

I put each collection of parameters in a separate subclass of the (abstract) ParameterBlock class. So clicking on "Solver" would access on subclass, clicking on "Heuristic" would access a different subclass, and so on. A parameter block can contain parameters of any types. The next shot shows a dialog for the solver parameters in my application. Two of the parameters are boolean (and have check boxes), two are Java enumerations (and have radio button groups), and three are numeric (and have text fields). String parameters are also fine (handled by text boxes).

Defining a parameter block is easy (in my opinion). It pretty much boils down to deciding how many parameters you are going to have, assigning a symbolic name to each (so that in other parts of the code you can refer to "DOMINANCE" and not need to remember if it parameter 1, parameter 2 or whatever), giving each one a label (for dialogs), a type (such as boolean.class or double.class), a default value and a tool tip. The ParameterBlock class contains a static method for generating a dialog like the one below, and one or two other useful methods.

Solver parameters

You can more details in the README file at the Bitbucket repository I set up for this. The repository contains a small example for demonstration purposes, but to use it you just need to copy ParameterBlock.java into your application. As usual, I'm releasing it under a Creative Commons license. Hopefully someone besides me will find it useful.

Tuesday, June 11, 2019

A Swing Platform for Computational Experiments

Most of my research involves coding algorithms and running computational experiments with them. It also involves lots of trial-and-error, both with the algorithms themselves and with assorted parameters that govern their functioning. Back in the Dark Ages, I did all this with programs that ran at a command prompt (or, in Linux terms, in a terminal) and wrote any output to the terminal. Eventually I got hip and started writing simple GUI applications for those experiments. A GUI application lets me load different problem without having to stop and restart the program, lets me change parameter settings visually (without having to screw around with lots of command line options ... is the switch for using antisymmetry constraints -A or -a?), and save output to text files when it's helpful.

Since I code (mostly) in Java, the natural way to do this is with a Swing application. (When I use R, I typically use an R notebook.) Since there are certain features I always want, I found myself copying code from old projects and then cutting out the project-specific stuff and adding new stuff, which is a bit inefficient. So I finally got around to creating a minimal template version of the application, which I'm calling "XFrame" (short for "Experimental JFrame" or "JFrame for Experiments" or something).

I just uploaded it to Bitbucket, where it is open-source under a very nonrestrictive Creative Commons license. Feel free to download it if you want to try it. There's an issue tracker where you can report any bugs or sorely missing features (but keep in mind I'm taking a fairly minimalist approach here).

Using it is pretty simple. The main package (xframe) contains two classes and an interface. You can just plop them into your application somewhere. One class (CustomOutputStream) you will probably not want to change. The actual GUI is the XFrame class. You will want to add menu items (the only one it comes with is File > Exit) and other logic. Feel free to change the class name as well if you wish. Finally, your program needs to implement the interface defined in XFrameController so that XFrame knows how to talk to the program.

The layout contains a window title at the top and a status message area at the bottom, both of which can be fetched and changed by code. The central area is a scrollable text area where the program can write output. It has buttons to save the content and to clear it, and the program will not exit with unsaved text unless you explicitly bless the operation.

There is a function that lets the program open nonmodal, scrollable dialog (which can be left open while the main window is in use, and whose content can be saved to a file). Another function allows the program to pop up modal dialogs (typically warnings or error messages). Yet another function provides a place where you can insert logic to tell the GUI when to enable/disable menu choices (and maybe other things). Finally, there is a built-in extension of SwingWorker that lets you launch a computational operation in a separate thread (where it will not slow down or freeze the GUI).

I included a small "Hello world!" application to show it works. I'll end with a couple of screen shots, one of the main window and the other of the nonmodal dialog (both from the demo). If it looks like something you might want to use, please head over to Bitbucket and grab the source.

Main window
Nonmodal dialog


Monday, June 10, 2019

Indicator Constraints v. Big M

Way, way back I did a couple of posts related to how to model "logical indicators" (true/false values that control enforcement of constraints):
The topic ties in to the general issue of "big M" model formulations. Somewhere around version 10, CPLEX introduced what they call indicator constraints, which are essentially implications of the form "if constraint 1 holds, then constraint 2 holds". As an example, if $a$ and $b$ are vector respectively scalar parameters, $x$ is a binary variable and $y$ is a vector of real variables, you might use an indicator constraint to express $x=1 \implies a'y\le b$.

That same constraint, in a "big M" formulation, would look like $a'y \le b + M(1-x)$. Using an indicator constraint saves you having to make a careful choice of the value of $M$ and avoids certain numerical complications. So should you always use indicator constraints? It's not that simple. Although "big M" formulations are notorious for having weak relaxations, indicator constraints can also result in weak (possibly weaker) relaxations.

For more details, I'll refer you to a question on the new Operations Research Stack Exchange site, and in particular to an answer containing information provided by Dr. Ed Klotz of IBM. My take is that this remains one of those "try it and see" kinds of questions.

Saturday, June 1, 2019

Naming CPLEX Objects

A CPLEX user recently asked the following question on a user forum: "Is there a way to print the constraints as interpreted by CPLEX immediately after adding these constraints using addEq, addLe etc." The context for a question like this is often an attempt to debug either a model or the code creating the model. Other users in the past have indicated difficulty in parsing models after exporting them to a text (LP format) file, because they cannot associate the variable names and constraint names assigned by CPLEX to the variables and constraints in their mathematical models. For example, here is part of a simple set covering model created in Java using CPLEX 12.9 and exported to an LP file:

Minimize
 obj: x1 + x2 + x3 + x4 + x5
Subject To
 c1: x1 + x4 >= 1
 c2: x1 + x3 + x4 >= 1
 c3: x2 + x5 >= 1

Assume that the decisions relate to possible locations for depots. The text is perfectly legible, but is "x4" the decision to put a depot in "San Jose" or the decision to put a depot in "Detroit"? Is constraint "c1" the requirement to have a depot near central Ohio or the requirement to have a depot near the metro New York area?

Assigning meaningful names to variables and constraints is easy. In the Java and C++ APIs (and, I assume, the others), the functions that create variables and constraints have optional string arguments for names. Let's say that I create my variables and my constraints inside loops, both indexed by a variable i. I just need to change
mip.boolVar()
and
mip.addGe(sum, 1)
(where sum is an expression calculated inside the loop) to
mip.boolVar(vnames[i]);
and
mip.addGe(sum, 1, cnames[i]);
where vnames[] and cnames[] are string arrays containing the names I want to use for variables and constraints, respectively. the previous model fragment now looks like the following:
Minimize
 obj: Pittsburgh + San_Jose + Newark + Detroit + Fresno
Subject To
 Central_Ohio: Pittsburgh + Detroit >= 1
 Metro_NY:     Pittsburgh + Newark + Detroit >= 1
 SF_Oakland:   San_Jose + Fresno >= 1
This version is much more useful when either explaining the model (to someone else) or looking for problems with it. (Just don't look for any geographic logic in the example.)

Note the use of underscores, rather than spaces, in the names. CPLEX has some rules about what is syntactically a legitimate name in an LP file, and if you violate those rules, CPLEX with futz with your names and add index numbers to them. So "San Jose" might become "San_Jose#2", and "SF/Oakland" would turn into something at least as silly.

That's part of the battle. The question I cited asks how to print out constraints as they arise. The key there is that the various constraint constructors (IloCplex.ge(), IloCplex.addGe(), IloCplex.le(), IloCplex.addLe(), IloCplex.eq(), IloCplex.addEq(), ...) return a pointer to the constraint they construct. If you pass that pointer to a print statement, you print the constraint. Extending my example, I will tweak the constraint construction a bit, to
IloRange newConstraint = mip.addGe(sum, 1, cnames[i]);
System.out.println(newConstraint);
which creates the cover constraint, adds it to the model and then prints it. That results in output lines like this:
IloRange Central_Ohio : 1.0 <= (1.0*Detroit + 1.0*Pittsburgh) <= infinity

One last observation: If you want to print the entire model out, you do not need to save it to an LP file. Just pass the model (IloCplex object) to a print statement. If I execute
System.out.println(mip);
after the model is complete, I get this:
IloModel  {
IloMinimize  : (1.0*San_Jose + 1.0*Detroit + 1.0*Pittsburgh + 1.0*Newark + 1.0*Fresno)
IloRange Central_Ohio : 1.0 <= (1.0*San_Jose + 1.0*Pittsburgh) <= infinity
IloRange Metro_NY : 1.0 <= (1.0*Pittsburgh + 1.0*Fresno) <= infinity
IloRange SF_Oakland : 1.0 <= (1.0*Detroit + 1.0*Fresno) <= infinity

}
It is not entirely complete (you don't see the declarations of the variables as binary), but it is arguably enough for most model debugging.

Wednesday, May 29, 2019

How to Crash CPLEX

A question elsewhere on the blog reminded me that some users of the CPLEX programming APIs are not conscious of a "technicality" that, when violated, might cause CPLEX to crash (or at least throw an exception).

The bottom line can be stated easily enough: modifying a CPLEX model while solving it is a Bozo no-no. In the Java API, that means making a change to instance of IloCplex while that instance is solving. In the C++ API, where the model (IloModel) and solver (IloCplex) are separate, I suspect it means making a change to either on ... but I'm not sure.

But wait, you say, callbacks do just that! No, not exactly. Callbacks add constraints (either user cuts or lazy constraints) to cut pools maintained by the solver, but they are not added to the model itself. Callbacks have their own add-whatever methods, for this specific purpose. Let's say that (in Java) I declare

IloCplex cplex = new IloCplex();

and then build a model, attach a callback or two and call cplex.solve(). While the solver is working, I can add user cuts or lazy constraints using the IloCplex.UserCutCallback.add() or IloCplex.LazyConstraintCallback.add() methods (or their local alternatives). What I cannot do, even in a callback, is use cplex.add() to add a user cut or lazy constraint (or anything else). If I do, kaboom!

What if, during a solve, I discover some new constraints that I would like to add to the model? One option is to abort the solver, add them, and start over. Another is to store them in program memory, wait for the solver to terminate (optimal solution, time limit, whatever), and then add them to the model. I just cannot add them while the solver is running (unless I want to crash the program).

Note that, while one model is solving, I am free to modify some other model that is not solving. For instance, suppose I decompose a problem into a master problem and several subproblems (one per time period, say). Assuming that subproblems are solved sequentially, not in parallel, if I stumble on a constraint relevant to subproblem #2 while I am solving subproblem #1, I am free to add it to subproblem #2 ... just not (yet) to subproblem #1.

Monday, May 13, 2019

Randomness: Friend or Foe?

I spent a chunk of the weekend debugging some code (which involved solving an optimization problem). There was an R script to setup input files and a Java program to process them. The Java program included both a random heuristic to get things going and an integer program solved by CPLEX.

Randomness in algorithms is both a blessing and a curse. Without it, genetic algorithms would reduce to identical clones of one individual engaging in a staring contest, simulation models would produce absurdly tight confidence intervals (thanks to the output have zero variance) and so on. With it, though, testing and debugging software gets tricky, since you cannot rely on the same inputs, parameter settings, hardware etc. to produce the same output, even when the program is function correctly. If I rerun a problem and get different results, or different performance (e.g., same answer but considerably faster or slower), is that an indication of a bug or just luck of the draw?

Somewhere, back in the Dark Ages, I was told that the answer to all this was to set the seed of the pseudorandom number stream explicitly. This was after the introduction of pseudorandom number generators, of course. Before that, random numbers were generated based on things like fluctuations in the voltage of the power supplied to the mainframe, or readings from a thermocouple stuck ... well, never mind. Today hardware random number generators are used mainly in cryptography or gambling, where you explicitly do not want reproducibility. With pseudorandom numbers, using the same seed will produce the same sequences of "random" values, which should hypothetically make reproducibility possible.

I think I originally came across that in the context of simulation, but it also applies in optimization, and not just in obviously random heuristics and metaheuristics. CPLEX has a built-in pseudorandom number generator which is used for some things related to branching decisions when solving integer programs (and possibly other places too). So explicitly setting a seed for both the random number generator used by my heuristic and CPLEX should make my code reproducible, right?

Wrong. There are two more wildcards here. One is that my Java code uses various types of collections (HashMaps, HashSets, ...) and also uses Streams. Both of those can behave in nondeterministic ways if you are not very careful. (Whether a Stream is deterministic may boil down to whether the source is. I'm not sure about that.) The other, and I'm pretty sure this bites me in the butt more often than any other aspect, is the use of parallel threads. My code runs multiple copies of the heuristic in parallel (using different seeds), and CPLEX uses multiple threads. The problem with parallel threads is that timing is nondeterministic, even if the sequence of operations in each thread is. The operating system will interrupt threads willy-nilly to use those cores for other tasks, such as updating the contents of the display, doing garbage collection in Java, pinging the mail server or asking Skynet whether it's time to terminate the user). If there is any communication between the processes running in parallel threads in your code, the sequencing of the messages will be inherently random. Also, if you give a process a time limit and base it on "wall clock" time (which I'm prone to doing), how far the process gets before terminating will depend on how often and for how long it was interrupted.

Limiting everything to a single thread tends not to be practical, at least for the problems I find myself tackling. So I'm (somewhat) reconciled to the notion that "reproducibility" in what I do has to be interpreted somewhat loosely.

Tuesday, April 16, 2019

CPLEX Callbacks: ThreadLocal v. Clone

A while back, I wrote a post about the new (at the time) "generic" callbacks in CPLEX, including a brief discussion of adventures with multiple threads. A key element was that, with generic callbacks, IBM was making the user more responsible for thread safety. In that previous post, I explored a few options for doing so, including use of Java's ThreadLocal class. ThreadLocal is my current first choice when writing a generic callback.

Recently, I saw a reply by IBM's Daniel Junglas on a CPLEX user forum that contained the following information.
For each thread CPLEX creates a clone of your callback object. This is done by invoking the callback's clone() method. The Java default implementation of clone() returns a shallow copy of the class.
That set me to wondering about the differences between ThreadLocal and cloning, so I did a bit of experimentation. I'll summarize what I think I learned below.

Why is any of this necessary?


It's not if you only allow CPLEX to use a single thread. When using callbacks with multiple threads, it is possible that two or more threads will access the callback simultaneously. Simultaneously reading/fetching the value of something is harmless, but trying to modify a value simultaneously will either trigger an exception or cause the JVM to crash. (That's not hypothetical, by the way. I have a test program that routinely crashes the JVM.) "Information callbacks" are harmless, but "control callbacks" (where you attempt to alter what CPLEX is doing, by adding cuts or lazy constraints or by taking control of branching), usually involve modifying something. In particular, with Benders decomposition your callback needs to solve a subproblem after first adjusting it based on the proposed master problem solution. Two threads trying to adjust the subproblem at the same time spells disaster.

Is the use of ThreadLocal an option for both legacy and generic callbacks?


Yes. The only tricky part is in initialization of the storage. Let's say that I'm doing Benders decomposition, and I have a subproblem that is an instance of IloCplex. I am going to need to create a separate version of the subproblem for each thread. So my callback class will contain a class field declared something like
private ThreadLocal<IloCplex> subproblem;
and it will need to fill in a value of the subproblem for each thread.

With a generic callback, the ThreadUp context provided by CPLEX can be used to do this. Assuming that context is the argument to the callback function you write, you can use code like the following to initialize the subproblem for each thread.
if (context == IloCplex.Callback.Context.Id.ThreadUp) {
  IloCplex s = ...; // code to generate a new subproblem
  subproblem.set(s);
}
Once the subproblem is initialized, to use it when a candidate solution is being proposed, you need to extract it from the ThreadLocal field. Here is an example of how that would look.
if (context == IloCplex.Callback.Context.Id.Candidate) {
  IloCplex s = subproblem.get(s);
  // Do Benders stuff with s.
}

Legacy callbacks do not have a mechanism like ThreadUp for detecting the creation of a new thread. You can still initialize the subproblem "lazily". In the legacy callback, before using the subproblem check to see if it exists. If not, create it. Here's some sample code.
IloCplex s = subproblem.get();  // get the subproblem
if (s == null) {
  // First use: need to generate a fresh subproblem.
  s = ...; // code to generate a new subproblem
  subproblem.set(s);
}
// Do Benders stuff with s.

Is cloning an option for both legacy and generic callbacks?


No. I don't think cloning can be used with generic callbacks. In Java, objects can be cloned. CPLEX declares legacy callbacks as Java objects, but it declares generic callbacks by means of an interface. Cloning an interface is not really a "thing" in Java.

When you use a generic callback, you create a class that implements the Callback interface. It's certainly possible to make that class cloneable, but according to my experiments CPLEX will not call the clone() method on that class, even if it exists. So the solver will be working with a single instance of that class, and if the class is not thread-safe, kaboom.

What does cloning entail?


There are quite a few web sites that explain cloning in Java, and if you intend to use it I recommend you do a search. I'll just cover the bare bones here.
  1. Declare your class with the modifier "implements Cloneable".
  2. Override the protected method Object.clone.
  3. Call the parent method (super.clone()) as the very first executable line of the clone() method.
  4. Handle the CloneNotSupportedException that the parent method might hypothetically throw, either by catching it and doing something, or by rethrowing it.
  5. After calling the parent method, fix anything that needs fixing.
I left that last step rather vague. With a Benders lazy constraint callback, you will need to replace the original subproblem with a freshly generated version (so that no two threads are chewing on the same subproblem instance). If life were fair (it's not), you would just do a deep clone of the original problem. The catch is that the IloCplex class is not cloneable. So the best (only?) alternative I can see is to build a new copy of the subproblem, the same way you built the original copy.

If you have other class fields that the callback might modify (such as a vector that stores the best feasible solution encountered), you may need to do a deep clone of them (or replace them with new versions). A detailed analysis of the differences between shallow and deep cloning is beyond the scope of this post (or my competence). As Daniel points out in his answer, super.clone() only makes a shallow copy. You'll need to take additional steps to make sure that fields containing arrays and other objects (other than primitives) are handled properly if the callback might modify them.

Here is some skeleton code.

public class MyCallback extends IloCplex.LazyConstraintCallback
                        implements Cloneable {
  private IloCplex subproblem;

  // ...

  /**
   * Clone this callback.
   * @return the clone
   * @throws CloneNotSupportedException if the parent clone method bombs
   */
  @Override
  public MyCallback clone() throws CloneNotSupportedException {
    // Call Object.clone first.
    MyCallback cb = (CloneCallback) super.clone();
    // Replace the subproblem with a fresh copy.
    subproblem = ...;  // generate a fresh copy of the subproblem
    // Make deep copies (or new values) for other fields as needed.
    return cb;
  }

}

Tuesday, March 12, 2019

Pseudocode in LyX Revisited

This post is strictly for users of LyX.

In a previous post I offered up a LyX module for typesetting pseudocode. Unfortunately, development of that module bumped up against some fundamental incompatibilities between the algorithmicx package and the way LyX layouts work. The repository for it remains open, but I decided to shift my efforts to a different approach.

LyX has built-in support for "program listings", insets that use either the listings (default choice) or minted LaTeX package. The listings package supports a lengthy list of programming languages, but not pseudocode. This makes sense because pseudocode has no fixed syntax; everyone writes it their own way. So I cobbled together a LyX module that adds my take on a pseudocode language to the listings package, allowing users to type pseudocode into a LyX program listing inset.

The module is available (under a GPL3 license) from a GitHub repository. The repository includes a user guide (both a LyX file and a PDF version) explaining both installation and the nuances of using the module. (Yes, there are some finicky bits.) The user guide also spells out what words are treated as keywords and formatted accordingly. I also built in a (somewhat clumsy) way to insert vertical lines to help the user recognize the span of a block of code.

The listings package provides (and LyX supports) a number of nice formatting features, including options for color-coding keywords, numbering lines and such. In addition, you can modify the list of recognized keywords by editing the module file in a text editor. If you are familiar with the listings package, you can customize the pseudocode language definition even further (for instance, by changing the delimiters for comments), again by hacking the module file. If you use the module, feel free to suggest changes to the language definition via the issue tracker for the repository.

Here is a small sample of the output, to give you an idea of what I am talking about. (I had to convert the PDF output to an image, which explains the slight fuzziness.)

While we're on the subject of LyX, I also have a layout file for the standalone LaTeX class, which is useful for generating PDF output of just one table, image or whatever without the wasted space of page margins. That layout file has its own GitHub repository, if you are interested.

Sunday, February 24, 2019

Guessing Pareto Solutions: A Test

In yesterday's post, I described a simple multiple-criterion decision problem (binary decisions, no constraints), and suggested a possible way to identify a portion of the Pareto frontier using what amounts to guesswork: randomly generate weights; use them to combine the multiple criteria into a single objective function; optimize that (trivial); repeat ad nauseam.

I ran an experiment, just to see whether the idea was a complete dud or not. Before getting into the results, I should manage expectations a bit. Specifically, while it's theoretically possible in some cases that you might find all the Pareto efficient solutions (and is in fact guaranteed if there is only one), in general you should not bank on it. Here's a trivial case that demonstrates that it may be impossible to find them all using my proposed technique. Suppose we have three yes-no decisions ($N=3$, using the notation from yesterday's post) and two criteria ($M=2$), one to be maximized and one to be minimized. Supposed further that, after changing the sign of the second criterion (so that we are maximizing both) and scaling, that all three decisions produce identical criterion values of $(1,-1)$. With $N=3$, there are $2^3=8$ candidate solutions. One (do nothing) produces the result $(0,0)$. Three (do just one thing) produce $(1,-1)$, three (do any two things) produce $(2,-2)$ and one (do all three things) produces $(3,-3)$. All of those are Pareto efficient! However, if we form a weighted combination using weights $(\alpha_1, \alpha_2) \gg 0$ for the criteria, then every decision has the same net impact $\beta = \alpha_1 - \alpha_2$. If $\beta > 0$, the only solution that optimizes the composite function is $x=(1,1,1)$ with value $3\beta$. If $\beta<0$, the only solution that optimizes the composite function is $x=(0,0,0)$ with value $0$. The other six solutions, while Pareto efficient, cannot be found my way.

With that said, I ran a single test using $N=10$ and $M=5$, with randomly generated criterion values. One test like this is not conclusive for all sorts of reasons (including but not limited to the fact that I made the five criteria statistically independent of each other). You can see (and try to reproduce) my work in an R notebook hosted on my web site. The code is embedded in the notebook, and you can extract it easily. Fair warning: it's pretty slow. In particular, I enumerated all the Pareto efficient solutions among the $2^{10} = 1,024$ possible solutions, which took about five minutes on my PC. I then tried one million random weight combinations, which took about four and a half minutes.

Correction: After I rejiggered the code, generating a million random trials took only 5.8 seconds. The bulk of that 4.5 minutes was apparently spent keeping track of how often each solution appeared among the one million results ... and even that dropped to a second or so after I tightened up the code.

To summarize the results, there were 623 Pareto efficient solutions out of the population of 1,024. The random guessing strategy only found 126 of them. It found the heck out of some of them: the maximum number of times the same solution was identified was 203,690! (I guess that one stuck out like a sore thumb.) Finding 126 out of 623 may not sound too good, but bear in mind the idea is to present a decision maker with a reasonable selection of Pareto efficient solutions. I'm not sure how many decision makers would want to see even 126 choices.

A key question is whether the solutions found by the random heuristic are representative of the Pareto frontier. Presented below are scatter plots of four pairs of criteria, showing all the Pareto efficient solutions color-coded according to whether or not they were identified by the heuristic. (You can see higher resolution versions by clicking on the link above to the notebook, which will open in your browser.) In all cases, the upper right corner would be ideal. Are the identified points a representative sample of all the Pareto efficient points? I'll let you judge.





I'll offer one final thought. The weight vectors are drawn uniformly over a unit hypercube of dimension $M$, and the frequency with which each identified Pareto solution occurs within the sample should be proportional to the volume of the region within that hypercube containing weights favoring that solution. So high frequency solutions have those high frequencies because wider ranges of weights favor them. Perhaps that makes them solutions that are more likely to appeal to decision makers than their low frequency (or undiscovered) counterparts?

Saturday, February 23, 2019

Guessing Pareto Solutions

One of the challenges of multiple-criteria decision-making (MCDM) is that, in the absence of a definitive weighting or prioritization of criteria, you cannot talk meaningfully about a "best" solution. (Optimization geeks such as myself tend to find that a major turn-off.) Instead, it is common to focus on Pareto efficient solutions. We can say that one solution A dominates another solution B if A does at least as well as B on all criteria and better than B on at least one criterion. A solution is Pareto efficient if no solution dominates it.

Recently I was chatting with a woman about a multiple-criteria model she had. The variables were all binary (yes-no decisions) and were otherwise unconstrained. Normally my hackles would go up at an unconstrained decision problem, but in this case all constraints were "soft". For example, rather than specify a fixed budget, she would make cost one of the criteria (less being better) and let the decision maker screen out Pareto efficient solutions with unrealistic budgets.

I won't name the researcher or go into more details, since her work is currently under review (as of this writing), but I will look at a similar problem that is only slightly simpler. Namely, I will add the requirement that the criteria are additive, meaning that the value of any criterion for any "yes" does not depend on which other decisions wound up "yes" rather than "no". So, in mathematical terms, we have $N$ binary decisions ($x\in\{0,1\}^N$) and $M$ criteria. Let $c_{ij}$ denote the value of criterion $j$ when $x_i=1$. In what follows, I will want to be consistent in maximizing (rather than minimizing) criterion values, so I'll assume that (a) a "no" decision has value 0 for every criterion and (b) criteria are normalized so that $c_{ij}\in [0,1]$ for all $i$ and for all $j$ such that more is better, while $c_{ij}\in [-1,0]$ for all $i$ and for all $j$ such that (before scaling) less is better. For example, if cost is a criterion (and if you do not work for the Pentagon), the scaled values of cost will run from -1 for the most expensive choice encountered to 0 for any choice that is free.

The solution space contains $2^N$ candidate solutions, which means that enumerating them is out of the question for nontrivial values of $N$ (and, trust me, hers are nontrivial ... very, very nontrivial). In any case, the size of the Pareto frontier when $N$ is nontrivial will quickly outpace any decision maker's capacity (or patience). So it seems reasonable to sample the Pareto frontier and present the solution maker with a reasonably (but not excessively) large set of Pareto efficient solutions that hopefully is representative of the overall Pareto frontier.

There are methods for doing this, and the researcher pointed me toward one: the NSGA-II genetic algorithm [1]. Like any genetic algorithm, NSGA-II runs until you stop it via some limit (run time, number of generations, some measure of stagnation, ...). As I understand it, the final population of solutions are not guaranteed to be Pareto efficient, but the population converges in some sense to the Pareto frontier. That would likely be sufficient in practice, and the researcher had good luck with it initially, but ran into scaling problems as $N$ grew.

Looking at the problem reminded me of a sometimes troubling dichotomy faced by academics (in at least some disciplines, including operations research). On the one hand, real-world decision makers with real-world problems are often quite happy to get solutions that are obtained quickly and cheaply and are easily explained or understood, even if the technology is "primitive". Academics, on the other hand, usually need to publish their research. In the domain of the problem, low-tech algorithms may be perfectly fine (and easier to understand than more esoteric algorithms), but for a professor in IE or OR looking to publish in a journal in his or her field, "simple" usually means "too unsophisticated to publish". Thus we tend at times to "over-engineer" solutions.

Anyway, taking off my academic hat for a moment, I got to wondering whether a no-brainer random search approach would work for a problem such as this one. My premise was the following. Suppose I randomly select weights $\alpha_1,\dots,\alpha_M \in (0,1]$ and use them to form a single weighted criterion function $f_\alpha:\{0,1\}^N \rightarrow \Re$ as follows:$$f_\alpha (x)=\sum_{i=1}^N w_i(\alpha) x_i$$where$$w_i(\alpha)=\sum_{j=1}^M \alpha_j c_{ij}.$$Note that the weights $\alpha_i$ need to be strictly positive, not just nonnegative. Maximizing $f_\alpha$ over the solution space is trivial: set $x_i=1$ if $w_i(\alpha) > 0$ and $x_i=0$ if $w_i(\alpha) < 0$. (You can set $x_i$ to anything you like if $w_i(\alpha) = 0$, but that should have probability 0 of occurring given the random generation of $\alpha$.) Constructing the coefficients of $f_\alpha$ is $O(MN)$ and optimizing $f_\alpha$ is $O(N)$, so each solution $x$ produced this way takes $O(MN)$ time. In practice, $M$ is likely not to get too big, and very likely does not grow as $N$ does, so functionally this is really $O(N)$ in complexity.

It's trivial to show that, so long as none of the composite weights $w_i(\alpha)$ is 0, each $x$ found by optimizing some $f_\alpha$ must be Pareto efficient. So we can generate a sample of Pareto efficient solutions fairly efficiently ... maybe. I did deal one card off the bottom of the deck. Solutions will tend to repeat, so we will need to do some bookkeeping to ensure that, in the final results, duplicates are weeded out. That means comparing each new solution to its predecessors. Each comparison is $O(N)$, and if we assume that we only retain a "reasonable" number of alternatives (a number that does not grow with $N$) in order to avoid causing the decision maker's brain to explode, then time spent weeding duplicates should be $O(N)$. Overall, the load is $O(NR)$, where $R$ is the number of replications performed.

That left me with a few questions, the main one being this: for plausible sample sizes, how well does the set of solutions obtained represent the Pareto frontier? Is it "biased" in the sense that solutions cluster on one part of the frontier while missing other parts?

I'll show some experimental results in my next post, and let you decide.


[1] K. Deb, A. Pratap, S. Agarwal & T. Meyarivan (2002). A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6, 182-197.

Thursday, February 14, 2019

Binary / Integer Conversion in R

I've been poking around in R with an unconstrained optimization problem involving binary decision variables. (Trust me, it's not as trivial as it sounds.) I wanted to explore the entire solution space. Assuming $n$ binary decisions, this means looking at the set $\{0,1\}^n$, which contains $2^n$ 0-1 vectors of dimension $n$.

For reasons I won't get into, I wanted to associate each vector with an ID number from 0 to $2^n-1$, which sounds pretty straightforward ... until you try to do it. I thought I'd post my code here, with no warranty ("express or implied", as the legal eagles would say) that it's computationally efficient.

Before I explain the code, let me just point out that using an integer from 0 to $2^n-1$ implies that $n$ is small enough that $2^n-1$ fits in a machine integer. So if you decide to try the code, don't go nuts with it.

Here's the code:

# Create a vector of powers of 2 (for use in conversions from binary vectors to integers).
powers.of.two <- 2^(0:(n - 1))
# Convert an ID (integer) to a binary vector of appropriate length. Note that the vector is reversed so that the lowest order bit (corresponding to the first decision) comes first.
fromID <- function(id) { as.integer(head(intToBits(id), n)) }
# Convert a binary vector of appropriate length to an ID value (integer).
toID <- function(vec) { as.integer(vec %*% powers.of.two) } 

To facilitate converting binary vectors to integers, I store all the powers of 2 once. The fromID() function takes an integer ID number and converts it to a binary vector of length $n$. It uses the intToBits() function from the R base package, which does the heavy lifting but whose output needs a little massaging. The toID() function takes a binary vector and converts it to an integer ID number. So, with $n=5$, the output of fromID(22) is
[1] 0 1 1 0 1
while the output of toID(c(0, 1, 1, 0, 1)) is
[1] 22
(as you would expect).

There is one other thing I should point out: because I am using the ID numbers to index binary vectors starting from (0, 0, ..., 0) and working up to (1, 1, ..., 1), I designed the functions to put the least significant bit first. If you want the most significant bit first and the least significant bit last, you need to make two changes to the code: change the definition of powers.of.two to 2^((n - 1):0); and, in the definition of fromID, change head(intToBits(id), n) to rev(head(intToBits(id), n)).

Friday, January 25, 2019

Threads and Memory

Yesterday I had a rather rude reminder (actually, two) of something I've known for a while. I was running a Java program that uses CPLEX to solve an integer programming model. The symptoms were as follows: shortly after the IP solver run started, I ran out of RAM, the operating system started paging memory to the hard drive, and the resulting hard drive thrashing made the system extremely sluggish (to put it charitably). Long after I regained enough control to kill the program, there was a significant amount of disk activity (and concomitant noise) as the system gradually came back to its senses.

How did this happen? My system has four cores, which means CPLEX defaults to running four parallel threads when solving models. What's not always obvious is that each thread gets a separate copy of the model. (I'm not sure if it is the entire model after presolving or just most of it, but it's definitely a large chunk.) In my particular case, the model begins with an unpredictably large number of constraints, and when that number got big, the model got big -- not big enough to be a problem if I used a single thread, but too big to get away with four copies of it ... or, as it turned out, three copies. (The second thrashing event was when I tried to run it with three threads.)

Parallel threading is great, but there are two caveats associated with it. First, performance is a sublinear function of the number of threads, meaning that doubling the number of threads will not cut run time in half, tripling the number of threads will not cut run time to a third the single-thread time, and so on. Second, if you are dealing with a largish model, you might want to try running a little while with a single thread to see how much memory it eats, and then decide how many copies your system can handle comfortably. That's an upper bound on how many threads you should use.