tag:blogger.com,1999:blog-87813834610619295712019-06-16T16:44:22.003-04:00OR in an OB WorldA mix of operations research items and software tricks that I'll probably forget if I don't write them down somewhere.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.comBlogger392125tag:blogger.com,1999:blog-8781383461061929571.post-66288881104416711242019-06-16T16:44:00.000-04:002019-06-16T16:44:21.857-04:00R v. PythonA couple of days ago, I was having a conversation with someone that touched on the curriculum for a masters program in analytics. One thing that struck me was requirement of one semester each of R and Python programming. On the one hand, I can see a couple of reasons for requiring both: some jobs will require the worker to deal with both kinds of code; and even if you're headed to a job where one of them suffices, you don't know which one it will be while you're in the program. On the other hand, I tend to think that most people can get by nicely with just one or the other (in my case R), if not neither. Also, once you've learned an interpreted language (plus the general concepts of programming), I tend to think you can learn another one on your own if you need it. (This will not, however, get you hired if proficiency in the one you don't know is an up-front requirement.)<br /><br />I don't really intend to argue the merits of requiring one v. both here, nor the issue of whether a one-semester deep understanding of two languages is as good as a two-semester deep understanding of one language. Purely by coincidence, that conversation was followed by my discovering <a href="https://github.com/matloff/R-vs.-Python-for-Data-Science" target="_blank">R vs. Python for Data Science</a>, a point-by-point comparison of the two by Norm Matloff (a computer science professor at UC Davis). If you are interested in data science and trying to decide which one to learn (or which to learn first), I think Matloff's comparison provides some very useful information. With the exception of "Language unity", I'm inclined to agree with his ratings.<br /><br />Matloff calls language unity a "horrible loss" for R, emphasizing a dichotomy between conventional/traditional R and the so-called "Tidyverse" (which is actually a set of packages). At the same time, he calls the transition form Python 2.7 to 3.x "nothing too elaborate". Personally, I use Tidyverse packages when it suits me and not when it doesn't. There's a bit of work involved in learning each new Tidyverse package, but that's not different from learning any other package. He mentions tibbles and pipes. Since a tibble is a subclass of a data frame, I can (and often do) ignore the differences and just treat them as data frames. As far as pipes go, they're not exactly a Tidyverse creation. The Tidyverse packages load the magrittr package to get pipes, but I think that package predates the Tidyverse, and I use it with "ordinary" R code. Matloff also says that "... the Tidyverse should be considered advanced R, not for beginners." If I were teaching an intro to R course, I think I would introduce the Tidyverse stuff early, because it imposes some consistency on the outputs of R functions that is sorely lacking in base R. (If you've done much programming in R, you've probably had more than a few "why the hell did it do that??" moments, such as getting a vector output when you expected a scalar or vice versa.) Meanwhile, I've seen issues with software that bundled Python scripts (and maybe libraries), for instance because users who came to Python recently and have only ever installed 3.x discover that the bundled scripts require 2.x (or vice versa).<br /><br />Anyway, that one section aside (where I think Matloff and I can agree to disagree), the comparison is quite cogent, and makes for good reading.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-2325295554644630442019-06-14T17:20:00.000-04:002019-06-14T17:20:28.733-04:00A Java Container for ParametersA few days ago, I posted about a <a href="https://orinanobworld.blogspot.com/2019/06/a-swing-platform-for-computational.html" target="_blank">Swing class</a> (and supporting stuff) that I developed to facilitate my own computations research, and which I have now made open-source in a <a href="https://bitbucket.org/prubin/xframe/src/master/" target="_blank">Bitbucket repository</a>. I finally got around to cleaning up another Java utility class I wrote, and which I use regularly in experiments. I call it <code>ParameterBlock</code>. It is designed to be a container for various parameters that I need to set during experiments.<br /><br />It might be easiest if I start with a couple of screen shots. The first one shows a Swing application I was using in a recent project. Specifically, it shows the "Settings" menu, which has multiple entries corresponding to different computational stages (the overall solver, an initial heuristic, two phases of post-heuristic number crunching), along with options to save and load settings.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-_Vtkim68MOg/XQQKFFgy5wI/AAAAAAAAC7c/7tBEldIx3T0HFfcreut3K_HpHDvCkCiJwCLcBGAs/s1600/param_menu.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="302" data-original-width="600" height="200" src="https://1.bp.blogspot.com/-_Vtkim68MOg/XQQKFFgy5wI/AAAAAAAAC7c/7tBEldIx3T0HFfcreut3K_HpHDvCkCiJwCLcBGAs/s400/param_menu.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Parameter settings menu</td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td></tr></tbody></table><br />Computational research can involve a ton of choices for various parameters. CPLEX alone has what seems to be an uncountable number of them. In the Dark Ages, I hard-coded parameters, which meant searching the source code (and recompiling) every time I wanted to change one. Later I graduated to putting them on the command line, but that gets old quickly if there are more than just a handful. When I started writing simple Swing platforms for my work (like the one shown above), I added menu options to call up dialogs that would let me see the current settings and change them. Over time, this led me to my current solution.<br /><br />I put each collection of parameters in a separate subclass of the (abstract) <code>ParameterBlock</code> class. So clicking on "Solver" would access on subclass, clicking on "Heuristic" would access a different subclass, and so on. A parameter block can contain parameters of any types. The next shot shows a dialog for the solver parameters in my application. Two of the parameters are boolean (and have check boxes), two are Java enumerations (and have radio button groups), and three are numeric (and have text fields). String parameters are also fine (handled by text boxes).<br /><br />Defining a parameter block is easy (in my opinion). It pretty much boils down to deciding how many parameters you are going to have, assigning a symbolic name to each (so that in other parts of the code you can refer to "DOMINANCE" and not need to remember if it parameter 1, parameter 2 or whatever), giving each one a label (for dialogs), a type (such as <code>boolean.class</code> or <code>double.class</code>), a default value and a tool tip. The <code>ParameterBlock</code> class contains a static method for generating a dialog like the one below, and one or two other useful methods.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-94XnKwc06mo/XQQL_zMsn_I/AAAAAAAAC7s/bZw8O1KiDi8KHcERG_B5H5nyx3UuYyMVACLcBGAs/s1600/solver_dialog.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="430" data-original-width="656" height="261" src="https://1.bp.blogspot.com/-94XnKwc06mo/XQQL_zMsn_I/AAAAAAAAC7s/bZw8O1KiDi8KHcERG_B5H5nyx3UuYyMVACLcBGAs/s400/solver_dialog.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Solver parameters</td></tr></tbody></table><div class="separator" style="clear: both; text-align: center;"><br /></div>You can more details in the README file at the <a href="https://bitbucket.org/prubin/parameterblocks/src/master/" target="_blank">Bitbucket repository</a> I set up for this. The repository contains a small example for demonstration purposes, but to use it you just need to copy <code>ParameterBlock.java</code> into your application. As usual, I'm releasing it under a Creative Commons license. Hopefully someone besides me will find it useful.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-80568536289871107982019-06-11T15:05:00.000-04:002019-06-11T15:05:45.690-04:00A Swing Platform for Computational ExperimentsMost of my research involves coding algorithms and running computational experiments with them. It also involves lots of trial-and-error, both with the algorithms themselves and with assorted parameters that govern their functioning. Back in the Dark Ages, I did all this with programs that ran at a command prompt (or, in Linux terms, in a terminal) and wrote any output to the terminal. Eventually I got hip and started writing simple GUI applications for those experiments. A GUI application lets me load different problem without having to stop and restart the program, lets me change parameter settings visually (without having to screw around with lots of command line options ... is the switch for using antisymmetry constraints -A or -a?), and save output to text files when it's helpful.<br /><br />Since I code (mostly) in Java, the natural way to do this is with a Swing application. (When I use R, I typically use an R notebook.) Since there are certain features I always want, I found myself copying code from old projects and then cutting out the project-specific stuff and adding new stuff, which is a bit inefficient. So I finally got around to creating a minimal template version of the application, which I'm calling "XFrame" (short for "Experimental JFrame" or "JFrame for Experiments" or something).<br /><br />I just uploaded it to <a href="https://bitbucket.org/prubin/xframe/src/master/" target="_blank">Bitbucket</a>, where it is open-source under a very nonrestrictive Creative Commons license. Feel free to download it if you want to try it. There's an issue tracker where you can report any bugs or sorely missing features (but keep in mind I'm taking a fairly minimalist approach here).<br /><br />Using it is pretty simple. The main package (<code>xframe</code>) contains two classes and an interface. You can just plop them into your application somewhere. One class (<code>CustomOutputStream</code>) you will probably not want to change. The actual GUI is the <code>XFrame</code> class. You will want to add menu items (the only one it comes with is <code>File > Exit</code>) and other logic. Feel free to change the class name as well if you wish. Finally, your program needs to implement the interface defined in XFrameController so that XFrame knows how to talk to the program.<br /><br />The layout contains a window title at the top and a status message area at the bottom, both of which can be fetched and changed by code. The central area is a scrollable text area where the program can write output. It has buttons to save the content and to clear it, and the program will not exit with unsaved text unless you explicitly bless the operation.<br /><br />There is a function that lets the program open nonmodal, scrollable dialog (which can be left open while the main window is in use, and whose content can be saved to a file). Another function allows the program to pop up modal dialogs (typically warnings or error messages). Yet another function provides a place where you can insert logic to tell the GUI when to enable/disable menu choices (and maybe other things). Finally, there is a built-in extension of <code>SwingWorker</code> that lets you launch a computational operation in a separate thread (where it will not slow down or freeze the GUI).<br /><br />I included a small "Hello world!" application to show it works. I'll end with a couple of screen shots, one of the main window and the other of the nonmodal dialog (both from the demo). If it looks like something you might want to use, please head over to Bitbucket and grab the source.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-Hr89QlrxdHk/XP_6_1Il8SI/AAAAAAAAC7I/ujrQHQbmeXsL2PcXZc3-LS7luv7D4DTQwCLcBGAs/s1600/main_window.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="300" data-original-width="700" height="170" src="https://1.bp.blogspot.com/-Hr89QlrxdHk/XP_6_1Il8SI/AAAAAAAAC7I/ujrQHQbmeXsL2PcXZc3-LS7luv7D4DTQwCLcBGAs/s400/main_window.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Main window</td></tr></tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-Fx3vNLdhEeA/XP_6_6aA8mI/AAAAAAAAC7M/qq8Rwn0QX4gp41OpdYQeqt97TB9O1_YcACLcBGAs/s1600/dialog_window.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="270" data-original-width="450" height="240" src="https://1.bp.blogspot.com/-Fx3vNLdhEeA/XP_6_6aA8mI/AAAAAAAAC7M/qq8Rwn0QX4gp41OpdYQeqt97TB9O1_YcACLcBGAs/s400/dialog_window.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Nonmodal dialog</td></tr></tbody></table><br /><span id="goog_1661107282"></span><span id="goog_1661107283"></span><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-2498865291926015102019-06-10T10:02:00.000-04:002019-06-10T10:02:56.819-04:00Indicator Constraints v. Big MWay, way back I did a couple of posts related to how to model "logical indicators" (true/false values that control enforcement of constraints):<br /><ul><li><h4 class="post-title entry-title"><a href="https://orinanobworld.blogspot.com/2010/07/logical-indicators-in-mathematical.html">Logical Indicators in Mathematical Program</a> </h4></li><li><h4 class="post-title entry-title"><a href="https://orinanobworld.blogspot.com/2012/05/indicator-implies-relation.html">Indicator Implies Relation</a></h4><h4 class="post-title entry-title"></h4></li></ul>The topic ties in to the general issue of "big M" model formulations. Somewhere around version 10, CPLEX introduced what they call <i>indicator constraints</i>, which are essentially implications of the form "if constraint 1 holds, then constraint 2 holds". As an example, if $a$ and $b$ are vector respectively scalar parameters, $x$ is a binary variable and $y$ is a vector of real variables, you might use an indicator constraint to express $x=1 \implies a'y\le b$.<br /><br />That same constraint, in a "big M" formulation, would look like $a'y \le b + M(1-x)$. Using an indicator constraint saves you having to make a careful choice of the value of $M$ and avoids certain numerical complications. So should you always use indicator constraints? It's not that simple. Although "big M" formulations are notorious for having weak relaxations, indicator constraints can also result in weak (possibly weaker) relaxations.<br /><br />For more details, I'll refer you to a <a href="https://or.stackexchange.com/questions/231/when-to-use-indicator-constraints-versus-big-m-approaches-in-solving-mixed-int" target="_blank">question</a> on the new <a href="https://or.stackexchange.com/" target="_blank">Operations Research Stack Exchange</a> site, and in particular to <a href="https://or.stackexchange.com/questions/231/when-to-use-indicator-constraints-versus-big-m-approaches-in-solving-mixed-int/348#348" target="_blank">an answer</a> containing information provided by Dr. Ed Klotz of IBM. My take is that this remains one of those "try it and see" kinds of questions.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-20292720185525340202019-06-01T16:27:00.001-04:002019-06-01T16:27:52.270-04:00Naming CPLEX ObjectsA CPLEX user recently asked the following <a href="https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=ff135aca-927b-4609-9cf9-f5645c2dcf15&permalinkReplyUuid=387b0206-89ea-46c0-ad05-21a575275e96" target="_blank">question</a> on a user forum: "Is there a way to print the constraints as interpreted by CPLEX immediately after adding these constraints using addEq, addLe etc." The context for a question like this is often an attempt to debug either a model or the code creating the model. Other users in the past have indicated difficulty in parsing models after exporting them to a text (LP format) file, because they cannot associate the variable names and constraint names assigned by CPLEX to the variables and constraints in their mathematical models. For example, here is part of a simple set covering model created in Java using CPLEX 12.9 and exported to an LP file:<br /><br /><pre>Minimize<br /> obj: x1 + x2 + x3 + x4 + x5<br />Subject To<br /> c1: x1 + x4 >= 1<br /> c2: x1 + x3 + x4 >= 1<br /> c3: x2 + x5 >= 1<br /><br /></pre>Assume that the decisions relate to possible locations for depots. The text is perfectly legible, but is "x4" the decision to put a depot in "San Jose" or the decision to put a depot in "Detroit"? Is constraint "c1" the requirement to have a depot near central Ohio or the requirement to have a depot near the metro New York area?<br /><br />Assigning meaningful names to variables and constraints is easy. In the Java and C++ APIs (and, I assume, the others), the functions that create variables and constraints have optional string arguments for names. Let's say that I create my variables and my constraints inside loops, both indexed by a variable <span style="font-family: "courier new" , "courier" , monospace;">i</span>. I just need to change<br /><pre>mip.boolVar()</pre>and<br /><pre>mip.addGe(sum, 1)</pre>(where <span style="font-family: "courier new" , "courier" , monospace;">sum</span> is an expression calculated inside the loop) to<br /><pre>mip.boolVar(vnames[i]);</pre>and<br /><pre>mip.addGe(sum, 1, cnames[i]);</pre>where <span style="font-family: "courier new" , "courier" , monospace;">vnames[]</span> and <span style="font-family: "courier new" , "courier" , monospace;">cnames[]</span> are string arrays containing the names I want to use for variables and constraints, respectively. the previous model fragment now looks like the following:<br /><pre>Minimize<br /> obj: Pittsburgh + San_Jose + Newark + Detroit + Fresno<br />Subject To<br /> Central_Ohio: Pittsburgh + Detroit >= 1<br /> Metro_NY: Pittsburgh + Newark + Detroit >= 1<br /> SF_Oakland: San_Jose + Fresno >= 1</pre>This version is much more useful when either explaining the model (to someone else) or looking for problems with it. (Just don't look for any geographic logic in the example.)<br /><br />Note the use of underscores, rather than spaces, in the names. CPLEX has some rules about what is syntactically a legitimate name in an LP file, and if you violate those rules, CPLEX with futz with your names and add index numbers to them. So "San Jose" might become "San_Jose#2", and "SF/Oakland" would turn into something at least as silly.<br /><br />That's part of the battle. The question I cited asks how to print out constraints as they arise. The key there is that the various constraint constructors (<span style="font-family: "courier new" , "courier" , monospace;">IloCplex.ge()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.addGe()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.le()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.addLe()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.eq()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.addEq()</span>, ...) <i>return a pointer</i> to the constraint they construct. If you pass that pointer to a print statement, you print the constraint. Extending my example, I will tweak the constraint construction a bit, to<br /><pre>IloRange newConstraint = mip.addGe(sum, 1, cnames[i]);<br />System.out.println(newConstraint);</pre>which creates the cover constraint, adds it to the model and then prints it. That results in output lines like this:<br /><pre>IloRange Central_Ohio : 1.0 <= (1.0*Detroit + 1.0*Pittsburgh) <= infinity</pre><br />One last observation: If you want to print the entire model out, you do not need to save it to an LP file. Just pass the model (<span style="font-family: "courier new" , "courier" , monospace;">IloCplex</span> object) to a print statement. If I execute<br /><pre>System.out.println(mip);</pre>after the model is complete, I get this:<br /><pre>IloModel {<br />IloMinimize : (1.0*San_Jose + 1.0*Detroit + 1.0*Pittsburgh + 1.0*Newark + 1.0*Fresno)<br />IloRange Central_Ohio : 1.0 <= (1.0*San_Jose + 1.0*Pittsburgh) <= infinity<br />IloRange Metro_NY : 1.0 <= (1.0*Pittsburgh + 1.0*Fresno) <= infinity<br />IloRange SF_Oakland : 1.0 <= (1.0*Detroit + 1.0*Fresno) <= infinity<br /><br />}</pre>It is not entirely complete (you don't see the declarations of the variables as binary), but it is arguably enough for most model debugging.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-33218158419774072352019-05-29T14:58:00.001-04:002019-05-29T14:59:10.828-04:00How to Crash CPLEXA question elsewhere on the blog reminded me that some users of the CPLEX programming APIs are not conscious of a "technicality" that, when violated, might cause CPLEX to crash (or at least throw an exception).<br /><br />The bottom line can be stated easily enough: modifying a CPLEX model while solving it is a <a href="https://www.urbandictionary.com/define.php?term=Bozo%20no-no" target="_blank">Bozo no-no</a>. In the Java API, that means making a change to instance of <span style="font-family: "courier new" , "courier" , monospace;">IloCplex</span> while that instance is solving. In the C++ API, where the model (<span style="font-family: "courier new" , "courier" , monospace;">IloModel</span>) and solver (<span style="font-family: "courier new" , "courier" , monospace;">IloCplex</span>) are separate, I suspect it means making a change to either on ... but I'm not sure.<br /><br />But wait, you say, callbacks do just that! No, not exactly. Callbacks add constraints (either user cuts or lazy constraints) to cut pools maintained by the solver, but they are <i>not</i> added to the model itself. Callbacks have their own add-whatever methods, for this specific purpose. Let's say that (in Java) I declare<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">IloCplex cplex = new IloCplex();</span><br /><br />and then build a model, attach a callback or two and call <span style="font-family: "courier new" , "courier" , monospace;">cplex.solve()</span>. While the solver is working, I can add user cuts or lazy constraints using the <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.UserCutCallback.add()</span> or <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.LazyConstraintCallback.add()</span> methods (or their local alternatives). What I <i>cannot</i> do, even in a callback, is use <span style="font-family: "courier new" , "courier" , monospace;">cplex.add()</span> to add a user cut or lazy constraint (or anything else). If I do, kaboom!<br /><br />What if, during a solve, I discover some new constraints that I would like to add to the model? One option is to abort the solver, add them, and start over. Another is to store them in program memory, wait for the solver to terminate (optimal solution, time limit, whatever), and then add them to the model. I just cannot add them while the solver is running (unless I want to crash the program).<br /><br />Note that, while one model is solving, I am free to modify some other model that is not solving. For instance, suppose I decompose a problem into a master problem and several subproblems (one per time period, say). Assuming that subproblems are solved sequentially, not in parallel, if I stumble on a constraint relevant to subproblem #2 while I am solving subproblem #1, I am free to add it to subproblem #2 ... just not (yet) to subproblem #1.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-13758631604944956092019-05-13T18:07:00.001-04:002019-05-13T18:07:17.908-04:00Randomness: Friend or Foe?I spent a chunk of the weekend debugging some code (which involved solving an optimization problem). There was an R script to setup input files and a Java program to process them. The Java program included both a random heuristic to get things going and an integer program solved by CPLEX.<br /><br />Randomness in algorithms is both a blessing and a curse. Without it, genetic algorithms would reduce to identical clones of one individual engaging in a staring contest, simulation models would produce absurdly tight confidence intervals (thanks to the output have zero variance) and so on. With it, though, testing and debugging software gets tricky, since you cannot rely on the same inputs, parameter settings, hardware etc. to produce the same output, even when the program is function correctly. If I rerun a problem and get different results, or different performance (e.g., same answer but considerably faster or slower), is that an indication of a bug or just luck of the draw?<br /><br />Somewhere, back in the Dark Ages, I was told that the answer to all this was to set the seed of the pseudorandom number stream explicitly. This was after the introduction of pseudorandom number generators, of course. Before that, random numbers were generated based on things like fluctuations in the voltage of the power supplied to the mainframe, or readings from a thermocouple stuck ... well, never mind. Today <a href="https://en.wikipedia.org/wiki/Hardware_random_number_generator" target="_blank">hardware random number generators</a> are used mainly in cryptography or gambling, where you explicitly do <i>not</i> want reproducibility. With pseudorandom numbers, using the same seed will produce the same sequences of "random" values, which should hypothetically make reproducibility possible.<br /><br />I think I originally came across that in the context of simulation, but it also applies in optimization, and not just in obviously random heuristics and metaheuristics. CPLEX has a built-in pseudorandom number generator which is used for some things related to branching decisions when solving integer programs (and possibly other places too). So explicitly setting a seed for both the random number generator used by my heuristic and CPLEX should make my code reproducible, right?<br /><br />Wrong. There are two more wildcards here. One is that my Java code uses various types of collections (HashMaps, HashSets, ...) and also uses Streams. Both of those can behave in nondeterministic ways if you are not very careful. (Whether a Stream is deterministic may boil down to whether the source is. I'm not sure about that.) The other, and I'm pretty sure this bites me in the butt more often than any other aspect, is the use of parallel threads. My code runs multiple copies of the heuristic in parallel (using different seeds), and CPLEX uses multiple threads. The problem with parallel threads is that timing is nondeterministic, even if the sequence of operations in each thread is. The operating system will interrupt threads willy-nilly to use those cores for other tasks, such as updating the contents of the display, doing garbage collection in Java, pinging the mail server or asking <a href="https://en.wikipedia.org/wiki/Skynet_(Terminator)" target="_blank">Skynet</a> whether it's time to terminate the user). If there is any communication between the processes running in parallel threads in your code, the sequencing of the messages will be inherently random. Also, if you give a process a time limit and base it on "wall clock" time (which I'm prone to doing), how far the process gets before terminating will depend on how often and for how long it was interrupted.<br /><br />Limiting everything to a single thread tends not to be practical, at least for the problems I find myself tackling. So I'm (somewhat) reconciled to the notion that "reproducibility" in what I do has to be interpreted somewhat loosely.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com4tag:blogger.com,1999:blog-8781383461061929571.post-23542494533694608822019-04-16T15:53:00.000-04:002019-04-16T15:53:52.837-04:00CPLEX Callbacks: ThreadLocal v. CloneA while back, I <a href="https://orinanobworld.blogspot.com/2017/11/benders-decomposition-with-generic.html" target="_blank">wrote a post</a> about the new (at the time) "generic" callbacks in CPLEX, including a brief discussion of adventures with multiple threads. A key element was that, with generic callbacks, IBM was making the user more responsible for thread safety. In that previous post, I explored a few options for doing so, including use of Java's <a href="https://docs.oracle.com/javase/10/docs/api/java/lang/ThreadLocal.html" target="_blank">ThreadLocal</a> class. <span style="font-family: "courier new" , "courier" , monospace;">ThreadLocal</span> is my current first choice when writing a generic callback.<br /><br />Recently, I saw a <a href="https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=e72f622a-b899-4e96-a789-254587a39c7d&permalinkReplyUuid=dc62fc3c-c30a-41e4-bfe1-3e037cc7700e" target="_blank">reply by IBM's Daniel Junglas</a> on a CPLEX user forum that contained the following information.<br /><blockquote class="tr_bq">For each thread CPLEX creates a clone of your callback object. This is done by invoking the callback's clone() method. The Java default implementation of clone() returns a shallow copy of the class.</blockquote>That set me to wondering about the differences between <span style="font-family: "courier new" , "courier" , monospace;">ThreadLocal</span> and cloning, so I did a bit of experimentation. I'll summarize what I think I learned below.<br /><br /><h3>Why is any of this necessary? </h3><br />It's not if you only allow CPLEX to use a single thread. When using callbacks with multiple threads, it is possible that two or more threads will access the callback simultaneously. Simultaneously reading/fetching the value of something is harmless, but trying to modify a value simultaneously will either trigger an exception or cause the JVM to crash. (That's not hypothetical, by the way. I have a test program that routinely crashes the JVM.) "Information callbacks" are harmless, but "control callbacks" (where you attempt to alter what CPLEX is doing, by adding cuts or lazy constraints or by taking control of branching), usually involve modifying something. In particular, with Benders decomposition your callback needs to solve a subproblem <i>after first adjusting it</i> based on the proposed master problem solution. Two threads trying to adjust the subproblem at the same time spells disaster.<br /><br /><h3>Is the use of ThreadLocal an option for both legacy and generic callbacks?</h3><br />Yes. The only tricky part is in initialization of the storage. Let's say that I'm doing Benders decomposition, and I have a subproblem that is an instance of <span style="font-family: "courier new" , "courier" , monospace;">IloCplex</span>. I am going to need to create a separate version of the subproblem for each thread. So my callback class will contain a class field declared something like<br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: black; font-weight: bold;">private</span> ThreadLocal<span style="color: black;"><</span>IloCplex<span style="color: black;">></span> subproblem<span style="color: black;">;</span></pre>and it will need to fill in a value of the subproblem for each thread.<br /><br />With a generic callback, the <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> context provided by CPLEX can be used to do this. Assuming that <span style="font-family: "Courier New", Courier, monospace;">context</span> is the argument to the callback function you write, you can use code like the following to initialize the subproblem for each thread.<br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: black; font-weight: bold;">if</span> <span style="color: black;">(</span>context <span style="color: black;">==</span> IloCplex<span style="color: black;">.</span>Callback<span style="color: black;">.</span>Context<span style="color: black;">.</span>Id<span style="color: black;">.</span>ThreadUp<span style="color: black;">) {</span><br /> IloCplex s <span style="color: black;">= ...;</span> <span style="color: #838183; font-style: italic;">// code to generate a new subproblem</span><br /> subproblem<span style="color: black;">.</span><span style="color: #010181;">set</span><span style="color: black;">(</span>s<span style="color: black;">);</span><br /><span style="color: black;">}</span></pre>Once the subproblem is initialized, to use it when a candidate solution is being proposed, you need to extract it from the <span style="font-family: "courier new" , "courier" , monospace;">ThreadLocal</span> field. Here is an example of how that would look. <br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: black; font-weight: bold;">if</span> <span style="color: black;">(</span>context <span style="color: black;">==</span> IloCplex<span style="color: black;">.</span>Callback<span style="color: black;">.</span>Context<span style="color: black;">.</span>Id<span style="color: black;">.</span>Candidate<span style="color: black;">) {</span><br /> IloCplex s <span style="color: black;">=</span> subproblem<span style="color: black;">.</span><span style="color: #010181;">get</span><span style="color: black;">(</span>s<span style="color: black;">)</span>;<br /> <span style="color: #838183; font-style: italic;">// Do Benders stuff with s.</span><br /><span style="color: black;">}</span></pre><br />Legacy callbacks do not have a mechanism like <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> for detecting the creation of a new thread. You can still initialize the subproblem "lazily". In the legacy callback, before using the subproblem check to see if it exists. If not, create it. Here's some sample code.<br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;">IloCplex s<span style="color: black;"> = </span><span style="color: black;">subproblem<span style="color: black;">.</span><span style="color: #010181;">get</span><span style="color: black;">()</span>;</span> <span style="color: #838183; font-style: italic;">// get the subproblem</span><br /><span style="color: black; font-weight: bold;">if</span> <span style="color: black;">(s</span><span style="color: black;"> ==</span> null<span style="color: black;">) {</span><br /> <span style="color: #838183; font-style: italic;">// First use: need to generate a fresh subproblem.</span><br /> s <span style="color: black;">= ...;</span> <span style="color: #838183; font-style: italic;">// code to generate a new subproblem</span><br /> subproblem<span style="color: black;">.</span><span style="color: #010181;">set</span><span style="color: black;">(</span>s<span style="color: black;">);</span><br /><span style="color: black;">}</span><br /><span style="color: #838183; font-style: italic;">// Do Benders stuff with s.</span></pre><h3>Is cloning an option for both legacy and generic callbacks?</h3><br />No. I don't think cloning can be used with generic callbacks. In Java, objects can be cloned. CPLEX declares legacy callbacks as Java objects, but it declares generic callbacks by means of an interface. Cloning an interface is not really a "thing" in Java.<br /><br />When you use a generic callback, you create a class that implements the <span style="font-family: "courier new" , "courier" , monospace;">Callback</span> interface. It's certainly possible to make that class cloneable, but according to my experiments CPLEX will not call the <span style="font-family: "courier new" , "courier" , monospace;">clone()</span> method on that class, even if it exists. So the solver will be working with a single instance of that class, and if the class is not thread-safe, kaboom.<br /><br /><h3>What does cloning entail?</h3><br />There are quite a few web sites that explain cloning in Java, and if you intend to use it I recommend you do a search. I'll just cover the bare bones here.<br /><ol><li>Declare your class with the modifier "<span style="font-family: "Courier New", Courier, monospace;">implements Cloneable</span>".</li><li>Override the protected method <span style="font-family: "Courier New", Courier, monospace;">Object.clone</span>.</li><li>Call the parent method (<span style="font-family: "Courier New", Courier, monospace;">super.clone()</span>) as the very first executable line of the <span style="font-family: "Courier New", Courier, monospace;">clone()</span> method.</li><li>Handle the <span style="font-family: "Courier New", Courier, monospace;">CloneNotSupportedException</span> that the parent method might hypothetically throw, either by catching it and doing something, or by rethrowing it.</li><li>After calling the parent method, fix anything that needs fixing.</li></ol>I left that last step rather vague. With a Benders lazy constraint callback, you will need to replace the original subproblem with a freshly generated version (so that no two threads are chewing on the same subproblem instance). If life were fair (it's not), you would just do a deep clone of the original problem. The catch is that the <span style="font-family: "Courier New", Courier, monospace;">IloCplex</span> class is <i>not</i> cloneable. So the best (only?) alternative I can see is to build a new copy of the subproblem, the same way you built the original copy.<br /><br />If you have other class fields that the callback might modify (such as a vector that stores the best feasible solution encountered), you <i>may</i> need to do a deep clone of them (or replace them with new versions). A detailed analysis of the differences between shallow and deep cloning is beyond the scope of this post (or my competence). As Daniel points out in his answer, <span style="font-family: "Courier New", Courier, monospace;">super.clone()</span> only makes a shallow copy. You'll need to take additional steps to make sure that fields containing arrays and other objects (other than primitives) are handled properly if the callback might modify them.<br /><br />Here is some skeleton code.<br /><br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: black; font-weight: bold;">public class</span> MyCallback <span style="color: black; font-weight: bold;">extends</span> IloCplex<span style="color: black;">.</span>LazyConstraintCallback<br /> <span style="color: black; font-weight: bold;">implements</span> Cloneable <span style="color: black;">{</span><br /> <span style="color: black; font-weight: bold;">private</span> IloCplex subproblem<span style="color: black;">;</span><br /><br /> <span style="color: #838183; font-style: italic;">// ...</span><br /><br /> <span style="color: #838183; font-style: italic;">/**</span><br /><span style="color: #838183; font-style: italic;"> * Clone this callback.</span><br /><span style="color: #838183; font-style: italic;"> * @return the clone</span><br /><span style="color: #838183; font-style: italic;"> * @throws CloneNotSupportedException if the parent clone method bombs</span><br /><span style="color: #838183; font-style: italic;"> */</span><br /> <span style="color: black; font-weight: bold;">@Override</span><br /> <span style="color: black; font-weight: bold;">public</span> MyCallback <span style="color: #010181;">clone</span><span style="color: black;">()</span> <span style="color: black; font-weight: bold;">throws</span> CloneNotSupportedException <span style="color: black;">{</span><br /> <span style="color: #838183; font-style: italic;">// Call Object.clone first.</span><br /> MyCallback cb <span style="color: black;">= (</span>CloneCallback<span style="color: black;">)</span> <span style="color: black; font-weight: bold;">super</span><span style="color: black;">.</span><span style="color: #010181;">clone</span><span style="color: black;">();</span><br /> <span style="color: #838183; font-style: italic;">// Replace the subproblem with a fresh copy.</span><br /> subproblem <span style="color: black;">= ...;</span> <span style="color: #838183; font-style: italic;">// generate a fresh copy of the subproblem</span><br /> <span style="color: #838183; font-style: italic;">// Make deep copies (or new values) for other fields as needed.</span><br /> <span style="color: black; font-weight: bold;">return</span> cb<span style="color: black;">;</span><br /> <span style="color: black;">}</span><br /><br /><span style="color: black;">}</span></pre>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-21272939670347851502019-03-12T19:46:00.000-04:002019-03-12T19:46:29.443-04:00Pseudocode in LyX RevisitedThis post is strictly for users of <a href="https://www.lyx.org/" target="_blank">LyX</a>.<br /><br />In a <a href="https://orinanobworld.blogspot.com/2018/10/pseudocode-in-lyx.html" target="_blank">previous post</a> I offered up a LyX module for typesetting pseudocode. Unfortunately, development of that module bumped up against some fundamental incompatibilities between the <a href="https://ctan.org/pkg/algorithmicx" target="_blank"><span style="font-family: "Courier New", Courier, monospace;">algorithmicx</span> package</a> and the way LyX layouts work. The repository for it remains open, but I decided to shift my efforts to a different approach.<br /><br />LyX has built-in support for "program listings", insets that use either the <a href="https://ctan.org/pkg/listings?lang=en" target="_blank">listings</a> (default choice) or <a href="https://ctan.org/pkg/minted" target="_blank">minted</a> LaTeX package. The listings package supports a lengthy list of programming languages, but not pseudocode. This makes sense because pseudocode has no fixed syntax; everyone writes it their own way. So I cobbled together a LyX module that adds my take on a pseudocode language to the listings package, allowing users to type pseudocode into a LyX program listing inset.<br /><br />The module is available (under a GPL3 license) from a <a href="https://github.com/prubin73/pseudolst" target="_blank">GitHub repository</a>. The repository includes a user guide (both a LyX file and a PDF version) explaining both installation and the nuances of using the module. (Yes, there are some finicky bits.) The user guide also spells out what words are treated as keywords and formatted accordingly. I also built in a (somewhat clumsy) way to insert vertical lines to help the user recognize the span of a block of code.<br /><br />The listings package provides (and LyX supports) a number of nice formatting features, including options for color-coding keywords, numbering lines and such. In addition, you can modify the list of recognized keywords by editing the module file in a text editor. If you are familiar with the listings package, you can customize the pseudocode language definition even further (for instance, by changing the delimiters for comments), again by hacking the module file. If you use the module, feel free to suggest changes to the language definition via the issue tracker for the repository.<br /><br />Here is a small sample of the output, to give you an idea of what I am talking about. (I had to convert the PDF output to an image, which explains the slight fuzziness.)<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-mAdBM5_QX98/XIhDfyal4PI/AAAAAAAAC28/vkobreGxLSURXc7TRFpGoSJ3e_7ZtQHEwCLcBGAs/s1600/euclid.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="206" data-original-width="338" src="https://2.bp.blogspot.com/-mAdBM5_QX98/XIhDfyal4PI/AAAAAAAAC28/vkobreGxLSURXc7TRFpGoSJ3e_7ZtQHEwCLcBGAs/s1600/euclid.png" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div>While we're on the subject of LyX, I also have a layout file for the <a href="https://ctan.org/pkg/standalone?lang=en" target="_blank">standalone</a> LaTeX class, which is useful for generating PDF output of just one table, image or whatever without the wasted space of page margins. That layout file has its own <a href="https://github.com/prubin73/standalone" target="_blank">GitHub repository</a>, if you are interested.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-79665856350403854332019-02-24T18:29:00.002-05:002019-02-25T17:30:28.222-05:00Guessing Pareto Solutions: A TestIn <a href="https://orinanobworld.blogspot.com/2019/02/guessing-pareto-solutions.html" target="_blank">yesterday's post</a>, I described a simple multiple-criterion decision problem (binary decisions, no constraints), and suggested a possible way to identify a portion of the Pareto frontier using what amounts to guesswork: randomly generate weights; use them to combine the multiple criteria into a single objective function; optimize that (trivial); repeat <i>ad nauseam</i>.<br /><br />I ran an experiment, just to see whether the idea was a complete dud or not. Before getting into the results, I should manage expectations a bit. Specifically, while it's theoretically possible in some cases that you might find all the Pareto efficient solutions (and is in fact guaranteed if there is only one), in general you should not bank on it. Here's a trivial case that demonstrates that it may be impossible to find them all using my proposed technique. Suppose we have three yes-no decisions ($N=3$, using the notation from yesterday's post) and two criteria ($M=2$), one to be maximized and one to be minimized. Supposed further that, after changing the sign of the second criterion (so that we are maximizing both) and scaling, that all three decisions produce identical criterion values of $(1,-1)$. With $N=3$, there are $2^3=8$ candidate solutions. One (do nothing) produces the result $(0,0)$. Three (do just one thing) produce $(1,-1)$, three (do any two things) produce $(2,-2)$ and one (do all three things) produces $(3,-3)$. All of those are Pareto efficient! However, if we form a weighted combination using weights $(\alpha_1, \alpha_2) \gg 0$ for the criteria, then every decision has the same net impact $\beta = \alpha_1 - \alpha_2$. If $\beta > 0$, the only solution that optimizes the composite function is $x=(1,1,1)$ with value $3\beta$. If $\beta<0$, the only solution that optimizes the composite function is $x=(0,0,0)$ with value $0$. The other six solutions, while Pareto efficient, cannot be found my way.<br /><br />With that said, I ran a single test using $N=10$ and $M=5$, with randomly generated criterion values. One test like this is not conclusive for all sorts of reasons (including but not limited to the fact that I made the five criteria statistically independent of each other). You can see (and try to reproduce) my work in an <a href="https://msu.edu/~rubin/code/pareto.nb.html" target="_blank">R notebook</a> hosted on my web site. The code is embedded in the notebook, and you can extract it easily. Fair warning: it's pretty slow. In particular, I enumerated all the Pareto efficient solutions among the $2^{10} = 1,024$ possible solutions, which took about five minutes on my PC. I then tried one million random weight combinations, which took about four and a half minutes.<br /><br /><b>Correction</b>: After I rejiggered the code, generating a million random trials took only 5.8 seconds. The bulk of that 4.5 minutes was apparently spent keeping track of how often each solution appeared among the one million results ... and even that dropped to a second or so after I tightened up the code.<br /><br />To summarize the results, there were 623 Pareto efficient solutions out of the population of 1,024. The random guessing strategy only found 126 of them. It found the heck out of some of them: the maximum number of times the same solution was identified was 203,690! (I guess that one stuck out like a sore thumb.) Finding 126 out of 623 may not sound too good, but bear in mind the idea is to present a decision maker with a reasonable selection of Pareto efficient solutions. I'm not sure how many decision makers would want to see even 126 choices.<br /><br />A key question is whether the solutions found by the random heuristic are representative of the Pareto frontier. Presented below are scatter plots of four pairs of criteria, showing all the Pareto efficient solutions color-coded according to whether or not they were identified by the heuristic. (You can see higher resolution versions by clicking on the link above to the notebook, which will open in your browser.) In all cases, the upper right corner would be ideal. Are the identified points a representative sample of all the Pareto efficient points? I'll let you judge.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-dIISelf6iJc/XHMn1b68J9I/AAAAAAAAC18/tPuG31q-gi4ybS337QwqF5b_bCs9xGkVACLcBGAs/s1600/plot1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="430" data-original-width="495" height="277" src="https://4.bp.blogspot.com/-dIISelf6iJc/XHMn1b68J9I/AAAAAAAAC18/tPuG31q-gi4ybS337QwqF5b_bCs9xGkVACLcBGAs/s320/plot1.png" width="320" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-H5Y1X6LJZus/XHMn1bTNmeI/AAAAAAAAC2E/o4SZCKkhIUcvrIg9wuAXKp2dbaurt9SpgCLcBGAs/s1600/plot3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="430" data-original-width="495" height="277" src="https://4.bp.blogspot.com/-H5Y1X6LJZus/XHMn1bTNmeI/AAAAAAAAC2E/o4SZCKkhIUcvrIg9wuAXKp2dbaurt9SpgCLcBGAs/s320/plot3.png" width="320" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-N-v3S8zAxq0/XHMn1QoUFtI/AAAAAAAAC14/IiMDPgbi0JoTrCjZ9at9Ay9-DXPfpU3zQCLcBGAs/s1600/plot2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="430" data-original-width="495" height="277" src="https://3.bp.blogspot.com/-N-v3S8zAxq0/XHMn1QoUFtI/AAAAAAAAC14/IiMDPgbi0JoTrCjZ9at9Ay9-DXPfpU3zQCLcBGAs/s320/plot2.png" width="320" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-GeejIDo_vXc/XHMn180muWI/AAAAAAAAC2A/Bc8T28J87-kpHeJnIp4-KHkT2wp3NVBAwCLcBGAs/s1600/plot4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="430" data-original-width="495" height="277" src="https://3.bp.blogspot.com/-GeejIDo_vXc/XHMn180muWI/AAAAAAAAC2A/Bc8T28J87-kpHeJnIp4-KHkT2wp3NVBAwCLcBGAs/s320/plot4.png" width="320" /></a></div><br />I'll offer one final thought. The weight vectors are drawn uniformly over a unit hypercube of dimension $M$, and the frequency with which each identified Pareto solution occurs within the sample should be proportional to the volume of the region within that hypercube containing weights favoring that solution. So high frequency solutions have those high frequencies because wider ranges of weights favor them. Perhaps that makes them solutions that are more likely to appeal to decision makers than their low frequency (or undiscovered) counterparts?Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-30837209852963578912019-02-23T14:53:00.003-05:002019-02-23T14:53:30.958-05:00Guessing Pareto SolutionsOne of the challenges of <a href="https://en.wikipedia.org/wiki/Multiple-criteria_decision_analysis" target="_blank">multiple-criteria decision-making (MCDM)</a> is that, in the absence of a definitive weighting or prioritization of criteria, you cannot talk meaningfully about a "best" solution. (Optimization geeks such as myself tend to find that a major turn-off.) Instead, it is common to focus on <a href="https://en.wikipedia.org/wiki/Pareto_efficiency" target="_blank">Pareto efficient</a> solutions. We can say that one solution A dominates another solution B if A does at least as well as B on all criteria and better than B on at least one criterion. A solution is Pareto efficient if no solution dominates it.<br /><br />Recently I was chatting with a woman about a multiple-criteria model she had. The variables were all binary (yes-no decisions) and were otherwise unconstrained. Normally my hackles would go up at an unconstrained decision problem, but in this case all constraints were "soft". For example, rather than specify a fixed budget, she would make cost one of the criteria (less being better) and let the decision maker screen out Pareto efficient solutions with unrealistic budgets.<br /><br />I won't name the researcher or go into more details, since her work is currently under review (as of this writing), but I will look at a similar problem that is only slightly simpler. Namely, I will add the requirement that the criteria are additive, meaning that the value of any criterion for any "yes" does not depend on which other decisions wound up "yes" rather than "no". So, in mathematical terms, we have $N$ binary decisions ($x\in\{0,1\}^N$) and $M$ criteria. Let $c_{ij}$ denote the value of criterion $j$ when $x_i=1$. In what follows, I will want to be consistent in maximizing (rather than minimizing) criterion values, so I'll assume that (a) a "no" decision has value 0 for every criterion and (b) criteria are normalized so that $c_{ij}\in [0,1]$ for all $i$ and for all $j$ such that more is better, while $c_{ij}\in [-1,0]$ for all $i$ and for all $j$ such that (before scaling) less is better. For example, if cost is a criterion (and if you do not work for the Pentagon), the scaled values of cost will run from -1 for the most expensive choice encountered to 0 for any choice that is free.<br /><br />The solution space contains $2^N$ candidate solutions, which means that enumerating them is out of the question for nontrivial values of $N$ (and, trust me, hers are nontrivial ... very, very nontrivial). In any case, the size of the Pareto frontier when $N$ is nontrivial will quickly outpace any decision maker's capacity (or patience). So it seems reasonable to sample the Pareto frontier and present the solution maker with a reasonably (but not excessively) large set of Pareto efficient solutions that hopefully is representative of the overall Pareto frontier.<br /><br />There are methods for doing this, and the researcher pointed me toward one: the NSGA-II genetic algorithm [1]. Like any genetic algorithm, NSGA-II runs until you stop it via some limit (run time, number of generations, some measure of stagnation, ...). As I understand it, the final population of solutions are not guaranteed to be Pareto efficient, but the population converges in some sense to the Pareto frontier. That would likely be sufficient in practice, and the researcher had good luck with it initially, but ran into scaling problems as $N$ grew.<br /><br />Looking at the problem reminded me of a sometimes troubling dichotomy faced by academics (in at least some disciplines, including operations research). On the one hand, real-world decision makers with real-world problems are often quite happy to get solutions that are obtained quickly and cheaply and are easily explained or understood, even if the technology is "primitive". Academics, on the other hand, usually need to publish their research. In the domain of the problem, low-tech algorithms may be perfectly fine (and easier to understand than more esoteric algorithms), but for a professor in IE or OR looking to publish in a journal in his or her field, "simple" usually means "too unsophisticated to publish". Thus we tend at times to "over-engineer" solutions.<br /><br />Anyway, taking off my academic hat for a moment, I got to wondering whether a no-brainer random search approach would work for a problem such as this one. My premise was the following. Suppose I randomly select weights $\alpha_1,\dots,\alpha_M \in (0,1]$ and use them to form a single weighted criterion function $f_\alpha:\{0,1\}^N \rightarrow \Re$ as follows:$$f_\alpha (x)=\sum_{i=1}^N w_i(\alpha) x_i$$where$$w_i(\alpha)=\sum_{j=1}^M \alpha_j c_{ij}.$$Note that the weights $\alpha_i$ need to be <i>strictly positive</i>, not just nonnegative. Maximizing $f_\alpha$ over the solution space is trivial: set $x_i=1$ if $w_i(\alpha) > 0$ and $x_i=0$ if $w_i(\alpha) < 0$. (You can set $x_i$ to anything you like if $w_i(\alpha) = 0$, but that should have probability 0 of occurring given the random generation of $\alpha$.) Constructing the coefficients of $f_\alpha$ is $O(MN)$ and optimizing $f_\alpha$ is $O(N)$, so each solution $x$ produced this way takes $O(MN)$ time. In practice, $M$ is likely not to get too big, and very likely does not grow as $N$ does, so functionally this is really $O(N)$ in complexity.<br /><br />It's trivial to show that, so long as none of the composite weights $w_i(\alpha)$ is 0, each $x$ found by optimizing some $f_\alpha$ must be Pareto efficient. So we can generate a sample of Pareto efficient solutions fairly efficiently ... maybe. I did deal one card off the bottom of the deck. Solutions will tend to repeat, so we will need to do some bookkeeping to ensure that, in the final results, duplicates are weeded out. That means comparing each new solution to its predecessors. Each comparison is $O(N)$, and if we assume that we only retain a "reasonable" number of alternatives (a number that does not grow with $N$) in order to avoid causing the decision maker's brain to explode, then time spent weeding duplicates should be $O(N)$. Overall, the load is $O(NR)$, where $R$ is the number of replications performed.<br /><br />That left me with a few questions, the main one being this: for plausible sample sizes, how well does the set of solutions obtained represent the Pareto frontier? Is it "biased" in the sense that solutions cluster on one part of the frontier while missing other parts?<br /><br />I'll show some experimental results in my next post, and let you decide.<br /><br /><br />[1] K. Deb, A. Pratap, S. Agarwal & T. Meyarivan (2002). A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. <b>IEEE Transactions on Evolutionary Computation</b>, <i>6</i>, 182-197.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-6957851638559641162019-02-14T17:03:00.000-05:002019-02-14T17:03:22.669-05:00Binary / Integer Conversion in RI've been poking around in R with an unconstrained optimization problem involving binary decision variables. (Trust me, it's not as trivial as it sounds.) I wanted to explore the entire solution space. Assuming $n$ binary decisions, this means looking at the set $\{0,1\}^n$, which contains $2^n$ 0-1 vectors of dimension $n$.<br /><br />For reasons I won't get into, I wanted to associate each vector with an ID number from 0 to $2^n-1$, which sounds pretty straightforward ... until you try to do it. I thought I'd post my code here, with no warranty ("express or implied", as the legal eagles would say) that it's computationally efficient.<br /><br />Before I explain the code, let me just point out that using an integer from 0 to $2^n-1$ implies that $n$ is small enough that $2^n-1$ fits in a machine integer. So if you decide to try the code, don't go nuts with it.<br /><br />Here's the code:<br /><br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: #838183; font-style: italic;"># Create a vector of powers of 2 (for use in conversions from binary vectors to integers).</span><br />powers<span style="color: black;">.</span>of<span style="color: black;">.</span>two <span style="color: black;"><-</span> <span style="color: #b07e00;">2</span>^<span style="color: black;">(</span><span style="color: #b07e00;">0</span><span style="color: black;">:(</span>n <span style="color: black;">-</span> <span style="color: #b07e00;">1</span><span style="color: black;">))</span><br /><span style="color: #838183; font-style: italic;"># Convert an ID (integer) to a binary vector of appropriate length. Note that the vector is reversed so that the lowest order bit (corresponding to the first decision) comes first.</span><br />fromID <span style="color: black;"><-</span> <span style="color: black; font-weight: bold;">function</span><span style="color: black;">(</span>id<span style="color: black;">) {</span> as<span style="color: black;">.</span><span style="color: #010181;">integer</span><span style="color: black;">(</span><span style="color: #010181;">head</span><span style="color: black;">(</span><span style="color: #010181;">intToBits</span><span style="color: black;">(</span>id<span style="color: black;">),</span> n<span style="color: black;">)) }</span><br /><span style="color: #838183; font-style: italic;"># Convert a binary vector of appropriate length to an ID value (integer).</span><br />toID <span style="color: black;"><-</span> <span style="color: black; font-weight: bold;">function</span><span style="color: black;">(</span>vec<span style="color: black;">) {</span> as<span style="color: black;">.</span><span style="color: #010181;">integer</span><span style="color: black;">(</span>vec <span style="color: black;">%*%</span> powers<span style="color: black;">.</span>of<span style="color: black;">.</span>two<span style="color: black;">) }</span> </pre><br />To facilitate converting binary vectors to integers, I store all the powers of 2 once. The <span style="font-family: "courier new" , "courier" , monospace;">fromID()</span> function takes an integer ID number and converts it to a binary vector of length $n$. It uses the <span style="font-family: "courier new" , "courier" , monospace;">intToBits()</span> function from the R base package, which does the heavy lifting but whose output needs a little massaging. The <span style="font-family: "courier new" , "courier" , monospace;">toID()</span> function takes a binary vector and converts it to an integer ID number. So, with $n=5$, the output of <span style="font-family: "courier new" , "courier" , monospace;">fromID(22)</span> is<br /><blockquote class="tr_bq"><pre>[1] 0 1 1 0 1</pre></blockquote>while the output of <span style="font-family: "courier new" , "courier" , monospace;">toID(c(0, 1, 1, 0, 1))</span> is<br /><blockquote class="tr_bq"><pre>[1] 22</pre></blockquote>(as you would expect).<br /><br />There is one other thing I should point out: because I am using the ID numbers to index binary vectors starting from (0, 0, ..., 0) and working up to (1, 1, ..., 1), I designed the functions to put the least significant bit first. If you want the most significant bit first and the least significant bit last, you need to make two changes to the code: change the definition of <span style="font-family: "courier new" , "courier" , monospace;">powers.of.two</span> to <span style="font-family: "courier new" , "courier" , monospace;">2^((n - 1):0)</span>; and, in the definition of <span style="font-family: "Courier New", Courier, monospace;">fromID</span>, change <span style="font-family: "Courier New", Courier, monospace;">head(intToBits(id), n)</span> to <span style="font-family: "Courier New", Courier, monospace;">rev(head(intToBits(id), n))</span>.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-47331996830204639362019-01-25T17:31:00.000-05:002019-01-25T17:31:55.936-05:00Threads and MemoryYesterday I had a rather rude reminder (actually, two) of something I've known for a while. I was running a Java program that uses CPLEX to solve an integer programming model. The symptoms were as follows: shortly after the IP solver run started, I ran out of RAM, the operating system started <a href="https://en.wikipedia.org/wiki/Paging" target="_blank">paging memory</a> to the hard drive, and the resulting hard drive <a href="https://en.wikipedia.org/wiki/Thrashing_(computer_science)" target="_blank">thrashing</a> made the system extremely sluggish (to put it charitably). Long after I regained enough control to kill the program, there was a significant amount of disk activity (and concomitant noise) as the system gradually came back to its senses.<br /><br />How did this happen? My system has four cores, which means CPLEX defaults to running four parallel threads when solving models. What's not always obvious is that each thread gets a separate copy of the model. (I'm not sure if it is the entire model after presolving or just most of it, but it's definitely a large chunk.) In my particular case, the model begins with an unpredictably large number of constraints, and when that number got big, the model got big -- not big enough to be a problem if I used a single thread, but too big to get away with four copies of it ... or, as it turned out, three copies. (The second thrashing event was when I tried to run it with three threads.)<br /><br />Parallel threading is great, but there are two caveats associated with it. First, performance is a sublinear function of the number of threads, meaning that doubling the number of threads will not cut run time in half, tripling the number of threads will not cut run time to a third the single-thread time, and so on. Second, if you are dealing with a largish model, you might want to try running a little while with a single thread to see how much memory it eats, and then decide how many copies your system can handle comfortably. That's an upper bound on how many threads you should use.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-4217039381559065502018-12-16T16:25:00.000-05:002018-12-16T16:25:59.062-05:00Of Typewriters and Permutations (V)Okay, this is the last post on the subject. I promise! If you're coming into this movie on the last reel, you may need to skim the last few posts to see what it's about. I'm trying not to repeat myself too much.<br /><br />To summarize where we are at: Hardmath123 posted a solution (generated by a heuristic) that might or might not be optimal; a constraint programming model gets that solution very quickly and fails to improve on it to the extent of my experience (I let it run four hours); and the MIP1 model was the least worst of the various optimization models I tried, but it had rather loose bounds.<br /><br />Rob Pratt, in a comment to the second post, suggested a new constraint that does improve the bound performance of MIP1. Recall that the variables in MIP1 are $x_{i,j}$ (binary, does symbol $i$ go in slot $j$), $p_i$ (the position where symbol $i$ gets parked), and $d_{ij}$ (the distance between symbols $i$ and $j$ in the layout). Rob's constraint is$$\sum_{i} \sum_{j>i} d_{ij} = \frac{n^3 - n}{6}$$where $n$ is the size of the symbol set (26 in our case). I'll skip the derivation here; it's based on the observation that the sum of the distances between all pairs of positions is constant, regardless of who fills those positions.<br /><br />That got me thinking, and I came up with a set of constraints that are similarly motivated. Let $\delta_j$ be the sum of the distances from position $j$ to all other positions. Since we are using zero-based indexing, there are $j$ positions to the left of position $j$ with distances $1,2,\dots,j$ (looking right to left) and $n -1-j$ positions to the right of $j$, with distances $1,2,\dots, n-1-j$ (looking left to right). Using the well known formula for summing a sequence of consecutive integers,$$\delta_j = \frac{j(j+1)}{2}+\frac{(n-1-j)(n-j)}{2}.$$It follows that if symbol $i$ lands in position $j$, $\sum_j d_{ij} = \delta_j$. Considering all possible landing spots for symbol $i$ leads to the following constraints:$$\sum_k d_{ik} = \sum_j \delta_j x_{ij}\quad \forall i.$$A little experimentation showed that the combination of those constraints plus Rob's constraint improved MIP1 more than either did alone. That's the good news. The bad news is that I still haven't been able to prove optimality for Hardmath123's solution. I ran MIP1 with the extra constraints (using branching priorities, which help), using the Hardmath123 solution as the start and changing the CPLEX emphasis setting to 3. That last bit tells CPLEX to emphasize pushing the bound, which makes sense when you think you have the optimal solution and you're trying to get to proof of optimality. After 338 minutes (5.6 hours), the incumbent had not improved (so it's looking good as a possible optimum), but the gap was still around 14.5%, and the search tree was north of 1.25 GB and growing. To get a sense of what I might call the "futility index", it took eleven minutes to shrink the gap from 14.58% to 14.48%.<br /><br />Here are a few final takeaways from all this.<br /><ul><li>There are often multiple ways to write a MIP model for a discrete optimization problem. They'll often perform differently, and (at least in my experience) it's hard to anticipate which will do best.</li><li>Constraint programming has potential for generating good incumbent solutions quickly for some discrete optimization problems, particularly those whose structure fits with the available selection of global constraints in the solver. Permutation problems will meet this criterion. CP may not be particularly helpful for proving optimality, though, since the bounding is weak.</li><li>Seemingly redundant constraints can help ... but there's no guarantee. (Rob suggested an additional constraint in a comment to part III. I tried it, and it slowed things down.)</li><li>Some days you get the bear, and some days the bear gets you.</li></ul>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-45854084522460546202018-12-15T17:29:00.000-05:002018-12-16T15:53:07.235-05:00 Of Typewriters and Permutations (IV)I'm continuing the recent theme of solving a quadratic assignment problem that lays out the 26 letters of the English alphabet on a one-dimensional "keyboard" for an 18th century typewriter. I thought this would be the last post, but something new turned up, so there will likely be one more.<br /><br />In the <a href="https://hardmath123.github.io/crown-typewriter.html" target="_blank">blog post that started all this</a>, Hardmath123 found a solution (via a heuristic) with cost 5,499,341. Using frequency data from Nate Brixius, I get a slightly higher cost (5,510,008), which is the value I will use for it (because, hey, it's my blog). Hardmath123 suspected but could not prove that this layout is optimal. I still can't verify optimality (but maybe in the next post I will ... or maybe not).<br /><br />In my previous epistles on this subject, I tried out three MIP models and a quadratic (integer) program. In five minute runs using a beta copy of the next version of CPLEX, the best I was able to do was a solution with objective value 5,686,878.<br /><br />Out of curiosity, I tried a <a href="https://ibmdecisionoptimization.github.io/docplex-doc/cp.html" target="_blank">constraint programming</a> (CP) model (using CP Optimizer, the constraint solver in the CPLEX Studio suite). Constraint programming is best suited to models with integer variables (which we have here), although it can handle floating-point variables to some extent. It is well suited to logic constraints, and in particular is popular in scheduling, but it's not my go-to tool in part because it tends to have very weak bounding.<br /><br />Defining the constraint programming model for the keyboard problem (denoted "CP" in <a href="https://gitlab.msu.edu/orobworld/typewriter" target="_blank">my code</a>) is fairly straightforward, but rather different from any of the MIP/QP models. We use general integer variables rather than binary variables to determine the position assignments. So $p_i\in \{0,\dots,25\}$ is the position of symbol $i$ in the layout, $s_j \in \{0,\dots,25\}$ is the symbol at position $j$, and $d_{ik}\in \{0,\dots,25\}$ is the distance between symbols $i$ and $k$. (Recall that we are using zero-based indexing.) As before, the objective is$$\min \sum_i \sum_j f_{ij}d_{ij},$$where $f_{ij}$ is the frequency with which symbol $j$ follows symbol $i$.<br /><br />Where it gets interesting is in specifying the constraints. Although CP is designed for logic constraints ("or", "and", "if-then"), it's real power comes from what they call "global constraints", specialized constraints that tie multiple variables together. Arguably the most common global constraint is "all-different", meaning that in a group of variables no two of them can take the same value. It is the key to defining permutations, and in our model we add all-different constraints for both $p$ (no two symbols have the same position) and $s$ (no two slots have the same symbol). One or the other of those is technically redundant, but redundant constraints in a CP model can help the solver, so it's probably worth including both.<br /><br />Another global constraint provided by CP Optimizer is "inverse". CP models allow variables to be used to index other variables, something we cannot do directly in math programming modes. The inverse constraint, applied to two vector variables $u$ and $v$, says that $u_{v_i}=i$ and $v_{u_i}=i$. In our model, we can apply the inverse constraint to $p$ and $s$. In effect, it says that the symbol in the position occupied by symbol $i$ is $i$, and the position of the symbol in position $j$ is $j$.<br /><br />Finally, we can define the distance variables using the position variables:$$d_{ij} = |p_i - p_j|\quad \forall i,j.$$ How does the CP model perform? As I expected, the bounding is pretty bad. In a five minute run, the lower bound only makes it to 447,795 (91.87% gap). The good news is that the CP model finds Hardmath123's solution with objective value 5,510,008 ... and finds it in about 22 seconds on my PC! This is using the default search strategy; I did not assign branching priorities.<br /><br />The take-away here, from my perspective, is that in some problems a CP model can be a great way to generate an optimal or near-optimal incumbent, which you can use as a starting solution in an optimization model if you need to prove optimality.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-25500882585211795782018-12-14T20:23:00.000-05:002018-12-14T20:23:42.831-05:00Of Typewriters and Permutations (III)This continues the discussion (okay, monologue) from the two previous posts about the problem of laying out a one-dimensional typewriter keyboard. This is not the last post in the series, but I can at least guarantee that the series is converging.<br /><br />In the previous post, I gave a MIP model (named MIP1) that used binary variables $x_{ij}$ to signal whether or not symbol $i$ was placed at position $j$ on the keyboard. It also used auxiliary variables $p_i$ for the position of symbol $i$ and $d_{ij}$ for the distance between symbols $i$ and $j$.<br /><br />You might wonder whether those auxiliary variables are worth having. I did, after the somewhat disappointing (to me) lack of progress in the lower bound when I ran MIP1. So I omitted the position variables in a new model, denoted MIP2 in the code. (Reminder: My <a href="https://gitlab.msu.edu/orobworld/typewriter" target="_blank">Java code</a> is available and free to play with.) I kept the distance variables because they make it easier to define the objective function.<br /><br />MIP2 contains the $x$ and $d$ variables from MIP1, along with the constraints that rows and columns of the $x$ matrix sum to 1. It also retains the objective function from MIP1. To tie $d_{ij}$ to $x_{i\cdot}$ and $x_{j\cdot}$, MIP2 contains the following constraints:$$d_{ij} \ge |m-k|\left(x_{ik} + x_{jm} -1\right)\quad \forall i\neq j, \forall k\neq m.$$These constraints can be entered as "normal" constraints or, using an option in the command line, as lazy constraints (meaning they'll be held off to side and activated whenever a node LP solution violates one of them). Since the distances are symmetric, we actually need only about half of these. Rather than risking an indexing error in my code, I added constraints of the form$$d_{ij}=d_{ji}\quad \forall i, \forall j<i$$and let the CPLEX presolver take care of eliminating the redundant constraints for me.<br /><br />Was MIP2 more efficient than MIP1? Nope, it was worse ... much worse. In a five minute run, the lower bound got stuck at zero, leaving the gap at 100%. (With the cut Rob Pratt suggested in a comment to the previous post, the lower bound got off zero but froze at 192,550, so that the gap was "only" 97.13% by the end of five minutes.<br /><br />Having no luck with MIP2, I went back to MIP1. In an effort to improve its performance, I made one tweak: rather than putting the branching priorities on the assignment variables ($x$), I declared the position variables ($p$) to be integer (in MIP1 they were continuous) and put the branching priorities on them. Thus higher priority was given to pinning down the positions of higher usage characters. This model is designated "MIP3" in the code. It did a fair bit worse than MIP1.<br /><br />As mentioned in the post by Nate Brixius (and in my first post), the underlying problem looks like a quadratic assignment problem (QAP). So I tried a model with a quadratic objective function (identified as "QP" in the code). It contains only the $x$ variables and row/column sum constraints from MIP1. The objective function is$$\min \sum_i \sum_{j\neq i} \sum_k \sum_{m\neq k} f_{ij} |k - m| x_{ik} x_{jm}.$$This model had a better incumbent than MIP2 but a worse incumbent than either MIP1 or MIP3, and the lower bound was stuck at zero. (Since there are no distance variables, Rob Pratt's cut is not applicable here.)<br /><br />Here is a summary of results after five minutes, using emphasis 2 (optimality), branching priorities, and Rob's cut:<br /><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;border-color:#bbb;margin:0px auto;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#bbb;color:#594F4F;background-color:#E0FFEB;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#bbb;color:#493F3F;background-color:#9DE0AD;} .tg .tg-baqh{text-align:center;vertical-align:top} .tg .tg-lqy6{text-align:right;vertical-align:top} </style> <br /><table class="tg"> <tbody><tr> <th class="tg-baqh">Model</th> <th class="tg-baqh">Incumbent</th> <th class="tg-baqh">Bound</th> <th class="tg-baqh">Gap</th> </tr><tr> <td class="tg-baqh">MIP1</td> <td class="tg-lqy6">5,686,878</td> <td class="tg-lqy6">2,970,708</td> <td class="tg-lqy6">47.76%</td> </tr><tr> <td class="tg-baqh">MIP2</td> <td class="tg-lqy6">6,708,297</td> <td class="tg-lqy6">192,550</td> <td class="tg-lqy6">97.13%</td> </tr><tr> <td class="tg-baqh">MIP3</td> <td class="tg-lqy6">5,893,907</td> <td class="tg-lqy6">1,503,828</td> <td class="tg-lqy6">74.49%</td> </tr><tr> <td class="tg-baqh">QP</td> <td class="tg-lqy6">6,056,668</td> <td class="tg-lqy6">0</td> <td class="tg-lqy6">100.00%</td> </tr></tbody></table><br />I'm not quite done. There's one more model to try, coming up in what will hopefully be my last post on this.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-58242922015330674552018-12-13T16:29:00.001-05:002018-12-15T16:19:54.778-05:00Of Typewriters and Permutations (II)This continues my <a href="https://orinanobworld.blogspot.com/2018/12/of-typewriters-and-permutations-i.html" target="_blank">previous post</a> about the problem of optimally laying out a one-dimensional typewriter keyboard, where "optimally" is taken to mean minimizing the expected amount of lateral movement to type a few selected books. As I noted there, Nate Brixius correctly <a href="https://nathanbrixius.wordpress.com/2018/11/26/optimizing-19th-century-typewriters/" target="_blank">characterized the problem</a> as a quadratic assignment problem (QAP). I'll in fact try out a quadratic model subsequently, but my inclination is always to try to linearize anything that can't outrun or outfight me. So I'll start by discussing a couple of mixed integer linear program formulations.<br /><br />The starting point for all the math programming formulations is a matrix of binary variables $x_{ij}\in \{0,1\}$, where $x_{ij}=1$ if and only if symbol $i$ is placed in slot $j$ in the keyboard. (Consistent with Nate's post, I'll be using zero-based indexing, so symbol index $i=0$ will correspond to the letter "A" and position index $j=0$ will correspond to the left edge of the keyboard.) Since each letter needs to be placed exactly once, we need the constraints $$\sum_{j=0}^{25}x_{ij} = 1\quad \forall i.$$Similarly, each slot can only contain one character, so we need the constraints $$\sum_{i=0}^{25} x_{ij} = 1\quad \forall j.$$Each row and column of $x$ can also be declared to be a type 1 specially ordered set (SOS1), but in CPLEX that tends to be useful only if you can assign "meaningful" weights to the variables in each set. I'll return to that later.<br /><br />Recall from the previous post that we have a $26\times 26$ matrix $f$, where $f_{ij}$ is the frequency with which symbol $j$ immediately follows symbol $i$. We can also get an overall frequency with which each symbol is used by summing the corresponding row and column of $f$. I'll denote the use frequency for symbol $i$ by $g_i$, where $g_i=\sum_{j=0}^{25}(f_{ij} + f_{ji})$. I'll use that for a couple of things, one of which is to eliminate a bit of symmetry. As I noted in that previous post, if we reverse any layout (treat it as listing symbols from right to left rather than from left to right), we get the same objective value as that of the original layout. We can break that symmetry by selecting one symbol and arbitrarily requiring it to be in the left half of the keyboard. Although it probably does not make much difference to the solver which symbol we use, I'll selecting the most frequently used symbol. So let $g_{i^*}=\max_i g_i$ (breaking ties arbitrarily). We will add the constraints $x_{i^*j}=0$ for $j=13,\dots,25$.<br /><br />Back to the SOS1 stuff. When you declare an SOS1 constraint in CPLEX, CPLEX wants weights. It uses the weights to do some branching on the sets. Branching at a node typically means selecting an integer variable and splitting its domain to create two children (generally by rounding the value of the variable in the node solution up or down). With an SOS1 constraint, CPLEX can partition the set of variables involved into two subsets. In either child node, one subset of variables is fixed at zero and the other subset remains unfixed. The weights are used to help CPLEX decide how to split the set of variables. Here, we can try declaring each column of $x$ to be an SOS1 using the cumulative frequencies. So we tell CPLEX for each $j$ that $(x_{0,j}, x_{1,j},\dots,x_{25,j})$ is an SOS1 with corresponding weights $(g_0, g_1,\dots, g_{25})$. In the code I posted, using SOS1 constraints is optional.<br /><br />Another option in the code is to assign branching priorities to the variables. This encourages CPLEX to branch on variables with higher priorities before branching on variables with lower priorities. If you were laying out the keyboard heuristically, you would probably want to put high usage symbols ("Hello, 'e'!") toward the center of the keyboard, where they would be easy to reach, and lower usage symbols ("q"?) toward the edges. So I assigned to each variable $x_{ij}$ the priority $g_i \times \min(j, 25-j)$. Again, this is an option in the code.<br /><br />If you're still awake at this point, you'll realize that I have not yet specified an objective function, which is where the linearization is needed. In my first MIP model ("MIP1" in the code), I did this by introducing a bunch of auxiliary variables. First, for each $i$ let $p_i\in [0,25]$ denote the position (slot) that symbol $i$ occupies. We define $p$ with the constraints $$p_i =\sum_{j=0}^{25} j \times x_{ij} \quad \forall i.$$Note that the $p_i$ do <i>not</i> need to be declared integer-valued. Armed with them, I can define another set of continuous variables $d_{ij}\in [0,25]$ for all $i$ and $j$, where $d_{ij}$ will be the distance between symbols $i$ and $j$ in the layout. Since $d_{ij}=|p_i - p_j|$ and we cannot use absolute values explicitly, we do the standard linearization of the absolute value function, adding the constraints $$d_{ij}\ge p_i - p_j \quad \forall i,j$$and$$d_{ij}\ge p_j - p_i \quad \forall i,j.$$(Yes, I know that when $i=j$ this gives me two copies of $d_{ii} \ge 0$. I'll let the presolver take care of that.) Now we have a very simple, clean expression of the objective function:$$\min \sum_{i=0}^{25}\sum_{j=0}^{25} f_{ij} d_{ij}.$$<br />How does this model do? I ran it for five minutes on a decent desktop PC (using four threads). I included both the branching priorities and the SOS1 constraints, but the CPLEX presolver eliminated all the SOS1 constraints as "redundant". It did that even if skipped the branching priorities, which irks me a bit. Someday maybe I'll figure out why it's ignoring those carefully defined SOS1 weights. At any rate, I did the five minute run with MIP emphasis 2, which emphasizes proving optimality. After five minutes, the incumbent solution had objective value 5,706,873. That's a bit worse than the solution Hardmath123 got in the original post. (Speaking of which, Hardmath123 quoted an objective value of 5,499,341 and posted a layout. I get a value of 5,510,008 for that solution. It may be that Nate's frequency data, which I'm using, differs slightly from the frequency data Hardmath123 used.)<br /><br />Unfortunately, after five minutes the gap was still 56.55%, and closing very slowly. (The last two minutes of that five minute run only closed the gap from about 57.5% to 56.5%.) I'm pretty sure the actual optimal value will be a lot closer to 5.5 million that to the last lower bound in the five minute run (2,479,745). So we're contending with a somewhat weak bound.<br /><br /><b>Update</b>: A longer run, using MIP emphasis 3 (which focuses on improving the lower bound), still had a gap of 33% after four hours.<br /><br />The incumbent was found after a bit less than four minutes (which will be relevant as I explore other models, in future posts). Still more to come on this ...Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-60599186722396700942018-12-12T15:29:00.001-05:002018-12-13T13:58:44.002-05:00Of Typewriters and Permutations (I)This is going to be the first of a few posts on the same problem. My efforts are based on a blog post by Nate Brixius (<a href="https://twitter.com/734421910805721088" target="_blank">@natebrix</a>) titled "<a href="https://nathanbrixius.wordpress.com/2018/11/26/optimizing-19th-century-typewriters/" target="_blank">Optimizing 19th Century Typewriters</a>", which in turn is based on a post titled "<a href="http://hardmath123.github.io/crown-typewriter.html" target="_blank">Tuning a Typewriter</a>" by "Hardmath123". As the image in Hardmath123's post shows, an old (very old!) Crown typewriter used a "keyboard" that had all the symbols in a single row. You shifted the row left or right to get to the symbol you wanted next, then pressed a key to enter it. There was a second key, the equivalent of "shift" on today's keyboards, that let you strike an alternate symbol perched above the primary symbol at each position in the lone row.<br /><br />If we ignore numerals and punctuation marks (which both Hardmath123 and Nate did), the problem of finding an optimal layout for the keyboard amounts to finding the best permutation of the 26 letters A through Z. The criterion we are all adopting for "best" is requiring the least movement, in some average sense, to get to the next letter. This requires some estimate of how often each possible transition occurs. Hardmath123 and Nate estimated those transition frequencies from three books in the Project Gutenberg library. Since Nate was kind enough to share his frequency data with me, I'm indirectly using those same books.<br /><br />To be consistent with Nate's post, I'll use zero-based indexing, so "A" will have index 0 and "Z" will have index 25. Similarly, the leftmost slot on the keyboard will have index 0 and the rightmost will have index 25. The metric we have all adopted is to minimize $\sum_{i=0}^{25} \sum_{j=0}^{j=25} f_{ij} |p_i - p_j|$, where $f_{ij}$ is the frequency with which symbol $j$ immediately follows symbol $i$ and $p_i$ is the position in which symbol $i$ is located (so that $|p_i - p_j|$ is the distance shifted moving from symbol $i$ to symbol $j$). Note that $i = j$ is perfectly fine, since a letter can follow itself.<br /><br />A brute force solution would require $26!/2 \approx 2\times 10^{26}$ layouts to be evaluated. The division by 2 is because the problem is bilaterally symmetric: given any layout, the reverse of that layout will have the same objective value. (This relies implicitly on the fact that we do not differentiate between moving $n$ spaces to the left and moving $n$ spaces to the right.) Hardmath123 applied a heuristic that I would characterize as a version of 2-opt (with restarts) and found a solution with cost 5,499,341. Nate ran a heuristic by a third party and matched Hardmath123's result "within seconds". He also modeled the problem as a quadratic assignment problem (more on that later) and ran a solver he'd written himself, back in the day ... but did not let it run to completion, and did not say how good the incumbent was when he terminated the run.<br /><br />I tried several models, in an attempt (as yet unsuccessful) to get a provably optimal solution. All the models used a beta copy of version 12.9 of IBM's CPLEX Studio, but perform similarly with the current production version (12.8). In subsequent posts, I'll document the various models I tried and give some results. Meanwhile, you are welcome to play with my Java code, the <a href="https://gitlab.msu.edu/orobworld/typewriter" target="_blank">repository</a> for which is publicly available. In addition to my Java code and the usual miscellany (README file, license file, ...), you'll find a text file (cro26_data.txt) with the frequencies being used, which Nate kindly is letting me share. To run the code, you will need a recent version of the CPLEX Studio suite, and you will need to put all three major pieces (CPLEX, CPOptimizer and OPL) on your library and class paths. You'll also need the open source <a href="https://commons.apache.org/proper/commons-cli/index.html" target="_blank">Apache Commons CLI</a> library, which I use to process command line options. Once you have the code installed and linked up, you can run the program with the command line option -h or --help (that's a double dash before "help") to get a list of what can/should be specified as options.<br /><br />More to come ...Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-89864350871464115192018-11-09T16:44:00.000-05:002018-11-09T16:44:49.520-05:00Stepwise Regression Code RevisitedI've added a few more tweaks to the stepwise regression code I published back in 2011. (If you wish, see <a href="https://orinanobworld.blogspot.com/2011/02/stepwise-regression-in-r.html" target="_blank">here</a> for the original post and <a href="https://orinanobworld.blogspot.com/2017/08/updated-stepwise-regression-function.html" target="_blank">here</a> for a subsequent update.) The code does stepwise regression using F tests (or, equivalently, p-values of coefficients), which is a bit old fashioned but apparently how it is still taught some places. The latest update supplies default values for the alpha-to-enter and alpha-to-leave values. The default values (0 and 1 respectively) are consistent consistent with forward and backward stepwise. For forward stepwise, you would start with a bare-bones initial model, set your alpha-to-enter, and omit alpha-to-leave. For backward stepwise, you would start with a large initial model, set alpha-to-leave and omit alpha-to-enter. Both are demonstrated in the notebook.<br /><br />The update also allows you to use the R shortcut of typing "." in a formula (meaning "all variables except the dependent variable"). The "." shortcut only works if you specify the data source as an argument to the function. You cannot use "." while omitting the data argument and relying on having the data source attached. Again, there are demonstrations in the notebook.<br /><br />The code is free to use under a Creative Commons license. It comes in the form of an <a href="https://www.msu.edu/~rubin/code/stepwise_demo.nb.html" target="_blank">R notebook</a>, which both defines the <span style="font-family: "Courier New", Courier, monospace;">stepwise()</span> function and does some demonstrations. From that web page, you should be able to download the notebook file using the select control labeled "Code" in the upper right corner. You can also get the files from my <a href="https://gitlab.msu.edu/orobworld/stepwise/" target="_blank">Git repository</a>. The Git repository also has an issue tracker, although I think you will need to create an account in order to add an issue.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-30089721208475379432018-10-29T19:36:00.001-04:002018-10-29T19:36:39.545-04:00Pseudocode in LyXFair warning: This post is for <a href="https://www.lyx.org/" target="_blank">LyX</a> users only. When I'm writing a paper or presentation in LaTeX (using LyX, of course) and want to include a program chunk or algorithm in pseudocode, I favor the <a href="https://ctan.org/pkg/algorithmicx" target="_blank"><span style="font-family: "Courier New", Courier, monospace;">algorithmicx</span> package</a> (and specifically the <span style="font-family: "Courier New", Courier, monospace;">algpseudocode</span> style). There being no intrinsic support for the package in LyX, I have to enter the various commands as raw LaTeX (or, in LyX parlance, "ERT" -- "Evil Red Text", so named for historical reasons). Unfortunately, I do this seldom enough that I do not remember said commands, so each sojourn into pseudocode involves finding and opening the <a href="http://mirrors.ctan.org/macros/latex/contrib/algorithmicx/algorithmicx.pdf" target="_blank">documentation file</a> for <span style="font-family: "Courier New", Courier, monospace;">algorithmicx</span>.<br /><br />I finally decided to fix that by writing a pseudocode module for LyX. The beta version (which seems pretty stable to me) is available under a <a href="https://opensource.org/licenses/GPL-2.0" target="_blank">GPLv2 license</a> from my <a href="https://github.com/prubin73/algpseudocode" target="_blank">Github repository</a>. Besides a "gitignore" file (which you should, as the name suggests, ignore), there is the module file and a combination user guide/demonstration LyX document. Installation instructions are included in the latter. You will, of course, need to install the <span style="font-family: "Courier New", Courier, monospace;">algorithmicx</span> package before using the module.<br /><br />Should you decide to try it, there's an issue tracker in the repository where you can report bugs or feature requests. I hope someone else can get some use out of this, but even if not, the development time will pay itself back the next time I need to write some pseudocode.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-48308788411532528272018-10-28T14:34:00.000-04:002018-10-28T14:34:10.977-04:00B.S.-ing PreciselyIn a recent blog post titled "<a href="https://www.johndcook.com/blog/2018/10/26/excessive-precision/" target="_blank">Excessive Precision</a>", John D. Cook points out the foolishness of articulating results to an arbitrarily high degree of precision when the inputs are themselves not that precise. To quote him:<br /><blockquote class="tr_bq"><strong>Excessive precision is not the mark of the expert</strong>. Nor is it the mark of the layman. It’s the mark of the intern.</blockquote>He also mentions that it is typically difficult to assess the precision of the output even if we know the precision of the input. This is in part a matter of possible nonlinearity (and quite possibly opaqueness, as in "black box") in the mechanism that transforms inputs into outputs. It can also be caused by the inherent imprecision of floating point arithmetic (rounding error, truncation error, spiteful quantum effects, ...).<br /><br />In operations research, there are myriad other sources of imprecision. Your conceptual model of the system is an approximation of the actual system. You may have linearized things that are not linear, or used "convenient" nonlinear representations (polynomials, exponential functions) for things that are nonlinear in a goofy way. If you are like me, you will have ignored randomness entirely, because the work is hard enough even when you pretend determinism. (Also, I confuse easily.) If you did bake in some explicit consideration of randomness, you likely approximated that. (How many things in life are really normally distributed?) There's also the matter of the passage of time. Did you make all the coefficients, distributions etc. time-varying? Didn't think so (and don't really blame you). Throw in estimation error for the model parameters and you have a witches' brew of imprecision. (It's almost Halloween, so I had to work in at least one spooky metaphor.)<br /><br />This brings to mind two running questions that I encounter with discrete optimization models. I doubt either has a definitive answer. The first question is whether there is any benefit to running the solver all the way to "proven optimality" (gap nearly zero) if everything is so approximate. My personal preference is to do it when it doesn't take too long (why not?) but not bother if the gap is closing at a painfully slow rate.<br /><br />The second question is whether to bother using a MIP model and solver at all, or just run some metaheuristic (preferably one with a cool biologically-inspired name, such as "banana slug foraging optimization"). After all, if you are just getting an answer to an approximation of the actual problem, why not get an approximate answer to the approximation? My personal inclination is to solve the model exactly when possible, in the hope that the model is at least "close" to reality and thus an optimal solution to the model will be "close" to an optimal solution to the real problem. At the same time, I recognize that there is a lot of hoping going on there, so if getting an exact solution is painful, I'll switch to a heuristic or metaheuristic and hope that the answer I get proves decent in practice.<br /><br />Either way, I'm not going to claim the "optimal" value of some variable is 2.31785 with any degree of certainty. It's about 2.3 (if we're lucky). On the bright side, a lot of the problems I deal with have binary variables. There's not a lot of concern there about artificial precision; the only concern is artificial confidence in the solution.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-42318577500420907152018-09-21T16:32:00.000-04:002018-09-21T16:32:30.535-04:00Coordinating Variable SignsSomeone asked me today (or yesterday, depending on whose time zone you go by) how to force a group of variables in an optimization model to take the same sign (all nonpositive or all nonnegative). Assuming that all the variables are bounded, you just need one new binary variable and a few constraints.<br /><br />Assume that the variables in question are $x_1,\dots,x_n$ and that they are all bounded, say $L_i \le x_i \le U_i$ for $i=1,\dots,n$. If we are going to allow variables to be either positive or negative, then clearly we need $L_i < 0 < U_i$. We introduce a new binary variable $y$ and, for each $i$, the constraints$$L_i (1-y) \le x_i \le U_i y.$$If $y$ takes the value 0, every original variable must be between its lower bound and 0 (so nonpositive). If $y$ takes the value 1, every original variable must be between 0 and its upper bound (so nonnegative).<br /><br />Note that trying to enforce strictly positive or strictly negative rather than nonnegative or nonpositive is problematic, since optimization models abhor strict inequalities. The only work around I know is to change "strictly positive" to "greater than or equal to $\epsilon$" for some strictly positive $\epsilon$, which creates holes in the domains of the variables (making values between 0 and $\epsilon$ infeasible). Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com7tag:blogger.com,1999:blog-8781383461061929571.post-18595398495529745342018-09-12T18:03:00.000-04:002018-09-12T18:03:16.281-04:00Choosing "Big M" ValuesI seem to bring up "big M" models a lot, so apologies if I end up repeating myself in places here. Not long ago, someone passed along highlights of a "big M" type model to me and asked if he could somehow reformulate to get rid of $M$. I did not see any good way to do it, but I did offer up suggestions about choosing values for $M$, and I thought that might make a decent blog post.<br /><br />Just for clarity, what I'm referring to is an integer or mixed-integer program (I'll stick to linear objective and constraints) in which binary variables are used to, in loose terms, turn constraints on or off. So a representative constraint might look like the following:$$\sum_i a_i x_i \le b + M(1-y)$$with the $a_i$ coefficients, the $x_i$ variables (discrete, continuous or a mix), $y$ a binary variable and $M$ a "large" coefficient. Think of $M$ as a stunt double for infinity. The notion is that if $y=1$, the constraint is "turned on" and reduces to$$\sum_i a_i x_i \le b.$$If $y=0$, the constraint is "turned off" and reduces to$$\sum_i a_i x_i \le b + M \approx \infty,$$which should be satisfied by any choices for $x$ the solver might make. There are other variations of "big M" constraints, but I'll stick with this one for illustrative purposes.<br /><br /><h3>The perils of $M$</h3><br />Choosing a value for $M$ can be a tricky business. Suppose first that we choose $M$ small enough that, when $y=0$ and the constraint should be "off",$$\sum_i a_i x_i^* \gt b + M$$for some solution $x^*$ that <i>should</i> be optimal but now appears infeasible to the solver. The result is an incorrect solution. So let's refer to a value for $M$ as <b>correct</b> if it is large enough that no potentially optimal solution violates the constraint when $y=0$. ("Correct" is my term for this. I don't know if there is an "official" term.) Correctness is essential: choose an incorrect (too small) value for $M$ and you risk getting an incorrect solution.<br /><br />Since it is frequently not obvious how large $M$ needs to be in order to be guaranteed correct, people tend to err on the side of caution by choosing really massive values for $M$. That brings with it a different set of problems (ones often ignored in introductory text books). First, branch-and-bound (or branch-and-cut) solvers tend to rely on the continuous relaxation of subproblems (dropping integrality constraints but keeping the functional constraints). Large values of $M$ make for weak relaxations (producing very loose bounds).<br /><br />Suppose, for instance, that we are solving a model that both builds roads and routes traffic along those roads. $y$ represents the decision to build a particular road ($y=1$) or not ($y=0$). If we build the road, a cost $c$ is incurred (represented by the term $cy$ in the objective function) and we can send unlimited traffic along it; if not, there is no cost but no traffic is committed. In our constraint, the left side is traffic on the road and $b=0$, so traffic there can either be up to $M$ (if built) or 0 (if not built). Now suppose, for example, that the user chooses $M=10^9$ and the solver, in a continuous relaxation of some subproblem, sets $y=10^{-6}$. The solution to the relaxed problem pays only one millionth of $c$ for the privilege of allowing 1,000 units of traffic on this route, which basically allows it to "cheat" (and perhaps grossly underestimates the cost of any actual solution).<br /><br />A related problem is limited arithmetic precision. Since double-precision floating-point arithmetic (the way the computer does arithmetic with real numbers) has limited precision, and is thus susceptible to rounding errors, solvers have to establish standards for "close enough to feasible" and "close enough to integer". Continuing the previous example, it is entirely possible that the solver might look at $y=10^{-6}$, decide that is within integrality tolerance, and treat it as $y=0$, possibly leading to what it thinks is an "optimal" solution with 1,000 units of traffic running over a road that does not exist. Oops.<br /><br />Finally, note that $M$ is part of the coefficient matrix. Mixing very large coefficients (like $M$) with much smaller coefficients can create numerical instability, leading the solver to spend more time computing linear program pivots and possibly leading to totally erroneous solutions. That's too big a topic to get into here, but I'm pretty sure I've mentioned it elsewhere.<br /><br />Despite all this, "big M" models are still very common in practice. There is a nice paper by Codato and Fischetti [1] that shows how a form of Bender's decomposition can be used to get rid of the $M$ coefficients. I've used it successfully (for instance, in [2]), but Bender's decomposition is an advanced technique (i.e., not for the faint of heart), and is not always supported by high-level modeling languages.<br /><br />So, if we are stuck with "big M" models, what can we do to choose values of $M$ that are both correct and and <b>tight</b> (again, my term), meaning small enough that they avoid numerical problems and hopefully produce relaxations with bounds strong enough to do some good?<br /><br /><h3>Not all $M$ are created equal</h3><br />Most "big M" models have more than one constraint (and frequently many) containing a large coefficient $M$. One of my pet peeves is that authors of text books and journal articles will frequently just us $M$, with no subscripts, everywhere such a coefficient is needed. This, in turn, breeds a tendency for modelers to choose one large value for $M$ and use it everywhere.<br /><br />Back when manuscripts were produced on typewriters, it was a bit of a pain in the ass to put in subscripts, so I can see how the trend would have started. (Quick tangent: Nate Brixius recently did a blog post with a picture of a typewriter, for those too young to be familiar with them. I'll <a href="https://nathanbrixius.wordpress.com/2018/09/12/the-origin-of-cc-and-bcc/" target="_blank">link it here</a> for the benefit of younger readers ... and also note that the one pictured is much more modern than the one I learned on, which was not electric.) Today, when people routinely use LaTeX to write manuscripts, there's not much excuse for omitting one measly subscript here and there. Anyway, my point is that it is usually better to use a different, customized value of $M$ for each constraint that needs one.<br /><br /><h3>Customized how?</h3><br />In some cases, model context will provide you an obvious choice for $M$. For example, suppose you are selecting warehouse locations and planning shipments from warehouses to customers. A typical "big M" constraint will look like the following (slightly different from our previous constraint but clearly related:$$\sum_j x_{ij} \le M_i y_i.$$Here variable $x_{ij}$ indicates the amount shipped from warehouse $i$ to customer $j$, binary variable $y_i$ is 1 if we use that warehouse and 0 if we do not, and the intent of the constraint is to say that if we do not use (and pay for) a warehouse, we cannot ship anything from it. One obvious choice for $M_i$ is the capacity of the warehouse. A better choice might be the smaller of that capacity or the maximum volume of customers demands that might plausibly be assigned to that warehouse. The latter might be based, for example, on knowing that the company would not ship anything to customers outside a certain distance from the warehouse.<br /><br />In other cases, there may be some less obvious way to extract suitable (correct and hopefully tight) values for the $M_i$. A while back, I was working on mixed-integer programming models for two group discriminant analysis, and in one paper ([3]) I needed to pick values for $M$. Without going into gory details, I was able to normalize the coefficients of the (linear) discriminant function under construction and then compute valid coefficients $M_i$ by looking at the euclidean distances between pairs of observations. I don't claim that the $M_i$ I came up with were the tightest possible, but they were correct and they produced faster solution of the models than what I got with some arbitrarily large values I initially tried.<br /><br />Finally, you can actually solve subproblems to get correct and (hopefully) tight $M$ values. Whether this is feasible depends on how many you need and how large and scary your model is. Going back to my first example, the trick would work something like this. Relax all integrality constraints. Drop the constraint in question. If you have any other "big M" constraints for which you have not yet computed tight values of $M$, pick something safely large for those coefficients, trying to avoid numerical problems but not worrying about tightness. Now switch the objective to maximizing $\sum_i a_i x_i - b$. The optimal objective value is your value for $M$. It is clearly correct: any feasible solution to the MIP model is a feasible solution to the LP relaxation, and so the value of $\sum_i a_i x_i - b$ cannot exceed your choice of $M$ (regardless of whether $y$ is 0 or 1). Repeat for each additional constraint containing a "big M" coefficient (switching from minimizing to maximizing if the nature of the constraint warrants it).<br /><br />The process may be time-consuming, but at least you are solving LPs rather than MIPs. It is also a bit trickier than I made it sound, at least when multiple $M_i$ are involved. You have to guess values for any $M_i$ for which you have not yet solved, and guessing big values for them may result in a looser relaxation than necessary, which in turn may result in an inflated value for the $M_i$ you are currently choosing. You should definitely get correct choices for the $M$ coefficients; it's just the tightness that is in question. Even so, the values you get for the $M_i$ just might be tight enough to save you more time in the MIP solution than it costs to solve the LPs.<br /><br /><h3>References</h3><br />[1] Codato, G. and Fischetti, M. <i>Combinatorial Benders' Cuts for Mixed-Integer Linear Programming</i>. Operations Research, <b>2006</b>, Vol. 54(4), pp. 756-766.<br /><br />[2] Bai, L. and Rubin, P. A. <i>Combinatorial Benders Cuts for the Minimum Tollbooth Problem</i>. Operations Research, <b>2009</b>, Vol. 57(6), pp. 1510-1522.<br /><br />[3] Rubin, P. A. <i>Heuristic Solution Procedures for a Mixed-Integer Programming Discriminant Model</i>. Managerial and Decision Economics, <b>1990</b>, Vol. 11(4), pp. 255-266. Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com3tag:blogger.com,1999:blog-8781383461061929571.post-5385644160804986132018-08-25T15:32:00.000-04:002018-08-25T15:32:15.553-04:00Adding Items to a SequenceA <a href="https://www.or-exchange.org/questions/14874/mip-formulation-of-tsp-recreation-of-lns" target="_blank">question</a> posed on OR-Exchange in 2017 asked the following: Given a tour of nodes, how does one best add two new nodes while respecting the ordering of the original tour. Specifically, the author began with a tour 0 - 1 - 2 - 4 - 6 - 0 (where node 0 is a depot) and wanted to add new stops 3 and 5 in such away that, in the revised tour, stop 1 still came before stop 2, stop 2 before stop 4, etc.<br /><br />This problem can arise not just in vehicle routing but in many sorts of sequencing problems (such as scheduling jobs for production). Of course, preserving the original ordering to the extent possible is not always a concern, but it might be if, for instance, the existing stops are customers who have been promised somewhat general time windows for delivery. In any event, we'll just take the question as a given.<br /><br />The answer I posted on OR-X made the somewhat charitable (and, in hindsight, unwarranted) assumption that the two new stops would be inserted by breaking two previous arcs, rather than consecutively (for instance, ... - 2 - <span style="color: blue;">3 - 5</span> - 4 - ...). So I'll post an answer without that assumption here. In fact, I'll post three variants, one specific to the case of adding exactly two stops and the other two more general.<br /><br />First, let me articulate some common elements. I'll denote the set of original nodes by $N_1$, the set of nodes to be added by $N_2$, and their union by $N=N_1 \cup N_2$. All three approaches will involve setting up integer programming models that will look for the most part like familiar routing models. So we will have binary variables $x_{ij}$ that will take the value 1 if $j$ immediately follows $i$ in the new tour. We will have constraints ensuring that every node is entered and exited exactly once:$$\sum_{j\in N} x_{ij} = 1\quad \forall i\in N\\ \sum_{i \in N} x_{ij} = 1 \quad\forall j\in N.$$The objective function will be some linear combination of the variables (sum of distances covered, sum of travel times, ...), which I will not worry about here, since it is no different from any sequencing model.<br /><br />The first new wrinkle is that we do <i>not</i> define a variable for every pair of nodes. We create $x_{ij}$ only for the following combinations of subscripts: \begin{align*} i & \in N_{2},j\in N_{2},i\neq j\\ i & \in N_{1},j\in N_{2}\\ i & \in N_{2},j\in N_{1}\\ i & \in N_{1},j\in N_{1},(i,j)\in T \end{align*} where $T$ is the original tour. Thus, for example, we would have $x_{24}$ but not $x_{42}$, nor $x_{26}$. The rationale is straightforward: if we add an arc between two original nodes that were not successors on the original tour, we will force an order reversal. For instance, suppose we replace the arc 2 - 4 with, say, 2 - 6. Node 4 now must appear either before node 2 or after node 6, and either way the order has not been preserved.<br /><br /><h3>Version 1</h3><br />The first variant makes explicit use of the fact that we have only two new nodes. We add one subtour elimination constraint, to prevent the new nodes from forming a subtour: $x_{35}+x_{53}\le 1.$ Now consider how many different ways we could insert the two new nodes. First, we could break two links in the original tour, inserting 3 in the void where the first link was and 5 in the void where the second link was. Since the original tour had five links there are $\binom{5}{2}=10$ distinct ways to do this. Similarly, we could break two links but insert 5 first and 3 later. There are again ten ways to do it. Finally, we could break one link and insert either 3 - 5 or 5 - 3 into the void. With five choices of the link to break and two possible orders, we get another ten results, for a grand total of 30 possibly new tours.<br /><br />With that in mind, consider what happens if node 3 is inserted after original node $i$, breaking the link between $i$ and its original successor $j$. (In our model, this corresponds to $x_{i3}=1$.) If this is a single node insertion, then we should have $j$ follow node 3 ($x_{3j}=1$). If it is a double insertion ($i$ - 3 - 5 - $j$), we should have $x_{35}=x_{5j}=1$. We can capture that logic with a pair of constraints for each original arc: <br />\[ \left.\begin{aligned}x_{i3}-x_{3j} & \le x_{35}\\ x_{i3}-x_{3j} & \le x_{5j} \end{aligned} \right\} \forall(i,j)\in T. \] We could do the same using node 5 in place of node 3, but it is unnecessary. If node 3 is correctly inserted by itself, say between $i$ and $j$, and node 5 is inserted after original node $h$, then the original successor $k$ of $h$ needs a new predecessor. That predecessor cannot be $h$, nor can it be any other original node (given our reduced set of variables), nor can it be node 3 (which now precedes $j$). The only available predecessor is 5, giving us $h$ - 5 - $k$ as expected.<br /><br />You might wonder how this accommodates a 5 - 3 insertion, say after node $i$. The original successor $j$ of $i$ needs a new predecessor, and 3 is the only eligible choice, so we're good.<br /><br />I tested this with a small Java program, and it did in fact find all 30 valid revised tours (and no invalid ones).<br /><br /><h3>Version 2</h3><br />Version 2, which can be applied to scenarios with any number of new nodes, involves building a standard sequencing model with subtour elimination constraints. The only novel element is the reduced set of variables (as described above). A blog is no place to explain sequencing models in their full glory, so I'll just assume that you, the poor suffering reader, already know how.<br /><br /><h3>Version 3</h3><br />In version 3, we again build a sequencing model with the reduced set of variables, but this time we use the Miller-Tucker-Zemlin method of eliminating subtours rather than adding a gaggle of subtour elimination constraints. The MTZ approach generally results in smaller models (since the number of subtours, and hence the potential number of subtour constraints, grows combinatorially with the number of nodes), but also generally produces weaker relaxations.<br /><br />The <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem" target="_blank">Wikipedia page for the TSP</a> shows the MTZ constraints, although for some reason without labeling them as such. Assume a total of $n$ nodes (with consecutive indices), with node $0$ being the depot. The MTZ approach adds continuous variables $u_i, \,i\in \{1,\dots,n\}$ with bounds $0\le u_i \le n-1$. It also adds the following constraints for all eligible arcs $(i,j)$ with $i\neq 0$:$$u_i - u_j + n x_{ij} \le n-1.$$You can think of the $u_i$ variables as counters. The MTZ constraints say that if we go from any node $i$ (other than the depot) to any node $j$ (including the depot), the count at node $j$ has to be at least one larger than the count at node $i$. These constraints preclude any subtours, since a subtour (one starting and ending any place other than the depot) would result in the count at the first node of the subtour being larger than itself.<br /><br />As I mentioned, the MTZ formulation has a somewhat weaker LP relaxation than a formulation with explicit subtour elimination constraints, so it is not favored by everyone. In our particular circumstance, however, it has an additional virtue: it gives us a relatively painless way to enforce the order preservation requirement. All we need do is insert constraints of the form$$u_j \ge u_i + 1\quad\forall (i,j)\in T.$$This forces the counts at the original nodes to increase monotonically with the original tour order, without directly impacting the counts at the new nodes.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-64507788324394708062018-07-31T17:02:00.000-04:002018-07-31T17:02:42.542-04:00NP ConfusionI just finished reading a somewhat provocative article on the <a href="https://www.cio.com/" target="_blank">CIO</a> website, titled "<a href="https://www.cio.com/article/3293010/hiring-and-staffing/10-reasons-to-ignore-computer-science-degrees.html" target="_blank">10 reasons to ignore computer science degrees</a>" (when hiring programmers). While I'm not in the business of hiring coders (although I recent was hired as a "student programmer" on a grant -- the Universe has a sense of humor), I find myself suspecting that the author is right about a few points, overstating a few and making a few that are valid for some university CS programs but not for all (or possibly most). At any rate, that's not why I mention it here. What particularly caught my eye was the following paragraph:<br /><blockquote class="tr_bq">It’s rare for a CS major to graduate without getting a healthy dose of <a href="https://en.wikipedia.org/wiki/NP-completeness" rel="nofollow noopener" target="_blank">NP-completeness</a> and <a href="https://en.wikipedia.org/wiki/Turing_machine" rel="nofollow noopener" target="_blank">Turing machines</a>, two beautiful areas of theory that would be enjoyable if they didn’t end up creating bad instincts. One biologist asked me to solve a problem in DNA sequence matching and I came back to him with the claim that it was NP-complete, a class of problems that can take a very long time to solve. He didn’t care. He needed to solve it anyway. And it turns out that most NP-complete problems are pretty easy to solve most of the time. There are just a few pathological instances that gum up our algorithms. But theoreticians are obsessed with the thin set that confound the simple algorithms, despite being rarely observed in everyday life.</blockquote>Any of my three regular readers will know that I periodically obsess/carp about NP-fixation, so I'm sympathetic to the tenor of this. At the same time, I have a somewhat mixed reaction to it.<br /><ul><li>"... NP-complete, a class of problems that can take a very long time to solve." This is certainly factually correct, and the author thankfully said "can" rather than "will". One thing that concerns me in general, though, is that not everyone grasps that problems in class P, for which polynomial time algorithms are known, can also take a very long time to solve. One reason, of course, is that "polynomial time" means run time is a polynomial function of problem size, and big instances will take longer. Another is that $p(n)=n^{1000}$ is a polynomial ... just not one you want to see as a (possibly attainable) bound for solution time. There's a third factor, though, that I think many people miss: the size of the coefficients (including a constant term, if any) in the polynomial bound for run time. I was recently reading a description of the default sorting algorithm in a common programming language. It might have been the one used in the Java collections package, but don't quote me on that. At any rate, they actually use two different sorting algorithms, one for small collections (I think the size cutoff might have been around 47) and the other for larger collections. The second algorithm has better computational complexity, but each step requires a bit more work and/or the setup is slower, so for small collections the nominally more complex algorithm is actually faster.</li><li>"He didn’t care. He needed to solve it anyway." I love this. It's certainly true that users can ask coders (and modelers) for the impossible, and then get snippy when they can't have it, but I do think that mathematicians (and, apparently, computer scientists) can get a bit too locked into theory. <major digression> As a grad student in math, I took a course or two in ordinary differential equations (ODEs), where I got a taste for the differences between mathematicians and engineers. Hand a mathematician an ODE, and he first tries to prove that it has a solution, then tries to characterize conditions under which the solution is unique, then worries about stability of the solution under changes in initial conditions or small perturbations in the coefficients, etc., ad nauseum. An engineer, faced with the same equation, tries to solve it. If she finds the solution, then obviously one exists. Depending on the nature of the underlying problem, she may or may not care about the existence of multiple solutions, and probably is not too concerned about stability given changes in the parameters (and maybe not concerned about changes in the initial conditions, if she is facing one specific set of initial conditions). If she can't solve the ODE, it won't do her much good to know whether a solution exists or not.</major digression> At any rate, when it comes to optimization problems, I'm a big believer in trying a few things before surrendering (and trying a few optimization approaches before saying "oh, it's NP-hard, we'll have to use my favorite metaheuristic").</li><li>"And it turns out that most NP-complete problems are pretty easy to solve most of the time. There are just a few pathological instances that gum up our algorithms." I find this part a bit misleading. Yes, some NP-complete problems can seem easier to solve than others, but the fundamental issue with NP-completeness or NP-hardness is problem dimension. Small instances of problem X are typically easier to solve than larger instances of problem X (although occasionally the Universe will mess with you on this, just to keep you on your toes). Small instances of problem X are likely easier to solve than large instances of problem Y, even if Y seems the "easier" problem class. Secondarily, the state of algorithm development plays a role. Some NP-complete problem classes have received more study than others, and so we have better tools for them. Bill Cook has a <a href="https://itunes.apple.com/us/app/concorde-tsp/id498366515?mt=8" target="_blank">TSP application for the iPhone</a> that can solve what I (a child of the first mainframe era) would consider to be insanely big instances of the traveling salesman problem in minutes. So, bottom line, I don't think a "few pathological instances" are responsible for "gum[ming] up our algorithms". Some people have problem instances of a dimension that is easily, or at least fairly easily, handled. Others may have instances (with genuine real-world application) that are too big for our current hardware and software to handle. That's also true of problems in class P. It's just that nobody ever throws their hands up in the air and quits without trying because a problem belongs to class P.</li></ul>In the end, though, the article got me to wondering two things: how often are problems left unsolved (or solved heuristically, with acceptance of a suboptimal final solution), due to fear of NP-completeness; and (assuming that's an actual concern), would we be better off if we never taught students (other than those in doctoral programs destined to be professors) about P v. NP, so that the applications programmers and OR analysts would tackle the problems unafraid?Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com4