tag:blogger.com,1999:blog-87813834610619295712021-07-26T04:03:54.357-04:00OR in an OB WorldA mix of operations research items and software tricks that I'll probably forget if I don't write them down somewhere.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.comBlogger438125tag:blogger.com,1999:blog-8781383461061929571.post-18805433831784190182021-07-13T17:30:00.000-04:002021-07-13T17:30:07.444-04:00Big-M versus Goals (Part II)<p>This is a continuation of my <a href="https://orinanobworld.blogspot.com/2021/07/big-m-versus-goals.html" target="_blank">previous post</a>, in which I discussed the common modeling issue of using a binary variable to enforce or relax a linear constraint. The standard approach to this uses what is commonly known as a "big M" constraint, but there are alternatives. The two that I mentioned in the previous post were a version of Benders decomposition frequently termed "combinatorial Benders cuts" (CBC) and an approach specific to CPLEX using disjunctive "goals". The previous post illustrated this using a mixed integer linear program (MILP) for the two-group discriminant problem.</p><p>I've used both "big M" and CBC before, but I had never used goals, and I was curious about the relative performance of the three methods. So I coded the discriminant problem from the previous post in Java (with a slightly different objective function, which is irrelevant to the discussion at hand) and ran the various models using four data sets that I have left over from a paper I wrote 30-ish years ago. The data sets relate to breast cancer (444+239), diabetes (500+268), liver disease (145+200) and (just to prove I don't have a morbid fascination with disease) radar returns (126+225). The numbers in parentheses are the sizes of the two samples.<br /></p><p>I ran the problems using CPLEX 20.1 with a 10 minute time limit and default values for all parameters. For the CBC model, I used generic callbacks with a local copy of the LP subproblem for each thread (which will relate to memory use). The table below shows, for each combination of data set and model, the run time (in seconds), the final objective value (lower is better), the final bound (bigger is better) and the approximate peak memory use (in gigabytes, omitted for cases where the memory use was "trivial").<br /></p><p><br /></p><style type="text/css">.tg {border-collapse:collapse;border-color:#9ABAD9;border-spacing:0;} .tg td{background-color:#EBF5FF;border-color:#9ABAD9;border-style:solid;border-width:1px;color:#444; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#409cff;border-color:#9ABAD9;border-style:solid;border-width:1px;color:#fff; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-baqh{text-align:center;vertical-align:top} .tg .tg-wn3b{background-color:#D2E4FC;text-align:center;vertical-align:middle} .tg .tg-j0tj{background-color:#D2E4FC;text-align:center;vertical-align:top} .tg .tg-nrix{text-align:center;vertical-align:middle} </style><div align="center"><table class="tg"><thead> <tr> <th class="tg-baqh">Data</th> <th class="tg-baqh">Model</th> <th class="tg-baqh">Time (s.)</th> <th class="tg-baqh">Objective</th> <th class="tg-baqh">Bound</th> <th class="tg-baqh">Memory (GB)</th> </tr></thead><tbody> <tr> <td class="tg-wn3b" rowspan="3">Cancer</td> <td class="tg-j0tj">Big M</td> <td class="tg-j0tj">2.2</td> <td class="tg-j0tj">0.0124</td> <td class="tg-j0tj">0.0124</td> <td class="tg-j0tj">---</td> </tr> <tr> <td class="tg-baqh">CBC</td> <td class="tg-baqh">6.5</td> <td class="tg-baqh">0.0124</td> <td class="tg-baqh">0.0124</td> <td class="tg-baqh">---</td> </tr> <tr> <td class="tg-j0tj">Goals<br /></td> <td class="tg-j0tj">600</td> <td class="tg-j0tj">0.0124</td> <td class="tg-j0tj">0.0056</td> <td class="tg-j0tj">1.0</td> </tr> <tr> <td class="tg-nrix" rowspan="3">Diabetes</td> <td class="tg-baqh">Big M</td> <td class="tg-baqh">600</td> <td class="tg-baqh">0.2057</td> <td class="tg-baqh">0.0614</td> <td class="tg-baqh">0.2</td> </tr> <tr> <td class="tg-j0tj">CBC</td> <td class="tg-j0tj">600</td> <td class="tg-j0tj">0.3966</td> <td class="tg-j0tj">0.0598</td> <td class="tg-j0tj">2.9</td> </tr> <tr> <td class="tg-baqh">Goals<br /></td> <td class="tg-baqh">600</td> <td class="tg-baqh">0.2239</td> <td class="tg-baqh">0.0038</td> <td class="tg-baqh">4.1</td> </tr> <tr> <td class="tg-wn3b" rowspan="3">Liver</td> <td class="tg-j0tj">Big M</td> <td class="tg-j0tj">600</td> <td class="tg-j0tj">0.2521</td> <td class="tg-j0tj">0.1243</td> <td class="tg-j0tj">0.2</td> </tr> <tr> <td class="tg-baqh">CBC</td> <td class="tg-baqh">600</td> <td class="tg-baqh">0.3244</td> <td class="tg-baqh">0.1013</td> <td class="tg-baqh">7.6</td> </tr> <tr> <td class="tg-j0tj">Goals<br /></td> <td class="tg-j0tj">600</td> <td class="tg-j0tj">0.2639</td> <td class="tg-j0tj">0.0169</td> <td class="tg-j0tj">2.8</td> </tr> <tr> <td class="tg-nrix" rowspan="3">Radar</td> <td class="tg-baqh">Big M</td> <td class="tg-baqh">91</td> <td class="tg-baqh">0.0190</td> <td class="tg-baqh">0.0190</td> <td class="tg-baqh">---</td> </tr> <tr> <td class="tg-j0tj">CBC</td> <td class="tg-j0tj">139</td> <td class="tg-j0tj">0.0186</td> <td class="tg-j0tj">0.0186</td> <td class="tg-j0tj">0.2</td> </tr> <tr> <td class="tg-baqh">Goals<br /></td> <td class="tg-baqh">600</td> <td class="tg-baqh">0.0190</td> <td class="tg-baqh">0.0066</td> <td class="tg-baqh">0.2</td> </tr></tbody></table></div><p> </p><p>Before forging ahead, I'll note one small anomaly in the results for the radar data set. Both the "big M" and CBC methods got "provably optimal" solutions (lower bound = upper bound), but the CBC solution was slightly better (misclassifying six observations, versus seven for "big M"). I think this is just an issue with tolerance settings (constraint satisfaction tolerance, integrality tolerance or a combination of the two).<br /></p><p>I've said on more than one occasion that the only valid generalization about integer programs is that there are no other valid generalizations about integer programs. That won't stop me from looking for possible patterns in the table, but it is important to keep in mind not only the small number of problems tried (four) and the specific nature (two-group discriminant analysis), but also the fact that I have fairly tight, data-based values of $M_i$ for the "big M" models. In two data sets, "big M" and CBC both got proven optimal solutions (with CBC being slower but not particularly slow). Both times, the goal approach also found an optimal solution (or optimal-ish in the radar case), but was nowhere getting the bound close to the optimal value. In the other two cases, goals did better on the primal side (found a better incumbent solution) but CBC did better on the dual side (found a tighter bound). Since CBC is busy hacking off infeasible candidate solutions and the goal approach is trying to satisfy goals, I suspect this pattern might generalize a bit.</p><p>On the two more difficult data sets (diabetes and liver disease), both CBC and goals used much more memory than "big M" did. In both cases, portions of the node file were being compressed and possibly shipped off to disk by CBC and goals, whereas "big M" never got to the point of consuming very much memory. I'll confess that I was expecting goals to be much more of a memory hog than CBC, since the goal "stack" gets replicated at every node, but CBC used much more memory than goals on the liver disease data set. Although I did not keep count, my guess is that indicates CBC was cranking out a huge number of Benders cuts. I think the memory use can be mitigated (somewhat) by making the cuts purgeable, but I did not bother to test that.</p><p>My last take-away: on the easy problems "big M" was fastest, and on the challenging problems "big M" had both the best incumbent value and the tightest bound. Again, this particular set of problems benefits from fairly tight $M_i$ values. So, henceforth, I will likely stick to "big M" whenever I think I have good values for $M$.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-91045655275327791102021-07-08T16:46:00.001-04:002021-07-08T16:46:14.166-04:00Big-M versus Goals<p>Much has been written (including by yours truly) about the trials and tribulations of "big M" integer programming models. A common use of "big M" is to let binary variables turn constraints on or off. So, for instance, $$a^\prime x \le b + Mz\quad (1)$$with $x$ a vector of continuous variables and $z$ a binary variable is intended to enforce $a^\prime x \le b$ when $z=0$ and not enforce it when $z=1$.<br /></p><p>Large values of $M$ can contribute to weak relaxations (leading to slow progress on the bound), "leakage" (where a value of $z$ can be close enough to 0 that the solver considers it 0 to within rounding error while making $Mz$ big enough to relax the constraint), and various numerical problems. Small values of $M$ may make the constraint binding when it is supposed to be relaxed. Still, for most people the "big M" approach is the only way they know to model certain problem features.</p><p>One alternative, at least in some cases, is to use "combinatorial Benders decomposition" [1], in which the constraints in question are either enforced or ignored in a subproblem depending on the values of $z$ in the master problem. The good news is that this eliminates any worries about choosing $M$, since no coefficients $M$ are needed. The bad news is that (a) Benders decomposition is a bit advanced for many users, (b) it may require custom programming (as opposed to making the model in a high-level language and letting the modeling environment pass it to the solver), and (c) Benders decomposition is an "outer approximation", so the best bound may be a bit leisurely in converging to the optimum.</p><p>There is another alternative available, at least with CPLEX. Recent versions of CPLEX support "goals". The user's manual does a masterful job of not actually defining what a goal is -- according to the CPLEX 20.1 manual, goals are things that "allow you to take control of the branch & cut search procedure used by IBM ILOG CPLEX to solve MIP problems". Basically, a goal is an alternative form of a constraint (I think), which rather than explicitly appearing in the constraint matrix is put in a stack of goals, passed to nodes when they are created, and somehow used to influence the creation of child nodes (I think).</p><p>The tie-in to today's topic is that one type of goal provided by CPLEX is an "or" goal, which is what it sounds like: a disjunction of two or more constraints or goals. So an alternative to writing constraint (1) with the dreaded $M$ would be to use an "or" goal $$a^\prime x \le b \mathrm{\quad OR \quad} z=1.\quad (2)$$</p><p>I was curious about how well this would work, so I tried to do a comparison between "big-M" and goal-based models for a two-group discriminant problem. The gist of the model is as follows. We have as data a sample of vectors $x_i\in \mathbb{R}^n$ from two groups. Let $G_0$ and $G_1$ denote the indices belong to the first and second groups respectively. We want to find coefficients $w\in \mathbb{R}^n$, $w_0 \in \mathbb{R}$ for a linear function $f(x) = w^\prime x + w_0$ such that $f(x) \lt 0$ predicts membership of $x$ in the first group and $f(x) \gt 0$ predicts membership in the second group.</p><p>The specific model I started with (from some research I did in my much younger days) includes one more variable $d\ge \delta$ (where $\delta$ is some small positive constant) and binary variables $z_i$ signaling whether an observation is correctly ($z_i =0$) or incorrectly ($z_i=1$) classified. Variable $d$ captures the minimum absolute score of a correctly classified observation, which in essence represents the amount of separation between (correct) scores for the two groups. If $d$ is too small, you may end up classifying observations positive or negative based on what amounts to rounding error, hence the lower bound on $d$.<br /></p><p>The "big M" version is as follows: $$\begin{align*} \min\quad\sum_{i}z_{i}-\epsilon d\\ \mathrm{s.t.}\quad w^{\prime}x_{i}+w_{0}+d & \le\phantom{-}M_{i}z_{i}\quad i\in G_{0}\\ w^{\prime}x_{i}+w_{0}-d & \ge-M_{i}z_{i}\quad i\in G_{1}\\ -1\le w & \le1\\ w_{0} & \quad \mathrm{free}\\ d & \ge\delta\\ z_{i} & \in\left\{ 0,1\right\} \quad\forall i. \end{align*}$$The model minimizes the number of misclassifications with a secondary criterion of maximizing separation. The coefficient $\epsilon$ is chosen to keep the objective contribution small enough that the solver is not tempted to make unnecessary misclassifications just to boost the value of $d$. Putting bounds on $w$ prevents huge coefficients for the classifier (which again could result in decisions being made based on rounding error). The model has been shown to work correctly.</p><p>The goal version of the model keeps the bounds on the variables and objective function but replaces all the "big-M" constraints with disjunctions of the form $w^\prime x_i +w_0 +d \le 0$ or $z_i=1$ for $i\in G_0$ and similarly for $i\in G_1$. In other words, "classify this observation correctly or pay the price for misclassifying it". I coded both models in Java and ran a test case, expecting both to produce an optimal classifier but unsure which would be faster. There was an unpleasant surprise waiting for me: CPLEX declared the goal-based model unbounded! It was right. You can satisfy all the disjunctions by declaring all the observations misclassified ($z_i = 1$ for all $i$). That lets you choose an arbitrarily large value for $d$, large enough that $\epsilon d$ is arbitrarily bigger than the sum of the $z$ variables, making the objective arbitrarily negative.</p><p>This is not a problem with the "big M" model, because no matter how large you make $M_i$, you still have a finite bound on the left side of each constraint. The fix was to come up with a defensible upper bound for $d$ and add it to the goal model, making the goal model bounded. With that fix in place, both models arrived at optimal solutions, in what was comparable time for the one test case I have run so far.</p><p>So the takeaway here is that if you want to use disjunctions to avoid "big M", you may need to take extra care to ensure that your model is bounded.<br /></p><p><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p>[1] Codato, G. and Fischetti, M. Combinatorial Benders' Cuts for Mixed-Integer Linear Programming. <i>Operations Research</i> 54(4), <b>2006</b>, 756-766.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-77452379499435215562021-07-05T16:52:00.000-04:002021-07-05T16:52:09.105-04:00Vertex Numbering via GA<p>A <a href="https://or.stackexchange.com/questions/6519/numbering-the-vertices-of-an-n-layer-graph-so-that-edges-have-similar-numbered" target="_blank">question</a> on OR Stack Exchange asks how to number vertices in a layered graph so that the endpoints of edges have similar numbers. Note that "numbering" the vertices here means numbering <i>within layers</i> (so that, for instance, every layer has a vertex numbered 1). We will assume that every node has a unique ID before we launch into the numbering process. The author of the question chose the sum of squared differences of endpoint labels for all edges as the objective function to minimize. The following image shows an example of a layered graph with a (probably suboptimal) numbering scheme. The numbers on the edges are their contributions to the objective function.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-cIc5scuoc2k/YONYpSn2KwI/AAAAAAAADnM/ba5kQECYCQ0jqt6XASifeSIWMgb7TEtYgCLcBGAsYHQ/s536/vertexnumbering.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="example graph with three layers" border="0" data-original-height="472" data-original-width="536" height="564" src="https://1.bp.blogspot.com/-cIc5scuoc2k/YONYpSn2KwI/AAAAAAAADnM/ba5kQECYCQ0jqt6XASifeSIWMgb7TEtYgCLcBGAsYHQ/w640-h564/vertexnumbering.png" width="640" /></a></div><p></p><p>The prolific Rob Pratt noted that the problem can be modeled as an assignment problem with a quadratic objective function, using binary variables. That model produces an exact solution, given sufficient time and memory.</p><p>Note that numbering the nodes within a layer is equivalent to picking one of the possible permutations of the node IDs. The author of the question indicated receptiveness to a metaheuristic, so I decided to try coding a random key genetic algorithm (RKGA) for the problem. I've mentioned RKGAs before (for instance, <a href="https://orinanobworld.blogspot.com/2020/09/a-greedy-heuristic-wins.html" target="_blank">here</a> and <a href="https://orinanobworld.blogspot.com/2021/04/a-ga-model-for-joint-clustering-problem.html" target="_blank">here</a>). As I understand it, they were originally designed for sequencing / scheduling problems, where things need to be permuted optimally, so an RKGA seemed like a natural choice. I coded both the integer programming (IP) model and the RKGA in Java, using <a href="https://www.ibm.com/products/ilog-cplex-optimization-studio" target="_blank">CPLEX</a> 20.1 as the IP solver and <a href="https://watchmaker.uncommons.org/" target="_blank">Watchmaker Framework</a> 0.7.1 for the GA. The Watchmaker Framework has not been under active development for quite a few years, but it works well.</p><p>To use an RKGA, you need to come up with a coding for a "chromosome" (candidate solution) and a mechanism for decoding the chromosome into a solution to the original problem (in this case separate vertex permutations for each graph layer) such that the decoded chromosome is always feasible. I chose as my chromosome representation a double-precision vector with elements between 0 and 1, having one element per vertex in the original graph. Double-precision is probably overkill, but I'm in the habit of using double-precision rather than single-precision, so it was the path of least resistance. To decode the chromosome, I first had to chop it up into smaller vectors (one per layer) and then extract the sort index of each smaller vector. So, using the image above as an example, a chromosome would be a double vector with 2 + 5 + 3 = 10 components. If the last three components were (0.623, 0.021, 0.444) the sort indices would be (3, 1, 2), yielding the vertex numbering for the last layer in the image. To convert a double vector into a sort index vector, I used a Java library (<a href="https://gitlab.msu.edu/orobworld/valueorderedmap" target="_blank">ValueOrderedMap</a>, described in <a href="https://orinanobworld.blogspot.com/2019/10/a-value-ordered-java-map.html" target="_blank">this post</a>) that I wrote some time back.</p><p>A GA can "stagnate", meaning cease to improve on the best known solution. One obvious reason for stagnation is that it cannot improve on an optimal solution, but stagnation can occur for other reasons. On small test problems, the GA tended to stagnate rather quickly, so I set a stagnation limit and put the GA in a loop that would restart it up to nine times or until a specified time limit (five minutes ... I wasn't very ambitious). On larger test problems, it was not as quick to stagnate, but it eventually did.</p><p>I used the GA's best solution as a starting solution for the IP model. On smaller test problems, CPLEX occasionally closed the gap to zero within five minutes but usually did not. On larger problems, CPLEX struggled to get the best bound above zero (100% gap) within five minutes. So I suspect the IP model is prone to loose bounds. An example of a "small" test problem is one with 29 vertices spread across five layers and 35 edges. (I worked mainly with sparse examples, to keep the size of the IP model down.)<br /></p><p>Interestingly, in <i>none</i> of the trials did CPLEX improve on the initial solution provided by the GA. So it might be that the GA was consistently finding an optimum, although I cannot prove that. (On the few smaller instances where CPLEX reached provable optimality, the optimal solution was indeed the GA solution.) Note that, while the GA solution appeared optimal, that does not mean that each GA run produced an optimum. On the 29 vertex example mentioned above, CPLEX found a provable optimum with objective value 111. The GA ran 10 times (using about 92 seconds total), and of the 10 runs two stagnated at 111, five at 112, and three at 114. So even if the GA is prone to finding optimal solutions, it may require multiple starts to do so.</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-32588781743794504062021-07-01T15:19:00.002-04:002021-07-01T15:19:28.314-04:00Smallest Pairwise Difference in R<p> In a recent <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2021/06/arranging-points-on-line.html" target="_blank">blog post</a>, OR blogger Erwin Kalvelagen included some R code for a genetic algorithm, including an objective function that takes as input a vector $x\in\Re^n$ and returns the smallest absolute pairwise difference in its elements. The actual objective calls for the smallest difference in <i>consecutive</i> values, but Erwin correctly notes that the smallest absolute difference in any pair will automatically occur in consecutive values, so the "all pairs" approach yields the same objective value (and is considerably friendlier in the MIP and MINLP models he proposes).</p><p>I did a little experimenting and confirmed his opinion that the GA is unlikely to be competitive with the MIP model. That led me to a tangent involving the way the objective function for the GA is coded. Erwin used the obvious approach: two nested for loops to find all pairs of values <i>once</i>, avoiding comparing $x_i$ with $x_j$ and then (redundantly) $x_j$ with $x_i$, and avoiding comparing $x_i$ with itself. This approach does $O(n^2)$ work. An alternative approach is to first sort the vector $x$, which takes $O(n \log n)$ work, and then compute consecutive differences and their minimum value ($O(n)$). I put together a little R code to compare the two, and unsurprisingly the method with sorting is faster (and <i>much</i> faster when $n$ gets big).</p><p>There is another wrinkle to this. I've seen a number of articles and comments online asserting that explicit looping in R (as with Erwin's nested for loops) is inefficient, and should be avoided at all costs in favor of using vectorized functions (where the looping is presumably coded in C or C++ and baked into the compiled function). I've also seen contrary opinions saying that the concern about looping is overstated. There is also, I think, a middle ground: even if explicit loops are inefficient, that's likely only a concern if you are looping over very large objects, or looping many times, or both. Erwin's test case has $n=50$, which I suspect is not large enough to be worth worrying about the efficiency or inefficiency of looping (even though the GA will evaluate the objective function repeatedly).</p><p>So to test this, I cobbled together an R notebook (which you can find <a href="https://rubin.msu.domains/blog/pairwise_differences.nb.html" target="_blank">here</a>) that tests all three versions of the objective function on vectors of various dimensions. As I thought, the sorted method dominates. Even at $n=50$ it's competitive with looping (although looping appears to be slightly faster), but at $n=2,000,000$ the sorting method takes about one third of a second on my PC (using R 4.1), whereas I estimate the nested loop method would take about <i>10 days</i> (!).</p><p>The third method uses the (vectorized) outer product operator in R. It computes all $n^2$ absolute differences, whereas the nested loops calculate the $n(n-1)/2$ unique (nontrivial) differences, so in theory it does about twice the work the nested for loops do. Despite that, it is faster than the nested loops (though not nearly as fast as sorting). For $n$ around 5,000 to 10,000, the outer product approach seems to be about 10x faster than the nested loops.</p><p>So I take away two conclusions from this. The first is confirmation of a bit of wisdom that I believe I heard from Princeton computer scientist Robert Sedgewick in a MOOC based on his book (with Kevin Wayne) "<a href="https://algs4.cs.princeton.edu/home/" target="_blank">Algorithms</a>". Sorting is computationally cheap, so if you think it might help with a calculation, try it. The second is confirmation of the assertion that, even in the latest version (4.1) of R, explicit looping is probably slower, and quite possibly by a noticeable amount, than using vectorized methods ... when there are vectorized methods available.</p><p>If you are curious about the code for the alternative methods, it's embedded in the notebook.</p><p> <br /></p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-28746111860134491802021-05-10T19:17:00.003-04:002021-05-11T14:23:52.134-04:00Symmetric Difference of Sets in Java<p>The <i>symmetric difference</i> of two sets $A$ and $B$ is defined as $$C=(A \cup B)\backslash (A \cap B).$$It is the set of objects that belong to either $A$ or $B$ but not both. I've been working on some Java code for a research project in which I will need to compute the sizes of the symmetric differences of a large number of pairs of sets. The contents of the sets are nonnegative integers (Java type <span style="font-family: courier;">Integer</span>), which makes life a bit simpler because integers are nicely ordered. (In Java-speak, <span style="font-family: courier;">Integer</span> implements the <span style="font-family: courier;">Comparable<Integer></span> interface.) Since I will be doing a large number of differences, and since the sets involved will be moderately large (by my standards), I wanted to find the fastest way to compute a difference. So I wrote a Java program to generate random pairs of integer sets and compute the size of their symmetric difference four different ways.</p><p>Since the introduction of streams in Java (I believe in version 8), what I think is the most obvious/intuitive way to do it is to convert each set to a stream, filter out members of the other set, and count up the survivors. On the other hand, I am a happy user of the <a href="https://commons.apache.org/proper/commons-collections/" target="_blank">Apache Commons Collections</a> library, whose <span style="font-family: courier;">CollectionsUtils</span> class provides not one but two ways to do this. One is to use the <span style="font-family: courier;">subtract()</span> method twice, switching the order of the arguments. This does the same thing that my first (stream-based) method does. The other is to use the <span style="font-family: courier;">disjunction()</span> method, which calculates the symmetric difference in one invocation.</p><p>Equally obvious to me from a mathematical perspective, but not from a Java perspective, is to take the bitwise exclusive OR of the <a href="https://en.wikipedia.org/wiki/Indicator_vector" target="_blank">characteristic vectors</a> of the two sets. Java contains a <span style="font-family: courier;">BitSet</span> class, with <span style="font-family: courier;">set()</span> and <span style="font-family: courier;">flip()</span> methods for individual bits, which makes this easy to do.</p><p>My program runs the experiments on instances of both <span style="font-family: courier;">HashSet<Integer></span> and <span style="font-family: courier;">TreeSet<Integer></span>. Hash sets are generally faster, but tree sets have more features (and impose an internal ordering on their contents). My somewhat naive intuition was that having the contents accessed in ascending order would make differencing tree sets faster than differencing hash sets.</p><p>There are lots of moving parts here (how large the integers get, how big the sets involved in the comparisons are, ...), so I hesitate to read too much into the results. That said, here are some timing results using sets of cardinality between 100 and 200 with integers from 0 to 100,000. Times are in microseconds and are averages of 10,000 replications. (You can afford big samples when things go this quickly.)</p><p><br /></p><div align="center"><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-baqh{text-align:center;vertical-align:top} </style><table class="tg"><thead> <tr> <th class="tg-baqh">Method</th> <th class="tg-baqh">Streams</th> <th class="tg-baqh">subtract()<br /></th> <th class="tg-baqh">disjunction()</th> <th class="tg-baqh">BitSet</th> </tr></thead><tbody> <tr> <td class="tg-baqh">HashSet</td> <td class="tg-baqh">10.07</td> <td class="tg-baqh">36.40</td> <td class="tg-baqh">62.15</td> <td class="tg-baqh">9.68</td> </tr> <tr> <td class="tg-baqh">TreeSet</td> <td class="tg-baqh">23.79</td> <td class="tg-baqh">35.37</td> <td class="tg-baqh">60.94</td> <td class="tg-baqh">7.21</td> </tr></tbody> </table></div><p> </p><p>I was surprised (OK, shocked) to find that the <span style="font-family: courier;">disjunction()</span> method, which directly computes the symmetric difference, was almost twice as slow as two invocations of the <span style="font-family: courier;">subtract()</span> method. Less surprising was that turning both sets into streams and filtering each other out was faster than calling <span style="font-family: courier;">subtract()</span> twice. (Note that I am computing the size of the symmetric difference, not the symmetric difference itself. Uniting the two filtered streams or the two sets resulting from calls to <span style="font-family: courier;">subtract()</span> into one combined set might move their times closer to those of <span style="font-family: courier;">disjunction()</span>).</p><p>Another surprise to me was that the approach using streams was consistently slower with tree sets than it was with hash sets, despite the tree sets being inherently ordered. The characteristic vector (<span style="font-family: courier;">BitSet</span>) approach seemed to have a slight preference for tree sets, but I'm not entirely sure that was consistent.<br /></p><p>In this particular run, the characteristic vector approach (using <span style="font-family: courier;">BitSet</span>) beat all comers. In other runs with similar parameters, streams sometimes beat <span style="font-family: courier;">BitSet</span> and sometimes did not when the sets were instances of <span style="font-family: courier;">HashSet</span>. With <span style="font-family: courier;">TreeSet</span>, using <span style="font-family: courier;">BitSet</span> appeared to be consistently faster. Let's contrast that with a second experiment, in which the integers are still between 0 and 100,000 but the sets are smaller (cardinality between 10 and 20).</p><p><br /></p><div align="center"><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-baqh{text-align:center;vertical-align:top} </style><table class="tg"><thead> <tr> <th class="tg-baqh">Method</th> <th class="tg-baqh">Streams</th> <th class="tg-baqh">subtract()<br /></th> <th class="tg-baqh">disjunction()</th> <th class="tg-baqh">BitSet</th> </tr></thead><tbody> <tr> <td class="tg-baqh">HashSet</td> <td class="tg-baqh">3.18</td> <td class="tg-baqh">5.96</td> <td class="tg-baqh">9.75<br /></td> <td class="tg-baqh">6.24<br /></td> </tr> <tr> <td class="tg-baqh">TreeSet</td> <td class="tg-baqh">3.11<br /></td> <td class="tg-baqh">4.89<br /></td> <td class="tg-baqh">8.10<br /></td> <td class="tg-baqh">4.07<br /></td> </tr></tbody> </table></div><p> </p><p>Note that with the smaller sets the method using streams beats the characteristic vector (<span style="font-family: courier;">BitSet</span>) method, and even the Commons Collections <span style="font-family: courier;">subtract()</span> method may be a bit better than using the characteristic vector.</p><p>For my project, I am using instances of <span style="font-family: courier;">TreeSet</span>, and I'm fairly certain the sets will (mostly) have cardinality in the low hundreds, so I will probably go with the <span style="font-family: courier;">BitSet</span> approach. If anyone would like to run their own tests, my code is available in a <a href="https://gitlab.msu.edu/orobworld/setsymdiff" target="_blank">GitLab repository</a>.</p><p><b>Update</b>: It belatedly occurred to me that in my research project, where I only need to get the size of the symmetric difference and not the symmetric difference itself, it might make sense to exploit the fact that the size of the symmetric difference between $A$ and $B$ is$$\vert A\vert + \vert B \vert - 2 \cdot \vert A \cap B\vert.$$So I can compute the size of the intersection, do a little arithmetic, and have the size of the symmetric difference.</p><p>I updated my code to include two versions of this approach. One version computes the intersection by streaming one set and filtering it based on inclusion in the other set. The other version uses the <span style="font-family: courier;">CollectionUtils.intersection()</span> method. Once again, the <span style="font-family: courier;">CollectionUtils</span> method was not competitive. Comparing the stream intersection method to the original stream approach and the characteristic vector approach using the original specifications for sets (cardinality 100 to 200), it seems the streamed intersection method is about twice as fast as the original stream method on both hash sets and tree sets (which makes sense, since it does with one stream what the original method did with two). It is also about twice as fast as the characteristic vector method on hash sets, but roughly half as fast on tree sets, as seen in the table below. (Times for the first two methods differ from previous tables because exact timings are unfortunately not reproducible.)<br /></p><p><br /></p><div align="center"><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-baqh{text-align:center;vertical-align:top} </style><table class="tg"><thead> <tr> <th class="tg-baqh">Method</th> <th class="tg-baqh">Streams</th> <th class="tg-baqh">BitSet</th> <th class="tg-baqh">Intersection<br /></th> </tr></thead><tbody> <tr> <td class="tg-baqh">HashSet</td> <td class="tg-baqh">11.33</td> <td class="tg-baqh">10.33</td> <td class="tg-baqh">5.59<br /></td> </tr> <tr> <td class="tg-baqh">TreeSet</td> <td class="tg-baqh">23.32</td> <td class="tg-baqh">7.47<br /></td> <td class="tg-baqh">12.23<br /></td> </tr></tbody> </table></div><p> </p><p>So the single stream intersection approach looks to be the fastest for computing the size (but, again, not contents) of the symmetric difference if I'm using hash sets, while the characteristic vector (<span style="font-family: courier;">BitSet</span>) approach is fastest if I'm using tree sets.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-53304958082060630952021-04-28T18:09:00.001-04:002021-04-28T18:09:17.613-04:00A BDD with JGraphT<p>Decision diagrams, and in particular binary decision diagrams (BDDs) [1], were originally introduced in computer science to evaluate logic propositions or boolean functions. Lately, they've taken on multiple roles in discrete optimization [2]. I've been reading an excellent book [3] on them, with ideas about using BDDs in a current research project. As is my wont, I'll be coding the research in Java, so I wanted to do a little demo project to figure out how to build and process a BDD in Java.</p><p>Not wanting to reinvent any wheels, I looked for an open-source Java graph library with which to build the diagrams, and settled on JGraphT [4]. Not only does JGraphT have the necessary building blocks, it has much better online documentation than many libraries. Also, there is a very helpful article [5] about it on the Baeldung web site (which is itself an extremely useful site for all things Java).</p><p>A BDD is a directed, acyclic, layered multigraph with edge weights. If you're not familiar with the term "multigraph", it means that there can be two or more distinct edges between the same pair of nodes, in the same direction. In a BDD, each node represents a state of the system, with up to two outbound arcs, corresponding to true (1) or false (0) values for a particular decision variable. The decision variable is the same for all nodes in a particular layer. An arc is omitted if it represents a decision which, given the state, would make the solution infeasible. To keep the size of the BDD in check (somewhat), you do not want multiple nodes in a layer with the same state. The multigraph aspect arises because, in some circumstances, the next state may be the same regardless of the decision at the current node (so both arcs go to the same child node). Among the attractions of the JGraphT library are its support for nodes based on arbitrary classes (which in a BDD means the state at the node) and for multigraphs.</p><p>To learn how to build BDDs with JGraphT, I decided to solve a maximal independent set problem (MISP) [6] with integer node weights. This means choosing the subset of nodes with greatest total weight such that no two chosen nodes are adjacent. JGraphT contains routines to generate some well-known (to graph theorists -- less well known to me) graphs, and for convenience I chose the Chvátal graph [7], which has 12 nodes and 24 edges. Here is an illustration of the Chvátal graph, with the (randomly generated) node weights in parentheses.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-w0HsZHkKJK0/YInVM3EUt1I/AAAAAAAADgQ/ZTQ_gl2UzsYGeSIFUNiPedwRx0RVO795gCLcBGAsYHQ/s635/chvatal.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="Chvatal graph" border="0" data-original-height="635" data-original-width="576" height="640" src="https://1.bp.blogspot.com/-w0HsZHkKJK0/YInVM3EUt1I/AAAAAAAADgQ/ZTQ_gl2UzsYGeSIFUNiPedwRx0RVO795gCLcBGAsYHQ/w580-h640/chvatal.png" width="580" /></a></div>My Java program uses routines in the JGraphT library to turn the graph into a DOT file [8], which it saves. I then use GraphViz [9] outside the Java program to convert the DOT file into the format I need for wherever the plot is going.<p></p><p>Using the same DOT export trick, I managed to generate a plot of the BDD, in which nodes display the set of vertices still available for addition to the independent set, arcs are solid if a vertex is being added to the independent and dotted if not, and solid arcs are annotated with the number of the vertex being added.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://rubin.msu.domains/blog/bdd.svg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img alt="BDD graph" border="0" data-original-height="1451" data-original-width="1939" height="299" src="https://1.bp.blogspot.com/-JuBxEfr93WE/YInXGojPIPI/AAAAAAAADgY/IZdhFygLKFoD6WxUERub4T202ULPQif_wCLcBGAsYHQ/w400-h299/bdd.png" width="400" /></a></div>Unfortunately, Blogger does not accept SVG images and the BDD is a bit too big for a legible PNG graph. If you want to see a better image, click it and an SVG version should open in a new window or tab.<br /><p></p><p>This post is already a bit long, so I won't go into details about the various coding issues I ran into or how I worked around them. I will point out one minor mathematical issue. Since the MISP is a maximization problem, the goal is to find the longest (in terms of weight, not number of edges) path from root node to terminal node in the BDD. JGraphT has a package containing shortest path algorithms, but no longest path algorithms. Fortunately, the number of layers in the graph is fixed (one layer per decision variable, plus one to hold the terminal node), which means the number $L$ of links in a longest path is fixed. So we simply find the maximum weight $W$ of any node in the graph, change the weight $w_e$ of each edge $e$ to $LW - w_e$, and find the shortest path using the modified weights. That path is guaranteed to be the longest path with respect to the original weights.</p><p>Last thing: As usual, my code is available for you to play with from my <a href="https://gitlab.msu.edu/orobworld/maximalindependentset" target="_blank">GitLab repository</a>.<br /></p><h3 style="text-align: left;">References</h3><div style="text-align: left;">[1] Wikipedia entry: <a href="https://en.wikipedia.org/wiki/Binary_decision_diagram" target="_blank">Binary decision diagram</a></div><div style="text-align: left;">[2] <a href="https://www.andrew.cmu.edu/user/vanhoeve/mdd/" target="_blank">Decision Diagrams for Optimization</a> (web page by <a href="http://www.andrew.cmu.edu/user/vanhoeve/" target="_top">Willem-Jan van Hoeve</a>)</div><div style="text-align: left;">[3] <span face="sans-serif">Bergman, D.; Cire, A. A.; van Hoeve, W.-J. & Hooker, J. <a href="https://www.springer.com/us/book/9783319428475" target="_blank">Decision Diagrams for Optimization</a>. <i>Springer International Publishing AG, </i><b>2016</b></span>.</div><div style="text-align: left;"><span face="sans-serif">[4] <a href="https://jgrapht.org/" target="_blank">JGraphT</a> library</span></div><div style="text-align: left;"><span face="sans-serif">[5] <a href="https://www.baeldung.com/jgrapht" target="_blank">Introduction to JGraphT</a> (Baeldung)</span></div><div style="text-align: left;"><span face="sans-serif">[6] Wikipedia entry: <a href="https://en.wikipedia.org/wiki/Maximal_independent_set" target="_blank">Maximal independent set</a></span></div><div style="text-align: left;">[7] Wikipedia entry: <a href="https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph" target="_blank">Chvátal graph</a></div><div style="text-align: left;">[8] Wikipedia entry: <a href="https://en.wikipedia.org/wiki/DOT_(graph_description_language)" target="_blank">DOT (graph description language)</a></div>[9] <a href="https://graphviz.org/" target="_blank">Graphviz - Graph Visualization Software</a><div style="text-align: left;"><br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-35563666996288037912021-04-21T18:27:00.000-04:002021-04-21T18:27:13.360-04:00Lagrangean Relaxation: The Sequel<p>In a <a href="https://orinanobworld.blogspot.com/2021/02/lagrangean-relaxation-for-assignment.html" target="_blank">previous post</a>, I looked at a way to solve a multiple assignment problem (where multiple users can be assigned to each server and each user can be assigned to multiple servers) using Lagrangean relaxation (LR). I won't repeat the details of the problem, or why LR was of interest, here. The post included some computational experiments in R, using CPLEX to get the optimal solution (for confirmatory purposes) and then trying out various nonlinear optimization algorithms on the Lagrangean function.</p><p>I've been looking for an open-source, derivative-free nonlinear optimizer (capable of taking box constraints) in Java, and I came across a couple in the <a href="https://commons.apache.org/proper/commons-math/" target="_blank">Apache Commons Mathematics Library</a>. Wanting to test one of them out, I repeated the experiment with the assignment problem in Java, again using CPLEX to get the optimal solution, and using the <a href="http://www.optimization-online.org/DB_HTML/2010/05/2616.html" target="_blank">BOBYQA algorithm</a> for minimizing the Lagrangean. As is my habit, I've made my Java code available via a <a href="https://gitlab.msu.edu/orobworld/binary-assignment" target="_blank">GitLab repository</a> for anyone who might want to see it. The Apache Commons library is a bit funky when it comes to using the optimization classes, so I had to do a little trial and error (and considerable staring at the Javadocs), along with a web search for examples. Hopefully my code is simple enough to be easy to digest.</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-38282029725711804842021-04-12T19:18:00.000-04:002021-04-12T19:18:26.036-04:00A Math Puzzle as a Network<p>There is a standard type of math puzzle that has been around at least since I was a child. The details vary, but the concept is consistent. You are typically given a few initially empty containers of various (integer) capacities, an essentially infinite reservoir of something that goes in the containers, and a goal (integer) for how much of that something you want to end up with. You have to figure out how to reach the goal without having any measuring instruments, meaning that your operations are limited to emptying a container into the reservoir, filling a container from the reservoir, or moving content from one container to another until you empty the source or fill the destination, whichever happens first. (All this is done under the assumption of no spillage, meaning the originator of the puzzle did not have me in mind.) I think I've seen a variant that involves cutting things, where your ability to measure where to cut is limited to stacking pieces you already have as a guide to the piece you want to cut.</p><p>A <a href="https://math.stackexchange.com/questions/4098586/solving-the-water-jug-problem-using-dynamic-programming/" target="_blank">question</a> popped up on Mathematics Stack Exchange about how to solve one of these puzzles using dynamic programming (DP) with backward recursion. The problem at hand involves two jugs, of capacities seven and three liters respectively, and a lake, with the desired end state being possession of exactly five liters of water. The obvious (at least to me) state space for DP would be the volume of water in each jug, resulting in 32 possible states ($\lbrace 0,\dots,7\rbrace \times \lbrace 0,\dots,3 \rbrace$). Assuming the objective function is to reach the state $(5,0)$ with a minimal number of operations, the problem can be cast just as easily as a shortest path problem on a digraph, in which each node is a possible state of the system, each arc has weight 1, and arcs fall into one of the categories mentioned in the previous paragraph.</p><p>I was looking for an excuse to try out the <span style="font-family: courier;">igraph</span> package for <span style="font-family: courier;">R</span>, and this was it. In my R notebook, a node label "5|2" would indicate the state where the larger jug contains five liters and the smaller jug contains two. Arcs are labeled with one of the following: "EL" (empty the larger jug); "FL" (fill the larger jug); "ES" (empty the smaller jug); "FS" (fill the smaller jug); "PLS" (pour the larger jug into the smaller jug); or "PSL" (pour the smaller jug into the larger jug).<br /></p><p>Assuming I did not screw up the digraph setup, a total of nine operations are required to get the job done. If you are interested, you can see my code (and the solution) in <a href="https://rubin.msu.domains/blog/water_jug.nb.html" target="_blank">this R notebook</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-53659740768799476852021-04-08T17:19:00.001-04:002021-07-05T14:50:15.565-04:00A GA Model for a Joint Clustering Problem<p>A problem in grouping users and servers was posted on <a href="https://math.stackexchange.com/questions/4088441/what-will-be-an-efficient-joint-clustering-solution-to-this-problem" target="_blank">Mathematics Stack Exchange</a> and <a href="https://or.stackexchange.com/questions/6023/how-to-mathematically-formulate-the-optimization-problem" target="_blank">OR Stack Exchange</a>. (Someone remind me to rant about cross-posting in a future blog post. Just don't cross-post the reminder.) The gist of the problem is as follows. We have $S$ servers of some sort, and $U$ users. For each combination of user $u$ and server $s$, we have a parameter $h_{u,s}$ which pertains to the quality / strength / something of service user $u$ would get from server $s$. We are told to group users and servers into a predefined number $G$ of groups or clusters. Every user in cluster $g$ will be served by every server in cluster $g$, but servers in other clusters will interfere with the service to user $u$. (A possible application might be cellular phone service, where signals from towers to which you are not connected might interfere with your signal. Just guessing.)</p><p>There is one more parameter, the maximum number ($M$) of servers that can be assigned to a group. It is explicitly stated that there is no limit to the number of users that can be assigned to a group. I'm going to go a step further and assume that every group must contain at least one server but that there is no lower limit to the number of users assigned to a group. (If a group has no users, presumably the servers in that group get to relax and play video games or whatever.)</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The objective is to maximize $\sum_{u=1}^U q_u$, the total quality of service, where $q_u$ is the quality of service for user $u$. What makes the problem a bit interesting is that $q_u$ is a nonlinear function of the allocation decisions. Specifically, if we let $\mathcal{S}_1, \dots, \mathcal{S}_G$ be the partition of the set $\mathcal{S} = \lbrace 1,\dots, S\rbrace$ of all servers, and if user $u$ is assigned to group $g$, then $$q_{u}=\frac{\sum_{s\in\mathcal{S}_{g}}h_{us}}{\sum_{s\notin\mathcal{S}_{g}}h_{us}}.$$Note that the service quality for user $u$ depends only on which <i>servers</i> are/are not in the same group with it; the assignments of other <i>users</i> do not influence the value of $q_u$.</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">An answer to the OR SE post explains how to model this as a mixed-integer linear program, including how to linearize the objective. That is the approach I would recommend. The original poster, however, specifically asked for a heuristic approach. I got curious and wrote a genetic algorithm for it, in R, using the GA library. Since this is a constrained problem, I used a random key GA formulation. I won't go into excessive detail here, but the gist is as follows. We focus on assigning servers to groups. Once we have a server assignment, we simply compute the $q_u$ value for each user and each possible group, and assign the user to the group that gives the highest $q_u$ value.</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">To assign servers to groups, we start with an "alphabet" consisting of the indices $1,\dots,S$ for the servers and $G$ dividers (which I will denote here as "|"). In the R code, I use NA for the dividers. A "chromosome" is an index vector that permutes the alphabet. Without loss of generality, we can assume that the first server goes in the first group, and the last divider must come after the last group, and thus we permute only the intervening elements of the alphabet. For instance, if $S=5$ and $G=3$, the alphabet is $1,2,3,4,5,|,|,|$ and a chromosome $(2, 7, 4, 6, 5, 3)$ would translate to the list $1, 2, |, 4, |, 5, 3, |$. (Each element of the chromosome is the index of an element in the alphabet.) I would interpret that as group 1 containing servers 1 and 2, group 2 containing server 4, and group 3 containing servers 3 and 5.</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">There is no guarantee that a random chromosome produces a server grouping that contains at least 1 and at most $M$ servers in every group, so we post-process it by going group by group and adjusting the dividers by the minimum amount necessary to make the current group legal. Once we have the servers grouped, we assign users by brute force and compute an overall fitness of the solution.</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">I am deliberately leaving out some (OK, many) gory details here. My R code and a test problem are contained in an <a href="https://rubin.msu.domains/blog/joint_clustering.nb.html" target="_blank">R notebook</a> that you are welcome to peruse. When I rerun the GA on the same test problem, I get different results, which is not surprising since the GA is a heuristic and is most definitely not guaranteed to produce an optimal solution. Whether the solution is "good enough", and whether it scales to the size problem the original poster has in mind, are open questions.</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p><style type="text/css">p, li { white-space: pre-wrap; }</style></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-26189519357561231412021-02-18T15:52:00.001-05:002021-02-18T15:52:32.915-05:00Restarting the MATE Panel<p>I know that <a href="https://en.wikipedia.org/wiki/Linux_Mint#Cinnamon" target="_blank">Cinnamon</a> is the trendy choice for desktop environment with Linux Mint, but ever since an <a href="https://orinanobworld.blogspot.com/2016/12/mint-upgrade-woes.html" target="_blank">unfortunate misadventure</a> with video drivers I have been using the somewhat more stable and somewhat faster <a href="https://mate-desktop.org/" target="_blank">MATE</a> (pronounced <em>Ma-Tay</em>, per their home page) environment. Overall I am very happy with it. There are, however, occasional hiccups with the panel, the bar along one edge of the display (bottom in my case) containing launchers, tabs for open applications and general what-not. Occasionally, for reasons beyond me ken, some of the icons will be screwed up, duplicated, or duplicated and screwed-up. This morning, for instance, I found not one but three iterations of the audio output icon (looks like a loud speaker, used to set audio preferences). The first icon had the normal appearance, meaning audio was enabled, while the second and third were indicating that audio was muted. (Despite the 2-1 vote against, audio was in fact enabled.)</p><p>Glitches like that do not render the system unusable, but they are annoying. So I dug around a bit and discovered that the system command <span style="font-family: courier;">mate-panel</span>. Run from a shell script or a launcher, the command <span style="font-family: courier;">mate-panel --replace</span> seems to do the trick of restarting the panel (and hopefully fixing the glitch that made you restart it). If you run the command from a terminal, be sure to push it to the background using an ampersand (<span style="font-family: courier;">mate-panel --replace &</span>). Otherwise, it will tie up the terminal session, and when you break out of it (probably via Ctrl-C) the panel will rather unhelpfully evaporate.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-42927227553394294812021-02-15T19:27:00.000-05:002021-02-15T19:27:33.333-05:00Lagrangean Relaxation for an Assignment Problem<p> A <a href="https://or.stackexchange.com/questions/5698/how-can-i-find-the-optimal-assignments-for-this-milp-problem-heuristically" target="_blank">question</a> on OR Stack Exchange asked about solving an assignment problem "heuristically but to optimality". The problem formulation (in which I stick as closely as possible to the notation in the original post, but substitute symbols for two numeric parameters) is as follows:</p><p>\begin{align*} \max_{d_{u,c}} & \sum_{u=1}^{U}\sum_{c=1}^{C}\omega_{u,c}d_{u,c}\\ \text{s.t. } & \sum_{c=1}^{C}d_{u,c}\le C_{\max}\ \forall u\in\left\{ 1,\dots,U\right\} \\ & \sum_{c=1}^{C}d_{u,c}\ge1\ \forall u\in\left\{ 1,\dots,U\right\} \\ & \sum_{u=1}^{U}d_{u,c}\le U_{\max}\ \forall c\in\left\{ 1,\dots,C\right\} \\ & d_{u,c}\in\left\{ 0,1\right\} \ \forall u,c. \end{align*} Here $d_{u,c}$ is a binary variable, representing assignment of "user" $u$ to "service provider" $c$, and everything else is a parameter. Each user must be assigned at least one provider and at most $C_\max$ providers, and each provider can be assigned at most $U_\max$ users. The objective maximizes the aggregate utility of the assignments.</p><p>One of the answers to the question asserts that the constraint matrix has the "integrality property", meaning that any basic feasible solution of the LP relaxation will have integer variable values. The recommended solution approach is therefore to solve the LP relaxation, and I agree with that recommendation. (I have not seen a proof that the matrix has the integrality property, but in my experiments the LP solution always was integer-valued.) That said, the author did ask about "heuristic" approaches, which got me wondering if there was a way to solve to optimality without solving an LP (and thus requiring access to an LP solver).</p><p> I decided to try Lagrangean relaxation, and it seems to work. In theory, it should work: if the constraint matrix has the integrality property, and the LP relaxation automatically produces an optimal integer-valued solution, then there is no <a href="https://en.wikipedia.org/wiki/Duality_gap" target="_blank">duality gap</a>, so the solution to the Lagrangean problem should be optimal for the original problem. The uncertainty lies more in numerical issues stemming from the solving of the Lagrangean problem.</p><p>In what follows, I am going to reverse the middle constraint of the original problem (multiplying both sides by -1) so that all constraints are $\le$ and thus all dual multipliers are nonnegative. If we let $\lambda\ge 0$, $\mu\ge 0$ and $\nu\ge 0$ be the duals for the three sets of constraints, the Lagrangean relaxation is formulated as follows:</p><p>$$\min_{\lambda,\mu,\nu\ge0}LR(\lambda,\mu,\nu)=\\\max_{d\in\left\{ 0,1\right\} ^{U\times C}}\left(\sum_{u}\sum_{c}\omega_{u,c}d_{u,c}-\sum_{u}\lambda_{u}\left[\sum_{c}d_{u,c}-C_{\max}\right]\\+\sum_{u}\mu_{u}\left[\sum_{c}d_{u,c}-1\right]-\sum_{c}\nu_{c}\left[\sum_{u}d_{u,c}-U_{\max}\right]\right).$$</p><p>We can simplify that a bit: <br /></p><p>$$\min_{\lambda,\mu,\nu\ge0}LR(\lambda,\mu,\nu)=\\\max_{d\in\left\{ 0,1\right\} ^{U\times C}}\left(\sum_{u}\sum_{c}\left[\omega_{u,c}-\lambda_{u}+\mu_{u}-\nu_{c}\right]d_{u,c}\\+C_{\max}\sum\lambda_{u}-\sum_{u}\mu_{u}+U_{\max}\sum_{c}\nu_{c}\right).$$</p><p>The inner maximization problem is solvable by inspection. Let $\rho_{u,c}= \omega_{u,c}-\lambda_{u}+\mu_{u}-\nu_{c}$. If $\rho_{u,c} > 0$, $d_{u,c}=1$. If $\rho_{u,c} < 0$, $d_{u,c}=0$. If $\rho_{u,c} = 0$, it does not matter (as far as the inner problem goes) what value we give $d_{u,c}$. So we can rewrite the outer (minimization) problem as follows:</p><p>$$\min_{\lambda, \mu, \nu \ge 0}LR(\lambda,\mu,\nu)=\\\sum_{u}\sum_{c}\left(\rho_{u,c}\right)^{+}+C_{\max}\sum\lambda_{u}-\sum_{u}\mu_{u}+U_{\max}\sum_{c}\nu_{c}.$$</p><p>$LR(\lambda,\mu,\nu)$ is a piecewise-linear function of its arguments, with directional gradients, but is not continuously differentiable. (Things get a bit tricky when you are on a boundary between linear segments, which corresponds to having $\rho_{u,c}=0$ for one or more combinations of $u$ and $c$.)</p><p>I coded a sample instance in R and tested both solving the LP relaxation (using CPLEX) and solving the Lagrangean problem, both using a derivative-based method (a version of the <a href="https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm" target="_blank">BFGS algorithm</a>) and using a couple of derivative-free algorithms (versions of the <a href="https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method" target="_blank">Nelder-Mead</a> and Hooke-Jeeves [1] algorithms). Importantly, all three algorithms are modified to allow box constraints, so that we can enforce the sign restriction on the multipliers.</p><p>You can <a href="http://rubin.msu.domains/blog/assignment.nb.html" target="_blank">download my code</a> in the form of an R notebook, containing text, output and the code itself (which can be extracted). In addition to CPLEX, it uses a gaggle of R libraries: <span style="font-family: courier;">magrittr</span> (for convenience); <span style="font-family: courier;">ompr</span>, <span style="font-family: courier;">ompr.roi</span>, <span style="font-family: courier;">ROI</span> and <span style="font-family: courier;">ROI.plugin.cplex</span> for building the LP model and interfacing with CPLEX; and <span style="font-family: courier;">dfoptim</span> for the Nelder-Mead and Hooke-Jeeves algorithms. (The BFGS algorithm comes via the <span style="font-family: courier;">optim()</span> method, part of the built-in <span style="font-family: courier;">stats</span> library.) If you want to play with the code but do not have CPLEX or some of the libraries, you can just delete the lines that load the missing libraries along with the code that uses them.</p><p>Based on limited experimentation, I would say that Nelder-Mead did not work well enough to consider, and BFGS did well in some cases but produced somewhat suboptimal results in others. It may be that tweaking some control setting would have helped with the cases where BFGS ran into trouble. Hooke-Jeeves, again in limited testing, consistently matched the LP solution. So if I needed to come up with some hand-coded way to solve the problem without using libraries (and did not want to write my own simplex code), I would seriously consider using Hooke-Jeeves (which I believe is pretty easy to code) on the Lagrangean problem.<br /></p><p><br /></p><p>[1] Hooke, Robert and Jeeves, T. (1961) "Direct search'' solution of numerical and statistical problems. <i>Journal of the ACM</i>, Vol. 8, No. 2, 212-229. <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com4tag:blogger.com,1999:blog-8781383461061929571.post-28157765276832517432021-01-31T14:54:00.003-05:002021-07-08T19:45:26.778-04:00Solving a Multidimensional NLP via Line Search<p>Someone <a href="https://or.stackexchange.com/questions/5628/piecewise-linear-and-global-optimization" target="_blank">posted</a> a nonconvex nonlinear optimization model on OR Stack Exchange and asked for advice about possible reformulations, piecewise linear approximations, using global optimizers, and other things. The model is as follows:\begin{alignat}{1} \max\,\, & q_{1}+q_{2}\\ \mathrm{s.t.} & \sum_{i=1}^{n}p_{i}x_{i}=\sum_{t=0}^{T}\frac{F_{t}}{(1+q_{1})^{t}} &(1)\\ & \sum_{i=1}^{n}p_{i}x_{i}=\sum_{t=0}^{T}\sum_{i=1}^{n}\frac{b_{t,i}x_{i}}{(1+q_{2})^{t}} &(2)\\ & \sum_{i=1}^{n}p_{i}x_{i}=\beta\sum_{t=0}^{T}\frac{F_{t}}{(1+q_{1}+q_{2})^{t}} &(3)\\ & q_{1,}q_{2}\ge0\\ & x\in\left[0,1\right]^{n} \end{alignat} All symbols other than $q_1$, $q_2$ and $x$ are model parameters (or indexes). The author originally had $x$ as binary variables, apparently believing that would facilitate linearization of products, but also expressed interest in the case where $x$ is continuous. I'm going to propose a possible "low-tech" solution procedure for the continuous case. The binary case is probably a bit too tough for me. The author supplied sample data for all parameters except $\beta$, with dimensions $n=3$ and $T=4$ but expressed a desire to solve the model for $n=10,000$ and $T=1,200$ (gulp).<br /></p><p>Note that the left-hand sides (LHSes) of the three constraints are identical. Let $h()$ be the function on the right-hand side (RHS) of constraint (1), so that the RHS of (1) is $h(q_1)$. $h()$ is a monotonically decreasing function. The RHS of (3) is $\beta h(q_1 + q_2)$. Since the left sides are equal, we have $$\beta h(q_1 + q_2) = h(q_1) \quad (4)$$which tells us several things. First, $q_2 \ge 0 \implies h(q_1+q_2) \le h(q_1)$, so if $\beta<1$ it is impossible to satisfy (4). Second, if $\beta =1$, (4) implies that $q_2 = 0$, which simplifies the problem a bit. Lastly, let's assume $\beta > 1$. For fixed $q_1$ the LHS of (4) is monotonically decreasing in $q_2$, with the LHS greater than the RHS when $q_2 = 0$ and with $$\lim_{q_2\rightarrow \infty} \beta h(q_1+q_2) = \beta F_0.$$ If $\beta F_0 > h(q_1)$, there is no $q_2$ that can balance equation (4), and so the value of $q_1$ is infeasible in the model. If $\beta F_0 < h(q_1)$, then there is exactly one value of $q_2$ for which (4) holds, and we can find it via line search.</p><p>Next, suppose that we have a candidate value for $q_1$ and have found the unique corresponding value of $q_2$ by solving (4). We just need to find a vector $x\in [0,1]^n$ that satisfies (1) and (2). Equation (3) will automatically hold if (1) does, given (4). We can find $x$ by solving a linear program that minimizes 0 subject to (1), (2) and the bounds for $x$.</p><p>Thus, we have basically turned the problem into a line search on $q_1$. Let's set an arbitrary upper limit of $Q$ for $q_1$ and $q_2$, so that our initial "interval of uncertainty" for $q_1$ is $[0, Q]$. It's entirely possible that neither 0 nor $Q$ is a feasible value for $q_1$, so our first task is to search upward from $0$ until we find a feasible value (call it $Q_\ell$) for $q_1$, then downward from $Q$ until we find a feasible value (call it $Q_h$) for $q_1$. After that, we cross our fingers and hope that all $q_1 \in [Q_\ell,Q_h]$ are feasible. I think this is true, although I do not have a proof. (I'm much less confident that it is true if we require $x$ to be binary.) Since $q_2$ is a function of $q_1$ and the objective function does not contain $x$, we can search $[Q_\ell,Q_h]$ for a local optimum (for instance, by <a href="https://en.wikipedia.org/wiki/Golden-section_search" target="_blank">golden section search</a>) and hope that the objective function is unimodal as a function of $q_1$, so that the local optimum is a global optimum. (Again, I do not have proof, although I would not be surprised if it were true.)</p><p>I put this to the test with an R script, using the author's example data. The linear programs were solved using CPLEX, with the model expressed via the OMPR package for R and using ROI as the intermediary between OMPR and CPLEX. I first concocted an arbitrary feasible solution and used it to compute $\beta$, so that I would be sure that the problem was feasible with my choice of $\beta$. Using $\beta = 1.01866$ and 100 (arbitrarily chosen) as the initial upper bounds for $q_1$ and $q_2$, my code produced an "optimal" solution of $q_1= 5.450373$, $q_2 = 0.4664311$, $x = (1, 0.1334608, 0)$ with objective value $5.916804$. There is a bit of rounding error involved: the common LHS of (1)-(3) evaluated to 126.6189, while the three RHSes were 126.6186, 126.6188, and 126.6186. (In my graduate student days, our characterization of this would be "good enough for government work".) Excluding loading libraries, the entire process took under three seconds on my desktop computer.</p><p>You can access <a href="http://rubin.msu.domains/blog/nlp.nb.html" target="_blank">my R code</a> from my web site. It is in the form of an R notebook (with the code embedded), so even if you are not fluent in R, you can at least read the "prose" portions and see some of the nagging details involved.<br /></p><p> <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com6tag:blogger.com,1999:blog-8781383461061929571.post-34335087041842261802021-01-28T17:17:00.000-05:002021-01-28T17:17:14.131-05:00A Monotonic Assignment Problem<p> A <a href="https://stackoverflow.com/questions/65884107/minimize-sum-of-product-of-two-uneven-consecutive-arrays" target="_blank">question</a> posted on Stack Overflow can be translated to an assignment problem with a few "quirks". First, the number of sources ($m$) is less than the number of sinks ($n$), so while every source is assigned to exactly one sink, not every sink is assigned to a source. Second, there are vectors $a\in\mathbb{R}^m$ and $b\in\mathbb{R}^n$ containing weights for each source and sink, and the cost of assigning source $i$ to sink $j$ is $a_i \times b_j$. Finally, there is a monotonicity constraint. If source $i$ is assigned to sink $j$, then source $i+1$ can only be assigned to one of the sinks $j+1,\dots,n$.</p><p>Fellow blogger Erwin Kalvelagen <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2021/01/an-assignment-problem-with-wrinkle.html" target="_blank">posted a MIP model</a> for the problem and explored some approaches to solving it. A key takeaway is that for a randomly generated problem instance with $m=100$ and $n=1,000$, CPLEX needed about half an hour to get a provably optimal solution. After seeing Erwin's post, I did some coding to cook up a network (shortest path) solution in Java. Several people proposed similar and in some cases essentially the same model in comments on Erwin's post. Today, while I was stuck on a Zoom committee call and fighting with various drawing programs to get a legible diagram, Erwin produced a <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2021/01/assignment-problem-with-wrinkle.html" target="_blank">follow-up post</a> showing the network solution (including the diagram I was struggling to produce ... so I'll refer readers to Erwin's post and forget about drawing it here).<br /></p><p>The network is a layered digraph (nodes organized in layers, directed arcs from nodes in one layer to nodes in the next layer). It includes two dummy nodes (a start node in layer 0 and a finish node in layer $m+1$). All nodes in layer $i\in \lbrace 1,\dots,m \rbrace$ represent possible sink assignments for source $i$. The cost of an arc entering a node representing sink $j$ in layer $i$ is $a_i \times b_j$, regardless of the source of the arc. All nodes in layer $m$ connect to the finish node via an arc with cost 0. The objective value of any valid assignment is the sum of the arc costs in the path from start to finish corresponding to that assignment, and the optimal solution corresponds to the shortest path from start to finish.</p><p>The monotonicity restriction is enforced simply by omitting arcs from any node in layer $i$ to a lower-index node in layer $i+1$. It also allows us to eliminate some nodes. In the first layer after the start node (where we assign source 1), the available sinks are $1,\dots,n-m+1$. The reason sinks $n-m+2,\dots,n$ are unavailable is that assigning source 1 to one of them and enforcing monotonicity would cause us to run out of sinks before we had made an assignment for every source. Similarly, nodes in layer $i>1$ begin with sink $i$ (because the first $i-1$ sinks must have been assigned or skipped in earlier layers) and end with sink $n-m+i$ (to allow enough sinks to cover the remaining $m-i$ nodes).</p><p>For the dimensions $m=100$, $n=1000$, the network has 90,102 nodes and 40,230,551 arcs. That may sound like a lot, but my Java code solves it in <i>under four seconds</i>, including the time spent setting up the network. I used the excellent (open-source) <a href="https://algs4.cs.princeton.edu/code/" target="_blank">algs4 library</a>, and specifically the <span style="font-family: courier;">AcyclicSP</span> class for solving the shortest path problem. Erwin reports even faster solution time for his network model (0.9 seconds, coded in Python), albeit on different hardware. At any rate, he needed about half an hour to solve the MIP model, so the main takeaway is that for this problem the network model is <i>much</i> faster.</p><p>If anyone is interested, my Java code is available for download from my <a href="https://gitlab.msu.edu/orobworld/orderedassignment" target="_blank">Git repository</a>. The main branch contains just the network model, and the only dependency is the algs4 library. There is also a branch named "CPLEX" which contains the MIP model, in case you either want to compare speeds or just confirm that the network model is getting correct results. If you grab that branch, you will also need to have CPLEX installed.<br /></p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com4tag:blogger.com,1999:blog-8781383461061929571.post-63090173192753637952021-01-22T18:24:00.000-05:002021-01-22T18:24:15.234-05:00Rainbow Parentheses in RStudio<p>I use the open-source edition of the <a href="https://rstudio.com/products/rstudio/" target="_blank">RStudio IDE</a> for any R coding I do, and I'm a big fan of it. The latest version (1.4.1103) introduced a new feature that was previously only available in alpha and beta versions: rainbow parentheses. I'd never heard the term before, but the meaning turns out to be remarkably simple. When turned on, if you enter an expression with nested parentheses, brackets or braces, RStudio automatically color codes the parentheses (brackets, braces) to make it easier to see matching pairs. This is in addition to the existing feature that highlights the matching delimiter when you put the cursor after a delimiter.</p><p>I was geeked to try this out, but when I first installed the latest version and turned it on, I did not see any changes. Eventually I figured out that it was color coding the delimiters, but the differences were too subtle for me to see. (This was with the default Textmate theme for the IDE.) So I hacked a new theme which makes the colors easier to see. I'll go through the steps here.</p><p>First, let me point to some documentation. In a <a href="https://blog.rstudio.com/2020/11/04/rstudio-1-4-preview-rainbow-parentheses/" target="_blank">blog post</a>, the folks at RStudio explain how to turn rainbow parentheses on, either globally or just for specific files, and near the end tell which CSS classes need to be tweaked to customize the colors (<code>.ace_paren_color_0</code> to <code>.ace_paren_color_6</code>). A <a href="https://rstudio.github.io/rstudio-extensions/rstudio-theme-creation.html" target="_blank">separate document</a> discusses how to create custom themes.</p><p>Theme selection in RStudio is done via <span style="font-family: courier;">Tools > Global Options... > Appearance > Editor theme</span>. Since I use the default (Textmate) theme, my first step was to track down that file and make a copy with a new name. On my Linux Mint system, the file is <span style="font-family: courier;">/usr/lib/rstudio/resources/themes/textmate.rstheme</span>. On Windows (and Macs?) the built-in themes will be lurking somewhere else. The customization document linked above alluded to a <span style="font-family: courier;">~/.R/rstudio/themes</span> directory on Linux and Macs, but that directory did not exist for me. (I created it, and parked my hacked theme file there.) Put the copied file someplace under a new name. I'm not sure whether the file name is significant to RStudio, but better safe than sorry.<br /></p><p>Open the copy you made of your preferred theme file in a text editor. The first two lines are comments that define the theme name (as it will appear in the list of editor themes in RStudio) and whether it is a dark theme or not. Change the name to something that won't conflict with existing themes. In my case, the first line was</p><div align="center"><pre>/* rs-theme-name: Textmate (default) */</pre></div>which I changed to<div align="center"><pre>/* rs-theme-name: Paul */</pre></div>(not very clever, but it got the job done).<p style="text-align: left;">Now add code at the bottom to define colors for the seven "ace_paren_color" styles. Here's what I used:</p><div class="scroll"><pre>.ace_paren_color_0 {<br /> color: #000000 <br /> /* black */<br />}<br /><br />.ace_paren_color_1 {<br /> color: #ff00ff<br /> /* magenta */<br />}<br /><br />.ace_paren_color_2 {<br /> color: #ffff00<br /> /* yellow */<br />}<br /><br />.ace_paren_color_3 {<br /> color: #0080ff<br /> /* light blue */<br />}<br /><br />.ace_paren_color_4 {<br /> color: #FF0000<br /> /* red */<br />}<br /><br />.ace_paren_color_5 {<br /> color: #004f39<br /> /* Spartan green */<br />}<br /><br />.ace_paren_color_6 {<br /> color: #0000ff<br /> /* dark blue */<br />}<br /></pre></div><p>Once you have a candidate style to test, go to the editor themes settings and use the <span style="font-family: courier;">Add...</span> button to add it. I had to fight through a lot of complaints from RStudio, and I needed to restart RStudio to get the new theme to appear. In the same dialog, I selected it, and then put it to the test by typing some nonsense with lots of nested parentheses in a file.</p><p>There are two things to watch out for if you try to remove a theme (using the <span style="font-family: courier;">Remove</span> button in that dialog). First, you cannot remove the currently selected theme, so you will need to select a different theme and click <span style="font-family: courier;">Apply</span>, then go back and select the theme to remove. Second, if you remove a theme, RStudio will <i>delete the theme file</i>. So be sure you have a backup copy if you think you might want to use it again (or edit it).</p><p>One good thing: once you have added your theme, you can edit your theme without having to remove it and then add it back. After saving any changes, you just have to switch to some other theme and then switch back to your theme to see the impact of the changes in your documents.<br /></p> Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-44571637976134719332020-12-12T14:38:00.003-05:002020-12-12T14:38:21.039-05:00Incorrect Program Listings in MythTV<p>I've been using MythTV as a personal video recorder (PVR) since 2012, and for the most part I have been quite satisfied ... but there have definitely been adventures. You can type Mythbuntu or MythTV in the blog search box to see a litany of posts about things that did not work as expected (or at all) (or threatened to implode the universe) and fixes I found. Today's post is a two part adventure regarding program listings.</p><p>I have an account with <a href="https://www.schedulesdirect.org/" target="_blank">Schedules Direct</a>, which lets me download XML files containing program listings for my cable provider. I access it from two different machines. On my main PC, I have <a href="http://www.artificialworlds.net/freeguide/Main/HomePage" target="_blank">FreeGuide</a>, an open-source TV guide program, installed. Once a week I download the latest listings and use FreeGuide to decide what I want to record during the coming week. On the PC that acts as my PVR, I have MythTV set up to download the same files from Schedules Direct when I tell it to refill the program database, which I again do once a week when programming that week's recordings.</p><p>The first part of today's adventure has been going on for a long time. When programming the week's recording schedule, I would occasionally run into discrepancies between the two machines. That is, in certain time slots FreeGuide would report one program and the listings on the PVR would report a different program, <i>even though both were working from identical downloaded data files</i>. When this happened, the listings in FreeGuide were invariably correct and the listings on the PVR incorrect. The obvious workaround was to use the FreeGuide listings, but that meant that when I wanted to record a program that FreeGuide said was there and the PVR said was not, I had to set up a manual recording rule, which is doable but somewhat inconvenient.</p><p>Eventually I figured out what was going on (probably with the help of considerable googling). I download 14 days of listings at a time from Schedules Direct, on both machines. Since I do this weekly, there is a one week overlap between the previously downloaded data and the newly downloaded data. FreeGuide replaces the old listings with the fresh download, but apparently MythTV only downloads the second week of data to its database and ignores the first week of the new download. The discrepancies occurred when the contents of a time slot in the overlap week changed between downloads. FreeGuide went with the newer data, but MythTV went with the older (incorrect) data.</p><p>I confirmed this by setting up a two line script on the PVR to let me download schedule data manually and overwrite the old data. The script is:</p><div class="code"><span style="font-family: courier;">#!/bin/sh<br />mythfilldatabase --dd-grab-all</span></div><p>Note the option <span style="font-family: courier;">-dd-grab-all</span>, which signals that all 14 days are to be downloaded and added to the database, updating any existing data. Running this from a terminal eliminated the inconsistencies between machines.</p><p>This brings me to the second part of today's adventure. I normally update the listings on the PVR machine by choosing the menu option to grab EPG (electronic program guide) data from the MythWelcome user interface. That was set up, back when I first installed MythTV, to run <span style="font-family: courier;">mythfilldatabase</span> without any optional settings. I wanted to update that setting to add the <span style="font-family: courier;">-dd-grab-all</span> option. The problem was, I could not find where to make the change. I did some googling (of course), and every post I found led to the same solution posted in the <a href="https://www.mythtv.org/wiki/Setup_General#Program_Schedule_Downloading_Options" target="_blank">MythTV wiki</a>: run <span style="font-family: courier;">mythtv-setup</span>; go to the "General" section, and within that to the "Program Schedule Downloading Options" section; then use the second of the six settings there ("Guide Data Program") to set up the program or script to download the guide data. That sounds simple enough, but when I run <span style="font-family: courier;">mythtv-setup</span> and go to that section <i>only the first entry (the toggle for automatically updating listings) is present</i>. The other five are nowhere to be found. I'm pretty sure they were there when I first installed MythTV, but they do not show up when I run the setup program on a machine that has already been configured. Possibly I need to run setup as a different user (the MythTV account "owner"?).</p><p>Anyway, I found a simple solution. The PVR machine runs <a href="https://www.mythtv.org/wiki/MythWeb" target="_blank">MythWeb</a>, a web interface to MythTV. I use MythWeb to program recordings from my main PC. It also has the ability to access settings (by clicking the button whose icon shows a key and a wrench). In the settings editor, I picked the button labeled "MythTV" and did some serious scrolling. Fortunately the settings are in alphabetic order. The one labeled "MythFillDatabasePath" has the path to the <span style="font-family: courier;">mythfilldatabase</span> program. I added the <span style="font-family: courier;">--dd-grab-all</span> option there, clicked the "Save" button at the bottom of the page, and that (hopefully) fixes the problem. Time will tell.<br /></p><p><br /></p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-10681666613943265042020-12-03T18:52:00.000-05:002020-12-03T18:52:27.572-05:00New CPLEX MIP Emphasis<p>Brace yourself (or flee now) -- this is a rather long post.</p><h2 style="text-align: left;">Introduction <br /></h2><p>IBM has <a href="https://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_ca/1/877/ENUSZP20-0571/index.html&request_locale=en" target="_blank">announced</a> the next version of CPLEX Studio (20.1), with planned availability around December 11, 2020. As to why the version number is jumping from 12.10 to 20.1, I have no idea ... but this is 2020, and I have no explanation for pretty much everything that has happened this year.</p><p>Among the changes in version 20.1, they have <a href="https://community.ibm.com/community/user/datascience/blogs/xavier-nodet1/2020/11/23/better-solutions-earlier-with-cplex-201">added a new value</a> to the MIP emphasis parameter. Prior to 20.1, there were five possible values for the emphasis parameter:</p><ul style="text-align: left;"><li>0 = Balance optimality and feasibility(default)</li><li>1 = Emphasize feasibility</li><li>2 = Emphasize proven optimality</li><li>3 = Emphasize improving the best bound</li><li>4 = Emphasize finding "hidden" feasible solutions.</li></ul><p>They have added the following new value:</p><ul style="text-align: left;"><li>5 = Emphasize heuristics (what Xavier Nodet calls "heuristics on steroids").</li></ul><p>The motivation for this is fairly clear: commercial (i.e., paying) customers with difficulty MIP models are frequently less concerned about provable optimality than with getting the best solution they can find within some limited amount of run time. Exactly how the new setting differs from setting 4 (and, for that matter, how setting 4 differs from setting 1) is unclear to me, which is OK. (I'm worried that if I really understood all the nuances, my brain would explode.)</p><p>I've been part of the beta test program for 20.1, and I've tried the new setting on a few MIP models. Going in, I expected it to slow down throughput (the number of nodes digested per minute), since running lots of heuristics means spending more time at a typical node. The question is whether the extra time per node pays for itself by reducing sufficiently the number of nodes required to find a solution of a specified quality.</p><p>My first attempt was on a difficult problem that arose in some recently published research, and on that problem the setting was definitely not helpful. In fairness, though, there may be a good reason for that. The solution approach involves a variant of Benders decomposition, so the extra time spent on heuristics will frequently produce a "good" solution only to see it shot down by the subproblem (producing a feasibility cut violated by the solution). So the remainder of my tests were on MIP models that did not involve decomposition.</p><h2 style="text-align: left;"> </h2><h2 style="text-align: left;">Test case 1: Partition</h2><p>The first test case is a MIP model that glues together sets to minimize the range of set sizes in a partition of a master set. It was originally <a href="https://orinanobworld.blogspot.com/2020/08/a-partitioning-problem.html" target="_blank">posted here</a> in August, 2020. The test problem is actually quite easy to solve, with an optimal value of 1 (meaning the cardinalities of the sets formed differ by at most 1).<br /><br />I ran the problem with a 90 second time limit (irrelevant in most cases), using each of the emphasis settings 0, 1, 4 and 5. The following plot (a log-log plot to enhance readability) shows the progress under each setting.</p><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-SldVCc7SCsg/X8lsLYbaVgI/AAAAAAAADYw/VNrRffdUf50_BNmk8Qv9w-rus_qBDYHYwCLcBGAsYHQ/s535/emph5a.png" style="margin-left: 1em; margin-right: 1em;"><img alt="Progress on partitioning problem" border="0" data-original-height="476" data-original-width="535" src="https://1.bp.blogspot.com/-SldVCc7SCsg/X8lsLYbaVgI/AAAAAAAADYw/VNrRffdUf50_BNmk8Qv9w-rus_qBDYHYwCLcBGAsYHQ/s16000/emph5a.png" title="Progress on Partitioning Problem" /></a></div><p><br />MIPEmphasis 1 ("Feasibility") makes the earliest progress but does not reach the optimal value of 1 within 90 seconds. (At that point, the incumbent value is 5.) Although just shy of one second some of the other levels do a little better than default, overall the default setting reaches the optimal solution fastest and the new setting is worse than the "Hidden Feasibility" setting. We can check the time at which each run (other than with emphasis 1) finds the optimal solution to confirm this.</p><table cellspacing="0" class="table table-condensed"><thead><tr><th align="left" style="max-width: 162px; min-width: 162px; text-align: left;"><div class="pagedtable-header-name">MIPEmphasis</div></th><th align="right" style="max-width: 45px; min-width: 45px; text-align: right;"><div class="pagedtable-header-name">Time</div></th></tr></thead><tbody><tr class="odd"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Default</td><td align="right" style="max-width: 45px; min-width: 45px; text-align: right;">5.50</td></tr><tr class="even"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Hidden Feasibility</td><td align="right" style="max-width: 45px; min-width: 45px; text-align: right;">21.19</td></tr><tr class="odd"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Heuristics</td><td align="right" style="max-width: 45px; min-width: 45px; text-align: right;">40.85</td><td align="right" style="max-width: 45px; min-width: 45px; text-align: right;"> </td></tr></tbody></table><h2 style="text-align: left;"> </h2><h2 style="text-align: left;">Test case 2: Typewriter</h2><p>The second test case is a MIP model for laying out the keyboard of a hypothetical 19th century typewriter. The problem was featured in a series of posts, and the model used here appeared in the <a href="https://orinanobworld.blogspot.com/2018/12/of-typewriters-and-permutations-v.html" target="_blank">last</a> of those posts. As I noted in that post, I was unable to find a provably optimal solution in large part due to a slow moving best bound, so for this demonstration I set a 60 second run limit. The problem seeks to minimize a distance measure. Once again, I'll use a log-log plot to show progress.<br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-OGFF45-sa60/X8lsLS31z_I/AAAAAAAADYs/t2pBt-dDjNoMr8LqbAll_1BsU8qDZgMOwCLcBGAsYHQ/s535/emph5b.png" style="margin-left: 1em; margin-right: 1em;"><img alt="Progress on the typewriter example" border="0" data-original-height="476" data-original-width="535" src="https://1.bp.blogspot.com/-OGFF45-sa60/X8lsLS31z_I/AAAAAAAADYs/t2pBt-dDjNoMr8LqbAll_1BsU8qDZgMOwCLcBGAsYHQ/s16000/emph5b.png" title="Progress on Typewriter Problem" /></a></div><p><br />All the emphasis settings produce a rapid reduction in the objective function early on. After about a second or so, emphasis 1 (feasibility) seems to do a bit better than the others. Settings 4 and 5 seem to lag a bit. Looking at the final objective values (at the 60 second cutoff), however, it seems that setting 4 (hidden feasibility) did best, and setting 5 (heuristics) slightly outperformed the other settings.</p><table cellspacing="0" class="table table-condensed"><thead><tr><th align="left" style="max-width: 162px; min-width: 162px; text-align: left;"><div class="pagedtable-header-name">MIPEmphasis</div></th><th align="right" style="max-width: 63px; min-width: 63px; text-align: right;"><div class="pagedtable-header-name">Best</div></th></tr></thead><tbody><tr class="odd"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Default</td><td align="right" style="max-width: 63px; min-width: 63px; text-align: right;">5650882</td></tr><tr class="even"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Feasibilty</td><td align="right" style="max-width: 63px; min-width: 63px; text-align: right;">5660625</td></tr><tr class="odd"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Heuristics</td><td align="right" style="max-width: 63px; min-width: 63px; text-align: right;">5640159</td></tr><tr class="even"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Hidden Feasibility</td><td align="right" style="max-width: 63px; min-width: 63px; text-align: right;">5517363</td></tr></tbody></table><p>We can also look at node throughput. As a general rule, we would expect that increased use of heuristics would slow down node throughput. One possible exception would be settings that encouraged "diving" (local depth-first search), which might speed up processing of nodes during a dive.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-sbICuIsyXHA/X8lsLX18QKI/AAAAAAAADY0/BMY9O-2goTIG9Q8I7Fbie9oy2phBFaS3gCLcBGAsYHQ/s535/emph5c.png" style="margin-left: 1em; margin-right: 1em;"><img alt="Typewriter problem node throughput" border="0" data-original-height="476" data-original-width="535" src="https://1.bp.blogspot.com/-sbICuIsyXHA/X8lsLX18QKI/AAAAAAAADY0/BMY9O-2goTIG9Q8I7Fbie9oy2phBFaS3gCLcBGAsYHQ/s16000/emph5c.png" title="Typewriter Problem Node Througput" /></a></div><p><br />The "heuristics" and "hidden feasibility" settings do in fact process fewer nodes in 60 seconds than does the default setting. The "feasibility" setting process about twice as many nodes as does the default setting, which may mean it does a fair bit of diving.</p><h2 style="text-align: left;"> </h2><h2 style="text-align: left;">Test case 3: Group Selection</h2><p>The last example is a <a href="https://orinanobworld.blogspot.com/2020/08/a-group-selection-problem.html" target="_blank">group selection problem</a> from yet another earlier post. I tested two slightly different MIP models with five minute time limits. The first variant uses continuous variables for some inherently boolean quantities, while the second variant makes those variables explicitly integer-valued. The second variant seems to be a bit harder to solve, even though they are mathematically equivalent.<br /><br />The problem is a maximization problem, and none of the runs come remotely near proof of optimality. As noted in the earlier post, nonlinear approaches yielded an objective value of 889.3463, which is apparently optimal.</p><p>Looking at progress in the incumbent value, we see that all methods make substantial progress at the root node but shortly after the root node appear to bog down. In the first model, there is not much difference among the emphasis settings.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-WZpHCD6S6yo/X8lsLhXZ60I/AAAAAAAADY4/JebWbVFSQfITDVA2hxOOgGleeMaNzQOKACLcBGAsYHQ/s535/emph5d.png" style="margin-left: 1em; margin-right: 1em;"><img alt="Progress on first group selection model" border="0" data-original-height="476" data-original-width="535" src="https://1.bp.blogspot.com/-WZpHCD6S6yo/X8lsLhXZ60I/AAAAAAAADY4/JebWbVFSQfITDVA2hxOOgGleeMaNzQOKACLcBGAsYHQ/s16000/emph5d.png" title="Progress on first Group Selection Model" /></a></div><p><br />In the second model, the feasibility setting is a bit faster than the other to reach its maximum, and the heuristics setting is slower.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-AqEmDFDGBKg/X8lsMcuRcxI/AAAAAAAADZA/HbBzmMufBDQWIcoTyUw9KzSslKa7tb6JgCLcBGAsYHQ/s535/emph5f.png" style="margin-left: 1em; margin-right: 1em;"><img alt="Progress on second group selection model" border="0" data-original-height="476" data-original-width="535" src="https://1.bp.blogspot.com/-AqEmDFDGBKg/X8lsMcuRcxI/AAAAAAAADZA/HbBzmMufBDQWIcoTyUw9KzSslKa7tb6JgCLcBGAsYHQ/s16000/emph5f.png" title="Progress on second Group Selection Model" /> </a></div><p></p><p>In both cases, though, the new "heuristics" setting produces the best objective value after 300 seconds.</p><pre class="r"><code class="hljs"></code></pre> <div class="pagedtable-wrapper" data-pagedtable="true"> <div class="pagedtable pagedtable-not-empty" style="opacity: 1;"><table style="height: auto; position: absolute; visibility: hidden; white-space: nowrap; width: auto;"><tbody><tr><td><br /></td></tr></tbody></table><table cellspacing="0" class="table table-condensed"><thead><tr><th align="left" style="max-width: 162px; min-width: 162px; text-align: left;"><div class="pagedtable-header-name">MIPEmphasis</div></th><th align="right" style="max-width: 72px; min-width: 72px; text-align: right;"><div class="pagedtable-header-name">Best</div></th></tr></thead><tbody><tr class="odd"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Default</td><td align="right" style="max-width: 72px; min-width: 72px; text-align: right;">885.7781</td></tr><tr class="even"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Feasibilty</td><td align="right" style="max-width: 72px; min-width: 72px; text-align: right;">885.7781</td></tr><tr class="odd"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Heuristics</td><td align="right" style="max-width: 72px; min-width: 72px; text-align: right;">889.3451</td></tr><tr class="even"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Hidden Feasibility</td><td align="right" style="max-width: 72px; min-width: 72px; text-align: right;">889.3130</td></tr></tbody></table></div></div><br /><table cellspacing="0" class="table table-condensed"><thead><tr><th align="left" style="max-width: 162px; min-width: 162px; text-align: left;"><div class="pagedtable-header-name">MIPEmphasis</div></th><th align="right" style="max-width: 72px; min-width: 72px; text-align: right;"><div class="pagedtable-header-name">Best</div></th></tr></thead><tbody><tr class="odd"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Default</td><td align="right" style="max-width: 72px; min-width: 72px; text-align: right;">884.6917</td></tr><tr class="even"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Feasibilty</td><td align="right" style="max-width: 72px; min-width: 72px; text-align: right;">884.6917</td></tr><tr class="odd"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Heuristics</td><td align="right" style="max-width: 72px; min-width: 72px; text-align: right;">889.3392</td></tr><tr class="even"><td align="left" style="max-width: 162px; min-width: 162px; text-align: left;">Hidden Feasibility</td><td align="right" style="max-width: 72px; min-width: 72px; text-align: right;">889.3130</td></tr></tbody></table><p>As for node throughput, the next two plots show that node throughput is clearly greater in the first variant (where inherently boolean variables are treated a continuous with domain [0, 1]), and the "feasibility" setting is again fastest in both variants, while the new "heuristics" setting is slowest.<br /></p><p> </p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-EDMAyggxZoo/X8lsL7XGf5I/AAAAAAAADY8/xoR4YqlWF8QK5YklLtNL4eW1wbTo43V0wCLcBGAsYHQ/s535/emph5e.png" style="margin-left: 1em; margin-right: 1em;"><img alt="Group Selection Model 1 Node Througput" border="0" data-original-height="476" data-original-width="535" src="https://1.bp.blogspot.com/-EDMAyggxZoo/X8lsL7XGf5I/AAAAAAAADY8/xoR4YqlWF8QK5YklLtNL4eW1wbTo43V0wCLcBGAsYHQ/s16000/emph5e.png" title="Group Selection Model 1 Node Througput" /></a></div><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-f4KFpZBIWX8/X8lsMgitLAI/AAAAAAAADZE/A4ZN_cG-ClwXf4RSvBjw0XbYO5TDsOATACLcBGAsYHQ/s535/emph5g.png" style="margin-left: 1em; margin-right: 1em;"><img alt="Group Selection Model 2 Node Througput" border="0" data-original-height="476" data-original-width="535" src="https://1.bp.blogspot.com/-f4KFpZBIWX8/X8lsMgitLAI/AAAAAAAADZE/A4ZN_cG-ClwXf4RSvBjw0XbYO5TDsOATACLcBGAsYHQ/s16000/emph5g.png" title="Group Selection Model 2 Node Througput" /></a></div><h2 style="text-align: left;"> </h2><h2 style="text-align: left;">Conclusion</h2><p>Testing on a small set of examples does not tell us much. On the group selection models, where progress is hard to come by after a short time, the new setting produced the best results, but was not much better than the old "hidden feasibility" setting. The same was true on the typewriter problem. So I am still waiting to encounter a problem instance where the new setting will provide a substantial improvement.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-89134421917071195012020-10-16T19:02:00.000-04:002020-10-16T19:02:12.221-04:00Multilogit Fit via LP<p> A <a href="https://or.stackexchange.com/questions/5040/linearizing-a-program-with-multinomial-logit-in-the-objective" target="_blank">recent question</a> on OR Stack Exchange has to do with getting an $L_1$ regression fit to some data. (I'm changing notation from the original post very slightly to avoid mixing sub- and super-scripts.) The author starts with $K$ observations $y_1, \dots, y_K$ of the dependent variable and seeks to find $x_{i,k} \ge 0$ ($i=1,\dots,N$, $k=1,\dots,K$) so as to minimize the $L_1$ error $$\sum_{k=1}^K \left|y_k - \sum_{i=1}^N \frac{e^{x_{i,k}}}{\sum_{j=1}^K e^{x_{i,j}}}\right|.$$ The author was looking for a way to linearize the objective function.</p><p>The solution I proposed there begins with a change of variables: $$z_{i,k}=\frac{e^{x_{i,k}}}{\sum_{j=1}^K e^{x_{i,j}}}.$$ The $z$ variables are nonnegative and must obey the constraint $$\sum_{k=1}^{K}z_{i, k}=1\quad\forall i=1,\dots,N.$$ With this change of variables, the objective becomes $$\sum_{k=1}^K \left|y_k - \sum_{i=1}^N z_{i,k} \right|.$$ Add nonnegative variables $w_k$ ($k=1,\dots, K$) and the constraints $$-w_k \le y_k - \sum_{i=1}^N z_{i,k} \le w_k \quad \forall k=1,\dots,K,$$ and the objective simplifies to minimizing $\sum_{k=1}^K w_k$, leaving us with an easy linear program to solve.</p><p>That leaves us with the problem of getting from the LP solution $z$ back to the original variables $x$. It turns out the transformation from $x$ to $z$ is invariant with respect to the addition of constant offsets. More precisely, for any constants $\lambda_i$ ($i=1,\dots,N$), if we set $$\hat{x}_{i,k}=x_{i,k} + \lambda_i \quad \forall i,k$$ and perform the $x\rightarrow z$ transformation on $\hat{x}$, we get $$\hat{z}_{i,k}=\frac{e^{\lambda_{i}}e^{x_{i,k}}}{\sum_{j=1}^{K}e^{\lambda_{i}}e^{x_{i,j}}}=z_{i,k}\quad\forall i,k.$$ This allows us to convert from $z$ back to $x$ as follows. For each $i$, set $j_0=\textrm{argmin}_j z_{i,j}$ and note that $$\log\left(\frac{z_{i,k}}{z_{i,j_0}}\right) = x_{i,k} - x_{i, j_0}.$$ Given the invariance to constant offsets, we can set $x_{i, j_0} = 0$ and use the log equation to find $x_{i,k}$ for $k \neq j_0$.</p><p>Well, almost. I dealt one card off the bottom of the deck. There is nothing stopping the LP solution $z$ from containing zeros, which will automatically be the smallest elements since $z \ge 0$. That means the log equation involves dividing by zero, which has been known to cause black holes to erupt in awkward places. We can fix that with a slight fudge: in the LP model, change $z \ge 0$ to $z \ge \epsilon$ for some small positive $\epsilon$ and hope that the result is not far from optimal.</p><p>I tested this with an R notebook. In it, I generated values for $y$ uniformly over $[0, 1]$, fit $x$ using the approach described above, and also fit it using a genetic algorithm for comparison purposes. In my experiment (with dimensions $K=100$, $N=10$), the GA was able to match the LP solution if I gave it enough time. Interestingly, the GA solution was dense (all $x_{i,j} > 0$) while the LP solution was quite sparse (34 of 1,000 values of $x_{i,j}$ were nonzero). As shown in the notebook (which you can download <a href="http://rubin.msu.domains/blog/multilogit.nb.html" target="_blank">here</a>), the LP solution could be made dense by adding positive amounts $\lambda_i$ as described above, while maintaining the same objective value. I tried to make the GA solution sparse by subtracting $\lambda_i = \min_k x_{i,k}$ from the $i$-th row of $x$. It preserved nonnegativity of $x$ and maintained the same objective value, but reduce density only from 1 to 0.99.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-22008479333438793402020-09-30T19:13:00.001-04:002021-07-05T14:53:28.808-04:00A Greedy Heuristic Wins<p> A <a href="https://or.stackexchange.com/questions/4777/combinatorial-optimisation-allocation-problem/" target="_blank">problem</a> posted on OR Stack Exchange starts as follows: "I need to find two distinct values to allocate, and how to allocate them in a network of stores." There are $n$ stores (where, according to the poster, $n$ can be close to 1,000). The two values (let's call them $x_1$ and $x_2$) must be integer, with $x_1 \in \lbrace 1, \dots, k_1 \rbrace$ and $x_2 \in \lbrace k_1, \dots, k_2 \rbrace$ for given parameters $k_1 < k_2$. Additionally, there is an additional set of parameters $s_{i3}$ and a balance constraint saying $$0.95 g(k_1 e) \le g(x_1, x_2) \le 1.05 g(k_1 e)$$ where $$g(y) = \sum_{i=1} \frac{s_{i3}}{y_i}$$ for any allocation $y$ and $e = (1,\dots, 1).$</p><p>The cost function (to be minimized) has the form $$f(x_1, x_2) = a\sum_{i=1}^n \left[ s_{i1}\cdot \left( \frac{s_{i2}}{y_i} \right)^b \right]$$with $a$, $s_{i1}$, $s_{i2}$ and $b$ all parameters and $y_i \in \lbrace x_1, x_2 \rbrace$ is the allocation to store $i$. There are two things to note about $f$. First, the leading coefficient $a (> 0)$ can be ignored when looking for an optimum. Second, given choices $x_1$ and $x_2>x_1$, the cheaper choice at all stores will be $x_1$ if $b < 0$ and $x_2$ if $b > 0$.</p><p>It's possible that a nonlinear solver might handle this, but I jumped straight to metaheuristics and, in particular, my go-to choice among metaheuristics -- a genetic algorithm. Originally, genetic algorithms were intended for unconstrained problems, and were tricky to use with constrained problems. (You could bake a penalty for constraint violations into the fitness function, or just reject offspring that violated any constraints, but neither of those approaches was entirely satisfactory.) Then came a breakthrough, the random key genetic algorithm [1]. A random key GA uses a numeric vector $v$ (perhaps integer, perhaps byte, perhaps double precision) as the "chromosome". The user is required to supply a function that translates any such chromosome into a <i>feasible</i> solution to the original problem.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">I did some experiments in R, using the <a href="https://cran.r-project.org/web/packages/GA/index.html" target="_blank">GA</a> package to implement a random key genetic algorithm. The package requires all "genes" (think "variables") to be the same type, so I used a double-precision vector of dimension $n_2$ for chromosomes. The last two genes have domains $(1, k_1 + 1)$ and $(k_1, k_2 + 1)$; the rest have domain $(0, 1)$. Decoding a chromosome $v$ proceeds as follows. First, $x_1 = \left\lfloor v_{n+1}\right\rfloor $ and $x_2 = \left\lfloor v_{n+2}\right\rfloor $, where $\left\lfloor z \right\rfloor$ denotes the "floor" (greatest lower integer) of $z$. The remaining values $v_1, \dots, v_{n}$ are sorted into ascending order, and their sort order is applied to the stores. So, for instance, if $v_7$ is the smallest of those genes and $v_{36}$ is the largest, then store $7$ will be first in the sorted list of stores and store $36$ will be last. (The significance of this sorting will come out in a second.)</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Armed with this, my decoder initially assigns every store the cheaper choice between $x_1$ and $x_2$ and computes the value of $g()$. If $g()$ does not fall within the given limits, the decoder runs through the stores in their sorted order, switching the allocation to the more expensive choice and updating $g()$, until $g()$ meets the balance constraint. As soon as it does, we have the decoded solution. This cheats a little on the supposed guarantee of feasibility in a decoded solution, since there is a (small?) (nearly zero?) chance that the decoding process will fail with $g()$ jumping from below the lower bound to above the upper bound (or vice versa) after some swap. If it does, my code discards the solution. This did not seem to happen in my testing.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The GA seemed to work okay, but it occurred to me that I might be over-engineering the solution a bit. (This would not be the first time I did that.) So I also tried a simple greedy heuristic. Since $k_1$ and $k_2$ seem likely to be relatively small in the original poster's problem (whereas $n$ is not), my greedy heuristic loops through all valid combinations of $x_1$ and $x_2$. For each combination, it sets $v_1$ equal to the cheaper choice and $v_2$ equal to the more expensive choice, assigns the cheaper quantity $v_1$ to every store and computes $g()$. It also computes, for each store, the ratio \[ \frac{|\frac{s_{i3}}{v_{2}}-\frac{s_{i3}}{v_{1}}|}{s_{i1}\left(\left(\frac{s_{i2}}{v_{2}}\right)^{b}-\left(\frac{s_{i1}}{v_{1}}\right)^{b}\right)} \]in which the numerator is the absolute change in balance at store $i$ when switching from the cheaper allocation $v_1$ to the more expensive allocation $v_2$, and the denominator is the corresponding change in cost. The heuristic uses these ratios to select stores in descending "bang for the buck" order, switching each store to the more expensive allocation until the balance constraint is met.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Both the GA decoder and the greedy heuristic share the approach of initially allocating every store the cheaper choice and then switching stores to the more expensive choice until balance is attained. My R notebook generates a random problem instance with $n=1,000$ and then solves it twice, first with the GA and then with the greedy heuristic. The greedy heuristic stops when all combinations of $x_1$ and $x_2$ have been tried. Stopping criteria for the GA are more arbitrary. I limited it to at most 1,000 generations (with a population of size 100) or 20 consecutive generations with no improvement, whichever came first.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The results on a typical instance were as follows. The GA ran for 49 seconds and got a solution with cost 1065.945. The greedy heuristic needed only 0.176 seconds to get a solution with cost 1051.735. This pattern (greedy heuristic getting a better solution in much less time) repeated across a range of random number seeds and input parameters, including switching between positive and negative values of $b$.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">If you are interested, you can browse <a href="https://rubin.msu.domains/blog/stores.nb.html" target="_blank">my R notebook</a> (which includes both code and results).<br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">[1] Bean, J. C. (1994). Genetic Algorithms and Random Keys for Sequencing and Optimization. <i>ORSA Journal on Computing</i>, <b>6</b>, 154-160.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-71469884566776286292020-09-03T16:22:00.003-04:002020-09-27T17:49:01.846-04:00Installing Rcplex and cplexAPI<div><p>I've previously mentioned solving MIP models in R, using CPLEX. In one post [1], I used the <a href="https://cran.r-project.org/web/packages/ompr/index.html" target="_blank">OMPR</a> package, which provides a domain specific language for model construction. OMPR uses the <a href="https://cran.r-project.org/web/packages/ROI/index.html" target="_blank">ROI</a> package, and in particular the <a href="https://cran.r-project.org/web/packages/ROI.plugin.cplex/index.html">ROI.plugin.cplex</a> package, to communicate with CPLEX. That, in turn, uses the <a href="https://cran.r-project.org/web/packages/Rcplex/index.html" target="_blank">Rcplex</a> package. In another post [2], I used Rcplex directly. Meanwhile, there is still another package, <a href="https://cran.r-project.org/web/packages/cplexAPI/index.html" target="_blank">cplexAPI</a>, that provides a low-level API to CPLEX.</p><p>Both Rcplex and cplexAPI will install against CPLEX Studio 12.8 and earlier, but neither one installs with CPLEX Studio 12.9 or 12.10. Fortunately, IBM's Daniel Junglas was able to hack solutions for both of them. I'll spell out the steps I used to get Rcplex working with CPLEX 12.10. You can find the solutions for both in the responses to <a href="https://community.ibm.com/community/user/datascience/communities/community-home/digestviewer/viewthread?MessageKey=0969e4fc-3ba7-42ed-91aa-5fd73f571481&CommunityKey=ab7de0fd-6f43-47a9-8261-33578a231bb7&tab=digestviewer&reply-inline=0969e4fc-3ba7-42ed-91aa-5fd73f571481&SuccessMsg=Thank%20you%20for%20submitting%20your%20message.#bm3ddfcd6c-a949-41c9-95a5-4ca0b6e65f64" target="_blank">this question</a> on the IBM Decision Optimization community site. Version information for what follows is: Linux Mint 19.3; CPLEX Studio 12.10; R 3.6.3; and Rcplex 0.3-3. Hopefully essentially the same hack works with Windows.<br /></p><ol style="text-align: left;"> <li>Download Rcplex_0.3-3.tar.gz, put it someplace harmless (the Downloads folder in my case, but /tmp would be fine) and expand it, producing a folder named Rcplex.</li> <li>Go to the Rcplex folder and open the 'configure' file in a text editor (one you would use for plain text files).</li> <li>Line 1548 should read as follows:<br /><div align="center"><pre>CPLEX_LIBS="-L${CPLEXLIBDIR} `${AWK} 'BEGIN {FS = " = "} /^CLNFLAGS/ {print $2}' ${CPLEX_MAKEFILE}`"</pre></div>.<br /> Replace it with <br /><div align="center"><pre>CPLEX_LIBS="-L${CPLEXLIBDIR} `${AWK} 'BEGIN {FS = " = "} /^CLNFLAGS/ {print $2}' ${CPLEX_MAKEFILE} | sed -e 's,\$(CPLEXLIB),cplex,'`"</pre></div>.<br /> Save the modified file.<br /></li> <li>Open a terminal in the parent directory of the Rcplex folder and run the following command:<br /> <div align="center"><pre>R CMD INSTALL --configure-args="--with-cplex-dir=.../CPLEX_Studio1210/cplex/" ./Rcplex</pre></div>.<br /> Adjust the file path (particularly the ...) so that it points to the 'cplex' directory in your CPLEX Studio installation (the one that has subdirectories named "bin", "examples", "include" etc.).</li> <li>Assuming there were no error messages during installation, you should be good to go.<br /></li></ol></div><br /><div><p>[1] <a href="https://orinanobworld.blogspot.com/2016/11/mip-models-in-r-with-ompr.html" target="_blank">https://orinanobworld.blogspot.com/2016/11/mip-models-in-r-with-ompr.html</a></p><p>[2] <a href="https://orinanobworld.blogspot.com/2020/08/a-group-selection-problem.html" target="_blank">https://orinanobworld.blogspot.com/2020/08/a-group-selection-problem.html</a></p><p><b>Update</b>: Version 1.4.0 of cplexAPI, released on 2020-09-21, installs correctly against CPLEX 12.10 (and presumably 12.9), at least on my system (Linux Mint).<br /></p></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-75094762787711981952020-08-29T14:42:00.001-04:002020-08-29T14:42:51.178-04:00A Group Selection Problem<p>Someone posted an interesting <a href="https://stackoverflow.com/questions/63584707/r-binary-integer-optimization-with-groups" target="_blank">question</a> about nonlinear integer programming with grouped binary variables on Stack Overflow, and it drew multiple responses. The problem is simple to state. You have 52 binary variables $x_i$ partitioned into 13 groups of four each, with a requirement that exactly one variable in each group take the value 1. So the constraints are quite simple:</p><p>\begin{align*} x_{1}+\dots+x_{4} & =1\\ x_{5}+\dots+x_{8} & =1\\ \vdots\\ x_{49}+\dots+x_{52} & =1. \end{align*}</p><p>The objective function is a cubic function of the form</p><p>\[ \left(\alpha\sum_{i}a_{i}x_{i}\right)\times\left(\beta\sum_{j}b_{j}x_{j}+\beta_{0}\right)\times\left(\gamma\sum_{k}c_{k}x_{k}+\gamma_{0}\right) \] where $\alpha = 1166/2000$, $\beta = 1/2100$, $\beta_0 = 0.05$, $\gamma = 1/1500$ and $\gamma_0 = 1.5$. (In the original post, there is a minus sign in front of the function and the author minimizes; for various reasons I am omitting the minus sign and maximizing here.) Not only is the objective nonlinear, it is nonconvex if minimizing (nonconcave if maximizing). The author of the question was working in R.<br /></p><p>Fellow blogger Erwin Kalvelagen <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2020/08/minlp-vs-miqcp.html" target="_blank">solved the problem</a> with a variety of nonlinear optimizers, obtaining a solution with objective value -889.346. Alex Fleischer of IBM posted an answer with the same objective value, using a constraint programming model written in OPL and solved with CP Optimizer.</p><p>My initial thought was to linearize the objective function by introducing continuous variables $y_{ij} = x_i \cdot x_j$ and $z_{ijk} = x_i \cdot x_j \cdot x_k$ with domain [0,1]. Many of those variables can be eliminated, due in part to symmetry ($y_{ij} = y_{ji}$, $z_{ijk} = z_{ikj}=\dots=z_{kji}$ and in part due to the observation that $y_{ii}=z_{iii}=x_i$. Also useful is that for $i<j<k$ $z_{ijk}=x_i \cdot y_{jk}$. I have an <a href="http://rubin.msu.domains/blog/groupselect.nb.html" target="_blank">R notebook</a> that you can download, in which I build the model using standard linearizations for the product of two binarys, then try to solve it with CPLEX using the Rcplex package (and the Matrix package, which allows a sparse representation of the constraint matrix). The results were, shall we say, unspectacular. With a five minute time limit (much longer than what Erwin or Alex needed), CPLEX found an incumbent with value 886.8748 (not bad but not optimal) and a rather dismal optimality gap of 146.5% (due mainly to a loose and slow moving bound).</p><p>Out of curiosity, I took a second shot using a genetic algorithm and the GA package for R. I was geeked to see that the GA package includes both an island model (using parallel processing) and a permutation variant (which lets me use permutations of the indices 1 to 52 as chromosomes with no extra work on my part). The permutation approach allows me to treat a chromosome as a prioritization of the 52 binary variables, which I decode into a solution $x$ by scanning the $x_i$ in priority order and setting each to 1 if and only none of the other variables in its group of four has been set to 1. <a href="http://rubin.msu.domains/blog/groupselect2.nb.html" target="_blank">That R notebook</a> is also available for download.</p><p>As a metaheuristic, the GA does not offer a proof of optimality, and in fact may or may not find the optimal solution. With my inspired choice of random number seed (123), I matched Erwin's and Alex's solution (889.3463). The settings I used resulted in a run time of about 36 seconds on my PC, more than half of which was spent after the best solution had been found. It's still slower than what Erwin and Alex achieved, but it is a "pure R" solution, meaning it requires nothing besides open-source R packages.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-9330047517428646232020-08-23T14:53:00.000-04:002020-08-23T14:53:06.070-04:00Multiobjective Optimization in CPLEX<p>In my <a href="https://orinanobworld.blogspot.com/2020/08/multiobjective-optimization.html" target="_blank">previous post</a>, I discussed multiobjective optimization and ended with a simple example. I'll use this post to discuss some of the new (as of version 12.9) features in CPLEX related to multiobjective optimization, and then apply them to the example from the previous post. My Java code can be downloaded from my <a href="https://gitlab.msu.edu/orobworld/multiobj" target="_blank">GitLab repository</a>.</p><p>Currently (meaning as of CPLEX version 12.10), CPLEX supports multiple objectives in linear and integer programs. It allows mixtures of "blended" objective functions (weighted combinations of original criteria) and "lexicographic" hierarchical objectives. Basically, you set one or more hierarchy (priority) levels, and in each one you can have a single criterion or a weighted combination of criteria. So the "classical" preemptive priority approach would involve multiple priority levels with a single criterion in each, while the "classical" weighted combination approach would involve one priority level with a blended objective in it. Overall, you are either maximizing or minimizing, but you can use negative weights for criteria that should go the opposite direction of the rest. In the example here, which is a minimization problem, the third priority level gives maximum provider utilization a weight of +1 (because we want to minimize it) and minimum provider utilization a weight of -1 (because we want to maximize it).</p><p>There are some limitations to the use of multiple objectives. The ones I think are of most general interest are the following:</p><ul style="text-align: left;"><li>objectives and constraints <i>must be linear</i> (no quadratic terms); and</li><li>all generic callbacks and legacy <i>information</i> callbacks can be used, but other legacy callbacks, and in particular legacy <i>control</i> callbacks (branch callbacks, cut callbacks etc.) cannot be used. So if you need to use callbacks with a multiobjective problem, now would be a good time to learn the new generic callback system.</li></ul><p>Every criterion you specify has a priority level and, within that priority level, a weight. A feature that I appreciate, and which I will use in the code, is that you can also specify an absolute and/or a relative tolerance for each criterion. The tolerances tell CPLEX how much it can sacrifice in that criterion to improve lower priority criteria. The default tolerance is zero, meaning higher priority criteria must be optimized before lower priority criteria are even considered. A nonzero tolerance basically tells CPLEX that is allowed to sacrifice some amount (either an absolute amount or a percentage of the optimal value) in that criterion in order to improve lower priority criteria.</p><p>Defining the variables and building the constraints of a multiobjective model is no different from a typical single criterion model. Getting the solution after solving the model is also unchanged. The differences come mainly in how you specify the objectives and how you invoke the solver.</p><p>To build the objective function, you need to use one of the several overloads of <span style="font-family: courier;">IloCplex.staticLex()</span>. The all take as first argument a one dimensional array of expressions <span style="font-family: courier;">IloNumExpr[]</span>, and they all return an instance of the new interface <span style="font-family: courier;">IloCplexMultiCriterionExpr</span>. In addition to an array of objective expressions, one of the overloads lets you also specify arrays of weights, priorities and tolerances (absolute and relative). That's the version used in my sample code. </p><p>This brings me to a minor inconvenience relative to a conventional single objective problem. Ordinarily, I would use <span style="font-family: courier;">IloCplexModeler.addMinimize(expr)</span> or <span style="font-family: courier;">IloCplexModeler.addMaximize(expr)</span> to add an objective to a model, where <span style="font-family: courier;">expr</span> is an instance of <span style="font-family: courier;">IloNumExpr</span>. I naively thought to do the same here, using the output of <span style="font-family: courier;">staticLex()</span> as the expression, but that is not (currently) supported. There's no overload of <span style="font-family: courier;">addMinimize()</span> or <span style="font-family: courier;">addMaximize()</span> that accepts a multicriterion expression. So it's a three step process: use <span style="font-family: courier;">cplex.staticLex(...)</span> to create the objective and save it to a temporary variable (where <span style="font-family: courier;">cplex</span> is your <span style="font-family: courier;">IloCplex</span> instance); pass that variable to either <span style="font-family: courier;">cplex.minimize(...)</span> or <span style="font-family: courier;">cplex.maximize(...)</span> and save the resulting instance of <span style="font-family: courier;">IloObjective</span> in a temporary variable; and then invoke <span style="font-family: courier;">cplex.add(...)</span> on that variable.</p><p>When you are ready to solve the model, you invoke the <span style="font-family: courier;">solve()</span> method on it. You can continue to use the version of <span style="font-family: courier;">solve()</span> that takes no arguments (which is what my code does), or you can use a new version that takes as argument an array of type <span style="font-family: courier;">IloCplex.ParameterSet[]</span>. This allows you to specify different parameter settings for different priority levels.</p><p>Other methods you might be interested in are <span style="font-family: courier;">IloCplex.getMultiObjNSolves()</span> (which gets the number of subproblems solved) and <span style="font-family: courier;">IloCplex.getMultiObjInfo()</span> (which lets you look up a variety of things that I really have not explored yet).</p><p>The output from my code (log file), which is in the repository, is too lengthy to show here, but if you want you can use <a href="https://gitlab.msu.edu/orobworld/multiobj/-/blob/master/log.txt" target="_blank">this link</a> to open it in a new tab. Here's a synopsis. I first optimized each of the three objective functions separately. (Recall that maximum and minimum provider utilization are blended into one objective.) This gives what is sometimes referred to as the "Utopia point" or "<a href="https://en.wikipedia.org/wiki/Multi-objective_optimization" target="_blank">ideal point</a>". This is column "U" in the table below. Next, I solved the prioritized multiobjective problem. The results are in column "O" of the table. Finally, to demonstrate the ability to be flexible with priorities, I resolved the multiobjective problem using a relative tolerance of 0.1 (10%) for the top priority objective (average distance traveled) and 0.05 (5%) for the second priority objective (maximum distance traveled). Those results are in column "F".</p><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;margin:0px auto;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} </style><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;margin:0px auto;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} .tg .tg-dvpl{border-color:inherit;text-align:right;vertical-align:top} </style><table class="tg"><thead> <tr> <th class="tg-0pky"><br /></th> <th class="tg-c3ow">U</th> <th class="tg-c3ow">O</th> <th class="tg-c3ow">F</th> </tr></thead><tbody> <tr> <td class="tg-0pky">Avg. distance</td> <td class="tg-dvpl">14.489</td> <td class="tg-dvpl">14.489</td> <td class="tg-dvpl">15.888</td> </tr> <tr> <td class="tg-0pky">Max distance</td> <td class="tg-dvpl">58.605</td> <td class="tg-dvpl">58.605</td> <td class="tg-dvpl">60.000</td> </tr> <tr> <td class="tg-0pky">Utilization spread</td> <td class="tg-dvpl">0.030</td> <td class="tg-dvpl">0.267</td> <td class="tg-dvpl">0.030</td> </tr> <tr> <td class="tg-0pky">Max utilization</td> <td class="tg-dvpl">0.710</td> <td class="tg-dvpl">0.880</td> <td class="tg-dvpl">0.710</td> </tr> <tr> <td class="tg-0pky">Min utilization</td> <td class="tg-dvpl">0.680</td> <td class="tg-dvpl">0.613</td> <td class="tg-dvpl">0.680</td> </tr></tbody></table><p>There are a few things to note.</p><ol style="text-align: left;"><li>The solution to the multiobjective model ("O") achieved the ideal values for the first two objectives. One would expect to match the ideal value on the highest priority objective; matching on the second objective was luck. The third objective (utilization spread) was, not surprisingly, somewhat worse than the ideal value.</li><li>Absolute and relative tolerances appear to work the same way that absolute and relative gap tolerances do: if a solution is within <i>either</i> absolute or relative tolerance of the best possible value on a higher priority objective, it can be accepted. In the third run, I set relative tolerances but let the absolute tolerances stay at default values.</li><li>The relative tolerances I set in the last run would normally allow CPLEX to accept a solution with an average travel distance as large as $(1 + 0.1)*14.489 = 15.938$ and a maximum travel distance as large as $(1 + 0.05)*58.605 = 61.535$. There is a constraint limiting travel distance to at most 60, though, which supersedes the tolerance setting.</li><li>The "flexible" solution (column "F") exceeds the ideal average distance by about 9.7%, hits the cap of 60 on maximum travel distance, and actually achieves the ideal utilization spread. However, without knowing the ideal point you would not realize that last part. I put a fairly short time limit (30 seconds) on the run, and it ended with about a 21% gap due to a very slow-moving best bound.</li></ol><p>I'll close with one last observation. At the bottom of the log, after solving the "flexible" variant, you will see the following lines.</p><p><span style="font-family: courier;">Solver status = Unknown.<br />Objective 0: Status = 101, value = <b>14.489</b>, bound = 14.489.<br />Objective 1: Status = 101, value = <b>58.605</b>, bound = 58.605.<br />Objective 2: Status = 107, value = 0.030, bound = 0.023.<br />Final value of average distance traveled = <i>15.888</i>.<br />Final value of longest distance traveled = <i>60.000</i>.<br />Final value of maximum provider utilization = 0.710.<br />Final value of minimum provider utilization = 0.680.</span></p><p>The first four lines are printed by CPLEX, the last four by my code. Note the mismatch in objective values of the first two criteria (bold for CPLEX, italic for my results). CPLEX prints the best value it achieved for each objective <i>before moving on to lower priority objectives</i>. When you are using the default tolerances of zero (meaning priorities are absolute), the printed values will match what you get in the final solution. When you specify non-zero tolerances, though, CPLEX may "give back" some of the quality of the higher priority results to improve lower priority results, so you will need to recover the objective values yourself.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-59476605336400123862020-08-20T14:19:00.000-04:002020-08-20T17:45:47.320-04:00Multiobjective Optimization<p><b>Multiobjective optimization</b> (making "optimal" decisions involving multiple, frequently conflicting, criteria) is a big subject. I will only nibble at the fringes of it here. In the next post, I'll describe recent additions to CPLEX that facilitate solving some multiobjective problems.</p><p>Among the various approaches to multiobjective problems, two are probably the most common, weighting and prioritization. The first approach is to merge the various criteria into a single one, usually (almost always?) by taking a weighted sum of the criteria. The CPLEX documentation refers to this as a <b>blended</b> objective. For this to make sense, the units of the various criteria really should be commensurable (e.g., all monetary values), but I'm pretty sure having criteria that are not commensurable doesn't stop people from trying. The weights serve two roles. First, they bring the units into some semblance of parity (so if $f()$ is in dollars and $g()$ in millions of dollars, $g()$ gets a weight roughly on millionth the size of the weight of $f()$). Second, they convey relative importance of various criteria.<br /></p><p>The second approach is to prioritize the criteria. The solver initially optimizes the highest priority criterion, without regard to any others. Once an optimal value of the highest priority criterion is known, maintaining that value becomes a constraint, and the solver moves to the second highest priority criterion, and so on. The CPLEX documentation refers to this as a lexicographic objective, meaning that the objective function is vector-valued rather than scalar-valued, and optimization means achieving the lexicographically largest or smallest objective vector possible. A variant of this allows a little "slippage" in the value of each criterion, so that for example the solver can accept a solution that is 1% below optimal on the first criterion in return for optimizing the second criterion. A key limitation here is the solver will trade any amount of degradation in a lower priority criterion, no matter how much, for any improvement in a higher priority criterion, no matter how small.</p><p>Although they are not relevant to the recent CPLEX additions, I will mention two other approaches. One is a variant of the priority method, known as <a href="https://en.wikipedia.org/wiki/Goal_programming" target="_blank">goal programming</a> (GP). This was originally developed as an extension of linear programming, but the same general approach can be extended to problems with integer variables. The user sets target levels for each criterion, and then prioritizes them. If a goal is underachieved, work on meeting lower priority goals cannot sacrifice any amount of the higher priority criterion. On the other hand, if a goal is overachieved, any portion of the overachievement can be sacrificed in the quest to reach a lower priority goal. An interesting attribute of goal programming is that the same criterion can be used with more than one goal. Suppose that you are building a GP model allocating a budget to various conservation projects. Your highest priority goal might be to allocate at least 50% of the budget to projects in underserved communities (USCs, to save me typing, with apologies to the universities of South Carolina and Southern Califonia). Your second highest priority goal might be to allocate at least 30% of the budget to projects with matching funds from outside sources. Your third highest priority goal might be to allocate at least 75% of the budget to USCs. The other approach is to investigate the <a href="https://en.wikipedia.org/wiki/Pareto_efficiency" target="_blank">Pareto frontier</a>, the set of all solutions for which no other solution does as well in all criteria and better in at least one. In essence, you want to present the decision-maker with the entire Pareto frontier and say "here, pick one". In practice, computing the Pareto frontier can be very computationally expensive, and trying to make sense of it might cause the decision maker to melt down.</p><p>To close this post, I'll pose a small sample problem and formulate the model for it. Suppose that we have $N$ patients in a health care system and $M$ providers, and that each patient needs to be assigned to a single provider. Provider $j$ has a limit $c_j$ on the number of patients they can handle. (To keep the example simple, and at the expense of some realism, we treat all patients as identical with regard to their capacity consumption.) We are given a matrix $D\in \mathbb{R}^{N\times M}$ of distances from patients to providers, as well as a cap $D_{max}$ on the distance that a patient can be required to travel. There are four criteria to be considered:</p><ul style="text-align: left;"><li>the average distance patients will travel (minimize, highest priority);</li><li>the maximum distance any patient must travel (minimize, second highest priority);</li><li>the maximum utilization of any provider as a fraction of their capacity (minimize, tied for third highest priority); and</li><li>the minimum utilization of any provider as a fraction of their capacity (maximize, tied for third highest priority).</li></ul><p>So we have a mix of three things to minimize and one to maximize, with the last two criteria combining to somewhat level the workload across providers. </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin: 0px; text-indent: 0px;">Let $x_{ij}$ be 1 if patient $i$ is assigned to provider $j$ and 0 if not, let $w$ be the longest distance traveled by any patient, let $y_j$ be the fraction of provider $j$'s capacity that is utilized, and let $z_{lo}$ and $z_{hi}$ be the minimum and maximum capacity utilization rates, respectively (where 0 means the provider is unused and 1 means the provider is operating at capacity). The objective expression is $f\in\mathbb{R}^3$, whose lexicographic minimum we seek, where</p>\[ f=\left[\begin{array}{c} \frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{M}d_{ij}x_{ij}\\ w\\ z_{hi}-z_{lo} \end{array}\right]. \]<p style="-qt-block-indent: 0; margin: 0px; text-indent: 0px;">The first and second components of $f$ are the average and maximum client travel distances. The third component is a weighted mix of maximum and minimum provider utilization, where the weights (+1, -1) are equal in magnitude to reflect the equal importance I am assigning to them and the negative coefficient for minimum utilization allows it to be maximized in what is otherwise a minimization problem.</p><p style="-qt-block-indent: 0; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin: 0px; text-indent: 0px;">The constraints of the model are easy to state:</p><p>\begin{align*} \sum_{j=1}^{M}x_{ij} & =1\quad\forall i\in\left\{ 1,\dots,N\right\} & (1)\\ d_{ij}x_{ij} & \le w\quad\forall i\in\left\{ 1,\dots,N\right\} ,\forall j\in\left\{ 1,\dots,M\right\} & (2)\\ \frac{1}{c_{j}}\sum_{i=1}^{N}x_{ij} & =y_{j}\quad\forall j\in\left\{ 1,\dots,M\right\} & (3)\\ y_{j} & \le z_{hi}\quad\forall j\in\left\{ 1,\dots,M\right\} & (4)\\ y_{j} & \ge z_{lo}\quad\forall j\in\left\{ 1,\dots,M\right\} & (5)\\ x & \in\left\{ 0,1\right\} ^{N\times M} & (6)\\ x_{ij} & =0\quad\forall i,j\ni d_{ij}>D_{max} & (7)\\ y & \in\left[0,1\right]^{M} & (8)\\ z_{hi},z_{lo} & \in\left[0,1\right] & (9)\\ w & \in\left[0,D_{max}\right] & (10) \end{align*} </p><ul style="text-align: left;"><li>Constraint (1) ensures that each patient is assigned to exactly one provider.</li><li>Constraint (2) defines $w$, the maximum distance traveled.</li><li>Constraint (3) defines the fraction $y_j$ of capacity used at each provider $j$.</li><li>Constraints (4) and (5) define $z_{lo}$ and $z_{hi}$.</li><li>Constraints (6), (8), (9) and (10) define variable domains. The upper bound of 1 for $y_j$ in (8) ensures that no provider is assigned more patients than their capacity allows.</li><li>Constraint (7) enforces the travel distance limit $D_{max}$ by preventing any assignments that would violate the limit (effectively removing those assignment variables from the model).</li></ul><p>In the next post, I will show how to solve the model using CPLEX (with, as usual, the Java API).</p><p> <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-8635473320738028192020-08-18T14:13:00.000-04:002020-08-18T14:13:26.969-04:00A Partitioning Problem<p> A <a href="https://math.stackexchange.com/questions/3792115/partition-n-items-into-k-equally-sized-partitions-while-retaining-groups-in/" target="_blank">recent question</a> on Mathematics Stack Exchange dealt with reducing the number of sets in a partition of a set of items. I'll repeat it here but with slightly different terminology from the original question. You start with $N$ items partitioned into $M$ disjoint sets. Your goal is to generate a smaller partition of $K < M$ sets (which I will henceforth call "collections" to distinguish them from the original sets). It is required that all items from any original set end up in the same collection (i.e., you cannot split the original sets). The criterion for success is that "the new [collection] sizes should be as close to even as possible".</p><p>This is easily done with an integer programming model. The author of the question thought about minimizing the variance in the collection sizes, which would work, but I'm fond of keeping things linear, so I will minimize the range of collection sizes. I'll denote the cardinality of original set $i$ by $n_i$. Let $x_{ij}$ be a binary variable which is 1 if set $i\in \lbrace 1,\dots, M\rbrace$ is assigned to collection $j\in \lbrace 1,\dots,K\rbrace$ and 0 if not. Let $y$ and $z$ denote the sizes of the smallest and largest collections. Finally, for $j\in \lbrace 1,\dots,K\rbrace$ let $s_j$ be the size (cardinality) of collection $j$. A MILP model for the problem is the following:</p><p>\begin{align} \min\,z-y\\ \textrm{s.t. }\sum_{j=1}^{K}x_{ij} & =1\;\; \forall i\in\left\{ 1,\dots M\right\} \\ \sum_{i=1}^{M}n_{i}x_{ij} & =s_{j}\;\; \forall j\in\left\{ 1,\dots,K\right\} \\ s_{j} & \le z\;\; \forall j\in\left\{ 1,\dots,K\right\} \\ s_{j} & \ge y\;\; \forall j\in\left\{ 1,\dots,K\right\} \\ y,z,s_{\cdot} & \ge0\\ x_{\cdot\cdot} & \in\left\{ 0,1\right\} \end{align} </p><p>The author of the question also indicated an interest in "fast greedy approximate solutions" (and did not specify problem dimensions). The first greedy heuristic that came to my mind was a simple one. Start with $K$ empty collections and sort the original sets into descending size order. Now assign each set, in turn, to the collection that currently has the smallest size (breaking times whimsically). Why work from largest to smallest set? There will be times when you will want to offset a large set in one collection with two or more smaller sets in another collection, and that will be easier to do if you start big and keep the smaller sets in reserve as long as is possible. <a href="https://math.stackexchange.com/users/683666/robpratt" target="_blank">Rob Pratt</a>, owner of a rather massive reputation score on MSE, correctly noted that this is equivalent to the "longest processing time" heuristic for assigning jobs to machines so as to minimize makespan.</p><p>I put together an <a href="https://rubin.msu.domains/blog/partition.nb.html" target="_blank">R notebook</a> to test this "greedy" heuristic against the optimization model (solved with CPLEX). The notebook uses Dirk Schumacher's <a href="https://cran.r-project.org/web/packages/ompr/index.html" target="_blank">OMPR</a> package for building the MILP model. It in turn uses the <a href="https://cran.r-project.org/web/packages/ROI/index.html" target="_blank">ROI</a> package (which requires the <a href="https://cran.r-project.org/web/packages/Rcplex/index.html" target="_blank">Rcplex</a> package) in order to communicate with CPLEX. On a test run using nice, round values of $N$, $M$ and $K$ that all ended in zeros (and in particular where $K$ divided evenly into both $M$ and $N$), the greedy heuristic nearly found the optimal solution. When I switched to less round numbers ($N=5723$, $M=137$, $K=10$), though, the heuristic did not fare as well. It was fast (well under one second on my PC) but it produced a solution where collection sizes ranged from 552 to 582 (a spread of 30), while CPLEX (in about 21 seconds) found an optimal solution where all collections had size either 572 or 573 (spread of 1). So I tacked on a second heuristic to refine the solution of the first heuristic. The second heuristic attempts pairwise swaps of the smallest set from the smallest collection with a larger set from a larger collection (trying collections in descending size order). Swaps are constrained not to leave the second collection (the one donating the larger set) smaller than the first collection started out. The intuition is to shrink the range by making the smallest collection bigger while shrinking the largest collection if possible and, if not, at least some collection that is larger than the smallest one. The new heuristic also ran in well under one second and shrank the range of collection sizes from 30 to 3 -- still not optimal, but likely good enough for the application the original questioner had in mind.</p><p>You are free to use the R code (which can be extracted from the notebook linked above) under the Creative Commons license that governs the blog.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-2267353170963263182020-08-15T15:11:00.003-04:002020-08-15T15:11:54.956-04:00Firefox and the New Blogger Interface<p>Blogger has a (relatively) new interface, to which I switched a while back. The one major annoyance I found was that clicking the "Preview" button while editing a post did not actually generate a preview. I got a notification (lower left) that the preview was being prepared, and then ... nothing. To get a preview, I had to save my work, exit the edit screen (going back to the Blogger control panel), and do the preview there.</p><p>It wasn't just me, either. Checking the Blogger help community, I found a ton of posts about this, on pretty much all operating systems and browsers, with some dated this month. A tip about fixing the problem on Safari worked for me. The key (somewhat obvious in hindsight) is that Blogger needs permission to open a pop-up. This was not <i>entirely</i> obvious to me, since I don't consider opening a tab the same as opening a pop-up, but so be it. In Firefox, with any Blogger screen displayed, click the padlock icon in the URL bar, and under "Permissions" allow the site to open pop-ups.</p><p>Other users said they had the same problem with Chrome, which is interesting in that preview works fine for me on Chrome, and I don't recall giving explicit permission there. At any rate, I seem to be back in business.</p><p>And yes, I previewed this entry before posting it.</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-39034709043167372022020-07-20T14:49:00.001-04:002020-07-20T14:49:25.722-04:00Longest Increasing Subsequence<div>In a <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2020/07/longest-increasing-subsequence.html" target="_blank">recent blog post</a> (whose title I have shamelessly appropriated), Erwin Kalvelagen discusses a mixed-integer nonlinear programming formulation (along with possible linearizations) for a <a href="https://leetcode.com/problems/longest-increasing-subsequence/" target="_blank">simple problem</a> from a coding challenge: "Given an unsorted array of integers, find the length of longest increasing subsequence." The challenge stipulates at worst $O(n^2)$ complexity, where $n$ is the length of the original sequence. Erwin suggests the intent of the original question was to use dynamic programming, which makes sense and meets the complexity requirement.</div><div><br /></div><div>I've been meaning for a while to start fiddling around with binary decision diagrams (BDDs), and this seemed like a good test problem. Decision diagrams originated in computer science, where the application was evaluation of possibly complicated logical expressions, but recently they have made their way into the discrete optimization arena. If you are looking to familiarize yourself with decision diagrams, I can recommend a book by Bergman et al. [1].</div><div><br /></div><div>Solving this problem with a binary decision diagram is equivalent to solving it with dynamic programming. Let $[x_1, \dots, x_n]$ be the original sequence. Consistent with Erwin, I'll assume that the $x_i$ are nonnegative and that the subsequence extracted must be strictly increasing.</div><div><br /></div><div>We create a layered digraph in which each node represents the value of the largest (and hence most recent) element in a partial subsequence, and has at most two children. Within a layer, no two nodes have the same state, but nodes in different layers can have the same state. We have $n+2$ layers, where in layer $j\in\lbrace 1,\dots,n \rbrace$ you are deciding whether or not to include $x_j$ in your subsequence. One child, if it exists, represents the state after adding $x_j$ to the subsequence. This child exists only if $x_j$ is greater than the state of the parent node (because the subsequence must be strictly increasing). The other child, which always exists, represents the state when $x_j$ is omitted (which will be the same as the state of the parent node). Layer 1 contains a root node (with state 0), layer $n+1$ contains nodes corresponding to completed subsequences, and layer $n+2$ contains a terminal node (whose state will be the largest element of the chosen subsequence). Actually, you could skip layer $n+1$ and follow layer $n$ with the terminal layer; in my code, I included the extra layer mainly for demonstration purposes (and debugging).<br /></div><div><br /></div><div>In the previous paragraph, I dealt a card off the bottom of the deck. The state of a node in layer $j$ is the largest element of a partial subsequence based on including or excluding $x_1,\dots,x_{j-1}$. The sneaky part is that more than one subsequence may be represented at that node (since more than one subsequence of $[x_1,\dots,x_{j-1}]$ my contain the same largest element). In addition to the state of a node, we also keep track at each node of the longest path from the root node to that node and the predecessor node along the longest path, where length is defined as the number of yes decisions from the root to that node. So although multiple subsequences may lead to the same node, we only care about one (the longest path, breaking ties arbitrarily). Note that by keeping track of the longest path from root to each node as we build the diagram, we actually solve the underlying problem during the construction of the BDD.<br /></div><div><br /></div><div>The diagram for the original example ($n=8$) is too big to fit here, so I'll illustrate this using a smaller initial vector: $x=[9, 2, 5, 3]$. The BDD is shown below (as a PDF file, so that you can zoom in or out while maintaining legibility).</div><iframe height="480" src="https://drive.google.com/file/d/1bwsRLSojqGj4Iy27_ARM2HYPwey0mnYI/preview" width="640"></iframe><div><br /></div><div>The first four layers correspond to decisions on whether to use a sequence entry or not. (The corresponding entries are shown in the right margin.) Nodes "r" and "t" are root and terminus, respectively. The remaining nodes are numbered from 1 to 14. Solid arrows represent decisions to use a value, so for instance the solid arrow from node 4 to node 8 means that 5 ($x_3$) has been added to the subsequence. Dashed arrows represent decisions not to use a value, so the dashed arrow from node 4 to node 7 means that 5 ($x_3$) is not being added to the subsequence. Dotted arrows (from the fifth layer to the sixth) do not represent decisions, they just connect the "leaf" nodes to the terminus.</div><div><br /></div><div>The green(ish) number to the lower left of a node is the state of the node, which is the largest element included so far in the subsequence. The subsequence at node 4 is just $[2]$ and the state is 2. At node 7, since we skipped the next element, the subsequence and state remain the same. At node 8, the subsequence is now $[2, 5]$ and the state changes to 5.</div><div><br /></div><div>The red numbers $d_i:p_i$ to the lower right of a node $i$ are the distance (number of solid arcs) from the root to node $i$ along the longest path ($d_i$) and the predecessor of node $i$ on the longest path ($p_i$). Two paths converge at $i=13$: a path $r \dashrightarrow 2 \rightarrow 4 \dashrightarrow 7 \rightarrow 13$ of length 2 and a path $r \dashrightarrow 2 \dashrightarrow 5 \dashrightarrow 9 \rightarrow 13$ of length 1. So the longest path to node 13 has length 2 and predecessor node 7. Backtracking from the terminus (distance 2, predecessor either 12 or 13), we get optimal paths $r \dashrightarrow 2 \rightarrow 4 \rightarrow 8 \dashrightarrow 12 \dashrightarrow t$ (subsequence $[2, 5]$) and $r \dashrightarrow 2 \rightarrow 4 \dashrightarrow 7 \rightarrow 13 \dashrightarrow t$ (subsequence $[2, 3]$), the latter shown in blue.<br /></div><div><br /></div><div>In addition to the original example from the coding challenge ($n=8$), Erwin included an example with $n=100$ and longest increasing subsequence length 15. (There are multiple optimal solutions to both the original example and the larger one.) Gurobi solved the larger example to proven optimality in one second (probably less, since the output likely rounded up the time). My highly non-optimized Java code solved the $n=100$ example in 6 ms. on my PC (not including the time to print the results).</div><div><br /></div><div>BDDs can get large in practice, with layers growing combinatorially. In this case, however, that is not a problem. Since the state of a node is the largest value of a subsequence, there can be at most $n$ different states. Given the stipulation that no two nodes in a layer have the same state, that means at most $n$ states in a layer. For Erwin's example with $n=100$, the largest layer in fact contained 66 nodes.</div><div><br /></div><div>As I said earlier, using the BDD here is equivalent to using dynamic programming. With $n+2$ layers, at most $n$ nodes in a layer, and two operations on each node (figuring out the state and path length of the "yes" child and the "no" child), the solution process is clearly $O(n^2)$.<br /></div><div><br /></div><div>[1] D. Bergman, A. A. Cire, W.-J. van Hoeve and J. Hooker. <i>Decision Diagrams for Optimization</i> (B. O’Sullivan and M. Wooldridge, eds.). Springer International Publishing AG, 2016.<br /><br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0