tag:blogger.com,1999:blog-87813834610619295712020-10-28T17:59:54.140-04:00OR in an OB WorldA mix of operations research items and software tricks that I'll probably forget if I don't write them down somewhere.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.comBlogger422125tag:blogger.com,1999:blog-8781383461061929571.post-89134421917071195012020-10-16T19:02:00.000-04:002020-10-16T19:02:12.221-04:00Multilogit Fit via LP<p> A <a href="https://or.stackexchange.com/questions/5040/linearizing-a-program-with-multinomial-logit-in-the-objective" target="_blank">recent question</a> on OR Stack Exchange has to do with getting an $L_1$ regression fit to some data. (I'm changing notation from the original post very slightly to avoid mixing sub- and super-scripts.) The author starts with $K$ observations $y_1, \dots, y_K$ of the dependent variable and seeks to find $x_{i,k} \ge 0$ ($i=1,\dots,N$, $k=1,\dots,K$) so as to minimize the $L_1$ error $$\sum_{k=1}^K \left|y_k - \sum_{i=1}^N \frac{e^{x_{i,k}}}{\sum_{j=1}^K e^{x_{i,j}}}\right|.$$ The author was looking for a way to linearize the objective function.</p><p>The solution I proposed there begins with a change of variables: $$z_{i,k}=\frac{e^{x_{i,k}}}{\sum_{j=1}^K e^{x_{i,j}}}.$$ The $z$ variables are nonnegative and must obey the constraint $$\sum_{k=1}^{K}z_{i, k}=1\quad\forall i=1,\dots,N.$$ With this change of variables, the objective becomes $$\sum_{k=1}^K \left|y_k - \sum_{i=1}^N z_{i,k} \right|.$$ Add nonnegative variables $w_k$ ($k=1,\dots, K$) and the constraints $$-w_k \le y_k - \sum_{i=1}^N z_{i,k} \le w_k \quad \forall k=1,\dots,K,$$ and the objective simplifies to minimizing $\sum_{k=1}^K w_k$, leaving us with an easy linear program to solve.</p><p>That leaves us with the problem of getting from the LP solution $z$ back to the original variables $x$. It turns out the transformation from $x$ to $z$ is invariant with respect to the addition of constant offsets. More precisely, for any constants $\lambda_i$ ($i=1,\dots,N$), if we set $$\hat{x}_{i,k}=x_{i,k} + \lambda_i \quad \forall i,k$$ and perform the $x\rightarrow z$ transformation on $\hat{x}$, we get $$\hat{z}_{i,k}=\frac{e^{\lambda_{i}}e^{x_{i,k}}}{\sum_{j=1}^{K}e^{\lambda_{i}}e^{x_{i,j}}}=z_{i,k}\quad\forall i,k.$$ This allows us to convert from $z$ back to $x$ as follows. For each $i$, set $j_0=\textrm{argmin}_j z_{i,j}$ and note that $$\log\left(\frac{z_{i,k}}{z_{i,j_0}}\right) = x_{i,k} - x_{i, j_0}.$$ Given the invariance to constant offsets, we can set $x_{i, j_0} = 0$ and use the log equation to find $x_{i,k}$ for $k \neq j_0$.</p><p>Well, almost. I dealt one card off the bottom of the deck. There is nothing stopping the LP solution $z$ from containing zeros, which will automatically be the smallest elements since $z \ge 0$. That means the log equation involves dividing by zero, which has been known to cause black holes to erupt in awkward places. We can fix that with a slight fudge: in the LP model, change $z \ge 0$ to $z \ge \epsilon$ for some small positive $\epsilon$ and hope that the result is not far from optimal.</p><p>I tested this with an R notebook. In it, I generated values for $y$ uniformly over $[0, 1]$, fit $x$ using the approach described above, and also fit it using a genetic algorithm for comparison purposes. In my experiment (with dimensions $K=100$, $N=10$), the GA was able to match the LP solution if I gave it enough time. Interestingly, the GA solution was dense (all $x_{i,j} > 0$) while the LP solution was quite sparse (34 of 1,000 values of $x_{i,j}$ were nonzero). As shown in the notebook (which you can download <a href="http://rubin.msu.domains/blog/multilogit.nb.html" target="_blank">here</a>), the LP solution could be made dense by adding positive amounts $\lambda_i$ as described above, while maintaining the same objective value. I tried to make the GA solution sparse by subtracting $\lambda_i = \min_k x_{i,k}$ from the $i$-th row of $x$. It preserved nonnegativity of $x$ and maintained the same objective value, but reduce density only from 1 to 0.99.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-22008479333438793402020-09-30T19:13:00.000-04:002020-09-30T19:13:05.057-04:00A Greedy Heuristic Wins<p> A <a href="https://or.stackexchange.com/questions/4777/combinatorial-optimisation-allocation-problem/" target="_blank">problem</a> posted on OR Stack Exchange starts as follows: "I need to find two distinct values to allocate, and how to allocate them in a network of stores." There are $n$ stores (where, according to the poster, $n$ can be close to 1,000). The two values (let's call them $x_1$ and $x_2$) must be integer, with $x_1 \in \lbrace 1, \dots, k_1 \rbrace$ and $x_2 \in \lbrace k_1, \dots, k_2 \rbrace$ for given parameters $k_1 < k_2$. Additionally, there is an additional set of parameters $s_{i3}$ and a balance constraint saying $$0.95 g(k_1 e) \le g(x_1, x_2) \le 1.05 g(k_1 e)$$ where $$g(y) = \sum_{i=1} \frac{s_{i3}}{y_i}$$ for any allocation $y$ and $e = (1,\dots, 1).$</p><p>The cost function (to be minimized) has the form $$f(x_1, x_2) = a\sum_{i=1}^n \left[ s_{i1}\cdot \left( \frac{s_{i2}}{y_i} \right)^b \right]$$with $a$, $s_{i1}$, $s_{i2}$ and $b$ all parameters and $y_i \in \lbrace x_1, x_2 \rbrace$ is the allocation to store $i$. There are two things to note about $f$. First, the leading coefficient $a (> 0)$ can be ignored when looking for an optimum. Second, given choices $x_1$ and $x_2>x_1$, the cheaper choice at all stores will be $x_1$ if $b < 0$ and $x_2$ if $b > 0$.</p><p>It's possible that a nonlinear solver might handle this, but I jumped straight to metaheuristics and, in particular, my go-to choice among metaheuristics -- a genetic algorithm. Originally, genetic algorithms were intended for unconstrained problems, and were tricky to use with constrained problems. (You could bake a penalty for constraint violations into the fitness function, or just reject offspring that violated any constraints, but neither of those approaches was entirely satisfactory.) Then came a breakthrough, the random key genetic algorithm [1]. A random key GA uses a numeric vector $v$ (perhaps integer, perhaps byte, perhaps double precision) as the "chromosome". The user is required to supply a function that translates any such chromosome into a <i>feasible</i> solution to the original problem.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">I did some experiments in R, using the <a href="https://cran.r-project.org/web/packages/GA/index.html" target="_blank">GA</a> package to implement a random key genetic algorithm. The package requires all "genes" (think "variables") to be the same type, so I used a double-precision vector of dimension $n_2$ for chromosomes. The last two genes have domains $(1, k_1 + 1)$ and $(k_1, k_2 + 1)$; the rest have domain $(0, 1)$. Decoding a chromosome $v$ proceeds as follows. First, $x_1 = \left\lfloor v_{n+1}\right\rfloor $ and $x_2 = \left\lfloor v_{n+2}\right\rfloor $, where $\left\lfloor z \right\rfloor$ denotes the "floor" (greatest lower integer) of $z$. The remaining values $v_1, \dots, v_{n}$ are sorted into ascending order, and their sort order is applied to the stores. So, for instance, if $v_7$ is the smallest of those genes and $v_{36}$ is the largest, then store $7$ will be first in the sorted list of stores and store $36$ will be last. (The significance of this sorting will come out in a second.)</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Armed with this, my decoder initially assigns every store the cheaper choice between $x_1$ and $x_2$ and computes the value of $g()$. If $g()$ does not fall within the given limits, the decoder runs through the stores in their sorted order, switching the allocation to the more expensive choice and updating $g()$, until $g()$ meets the balance constraint. As soon as it does, we have the decoded solution. This cheats a little on the supposed guarantee of feasibility in a decoded solution, since there is a (small?) (nearly zero?) chance that the decoding process will fail with $g()$ jumping from below the lower bound to above the upper bound (or vice versa) after some swap. If it does, my code discards the solution. This did not seem to happen in my testing.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The GA seemed to work okay, but it occurred to me that I might be over-engineering the solution a bit. (This would not be the first time I did that.) So I also tried a simple greedy heuristic. Since $k_1$ and $k_2$ seem likely to be relatively small in the original poster's problem (whereas $n$ is not), my greedy heuristic loops through all valid combinations of $x_1$ and $x_2$. For each combination, it sets $v_1$ equal to the cheaper choice and $v_2$ equal to the more expensive choice, assigns the cheaper quantity $v_1$ to every store and computes $g()$. It also computes, for each store, the ratio \[ \frac{|\frac{s_{i3}}{v_{2}}-\frac{s_{i3}}{v_{1}}|}{s_{i1}\left(\left(\frac{s_{i2}}{v_{2}}\right)^{b}-\left(\frac{s_{i1}}{v_{1}}\right)^{b}\right)} \]in which the numerator is the absolute change in balance at store $i$ when switching from the cheaper allocation $v_1$ to the more expensive allocation $v_2$, and the denominator is the corresponding change in cost. The heuristic uses these ratios to select stores in descending "bang for the buck" order, switching each store to the more expensive allocation until the balance constraint is met.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Both the GA decoder and the greedy heuristic share the approach of initially allocating every store the cheaper choice and then switching stores to the more expensive choice until balance is attained. My R notebook generates a random problem instance with $n=1,000$ and then solves it twice, first with the GA and then with the greedy heuristic. The greedy heuristic stops when all combinations of $x_1$ and $x_2$ have been tried. Stopping criteria for the GA are more arbitrary. I limited it to at most 1,000 generations (with a population of size 100) or 20 consecutive generations with no improvement, whichever came first.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The results on a typical instance were as follows. The GA ran for 49 seconds and got a solution with cost 1065.945. The greedy heuristic needed only 0.176 seconds to get a solution with cost 1051.735. This pattern (greedy heuristic getting a better solution in much less time) repeated across a range of random number seeds and input parameters, including switching between positive and negative values of $b$.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">If you are interested, you can browse <a href="https://rubin.msu.domains/blog/stores.nb.html" target="_blank">my R notebook</a> (which includes both code and results).<br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">[1] Bean, J. C. (1994). Genetic Algorithms and Random Keys for Sequencing and Optimization. <i>ORSA Journal on Computing</i>, <b>6</b>, 154-160.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-71469884566776286292020-09-03T16:22:00.003-04:002020-09-27T17:49:01.846-04:00Installing Rcplex and cplexAPI<div><p>I've previously mentioned solving MIP models in R, using CPLEX. In one post [1], I used the <a href="https://cran.r-project.org/web/packages/ompr/index.html" target="_blank">OMPR</a> package, which provides a domain specific language for model construction. OMPR uses the <a href="https://cran.r-project.org/web/packages/ROI/index.html" target="_blank">ROI</a> package, and in particular the <a href="https://cran.r-project.org/web/packages/ROI.plugin.cplex/index.html">ROI.plugin.cplex</a> package, to communicate with CPLEX. That, in turn, uses the <a href="https://cran.r-project.org/web/packages/Rcplex/index.html" target="_blank">Rcplex</a> package. In another post [2], I used Rcplex directly. Meanwhile, there is still another package, <a href="https://cran.r-project.org/web/packages/cplexAPI/index.html" target="_blank">cplexAPI</a>, that provides a low-level API to CPLEX.</p><p>Both Rcplex and cplexAPI will install against CPLEX Studio 12.8 and earlier, but neither one installs with CPLEX Studio 12.9 or 12.10. Fortunately, IBM's Daniel Junglas was able to hack solutions for both of them. I'll spell out the steps I used to get Rcplex working with CPLEX 12.10. You can find the solutions for both in the responses to <a href="https://community.ibm.com/community/user/datascience/communities/community-home/digestviewer/viewthread?MessageKey=0969e4fc-3ba7-42ed-91aa-5fd73f571481&CommunityKey=ab7de0fd-6f43-47a9-8261-33578a231bb7&tab=digestviewer&reply-inline=0969e4fc-3ba7-42ed-91aa-5fd73f571481&SuccessMsg=Thank%20you%20for%20submitting%20your%20message.#bm3ddfcd6c-a949-41c9-95a5-4ca0b6e65f64" target="_blank">this question</a> on the IBM Decision Optimization community site. Version information for what follows is: Linux Mint 19.3; CPLEX Studio 12.10; R 3.6.3; and Rcplex 0.3-3. Hopefully essentially the same hack works with Windows.<br /></p><ol style="text-align: left;"> <li>Download Rcplex_0.3-3.tar.gz, put it someplace harmless (the Downloads folder in my case, but /tmp would be fine) and expand it, producing a folder named Rcplex.</li> <li>Go to the Rcplex folder and open the 'configure' file in a text editor (one you would use for plain text files).</li> <li>Line 1548 should read as follows:<br /><div align="center"><pre>CPLEX_LIBS="-L${CPLEXLIBDIR} `${AWK} 'BEGIN {FS = " = "} /^CLNFLAGS/ {print $2}' ${CPLEX_MAKEFILE}`"</pre></div>.<br /> Replace it with <br /><div align="center"><pre>CPLEX_LIBS="-L${CPLEXLIBDIR} `${AWK} 'BEGIN {FS = " = "} /^CLNFLAGS/ {print $2}' ${CPLEX_MAKEFILE} | sed -e 's,\$(CPLEXLIB),cplex,'`"</pre></div>.<br /> Save the modified file.<br /></li> <li>Open a terminal in the parent directory of the Rcplex folder and run the following command:<br /> <div align="center"><pre>R CMD INSTALL --configure-args="--with-cplex-dir=.../CPLEX_Studio1210/cplex/" ./Rcplex</pre></div>.<br /> Adjust the file path (particularly the ...) so that it points to the 'cplex' directory in your CPLEX Studio installation (the one that has subdirectories named "bin", "examples", "include" etc.).</li> <li>Assuming there were no error messages during installation, you should be good to go.<br /></li></ol></div><br /><div><p>[1] <a href="https://orinanobworld.blogspot.com/2016/11/mip-models-in-r-with-ompr.html" target="_blank">https://orinanobworld.blogspot.com/2016/11/mip-models-in-r-with-ompr.html</a></p><p>[2] <a href="https://orinanobworld.blogspot.com/2020/08/a-group-selection-problem.html" target="_blank">https://orinanobworld.blogspot.com/2020/08/a-group-selection-problem.html</a></p><p><b>Update</b>: Version 1.4.0 of cplexAPI, released on 2020-09-21, installs correctly against CPLEX 12.10 (and presumably 12.9), at least on my system (Linux Mint).<br /></p></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-75094762787711981952020-08-29T14:42:00.001-04:002020-08-29T14:42:51.178-04:00A Group Selection Problem<p>Someone posted an interesting <a href="https://stackoverflow.com/questions/63584707/r-binary-integer-optimization-with-groups" target="_blank">question</a> about nonlinear integer programming with grouped binary variables on Stack Overflow, and it drew multiple responses. The problem is simple to state. You have 52 binary variables $x_i$ partitioned into 13 groups of four each, with a requirement that exactly one variable in each group take the value 1. So the constraints are quite simple:</p><p>\begin{align*} x_{1}+\dots+x_{4} & =1\\ x_{5}+\dots+x_{8} & =1\\ \vdots\\ x_{49}+\dots+x_{52} & =1. \end{align*}</p><p>The objective function is a cubic function of the form</p><p>\[ \left(\alpha\sum_{i}a_{i}x_{i}\right)\times\left(\beta\sum_{j}b_{j}x_{j}+\beta_{0}\right)\times\left(\gamma\sum_{k}c_{k}x_{k}+\gamma_{0}\right) \] where $\alpha = 1166/2000$, $\beta = 1/2100$, $\beta_0 = 0.05$, $\gamma = 1/1500$ and $\gamma_0 = 1.5$. (In the original post, there is a minus sign in front of the function and the author minimizes; for various reasons I am omitting the minus sign and maximizing here.) Not only is the objective nonlinear, it is nonconvex if minimizing (nonconcave if maximizing). The author of the question was working in R.<br /></p><p>Fellow blogger Erwin Kalvelagen <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2020/08/minlp-vs-miqcp.html" target="_blank">solved the problem</a> with a variety of nonlinear optimizers, obtaining a solution with objective value -889.346. Alex Fleischer of IBM posted an answer with the same objective value, using a constraint programming model written in OPL and solved with CP Optimizer.</p><p>My initial thought was to linearize the objective function by introducing continuous variables $y_{ij} = x_i \cdot x_j$ and $z_{ijk} = x_i \cdot x_j \cdot x_k$ with domain [0,1]. Many of those variables can be eliminated, due in part to symmetry ($y_{ij} = y_{ji}$, $z_{ijk} = z_{ikj}=\dots=z_{kji}$ and in part due to the observation that $y_{ii}=z_{iii}=x_i$. Also useful is that for $i<j<k$ $z_{ijk}=x_i \cdot y_{jk}$. I have an <a href="http://rubin.msu.domains/blog/groupselect.nb.html" target="_blank">R notebook</a> that you can download, in which I build the model using standard linearizations for the product of two binarys, then try to solve it with CPLEX using the Rcplex package (and the Matrix package, which allows a sparse representation of the constraint matrix). The results were, shall we say, unspectacular. With a five minute time limit (much longer than what Erwin or Alex needed), CPLEX found an incumbent with value 886.8748 (not bad but not optimal) and a rather dismal optimality gap of 146.5% (due mainly to a loose and slow moving bound).</p><p>Out of curiosity, I took a second shot using a genetic algorithm and the GA package for R. I was geeked to see that the GA package includes both an island model (using parallel processing) and a permutation variant (which lets me use permutations of the indices 1 to 52 as chromosomes with no extra work on my part). The permutation approach allows me to treat a chromosome as a prioritization of the 52 binary variables, which I decode into a solution $x$ by scanning the $x_i$ in priority order and setting each to 1 if and only none of the other variables in its group of four has been set to 1. <a href="http://rubin.msu.domains/blog/groupselect2.nb.html" target="_blank">That R notebook</a> is also available for download.</p><p>As a metaheuristic, the GA does not offer a proof of optimality, and in fact may or may not find the optimal solution. With my inspired choice of random number seed (123), I matched Erwin's and Alex's solution (889.3463). The settings I used resulted in a run time of about 36 seconds on my PC, more than half of which was spent after the best solution had been found. It's still slower than what Erwin and Alex achieved, but it is a "pure R" solution, meaning it requires nothing besides open-source R packages.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-9330047517428646232020-08-23T14:53:00.000-04:002020-08-23T14:53:06.070-04:00Multiobjective Optimization in CPLEX<p>In my <a href="https://orinanobworld.blogspot.com/2020/08/multiobjective-optimization.html" target="_blank">previous post</a>, I discussed multiobjective optimization and ended with a simple example. I'll use this post to discuss some of the new (as of version 12.9) features in CPLEX related to multiobjective optimization, and then apply them to the example from the previous post. My Java code can be downloaded from my <a href="https://gitlab.msu.edu/orobworld/multiobj" target="_blank">GitLab repository</a>.</p><p>Currently (meaning as of CPLEX version 12.10), CPLEX supports multiple objectives in linear and integer programs. It allows mixtures of "blended" objective functions (weighted combinations of original criteria) and "lexicographic" hierarchical objectives. Basically, you set one or more hierarchy (priority) levels, and in each one you can have a single criterion or a weighted combination of criteria. So the "classical" preemptive priority approach would involve multiple priority levels with a single criterion in each, while the "classical" weighted combination approach would involve one priority level with a blended objective in it. Overall, you are either maximizing or minimizing, but you can use negative weights for criteria that should go the opposite direction of the rest. In the example here, which is a minimization problem, the third priority level gives maximum provider utilization a weight of +1 (because we want to minimize it) and minimum provider utilization a weight of -1 (because we want to maximize it).</p><p>There are some limitations to the use of multiple objectives. The ones I think are of most general interest are the following:</p><ul style="text-align: left;"><li>objectives and constraints <i>must be linear</i> (no quadratic terms); and</li><li>all generic callbacks and legacy <i>information</i> callbacks can be used, but other legacy callbacks, and in particular legacy <i>control</i> callbacks (branch callbacks, cut callbacks etc.) cannot be used. So if you need to use callbacks with a multiobjective problem, now would be a good time to learn the new generic callback system.</li></ul><p>Every criterion you specify has a priority level and, within that priority level, a weight. A feature that I appreciate, and which I will use in the code, is that you can also specify an absolute and/or a relative tolerance for each criterion. The tolerances tell CPLEX how much it can sacrifice in that criterion to improve lower priority criteria. The default tolerance is zero, meaning higher priority criteria must be optimized before lower priority criteria are even considered. A nonzero tolerance basically tells CPLEX that is allowed to sacrifice some amount (either an absolute amount or a percentage of the optimal value) in that criterion in order to improve lower priority criteria.</p><p>Defining the variables and building the constraints of a multiobjective model is no different from a typical single criterion model. Getting the solution after solving the model is also unchanged. The differences come mainly in how you specify the objectives and how you invoke the solver.</p><p>To build the objective function, you need to use one of the several overloads of <span style="font-family: courier;">IloCplex.staticLex()</span>. The all take as first argument a one dimensional array of expressions <span style="font-family: courier;">IloNumExpr[]</span>, and they all return an instance of the new interface <span style="font-family: courier;">IloCplexMultiCriterionExpr</span>. In addition to an array of objective expressions, one of the overloads lets you also specify arrays of weights, priorities and tolerances (absolute and relative). That's the version used in my sample code. </p><p>This brings me to a minor inconvenience relative to a conventional single objective problem. Ordinarily, I would use <span style="font-family: courier;">IloCplexModeler.addMinimize(expr)</span> or <span style="font-family: courier;">IloCplexModeler.addMaximize(expr)</span> to add an objective to a model, where <span style="font-family: courier;">expr</span> is an instance of <span style="font-family: courier;">IloNumExpr</span>. I naively thought to do the same here, using the output of <span style="font-family: courier;">staticLex()</span> as the expression, but that is not (currently) supported. There's no overload of <span style="font-family: courier;">addMinimize()</span> or <span style="font-family: courier;">addMaximize()</span> that accepts a multicriterion expression. So it's a three step process: use <span style="font-family: courier;">cplex.staticLex(...)</span> to create the objective and save it to a temporary variable (where <span style="font-family: courier;">cplex</span> is your <span style="font-family: courier;">IloCplex</span> instance); pass that variable to either <span style="font-family: courier;">cplex.minimize(...)</span> or <span style="font-family: courier;">cplex.maximize(...)</span> and save the resulting instance of <span style="font-family: courier;">IloObjective</span> in a temporary variable; and then invoke <span style="font-family: courier;">cplex.add(...)</span> on that variable.</p><p>When you are ready to solve the model, you invoke the <span style="font-family: courier;">solve()</span> method on it. You can continue to use the version of <span style="font-family: courier;">solve()</span> that takes no arguments (which is what my code does), or you can use a new version that takes as argument an array of type <span style="font-family: courier;">IloCplex.ParameterSet[]</span>. This allows you to specify different parameter settings for different priority levels.</p><p>Other methods you might be interested in are <span style="font-family: courier;">IloCplex.getMultiObjNSolves()</span> (which gets the number of subproblems solved) and <span style="font-family: courier;">IloCplex.getMultiObjInfo()</span> (which lets you look up a variety of things that I really have not explored yet).</p><p>The output from my code (log file), which is in the repository, is too lengthy to show here, but if you want you can use <a href="https://gitlab.msu.edu/orobworld/multiobj/-/blob/master/log.txt" target="_blank">this link</a> to open it in a new tab. Here's a synopsis. I first optimized each of the three objective functions separately. (Recall that maximum and minimum provider utilization are blended into one objective.) This gives what is sometimes referred to as the "Utopia point" or "<a href="https://en.wikipedia.org/wiki/Multi-objective_optimization" target="_blank">ideal point</a>". This is column "U" in the table below. Next, I solved the prioritized multiobjective problem. The results are in column "O" of the table. Finally, to demonstrate the ability to be flexible with priorities, I resolved the multiobjective problem using a relative tolerance of 0.1 (10%) for the top priority objective (average distance traveled) and 0.05 (5%) for the second priority objective (maximum distance traveled). Those results are in column "F".</p><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;margin:0px auto;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} </style><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;margin:0px auto;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} .tg .tg-dvpl{border-color:inherit;text-align:right;vertical-align:top} </style><table class="tg"><thead> <tr> <th class="tg-0pky"><br /></th> <th class="tg-c3ow">U</th> <th class="tg-c3ow">O</th> <th class="tg-c3ow">F</th> </tr></thead><tbody> <tr> <td class="tg-0pky">Avg. distance</td> <td class="tg-dvpl">14.489</td> <td class="tg-dvpl">14.489</td> <td class="tg-dvpl">15.888</td> </tr> <tr> <td class="tg-0pky">Max distance</td> <td class="tg-dvpl">58.605</td> <td class="tg-dvpl">58.605</td> <td class="tg-dvpl">60.000</td> </tr> <tr> <td class="tg-0pky">Utilization spread</td> <td class="tg-dvpl">0.030</td> <td class="tg-dvpl">0.267</td> <td class="tg-dvpl">0.030</td> </tr> <tr> <td class="tg-0pky">Max utilization</td> <td class="tg-dvpl">0.710</td> <td class="tg-dvpl">0.880</td> <td class="tg-dvpl">0.710</td> </tr> <tr> <td class="tg-0pky">Min utilization</td> <td class="tg-dvpl">0.680</td> <td class="tg-dvpl">0.613</td> <td class="tg-dvpl">0.680</td> </tr></tbody></table><p>There are a few things to note.</p><ol style="text-align: left;"><li>The solution to the multiobjective model ("O") achieved the ideal values for the first two objectives. One would expect to match the ideal value on the highest priority objective; matching on the second objective was luck. The third objective (utilization spread) was, not surprisingly, somewhat worse than the ideal value.</li><li>Absolute and relative tolerances appear to work the same way that absolute and relative gap tolerances do: if a solution is within <i>either</i> absolute or relative tolerance of the best possible value on a higher priority objective, it can be accepted. In the third run, I set relative tolerances but let the absolute tolerances stay at default values.</li><li>The relative tolerances I set in the last run would normally allow CPLEX to accept a solution with an average travel distance as large as $(1 + 0.1)*14.489 = 15.938$ and a maximum travel distance as large as $(1 + 0.05)*58.605 = 61.535$. There is a constraint limiting travel distance to at most 60, though, which supersedes the tolerance setting.</li><li>The "flexible" solution (column "F") exceeds the ideal average distance by about 9.7%, hits the cap of 60 on maximum travel distance, and actually achieves the ideal utilization spread. However, without knowing the ideal point you would not realize that last part. I put a fairly short time limit (30 seconds) on the run, and it ended with about a 21% gap due to a very slow-moving best bound.</li></ol><p>I'll close with one last observation. At the bottom of the log, after solving the "flexible" variant, you will see the following lines.</p><p><span style="font-family: courier;">Solver status = Unknown.<br />Objective 0: Status = 101, value = <b>14.489</b>, bound = 14.489.<br />Objective 1: Status = 101, value = <b>58.605</b>, bound = 58.605.<br />Objective 2: Status = 107, value = 0.030, bound = 0.023.<br />Final value of average distance traveled = <i>15.888</i>.<br />Final value of longest distance traveled = <i>60.000</i>.<br />Final value of maximum provider utilization = 0.710.<br />Final value of minimum provider utilization = 0.680.</span></p><p>The first four lines are printed by CPLEX, the last four by my code. Note the mismatch in objective values of the first two criteria (bold for CPLEX, italic for my results). CPLEX prints the best value it achieved for each objective <i>before moving on to lower priority objectives</i>. When you are using the default tolerances of zero (meaning priorities are absolute), the printed values will match what you get in the final solution. When you specify non-zero tolerances, though, CPLEX may "give back" some of the quality of the higher priority results to improve lower priority results, so you will need to recover the objective values yourself.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-59476605336400123862020-08-20T14:19:00.000-04:002020-08-20T17:45:47.320-04:00Multiobjective Optimization<p><b>Multiobjective optimization</b> (making "optimal" decisions involving multiple, frequently conflicting, criteria) is a big subject. I will only nibble at the fringes of it here. In the next post, I'll describe recent additions to CPLEX that facilitate solving some multiobjective problems.</p><p>Among the various approaches to multiobjective problems, two are probably the most common, weighting and prioritization. The first approach is to merge the various criteria into a single one, usually (almost always?) by taking a weighted sum of the criteria. The CPLEX documentation refers to this as a <b>blended</b> objective. For this to make sense, the units of the various criteria really should be commensurable (e.g., all monetary values), but I'm pretty sure having criteria that are not commensurable doesn't stop people from trying. The weights serve two roles. First, they bring the units into some semblance of parity (so if $f()$ is in dollars and $g()$ in millions of dollars, $g()$ gets a weight roughly on millionth the size of the weight of $f()$). Second, they convey relative importance of various criteria.<br /></p><p>The second approach is to prioritize the criteria. The solver initially optimizes the highest priority criterion, without regard to any others. Once an optimal value of the highest priority criterion is known, maintaining that value becomes a constraint, and the solver moves to the second highest priority criterion, and so on. The CPLEX documentation refers to this as a lexicographic objective, meaning that the objective function is vector-valued rather than scalar-valued, and optimization means achieving the lexicographically largest or smallest objective vector possible. A variant of this allows a little "slippage" in the value of each criterion, so that for example the solver can accept a solution that is 1% below optimal on the first criterion in return for optimizing the second criterion. A key limitation here is the solver will trade any amount of degradation in a lower priority criterion, no matter how much, for any improvement in a higher priority criterion, no matter how small.</p><p>Although they are not relevant to the recent CPLEX additions, I will mention two other approaches. One is a variant of the priority method, known as <a href="https://en.wikipedia.org/wiki/Goal_programming" target="_blank">goal programming</a> (GP). This was originally developed as an extension of linear programming, but the same general approach can be extended to problems with integer variables. The user sets target levels for each criterion, and then prioritizes them. If a goal is underachieved, work on meeting lower priority goals cannot sacrifice any amount of the higher priority criterion. On the other hand, if a goal is overachieved, any portion of the overachievement can be sacrificed in the quest to reach a lower priority goal. An interesting attribute of goal programming is that the same criterion can be used with more than one goal. Suppose that you are building a GP model allocating a budget to various conservation projects. Your highest priority goal might be to allocate at least 50% of the budget to projects in underserved communities (USCs, to save me typing, with apologies to the universities of South Carolina and Southern Califonia). Your second highest priority goal might be to allocate at least 30% of the budget to projects with matching funds from outside sources. Your third highest priority goal might be to allocate at least 75% of the budget to USCs. The other approach is to investigate the <a href="https://en.wikipedia.org/wiki/Pareto_efficiency" target="_blank">Pareto frontier</a>, the set of all solutions for which no other solution does as well in all criteria and better in at least one. In essence, you want to present the decision-maker with the entire Pareto frontier and say "here, pick one". In practice, computing the Pareto frontier can be very computationally expensive, and trying to make sense of it might cause the decision maker to melt down.</p><p>To close this post, I'll pose a small sample problem and formulate the model for it. Suppose that we have $N$ patients in a health care system and $M$ providers, and that each patient needs to be assigned to a single provider. Provider $j$ has a limit $c_j$ on the number of patients they can handle. (To keep the example simple, and at the expense of some realism, we treat all patients as identical with regard to their capacity consumption.) We are given a matrix $D\in \mathbb{R}^{N\times M}$ of distances from patients to providers, as well as a cap $D_{max}$ on the distance that a patient can be required to travel. There are four criteria to be considered:</p><ul style="text-align: left;"><li>the average distance patients will travel (minimize, highest priority);</li><li>the maximum distance any patient must travel (minimize, second highest priority);</li><li>the maximum utilization of any provider as a fraction of their capacity (minimize, tied for third highest priority); and</li><li>the minimum utilization of any provider as a fraction of their capacity (maximize, tied for third highest priority).</li></ul><p>So we have a mix of three things to minimize and one to maximize, with the last two criteria combining to somewhat level the workload across providers. </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin: 0px; text-indent: 0px;">Let $x_{ij}$ be 1 if patient $i$ is assigned to provider $j$ and 0 if not, let $w$ be the longest distance traveled by any patient, let $y_j$ be the fraction of provider $j$'s capacity that is utilized, and let $z_{lo}$ and $z_{hi}$ be the minimum and maximum capacity utilization rates, respectively (where 0 means the provider is unused and 1 means the provider is operating at capacity). The objective expression is $f\in\mathbb{R}^3$, whose lexicographic minimum we seek, where</p>\[ f=\left[\begin{array}{c} \frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{M}d_{ij}x_{ij}\\ w\\ z_{hi}-z_{lo} \end{array}\right]. \]<p style="-qt-block-indent: 0; margin: 0px; text-indent: 0px;">The first and second components of $f$ are the average and maximum client travel distances. The third component is a weighted mix of maximum and minimum provider utilization, where the weights (+1, -1) are equal in magnitude to reflect the equal importance I am assigning to them and the negative coefficient for minimum utilization allows it to be maximized in what is otherwise a minimization problem.</p><p style="-qt-block-indent: 0; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin: 0px; text-indent: 0px;">The constraints of the model are easy to state:</p><p>\begin{align*} \sum_{j=1}^{M}x_{ij} & =1\quad\forall i\in\left\{ 1,\dots,N\right\} & (1)\\ d_{ij}x_{ij} & \le w\quad\forall i\in\left\{ 1,\dots,N\right\} ,\forall j\in\left\{ 1,\dots,M\right\} & (2)\\ \frac{1}{c_{j}}\sum_{i=1}^{N}x_{ij} & =y_{j}\quad\forall j\in\left\{ 1,\dots,M\right\} & (3)\\ y_{j} & \le z_{hi}\quad\forall j\in\left\{ 1,\dots,M\right\} & (4)\\ y_{j} & \ge z_{lo}\quad\forall j\in\left\{ 1,\dots,M\right\} & (5)\\ x & \in\left\{ 0,1\right\} ^{N\times M} & (6)\\ x_{ij} & =0\quad\forall i,j\ni d_{ij}>D_{max} & (7)\\ y & \in\left[0,1\right]^{M} & (8)\\ z_{hi},z_{lo} & \in\left[0,1\right] & (9)\\ w & \in\left[0,D_{max}\right] & (10) \end{align*} </p><ul style="text-align: left;"><li>Constraint (1) ensures that each patient is assigned to exactly one provider.</li><li>Constraint (2) defines $w$, the maximum distance traveled.</li><li>Constraint (3) defines the fraction $y_j$ of capacity used at each provider $j$.</li><li>Constraints (4) and (5) define $z_{lo}$ and $z_{hi}$.</li><li>Constraints (6), (8), (9) and (10) define variable domains. The upper bound of 1 for $y_j$ in (8) ensures that no provider is assigned more patients than their capacity allows.</li><li>Constraint (7) enforces the travel distance limit $D_{max}$ by preventing any assignments that would violate the limit (effectively removing those assignment variables from the model).</li></ul><p>In the next post, I will show how to solve the model using CPLEX (with, as usual, the Java API).</p><p> <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-8635473320738028192020-08-18T14:13:00.000-04:002020-08-18T14:13:26.969-04:00A Partitioning Problem<p> A <a href="https://math.stackexchange.com/questions/3792115/partition-n-items-into-k-equally-sized-partitions-while-retaining-groups-in/" target="_blank">recent question</a> on Mathematics Stack Exchange dealt with reducing the number of sets in a partition of a set of items. I'll repeat it here but with slightly different terminology from the original question. You start with $N$ items partitioned into $M$ disjoint sets. Your goal is to generate a smaller partition of $K < M$ sets (which I will henceforth call "collections" to distinguish them from the original sets). It is required that all items from any original set end up in the same collection (i.e., you cannot split the original sets). The criterion for success is that "the new [collection] sizes should be as close to even as possible".</p><p>This is easily done with an integer programming model. The author of the question thought about minimizing the variance in the collection sizes, which would work, but I'm fond of keeping things linear, so I will minimize the range of collection sizes. I'll denote the cardinality of original set $i$ by $n_i$. Let $x_{ij}$ be a binary variable which is 1 if set $i\in \lbrace 1,\dots, M\rbrace$ is assigned to collection $j\in \lbrace 1,\dots,K\rbrace$ and 0 if not. Let $y$ and $z$ denote the sizes of the smallest and largest collections. Finally, for $j\in \lbrace 1,\dots,K\rbrace$ let $s_j$ be the size (cardinality) of collection $j$. A MILP model for the problem is the following:</p><p>\begin{align} \min\,z-y\\ \textrm{s.t. }\sum_{j=1}^{K}x_{ij} & =1\;\; \forall i\in\left\{ 1,\dots M\right\} \\ \sum_{i=1}^{M}n_{i}x_{ij} & =s_{j}\;\; \forall j\in\left\{ 1,\dots,K\right\} \\ s_{j} & \le z\;\; \forall j\in\left\{ 1,\dots,K\right\} \\ s_{j} & \ge y\;\; \forall j\in\left\{ 1,\dots,K\right\} \\ y,z,s_{\cdot} & \ge0\\ x_{\cdot\cdot} & \in\left\{ 0,1\right\} \end{align} </p><p>The author of the question also indicated an interest in "fast greedy approximate solutions" (and did not specify problem dimensions). The first greedy heuristic that came to my mind was a simple one. Start with $K$ empty collections and sort the original sets into descending size order. Now assign each set, in turn, to the collection that currently has the smallest size (breaking times whimsically). Why work from largest to smallest set? There will be times when you will want to offset a large set in one collection with two or more smaller sets in another collection, and that will be easier to do if you start big and keep the smaller sets in reserve as long as is possible. <a href="https://math.stackexchange.com/users/683666/robpratt" target="_blank">Rob Pratt</a>, owner of a rather massive reputation score on MSE, correctly noted that this is equivalent to the "longest processing time" heuristic for assigning jobs to machines so as to minimize makespan.</p><p>I put together an <a href="https://rubin.msu.domains/blog/partition.nb.html" target="_blank">R notebook</a> to test this "greedy" heuristic against the optimization model (solved with CPLEX). The notebook uses Dirk Schumacher's <a href="https://cran.r-project.org/web/packages/ompr/index.html" target="_blank">OMPR</a> package for building the MILP model. It in turn uses the <a href="https://cran.r-project.org/web/packages/ROI/index.html" target="_blank">ROI</a> package (which requires the <a href="https://cran.r-project.org/web/packages/Rcplex/index.html" target="_blank">Rcplex</a> package) in order to communicate with CPLEX. On a test run using nice, round values of $N$, $M$ and $K$ that all ended in zeros (and in particular where $K$ divided evenly into both $M$ and $N$), the greedy heuristic nearly found the optimal solution. When I switched to less round numbers ($N=5723$, $M=137$, $K=10$), though, the heuristic did not fare as well. It was fast (well under one second on my PC) but it produced a solution where collection sizes ranged from 552 to 582 (a spread of 30), while CPLEX (in about 21 seconds) found an optimal solution where all collections had size either 572 or 573 (spread of 1). So I tacked on a second heuristic to refine the solution of the first heuristic. The second heuristic attempts pairwise swaps of the smallest set from the smallest collection with a larger set from a larger collection (trying collections in descending size order). Swaps are constrained not to leave the second collection (the one donating the larger set) smaller than the first collection started out. The intuition is to shrink the range by making the smallest collection bigger while shrinking the largest collection if possible and, if not, at least some collection that is larger than the smallest one. The new heuristic also ran in well under one second and shrank the range of collection sizes from 30 to 3 -- still not optimal, but likely good enough for the application the original questioner had in mind.</p><p>You are free to use the R code (which can be extracted from the notebook linked above) under the Creative Commons license that governs the blog.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-2267353170963263182020-08-15T15:11:00.003-04:002020-08-15T15:11:54.956-04:00Firefox and the New Blogger Interface<p>Blogger has a (relatively) new interface, to which I switched a while back. The one major annoyance I found was that clicking the "Preview" button while editing a post did not actually generate a preview. I got a notification (lower left) that the preview was being prepared, and then ... nothing. To get a preview, I had to save my work, exit the edit screen (going back to the Blogger control panel), and do the preview there.</p><p>It wasn't just me, either. Checking the Blogger help community, I found a ton of posts about this, on pretty much all operating systems and browsers, with some dated this month. A tip about fixing the problem on Safari worked for me. The key (somewhat obvious in hindsight) is that Blogger needs permission to open a pop-up. This was not <i>entirely</i> obvious to me, since I don't consider opening a tab the same as opening a pop-up, but so be it. In Firefox, with any Blogger screen displayed, click the padlock icon in the URL bar, and under "Permissions" allow the site to open pop-ups.</p><p>Other users said they had the same problem with Chrome, which is interesting in that preview works fine for me on Chrome, and I don't recall giving explicit permission there. At any rate, I seem to be back in business.</p><p>And yes, I previewed this entry before posting it.</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-39034709043167372022020-07-20T14:49:00.001-04:002020-07-20T14:49:25.722-04:00Longest Increasing Subsequence<div>In a <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2020/07/longest-increasing-subsequence.html" target="_blank">recent blog post</a> (whose title I have shamelessly appropriated), Erwin Kalvelagen discusses a mixed-integer nonlinear programming formulation (along with possible linearizations) for a <a href="https://leetcode.com/problems/longest-increasing-subsequence/" target="_blank">simple problem</a> from a coding challenge: "Given an unsorted array of integers, find the length of longest increasing subsequence." The challenge stipulates at worst $O(n^2)$ complexity, where $n$ is the length of the original sequence. Erwin suggests the intent of the original question was to use dynamic programming, which makes sense and meets the complexity requirement.</div><div><br /></div><div>I've been meaning for a while to start fiddling around with binary decision diagrams (BDDs), and this seemed like a good test problem. Decision diagrams originated in computer science, where the application was evaluation of possibly complicated logical expressions, but recently they have made their way into the discrete optimization arena. If you are looking to familiarize yourself with decision diagrams, I can recommend a book by Bergman et al. [1].</div><div><br /></div><div>Solving this problem with a binary decision diagram is equivalent to solving it with dynamic programming. Let $[x_1, \dots, x_n]$ be the original sequence. Consistent with Erwin, I'll assume that the $x_i$ are nonnegative and that the subsequence extracted must be strictly increasing.</div><div><br /></div><div>We create a layered digraph in which each node represents the value of the largest (and hence most recent) element in a partial subsequence, and has at most two children. Within a layer, no two nodes have the same state, but nodes in different layers can have the same state. We have $n+2$ layers, where in layer $j\in\lbrace 1,\dots,n \rbrace$ you are deciding whether or not to include $x_j$ in your subsequence. One child, if it exists, represents the state after adding $x_j$ to the subsequence. This child exists only if $x_j$ is greater than the state of the parent node (because the subsequence must be strictly increasing). The other child, which always exists, represents the state when $x_j$ is omitted (which will be the same as the state of the parent node). Layer 1 contains a root node (with state 0), layer $n+1$ contains nodes corresponding to completed subsequences, and layer $n+2$ contains a terminal node (whose state will be the largest element of the chosen subsequence). Actually, you could skip layer $n+1$ and follow layer $n$ with the terminal layer; in my code, I included the extra layer mainly for demonstration purposes (and debugging).<br /></div><div><br /></div><div>In the previous paragraph, I dealt a card off the bottom of the deck. The state of a node in layer $j$ is the largest element of a partial subsequence based on including or excluding $x_1,\dots,x_{j-1}$. The sneaky part is that more than one subsequence may be represented at that node (since more than one subsequence of $[x_1,\dots,x_{j-1}]$ my contain the same largest element). In addition to the state of a node, we also keep track at each node of the longest path from the root node to that node and the predecessor node along the longest path, where length is defined as the number of yes decisions from the root to that node. So although multiple subsequences may lead to the same node, we only care about one (the longest path, breaking ties arbitrarily). Note that by keeping track of the longest path from root to each node as we build the diagram, we actually solve the underlying problem during the construction of the BDD.<br /></div><div><br /></div><div>The diagram for the original example ($n=8$) is too big to fit here, so I'll illustrate this using a smaller initial vector: $x=[9, 2, 5, 3]$. The BDD is shown below (as a PDF file, so that you can zoom in or out while maintaining legibility).</div><iframe height="480" src="https://drive.google.com/file/d/1bwsRLSojqGj4Iy27_ARM2HYPwey0mnYI/preview" width="640"></iframe><div><br /></div><div>The first four layers correspond to decisions on whether to use a sequence entry or not. (The corresponding entries are shown in the right margin.) Nodes "r" and "t" are root and terminus, respectively. The remaining nodes are numbered from 1 to 14. Solid arrows represent decisions to use a value, so for instance the solid arrow from node 4 to node 8 means that 5 ($x_3$) has been added to the subsequence. Dashed arrows represent decisions not to use a value, so the dashed arrow from node 4 to node 7 means that 5 ($x_3$) is not being added to the subsequence. Dotted arrows (from the fifth layer to the sixth) do not represent decisions, they just connect the "leaf" nodes to the terminus.</div><div><br /></div><div>The green(ish) number to the lower left of a node is the state of the node, which is the largest element included so far in the subsequence. The subsequence at node 4 is just $[2]$ and the state is 2. At node 7, since we skipped the next element, the subsequence and state remain the same. At node 8, the subsequence is now $[2, 5]$ and the state changes to 5.</div><div><br /></div><div>The red numbers $d_i:p_i$ to the lower right of a node $i$ are the distance (number of solid arcs) from the root to node $i$ along the longest path ($d_i$) and the predecessor of node $i$ on the longest path ($p_i$). Two paths converge at $i=13$: a path $r \dashrightarrow 2 \rightarrow 4 \dashrightarrow 7 \rightarrow 13$ of length 2 and a path $r \dashrightarrow 2 \dashrightarrow 5 \dashrightarrow 9 \rightarrow 13$ of length 1. So the longest path to node 13 has length 2 and predecessor node 7. Backtracking from the terminus (distance 2, predecessor either 12 or 13), we get optimal paths $r \dashrightarrow 2 \rightarrow 4 \rightarrow 8 \dashrightarrow 12 \dashrightarrow t$ (subsequence $[2, 5]$) and $r \dashrightarrow 2 \rightarrow 4 \dashrightarrow 7 \rightarrow 13 \dashrightarrow t$ (subsequence $[2, 3]$), the latter shown in blue.<br /></div><div><br /></div><div>In addition to the original example from the coding challenge ($n=8$), Erwin included an example with $n=100$ and longest increasing subsequence length 15. (There are multiple optimal solutions to both the original example and the larger one.) Gurobi solved the larger example to proven optimality in one second (probably less, since the output likely rounded up the time). My highly non-optimized Java code solved the $n=100$ example in 6 ms. on my PC (not including the time to print the results).</div><div><br /></div><div>BDDs can get large in practice, with layers growing combinatorially. In this case, however, that is not a problem. Since the state of a node is the largest value of a subsequence, there can be at most $n$ different states. Given the stipulation that no two nodes in a layer have the same state, that means at most $n$ states in a layer. For Erwin's example with $n=100$, the largest layer in fact contained 66 nodes.</div><div><br /></div><div>As I said earlier, using the BDD here is equivalent to using dynamic programming. With $n+2$ layers, at most $n$ nodes in a layer, and two operations on each node (figuring out the state and path length of the "yes" child and the "no" child), the solution process is clearly $O(n^2)$.<br /></div><div><br /></div><div>[1] D. Bergman, A. A. Cire, W.-J. van Hoeve and J. Hooker. <i>Decision Diagrams for Optimization</i> (B. O’Sullivan and M. Wooldridge, eds.). Springer International Publishing AG, 2016.<br /><br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-27582986625077667652020-07-12T18:39:00.001-04:002020-07-12T18:42:08.001-04:00Mint 20 Upgrade Hiccup<div>Okay, "hiccup" might be an understatement. Something beginning with "cluster" might be more appropriate.</div><div><br /></div><div>I tried to upgrade my MythTV backend box from Linux Mint 19.3 to Mint 20, using the <a href="https://linuxmint-user-guide.readthedocs.io/en/latest/upgrade-to-mint-20.html" target="_blank">Mint upgrade tool</a>. Even on a fairly fast machine with a fast Internet connection and not much installed on it (MythTV plus the applications that come with Mint), this takes hours. A seemingly endless series of commands scroll in a terminal, and I don't dare walk away for too long, less the process stop waiting for some input from me (it periodically needs my password) or due to a glitch.</div><div><br /></div><div>Speaking of glitches, I noticed that the scrolling stopped and the process seemed to freeze just after a couple of lines about installing symlinks for <a href="https://www.mysql.com/" target="_blank">MySQL</a> and <a href="https://mariadb.org/" target="_blank">MariaDB</a>, two database programs. MariaDB, which I've never had installed before, is apparently a fork of MySQL. MythTV uses MySQL as its database manager. Before upgrading, I had killed the MythTV back end, but I noticed that the MySQL server was still running. On a hunch, I opened a separate terminal and shut down the MySQL server. Sure enough, the upgrade process resumed, with a message about a cancelled job or something, which I think referred to MariaDB. Whether this contributed to the unfolding disaster I do not know.<br /></div><div><br /></div><div>After a reboot, the good news was that everything that should start did start, and the frontend was able to see and play the recorded TV shows. The bad news was that (a) the backend got very busy doing a lot of (alleged) transcoding and scanning for commercials that should not have been necessary (having already been done on all recorded shows) and (b) I could not shut down, because the backend thought it was in a "shutdown/wakeup period", meaning (I think) that it thought it needed to start recording soon -- even though the next scheduled recording was not for a couple of days, and the front end was showing the correct date and time for the next recording. So I think the switch from MySQL to MariaDB somehow screwed up something in the database.<br /></div><div><br /></div><div>From there, things got worse. I had backed up the database, so I tried to restore the backup (using a MythTV script for just that purpose). The script failed because the database already contained data. Following suggestions online, I dropped the relevant table from the database and tried to run an SQL script (mc.sql) to restore a blank version of the table. No joy -- I needed the root MySQL password, and no password I tried would work. There is allegedly a way to reset the root password in a MySQL database, but that didn't work either, and in fact trying to shut the server down using "sudo service mysql stop" did not work (!). The only way to get rid of the service was to use "sudo pkill mysqld".</div><div><br /></div><div>Fortunately, timeshift was able to restore the system to its pre-upgrade state (with a little help from a backup of the MythTV database and recordings folder). For reasons I do not understand (which describes pretty much everything discussed here), restoring the database backup did not cause MythTV to remember this week's schedule of recordings, but as soon as I reentered one (using MythWeb) it remembered the rest. And I thought my memory was acting a bit quirky ...<br /></div><div><br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-65445618454372455332020-06-08T18:30:00.001-04:002020-06-08T18:30:35.362-04:00Zoom on Linux<div>Thanks to the pandemic, I'm been spending a <i>lot</i> of time on Zoom lately, and I'm grateful to have it. The Zoom Linux client seems to be almost as good as the other clients. The only thing that I know is missing is virtual backgrounds, which I do not particularly miss.</div><div><br /></div><div>That said, I did run into one minor bug (I think). It has to do with what I <i>think</i> is called the "panel". (I've found it strangely hard to confirm this, despite a good bit of searching.) What I'm referring to is a widget that sits off to one side (and can be moved by me) when Zoom is running full screen and a presenter is holding the "spotlight" (owning the bulk of the window). The panel has four buttons at the top that let me choose its configuration. Three of them will show just my video (small, medium or large). The fourth one will show a stack of four videos, each a participant (excluding the presenter), with mine first and the other three selected by some rule I cannot fathom. (Empirical evidence suggests it is not selecting the three best looking participants.) Showing my camera image isn't exactly critical, but it's somewhat reassuring (meaning I know my camera is still working, and I'm in its field of view).</div><div><br /></div><div>I'm running Zoom on both a desktop and a laptop, the latter exclusively for online taekwondo classes. On my desktop, the panel behaves as one would expect. On my laptop, however, the panel window assigned to my camera was intermittently blanking out. Randomly moving the cursor around would bring the image back (temporarily). This happened regardless of what panel configuration or size I chose.</div><div><br /></div><div>On a hunch, I disabled the screen lock option on the laptop (which would normally blank the screen or show a lock screen if the laptop sat idle for too long. To be the clear, even with no keyboard/mouse input from me, the laptop was <i>not</i> showing the lock screen or sleeping -- the main presenter was never interrupted. It was just my camera feed that seemed to be napping. That said, disabling the lock screen seems to have helped somewhat. If the panel is showing only my camera, it still blanks after some amount of "idle" time; but if the panel is set to show a stack of four cameras (including mine), mine does not seem to blank out any more.</div><div><br /></div><div>It's still a mystery to me why mine blanks when it's the only one in the panel, although it's clear there's a connection to my not providing any keyboard or mouse input for a while. The blanking never happens on my desktop. They're both running Linux Mint (the laptop having a somewhat newer version), and they're both running the latest version of the Zoom client. The laptop has a built-in camera whereas the desktop has a USB webcam. The desktop, unsurprisingly, has a faster processor, and probably better graphics. My typical desktop Zoom usage does not involve extended periods of inactivity on my part (if I'm not doing something useful as part of the call, I'm surreptitiously checking email or playing Minesweeper), so the lack of blanking on the laptop may just be lack of opportunity. It might be a matter of the desktop having better hardware. It might just be some minor computer deity figuring it's more entertaining to annoy me during a workout than during a meeting. Anyway, turning off the screensaver got rid of at least part of the problem. If anyone knows the real reason and/or the right fix, please leave a comment.<br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-72735368928430832302020-06-01T15:34:00.004-04:002020-06-05T12:15:59.697-04:00An Idea for an Agent-Based Simulation<div>I don't do agent-based simulations (or any other kind of simulations these days), so this is a suggested research topic for someone who does.</div><div><br /></div><div>A number of supermarkets and other large stores have instituted one-way lanes, presumably thinking this will improve physical distancing of customers. I just returned from my local Kroger supermarket, where the narrower aisles have been marked one-way, alternating directions, for a few weeks now. The wider aisles remain bidirectional (or multidirectional, the way some people roll). Despite having been fairly clearly marked for weeks, I would say that close to half of all shoppers (possibly more than half) are either unaware of the direction limits or disregard them. Kroger offers a service where you order online, their employees grab and pack the food (using rather large, multilevel rolling carts), and then bring it out to your waiting car. Kroger refers to this as "Pickup" (formerly "Clicklist"). Interestingly, somewhere between 70% and 90% of the employees doing "Pickup" shopping that I encountered today were going the wrong direction on the directional aisles.</div><div><br /></div><div>My perhaps naive thought is that unidirectional aisles are somewhere between useless and counterproductive, even if people obey the rules. That's based on two observations:</div><div><ol style="text-align: left;"><li>the number of people per hour needing stuff from aisle 13 is unaffected by any directional restrictions on the aisle; and</li><li>obeying the rules means running up extra miles on the cart, as the shopper zips down aisle 12 (which contains nothing he wants) in order to get to the other end, so that he can cruise aisle 13 in the designated direction.</li></ol><div>Of course, OR types could mitigate item 2 by solving TSPs on the (partially directional) supermarket network, charitably (and in my case incorrectly) assuming that they knew which aisle held each item on their shopping list (and, for that matter, charitably assuming that they had a shopping list). I doubt any of us do have supermarket TSPs lying around, and that's beyond the skill set of most other people. So we can assume that shoppers arrive with a list, (mostly) pick up all items from the same aisle in one pass through it, and generally visit aisles in a vaguely ordered way (with occasional doubling back).</div><div><br /></div><div>If I'm right, item 1 means that time spent stationary near other shoppers is not influenced by the one-way rules, and item 2 means that time spent passing shoppers increases (because shoppers have to log extra wasted miles just getting to the correct ends of aisles). So if any of you simulators out there would care to <strike>prove my point </strike>investigate this, knock yourselves out, and please let me know what you find.<br /></div></div><div><br /></div><div>Addendum: I heard an interview with Dr. Samuel Stanley, the current president of Michigan State University, regarding plans for reopening in Fall 2020. During the interview, he mentioned something about creating one-way pedestrian flows on campus. (Good luck with that -- herding undergrads makes herding cats look trivial.) The logic he expressed was that it would reduce face-to-face encounters among pedestrians. Dr. Stanley's academic field is infectious diseases, so presumably he knows whereof he speaks. On the other hand, my impression from various articles and interviews is that droplets emitted by COVID-infected people can linger in the air for a while. So there is a trade-off with one-way routing: an infected person passes fewer people face-to-face, but presumably spreads the virus over a greater area due to longer routes. Has anyone actually studied the trade-off?<br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-10919766709041830632020-05-31T16:31:00.000-04:002020-05-31T16:31:37.759-04:00A Simple Constrained Optimization<div>A question posted to OR Stack Exchange, "<a href="https://or.stackexchange.com/questions/4248/linear-optimization-problem-with-user-defined-cost-function" target="_blank">Linear optimization problem with user-defined cost function</a>", caught my eye. The question has gone through multiple edits, and the title is a bit misleading, in that the objective function is in at least some cases nonlinear. The constraints are both linear and very simple. The user is looking for weights to assign to $n$ vectors, and the weights $x_i$ satisfy $$\sum_{i=1}^n x_i = 1\\x \ge 0.$$ Emma, the original poster, put a <a href="https://github.com/emmaparker96/WorkingExampleForStackExchange/blob/master/Example_SLSQP.py" target="_blank">working example</a> (in Python) on GitHub. The simplified version of her cost function includes division of one linear expression by another, with an adjustment to deal with division by zero errors (converting the resulting NaN to 0).</div><div><br /></div><div>The feasible region of the problem is a simplex, which triggered a memory of the <a href="https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method" target="_blank">Nelder-Mead algorithm</a> (which was known as the "Nelder-Mead simplex algorithm" when I learned it, despite confusion with Dantzig's simplex algorithm for linear programs). The Nelder-Mead algorithm, published in 1965, attempts to optimize a nonlinear function (with no guarantee of convergence to the optimum in general), using only function evaluations (no derivatives). It is based on an earlier algorithm (by Spendley, Hext and Himsworth, in 1962), and I'm pretty sure there have been tweaks to Nelder-Mead over the subsequent years.<br /></div><div><br /></div><div> The Nelder-Mead algorithm is designed for unconstrained problems. That said, my somewhat fuzzy recollection was that Nelder-Mead starts with a simplex (hopefully containing an optimal solution) and progressively shrinks the uncertainty region, each time getting a simplex that is a subset of the previous simplex. So if we start with the unit simplex $\lbrace (1,0,0,\dots,0,0), (0,1,0,\dots,0,0),\dots,(0,0,0,\dots,0,1)\rbrace$, which is the full feasible region, every subsequent simplex should be comprised of feasible points. It turns out I was not quite right. Depending on the parameter values you use, there is one step (expansion) that can leave the current simplex and thus possibly violate the sign restrictions. That's easily fixed, though, by checking the step size and shrinking it if necessary.</div><div><br /></div><div>There are several R packages containing a Nelder-Mead function, but most of them look like they are designed for univariate optimization, and the one I could find that was multivariate and allowed specification of the initial simplex would not work for me. So I coded my own, based on the Wikipedia page, which was easy enough. I used what that page describes as typical values for the four step size parameters. It hit my convergent limit (too small a change in the simplex) after 29 iterations, producing a solution that appears to be not quite optimal but close.</div><div><br /></div><div>Just for comparison purposes, I thought I would try a genetic algorithm (GA). GAs are generally not designed for constrained problems, although there are exceptions. (Search "random key genetic algorithm" to find one.) That's easy to finesse in our case. Getting a GA to produce only nonnegative values is easy: you just have to require the code that generates new solutions (used to seed the initial population, and possibly for immigration) and the code that mutates existing solutions to use only nonnegative numbers. That might actually be the default in a lot of GA libraries. "Crossover" (their term for solutions having children) takes care of itself. So we just need to enforce the lone equation constraint, which we can do by redefining the objective function. We allow the GA to produce solutions without regard to the sum of their components, and instead optimize the function $$\hat{f}(x)=f\left(\frac{x}{\sum_{i=1}^n x_i}\right)$$where $f()$ is the original objective function.</div><div><br /></div><div>R has multiple GA packages. I used the `genalg` package in my experiments. Running was 100 generations with a population of size 200 took several seconds (so longer than what Nelder-Mead took), but it produced a somewhat better solution. Since the GA is a random algorithm, running it repeatedly will produce different results, some worse, possibly some better. You could also try restarting Nelder-Mead when the polytope gets too small, starting from a new polytope centered around the current optimum, which might possibly improve on the solution obtained.</div><div><br /></div><div>This was all mostly just to satisfy my curiosity. My R code for both the Nelder-Mead and GA approaches is in an <a href="http://rubin.msu.domains/blog/neldermead.nb.html" target="_blank">R notebook</a> you are free to download.<br /></div><div><br /></div><div><br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-31625472076078172722020-05-23T16:38:00.000-04:002020-05-23T16:38:09.934-04:00Of ICUs and SimulationsI'm a fan of the INFORMS "<a href="https://pubsonline.informs.org/magazine/orms-today/podcasts">Resoundingly Human</a>" podcasts, particularly since they changed the format to shorter (~15 minute) installments. I just listened to a longer entry (40+ minutes) about the use of OR (and specifically simulation models) to help with hospital planning during the pandemic. (Grrrr. I'd hoped to keep the word "pandemic" out of my blog. Oh well.) The title is "<a href="https://pubsonline.informs.org/do/10.1287/orms.2020.03.11p/full/">The dangers of overcrowding: Helping ICUs preserve essential bed space</a>", and the guest is Frances Sneddon, CTO of <a href="https://www.simul8.com/">Simul8 Corporation</a>. I thought the content was interesting, and Frances was very enthusiastic presenting it, so if you have any interest in simulation and/or how OR can help during the (here it comes again) pandemic, I do recommend giving it a listen.<br /><br />One thing that definitely got my attention was Frances's emphasis on building simulation models in a rapid / interactive / iterative / agile way. ("Rapid" was very much her word, and she used "agile" toward the end of the podcast. "Interactive" and "iterative" are my words for the process she described.) Basically (again with my paraphrasing), she said that the best outcomes occur when simulations are born from discussions among users and modelers where the users ask questions, followed by fairly rapid building and running of a new/revised model, followed by discussions with the users and more of the same. Frances at one point drew an analogy to detective work, where running one simulation lets you ferret out clues that lead to questions that lead to the next model.<br /><br />To some extent, I think the same likely holds true of other applications of OR in the real world, including optimization. Having one conversation with the end users, wandering off into a cave to build a model, and then presenting it as a <i>fait accompli</i> is probably not a good way to build trust in the model results and may well leave the user with a model that fundamentally does not get the job done. As a small example, I once worked on a model for assigning school-age children to recreational league athletic teams. The version of the model satisfied all <i>stated</i> constraints, but the user told me it would not work. Some parents have multiple children enrolled in the league, and it is unworkable to expect them to ferry their kids to different teams playing or practicing in different places. So siblings must go on the same team. (There were other constraints that emerged after the initial specification, but I won't bore you with the details.)<br /><br />So on the one hand, I'm predisposed to agree at least somewhat with what Frances said. Here comes the cognitive dissonance (as my erstwhile management colleagues would say). Once upon a time I actually taught simulation modeling. (I won't say exactly when, but in the podcast Frances mentions having been in the OR field for 20 years, and how saying that makes her feel old. The last time I taught simulation was before she entered the field.) Two significant issues, at least back then, were <a href="https://en.wikipedia.org/wiki/Verification_and_validation_of_computer_simulation_models">verifying and validating simulation models</a>. I suspect that verification (essentially checking the correctness of the code, given the model) is a lot easier now, particularly if you are using GUI-based model design tools, where the coding looks a lot like drawing a flow chart from a palette of blocks. The model likely was also presented as a flow chart, so comparing code to model should be straightforward (put the two flow charts side by side). Validation, the process of confirming that the <i>model</i> is correct, may or may not be easier than in the past. To some extent you can achieve "face validity" by talking through the assumptions of the model with the users during those interactive sessions, helped by a flow chart.<br /><br />Back in my day, we also talked about historical validation (running the model with historical inputs and seeing if the results reasonably tracked with historical outputs). When you are trying to answer "what if" questions (what if we reconfigure the ICU this way, or change admissions this way, or ...?), you likely don't have historical data for the alternate configurations, but you can at least validate that the model adequately captures the "base case", whatever that is. Also, "what if" questions are likely to lead you down paths for which you lack hard data for parameter estimates. What if we build a tent hospital in Central Park (which has never been done before)? What do we use for the rate at which patients experience allergy attacks (from plant life in the park that simply does not exist inside the hospital)? My guess is that your only recourse is to run the simulation for multiple values of the mystery parameter, which leads us to a geometric explosion of scenarios as we pile on uncertain parameters. So my question is this: in an interactive loop (meet with users - hack model - collect runs / synthesize output - repeat), can we take reasonable care to preserve validity without exhausting the parties involved, or overloading them with possibilities to the point that there is no actual take-away?<br /><br />Informed opinions are welcome in the comments section. (It's an election year here, so I'm already maxed out on uninformed opinions.)<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-45039053276966663872020-04-24T17:20:00.000-04:002020-04-24T17:23:17.990-04:00Generating Random DigraphsIn a recent post, OR consultant and blogger Erwin Kalvelagen discussed <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2020/04/random-sparse-arcs-in-gams.html" target="_blank">generating a random sparse network</a> in GAMS. More specifically, he starts with a fixed set of nodes and a desired number of arcs, and randomly generates approximately that number of arcs. His best results, in terms of execution time, came from exporting the dimensions to R, running a script there, writing out the arcs and importing them back into GAMS.<br /><br />There are three possible issues with the script. Erwin acknowledged the first, which applies to all his approaches: after removing duplicates, he wound up with fewer than the targeted number of arcs. In many applications this would not be a problems, since you would be looking for "about 1% density" rather than "exactly 1% density". Still, there might be times when you need a specific number of arcs, period. You could supplement Erwin's method with a loop that would test the number of arcs and, if short, would generate more arcs, remove duplicates, add the survivors to the network and repeat.<br /><br />The second possible issue is the occurrence of self-loops (arcs with head and tail the same, such as (5, 5)). Again, this may or may not be a problem in practice, depending on the application. I rarely encounter network applications where self-loops are expected, or even tolerated. Again, you could modify Erwin's code easily to remove self-loops, and it would not increase execution time much.<br /><br />The third possible issue is that some nodes may be "orphans" (no arcs in or out), and others may be accessible only one way (either inward degree 0 or outward degree 0). Once again, the application will dictate whether this is a matter of concern.<br /><br />I decided to test a somewhat different approach to generating the network (using R). It has the advantage of guaranteeing the targeted number of arcs, with no self-loops. (It does not address the issue of orphan nodes.) It has the disadvantage of being slower than Erwin's algorithm (but by what I would call a tolerable amount). My approach is based on assigning an integer index to every possible arc. Assume there are $N$ nodes, indexed $0, \dots, N-1$. (Erwin uses 1-based indexing, but it is trivial to adjust from 0-based to 1-based after the network has been generated.) There are $N^2$ arcs, including self-loops, indexed from $0,\dots,N^2-1$. The arc with index $k$ is given by $$f(k) = (k \div n, k \mod n),$$where $\div$ denotes the integer quotient (so that $7 \div 3 = 2$). As self-loop is an arc whose index $k$ satisfies $k\div n = k \mod n$; those are precisely the arcs with indices $k=m(n + 1)$ for $m=0,\dots,n-1$. So my version of the algorithm is to start with the index set $\lbrace 0,\dots,n^2-1\rbrace$, remove the indices $0, n+1, 2n+2,\dots, n^2-1$, take a random subset of the survivors and apply $f()$ to them.<br /><br />I have an <a href="https://rubin.msu.domains/blog/randomGraph.nb.html" target="_blank">R Notebook</a> that compares the two algorithms, using the same dimensions Erwin had: $n=5000$ nodes, with a target density of 1% (250,000 arcs). Timing is somewhat random, even though I set a fixed random number seed for each algorithm. The notebook includes both code and output. As expected, my code gets the targeted number of arcs, with no self-loops. Erwin's code, as expected, comes up a bit short on the number of arcs, and contains a few (easily removed) self-loops. Somewhat interestingly, in my test runs <i>every node had both inward and outward degree at least 1</i> for both algorithms. I think that is a combination of a fairly large arc count and a bit of luck (the required amount of luck decreasing as the network density increases). If orphans, or nodes with either no outbound or no inbound arcs, turn out to be problems, there is a fairly easy fix for both methods. First, randomly generate either one or two arcs incident on each node (depending on whether you need both inward and outward arcs everywhere). Then generate the necessary number of additional arcs by adjusting the sample size. As before, you might come up a few arcs short with Erwin's algorithm (unless you include a loop to add arcs until the target is reached). In my algorithm, you can easily calculate the indices of the initial set of arcs (the index of arc $(i,j)$ is $n\times i + j$) and then just remove those indices at the same time that you remove the indices of the self-loops, before generating the remaining arcs.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-25269678224676034052020-04-21T13:56:00.000-04:002020-04-21T13:56:10.565-04:00A CP Model for Toasting BreadA <a href="https://math.stackexchange.com/questions/3632331/3-toast-problem-from-thinking-mathematically-solution-check/" target="_blank">question</a> on Mathematics Stack Exchange deals with a problem (apparently from the book "<a href="https://www.google.com/books/edition/_/nCNVQgAACAAJ?hl=en" target="_blank">Thinking Mathematically</a>") about toasting bread on a grill. The grill can hold two slices at a time and can only toast one side of each at a time. You have three slices to toast, and the issue is to figure out how to do it in the minimum possible amount of time (what operations management people refer to as the <i>makespan</i>).<br /><br />The questioner had a solution that I was able to prove is optimal, using a constraint programming (CP) model that I coded using the Java API to IBM's CPOptimizer (part of the CPLEX Studio product). I won't swear my model is elegant or efficient, since I'm pretty new to CPO, but I think it is correct. If anyone getting started with CPO and the Java API wants to see the source code, it is available <a href="https://gitlab.msu.edu/orobworld/toast" target="_blank">in my repository</a>. I'll describe a few key aspects below.<br /><br />I assumed in my model that there is a single cook. The fundamental components of the model are CPO "interval variables" (instances of <span style="font-family: "Courier New", Courier, monospace;">IloIntervalVar</span>) for each task (inserting a slice, toasting one side, removing a slice, flipping a slice) along with a dummy task for being done and a placeholder task I called "reversing". Interval variables represent time spans during which tasks are done.<br /><br />In the problem, there are two ways to get from toasting the front of a slice to toasting the back: you can leave it on the grill and flip it; or you can remove it and (later) replace it with the other side up. Since I didn't know <i>a priori</i> which slices will be handled which way, I created interval variables for removing each slice after the first side, replacing each slice with the second side up, and flipping each slice. Those variables are declared optional, meaning each interval may or may not show up in the solution. For each slice, there is an interval variable for the "reversing" task that is <i>not</i> optional. Each slice has to be reversed, one way or the other. The tasks for replacing a slice (after removing it) and for flipping the slice are listed as alternatives for the reversing task for that slice, which means exactly one of flipping or replacing must be in the solution. Separate constraints ensure that a slice is reinserted if and only if it was removed after the first side toasted. Those constraints use the <span style="font-family: "Courier New", Courier, monospace;">IloCP.presenceOf</span> function to test whether the remove and reinsert intervals are present, and set the results equal (so both present or neither present).<br /><br />The sequencing of operations (insert, toast first side, reverse, toast second side, remove) is enforced through a bunch of constraints that use <span style="font-family: "Courier New", Courier, monospace;">IloCP.endBeforeStart</span> (which says the first argument has to end before the second argument starts). The dummy "done" task is sequenced to start only after each slice has been removed for the final time. I'm pretty sure the objective value (the time everything is done) could be handled other ways, but I tend to think of completion as a zero-length task.<br /><br />The cook can only do one thing at a time. This is handled using the <span style="font-family: "Courier New", Courier, monospace;">IloCP.noOverlap</span> function. It is passed a list of every interval that requires the cook's attention (everything but the actual toasting tasks), and prevents any of those intervals from overlapping with any other.<br /><br />Finally, I need to prevent more than two slices from occupying the grill at any one time. The <span style="font-family: "Courier New", Courier, monospace;">noOverlap</span> function is no help here. Instead, I use an instance of <span style="font-family: "Courier New", Courier, monospace;">IloCumFunctionExpr</span>, which represents a function over time that changes as intervals begin and end. In this case, the function measures occupancy. This is handled by treating the usage as a combination of step functions (<span style="font-family: "Courier New", Courier, monospace;">IloCP.stepAtStart</span> and <span style="font-family: "Courier New", Courier, monospace;">IloCP.stepAtEnd</span>). Usage steps up by one at the <i>start</i> of the task of inserting a slice and steps down by one at the <i>end</i> of the task of removing a slice. (Toasting and flipping have no effect on occupancy.) The Javadoc for the relevant functions a bit, um, sparse, essentially saying nothing about the step height argument. Thus I discovered the hard way (through error messages) that adding a step with height -1 when a slice was removed was not acceptable. Instead, I had to subtract a step of height +1.<br /><br />Although it was not really necessary on a problem this small, I removed some symmetry resulting from the arbitrary ordering of the slices by setting precedence constraints saying that slice 1 is started before slice 2, which in turn is started before slice 3.<br /><br />It is possible to model the problem as an integer program, and in fact I initially leaned that direction. The IP model, however, would be bulkier and less expressive (which would make it more prone to logic errors), and quite possibly would be slower to solve. CPOptimizer is designed specifically with scheduling problems in mind, so it is the better tool for this particular job.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-49802524686898858432020-04-17T13:07:00.000-04:002020-04-17T13:07:22.105-04:00Objective Constraints (Again)Long ago, I did a couple of posts [1, 2] about constraints designed to bound objective functions. We are referring here to constraints that explicitly bound the value of the objective function in an integer or mixed-integer linear program. The typical application is when the user is minimizing $f(x)$ subject to $x\in X$ and specifies a bound $f(x) \le d$. (If maximizing the inequality becomes $f(x)\ge d$.) The reason for doing so is to help the solver prune inferior nodes (nodes where $f(x) > d$ when minimizing) faster.<br /><br />One way to accomplish the goal is to set a feasible starting solution $x^0 \in X$ for which $f(x)\le d$. This of course requires you to know such a solution. Also, setting a starting solution, even a good one, will likely steer the solver in a different direction than what it would have taken without the starting solution (meaning it will build a different tree), and this can wind up either faster or slower than not having the start, depending on where you sit on Santa's naughty/nice list and assorted random factors. (Asserting the bound by any of the other methods listed below can also have unintended consequences. Pretty much anything you do with a MIP can have unintended consequences.)<br /><br />Assuming you have a bound in mind but not a starting solution, you have a few options. The main takeaways from those two posts were basically the following.<br /><ol><li>If your solver has the capability, your best bet is probably to specify the bound via a parameter. (CPLEX has the "upper cutoff" parameter for min problems and the "lower cutoff" parameter for max problems to do just this.)</li><li>Failing that, you can introduce a variable $z$ to represent your objective function, add a defining constraint $z = f(x)$, minimize $z$ and then specify $d$ as an upper bound for $z$. This may slow the solver some (for reasons explained in the prior posts) but is likely not as bad as the last option.</li><li>The last option, which is the most obvious (and thus one users gravitate to), is to add the constraint $f(x) \le d$ to the model. This can slow the solver down noticeably.</li></ol>The short version of why the last option is undesirable is that if the last constraint is not binding (which will happen if $d$ is not the optimal value and the solver has found an optimal or near optimal solution), it is just excess baggage. If it is binding, it can cause dual degeneracy.<br /><br />Someone recently asked about this, and I waved my hand and invoked "dual degeneracy", but I'm not sure how clear I was. So I thought I would augment the two previous posts with a small example. <br /><br />Suppose that we are solving a MIP model, and at some node we are staring at the following LP relaxation:$$\begin{alignat*}{5} \min & {}-{}5x_{1} & {}+{}40x_{2} & {}-{}5x_{3} & {}+{}5x_{4}\\ \textrm{s.t.} & \phantom{\{\}-}x_{1} & {}-{}\phantom{4}6x_{2} & & {}-{}3x_{4} & {}+{}s_{1} & & & =-3\\ & \phantom{\{\}-}x_{1} & {}-{}\phantom{4}2x_{2} & {}+\phantom{5}{}x_{3} & {}+{}\phantom{4}x_{4} & & {}+{}s_{2} & & =\phantom{-}0\\ & {}-{}5x_{1} & {}+{}40x_{2} & {}-{}5x_{3} & {}+{}5x_{4} & & & {}+{}s_{3} & =-6 &\quad (*)\end{alignat*}$$where the variables are nonnegative, the $s$ variables are slacks, and the constraint (*) is our way of imposing an upper bound of -6 on the objective function. In matrix terms the problem is\begin{align} \min\quad & \bar{c}'\bar{x}\\ \textrm{s.t.}\quad & \bar{A}\bar{x}=\bar{b}\\ & \bar{x}\ge0 \end{align} with $\bar{x}=(x_1,\dots,x_4,s_1,\dots,s_3)'$, $\bar{c}=(-5,40,-5,5,0,0,0)'$, $\bar{b}=(-3,0,-6)'$ and $$\bar{A}=\left[\begin{array}{rrrrrrr} 1 & -6 & 0 & -3 & 1 & 0 & 0\\ 1 & -2 & 1 & 1 & 0 & 1 & 0\\ -5 & 40 & -5 & 5 & 0 & 0 & 1 \end{array}\right].$$The initial basis would be the slack variables, giving us an infeasible solution $x=0$, $s=(-3,0,-6)$ with reduced costs $r = \bar{c}$. The negative values of $s_1$ and $s_3$ cause the infeasibility.<br /><br />MIP solvers commonly use the dual simplex method to eliminate infeasibility in a node LP. Dual simplex would pivot in the row $i$ with the most negative right-hand side value $\bar{b}_i$, and in the column $j$ for which the ratio $r_j/\bar{a}_{ij}$ is minimal among those where $\bar{a}_{ij}\lt 0$. Here $i=3$ and $j$ is either 1 or 3 (the ratio in both column 1 and column 3 being $-5/-5=1$). Suppose that the solver chooses column 1, making the new basis (in row order) $(s_1, s_2, x_1).$ After the pivot, the reduced cost vector becomes $\hat{r}=c(0,0,0,0,0,0,-1)$, the new right-hand side vector is $\hat{b}=(-4.2, -1.2, 1.2)'$, and the new constraint matrix is $$\hat{A} = \left[\begin{array}{rrrrrrr} 0 & 2 & -1 & -2 & 1 & 0 & 0.2\\ 0 & 6 & 0 & 2 & 0 & 1 & 0.2\\ 1 & -8 & 1 & -1 & 0 & 0 & -0.2 \end{array}\right].$$The solution is still infeasible, and dual simplex will look to pivot in row 1 (where $\hat{b}$ is most negative. There are two possible pivot columns, columns 3 and 4, but the ratio used to distinguish them is zero in both cases because the reduced cost vector is all zeros (except for $s_3$, the slack in the objective constraint).<br /><br />The same thing happens if we pivot in column 3 rather than column 1, and in fact it is possible to show that the reduced cost vector will be all zeros other than the slack in the constraint limit as long as the slack in the constraint limit is nonbasic. Since that slack variable will typically be nonbasic so long as the constraint is binding, and the constraint is useful only when binding, we can expect to see a lot of LPs where this occurs. The tie is survivable (we've already seen one tie for pivot column), but picture this occurring where there are many dual pivots required, with perhaps many eligible columns (negative coefficients) for each pivot, and they all have ratio 0. The solver will be flying somewhat blind when it picks pivot columns, which could plausibly slow things down.<br /><br /><h3>References</h3><br />[1] "<a href="https://orinanobworld.blogspot.com/2013/06/objective-functions-make-poor.html" target="_blank">Objective Functions Make Poor Constraints</a>"<br />[2] "<a href="https://orinanobworld.blogspot.com/2013/06/object-constraints-sequel.html" target="_blank">Objective Constraints: The Sequel</a>"Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-58142780273163209862020-04-04T18:03:00.000-04:002020-04-04T18:03:11.501-04:00Tangents v. Secants Part IIThis is a continuation of a recent post ("<a href="https://orinanobworld.blogspot.com/2020/03/approximating-nonlinear-functions.html" target="_blank">Approximating Nonlinear Functions: Tangents v. Secants</a>") on how to work a nonlinear function into a mixed-integer linear programming model. As before, I'm sticking to functions of one variable. To recap the take-aways from that post, there are basically four ways I know to approximate a nonlinear function:<br /><ol><li>use a piecewise-linear function based on secants;</li><li>use a piecewise-linear function based on tangents;</li><li>use a surrogate variable that is bounded (below if the function is convex, above if the function is concave) by a collection of linear functions derived from tangents; or </li><li>use the third technique plus a callback that adds additional linear tangent functions whenever a candidate solution underestimates (convex case) or overestimates (concave case) the true value of the function.</li></ol>The first two methods apply generally, meaning that the function need not be convex or concave, and the constraint it appears in need not be a particular type ($\ge$, $\le$, $=$). The third and fourth methods only work when the function is convex or concave. For convex functions, tangents will underestimate the true value and secants will overestimate it; for concave functions, the opposite is true. For specific cases (convex function in a $\le$ constraint, concave function in a $\ge$ constraint), one of secants or tangents will produce feasible but potentially suboptimal solutions while the other will produce superoptimal but potentially infeasible solutions.<br /><br />Before going on, I want to make two observations that I should have made in the first post. First, for the specific cases I just mentioned, if you solve the problem twice, once using secants and once using tangents, the true optimal objective value will fall between the values of the two solutions, so you will have an idea of how close to optimal the possibly suboptimal solution is. (I'll illustrate this below.) For nonlinear functions in equality constraints, or for functions that are neither convex nor concave, this would not work. Second, if the argument to the nonlinear function is integer-valued, it makes sense to construct a piecewise-linear function (first two options) with integer-valued break points or to construct tangents (third option) at integer-valued points. That way, you are guaranteed correct function values at least at some points in the domain of the function. This is easy to do with secants but considerably more work with tangents.<br /><br />I have one more observation to make before getting to an example. In the fourth method I listed, with a convex function in a $\le$ constraint or a concave function in a $\ge$ constraint, if the solver finishes the search with an "optimal" solution, the solution will really be optimal. These are the cases where we would normally risk superoptimality, but the callback will prevent that from happening.<br /><br />At this point, I'm going to present an example of one possible scenario. The problem is to select repeating order quantities for a collection of products. In the model to follow, capital letters will be parameters and lower case letters will be indices or variables. We start with $N$ products. For each product $i$, we know the annual demand ($D_i$), the unit price ($P_i$), the unit annual holding cost ($H_i$), the cost to place an order for the product ($S_i$, regardless of how much is being ordered), and the unit volume ($V_i$, the amount of storage space one unit occupies). In addition, we know the total storage capacity $C$ of the warehouse where the products will be stored. We will somewhat laughably assume that everything is deterministic.<br /><br />Let $q_i$ denote the quantity of product $i$ ordered each time an order is placed, and $f_i$ the frequency (number of orders per year) with which product $i$ is replenished. The total annual cost, to be minimized, is $$\sum_{i=1}^N \left[P_iD_i + H_i\frac{q_i}{2} + S_i f_i \right],\tag{1}$$where the first term is the total cost of purchasing products (which is constant), the second term is the total cost of storing them (based on the average inventory level, which is half the order size), and the last term is the total cost for placing orders.<br /><br />The nonlinear function in this problem is the one relating order quantity to order frequency:$$f_i = \frac{D_i}{q_i}\quad \forall i=1,\dots,N.\tag{2}$$For a single product, it would be easy to substitute out $f_i$ from the objective, leaving a function of just $q_i$, and then differentiate. The first order optimality condition leads to the well known <a href="https://en.wikipedia.org/wiki/Economic_order_quantity" target="_blank">economic order quantity</a> (EOQ) formula$$q_i^* = \sqrt{\frac{2D_iS_i}{H_i}}.$$The catch here is that ordering the EOQ for every item might exceed our storage space. So we resort to a mixed-integer program, minimizing (1) subject to $$\frac{1}{2}\sum_{i=1}^n V_i q_i \le C \tag{3}$$with $$q_i\in \lbrace 1,\dots,D_i\rbrace\quad \forall i\in\lbrace 1,\dots,N\rbrace$$ and $$f_i \in\left[ 1,D_i\right]\quad \forall i\in\lbrace 1,\dots,N\rbrace ,$$plus some constraint(s) to connect $f_i$ to $q_i$. It's worth pausing here to note that at most one of $f_i$ and $q_i$ needs to be discrete. Here I am assuming that order quantities must be integers (we are ordering something like appliances, where a third of a washing machine is not a meaningful concept) but that order frequencies need not be integers (2.5 orders per year just means two orders one year, three the next, and so on).<br /><br />What is left is to pick one of the methods for approximating the relationship between quantity and frequency, equation (2). For methods 1 and 2, we can add to (1) and (3) the constraint $$f_i = \ell_i(q_i)\quad\forall i\in\lbrace 1,\dots,N\rbrace,\tag{4}$$where $\ell_i()$ is a piecewise linear function derived from either tangents or secants to the reciprocal function $g(x)=1/x$. For method 3, we can instead compute $M_i$ tangent functions $\ell_i()$ for each $i$ and add the constraint $$f_i \ge \ell_j(q_i) \quad\forall i\in\lbrace 1,\dots,i\rbrace,\,\forall j \in\lbrace 1,\dots,M_i\rbrace.\tag{4'}$$Note that this may underestimate $f_i$ (and thus the cost of a solution) but will not affect feasibility (since the space constraint (3) does not contain $f_i$). Finally, for the fourth method, we can minimize (1) subject to (3) and (4') and also use a callback to add more constraints like (4') on the fly.<br /><br />I tried all four methods, using CPLEX 12.10, on a small test problem ($N=10$ products). I used a constant holding cost rate ($H_i = 0.2$) for all products, and set the space limit to $C=2136.41$. (Yes, it looks goofy, but I used random numbers to generate the problem.) The product level data was as follows:<br /><br /><table align="center" class="tableizer-table"><thead><tr align="center" class="tableizer-firstrow"><th>Product</th><th></th><th style="padding: 0 0 0 30px;">Unit Cost</th><th style="padding: 0 0 0 30px;">Order Cost</th><th style="padding: 0 0 0 30px;">Demand</th><th style="padding: 0 0 0 30px;">Size</th></tr></thead><tbody><tr align="right"><td>0</td><td></td><td>4.04</td><td>2.54</td><td>1706</td><td>1.57</td></tr><tr align="right"><td>1</td><td></td><td>2.37</td><td>2.55</td><td>1203</td><td>4.68</td></tr><tr align="right"><td>2</td><td></td><td>3.82</td><td>2.01</td><td>1206</td><td>1.56</td></tr><tr align="right"><td>3</td><td></td><td>2.92</td><td>2.52</td><td>1373</td><td>2.00</td></tr><tr align="right"><td>4</td><td></td><td>1.37</td><td>3.38</td><td>1029</td><td>4.79</td></tr><tr align="right"><td>5</td><td></td><td>2.56</td><td>2.52</td><td>1700</td><td>2.91</td></tr><tr align="right"><td>6</td><td></td><td>4.55</td><td>3.12</td><td>1314</td><td>4.28</td></tr><tr align="right"><td>7</td><td></td><td>1.07</td><td>3.18</td><td>1223</td><td>4.06</td></tr><tr align="right"><td>8</td><td></td><td>3.64</td><td>3.74</td><td>1916</td><td>3.32</td></tr><tr align="right"><td>9</td><td></td><td>1.97</td><td>2.13</td><td>630</td><td>2.52</td></tr></tbody></table><br />For method 1, I used 10 evenly spaced breakpoints (so nine chords). For the other methods, I computed tangents at 10 evenly spaced points. Of course, more breakpoints or tangents would make for a more accurate approximation. The objective value for each of the four methods is as follows:<br /><br /><table align="center" class="tableizer-table"><thead><tr align="center" class="tableizer-firstrow"><th>Method</th><th style="padding: 0 0 0 30px;">Nominal</th><th style="padding: 0 0 0 30px;">Actual</th></tr></thead><tbody><tr align="right"><td>1</td><td>672.48</td><td>670.05</td></tr><tr align="right"><td>2</td><td>490.27</td><td>28946.48</td></tr><tr align="right"><td>3</td><td>490.27</td><td>28946.48</td></tr><tr align="right"><td>4</td><td>633.40</td><td>633.53</td></tr></tbody></table><br />Here "nominal" is the objective value reported by CPLEX and "actual" is the actual cost, using the order quantities from the CPLEX solution but recalculating the order frequencies according to (2). Method 1, based on secants, overestimates frequency (and thus cost) slightly. Methods 2 and 3 massively underestimate some frequencies, and thus the overall cost. The reason is apparent from the next table, which shows the nominal and actual order frequencies for each product:<br /><br /><table align="center" class="tableizer-table"><thead><tr align="center" class="tableizer-firstrow"><th>Product</th><th style="padding: 0 0 0 30px;">Demand</th><th style="padding: 0 0 0 30px;">Quantity</th><th style="padding: 0 0 0 30px;">Actual Freq</th><th style="padding: 0 0 0 30px;">Nominal Freq</th></tr></thead><tbody><tr align="right"><td>0</td><td>1706</td><td>1</td><td>1706.00</td><td>19.89</td></tr><tr align="right"><td>1</td><td>1203</td><td>1</td><td>1203.00</td><td>19.80</td></tr><tr align="right"><td>2</td><td>1206</td><td>1</td><td>1206.00</td><td>19.85</td></tr><tr align="right"><td>3</td><td>1373</td><td>1</td><td>1373.00</td><td>19.83</td></tr><tr align="right"><td>4</td><td>1029</td><td>137</td><td>7.51</td><td>6.69</td></tr><tr align="right"><td>5</td><td>1700</td><td>1</td><td>1700.00</td><td>19.82</td></tr><tr align="right"><td>6</td><td>1314</td><td>1</td><td>1314.00</td><td>19.83</td></tr><tr align="right"><td>7</td><td>1223</td><td>164</td><td>7.46</td><td>6.64</td></tr><tr align="right"><td>8</td><td>1916</td><td>1</td><td>1916.00</td><td>19.91</td></tr><tr align="right"><td>9</td><td>630</td><td>85</td><td>7.41</td><td>6.61</td></tr></tbody></table><br />For products with nontrivial order quantities, frequencies are underestimated a bit, but for products with order quantity 1 frequencies are massively underestimated (which is what attracts the solver to using such a small order quantity). Basically, the piecewise-linear approximation of the reciprocal relation (2) stinks at the low end of the quantity range, because the curve is steep and none of the tangents are close enough to that end of the quantity domain. This could be improved by forcing the piecewise-linear functions in method 2 to have a breakpoint at $q_i=1$ or by including a tangent function $\ell_i()$ calculated at $q_i=1$ in method 3. Still, to get a reasonable approximation you might need to add a bunch of tangents at that end of the curve.<br /><br />Method 4 produces a solution that is actually optimal (to within convergence tolerances). The callback ignored discrepancies between nominal and actual order frequency below 0.01. Cost is very slightly underestimated, which I think could be fixed by setting that 0.01 tolerance even smaller. That might cause the program to generate a huge number of tangents in the callback, slowing it down considerably. As it was, the callback added 392 tangents to the initial set of 10 tangents during the solution run.<br /><br />I mentioned earlier that using both tangents and secants (in separate runs) would bracket the true optimal value in cases of convex (concave) functions in less than (greater than) constraints. Here the true optimal cost (around 633.5) is indeed below the nominal cost of the secant approach (672.58) and above the nominal cost of the tangent approach (490.27). Note that we have to use the nominal costs, which in the case of the tangent approach is at once horribly inaccurate for the solution produced but still a valid lower bound on the actual optimal cost.<br /><br />If you would like to look at or play with my code (add breakpoints or tangents, add products, make the frequency rather than the order quantity discrete), you can find it at <a href="https://gitlab.msu.edu/orobworld/secants">https://gitlab.msu.edu/orobworld/secants</a>.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-75593512890719994322020-03-27T21:39:00.000-04:002020-03-27T21:39:07.662-04:00A Minimum Variance Partition ProblemSomeone posed a <a href="https://math.stackexchange.com/questions/3594343/combinatorial-optimization-find-partitions/3597944" target="_blank">question</a> on Mathematics Stack Exchange that I answered there. This post is to explain the model I used (and generalize a bit).<br /><br />In general terms, you start with a collection $\Omega=[\omega_1, \dots, \omega_n]$ of positive integers. I choose to think of the integers as weights. The mission is to partition $\Omega$ into subcollections $S_1, \dots, S_m$ such that (a) you use as few subcollections as possible (i.e., minimize $m$), (b) the variance of their weight totals is minimal, and (c) no subcollection in the partition has total weight greater than some specified constant $L$. (If, for example, $S_1 = [\omega_4, \omega_9, \omega_{10}]$ then the weight total for $S_1$ is $\omega_4 + \omega_9 + \omega_{10}$.) Why am I saying "collections" rather than sets? This would be a set partitioning problem were it not for the fact that $\Omega$ may contain duplicate entries and so is technically a multiset.<br /><br />The term "variance" is slightly ambiguous (population variance or sample variance), but happily that will not matter here. Let $$s_j =\sum_{i : \omega_i \in S_j} \omega_i$$be the weight of subcollection $S_j$. Since we are talking about a partition here (every item in exactly one subcollection), $$\sum_{j=1}^m s_j = \sum_{i=1}^n \omega_i = n \bar{\omega}$$where $\bar{\omega}$ is the (constant) mean of all the item weights, and so the mean of the subcollection weights is $$\bar{s} = \frac{n\bar{\omega}}{m},$$which for a given number $m$ of subcollections is constant. The variance of the subcollection weights is $$\frac{1}{m}\sum_{j=1}^m (s_j - \bar{s})^2=\frac{1}{m}\sum_{j=1}^m s_j^2-\bar{s}^2.$$To minimize that, we can just minimize $\sum_j s_j^2$ in the optimization model, and then do the arithmetic to get the variance afterward.<br /><br />As originally articulated, the problem has two objectives: minimize the variance of the subcollection weights and minimize the number of subcollections used. Normally, a multiobjective problem involves setting priorities or assigning weights to objectives or something along those lines. For the original question, where there are only $n=12$ elements in $\Omega$ (and thus at most $m=12$ sets in the partition), a plausible approach is to find a minimum variance solution for each possible value of $m$ and present the user with the efficient frontier of the solution space, letting the user determine the trade-off between partition size and variance that most appeals to them. For the particular example in the SE question, I was a bit surprised to find that the variance increased monotonically with the number of sets in the partition, as seen in the following table.<br /><center><!--l. 23--><br /><div class="noindent"></div><div class="tabular"><table cellpadding="0" cellspacing="0" class="tabular" id="TBL-1"><colgroup id="TBL-1-1g"><col id="TBL-1-1"></col><col id="TBL-1-2"></col><col id="TBL-1-3"></col></colgroup><tbody><tr id="TBL-1-1-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-1-1" style="text-align: center; white-space: nowrap;"># of Subcollections</td><td class="td11" id="TBL-1-1-2" style="text-align: center; white-space: nowrap;"> </td><td class="td11" id="TBL-1-1-3" style="text-align: center; white-space: nowrap;">Minimum Variance</td></tr><tr class="hline"><td><hr /></td><td><hr /></td><td><hr /></td></tr><tr id="TBL-1-2-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-2-1" style="text-align: center; white-space: nowrap;">1 </td><td class="td11" id="TBL-1-2-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-2-3" style="text-align: center; white-space: nowrap;">infeasible </td></tr><tr id="TBL-1-3-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-3-1" style="text-align: center; white-space: nowrap;">2 </td><td class="td11" id="TBL-1-3-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-3-3" style="text-align: center; white-space: nowrap;">infeasible</td></tr><tr id="TBL-1-4-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-4-1" style="text-align: center; white-space: nowrap;">3 </td><td class="td11" id="TBL-1-4-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-4-3" style="text-align: center; white-space: nowrap;">infeasible </td></tr><tr id="TBL-1-5-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-5-1" style="text-align: center; white-space: nowrap;">4 </td><td class="td11" id="TBL-1-5-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-5-3" style="text-align: center; white-space: nowrap;">infeasible</td></tr><tr id="TBL-1-6-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-6-1" style="text-align: center; white-space: nowrap;">5 </td><td class="td11" id="TBL-1-6-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-6-3" style="text-align: center; white-space: nowrap;">672 </td></tr><tr id="TBL-1-7-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-7-1" style="text-align: center; white-space: nowrap;">6 </td><td class="td11" id="TBL-1-7-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-7-3" style="text-align: center; white-space: nowrap;">1048.22</td></tr><tr id="TBL-1-8-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-8-1" style="text-align: center; white-space: nowrap;">7 </td><td class="td11" id="TBL-1-8-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-8-3" style="text-align: center; white-space: nowrap;">1717.39 </td></tr><tr id="TBL-1-9-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-9-1" style="text-align: center; white-space: nowrap;">8 </td><td class="td11" id="TBL-1-9-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-9-3" style="text-align: center; white-space: nowrap;">3824.75 </td></tr><tr id="TBL-1-10-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-10-1" style="text-align: center; white-space: nowrap;">9 </td><td class="td11" id="TBL-1-10-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-10-3" style="text-align: center; white-space: nowrap;">7473.73 </td></tr><tr id="TBL-1-11-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-11-1" style="text-align: center; white-space: nowrap;">10 </td><td class="td11" id="TBL-1-11-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-11-3" style="text-align: center; white-space: nowrap;">9025.80 </td></tr><tr id="TBL-1-12-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-12-1" style="text-align: center; white-space: nowrap;">11 </td><td class="td11" id="TBL-1-12-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-12-3" style="text-align: center; white-space: nowrap;">9404.81 </td></tr><tr id="TBL-1-13-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-13-1" style="text-align: center; white-space: nowrap;">12 </td><td class="td11" id="TBL-1-13-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-13-3" style="text-align: center; white-space: nowrap;">11667.06</td></tr><tr id="TBL-1-14-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-14-1" style="text-align: center; white-space: nowrap;"></td></tr></tbody></table></div></center><br />So the five subcollection partition is minimal both in number of subcollections and weight variance. I would be surprised if that generalized.<br /><br />The QP formulation uses two sets of variables. For $i=1,\dots,n$ and $j=1,\dots,m$, $x_{ij}\in \lbrace 0, 1\rbrace$ determines whether $\omega_i$ belongs to $S_j$ ($x_{ij}=1$) or not ($x_{ij}=0$). For $j=1,\dots,m$, $s_j \ge 0$ is the total weight of subcollection $j$. The objective function is simply $$\min_{x,s} \sum_{j=1}^m s_j^2$$which happily is convex. The first constraint enforces the requirement that the subcollections form a partition (every element being used exactly once): $$\sum_{j=1}^m x_{ij} = 1 \quad\forall i\in \lbrace 1,\dots,n\rbrace.$$For a partition to be valid, we do not want any empty sets, which leads to the following constraint:$$\sum_{i=1}^n x_{ij} \ge 1\quad\forall j\in\lbrace 1,\dots,m\rbrace.$$ The next constraint defines the weight variables: $$s_j = \sum_{i=1}^n \omega_i x_{ij}\quad\forall j\in\lbrace 1,\dots,m\rbrace.$$What about the limit $L$ on the weight of any subcollection? We can enforce that by limiting the domain of the $s$ variables to $s_j\in [0,L]$ for all $j$.<br /><br />Finally, there is symmetry in the model. One source is duplicate items in $\Omega$ ($\omega_i = \omega_{i'}$ for some $i\neq i'$). That happens in the data of the SE question, but it's probably not worth dealing with in an example that small. Another source of symmetry is that, given any partition, you get another partition with identical variance by permuting the indices of the subcollections. That one is easily fixed by requiring the subcollections to be indexed in ascending (or at least nondecreasing) weight order:$$s_j \le s_{j+1}\quad\forall j\in\lbrace 1,\dots,m-1\rbrace.$$Is it worth bothering with? With the anti-symmetry constraint included, total run time for all 12 models was between one and two seconds on my PC. Without the anti-symmetry constraint, the same solutions for the same 12 models were found in a bit over 100 minutes.<br /><br />If you would like to see my Java code (which requires CPLEX), you can find it at <a href="https://gitlab.msu.edu/orobworld/mvpartition">https://gitlab.msu.edu/orobworld/mvpartition</a>. <br /><div style="clear: both;"></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-16998495410879108602020-03-24T13:29:00.000-04:002020-03-24T13:40:46.818-04:00Approximating Nonlinear Functions: Tangents v. Secants<div class="separator" style="clear: both; text-align: center;"><br /></div>In a (much) earlier post [1], I discussed using SOS2 constraints to approximate a nonlinear function in a mixed-integer linear program (MILP). In this post, and probably the next one or two, I'm going to expand on that theme a bit. (By "a bit" I mean "endlessly".)<br /><br /><h3>Summary</h3><br />To save you some reading, in case you end up not interested, I'll summarize the take-aways here.<br /><ol><li>We can approximate nonlinear functions in a MILP model using piecewise-linear functions based on either tangents or secants. Piecewise-linear functions implicitly add either binary variables or SOS2 constraints (or maybe both).</li><li>For convex functions only appearing in $\le$ constraints or concave functions only appearing in $\ge$ constraints, we can say that using tangents may produce superoptimal solutions (look optimal but are actually infeasible) but will not produce suboptimal solutions, while using secants may produce suboptimal solutions but will not produce infeasible solutions. In any other case, super- and sub-optimality are both risks.</li><li>For the same two combinations in the previous paragraph, we have a third option: using a set of linear inequalities based on tangents, rather than a piecewise-linear function. Besides being somewhat easier to set up, this has a potential advantage: using a callback, we can refine the approximation on the fly (by adding additional constraints based on new tangents as they are needed).</li></ol><h3>Gory details</h3><br />Suppose that we are working on a MILP, and we need to incorporate a nonlinear function of some of the variables. To keep things simple, I'll illustrate with functions $f:\mathbb{R}\rightarrow\mathbb{R}$ of a single variable, but the concepts presented will extend (not necessarily easily) to functions of multiple variables. In what follows, I will refer to the set of points $(x,f(x))$ as the <i>surface</i> of $f$. A <i>tangent</i> to the surface at $x$ is a line (more generally, hyperplane) that touches the surface at $x$ and has the same slope as the surface at that one point. A <i>secant</i> to the surface is a line (more generally, hyperplane) that intersects the surface at multiple points. For a function of one dimension, a <i>chord</i> is a segment of a secant between two consecutive intersection points. ("Consecutive" means that there are no other intersections of secant and surface between the endpoints of the chord.) I'm not sure if the term "chord" is used in higher dimensions, although the concept extends to higher dimensions.<br /><br />Figure 1 shows a convex function together with a few tangents (in blue).<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><br /><br /><a href="https://1.bp.blogspot.com/-svjzglyWeeo/XnlBefThK0I/AAAAAAAADJo/ilR4XEWBNzMI0xCwDEjjJrcOz5ol8nZGQCLcBGAsYHQ/s1600/cvx_tan.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="330" data-original-width="534" height="245" src="https://1.bp.blogspot.com/-svjzglyWeeo/XnlBefThK0I/AAAAAAAADJo/ilR4XEWBNzMI0xCwDEjjJrcOz5ol8nZGQCLcBGAsYHQ/s400/cvx_tan.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 1: Tangents to convex function<br /><style type="text/css">/* Layout-provided Styles */ div.standard { margin-bottom: 2ex; }</style></td></tr></tbody></table><div class="composeBoxWrapper K3JSBVB-R-h"></div>Figure 2 shows the same function with a few secants (in green). One of the secants intersects the surface at $(-5,15)$ and $(0,0)$. The dashed line segment between those points is a chord.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-1ydPrdHjx70/XnlBeJwlBxI/AAAAAAAADJ8/b0np8lLLMhQHytme3guYTrc8JC0oth1pACEwYBhgL/s1600/cvx_sec.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="330" data-original-width="534" height="246" src="https://1.bp.blogspot.com/-1ydPrdHjx70/XnlBeJwlBxI/AAAAAAAADJ8/b0np8lLLMhQHytme3guYTrc8JC0oth1pACEwYBhgL/s400/cvx_sec.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 2: Secants to convex function</td></tr></tbody></table>Figures 3 and 4 do the same for a concave function.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-mJSfnecoWWc/XnlBdRIp5jI/AAAAAAAADJ4/8MgXCIxjqrAW9L5XQEzpWZbSuvwd1_OMwCEwYBhgL/s1600/concave_tan.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="330" data-original-width="534" height="246" src="https://1.bp.blogspot.com/-mJSfnecoWWc/XnlBdRIp5jI/AAAAAAAADJ4/8MgXCIxjqrAW9L5XQEzpWZbSuvwd1_OMwCEwYBhgL/s400/concave_tan.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 3: Tangents to concave function</td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td></tr></tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-hGFKn5oNcxE/XnlBdQfyEjI/AAAAAAAADKI/kT-Eccpm1pMR3AoGP2Suu7CHEQoheD_CgCEwYBhgL/s1600/concave_sec.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="330" data-original-width="534" height="246" src="https://1.bp.blogspot.com/-hGFKn5oNcxE/XnlBdQfyEjI/AAAAAAAADKI/kT-Eccpm1pMR3AoGP2Suu7CHEQoheD_CgCEwYBhgL/s400/concave_sec.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 4: Secants to concave function</td></tr></tbody></table><br />A useful property of convex and concave functions is that tangents of convex functions are always less than or equal to the functions and tangents of concave functions are always greater than or equal to the functions. Similar claims cannot be made for secants. A chord of a convex function will always be greater than or equal to the function, but other points on its secant can be less than the function. For instance, in Figure 2 the chord from $x=-5$ to $x=0$ lies above the function surface, but the secant to which it belongs falls below the function surface for $x<-5$ or $x>0$. It is worth noting that tangents to nonconvex functions do not stay on one side of the surface. Figure 5 shows a nonconvex function along with one tangent (in blue) and one secant (in green). Both the tangent and the secant are above the function surface some places and below it other places.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-b3Oqq_zeS0U/XnlBfa24SsI/AAAAAAAADKA/GinoU18xBZkyij_UTR-22D4OInr6TzCQgCEwYBhgL/s1600/nonconvex.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="482" data-original-width="594" height="323" src="https://1.bp.blogspot.com/-b3Oqq_zeS0U/XnlBfa24SsI/AAAAAAAADKA/GinoU18xBZkyij_UTR-22D4OInr6TzCQgCEwYBhgL/s400/nonconvex.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 5: Nonconvex function</td></tr></tbody></table><br />In what follows, when I talk about convexity of the feasible region, I will be referring to the feasible region of the continuous relaxation (all integrality restrictions relaxed). Convexity of that region is critical to algorithms using LP bounds. There are two cases of initial interest, because they preserve that convexity. If the function $f()$ is convex and appears only in constraints of the form $f(x)+\dots\le b$ (where $\dots$ is empty or contains only convex terms), or is concave and appears only in constraints of the form $f(x)+\dots\ge b$ (where $\dots$ is empty or contains only concave terms), and if the other constraints do nothing to violate convexity, then the feasible region remains convex when $f()$ is introduced. Under any other scenario, you lose convexity of the feasible region when you introduce $f()$. So for now we will only consider convex $f()$ in $\le$ constraints or concave $f()$ in $\ge$ constraints. If $f()$ is convex or concave but appears in an inequality constraint with the wrong direction, or appears in an equation constraint, or if $f()$ is neither convex nor concave, we will deal with it later (or never, whichever comes second).<br /><br />For simplicity, suppose that $f()$ is convex. We introduce a new variable $z$ that is a surrogate for $f(x)$, changing each constraint of the form $f(x)+\dots\le b$ to $z+\dots\le b$. Now we have to tie $z$ to $f(x)$, but without using the explicit constraint $z=f(x)$ (due to that whole nonlinearity thing). We can approximate the relationship either with tangents or with secants. With secants, we pick an arbitrary collection of values of $x$, calculate the corresponding values of $f(x)$, define a piecewise-linear function $\hat{f}(x)$ based on those values, and set $z=\hat{f}(x)$. In Figure 2, $\hat{f}(x)$ consists of the four chords shown (with endpoints at $x\in\left\{ -10,-5,0,5,10\right\} $).<br /><br />With tangents, we have a choice. Both options involve computing tangents at an arbitrary collection of values of $x$. In one version, we find where those tangents intersect and use the intersection points to define a piecewise-linear function. In Figure 1, the five tangents shown intersect at $\left\{ (-7.5,35),(-2.5,-5),(2.5,5),(7.5,65)\right\} $. We can add endpoints for where the first and last tangents exit the interval of interest (say $(-10,80)$ and $(10,120)$ for Figure 1). In the other version, we add the constraints $z\ge\ell_{j}(x)$ for $j=1,\dots,t$, where $t$ is the number of tangents computed and $\ell_{1}(x),\dots,\ell_{t}(x)$ are the linear functions corresponding to the tangents. In Figure 1, $t=5$ and for any $x$ we are constraining $z$ to be greater than or equal to the ordinates of all five dotted lines at that $x$. If $f()$ is concave and appears in the constraint $f(x)\ge b$, the only change is that the added constraints are $z\le\ell_{j}(x)$.<br /><br />At this point, there are several things we can note.<br /><ol><li>Calculating chords only requires the evaluation of $f(x)$ for various values of $x$ (plus some simple algebra). Calculating tangents requires computing the first derivative of $f(x)$ when $f()$ is smooth. (More generally, it involves calculating the subgradient of $f()$ at $x$.) It then requires calculating where the tangents intersect, if we are going to make a piecewise-linear function from the tangents. So if $f()$ is painful to work with, or if sorting out a lot of intersections does not appeal to you, you might prefer chords.</li><li>Chords involve embedding a piecewise-linear function in a model. As discussed in [1], this can be done using type two special ordered sets (SOS2), which turns an LP into a MIP and turns a MIP into a somewhat more complicated MIP. In another post [2], I discussed a handy feature in the CPLEX APIs that allows for easy entry of a piecewise-linear function. (In the Java API, the method is <span style="font-family: "courier new" , "courier" , monospace;">IloCplexModeler.piecewiseLinear()</span>, which in turn implements a method from the <span style="font-family: "courier new" , "courier" , monospace;">IloMPModeler</span> interface.) The API method still introduces either binary variables or SOS2 constraints "under the hood", but at least the user is spared the tedious business of computing them. With tangents, the first option described does the same thing, but the second option just involves adding a bunch of linear constraints. (In CPLEX, if you calculate a lot of tangents, you might consider adding the constraints as "lazy constraints".)</li><li>The way the approximations affect the solution differs between tangents and secants. <ol><li>In Figures 1 and 3, it is easy to see that if we replace $f(x)$ with a finite set of tangents, every feasible point remains feasible, but some infeasible points appear feasible. For instance, take the constraint $f(x)\le38$ (which becomes $z\le38$) with the function in Figure 1, redrawn below in Figure 6. The horizontal dotted line (in red) shows the limit (38) imposed by the constraint. The vertical dotted line (in cyan) shows what happens when $x=-7.5$. The black point, $(-7.5,f(-7.5))$ is above the red dotted line, indicating that $x=-7.5$ is infeasible. The blue point, $(-7.5,35)$, is however feasible in the relaxed problem, since it is on or above all the selected tangents and below the constraint limit. So the solver might accept a solution containing $x=-7.5$.</li><li>In Figures 2 and 4, it is likewise easy to see that if we replace $f(x)$ with a piecewise-linear function consisting of chords, solutions that are feasible in the relaxed problem will also be feasible in the original problem, but "optimal" solutions to the relaxed problem may actually be suboptimal. The secant approximation to our convex function in Figure 2 is redrawn below in Figure 7. Staying with the constraint $z\le38$, consider what happens at $x=-7$ (the dotted vertical line). The actual function value $f(-7)=35$ is below the constraint limit, so $x=-7$ satisfies the constraint. (This is the black point on the graph.) Barring exclusion due to other constraints, it would be feasible and might be part of an optimal solution. The green point $(-7,41)$, on the piecewise-linear function defined by the secants, is not feasible, however, and so the solver will reject any solution containing $x=-7$, including possibly the true optimum.</li></ol></li></ol><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-htEGTesdKno/XnlBe7gCBbI/AAAAAAAADKE/gQhO0BGCt2MPbqhpyWGCZtJ5eaq8IaKXACEwYBhgL/s1600/infeasible.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="482" data-original-width="594" height="323" src="https://1.bp.blogspot.com/-htEGTesdKno/XnlBe7gCBbI/AAAAAAAADKE/gQhO0BGCt2MPbqhpyWGCZtJ5eaq8IaKXACEwYBhgL/s400/infeasible.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 6: Infeasible point</td></tr></tbody></table><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-gyHqTxo7cnY/XnlBfiyORcI/AAAAAAAADKM/opvI9qL0uk4XR1xZNjoPPYYOGDsdK3mRQCEwYBhgL/s1600/suboptimal.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="482" data-original-width="594" height="323" src="https://1.bp.blogspot.com/-gyHqTxo7cnY/XnlBfiyORcI/AAAAAAAADKM/opvI9qL0uk4XR1xZNjoPPYYOGDsdK3mRQCEwYBhgL/s400/suboptimal.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 7: Feasible point excluded</td></tr></tbody></table><br />Since tangents may produce infeasible "solutions" and secants may produce suboptimal solutions, we may need to solve the problem, get a sense of where the optimal solution might be located, refine the approximation near there and repeat. Note that if we use tangents for convex or concave functions (which produces an "outer approximation" to the original problem) and the final solution is really feasible (using the actual $f()$), there is no need to refine further. Unfortunately, with secants, I see no obvious way to know that our approximation is "good enough". With functions that are neither convex nor concave, or with convex/concave functions in equations or inequalities going the "wrong" direction, tangents no longer provide an outer approximation (Figure 5), and we may need to refine and repeat.<br /><br />This brings me to my last point, which pertains to the cases where tangents do provide an outer approximation and where we add all the tangents as constraints rather than turning them into a piecewise-linear function. With solvers, like CPLEX, that allow users to post new constraints during the branch-and-bound process, we can refine on the fly. Referring back to Figure 6, suppose that I am solving the problem using CPLEX, with a lazy constraint callback attached, and CPLEX identifies a solution containing $x=-7.5$. Inside the callback, I can compute $f(-7.5)$, observe that it is greater than the constraint cutoff of $38$ (and also greater than $z=35$), calculate a new tangent at the black point, and add it as a lazy constraint. This will make the proposed new incumbent infeasible. Figure 8 shows this, with the orange dashed line being the tangent at $x=-7.5$. After adding it to our set of tangents (and adding the constraint $z\ge\ell(x)$, where $\ell(x)$ is the linear function given by the new tangent), the blue point no longer gives a possible value for $z$. At $x=-7.5$, the maximum value of any tangent will be the value of $\ell(-7.5)$, and that will equal $f(-7.5)$ since we calculated the tangent there. (It's hard to see in the figure, but trust me, the orange line goes through the black point.) So the solver will update $z$ to the correct value and, in this case, reject the solution as violating the constraint $z\le38$.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-PoNf6RW78_0/XnlBdZe_NhI/AAAAAAAADKI/4HUIrrontW0gD65JppRwcjcTVWKhzUgWwCEwYBhgL/s1600/addcut.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="482" data-original-width="594" height="323" src="https://1.bp.blogspot.com/-PoNf6RW78_0/XnlBdZe_NhI/AAAAAAAADKI/4HUIrrontW0gD65JppRwcjcTVWKhzUgWwCEwYBhgL/s400/addcut.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 8: Adding a tangent</td></tr></tbody></table><br /><h3>References</h3>[1] "<a href="https://orinanobworld.blogspot.com/2014/11/linearize-that.html" target="_blank">Linearize That!</a>"<br />[2] "<a href="https://orinanobworld.blogspot.com/2012/07/piecewise-linear-functions-in-cplex.html" target="_blank">Piecewise Linear Functions in CPLEX</a>"Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-37127926527169686442020-02-17T19:53:00.003-05:002020-02-18T14:26:05.926-05:00Reversing DifferencesFellow blogger <a href="http://hakank.org/" target="_blank">Håkan Kjellerstrand</a> posted an <a href="https://or.stackexchange.com/questions/3349/restoring-a-list-from-differences" target="_blank">interesting question</a> on OR Stack Exchange recently. Starting from a list of integers, it is trivial to compute the list of all pairwise absolute differences, but what about going in the other direction? Given the pairwise (absolute) differences, with duplicates removed, can you recover the source list (or a source list)?<br /><br />We can view the source "list" as a vector $x\in\mathbb{Z}^n$ for some dimension $n$ (equal to the length of the list). With duplicates removed, we can view the differences as a set $D\subset \mathbb{Z}_+$. So the question has to do with recovering $x$ from $D$. Our first observation kills any chance of recovering the original list with certainty:<br /><blockquote class="tr_bq">If $x$ produces difference set $D$, then for any $t\in\mathbb{R}$ the vector $x+t\cdot (1,\dots,1)'$ produces the same set $D$ of differences.</blockquote>Translating all components of $x$ by a constant amount has no effect on the differences. So there will be an infinite number of solutions for a given difference set $D$. A reasonable approach (proposed by Håkan in his question) is to look for the shortest possible list, i.e., minimize $n$.<br /><br />Next, observe that $0\in D$ if and only if two components of $x$ are identical. If $0\notin D$, we can assume that all components of $x$ are distinct. If $0\in D$, we can solve the problem for $D\backslash\{0\}$ and then duplicate any one component of the resulting vector $x$ to get a minimum dimension solution to the original problem.<br /><br />Combining the assumption that $0\notin D$ and the observation about adding a constant having no effect, we can assume that the minimum element of $x$ is 1. That in turn implies that the maximum element of $x$ is $1+m$ where $m=\max(D)$.<br /><br />From there, Håkan went on to solve a test problem using constraint programming (CP). Although I'm inclined to suspect that CP will be more efficient in general than an integer programming (IP) model, I went ahead and solved his test problem via an IP model (coded in Java and solved using CPLEX 12.10). CPLEX's solution pool feature found the same four solutions to Håkan's example that he did, in under 100 ms. How well the IP method scales is an open question, but it certainly works for modest size problems.<br /><br />The IP model uses binary variables $z_1, \dots, z_{m+1}$ to decide which of the integers $1,\dots,m+1$ are included in the solution $x$. It also uses variables $w_{ij}\in [0,1]$ for all $i,j\in \{1,\dots,m+1\}$ such that $i \lt j$. The intent is that $w_{ij}=1$ if both $i$ and $j$ are included in the solution, and $w_{ij} = 0$ otherwise. We could declare the $w_{ij}$ to be binary, but we do not need to; constraints will force them to be $0$ or $1$.<br /><br />The full IP model is as follows:<br /><br /><div class="scroll">\[ \begin{array}{lrlrc} \min & \sum_{i=1}^{m+1}z_{i} & & & (1)\\ \textrm{s.t.} & w_{i,j} & \le z_{i} & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (2)\\ & w_{i,j} & \le z_{j} & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (3)\\ & w_{i,j} & \ge z_{i}+z_{j}-1 & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (4)\\ & w_{i,j} & =0 & \forall i,j\in\left\{ 1,\dots,m+1\right\} \textrm{ s.t. }(j-i)\notin D & (5)\\ & \sum_{i,j\in\left\{ 1,\dots,m+1\right\} |j-i=d}w_{i,j} & \ge 1 & \forall d\in D & (6)\\ & z_{1} & = 1 & & (7) \end{array} \]</div><br />The objective (1) minimizes the number of integers used. Constraints (2) through (4) enforce the rule that $w_{ij}=1$ if and only if both $z_i$ and $z_j$ are $1$ (i.e., if and only if both $i$ and $j$ are included in the solution). Constraint (5) precludes the inclusion of any pair $i < j$ whose difference $j - i$ is not in $D$, while constraint (6) says that for each difference $d \in D$ we must include at least one pair $i < j$ for that produces that difference ($j - i = d$). Finally, since we assumed that our solution starts with minimum value $1$, constraint (7) ensures that $1$ is in the solution. (This constraint is redundant, but appears to help the solver a little, although I can't be sure given the short run times.)<br /><br />My <a href="https://gitlab.msu.edu/orobworld/differences" target="_blank">Java code</a> is available from my repository (bring your own CPLEX). Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com5tag:blogger.com,1999:blog-8781383461061929571.post-34466774975110152362020-02-11T15:37:00.000-05:002020-02-11T15:37:06.374-05:00Collections of CPLEX VariablesRecently, someone asked for help online regarding an optimization model they were building using the CPLEX Java API. The underlying problem had some sort of network structure with $N$ nodes, and a dynamic aspect (something going on in each of $T$ periods, relating to arc flows I think). Forget about solving the problem: the program was running out of memory and dying while building the model.<br /><br />A major issue was that they allocated two $N\times N\times T$ arrays of variables, and $N$ and $T$ were big enough that $2N^2T$ was, to use a technical term, ginormous. Fortunately, the network was fairly sparse, and possibly not every time period was relevant for every arc. So by creating only the IloNumVar instances they needed (meaning only for arcs that actual exist in time periods that were actually relevant), they were able to get the model to build.<br /><br />That's the motivation for today's post. We have a tendency to write mathematical models using vectors or matrices of variables. So, for instance, $x_i \, (i=1,\dots,n)$ might be an inventory level at each of $n$ locations, or $y_{i,j} \, (i=1,\dots,m; j=1,\dots,n)$ might be the inventory of item $i$ at location $j$. It's a natural way of expressing things mathematically. Not coincidentally, I think, CPLEX APIs provide structures for storing vectors or matrices of variables and for passing them into or out of functions. That makes it easy to fall into the trap of thinking that variables <i>must be</i> organized into vectors or matrices.<br /><br />Last year I did a post ("<a href="https://orinanobworld.blogspot.com/2019/07/using-java-collections-with-cplex.html">Using Java Collections with CPLEX</a>") about using what Java calls "collections" to manage CPLEX variables. This is not unique to Java. I know that C++ has similar memory structures, and I think they exist in other languages you might use with CPLEX. The solution to the memory issue I mentioned at the start was to create a Java container class for each combination of an arc that actually exists and time epoch for which it would have a variable, and then associate instances of that class with CPLEX variables. So if we call the new class <span style="font-family: "Courier New", Courier, monospace;">AT</span> (my shorthand for "arc-time"), I suggested the model owner use a <span style="font-family: "Courier New", Courier, monospace;">Map<AT, IloNumVar></span> to associate each arc-time combination with the variable representing it and a <span style="font-family: "Courier New", Courier, monospace;">Map<IloNumVar, AT></span> to hold the reverse association. The particular type of map is mostly a matter of taste. (I generally use <span style="font-family: "Courier New", Courier, monospace;">HashMap</span>.) During model building, they would create only the <span style="font-family: "Courier New", Courier, monospace;">AT</span> instances they actually need, then create a variable for each and pair them up in the first map. When getting a solution from CPLEX, they would get a value for each variable and then use the second map to figure out for which arc and time that value applied. (As a side note, if you use maps and then need the variables in vector form, you can apply the <span style="font-family: "Courier New", Courier, monospace;">values()</span> method to the first map (or the <span style="font-family: "Courier New", Courier, monospace;">getKeySet()</span> method to the second one), and then apply the <span style="font-family: "Courier New", Courier, monospace;">toArray()</span> method to that collection.)<br /><br />Now you can certainly get a valid model using just arrays of variables, which was all that was available to me back in the Dark Ages when I used FORTRAN, but I think there are some benefits to using collections. Using arrays requires you to develop an indexing scheme for your variables. The indexing scheme tells you that the flow from node 3 to node 7 at time 4 will be occupy slot 17 in the master variable vector. Here are my reasons for avoiding that. <br /><ul><li>Done correctly, the indexing scheme is, in my opinion, a pain in the butt to manage. Finding the index for a particular variable while writing the code is time-consuming and has been known to kill brain cells.</li><li>It is easy to make mistakes while programming (calculate an index incorrectly).</li><li>Indexing invites the error of declaring an array or vector with one entry for each combination of component indices (that $N\times N\times T$ matrix above), without regard to whether you need all those slots. Doing so wastes time and space, and the space, as we saw, may be precious.</li><li>Creating slots that you do not need can lead to execution errors. Suppose that I allocating a vector <span style="font-family: "Courier New", Courier, monospace;">IloNumVar x = new IloNumVar[20]</span> and use 18 slots, omitting slots 0 and 13. If I solve the model and then call <span style="font-family: "Courier New", Courier, monospace;">getValues(x)</span><span style="font-family: "Courier New", Courier, monospace;"></span>, CPLEX will throw an exception, because I am asking for values of two variables (<span style="font-family: "Courier New", Courier, monospace;">x[0]</span> and <span style="font-family: "Courier New", Courier, monospace;">x[13]</span>) that do not exist. Even if I create variables for those two slots, the exception will occur, because those two variables will not belong to the model being solved. (There is a way to force CPLEX to include those variables in the model without using them, but it's one more pain in the butt to deal with.) I've lost count of how many times I've seen messages on the CPLEX help forums about exceptions that boiled down to "unextracted variables".</li></ul>So my advice is to embrace collections when building models where variables do not have an obvious index scheme (with no skips).Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-73003561398342538642020-01-30T14:16:00.000-05:002020-01-30T14:20:13.728-05:00Generic Callback Changes in CPLEX 12.10CPLEX 12.10 is out, and there have been a few changes to the new(ish) generic callbacks. Rather than go into them in detail (and likely screw something up), I'll just point you to the <a href="https://ibm.ent.box.com/s/t6dq7kvljjua6xd3qj2811h5bcufdw9n" target="_blank">slides for a presentation</a> by Daniel Junglas of IBM at the 2019 INFORMS Annual Meeting.<br /><br />I've written about half a dozen posts about generic callbacks since IBM introduced them (which you can find by typing "generic callback" in the search widget on the blog). A couple of things have been added recently, and I thought I would mention them. The generic callback approach uses a single callback function that can be called from a variety of contexts, including when CPLEX solves a node relaxation ("RELAXATION" context), when if finds a candidate solution ("CANDIDATE" context) and, now, when it is ready to split a node into children ("BRANCHING" context).<br /><br />The branching context is one of the new features. It brings back most of the functionality of the branch callback in the legacy callback system. Unfortunately, it does not seem to have the ability to attach user information to the child nodes, which was a feature that was occasionally useful in the legacy system. You can get more or less equivalent functionality by creating a data store (array, map, whatever) in your global memory and storing the node information keyed by the unique index number of each child node. The catch is that you are now responsible for memory management (freeing up space when a node is pruned and the associated information is no longer needed), and for dealing with thread synchronization issues.<br /><br />Another new feature is that you can now inject a heuristic solution (if you have one) from all three of the contexts I mentioned above. CPLEX gives you a variety of options for how it will handle the injected solution: "NoCheck" (CPLEX will trust you that it is feasible); "CheckFeasible" (CPLEX will check feasibility and ignore the solution if it is not feasible); "Propagate" (Daniel's explanation: CPLEX will "propagate fixed variables and accept if feasible"); and "Solve" (CPLEX will solve a MIP problem with fixed variables and accept the result if feasible). I assume the latter two mean that you provide a partial solution, fixing some variables but not others. (Unfortunately I was unable to make it to Daniel's talk, so I'm speculating here.)<br /><br />I'm not sure if those are the only new features, but they are the ones that are most relevant to me. I invite you to read through Daniel's slides to get a more complete picture, including both the reasons for switching from legacy callbacks to generic callbacks and some of the technical issues in using them.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-56340525588829501192020-01-07T14:37:00.000-05:002020-01-07T14:37:57.835-05:00Greedy Methods Can Be ExactWe generally sort optimization algorithms (as opposed to models) into two or three categories, based on how certain we are that solutions will be either optimal or at least "good". An <a href="https://or.stackexchange.com/a/936/67" target="_blank">answer by Michael Feldmeier</a> to a question I posted on OR Stack Exchange neatly summarizes the categories:<br /><ul><li><i>exact</i> methods eventually cough up provably optimal solutions;</li><li><i>approximate</i> methods eventually cough up solutions with some (meaningful) guarantee regarding how far from optimal they might be; and</li><li><i>heuristics</i> provide no worst-case guarantees (but generally are either easy to implement, fast to execute or both).</li></ul>I should explain my use of "meaningful" (which is not part of Michael's answer). A common way to estimate the "gap" between a solution and the optimum is to take $|z - \tilde{z}|/|z|$, where $z$ is the objective value of the solution produced by the algorithm and $\tilde{z}$ is some bound (lower bound in a minimization, upper bound in a maximization) of the optimal solution. Now suppose that we are minimizing a function known to be nonnegative. If we set $\tilde{s}=0$, we know that any method, no matter how stupid, will have a gap no worse than 100%. To me, that is not a meaningful guarantee. So I'll leave the definition of "meaningful" to the reader.<br /><br />What brings all this to mind is a <a href="https://math.stackexchange.com/q/3478711/458896" target="_blank">question</a> posted on Mathematics Stack Exchange. The author of the question was trying to solve a nonlinear integer program. He approached it by applying a "<a href="https://en.wikipedia.org/wiki/Greedy_algorithm" target="_blank">greedy algorithm</a>". Greedy algorithms are generally assumed to be heuristics, since it seldom is possible to provide useful guarantees on performance. In his case, though, the greedy algorithm is provably optimal, mainly due to the objective function being concave and separable. I'll state the problem and show a proof of optimality below (changing the original notation a bit). Brace yourself: the proof is a bit long-winded.<br /><br /><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">You start with $N$ workers to be assigned to $M$ work stations. The output of workstation $m$, as a function of the number of workers $x$ assigned to it, is given by $$f_{m}(x)=a_{m}x+b_{m}-\frac{c_{m}}{x},$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">where $a_{m},b_{m},c_{m}$ are all positive constants. Since $f(0)=-\infty$, we can assume that each work station gets at least one worker (and, consequently, that $N>M$). Since $f_{m}'(x)=a_{m}+c_{m}/x^{2}>0$, each $f_{m}()$ is monotonically increasing. Thus, we can safely assume that all $N$ workers will be assigned somewhere. $f_{m}''(x)=-2c_{m}/x^{3}<0$, so $f_{m}()$ is strictly concave (which we will need later). We also note, for future reference, that the impact of adding one worker to a current staff of $x$ at station $m$ is $$\Delta f_{m}(x)=a_{m}+\frac{c_{m}}{x(x+1)}>0.$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Similarly, the impact of removing one worker at station $m$ is $$\delta f_{m}(x)=-a_{m}-\frac{c_{m}}{x(x-1)}<0.$$We see that $\delta f_{m}(x)$ is an increasing function of $x$ (i.e., it gets less negative as $x$ gets bigger). We also note that $\Delta f_{m}(x)=-\delta f_{m}(x+1)$.</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The IP model is easy to state. Let $x_{m}$ be the number of workers assigned to work station $m$. The model is</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\max\sum_{m=1}^{M}f_{m}(x_{m})$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">subject to</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\sum_{m=1}^{M}x_{m}\le N$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">with </div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$x\in\mathbb{Z}_{+}^{M}.$$</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The greedy algorithm starts with a single worker at each station ($x=(1,\dots,1)$) and, at each step, adds one worker to the workstation where that worker produces the greatest increase in objective value (breaking ties arbitrarily). It stops when all $N$ workers are assigned. To prove that it actually finds an optimal solution, I'll use proof by contradiction.</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Let $x^{(0)},x^{(1)},\dots,x^{(N-M)}$ be the sequence of solutions constructed by the greedy algorithm, with $x^{(0)}=(1,\dots,1)$, and let $x^{(k)}$ be the last solution in the sequence for which an optimal solution $x^{*}$ exists such that $x^{(k)}\le x^{*}$. The significance of the inequality is that if $x\le x^{*}$, it is possible to extend the partial solution $x$ to the optimal solution $x^{*}$ by adding unassigned workers to work stations. We know that $k$ is well defined because $x^{(0)}\le x^{*}$ for any optimal $x^{*}$. Since we are assuming that the greedy algorithm does not find an optimum, it must be that $k<N-M$.</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Now identify the work station $j$ to which the greedy algorithm added a worker at step $k$, meaning that $x_{j}^{(k+1)}=x_{j}^{(k)}+1$ and $x_{i}^{(k+1)}=x_{i}^{(k)}$ for $i\neq j$. Since, by assumption, $x^{(k)}\le x^{*}$ but $x^{(k+1)}\not\le x^{*}$, it must be that $x_{j}^{(k)}=x_{j}^{*}$.</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Next, since $x^{(k)}\le x^{*}$ and $x^{(k)}\neq x^{*}$ (else $x^{(k)}$ would be optimal), there is some work station $h\neq j$ such that $x_{h}^{(k)}<x_{h}^{*}$. Let $\tilde{x}$ be the solution obtained from $x^{(k)}$ by adding a worker to station $h$: $\tilde{x}_{h}=x_{h}^{(k)}+1$ and $\tilde{x}_{i}=x_{i}^{(k)}$ for $i\neq h$. Observe that $\tilde{x}\le x^{*}$. The greedy algorithm chose work station $j$ over work station $h$ at $x^{(k)}$, so it must be that</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{(k)})\ge\Delta f_{h}(x_{h}^{(k)}). \quad (1)$$</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Finally, let $\hat{x}$ be the result of starting from optimal solution $x^{*}$ and shifting one worker from station $h$ to station $j$. Since </div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$x_{j}^{(k+1)}=x_{j}^{(k)}+1=x_{j}^{*}+1=\hat{x}_{j},$$ $$x_{h}^{(k+1)}=x_{h}^{(k)}<x_{h}^{*}\implies x_{h}^{(k+1)}\le\hat{x}_{h}$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">and </div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$x_{i}^{(k+1)}=x_{i}^{(k)}\le x_{i}^{*}=\hat{x}_{i}\,\forall i\notin\{h,j\},$$</div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">we have $x^{(k+1)}\le\hat{x}$. Under the assumption that $x^{(k)}$ was the last solution in the greedy sequence that could be extended to an optimal solution, it must be that $\hat{x}$ is not optimal. Thus the net change to the objective function at $x^{*}$ when shifting one worker from station $h$ to station $j$ must be negative, i.e.,</div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{*})+\delta f_{h}(x_{h}^{*})<0.\quad (2)$$</div><br /><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">We showed previously that, under our assumptions, $x_{j}^{(k)}=x_{j}^{*}$, from which it follows that </div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{*})=\Delta f_{j}(x_{j}^{(k)}). \quad (3)$$</div>We also showed that $\delta f_{h}()$ is an increasing function. Since $\tilde{x}_{h}\le x_{h}^{*}$, <br /><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\delta f_{h}(x_{h}^{*})\ge\delta f_{h}(\tilde{x}_{h})=-\Delta f_{h}(x_{h}^{(k)}). \quad (4)$$</div>Combining (4) with (2), we have<br /><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{*})-\Delta f_{h}(x_{h}^{(k)})<0,$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">i.e., </div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{*})<\Delta f_{h}(x_{h}^{(k)}). \quad (5)$$</div><br /><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Combining (3) with (5) yields</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{(k)})<\Delta f_{h}(x_{h}^{(k)})$$</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">which contradicts (1).</div><style type="text/css">p, li { white-space: pre-wrap; } </style><br /><br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-44077388130364045552019-12-26T17:48:00.000-05:002019-12-31T13:05:33.164-05:001-D Cutting Stock with OverlapAn interesting variation of the <a href="https://en.wikipedia.org/wiki/Cutting_stock_problem" target="_blank">1-D cutting stock problem</a> was posed on <a href="https://or.stackexchange.com/questions/3179/minimize-number-of-pieces-required-to-cover-distances-with-overlap/" target="_blank">OR Stack Exchange</a>. Rather than having a specified demand for various size cut pieces, the requirement is to cut pieces to cover a specified number of units of fixed lengths. Moreover, you can use multiple pieces to cover one target unit, but if you do, you have to allow a specified overlap between adjacent pieces.<br /><br />There are two common ways to build an integer programming model for a "typical" 1-D cutting stock problem. One is to use decision variables for how many pieces of each defined length are cut from each piece of stock (along with a variable or variables for how many pieces of stock are used). That approach contains a high degree of symmetry, since it uses an index for each piece of stock that is cut and the solution is invariant to permutations in that index. The other is to define cutting patterns, and use decision variables for how many pieces of stock are cut according to each pattern (along with variables for how many cut pieces result). There's no symmetry issue, but the number of possible patterns blows up pretty quickly. For problems where the number of patterns is too large to comfortably enumerate, Gilmore and Gomory [1] proposed a <a href="https://en.wikipedia.org/wiki/Column_generation" target="_blank">column generation</a> approach (which has since been improved on by others). They start with a small number of patterns and relax the integrality conditions to get a linear program. Each time they solve the LP, they use the dual values in a subproblem to see if they can generate a new pattern that would improve the LP solution. If so, they add the pattern, solve the modified LP, and repeat. When no new pattern emerges, they restore the integrality restrictions, solve a MILP once, and get a good (though not provably optimal) solution. More recently, <a href="https://en.wikipedia.org/wiki/Branch_and_price" target="_blank">branch and price</a> (or branch-price-and-cut as it is sometimes known) has emerged as a method for finding a provably optimal solution using patterns.<br /><br />For the problem at hand, we actually have two places where we must choose between piece counts and symmetry or pattern counts: how we will cut the stock, and how we will piece together covers for the output parts. I'll reproduce here the answer I gave on OR Stack Exchange, which mixes the two methods (patterns for cutting, piece counts for output).<br /><br />Let $L$ be the length of a piece of stock and $D$ the length of a part that needs to be covered. (I'll keep this simple with a single size for stock and a single size item to be covered, but you can easily expand to multiple lengths of either by tacking on subscripts.) $N$ will denote the number of items needing to be covered. $M$ will be the required overlap between adjacent cover pieces.<br /><br />Suppose that we have a set $\mathcal{P}$ of patterns for cutting the stock, and a set $\mathcal{K}$ of possible piece sizes (where $k\in\mathcal{K}\implies k\le \min\{L, D\}$). Our first decision variable will be the number $x_p\in \mathbb{Z}_+$ of units of stock cut according to pattern $p\in\mathcal{P}$. We want to minimize the use of stock. Assuming there is no scrap value for unused cut pieces or trim waste, the objective is just $$\min \sum_{p\in\mathcal{P}} x_p.$$Now let $\pi_{k,p}$ be the number of pieces of length $k\in\mathcal{K}$ we get from one unit of stock cut according to pattern $p\in\mathcal{P}$, and define variable $y_k \ge 0$ to be the total number of pieces of length $k\in\mathcal{K}$ produced by all cutting operations. ($y_k$ will take integer values, but we do not need to restrict it to be integer.) The constraints tying the $y$ variables to production are $$y_k = \sum_{p\in\mathcal{P}} \pi_{k,p}x_p \, \forall k\in\mathcal{K}.$$Finally, let $n=1,\dots,N$ enumerate the items needing to be covered, and let variable $z_{k,n}\in \mathbb{Z}_+$ denote the number of pieces of length $k\in\mathcal{K}$ used to cover item $n\in\{1,\dots,N\}$. To be sure that we do not use pieces we do not have, we add constraints $$\sum_{n=1}^N z_{k,n} \le y_k \,\forall k\in\mathcal{K}.$$Note that this is a "single period" model (we fulfill demand once and that's that). If the cutting and covering operations were to be repeated over time, we would need to add inventory constraints to account for excess cut pieces being carried over to a subsequent period. Finally, we need to be sure that the solution actually covers all the output units, and meets the overlap conditions. We combine both those restrictions into a single set of constraints:$$\sum_{k\in\mathcal{K}} k\cdot z_{k,n} \ge D + M\left(\sum_{k\in\mathcal{K}}z_{k,n} - 1\right) \,\forall n=1,\dots,N.$$By using an inequality there rather than an equation, I snuck in an assumption that excess length can be trimmed at no cost.<br /><br />This formulation uses patterns on the cutting side but not on the covering side. That means the covering side is subject to symmetry concerns. Namely, given any feasible solution the model, if we permute the subscript $n$ we get another feasible solution, nominally different but with an identical cost. For instance, suppose that $D=11$ and $M=1$, and that we have a solution that covers unit 1 with two pieces of length 6 and unit 2 with two pieces of length 5 and one piece of length 3. Change that so that unit 2 gets two pieces of length 6 and unit 1 gets two pieces of length 5 and one of length 3, and you have a "different" solution that really is not different in any meaningful way. There are various ways to reduce or eliminate the symmetry via added constraints. One simple way to reduce (but not eliminate) the symmetry is to require that the units receiving the most pieces have the lowest indices. The added constraints are $$\sum_{k\in\mathcal{K}}z_{k,n} \ge \sum_{k\in\mathcal{K}}z_{k,n+1} \,\forall n=1,\dots,N-1.$$(This still leaves symmetry among covering patterns that use the same number of pieces, such as 5-5-3 versus 5-4-4.)<br /><br />[1] Gilmore, P. C. & Gomory, R. E. A Linear Programming Approach to the Cutting-Stock Problem. <i>Operations Research</i>, <b>1961</b>, 9, 849-859<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0