tag:blogger.com,1999:blog-87813834610619295712020-03-29T04:15:31.930-04:00OR in an OB WorldA mix of operations research items and software tricks that I'll probably forget if I don't write them down somewhere.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.comBlogger404125tag:blogger.com,1999:blog-8781383461061929571.post-75593512890719994322020-03-27T21:39:00.000-04:002020-03-27T21:39:07.662-04:00A Minimum Variance Partition ProblemSomeone posed a <a href="https://math.stackexchange.com/questions/3594343/combinatorial-optimization-find-partitions/3597944" target="_blank">question</a> on Mathematics Stack Exchange that I answered there. This post is to explain the model I used (and generalize a bit).<br /><br />In general terms, you start with a collection $\Omega=[\omega_1, \dots, \omega_n]$ of positive integers. I choose to think of the integers as weights. The mission is to partition $\Omega$ into subcollections $S_1, \dots, S_m$ such that (a) you use as few subcollections as possible (i.e., minimize $m$), (b) the variance of their weight totals is minimal, and (c) no subcollection in the partition has total weight greater than some specified constant $L$. (If, for example, $S_1 = [\omega_4, \omega_9, \omega_{10}]$ then the weight total for $S_1$ is $\omega_4 + \omega_9 + \omega_{10}$.) Why am I saying "collections" rather than sets? This would be a set partitioning problem were it not for the fact that $\Omega$ may contain duplicate entries and so is technically a multiset.<br /><br />The term "variance" is slightly ambiguous (population variance or sample variance), but happily that will not matter here. Let $$s_j =\sum_{i : \omega_i \in S_j} \omega_i$$be the weight of subcollection $S_j$. Since we are talking about a partition here (every item in exactly one subcollection), $$\sum_{j=1}^m s_j = \sum_{i=1}^n \omega_i = n \bar{\omega}$$where $\bar{\omega}$ is the (constant) mean of all the item weights, and so the mean of the subcollection weights is $$\bar{s} = \frac{n\bar{\omega}}{m},$$which for a given number $m$ of subcollections is constant. The variance of the subcollection weights is $$\frac{1}{m}\sum_{j=1}^m (s_j - \bar{s})^2=\frac{1}{m}\sum_{j=1}^m s_j^2-\bar{s}^2.$$To minimize that, we can just minimize $\sum_j s_j^2$ in the optimization model, and then do the arithmetic to get the variance afterward.<br /><br />As originally articulated, the problem has two objectives: minimize the variance of the subcollection weights and minimize the number of subcollections used. Normally, a multiobjective problem involves setting priorities or assigning weights to objectives or something along those lines. For the original question, where there are only $n=12$ elements in $\Omega$ (and thus at most $m=12$ sets in the partition), a plausible approach is to find a minimum variance solution for each possible value of $m$ and present the user with the efficient frontier of the solution space, letting the user determine the trade-off between partition size and variance that most appeals to them. For the particular example in the SE question, I was a bit surprised to find that the variance increased monotonically with the number of sets in the partition, as seen in the following table.<br /><center><!--l. 23--><br /><div class="noindent"></div><div class="tabular"><table cellpadding="0" cellspacing="0" class="tabular" id="TBL-1"><colgroup id="TBL-1-1g"><col id="TBL-1-1"></col><col id="TBL-1-2"></col><col id="TBL-1-3"></col></colgroup><tbody><tr id="TBL-1-1-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-1-1" style="text-align: center; white-space: nowrap;"># of Subcollections</td><td class="td11" id="TBL-1-1-2" style="text-align: center; white-space: nowrap;"> </td><td class="td11" id="TBL-1-1-3" style="text-align: center; white-space: nowrap;">Minimum Variance</td></tr><tr class="hline"><td><hr /></td><td><hr /></td><td><hr /></td></tr><tr id="TBL-1-2-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-2-1" style="text-align: center; white-space: nowrap;">1 </td><td class="td11" id="TBL-1-2-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-2-3" style="text-align: center; white-space: nowrap;">infeasible </td></tr><tr id="TBL-1-3-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-3-1" style="text-align: center; white-space: nowrap;">2 </td><td class="td11" id="TBL-1-3-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-3-3" style="text-align: center; white-space: nowrap;">infeasible</td></tr><tr id="TBL-1-4-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-4-1" style="text-align: center; white-space: nowrap;">3 </td><td class="td11" id="TBL-1-4-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-4-3" style="text-align: center; white-space: nowrap;">infeasible </td></tr><tr id="TBL-1-5-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-5-1" style="text-align: center; white-space: nowrap;">4 </td><td class="td11" id="TBL-1-5-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-5-3" style="text-align: center; white-space: nowrap;">infeasible</td></tr><tr id="TBL-1-6-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-6-1" style="text-align: center; white-space: nowrap;">5 </td><td class="td11" id="TBL-1-6-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-6-3" style="text-align: center; white-space: nowrap;">672 </td></tr><tr id="TBL-1-7-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-7-1" style="text-align: center; white-space: nowrap;">6 </td><td class="td11" id="TBL-1-7-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-7-3" style="text-align: center; white-space: nowrap;">1048.22</td></tr><tr id="TBL-1-8-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-8-1" style="text-align: center; white-space: nowrap;">7 </td><td class="td11" id="TBL-1-8-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-8-3" style="text-align: center; white-space: nowrap;">1717.39 </td></tr><tr id="TBL-1-9-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-9-1" style="text-align: center; white-space: nowrap;">8 </td><td class="td11" id="TBL-1-9-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-9-3" style="text-align: center; white-space: nowrap;">3824.75 </td></tr><tr id="TBL-1-10-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-10-1" style="text-align: center; white-space: nowrap;">9 </td><td class="td11" id="TBL-1-10-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-10-3" style="text-align: center; white-space: nowrap;">7473.73 </td></tr><tr id="TBL-1-11-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-11-1" style="text-align: center; white-space: nowrap;">10 </td><td class="td11" id="TBL-1-11-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-11-3" style="text-align: center; white-space: nowrap;">9025.80 </td></tr><tr id="TBL-1-12-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-12-1" style="text-align: center; white-space: nowrap;">11 </td><td class="td11" id="TBL-1-12-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-12-3" style="text-align: center; white-space: nowrap;">9404.81 </td></tr><tr id="TBL-1-13-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-13-1" style="text-align: center; white-space: nowrap;">12 </td><td class="td11" id="TBL-1-13-2" style="text-align: center; white-space: nowrap;"></td><td class="td11" id="TBL-1-13-3" style="text-align: center; white-space: nowrap;">11667.06</td></tr><tr id="TBL-1-14-" style="vertical-align: baseline;"><td class="td11" id="TBL-1-14-1" style="text-align: center; white-space: nowrap;"></td></tr></tbody></table></div></center><br />So the five subcollection partition is minimal both in number of subcollections and weight variance. I would be surprised if that generalized.<br /><br />The QP formulation uses two sets of variables. For $i=1,\dots,n$ and $j=1,\dots,m$, $x_{ij}\in \lbrace 0, 1\rbrace$ determines whether $\omega_i$ belongs to $S_j$ ($x_{ij}=1$) or not ($x_{ij}=0$). For $j=1,\dots,m$, $s_j \ge 0$ is the total weight of subcollection $j$. The objective function is simply $$\min_{x,s} \sum_{j=1}^m s_j^2$$which happily is convex. The first constraint enforces the requirement that the subcollections form a partition (every element being used exactly once): $$\sum_{j=1}^m x_{ij} = 1 \quad\forall i\in \lbrace 1,\dots,n\rbrace.$$For a partition to be valid, we do not want any empty sets, which leads to the following constraint:$$\sum_{i=1}^n x_{ij} \ge 1\quad\forall j\in\lbrace 1,\dots,m\rbrace.$$ The next constraint defines the weight variables: $$s_j = \sum_{i=1}^n \omega_i x_{ij}\quad\forall j\in\lbrace 1,\dots,m\rbrace.$$What about the limit $L$ on the weight of any subcollection? We can enforce that by limiting the domain of the $s$ variables to $s_j\in [0,L]$ for all $j$.<br /><br />Finally, there is symmetry in the model. One source is duplicate items in $\Omega$ ($\omega_i = \omega_{i'}$ for some $i\neq i'$). That happens in the data of the SE question, but it's probably not worth dealing with in an example that small. Another source of symmetry is that, given any partition, you get another partition with identical variance by permuting the indices of the subcollections. That one is easily fixed by requiring the subcollections to be indexed in ascending (or at least nondecreasing) weight order:$$s_j \le s_{j+1}\quad\forall j\in\lbrace 1,\dots,m-1\rbrace.$$Is it worth bothering with? With the anti-symmetry constraint included, total run time for all 12 models was between one and two seconds on my PC. Without the anti-symmetry constraint, the same solutions for the same 12 models were found in a bit over 100 minutes.<br /><br />If you would like to see my Java code (which requires CPLEX), you can find it at <a href="https://gitlab.msu.edu/orobworld/mvpartition">https://gitlab.msu.edu/orobworld/mvpartition</a>. <br /><div style="clear: both;"></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-16998495410879108602020-03-24T13:29:00.000-04:002020-03-24T13:40:46.818-04:00Approximating Nonlinear Functions: Tangents v. Secants<div class="separator" style="clear: both; text-align: center;"><br /></div>In a (much) earlier post [1], I discussed using SOS2 constraints to approximate a nonlinear function in a mixed-integer linear program (MILP). In this post, and probably the next one or two, I'm going to expand on that theme a bit. (By "a bit" I mean "endlessly".)<br /><br /><h3>Summary</h3><br />To save you some reading, in case you end up not interested, I'll summarize the take-aways here.<br /><ol><li>We can approximate nonlinear functions in a MILP model using piecewise-linear functions based on either tangents or secants. Piecewise-linear functions implicitly add either binary variables or SOS2 constraints (or maybe both).</li><li>For convex functions only appearing in $\le$ constraints or concave functions only appearing in $\ge$ constraints, we can say that using tangents may produce superoptimal solutions (look optimal but are actually infeasible) but will not produce suboptimal solutions, while using secants may produce suboptimal solutions but will not produce infeasible solutions. In any other case, super- and sub-optimality are both risks.</li><li>For the same two combinations in the previous paragraph, we have a third option: using a set of linear inequalities based on tangents, rather than a piecewise-linear function. Besides being somewhat easier to set up, this has a potential advantage: using a callback, we can refine the approximation on the fly (by adding additional constraints based on new tangents as they are needed).</li></ol><h3>Gory details</h3><br />Suppose that we are working on a MILP, and we need to incorporate a nonlinear function of some of the variables. To keep things simple, I'll illustrate with functions $f:\mathbb{R}\rightarrow\mathbb{R}$ of a single variable, but the concepts presented will extend (not necessarily easily) to functions of multiple variables. In what follows, I will refer to the set of points $(x,f(x))$ as the <i>surface</i> of $f$. A <i>tangent</i> to the surface at $x$ is a line (more generally, hyperplane) that touches the surface at $x$ and has the same slope as the surface at that one point. A <i>secant</i> to the surface is a line (more generally, hyperplane) that intersects the surface at multiple points. For a function of one dimension, a <i>chord</i> is a segment of a secant between two consecutive intersection points. ("Consecutive" means that there are no other intersections of secant and surface between the endpoints of the chord.) I'm not sure if the term "chord" is used in higher dimensions, although the concept extends to higher dimensions.<br /><br />Figure 1 shows a convex function together with a few tangents (in blue).<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><br /><br /><a href="https://1.bp.blogspot.com/-svjzglyWeeo/XnlBefThK0I/AAAAAAAADJo/ilR4XEWBNzMI0xCwDEjjJrcOz5ol8nZGQCLcBGAsYHQ/s1600/cvx_tan.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="330" data-original-width="534" height="245" src="https://1.bp.blogspot.com/-svjzglyWeeo/XnlBefThK0I/AAAAAAAADJo/ilR4XEWBNzMI0xCwDEjjJrcOz5ol8nZGQCLcBGAsYHQ/s400/cvx_tan.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 1: Tangents to convex function<br /><style type="text/css">/* Layout-provided Styles */ div.standard { margin-bottom: 2ex; }</style></td></tr></tbody></table><div class="composeBoxWrapper K3JSBVB-R-h"></div>Figure 2 shows the same function with a few secants (in green). One of the secants intersects the surface at $(-5,15)$ and $(0,0)$. The dashed line segment between those points is a chord.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-1ydPrdHjx70/XnlBeJwlBxI/AAAAAAAADJ8/b0np8lLLMhQHytme3guYTrc8JC0oth1pACEwYBhgL/s1600/cvx_sec.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="330" data-original-width="534" height="246" src="https://1.bp.blogspot.com/-1ydPrdHjx70/XnlBeJwlBxI/AAAAAAAADJ8/b0np8lLLMhQHytme3guYTrc8JC0oth1pACEwYBhgL/s400/cvx_sec.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 2: Secants to convex function</td></tr></tbody></table>Figures 3 and 4 do the same for a concave function.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-mJSfnecoWWc/XnlBdRIp5jI/AAAAAAAADJ4/8MgXCIxjqrAW9L5XQEzpWZbSuvwd1_OMwCEwYBhgL/s1600/concave_tan.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="330" data-original-width="534" height="246" src="https://1.bp.blogspot.com/-mJSfnecoWWc/XnlBdRIp5jI/AAAAAAAADJ4/8MgXCIxjqrAW9L5XQEzpWZbSuvwd1_OMwCEwYBhgL/s400/concave_tan.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 3: Tangents to concave function</td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td></tr></tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-hGFKn5oNcxE/XnlBdQfyEjI/AAAAAAAADKI/kT-Eccpm1pMR3AoGP2Suu7CHEQoheD_CgCEwYBhgL/s1600/concave_sec.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="330" data-original-width="534" height="246" src="https://1.bp.blogspot.com/-hGFKn5oNcxE/XnlBdQfyEjI/AAAAAAAADKI/kT-Eccpm1pMR3AoGP2Suu7CHEQoheD_CgCEwYBhgL/s400/concave_sec.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 4: Secants to concave function</td></tr></tbody></table><br />A useful property of convex and concave functions is that tangents of convex functions are always less than or equal to the functions and tangents of concave functions are always greater than or equal to the functions. Similar claims cannot be made for secants. A chord of a convex function will always be greater than or equal to the function, but other points on its secant can be less than the function. For instance, in Figure 2 the chord from $x=-5$ to $x=0$ lies above the function surface, but the secant to which it belongs falls below the function surface for $x<-5$ or $x>0$. It is worth noting that tangents to nonconvex functions do not stay on one side of the surface. Figure 5 shows a nonconvex function along with one tangent (in blue) and one secant (in green). Both the tangent and the secant are above the function surface some places and below it other places.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-b3Oqq_zeS0U/XnlBfa24SsI/AAAAAAAADKA/GinoU18xBZkyij_UTR-22D4OInr6TzCQgCEwYBhgL/s1600/nonconvex.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="482" data-original-width="594" height="323" src="https://1.bp.blogspot.com/-b3Oqq_zeS0U/XnlBfa24SsI/AAAAAAAADKA/GinoU18xBZkyij_UTR-22D4OInr6TzCQgCEwYBhgL/s400/nonconvex.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 5: Nonconvex function</td></tr></tbody></table><br />In what follows, when I talk about convexity of the feasible region, I will be referring to the feasible region of the continuous relaxation (all integrality restrictions relaxed). Convexity of that region is critical to algorithms using LP bounds. There are two cases of initial interest, because they preserve that convexity. If the function $f()$ is convex and appears only in constraints of the form $f(x)+\dots\le b$ (where $\dots$ is empty or contains only convex terms), or is concave and appears only in constraints of the form $f(x)+\dots\ge b$ (where $\dots$ is empty or contains only concave terms), and if the other constraints do nothing to violate convexity, then the feasible region remains convex when $f()$ is introduced. Under any other scenario, you lose convexity of the feasible region when you introduce $f()$. So for now we will only consider convex $f()$ in $\le$ constraints or concave $f()$ in $\ge$ constraints. If $f()$ is convex or concave but appears in an inequality constraint with the wrong direction, or appears in an equation constraint, or if $f()$ is neither convex nor concave, we will deal with it later (or never, whichever comes second).<br /><br />For simplicity, suppose that $f()$ is convex. We introduce a new variable $z$ that is a surrogate for $f(x)$, changing each constraint of the form $f(x)+\dots\le b$ to $z+\dots\le b$. Now we have to tie $z$ to $f(x)$, but without using the explicit constraint $z=f(x)$ (due to that whole nonlinearity thing). We can approximate the relationship either with tangents or with secants. With secants, we pick an arbitrary collection of values of $x$, calculate the corresponding values of $f(x)$, define a piecewise-linear function $\hat{f}(x)$ based on those values, and set $z=\hat{f}(x)$. In Figure 2, $\hat{f}(x)$ consists of the four chords shown (with endpoints at $x\in\left\{ -10,-5,0,5,10\right\} $).<br /><br />With tangents, we have a choice. Both options involve computing tangents at an arbitrary collection of values of $x$. In one version, we find where those tangents intersect and use the intersection points to define a piecewise-linear function. In Figure 1, the five tangents shown intersect at $\left\{ (-7.5,35),(-2.5,-5),(2.5,5),(7.5,65)\right\} $. We can add endpoints for where the first and last tangents exit the interval of interest (say $(-10,80)$ and $(10,120)$ for Figure 1). In the other version, we add the constraints $z\ge\ell_{j}(x)$ for $j=1,\dots,t$, where $t$ is the number of tangents computed and $\ell_{1}(x),\dots,\ell_{t}(x)$ are the linear functions corresponding to the tangents. In Figure 1, $t=5$ and for any $x$ we are constraining $z$ to be greater than or equal to the ordinates of all five dotted lines at that $x$. If $f()$ is concave and appears in the constraint $f(x)\ge b$, the only change is that the added constraints are $z\le\ell_{j}(x)$.<br /><br />At this point, there are several things we can note.<br /><ol><li>Calculating chords only requires the evaluation of $f(x)$ for various values of $x$ (plus some simple algebra). Calculating tangents requires computing the first derivative of $f(x)$ when $f()$ is smooth. (More generally, it involves calculating the subgradient of $f()$ at $x$.) It then requires calculating where the tangents intersect, if we are going to make a piecewise-linear function from the tangents. So if $f()$ is painful to work with, or if sorting out a lot of intersections does not appeal to you, you might prefer chords.</li><li>Chords involve embedding a piecewise-linear function in a model. As discussed in [1], this can be done using type two special ordered sets (SOS2), which turns an LP into a MIP and turns a MIP into a somewhat more complicated MIP. In another post [2], I discussed a handy feature in the CPLEX APIs that allows for easy entry of a piecewise-linear function. (In the Java API, the method is <span style="font-family: "courier new" , "courier" , monospace;">IloCplexModeler.piecewiseLinear()</span>, which in turn implements a method from the <span style="font-family: "courier new" , "courier" , monospace;">IloMPModeler</span> interface.) The API method still introduces either binary variables or SOS2 constraints "under the hood", but at least the user is spared the tedious business of computing them. With tangents, the first option described does the same thing, but the second option just involves adding a bunch of linear constraints. (In CPLEX, if you calculate a lot of tangents, you might consider adding the constraints as "lazy constraints".)</li><li>The way the approximations affect the solution differs between tangents and secants. <ol><li>In Figures 1 and 3, it is easy to see that if we replace $f(x)$ with a finite set of tangents, every feasible point remains feasible, but some infeasible points appear feasible. For instance, take the constraint $f(x)\le38$ (which becomes $z\le38$) with the function in Figure 1, redrawn below in Figure 6. The horizontal dotted line (in red) shows the limit (38) imposed by the constraint. The vertical dotted line (in cyan) shows what happens when $x=-7.5$. The black point, $(-7.5,f(-7.5))$ is above the red dotted line, indicating that $x=-7.5$ is infeasible. The blue point, $(-7.5,35)$, is however feasible in the relaxed problem, since it is on or above all the selected tangents and below the constraint limit. So the solver might accept a solution containing $x=-7.5$.</li><li>In Figures 2 and 4, it is likewise easy to see that if we replace $f(x)$ with a piecewise-linear function consisting of chords, solutions that are feasible in the relaxed problem will also be feasible in the original problem, but "optimal" solutions to the relaxed problem may actually be suboptimal. The secant approximation to our convex function in Figure 2 is redrawn below in Figure 7. Staying with the constraint $z\le38$, consider what happens at $x=-7$ (the dotted vertical line). The actual function value $f(-7)=35$ is below the constraint limit, so $x=-7$ satisfies the constraint. (This is the black point on the graph.) Barring exclusion due to other constraints, it would be feasible and might be part of an optimal solution. The green point $(-7,41)$, on the piecewise-linear function defined by the secants, is not feasible, however, and so the solver will reject any solution containing $x=-7$, including possibly the true optimum.</li></ol></li></ol><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-htEGTesdKno/XnlBe7gCBbI/AAAAAAAADKE/gQhO0BGCt2MPbqhpyWGCZtJ5eaq8IaKXACEwYBhgL/s1600/infeasible.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="482" data-original-width="594" height="323" src="https://1.bp.blogspot.com/-htEGTesdKno/XnlBe7gCBbI/AAAAAAAADKE/gQhO0BGCt2MPbqhpyWGCZtJ5eaq8IaKXACEwYBhgL/s400/infeasible.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 6: Infeasible point</td></tr></tbody></table><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-gyHqTxo7cnY/XnlBfiyORcI/AAAAAAAADKM/opvI9qL0uk4XR1xZNjoPPYYOGDsdK3mRQCEwYBhgL/s1600/suboptimal.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="482" data-original-width="594" height="323" src="https://1.bp.blogspot.com/-gyHqTxo7cnY/XnlBfiyORcI/AAAAAAAADKM/opvI9qL0uk4XR1xZNjoPPYYOGDsdK3mRQCEwYBhgL/s400/suboptimal.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 7: Feasible point excluded</td></tr></tbody></table><br />Since tangents may produce infeasible "solutions" and secants may produce suboptimal solutions, we may need to solve the problem, get a sense of where the optimal solution might be located, refine the approximation near there and repeat. Note that if we use tangents for convex or concave functions (which produces an "outer approximation" to the original problem) and the final solution is really feasible (using the actual $f()$), there is no need to refine further. Unfortunately, with secants, I see no obvious way to know that our approximation is "good enough". With functions that are neither convex nor concave, or with convex/concave functions in equations or inequalities going the "wrong" direction, tangents no longer provide an outer approximation (Figure 5), and we may need to refine and repeat.<br /><br />This brings me to my last point, which pertains to the cases where tangents do provide an outer approximation and where we add all the tangents as constraints rather than turning them into a piecewise-linear function. With solvers, like CPLEX, that allow users to post new constraints during the branch-and-bound process, we can refine on the fly. Referring back to Figure 6, suppose that I am solving the problem using CPLEX, with a lazy constraint callback attached, and CPLEX identifies a solution containing $x=-7.5$. Inside the callback, I can compute $f(-7.5)$, observe that it is greater than the constraint cutoff of $38$ (and also greater than $z=35$), calculate a new tangent at the black point, and add it as a lazy constraint. This will make the proposed new incumbent infeasible. Figure 8 shows this, with the orange dashed line being the tangent at $x=-7.5$. After adding it to our set of tangents (and adding the constraint $z\ge\ell(x)$, where $\ell(x)$ is the linear function given by the new tangent), the blue point no longer gives a possible value for $z$. At $x=-7.5$, the maximum value of any tangent will be the value of $\ell(-7.5)$, and that will equal $f(-7.5)$ since we calculated the tangent there. (It's hard to see in the figure, but trust me, the orange line goes through the black point.) So the solver will update $z$ to the correct value and, in this case, reject the solution as violating the constraint $z\le38$.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-PoNf6RW78_0/XnlBdZe_NhI/AAAAAAAADKI/4HUIrrontW0gD65JppRwcjcTVWKhzUgWwCEwYBhgL/s1600/addcut.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="482" data-original-width="594" height="323" src="https://1.bp.blogspot.com/-PoNf6RW78_0/XnlBdZe_NhI/AAAAAAAADKI/4HUIrrontW0gD65JppRwcjcTVWKhzUgWwCEwYBhgL/s400/addcut.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 8: Adding a tangent</td></tr></tbody></table><br /><h3>References</h3>[1] "<a href="https://orinanobworld.blogspot.com/2014/11/linearize-that.html" target="_blank">Linearize That!</a>"<br />[2] "<a href="https://orinanobworld.blogspot.com/2012/07/piecewise-linear-functions-in-cplex.html" target="_blank">Piecewise Linear Functions in CPLEX</a>"Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-37127926527169686442020-02-17T19:53:00.003-05:002020-02-18T14:26:05.926-05:00Reversing DifferencesFellow blogger <a href="http://hakank.org/" target="_blank">Håkan Kjellerstrand</a> posted an <a href="https://or.stackexchange.com/questions/3349/restoring-a-list-from-differences" target="_blank">interesting question</a> on OR Stack Exchange recently. Starting from a list of integers, it is trivial to compute the list of all pairwise absolute differences, but what about going in the other direction? Given the pairwise (absolute) differences, with duplicates removed, can you recover the source list (or a source list)?<br /><br />We can view the source "list" as a vector $x\in\mathbb{Z}^n$ for some dimension $n$ (equal to the length of the list). With duplicates removed, we can view the differences as a set $D\subset \mathbb{Z}_+$. So the question has to do with recovering $x$ from $D$. Our first observation kills any chance of recovering the original list with certainty:<br /><blockquote class="tr_bq">If $x$ produces difference set $D$, then for any $t\in\mathbb{R}$ the vector $x+t\cdot (1,\dots,1)'$ produces the same set $D$ of differences.</blockquote>Translating all components of $x$ by a constant amount has no effect on the differences. So there will be an infinite number of solutions for a given difference set $D$. A reasonable approach (proposed by Håkan in his question) is to look for the shortest possible list, i.e., minimize $n$.<br /><br />Next, observe that $0\in D$ if and only if two components of $x$ are identical. If $0\notin D$, we can assume that all components of $x$ are distinct. If $0\in D$, we can solve the problem for $D\backslash\{0\}$ and then duplicate any one component of the resulting vector $x$ to get a minimum dimension solution to the original problem.<br /><br />Combining the assumption that $0\notin D$ and the observation about adding a constant having no effect, we can assume that the minimum element of $x$ is 1. That in turn implies that the maximum element of $x$ is $1+m$ where $m=\max(D)$.<br /><br />From there, Håkan went on to solve a test problem using constraint programming (CP). Although I'm inclined to suspect that CP will be more efficient in general than an integer programming (IP) model, I went ahead and solved his test problem via an IP model (coded in Java and solved using CPLEX 12.10). CPLEX's solution pool feature found the same four solutions to Håkan's example that he did, in under 100 ms. How well the IP method scales is an open question, but it certainly works for modest size problems.<br /><br />The IP model uses binary variables $z_1, \dots, z_{m+1}$ to decide which of the integers $1,\dots,m+1$ are included in the solution $x$. It also uses variables $w_{ij}\in [0,1]$ for all $i,j\in \{1,\dots,m+1\}$ such that $i \lt j$. The intent is that $w_{ij}=1$ if both $i$ and $j$ are included in the solution, and $w_{ij} = 0$ otherwise. We could declare the $w_{ij}$ to be binary, but we do not need to; constraints will force them to be $0$ or $1$.<br /><br />The full IP model is as follows:<br /><br /><div class="scroll">\[ \begin{array}{lrlrc} \min & \sum_{i=1}^{m+1}z_{i} & & & (1)\\ \textrm{s.t.} & w_{i,j} & \le z_{i} & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (2)\\ & w_{i,j} & \le z_{j} & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (3)\\ & w_{i,j} & \ge z_{i}+z_{j}-1 & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (4)\\ & w_{i,j} & =0 & \forall i,j\in\left\{ 1,\dots,m+1\right\} \textrm{ s.t. }(j-i)\notin D & (5)\\ & \sum_{i,j\in\left\{ 1,\dots,m+1\right\} |j-i=d}w_{i,j} & \ge 1 & \forall d\in D & (6)\\ & z_{1} & = 1 & & (7) \end{array} \]</div><br />The objective (1) minimizes the number of integers used. Constraints (2) through (4) enforce the rule that $w_{ij}=1$ if and only if both $z_i$ and $z_j$ are $1$ (i.e., if and only if both $i$ and $j$ are included in the solution). Constraint (5) precludes the inclusion of any pair $i < j$ whose difference $j - i$ is not in $D$, while constraint (6) says that for each difference $d \in D$ we must include at least one pair $i < j$ for that produces that difference ($j - i = d$). Finally, since we assumed that our solution starts with minimum value $1$, constraint (7) ensures that $1$ is in the solution. (This constraint is redundant, but appears to help the solver a little, although I can't be sure given the short run times.)<br /><br />My <a href="https://gitlab.msu.edu/orobworld/differences" target="_blank">Java code</a> is available from my repository (bring your own CPLEX). Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com5tag:blogger.com,1999:blog-8781383461061929571.post-34466774975110152362020-02-11T15:37:00.000-05:002020-02-11T15:37:06.374-05:00Collections of CPLEX VariablesRecently, someone asked for help online regarding an optimization model they were building using the CPLEX Java API. The underlying problem had some sort of network structure with $N$ nodes, and a dynamic aspect (something going on in each of $T$ periods, relating to arc flows I think). Forget about solving the problem: the program was running out of memory and dying while building the model.<br /><br />A major issue was that they allocated two $N\times N\times T$ arrays of variables, and $N$ and $T$ were big enough that $2N^2T$ was, to use a technical term, ginormous. Fortunately, the network was fairly sparse, and possibly not every time period was relevant for every arc. So by creating only the IloNumVar instances they needed (meaning only for arcs that actual exist in time periods that were actually relevant), they were able to get the model to build.<br /><br />That's the motivation for today's post. We have a tendency to write mathematical models using vectors or matrices of variables. So, for instance, $x_i \, (i=1,\dots,n)$ might be an inventory level at each of $n$ locations, or $y_{i,j} \, (i=1,\dots,m; j=1,\dots,n)$ might be the inventory of item $i$ at location $j$. It's a natural way of expressing things mathematically. Not coincidentally, I think, CPLEX APIs provide structures for storing vectors or matrices of variables and for passing them into or out of functions. That makes it easy to fall into the trap of thinking that variables <i>must be</i> organized into vectors or matrices.<br /><br />Last year I did a post ("<a href="https://orinanobworld.blogspot.com/2019/07/using-java-collections-with-cplex.html">Using Java Collections with CPLEX</a>") about using what Java calls "collections" to manage CPLEX variables. This is not unique to Java. I know that C++ has similar memory structures, and I think they exist in other languages you might use with CPLEX. The solution to the memory issue I mentioned at the start was to create a Java container class for each combination of an arc that actually exists and time epoch for which it would have a variable, and then associate instances of that class with CPLEX variables. So if we call the new class <span style="font-family: "Courier New", Courier, monospace;">AT</span> (my shorthand for "arc-time"), I suggested the model owner use a <span style="font-family: "Courier New", Courier, monospace;">Map<AT, IloNumVar></span> to associate each arc-time combination with the variable representing it and a <span style="font-family: "Courier New", Courier, monospace;">Map<IloNumVar, AT></span> to hold the reverse association. The particular type of map is mostly a matter of taste. (I generally use <span style="font-family: "Courier New", Courier, monospace;">HashMap</span>.) During model building, they would create only the <span style="font-family: "Courier New", Courier, monospace;">AT</span> instances they actually need, then create a variable for each and pair them up in the first map. When getting a solution from CPLEX, they would get a value for each variable and then use the second map to figure out for which arc and time that value applied. (As a side note, if you use maps and then need the variables in vector form, you can apply the <span style="font-family: "Courier New", Courier, monospace;">values()</span> method to the first map (or the <span style="font-family: "Courier New", Courier, monospace;">getKeySet()</span> method to the second one), and then apply the <span style="font-family: "Courier New", Courier, monospace;">toArray()</span> method to that collection.)<br /><br />Now you can certainly get a valid model using just arrays of variables, which was all that was available to me back in the Dark Ages when I used FORTRAN, but I think there are some benefits to using collections. Using arrays requires you to develop an indexing scheme for your variables. The indexing scheme tells you that the flow from node 3 to node 7 at time 4 will be occupy slot 17 in the master variable vector. Here are my reasons for avoiding that. <br /><ul><li>Done correctly, the indexing scheme is, in my opinion, a pain in the butt to manage. Finding the index for a particular variable while writing the code is time-consuming and has been known to kill brain cells.</li><li>It is easy to make mistakes while programming (calculate an index incorrectly).</li><li>Indexing invites the error of declaring an array or vector with one entry for each combination of component indices (that $N\times N\times T$ matrix above), without regard to whether you need all those slots. Doing so wastes time and space, and the space, as we saw, may be precious.</li><li>Creating slots that you do not need can lead to execution errors. Suppose that I allocating a vector <span style="font-family: "Courier New", Courier, monospace;">IloNumVar x = new IloNumVar[20]</span> and use 18 slots, omitting slots 0 and 13. If I solve the model and then call <span style="font-family: "Courier New", Courier, monospace;">getValues(x)</span><span style="font-family: "Courier New", Courier, monospace;"></span>, CPLEX will throw an exception, because I am asking for values of two variables (<span style="font-family: "Courier New", Courier, monospace;">x[0]</span> and <span style="font-family: "Courier New", Courier, monospace;">x[13]</span>) that do not exist. Even if I create variables for those two slots, the exception will occur, because those two variables will not belong to the model being solved. (There is a way to force CPLEX to include those variables in the model without using them, but it's one more pain in the butt to deal with.) I've lost count of how many times I've seen messages on the CPLEX help forums about exceptions that boiled down to "unextracted variables".</li></ul>So my advice is to embrace collections when building models where variables do not have an obvious index scheme (with no skips).Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-73003561398342538642020-01-30T14:16:00.000-05:002020-01-30T14:20:13.728-05:00Generic Callback Changes in CPLEX 12.10CPLEX 12.10 is out, and there have been a few changes to the new(ish) generic callbacks. Rather than go into them in detail (and likely screw something up), I'll just point you to the <a href="https://ibm.ent.box.com/s/t6dq7kvljjua6xd3qj2811h5bcufdw9n" target="_blank">slides for a presentation</a> by Daniel Junglas of IBM at the 2019 INFORMS Annual Meeting.<br /><br />I've written about half a dozen posts about generic callbacks since IBM introduced them (which you can find by typing "generic callback" in the search widget on the blog). A couple of things have been added recently, and I thought I would mention them. The generic callback approach uses a single callback function that can be called from a variety of contexts, including when CPLEX solves a node relaxation ("RELAXATION" context), when if finds a candidate solution ("CANDIDATE" context) and, now, when it is ready to split a node into children ("BRANCHING" context).<br /><br />The branching context is one of the new features. It brings back most of the functionality of the branch callback in the legacy callback system. Unfortunately, it does not seem to have the ability to attach user information to the child nodes, which was a feature that was occasionally useful in the legacy system. You can get more or less equivalent functionality by creating a data store (array, map, whatever) in your global memory and storing the node information keyed by the unique index number of each child node. The catch is that you are now responsible for memory management (freeing up space when a node is pruned and the associated information is no longer needed), and for dealing with thread synchronization issues.<br /><br />Another new feature is that you can now inject a heuristic solution (if you have one) from all three of the contexts I mentioned above. CPLEX gives you a variety of options for how it will handle the injected solution: "NoCheck" (CPLEX will trust you that it is feasible); "CheckFeasible" (CPLEX will check feasibility and ignore the solution if it is not feasible); "Propagate" (Daniel's explanation: CPLEX will "propagate fixed variables and accept if feasible"); and "Solve" (CPLEX will solve a MIP problem with fixed variables and accept the result if feasible). I assume the latter two mean that you provide a partial solution, fixing some variables but not others. (Unfortunately I was unable to make it to Daniel's talk, so I'm speculating here.)<br /><br />I'm not sure if those are the only new features, but they are the ones that are most relevant to me. I invite you to read through Daniel's slides to get a more complete picture, including both the reasons for switching from legacy callbacks to generic callbacks and some of the technical issues in using them.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-56340525588829501192020-01-07T14:37:00.000-05:002020-01-07T14:37:57.835-05:00Greedy Methods Can Be ExactWe generally sort optimization algorithms (as opposed to models) into two or three categories, based on how certain we are that solutions will be either optimal or at least "good". An <a href="https://or.stackexchange.com/a/936/67" target="_blank">answer by Michael Feldmeier</a> to a question I posted on OR Stack Exchange neatly summarizes the categories:<br /><ul><li><i>exact</i> methods eventually cough up provably optimal solutions;</li><li><i>approximate</i> methods eventually cough up solutions with some (meaningful) guarantee regarding how far from optimal they might be; and</li><li><i>heuristics</i> provide no worst-case guarantees (but generally are either easy to implement, fast to execute or both).</li></ul>I should explain my use of "meaningful" (which is not part of Michael's answer). A common way to estimate the "gap" between a solution and the optimum is to take $|z - \tilde{z}|/|z|$, where $z$ is the objective value of the solution produced by the algorithm and $\tilde{z}$ is some bound (lower bound in a minimization, upper bound in a maximization) of the optimal solution. Now suppose that we are minimizing a function known to be nonnegative. If we set $\tilde{s}=0$, we know that any method, no matter how stupid, will have a gap no worse than 100%. To me, that is not a meaningful guarantee. So I'll leave the definition of "meaningful" to the reader.<br /><br />What brings all this to mind is a <a href="https://math.stackexchange.com/q/3478711/458896" target="_blank">question</a> posted on Mathematics Stack Exchange. The author of the question was trying to solve a nonlinear integer program. He approached it by applying a "<a href="https://en.wikipedia.org/wiki/Greedy_algorithm" target="_blank">greedy algorithm</a>". Greedy algorithms are generally assumed to be heuristics, since it seldom is possible to provide useful guarantees on performance. In his case, though, the greedy algorithm is provably optimal, mainly due to the objective function being concave and separable. I'll state the problem and show a proof of optimality below (changing the original notation a bit). Brace yourself: the proof is a bit long-winded.<br /><br /><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">You start with $N$ workers to be assigned to $M$ work stations. The output of workstation $m$, as a function of the number of workers $x$ assigned to it, is given by $$f_{m}(x)=a_{m}x+b_{m}-\frac{c_{m}}{x},$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">where $a_{m},b_{m},c_{m}$ are all positive constants. Since $f(0)=-\infty$, we can assume that each work station gets at least one worker (and, consequently, that $N>M$). Since $f_{m}'(x)=a_{m}+c_{m}/x^{2}>0$, each $f_{m}()$ is monotonically increasing. Thus, we can safely assume that all $N$ workers will be assigned somewhere. $f_{m}''(x)=-2c_{m}/x^{3}<0$, so $f_{m}()$ is strictly concave (which we will need later). We also note, for future reference, that the impact of adding one worker to a current staff of $x$ at station $m$ is $$\Delta f_{m}(x)=a_{m}+\frac{c_{m}}{x(x+1)}>0.$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Similarly, the impact of removing one worker at station $m$ is $$\delta f_{m}(x)=-a_{m}-\frac{c_{m}}{x(x-1)}<0.$$We see that $\delta f_{m}(x)$ is an increasing function of $x$ (i.e., it gets less negative as $x$ gets bigger). We also note that $\Delta f_{m}(x)=-\delta f_{m}(x+1)$.</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The IP model is easy to state. Let $x_{m}$ be the number of workers assigned to work station $m$. The model is</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\max\sum_{m=1}^{M}f_{m}(x_{m})$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">subject to</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\sum_{m=1}^{M}x_{m}\le N$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">with </div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$x\in\mathbb{Z}_{+}^{M}.$$</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The greedy algorithm starts with a single worker at each station ($x=(1,\dots,1)$) and, at each step, adds one worker to the workstation where that worker produces the greatest increase in objective value (breaking ties arbitrarily). It stops when all $N$ workers are assigned. To prove that it actually finds an optimal solution, I'll use proof by contradiction.</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Let $x^{(0)},x^{(1)},\dots,x^{(N-M)}$ be the sequence of solutions constructed by the greedy algorithm, with $x^{(0)}=(1,\dots,1)$, and let $x^{(k)}$ be the last solution in the sequence for which an optimal solution $x^{*}$ exists such that $x^{(k)}\le x^{*}$. The significance of the inequality is that if $x\le x^{*}$, it is possible to extend the partial solution $x$ to the optimal solution $x^{*}$ by adding unassigned workers to work stations. We know that $k$ is well defined because $x^{(0)}\le x^{*}$ for any optimal $x^{*}$. Since we are assuming that the greedy algorithm does not find an optimum, it must be that $k<N-M$.</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Now identify the work station $j$ to which the greedy algorithm added a worker at step $k$, meaning that $x_{j}^{(k+1)}=x_{j}^{(k)}+1$ and $x_{i}^{(k+1)}=x_{i}^{(k)}$ for $i\neq j$. Since, by assumption, $x^{(k)}\le x^{*}$ but $x^{(k+1)}\not\le x^{*}$, it must be that $x_{j}^{(k)}=x_{j}^{*}$.</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Next, since $x^{(k)}\le x^{*}$ and $x^{(k)}\neq x^{*}$ (else $x^{(k)}$ would be optimal), there is some work station $h\neq j$ such that $x_{h}^{(k)}<x_{h}^{*}$. Let $\tilde{x}$ be the solution obtained from $x^{(k)}$ by adding a worker to station $h$: $\tilde{x}_{h}=x_{h}^{(k)}+1$ and $\tilde{x}_{i}=x_{i}^{(k)}$ for $i\neq h$. Observe that $\tilde{x}\le x^{*}$. The greedy algorithm chose work station $j$ over work station $h$ at $x^{(k)}$, so it must be that</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{(k)})\ge\Delta f_{h}(x_{h}^{(k)}). \quad (1)$$</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br /></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Finally, let $\hat{x}$ be the result of starting from optimal solution $x^{*}$ and shifting one worker from station $h$ to station $j$. Since </div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$x_{j}^{(k+1)}=x_{j}^{(k)}+1=x_{j}^{*}+1=\hat{x}_{j},$$ $$x_{h}^{(k+1)}=x_{h}^{(k)}<x_{h}^{*}\implies x_{h}^{(k+1)}\le\hat{x}_{h}$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">and </div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$x_{i}^{(k+1)}=x_{i}^{(k)}\le x_{i}^{*}=\hat{x}_{i}\,\forall i\notin\{h,j\},$$</div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">we have $x^{(k+1)}\le\hat{x}$. Under the assumption that $x^{(k)}$ was the last solution in the greedy sequence that could be extended to an optimal solution, it must be that $\hat{x}$ is not optimal. Thus the net change to the objective function at $x^{*}$ when shifting one worker from station $h$ to station $j$ must be negative, i.e.,</div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{*})+\delta f_{h}(x_{h}^{*})<0.\quad (2)$$</div><br /><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">We showed previously that, under our assumptions, $x_{j}^{(k)}=x_{j}^{*}$, from which it follows that </div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{*})=\Delta f_{j}(x_{j}^{(k)}). \quad (3)$$</div>We also showed that $\delta f_{h}()$ is an increasing function. Since $\tilde{x}_{h}\le x_{h}^{*}$, <br /><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\delta f_{h}(x_{h}^{*})\ge\delta f_{h}(\tilde{x}_{h})=-\Delta f_{h}(x_{h}^{(k)}). \quad (4)$$</div>Combining (4) with (2), we have<br /><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{*})-\Delta f_{h}(x_{h}^{(k)})<0,$$</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">i.e., </div><div style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{*})<\Delta f_{h}(x_{h}^{(k)}). \quad (5)$$</div><br /><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Combining (3) with (5) yields</div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">$$\Delta f_{j}(x_{j}^{(k)})<\Delta f_{h}(x_{h}^{(k)})$$</div><div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></div><div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">which contradicts (1).</div><style type="text/css">p, li { white-space: pre-wrap; } </style><br /><br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-44077388130364045552019-12-26T17:48:00.000-05:002019-12-31T13:05:33.164-05:001-D Cutting Stock with OverlapAn interesting variation of the <a href="https://en.wikipedia.org/wiki/Cutting_stock_problem" target="_blank">1-D cutting stock problem</a> was posed on <a href="https://or.stackexchange.com/questions/3179/minimize-number-of-pieces-required-to-cover-distances-with-overlap/" target="_blank">OR Stack Exchange</a>. Rather than having a specified demand for various size cut pieces, the requirement is to cut pieces to cover a specified number of units of fixed lengths. Moreover, you can use multiple pieces to cover one target unit, but if you do, you have to allow a specified overlap between adjacent pieces.<br /><br />There are two common ways to build an integer programming model for a "typical" 1-D cutting stock problem. One is to use decision variables for how many pieces of each defined length are cut from each piece of stock (along with a variable or variables for how many pieces of stock are used). That approach contains a high degree of symmetry, since it uses an index for each piece of stock that is cut and the solution is invariant to permutations in that index. The other is to define cutting patterns, and use decision variables for how many pieces of stock are cut according to each pattern (along with variables for how many cut pieces result). There's no symmetry issue, but the number of possible patterns blows up pretty quickly. For problems where the number of patterns is too large to comfortably enumerate, Gilmore and Gomory [1] proposed a <a href="https://en.wikipedia.org/wiki/Column_generation" target="_blank">column generation</a> approach (which has since been improved on by others). They start with a small number of patterns and relax the integrality conditions to get a linear program. Each time they solve the LP, they use the dual values in a subproblem to see if they can generate a new pattern that would improve the LP solution. If so, they add the pattern, solve the modified LP, and repeat. When no new pattern emerges, they restore the integrality restrictions, solve a MILP once, and get a good (though not provably optimal) solution. More recently, <a href="https://en.wikipedia.org/wiki/Branch_and_price" target="_blank">branch and price</a> (or branch-price-and-cut as it is sometimes known) has emerged as a method for finding a provably optimal solution using patterns.<br /><br />For the problem at hand, we actually have two places where we must choose between piece counts and symmetry or pattern counts: how we will cut the stock, and how we will piece together covers for the output parts. I'll reproduce here the answer I gave on OR Stack Exchange, which mixes the two methods (patterns for cutting, piece counts for output).<br /><br />Let $L$ be the length of a piece of stock and $D$ the length of a part that needs to be covered. (I'll keep this simple with a single size for stock and a single size item to be covered, but you can easily expand to multiple lengths of either by tacking on subscripts.) $N$ will denote the number of items needing to be covered. $M$ will be the required overlap between adjacent cover pieces.<br /><br />Suppose that we have a set $\mathcal{P}$ of patterns for cutting the stock, and a set $\mathcal{K}$ of possible piece sizes (where $k\in\mathcal{K}\implies k\le \min\{L, D\}$). Our first decision variable will be the number $x_p\in \mathbb{Z}_+$ of units of stock cut according to pattern $p\in\mathcal{P}$. We want to minimize the use of stock. Assuming there is no scrap value for unused cut pieces or trim waste, the objective is just $$\min \sum_{p\in\mathcal{P}} x_p.$$Now let $\pi_{k,p}$ be the number of pieces of length $k\in\mathcal{K}$ we get from one unit of stock cut according to pattern $p\in\mathcal{P}$, and define variable $y_k \ge 0$ to be the total number of pieces of length $k\in\mathcal{K}$ produced by all cutting operations. ($y_k$ will take integer values, but we do not need to restrict it to be integer.) The constraints tying the $y$ variables to production are $$y_k = \sum_{p\in\mathcal{P}} \pi_{k,p}x_p \, \forall k\in\mathcal{K}.$$Finally, let $n=1,\dots,N$ enumerate the items needing to be covered, and let variable $z_{k,n}\in \mathbb{Z}_+$ denote the number of pieces of length $k\in\mathcal{K}$ used to cover item $n\in\{1,\dots,N\}$. To be sure that we do not use pieces we do not have, we add constraints $$\sum_{n=1}^N z_{k,n} \le y_k \,\forall k\in\mathcal{K}.$$Note that this is a "single period" model (we fulfill demand once and that's that). If the cutting and covering operations were to be repeated over time, we would need to add inventory constraints to account for excess cut pieces being carried over to a subsequent period. Finally, we need to be sure that the solution actually covers all the output units, and meets the overlap conditions. We combine both those restrictions into a single set of constraints:$$\sum_{k\in\mathcal{K}} k\cdot z_{k,n} \ge D + M\left(\sum_{k\in\mathcal{K}}z_{k,n} - 1\right) \,\forall n=1,\dots,N.$$By using an inequality there rather than an equation, I snuck in an assumption that excess length can be trimmed at no cost.<br /><br />This formulation uses patterns on the cutting side but not on the covering side. That means the covering side is subject to symmetry concerns. Namely, given any feasible solution the model, if we permute the subscript $n$ we get another feasible solution, nominally different but with an identical cost. For instance, suppose that $D=11$ and $M=1$, and that we have a solution that covers unit 1 with two pieces of length 6 and unit 2 with two pieces of length 5 and one piece of length 3. Change that so that unit 2 gets two pieces of length 6 and unit 1 gets two pieces of length 5 and one of length 3, and you have a "different" solution that really is not different in any meaningful way. There are various ways to reduce or eliminate the symmetry via added constraints. One simple way to reduce (but not eliminate) the symmetry is to require that the units receiving the most pieces have the lowest indices. The added constraints are $$\sum_{k\in\mathcal{K}}z_{k,n} \ge \sum_{k\in\mathcal{K}}z_{k,n+1} \,\forall n=1,\dots,N-1.$$(This still leaves symmetry among covering patterns that use the same number of pieces, such as 5-5-3 versus 5-4-4.)<br /><br />[1] Gilmore, P. C. & Gomory, R. E. A Linear Programming Approach to the Cutting-Stock Problem. <i>Operations Research</i>, <b>1961</b>, 9, 849-859<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-36099395984290564562019-10-17T19:35:00.000-04:002019-10-17T19:35:12.393-04:00A Value-ordered Java MapFor a project I'm working on (in Java), I wanted to store pairs of items in a key-value map. The catch is that I need the items sorted according to the <i>values</i>, not the keys. Java provides an efficient, presorted mapping in the <a href="https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/TreeMap.html" target="_blank">TreeMap</a> class, but it sorts on the keys, not the values. Revering the roles of keys and values is not an option for me, as I might have several keys mapped to the same value.<br /><br />A Google search wasn't particularly helpful, other than to confirm that I was screwed. The most practical suggestions I saw revolved around the theme of using a conventional mapping (HashMap, TreeMap, whatever) and then, as needed, dumping the entries into a list or stream and sorting that using a comparator based on the values. For my application, though, this would mean modifying the map contents, exporting it and sorting it thousands of times, which would put a big drag on the application. So I really wanted something moderately efficient that would maintain the map contents in sorted order. Eventually, I gave up looking for a solution and wrote one myself. I gave my class the not entirely compact name <span style="font-family: "Courier New", Courier, monospace;">ValueOrderedMap</span>. It's generic, meaning you can specify any types for keys and values (as long as both types are comparable).<br /><br />I've implemented most of the commonly used methods associated with map classes (but not all known map methods), and I've tried to make it thread-safe (but as yet have not tested that). Anyone who wants to take it for a spin is welcome to. It's released under a Creative Commons license. There are only two classes, and only one (<span style="font-family: "Courier New", Courier, monospace;">ValueOrderedMap</span>) is essential; the other is just a small demo program. You can find the source code in my campus <a href="https://gitlab.msu.edu/orobworld/valueorderedmap" target="_blank">GitLab repository</a>. There's an issue tracker (requires registration) where you can report bugs or ask for new features. If you just want a binary jar file (along with the Javadoc, README and license files), I've parked a <a href="https://gitlab.msu.edu/orobworld/valueorderedmap" target="_blank">zip archive</a> on my faculty website.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-1785638123171792762019-10-08T17:41:00.001-04:002019-10-09T14:24:31.365-04:00Numerical Inaccuracy (Linear Algebra Edition)"There are three kinds of lies: lies, damned lies, and statistics." The <a href="https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics" target="_blank">origin of the statement</a> is lost in time, but the sentiment lives on. I'm tempted to give it a 21st century update. "There are three kinds of lies: lies, damned lies, and floating point arithmetic."<br /><br />I am currently working on a research project in which the following problem arises. We start with an $m \times n$ matrix $A$ with $m < n$, and a set of indices $I\subseteq \{1,\dots,n\}$. I need to confirm that $x_i = 0$ for all $i\in I$ and for all $x\in \mathcal{N}$, where $\mathcal{N} = \{x\in \mathbb{R}^n : Ax = 0\}$. There are various names for $\mathcal{N}$; I'll go with the <i>null space</i> of $A$.<br /><br />How do you test whether a particular component of every vector in an uncountably infinite set is zero? Since $\mathcal{N}$ is a vector space, we can compute a basis for it and just check the basis vectors. If $x_i =0$ for every basis vector $x$ of $\mathcal{N}$, it will be true for every vector in $\mathcal{N}$. <br /><br />This is going on in a Java program, and I have tried multiple linear algebra packages. Initially I had the idea to use a QR decomposition of $A$ to get a basis for the null space, but that requires (at least as far as I know) an upper triangular $R$ component, and neither of the two most promising packages I tried delivered one. (I'm pretty sure the Javadocs for one or both said $R$ would be upper triangular ... but it wasn't.) So I switched to singular value decomposition (SVD) using the <a href="http://ejml.org/wiki/index.php?title=Main_Page" target="_blank">EJML</a> library. SVD is generally slower than QR decomposition, but tends to be more accurate.<br /><br />I got an answer, but a check performed by the Java program warned me that some of the $x_i$ values that were supposed to be zero were in the vicinity of $10^{-6}$, which is a bit dicey to shrug off as rounding error. (The basis vectors are all normalized, by the way.) So I decided to port $A$ and some other stuff over to R and do some testing there ... which is when the adventure began.<br /><br />The dimensions of this $A$ matrix were $m=401$ and $n=411$. EJML gave me a basis of 17 vectors, indicating that $A$ had nullity 17 and thus rank 394. (The nullity of matrix $A$ is the dimension of $\mathcal{N}$, and is in turn the difference between $n$ and the rank of $A$.) I tested this by doing QR and SVD decompositions of $A$ (using the qr() and svd() functions from the R base package) and the Rank() function from the R <a href="https://cran.r-project.org/web/packages/pracma/index.html" target="_blank">pracma package</a>. The QR and SVD results indicated that $A$ had rank 393 and nullity 18. Only pracma::Rank() agreed with EJML ... and that came with the warning "Rank calculation may be problematic". (As if the rest of this stuff isn't.)<br /><br />So EJML might have shorted me one dimension in the null space ... or maybe not. The smallest eigenvalues of $A$ from the SVD were as follows: 9.16e-05, 5.37e-06, <b>1.09e-10</b>, <i>1.13e-15, 7.56e-16, 6.20e-16, 4.87e-16, 3.79e-16, 3.37e-16, 3.16e-16</i>. So the last seven (italicized) were pretty clearly zero and the preceding one (bold) is maybe zero, maybe not. The SVD does not produce the ten zero eigenvalues resulting from the difference between $m$ and $n$, so basically we're looking at a nullity of 18 if you treat the bold one as zero and 17 if you don't.<br /><br />The 17 basis vectors EJML gave me were all at least close to zero in all components $i\in I$. R had almost all the designated components $10^{-7}$ or smaller. The pracma::nullspace() function gave me 18 basis vectors (despite pracma::Rank() coming up with a nullity of 17), and some of the corresponding $x_i$ were pretty big (0.29, for example) -- way too big to be zero. So my condition is (probably) satisfied if I believe EJML and (definitely) not satisfied if I believe the R computations.<br /><br />Hoping for some clarity, I calculated (in R) $Ax$ for each basis vector $x$ from either EJML or pracma::nullspace(). The products with the EJML basis vectors were all zero vectors (meaning no components larger than about $10^{-15}$ in magnitude). For the pracma basis vectors, the products had components around $10^{-9}$ in many cases. So much for clarity. Do we treat $10^{-9}$ as approximately zero and believe pracma, or do we call it nonzero and believe EJML?<br /><br />I'm not new to the limitations of double-precision arithmetic. My first course in grad school was numerical analysis, and I've had a fair share of adventures with MIP models in CPLEX where the numerics were, shall we say, dicey. Still, it's rather frustrating to have to try multiple packages and then have to judge (meaning guess) which one is lying to me the least.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com8tag:blogger.com,1999:blog-8781383461061929571.post-60378848298951459332019-09-01T17:20:00.002-04:002019-09-01T17:20:51.506-04:00Further Adventures with MythTVI've documented various travails getting MythTV (specifically, via the Mythbuntu Linux distribution) to behave as expected and play well with my hardware. (If you want to see them, you can click the MythTV link in the tag cloud on the lower right, or use the search box.) Unfortunately, I've been stuck on an old version, unable to upgrade safely, for a while. The problem was likely the BIOS. Assorted kernel updates made waking up to record shows dicey at best, so I froze the machine on a now rather dated kernel ... when meant sticking to a rather dated version of Mythbuntu as well.<br /><br />I finally got around to buying a new PC (HP Pavilion) to be my PVR. In the meantime, development of Mythbuntu has apparently ceased, so I decided to install Linux Mint (the same operating system I have on my desktop and laptop) and then install MythTV on top of that. What I thought would be an afternoon's work stretched out over four days or so, due to a combination of BIOS adventures (starting with disabling "SecureBoot" and convincing the BIOS to let me boot from the DVD drive ... which I somehow had to do twice) and my natural ineptitude. I mistakenly thought that having a working MythTV installation on the old machine, from which I could copy various settings and scripts, would make installation smooth sailing. Going into the process, my expectation was that moving the backend database from the old machine to the new one would be the sticking point. It turned out that was the one step that went (fairly) smoothly.<br /><br />In this post, I'll document a couple of issues and how I sorted them out.<br /><br /><h2>First issue: No sound</h2>Watching TV is considerably less fun when you have no sound. Both the old machine and the new one were/are connected to my TV via an HDMI cable, and after a bit of fuss sound worked on the old machine. I have no idea why the new machine put up more of a fight, but I suspect it has something to do with using Mythbuntu versus Mint and which sound components each installs by default.<br /><br />At any event, I found a couple of blog posts that, between them, got me over the hump:<br /><br /><ul><li><a href="https://www.linuxuprising.com/2018/06/fix-no-sound-dummy-output-issue-in.html" target="_blank">Fix No Sound (Dummy Output) Issue In Ubuntu With SND HDA Intel</a> </li></ul><ul><li><a href="https://forum.mythtv.org/viewtopic.php?t=582#p2735" target="_blank">No sound on Mythbuntu/optiplex frontend SOLVED</a></li></ul>I have an old post of my own relating to the same problem (<a href="https://orinanobworld.blogspot.com/2014/12/hdmi-audio-in-mythbuntu.html">HDMI Audio in Mythbuntu</a>), but the symptoms on the Mythbuntu box were a bit different, so I'm not sure how helpful it would be on Mint (or any other distribution).<br /><br /><h2>Second issue: Being bugged for a password</h2><br />I swear that this never happened on Mythbuntu, so it surprised me when it happened on Mint. On both machines, I use the <a href="https://www.mythtv.org/wiki/Mythwelcome" target="_blank">MythWelcome</a> program as the gateway to the front end, and the Mythshutdown application to set wake-up times for scheduled recordings and shut the machine down (either automatically when idle for a specified time, or manually by clicking the shut down button in MythWelcome). On the new machine, shutting down from the MythWelcome menu resulted in a prompt for the root password. Back end use of mythwelcome did not produce a visible password request, but also did not work as expected. I suspect it might have been trying to ask for a root password in a non-graphical context (nowhere to display the password prompt). On the old machine, it just worked with no intervention on my part.<br /><br />It turns out I was missing a file. The MythTV installation process on both machines created a "mythtv" user group and added my account to it. What was missing was an addition to the <a href="https://help.ubuntu.com/community/Sudoers" target="_blank">sudoers</a> file -- or, more precisely, a file parked in the /etc/sudoers.d directory, which is automatically appended to the sudoers file at startup. The file needs to be installed as root. Apparently Mythbuntu did that automatically, whereas the MythTV installer did not (and did not prompt me to do so).<br /><br />The file's name is just "mythtv" (which I don't think is significant -- any file name should work). It is a plain text file, containing the following single line:<br /><br />%mythtv ALL = NOPASSWD: /sbin/shutdown, /bin/sh, /usr/bin/setwakeup.sh, /usr/bin/mythshutdown, /sbin/poweroff, /sbin/reboot<br /><br />The first bit ("%mythtv" -- don't forget the percent sign) says that what follows applies to anyone in the "mythtv" user group. The rest identifies six programs that get to run as root without requiring a password to be entered. Note that it includes the "setwakeup" script (which sets the wake-up time for a specified interval ahead of the next scheduled recording) and various commands relating to shutting down or rebooting the system. I'm a bit surprised to see /bin/sh in there, which apparently lets the mythtv group run any shell script that requires elevated privileges. That said, I'm not going to screw with it. If it ain't broke, don't fix it!<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-21028854404501593452019-07-20T12:09:00.000-04:002019-07-20T12:09:00.779-04:00Using Java Collections with CPLEXDisclaimer: What follows is specific to Java, but with some name changes will also apply to C++. If you are using one of the other programming APIs for CPLEX, something analogous may exist, but I would have no idea about it.<br /><br />I've seen a few questions by CPLEX users on forums recently that suggest the questioners may be moving from using a modeling language for optimization to using a general purpose language. Modeling languages tend to be more expressive (at least in my opinion) when it comes to writing optimization models. That's hardly shocking given that they are built for that purpose (and general programming languages are not). Recent questions have been along the lines of the following: I was using a structure (a tuple, or something similar) supported by the modeling language, and I don't know how to translate that to the general language; or I was creating arrays of variables / constraints / whatever indexed by something human-readable (probably names of things) and I don't know how to switch to integer-indexed arrays and keep it readable.<br /><br />In my mind, the answer to those issues in Java is to make good use of classes and the Java Collection interface (and its various implementations). I'll give some examples in the context of a network flow problem.<br /><br />In a modeling language, I might use a tuple to represent an arc. The tuple would consist of some identifier (string? index?) for the tail node of the arc, some identifier for the head node of the arc, and maybe an arc weight or some other arc property. In Java, I would create a class named Node to represent a single node, with fields including the node's name or other unique identifier and any costs, weights or other parameters associated with the node. Then I would create a class named Arc with fields of type Node holding the tail and head nodes, along with fields for any parameters associated with the arc (such as cost), and maybe a string field with a human-friendly label for the arc.<br /><br />Now suppose I have an instance of <code>IloCplex</code> named <code>cplex</code>, and that I need a variable for each arc representing the flow on the arc. The conventional (?) approach would be to create a one dimensional array of <code>IloNumVar</code> instances, one per arc, and park the variables there. In fact, CPLEX has convenience methods for creating vectors of variables. The catch is that it may be hard to remember which index corresponds to which arc. One partial answer is to attach a label to each variable. The various methods in the CPLEX API for creating variables (and constraints) all have versions with a final string argument assigning a name to that variable or constraint. This works fine when you print out the model, but it's not terribly helpful while you're doing the coding.<br /><br />I pretty much always use that feature to assign labels to variables and constraints, but I also employ collections in various ways. In my hypothetical network flow example, I would do something like the following.<br /><br /><div class="scroll"><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;">HashSet<span style="color: black;"><</span>Node<span style="color: black;">></span> nodes <span style="color: black;">=</span> <span style="color: black; font-weight: bold;">new</span> HashSet<span style="color: black;"><>();</span><br /><span style="color: #838183; font-style: italic;">// Fill nodes with all node instances.</span><br />HashSet<span style="color: black;"><</span>Arc<span style="color: black;">></span> arcs <span style="color: black;">=</span> <span style="color: black; font-weight: bold;">new</span> HashSet<span style="color: black;"><>();</span><br /><span style="color: #838183; font-style: italic;">// Fill arcs with all arc instances.</span><br />HashMap<span style="color: black;"><</span>Arc<span style="color: black;">,</span> IloNumVar<span style="color: black;">></span> flowVars <span style="color: black;">=</span> <span style="color: black; font-weight: bold;">new</span> HashMap<span style="color: black;"><>();</span><br />HashMap<span style="color: black;"><</span>IloNumVar<span style="color: black;">,</span> Arc<span style="color: black;">></span> flowArcs <span style="color: black;">=</span> <span style="color: black; font-weight: bold;">new</span> HashMap<span style="color: black;"><>();</span><br /><span style="color: black; font-weight: bold;">for</span> <span style="color: black;">(</span>Arc a <span style="color: black;">:</span> arcs<span style="color: black;">) {</span><br /> IloNumVar x <span style="color: black;">=</span> cplex<span style="color: black;">.</span><span style="color: #010181;">numVar</span><span style="color: black;">(</span><span style="color: #b07e00;">0</span><span style="color: black;">,</span> Double<span style="color: black;">.</span>MAX_VALUE<span style="color: black;">,</span> "Flow_" + a<span style="color: black;">.</span><span style="color: #010181;">getID</span><span style="color: black;">());</span><br /> flowVars<span style="color: black;">.</span><span style="color: #010181;">put</span><span style="color: black;">(</span>a<span style="color: black;">,</span> x<span style="color: black;">);</span><br /> flowArcs<span style="color: black;">.</span><span style="color: #010181;">put</span><span style="color: black;">(</span>x<span style="color: black;">,</span> a<span style="color: black;">);</span><br /><span style="color: black;">}</span><br /></pre></div><br />I put the nodes and arcs in separate collections (I used sets here, but you could equally well use lists), and then I create mappings between model constructs (in this case, arcs) and CPLEX constructs (in this case, variables). Collections can be iterated over, so there is no need to mess around with going back and forth between model constructs and integer indices. For each arc, I create a variable (and give it a name that includes the string identifier of the arc -- I'm assuming here that I gave the <code>Arc</code> class a <code>getID</code> method). I then map the arc to the variable and the variable to the arc. Why two maps? The arc to variable map lets me grab the correct variable while I'm building the model. For instance, a flow balance constraint would involve the flow variables for all arcs leading into and out of a node. I would identify all those arcs, park them in a collection (list or set), iterate over that collection of arcs, and for use the <code>flowVars</code> map to get the correct variables.<br /><br />The reverse mapping I set up comes into play when the solver has a solution I want to inspect. I can fetch the value of each variable and map it to the corresponding variable, along the following lines.<br /><br /><div class="scroll"><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;">HashMap<span style="color: black;"><</span>IloNumVar<span style="color: black;">,</span> Double<span style="color: black;">></span> solution <span style="color: black;">=</span> <span style="color: black; font-weight: bold;">new</span> HashMap<span style="color: black;"><>();</span><br /><span style="color: black; font-weight: bold;">for</span> <span style="color: black;">(</span>IloNumVar x <span style="color: black;">:</span> flowArcs<span style="color: black;">.</span><span style="color: #010181;">keySet</span><span style="color: black;">()) {</span><br /> solution<span style="color: black;">.</span><span style="color: #010181;">put</span><span style="color: black;">(</span>x<span style="color: black;">,</span> cplex<span style="color: black;">.</span><span style="color: #010181;">getValue</span><span style="color: black;">(</span>x<span style="color: black;">));</span><br /><span style="color: black;">}</span><br /></pre></div><br />When I need to associate variable values with arcs, I can iterate over <code>solution.entrySet()</code>. For each entry <code>e</code>, <code>flowArcs.get(e.getKey())</code> gives me the arc and <code>e.getValue()</code> gives me the flow on the arc.<br /><br />Similar things can be done with constraints (<code>IloRange</code> instances).<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-11577519856708986272019-06-22T20:01:00.000-04:002019-06-22T20:02:43.195-04:00Evaluating Expressions in CPLEXThere's a feature in the Java API for CPLEX (and in the C++ and C APIs; I'm not sure about the others) that I don't see mentioned very often, possibly because use cases may not arise all that frequently. It became relevant in a recent email exchange, though, so I thought I'd highlight it. As usual, I describe it in the context of Java and let users of other APIs figure out the corresponding syntax.<br /><br />Anyone who uses CPLEX knows how to use <code>IloCplex.getValue()</code> or <code>IloCplex.getValues()</code> to retrieve the values of the model variables in the final solution. What might fly under the radar is that there is an overload that takes as an argument an expression involving the variables (an instance of <code>IloNumExpr</code>), evaluates that expression, and returns the value. This is more a convenience than an essential feature: armed with the variable values, you could presumably compute the expression value yourself. The convenience aspects are not trivial though. Using <code>getValue()</code> to evaluate an expression saves you having to retrieve coefficients from someplace and likely run through a loop each time you evaluate the expression. (You do have to create the expression and store a pointer to it, but you do that once.) Also, if the only reason you need the variable values is to evaluate the expression, it saves you having to use <code>getValue()</code> to get the variable values.<br /><br />Where might you use this? It's possible to use it to compute slacks for cuts you generate somewhere. (Use it to evaluate the cut expression, then subtract the relevant bound on the cut inequality and you have the slack.) It's also possible to use it to track secondary criteria or objectives that you did not build into the model. I cobbled together <a href="https://gitlab.msu.edu/orobworld/getvalue" target="_blank">some code</a> to demonstrate the latter use, which is now available from my campus GitLab repository. The underlying problem comes from an old post by Erwin Kalvelagen ("<a href="https://yetanothermathprogrammingconsultant.blogspot.com/2017/09/minimizing-standard-deviation.html" target="_blank">Minimizing Standard Deviation</a>"). It involves loading pallets and minimizing the standard deviation of the pallet loads. Let $p$ be the vector of pallet loads and $\mu$ the mean load per pallet. My code creates expressions for both $\parallel p-(\mu,\dots,\mu)\parallel_{1}$ and $\parallel p-(\mu,\dots,\mu)\parallel_{2}^{2}$, minimizes the $L_1$ norm (which is computationally easier than minimizing the $L_2$ norm), and evaluates for each newly found solution (in an incumbent callback) and for the final solution.<br /><br />The use in a callback brings up an important point. There are <code>getValue()</code> methods in several classes, and you need to be a bit careful about which one you are using. <code>IloCplex.getValue()</code> evaluates the expression using the final solution, and will generate error 1217 (no solution exists) if you try to use before the solver is done. That in particular means you cannot use it in a callback. Fortunately, the relevant callbacks have their own versions. <code>IloCplex.LazyConstraintCallback.getValue()</code> and <code>IloCplex.IncumbentCallback.getValue()</code> evaluate expressions using the new candidate and accepted integer-feasible solution, respectively. Other callbacks (solve, branch, user cut) evaluate using the solution to the LP relaxation. To use any of them, call <code>this.getValue(...)</code> or just <code>getValue(...)</code>. Do <i>not</i> call <code>mip.getValue(...)</code> inside a callback, where <code>mip</code> is the problem you are solving: that will invoke <code>IloCplex.getValue()</code> and trigger the aforementioned error.<br /><br />Note that there are also <code>getSlack()</code> and <code>getSlacks()</code> methods for computing the slack in a constraint. The former takes a single instance of <code>IloRange</code> as argument and the latter takes a vector of <code>IloRange</code> instances. Like <code>getValue()</code>, there are separate versions in <code>IloCplex</code> and in the callback classes. I assume they behave the same way that <code>getValue</code> does, but I have not actually checked that.<br /><br />Finally, if you have switched to generic callbacks, there is good news. The <code>IloCplex.Callback.Context</code> class (the class used by the argument to your callback function) has <code>getCandidateValue()</code>, <code>getIncumbentValue()</code> and <code>getRelaxationValue()</code> for evaluating expressions based on a proposed new integer-feasible solution, the current best integer-feasible solution and the current node LP solution, respectively. (<code>IloCplex.getValue()</code> remains as it was.) Not only can you do the same evaluations, but the method names are utterly unambiguous. Gotta love that!Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-66288881104416711242019-06-16T16:44:00.000-04:002019-06-16T16:44:21.857-04:00R v. PythonA couple of days ago, I was having a conversation with someone that touched on the curriculum for a masters program in analytics. One thing that struck me was requirement of one semester each of R and Python programming. On the one hand, I can see a couple of reasons for requiring both: some jobs will require the worker to deal with both kinds of code; and even if you're headed to a job where one of them suffices, you don't know which one it will be while you're in the program. On the other hand, I tend to think that most people can get by nicely with just one or the other (in my case R), if not neither. Also, once you've learned an interpreted language (plus the general concepts of programming), I tend to think you can learn another one on your own if you need it. (This will not, however, get you hired if proficiency in the one you don't know is an up-front requirement.)<br /><br />I don't really intend to argue the merits of requiring one v. both here, nor the issue of whether a one-semester deep understanding of two languages is as good as a two-semester deep understanding of one language. Purely by coincidence, that conversation was followed by my discovering <a href="https://github.com/matloff/R-vs.-Python-for-Data-Science" target="_blank">R vs. Python for Data Science</a>, a point-by-point comparison of the two by Norm Matloff (a computer science professor at UC Davis). If you are interested in data science and trying to decide which one to learn (or which to learn first), I think Matloff's comparison provides some very useful information. With the exception of "Language unity", I'm inclined to agree with his ratings.<br /><br />Matloff calls language unity a "horrible loss" for R, emphasizing a dichotomy between conventional/traditional R and the so-called "Tidyverse" (which is actually a set of packages). At the same time, he calls the transition form Python 2.7 to 3.x "nothing too elaborate". Personally, I use Tidyverse packages when it suits me and not when it doesn't. There's a bit of work involved in learning each new Tidyverse package, but that's not different from learning any other package. He mentions tibbles and pipes. Since a tibble is a subclass of a data frame, I can (and often do) ignore the differences and just treat them as data frames. As far as pipes go, they're not exactly a Tidyverse creation. The Tidyverse packages load the magrittr package to get pipes, but I think that package predates the Tidyverse, and I use it with "ordinary" R code. Matloff also says that "... the Tidyverse should be considered advanced R, not for beginners." If I were teaching an intro to R course, I think I would introduce the Tidyverse stuff early, because it imposes some consistency on the outputs of R functions that is sorely lacking in base R. (If you've done much programming in R, you've probably had more than a few "why the hell did it do that??" moments, such as getting a vector output when you expected a scalar or vice versa.) Meanwhile, I've seen issues with software that bundled Python scripts (and maybe libraries), for instance because users who came to Python recently and have only ever installed 3.x discover that the bundled scripts require 2.x (or vice versa).<br /><br />Anyway, that one section aside (where I think Matloff and I can agree to disagree), the comparison is quite cogent, and makes for good reading.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-2325295554644630442019-06-14T17:20:00.000-04:002019-08-15T10:34:08.013-04:00A Java Container for ParametersA few days ago, I posted about a <a href="https://orinanobworld.blogspot.com/2019/06/a-swing-platform-for-computational.html" target="_blank">Swing class</a> (and supporting stuff) that I developed to facilitate my own computations research, and which I have now made open-source in a <a href="https://bitbucket.org/prubin/xframe/src/master/" target="_blank">Bitbucket repository</a>. I finally got around to cleaning up another Java utility class I wrote, and which I use regularly in experiments. I call it <code>ParameterBlock</code>. It is designed to be a container for various parameters that I need to set during experiments.<br /><br />It might be easiest if I start with a couple of screen shots. The first one shows a Swing application I was using in a recent project. Specifically, it shows the "Settings" menu, which has multiple entries corresponding to different computational stages (the overall solver, an initial heuristic, two phases of post-heuristic number crunching), along with options to save and load settings.<br /><br /><div class="separator" style="clear: both; text-align: center;"></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-6qyjFRgp9Jk/XQQQKDGHOGI/AAAAAAAAC8E/RpvpM2sVIXkbEMCVLWKRT__ewCYBf2_wwCPcBGAYYCw/s1600/param_menu.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="302" data-original-width="600" height="161" src="https://1.bp.blogspot.com/-6qyjFRgp9Jk/XQQQKDGHOGI/AAAAAAAAC8E/RpvpM2sVIXkbEMCVLWKRT__ewCYBf2_wwCPcBGAYYCw/s320/param_menu.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Parameter settings menu</td></tr></tbody></table><br />Computational research can involve a ton of choices for various parameters. CPLEX alone has what seems to be an uncountable number of them. In the Dark Ages, I hard-coded parameters, which meant searching the source code (and recompiling) every time I wanted to change one. Later I graduated to putting them on the command line, but that gets old quickly if there are more than just a handful. When I started writing simple Swing platforms for my work (like the one shown above), I added menu options to call up dialogs that would let me see the current settings and change them. Over time, this led me to my current solution.<br /><br />I put each collection of parameters in a separate subclass of the (abstract) <code>ParameterBlock</code> class. So clicking on "Solver" would access on subclass, clicking on "Heuristic" would access a different subclass, and so on. A parameter block can contain parameters of any types. The next shot shows a dialog for the solver parameters in my application. Two of the parameters are boolean (and have check boxes), two are Java enumerations (and have radio button groups), and three are numeric (and have text fields). String parameters are also fine (handled by text boxes).<br /><br />Defining a parameter block is easy (in my opinion). It pretty much boils down to deciding how many parameters you are going to have, assigning a symbolic name to each (so that in other parts of the code you can refer to "DOMINANCE" and not need to remember if it parameter 1, parameter 2 or whatever), giving each one a label (for dialogs), a type (such as <code>boolean.class</code> or <code>double.class</code>), a default value and a tool tip. The <code>ParameterBlock</code> class contains a static method for generating a dialog like the one below, and one or two other useful methods.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-94XnKwc06mo/XQQL_zMsn_I/AAAAAAAAC7s/bZw8O1KiDi8KHcERG_B5H5nyx3UuYyMVACLcBGAs/s1600/solver_dialog.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="430" data-original-width="656" height="261" src="https://1.bp.blogspot.com/-94XnKwc06mo/XQQL_zMsn_I/AAAAAAAAC7s/bZw8O1KiDi8KHcERG_B5H5nyx3UuYyMVACLcBGAs/s400/solver_dialog.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Solver parameters</td></tr></tbody></table><div class="separator" style="clear: both; text-align: center;"><br /></div>You can more details in the README file at the <a href="https://bitbucket.org/prubin/parameterblocks/src/master/" target="_blank">Bitbucket repository</a> I set up for this. The repository contains a small example for demonstration purposes, but to use it you just need to copy <code>ParameterBlock.java</code> into your application. As usual, I'm releasing it under a Creative Commons license. Hopefully someone besides me will find it useful.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-80568536289871107982019-06-11T15:05:00.000-04:002019-06-11T15:05:45.690-04:00A Swing Platform for Computational ExperimentsMost of my research involves coding algorithms and running computational experiments with them. It also involves lots of trial-and-error, both with the algorithms themselves and with assorted parameters that govern their functioning. Back in the Dark Ages, I did all this with programs that ran at a command prompt (or, in Linux terms, in a terminal) and wrote any output to the terminal. Eventually I got hip and started writing simple GUI applications for those experiments. A GUI application lets me load different problem without having to stop and restart the program, lets me change parameter settings visually (without having to screw around with lots of command line options ... is the switch for using antisymmetry constraints -A or -a?), and save output to text files when it's helpful.<br /><br />Since I code (mostly) in Java, the natural way to do this is with a Swing application. (When I use R, I typically use an R notebook.) Since there are certain features I always want, I found myself copying code from old projects and then cutting out the project-specific stuff and adding new stuff, which is a bit inefficient. So I finally got around to creating a minimal template version of the application, which I'm calling "XFrame" (short for "Experimental JFrame" or "JFrame for Experiments" or something).<br /><br />I just uploaded it to <a href="https://bitbucket.org/prubin/xframe/src/master/" target="_blank">Bitbucket</a>, where it is open-source under a very nonrestrictive Creative Commons license. Feel free to download it if you want to try it. There's an issue tracker where you can report any bugs or sorely missing features (but keep in mind I'm taking a fairly minimalist approach here).<br /><br />Using it is pretty simple. The main package (<code>xframe</code>) contains two classes and an interface. You can just plop them into your application somewhere. One class (<code>CustomOutputStream</code>) you will probably not want to change. The actual GUI is the <code>XFrame</code> class. You will want to add menu items (the only one it comes with is <code>File > Exit</code>) and other logic. Feel free to change the class name as well if you wish. Finally, your program needs to implement the interface defined in XFrameController so that XFrame knows how to talk to the program.<br /><br />The layout contains a window title at the top and a status message area at the bottom, both of which can be fetched and changed by code. The central area is a scrollable text area where the program can write output. It has buttons to save the content and to clear it, and the program will not exit with unsaved text unless you explicitly bless the operation.<br /><br />There is a function that lets the program open nonmodal, scrollable dialog (which can be left open while the main window is in use, and whose content can be saved to a file). Another function allows the program to pop up modal dialogs (typically warnings or error messages). Yet another function provides a place where you can insert logic to tell the GUI when to enable/disable menu choices (and maybe other things). Finally, there is a built-in extension of <code>SwingWorker</code> that lets you launch a computational operation in a separate thread (where it will not slow down or freeze the GUI).<br /><br />I included a small "Hello world!" application to show it works. I'll end with a couple of screen shots, one of the main window and the other of the nonmodal dialog (both from the demo). If it looks like something you might want to use, please head over to Bitbucket and grab the source.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-Hr89QlrxdHk/XP_6_1Il8SI/AAAAAAAAC7I/ujrQHQbmeXsL2PcXZc3-LS7luv7D4DTQwCLcBGAs/s1600/main_window.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="300" data-original-width="700" height="170" src="https://1.bp.blogspot.com/-Hr89QlrxdHk/XP_6_1Il8SI/AAAAAAAAC7I/ujrQHQbmeXsL2PcXZc3-LS7luv7D4DTQwCLcBGAs/s400/main_window.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Main window</td></tr></tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-Fx3vNLdhEeA/XP_6_6aA8mI/AAAAAAAAC7M/qq8Rwn0QX4gp41OpdYQeqt97TB9O1_YcACLcBGAs/s1600/dialog_window.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="270" data-original-width="450" height="240" src="https://1.bp.blogspot.com/-Fx3vNLdhEeA/XP_6_6aA8mI/AAAAAAAAC7M/qq8Rwn0QX4gp41OpdYQeqt97TB9O1_YcACLcBGAs/s400/dialog_window.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Nonmodal dialog</td></tr></tbody></table><br /><span id="goog_1661107282"></span><span id="goog_1661107283"></span><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-2498865291926015102019-06-10T10:02:00.000-04:002019-06-10T10:02:56.819-04:00Indicator Constraints v. Big MWay, way back I did a couple of posts related to how to model "logical indicators" (true/false values that control enforcement of constraints):<br /><ul><li><h4 class="post-title entry-title"><a href="https://orinanobworld.blogspot.com/2010/07/logical-indicators-in-mathematical.html">Logical Indicators in Mathematical Program</a> </h4></li><li><h4 class="post-title entry-title"><a href="https://orinanobworld.blogspot.com/2012/05/indicator-implies-relation.html">Indicator Implies Relation</a></h4><h4 class="post-title entry-title"></h4></li></ul>The topic ties in to the general issue of "big M" model formulations. Somewhere around version 10, CPLEX introduced what they call <i>indicator constraints</i>, which are essentially implications of the form "if constraint 1 holds, then constraint 2 holds". As an example, if $a$ and $b$ are vector respectively scalar parameters, $x$ is a binary variable and $y$ is a vector of real variables, you might use an indicator constraint to express $x=1 \implies a'y\le b$.<br /><br />That same constraint, in a "big M" formulation, would look like $a'y \le b + M(1-x)$. Using an indicator constraint saves you having to make a careful choice of the value of $M$ and avoids certain numerical complications. So should you always use indicator constraints? It's not that simple. Although "big M" formulations are notorious for having weak relaxations, indicator constraints can also result in weak (possibly weaker) relaxations.<br /><br />For more details, I'll refer you to a <a href="https://or.stackexchange.com/questions/231/when-to-use-indicator-constraints-versus-big-m-approaches-in-solving-mixed-int" target="_blank">question</a> on the new <a href="https://or.stackexchange.com/" target="_blank">Operations Research Stack Exchange</a> site, and in particular to <a href="https://or.stackexchange.com/questions/231/when-to-use-indicator-constraints-versus-big-m-approaches-in-solving-mixed-int/348#348" target="_blank">an answer</a> containing information provided by Dr. Ed Klotz of IBM. My take is that this remains one of those "try it and see" kinds of questions.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-20292720185525340202019-06-01T16:27:00.001-04:002019-06-01T16:27:52.270-04:00Naming CPLEX ObjectsA CPLEX user recently asked the following <a href="https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=ff135aca-927b-4609-9cf9-f5645c2dcf15&permalinkReplyUuid=387b0206-89ea-46c0-ad05-21a575275e96" target="_blank">question</a> on a user forum: "Is there a way to print the constraints as interpreted by CPLEX immediately after adding these constraints using addEq, addLe etc." The context for a question like this is often an attempt to debug either a model or the code creating the model. Other users in the past have indicated difficulty in parsing models after exporting them to a text (LP format) file, because they cannot associate the variable names and constraint names assigned by CPLEX to the variables and constraints in their mathematical models. For example, here is part of a simple set covering model created in Java using CPLEX 12.9 and exported to an LP file:<br /><br /><pre>Minimize<br /> obj: x1 + x2 + x3 + x4 + x5<br />Subject To<br /> c1: x1 + x4 >= 1<br /> c2: x1 + x3 + x4 >= 1<br /> c3: x2 + x5 >= 1<br /><br /></pre>Assume that the decisions relate to possible locations for depots. The text is perfectly legible, but is "x4" the decision to put a depot in "San Jose" or the decision to put a depot in "Detroit"? Is constraint "c1" the requirement to have a depot near central Ohio or the requirement to have a depot near the metro New York area?<br /><br />Assigning meaningful names to variables and constraints is easy. In the Java and C++ APIs (and, I assume, the others), the functions that create variables and constraints have optional string arguments for names. Let's say that I create my variables and my constraints inside loops, both indexed by a variable <span style="font-family: "courier new" , "courier" , monospace;">i</span>. I just need to change<br /><pre>mip.boolVar()</pre>and<br /><pre>mip.addGe(sum, 1)</pre>(where <span style="font-family: "courier new" , "courier" , monospace;">sum</span> is an expression calculated inside the loop) to<br /><pre>mip.boolVar(vnames[i]);</pre>and<br /><pre>mip.addGe(sum, 1, cnames[i]);</pre>where <span style="font-family: "courier new" , "courier" , monospace;">vnames[]</span> and <span style="font-family: "courier new" , "courier" , monospace;">cnames[]</span> are string arrays containing the names I want to use for variables and constraints, respectively. the previous model fragment now looks like the following:<br /><pre>Minimize<br /> obj: Pittsburgh + San_Jose + Newark + Detroit + Fresno<br />Subject To<br /> Central_Ohio: Pittsburgh + Detroit >= 1<br /> Metro_NY: Pittsburgh + Newark + Detroit >= 1<br /> SF_Oakland: San_Jose + Fresno >= 1</pre>This version is much more useful when either explaining the model (to someone else) or looking for problems with it. (Just don't look for any geographic logic in the example.)<br /><br />Note the use of underscores, rather than spaces, in the names. CPLEX has some rules about what is syntactically a legitimate name in an LP file, and if you violate those rules, CPLEX with futz with your names and add index numbers to them. So "San Jose" might become "San_Jose#2", and "SF/Oakland" would turn into something at least as silly.<br /><br />That's part of the battle. The question I cited asks how to print out constraints as they arise. The key there is that the various constraint constructors (<span style="font-family: "courier new" , "courier" , monospace;">IloCplex.ge()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.addGe()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.le()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.addLe()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.eq()</span>, <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.addEq()</span>, ...) <i>return a pointer</i> to the constraint they construct. If you pass that pointer to a print statement, you print the constraint. Extending my example, I will tweak the constraint construction a bit, to<br /><pre>IloRange newConstraint = mip.addGe(sum, 1, cnames[i]);<br />System.out.println(newConstraint);</pre>which creates the cover constraint, adds it to the model and then prints it. That results in output lines like this:<br /><pre>IloRange Central_Ohio : 1.0 <= (1.0*Detroit + 1.0*Pittsburgh) <= infinity</pre><br />One last observation: If you want to print the entire model out, you do not need to save it to an LP file. Just pass the model (<span style="font-family: "courier new" , "courier" , monospace;">IloCplex</span> object) to a print statement. If I execute<br /><pre>System.out.println(mip);</pre>after the model is complete, I get this:<br /><pre>IloModel {<br />IloMinimize : (1.0*San_Jose + 1.0*Detroit + 1.0*Pittsburgh + 1.0*Newark + 1.0*Fresno)<br />IloRange Central_Ohio : 1.0 <= (1.0*San_Jose + 1.0*Pittsburgh) <= infinity<br />IloRange Metro_NY : 1.0 <= (1.0*Pittsburgh + 1.0*Fresno) <= infinity<br />IloRange SF_Oakland : 1.0 <= (1.0*Detroit + 1.0*Fresno) <= infinity<br /><br />}</pre>It is not entirely complete (you don't see the declarations of the variables as binary), but it is arguably enough for most model debugging.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-33218158419774072352019-05-29T14:58:00.001-04:002019-05-29T14:59:10.828-04:00How to Crash CPLEXA question elsewhere on the blog reminded me that some users of the CPLEX programming APIs are not conscious of a "technicality" that, when violated, might cause CPLEX to crash (or at least throw an exception).<br /><br />The bottom line can be stated easily enough: modifying a CPLEX model while solving it is a <a href="https://www.urbandictionary.com/define.php?term=Bozo%20no-no" target="_blank">Bozo no-no</a>. In the Java API, that means making a change to instance of <span style="font-family: "courier new" , "courier" , monospace;">IloCplex</span> while that instance is solving. In the C++ API, where the model (<span style="font-family: "courier new" , "courier" , monospace;">IloModel</span>) and solver (<span style="font-family: "courier new" , "courier" , monospace;">IloCplex</span>) are separate, I suspect it means making a change to either on ... but I'm not sure.<br /><br />But wait, you say, callbacks do just that! No, not exactly. Callbacks add constraints (either user cuts or lazy constraints) to cut pools maintained by the solver, but they are <i>not</i> added to the model itself. Callbacks have their own add-whatever methods, for this specific purpose. Let's say that (in Java) I declare<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">IloCplex cplex = new IloCplex();</span><br /><br />and then build a model, attach a callback or two and call <span style="font-family: "courier new" , "courier" , monospace;">cplex.solve()</span>. While the solver is working, I can add user cuts or lazy constraints using the <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.UserCutCallback.add()</span> or <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.LazyConstraintCallback.add()</span> methods (or their local alternatives). What I <i>cannot</i> do, even in a callback, is use <span style="font-family: "courier new" , "courier" , monospace;">cplex.add()</span> to add a user cut or lazy constraint (or anything else). If I do, kaboom!<br /><br />What if, during a solve, I discover some new constraints that I would like to add to the model? One option is to abort the solver, add them, and start over. Another is to store them in program memory, wait for the solver to terminate (optimal solution, time limit, whatever), and then add them to the model. I just cannot add them while the solver is running (unless I want to crash the program).<br /><br />Note that, while one model is solving, I am free to modify some other model that is not solving. For instance, suppose I decompose a problem into a master problem and several subproblems (one per time period, say). Assuming that subproblems are solved sequentially, not in parallel, if I stumble on a constraint relevant to subproblem #2 while I am solving subproblem #1, I am free to add it to subproblem #2 ... just not (yet) to subproblem #1.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-13758631604944956092019-05-13T18:07:00.001-04:002019-05-13T18:07:17.908-04:00Randomness: Friend or Foe?I spent a chunk of the weekend debugging some code (which involved solving an optimization problem). There was an R script to setup input files and a Java program to process them. The Java program included both a random heuristic to get things going and an integer program solved by CPLEX.<br /><br />Randomness in algorithms is both a blessing and a curse. Without it, genetic algorithms would reduce to identical clones of one individual engaging in a staring contest, simulation models would produce absurdly tight confidence intervals (thanks to the output have zero variance) and so on. With it, though, testing and debugging software gets tricky, since you cannot rely on the same inputs, parameter settings, hardware etc. to produce the same output, even when the program is function correctly. If I rerun a problem and get different results, or different performance (e.g., same answer but considerably faster or slower), is that an indication of a bug or just luck of the draw?<br /><br />Somewhere, back in the Dark Ages, I was told that the answer to all this was to set the seed of the pseudorandom number stream explicitly. This was after the introduction of pseudorandom number generators, of course. Before that, random numbers were generated based on things like fluctuations in the voltage of the power supplied to the mainframe, or readings from a thermocouple stuck ... well, never mind. Today <a href="https://en.wikipedia.org/wiki/Hardware_random_number_generator" target="_blank">hardware random number generators</a> are used mainly in cryptography or gambling, where you explicitly do <i>not</i> want reproducibility. With pseudorandom numbers, using the same seed will produce the same sequences of "random" values, which should hypothetically make reproducibility possible.<br /><br />I think I originally came across that in the context of simulation, but it also applies in optimization, and not just in obviously random heuristics and metaheuristics. CPLEX has a built-in pseudorandom number generator which is used for some things related to branching decisions when solving integer programs (and possibly other places too). So explicitly setting a seed for both the random number generator used by my heuristic and CPLEX should make my code reproducible, right?<br /><br />Wrong. There are two more wildcards here. One is that my Java code uses various types of collections (HashMaps, HashSets, ...) and also uses Streams. Both of those can behave in nondeterministic ways if you are not very careful. (Whether a Stream is deterministic may boil down to whether the source is. I'm not sure about that.) The other, and I'm pretty sure this bites me in the butt more often than any other aspect, is the use of parallel threads. My code runs multiple copies of the heuristic in parallel (using different seeds), and CPLEX uses multiple threads. The problem with parallel threads is that timing is nondeterministic, even if the sequence of operations in each thread is. The operating system will interrupt threads willy-nilly to use those cores for other tasks, such as updating the contents of the display, doing garbage collection in Java, pinging the mail server or asking <a href="https://en.wikipedia.org/wiki/Skynet_(Terminator)" target="_blank">Skynet</a> whether it's time to terminate the user). If there is any communication between the processes running in parallel threads in your code, the sequencing of the messages will be inherently random. Also, if you give a process a time limit and base it on "wall clock" time (which I'm prone to doing), how far the process gets before terminating will depend on how often and for how long it was interrupted.<br /><br />Limiting everything to a single thread tends not to be practical, at least for the problems I find myself tackling. So I'm (somewhat) reconciled to the notion that "reproducibility" in what I do has to be interpreted somewhat loosely.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com4tag:blogger.com,1999:blog-8781383461061929571.post-23542494533694608822019-04-16T15:53:00.000-04:002019-04-16T15:53:52.837-04:00CPLEX Callbacks: ThreadLocal v. CloneA while back, I <a href="https://orinanobworld.blogspot.com/2017/11/benders-decomposition-with-generic.html" target="_blank">wrote a post</a> about the new (at the time) "generic" callbacks in CPLEX, including a brief discussion of adventures with multiple threads. A key element was that, with generic callbacks, IBM was making the user more responsible for thread safety. In that previous post, I explored a few options for doing so, including use of Java's <a href="https://docs.oracle.com/javase/10/docs/api/java/lang/ThreadLocal.html" target="_blank">ThreadLocal</a> class. <span style="font-family: "courier new" , "courier" , monospace;">ThreadLocal</span> is my current first choice when writing a generic callback.<br /><br />Recently, I saw a <a href="https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=e72f622a-b899-4e96-a789-254587a39c7d&permalinkReplyUuid=dc62fc3c-c30a-41e4-bfe1-3e037cc7700e" target="_blank">reply by IBM's Daniel Junglas</a> on a CPLEX user forum that contained the following information.<br /><blockquote class="tr_bq">For each thread CPLEX creates a clone of your callback object. This is done by invoking the callback's clone() method. The Java default implementation of clone() returns a shallow copy of the class.</blockquote>That set me to wondering about the differences between <span style="font-family: "courier new" , "courier" , monospace;">ThreadLocal</span> and cloning, so I did a bit of experimentation. I'll summarize what I think I learned below.<br /><br /><h3>Why is any of this necessary? </h3><br />It's not if you only allow CPLEX to use a single thread. When using callbacks with multiple threads, it is possible that two or more threads will access the callback simultaneously. Simultaneously reading/fetching the value of something is harmless, but trying to modify a value simultaneously will either trigger an exception or cause the JVM to crash. (That's not hypothetical, by the way. I have a test program that routinely crashes the JVM.) "Information callbacks" are harmless, but "control callbacks" (where you attempt to alter what CPLEX is doing, by adding cuts or lazy constraints or by taking control of branching), usually involve modifying something. In particular, with Benders decomposition your callback needs to solve a subproblem <i>after first adjusting it</i> based on the proposed master problem solution. Two threads trying to adjust the subproblem at the same time spells disaster.<br /><br /><h3>Is the use of ThreadLocal an option for both legacy and generic callbacks?</h3><br />Yes. The only tricky part is in initialization of the storage. Let's say that I'm doing Benders decomposition, and I have a subproblem that is an instance of <span style="font-family: "courier new" , "courier" , monospace;">IloCplex</span>. I am going to need to create a separate version of the subproblem for each thread. So my callback class will contain a class field declared something like<br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: black; font-weight: bold;">private</span> ThreadLocal<span style="color: black;"><</span>IloCplex<span style="color: black;">></span> subproblem<span style="color: black;">;</span></pre>and it will need to fill in a value of the subproblem for each thread.<br /><br />With a generic callback, the <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> context provided by CPLEX can be used to do this. Assuming that <span style="font-family: "Courier New", Courier, monospace;">context</span> is the argument to the callback function you write, you can use code like the following to initialize the subproblem for each thread.<br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: black; font-weight: bold;">if</span> <span style="color: black;">(</span>context <span style="color: black;">==</span> IloCplex<span style="color: black;">.</span>Callback<span style="color: black;">.</span>Context<span style="color: black;">.</span>Id<span style="color: black;">.</span>ThreadUp<span style="color: black;">) {</span><br /> IloCplex s <span style="color: black;">= ...;</span> <span style="color: #838183; font-style: italic;">// code to generate a new subproblem</span><br /> subproblem<span style="color: black;">.</span><span style="color: #010181;">set</span><span style="color: black;">(</span>s<span style="color: black;">);</span><br /><span style="color: black;">}</span></pre>Once the subproblem is initialized, to use it when a candidate solution is being proposed, you need to extract it from the <span style="font-family: "courier new" , "courier" , monospace;">ThreadLocal</span> field. Here is an example of how that would look. <br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: black; font-weight: bold;">if</span> <span style="color: black;">(</span>context <span style="color: black;">==</span> IloCplex<span style="color: black;">.</span>Callback<span style="color: black;">.</span>Context<span style="color: black;">.</span>Id<span style="color: black;">.</span>Candidate<span style="color: black;">) {</span><br /> IloCplex s <span style="color: black;">=</span> subproblem<span style="color: black;">.</span><span style="color: #010181;">get</span><span style="color: black;">(</span>s<span style="color: black;">)</span>;<br /> <span style="color: #838183; font-style: italic;">// Do Benders stuff with s.</span><br /><span style="color: black;">}</span></pre><br />Legacy callbacks do not have a mechanism like <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> for detecting the creation of a new thread. You can still initialize the subproblem "lazily". In the legacy callback, before using the subproblem check to see if it exists. If not, create it. Here's some sample code.<br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;">IloCplex s<span style="color: black;"> = </span><span style="color: black;">subproblem<span style="color: black;">.</span><span style="color: #010181;">get</span><span style="color: black;">()</span>;</span> <span style="color: #838183; font-style: italic;">// get the subproblem</span><br /><span style="color: black; font-weight: bold;">if</span> <span style="color: black;">(s</span><span style="color: black;"> ==</span> null<span style="color: black;">) {</span><br /> <span style="color: #838183; font-style: italic;">// First use: need to generate a fresh subproblem.</span><br /> s <span style="color: black;">= ...;</span> <span style="color: #838183; font-style: italic;">// code to generate a new subproblem</span><br /> subproblem<span style="color: black;">.</span><span style="color: #010181;">set</span><span style="color: black;">(</span>s<span style="color: black;">);</span><br /><span style="color: black;">}</span><br /><span style="color: #838183; font-style: italic;">// Do Benders stuff with s.</span></pre><h3>Is cloning an option for both legacy and generic callbacks?</h3><br />No. I don't think cloning can be used with generic callbacks. In Java, objects can be cloned. CPLEX declares legacy callbacks as Java objects, but it declares generic callbacks by means of an interface. Cloning an interface is not really a "thing" in Java.<br /><br />When you use a generic callback, you create a class that implements the <span style="font-family: "courier new" , "courier" , monospace;">Callback</span> interface. It's certainly possible to make that class cloneable, but according to my experiments CPLEX will not call the <span style="font-family: "courier new" , "courier" , monospace;">clone()</span> method on that class, even if it exists. So the solver will be working with a single instance of that class, and if the class is not thread-safe, kaboom.<br /><br /><h3>What does cloning entail?</h3><br />There are quite a few web sites that explain cloning in Java, and if you intend to use it I recommend you do a search. I'll just cover the bare bones here.<br /><ol><li>Declare your class with the modifier "<span style="font-family: "Courier New", Courier, monospace;">implements Cloneable</span>".</li><li>Override the protected method <span style="font-family: "Courier New", Courier, monospace;">Object.clone</span>.</li><li>Call the parent method (<span style="font-family: "Courier New", Courier, monospace;">super.clone()</span>) as the very first executable line of the <span style="font-family: "Courier New", Courier, monospace;">clone()</span> method.</li><li>Handle the <span style="font-family: "Courier New", Courier, monospace;">CloneNotSupportedException</span> that the parent method might hypothetically throw, either by catching it and doing something, or by rethrowing it.</li><li>After calling the parent method, fix anything that needs fixing.</li></ol>I left that last step rather vague. With a Benders lazy constraint callback, you will need to replace the original subproblem with a freshly generated version (so that no two threads are chewing on the same subproblem instance). If life were fair (it's not), you would just do a deep clone of the original problem. The catch is that the <span style="font-family: "Courier New", Courier, monospace;">IloCplex</span> class is <i>not</i> cloneable. So the best (only?) alternative I can see is to build a new copy of the subproblem, the same way you built the original copy.<br /><br />If you have other class fields that the callback might modify (such as a vector that stores the best feasible solution encountered), you <i>may</i> need to do a deep clone of them (or replace them with new versions). A detailed analysis of the differences between shallow and deep cloning is beyond the scope of this post (or my competence). As Daniel points out in his answer, <span style="font-family: "Courier New", Courier, monospace;">super.clone()</span> only makes a shallow copy. You'll need to take additional steps to make sure that fields containing arrays and other objects (other than primitives) are handled properly if the callback might modify them.<br /><br />Here is some skeleton code.<br /><br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: black; font-weight: bold;">public class</span> MyCallback <span style="color: black; font-weight: bold;">extends</span> IloCplex<span style="color: black;">.</span>LazyConstraintCallback<br /> <span style="color: black; font-weight: bold;">implements</span> Cloneable <span style="color: black;">{</span><br /> <span style="color: black; font-weight: bold;">private</span> IloCplex subproblem<span style="color: black;">;</span><br /><br /> <span style="color: #838183; font-style: italic;">// ...</span><br /><br /> <span style="color: #838183; font-style: italic;">/**</span><br /><span style="color: #838183; font-style: italic;"> * Clone this callback.</span><br /><span style="color: #838183; font-style: italic;"> * @return the clone</span><br /><span style="color: #838183; font-style: italic;"> * @throws CloneNotSupportedException if the parent clone method bombs</span><br /><span style="color: #838183; font-style: italic;"> */</span><br /> <span style="color: black; font-weight: bold;">@Override</span><br /> <span style="color: black; font-weight: bold;">public</span> MyCallback <span style="color: #010181;">clone</span><span style="color: black;">()</span> <span style="color: black; font-weight: bold;">throws</span> CloneNotSupportedException <span style="color: black;">{</span><br /> <span style="color: #838183; font-style: italic;">// Call Object.clone first.</span><br /> MyCallback cb <span style="color: black;">= (</span>CloneCallback<span style="color: black;">)</span> <span style="color: black; font-weight: bold;">super</span><span style="color: black;">.</span><span style="color: #010181;">clone</span><span style="color: black;">();</span><br /> <span style="color: #838183; font-style: italic;">// Replace the subproblem with a fresh copy.</span><br /> subproblem <span style="color: black;">= ...;</span> <span style="color: #838183; font-style: italic;">// generate a fresh copy of the subproblem</span><br /> <span style="color: #838183; font-style: italic;">// Make deep copies (or new values) for other fields as needed.</span><br /> <span style="color: black; font-weight: bold;">return</span> cb<span style="color: black;">;</span><br /> <span style="color: black;">}</span><br /><br /><span style="color: black;">}</span></pre>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-21272939670347851502019-03-12T19:46:00.000-04:002019-03-12T19:46:29.443-04:00Pseudocode in LyX RevisitedThis post is strictly for users of <a href="https://www.lyx.org/" target="_blank">LyX</a>.<br /><br />In a <a href="https://orinanobworld.blogspot.com/2018/10/pseudocode-in-lyx.html" target="_blank">previous post</a> I offered up a LyX module for typesetting pseudocode. Unfortunately, development of that module bumped up against some fundamental incompatibilities between the <a href="https://ctan.org/pkg/algorithmicx" target="_blank"><span style="font-family: "Courier New", Courier, monospace;">algorithmicx</span> package</a> and the way LyX layouts work. The repository for it remains open, but I decided to shift my efforts to a different approach.<br /><br />LyX has built-in support for "program listings", insets that use either the <a href="https://ctan.org/pkg/listings?lang=en" target="_blank">listings</a> (default choice) or <a href="https://ctan.org/pkg/minted" target="_blank">minted</a> LaTeX package. The listings package supports a lengthy list of programming languages, but not pseudocode. This makes sense because pseudocode has no fixed syntax; everyone writes it their own way. So I cobbled together a LyX module that adds my take on a pseudocode language to the listings package, allowing users to type pseudocode into a LyX program listing inset.<br /><br />The module is available (under a GPL3 license) from a <a href="https://github.com/prubin73/pseudolst" target="_blank">GitHub repository</a>. The repository includes a user guide (both a LyX file and a PDF version) explaining both installation and the nuances of using the module. (Yes, there are some finicky bits.) The user guide also spells out what words are treated as keywords and formatted accordingly. I also built in a (somewhat clumsy) way to insert vertical lines to help the user recognize the span of a block of code.<br /><br />The listings package provides (and LyX supports) a number of nice formatting features, including options for color-coding keywords, numbering lines and such. In addition, you can modify the list of recognized keywords by editing the module file in a text editor. If you are familiar with the listings package, you can customize the pseudocode language definition even further (for instance, by changing the delimiters for comments), again by hacking the module file. If you use the module, feel free to suggest changes to the language definition via the issue tracker for the repository.<br /><br />Here is a small sample of the output, to give you an idea of what I am talking about. (I had to convert the PDF output to an image, which explains the slight fuzziness.)<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-mAdBM5_QX98/XIhDfyal4PI/AAAAAAAAC28/vkobreGxLSURXc7TRFpGoSJ3e_7ZtQHEwCLcBGAs/s1600/euclid.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="206" data-original-width="338" src="https://2.bp.blogspot.com/-mAdBM5_QX98/XIhDfyal4PI/AAAAAAAAC28/vkobreGxLSURXc7TRFpGoSJ3e_7ZtQHEwCLcBGAs/s1600/euclid.png" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div>While we're on the subject of LyX, I also have a layout file for the <a href="https://ctan.org/pkg/standalone?lang=en" target="_blank">standalone</a> LaTeX class, which is useful for generating PDF output of just one table, image or whatever without the wasted space of page margins. That layout file has its own <a href="https://github.com/prubin73/standalone" target="_blank">GitHub repository</a>, if you are interested.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-79665856350403854332019-02-24T18:29:00.002-05:002019-02-25T17:30:28.222-05:00Guessing Pareto Solutions: A TestIn <a href="https://orinanobworld.blogspot.com/2019/02/guessing-pareto-solutions.html" target="_blank">yesterday's post</a>, I described a simple multiple-criterion decision problem (binary decisions, no constraints), and suggested a possible way to identify a portion of the Pareto frontier using what amounts to guesswork: randomly generate weights; use them to combine the multiple criteria into a single objective function; optimize that (trivial); repeat <i>ad nauseam</i>.<br /><br />I ran an experiment, just to see whether the idea was a complete dud or not. Before getting into the results, I should manage expectations a bit. Specifically, while it's theoretically possible in some cases that you might find all the Pareto efficient solutions (and is in fact guaranteed if there is only one), in general you should not bank on it. Here's a trivial case that demonstrates that it may be impossible to find them all using my proposed technique. Suppose we have three yes-no decisions ($N=3$, using the notation from yesterday's post) and two criteria ($M=2$), one to be maximized and one to be minimized. Supposed further that, after changing the sign of the second criterion (so that we are maximizing both) and scaling, that all three decisions produce identical criterion values of $(1,-1)$. With $N=3$, there are $2^3=8$ candidate solutions. One (do nothing) produces the result $(0,0)$. Three (do just one thing) produce $(1,-1)$, three (do any two things) produce $(2,-2)$ and one (do all three things) produces $(3,-3)$. All of those are Pareto efficient! However, if we form a weighted combination using weights $(\alpha_1, \alpha_2) \gg 0$ for the criteria, then every decision has the same net impact $\beta = \alpha_1 - \alpha_2$. If $\beta > 0$, the only solution that optimizes the composite function is $x=(1,1,1)$ with value $3\beta$. If $\beta<0$, the only solution that optimizes the composite function is $x=(0,0,0)$ with value $0$. The other six solutions, while Pareto efficient, cannot be found my way.<br /><br />With that said, I ran a single test using $N=10$ and $M=5$, with randomly generated criterion values. One test like this is not conclusive for all sorts of reasons (including but not limited to the fact that I made the five criteria statistically independent of each other). You can see (and try to reproduce) my work in an <a href="https://msu.edu/~rubin/code/pareto.nb.html" target="_blank">R notebook</a> hosted on my web site. The code is embedded in the notebook, and you can extract it easily. Fair warning: it's pretty slow. In particular, I enumerated all the Pareto efficient solutions among the $2^{10} = 1,024$ possible solutions, which took about five minutes on my PC. I then tried one million random weight combinations, which took about four and a half minutes.<br /><br /><b>Correction</b>: After I rejiggered the code, generating a million random trials took only 5.8 seconds. The bulk of that 4.5 minutes was apparently spent keeping track of how often each solution appeared among the one million results ... and even that dropped to a second or so after I tightened up the code.<br /><br />To summarize the results, there were 623 Pareto efficient solutions out of the population of 1,024. The random guessing strategy only found 126 of them. It found the heck out of some of them: the maximum number of times the same solution was identified was 203,690! (I guess that one stuck out like a sore thumb.) Finding 126 out of 623 may not sound too good, but bear in mind the idea is to present a decision maker with a reasonable selection of Pareto efficient solutions. I'm not sure how many decision makers would want to see even 126 choices.<br /><br />A key question is whether the solutions found by the random heuristic are representative of the Pareto frontier. Presented below are scatter plots of four pairs of criteria, showing all the Pareto efficient solutions color-coded according to whether or not they were identified by the heuristic. (You can see higher resolution versions by clicking on the link above to the notebook, which will open in your browser.) In all cases, the upper right corner would be ideal. Are the identified points a representative sample of all the Pareto efficient points? I'll let you judge.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-dIISelf6iJc/XHMn1b68J9I/AAAAAAAAC18/tPuG31q-gi4ybS337QwqF5b_bCs9xGkVACLcBGAs/s1600/plot1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="430" data-original-width="495" height="277" src="https://4.bp.blogspot.com/-dIISelf6iJc/XHMn1b68J9I/AAAAAAAAC18/tPuG31q-gi4ybS337QwqF5b_bCs9xGkVACLcBGAs/s320/plot1.png" width="320" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-H5Y1X6LJZus/XHMn1bTNmeI/AAAAAAAAC2E/o4SZCKkhIUcvrIg9wuAXKp2dbaurt9SpgCLcBGAs/s1600/plot3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="430" data-original-width="495" height="277" src="https://4.bp.blogspot.com/-H5Y1X6LJZus/XHMn1bTNmeI/AAAAAAAAC2E/o4SZCKkhIUcvrIg9wuAXKp2dbaurt9SpgCLcBGAs/s320/plot3.png" width="320" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-N-v3S8zAxq0/XHMn1QoUFtI/AAAAAAAAC14/IiMDPgbi0JoTrCjZ9at9Ay9-DXPfpU3zQCLcBGAs/s1600/plot2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="430" data-original-width="495" height="277" src="https://3.bp.blogspot.com/-N-v3S8zAxq0/XHMn1QoUFtI/AAAAAAAAC14/IiMDPgbi0JoTrCjZ9at9Ay9-DXPfpU3zQCLcBGAs/s320/plot2.png" width="320" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-GeejIDo_vXc/XHMn180muWI/AAAAAAAAC2A/Bc8T28J87-kpHeJnIp4-KHkT2wp3NVBAwCLcBGAs/s1600/plot4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="430" data-original-width="495" height="277" src="https://3.bp.blogspot.com/-GeejIDo_vXc/XHMn180muWI/AAAAAAAAC2A/Bc8T28J87-kpHeJnIp4-KHkT2wp3NVBAwCLcBGAs/s320/plot4.png" width="320" /></a></div><br />I'll offer one final thought. The weight vectors are drawn uniformly over a unit hypercube of dimension $M$, and the frequency with which each identified Pareto solution occurs within the sample should be proportional to the volume of the region within that hypercube containing weights favoring that solution. So high frequency solutions have those high frequencies because wider ranges of weights favor them. Perhaps that makes them solutions that are more likely to appeal to decision makers than their low frequency (or undiscovered) counterparts?Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-30837209852963578912019-02-23T14:53:00.003-05:002019-02-23T14:53:30.958-05:00Guessing Pareto SolutionsOne of the challenges of <a href="https://en.wikipedia.org/wiki/Multiple-criteria_decision_analysis" target="_blank">multiple-criteria decision-making (MCDM)</a> is that, in the absence of a definitive weighting or prioritization of criteria, you cannot talk meaningfully about a "best" solution. (Optimization geeks such as myself tend to find that a major turn-off.) Instead, it is common to focus on <a href="https://en.wikipedia.org/wiki/Pareto_efficiency" target="_blank">Pareto efficient</a> solutions. We can say that one solution A dominates another solution B if A does at least as well as B on all criteria and better than B on at least one criterion. A solution is Pareto efficient if no solution dominates it.<br /><br />Recently I was chatting with a woman about a multiple-criteria model she had. The variables were all binary (yes-no decisions) and were otherwise unconstrained. Normally my hackles would go up at an unconstrained decision problem, but in this case all constraints were "soft". For example, rather than specify a fixed budget, she would make cost one of the criteria (less being better) and let the decision maker screen out Pareto efficient solutions with unrealistic budgets.<br /><br />I won't name the researcher or go into more details, since her work is currently under review (as of this writing), but I will look at a similar problem that is only slightly simpler. Namely, I will add the requirement that the criteria are additive, meaning that the value of any criterion for any "yes" does not depend on which other decisions wound up "yes" rather than "no". So, in mathematical terms, we have $N$ binary decisions ($x\in\{0,1\}^N$) and $M$ criteria. Let $c_{ij}$ denote the value of criterion $j$ when $x_i=1$. In what follows, I will want to be consistent in maximizing (rather than minimizing) criterion values, so I'll assume that (a) a "no" decision has value 0 for every criterion and (b) criteria are normalized so that $c_{ij}\in [0,1]$ for all $i$ and for all $j$ such that more is better, while $c_{ij}\in [-1,0]$ for all $i$ and for all $j$ such that (before scaling) less is better. For example, if cost is a criterion (and if you do not work for the Pentagon), the scaled values of cost will run from -1 for the most expensive choice encountered to 0 for any choice that is free.<br /><br />The solution space contains $2^N$ candidate solutions, which means that enumerating them is out of the question for nontrivial values of $N$ (and, trust me, hers are nontrivial ... very, very nontrivial). In any case, the size of the Pareto frontier when $N$ is nontrivial will quickly outpace any decision maker's capacity (or patience). So it seems reasonable to sample the Pareto frontier and present the solution maker with a reasonably (but not excessively) large set of Pareto efficient solutions that hopefully is representative of the overall Pareto frontier.<br /><br />There are methods for doing this, and the researcher pointed me toward one: the NSGA-II genetic algorithm [1]. Like any genetic algorithm, NSGA-II runs until you stop it via some limit (run time, number of generations, some measure of stagnation, ...). As I understand it, the final population of solutions are not guaranteed to be Pareto efficient, but the population converges in some sense to the Pareto frontier. That would likely be sufficient in practice, and the researcher had good luck with it initially, but ran into scaling problems as $N$ grew.<br /><br />Looking at the problem reminded me of a sometimes troubling dichotomy faced by academics (in at least some disciplines, including operations research). On the one hand, real-world decision makers with real-world problems are often quite happy to get solutions that are obtained quickly and cheaply and are easily explained or understood, even if the technology is "primitive". Academics, on the other hand, usually need to publish their research. In the domain of the problem, low-tech algorithms may be perfectly fine (and easier to understand than more esoteric algorithms), but for a professor in IE or OR looking to publish in a journal in his or her field, "simple" usually means "too unsophisticated to publish". Thus we tend at times to "over-engineer" solutions.<br /><br />Anyway, taking off my academic hat for a moment, I got to wondering whether a no-brainer random search approach would work for a problem such as this one. My premise was the following. Suppose I randomly select weights $\alpha_1,\dots,\alpha_M \in (0,1]$ and use them to form a single weighted criterion function $f_\alpha:\{0,1\}^N \rightarrow \Re$ as follows:$$f_\alpha (x)=\sum_{i=1}^N w_i(\alpha) x_i$$where$$w_i(\alpha)=\sum_{j=1}^M \alpha_j c_{ij}.$$Note that the weights $\alpha_i$ need to be <i>strictly positive</i>, not just nonnegative. Maximizing $f_\alpha$ over the solution space is trivial: set $x_i=1$ if $w_i(\alpha) > 0$ and $x_i=0$ if $w_i(\alpha) < 0$. (You can set $x_i$ to anything you like if $w_i(\alpha) = 0$, but that should have probability 0 of occurring given the random generation of $\alpha$.) Constructing the coefficients of $f_\alpha$ is $O(MN)$ and optimizing $f_\alpha$ is $O(N)$, so each solution $x$ produced this way takes $O(MN)$ time. In practice, $M$ is likely not to get too big, and very likely does not grow as $N$ does, so functionally this is really $O(N)$ in complexity.<br /><br />It's trivial to show that, so long as none of the composite weights $w_i(\alpha)$ is 0, each $x$ found by optimizing some $f_\alpha$ must be Pareto efficient. So we can generate a sample of Pareto efficient solutions fairly efficiently ... maybe. I did deal one card off the bottom of the deck. Solutions will tend to repeat, so we will need to do some bookkeeping to ensure that, in the final results, duplicates are weeded out. That means comparing each new solution to its predecessors. Each comparison is $O(N)$, and if we assume that we only retain a "reasonable" number of alternatives (a number that does not grow with $N$) in order to avoid causing the decision maker's brain to explode, then time spent weeding duplicates should be $O(N)$. Overall, the load is $O(NR)$, where $R$ is the number of replications performed.<br /><br />That left me with a few questions, the main one being this: for plausible sample sizes, how well does the set of solutions obtained represent the Pareto frontier? Is it "biased" in the sense that solutions cluster on one part of the frontier while missing other parts?<br /><br />I'll show some experimental results in my next post, and let you decide.<br /><br /><br />[1] K. Deb, A. Pratap, S. Agarwal & T. Meyarivan (2002). A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. <b>IEEE Transactions on Evolutionary Computation</b>, <i>6</i>, 182-197.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-6957851638559641162019-02-14T17:03:00.000-05:002019-02-14T17:03:22.669-05:00Binary / Integer Conversion in RI've been poking around in R with an unconstrained optimization problem involving binary decision variables. (Trust me, it's not as trivial as it sounds.) I wanted to explore the entire solution space. Assuming $n$ binary decisions, this means looking at the set $\{0,1\}^n$, which contains $2^n$ 0-1 vectors of dimension $n$.<br /><br />For reasons I won't get into, I wanted to associate each vector with an ID number from 0 to $2^n-1$, which sounds pretty straightforward ... until you try to do it. I thought I'd post my code here, with no warranty ("express or implied", as the legal eagles would say) that it's computationally efficient.<br /><br />Before I explain the code, let me just point out that using an integer from 0 to $2^n-1$ implies that $n$ is small enough that $2^n-1$ fits in a machine integer. So if you decide to try the code, don't go nuts with it.<br /><br />Here's the code:<br /><br /><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: #838183; font-style: italic;"># Create a vector of powers of 2 (for use in conversions from binary vectors to integers).</span><br />powers<span style="color: black;">.</span>of<span style="color: black;">.</span>two <span style="color: black;"><-</span> <span style="color: #b07e00;">2</span>^<span style="color: black;">(</span><span style="color: #b07e00;">0</span><span style="color: black;">:(</span>n <span style="color: black;">-</span> <span style="color: #b07e00;">1</span><span style="color: black;">))</span><br /><span style="color: #838183; font-style: italic;"># Convert an ID (integer) to a binary vector of appropriate length. Note that the vector is reversed so that the lowest order bit (corresponding to the first decision) comes first.</span><br />fromID <span style="color: black;"><-</span> <span style="color: black; font-weight: bold;">function</span><span style="color: black;">(</span>id<span style="color: black;">) {</span> as<span style="color: black;">.</span><span style="color: #010181;">integer</span><span style="color: black;">(</span><span style="color: #010181;">head</span><span style="color: black;">(</span><span style="color: #010181;">intToBits</span><span style="color: black;">(</span>id<span style="color: black;">),</span> n<span style="color: black;">)) }</span><br /><span style="color: #838183; font-style: italic;"># Convert a binary vector of appropriate length to an ID value (integer).</span><br />toID <span style="color: black;"><-</span> <span style="color: black; font-weight: bold;">function</span><span style="color: black;">(</span>vec<span style="color: black;">) {</span> as<span style="color: black;">.</span><span style="color: #010181;">integer</span><span style="color: black;">(</span>vec <span style="color: black;">%*%</span> powers<span style="color: black;">.</span>of<span style="color: black;">.</span>two<span style="color: black;">) }</span> </pre><br />To facilitate converting binary vectors to integers, I store all the powers of 2 once. The <span style="font-family: "courier new" , "courier" , monospace;">fromID()</span> function takes an integer ID number and converts it to a binary vector of length $n$. It uses the <span style="font-family: "courier new" , "courier" , monospace;">intToBits()</span> function from the R base package, which does the heavy lifting but whose output needs a little massaging. The <span style="font-family: "courier new" , "courier" , monospace;">toID()</span> function takes a binary vector and converts it to an integer ID number. So, with $n=5$, the output of <span style="font-family: "courier new" , "courier" , monospace;">fromID(22)</span> is<br /><blockquote class="tr_bq"><pre>[1] 0 1 1 0 1</pre></blockquote>while the output of <span style="font-family: "courier new" , "courier" , monospace;">toID(c(0, 1, 1, 0, 1))</span> is<br /><blockquote class="tr_bq"><pre>[1] 22</pre></blockquote>(as you would expect).<br /><br />There is one other thing I should point out: because I am using the ID numbers to index binary vectors starting from (0, 0, ..., 0) and working up to (1, 1, ..., 1), I designed the functions to put the least significant bit first. If you want the most significant bit first and the least significant bit last, you need to make two changes to the code: change the definition of <span style="font-family: "courier new" , "courier" , monospace;">powers.of.two</span> to <span style="font-family: "courier new" , "courier" , monospace;">2^((n - 1):0)</span>; and, in the definition of <span style="font-family: "Courier New", Courier, monospace;">fromID</span>, change <span style="font-family: "Courier New", Courier, monospace;">head(intToBits(id), n)</span> to <span style="font-family: "Courier New", Courier, monospace;">rev(head(intToBits(id), n))</span>.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-47331996830204639362019-01-25T17:31:00.000-05:002019-01-25T17:31:55.936-05:00Threads and MemoryYesterday I had a rather rude reminder (actually, two) of something I've known for a while. I was running a Java program that uses CPLEX to solve an integer programming model. The symptoms were as follows: shortly after the IP solver run started, I ran out of RAM, the operating system started <a href="https://en.wikipedia.org/wiki/Paging" target="_blank">paging memory</a> to the hard drive, and the resulting hard drive <a href="https://en.wikipedia.org/wiki/Thrashing_(computer_science)" target="_blank">thrashing</a> made the system extremely sluggish (to put it charitably). Long after I regained enough control to kill the program, there was a significant amount of disk activity (and concomitant noise) as the system gradually came back to its senses.<br /><br />How did this happen? My system has four cores, which means CPLEX defaults to running four parallel threads when solving models. What's not always obvious is that each thread gets a separate copy of the model. (I'm not sure if it is the entire model after presolving or just most of it, but it's definitely a large chunk.) In my particular case, the model begins with an unpredictably large number of constraints, and when that number got big, the model got big -- not big enough to be a problem if I used a single thread, but too big to get away with four copies of it ... or, as it turned out, three copies. (The second thrashing event was when I tried to run it with three threads.)<br /><br />Parallel threading is great, but there are two caveats associated with it. First, performance is a sublinear function of the number of threads, meaning that doubling the number of threads will not cut run time in half, tripling the number of threads will not cut run time to a third the single-thread time, and so on. Second, if you are dealing with a largish model, you might want to try running a little while with a single thread to see how much memory it eats, and then decide how many copies your system can handle comfortably. That's an upper bound on how many threads you should use.<br /><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2