tag:blogger.com,1999:blog-87813834610619295712022-07-03T17:24:50.837-04:00OR in an OB WorldA mix of operations research items and software tricks that I'll probably forget if I don't write them down somewhere.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.comBlogger457125tag:blogger.com,1999:blog-8781383461061929571.post-75416765584541254452022-06-17T15:34:00.000-04:002022-06-17T20:10:06.101-04:00Partitioning an Interval<p>I recently saw a question on a help forum that fits the following pattern. The context is an optimization model (mixed integer program). There is a continuous variable $x$ and a constant value $a$ (frequently but not always 0), and we want to distinguish among three cases (with different consequences in the model): $x&lt;a;$ $x=a;$ or $x&gt;a.$ Each case has separate consequences within the model, which I will call $C_1,$ $C_2$ and $C_3$ respectively. Variations on this question arise frequently. For instance, make $x$ the difference of two variables, set $a=0$ and you are distinguishing whether the difference is negative, zero or positive.</p><p>To make things work properly, we need $x$ to be bounded, say $L\le x \le U.$ The goal is to break the feasible range of $x$ into three intervals. To do so, we will introduce a trio of binary variables, $y_1, y_2, y_3$ together with the constraint $$y_1 + y_2 + y_3 = 1.$$ We will come up with constraints such that fixing $y_1 = 1$ puts you in the left portion of the domain, fixing $y_2 = 1$ puts you in the middle portion, and fixing $y_3 = 1$ puts you in the right portion. The $y$ variables are then used elsewhere in the model to enforce whichever condition $C$ applies.</p><p>I'm using three variables for clarity. You can get by with two binary variables (changing the equality constraint above to $\le$), where both variables equaling zero defines one of the domain intervals. Figure 1 shows what the user wants and how the $y$ variables relate to it.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2iGQSQlqq5RFQIwGlV5MQUn01XnRyxM_wu1NoY56dimbdzy6Mydl393vtc_HMNcL9kZWCjscuBGB3fsV6_dYr7ENv8JfAcZPKODPaHH3z2YtU6x5yuCFy_wxqotXUeBQkdhdmSYjsvJhDZEC8c5hnbg7OVoo2mBarCWoqRJtpM94V5cqe5aXpQjwE/s600/interval1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="intervals desired by user" border="0" data-original-height="116" data-original-width="600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2iGQSQlqq5RFQIwGlV5MQUn01XnRyxM_wu1NoY56dimbdzy6Mydl393vtc_HMNcL9kZWCjscuBGB3fsV6_dYr7ENv8JfAcZPKODPaHH3z2YtU6x5yuCFy_wxqotXUeBQkdhdmSYjsvJhDZEC8c5hnbg7OVoo2mBarCWoqRJtpM94V5cqe5aXpQjwE/s16000/interval1.png" title="Figure 1" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 1<br /></td></tr></tbody></table><p></p><p>The intervals for $x$ are $[L, a)$ when $y_1 = 1$, the singleton $[a, a]$ when $y_2 = 1,$ and $(a, U]$ when $y_3 = 1.$ Unfortunately, the user cannot have what the user wants because the feasible region of a MIP model must be closed, and the first and third intervals in Figure 1 are not closed. Closing them results in Figure 2.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguc4EcMxyV5Ds17Ig1DvM-OQmPbf6TrmWip58C79ferrsI_sUKobUzO_bu7t2kGwk6tF-DjDQhXoQluT6-DcUTmAOcOK56cUjIIkjMocNLseo18NXgMUTxSvn-_esRiuyZDknyCsV5O2iMUP3swDyqgcZXIZb0eAZ3sNbeDbx_ym6Un1EiJnEusIih/s600/interval2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="closed intervals that intersect" border="0" data-original-height="104" data-original-width="600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguc4EcMxyV5Ds17Ig1DvM-OQmPbf6TrmWip58C79ferrsI_sUKobUzO_bu7t2kGwk6tF-DjDQhXoQluT6-DcUTmAOcOK56cUjIIkjMocNLseo18NXgMUTxSvn-_esRiuyZDknyCsV5O2iMUP3swDyqgcZXIZb0eAZ3sNbeDbx_ym6Un1EiJnEusIih/s16000/interval2.png" title="Figure 2" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 2<br /></td></tr></tbody></table><p>The intervals for $x$ are now $[L, a],$ $[a, a]$ and $[a, U].$ The good news is that this decomposition is easy to accomplish. The bad news is that the user likely will not accept it. If $x=a$, any one of the $y$ variables could take the value 1, so any of the three conditions could be triggered.<br /></p><p>This brings us to the first (and more common) of two possible compromises. We can change the strict inequality $x &gt; a$ to the weak inequality $x \ge a + \epsilon$ where $\epsilon$ is some small positive number, and do something analogous with the left interval. The result is Figure 3.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9A6i6Lt9658tFJ02rsi6rMgtYRZLKGRMA4yTFuUTGntKitoKEWYzVz3zIgq52AdprWNwLE9PbO2KfxX7twYaYkC9Zi5Vbv97z8cv47CKKmn1hhXenBmk-YKsUu-JE3Ri9uc7IXRvjNic9j2u5S2w76w5z7aMC7jYkZmdUXldvcgjV7Kc97KLKNNdJ/s598/interval3.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="intervals with gaps on either side" border="0" data-original-height="109" data-original-width="598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9A6i6Lt9658tFJ02rsi6rMgtYRZLKGRMA4yTFuUTGntKitoKEWYzVz3zIgq52AdprWNwLE9PbO2KfxX7twYaYkC9Zi5Vbv97z8cv47CKKmn1hhXenBmk-YKsUu-JE3Ri9uc7IXRvjNic9j2u5S2w76w5z7aMC7jYkZmdUXldvcgjV7Kc97KLKNNdJ/s16000/interval3.png" title="Figure 3" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 3<br /></td></tr></tbody></table><p>Now $y_1=1 \implies x \le a-\epsilon,$ $y_2 = 1 \implies x=a,$ and $y_3=1 \implies x \ge a + \epsilon.$ The good news is that all three intervals are closed. The bad news is that values of $x$ between $a-\epsilon$ and $a$ or between $a$ and $a + \epsilon$ are suddenly infeasible, whereas they were feasible in the original problem. Also note that any tiny deviation from $a$ will throw $x$ into no man's land. That leads to one more possibility, involving yet another positive constant $\delta &lt; \epsilon$ and shown in Figure 4.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhC7EiUDoVBdapsOTRfFKEyEeMaOkVU1z5LsViSm7dhQh9Xj9unKP9XLUFVsxaWXFcDzp1Gefd-pvKMFfBvD2rUcBM7AT-MOE9LmtqH6D27etWApi4DyttIH_wldG1RqyN3qGF0CodUefJp3EXu7UkKMLkZDsBA7jzSAURfEiUtXQgwZU47ujfhvbl_/s600/interval4.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="domain broken into three intervals" border="0" data-original-height="112" data-original-width="600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhC7EiUDoVBdapsOTRfFKEyEeMaOkVU1z5LsViSm7dhQh9Xj9unKP9XLUFVsxaWXFcDzp1Gefd-pvKMFfBvD2rUcBM7AT-MOE9LmtqH6D27etWApi4DyttIH_wldG1RqyN3qGF0CodUefJp3EXu7UkKMLkZDsBA7jzSAURfEiUtXQgwZU47ujfhvbl_/s16000/interval4.png" title="Figure 4" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 4<br /></td></tr></tbody></table>In this version, the domain of $x$ is broken into the closed intervals $[L, a-\epsilon],$ $[a-\epsilon +\delta, a+\epsilon - \delta]$ and $[a+\epsilon, U].$ There are still values of $x$ that were feasible in the original model but no longer feasible, but now you get to count values of $x$ "close" to $a$ as signalling the second of your three conditions.<br /><p>The algebra to put either Figure 3 or Figure 4 into force is actually pretty straightforward. You just need to write a pair of inequalities $\dots \le x \le \dots$ where the left side expression is the sum of each $y$ variable times the lower bound that would apply when that variable is 1 and the right expression is the sum of each $y$ variable times the upper bound that would apply. So for Figure 3, where the intervals are $[L, a-\epsilon],$ $[a,a]$ and $[a+\epsilon, U]$, we would have $$Ly_1 + a y_2 + (a+\epsilon)y_3 \le x \le (a-\epsilon)y_1 + a y_2 + Uy_3.$$ For Figure 4, the constraints would be $$Ly_1 + (a-\epsilon+\delta)y_2 + (a+\epsilon)y_3 \le x \le (a-\epsilon)y_1 + (a+\epsilon-\delta) y_2 + Uy_3.$$</p><p>There is one "tactical" point to consider. When this is first explained to someone who was unaware that strict inequalities are verboten, they are often tempted to set $\epsilon = 10^{-G}$ where $G$ is the gross domestic product of their native country. The problem is that the solver is using finite precision computer arithmetic, and every solver has a tolerance value $\tau$ (small but not zero) such that coming within $\tau$ of satisfying a constraint is "close enough". If you pick $\epsilon$ (or $\delta$) too small, it has the same consequence as setting it equal to 0 in terms of how valid the solutions are. You can end up with a value of $a$ for $x$ (to within rounding error) triggering consequence $C_1$ or $C_3$ rather than $C_2$, or a value of $x$ strictly greater than $a$ triggering $C_2$ (or possibly even $C_1$), and so on. So $\epsilon$ (and, if needed, $\delta$) must be greater than the rounding tolerance used by the solver.<br /></p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-75454274231027003692022-05-24T15:16:00.000-04:002022-05-24T15:16:52.250-04:00Allocating Resource Sets<p>The following problem was <a href="https://or.stackexchange.com/questions/8435/how-to-model-this-user-packing-problem" target="_blank">posted</a> on Operations Research Stack Exchange. A system contains $K$ users and $N$ resources. Each unit of resource may be allocated to at most one user. Each user $k$ requires a specified number $d_k$ of resource units, <i>which must be consecutive</i>. Not all resource sequences of the proper length will be acceptable to a user. Citing an example from the original post, you might have $d_1=3,$ with user 1 accepting only one of following sets: $\{1, 2, 3\},$ $\{7, 8, 9\},$ $\{11, 12, 13\},$ or $\{17, 18, 19\}.$ The objective is to assign resources to as many users as is possible.</p><p>Formulating this as an integer linear program is straightforward, and I coded the IP model in Java (using CPLEX as the solver) to provide a benchmark. The author of the question, however, was looking for heuristic solutions. I suggested a few possibilities, and decided to code two of them just to see how well they did. The setup for all of them is identical. You start with the following elements:</p><ul style="text-align: left;"><li>the set of unsatisfied users (initially, all users);</li><li>an enumeration of all resource sets compatible with any user (with each such set represented by an integer ID number);</li><li>a mapping of each user ID to the set of IDs of all compatible resource sets;</li><li>a mapping of each resource set ID to the set of IDs of all users who would accept that set; and</li><li>a mapping of each resources set ID to the set of IDs of all other resources sets that conflict with that set.</li></ul><p>Two resource sets conflict if they intersect, in which case allocating one of them precludes allocating the other since each unit of resource can be used at most once.</p><p>All the heuristics I suggested work by finding a valid allocation (an unused resource set that does not conflict with any resource set already used and an user who has not yet been served and will accept that resource set), making that allocation, and then updating the mappings described above. Updating the mappings means removing the user who just received resources, removing the resource set just assigned, and removing any other unassigned resource sets that conflict with the set just assigned. Those changes can have "ripple effects": removing a user may result in a surviving resource set having no compatible users left (in which case the resource set can be removed), and removing a resource set (whether used or due to a conflict) may leave a user with no surviving resource sets that are compatible, in which case the user can be removed. So the update step involves some looping that must be coded carefully.</p><p>The various heuristics I suggested are distinguished by two fundamental choices. First, in each iteration do you pick an available resource set and then look for a user who will accept it, or do you pick a user who has not yet been served and then look for a compatible resource set that is still available? Second, regardless of order, what criteria do you use for selecting users and resource sets?</p><p>There are a lot of ways to make those choices, and I home in on two. My first approach starts by selecting the available resource set with the fewest remaining conflicts and then chooses the compatible user having the fewest surviving options for resource sets. My second approach starts by selecting the remaining user with the fewest surviving options for resource sets and then choosing the compatible resource set with the fewest remaining conflicts. In all cases, ties are broken randomly. My logic is that picking the resource set with the fewest conflicts will leave the most surviving resource sets and (hopefully) allow more users to be served, and picking the user with the fewest remaining options will (hopefully) reduce the number of users who are frozen out because all compatible resource choices have already been allocated.</p><p>I'll take a moment to note here that the two approaches I just stated are intended to be one-pass heuristics (find a feasible allocation and stop). You could rerun them with different random seeds, but that would only change the results (not necessarily for the better) if one or more ties had occurred during the first pass. Another possibility, which I did not bother to code, would be to select either resource set or user randomly at each stage, then select the other half of the assignment randomly from the compatible choices. I suspect that any single run of the random approach would likely do worse than what I described above, but you could keep rerunning the purely random heuristic for a specific number of iterations (or with a time limit) and track the best solution ever found. That might improve over the versions I tested.</p><p>Using the dimensions specified (in a comment, replying to me) by author of the question, I did a small number of tests. In those tests, the resource set first heuristic consistently beat the user first heuristic, and both exhibited what I would consider to be decent results. On three test runs with different seeds for the random problem generator, I got results of <span class="comment-copy">14/13/11, 15/14/11 and 14/13/10, where the first number is the optimal number of customers served (from the IP model), the second is the number served by the resource-first heuristic, and the third is the number served by the customer-first heuristic. Run times for the heuristics on my humble PC (including setup times) were minuscule (under 20 milliseconds).</span></p><p><span class="comment-copy">My Java code can be found <a href="https://gitlab.msu.edu/orobworld/resourcesets" target="_blank">here</a>.<br /></span></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com11tag:blogger.com,1999:blog-8781383461061929571.post-24777106475825509082022-03-13T19:06:00.002-04:002022-03-13T19:06:50.477-04:00Wolf, Goat and Cabbage Part II<p>This continues my <a href="https://orinanobworld.blogspot.com/2022/03/wolf-goat-and-cabbage.html" target="_blank">previous post</a> about the "Wolf, Goat and Cabbage" logic puzzle. Using a MIP model to solve it strikes me as somewhat analogous to hammering in a nail with the butt end of a screwdriver handle: it works, but maybe it's not the ideal tool. To me, a constraint solver would make more sense, so I coded up a model using IBM's CP Optimizer (henceforth "CPO", part of the CPLEX Optimization Suite). Unsurprisingly, it worked (after some debugging) and produced the same solution.</p><p>MIP solvers largely share the same "vocabulary" for building models (real / integer / binary variables, linear and maybe quadratic expressions, equality and inequality constraints). From my limited exposure to constraint solvers, there is significant variation in both what things the solver does and does not understand. Some of that is variable types. A very basic solver may understand logical (true/false) and integer variables, and maybe real-valued variables (although I'm not sure handling real variables is anywhere near universal). CPO (and presumably some other solvers) understand "interval variables", which as the name suggests represent intervals of discrete values (usually time intervals, though possibly location or some other aspect). Similarly, different solvers will understand different types of global constraints. I suspect that every CP solver worth mentioning understands "all different" (no two variables in a bunch of integer variables can take the same value), but some solvers will implement "no overlap" constraints (the time interval during which I am eating and the time interval during which I am showering cannot overlap) or precedence constraints (this job has to end before this other job can start). Those kinds of constraints make certain scheduling problems easier to solve with CP than with a MIP model.</p><p>Anyway, I'm not entirely new to CPO, though far from proficient, and I tripped over a few "features" while coding the puzzle. I wanted to use boolean (true/false) variables for certain things, such as whether an item had made it to the far side of the river (true) or was stuck on the near side (false). CPO lets you declare a boolean variable but then treats it as an integer variable, meaning that you need to think in terms of 0 and 1 rather than false and true. So you can't say "if $x$ then ..."; instead, you need to say "if $x = 1$ then ..." (and trust me, the syntax is clunkier than what I'm typing here). When you go to get the value of your boolean variable $x$ after solving the model, CPO returns a double precision value. CPLEX users will be used to this, because in a MIP model even integer variables are treated as double-precision during matrix computations. CP solvers, though, like to do integer arithmetic (as far as I know), so it's a bit unclear why my boolean variable has to be converted from double precision to integer (or boolean). Even odder is that, at least in the Java API, there is a method that returns the value of an integer variable as an integer <i>if you pass the name of the variable as the argument</i>, but if you pass the actual variable you are going to get a double. (Did a federal government panel design this?)</p><p>In any event, logic of the CPO model is moderately straightforward, with constraints like "you can't carry something in the boat if it isn't on the bank the boat departs from" and "if the wolf and goat end up in the same place at the end of a period, the farmer better end up there too". Some things are bit less clunky with CPO than with CPLEX. For instance, to figure out what (if anything) is in the boat at a given time, the MIP model requires binary variables subscripted by time and item index (1 if that item is in the boat on that trip 0 if not). The CPO model just needs an integer variable for each time period whose value is either the index of the thing in the boat or a dummy value if the boat is empty. Furthermore, the nature of the variable automatically takes care of a capacity constraint. Since there is only one variable for what's in the boat, at most one thing (whatever that variable indicates) can be in the boat.</p><p>Some (most?) constraint solvers, including CPO, provide a way to use a variable as an index to another variable. In my code, the integer variable indicating what's in the boat at time $t$ is used to look up the location variable (near or far bank) for that item at time $t$ from a vector of location variables for all items at time $t$.</p><p>Anyway, the <a href="https://gitlab.msu.edu/orobworld/wolfgoatcabbage" target="_blank">code</a> in my repository has been updated to include the CPO model, and it's heavily commented in case you wanted to compare it to the MIP model.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-81278097693142300442022-03-11T15:26:00.002-05:002022-03-11T15:26:52.707-05:00Wolf, Goat and Cabbage<p>On Operations Research Stack Exchange, someone <a href="https://or.stackexchange.com/questions/7818/wolf-goat-and-cabbage-riddle-as-an-optimization-problem" target="_blank">asked</a> about a possible connection between the <a href="https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem" target="_blank">"wolf, goat and cabbage"</a> logic puzzle and Monge's <a href="https://en.wikipedia.org/wiki/Transportation_theory_(mathematics)#Monge_and_Kantorovich_formulations" target="_blank">optimal transport problem</a>. In the logic puzzle, a farmer has to get a wolf, a goat and a cabbage across a river using a boat that can only accommodate one of them (plus the farmer) at a time. If you leave the wolf and goat alone together at any point, the wolf eats the goat. If you leave the goat and cabbage alone together at any point, the goat eats the cabbage. Fortunately, the wolf has no appetite for cabbage and the cabbage does not seem to want to eat anything, else the problem would be infeasible.</p><p>Neither Monge's transport problem nor the more commonly taught (in my opinion) <a href="https://en.wikipedia.org/wiki/Transportation_theory_(mathematics)#Hitchcock_problem" target="_blank">Hitchcock transportation problem</a> directly apply, although you can (almost) treat the puzzle as a multiperiod commodity flow with an "inventory" of each item (wolf, goat, cabbage) on each side of the river. The "almost" part is that you need some of the variables to be integer, for two reasons. One is that the LP relaxation of the logic constraints (e.g., inventory of wolf + inventory of goat $\le 1$ on this side of the river at this time) will result in fractional values (we'll leave half a wolf and half a goat here and carry the other halves across the river) (which would greatly diminish the values of both wolf and goat). The other is that the objective is to minimize the number of trips made. It would be tempting to just assign a cost of 1 to each movement of an object in either direction, but the catch is that you will occasionally cross with nothing in the boat (besides the farmer). Those "deadheading" trips count toward the objective, but it's tricky to assign a cost to a zero flow.</p><p>To fill in some idle time, I coded up a MIP model. Mind you, I am not advocating MIP as a way to solve problems like this; I just wanted to confirm my thinking (in particular, that an LP commodity flow model would have fractional solutions). Assume that the farmer arrives at the left bank at time 0 with all three items, and that each trip (in either direction) counts as one time unit, with the first trip occurring at time 1. We need to set an upper bound $T$ on the number of trips. Since the state of the system is described by the location of four things (counting the farmer), with each have two possible locations (left bank, right bank), $T=2^4 =16$ works. The set of items will be denoted $I=\lbrace W, G, C\rbrace.$ My formulation uses the following variables.</p><ul style="text-align: left;"><li>$L_{i,t}\in [0,1]$ and $R_{i,t}\in [0,1]$ are the inventories of item $i\in I$ at time $t\in \lbrace 0,\dots,T$ on the left and right banks respectively, after any trip occurring in that period.</li><li>$x_{i,t}\in \lbrace 0,1 \rbrace$ and $y_{i,t}\in \lbrace 0,1 \rbrace$ are the amount of item $i$ crossing the river at time $t$ from left to right or right to left respectively.</li><li>$z_t\in \lbrace 0,1 \rbrace$ is 1 if transport is ongoing and 0 if we are done (the farmer and all three items are on the right bank).</li></ul><p>It would be perfectly fine (but unnecessary) to make the inventory variables integer-valued, and we could also make the inventory variables integer-valued and drop the integrality restrictions on the cartage variables ($x$ and $y$).</p><p>Some of the variables can be fixed at the outset.</p><ul style="text-align: left;"><li>We start with all inventory on the left bank, so $L_{i,0}=1$ and $R_{i,0}=0$ for all $i\in I.$&nbsp;</li><li>There is no trip at time 0, so $z_0=0$ and $x_{i,0}=y_{i,0}$ for all $i\in I$.</li><li>Trips alternate left-to-right (odd time periods) and right-to-left (even time periods), so $x_{i,t}=0$ for all $i\in I$ and for all even $t$ and $y_{i,t}=0$ for all $i\in I$ and for all odd $t$.<br /></li></ul><p>The objective function is to minimize the number of trips required. $$\min \sum_{t=1}^T z_t.$$</p><p>The constraints are rather straightforward.</p><ul style="text-align: left;"><li>The inventory on either bank in any period is the inventory on that bank from the preceding period, plus any arriving inventory and minus any departing inventory. So for $t\ge 1$ $$L_{i,t} = L_{i, t-1} - x_{i,t} + y_{i,t}\quad \forall i\in I$$ and $$R_{i,t} = R_{i,t-1} + x_{i,t} - y_{i,t}\quad \forall i\in I.$$</li><li>In an odd numbered period (where the farmer ends up on the right bank), neither wolf and goat nor goat and cabbage can be on the left bank. So for odd $t$ $$L_{W,t} + L_{G,t} \le 1$$ and $$L_{G,t} + L_{C,t}\le 1.$$</li><li>In an even numbered period (where the farmer ends up on the left bank), neither wolf and goat nor goat and cabbage can be on the right bank <i>unless</i> the problem is completed ($z_t = 0$), in which case the farmer remains on the right bank. So for even $t$ $$R_{W,t}+R_{G_t} + z_t \le 2$$ and $$R_{G,t} + R_{C,t} + z_t \le 2.$$</li><li>Transport continues until the left bank is empty. $$3z_t \ge \sum_{i\in I} L_{i,t - 1}\quad \forall t\ge 1.$$</li></ul><p>It does indeed produce a correct solution, using seven trips (see the Wikipedia page for the solution) ... and with integrality conditions dropped it does indeed produce a nonsense solution with fractions of items being transported.</p><p><a href="https://gitlab.msu.edu/orobworld/wolfgoatcabbage" target="_blank">Java code</a> for this model (using CPLEX) is in my Git repository. <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-8089323189289733842022-03-03T15:32:00.060-05:002022-03-09T19:46:45.105-05:00Finding Almost All Paths<p>A question posted on Stack Overflow (and subsequently deleted) led to a <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2022/01/an-all-paths-network-problem.html" target="_blank">blog post</a> by Erwin Kalvelagen on how to find all paths between two nodes in a directed graph (possibly with self-loops, i.e. arcs from a node to itself) subject to two constraints: no arc can be used more than once in a path; and there is an upper limit $M$ on the number of arcs used. Note that a path might visit a *node* more than once. It just cannot repeat an arc. The original question seems to have referred to an undirected graph, but Erwin's post works with directed graphs and so will I.<br /><br />Erwin explored some mixed integer linear programming (MIP) models in his post, and a <a href="https://or.stackexchange.com/questions/7966/how-to-compute-all-paths-between-two-given-nodes-in-a-network/" target="_blank">subsequent post</a> on OR Stack Exchange led to more proposals of MIP models (including one from me). I also suggested that a "brute force" approach might be faster than any of the MIP models. In what follows, I will spell out both my MIP model and the brute force approach I used. Java code for both (which requires CPLEX and the Apache Commons Collections library) are in my <a href="https://gitlab.msu.edu/orobworld/allpaths" target="_blank">code repository</a>.</p><p>In what follows $A$ is the set of arcs in the graph, $s$ is the origin node for all paths, $t$ is the destination node for all paths, and $M$ is the maximum number of arcs to include in a path.</p> <h3 style="text-align: left;">MIP model</h3><div style="text-align: left;">&nbsp;</div><div style="text-align: left;">The MIP model uses the following variables.&nbsp;</div> <div style="text-align: left;"><ul style="text-align: left;"><li>$u_{a}\in\left\{ 0,1\right\}$ is 1 if and only if arc $a$ is used on the path.</li><li>$f_{a}\in\left\{ 0,1\right\}$ is 1 if and only if arc $a$ is the first arc on the path.</li><li>$\ell_{a}\in\left\{ 0,1\right\}$ is 1 if and only if arc $a$ is the last arc on the path.</li><li>$y_{ab}\in\left\{ 0,1\right\}$ is 1 if and only if arc $b$ immediately follows arc $a$ on the path.</li><li>$z_{a}\in\left[0,M\right]$ will be the number of arcs preceding arc $a$ on the path (0 if $a$ is not on the path).</li></ul> <p>Some of the variables can be eliminated (fixed at 0) at the outset.</p> <ul style="text-align: left;"><li>$f_{a}=0$ if node $s$ is not the tail of arc $a.$</li><li>$\ell_{a}=0$ if node $t$ is not the head of arc $a.$</li><li>$y_{ab}=0$ if the head of arc $a$ is not the tail of arc $b.$</li></ul> <p>Since we are just interested in finding feasible solutions, the objective is to minimize 0.</p><p>Use of the $z$ variables mimics the <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem#Miller%E2%80%93Tucker%E2%80%93Zemlin_formulation" target="_blank">Miller-Tucker-Zemlin</a> approach to avoiding subtours in the traveling salesman problem. A common knock on the MTZ formulation for the TSP is that it has a somewhat loose continuous relaxation. Since we have a trivial objective (all feasible solutions are optimal), that is not a concern here. </p></div><p>The constraints are as follows.</p> <ul style="text-align: left;"><li>There must be one first arc and one last arc.<br />$$\sum_{a\in A}f_{a}=1.$$$$\sum_{a\in A}\ell_{a}=1.$$</li><li>At most $M$ arcs can be used.$$\sum_{a\in A}u_{a}\le M.$$</li><li>An arc is used if and only if it is either the first arc or follows another arc on the path.$$f_{a}+\sum_{b\in A}y_{ba}=u_{a}\quad\forall a\in A.$$</li><li>The last arc must be a used arc.$$\ell_{a}\le u_{a}\quad\forall a\in A. (1)$$</li><li>The sequence value of an unused arc is 0.$$z_{a}\le Mu_{a}\quad\forall a\in A.$$</li><li>No arc can follow the last arc.$$\ell_{a}+\sum_{b\in A}y_{ab}\le1\quad\forall a\in A. (2)$$</li><li>If an arc is used, either it is the last arc or another arc follows it.$$\ell_{a}+\sum_{b\in A}y_{ab}=u_{a}\quad\forall a\in A. (3)$$</li><li>If an arc $b$ follows arc $a$, the sequence number of arc $b$ must be one higher than the sequence number of arc $a.$ $$z_{a}-z_{b}+\left(M+1\right)y_{ab}\le M\quad\forall a,b\in A,a\neq b.$$</li></ul> <p>Over on OR Stack Exchange, Rob Pratt correctly pointed out that constraint (3) implies constraints (1) and (2). I've left them in the Java code anyway. The CPLEX presolver removes them automatically.<br /> </p><h4 style="text-align: left;">Finding all solutions&nbsp;</h4> <p>To find all solutions, one possible approach is to solve whatever MIP model you choose, then add a "no-good" constraint that eliminates the solution just found (and only that one) and solve again, until eventually the aggregation of "no-good" constraints makes the model infeasible. What I did instead was to use the "populate" command in CPLEX, which accumulates a pool of solutions. Besides a time limit, I used two non-default parameter settings: I cranked up the solution pool capacity (the maximum number of solutions to find) to something high enough to let it find all solutions; and I set the solution pool intensity to its highest value (4), which tells CPLEX to aggressively seek out all feasible solutions.</p> <h3 style="text-align: left;">Brute force approach</h3><div style="text-align: left;">&nbsp;</div><div style="text-align: left;">The brute force approach is remarkably straightforward. We will use a queue of (incomplete) paths that I will call the "to-do list". Start by creating a length one path for each arc with tail $s$ and add them to the to-do list. We now proceed in a loop until the to-do list is empty. At each iteration, we pull a path $P$ off the to-do list, identify the arcs whose tails match the head of the last arc in $P$, remove any arcs already on $P$, and for each surviving arc $a$ create a new path $P'$ by extending $P$ with $a$. If the head of arc $a$ is $t$, $P'$ is a new $s-t$ path, which we record. Either way, if $P'$ has length less than $M$, we add it to the to-do list, and eventually try to extend it further.</div><div style="text-align: left;">&nbsp;</div> <div style="text-align: left;"><h3 style="text-align: left;">Do they work?</h3></div> <div style="text-align: left;">&nbsp;</div><div style="text-align: left;">Would I be posting this if they didn't. :-) I tested both methods on the digraph from Erwin's post, which has 10 nodes and 22 arcs (with two self-loops). The source and sink are nodes 1 and 10 respectively. With $M=3$ there are four paths (which both methods find), and with $M=4$ there are nine paths (which both methods find). In both cases, brute force took about 1 ms. on my PC (using non-optimized Java code with a single thread). CPLEX times were rather stochastic, but I think it fair to say that building the models took around 55+ ms. and solving them typically took 20 ms. or more.</div> <div style="text-align: left;">When I set a maximum length ($M$) of 10, things got interesting. The brute force approach found 268 paths (in about 6 ms), while CPLEX found only 33. Neither time limit nor pool size were a factor, so I assume that this is just a limitation of the aggressive setting for pool intensity. That means that to find all possible solutions using CPLEX, the solve / add constraint / solve again approach would be necessary.<br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-88890066030937387162022-02-22T14:57:00.003-05:002022-03-09T14:53:30.902-05:00A "Decoupled" Quadratic Program<p>Someone posted a <a href="https://or.stackexchange.com/questions/7806/how-to-handle-a-non-separable-bilinear-objective-function-in-the-special-case-of" target="_blank">question</a> on OR Stack Exchange about a problem with a "nonseparable bilinear objective function" with "decoupled constraints". The problem has the following form:</p><p>\begin{alignat*}{1} \min &amp; \sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}x_{i}y_{j}+b^{\prime}x+c^{\prime}y\\ \textrm{s.t. } &amp; x\in X\subset\mathbb{R}^{m}\\ &amp; y\in Y\subset\mathbb{R}^{n} \end{alignat*}</p><p>where $X$ and $Y$ are polytopes (i.e., defined by linear constraints, and bounded) in the positive orthants of their respective spaces (i.e., $x\ge 0$ and $y\ge 0$) and $a$, $b$ and $c$ are all nonnegative. So besides everything being nonnegative, the key things here are that the objective contains bilinear terms (always involving the product of an $x$ and a $y$) and each constraint involves either only $x$ or only $y$. The author was looking for a way to exploit the separability of the constraints in solving the problem.</p><p>It occurred to me that a heuristic approach might be to solve a sequence of linear programs, alternating between $X$ and $Y$. Let $h(x,y)$ be the objective function of the original problem, and let $$f(x;y) = \sum_i\sum_j a_{ij}x_i y_j + b^\prime x$$and $$g(y;x)=\sum_i\sum_j a_{ij}x_i y_j + c^\prime y$$ be respectively the portions of the objective dependent on $x$ with $y$ fixed or dependent on $y$ with $x$ fixed. Each is linear in one set of variables (the other set being fixed). So we could proceed as follows:</p><ol style="text-align: left;"><li>Minimize $f(x;0)$ over $X$ (a linear program) to get a starting value $x^0\in X.$</li><li>Minimize $g(y;x^0)$ over $Y$ to get a starting value $y^0\in Y.$ We now have an incumbent solution with objective value $h(x^0,y^0)=b^\prime x^0 + g(y^0;x^0).$ Set $t=0$.</li><li>Repeat the following until the first time that the incumbent does not improve.</li><ol><li>Minimize $f(x;y^t)$ over $X$ to get $x^{t+1}.$ If $f(x^{t+1},y^t) + c^\prime y^t$ is less than the incumbent value, make $(x^{t+1}, y^t)$ the new incumbent, else punt.</li><li>Minimize $g(y;x^{t+1})$ over $Y$ to get $y^{t+1}.$ If $g(y^{t+1};x^{t+1}) + b^\prime x^{t+1}$ is less than the incumbent value, make $(x^{t+1},y^{t+1})$ the new incumbent and increment $t$, else punt.</li></ol></ol><p>This will generate a sequence of solutions, each a corner point of the feasible region $X\times Y,$ with monotonically decreasing objective values.</p><p>Before continuing, it is worth noting that we are guaranteed that at least one corner of $X\times Y$ will be an optimum. This is of course always true when the objective is linear, but not always true with a quadratic objective. In our case, suppose that $(x^*, y^*)$ is an arbitrary optimum for the original problem. Then $x^*$ must minimize $f(x;y^*)$ over $X$, so either $x^*$ is a corner of $X$ of (if there are multiple optimal) $x^*$ can be replaced by a corner of $X$ with the same value of $f(x;y^*)$ (and thus the same value of $h(x;y^*)$, since the difference $c^\prime y^*$ does not depend on $x$). Apply the same logic on the other variable to show that $y*$ is either a corner of $Y$ or can be replaced by one.</p><p>I'm still calling this a heuristic, because there is one more piece to the puzzle. Could the procedure stop at a corner of $X\times Y$ that is a local but not global optimum? I'm not sure. Offhand, I do not see a way to prove that it will not, but I also cannot easily conjure up an example where it would.<br /></p><p>To test this (and to confirm that I was not hallucinating, which has been known to happen), I wrote some Java code to generate random test problems and try the procedure. The code uses CPLEX to solve the LPs. In all test cases, it terminated with a solution (at least locally optimal) in a pretty small number of iterations.&nbsp;</p><p>As a benchmark, I tried solving the full QP models with CPLEX, setting its "optimality target" parameter to "OPTIMALGLOBAL", which tells CPLEX to search for a global optimum to a nonconvex problem. This causes CPLEX to turn the problem into a mixed-integer quadratic program, which shockingly takes longer to solve than an LP. In a limited number of test runs, I observed a surprisingly consistent behavior. At the root node, CPLEX immediately found a feasible solution and then immediately found a better solution that matched what my procedure produced. After than (and within a usually stingy time limit I set), CPLEX made progress on the bound but never improved on the incumbent. In two test runs, CPLEX actually reached proven optimality with that second incumbent, meaning my procedure had found a global optimum.</p><p>So perhaps my "heuristic" can actually be shown to be an exact algorithm ... or perhaps not. At least in my test runs, CPLEX working on the QP found the best solution about as fast as, or maybe slightly faster than, my procedure did. On the other hand, my procedure only requires an LP solver, so it will work with code that does not accept QPs.</p><p>My Java code (which requires CPLEX) is available <a href="https://gitlab.msu.edu/orobworld/decoupledqp" target="_blank">here</a>.</p><p><b>Addendum</b>: User "Dusto" on the Discord Operations Research server <a href="https://discord.com/channels/888822916186243142/888897947650097162/95093621228844652" target="_blank">posted a simple counterexample</a> to global convergence. The example has no constraints other than bounds on the variables (from 0 to 10 in all cases). The $b$ and $c$ vectors are strictly positive, so the linear programs in my heuristic will start at the origin and get stuck there. The $A$ matrix is crafted so that a negative overall objective value is attainable.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-20170399470518793232022-02-10T11:37:00.000-05:002022-02-10T11:37:40.528-05:00Tournament Scheduling: CPLEX v. CP Optimizer (Part 3)<p>This is the third and final chapter in my opus about scheduling "pods" (players) in a tournament. Reading the <a href="https://orinanobworld.blogspot.com/2022/02/tournament-scheduling-cplex-v-cp.html" target="_blank">first post</a> will be essential to understanding what is going on here. Reading the <a href="https://orinanobworld.blogspot.com/2022/02/tournament-scheduling-cplex-v-cp_9.html" target="_blank">second post</a> (in which I formulate an integer programming model for the problem) is definitely optional.</p><p>I will stick with the notation introduced in the prior post. As before, there are $N$ pods (players) being organized into teams of two to play in $G$ games (two teams per game, one in white, one in black). Each game has two slots corresponding to the two teams in the game.<br /></p><ul style="text-align: left;"><li>$t$ indexes a team;</li><li>$p$ and $q$ index pods;</li><li>$g$ indexes a game;</li><li>$s$ indexes a slot (and $s'$ is the other slot of the same game);</li><li>$c$ is a jersey color (0 for white, 1 for black).</li></ul><p>For a slot $s$, I will use $G_s$ and $C_s$ to denote respectively the game in which the slot exists and the jersey color (0 or 1) of the team in the slot. For a team $t$, $P_{t,0}$ and $P_{t,1}$ will be the two pods on the team. (The order they are listed is immaterial.)<br /></p><p>The model variables are almost (but not exactly) what they were in the IP model.</p><ul style="text-align: left;"><li>$x_{t}\in \lbrace 0, \dots, 2G-1\rbrace$ is the index of the slot in which team $t$ plays (where slots 0 and $G$ are the white and black teams in the first game and slots $G-1$ and $2G-1$ are the white and black teams in the final game).<br /></li><li>$y_{pqg}\in \lbrace 0,1\rbrace$ is 1 if pods $p$ and $q\neq p$ are playing in game $g$ on opposing teams.<br /></li><li>$z_{pg}\in \lbrace 0,1\rbrace$ is the color jersey worn by pod $p$ at the conclusion of game $g$ (regardless of whether or not $p$ played in game $g$).</li><li>$w_{pg}\in \lbrace 0,1\rbrace$ is 1 if pod $p$ changes jersey color going into game $g.$</li><li>$v_{pg} \in \lbrace 0,1\rbrace$ is 1 if pod $p$ is playing in game $g.$</li></ul><p>The changes from the IP model are that $x$ is general integer rather than binary (with dimension 1 rather than 2), the third index of $y$ is the game rather than the slot, and $v$ is a new variable. As in the IP model, the objective is to minimize $\sum_p \sum_g w_{pg}.$</p><p>The constraints are as follows. Note that the requirement that every team play exactly once is captured by the assignment of a single slot value to each team (via $x$).<br /></p><ul style="text-align: left;"><li>Two different teams cannot occupy the same slot: $$\textrm{allDiff}(x).$$</li><li>Occupying a slot implies playing in the game: $$x_t = s \implies (v_{P_{t,1},G_s} = 1 \wedge v_{P_{t,2},G_s} = 1) \quad \forall t,s.$$</li><li>Occupying a slot determines the color of the jersey worn in the game: $$x_t = s \implies&nbsp; (z_{P_{t,1},G_s} = C_s \wedge z_{P_{t,2},G_s} = C_s) \quad \forall t,s.$$</li><li>Jersey color for a pod remains constant from game to game unless a change is recorded: $$z_{p,g} \neq z_{p,g-1}&nbsp; \iff w_{p,g} = 1 \quad \forall p, \forall g &gt; 0.$$</li><li>Playing in consecutive games precludes changing jerseys: $$(v_{p,g - 1} = 1 \wedge v_{p,g} = 1) \implies w_{p,g} = 0 \quad \forall p, \forall g &gt; 0.$$</li><li>Two pods are opponents if and only if they are playing in the same game with different colors: $$y_{pqg} = 1 \iff (v_{pg} = 1 \wedge v_{qg} = 1 \wedge z_{pg} \neq z_{qg} \quad \forall p, \forall q\neq p, \forall g.$$</li><li>Every pair of pods faces each other exactly twice: $$\sum_g y_{pqg} = 2 \quad \forall p, \forall q\neq p.$$</li></ul><p>Here "allDiff()" is the CPLEX notation for the "all different" constraint, a global constraint common to most constraint solvers. Given a collection of variables with the same range, the all different constraint says that no two variables in the collection can take the same value.</p><p>To put it mildly, I would not be shocked if there is a more efficient (easier to solver) way to write the problem as a CP model.<br /></p><p>&nbsp;</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-41327156095317692632022-02-09T14:43:00.000-05:002022-02-09T14:43:30.244-05:00Tournament Scheduling: CPLEX v. CP Optimizer (Part 2)<p>In my <a href="https://orinanobworld.blogspot.com/2022/02/tournament-scheduling-cplex-v-cp.html" target="_blank">previous post</a>, I described a tournament scheduling problem and discussed my results using both an integer programming model (CPLEX) and a constraint programming model (CP Optimizer) to try to solve it. Here I will present the IP model. (This post will be gibberish unless you have read at least the start of the previous post.)<br /></p><p>There are multiple ways to write an integer programming (IP) model for the tournament problem. My initial instinct was to think in terms of what games a pod plays in, which pods it plays with/against and so on, but I fairly quickly changed to thinking in terms of teams rather than pods. With $N=9$ pods and two pods to a team, there are $N(N-1)/2=36$ possible teams, which we can enumerate at the outset. There will be 18 games, each containing two teams, and every team will play exactly once. Each game contains two "slots", one for the team in white jerseys and the other for the team in black jerseys.<br /></p><p>In what follows, I will (I hope) stick to the following notation:</p><ul style="text-align: left;"><li>$t$ indexes a team;</li><li>$p$ and $q$ index pods;</li><li>$g$ indexes a game;</li><li>$s$ indexes a slot (and $s'$ is the other slot of the same game);</li><li>$c$ is a jersey color (0 for white, 1 for black).</li></ul><p>Since I am using Java, all indexing is zero-based. Slots are numbered so that game 0 has slots 0 (white) and $G$ (black), game 1 has slots 1 (white) and $G+1$ (black), etc., where $G=N(N-1)/4$ is the number of games. I will denote by $T_p$ the set of teams $t$ containing pod $p$. <br /></p><p>The model variables are as follows.</p><ul style="text-align: left;"><li>$x_{ts}\in \lbrace 0,1\rbrace$ is 1 if team $t$ plays in slot $s,$ 0 if not.</li><li>$y_{pqs}\in \lbrace 0,1\rbrace$ is 1 if pod $p$ is playing in slot $s$ and is opposed by pod $q.$</li><li>$z_{pg}\in \lbrace 0,1\rbrace$ is the color jersey worn by pod $p$ at the conclusion of game $g$ (regardless of whether or not $p$ played in game $g$).</li><li>$w_{pg}\in \lbrace 0,1\rbrace$ is 1 if pod $p$ changes jersey color going into game $g,$ 0 if not.</li></ul><p>The objective is to minimize $\sum_p \sum_g w_{pg}.$ The constraints are the following.</p><ul style="text-align: left;"><li>Every team plays exactly once. $$\sum_s x_{ts} = 1 \quad\forall t.$$</li><li>Every schedule slot is filled exactly once. $$\sum_t x_{ts} = 1 \quad\forall s.$$</li><li>Pods $p$ and $q$ oppose each other with $p$ in slot $s$ if and only if $p$ is playing in $s$ and $q$ is playing in $s'$. $$y_{pqs} \le \sum_{t\in T_p} x_{ts} \quad\forall p, q\neq p, s.$$ $$y_{pqs} \le \sum_{t\in T_q} x_{ts'} \quad\forall p, q\neq p, s.$$ $$y_{pqs} \ge \sum_{t\in T_p} x_{ts} + \sum_{t\in T_q} x_{ts'} - 1 \quad\forall p, q\neq p, s.$$</li><li>Every pair of pods opposes each other exactly twice. $$\sum_s y_{pqs} = 2 \quad \forall p\neq q.$$</li><li>No pod plays consecutive games in different jerseys.$$\sum_{t\in T_p}\left( x_{t,s-1} + x_{t,s'} \right) \le 1 \quad \forall p, \forall s\notin \lbrace 0, G\rbrace.$$</li><li>A pod playing in a game has its jersey color determined by its team's slot. (This has the desirable side effect of preventing two teams containing the same pod from playing against each other.) $$\sum_{t\in T_p} x_{t, g+G} \le z_{pg} \le 1 - \sum_{t\in T_p} x_{t,g} \quad \forall p,g.$$(Note that for any game index $g$, slot $g$ is the white slot in game $g$ and slot $g+G$ is the black slot.)</li><li>A pod's color for any game (playing or not) is the same as its color for the previous game, unless a change occurs. $$z_{p, g-1} - w_{pg} \le z_{pg} \le z_{p, g-1} + w_{pg} \quad\forall p, \forall g \ge 1.$$</li></ul><p>In the next post, I will try to articulate my CP model for the problem. <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-3330612360992933362022-02-08T16:46:00.000-05:002022-02-08T16:46:04.916-05:00Tournament Scheduling: CPLEX v. CP Optimizer (Part 1)<p>&nbsp;A <a href="https://math.stackexchange.com/questions/4370828/tournament-schedule-assignment-via-convex-optimization" target="_blank">recent question</a> on Mathematics Stack Exchange asked about scheduling $N$ "pods" (players) in a tournament under the following conditions.</p><ul style="text-align: left;"><li>There will be $N(N-1)/4$ games played sequentially (no games in parallel).</li><li>Games pit teams of two pods each against each other. One team wears white and one team wears black.<br /></li><li>Each pod partners with every other pod once and plays against every other pod twice.</li><li>No pod can play in consecutive games wearing different jersey colors.</li></ul><p>The objective is find a schedule that minimizes the total number of times pods have to change their jersey color.</p><p>The question specifies $N=9$. Note that, since there are $N(N-1)/4$ games, a necessary condition for the problem to be feasible is that either $N$ or $N-1$ must be divisible by $4$. That condition is not, however, sufficient. An obvious counterexample is when $N=4$ and there are three games to be scheduled. The first game uses all four teams, as does the second game. Since no pod can be forced to wear different jerseys in consecutive games, all four teams would go into the second game wearing the same colors as in the first game ... which means being in the same teams, violating the rule about pods being teamed together exactly once.</p><p>I coded both an integer programming model (using CPLEX) and a constraint programming model (using CP Optimizer) in Java and tried to solve the tournament problem with $N=9$. Since everything is inherently integer and the constraints are more logical than arithmetic (the only real algebra is summing up the number of jersey changes), I had two initial expectations: that CPLEX would provide tighter lower bounds than CPO (because MIP models tend to do better with bounds than CP models); and that CPO would find feasible (and possibly optimal) schedules faster than CPLEX would, since the problem is inherently logic-based (and CP models often do better than MIP models on scheduling problems). I was half right. CPLEX did provide tighter lower bounds than CPO, at least within the time limits I was willing to use, although I don't think either came remotely close to a "tight" lower bound. CPO, however, struggled massively to find feasible schedules while CPLEX did not have too much trouble doing so.</p><p>Before going any further, I need to issue three caveats. First, the longest I ran the IP model was 30 minutes and the longest I ran the CP model was an hour. Second, while I am pretty familiar with formulating IP models, I am much less familiar running CP models, so I may have missed opportunities to make the CP model more efficient. Third, while CPLEX gives the user a fair amount of indirect control over the search process (by setting branching priorities, MIP emphasis, how frequently to try certain kinds of cuts and so on), CP Optimizer may offer the user even more flexibility in how to tailor searches I am not yet familiar enough to do anything beyond trying to indicate which variables should be the first candidates for branching (and I'm not sure I got that right).</p><p>I'll end this installment with a synopsis of some runs.</p><ul style="text-align: left;"><li>In a 30 minute run using the new setting 5 for the MIP emphasis parameter (stressing use of heuristics to improve the incumbent, at the cost of not paying too much attention to the lower bound), CPLEX found a feasible schedule with 14 jersey changes and a lower bound of 2.97 (a 79% gap). It evaluated somewhere around 18,500 nodes.<br /></li><li>In a 30 minute run, CP Optimizer branched over 957 <i>million</i> times but never found a feasible schedule and ended with a best bound of 0.</li><li>Finally, I tried running CPLEX for three minutes (in which it found an incumbent with 19 jersey changes) and then used that incumbent as a starting solution for a one hour run of CP Optimizer. CPO accepted the incumbent solution, did a bit over 1.55 <i>billion</i> branch operations, and failed to improve on it. The best bound was again 0.&nbsp;</li></ul><p>If you want to look at my code (which will make slightly more sense after my next couple of posts, where I will describe the models), it is available from my university GitLab <a href="https://gitlab.msu.edu/orobworld/podtourney" target="_blank">repository</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-36879494131181028262022-02-06T16:14:00.001-05:002022-02-06T16:14:51.446-05:00Taxi Dispatch<p>A <a href="https://or.stackexchange.com/questions/7771/average-time-between-two-dispatches-in-a-taxi-fleet-probably-a-batch-processing" target="_blank">question</a> on OR Stack Exchange pertains to a taxi stand. The assumptions are as follows.</p><ul style="text-align: left;"><li>There are $t$ taxis operating between points A and B, with all passenger traffic going from A to B. For concreteness, I'll refer to a "taxi stand" with a "line" of taxis, similar to what you might see at an airport or popular hotel.</li><li>The round-trip travel time for a taxi (taking passengers from A to B and deadheading back from B to A) is deterministic with value $T$. (Note: Deterministic travel time is the author's assumption, not mine.)</li><li>Each taxi has a capacity for $p$ passengers and is dispatched only when full.</li><li>Passengers arrive according to a Poisson process with parameter (rate) $\lambda$.</li></ul><p>The author was looking for the mean time between dispatches.</p><p>It has been a very long time since I taught queuing theory, and I have forgotten most of what I knew. My best guess for the mean time between dispatches, under the assumption that the system reached steady state, was $p/\lambda$, which would be the long-run average time required for $p$ passengers to arrive. To achieve steady-state, you need the average arrival rate ($\lambda$) to be less than the maximum full-blast service rate ($t p/T$). If arrivals occur faster than that, in the long term you will dispatching a taxi every $1/T$ time units while either queue explodes or there is significant balking/reneging.</p><p>To test whether my answer was plausible (under the assumption of a steady state), I decided to write a little discrete event simulation (DES). There are special purpose DES programs, but it's been way to long since I used one (or had a license), so I decided to try writing one in R. A little googling turned up at least a couple of DES packages for R, but I did not want to commit much time to learning their syntax for a one-off experiment, plus I was not sure if any of them handled batch processing. So I wrote my own from scratch.</p><p>My code is not exactly a paragon of efficiency. Given the simplicity of the problem and the fact that I would only be running it once or twice, I went for minimal programmer time as opposed to minimal run time. That said, it can run 10,000 simulated passenger arrivals in a few seconds. The program uses one data frame as what is known in DES (or at least was back in my day) an "event calendar", holding in chronological order three types of events: passengers joining the queue; taxis being dispatched; and taxis returning.</p><p>The simulation starts with all taxis in line and no passengers present. I did not bother with a separate "warm-up" period to achieve steady-state, which would be required if I were striving for more accuracy. For a typical trial run ($t=5$, $p=4$, $T=20$, $\lambda=0.7$), my guess at the mean time between dispatches worked out to 5.714 and the mean time computed from the run was 5.702. So I'm inclined to trust my answer (and to quit while I'm ahead).</p><p>If anyone is curious (and will forgive the lack of efficiency), you can take a look at my <a href="https://rubin.msu.domains/blog/taxis.nb.html" target="_blank">R notebook</a> (which includes the code).<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-37722332733637521752022-01-26T17:51:00.000-05:002022-01-26T17:51:41.899-05:00Balancing Machine Groups<p>This post continues my previous post about <a href="https://orinanobworld.blogspot.com/2022/01/quantifying-approximate-equality.html" target="_blank">approximate equality</a> of some value across multiple entities. The motivation was a <a href="https://or.stackexchange.com/questions/7611/dividing-machines-into-groups-of-equal-sizes-so-that-each-group-has-approximatel" target="_blank">question</a> posed on OR Stack Exchange asking how to partition $M$ machines into $G$ groups so that each group would have approximately the same total productivity. The machines have specified productivity values $P_1,\dots,P_M \ge 0$, and the productivity of a group is just the sum of the productivity values of the group members. The question assumes that $G$ divides $M$ evenly, so that each group has the number $S = M / G$ of members.<br /></p><p>The previous post covered the issue of what metric to use in the objective function. Here I will focus on what algorithm/heuristic to use to solve it.</p>The author of the question specified $M=21$, $G=7$, $S=3$ for problem dimensions, with productivity values $P_m \in [0,10].$ He also posed a heuristic that on the surface makes sense: assign machines sequentially (I assume in decreasing productivity order) to the group with lowest current productivity (and with space to accept another machine). A genetic algorithm with chromosomes representing permutations of $1,\dots,M$ can be applied to any of the objective functions. (Since GAs maximize fitness and the problem here is to minimize the dispersion measure, we would just maximize the difference between the dispersion measure and some fixed upper bound on it.)<p>Finally, we can find exact solutions using a mixed integer linear program and any objective except MSD. (The MSD would be a mixed integer quadratic program.) The MIP models would contain variables $x_{mg}\in \lbrace 0,1 \rbrace$ for each machine $m$ and group $g$, indicating membership of that machine in that group. They would also contain auxiliary variables $y_g \ge 0$ to indicate the productivity value for each group $g$. The common part of the models would involve constraints $$\sum_{g=1}^G x_{mg} = 1\quad\forall m$$ (every machine is assigned to exactly one group), $$\sum_{m=1}^M x_{mg} = S \quad\forall g$$ (every group contains $S$ machines) and $$y_g = \sum_{m=1}^M P_m x_{mg} \quad\forall g$$ (defining the group productivity variables in terms of the assignments). To minimize the range, we would add variables $z_0, z_1 \ge 0$ plus constraints $$z_0 \le y_g \quad\forall g$$ and $$z_1 \ge y_g \quad \forall g$$ and then minimize $z_1 - z_0$. To minimize the MAD, we would instead add variables $z_g \ge 0$ for $g=1,\dots,G$ with the constraints $$z_g \ge y_g - S\cdot \overline{P} \quad\forall g$$ and $$z_g \ge S\cdot \overline{P} - y_g \quad\forall g$$ (where $\overline{P} = (\sum_m P_m)/M$ is the mean productivity level of the machines, so that $S\cdot \overline{P}$ is the mean productivity per group regardless of how the groups are formed). The objective would then be to minimize $\sum_g z_g$.<br /></p><p>I decided to code some of the alternatives in R (using CPLEX to solve the MIPs) and see how they did. The author's heuristic has the advantage of being very fast. The MIP models, not surprisingly, were much slower. I tried adding some antisymmetry constraints (since the objective value is unaffected if you permute the indices of the groups), but they did not seem to help run time, so I dropped them.</p><p>Here are the range and MAD values achieved by each method on a particular test run (reasonably representative of other runs).</p><p style="text-align: center;"><span style="font-family: courier;">&nbsp;Method Criterion Range % Gap&nbsp; MAD % Gap<br />&nbsp;Author&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; None&nbsp;&nbsp; 381.02613 564.734106<br />&nbsp;&nbsp;&nbsp;&nbsp; GA&nbsp;&nbsp;&nbsp;&nbsp; Range&nbsp;&nbsp;&nbsp; 52.04068&nbsp; 49.728175<br />&nbsp;&nbsp;&nbsp;&nbsp; GA&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MAD&nbsp;&nbsp;&nbsp; 56.95300&nbsp; 66.916019<br />&nbsp;&nbsp;&nbsp; MIP&nbsp;&nbsp;&nbsp;&nbsp; Range&nbsp;&nbsp;&nbsp;&nbsp; 0.00000&nbsp;&nbsp; 2.215414<br />&nbsp;&nbsp;&nbsp; MIP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MAD&nbsp;&nbsp;&nbsp; 17.02795&nbsp;&nbsp; 0.000000</span></p><p>The gap measures are percentage above optimal. The author's method seems to be suboptimal by a wide margin, and the GAs are not doing particularly well on this example either.</p><p>We can visualize the group productivity levels for each solution in a bubble plot, where the size of a point is the productivity. The actual group productivity levels are close enough to each other that the plot is not helpful, so rather than plotting productivity I will plot the absolute percentage deviation from the overall mean productivity of all groups.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgwyfBML9lYL7-iVFKelRUWdyUZAeUdeRBYyEq7MGhHbUi7jvZX2WxmM1mJp-JXUFp_zXCbNa7ZHp_5jsmNmW8bV98GW9vAeypzRUNOX8KoekNNeFzDP_CoFOaAav-L8fHKm9J8-nwDHWbuKUrUO0EOuzSOShLEyFR_s-UMDpcQSfyc6A8-j0qy34hr=s579" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="357" data-original-width="579" src="https://blogger.googleusercontent.com/img/a/AVvXsEgwyfBML9lYL7-iVFKelRUWdyUZAeUdeRBYyEq7MGhHbUi7jvZX2WxmM1mJp-JXUFp_zXCbNa7ZHp_5jsmNmW8bV98GW9vAeypzRUNOX8KoekNNeFzDP_CoFOaAav-L8fHKm9J8-nwDHWbuKUrUO0EOuzSOShLEyFR_s-UMDpcQSfyc6A8-j0qy34hr=s16000" /></a></div><p>For the author's heuristic, group 3 has a productivity level close to the target mean and the other groups are pretty far off. (Remember that this is absolute deviation, so while we know group 1 for the author's solution is way off, we do not know from the plot whether it is above or below mean.) The MIP model that minimizes range looks the best to me, even though the MIP model that minimizes MAD is better with respect to overall absolute deviation.</p><p>There are three caveats to bear in mind. First, this is one test run, with randomly generated data. Altering the data affects all the results, although in multiple runs the author's heuristic always had gaps of 200% or more and the GAs never found optimal solutions. Second, performance of the GAs is affected by multiple parameters, in particular the population size and number of generations run. Third, we could adapt the author's heuristic by using the $G$ machines with highest productivity as "anchors" for the $G$ groups and then assigning the remaining machines in random order rather than in decreasing productivity order. That would allow us to run the heuristic multiple times with different sort orders for the machines, subject to time and/or iteration limits, and pick the best solution found.</p><p>My <a href="https://rubin.msu.domains/blog/machineGroups.nb.html" target="_blank">R notebook</a> is available for download. Fair warning: it requires a bunch of packages, and requires a CPLEX installation if you plan to run the MIP models.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-23818616653014675172022-01-25T14:34:00.002-05:002022-01-25T14:34:27.463-05:00Quantifying Approximate Equality<p>A <a href="https://or.stackexchange.com/questions/7611/dividing-machines-into-groups-of-equal-sizes-so-that-each-group-has-approximatel" target="_blank">question</a> posed on OR Stack Exchange asked how to partition $M$ machines with specified productivity values $P_1,\dots,P_M \ge 0$ into $G$ groups of $S$ machines each (where $G\cdot S = M$) so that each group would have approximately the same total productivity. This raises an interesting question about how one quantifies "approximate equality".</p><p>Some of the suggestions for metrics among the responses to the question included minimizing the maximum productivity of any group, maximizing the minimum productivity of any group, and minimizing the range (the difference between the maximum and minimum). I suspect that mathematical convenience at least partly motivated them, since they are all easy to incorporate in an optimization model. Of those three, minimizing the range would get my vote. At the same time, minimizing the range does not minimize the "lumpiness" of the distribution of values between the extremes. So I would also consider minimizing either the mean absolute deviation (MAD) or mean squared deviation (MSD) for the group productivity values, recognizing that either of those might result in a greater overall spread of group values.</p><p>At this point I will drop any references to machines and productivity and just look at the general problem of allocating a fixed amount of something to a fixed number of somethings. Let's suppose that I have 49 units of whatever and 7 entities to whom to distribute it. The mean number of units given to any entity will be 49 / 7 = 7, regardless of how I do the allocating. Consider the following plot showing three different possibilities.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEh9b9mzq73InaIM9We2lEQaOeAtXU2i0LEUyj-tMryWFXcDaWMH4ic56EoTVuf4In3bouGerqeLrS5GXmAz1CIeb8P2dK_DqeuqR32JNoX-XWgXlrlMayxUjKVQL4zb1ZQvRGFyYy3rXAGB9OC5KJ5BkL4E-jCQCeCpgIMf-Yehm2HMUBCwEogA3KR0=s586" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="Bubble plot of three allocations" border="0" data-original-height="362" data-original-width="586" src="https://blogger.googleusercontent.com/img/a/AVvXsEh9b9mzq73InaIM9We2lEQaOeAtXU2i0LEUyj-tMryWFXcDaWMH4ic56EoTVuf4In3bouGerqeLrS5GXmAz1CIeb8P2dK_DqeuqR32JNoX-XWgXlrlMayxUjKVQL4zb1ZQvRGFyYy3rXAGB9OC5KJ5BkL4E-jCQCeCpgIMf-Yehm2HMUBCwEogA3KR0=s16000" /></a></div><p></p><p>The size of each bubble is the number of entities receiving an allocation of a given size. True equality would manifest as one large bubble at value 7.</p><p>Methods A and B have the same minimum and maximum (and hence the same range), but method B is clearly more concentrated (smaller MAD and MSD). I suspect that the author of the original question would prefer B to A. On the other hand, if we are allocating rewards to people, someone might possibly find A more egalitarian, in that the recipient of the smallest amount (1) is less disadvantaged relative to the recipient(s) of the second smallest amount (3 in A, 7 in B), and similarly the recipient of the largest amount (13) is less advantaged relative to the recipient(s) of the second largest amount (11 in A, 7 in B).</p><p>Now compare methods B and C. C has a larger minimum, smaller maximum and smaller range, but B has a smaller MAD. (They have the same MSD.) Which is closer to equality? I think that is a subjective question. In my next post, I will compare a few possible solutions to the original question.<br /></p><p>My point here is that minimizing range and minimizing MAD are both valid choices, potentially leading to substantially different solutions.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-16152738604561869042021-12-28T14:11:00.000-05:002021-12-28T14:11:10.129-05:00Revisiting a Joint Clustering Problem<p>In a <a href="https://orinanobworld.blogspot.com/2021/04/a-ga-model-for-joint-clustering-problem.html" target="_blank">previous post</a>, I described a genetic algorithm for a problem posted on OR Stack Exchange. The problem involved grouping users and servers so as to maximize the total quality of service to all users. Every user in a cluster is served by every server in the cluster, but servers in other clusters interfere to some degree with the user's service.</p><p>The person who posted that problem recently posted <a href="https://or.stackexchange.com/questions/7471/how-to-perform-clustering-of-two-different-sets-of-entities" target="_blank">another variation</a> on OR SE. The new version of the problem is nearly, but not quite, identical to the previous version. "Servers" are now termed "transmitters", but the problem remains clustering transmitters and users, with limits on how many transmitters (but not how many users) belong to each cluster, and with the same objective function. The new version may be slightly easier thanks to one new restriction.</p><blockquote><p>The user must belong to a cluster that has the transmitter providing the highest weight for the given user.</p></blockquote><p>Thus the assignment of a transmitter to a cluster immediately implies the assignment to that cluster of all users for whom the specified transmitter has highest weight. (I will assume that ties for highest weight either do not occur or are broken whimsically.)</p><p>I put together two different mixed-integer programming (MIP) models for the problem. Both are rather complicated, in part because the objective function needs to be linearized. They are too big to squeeze into a blog post, so I put them in a <a href="https://rubin.msu.domains/blog/transmitter_cluster_models.pdf" target="_blank">PDF file</a>. The models differ in the use of binary variables to assign transmitters. In the first model, a binary variable $x_{t,t'}$ is 1 if transmitters $t$ and $t'$ belong to the same cluster. In the second, a binary variable $x_{t,c}$ is 1 if transmitter $t$ belongs to cluster $c$. I coded both models in Java and ran them (using CPLEX as the MIP solver) on a few test problems with dimensions similar to what the original post contained (20 transmitters, 60 users). The first model was consistently smaller in size than the second and produced tighter bounds within whatever time limit I gave CPLEX.</p><p>The OR SE question also mentioned heuristics, so I coded up a couple of reasonably obvious ones. The first is a greedy construction heuristic. It initially puts each transmitter in a separate cluster, then repeatedly merges the pair of clusters that either produces the best improvement in the overall objective function. If no further improvement is possible and there are still more clusters than the number allowed, it merges whichever pair does the least damage to the objective. In all cases, mergers occur only if the merged cluster is within the allowed size limit.</p><p>The second heuristic is a pairwise swap heuristic. Given the (presumably feasible) solution from the first heuristic, it considers all pairs of transmitters belong to different clusters and, for each pair, evaluates the objective impact of swapping the two transmitters and performs the swap if it is beneficial. Note that swapping preserves the sizes of both clusters, so there is no question whether it is feasible. The heuristic keeps looping through pairs of transmitters until it is unable to find a useful swap.</p><p>In my experiments, I used the best solution from the second heuristic to "hot start" the first MIP model. (I ran into some difficulties getting the second MIP model to accept it as a starting solution, and frankly was not motivated enough to bother investigating the issue.) The results from one computational test (20 transmitters, 60 users, maximum cluster size 5) are fairly representative of what I saw in other tests. The construction heuristic got a solution with objective value 26.98 in 0.1 seconds on my desktop PC. The improvement heuristic ran for about 0.25 seconds and improved the objective value to 29.05. Running the first (better) MIP model using CPLEX 20.1 with a 10 minute time limit, starting from that solution, CPLEX improved the objective value to 29.62 with a final upper bound around 31.18. So the heuristic solution, which was obtained very quickly, was within 7% of optimal based on the bound and within 2% of the last solution CPLEX got within the 10 minute time limit.</p><p>Two caveats apply here. First, I did a minimal amount of testing. Second, and possibly more important, I randomly generated the transmitter-user weights using a uniform distribution. Having weights that follow a different distribution might affect both heuristic and model performance.</p><p>The <a href="https://gitlab.msu.edu/orobworld/transmitter-clusters" target="_blank">Java code</a> for both heuristics and both MIP models can be downloaded from my repository. It requires a recent version of Java (14 or later) since it uses the Record class, and it of course requires CPLEX (unless you want to delete that part and just play with the heuristics), but has no other dependencies.<br /></p><br />Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-23987429080145714642021-12-21T14:31:00.001-05:002021-12-21T14:31:06.477-05:00Installing Rcplex in RStudio<p>This is an update to a previous post ("<a href="https://orinanobworld.blogspot.com/2020/09/installing-rcplex-and-cplexapi.html" target="_blank">Installing Rcplex and cplexAPI</a>"). <a href="https://cran.r-project.org/web/packages/Rcplex/" target="_blank">Rcplex</a> is a interface from R to the CPLEX optimization solver. A recent release (it's now at version 0.3-5) has made installation and updating much easier (no hacking of installer code required), provided you configure your environment correctly.</p><p>I use the RStudio IDE for almost everything related to R, including updating packages. A recent list of packages waiting to be updated included Rcplex 0.3-4, but when I told RStudio to update them all it balked at Rcplex. The workaround is actually quite simple. What follows is how I got to work in Linux Mint, but the same steps should work with any Linux distribution and I assume can be modified to work on Windows or Mac OS.</p><ol style="text-align: left;"><li>Find where the CPLEX binary lives. In my case, the path was "/home/paul/&lt;stuff&gt;/CPLEX_Studio201/cplex/bin/x86-64_linux/cplex".</li><li>In a terminal, create an environment variable CPLEX_BIN containing that path (unless you already have one). This can be done using the <span style="font-family: courier;">export</span> command, as in "<span style="font-family: courier;">export CPLEX_BIN=/home/paul/&lt;stuff&gt;/CPLEX_Studio201/cplex/bin/x86-64_linux/cplex</span>". Note that what I am exporting includes the name of the executable (that last "<span style="font-family: courier;">cplex</span>"), not just the path to it. (I goofed on my first try by just specifying the path.)</li><li>Now invoke RStudio from the terminal and do the package update (or installation if you are getting it for the first time) as normal.</li></ol><p>If you are not an RStudio user, you can install or update the package manually from a terminal by issuing the same export command followed by "<span style="font-family: courier;">R CMD INSTALL &lt;path to download&gt;/Rcplex_0.3-5.tar.gz</span>" (updating the version number as necessary).</p><p>This was much less painful than previous installations. Thanks to the developers for maintaining the package and updating the installation scripts!<br /></p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-18424591874189276122021-12-09T15:07:00.002-05:002021-12-09T15:07:45.888-05:00Normalization Constraints<p>There are occasions where one might consider adding a constraint to an optimization problem specifying that $\|x\| \sim \nu,$ where $x$ is the solution vector, $\nu$ is a positive constant (usually 1, but sometimes something else), the norm is whichever norm (1, 2, $\infty$) suits the modeler's fancy, and the relation $\sim$ is whichever of $\le,$ $=$ or $\ge$ fits the modeler's needs. I can think of two somewhat common reasons for a constraint like this.</p><p>The first is a situation where $x$ is free to scale up or down and the modeler is trying to avoid having the solver choose ridiculously large or ridiculously small values for $x.$ Commonly, this is to prevent the solver from exploiting rounding error. As an example, consider the two group discriminant problem with a linear discriminant function, in which your observations have dimension $k$ and you have two samples, one of observations that should receive a positive discriminant score and one of observations that should receive a negative discriminant score. Let $P$ be an $m_P \times k$ matrix containing the first sample and $N$ an $m_N \times k$ matrix containing the second sample. Your goal is to find a coefficient vector $x\in \mathbb{R}^k$ and scalar $x_0\in \mathbb{R}$ that maximize the number of indices $i$ for which $\sum_j P_{i,j} x_j + x_0 \ge \delta$ plus the number of indices $i$ for which $\sum_j N_{i,j} x_j + x_0 \le -\delta,$ where $\delta &gt; 0$ is a cutoff for how small a discriminant score can be and still be counted as nonzero. There is an obvious formulation as a mixed integer program using binary indicator variables for correct v. incorrect scores.</p><p>From a "pure mathematics" perspective, scaling the coefficients of the discriminant function (including $x_0$) up or down by a constant positive factor does not change whether an observation gets a positive, negative or zero score, but unfortunately the computer's finite precision arithmetic drags us out of the realm of "pure mathematics". Absent some sort of normalization, the solver might scale the coefficients way up to exploit some lucky rounding error (where an observation in the positive sample whose score should be zero or even negative gets a tiny positive score courtesy of rounding error, which becomes big enough to clear the $\delta$ threshold after the solution is scaled up). A normalization constraint is one way to avoid this, but I suspect a better way is just to impose bounds on the variables, which avoids any nonlinearities introduced by the norm function. Note, though, that when the domain of the variables contains both positive and negative values (true in the discriminant example, where coefficients can have either sign), bounds will not stop the solver from scaling the solution downward. If the solver wants to make the solution artificially small (for instance, if there is a penalty for scores that are incorrect by more than a particular amount), bounds will not prevent it; you will need a norm constraint.</p><p>The other situation I have encountered is one where zero is a feasible solution that is perhaps attractive from the perspective of the objective function but needs to be eliminated. An <a href="https://or.stackexchange.com/questions/7412/maximizing-the-number-of-nonnegative-coordinates-of-wx/7415" target="_blank">example of this</a> recently popped up on OR Stack Exchange. Given a matrix $W,$ the author wants to find a vector $x\in\mathbb{R}^n$ that maximizes the number of nonnegative components of $Wx.$ The author turns this into a MIP model using binary indicator variables for whether a component is nonnegative or not. The catch here is that $x=0\implies Wx=0,$ so the zero solution trivially maximizes the objective (all components nonnegative). The author rules this out by adding the normalization constraint $\| x \|_1=n.$ The one-norm will introduce binary variables, either directly or indirectly (meaning they might be added internally by the solver).</p><p>For this situation, there is an alternative way to rule out zero. Suppose that we draw an observation $r\in\mathbb{R}^n$ from a continuous distribution over a set of full dimension. For simplicity, I like the uniform distribution, so I might take $r\sim U([-1, 1]^n).$ I would normalize the vector (changing $r$ to $r/\|r\|_2$) just to avoid the possibility that it is too close to a zero vector (or, if using a larger domain, too big). In lieu of the norm constraint, I would add the linear constraint $r^\prime x = 1.$ Clearly this eliminates $x=0,$ but does it eliminate other values of $x?$ Most likely not. If $r^\prime x \neq 0$ and if $x$ is free to scale, than the solver can scale an otherwise optimal $x$ to satisfy $r^\prime x = 1.$ Because $r$ is drawn from a continuous distribution over a full-dimensional domain, $\Pr(r^\prime x = 0)=0$ for any nonzero $x.$ So the only real concern would be if the optimal $x$ had a (nonzero) inner product with $r$ small enough that attempting to scale it to meet the normalization constraint would violate some bounds on the components of $x.$ That is a risk I'm usually willing to take.</p><p>Here is the punch line (and the motivation for this post). I suggested using the random product in lieu of the norm constraint in an answer I posted on OR SE. The author of the question commented that he had tried it, and it "<span class="comment-copy">improves performance by at least an order of magnitude". Presumably this is be eliminating the (explicit or implicit) binary variables in the computation of the 1-norm, along with an auxiliary variables and constraints used to implement the norm restriction.</span></p><p><span class="comment-copy">I tested a genetic algorithm versus the author's MIP model (modified to replace the norm constraint with a random normalization) using CPLEX. If anyone is curious how the GA does, you can play with my <a href="https://rubin.msu.domains/blog/maxnonnegative.nb.html" target="_blank">R code</a> (which, as usual, requires one library for the GA and a gaggle of libraries to use CPLEX).<br /></span></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-56219759607526886172021-11-15T15:45:00.002-05:002021-11-15T15:45:48.615-05:00A Heuristic Surprise<p>The <a href="https://en.wikipedia.org/wiki/Maximum_cut" target="_blank">maximum cut problem</a> is one of those combinatorics problems that are deceptively easy to articulate and yet NP-annoying to solve. Given an undirected graph $G=(V,E)$, find a partition of $V$ into disjoint subsets $A$ and $B$ such that the set of edges in $E$ with one endpoint in $A$ and the other in $B$ (the "cut set") is as large as is possible. There is also a weighted version, where you maximize the total weights of the cut set rather than its cardinality.</p><p>Someone <a href="https://or.stackexchange.com/questions/7269/budgeted-max-cut-heuristic/7273" target="_blank">asked on OR Stack Exchange</a> about a variant of the problem, in which there is a specified size (which the author called a "budget") for one of the sets, i.e., a constraint $|A|=b$ for some given $b$. This generated various suggestions, including some heuristics. I had a couple of thoughts about possible heuristics, and in playing with some computational examples found myself rather surprised by the results.</p><h2 style="text-align: left;">Mixed Integer Program</h2><div style="text-align: left;">It is of course quite straightforward to write an integer or mixed-integer linear program for the problem. I'll assume that $V=\lbrace 1,\dots,N\rbrace$ and that $2\le b \le N-1$. Let variable $x_i$ be 1 if vertex $i\in V$ is in $A$ and 0 if it is in $B$, and let variable $y_{i,j}$ be 1 if edge $(i,j)\in E$ is in the cut set ($i\in A$ and $j\in B$ or vice versa) and 0 if not ($i$ and $j$ both in $A$ or both in $B$).</div><div style="text-align: left;">&nbsp;</div><div style="text-align: left;">Please allow me a slight digression here. The $y$ variables can be declared either continuous or binary. The combination of objective and constraints will force the values to 0 or 1 even if they are declared as continuous. Force of habit, dating back to when I was using very early generation software on not particularly powerful (by current standards) computers, has me declaring every variable as continuous unless it has to be integer. On the other hand, it has been suggested to me that declaring such variables integer-valued may help solvers tighten bounds or fix variables. That makes sense to me, but in this particular model I don't see anything that would suggest making $y$ integer would help the solver.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Digression over, we can write the MIP model as</div><div style="text-align: left;"> \begin{alignat*}{1} \max\sum_{(i,j)\in E}y_{ij}\\ \textrm{s.t. }\sum_{i\in V}x_{i} &amp; =b\\ y_{ij} &amp; \le x_{i}+x_{j}\quad\forall(i,j)\in E\\ y_{ij} &amp; \le2-x_{i}-x_{j}\quad\forall(i,j)\in E\\ x_{i} &amp; \in\left\{ 0,1\right\} \quad\forall i\in V\\ y_{ij} &amp; \in\left[0,1\right]\quad\forall(i,j)\in E. \end{alignat*}</div><div style="text-align: left;">&nbsp;</div><div style="text-align: left;">The first constraint enforces the "budget", while the second and third constraints ensure that $y_{ij}=0$ if $i$ and $j$ are on the same side of the partition. The objective function will force $y_{ij}=1$ if $i$ and $j$ are on different sides. The MIP model is easy to solve for small instances and grows more difficult as the number of vertices and the number of edges grows, and as $b$ gets closer to $N/2$ (since the number of possible partitions satisfying the budget increases).</div><div style="text-align: left;"></div><div style="text-align: left;"><h2 style="text-align: left;">Genetic Algorithm</h2></div><div style="text-align: left;"></div><div style="text-align: left;">Since the max cut problem is NP-hard, I suspect the budgeted version also is, and in any case it is likely that solving the MIP to optimality can take too long (or too much memory) in many cases. That led me to thinking about heuristics. One easy heuristic is to solve the MIP model but not to optimality, stopping at some arbitrary time or memory/node limit. That requires a MIP solver. Another possibility (among many) is a genetic algorithm.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">I was fooling around with the problem in R, which has a very handy genetic algorithm library (named, rather suggestively, <span style="font-family: courier;">GA</span>). Among other things, the <span style="font-family: courier;">GA</span> library allows you to use a permutation as a "chromosome" (solution encoding). So we can define a candidate solution to be a permutation of $1,\dots,N$, and it's "fitness" is then the size of the cut set defined by setting $A$ equal to the first $b$ elements of the permutation.</div><div style="text-align: left;"></div><div style="text-align: left;"><h2 style="text-align: left;">Pairwise Swap</h2></div><div style="text-align: left;"></div><div style="text-align: left;">Another possibility that I suggested on OR SE was a simple pairwise swapping heuristic. Start by generating a random choice of $A$ (and, by implication, $B$) and calculate the size of the cut set. Now consider all pairs of vertices $i\in A$ and $j\in B$ and, for each pair, see what happens if you swap them (moving $i$ to $B$ and $j$ to $A$). If the size of the cut set increases, accept the swap; otherwise do not. This is a primitive example of a "neighborhood" search, and it can be used in fancier heuristics, including <a href="https://en.wikipedia.org/wiki/Simulated_annealing" target="_blank">simulated annealing</a> (where you occasionally accept a swap that actually makes the objective smaller, in order to get you out of your current neighborhood). To keep things simple, I suggested just doing pairwise swaps when the objective improved, with random restarts. That means assigning a time limit to the heuristic, and when you run out of acceptable swaps, start fresh with a new random solution until time is exhausted.</div><div style="text-align: left;"><h2 style="text-align: left;">And the surprise is ...</h2></div><div style="text-align: left;">I code the MIP model (solved with CPLEX), the GA and the swapping heuristic in R and ran them on a few random test cases, keeping dimensions small enough that CPLEX could solve the MIP reasonably quickly. Since the swapping heuristic relies on a time limit to stop it, I set a limit of two minutes. For the GA, I set a limit of 10,000 generations (which it never reached, as I expected) and a stagnation limit of 200 generations (meaning that, after 200 consecutive generations with no improvement in the best objective value, it would terminate). Early experiments suggested that the GA could stagnate pretty quickly, so I used an "island" GA (in which several smaller populations on "islands" evolve independently, with periodic migrations from one island to another to freshen the "genetic pools" on the islands).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">My expectation was that the "sophisticated" GA would outperform the "primitive" swapping heuristic. Wrong! In the table below, I show the optimality gap (percent worse than the optimum, as computed by CPLEX) for the GA and the swap heuristic on six examples. Since the GA stagnates well before the two minute limit I gave to the swap heuristic, I also ran the GA with restarts (column "GA-R"), which restarted the GA with a new population each time it stagnated until the two minute limit was reached. The restarts improved the GA performance somewhat, but as you can see the swapping heuristic beat it on every one of the six test cases (and found the optimum in four out of six).</div><div style="text-align: left;"><br /></div><div style="text-align: left;"><br /></div> <style type="text/css">.tg {border:none;border-collapse:collapse;border-color:#bbb;border-spacing:0;} .tg td{background-color:#E0FFEB;border-color:#bbb;border-style:solid;border-width:0px;color:#594F4F; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#9DE0AD;border-color:#bbb;border-style:solid;border-width:0px;color:#493F3F; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-lboi{border-color:inherit;text-align:left;vertical-align:middle} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} .tg .tg-yz93{border-color:inherit;text-align:right;vertical-align:middle} </style><div align="center"><table class="tg"><thead> <tr> <th class="tg-c3ow" colspan="3">Graph</th> <th class="tg-c3ow" colspan="3">Gap</th> </tr></thead> <tbody><tr> <td class="tg-lboi">Nodes</td> <td class="tg-lboi">Edges</td> <td class="tg-lboi">Budget (b)</td> <td class="tg-lboi">GA</td> <td class="tg-lboi">GA-R</td> <td class="tg-lboi">Swap</td> </tr></tbody><tbody> <tr> <td class="tg-yz93">50</td> <td class="tg-yz93">429</td> <td class="tg-yz93">10</td> <td class="tg-yz93">1.7%</td> <td class="tg-yz93">1.7%</td> <td class="tg-yz93">0.0%</td> </tr> <tr> <td class="tg-yz93">50</td> <td class="tg-yz93">490</td> <td class="tg-yz93">20</td> <td class="tg-yz93">3.0%</td> <td class="tg-yz93">0.7%</td> <td class="tg-yz93">0.0%</td> </tr> <tr> <td class="tg-yz93">75</td> <td class="tg-yz93">416</td> <td class="tg-yz93">20</td> <td class="tg-yz93">6.4%</td> <td class="tg-yz93">4.3%</td> <td class="tg-yz93">0.0%</td> </tr> <tr> <td class="tg-yz93">75</td> <td class="tg-yz93">416</td> <td class="tg-yz93">30</td> <td class="tg-yz93">6.8%</td> <td class="tg-yz93">4.7%</td> <td class="tg-yz93">0.4%</td> </tr> <tr> <td class="tg-yz93">100</td> <td class="tg-yz93">495</td> <td class="tg-yz93">30</td> <td class="tg-yz93">8.4%</td> <td class="tg-yz93">7.8%</td> <td class="tg-yz93">1.6%</td> </tr> <tr> <td class="tg-yz93">100</td> <td class="tg-yz93">990</td> <td class="tg-yz93">25</td> <td class="tg-yz93">9.1%</td> <td class="tg-yz93">4.4%</td> <td class="tg-yz93">0.0%</td> </tr></tbody> </table></div> <p>I don't expect the swapping heuristic to find the optimum this consistently on larger problems. Whether it would outperform the GA (with restarts) on tougher problems is an open question. Nonetheless, I take this as a reminder that sometimes simple heuristics can do a pretty good job.</p><p>My <a href="https://rubin.msu.domains/blog/maxcut.nb.html" target="_blank">R notebook</a> is available if you want to play with it. Fair warning: the notebook assumes you have CPLEX installed and loads seven R libraries (four for use with CPLEX, one for the GA, one to time execution of the heuristics and a separate one to set a time limit on the heuristics). If some of that does not interest you, you can of course edit out the related code and skip loading the corresponding libraries.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-40357108256450543982021-11-03T18:34:00.001-04:002021-11-03T18:34:52.145-04:00Smallest Polygon Containing A 2D Point Set<p>A <a href="https://or.stackexchange.com/questions/7158/how-to-find-the-point-on-the-exterior-of-a-given-set-of-points" target="_blank">question</a> on OR Stack Exchange asked about an optimization model (specifically a MILP) for finding the "smallest hull" containing a finite set of points in the plane. The phrase "smallest hull" is rather fuzzy. In a follow up comment, the author of the question clarified that "smallest" means smallest area. That leaves the meaning of "hull". Note that any <a href="https://en.wikipedia.org/wiki/Hamiltonian_path" target="_blank">Hamiltonian path</a> through the points would contain all the points (with all of them on its "boundary") and would have area zero. Based on a sample illustration added by the author, I will assume that "hull" here means polygon (but not necessarily a convex one).</p><p>I'm pretty sure that a MILP model could be devised, but I suspect it would be rather large and might solve slowly. So I looked at the possibility of a heuristic solution, and came up with something that seems to do pretty well but definitely does not produce an optimal solution in all cases. I got it working in R. You can see the full notebook (which includes the code) <a href="https://rubin.msu.domains/blog/hull.nb.html" target="_blank">here</a>. (Warning: It's almost 6MB, due to the presence of some graphs.) The notebook uses three R libraries: <span style="font-family: courier;">ggplot2</span> and <span style="font-family: courier;">plotly</span>, for plotting; and <span style="font-family: courier;">interp</span>, for generating a tesselation (more about that below). The <span style="font-family: courier;">plotly</span> library is used just once, to generate an interactive plot with "tooltips" identifying each point, which is mainly useful for debugging. So if you don't feel like installing it, you can just delete any references to it and still run the important parts of the code.</p><p>Suppose we start with the following 2D point set.</p><p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-9Iq-kyTMl4w/YYMH4x5kNFI/AAAAAAAADso/_a7ZS1nUeWgfw8IHmaxqWPRZWdPMXNj1wCLcBGAsYHQ/s528/convex_hull.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="Point set plot showing convex hull" border="0" data-original-height="477" data-original-width="528" height="578" src="https://1.bp.blogspot.com/-9Iq-kyTMl4w/YYMH4x5kNFI/AAAAAAAADso/_a7ZS1nUeWgfw8IHmaxqWPRZWdPMXNj1wCLcBGAsYHQ/w640-h578/convex_hull.png" title="Point set (with convex hull)" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Point set (with convex hull)<br /></td></tr></tbody></table>The heuristic first finds the convex hull of the points (shown above) and a <a href="https://en.wikipedia.org/wiki/Delaunay_triangulation" target="_blank">Delaunay triangulation</a> of them, using the <span style="font-family: courier;">tri.mesh</span> function from the <span style="font-family: courier;">interp</span> library. The triangulation is a collection of nonoverlapping triangles, with all vertices taken from the point set, such that no point is in the interior of any triangle. The next plot shows what the triangulation looks like.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-8CRJW7EjdEI/YYMI8g06AuI/AAAAAAAADsw/SnX6t_-q6rgEpP9PraG8xfWUEjJ1F4LYgCLcBGAsYHQ/s555/triangulation.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="Delaunay triangulation of the points" border="0" data-original-height="343" data-original-width="555" height="396" src="https://1.bp.blogspot.com/-8CRJW7EjdEI/YYMI8g06AuI/AAAAAAAADsw/SnX6t_-q6rgEpP9PraG8xfWUEjJ1F4LYgCLcBGAsYHQ/w640-h396/triangulation.png" title="Delaunay triangulation" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Delaunay triangulation<br /></td></tr></tbody></table><p>The heuristic now takes a lap around the boundary of the convex hull. For each line segment $[a,b]$, it locates the (unique) triangle containing that edge. Let's say the third vertex of that triangle is $c$. If $c$ is not currently on the boundary, the heuristic adds it to the boundary, replacing the segment $[a,b]$ with the segments $[a,c]$ and $[c,b]$. It then deletes the $a-b-c$ triangle from the polygon, shaving off some area, and moves on to the next boundary segment.</p><p>This loop is repeated until a full circuit around the boundary fails to make any changes, at which point the heuristic terminates. Here is what the final polygon looks like.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-DoBZ3UKVV7U/YYMKqhnwz7I/AAAAAAAADs4/RrtDGuqR7GgNxfKmq5UCWKb5gdiPIIPMgCLcBGAsYHQ/s555/final_polygon.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="Plot of the final polygon, shaded to show the interior" border="0" data-original-height="343" data-original-width="555" height="396" src="https://1.bp.blogspot.com/-DoBZ3UKVV7U/YYMKqhnwz7I/AAAAAAAADs4/RrtDGuqR7GgNxfKmq5UCWKb5gdiPIIPMgCLcBGAsYHQ/w640-h396/final_polygon.png" title="Final polygon" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Final polygon<br /></td></tr></tbody></table><p>What appear to be line segments sticking out in a few places are actually very skinny triangles. Clearly the heuristic did not select this polygon for its aesthetics.<br /></p><p>The points were randomly generated in the unit square, which of course has area 1. For this example, the convex hull has area 0.865, while the final polygon has area 0.511, a reduction of almost 41%. At the same time, it is clear from the final plot that the solution is not optimal. There is one point left in the interior of the polygon, around $(0.493, 0.378)$. The heuristic cannot do anything about it because none of the triangles to which it belongs have an edge on the boundary. We could certainly add line segments from it to two boundary points, at $(0.483, 0.280)$ and $(0.450, 0.175)$, that currently share an edge. That would form a new triangle containing no other points, and we could drop the existing edge, adding the two new edges to boundary, and shed a little more area.<br /></p><p>So the heuristic does not produce "nice looking" polygons, nor does it find an optimal solution, but my intuition after running a few examples is that it probably does reasonably well.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com5tag:blogger.com,1999:blog-8781383461061929571.post-44361664291640022982021-10-09T16:16:00.001-04:002021-10-09T16:21:32.518-04:00Worker Ordering<p>&nbsp;A question ("<a href="https://math.stackexchange.com/questions/4271163/ordering-of-resources-for-maximizing-success" target="_blank">Ordering of resources for maximizing success</a>") on Mathematics Stack Exchange asks how to model the problem of sequencing workers under some unusual assumptions. You have 32 workers and 16 tasks. Each worker $i$ has a probability $w_i$ for success on any task. The probabilities vary by worker but not by task. Successes and failures are also assumed to be independent, meaning that if worker $i$ succeeds on one task, their probability of succeeding on the next task is still $w_i$, and if they fail on a task, the probability worker $i+1$ succeeds is still $w_{i+1}$. Once a worker fails, that worker is no longer available for any subsequent tasks. The objective is to order the workers so as to maximize the probability that all tasks are completed.</p><p>Answers posted on Math Stack Exchange asserted that sorting the workers into either ascending or descending probability order would be best. The question actually asked about how to model the decision problem, and I could not think of a good way to turn it into a tractable optimization model. So I decided to run a few experiments (in R) computing various permutations, with the thought that I would eventually land on a heuristic approach.</p><p>To my surprise, in every experiment I ran <i>all permutations resulted in the same success probability</i>. In other words, the answer to "what is the best order?" is "any order you like". I wrote out the explicit formula for success with three workers and two tasks, thinking I might see why the order did not matter, but derived no insights from it. So I was pretty much flummoxed.</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Eventually, a retroactively obvious answer dawned on me. Suppose we put the workers in an arbitrary order, pad the list of tasks with dummy tasks (so there are essentially infinitely many tasks, the first 16 of which are real), and turn the workers loose sequentially until the last worker fails. Let $n_i$ be the number of successes for worker $i$. (Remember, $i$ stops after their first failure.) Since each attempt is an independent observation of a <a href="https://en.wikipedia.org/wiki/Bernoulli_distribution" target="_blank">Bernoulli distribution</a> with parameter $w_i$, $n_i$ has a <a href="https://en.wikipedia.org/wiki/Geometric_distribution" target="_blank">geometric distribution</a> with parameter $1-w_i$. As it turns out, the specific distribution does not matter to us. What matters is that, given our infinite supply of dummy tasks, the variables $n_1, \dots, n_{32}$ are independent of each other. Since the real tasks are at the front of our task queue, the probability of overall success is just $$\mathrm{Pr}\left(\sum_{i=1}^{32}n_{i}\ge16\right),$$which depends only on the sum, not the order of summation. So it becomes obvious that any ordering of the workers (and thus of the $n_i$ variables) yields the same likelihood of success.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">If you want to see my R computations, they are in a <a href="https://rubin.msu.domains/blog/workers.nb.html" target="_blank">notebook</a> you can download.<br /></p><p><style type="text/css">p, li { white-space: pre-wrap; }</style></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-15187748106346267662021-09-13T17:44:00.000-04:002021-09-13T17:44:43.209-04:00Helping Babies<p>Before I get to what's on my mind, let me start by giving a shout out to the <i><a href="https://pubsonline.informs.org/magazine/orms-today/podcasts" target="_blank">Resoundingly Human</a> </i>podcast from INFORMS. In each episode, host Ashley Kilgore interviews one or more OR academics or professionals about interesting recent work. Frequently, though not always, the subject matter pertains to a recent or forthcoming publication, but the discussions never get very technical, so they are suitable for a general audience. Episodes typical run in the vicinity of 15 to 20 minutes, which I find enough for a full exposition but not long enough to be tedious. They are generally quite well done.</p><p>What motivated me to post was the most recent episode, "<a href="https://pubsonline.informs.org/do/10.1287/orms.2021.05.07p" target="_blank">Helping valuable donor milk reach infants in need</a>", which I listened to while walking on a nature trail, feeding mosquitoes. The guest was Professor Lisa Maillart of the University of Pittsburgh, discussing a model she and colleagues developed to help donor milk banks convert donations into products for end use. If you are not familiar with milk banks (I actually was, not sure why), think "blood bank" but with donated breast milk replacing donated blood and babies in need replacing surgical patients in need. For greater detail, I recommend listing to the episode (linked above, approximately 20 minutes duration).</p><p>The application is the sort of thing that tends to give one warm fuzzy feelings. (Who doesn't like babies?) Moreover, during the podcast Prof. Maillart made a point that I think bears some reflection. I have not seen their model (the paper is not out yet), but from the sounds of it I think we can assume it is a production planning/blending model that likely resembles other models with which many of us are familiar. Early results from implementing the model seem to have produced substantial gains to the milk bank with which the authors collaborated. Prof. Maillart noted that, given the variety of constraints involved and the number of decisions to be made, scheduling production manually (based on experience, intuition or maybe just guesswork or subconscious heuristics) is challenging. In other words, for the non-OR person doing the planning, it is a "hard" problem, in part due to the number things to be juggled. To an OR person, it may not seem that hard at all.</p><p>For me, at least, "hard" means difficult to formulate because it involves some complicated constructs or some probabilistic/squishy elements, or difficult to formulate because something is nonlinear (and not easily approximated), or possibly difficult to solve because feasible solutions are in some way tough to find or the minimum possible scale (after decomposing or clustering or whatever it takes to get the dimensions down) still overwhelms the hardware and software.&nbsp; (At this point I'll stop and confess that my perspective inevitably is that of an optimizer, as opposed to a simulator or a stochastic process &lt;insert your own noun ending in "er" here -- I'm at a loss&gt;.) For a typical person, going from five constraints of a particular sort (for instance, capacity limits on individual workers) to ten can be "hard". For an OR person, it just means the index range of a single constraint changes.</p><p>After listening to the episode, I am left wondering (not for the first time) how often people in the "real world" stare at "hard" problems that would seem relatively straightforward to an OR person ... if only we knew about them, which usually requires that the problem owner know about us. INFORMS is working to publicize both itself and the work of the OR community, but I think we still fly below almost everybody's radar.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-18805433831784190182021-07-13T17:30:00.000-04:002021-07-13T17:30:07.444-04:00Big-M versus Goals (Part II)<p>This is a continuation of my <a href="https://orinanobworld.blogspot.com/2021/07/big-m-versus-goals.html" target="_blank">previous post</a>, in which I discussed the common modeling issue of using a binary variable to enforce or relax a linear constraint. The standard approach to this uses what is commonly known as a "big M" constraint, but there are alternatives. The two that I mentioned in the previous post were a version of Benders decomposition frequently termed "combinatorial Benders cuts" (CBC) and an approach specific to CPLEX using disjunctive "goals". The previous post illustrated this using a mixed integer linear program (MILP) for the two-group discriminant problem.</p><p>I've used both "big M" and CBC before, but I had never used goals, and I was curious about the relative performance of the three methods. So I coded the discriminant problem from the previous post in Java (with a slightly different objective function, which is irrelevant to the discussion at hand) and ran the various models using four data sets that I have left over from a paper I wrote 30-ish years ago. The data sets relate to breast cancer (444+239), diabetes (500+268), liver disease (145+200) and (just to prove I don't have a morbid fascination with disease) radar returns (126+225). The numbers in parentheses are the sizes of the two samples.<br /></p><p>I ran the problems using CPLEX 20.1 with a 10 minute time limit and default values for all parameters. For the CBC model, I used generic callbacks with a local copy of the LP subproblem for each thread (which will relate to memory use). The table below shows, for each combination of data set and model, the run time (in seconds), the final objective value (lower is better), the final bound (bigger is better) and the approximate peak memory use (in gigabytes, omitted for cases where the memory use was "trivial").<br /></p><p><br /></p><style type="text/css">.tg {border-collapse:collapse;border-color:#9ABAD9;border-spacing:0;} .tg td{background-color:#EBF5FF;border-color:#9ABAD9;border-style:solid;border-width:1px;color:#444; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#409cff;border-color:#9ABAD9;border-style:solid;border-width:1px;color:#fff; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-baqh{text-align:center;vertical-align:top} .tg .tg-wn3b{background-color:#D2E4FC;text-align:center;vertical-align:middle} .tg .tg-j0tj{background-color:#D2E4FC;text-align:center;vertical-align:top} .tg .tg-nrix{text-align:center;vertical-align:middle} </style><div align="center"><table class="tg"><thead> <tr> <th class="tg-baqh">Data</th> <th class="tg-baqh">Model</th> <th class="tg-baqh">Time (s.)</th> <th class="tg-baqh">Objective</th> <th class="tg-baqh">Bound</th> <th class="tg-baqh">Memory (GB)</th> </tr></thead><tbody> <tr> <td class="tg-wn3b" rowspan="3">Cancer</td> <td class="tg-j0tj">Big M</td> <td class="tg-j0tj">2.2</td> <td class="tg-j0tj">0.0124</td> <td class="tg-j0tj">0.0124</td> <td class="tg-j0tj">---</td> </tr> <tr> <td class="tg-baqh">CBC</td> <td class="tg-baqh">6.5</td> <td class="tg-baqh">0.0124</td> <td class="tg-baqh">0.0124</td> <td class="tg-baqh">---</td> </tr> <tr> <td class="tg-j0tj">Goals<br /></td> <td class="tg-j0tj">600</td> <td class="tg-j0tj">0.0124</td> <td class="tg-j0tj">0.0056</td> <td class="tg-j0tj">1.0</td> </tr> <tr> <td class="tg-nrix" rowspan="3">Diabetes</td> <td class="tg-baqh">Big M</td> <td class="tg-baqh">600</td> <td class="tg-baqh">0.2057</td> <td class="tg-baqh">0.0614</td> <td class="tg-baqh">0.2</td> </tr> <tr> <td class="tg-j0tj">CBC</td> <td class="tg-j0tj">600</td> <td class="tg-j0tj">0.3966</td> <td class="tg-j0tj">0.0598</td> <td class="tg-j0tj">2.9</td> </tr> <tr> <td class="tg-baqh">Goals<br /></td> <td class="tg-baqh">600</td> <td class="tg-baqh">0.2239</td> <td class="tg-baqh">0.0038</td> <td class="tg-baqh">4.1</td> </tr> <tr> <td class="tg-wn3b" rowspan="3">Liver</td> <td class="tg-j0tj">Big M</td> <td class="tg-j0tj">600</td> <td class="tg-j0tj">0.2521</td> <td class="tg-j0tj">0.1243</td> <td class="tg-j0tj">0.2</td> </tr> <tr> <td class="tg-baqh">CBC</td> <td class="tg-baqh">600</td> <td class="tg-baqh">0.3244</td> <td class="tg-baqh">0.1013</td> <td class="tg-baqh">7.6</td> </tr> <tr> <td class="tg-j0tj">Goals<br /></td> <td class="tg-j0tj">600</td> <td class="tg-j0tj">0.2639</td> <td class="tg-j0tj">0.0169</td> <td class="tg-j0tj">2.8</td> </tr> <tr> <td class="tg-nrix" rowspan="3">Radar</td> <td class="tg-baqh">Big M</td> <td class="tg-baqh">91</td> <td class="tg-baqh">0.0190</td> <td class="tg-baqh">0.0190</td> <td class="tg-baqh">---</td> </tr> <tr> <td class="tg-j0tj">CBC</td> <td class="tg-j0tj">139</td> <td class="tg-j0tj">0.0186</td> <td class="tg-j0tj">0.0186</td> <td class="tg-j0tj">0.2</td> </tr> <tr> <td class="tg-baqh">Goals<br /></td> <td class="tg-baqh">600</td> <td class="tg-baqh">0.0190</td> <td class="tg-baqh">0.0066</td> <td class="tg-baqh">0.2</td> </tr></tbody></table></div><p>&nbsp;</p><p>Before forging ahead, I'll note one small anomaly in the results for the radar data set. Both the "big M" and CBC methods got "provably optimal" solutions (lower bound = upper bound), but the CBC solution was slightly better (misclassifying six observations, versus seven for "big M"). I think this is just an issue with tolerance settings (constraint satisfaction tolerance, integrality tolerance or a combination of the two).<br /></p><p>I've said on more than one occasion that the only valid generalization about integer programs is that there are no other valid generalizations about integer programs. That won't stop me from looking for possible patterns in the table, but it is important to keep in mind not only the small number of problems tried (four) and the specific nature (two-group discriminant analysis), but also the fact that I have fairly tight, data-based values of $M_i$ for the "big M" models. In two data sets, "big M" and CBC both got proven optimal solutions (with CBC being slower but not particularly slow). Both times, the goal approach also found an optimal solution (or optimal-ish in the radar case), but was nowhere getting the bound close to the optimal value. In the other two cases, goals did better on the primal side (found a better incumbent solution) but CBC did better on the dual side (found a tighter bound). Since CBC is busy hacking off infeasible candidate solutions and the goal approach is trying to satisfy goals, I suspect this pattern might generalize a bit.</p><p>On the two more difficult data sets (diabetes and liver disease), both CBC and goals used much more memory than "big M" did. In both cases, portions of the node file were being compressed and possibly shipped off to disk by CBC and goals, whereas "big M" never got to the point of consuming very much memory. I'll confess that I was expecting goals to be much more of a memory hog than CBC, since the goal "stack" gets replicated at every node, but CBC used much more memory than goals on the liver disease data set. Although I did not keep count, my guess is that indicates CBC was cranking out a huge number of Benders cuts. I think the memory use can be mitigated (somewhat) by making the cuts purgeable, but I did not bother to test that.</p><p>My last take-away: on the easy problems "big M" was fastest, and on the challenging problems "big M" had both the best incumbent value and the tightest bound. Again, this particular set of problems benefits from fairly tight $M_i$ values. So, henceforth, I will likely stick to "big M" whenever I think I have good values for $M$.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-91045655275327791102021-07-08T16:46:00.001-04:002021-07-08T16:46:14.166-04:00Big-M versus Goals<p>Much has been written (including by yours truly) about the trials and tribulations of "big M" integer programming models. A common use of "big M" is to let binary variables turn constraints on or off. So, for instance, $$a^\prime x \le b + Mz\quad (1)$$with $x$ a vector of continuous variables and $z$ a binary variable is intended to enforce $a^\prime x \le b$ when $z=0$ and not enforce it when $z=1$.<br /></p><p>Large values of $M$ can contribute to weak relaxations (leading to slow progress on the bound), "leakage" (where a value of $z$ can be close enough to 0 that the solver considers it 0 to within rounding error while making $Mz$ big enough to relax the constraint), and various numerical problems. Small values of $M$ may make the constraint binding when it is supposed to be relaxed. Still, for most people the "big M" approach is the only way they know to model certain problem features.</p><p>One alternative, at least in some cases, is to use "combinatorial Benders decomposition" , in which the constraints in question are either enforced or ignored in a subproblem depending on the values of $z$ in the master problem. The good news is that this eliminates any worries about choosing $M$, since no coefficients $M$ are needed. The bad news is that (a) Benders decomposition is a bit advanced for many users, (b) it may require custom programming (as opposed to making the model in a high-level language and letting the modeling environment pass it to the solver), and (c) Benders decomposition is an "outer approximation", so the best bound may be a bit leisurely in converging to the optimum.</p><p>There is another alternative available, at least with CPLEX. Recent versions of CPLEX support "goals". The user's manual does a masterful job of not actually defining what a goal is -- according to the CPLEX 20.1 manual, goals are things that "allow you to take control of the branch &amp; cut search procedure used by IBM ILOG CPLEX to solve MIP problems". Basically, a goal is an alternative form of a constraint (I think), which rather than explicitly appearing in the constraint matrix is put in a stack of goals, passed to nodes when they are created, and somehow used to influence the creation of child nodes (I think).</p><p>The tie-in to today's topic is that one type of goal provided by CPLEX is an "or" goal, which is what it sounds like: a disjunction of two or more constraints or goals. So an alternative to writing constraint (1) with the dreaded $M$ would be to use an "or" goal $$a^\prime x \le b \mathrm{\quad OR \quad} z=1.\quad (2)$$</p><p>I was curious about how well this would work, so I tried to do a comparison between "big-M" and goal-based models for a two-group discriminant problem. The gist of the model is as follows. We have as data a sample of vectors $x_i\in \mathbb{R}^n$ from two groups. Let $G_0$ and $G_1$ denote the indices belong to the first and second groups respectively. We want to find coefficients $w\in \mathbb{R}^n$, $w_0 \in \mathbb{R}$ for a linear function $f(x) = w^\prime x + w_0$ such that $f(x) \lt 0$ predicts membership of $x$ in the first group and $f(x) \gt 0$ predicts membership in the second group.</p><p>The specific model I started with (from some research I did in my much younger days) includes one more variable $d\ge \delta$ (where $\delta$ is some small positive constant) and binary variables $z_i$ signaling whether an observation is correctly ($z_i =0$) or incorrectly ($z_i=1$) classified. Variable $d$ captures the minimum absolute score of a correctly classified observation, which in essence represents the amount of separation between (correct) scores for the two groups. If $d$ is too small, you may end up classifying observations positive or negative based on what amounts to rounding error, hence the lower bound on $d$.<br /></p><p>The "big M" version is as follows: \begin{align*} \min\quad\sum_{i}z_{i}-\epsilon d\\ \mathrm{s.t.}\quad w^{\prime}x_{i}+w_{0}+d &amp; \le\phantom{-}M_{i}z_{i}\quad i\in G_{0}\\ w^{\prime}x_{i}+w_{0}-d &amp; \ge-M_{i}z_{i}\quad i\in G_{1}\\ -1\le w &amp; \le1\\ w_{0} &amp; \quad \mathrm{free}\\ d &amp; \ge\delta\\ z_{i} &amp; \in\left\{ 0,1\right\} \quad\forall i. \end{align*}The model minimizes the number of misclassifications with a secondary criterion of maximizing separation. The coefficient $\epsilon$ is chosen to keep the objective contribution small enough that the solver is not tempted to make unnecessary misclassifications just to boost the value of $d$. Putting bounds on $w$ prevents huge coefficients for the classifier (which again could result in decisions being made based on rounding error). The model has been shown to work correctly.</p><p>The goal version of the model keeps the bounds on the variables and objective function but replaces all the "big-M" constraints with disjunctions of the form $w^\prime x_i +w_0 +d \le 0$ or $z_i=1$ for $i\in G_0$ and similarly for $i\in G_1$. In other words, "classify this observation correctly or pay the price for misclassifying it". I coded both models in Java and ran a test case, expecting both to produce an optimal classifier but unsure which would be faster. There was an unpleasant surprise waiting for me: CPLEX declared the goal-based model unbounded! It was right. You can satisfy all the disjunctions by declaring all the observations misclassified ($z_i = 1$ for all $i$). That lets you choose an arbitrarily large value for $d$, large enough that $\epsilon d$ is arbitrarily bigger than the sum of the $z$ variables, making the objective arbitrarily negative.</p><p>This is not a problem with the "big M" model, because no matter how large you make $M_i$, you still have a finite bound on the left side of each constraint. The fix was to come up with a defensible upper bound for $d$ and add it to the goal model, making the goal model bounded. With that fix in place, both models arrived at optimal solutions, in what was comparable time for the one test case I have run so far.</p><p>So the takeaway here is that if you want to use disjunctions to avoid "big M", you may need to take extra care to ensure that your model is bounded.<br /></p><p><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p> Codato, G. and Fischetti, M. Combinatorial Benders' Cuts for Mixed-Integer Linear Programming. <i>Operations Research</i> 54(4), <b>2006</b>, 756-766.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-77452379499435215562021-07-05T16:52:00.000-04:002021-07-05T16:52:09.105-04:00Vertex Numbering via GA<p>A <a href="https://or.stackexchange.com/questions/6519/numbering-the-vertices-of-an-n-layer-graph-so-that-edges-have-similar-numbered" target="_blank">question</a> on OR Stack Exchange asks how to number vertices in a layered graph so that the endpoints of edges have similar numbers. Note that "numbering" the vertices here means numbering <i>within layers</i> (so that, for instance, every layer has a vertex numbered 1). We will assume that every node has a unique ID before we launch into the numbering process. The author of the question chose the sum of squared differences of endpoint labels for all edges as the objective function to minimize. The following image shows an example of a layered graph with a (probably suboptimal) numbering scheme. The numbers on the edges are their contributions to the objective function.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-cIc5scuoc2k/YONYpSn2KwI/AAAAAAAADnM/ba5kQECYCQ0jqt6XASifeSIWMgb7TEtYgCLcBGAsYHQ/s536/vertexnumbering.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="example graph with three layers" border="0" data-original-height="472" data-original-width="536" height="564" src="https://1.bp.blogspot.com/-cIc5scuoc2k/YONYpSn2KwI/AAAAAAAADnM/ba5kQECYCQ0jqt6XASifeSIWMgb7TEtYgCLcBGAsYHQ/w640-h564/vertexnumbering.png" width="640" /></a></div><p></p><p>The prolific Rob Pratt noted that the problem can be modeled as an assignment problem with a quadratic objective function, using binary variables. That model produces an exact solution, given sufficient time and memory.</p><p>Note that numbering the nodes within a layer is equivalent to picking one of the possible permutations of the node IDs. The author of the question indicated receptiveness to a metaheuristic, so I decided to try coding a random key genetic algorithm (RKGA) for the problem. I've mentioned RKGAs before (for instance, <a href="https://orinanobworld.blogspot.com/2020/09/a-greedy-heuristic-wins.html" target="_blank">here</a> and <a href="https://orinanobworld.blogspot.com/2021/04/a-ga-model-for-joint-clustering-problem.html" target="_blank">here</a>). As I understand it, they were originally designed for sequencing / scheduling problems, where things need to be permuted optimally, so an RKGA seemed like a natural choice. I coded both the integer programming (IP) model and the RKGA in Java, using <a href="https://www.ibm.com/products/ilog-cplex-optimization-studio" target="_blank">CPLEX</a> 20.1 as the IP solver and <a href="https://watchmaker.uncommons.org/" target="_blank">Watchmaker Framework</a> 0.7.1 for the GA. The Watchmaker Framework has not been under active development for quite a few years, but it works well.</p><p>To use an RKGA, you need to come up with a coding for a "chromosome" (candidate solution) and a mechanism for decoding the chromosome into a solution to the original problem (in this case separate vertex permutations for each graph layer) such that the decoded chromosome is always feasible. I chose as my chromosome representation a double-precision vector with elements between 0 and 1, having one element per vertex in the original graph. Double-precision is probably overkill, but I'm in the habit of using double-precision rather than single-precision, so it was the path of least resistance. To decode the chromosome, I first had to chop it up into smaller vectors (one per layer) and then extract the sort index of each smaller vector. So, using the image above as an example, a chromosome would be a double vector with 2 + 5 + 3 = 10 components. If the last three components were (0.623, 0.021, 0.444) the sort indices would be (3, 1, 2), yielding the vertex numbering for the last layer in the image. To convert a double vector into a sort index vector, I used a Java library (<a href="https://gitlab.msu.edu/orobworld/valueorderedmap" target="_blank">ValueOrderedMap</a>, described in <a href="https://orinanobworld.blogspot.com/2019/10/a-value-ordered-java-map.html" target="_blank">this post</a>) that I wrote some time back.</p><p>A GA can "stagnate", meaning cease to improve on the best known solution. One obvious reason for stagnation is that it cannot improve on an optimal solution, but stagnation can occur for other reasons. On small test problems, the GA tended to stagnate rather quickly, so I set a stagnation limit and put the GA in a loop that would restart it up to nine times or until a specified time limit (five minutes ... I wasn't very ambitious). On larger test problems, it was not as quick to stagnate, but it eventually did.</p><p>I used the GA's best solution as a starting solution for the IP model. On smaller test problems, CPLEX occasionally closed the gap to zero within five minutes but usually did not. On larger problems, CPLEX struggled to get the best bound above zero (100% gap) within five minutes. So I suspect the IP model is prone to loose bounds. An example of a "small" test problem is one with 29 vertices spread across five layers and 35 edges. (I worked mainly with sparse examples, to keep the size of the IP model down.)<br /></p><p>Interestingly, in <i>none</i> of the trials did CPLEX improve on the initial solution provided by the GA. So it might be that the GA was consistently finding an optimum, although I cannot prove that. (On the few smaller instances where CPLEX reached provable optimality, the optimal solution was indeed the GA solution.) Note that, while the GA solution appeared optimal, that does not mean that each GA run produced an optimum. On the 29 vertex example mentioned above, CPLEX found a provable optimum with objective value 111. The GA ran 10 times (using about 92 seconds total), and of the 10 runs two stagnated at 111, five at 112, and three at 114. So even if the GA is prone to finding optimal solutions, it may require multiple starts to do so.</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-32588781743794504062021-07-01T15:19:00.002-04:002021-07-01T15:19:28.314-04:00Smallest Pairwise Difference in R<p>&nbsp;In a recent <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2021/06/arranging-points-on-line.html" target="_blank">blog post</a>, OR blogger Erwin Kalvelagen included some R code for a genetic algorithm, including an objective function that takes as input a vector $x\in\Re^n$ and returns the smallest absolute pairwise difference in its elements. The actual objective calls for the smallest difference in <i>consecutive</i> values, but Erwin correctly notes that the smallest absolute difference in any pair will automatically occur in consecutive values, so the "all pairs" approach yields the same objective value (and is considerably friendlier in the MIP and MINLP models he proposes).</p><p>I did a little experimenting and confirmed his opinion that the GA is unlikely to be competitive with the MIP model. That led me to a tangent involving the way the objective function for the GA is coded. Erwin used the obvious approach: two nested for loops to find all pairs of values <i>once</i>, avoiding comparing $x_i$ with $x_j$ and then (redundantly) $x_j$ with $x_i$, and avoiding comparing $x_i$ with itself. This approach does $O(n^2)$ work. An alternative approach is to first sort the vector $x$, which takes $O(n \log n)$ work, and then compute consecutive differences and their minimum value ($O(n)$). I put together a little R code to compare the two, and unsurprisingly the method with sorting is faster (and <i>much</i> faster when $n$ gets big).</p><p>There is another wrinkle to this. I've seen a number of articles and comments online asserting that explicit looping in R (as with Erwin's nested for loops) is inefficient, and should be avoided at all costs in favor of using vectorized functions (where the looping is presumably coded in C or C++ and baked into the compiled function). I've also seen contrary opinions saying that the concern about looping is overstated. There is also, I think, a middle ground: even if explicit loops are inefficient, that's likely only a concern if you are looping over very large objects, or looping many times, or both. Erwin's test case has $n=50$, which I suspect is not large enough to be worth worrying about the efficiency or inefficiency of looping (even though the GA will evaluate the objective function repeatedly).</p><p>So to test this, I cobbled together an R notebook (which you can find <a href="https://rubin.msu.domains/blog/pairwise_differences.nb.html" target="_blank">here</a>) that tests all three versions of the objective function on vectors of various dimensions. As I thought, the sorted method dominates. Even at $n=50$ it's competitive with looping (although looping appears to be slightly faster), but at $n=2,000,000$ the sorting method takes about one third of a second on my PC (using R 4.1), whereas I estimate the nested loop method would take about <i>10 days</i> (!).</p><p>The third method uses the (vectorized) outer product operator in R. It computes all $n^2$ absolute differences, whereas the nested loops calculate the $n(n-1)/2$ unique (nontrivial) differences, so in theory it does about twice the work the nested for loops do. Despite that, it is faster than the nested loops (though not nearly as fast as sorting). For $n$ around 5,000 to 10,000, the outer product approach seems to be about 10x faster than the nested loops.</p><p>So I take away two conclusions from this. The first is confirmation of a bit of wisdom that I believe I heard from Princeton computer scientist Robert Sedgewick in a MOOC based on his book (with Kevin Wayne) "<a href="https://algs4.cs.princeton.edu/home/" target="_blank">Algorithms</a>". Sorting is computationally cheap, so if you think it might help with a calculation, try it. The second is confirmation of the assertion that, even in the latest version (4.1) of R, explicit looping is probably slower, and quite possibly by a noticeable amount, than using vectorized methods ... when there are vectorized methods available.</p><p>If you are curious about the code for the alternative methods, it's embedded in the notebook.</p><p>&nbsp;<br /></p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-28746111860134491802021-05-10T19:17:00.003-04:002021-05-11T14:23:52.134-04:00Symmetric Difference of Sets in Java<p>The <i>symmetric difference</i> of two sets $A$ and $B$ is defined as $$C=(A \cup B)\backslash (A \cap B).$$It is the set of objects that belong to either $A$ or $B$ but not both. I've been working on some Java code for a research project in which I will need to compute the sizes of the symmetric differences of a large number of pairs of sets. The contents of the sets are nonnegative integers (Java type <span style="font-family: courier;">Integer</span>), which makes life a bit simpler because integers are nicely ordered. (In Java-speak, <span style="font-family: courier;">Integer</span> implements the <span style="font-family: courier;">Comparable&lt;Integer&gt;</span> interface.) Since I will be doing a large number of differences, and since the sets involved will be moderately large (by my standards), I wanted to find the fastest way to compute a difference. So I wrote a Java program to generate random pairs of integer sets and compute the size of their symmetric difference four different ways.</p><p>Since the introduction of streams in Java (I believe in version 8), what I think is the most obvious/intuitive way to do it is to convert each set to a stream, filter out members of the other set, and count up the survivors. On the other hand, I am a happy user of the <a href="https://commons.apache.org/proper/commons-collections/" target="_blank">Apache Commons Collections</a> library, whose <span style="font-family: courier;">CollectionsUtils</span> class provides not one but two ways to do this. One is to use the <span style="font-family: courier;">subtract()</span> method twice, switching the order of the arguments. This does the same thing that my first (stream-based) method does. The other is to use the <span style="font-family: courier;">disjunction()</span> method, which calculates the symmetric difference in one invocation.</p><p>Equally obvious to me from a mathematical perspective, but not from a Java perspective, is to take the bitwise exclusive OR of the <a href="https://en.wikipedia.org/wiki/Indicator_vector" target="_blank">characteristic vectors</a> of the two sets. Java contains a <span style="font-family: courier;">BitSet</span> class, with <span style="font-family: courier;">set()</span> and <span style="font-family: courier;">flip()</span> methods for individual bits, which makes this easy to do.</p><p>My program runs the experiments on instances of both <span style="font-family: courier;">HashSet&lt;Integer&gt;</span> and <span style="font-family: courier;">TreeSet&lt;Integer&gt;</span>. Hash sets are generally faster, but tree sets have more features (and impose an internal ordering on their contents). My somewhat naive intuition was that having the contents accessed in ascending order would make differencing tree sets faster than differencing hash sets.</p><p>There are lots of moving parts here (how large the integers get, how big the sets involved in the comparisons are, ...), so I hesitate to read too much into the results. That said, here are some timing results using sets of cardinality between 100 and 200 with integers from 0 to 100,000. Times are in microseconds and are averages of 10,000 replications. (You can afford big samples when things go this quickly.)</p><p><br /></p><div align="center"><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-baqh{text-align:center;vertical-align:top} </style><table class="tg"><thead> <tr> <th class="tg-baqh">Method</th> <th class="tg-baqh">Streams</th> <th class="tg-baqh">subtract()<br /></th> <th class="tg-baqh">disjunction()</th> <th class="tg-baqh">BitSet</th> </tr></thead><tbody> <tr> <td class="tg-baqh">HashSet</td> <td class="tg-baqh">10.07</td> <td class="tg-baqh">36.40</td> <td class="tg-baqh">62.15</td> <td class="tg-baqh">9.68</td> </tr> <tr> <td class="tg-baqh">TreeSet</td> <td class="tg-baqh">23.79</td> <td class="tg-baqh">35.37</td> <td class="tg-baqh">60.94</td> <td class="tg-baqh">7.21</td> </tr></tbody> </table></div><p>&nbsp;</p><p>I was surprised (OK, shocked) to find that the <span style="font-family: courier;">disjunction()</span> method, which directly computes the symmetric difference, was almost twice as slow as two invocations of the <span style="font-family: courier;">subtract()</span> method. Less surprising was that turning both sets into streams and filtering each other out was faster than calling <span style="font-family: courier;">subtract()</span> twice. (Note that I am computing the size of the symmetric difference, not the symmetric difference itself. Uniting the two filtered streams or the two sets resulting from calls to <span style="font-family: courier;">subtract()</span> into one combined set might move their times closer to those of <span style="font-family: courier;">disjunction()</span>).</p><p>Another surprise to me was that the approach using streams was consistently slower with tree sets than it was with hash sets, despite the tree sets being inherently ordered. The characteristic vector (<span style="font-family: courier;">BitSet</span>) approach seemed to have a slight preference for tree sets, but I'm not entirely sure that was consistent.<br /></p><p>In this particular run, the characteristic vector approach (using <span style="font-family: courier;">BitSet</span>) beat all comers. In other runs with similar parameters, streams sometimes beat <span style="font-family: courier;">BitSet</span> and sometimes did not when the sets were instances of <span style="font-family: courier;">HashSet</span>. With <span style="font-family: courier;">TreeSet</span>, using <span style="font-family: courier;">BitSet</span> appeared to be consistently faster. Let's contrast that with a second experiment, in which the integers are still between 0 and 100,000 but the sets are smaller (cardinality between 10 and 20).</p><p><br /></p><div align="center"><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-baqh{text-align:center;vertical-align:top} </style><table class="tg"><thead> <tr> <th class="tg-baqh">Method</th> <th class="tg-baqh">Streams</th> <th class="tg-baqh">subtract()<br /></th> <th class="tg-baqh">disjunction()</th> <th class="tg-baqh">BitSet</th> </tr></thead><tbody> <tr> <td class="tg-baqh">HashSet</td> <td class="tg-baqh">3.18</td> <td class="tg-baqh">5.96</td> <td class="tg-baqh">9.75<br /></td> <td class="tg-baqh">6.24<br /></td> </tr> <tr> <td class="tg-baqh">TreeSet</td> <td class="tg-baqh">3.11<br /></td> <td class="tg-baqh">4.89<br /></td> <td class="tg-baqh">8.10<br /></td> <td class="tg-baqh">4.07<br /></td> </tr></tbody> </table></div><p>&nbsp;</p><p>Note that with the smaller sets the method using streams beats the characteristic vector (<span style="font-family: courier;">BitSet</span>) method, and even the Commons Collections <span style="font-family: courier;">subtract()</span> method may be a bit better than using the characteristic vector.</p><p>For my project, I am using instances of <span style="font-family: courier;">TreeSet</span>, and I'm fairly certain the sets will (mostly) have cardinality in the low hundreds, so I will probably go with the <span style="font-family: courier;">BitSet</span> approach. If anyone would like to run their own tests, my code is available in a <a href="https://gitlab.msu.edu/orobworld/setsymdiff" target="_blank">GitLab repository</a>.</p><p><b>Update</b>: It belatedly occurred to me that in my research project, where I only need to get the size of the symmetric difference and not the symmetric difference itself, it might make sense to exploit the fact that the size of the symmetric difference between $A$ and $B$ is$$\vert A\vert + \vert B \vert - 2 \cdot \vert A \cap B\vert.$$So I can compute the size of the intersection, do a little arithmetic, and have the size of the symmetric difference.</p><p>I updated my code to include two versions of this approach. One version computes the intersection by streaming one set and filtering it based on inclusion in the other set. The other version uses the <span style="font-family: courier;">CollectionUtils.intersection()</span> method. Once again, the <span style="font-family: courier;">CollectionUtils</span> method was not competitive. Comparing the stream intersection method to the original stream approach and the characteristic vector approach using the original specifications for sets (cardinality 100 to 200), it seems the streamed intersection method is about twice as fast as the original stream method on both hash sets and tree sets (which makes sense, since it does with one stream what the original method did with two). It is also about twice as fast as the characteristic vector method on hash sets, but roughly half as fast on tree sets, as seen in the table below. (Times for the first two methods differ from previous tables because exact timings are unfortunately not reproducible.)<br /></p><p><br /></p><div align="center"><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-baqh{text-align:center;vertical-align:top} </style><table class="tg"><thead> <tr> <th class="tg-baqh">Method</th> <th class="tg-baqh">Streams</th> <th class="tg-baqh">BitSet</th> <th class="tg-baqh">Intersection<br /></th> </tr></thead><tbody> <tr> <td class="tg-baqh">HashSet</td> <td class="tg-baqh">11.33</td> <td class="tg-baqh">10.33</td> <td class="tg-baqh">5.59<br /></td> </tr> <tr> <td class="tg-baqh">TreeSet</td> <td class="tg-baqh">23.32</td> <td class="tg-baqh">7.47<br /></td> <td class="tg-baqh">12.23<br /></td> </tr></tbody> </table></div><p>&nbsp;</p><p>So the single stream intersection approach looks to be the fastest for computing the size (but, again, not contents) of the symmetric difference if I'm using hash sets, while the characteristic vector (<span style="font-family: courier;">BitSet</span>) approach is fastest if I'm using tree sets.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-53304958082060630952021-04-28T18:09:00.001-04:002021-04-28T18:09:17.613-04:00A BDD with JGraphT<p>Decision diagrams, and in particular binary decision diagrams (BDDs) , were originally introduced in computer science to evaluate logic propositions or boolean functions. Lately, they've taken on multiple roles in discrete optimization . I've been reading an excellent book  on them, with ideas about using BDDs in a current research project. As is my wont, I'll be coding the research in Java, so I wanted to do a little demo project to figure out how to build and process a BDD in Java.</p><p>Not wanting to reinvent any wheels, I looked for an open-source Java graph library with which to build the diagrams, and settled on JGraphT . Not only does JGraphT have the necessary building blocks, it has much better online documentation than many libraries. Also, there is a very helpful article  about it on the Baeldung web site (which is itself an extremely useful site for all things Java).</p><p>A BDD is a directed, acyclic, layered multigraph with edge weights. If you're not familiar with the term "multigraph", it means that there can be two or more distinct edges between the same pair of nodes, in the same direction. In a BDD, each node represents a state of the system, with up to two outbound arcs, corresponding to true (1) or false (0) values for a particular decision variable. The decision variable is the same for all nodes in a particular layer. An arc is omitted if it represents a decision which, given the state, would make the solution infeasible. To keep the size of the BDD in check (somewhat), you do not want multiple nodes in a layer with the same state. The multigraph aspect arises because, in some circumstances, the next state may be the same regardless of the decision at the current node (so both arcs go to the same child node). Among the attractions of the JGraphT library are its support for nodes based on arbitrary classes (which in a BDD means the state at the node) and for multigraphs.</p><p>To learn how to build BDDs with JGraphT, I decided to solve a maximal independent set problem (MISP)  with integer node weights. This means choosing the subset of nodes with greatest total weight such that no two chosen nodes are adjacent. JGraphT contains routines to generate some well-known (to graph theorists -- less well known to me) graphs, and for convenience I chose the Chvátal graph , which has 12 nodes and 24 edges. Here is an illustration of the Chvátal graph, with the (randomly generated) node weights in parentheses.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-w0HsZHkKJK0/YInVM3EUt1I/AAAAAAAADgQ/ZTQ_gl2UzsYGeSIFUNiPedwRx0RVO795gCLcBGAsYHQ/s635/chvatal.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="Chvatal graph" border="0" data-original-height="635" data-original-width="576" height="640" src="https://1.bp.blogspot.com/-w0HsZHkKJK0/YInVM3EUt1I/AAAAAAAADgQ/ZTQ_gl2UzsYGeSIFUNiPedwRx0RVO795gCLcBGAsYHQ/w580-h640/chvatal.png" width="580" /></a></div>My Java program uses routines in the JGraphT library to turn the graph into a DOT file , which it saves. I then use GraphViz  outside the Java program to convert the DOT file into the format I need for wherever the plot is going.<p></p><p>Using the same DOT export trick, I managed to generate a plot of the BDD, in which nodes display the set of vertices still available for addition to the independent set, arcs are solid if a vertex is being added to the independent and dotted if not, and solid arcs are annotated with the number of the vertex being added.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://rubin.msu.domains/blog/bdd.svg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img alt="BDD graph" border="0" data-original-height="1451" data-original-width="1939" height="299" src="https://1.bp.blogspot.com/-JuBxEfr93WE/YInXGojPIPI/AAAAAAAADgY/IZdhFygLKFoD6WxUERub4T202ULPQif_wCLcBGAsYHQ/w400-h299/bdd.png" width="400" /></a></div>Unfortunately, Blogger does not accept SVG images and the BDD is a bit too big for a legible PNG graph. If you want to see a better image, click it and an SVG version should open in a new window or tab.<br /><p></p><p>This post is already a bit long, so I won't go into details about the various coding issues I ran into or how I worked around them. I will point out one minor mathematical issue. Since the MISP is a maximization problem, the goal is to find the longest (in terms of weight, not number of edges) path from root node to terminal node in the BDD. JGraphT has a package containing shortest path algorithms, but no longest path algorithms. Fortunately, the number of layers in the graph is fixed (one layer per decision variable, plus one to hold the terminal node), which means the number $L$ of links in a longest path is fixed. So we simply find the maximum weight $W$ of any node in the graph, change the weight $w_e$ of each edge $e$ to $LW - w_e$, and find the shortest path using the modified weights. That path is guaranteed to be the longest path with respect to the original weights.</p><p>Last thing: As usual, my code is available for you to play with from my <a href="https://gitlab.msu.edu/orobworld/maximalindependentset" target="_blank">GitLab repository</a>.<br /></p><h3 style="text-align: left;">References</h3><div style="text-align: left;"> Wikipedia entry: <a href="https://en.wikipedia.org/wiki/Binary_decision_diagram" target="_blank">Binary decision diagram</a></div><div style="text-align: left;"> <a href="https://www.andrew.cmu.edu/user/vanhoeve/mdd/" target="_blank">Decision Diagrams for Optimization</a> (web page by <a href="http://www.andrew.cmu.edu/user/vanhoeve/" target="_top">Willem-Jan van Hoeve</a>)</div><div style="text-align: left;"> <span face="sans-serif">Bergman, D.; Cire, A. A.; van Hoeve, W.-J. &amp; Hooker, J. <a href="https://www.springer.com/us/book/9783319428475" target="_blank">Decision Diagrams for Optimization</a>. <i>Springer International Publishing AG, </i><b>2016</b></span>.</div><div style="text-align: left;"><span face="sans-serif"> <a href="https://jgrapht.org/" target="_blank">JGraphT</a> library</span></div><div style="text-align: left;"><span face="sans-serif"> <a href="https://www.baeldung.com/jgrapht" target="_blank">Introduction to JGraphT</a> (Baeldung)</span></div><div style="text-align: left;"><span face="sans-serif"> Wikipedia entry: <a href="https://en.wikipedia.org/wiki/Maximal_independent_set" target="_blank">Maximal independent set</a></span></div><div style="text-align: left;"> Wikipedia entry: <a href="https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph" target="_blank">Chvátal graph</a></div><div style="text-align: left;"> Wikipedia entry: <a href="https://en.wikipedia.org/wiki/DOT_(graph_description_language)" target="_blank">DOT (graph description language)</a></div> <a href="https://graphviz.org/" target="_blank">Graphviz - Graph Visualization Software</a><div style="text-align: left;"><br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0