tag:blogger.com,1999:blog-87813834610619295712023-03-24T06:59:32.525-04:00OR in an OB WorldA mix of operations research items and software tricks that I'll probably forget if I don't write them down somewhere.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.comBlogger472125tag:blogger.com,1999:blog-8781383461061929571.post-82821209989770134572023-03-20T14:54:00.003-04:002023-03-20T14:54:48.504-04:00Annoying Message with R GA Package<p>This is a small technical note (mainly a reminder to myself) about using the <span style="font-family: courier;">GA</span> package for genetic algorithms in R, specifically within an R Markdown document (such as an R notebook). The <span style="font-family: courier;">GA::ga()</span> function includes an optional argument to turn on parallel evaluation of the fitness function. The argument value can be true or false (default), which are self-explanatory, or a number of cores, or a character string ("snow" or "multicore") for how parallelization is to be done. The default is "multicore" except on Windows, where "snow" is apparently the only option. When I choose to turn on parallel objective evaluation, I take the easy route and just set it to <span style="font-family: courier;">TRUE</span>.</p><p>When run in a terminal, I just get a sequence of log entries, one per generation, and then it terminates. That's also what I see when the command is run in an R Markdown document ... until the document is rendered to HTML. In the HTML output, interspersed with each line of log output is a block of four identical copies of the following.<br /></p><pre style="-webkit-text-stroke-width: 0px; background-color: white; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; display: block; font-family: monospace; font-size: 13px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.42857; margin: 0px 0px 10px; orphans: 2; overflow-wrap: break-word; overflow: auto; padding: 9.5px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; widows: 2; word-break: break-all; word-spacing: 0px;"><code class="hljs" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: monospace; font-size: inherit; padding: 0px; white-space: pre-wrap;">loaded GA and set parent environment</code></pre><p>The fact that there are four copies is presumably because my PC has four cores. Turning off the monitor option gets rid of the progress printouts but does not eliminate the repetitive messages. Switching the parallel mode to "snow" does get rid of them, but I'm not sure what the broader implications of that are.</p><p>After significant spelunking, I traced the messages to the <span style="font-family: courier;">doParallel</span> R package, which is apparently being used to implement parallel processing under the "multicore" option. In any event, the way to get rid of them is to add the option "message = FALSE" to the R chunk in which <span style="font-family: courier;">ga()</span> is being executed. Barring other chunk options, this would change <span style="font-family: courier;">{r}</span> to <span style="font-family: courier;">{r, message = FALSE}</span> for that one chunk. (You can also set it as a global option in the document.) Happily, the monitor output (stats for each generation) still prints out. It just gets rid of those <expletive deleted> parallel processing messages.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-30048876379220296132023-03-07T16:48:00.000-05:002023-03-07T16:48:01.375-05:00Another GA for Clustering<p>Someone asked on OR Stack Exchange about <a href="https://or.stackexchange.com/questions/10059/how-to-perform-clustering-of-a-large-number-of-nodes/" target="_blank">clustering nodes</a>, a subject that comes up from time to time. The premise is that you have $N$ points (nodes), and for each pair $i \neq j$ there is (or may be) an edge with weight $w_{i,j} \in [0, 1]$ (where $w_{i,j} = 0$ if there is no edge between $i$ and $j$). The objective is to create exactly $G$ groups (clusters), with no group having cardinality greater than $M,$ such that the sum of within-group edge weights is maximized. It is simple to write an integer programming model for the problem, and the question includes one, but the author was looking for something that did not require an IP solver. The question specifies, as an example, dimensions of $N=500$ points, $G=8$ groups, and a maximum group size of $M=70.$</p><p>Since I have suggest genetic algorithms for previous clustering problems (see <a href="https://orinanobworld.blogspot.com/2021/04/a-ga-model-for-joint-clustering-problem.html" target="_blank">here</a> and <a href="https://orinanobworld.blogspot.com/2021/12/revisiting-joint-clustering-problem.html" target="_blank">here</a>), it probably will not shock you that I once again looked for a GA approach, using a "random key" algorithm (where the chromosome gets decoded into a feasible solution. In this case, the chromosome consists of $N+G$ values between 0 and 1. The first $G$ values are used to select the sizes $n_1,\dots, n_G$ of the groups. The last $N$ values are converted into a permutation of the point indices $1,\dots, N.$ The first $n_1$ entries in the permutation are the indices of the points in group 1, the next $n_2$ entries give the points in group 2, and so on.</p><p>The tricky part here is converting the first portion of the chromosome into the group sizes. We know that the maximum group size is $M,$ and it is easy to deduce that the minimum group size is $m = N - (G-1)M.$ So the minimum and maximum fractions of the population to assign to any group are $a=m/N$ and $b=M/N,$ where $$m = N - (G-1)M \implies a = 1 - (G-1)b.$$ Now suppose that I have values $\pi_1, \dots,\pi_G \in (a,b)$ and assign group sizes $n_i = \pi_i \cdot N,$ which clearly meet the minimum and maximum size limits. (I'm cheating here, but I'll fix it in a minute.) We want $\sum_{i=1}^G n_i = N,$ which means we need $\sum_i \pi_i = 1.$</p><p>To get the $\pi_i,$ we take a rather roundabout route. Assume that the first $G$ entries in the chromosome are $x_1,\dots,x_G \in (0,1).$ Set $$y_i = 1 - \frac{x_i}{\sum_{j=1}^G x_j} \in (0,1)$$ and observe that $\sum_{i=1}^G y_i = G-1.$ Finally, set $\pi_i = a + (b-a) y_i \in (a, b).$ We have $$\sum_i \pi_i = G\cdot a + (b-a) \sum_i y_i\\ = G\cdot a + (b-a)(G-1) = (G-1)b + a = 1.$$</p><p>Now to confess to my cheating. I said the $i$-th group size would be $n_i=\pi_i \cdot N,$ ignoring the minor problem that the left side is an integer and the right side almost surely is not. So of course we will round $\pi_i \cdot N$ to get $n_i.$ After rounding, though, we can no longer be sure that $\sum_i n_i = N.$ So we iteratively fix it. If $\sum_i n_i < N,$ we add 1 to the smallest $n_i$ and recheck. if $\sum_i n_i > N,$ we subtract 1 from the largest $n_i$ and recheck. Eventually, we end up with a feasible set of group sizes.</p><p>All this is encoded in an <a href="https://rubin.msu.domains/blog/ga_clustering.nb.html" target="_blank">R notebook</a> I wrote, including a sample problem. Whether the GA gets a "good" (near optimal) partition or not I cannot say, since I did not write the equivalent MIP model. I can say, though, that it gets a sequence of progressively improving partitions.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-19029288518592495832023-01-30T19:03:00.000-05:002023-01-30T19:03:23.918-05:00A Random Tree Generator<p>For a problem I was noodling with in R, I needed to generate random trees (meaning connected, acyclic, layered, undirected graphs, not classification trees or anything statistical). There are a bunch of libraries for R that have various capabilities for constructing and or massaging graphs and networks (see the <a href="https://cran.r-project.org/web/views/GraphicalModels.html" target="_blank">CRAN Task View</a> for graphical models). After spending some time poking through the docs for some of the libraries listed in the task view, and unsuccessfully trying to install one, I decided it would be faster just to roll my own code.</p><p>Along the way I succumbed to the software developer's maxim: if it works, it needs more features. Still, I wound up with a function that is not horribly complicated and seems to work. Given how many nodes and how many layers you want in the tree, it outputs a matrix of edges forming the desired tree. There are optional arguments for how you want the nodes labeled (the default is 1, 2, ... but you can supply a vector of labels) and in what order the labels should be assigned (top to bottom/left to right raster scan of the graph or randomly), as well as how you want the output matrix organized (randomly, by label order, or by layer order).</p><p>The code is in an R notebook that demonstrates the use of the function. It imports the <span style="font-family: courier;">DiagrammeR</span> library in order to plot the resulting trees, but the function does not require <span style="font-family: courier;">DiagrammeR</span> and in fact has no dependencies. You can view the notebook <a href="https://rubin.msu.domains/blog/treeGenerator.nb.html" target="_blank">here</a>, and if you want you can download the code from it (see the select box in the upper right of the notebook). The code is licensed under the <a href="https://creativecommons.org/licenses/by/3.0/deed.en_US">Creative Commons Attribution 3.0 Unported license</a>.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-87638461500945047522023-01-27T14:16:00.000-05:002023-01-27T14:16:04.843-05:00Weird Caja Bug<p>In the past day or so I've tripped over a rather annoying bug in Caja, the file manager for the MATE desktop (which I use with Linux Mint, and which is also used by many Ubuntu users). Theoretically, the F2 key is a shortcut for Edit > Rename..., which lets you rename selected files. Usually, it works fine, but today (and at least once in the recent past) the F2 key had no effect.</p><p>At first I thought there was a problem with my keyboard, but no, the F2 key was being recognized. A Google search led me to a <a href="https://ubuntu-mate.community/t/potential-important-bug-in-caja/16686/12" target="_blank">bug report</a> from 2018 (!) where a diligent soul with screen name "terzag" discovered that the problem occurred when the Caja window displayed a vertical scrollbar and disappeared when the window was resized large enough to lose the scrollbar. Sure enough, that fix worked for me today ... and then things got weirder.</p><p>I did a little experimenting, and the bug reappeared when the scrollbar reappeared and disappeared whenever the scrollbar disappeared. Then I tried F2 with multiple file icons selected (which opens a dialog to bulk rename them), and it worked, despite there being a visible scrollbar. Since then, no matter how many times I close and reopen Caja, the bug does not manifest. So it is temporarily fixed, I guess, with emphasis on "temporarily".</p><p>I also encountered a possibly related bug, again reported years ago. If I hit either F2 or Edit > Rename... without first selecting one or more files, it closes (crashes?) all open Caja windows. I'm pretty sure that's a bug and not a "feature".</p><p>Hopefully this all gets fixed relatively soon ... although looking at bug reports from almost five years ago does not make me feel sanguine about the prospects.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-14998184317830038222023-01-27T13:43:00.000-05:002023-01-27T13:43:07.325-05:00Finding All Simple Cycles (III)<p>This is the third and (hopefully) final post about finding all simple cycles in an undirected graph. In the <a href="https://orinanobworld.blogspot.com/2023/01/finding-all-simple-cycles-i.html" target="_blank">first post</a>, I discussed the problem briefly and described a simple, efficient iterative algorithm. In the <a href="https://orinanobworld.blogspot.com/2023/01/finding-all-simple-cycles-ii.html" target="_blank">second post</a>, I posed (but definitely did not recommend) a mixed integer linear program (MILP) to find all simple cycles. Here I will discuss an alternative MILP approach, specific to CPLEX. To make sense of it, you will definitely need to have read the second post. Java code demonstrating all three methods is available from my <a href="https://gitlab.msu.edu/orobworld/simplecycles" target="_blank">repository</a>.</p><p>Recent versions of CPLEX have a feature called the solution pool, about which I have <a href="https://orinanobworld.blogspot.com/2013/01/finding-all-mip-optima-cplex-solution.html" target="_blank">previous written</a>. One use is to accumulate solutions (including suboptimal ones if desired) encountered along the way to the optimal solution of a MILP model. Another, using the <span style="font-family: courier;">IloCplex.populate()</span> method in the Java API, is to just generate a gaggle of feasible solutions. So the approach I propose here is to build a MILP model for finding a single simple cycle, set the solution pool population limit parameter (the maximum number of solutions to retain) to something really large and the solution pool intensity parameter (how hard CPLEX works to churn out solutions) to its maximum value, and then use the <span style="font-family: courier;">populate()</span> method to find simple cycles. This is implemented in the <span style="font-family: courier;">SingleCycleMIP.java</span> class in my code. </p><p>Actually, I'm not sure setting the intensity parameter is necessary. On the small test graph, results were similar no matter what intensity setting I used. CPLEX stopped with 160 solutions, many of which were duplicates. After removing duplicates, there were 13 distinct simple cycles, which matches what the other methods found. The only difference was run time, which was a fraction of a second using either the default intensity or the maximum intensity (4) and about two seconds using the minimum intensity (1). Either way, it was much faster than the full MILP model, at least on this small test graph.</p><p>The MILP model used here is a subset of the model in the previous post. The model has no objective value (or equivalently minimizes the constant zero function). We drop the $u$ variables (which were used to ensure that cycles were distinct) and the $x$ variables (which signaled whether a slot in the solution was filled with a cycle or not). For all other variables, we drop the $c$ subscript (which slot is being filled), since we are only building one cycle. The "one anchor per cycle" constraint becomes $\sum_i w_i = 1,$ and other constraints involving $u$ and/or $x$ are dropped.</p><p>One issue with this model is that the same cycle can be discovered multiple times, either because its orientation is reversed (so the "clockwise" and "counterclockwise" versions are considered distinct) or possibly because minor difference (rounding error) in the flow variables make two copies of the same solution look distinct. Regardless of the reason, the fact CPLEX terminated with 160 solutions when only 13 cycles exist tells you duplication is happening. Fortunately, we do not have to monkey with the model (or set overly finicky values for tolerance values); we can just filter the pool after the run and weed out the duplicates (which my code does). The alternative would be to use a callback (either an incumbent callback if you are using legacy callbacks, or the more current generic callback in "candidate" context) to weed out duplicates as they arise. Since the logic to be used in the callback is the same as the logic used in post-processing, I cannot see any virtue to using a callback, and it definitely would complicate the code.</p><p>With that I am hopefully done with this topic. :-)</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-45323582944106023602023-01-26T15:03:00.002-05:002023-01-26T15:03:58.291-05:00Finding All Simple Cycles (II)<p><a href="https://orinanobworld.blogspot.com/2023/01/finding-all-simple-cycles-i.html" target="_blank">Yesterday's post</a> described my introduction (by way of a <a href="https://or.stackexchange.com/questions/9786/number-of-hamiltonian-sub-cycles-on-the-graph" target="_blank">question</a> on OR Stack Exchange) to the problem of how to find all simple cycles in a graph. In that post I described a simple, efficient iterative algorithm for finding them.</p><p>The OR SE question specifically asked about a mixed integer linear program (MILP) to find all simple cycles. While I would never take that approach myself, I got curious as to whether it was possible to construct such a model. Down the rabbit hole I went. The answer turns out to be yes. In this post, I'll describe the model (again, without advocating that it be used).</p><p>Assume that we have an undirected graph with vertices numbered $1,\dots,N$ and edge set $E.$ In what follows, $i$ and $j$ will always be vertex numbers and $c$ and $c'$ will index cycles. Although the graph is undirected, when we construct cycles we will give them an orientation (e.g., $3 \rightarrow 2 \rightarrow 5 \rightarrow 3$), which will be useful in ensuring that the result is an actual cycle (you can get from the starting node back to itself) and not the composition of multiple smaller cycles (e.g., not treating $3 \rightarrow 2 \rightarrow 4 \rightarrow 3$ and $5 \rightarrow 6 \rightarrow 9 \rightarrow 8 \rightarrow 5$ as if they form a single cycle).<br /></p><p>The model depends on guessing a non-binding upper bound $C$ on the number of simple cycles in the graph. (If the solution to the model does contain $C$ cycles, your guess may have been too small. To be certain that all cycles were found, you would need to bump up $C$ and solve again, until the optimal solution found fewer than $C$ cycles.)</p><p>The model contains the following gazillion binary variables:</p><ul style="text-align: left;"><li>$x_{c}$ is an indicator for whether cycle $c\in\left\{ 1,\dots,C\right\} $<br />is constructed;</li><li>$y_{i,c}$ is an indicator for whether vertex $i$ belongs to cycle $c$;</li><li>$z_{i,j,c}$ is an indicator for whether edge $[i,j]\in E$ is part of<br />cycle $c$ with the tour of $c$ proceeding from $i$ to $j$;</li><li>$w_{i,c}$ is an indicator for whether vertex $i$ is the anchor (starting<br />and ending point) for cycle $c$; and<br /></li><li>$u_{i,j,c,c'}$ is an indicator that edge $[i,j]\in E$ (in either<br />orientation) belongs to exactly one of cycles $c$ and $c'\neq c.$</li></ul><p>We also have one set of variables that can be defined as either continuous or integer (continuous probably makes the model solve faster):</p><ul style="text-align: left;"><li>$f_{i,j,c}\ge 0$ is the ``flow'' on edge $[i,j]$ in the orientation $i\rightarrow j$ in cycle $c.$</li></ul><p>The flow variables are an adaptation of the variables used in the <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem#Miller%E2%80%93Tucker%E2%80%93Zemlin_formulation[21]" target="_blank">Miller-Tucker-Zemlin formulation</a> for the traveling salesman problem (TSP). We will use them to ensure that cycles found are in fact cycles.</p><p>The objective, at least, is straightforward. Since we want to find all possible simple cycles, we maximize the number of cycles constructed: $\max\sum_{c=1}^C x_{c}.$ Now come the constraints. (Brace yourself for some serious vertical scrolling.)<br /></p><ul style="text-align: left;"><li>Edges can only belong to cycles containing both endpoints, and only<br />in one orientation: \begin{align*}<br />z_{i,j,c}+z_{j,i,c} & \le y_{i,c}\quad\forall[i,j]\in E,\forall c\\<br />z_{i,j,c}+z_{j,i,c} & \le y_{j,c}\quad\forall[i,j]\in E,\forall c<br />\end{align*}</li><li>Flow only occurs on edges that are used, and only in the orientation<br />used: \begin{align*}<br />f_{i,j,c} & \le Nz_{i,j,c}\quad\forall[i,j]\in E,\forall c\\<br />f_{j,i,c} & \le Nz_{j,i,c}\quad\forall[i,j]\in E,\forall c<br />\end{align*}</li><li>Constructed cycles have one anchor and unused cycles have none: \[<br />\sum_{i=1}^N w_{i,c}=x_{c}\quad\forall c<br />\]</li><li>A cycle's anchor must belong to the cycle: \[<br />w_{i,c}\le y_{i,c}\quad\forall i,\forall c<br />\]</li><li>In every cycle, at every node, there are either two incident edges<br />(if the node belongs to the cycle) or none (if the node is not part<br />of the cycle: \[<br />\sum_{j:[i,j]\in E}(z_{i,j,c}+z_{j,i,c})=2y_{i,c}\quad\forall i,\forall c<br />\]</li><li>In any used cycle, at any node in the cycle other than the anchor, flow out is at least one unit less than flow in (mimicking the MTZ formulation of the TSP): \[<br />\sum_{j:[i,j]\in E}f_{i,j,c}\le\sum_{j:[i,j]\in E}f_{j,i,c}-y_{i,c}+Nw_{i,c}\quad\forall i,\forall c<br />\]</li><li>To ensure a complete cycle, one unit of flow must return to the anchor node: \[<br />\sum_{j:[i,j]\in E}f_{j,i,c}\ge w_{i,c}\quad\forall i,\forall c<br />\]</li><li>The $u$ variables are defined in terms of the $z$ variables: \[<br />u_{i,j,c,c'}\le z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\quad\forall[i,j]\in E,\forall c<c'<br />\] \[<br />u_{i,j,c,c'}\le2-\left(z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\right)\quad\forall[i,j]\in E,\forall c<c'<br />\]</li><li>Any pair of cycles constructed must differ in at least one edge (which<br />also prevents a cycle from repeating with the opposite orientation): \[<br />\sum_{[i,j]\in E}u_{i,j,c,c'}\ge x_{c}+x_{c'}-1\quad\forall c<c'<br />\]</li></ul><p>It is also possible to add constraints to mitigate some of the symmetry in the model. Two come to mind:</p><ul style="text-align: left;"><li>Unused cycles must have higher indices than used cycles: \[<br />x_{c}\le x_{c-1}\quad\forall c\in\left\{ 2,\dots,C\right\} <br />\]</li><li>The anchor in a cycle must have the smallest index of any node<br />in the cycle: \[<br />w_{i,c}+y_{j,c}\le1\quad\forall i>j,\forall c<br />\]</li></ul><p>The first one is of low enough dimension that I automatically included it in my Java code. I made the second one optional in the code.</p><p>When I ran it against the 9 node, 12 edge graph in the OR SE question, CPLEX had a solution with 13 cycles (the full number) after about 25 seconds and 5,000 nodes -- way, way more time than the 13 <i>milliseconds</i> the iterative method needed. I set $C=20$ as the upper bound on the number of cycles, which of course is also the initial upper bound on the objective. After 60 seconds (the time limit I set), CPLEX's upper bound was still 20. So probable optimality will not be easy to come by.</p><p>I've got one more MILP model up my sleeve, coming in the next post. Meanwhile, as a reminder, my Java code is available <a href="https://gitlab.msu.edu/orobworld/simplecycles" target="_blank">here</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-44266291329494032342023-01-25T18:26:00.000-05:002023-01-25T18:26:42.240-05:00Finding All Simple Cycles (I)<p>This is the first of a sequence of posts related to the question of finding all simple cycles in an undirected graph. Motivation for the posts comes from a <a href="https://or.stackexchange.com/questions/9786/number-of-hamiltonian-sub-cycles-on-the-graph" target="_blank">question</a> on OR Stack Exchange. A simple cycle is a cycle with no repeated nodes (other than the starting/ending node).</p><p>Assume we have an undirected graph with nodes indexed from 1 to $N$ and with edge set $E.$ I will use the term "anchor" to refer to the node at which a tour of a simple cycle starts (and ends), and require (without loss of generality) that the anchor of every cycle be the lowest index node in the cycle. Since a cycle is the same whether you traverse it "clockwise" or "counterclockwise", I will also require without loss of generality that nodes in a cycle be listed so that the node immediately after the anchor has smaller index than the node immediately before the anchor. This is just to reduce production of duplicate cycles.</p><p>I came up with three approaches to finding all simple cycles. The first, which I will describe below, is a simple iterative scheme that is the fastest. The other two, described in subsequent posts, use mixed integer linear programs (MILPs) -- not because they are efficient but because the OR SE question specifically asked about MILP approaches. I will describe those in subsequent posts.</p><p>The iterative scheme is pretty simple. It uses a queue of paths (incomplete cycles) to store work in progress. I will use "terminus" to indicate the last node in a path. When the terminus equals the anchor, we have a cycle. Due to my definition of "anchor", we can say that no node on a path can have a lower index than the anchor of the path.</p><p>The iterative scheme loops over possible anchors from 1 to $N.$ For each anchor, we initialize the queue with all paths consisting of a single edge incident at the anchor and not incident at any node less than the anchor. While the queue is not empty, we pop a path and create one or more new paths by extending that path with each qualifying edge. A qualifying edge is one that meets the following conditions:</p><ul style="text-align: left;"><li>it is incident at the terminus of the current path;</li><li>it is not incident at any node already on the path (other than anchor);</li><li>it is not incident at any node less than the anchor; and</li><li>if it is incident at the anchor, it is not the same as the first edge of the path (i.e., we do not allow a "cycle" of the form $a\rightarrow b \rightarrow a$).<br /></li></ul><p>If the new edge leads to the anchor, we have a cycle. If not, we update the terminus and add the extended path to the queue, for further processing later. If a path cannot be extended because it is not a cycle and there are no qualifying edges left, we discard it.</p><p>Coding the algorithm is pretty straightforward, and it should work on any undirected graph (even those that are not connected). On the example from the OR SE post (9 nodes, 12 edges) it took a whopping 11 milliseconds to find all 13 simple cycles.</p><p>Java code for this and the two MILP models (the latter requiring a recent version of CPLEX) can be found in my <a href="https://gitlab.msu.edu/orobworld/simplecycles" target="_blank">GitLab repository</a>. The iterative algorithm is in the <span style="font-family: courier;">CycleFinder</span> class. Stay tuned for the MILP models.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-77178391836408762632022-12-26T14:27:00.001-05:002022-12-26T14:27:41.558-05:00Selecting Dispersed Points<p>Fellow blogger Erwin Kalvelagen posted <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2022/12/maximum-number-of-points-with-minimum.html" target="_blank">a comparison</a> of two binary programming models, one quadratically constrained and one linearly constrained, for the problem of selecting a maximal number of points from a finite set subject to the requirement that no two selected points be closer than a specified distance. The models were an answer to a <a href="https://scicomp.stackexchange.com/questions/42321/selecting-most-points-from-a-set-of-points-with-distance-constraint" target="_blank">question</a> posted on Computational Science Stack Exchange. Not surprisingly, the linearized version tended to solve faster than the quadratic version.</p><p>In Erwin's linear model, the constraints take the form $$\underline{D}(x_i + x_j - 1) \le d_{i,j} \quad (1)$$ where $d_{i,j}$ is the distance between points $i$ and $j$, $\underline{D}$ is the minimum allowable distance between selected points, and $x_k$ is a binary variable indicating whether point $k$ is selected (1) or not (0). I coded both his models in Java, using CPLEX 22.1.1, along with another linear model where the constraints are expressed as $$x_i + x_j \le 1\quad (2)$$ for those pairs $(i,j)$ where $d_i + d_j \le \underline{D}.$ In other words, we exploit the fact that we know the distances at the outset to precompute which pairs of points can / cannot coexist, and just rule out the pairs that cannot. Since (1) is equivalent to $$x_i + x_j \le 1 + \frac{d_{i,j}}{\underline{D}},$$ constraint (2) is clearly at least a bit tighter than constraint (1).<br /></p><p>Erwin started with a problem size of 50 points for demonstration purposes, then doubled that to compare timing of this two models. I ratcheted the problem size up to 1,000 points to compare his linear model to mine. (I did not test the quadratic model at that size.) As with all things MIP, the results were not entirely consistent. In limited testing, the model using (2) was usually faster than the model using (1), but occasionally (1) proved faster. The run time differences were not large enough to be exciting. For instance, in one test run version (1) needed 4.633 seconds versus 2.987 seconds for version (2).</p><p>Overall, I can't say the time differences lived up to my expectations, and the fact that at least occasionally (1) was faster than (2) (perhaps due to some quirk in presolving, or just to some random choices in branching) is consistent with my experience that MIP behaviors are, well, not consistent. <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com4tag:blogger.com,1999:blog-8781383461061929571.post-2656940070063535962022-12-13T17:38:00.000-05:002022-12-13T17:38:10.113-05:00Scheduling a Round-Robin Tournament<p>Once in a while I come across a scheduling problem, and of course my first reaction is to think "integer program". A while back, for example, I used an IP model to work out a schedule for a rotating duplicate bridge game (in which both teams and hosts rotated) at the request of a buddy. Most recently, I saw a question on OR Stack Exchange ("<a href="https://or.stackexchange.com/questions/9545/doubles-round-robin-sorting-algorithm" target="_blank">Doubles Round Robin Sorting Algorithm</a>") related to tournament scheduling. </p><p></p><p>The problem involves six players playing pickleball (so two players on each team, with two sitting out, for each game). Given six players, there are 15 possible teams (pairings). Twelve games are scheduled each week for four weeks, with two sets of six games per week. The author of the question correctly computed that there are 45 possible game configurations. Over the course of the four weeks, he wanted every possible game played at least once (which leaves the final three game slots to be filled arbitrarily). The following conditions must be met:</p><ul style="text-align: left;"><li>within each set of six games, every player plays four games and sits out two;</li><li>no player sits out two consecutive games within the same set;</li><li>no two players are partners more than once per set.</li></ul><p>The problem poses no criterion for selecting among feasible schedules; we just want to find any one schedule that works. </p><p>My IP formulation can be found in my answer to the OR SE question. I will just mention that it uses binary variables $x_{g,s}=1$ if game $g$ goes in slot $s$ and 0 if not. Games are enumerated up front and indexed from 1 to 15. Slots refer to positions in the schedule and are numbered from 1 to 48, with six slots per set and 12 slots per week. Since no criterion is specified, we just use the default (minimize 0).</p><p>I also tried a constraint programming model, using general integer variables $s_1,\dots,s_{48},$ where $s_j \in {1,\dots,15}$ is the index of the game played in slot $j$ of the schedule. My CP formulation uses the "all different" constraint (fairly ubiquitous among CP solvers) to ensure that the first 45 slots each contain a different game. It uses the CP Optimizer "distribute" constraint to ensure that no game is played more than twice and the CP Optimizer "count" constraint to ensure that each player plays exactly four times per set (and thus sits twice) and that no team plays more than once in a set. I'm not sure how common those types of constraints are among CP solvers because I don't have much experience with CP solvers.</p><p>As it turns out, both CPLEX (for the IP model) and CP Optimizer (for the CP model) produced feasible schedules in about a second or so on my desktop PC. My intuition was that a CP model would be better suited to this type of problem, because all variables are integer and a CP solver would not be doing much if any matrix arithmetic. As it turns out, it would take a much larger test case to tell whether that intuition has any merit.</p><p>If anyone wants to play with the models, my Java code is available from my university GitLab <a href="https://gitlab.msu.edu/orobworld/roundrobin" target="_blank">repository</a>. Running it would require CPLEX and CP Optimizer (and not necessarily the most recent versions of either).<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-23095946496261555012022-11-06T15:52:00.002-05:002022-11-06T15:53:39.220-05:00A Bicriterion IP Model for a Game Strategy<p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">A <a href="https://math.stackexchange.com/questions/4569317/how-can-i-solve-this-optimization-problem-with-integer-variables" target="_blank">question</a> on Mathematics Stack Exchange asks how to solve an optimization problem related to a game apparently called "speedrunner". The problem boils down to selecting integer values for variables $n_a,$ $n_c$ and $n_{b,i}$ ($i=1,\dots,n_c$) so as to minimize elapsed time $$t_{\textrm{total}} =t_{a}n_{a}+\sum_{i=1}^{n_{c}}t_{b}n_{b,i}+t_{c}n_{c},$$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">where $t_a$, $t_b$ and $t_c$ are parameters with specified values. The sole constraint specified is that winnings, defined by</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$x_{\textrm{total}} = x_{0}n_{a}\prod_{i=1}^{n_{c}}n_{b,i}$$(where $x_0$ is another parameter), must be at least $\$1$ billion. We can infer from the requirement of positive earnings that $n_a \ge 1,$ $n_c \ge 1,$ and $n_{b,i} \ge 1$ for $i=1,\dots,n_c.$ For modeling purposes, we will allow the index $i$ to exceed $n_c$ and require that $n_{b,i}=0$ for $i \gt n_c.$ An integer linear programming model would be a no-brainer were it not for two things: the product form of the expression for winnings is nonlinear; and it involves the product of a variable number of variables. </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The approach I chose was to express the original variables in terms of binary variables. To do that, we first need upper bounds on the original variables. We get those by guessing an upper bound $T$ on $t_{\textrm{total}}.$ (If we guess too low, we will either get a solution that exceeds that upper bound or the model will become infeasible, either of which will tip us off to try a larger guess.) Once we have that upper bound, we get an upper bound for each variable by setting the other two variables as small as possible. That leads to the following upper bounds:$$N_{a} =\left\lfloor \frac{T-t_{b}-t_{c}}{t_{a}}\right\rfloor$$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$N_{b} =\left\lfloor \frac{T-t_{a}-t_{c}}{t_{b}}\right\rfloor$$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">and </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$N_{c} =\left\lfloor \frac{T-t_{a}}{t_{c}+t_{b}}\right\rfloor .$$ </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The denominator in the third equation arises from the fact that adding 1 to $n_c$ not only directly costs $t_c$ units of time but also adds another $n_{b,i}$ variable with minimum value 1, costing $t_b$ units of time.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Armed with those upper bounds, we can introduce binary variables $y_j$ $(j=1,\dots,N_a),$ $w_j$ $(j=1,\dots, N_c)$ and $z_{i,j}$ $(i=1,\dots,N_c$ and $j=0,\dots,N_b)$ along with constraints making them type 1 special ordered sets (i.e., each set contains exactly one variable with value 1) and expanding the original variables in terms of them. Specifically:</p><ul style="text-align: left;"><li>$n_{a}=\sum_{j=1}^{N_{a}}j\cdot y_{j}$ with constraint $\sum_{j=1}^{N_{a}}y_{j}=1;$<style type="text/css">p, li { white-space: pre-wrap; }</style> </li><li><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$n_{c}=\sum_{j=1}^{N_{c}}j\cdot w_{j}$ with constraint $\sum_{j=1}^{N_{c}}w_{j}=1;$ and</p></li><li>$n_{b,i}=\sum_{j=0}^{N_{b}}j\cdot z_{i,j}$ for $i=1,\dots,N_{c}$ <p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">with the following constraints for all $i\in\left\{ 1,\dots,N_{c}\right\}:$</p><ul><li>$\sum_{j=0}^{N_{b}}z_{i,j}=1$ (the value of $n_{b,i}$ is uniquely defined); and<br /></li></ul><style type="text/css">p, li { white-space: pre-wrap; }</style></li><ul><li>$z_{i,0}+\sum_{k=i}^{N_{c}}w_{k}=1$ ($n_{b,i}=0$ if and only if $n_{c}<i$).</li></ul></ul><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Armed with all that, the original objective function can be expressed as minimizing$$t_{a}n_{a}+t_{b}\sum_{i=1}^{N_{c}}n_{b,i}+t_{c}n_{c},$$where the only tweak is that we now sum over a constant number $N_c$ of terms rather than a variable number $n_c,$ relying on the fact that $n_{b,i}=0$ for $i>n_c.$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">That brings us to the nonlinear formula for winnings. The original constraint is $ x_{\textrm{total}} \ge 10^9,$ which we can rewrite as $\log(x_{\textrm{total}}) \ge \log(10^9)$ using whatever flavor logarithm you like. Expanding the logs of $n_a$ and $n_{b,i}$ in terms of the binary variables, we get$$\log(x_{\textrm{total}}) = \log(x_{0})+\sum_{j=1}^{N_{a}}\log(j)\cdot y_{j} +\sum_{i=1}^{N_{c}}\sum_{j=1}^{N_{b}}\log(j)\cdot z_{i,j}.$$So the constraint that log winnings equal or exceed the log of $10^9$ is linear in the binary variables.</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The solution in the Math Stack Exchange post $(n_a = 7,$ $n_c = 3$ and $n_{b,1}=n_{b,2}=n_{b,3}=18)$ turns out to be optimal given the parameter values in the question, and the IP model here will correctly achieve the minimum time (approximately 52.3 minutes) ... but it will not necessarily reproduce the best winnings within that time (approximately $\$1.02$ billion). That is because there are multiple feasible solutions that hit the correct time value, with $n_a = 7$ and with $n_c = 3$ but with the 54 cumulative $n_b$ units allocated differently (for instance, $\{19, 19, 16\}$ rather than $\{18, 18, 18\}.$ To get the best possible solution, we can use a bicriterion model with lexicographically ordered objectives, first minimizing time and then maximizing winnings (by minimizing the negative of the log of winnings).</p><p></p><p>I tested that model using the Java API to CPLEX 22.1. My code is available from my <a href="https://gitlab.msu.edu/orobworld/speedrunner" target="_blank">university repository</a>.<br /></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p><style type="text/css">p, li { white-space: pre-wrap; }</style></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-14095270023479677062022-11-01T11:59:00.000-04:002022-11-01T11:59:41.193-04:00Holt Double Exponential Smoothing<p>Given a time series $x_t,\, t=1,2,\dots,$ presumed to contain linear trend, <a href="https://en.wikipedia.org/wiki/Exponential_smoothing#Double_exponential_smoothing_(Holt_linear)" target="_blank">Holt's double exponential smoothing method</a> computes estimates $s_t$ of the level (mean) at time $t$ and $b_t$ of the (presumed constant) slope using the following formulas: $$s_t = \alpha x_t + (1-\alpha)(s_{t-1} + b_{t-1})$$ and $$b_t = \beta(s_t - s_{t-1}) + (1-\beta)b_{t-1}$$ where $\alpha,\beta \in (0,1)$ are smoothing weights. A recent <a href="https://or.stackexchange.com/questions/9217/forecasting-with-holts-linear-trend-method/" target="_blank">question on OR Stack Exchange</a> asked why the second formula is based on the level estimate and not the observed value. In other words, the proposed alternative to the trend update was $$b_t = \beta(x_t - x_{t-1}) + (1-\beta)b_{t-1}.$$</p><p>The intuition for doing it Holt's way is fairly simple. If exponential smoothing is working as intended (meaning <i>smoothing</i> things), then the difference in level estimates $s_t - s_{t-1}$ should be less variable than the difference in observed values $x_t - x_{t-1}.$ A formal proof probably involves induction arguments, requiring more functioning brain cells than I had available at the time, so I was a bit loose mathematically in my answer on OR SE. Just to confirm the intuition, I did some Monte Carlo simulations in R. The notebook containing the experimental setup, including code, is available <a href="https://rubin.msu.domains/blog/Holt.nb.html" target="_blank">here</a>. It requires the <span style="font-family: courier;">dplyr</span> and <span style="font-family: courier;">ggplot2</span> library packages.<br /></p><p>The following plots show confidence intervals over time for the errors in the level and trend estimates using both Holt's formula and what I called the "variant" method. They are from a single experiment (100 independent time series with identical slope and intercept, smoothed both ways), but other trials with different random number seeds and changes to the variability of the noise produced similar results.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUgHyaC2TcpZkTaIz56M66ivTu36aHkdHCisEYUsLmc_7rQdmZ2xYiVmj0QPE5YK0h_8NoYjlEN4C6aELVQThit3XqDUi6THWxVZ3w4KV53k-h6CL19t7anItawyIpmSDYPgij665mZBcYH_KQNoERrxFeEYFbR3xi7Ikp-0NXwqhKK2Z1FlRh8QxE/s579/Holt_level.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="plot of confidence intervals for error in level estimates" border="0" data-original-height="357" data-original-width="579" height="394" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUgHyaC2TcpZkTaIz56M66ivTu36aHkdHCisEYUsLmc_7rQdmZ2xYiVmj0QPE5YK0h_8NoYjlEN4C6aELVQThit3XqDUi6THWxVZ3w4KV53k-h6CL19t7anItawyIpmSDYPgij665mZBcYH_KQNoERrxFeEYFbR3xi7Ikp-0NXwqhKK2Z1FlRh8QxE/w640-h394/Holt_level.png" width="640" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDQmQIPfbMZ25b11FqmSHjgqyN1h0RfwKWn5rAKNHzEB1lvhuKpYEe17y4MiUIwcwTGwS6-wxTeHN0a6VA-Q8Qxiwuo1q3QT5ML1ABHMm7YJhdtCOLjl63-Y8XQKbq8Io1a8bruxkkgjv8Yi_-jiWy091hKV5TNu0UeHIag-ZNoM3UjyktlIdE_IVH/s579/Holt_trend.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="plot of confidence intervals for error in trend estimates" border="0" data-original-height="357" data-original-width="579" height="394" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDQmQIPfbMZ25b11FqmSHjgqyN1h0RfwKWn5rAKNHzEB1lvhuKpYEe17y4MiUIwcwTGwS6-wxTeHN0a6VA-Q8Qxiwuo1q3QT5ML1ABHMm7YJhdtCOLjl63-Y8XQKbq8Io1a8bruxkkgjv8Yi_-jiWy091hKV5TNu0UeHIag-ZNoM3UjyktlIdE_IVH/w640-h394/Holt_trend.png" width="640" /></a></div><br /><p>In both cases, the estimates start out a bit wobbly (and the Holt estimates may actually be a bit noisier), but over time both stabilize. There does not seem to be much difference between the two approaches in how noisy the level estimates are, at least in this run. The Holt estimates may have slightly narrower confidence intervals, but that is not clear, and the difference if any seems pretty small. The Holt trend estimates, however, are considerably less noisy than those of the variant method, supporting the intuitive argument.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-37764870158890072652022-09-13T17:55:00.001-04:002022-09-13T17:55:59.819-04:00With CPLEX, You're Not In Until You're In<style type="text/css"><!-- /** * GeSHi (C) 2004 - 2007 Nigel McNie, 2007 - 2008 Benny Baumann * (http://qbnz.com/highlighter/ and http://geshi.org/) */ .java {font-family:monospace;color: #006; border: 1px solid #d0d0d0; background-color: #f0f0f0;} .java a:link {color: #000060;} .java a:hover {background-color: #f0f000;} .java .imp {font-weight: bold; color: red;} .java .kw1 {color: #000000; font-weight: bold;} .java .kw2 {color: #000066; font-weight: bold;} .java .kw3 {color: #003399;} .java .kw4 {color: #000066; font-weight: bold;} .java .co1 {color: #666666; font-style: italic;} .java .co2 {color: #006699;} .java .co3 {color: #008000; font-style: italic; font-weight: bold;} .java .coMULTI {color: #666666; font-style: italic;} .java .es0 {color: #000099; font-weight: bold;} .java .br0 {color: #009900;} .java .sy0 {color: #339933;} .java .st0 {color: #0000ff;} .java .nu0 {color: #cc66cc;} .java .me1 {color: #006633;} .java .me2 {color: #006633;} .java span.xtra { display:block; } --!></style><p>(For a suitable soundtrack for this post, try <a href="https://www.youtube.com/watch?v=OOWO--z1S8A" target="_blank">this</a> three minute video.) <br /></p><p>A <a href="https://community.ibm.com/community/user/datascience/discussion/variables-annotations-in-cplex-for-automatic-benders-decomposition-java-api" target="_blank">question</a> popped up on the CPLEX support forum that reminded me of a slightly obscure and occasionally import aspect of the CPLEX programming APIs. What follows holds true for the Java API, and I believe it applies to the C++ and maybe C APIs. It may also hold for some of the other APIs.</p><p>The questioner, working in Java, was trying to use the "annotated Benders" approach to a MIP model. The programming issue actually is not specific to Benders decomposition, annotated or otherwise. The problem occurred in a loop where the user was defining integer variables and trying to associate them with the master problem. Here's the code:</p><div class="java"><span class="kw1">for</span><span class="br0">(</span><span class="kw4">int</span> k<span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span> k <span class="sy0"><</span> x.<span class="me1">length</span><span class="sy0">;</span> k<span class="sy0">++</span><span class="br0">)</span> <span class="br0">{</span> <br /> x <span class="br0">[</span>k<span class="br0">]</span> <span class="sy0">=</span> cplex.<span class="me1">intVar</span><span class="br0">(</span><span class="nu0">0</span>,cluster.<span class="me1">getNbVeh</span><span class="br0">(</span><span class="br0">)</span>,<span class="st0">"x_"</span><span class="sy0">ez_plus</span>k<span class="br0">)</span><span class="sy0">;</span> <br /> cplex.<span class="me1">setAnnotation</span><span class="br0">(</span>benders, x<span class="br0">[</span>k<span class="br0">]</span>,<br /> IloCplex.<span class="me1">CPX_BENDERS_MASTERVALUE</span><span class="br0">)</span><span class="sy0">;</span><br /><span class="br0">}</span></div><div class="java"><span class="br0"> </span></div><p>The error message from CPLEX was "object is unknown to IloCplex", which was understandably confusing and yet, from personal experience, predictable.</p><p>Here is the problem: at the point that variable <span style="font-family: courier;">x[k]</span> is fed to the <span style="font-family: courier;">setAnnotation()</span> function, <span style="font-family: courier;">x[k]</span> is initialized (in the programming sense that it has content) but is <i>not yet part of the model instance</i> (since it has not be used in a constraint or objective function). In fact, even if it had been added to a constraint or objective expression, it would still not be part of the model until the constraint or objective was itself added to the model. This is despite the fact that the model instance (here <span style="font-family: courier;">cplex</span>) was used to create the variable.</p><p>One possible fix would be to use <span style="font-family: courier;">x[k]</span> in something that was added to the model <i>before</i> trying to set its annotation. A simpler fix is just to insert the line <span style="font-family: courier;">cplex.add(x[k]);</span> in between where <span style="font-family: courier;">x[k]</span> is defined and where the annotation is set. This nominally adds <span style="font-family: courier;">x[k]</span> to the model (even though not in any specific role), which is enough to make it a "known object" to CPLEX.</p><p>I learned (the hard way) about this long before CPLEX added support for Benders decomposition. When you look up solution values after solving a model, CPLEX provides convenient functions to look up either the value of a single variable or the value of an array of variables. In many applications, it is tempting to use the latter. At the same time, in some applications you may want to define an array of variables but not use all of them. For instance, suppose I have a network where some nodes have demands and some do not, and I want to define a binary variable <span style="font-family: courier;">x[k]</span> just at nodes <span style="font-family: courier;">k</span> that do have demand. If I define <span style="font-family: courier;">x</span> to have dimension equal to the number of nodes and then only initialize <span style="font-family: courier;">x[k]</span> at the nodes with demands, passing <span style="font-family: courier;">x</span> to a method to retrieves values will bomb because some of the entries of <span style="font-family: courier;">x</span> are null. So I cleverly initialize <span style="font-family: courier;">x[k]</span> for all <span style="font-family: courier;">k</span> ... and the attempt to retrieve values still bombs, because <span style="font-family: courier;">x[k]</span> was never included in the model (in any constraint or objective function) when node <span style="font-family: courier;">k</span> has no demand. Again, the solution is to call <span style="font-family: courier;">add(x[k])</span> on the model object before solving, which means the model knows about <span style="font-family: courier;">x[k]</span> even if it never appears in any part of the actual model.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-3375395025770255632022-07-19T19:33:00.001-04:002022-07-19T19:33:08.984-04:00"Block Party" Puzzle<p>A question posed on the OR Discord channel by a doctoral student led me to discover the existence of the <a href="https://www.janestreet.com/puzzles/" target="_blank">Jane Street puzzle page</a>. The student was asking about building a MILP model for the June 2022 puzzle, called "<a href="https://www.janestreet.com/puzzles/block-party-4-index/" target="_blank">Block Party 4</a>". The puzzle involves inserting numbers into a grid, with some cells already filled in. It bears a superficial resemblance to sudoku, but with a few key differences. Where a sudoku is divided into nine square regions of nine cells each, the block party puzzle grid is divided into connected regions of varying sizes and shapes. Within a region of $k$ cells, the numbers 1 through $k$ must be filled in. Finally, rather than requiring that rows and columns contain no repeated numbers, the rules require that, for each possible value $K$, if $K$ is inserted into a cell then the nearest instance of $K$ must be at distance exactly $K$ in the $L_1$ norm. So to use a 1, there must be a 1 in an adjacent cell. To use a 2, there must be a 2 in a cell two moves away but no 2 in any adjacent cell.</p><p>Since this is a problem with logic constraints and integer decisions, my instinct was to think that constraint programming would be faster than integer programming. To test this, I coded both an IP model and a CP model in Java, using CPLEX and CP Optimizer as the respective solvers. I assumed that the grid would be square, since both the June puzzle (10 x 10) and a smaller example provided (5 x 5) were. Both models can easily be adjusted for non-square grids.</p><p>Assume an $N\times N$ grid partitioned into regions, and let $M$ be the size of the largest region (and thus the largest value that can be used in the puzzle). Number the cells 1 through $N^2$ in any order. (I used a left-to-right raster scan.) For the IP model, I use binary variables $x_{i,j}$ $(i=1,\dots,N^2$, $j=1,\dots,M)$ to indicate whether value $j$ is inserted into cell $i$. For cells with known values, I fix $x_{i,j}$ to either 0 or 1 as appropriate while building the model. Also, if cell $i$ lies in a region of size $K$, then I can fix $x_{i,j}=0$ for $j>K.$</p><p>Since we just want a feasible solution, I let the IP objective function default to minimizing zero. The most obvious constraint is $$\sum_{j=1}^M x_{i,j} = 1 \quad \forall i,$$which forces a single value to be selected for each cell. Similarly, if $B$ is a block with size $K,$ then $$\sum_{i\in B}x_{i,j}=1 \quad \forall j=1,\dots,K$$forces every value from 1 to $K$ to be used exactly once in the block. Finally, for each block $B,$ each cell $i\in B$ and each legal value $j\in \lbrace 1, \dots, \vert B\vert\rbrace$ for that cell, we add these constraints: $$x_{i,j} \le \sum_{k\in N_j(i)} x_{k,j} $$ and $$x_{i,j} + x_{k,j} \le 1\quad \forall k\in N^-_j(i),$$ where $N_j(i)$ is the set of all cells at distance exactly $j$ from cell $i$ and $N^-_j(i)$ is the set of all cells at distance less than $j$ from cell $i$ (excluding cell $i$ itself). These enforce the rule that, for value $j$ to be used in cell $i,$, it must also be used in at least one cell at distance $j$ from $i$ and in no closer cell.</p><p>The CP model is a bit more straightforward to articulate. Again, there is no objective function, since we are just solving for a feasible solution. For each cell $i$, there is a single integer variable $x_i$ with domain $1,\dots,\vert B \vert$ where $B$ is the block containing cell $i.$ If we know that cell $i$ is fixed to value $k,$ we just declare $x_i$ to have domain $\lbrace k \rbrace.$ For each block, we use an "all different" constraint to enforce the requirement that the cells in the block take distinct values. For each cell $i$ and legal value $j$ for it, the implication constraint $$(x_i = j) \implies \bigvee_{k\in N_j(i)} (x_k = j)$$ where $\bigvee$ denotes disjunction ("or"), forces at least one cell at distance $j$ from $i$ to take value $j$ if cell $i$ does, while the constraints $$(x_i = j) \implies (x_k \neq j) \quad \forall k\in N^-_j(i)$$ prohibit any closer cell from using that value. (These constraints could be condensed into a conjunction on the right hand side. For reasons I have since forgotten, I did not bother to do so.)</p><p>Both models solved the 10x10 puzzle easily. My expectation was that the CP model would be faster, for several reasons. First, it has 100 general integer variables, whereas the IP model started out with 1,100 binary variables (which the presolver whittled down to 119 binary variables, compared to 90 variables for the CP model after presolving). Second, the "all different" CP constraint seems to be a more efficient way than a steaming pile of inequality constraints to enforce the rule that no two cells in the same block take the same value. Third, CP Optimizer would be doing integer arithmetic while CPLEX would be doing double precision arithmetic, and on a per-operation basis integer arithmetic should be faster. Lastly, my experience in the past has been that the one edge IP models tend to have over CP models is tighter bounds, but that has no effect in a feasibility problem (when you are not optimizing anything).</p><p>As it turns out, I was in for a surprise. Actually, make that two surprises. First, the IP model after presolving had 211 constraints, whereas the CP model after presolving had 7,399 constraints. Note that, in the implication constraints, the left side and each equality on the right side count as a constraint. I'm not sure how comparable constraints are between the two models, but I was not expecting the CP model to have so many more. Second, while both model solved in negligible time, the IP model was faster. CPLEX solved the IP model at the root node (no branching) in about 0.01 seconds on my fairly average desktop PC. CP Optimizer needed 2,678 branches and about 0.12 seconds to solve the CP model, of which 0.05 seconds was spent in the "engine" (i.e., solving) and the rest was spent in "extraction" (turning the model into something suitable for the engine).</p><p>My Java code (which requires both CPLEX and CP Optimizer but nothing else) can be found in my <a href="https://gitlab.msu.edu/orobworld/blockparty4" target="_blank">GitLab repository</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-23367493808631263932022-07-15T14:22:00.000-04:002022-07-15T14:22:38.301-04:00Left Matrix Inverses in R<p>The following <a href="https://or.stackexchange.com/questions/8693/finding-all-left-inverses-of-a-matrix" target="_blank">question</a> popped up on OR Stack Exchange: given an $m\times n$ matrix $A$ with $m > n,$ how can one find <i>all</i> left inverses of $A$ in R? The author mentions that dimensions in their case would be around 200x10.<br /></p><p>A left inverse is a matrix $B\in \mathbb{R}^{n\times m}$ such that $B A = I.$ In order for a left inverse to exist, $A$ must have full column rank. If it does not, $Ax = 0$ for some nonzero $x\in \mathbb{R}^n,$ in which case $$x = Ix = (BA)x = B(Ax) = B\cdot 0 = 0,$$ a contradiction.</p><p>Henceforth we assume that $A$ has full column rank, in which case the <a href="https://en.wikipedia.org/wiki/Kernel_(linear_algebra)#Left_null_space" target="_blank">left null space</a> $\mathcal{N} \subseteq \mathbb{R}^m$ will have dimension $m-n.$ Now suppose that $C\in \mathbb{R}^{n\times m}$ has the property that every row of $C$ belongs to $\mathcal{N}.$ Then $CA = 0$ and so $(B+C)A = BA+0 = I,$ making $B+C$ another left inverse of $A.$ That means $A$ has an uncountably infinite number of left inverses. Conversely, if $DA=I$ then the rows of $D-B$ belong to $\mathcal{N}.$ So the set of left inverses of $A$ can be fully characterized by any individual left inverse and a basis for the left null space of $A.$</p><p>Getting this information in R is remarkably easy. There are multiple ways to compute a left inverse, but if you have the <span style="font-family: courier;">pracma</span> library loaded, then the function <span style="font-family: courier;">pracma::pinv()</span> can be used to compute the <a href="https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse" target="_blank">Moore-Penrose pseudoinverse</a>, which is a left inverse of $A.$ To get a basis for the left null space of $A,$ we apply the function <span style="background-color: white;"><span style="font-family: courier;">pracma::nullspace()</span></span> to $A^T,$ which computes a basis for the right null space of the transpose of $A,$ and then transpose the results.</p><p>I have a small <a href="https://rubin.msu.domains/blog/left_inverses.nb.html" target="_blank">R notebook</a> that demonstrates this on a random 200x10 matrix.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-81045312963307406372022-07-14T19:52:00.001-04:002022-07-14T19:52:29.473-04:00Models for a Social Network Problem<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">An <a href="https://or.stackexchange.com/questions/8674/is-this-ilp-formulation-for-group-closeness-centrality-a-column-generation-appro" target="_blank">interesting question</a> recently popped up on Operations Research Stack Exchange. The setting is a graph $G=(V,E)$ in which path length is defined to be the number of edges on the path (i.e., all edges have weight 1). The problem is to select a subset $S\subset V$ of vertices with a specified cardinality $k$ so as to minimize the sum over all nodes of the distance from each node to the closest selected node. I will assume the graph is connected, since otherwise the objective might be undefined.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br />The author of the original question indicated in a comment that the context for the problem is something involving social networks, and in another comment indicated that the diameters of the graphs are relatively constant regardless of graph size. The diameter of $G$, which I will denote by $\delta(G),$ is the maximum distance between any pair of vertices in $V.$ I will denote by $D$ the set $\left\{ 0,1,\dots,\delta(G)\right\} .$</p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The author posed the following integer programming model, in which $x_{v,d}\in\left\{ 0,1\right\} $ is 1 if vertex $v$ has distance $d$ to the nearest selected vertex and 0 otherwise. Note that $x_{v,0}=1$ if and only if $v\in S.$ \begin{align*} \min_{x} & \sum_{v\in V}\sum_{d\in D}d\cdot x_{v,d}\\ \text{s.t.} & \sum_{v\in V}x_{v,0}=k & (1)\\ & \sum_{d\in D}x_{v,d}=1\quad\forall v\in V & (2)\\ & x_{v,d}\le\sum_{u\in N_{d}(v)}x_{u,0}\quad\forall v\in V,\forall d\in D\backslash\left\{ 0\right\} & (3) \end{align*}where $N_{d}(v)\subset V$ is the set of all nodes at (shortest) distance $d$ from $v.$ Constraint (1) ensures that $S$ has the correct cardinality, constraint (2) ensures that the distance of any node from $S$ is uniquely defined, and constraint (3) ensures that a node is at distance $d$ from $S$ only if at least one node at distance $d$ belongs to $S.$ I will refer to this as the "distance model".</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The author was asking about a possible incremental approach, but I got curious about alternative models. Frequent forum contributor Rob Pratt suggested changing constraint (3) to $$x_{v,d}\le\sum_{u\in N_{1}(v)}x_{u,d-1}\quad\forall v\in V,\forall d\in D\backslash\left\{ 0\right\} \quad(3'),$$</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">which says that for a node $v$ to be at distance $d,$ one of its neighbors must be at distance $d-1.$ I will call that "distance model 2". Meanwhile, I thought it might help to leave $x_{v,0}$ binary but make $x_{v,d}\in\left[0,1\right]$ continuous for $d>0$ (which might or might not be equivalent to using branching priorities</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">to ensure that the $x_{v,0}$ variables were branched on before any</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">of the other variables). I will call that "distance model 3".</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><style type="text/css"> </style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Someone else suggested what I will call the "assignment model", which uses one set of binary variables $x_{v}$ to determine which vertices are selected to be in $S$ and a second set if binary variables $y_{v,u}$ to indicate whether vertex $u$ is the closest vertex in $S$ to vertex $v.$ That model is as follows:\begin{align*} \min_{x,y} & \sum_{u,v\in V}d_{v,u}y_{v,u}\\ \text{s.t.} & \sum_{v\in V}x_{v}=k & (4)\\ & \sum_{u\in V}y_{v,u}=1\quad\forall v\in V & (5)\\ & y_{v,u}\le x_{u}\quad\forall v,u\in V & (6) \end{align*}where (4) enforces the size requirement for $S$, (5) stipulates that every vertex is assigned to a single selected vertex (possibly itself) and (6) makes sure that the assigned selected vertex is actually selected.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"></p><p>Lastly, I came up with yet another model, which I will call the "flow model" since it is based on treating selected vertices as sinks and vertices outside $S$ as sources in a flow model. It uses binary variables $x_{v}$ to indicate whether a vertex is selected and continuous variables $y_{u,v}\in\left[0,\vert V\vert-k\right]$ for flow volumes. Since the edges are bidirectional, for each edge $(u,v)\in E$ there will be two flow variables, $y_{u,v}$ and $y_{v,u}$ (only one of which will be nonzero in the optimal solution). The idea is that each vertex that is not selected passes along any flow arriving at it plus one unit of new flow. Selected vertices soak up whatever flow comes in. We minimize the sum of the aggregate flows across all edges, which effectively charges each unit of flow 1 for each edge it crosses. The optimal solution will therefore send each unit of flow to the selected vertex (sink) closest to its source, making the cost of each unit of flow equal to the distance from source to nearest selected vertex. That model is as follows.\begin{align*} \min_{x,y} & \sum_{(u,v)\in E}\left(y_{u,v}+y_{v,u}\right)\\ \text{s.t.} & \sum_{v\in V}x_{v}=k & (7)\\ & \sum_{u\in N_{1}(v)}\left(y_{v,u}-y_{u,v}\right)\ge1-\left(\vert V\vert-k\right)\cdot x_{v}\quad\forall v\in V & (8). \end{align*}The by now familiar constraint (7) sets the size of the selected set. Constraint (8) says that the flow out of any vertex $v$ must be one greater than the flow in \emph{unless} the vertex is selected (in which case nothing needs to flow out of it).</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">To assess how the various models perform, I coded them in Java using CPLEX 22.1 as the solver and ran a few experiments on randomly generated graphs. I did not do nearly enough experiments for any definitive conclusions, but I will quote the results of one that I think is somewhat informative. The test graph has $\vert V\vert=2,000,$ $\vert E\vert=23,360$ and diameter $\delta(G)=5.$ Each model was run for 10 minutes (not including the time spent constructing the model). The next table summarizes the results.</p><p><style type="text/css">p, li { white-space: pre-wrap; }</style><style type="text/css">p, li { white-space: pre-wrap; }</style></p><style type="text/css">.tg {border:none;border-collapse:collapse;border-color:#bbb;border-spacing:0;} .tg td{background-color:#E0FFEB;border-color:#bbb;border-style:solid;border-width:0px;color:#594F4F; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#9DE0AD;border-color:#bbb;border-style:solid;border-width:0px;color:#493F3F; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} .tg .tg-dvpl{border-color:inherit;text-align:right;vertical-align:top} </style><style type="text/css">.tg {border:none;border-collapse:collapse;border-color:#bbb;border-spacing:0;} .tg td{background-color:#E0FFEB;border-color:#bbb;border-style:solid;border-width:0px;color:#594F4F; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#9DE0AD;border-color:#bbb;border-style:solid;border-width:0px;color:#493F3F; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} .tg .tg-dvpl{border-color:inherit;text-align:right;vertical-align:top} </style><div align="center"><table class="tg"><thead> <tr> <th class="tg-c3ow"><br /></th> <th class="tg-c3ow">Distance</th> <th class="tg-0pky">Distance2</th> <th class="tg-0pky">Distance3</th> <th class="tg-c3ow">Assignment</th> <th class="tg-0pky">Flow</th> </tr></thead><tbody> <tr> <td class="tg-c3ow">Binary variables</td> <td class="tg-dvpl">9,976</td> <td class="tg-dvpl">12,000</td> <td class="tg-dvpl">2,000</td> <td class="tg-dvpl">4,002,000?</td> <td class="tg-dvpl">2,000</td> </tr> <tr> <td class="tg-c3ow">Total columns</td> <td class="tg-dvpl">9,976</td> <td class="tg-dvpl">12,000</td> <td class="tg-dvpl">9,976</td> <td class="tg-dvpl">4,002,000?</td> <td class="tg-dvpl">48,720</td> </tr> <tr> <td class="tg-c3ow">Total rows</td> <td class="tg-dvpl">9,977</td> <td class="tg-dvpl">12,001</td> <td class="tg-dvpl">9,977</td> <td class="tg-dvpl">2,001?</td> <td class="tg-dvpl">2,001</td> </tr> <tr> <td class="tg-c3ow">Nonzero coefficients</td> <td class="tg-dvpl">4,017,952</td> <td class="tg-dvpl">257,600</td> <td class="tg-dvpl">4,017,952</td> <td class="tg-dvpl">12,002,000?</td> <td class="tg-dvpl">97,440</td> </tr> <tr> <td class="tg-c3ow">Objective value</td> <td class="tg-dvpl">3,202</td> <td class="tg-dvpl">3,192</td> <td class="tg-dvpl">3,181</td> <td class="tg-dvpl">none</td> <td class="tg-dvpl">3,506</td> </tr> <tr> <td class="tg-c3ow">Lower bound</td> <td class="tg-dvpl">3,139.7</td> <td class="tg-dvpl">3,139.2</td> <td class="tg-dvpl">3,143.3</td> <td class="tg-dvpl">none</td> <td class="tg-dvpl">1,980</td> </tr></tbody> </table></div> <p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">None of the models reach proven optimality, and in fact the assignment model hit the time limit early in the presolve phase, before CPLEX printed any statistics about dimensions. All the output I got was that presolve "has eliminated 0 rows and 0 columns..." The dimensions I listed are the dimensions for the paper model, before any presolve reductions.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">If you are wondering about why the "Distance2" model (which is the original distance model with Rob's modification to constraint (3)) has larger row and column dimensions than the original, it turns out the both start with the same initial dimensions but the CPLEX presolver removes a bunch of rows and columns in the original model that it cannot remove from Rob's version. Still, Rob's version winds up with an order of magnitude fewer nonzero coefficients than the original version (or my tweak to it, "Distance3").</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Based on this and a few other preliminary runs, there are a handful of takeaways that I am somewhat comfortable in saying.</p><ul style="text-align: left;"><li> As graph size grows, the assignment model fairly quickly becomes too large to use. Hypothetically it could reach proven optimum faster than the others given a liberal time limit, but the betting line is against that.</li><li>Of the remaining models, my flow model has the most columns but the fewest rows and the fewest nonzeros. Unfortunately, it also has the worst performance. On a couple of other tests it did better, at least early on, with the incumbent solution value, but it seems to have the weakest bounds. In the interest of salvaging a little bit of my bruised pride, I will note that fewest nonzeros means that, as the graph grows, it might at some point be the only one of these models to fit in memory.</li><li>The tweak I made to the original model ("Distance3") did best in both incumbent value and bound ... on this example ... within the 10 minute run time limit. There is no guarantee this holds up in other cases, and it ties with the original model for largest constraint matrix (excluding the assignment model).</li><li>Rob's tweak has more rows and columns than the original model (or my tweak), but not hugely more, and it has the fewest nonzeros among the distance models. So it is definitely a contender for the production model, at least pending more testing.</li></ul>Source code (in Java) that generates a random graph and tests the model is available from <a href="https://gitlab.msu.edu/orobworld/social-networks" target="_blank">my repository</a>.<br /><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><style type="text/css">p, li { white-space: pre-wrap; }</style>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-75416765584541254452022-06-17T15:34:00.000-04:002022-06-17T20:10:06.101-04:00Partitioning an Interval<p>I recently saw a question on a help forum that fits the following pattern. The context is an optimization model (mixed integer program). There is a continuous variable $x$ and a constant value $a$ (frequently but not always 0), and we want to distinguish among three cases (with different consequences in the model): $x<a;$ $x=a;$ or $x>a.$ Each case has separate consequences within the model, which I will call $C_1,$ $C_2$ and $C_3$ respectively. Variations on this question arise frequently. For instance, make $x$ the difference of two variables, set $a=0$ and you are distinguishing whether the difference is negative, zero or positive.</p><p>To make things work properly, we need $x$ to be bounded, say $L\le x \le U.$ The goal is to break the feasible range of $x$ into three intervals. To do so, we will introduce a trio of binary variables, $y_1, y_2, y_3$ together with the constraint $$y_1 + y_2 + y_3 = 1.$$ We will come up with constraints such that fixing $y_1 = 1$ puts you in the left portion of the domain, fixing $y_2 = 1$ puts you in the middle portion, and fixing $y_3 = 1$ puts you in the right portion. The $y$ variables are then used elsewhere in the model to enforce whichever condition $C$ applies.</p><p>I'm using three variables for clarity. You can get by with two binary variables (changing the equality constraint above to $\le$), where both variables equaling zero defines one of the domain intervals. Figure 1 shows what the user wants and how the $y$ variables relate to it.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2iGQSQlqq5RFQIwGlV5MQUn01XnRyxM_wu1NoY56dimbdzy6Mydl393vtc_HMNcL9kZWCjscuBGB3fsV6_dYr7ENv8JfAcZPKODPaHH3z2YtU6x5yuCFy_wxqotXUeBQkdhdmSYjsvJhDZEC8c5hnbg7OVoo2mBarCWoqRJtpM94V5cqe5aXpQjwE/s600/interval1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="intervals desired by user" border="0" data-original-height="116" data-original-width="600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2iGQSQlqq5RFQIwGlV5MQUn01XnRyxM_wu1NoY56dimbdzy6Mydl393vtc_HMNcL9kZWCjscuBGB3fsV6_dYr7ENv8JfAcZPKODPaHH3z2YtU6x5yuCFy_wxqotXUeBQkdhdmSYjsvJhDZEC8c5hnbg7OVoo2mBarCWoqRJtpM94V5cqe5aXpQjwE/s16000/interval1.png" title="Figure 1" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 1<br /></td></tr></tbody></table><p></p><p>The intervals for $x$ are $[L, a)$ when $y_1 = 1$, the singleton $[a, a]$ when $y_2 = 1,$ and $(a, U]$ when $y_3 = 1.$ Unfortunately, the user cannot have what the user wants because the feasible region of a MIP model must be closed, and the first and third intervals in Figure 1 are not closed. Closing them results in Figure 2.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguc4EcMxyV5Ds17Ig1DvM-OQmPbf6TrmWip58C79ferrsI_sUKobUzO_bu7t2kGwk6tF-DjDQhXoQluT6-DcUTmAOcOK56cUjIIkjMocNLseo18NXgMUTxSvn-_esRiuyZDknyCsV5O2iMUP3swDyqgcZXIZb0eAZ3sNbeDbx_ym6Un1EiJnEusIih/s600/interval2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="closed intervals that intersect" border="0" data-original-height="104" data-original-width="600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguc4EcMxyV5Ds17Ig1DvM-OQmPbf6TrmWip58C79ferrsI_sUKobUzO_bu7t2kGwk6tF-DjDQhXoQluT6-DcUTmAOcOK56cUjIIkjMocNLseo18NXgMUTxSvn-_esRiuyZDknyCsV5O2iMUP3swDyqgcZXIZb0eAZ3sNbeDbx_ym6Un1EiJnEusIih/s16000/interval2.png" title="Figure 2" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 2<br /></td></tr></tbody></table><p>The intervals for $x$ are now $[L, a],$ $[a, a]$ and $[a, U].$ The good news is that this decomposition is easy to accomplish. The bad news is that the user likely will not accept it. If $x=a$, any one of the $y$ variables could take the value 1, so any of the three conditions could be triggered.<br /></p><p>This brings us to the first (and more common) of two possible compromises. We can change the strict inequality $x > a$ to the weak inequality $x \ge a + \epsilon$ where $\epsilon$ is some small positive number, and do something analogous with the left interval. The result is Figure 3.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9A6i6Lt9658tFJ02rsi6rMgtYRZLKGRMA4yTFuUTGntKitoKEWYzVz3zIgq52AdprWNwLE9PbO2KfxX7twYaYkC9Zi5Vbv97z8cv47CKKmn1hhXenBmk-YKsUu-JE3Ri9uc7IXRvjNic9j2u5S2w76w5z7aMC7jYkZmdUXldvcgjV7Kc97KLKNNdJ/s598/interval3.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="intervals with gaps on either side" border="0" data-original-height="109" data-original-width="598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9A6i6Lt9658tFJ02rsi6rMgtYRZLKGRMA4yTFuUTGntKitoKEWYzVz3zIgq52AdprWNwLE9PbO2KfxX7twYaYkC9Zi5Vbv97z8cv47CKKmn1hhXenBmk-YKsUu-JE3Ri9uc7IXRvjNic9j2u5S2w76w5z7aMC7jYkZmdUXldvcgjV7Kc97KLKNNdJ/s16000/interval3.png" title="Figure 3" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 3<br /></td></tr></tbody></table><p>Now $y_1=1 \implies x \le a-\epsilon,$ $y_2 = 1 \implies x=a,$ and $y_3=1 \implies x \ge a + \epsilon.$ The good news is that all three intervals are closed. The bad news is that values of $x$ between $a-\epsilon$ and $a$ or between $a$ and $a + \epsilon$ are suddenly infeasible, whereas they were feasible in the original problem. Also note that any tiny deviation from $a$ will throw $x$ into no man's land. That leads to one more possibility, involving yet another positive constant $\delta < \epsilon$ and shown in Figure 4.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhC7EiUDoVBdapsOTRfFKEyEeMaOkVU1z5LsViSm7dhQh9Xj9unKP9XLUFVsxaWXFcDzp1Gefd-pvKMFfBvD2rUcBM7AT-MOE9LmtqH6D27etWApi4DyttIH_wldG1RqyN3qGF0CodUefJp3EXu7UkKMLkZDsBA7jzSAURfEiUtXQgwZU47ujfhvbl_/s600/interval4.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img alt="domain broken into three intervals" border="0" data-original-height="112" data-original-width="600" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhC7EiUDoVBdapsOTRfFKEyEeMaOkVU1z5LsViSm7dhQh9Xj9unKP9XLUFVsxaWXFcDzp1Gefd-pvKMFfBvD2rUcBM7AT-MOE9LmtqH6D27etWApi4DyttIH_wldG1RqyN3qGF0CodUefJp3EXu7UkKMLkZDsBA7jzSAURfEiUtXQgwZU47ujfhvbl_/s16000/interval4.png" title="Figure 4" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 4<br /></td></tr></tbody></table>In this version, the domain of $x$ is broken into the closed intervals $[L, a-\epsilon],$ $[a-\epsilon +\delta, a+\epsilon - \delta]$ and $[a+\epsilon, U].$ There are still values of $x$ that were feasible in the original model but no longer feasible, but now you get to count values of $x$ "close" to $a$ as signalling the second of your three conditions.<br /><p>The algebra to put either Figure 3 or Figure 4 into force is actually pretty straightforward. You just need to write a pair of inequalities $\dots \le x \le \dots$ where the left side expression is the sum of each $y$ variable times the lower bound that would apply when that variable is 1 and the right expression is the sum of each $y$ variable times the upper bound that would apply. So for Figure 3, where the intervals are $[L, a-\epsilon],$ $[a,a]$ and $[a+\epsilon, U]$, we would have $$Ly_1 + a y_2 + (a+\epsilon)y_3 \le x \le (a-\epsilon)y_1 + a y_2 + Uy_3.$$ For Figure 4, the constraints would be $$Ly_1 + (a-\epsilon+\delta)y_2 + (a+\epsilon)y_3 \le x \le (a-\epsilon)y_1 + (a+\epsilon-\delta) y_2 + Uy_3.$$</p><p>There is one "tactical" point to consider. When this is first explained to someone who was unaware that strict inequalities are verboten, they are often tempted to set $\epsilon = 10^{-G}$ where $G$ is the gross domestic product of their native country. The problem is that the solver is using finite precision computer arithmetic, and every solver has a tolerance value $\tau$ (small but not zero) such that coming within $\tau$ of satisfying a constraint is "close enough". If you pick $\epsilon$ (or $\delta$) too small, it has the same consequence as setting it equal to 0 in terms of how valid the solutions are. You can end up with a value of $a$ for $x$ (to within rounding error) triggering consequence $C_1$ or $C_3$ rather than $C_2$, or a value of $x$ strictly greater than $a$ triggering $C_2$ (or possibly even $C_1$), and so on. So $\epsilon$ (and, if needed, $\delta$) must be greater than the rounding tolerance used by the solver.<br /></p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-75454274231027003692022-05-24T15:16:00.000-04:002022-05-24T15:16:52.250-04:00Allocating Resource Sets<p>The following problem was <a href="https://or.stackexchange.com/questions/8435/how-to-model-this-user-packing-problem" target="_blank">posted</a> on Operations Research Stack Exchange. A system contains $K$ users and $N$ resources. Each unit of resource may be allocated to at most one user. Each user $k$ requires a specified number $d_k$ of resource units, <i>which must be consecutive</i>. Not all resource sequences of the proper length will be acceptable to a user. Citing an example from the original post, you might have $d_1=3,$ with user 1 accepting only one of following sets: $\{1, 2, 3\},$ $\{7, 8, 9\},$ $\{11, 12, 13\},$ or $\{17, 18, 19\}.$ The objective is to assign resources to as many users as is possible.</p><p>Formulating this as an integer linear program is straightforward, and I coded the IP model in Java (using CPLEX as the solver) to provide a benchmark. The author of the question, however, was looking for heuristic solutions. I suggested a few possibilities, and decided to code two of them just to see how well they did. The setup for all of them is identical. You start with the following elements:</p><ul style="text-align: left;"><li>the set of unsatisfied users (initially, all users);</li><li>an enumeration of all resource sets compatible with any user (with each such set represented by an integer ID number);</li><li>a mapping of each user ID to the set of IDs of all compatible resource sets;</li><li>a mapping of each resource set ID to the set of IDs of all users who would accept that set; and</li><li>a mapping of each resources set ID to the set of IDs of all other resources sets that conflict with that set.</li></ul><p>Two resource sets conflict if they intersect, in which case allocating one of them precludes allocating the other since each unit of resource can be used at most once.</p><p>All the heuristics I suggested work by finding a valid allocation (an unused resource set that does not conflict with any resource set already used and an user who has not yet been served and will accept that resource set), making that allocation, and then updating the mappings described above. Updating the mappings means removing the user who just received resources, removing the resource set just assigned, and removing any other unassigned resource sets that conflict with the set just assigned. Those changes can have "ripple effects": removing a user may result in a surviving resource set having no compatible users left (in which case the resource set can be removed), and removing a resource set (whether used or due to a conflict) may leave a user with no surviving resource sets that are compatible, in which case the user can be removed. So the update step involves some looping that must be coded carefully.</p><p>The various heuristics I suggested are distinguished by two fundamental choices. First, in each iteration do you pick an available resource set and then look for a user who will accept it, or do you pick a user who has not yet been served and then look for a compatible resource set that is still available? Second, regardless of order, what criteria do you use for selecting users and resource sets?</p><p>There are a lot of ways to make those choices, and I home in on two. My first approach starts by selecting the available resource set with the fewest remaining conflicts and then chooses the compatible user having the fewest surviving options for resource sets. My second approach starts by selecting the remaining user with the fewest surviving options for resource sets and then choosing the compatible resource set with the fewest remaining conflicts. In all cases, ties are broken randomly. My logic is that picking the resource set with the fewest conflicts will leave the most surviving resource sets and (hopefully) allow more users to be served, and picking the user with the fewest remaining options will (hopefully) reduce the number of users who are frozen out because all compatible resource choices have already been allocated.</p><p>I'll take a moment to note here that the two approaches I just stated are intended to be one-pass heuristics (find a feasible allocation and stop). You could rerun them with different random seeds, but that would only change the results (not necessarily for the better) if one or more ties had occurred during the first pass. Another possibility, which I did not bother to code, would be to select either resource set or user randomly at each stage, then select the other half of the assignment randomly from the compatible choices. I suspect that any single run of the random approach would likely do worse than what I described above, but you could keep rerunning the purely random heuristic for a specific number of iterations (or with a time limit) and track the best solution ever found. That might improve over the versions I tested.</p><p>Using the dimensions specified (in a comment, replying to me) by author of the question, I did a small number of tests. In those tests, the resource set first heuristic consistently beat the user first heuristic, and both exhibited what I would consider to be decent results. On three test runs with different seeds for the random problem generator, I got results of <span class="comment-copy">14/13/11, 15/14/11 and 14/13/10, where the first number is the optimal number of customers served (from the IP model), the second is the number served by the resource-first heuristic, and the third is the number served by the customer-first heuristic. Run times for the heuristics on my humble PC (including setup times) were minuscule (under 20 milliseconds).</span></p><p><span class="comment-copy">My Java code can be found <a href="https://gitlab.msu.edu/orobworld/resourcesets" target="_blank">here</a>.<br /></span></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com17tag:blogger.com,1999:blog-8781383461061929571.post-24777106475825509082022-03-13T19:06:00.002-04:002022-03-13T19:06:50.477-04:00Wolf, Goat and Cabbage Part II<p>This continues my <a href="https://orinanobworld.blogspot.com/2022/03/wolf-goat-and-cabbage.html" target="_blank">previous post</a> about the "Wolf, Goat and Cabbage" logic puzzle. Using a MIP model to solve it strikes me as somewhat analogous to hammering in a nail with the butt end of a screwdriver handle: it works, but maybe it's not the ideal tool. To me, a constraint solver would make more sense, so I coded up a model using IBM's CP Optimizer (henceforth "CPO", part of the CPLEX Optimization Suite). Unsurprisingly, it worked (after some debugging) and produced the same solution.</p><p>MIP solvers largely share the same "vocabulary" for building models (real / integer / binary variables, linear and maybe quadratic expressions, equality and inequality constraints). From my limited exposure to constraint solvers, there is significant variation in both what things the solver does and does not understand. Some of that is variable types. A very basic solver may understand logical (true/false) and integer variables, and maybe real-valued variables (although I'm not sure handling real variables is anywhere near universal). CPO (and presumably some other solvers) understand "interval variables", which as the name suggests represent intervals of discrete values (usually time intervals, though possibly location or some other aspect). Similarly, different solvers will understand different types of global constraints. I suspect that every CP solver worth mentioning understands "all different" (no two variables in a bunch of integer variables can take the same value), but some solvers will implement "no overlap" constraints (the time interval during which I am eating and the time interval during which I am showering cannot overlap) or precedence constraints (this job has to end before this other job can start). Those kinds of constraints make certain scheduling problems easier to solve with CP than with a MIP model.</p><p>Anyway, I'm not entirely new to CPO, though far from proficient, and I tripped over a few "features" while coding the puzzle. I wanted to use boolean (true/false) variables for certain things, such as whether an item had made it to the far side of the river (true) or was stuck on the near side (false). CPO lets you declare a boolean variable but then treats it as an integer variable, meaning that you need to think in terms of 0 and 1 rather than false and true. So you can't say "if $x$ then ..."; instead, you need to say "if $x = 1$ then ..." (and trust me, the syntax is clunkier than what I'm typing here). When you go to get the value of your boolean variable $x$ after solving the model, CPO returns a double precision value. CPLEX users will be used to this, because in a MIP model even integer variables are treated as double-precision during matrix computations. CP solvers, though, like to do integer arithmetic (as far as I know), so it's a bit unclear why my boolean variable has to be converted from double precision to integer (or boolean). Even odder is that, at least in the Java API, there is a method that returns the value of an integer variable as an integer <i>if you pass the name of the variable as the argument</i>, but if you pass the actual variable you are going to get a double. (Did a federal government panel design this?)</p><p>In any event, logic of the CPO model is moderately straightforward, with constraints like "you can't carry something in the boat if it isn't on the bank the boat departs from" and "if the wolf and goat end up in the same place at the end of a period, the farmer better end up there too". Some things are bit less clunky with CPO than with CPLEX. For instance, to figure out what (if anything) is in the boat at a given time, the MIP model requires binary variables subscripted by time and item index (1 if that item is in the boat on that trip 0 if not). The CPO model just needs an integer variable for each time period whose value is either the index of the thing in the boat or a dummy value if the boat is empty. Furthermore, the nature of the variable automatically takes care of a capacity constraint. Since there is only one variable for what's in the boat, at most one thing (whatever that variable indicates) can be in the boat.</p><p>Some (most?) constraint solvers, including CPO, provide a way to use a variable as an index to another variable. In my code, the integer variable indicating what's in the boat at time $t$ is used to look up the location variable (near or far bank) for that item at time $t$ from a vector of location variables for all items at time $t$.</p><p>Anyway, the <a href="https://gitlab.msu.edu/orobworld/wolfgoatcabbage" target="_blank">code</a> in my repository has been updated to include the CPO model, and it's heavily commented in case you wanted to compare it to the MIP model.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-81278097693142300442022-03-11T15:26:00.002-05:002022-03-11T15:26:52.707-05:00Wolf, Goat and Cabbage<p>On Operations Research Stack Exchange, someone <a href="https://or.stackexchange.com/questions/7818/wolf-goat-and-cabbage-riddle-as-an-optimization-problem" target="_blank">asked</a> about a possible connection between the <a href="https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem" target="_blank">"wolf, goat and cabbage"</a> logic puzzle and Monge's <a href="https://en.wikipedia.org/wiki/Transportation_theory_(mathematics)#Monge_and_Kantorovich_formulations" target="_blank">optimal transport problem</a>. In the logic puzzle, a farmer has to get a wolf, a goat and a cabbage across a river using a boat that can only accommodate one of them (plus the farmer) at a time. If you leave the wolf and goat alone together at any point, the wolf eats the goat. If you leave the goat and cabbage alone together at any point, the goat eats the cabbage. Fortunately, the wolf has no appetite for cabbage and the cabbage does not seem to want to eat anything, else the problem would be infeasible.</p><p>Neither Monge's transport problem nor the more commonly taught (in my opinion) <a href="https://en.wikipedia.org/wiki/Transportation_theory_(mathematics)#Hitchcock_problem" target="_blank">Hitchcock transportation problem</a> directly apply, although you can (almost) treat the puzzle as a multiperiod commodity flow with an "inventory" of each item (wolf, goat, cabbage) on each side of the river. The "almost" part is that you need some of the variables to be integer, for two reasons. One is that the LP relaxation of the logic constraints (e.g., inventory of wolf + inventory of goat $\le 1$ on this side of the river at this time) will result in fractional values (we'll leave half a wolf and half a goat here and carry the other halves across the river) (which would greatly diminish the values of both wolf and goat). The other is that the objective is to minimize the number of trips made. It would be tempting to just assign a cost of 1 to each movement of an object in either direction, but the catch is that you will occasionally cross with nothing in the boat (besides the farmer). Those "deadheading" trips count toward the objective, but it's tricky to assign a cost to a zero flow.</p><p>To fill in some idle time, I coded up a MIP model. Mind you, I am not advocating MIP as a way to solve problems like this; I just wanted to confirm my thinking (in particular, that an LP commodity flow model would have fractional solutions). Assume that the farmer arrives at the left bank at time 0 with all three items, and that each trip (in either direction) counts as one time unit, with the first trip occurring at time 1. We need to set an upper bound $T$ on the number of trips. Since the state of the system is described by the location of four things (counting the farmer), with each have two possible locations (left bank, right bank), $T=2^4 =16$ works. The set of items will be denoted $I=\lbrace W, G, C\rbrace.$ My formulation uses the following variables.</p><ul style="text-align: left;"><li>$L_{i,t}\in [0,1]$ and $R_{i,t}\in [0,1]$ are the inventories of item $i\in I$ at time $t\in \lbrace 0,\dots,T$ on the left and right banks respectively, after any trip occurring in that period.</li><li>$x_{i,t}\in \lbrace 0,1 \rbrace$ and $y_{i,t}\in \lbrace 0,1 \rbrace$ are the amount of item $i$ crossing the river at time $t$ from left to right or right to left respectively.</li><li>$z_t\in \lbrace 0,1 \rbrace$ is 1 if transport is ongoing and 0 if we are done (the farmer and all three items are on the right bank).</li></ul><p>It would be perfectly fine (but unnecessary) to make the inventory variables integer-valued, and we could also make the inventory variables integer-valued and drop the integrality restrictions on the cartage variables ($x$ and $y$).</p><p>Some of the variables can be fixed at the outset.</p><ul style="text-align: left;"><li>We start with all inventory on the left bank, so $L_{i,0}=1$ and $R_{i,0}=0$ for all $i\in I.$ </li><li>There is no trip at time 0, so $z_0=0$ and $x_{i,0}=y_{i,0}$ for all $i\in I$.</li><li>Trips alternate left-to-right (odd time periods) and right-to-left (even time periods), so $x_{i,t}=0$ for all $i\in I$ and for all even $t$ and $y_{i,t}=0$ for all $i\in I$ and for all odd $t$.<br /></li></ul><p>The objective function is to minimize the number of trips required. $$\min \sum_{t=1}^T z_t.$$</p><p>The constraints are rather straightforward.</p><ul style="text-align: left;"><li>The inventory on either bank in any period is the inventory on that bank from the preceding period, plus any arriving inventory and minus any departing inventory. So for $t\ge 1$ $$L_{i,t} = L_{i, t-1} - x_{i,t} + y_{i,t}\quad \forall i\in I$$ and $$R_{i,t} = R_{i,t-1} + x_{i,t} - y_{i,t}\quad \forall i\in I.$$</li><li>In an odd numbered period (where the farmer ends up on the right bank), neither wolf and goat nor goat and cabbage can be on the left bank. So for odd $t$ $$L_{W,t} + L_{G,t} \le 1$$ and $$L_{G,t} + L_{C,t}\le 1.$$</li><li>In an even numbered period (where the farmer ends up on the left bank), neither wolf and goat nor goat and cabbage can be on the right bank <i>unless</i> the problem is completed ($z_t = 0$), in which case the farmer remains on the right bank. So for even $t$ $$R_{W,t}+R_{G_t} + z_t \le 2$$ and $$R_{G,t} + R_{C,t} + z_t \le 2.$$</li><li>Transport continues until the left bank is empty. $$3z_t \ge \sum_{i\in I} L_{i,t - 1}\quad \forall t\ge 1.$$</li></ul><p>It does indeed produce a correct solution, using seven trips (see the Wikipedia page for the solution) ... and with integrality conditions dropped it does indeed produce a nonsense solution with fractions of items being transported.</p><p><a href="https://gitlab.msu.edu/orobworld/wolfgoatcabbage" target="_blank">Java code</a> for this model (using CPLEX) is in my Git repository. <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-8089323189289733842022-03-03T15:32:00.060-05:002022-03-09T19:46:45.105-05:00Finding Almost All Paths<p>A question posted on Stack Overflow (and subsequently deleted) led to a <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2022/01/an-all-paths-network-problem.html" target="_blank">blog post</a> by Erwin Kalvelagen on how to find all paths between two nodes in a directed graph (possibly with self-loops, i.e. arcs from a node to itself) subject to two constraints: no arc can be used more than once in a path; and there is an upper limit $M$ on the number of arcs used. Note that a path might visit a *node* more than once. It just cannot repeat an arc. The original question seems to have referred to an undirected graph, but Erwin's post works with directed graphs and so will I.<br /><br />Erwin explored some mixed integer linear programming (MIP) models in his post, and a <a href="https://or.stackexchange.com/questions/7966/how-to-compute-all-paths-between-two-given-nodes-in-a-network/" target="_blank">subsequent post</a> on OR Stack Exchange led to more proposals of MIP models (including one from me). I also suggested that a "brute force" approach might be faster than any of the MIP models. In what follows, I will spell out both my MIP model and the brute force approach I used. Java code for both (which requires CPLEX and the Apache Commons Collections library) are in my <a href="https://gitlab.msu.edu/orobworld/allpaths" target="_blank">code repository</a>.</p><p>In what follows $A$ is the set of arcs in the graph, $s$ is the origin node for all paths, $t$ is the destination node for all paths, and $M$ is the maximum number of arcs to include in a path.</p> <h3 style="text-align: left;">MIP model</h3><div style="text-align: left;"> </div><div style="text-align: left;">The MIP model uses the following variables. </div> <div style="text-align: left;"><ul style="text-align: left;"><li>$u_{a}\in\left\{ 0,1\right\} $ is 1 if and only if arc $a$ is used on the path.</li><li>$f_{a}\in\left\{ 0,1\right\} $ is 1 if and only if arc $a$ is the first arc on the path.</li><li>$\ell_{a}\in\left\{ 0,1\right\} $ is 1 if and only if arc $a$ is the last arc on the path.</li><li>$y_{ab}\in\left\{ 0,1\right\} $ is 1 if and only if arc $b$ immediately follows arc $a$ on the path.</li><li>$z_{a}\in\left[0,M\right]$ will be the number of arcs preceding arc $a$ on the path (0 if $a$ is not on the path).</li></ul> <p>Some of the variables can be eliminated (fixed at 0) at the outset.</p> <ul style="text-align: left;"><li>$f_{a}=0$ if node $s$ is not the tail of arc $a.$</li><li>$\ell_{a}=0$ if node $t$ is not the head of arc $a.$</li><li>$y_{ab}=0$ if the head of arc $a$ is not the tail of arc $b.$</li></ul> <p>Since we are just interested in finding feasible solutions, the objective is to minimize 0.</p><p>Use of the $z$ variables mimics the <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem#Miller%E2%80%93Tucker%E2%80%93Zemlin_formulation[23]" target="_blank">Miller-Tucker-Zemlin</a> approach to avoiding subtours in the traveling salesman problem. A common knock on the MTZ formulation for the TSP is that it has a somewhat loose continuous relaxation. Since we have a trivial objective (all feasible solutions are optimal), that is not a concern here. </p></div><p>The constraints are as follows.</p> <ul style="text-align: left;"><li>There must be one first arc and one last arc.<br />$$\sum_{a\in A}f_{a}=1.$$$$\sum_{a\in A}\ell_{a}=1.$$</li><li>At most $M$ arcs can be used.$$\sum_{a\in A}u_{a}\le M.$$</li><li>An arc is used if and only if it is either the first arc or follows another arc on the path.$$f_{a}+\sum_{b\in A}y_{ba}=u_{a}\quad\forall a\in A.$$</li><li>The last arc must be a used arc.$$\ell_{a}\le u_{a}\quad\forall a\in A. (1)$$</li><li>The sequence value of an unused arc is 0.$$z_{a}\le Mu_{a}\quad\forall a\in A.$$</li><li>No arc can follow the last arc.$$\ell_{a}+\sum_{b\in A}y_{ab}\le1\quad\forall a\in A. (2)$$</li><li>If an arc is used, either it is the last arc or another arc follows it.$$\ell_{a}+\sum_{b\in A}y_{ab}=u_{a}\quad\forall a\in A. (3)$$</li><li>If an arc $b$ follows arc $a$, the sequence number of arc $b$ must be one higher than the sequence number of arc $a.$ $$z_{a}-z_{b}+\left(M+1\right)y_{ab}\le M\quad\forall a,b\in A,a\neq b.$$</li></ul> <p>Over on OR Stack Exchange, Rob Pratt correctly pointed out that constraint (3) implies constraints (1) and (2). I've left them in the Java code anyway. The CPLEX presolver removes them automatically.<br /> </p><h4 style="text-align: left;">Finding all solutions </h4> <p>To find all solutions, one possible approach is to solve whatever MIP model you choose, then add a "no-good" constraint that eliminates the solution just found (and only that one) and solve again, until eventually the aggregation of "no-good" constraints makes the model infeasible. What I did instead was to use the "populate" command in CPLEX, which accumulates a pool of solutions. Besides a time limit, I used two non-default parameter settings: I cranked up the solution pool capacity (the maximum number of solutions to find) to something high enough to let it find all solutions; and I set the solution pool intensity to its highest value (4), which tells CPLEX to aggressively seek out all feasible solutions.</p> <h3 style="text-align: left;">Brute force approach</h3><div style="text-align: left;"> </div><div style="text-align: left;">The brute force approach is remarkably straightforward. We will use a queue of (incomplete) paths that I will call the "to-do list". Start by creating a length one path for each arc with tail $s$ and add them to the to-do list. We now proceed in a loop until the to-do list is empty. At each iteration, we pull a path $P$ off the to-do list, identify the arcs whose tails match the head of the last arc in $P$, remove any arcs already on $P$, and for each surviving arc $a$ create a new path $P'$ by extending $P$ with $a$. If the head of arc $a$ is $t$, $P'$ is a new $s-t$ path, which we record. Either way, if $P'$ has length less than $M$, we add it to the to-do list, and eventually try to extend it further.</div><div style="text-align: left;"> </div> <div style="text-align: left;"><h3 style="text-align: left;">Do they work?</h3></div> <div style="text-align: left;"> </div><div style="text-align: left;">Would I be posting this if they didn't. :-) I tested both methods on the digraph from Erwin's post, which has 10 nodes and 22 arcs (with two self-loops). The source and sink are nodes 1 and 10 respectively. With $M=3$ there are four paths (which both methods find), and with $M=4$ there are nine paths (which both methods find). In both cases, brute force took about 1 ms. on my PC (using non-optimized Java code with a single thread). CPLEX times were rather stochastic, but I think it fair to say that building the models took around 55+ ms. and solving them typically took 20 ms. or more.</div> <div style="text-align: left;">When I set a maximum length ($M$) of 10, things got interesting. The brute force approach found 268 paths (in about 6 ms), while CPLEX found only 33. Neither time limit nor pool size were a factor, so I assume that this is just a limitation of the aggressive setting for pool intensity. That means that to find all possible solutions using CPLEX, the solve / add constraint / solve again approach would be necessary.<br /></div>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-88890066030937387162022-02-22T14:57:00.003-05:002022-03-09T14:53:30.902-05:00A "Decoupled" Quadratic Program<p>Someone posted a <a href="https://or.stackexchange.com/questions/7806/how-to-handle-a-non-separable-bilinear-objective-function-in-the-special-case-of" target="_blank">question</a> on OR Stack Exchange about a problem with a "nonseparable bilinear objective function" with "decoupled constraints". The problem has the following form:</p><p>\begin{alignat*}{1} \min & \sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}x_{i}y_{j}+b^{\prime}x+c^{\prime}y\\ \textrm{s.t. } & x\in X\subset\mathbb{R}^{m}\\ & y\in Y\subset\mathbb{R}^{n} \end{alignat*}</p><p>where $X$ and $Y$ are polytopes (i.e., defined by linear constraints, and bounded) in the positive orthants of their respective spaces (i.e., $x\ge 0$ and $y\ge 0$) and $a$, $b$ and $c$ are all nonnegative. So besides everything being nonnegative, the key things here are that the objective contains bilinear terms (always involving the product of an $x$ and a $y$) and each constraint involves either only $x$ or only $y$. The author was looking for a way to exploit the separability of the constraints in solving the problem.</p><p>It occurred to me that a heuristic approach might be to solve a sequence of linear programs, alternating between $X$ and $Y$. Let $h(x,y)$ be the objective function of the original problem, and let $$f(x;y) = \sum_i\sum_j a_{ij}x_i y_j + b^\prime x$$and $$g(y;x)=\sum_i\sum_j a_{ij}x_i y_j + c^\prime y$$ be respectively the portions of the objective dependent on $x$ with $y$ fixed or dependent on $y$ with $x$ fixed. Each is linear in one set of variables (the other set being fixed). So we could proceed as follows:</p><ol style="text-align: left;"><li>Minimize $f(x;0)$ over $X$ (a linear program) to get a starting value $x^0\in X.$</li><li>Minimize $g(y;x^0)$ over $Y$ to get a starting value $y^0\in Y.$ We now have an incumbent solution with objective value $h(x^0,y^0)=b^\prime x^0 + g(y^0;x^0).$ Set $t=0$.</li><li>Repeat the following until the first time that the incumbent does not improve.</li><ol><li>Minimize $f(x;y^t)$ over $X$ to get $x^{t+1}.$ If $f(x^{t+1},y^t) + c^\prime y^t$ is less than the incumbent value, make $(x^{t+1}, y^t)$ the new incumbent, else punt.</li><li>Minimize $g(y;x^{t+1})$ over $Y$ to get $y^{t+1}.$ If $g(y^{t+1};x^{t+1}) + b^\prime x^{t+1}$ is less than the incumbent value, make $(x^{t+1},y^{t+1})$ the new incumbent and increment $t$, else punt.</li></ol></ol><p>This will generate a sequence of solutions, each a corner point of the feasible region $X\times Y,$ with monotonically decreasing objective values.</p><p>Before continuing, it is worth noting that we are guaranteed that at least one corner of $X\times Y$ will be an optimum. This is of course always true when the objective is linear, but not always true with a quadratic objective. In our case, suppose that $(x^*, y^*)$ is an arbitrary optimum for the original problem. Then $x^*$ must minimize $f(x;y^*)$ over $X$, so either $x^*$ is a corner of $X$ of (if there are multiple optimal) $x^*$ can be replaced by a corner of $X$ with the same value of $f(x;y^*)$ (and thus the same value of $h(x;y^*)$, since the difference $c^\prime y^*$ does not depend on $x$). Apply the same logic on the other variable to show that $y*$ is either a corner of $Y$ or can be replaced by one.</p><p>I'm still calling this a heuristic, because there is one more piece to the puzzle. Could the procedure stop at a corner of $X\times Y$ that is a local but not global optimum? I'm not sure. Offhand, I do not see a way to prove that it will not, but I also cannot easily conjure up an example where it would.<br /></p><p>To test this (and to confirm that I was not hallucinating, which has been known to happen), I wrote some Java code to generate random test problems and try the procedure. The code uses CPLEX to solve the LPs. In all test cases, it terminated with a solution (at least locally optimal) in a pretty small number of iterations. </p><p>As a benchmark, I tried solving the full QP models with CPLEX, setting its "optimality target" parameter to "OPTIMALGLOBAL", which tells CPLEX to search for a global optimum to a nonconvex problem. This causes CPLEX to turn the problem into a mixed-integer quadratic program, which shockingly takes longer to solve than an LP. In a limited number of test runs, I observed a surprisingly consistent behavior. At the root node, CPLEX immediately found a feasible solution and then immediately found a better solution that matched what my procedure produced. After than (and within a usually stingy time limit I set), CPLEX made progress on the bound but never improved on the incumbent. In two test runs, CPLEX actually reached proven optimality with that second incumbent, meaning my procedure had found a global optimum.</p><p>So perhaps my "heuristic" can actually be shown to be an exact algorithm ... or perhaps not. At least in my test runs, CPLEX working on the QP found the best solution about as fast as, or maybe slightly faster than, my procedure did. On the other hand, my procedure only requires an LP solver, so it will work with code that does not accept QPs.</p><p>My Java code (which requires CPLEX) is available <a href="https://gitlab.msu.edu/orobworld/decoupledqp" target="_blank">here</a>.</p><p><b>Addendum</b>: User "Dusto" on the Discord Operations Research server <a href="https://discord.com/channels/888822916186243142/888897947650097162/95093621228844652" target="_blank">posted a simple counterexample</a> to global convergence. The example has no constraints other than bounds on the variables (from 0 to 10 in all cases). The $b$ and $c$ vectors are strictly positive, so the linear programs in my heuristic will start at the origin and get stuck there. The $A$ matrix is crafted so that a negative overall objective value is attainable.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-20170399470518793232022-02-10T11:37:00.000-05:002022-02-10T11:37:40.528-05:00Tournament Scheduling: CPLEX v. CP Optimizer (Part 3)<p>This is the third and final chapter in my opus about scheduling "pods" (players) in a tournament. Reading the <a href="https://orinanobworld.blogspot.com/2022/02/tournament-scheduling-cplex-v-cp.html" target="_blank">first post</a> will be essential to understanding what is going on here. Reading the <a href="https://orinanobworld.blogspot.com/2022/02/tournament-scheduling-cplex-v-cp_9.html" target="_blank">second post</a> (in which I formulate an integer programming model for the problem) is definitely optional.</p><p>I will stick with the notation introduced in the prior post. As before, there are $N$ pods (players) being organized into teams of two to play in $G$ games (two teams per game, one in white, one in black). Each game has two slots corresponding to the two teams in the game.<br /></p><ul style="text-align: left;"><li>$t$ indexes a team;</li><li>$p$ and $q$ index pods;</li><li>$g$ indexes a game;</li><li>$s$ indexes a slot (and $s'$ is the other slot of the same game);</li><li>$c$ is a jersey color (0 for white, 1 for black).</li></ul><p>For a slot $s$, I will use $G_s$ and $C_s$ to denote respectively the game in which the slot exists and the jersey color (0 or 1) of the team in the slot. For a team $t$, $P_{t,0}$ and $P_{t,1}$ will be the two pods on the team. (The order they are listed is immaterial.)<br /></p><p>The model variables are almost (but not exactly) what they were in the IP model.</p><ul style="text-align: left;"><li>$x_{t}\in \lbrace 0, \dots, 2G-1\rbrace$ is the index of the slot in which team $t$ plays (where slots 0 and $G$ are the white and black teams in the first game and slots $G-1$ and $2G-1$ are the white and black teams in the final game).<br /></li><li>$y_{pqg}\in \lbrace 0,1\rbrace$ is 1 if pods $p$ and $q\neq p$ are playing in game $g$ on opposing teams.<br /></li><li>$z_{pg}\in \lbrace 0,1\rbrace$ is the color jersey worn by pod $p$ at the conclusion of game $g$ (regardless of whether or not $p$ played in game $g$).</li><li>$w_{pg}\in \lbrace 0,1\rbrace$ is 1 if pod $p$ changes jersey color going into game $g.$</li><li>$v_{pg} \in \lbrace 0,1\rbrace$ is 1 if pod $p$ is playing in game $g.$</li></ul><p>The changes from the IP model are that $x$ is general integer rather than binary (with dimension 1 rather than 2), the third index of $y$ is the game rather than the slot, and $v$ is a new variable. As in the IP model, the objective is to minimize $\sum_p \sum_g w_{pg}.$</p><p>The constraints are as follows. Note that the requirement that every team play exactly once is captured by the assignment of a single slot value to each team (via $x$).<br /></p><ul style="text-align: left;"><li>Two different teams cannot occupy the same slot: $$\textrm{allDiff}(x).$$</li><li>Occupying a slot implies playing in the game: $$x_t = s \implies (v_{P_{t,1},G_s} = 1 \wedge v_{P_{t,2},G_s} = 1) \quad \forall t,s.$$</li><li>Occupying a slot determines the color of the jersey worn in the game: $$x_t = s \implies (z_{P_{t,1},G_s} = C_s \wedge z_{P_{t,2},G_s} = C_s) \quad \forall t,s.$$</li><li>Jersey color for a pod remains constant from game to game unless a change is recorded: $$ z_{p,g} \neq z_{p,g-1} \iff w_{p,g} = 1 \quad \forall p, \forall g > 0.$$</li><li>Playing in consecutive games precludes changing jerseys: $$(v_{p,g - 1} = 1 \wedge v_{p,g} = 1) \implies w_{p,g} = 0 \quad \forall p, \forall g > 0.$$</li><li>Two pods are opponents if and only if they are playing in the same game with different colors: $$y_{pqg} = 1 \iff (v_{pg} = 1 \wedge v_{qg} = 1 \wedge z_{pg} \neq z_{qg} \quad \forall p, \forall q\neq p, \forall g.$$</li><li>Every pair of pods faces each other exactly twice: $$\sum_g y_{pqg} = 2 \quad \forall p, \forall q\neq p.$$</li></ul><p>Here "allDiff()" is the CPLEX notation for the "all different" constraint, a global constraint common to most constraint solvers. Given a collection of variables with the same range, the all different constraint says that no two variables in the collection can take the same value.</p><p>To put it mildly, I would not be shocked if there is a more efficient (easier to solver) way to write the problem as a CP model.<br /></p><p> </p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-41327156095317692632022-02-09T14:43:00.000-05:002022-02-09T14:43:30.244-05:00Tournament Scheduling: CPLEX v. CP Optimizer (Part 2)<p>In my <a href="https://orinanobworld.blogspot.com/2022/02/tournament-scheduling-cplex-v-cp.html" target="_blank">previous post</a>, I described a tournament scheduling problem and discussed my results using both an integer programming model (CPLEX) and a constraint programming model (CP Optimizer) to try to solve it. Here I will present the IP model. (This post will be gibberish unless you have read at least the start of the previous post.)<br /></p><p>There are multiple ways to write an integer programming (IP) model for the tournament problem. My initial instinct was to think in terms of what games a pod plays in, which pods it plays with/against and so on, but I fairly quickly changed to thinking in terms of teams rather than pods. With $N=9$ pods and two pods to a team, there are $N(N-1)/2=36$ possible teams, which we can enumerate at the outset. There will be 18 games, each containing two teams, and every team will play exactly once. Each game contains two "slots", one for the team in white jerseys and the other for the team in black jerseys.<br /></p><p>In what follows, I will (I hope) stick to the following notation:</p><ul style="text-align: left;"><li>$t$ indexes a team;</li><li>$p$ and $q$ index pods;</li><li>$g$ indexes a game;</li><li>$s$ indexes a slot (and $s'$ is the other slot of the same game);</li><li>$c$ is a jersey color (0 for white, 1 for black).</li></ul><p>Since I am using Java, all indexing is zero-based. Slots are numbered so that game 0 has slots 0 (white) and $G$ (black), game 1 has slots 1 (white) and $G+1$ (black), etc., where $G=N(N-1)/4$ is the number of games. I will denote by $T_p$ the set of teams $t$ containing pod $p$. <br /></p><p>The model variables are as follows.</p><ul style="text-align: left;"><li>$x_{ts}\in \lbrace 0,1\rbrace$ is 1 if team $t$ plays in slot $s,$ 0 if not.</li><li>$y_{pqs}\in \lbrace 0,1\rbrace$ is 1 if pod $p$ is playing in slot $s$ and is opposed by pod $q.$</li><li>$z_{pg}\in \lbrace 0,1\rbrace$ is the color jersey worn by pod $p$ at the conclusion of game $g$ (regardless of whether or not $p$ played in game $g$).</li><li>$w_{pg}\in \lbrace 0,1\rbrace$ is 1 if pod $p$ changes jersey color going into game $g,$ 0 if not.</li></ul><p>The objective is to minimize $\sum_p \sum_g w_{pg}.$ The constraints are the following.</p><ul style="text-align: left;"><li>Every team plays exactly once. $$\sum_s x_{ts} = 1 \quad\forall t.$$</li><li>Every schedule slot is filled exactly once. $$\sum_t x_{ts} = 1 \quad\forall s.$$</li><li>Pods $p$ and $q$ oppose each other with $p$ in slot $s$ if and only if $p$ is playing in $s$ and $q$ is playing in $s'$. $$y_{pqs} \le \sum_{t\in T_p} x_{ts} \quad\forall p, q\neq p, s.$$ $$y_{pqs} \le \sum_{t\in T_q} x_{ts'} \quad\forall p, q\neq p, s.$$ $$y_{pqs} \ge \sum_{t\in T_p} x_{ts} + \sum_{t\in T_q} x_{ts'} - 1 \quad\forall p, q\neq p, s.$$</li><li>Every pair of pods opposes each other exactly twice. $$\sum_s y_{pqs} = 2 \quad \forall p\neq q.$$</li><li>No pod plays consecutive games in different jerseys.$$\sum_{t\in T_p}\left( x_{t,s-1} + x_{t,s'} \right) \le 1 \quad \forall p, \forall s\notin \lbrace 0, G\rbrace.$$</li><li>A pod playing in a game has its jersey color determined by its team's slot. (This has the desirable side effect of preventing two teams containing the same pod from playing against each other.) $$\sum_{t\in T_p} x_{t, g+G} \le z_{pg} \le 1 - \sum_{t\in T_p} x_{t,g} \quad \forall p,g.$$(Note that for any game index $g$, slot $g$ is the white slot in game $g$ and slot $g+G$ is the black slot.)</li><li>A pod's color for any game (playing or not) is the same as its color for the previous game, unless a change occurs. $$z_{p, g-1} - w_{pg} \le z_{pg} \le z_{p, g-1} + w_{pg} \quad\forall p, \forall g \ge 1.$$</li></ul><p>In the next post, I will try to articulate my CP model for the problem. <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-3330612360992933362022-02-08T16:46:00.000-05:002022-02-08T16:46:04.916-05:00Tournament Scheduling: CPLEX v. CP Optimizer (Part 1)<p> A <a href="https://math.stackexchange.com/questions/4370828/tournament-schedule-assignment-via-convex-optimization" target="_blank">recent question</a> on Mathematics Stack Exchange asked about scheduling $N$ "pods" (players) in a tournament under the following conditions.</p><ul style="text-align: left;"><li>There will be $N(N-1)/4$ games played sequentially (no games in parallel).</li><li>Games pit teams of two pods each against each other. One team wears white and one team wears black.<br /></li><li>Each pod partners with every other pod once and plays against every other pod twice.</li><li>No pod can play in consecutive games wearing different jersey colors.</li></ul><p>The objective is find a schedule that minimizes the total number of times pods have to change their jersey color.</p><p>The question specifies $N=9$. Note that, since there are $N(N-1)/4$ games, a necessary condition for the problem to be feasible is that either $N$ or $N-1$ must be divisible by $4$. That condition is not, however, sufficient. An obvious counterexample is when $N=4$ and there are three games to be scheduled. The first game uses all four teams, as does the second game. Since no pod can be forced to wear different jerseys in consecutive games, all four teams would go into the second game wearing the same colors as in the first game ... which means being in the same teams, violating the rule about pods being teamed together exactly once.</p><p>I coded both an integer programming model (using CPLEX) and a constraint programming model (using CP Optimizer) in Java and tried to solve the tournament problem with $N=9$. Since everything is inherently integer and the constraints are more logical than arithmetic (the only real algebra is summing up the number of jersey changes), I had two initial expectations: that CPLEX would provide tighter lower bounds than CPO (because MIP models tend to do better with bounds than CP models); and that CPO would find feasible (and possibly optimal) schedules faster than CPLEX would, since the problem is inherently logic-based (and CP models often do better than MIP models on scheduling problems). I was half right. CPLEX did provide tighter lower bounds than CPO, at least within the time limits I was willing to use, although I don't think either came remotely close to a "tight" lower bound. CPO, however, struggled massively to find feasible schedules while CPLEX did not have too much trouble doing so.</p><p>Before going any further, I need to issue three caveats. First, the longest I ran the IP model was 30 minutes and the longest I ran the CP model was an hour. Second, while I am pretty familiar with formulating IP models, I am much less familiar running CP models, so I may have missed opportunities to make the CP model more efficient. Third, while CPLEX gives the user a fair amount of indirect control over the search process (by setting branching priorities, MIP emphasis, how frequently to try certain kinds of cuts and so on), CP Optimizer may offer the user even more flexibility in how to tailor searches I am not yet familiar enough to do anything beyond trying to indicate which variables should be the first candidates for branching (and I'm not sure I got that right).</p><p>I'll end this installment with a synopsis of some runs.</p><ul style="text-align: left;"><li>In a 30 minute run using the new setting 5 for the MIP emphasis parameter (stressing use of heuristics to improve the incumbent, at the cost of not paying too much attention to the lower bound), CPLEX found a feasible schedule with 14 jersey changes and a lower bound of 2.97 (a 79% gap). It evaluated somewhere around 18,500 nodes.<br /></li><li>In a 30 minute run, CP Optimizer branched over 957 <i>million</i> times but never found a feasible schedule and ended with a best bound of 0.</li><li>Finally, I tried running CPLEX for three minutes (in which it found an incumbent with 19 jersey changes) and then used that incumbent as a starting solution for a one hour run of CP Optimizer. CPO accepted the incumbent solution, did a bit over 1.55 <i>billion</i> branch operations, and failed to improve on it. The best bound was again 0. </li></ul><p>If you want to look at my code (which will make slightly more sense after my next couple of posts, where I will describe the models), it is available from my university GitLab <a href="https://gitlab.msu.edu/orobworld/podtourney" target="_blank">repository</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-36879494131181028262022-02-06T16:14:00.001-05:002022-02-06T16:14:51.446-05:00Taxi Dispatch<p>A <a href="https://or.stackexchange.com/questions/7771/average-time-between-two-dispatches-in-a-taxi-fleet-probably-a-batch-processing" target="_blank">question</a> on OR Stack Exchange pertains to a taxi stand. The assumptions are as follows.</p><ul style="text-align: left;"><li>There are $t$ taxis operating between points A and B, with all passenger traffic going from A to B. For concreteness, I'll refer to a "taxi stand" with a "line" of taxis, similar to what you might see at an airport or popular hotel.</li><li>The round-trip travel time for a taxi (taking passengers from A to B and deadheading back from B to A) is deterministic with value $T$. (Note: Deterministic travel time is the author's assumption, not mine.)</li><li>Each taxi has a capacity for $p$ passengers and is dispatched only when full.</li><li>Passengers arrive according to a Poisson process with parameter (rate) $\lambda$.</li></ul><p>The author was looking for the mean time between dispatches.</p><p>It has been a very long time since I taught queuing theory, and I have forgotten most of what I knew. My best guess for the mean time between dispatches, under the assumption that the system reached steady state, was $p/\lambda$, which would be the long-run average time required for $p$ passengers to arrive. To achieve steady-state, you need the average arrival rate ($\lambda$) to be less than the maximum full-blast service rate ($t p/T$). If arrivals occur faster than that, in the long term you will dispatching a taxi every $1/T$ time units while either queue explodes or there is significant balking/reneging.</p><p>To test whether my answer was plausible (under the assumption of a steady state), I decided to write a little discrete event simulation (DES). There are special purpose DES programs, but it's been way to long since I used one (or had a license), so I decided to try writing one in R. A little googling turned up at least a couple of DES packages for R, but I did not want to commit much time to learning their syntax for a one-off experiment, plus I was not sure if any of them handled batch processing. So I wrote my own from scratch.</p><p>My code is not exactly a paragon of efficiency. Given the simplicity of the problem and the fact that I would only be running it once or twice, I went for minimal programmer time as opposed to minimal run time. That said, it can run 10,000 simulated passenger arrivals in a few seconds. The program uses one data frame as what is known in DES (or at least was back in my day) an "event calendar", holding in chronological order three types of events: passengers joining the queue; taxis being dispatched; and taxis returning.</p><p>The simulation starts with all taxis in line and no passengers present. I did not bother with a separate "warm-up" period to achieve steady-state, which would be required if I were striving for more accuracy. For a typical trial run ($t=5$, $p=4$, $T=20$, $\lambda=0.7$), my guess at the mean time between dispatches worked out to 5.714 and the mean time computed from the run was 5.702. So I'm inclined to trust my answer (and to quit while I'm ahead).</p><p>If anyone is curious (and will forgive the lack of efficiency), you can take a look at my <a href="https://rubin.msu.domains/blog/taxis.nb.html" target="_blank">R notebook</a> (which includes the code).<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0