tag:blogger.com,1999:blog-87813834610619295712023-09-22T20:46:50.959-04:00OR in an OB WorldA mix of operations research items and software tricks that I'll probably forget if I don't write them down somewhere.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.comBlogger482125tag:blogger.com,1999:blog-8781383461061929571.post-57342727226178949832023-09-16T14:48:00.000-04:002023-09-16T14:48:31.757-04:00Unnecessary LyX Backup Files<p>If you edit a file from an older version of LyX and then save it, LyX will create an extra backup file with a lengthy name containing something about "version". As with other LyX backups, the file extension is ".lyx~". I believe this is a defense against the possibility that changes to the file format might muck up the file, and I have absolutely no problem with it. Let's call this the version backup.<br /></p><p>That said, for a long time now it has been the case that every time I created a new LyX file, saved it, then edited it and saved the edit, I would get both the standard backup file and (after the first edit only) the version backup. Subsequent saves would only produce the usual backup. This was a minor annoyance for me, since I was perpetually tracking down and deleting the version backups to reduce drive clutter (and cut down the time it took to backup my hard drive, which I do regularly).</p><p>Turns out there was a simple explanation (and fix) for this, which I found in a thread on a support forum. Like many users, at some point I created a new file, customized a few things, and then in Document > Settings..., clicked "Save as Document Defaults" to make those settings the defaults for all new documents. Doing so creates a file named "defaults.lyx" in the templates directory (which, on my Linux Mint system, is ~/.lyx/templates). That file used the file version in effect when I set it as defaults, meaning every new file I've created since then has started with that version and then been upgraded to whatever the current version is ... leading to all those version backups.</p><p>The fix was simple. I just created a new document and set it as the default template, which updated the template to the current LyX format (544 as of this writing). Now I just have to remember to repeat this whenever a new LyX version comes out with a new file format. (At my age, "I just have to remember" should not be taken lightly.)</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-24199057223154572202023-09-12T17:21:00.001-04:002023-09-16T14:33:07.698-04:00Randomly Generating a Connected Graph<p>Someone posted a <a href="https://or.stackexchange.com/questions/10942/how-to-generate-random-connected-planar-graph" target="_blank">question</a> on Operations Research Stack Exchange about generating random test instances of a connected graph. (They specified a <a href="https://en.wikipedia.org/wiki/Planar_graph" target="_blank">planar graph</a>, but I am going to ignore that restriction, which I suspect was unintended and which I am pretty sure would make the problem much harder.) The premise is fairly straightforward. You start with a fixed set of nodes, partitioned into three subsets (supply, transshipment and demand nodes). The goal is to select a random set of edges of fixed cardinality such that (a) the resulting graph is connected and (b) no edge connects two supply nodes or a supply node and a demand node. Setting the number of edges too low will obviously make the problem infeasible. The author specified that the graph could be either directed or undirected. My solutions are coded for undirected graphs but can be adapted to directed graphs.</p><p>I proposed two solutions, both of which I coded and tested in Java. One uses a mixed integer programming (MIP) model, which my code solves using CPLEX. Let $E$ be the set of all valid edges, $N$ the number of nodes and $K$ the desired number of edges. The MIP model uses binary variables $x_{i,j}$ for all $(i,j)\in E.$ The requirement to include $K$ edges is handled by the constraint $$\sum_{(i,j)\in E} x_{i,j} = K$$ and randomness is achieved by minimizing $$\sum_{(i,j)\in E} c_{i,j} x_{i,j}$$ where the cost coefficients $c_{i,j}$ are generated randomly. </p><p>That leaves the matter of forcing the graph to be connected. To do that, we introduce forward and backward flow variables $f_{i,j} \ge 0$ and $r_{i,j} \ge 0$ for each edge $(i,j),$ where $f_{i,j}$ is interpreted as flow from $i$ to $j$ and $r_{i,j}$ as flow from $j$ to $i.$ The concept is to single out one node (call it $s$) as a source for $N-1$ units of flow of some mystery commodity and assign every other node a demand of one unit of the commodity. For $i\neq s,$ the flow conservation constraint is $$\sum_{(j,i)\in E} (f_{j,i} - r_{j,i}) + \sum_{(i,j)\in E} (r_{i,j} - f_{i,j}) = 1,$$ which says that flow in minus flow out equals 1. To ensure that flow occurs only on selected edges, we add the constraints $$f_{i,j} \le (N-1) x_{i,j}$$ and $$r_{i,j} \le (N-1) x_{i,j}$$ for all $(i,j)\in E.$</p><p>The edge removal heuristic is a bit simpler. We start by selecting all eligible edges and shuffling the list randomly. While the number of edges in the graph exceeds $K,$ we pop the next edge from the list, remove it, and test whether the graph remains connected. If yes, the edge is permanently gone. If no, we replace the edge in the graph (but not on the list) and move to the next edge in the list. To confirm that removing edge $(i,j)$ leaves a connected graph, we start from node $i$ and find all nodes that can be reached from $i$ in one step (i.e., remain adjacent). Then we find all nodes adjacent to those nodes, and continue until either we encounter node $j$ (in which case the graph is still connected) or run out of nodes to test (in which case the graph is now disconnected).</p><p>The edge removal heuristic is simpler to explain, does not require a MIP solver and likely is faster. There are two potential disadvantages compared to the MIP approach. One is that the MIP is done after solving once, whereas there is a possibility that the edge removal heuristic fails to produce a connected graph due to an "unlucky" choice of which edges to remove, requiring one or more restarts with different random number seeds. The other is that if $K$ is set too low (making a connected graph impossible), the MIP model will detect that the problem is infeasible. With the edge removal heuristic, you would not be able to distinguish infeasibility from bad luck (although if the heuristic failed multiple times with different random seeds, infeasibility would start to look likely). In very very limited testing of my code, the edge removal heuristic was definitely faster than the MIP and achieved a valid result in the first pass each time.</p><p>Java code for both methods is in my <a href="https://gitlab.msu.edu/orobworld/randomconnectedgraph" target="_blank">code repository</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-72192781010666705332023-08-12T15:27:00.003-04:002023-08-12T15:27:32.756-04:00The "Set" Card Game<p>Someone asked a <a href="https://or.stackexchange.com/questions/10820/set-game-how-to-generate-all-sets-with-a-mip" target="_blank">question</a> (actually a pair of questions) about the <a href="https://en.wikipedia.org/wiki/Set_(card_game)" target="_blank">"Set" card game</a> on OR Stack Exchange. This was the first I had heard of the game. For purposes of this post, the game play is irrelevant. The key is that it uses a deck of 81 cards, each of which has four "features" with three possible values for each feature. Every combination of feature values appears on exactly one card. Three cards form a "set" if, for each of the four features, either all three cards have the same value or all three cards have different values. As noted in the Wikipedia page, there are 1,080 possible "sets". (I'm putting "set" in quotes to avoid confusion with, you know, sets.) The OR SE question asked if an integer program (IP) could be used to generate all 1,080 sets.</p><p>The answer is yes, with the caveat that while you can do it, an IP model is not the most efficient way to do it. I won't repeat my IP model here, since (a) it's a bit of a space hog and (b) you can read it in my answer on ORSE. The model is pretty small (97 binary variables) and finds one "set" quickly. There are two ways to get it to find all possible "sets". Both involve adding a "no good" constraint for each solution found, i.e., a constraint that eliminates that solution but no others. The more tedious approach solves the progressively growing model 1,081 times (the last time producing a declaration of infeasibility, meaning all "sets" have been found). The less tedious approach requires a solver that supports a callback function letting you reject candidate solutions by adding a constraint cutting them off (the same "no good" constraint used in the first approach).</p><p>Since there is no objective and the constraints are mostly logical in nature, constraint programming (CP) strikes me as a more intuitive approach than IP. The specifics of a CP model will depend on what types of constraints your CP solver supports, but I think the model I tried is likely to work with a lot of solvers. Since I coded the models in Java, I'll use 0-based indexing here. My variables (all integer-valued) are: $x_s\in \lbrace 0,\dots, 80\rbrace,$ the index of the card in slot $s\in \lbrace 0,1,2\rbrace$ of the "set"; and $y_{f,s}\in \lbrace 0,1,2\rbrace,$ the value of feature $f$ in the card in slot $s.$ The constraints are as follows:</p><ul style="text-align: left;"><li>allDifferent($x$) (no two slots can contain the same card);</li><li>$x_0 \lt x_1 \lt x_2$ (to avoid counting the same "set" more than once, we require that the cards be listed in ascending index order);</li><li>$y_{f,s} = a_{x_s,f},$ where $a_{c,f}$ is the value of feature $f$ in card $c$ (the value of a feature in a slot comes from the card in the slot); and</li><li>allDifferent$(\lbrace y_{f,0}, y_{f,1}, y_{f,2}\rbrace) \vee (y_{f,0}=y_{f,1} \wedge y_{f,1}=y_{f,2})\,\forall f$ (for every feature, either all three slots have different values or all three slots have the same value).</li></ul><p>This exploits two constraints that CP solvers typically have but IP solvers do not. One is the "all different" constraint, which forces a set of variables to take values that do not repeat. The other is the ability to use a decision variable as an index to either a constant vector or another variable vector. That comes into play in the third constraint, where the variable $x_s$ (the index of the card in slot $s$) is used as a subscript of $a$ to get the feature value of that card.</p><p>Even CP is probably overkill for this problem, though. We can compute the 1,080 possible solutions by brute force. I used a recursive function for this.</p><p>Here is how each method stacks up in terms of computation time. As I said, I coded it in Java, using CPLEX 22.1.1 for the IP models and CP Optimizer 22.1.1 for the CP model.</p><ul style="text-align: left;"><li>brute force took between 0.1 and 0.2 seconds;</li><li>the CP model also took between 0.1 and 0.55 seconds;</li><li>solving the IP model once using a callback took between 1.4 and 3 seconds; and</li><li>solving the IP model 1,081 times (without a callback) took around 56 seconds, give or take.</li></ul><p>Solution times for all methods are a bit random (they change when I rerun the program), so treat them as approximate. In the interest of full disclosure, I throttled CPLEX to a single thread for the callback approach. Using multiple threads would have required both synchronizing the code (a pain) and also checking for repeated solutions (since hypothetically two parallel threads could independently locate the same "set"). The moral, to the extent there is one, is that sometimes brute force beats more sophisticated approaches.</p><p>You can find my Java code <a href="https://gitlab.msu.edu/orobworld/set-game" target="_blank">here</a> if interested.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-30053475196960195992023-07-30T11:51:00.000-04:002023-07-30T11:51:46.447-04:00Visiting All Nodes<p>A <a href="https://or.stackexchange.com/questions/10762/how-can-i-find-the-shortest-path-visiting-all-nodes-in-a-connected-graph-as-milp" target="_blank">question</a> on Operations Research Stack Exchange asks about finding a shortest route through a connected graph that visits every node. This is distinct from the well-known <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem" target="_blank">traveling salesman problem</a> (TSP), in that the TSP requires each node to be visited exactly once whereas the OR SE question explicitly allows nodes to be visited more than once.</p><p>The question reminded me of a conversation I had years ago with a doctoral student in our logistics program. He was working on routing deliveries (via truck) from a depot to various customers, and was trying to formulate it as a TSP using a graph based on a road map (buildings or intersections as nodes, roads as edges). I did my best at the time to convince him that the TSP was unnecessarily restrictive and might even make the problem infeasible. The second example in the OR SE question is a nice example of how that could happen. Generally speaking, in an actual routing problem there is nothing stopping you from passing by/through a previously visited location if that's the best route. There are a few exceptions to this, some in military applications (you were laying mines and/or blowing up bridges as you passed them) and some in normal life (you were speeding and might not want to return some place where a traffic cop would remember you), but for the most part I believe it holds.</p><p>My advice to the doctoral student was similar to the answer Rob Pratt posted on OR SE. Rethink the graph, keeping just the depot and customers as nodes and letting the edges in the new graph represent the shortest routes between pairs of nodes. So edge $(i,j)$ and edge $(k,\ell)$ connect different pairs of nodes but might represent physical routes that cross (you pass through the same intersection on both, albeit possibly in different directions) or overlap (both take US 127 north from mile marker 102 to mile marker 137). Building the edges in the new graph involves solving a shortest path problem between each pair of nodes (using the original graph). You can apply <a href="https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm" target="_blank">Dijkstra's algorithm</a> to each pair of nodes, but I'm pretty sure applying the <a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm" target="_blank">Floyd-Warshall algorithm</a> would be faster. Pro tip: if you use Floyd-Warshall, don't bother to extract the route for each pair of nodes. Just get the distances, and keep the F-W solution handy. Once you have solved a TSP on the modified graph and know which edges will be used, you can extract the physical routes corresponding to just the used edges from the F-W solution.</p><p>I posted an explicit MIP model (which I will not repeat here, since it's a space hog) as an answer to the OR SE question. I suspect that Rob's approach (TSP on a modified graph) might be faster, particularly because it allows you to use a dedicated TSP solver. Although TSPs have a reputation for being difficult (being the poster child for NP-hard problems), bespoke solvers like <a href="https://www.math.uwaterloo.ca/tsp/concorde.html" target="_blank">Concorde</a> can chew through rather large TSPs pretty quickly. Other than the upfront computational cost of finding the shortest path between every pair of nodes, the one concern I might have with the TSP approach is that it converts what might have been a sparse graph to a complete graph. So, like all things MIP, which approach is better is an empirical question.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-74007901323322356242023-06-18T10:00:00.002-04:002023-06-18T10:00:00.149-04:00An Unbounded Bounded Feasible Region (II)<p>In my <a href="https://orinanobworld.blogspot.com/2023/06/an-unbounded-bounded-feasible-region-i.html" target="_blank">previous post</a>, I cited a <a href="https://or.stackexchange.com/questions/10579/randomly-constructing-a-bounded-ellipsoid" target="_blank">question</a> on Operations Research Stack Exchange about an allegedly unbounded feasible region defined by a randomly generated quadratic constraint. In that post, I presented what I believe is a valid proof that, at least in theory, the feasible region should be bounded with probability 1.</p><p>Unfortunately, in our digital world, theory and computation sometimes diverge. To test whether I was correct, I coded several mixed integer quadratic programming (MIQCP) models to assess the boundedness of $X$, and applied them to a small sample of test problems (using Java and CPLEX 22.1.1). I set the dimension of the test problems to $n=5$ (for no particular reason).<br /></p><p>All three models contained the variable $x$ and the constraint $x^{\prime}Qx+q^{\prime}x \le -q_{0}.$ Two of the models attempted to answer the question directly by maximizing either the $L_1$ or $L_\infty$ norm of $x$ over $X$.</p><p>For the $L_\infty$ model, I added continuous variable $y$ and binary variables $z_i$ and $w_i$ together with the constraints $$\sum_i z_i + \sum_i w_i = 1,$$ $$y_i \ge x_i,$$ $$y_i \ge -x_i,$$ $$ z_i = 1 \implies y_i = x_i$$ and $$w_i = 1 \implies y = -x_i.$$ The combined effect of these is to force $y = \vert x_i \vert$ for some $i$ where $$\vert x_i \vert = \max_j \vert x_j \vert = \parallel x \parallel_\infty.$$ The objective was to maximize $y.$<br /></p><p>For the $L_1$ model, I added continuous variables $y_i \ge 0$ and constraints $y_i = \vert x_i \vert.$ (Internally, CPLEX adds binary variables does big-M magic to linearize the use of absolute values.) The objective was to maximize $\sum_i y_i = \parallel x \parallel_1.$</p><p>My third approach was iterative. The model was almost the same as the $L_\infty$ model, except that rather than maximizing $y$ I set the objective to minimize 0 (meaning the first feasible solution wins) and set the lower bound of $y$ to some value $M.$ If the model found a feasible solution (an $x\in X$ such that $\parallel x \parallel_\infty \ge M$), I doubled $M$ and tried again, until either $M$ exceeded some upper limit or the solver said the problem was infeasible (meaning $X$ is bounded in $L_\infty$ norm by the last value of $M$).</p><p>You can find <a href="https://gitlab.msu.edu/orobworld/ellipsoid" target="_blank">my Java code</a> (self-contained other than needing CPLEX) in my repository. Here are the results from a handful of test runs.</p><p></p>
<div style="text-align: center;"><style type="text/css">
.tg {border:none;border-collapse:collapse;border-color:#bbb;border-spacing:0;}
.tg td{background-color:#E0FFEB;border-color:#bbb;border-style:solid;border-width:0px;color:#594F4F;
font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{background-color:#9DE0AD;border-color:#bbb;border-style:solid;border-width:0px;color:#493F3F;
font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
</style></div>
<table align="center" class="tg">
<thead align="center">
<tr>
<th class="tg-c3ow">Random Seed</th>
<th class="tg-c3ow">Max L_infinity<br /></th>
<th class="tg-c3ow">Max L_1<br /></th>
<th class="tg-c3ow">Iterative<br /></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td class="tg-c3ow">123<br /></td>
<td class="tg-c3ow">unbounded<br /></td>
<td class="tg-c3ow">unbounded<br /></td>
<td class="tg-c3ow">bounded (100)<br /></td>
</tr>
<tr align="center">
<td class="tg-c3ow">456</td>
<td class="tg-c3ow">unbounded<br /></td>
<td class="tg-c3ow">unbounded<br /></td>
<td class="tg-c3ow">bounded (100)</td>
</tr>
<tr align="center">
<td class="tg-c3ow">789</td>
<td class="tg-c3ow">unbounded</td>
<td class="tg-c3ow">unbounded</td>
<td class="tg-c3ow">bounded (1600)</td>
</tr>
<tr align="center">
<td class="tg-c3ow">12345</td>
<td class="tg-c3ow">unbounded</td>
<td class="tg-c3ow">bounded (22.1)</td>
<td class="tg-c3ow">bounded (100)</td>
</tr>
<tr align="center">
<td class="tg-c3ow">61623</td>
<td class="tg-c3ow">bounded (12.4)</td>
<td class="tg-c3ow">unbounded</td>
<td class="tg-c3ow">bounded (100)</td>
</tr>
<tr>
<td class="tg-c3ow" style="text-align: center;">20230616</td>
<td class="tg-c3ow" style="text-align: center;">bounded (120.9)</td>
<td class="tg-c3ow" style="text-align: center;">unbounded</td>
<td class="tg-c3ow" style="text-align: center;">bounded (200)</td>
</tr>
</tbody>
</table><p>The numbers in parentheses are upper bounds on the $L_1$ norm in the third column and the $L_\infty$ norm in the second and fourth columns. The bound found by the iterative method is never tight, so a bound of 100 means $\max_{x in X} \parallel x \parallel_\infty \le 100.$ As you can see, the iterative method always found the ellipsoids to be bounded, consistent with the mathematical argument in the previous post. The other two models frequently found the problem to be "unbounded", though they did not always agree on that. This is a bit confusing (OK, very confusing). In particular, the "Max L_infinity" and "Iterative" models differ only in whether you are maximizing $y$ or looking for any solution with $y\ge M,$ so saying (when the seed is 123) that the supremum of $y$ is $\infty$ but $y$ cannot exceed 100 is grounds for another beer (or three).</p><p>Something is apparently going on under the hood in CPLEX that is beyond me. Meanwhile, I'm sticking to my belief that $X$ is always bounded.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-90771706836036592872023-06-17T15:27:00.000-04:002023-06-17T15:27:56.335-04:00An Unbounded Bounded Feasible Region (I)<p>If you find the title of this post confusing, join the club! A question on Operations Research Stack Exchange, <a href="https://or.stackexchange.com/questions/10579/randomly-constructing-a-bounded-ellipsoid" target="_blank">"Randomly constructing a bounded ellipsoid"</a>, sent me down a rabbit hole, eventually leading to this opus. I'm going to make a couple of notational tweaks to the original question, but the gist is as follows. We have an elliptical feasible region $X = \lbrace x \in \mathbb{R}^n : f(x) \le 0 \rbrace$ where $f(x) = x^\prime Q x + q^\prime x + q_0.$ (One of my tweaks is to absorb the author's factor $1/2$ into $Q.$) $Q\in \mathbb{R}^{n \times n},$ $q\in \mathbb{R}^n$ and $q_0\in \mathbb{R}$ are generated by sampling random numbers from the standard normal distribution. In the case of $Q,$ we sample an $n\times n$ matrix $H$ and then set $Q = \frac{1}{2} H^\prime H.$ (My introducing the symbol $H$ for the sampled matrix is the other notational tweak.) Note that $Q$ is automatically symmetric and positive semidefinite, and is positive definite with probability 1. (For it not to be positive definite, $H$ would have to have less than full rank, which has zero probability of occurring.) I should point out here that saying something has probability 1 or 0 assumes that the random number generator works as advertised.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The author of the question said that in their experience $X$ was unbounded "most of the time." That struck me as impossible, and after a bunch of scribbling on marker boards I finally came down to what I think is a correct argument that $X$ must be bounded. Let $\left\{ x_{1},\dots,x_{n}\right\} $ be an orthonormal basis of eigenvectors of $Q,$ with $Qx_{i}=\lambda_{i}x_{i}$ and $$x_i^\prime x_j =\begin{cases} 1 & i = j \\ 0 & i \neq j. \end{cases}$$</p><span style="white-space: pre-wrap;">(I'll leave the proof that such a basis exists to the reader as an exercise.)</span><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Now suppose that $X$ is unbounded, meaning that for an arbitrarily large $M$ we can find $x\in X$ such that $\parallel x\parallel>M.$ Write $x$ in terms of the basis: $x=\sum_{i}a_{i}x_{i}.$ Observe that $$M^{2}=\parallel x\parallel^{2}=x^{\prime}x=\sum_i \sum_j a_i a_j x_i^\prime x_j = \sum_{i}a_{i}^{2}\left(x_{i}^{\prime}x_{i}\right)=\sum_{i}a_{i}^{2}.$$Expanding $f(x),$ we have </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">\begin{align*} f(x) & =\left(\sum_{i}a_{i}x_{i}\right)^{\prime}Q\left(\sum_{i}a_{i}x_{i}\right)+q^{\prime}\left(\sum_{i}a_{i}x_{i}\right)+q_{0} \\ & =\sum_{i,j}a_{i}a_{j}\left(x_{i}^{\prime}Qx_{j}\right)+\sum_i a_{i}\left(q^{\prime}x_{i}\right)+q_{0} \\ & =\sum_{i,j}a_{i}a_{j}\lambda_{j}\left(x_{i}^{\prime}x_{j}\right)+\sum_{i}a_{i}\left(q^{\prime}x_{i}\right)+q_{0} \\ & =\sum_{i}a_{i}^{2}\lambda_{i}+\sum_{i}a_{i}\left(q^{\prime}x_{i}\right)+q_{0}. \end{align*}</p><p></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Since $x\in X,$ $\sum_{i}a_{i}^{2}\lambda_{i}\le-\sum_{i}a_{i}\left(q^{\prime}x_{i}\right)-q_{0}.$ According to the Cauchy-Schwarz inequality, $\vert q^{\prime}x_{i}\vert\le\parallel q\parallel\parallel x_{i}\parallel=\parallel q\parallel,$ so we have $$\sum_{i}a_{i}^{2}\lambda_{i}\le-\sum_{i}a_{i}\left(q^{\prime}x_{i}\right)-q_{0}\le\sum_{i}\vert a_{i}\vert\parallel q\parallel+\vert q_{0}\vert.$$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">On the other hand, if $\Lambda=\min_{i}\lambda_{i}>0,$ then $$\sum_{i}a_{i}^{2}\lambda_{i}\ge\Lambda\sum_{i}a_{i}^{2}=\Lambda M^{2}.$$ Combining these, $$\Lambda M^{2}\le\parallel q\parallel\sum_{i}\vert a_{i}\vert+\vert q_{0}\vert.\quad (1)$$</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Now let $A=\max_{i}\vert a_{i}\vert$ and assume without loss of generality that $\vert a_{1}\vert=A.$ Since $M^{2}=\sum_{i}a_{i}^{2},$ $A^{2}=M^{2}-\sum_{i>1}a_{i}^{2}\le M^{2}$ and so $0<A\le M.$ Meanwhile, $M^{2}=\sum_{i}a_{i}^{2}\le nA^{2},$ which implies $A\ge\frac{M}{\sqrt{n}}.$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style>
Dividing both sides of (1) by $A,$ we have $$\Lambda M\le\Lambda\frac{M^{2}}{A}\le\parallel q\parallel\sum_{i}\frac{\vert a_{i}\vert}{A}+\frac{\vert q_{0}\vert}{A}\le\parallel q\parallel n+\frac{\vert q_{0}\vert}{A}.\quad (2)$$</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The left side of (2) increases as we increase $M,$ while the right side decreases (since $A$ increases and both $q_0$ and $\parallel q\parallel n$ are constant). This leads to a contradiction.</p>
<p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br />So, barring an error in the above, we have a mathematical proof that $X$ must be bounded. In the next post I will explore the computational side of things.<br /></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p>
<p><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p><style type="text/css">p, li { white-space: pre-wrap; }</style></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-65639951029435806652023-05-08T18:49:00.000-04:002023-05-08T18:49:56.048-04:00Finding a Connected Subgraph<p>A recent <a href="https://math.stackexchange.com/questions/4693636/milp-constraints-for-connectivity-in-a-subgraph" target="_blank">question</a> on Mathematics Stack Exchange asked about using a mixed integer linear program (MILP) to find a connected subgraph of a specified size (vertex count) within a given graph. Specifically, it asked how to formulate constraints to enforce the requirement that the subgraph be connected.</p><p>As I have <a href="https://orinanobworld.blogspot.com/2013/07/extracting-connected-graph.html" target="_blank">previously mentioned</a>, one way to force a graph to be connected is to add flows between vertices, treating one vertex as a source and the other vertices as sinks. In the context of the current question, that works as follows. Let $V$ and $E$ be the sets of vertices and edges respectively in our (undirected) graph and $M<\vert V \vert$ the number of vertices required in the connected subgraph. Note that we do not assume the parent graph is connected, and in fact the sample graph in the Math SE post is not. Since flows are inherently directional, we replace each edge $e=(v,w)\in E$ with two arcs, $(v,w)$ and $(w,v).$ Let $A$ be the set of all arcs.<br /></p><p>We designate (arbitrarily) one vertex in the subgraph to be the "root" (source) vertex, with a supply of $M-1$ thingies. (I was going to say "honest politicians", but then the problem becomes infeasible, hence "thingies".) Each of the remaining vertices in the subgraph consumes one thingy. Thingies flow along arcs, but only if both endpoints are selected. Since $M-1$ thingies enter the network at the root vertex and each of the remaining $M-1$ vertices consumes one thingy, the only way for flow to be balanced is if the selected vertices form a connected graph.</p><p>For each vertex $v\in V,$ let $x_v \in \lbrace 0, 1 \rbrace$ be 1 if and only if vertex $v$ is selected and let $y_v\in \lbrace 0, 1 \rbrace$ be 1 if and only if vertex $v$ is chosen to be the root vertex. For each arc $a\in A,$ let $f_a \in [0, M-1]$ be the flow across arc $a.$ The objective function is immaterial, since we just want a feasible solution, so we will minimize 0. The constraints are as follows:</p><ul style="text-align: left;"><li>The correct number of vertices must be chosen: $$\sum_{v\in V} x_v = M.$$</li><li>One vertex must be designated the root: $$\sum_{v \in V} y_v = 1.$$</li><li>The root vertex must be one of the chosen vertices: $$y_v \le x_v \quad \forall v\in V.$$</li><li>The flow on any arc must be 0 unless both head and tail are among the chosen vertices: $$f_{(v,w)}\le (M-1)x_v \quad \forall (v,w)\in A$$ and $$f_{(v,w)}\le (M-1)x_w \quad \forall (v,w)\in A.$$</li><li>The net flow out of any vertex (flow out - flow in) must be $M - 1$ for the source vertex, -1 for every other selected vertex and 0 otherwise: $$\sum_{(v,w)\in A} f_{(v,w)} - \sum_{(w,v)\in A} f_{(w,v)} = M\cdot y_v - x_v \quad \forall v\in V.$$</li></ul><p>This works in a MILP model, but a MILP model is probably not the best way to solver the underlying problem. Consider the following algorithm, which is based on constructing a layered subgraph. Assume the vertices are in some order. Make the first vertex the root and create a graph containing just it (which we will call layer 0). Add up to $M-1$ vertices connected to the root by an edge (layer 1). If the graph now contains $M$ vertices, declare victory and stop. Otherwise, for each vertex in layer 1, find all vertices connected to it by an edge and not in layers 0 or 1 and as many as them as are needed to reach $M$ vertices (or all of them if you are still short). Repeat until done or until all vertices in layer 1 have been processed. The added vertices form layer 2 because they are two edges from the root. Repeat with each layer until either you have $M$ vertices (victory) or there are no vertices left to process.</p><p>If all connected vertices have been found and the count is less than $M,$ discard the current root and start over with the next vertex as root. Note that when processing subsequent roots, you can ignore any vertices already tried as root and just look at vertices later in the vertex ordering.</p><p>I ginned up some <a href="https://gitlab.msu.edu/orobworld/connected-subgraph" target="_blank">Java code</a> (open source, as always) to test both the MILP model and the layered graph approach, using CPLEX as the MILP solver. Both got the examples from the Math SE post correct in a trivial amount of time. On a randomly generated graph with 5,000 nodes and 624,713 edges (about 5% of the edges in a complete graph that size), with a target subgraph size of 1,000 nodes, the MILP model took about 8 seconds to build and 12 seconds to solve. The layered graph search needed 3 milliseconds. I tried a larger graph, but my computer ran out of heap space trying to build the model (mainly due to the expanding number of flow variables and related constraints).</p><p>So there are two things to take away. One is that flows and flow balance constraints can be used to enforce connection among vertices, even when there is no actual thing flowing. The other is that, much as I like MILP models, sometimes simpler approaches are better.</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-12979510750466884702023-04-28T16:03:00.000-04:002023-04-28T16:03:37.126-04:00A Matrix Puzzle<p>A question on Mathematics Stack Exchange, <a href="https://math.stackexchange.com/questions/4686170/placing-number-blocks-so-that-the-resulting-matrix-is-symmetric" target="_blank">"Placing number blocks so that the resulting matrix is symmetric"</a>, revolves around a square matrix whose entries are partitioned into blocks (submatrices). The submatrices need to be rearranged so that the resulting matrix is symmetric around its main diagonal. The author of the question asked if it were possible to do this using linear programming. I think the answer is no, but we can certainly do it with either an integer linear program or a constraint program. The author posts two examples ($4\times 4$ and $5\times 5$). I cobbled together Java code for a MIP model (using CPLEX) and a CP model (using CP Optimizer) and had no trouble solving both problems.</p><p>There are a couple of generalizations worth noting. First, while the example matrices contain integers, there is nothing preventing the two models from being used with other content types (real numbers, complex numbers, text). Some variable definitions would need modification, but conceptually both methods would work. Second, the author of the post used only one-dimensional blocks (row vectors or column vectors), but my code allows for arbitrary rectangular blocks. The code assumes that the "parent matrix" is square, but that would be easy enough to relax, so the same models (with minor adjustments) would work with arbitrary rectangular matrices.</p><p>I think that the puzzle makes a good vehicle for comparing MIP and CP applications to logic problems. In optimization problems, I suspect that MIP models often do a better job of computing objective bounds than do CP models. That is a non-issue here, since the problem is just to find a feasible solution. For logic problems and similar things (particularly scheduling), I think CP models tend to be more expressive, meaning certain types of constraints or relationships can be expressed more naturally with CP than with MIP (where the relationships turn into rather arcane and complicated adventures with binary variables). That applies here, where the CP model exploits the ability to use integer variables as subscripts of other variables.</p><p>As described in the PDF file, though, that subscripting ability has its limits. CP Optimizer will let you index a one-dimensional vector of variables using an integer variable, but not a two-dimensional array. In other words, <span style="font-family: courier;">x[y]</span> is fine but <span style="font-family: courier;">x[y, z]</span> is not (where <span style="font-family: courier;">x</span>, <span style="font-family: courier;">y</span> and <span style="font-family: courier;">z</span> are all variables). The workaround I used is to flatten a 2-D matrix into a 1-D matrix. So if $N\times N$ matrix $x$ is flattened into $N^2\times 1$ vector $\hat{x},$ $x_{y,z}$ becomes $\hat{x}_{N(y - 1) + z}.$</p><p>The models are a bit too verbose to squeeze into a blog post, so I wrote them up in a separate <a href="https://rubin.msu.domains/blog/matrix-puzzle-formulations.pdf" target="_blank">PDF file</a>. My Java code (which requires recent versions of CPLEX and CP Optimizer) can be had from my <a href="https://gitlab.msu.edu/orobworld/matrix-puzzle" target="_blank">code repository</a> under a Creative Commons license.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-61834652469186857042023-04-18T16:24:00.000-04:002023-04-18T16:24:32.357-04:00Node Coloring<p>Someone posted a question on OR Stack Exchange about <a href="https://or.stackexchange.com/questions/10321/coloring-of-nodes-of-a-sensor-network" target="_blank">coloring nodes</a> in a sensor network. The underlying problem has something to do with placing sensors, but in mathematical terms the poster wanted to color the nodes of a weighted undirected graph with a fixed number of colors, while minimizing the sum of the edge weights of edges connecting nodes of the same color.</p><p>There is an obvious MIP model for this problem, which was posted as an answer to an earlier version of the question. If $N$ and $C$ are the index sets for nodes and colors respectively, $E\subseteq N\times N$ is the set of edges, $w_{ij}$ is the objective weight of edge $(i,j)\in E$ and $x_{n,c}$ is a binary variable indicating whether node $n$ is assigned color $c$, the problem can be written as a binary quadratic program:</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$\min\ \sum_{(n,m)\in E}w_{nm}\sum_{c\in C}x_{nc}x_{mc}\\<br />\textrm{s.t. }\sum_{c\in C}x_{nc} =1\quad\forall n\in N\\<br />\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C.$$</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> <br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The model can be linearized by introducing continuous variables $y_{nm}\ge 0$ to represent the objective contribution of each edge $(n,m)\in E$:</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$\min\ \sum_{(n,m)\in E}w_{nm} y_{nm}\\<br />\textrm{s.t. }\sum_{c\in C}x_{nc}=1 \quad \forall n\in N\\<br />\phantom{\textrm{s.t. }}y_{nm}\ge x_{nc} + x_{mc} - 1 \quad \forall (n,m)\in E, \forall c\in C\\<br />\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C\\<br />\phantom{\textrm{s.t. }}y_{nm}\ge 0 \quad \forall (n,m)\in E.$$<br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">There is some symmetry in the model: given any feasible solution, another solution of identical cost is obtained by permuting the colors. It is possible to add constraints to remove that symmetry.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The person posting the question was looking for alternatives to a MIP model. An easy choice for me would be a genetic algorithm. I wanted to test this in R, using the <span style="font-family: courier;">GA</span> library for the genetic algorithm. The <span style="font-family: courier;">GA</span> library allows three types of chromosomes: binary or real vectors, or permutations of an index vector. If I were coding my own GA, the chromosome would be a vector in $\lbrace 1,\dots,C\rbrace^N.$ Within the available chromosome types in the library, the obvious choice was to use a vector $x\in [0,C]^N$ and then set the color of node $i$ to $\lceil x_i \rceil\in \lbrace 1,\dots,C\rbrace.$ I suspect, though, that the symmetry issue described above might have negative repercussions for a GA. If two "good" solutions were effectively the same (other than a permutation of the color indices), their offspring (which would be a mix of the two color numbering methods) might be a mess.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">A couple of responders to the OR SE question suggested heuristics, as did I. My choice was to start with a random generation heuristic (assign colors randomly), apply an improvement heuristic (go through the nodes in random order and, for each node, reassign it to the cheapest color given the rest of the solution), and repeat (changing the order in which nodes are examined on each pass) until one pass produced no improvement. Since the heuristic is fairly fast, it can be done multiple times (new random restarts) with the best solution retained. Heuristics like this are very easy to code.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">I coded all three approaches in R, using CPLEX (with the <span style="font-family: courier;">ompr</span> and <span style="font-family: courier;">ROI</span> libraries) for the MIP model, the <span style="font-family: courier;">GA</span> library for the genetic algorithm, and nothing special for the improvement heuristic. I also threw in the <span style="font-family: courier;">tictoc</span> library to do some timings. The test problem was a randomly constructed $10\times 20$ complete grid graph (200 nodes; 19,900 edges), to be colored using five colors.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">To break symmetry in the MIP model, I forced the colors to be ordered so that the lowest node index receiving any color was smaller than the lowest node index receiving the next color. I let CPLEX run for five minutes using default settings. It made what I would call reasonable progress on the incumbent objective value, but the formulation is apparently very weak (at least on the test example). The lower bound barely moved, and the gap was 99.97% after five minutes.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">I mostly used default parameters for the GA, although I let it run for 2,000 generations with a population of size 100. It found a much better solution (in 38 seconds) than CPLEX found in five minutes. To be fair, though, CPLEX was balancing improving the incumbent with working on the bound. Had I changed the CPLEX parameters to emphasize improving the incumbent, it likely would have done better (though likely not as well as the GA).</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The improvement heuristic surprised me a bit. 600 random starts needed only 37 seconds (and I'm not the poster boy for optimized R code), and the best solution it found had an objective value about 5% better than what the GA produced. I did a small number of runs with different random seeds (but the same problem size), and the improvement heuristic consistently edged out the GA. I suppose the moral here is to try simple heuristics before going "high tech".</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">You are welcome to play with <a href="https://rubin.msu.domains/blog/graph_coloring.nb.html" target="_blank">my R notebook</a>, which contains both code and results.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-74977546469258899172023-04-13T18:07:00.001-04:002023-04-13T18:07:35.849-04:00The rJava Curse Strikes Again<p>Apparently I have not needed the <span style="font-family: courier;">rJava</span> R package in a while, because when I wanted to install an R package today that has <span style="font-family: courier;">rJava</span> as a dependency, it was not there. So I tried to install it (this is on Linux Mint), and of course it failed to install. I have a long history of installation battles with <span style="font-family: courier;">rJava</span> (see <a href="https://orinanobworld.blogspot.com/2011/03/configuring-r-for-java.html" target="_blank">here</a>, <a href="https://orinanobworld.blogspot.com/2015/06/the-rjava-nightmare.html" target="_blank">here</a> and <a href="https://orinanobworld.blogspot.com/2016/12/rjava-gift-that-keeps-on-giving.html" target="_blank">here</a> in chronological order ... or better still don't traumatize yourself by reading them). Why should this time be different?<br /></p><p>All my previous battles involved older versions of Java with apparently different directory locations or structures, and none of the previous fixes worked. After considerable aggravation, I found a <a href="https://datawookie.dev/blog/2018/02/installing-rjava-on-ubuntu/" target="_blank">very helpful post</a> by "datawookie" that nearly got the job done. I did in fact get an error message about "jni" and used the trick in datawookie's post of setting JAVA_HOME to the correct path (in my case to Open JDK 17) as an argument to <span style="font-family: courier;">javareconf</span>. When I then attempted to install <span style="font-family: courier;">rJava</span>, I got a different error ("could not find -lbz2"), which prompted me to install <span style="font-family: courier;">libbz2.dev</span> ("<span style="font-family: courier;">sudo apt-get install libbz2.dev</span>"), after which R was finally able to install <span style="font-family: courier;">rJava</span> (woo-hoo!).</p><p>I'd say this is getting ridiculous, but we passed that milestone years ago.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-82821209989770134572023-03-20T14:54:00.003-04:002023-03-20T14:54:48.504-04:00Annoying Message with R GA Package<p>This is a small technical note (mainly a reminder to myself) about using the <span style="font-family: courier;">GA</span> package for genetic algorithms in R, specifically within an R Markdown document (such as an R notebook). The <span style="font-family: courier;">GA::ga()</span> function includes an optional argument to turn on parallel evaluation of the fitness function. The argument value can be true or false (default), which are self-explanatory, or a number of cores, or a character string ("snow" or "multicore") for how parallelization is to be done. The default is "multicore" except on Windows, where "snow" is apparently the only option. When I choose to turn on parallel objective evaluation, I take the easy route and just set it to <span style="font-family: courier;">TRUE</span>.</p><p>When run in a terminal, I just get a sequence of log entries, one per generation, and then it terminates. That's also what I see when the command is run in an R Markdown document ... until the document is rendered to HTML. In the HTML output, interspersed with each line of log output is a block of four identical copies of the following.<br /></p><pre style="-webkit-text-stroke-width: 0px; background-color: white; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; display: block; font-family: monospace; font-size: 13px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.42857; margin: 0px 0px 10px; orphans: 2; overflow-wrap: break-word; overflow: auto; padding: 9.5px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; widows: 2; word-break: break-all; word-spacing: 0px;"><code class="hljs" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: monospace; font-size: inherit; padding: 0px; white-space: pre-wrap;">loaded GA and set parent environment</code></pre><p>
The fact that there are four copies is presumably because my PC has four cores. Turning off the monitor option gets rid of the progress printouts but does not eliminate the repetitive messages. Switching the parallel mode to "snow" does get rid of them, but I'm not sure what the broader implications of that are.</p><p>After significant spelunking, I traced the messages to the <span style="font-family: courier;">doParallel</span> R package, which is apparently being used to implement parallel processing under the "multicore" option. In any event, the way to get rid of them is to add the option "message = FALSE" to the R chunk in which <span style="font-family: courier;">ga()</span> is being executed. Barring other chunk options, this would change <span style="font-family: courier;">{r}</span> to <span style="font-family: courier;">{r, message = FALSE}</span> for that one chunk. (You can also set it as a global option in the document.) Happily, the monitor output (stats for each generation) still prints out. It just gets rid of those <expletive deleted> parallel processing messages.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-30048876379220296132023-03-07T16:48:00.000-05:002023-03-07T16:48:01.375-05:00Another GA for Clustering<p>Someone asked on OR Stack Exchange about <a href="https://or.stackexchange.com/questions/10059/how-to-perform-clustering-of-a-large-number-of-nodes/" target="_blank">clustering nodes</a>, a subject that comes up from time to time. The premise is that you have $N$ points (nodes), and for each pair $i \neq j$ there is (or may be) an edge with weight $w_{i,j} \in [0, 1]$ (where $w_{i,j} = 0$ if there is no edge between $i$ and $j$). The objective is to create exactly $G$ groups (clusters), with no group having cardinality greater than $M,$ such that the sum of within-group edge weights is maximized. It is simple to write an integer programming model for the problem, and the question includes one, but the author was looking for something that did not require an IP solver. The question specifies, as an example, dimensions of $N=500$ points, $G=8$ groups, and a maximum group size of $M=70.$</p><p>Since I have suggest genetic algorithms for previous clustering problems (see <a href="https://orinanobworld.blogspot.com/2021/04/a-ga-model-for-joint-clustering-problem.html" target="_blank">here</a> and <a href="https://orinanobworld.blogspot.com/2021/12/revisiting-joint-clustering-problem.html" target="_blank">here</a>), it probably will not shock you that I once again looked for a GA approach, using a "random key" algorithm (where the chromosome gets decoded into a feasible solution. In this case, the chromosome consists of $N+G$ values between 0 and 1. The first $G$ values are used to select the sizes $n_1,\dots, n_G$ of the groups. The last $N$ values are converted into a permutation of the point indices $1,\dots, N.$ The first $n_1$ entries in the permutation are the indices of the points in group 1, the next $n_2$ entries give the points in group 2, and so on.</p><p>The tricky part here is converting the first portion of the chromosome into the group sizes. We know that the maximum group size is $M,$ and it is easy to deduce that the minimum group size is $m = N - (G-1)M.$ So the minimum and maximum fractions of the population to assign to any group are $a=m/N$ and $b=M/N,$ where $$m = N - (G-1)M \implies a = 1 - (G-1)b.$$ Now suppose that I have values $\pi_1, \dots,\pi_G \in (a,b)$ and assign group sizes $n_i = \pi_i \cdot N,$ which clearly meet the minimum and maximum size limits. (I'm cheating here, but I'll fix it in a minute.) We want $\sum_{i=1}^G n_i = N,$ which means we need $\sum_i \pi_i = 1.$</p><p>To get the $\pi_i,$ we take a rather roundabout route. Assume that the first $G$ entries in the chromosome are $x_1,\dots,x_G \in (0,1).$ Set $$y_i = 1 - \frac{x_i}{\sum_{j=1}^G x_j} \in (0,1)$$ and observe that $\sum_{i=1}^G y_i = G-1.$ Finally, set $\pi_i = a + (b-a) y_i \in (a, b).$ We have $$\sum_i \pi_i = G\cdot a + (b-a) \sum_i y_i\\ = G\cdot a + (b-a)(G-1) = (G-1)b + a = 1.$$</p><p>Now to confess to my cheating. I said the $i$-th group size would be $n_i=\pi_i \cdot N,$ ignoring the minor problem that the left side is an integer and the right side almost surely is not. So of course we will round $\pi_i \cdot N$ to get $n_i.$ After rounding, though, we can no longer be sure that $\sum_i n_i = N.$ So we iteratively fix it. If $\sum_i n_i < N,$ we add 1 to the smallest $n_i$ and recheck. if $\sum_i n_i > N,$ we subtract 1 from the largest $n_i$ and recheck. Eventually, we end up with a feasible set of group sizes.</p><p>All this is encoded in an <a href="https://rubin.msu.domains/blog/ga_clustering.nb.html" target="_blank">R notebook</a> I wrote, including a sample problem. Whether the GA gets a "good" (near optimal) partition or not I cannot say, since I did not write the equivalent MIP model. I can say, though, that it gets a sequence of progressively improving partitions.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-19029288518592495832023-01-30T19:03:00.000-05:002023-01-30T19:03:23.918-05:00A Random Tree Generator<p>For a problem I was noodling with in R, I needed to generate random trees (meaning connected, acyclic, layered, undirected graphs, not classification trees or anything statistical). There are a bunch of libraries for R that have various capabilities for constructing and or massaging graphs and networks (see the <a href="https://cran.r-project.org/web/views/GraphicalModels.html" target="_blank">CRAN Task View</a> for graphical models). After spending some time poking through the docs for some of the libraries listed in the task view, and unsuccessfully trying to install one, I decided it would be faster just to roll my own code.</p><p>Along the way I succumbed to the software developer's maxim: if it works, it needs more features. Still, I wound up with a function that is not horribly complicated and seems to work. Given how many nodes and how many layers you want in the tree, it outputs a matrix of edges forming the desired tree. There are optional arguments for how you want the nodes labeled (the default is 1, 2, ... but you can supply a vector of labels) and in what order the labels should be assigned (top to bottom/left to right raster scan of the graph or randomly), as well as how you want the output matrix organized (randomly, by label order, or by layer order).</p><p>The code is in an R notebook that demonstrates the use of the function. It imports the <span style="font-family: courier;">DiagrammeR</span> library in order to plot the resulting trees, but the function does not require <span style="font-family: courier;">DiagrammeR</span> and in fact has no dependencies. You can view the notebook <a href="https://rubin.msu.domains/blog/treeGenerator.nb.html" target="_blank">here</a>, and if you want you can download the code from it (see the select box in the upper right of the notebook). The code is licensed under the <a href="https://creativecommons.org/licenses/by/3.0/deed.en_US">Creative
Commons Attribution 3.0 Unported license</a>.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-87638461500945047522023-01-27T14:16:00.000-05:002023-01-27T14:16:04.843-05:00Weird Caja Bug<p>In the past day or so I've tripped over a rather annoying bug in Caja, the file manager for the MATE desktop (which I use with Linux Mint, and which is also used by many Ubuntu users). Theoretically, the F2 key is a shortcut for Edit > Rename..., which lets you rename selected files. Usually, it works fine, but today (and at least once in the recent past) the F2 key had no effect.</p><p>At first I thought there was a problem with my keyboard, but no, the F2 key was being recognized. A Google search led me to a <a href="https://ubuntu-mate.community/t/potential-important-bug-in-caja/16686/12" target="_blank">bug report</a> from 2018 (!) where a diligent soul with screen name "terzag" discovered that the problem occurred when the Caja window displayed a vertical scrollbar and disappeared when the window was resized large enough to lose the scrollbar. Sure enough, that fix worked for me today ... and then things got weirder.</p><p>I did a little experimenting, and the bug reappeared when the scrollbar reappeared and disappeared whenever the scrollbar disappeared. Then I tried F2 with multiple file icons selected (which opens a dialog to bulk rename them), and it worked, despite there being a visible scrollbar. Since then, no matter how many times I close and reopen Caja, the bug does not manifest. So it is temporarily fixed, I guess, with emphasis on "temporarily".</p><p>I also encountered a possibly related bug, again reported years ago. If I hit either F2 or Edit > Rename... without first selecting one or more files, it closes (crashes?) all open Caja windows. I'm pretty sure that's a bug and not a "feature".</p><p>Hopefully this all gets fixed relatively soon ... although looking at bug reports from almost five years ago does not make me feel sanguine about the prospects.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-14998184317830038222023-01-27T13:43:00.000-05:002023-01-27T13:43:07.325-05:00Finding All Simple Cycles (III)<p>This is the third and (hopefully) final post about finding all simple cycles in an undirected graph. In the <a href="https://orinanobworld.blogspot.com/2023/01/finding-all-simple-cycles-i.html" target="_blank">first post</a>, I discussed the problem briefly and described a simple, efficient iterative algorithm. In the <a href="https://orinanobworld.blogspot.com/2023/01/finding-all-simple-cycles-ii.html" target="_blank">second post</a>, I posed (but definitely did not recommend) a mixed integer linear program (MILP) to find all simple cycles. Here I will discuss an alternative MILP approach, specific to CPLEX. To make sense of it, you will definitely need to have read the second post. Java code demonstrating all three methods is available from my <a href="https://gitlab.msu.edu/orobworld/simplecycles" target="_blank">repository</a>.</p><p>Recent versions of CPLEX have a feature called the solution pool, about which I have <a href="https://orinanobworld.blogspot.com/2013/01/finding-all-mip-optima-cplex-solution.html" target="_blank">previous written</a>. One use is to accumulate solutions (including suboptimal ones if desired) encountered along the way to the optimal solution of a MILP model. Another, using the <span style="font-family: courier;">IloCplex.populate()</span> method in the Java API, is to just generate a gaggle of feasible solutions. So the approach I propose here is to build a MILP model for finding a single simple cycle, set the solution pool population limit parameter (the maximum number of solutions to retain) to something really large and the solution pool intensity parameter (how hard CPLEX works to churn out solutions) to its maximum value, and then use the <span style="font-family: courier;">populate()</span> method to find simple cycles. This is implemented in the <span style="font-family: courier;">SingleCycleMIP.java</span> class in my code. </p><p>Actually, I'm not sure setting the intensity parameter is necessary. On the small test graph, results were similar no matter what intensity setting I used. CPLEX stopped with 160 solutions, many of which were duplicates. After removing duplicates, there were 13 distinct simple cycles, which matches what the other methods found. The only difference was run time, which was a fraction of a second using either the default intensity or the maximum intensity (4) and about two seconds using the minimum intensity (1). Either way, it was much faster than the full MILP model, at least on this small test graph.</p><p>The MILP model used here is a subset of the model in the previous post. The model has no objective value (or equivalently minimizes the constant zero function). We drop the $u$ variables (which were used to ensure that cycles were distinct) and the $x$ variables (which signaled whether a slot in the solution was filled with a cycle or not). For all other variables, we drop the $c$ subscript (which slot is being filled), since we are only building one cycle. The "one anchor per cycle" constraint becomes $\sum_i w_i = 1,$ and other constraints involving $u$ and/or $x$ are dropped.</p><p>One issue with this model is that the same cycle can be discovered multiple times, either because its orientation is reversed (so the "clockwise" and "counterclockwise" versions are considered distinct) or possibly because minor difference (rounding error) in the flow variables make two copies of the same solution look distinct. Regardless of the reason, the fact CPLEX terminated with 160 solutions when only 13 cycles exist tells you duplication is happening. Fortunately, we do not have to monkey with the model (or set overly finicky values for tolerance values); we can just filter the pool after the run and weed out the duplicates (which my code does). The alternative would be to use a callback (either an incumbent callback if you are using legacy callbacks, or the more current generic callback in "candidate" context) to weed out duplicates as they arise. Since the logic to be used in the callback is the same as the logic used in post-processing, I cannot see any virtue to using a callback, and it definitely would complicate the code.</p><p>With that I am hopefully done with this topic. :-)</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-45323582944106023602023-01-26T15:03:00.002-05:002023-01-26T15:03:58.291-05:00Finding All Simple Cycles (II)<p><a href="https://orinanobworld.blogspot.com/2023/01/finding-all-simple-cycles-i.html" target="_blank">Yesterday's post</a> described my introduction (by way of a <a href="https://or.stackexchange.com/questions/9786/number-of-hamiltonian-sub-cycles-on-the-graph" target="_blank">question</a> on OR Stack Exchange) to the problem of how to find all simple cycles in a graph. In that post I described a simple, efficient iterative algorithm for finding them.</p><p>The OR SE question specifically asked about a mixed integer linear program (MILP) to find all simple cycles. While I would never take that approach myself, I got curious as to whether it was possible to construct such a model. Down the rabbit hole I went. The answer turns out to be yes. In this post, I'll describe the model (again, without advocating that it be used).</p><p>Assume that we have an undirected graph with vertices numbered $1,\dots,N$ and edge set $E.$ In what follows, $i$ and $j$ will always be vertex numbers and $c$ and $c'$ will index cycles. Although the graph is undirected, when we construct cycles we will give them an orientation (e.g., $3 \rightarrow 2 \rightarrow 5 \rightarrow 3$), which will be useful in ensuring that the result is an actual cycle (you can get from the starting node back to itself) and not the composition of multiple smaller cycles (e.g., not treating $3 \rightarrow 2 \rightarrow 4 \rightarrow 3$ and $5 \rightarrow 6 \rightarrow 9 \rightarrow 8 \rightarrow 5$ as if they form a single cycle).<br /></p><p>The model depends on guessing a non-binding upper bound $C$ on the number of simple cycles in the graph. (If the solution to the model does contain $C$ cycles, your guess may have been too small. To be certain that all cycles were found, you would need to bump up $C$ and solve again, until the optimal solution found fewer than $C$ cycles.)</p><p>The model contains the following gazillion binary variables:</p><ul style="text-align: left;"><li>$x_{c}$ is an indicator for whether cycle $c\in\left\{ 1,\dots,C\right\} $<br />is constructed;</li><li>$y_{i,c}$ is an indicator for whether vertex $i$ belongs to cycle $c$;</li><li>$z_{i,j,c}$ is an indicator for whether edge $[i,j]\in E$ is part of<br />cycle $c$ with the tour of $c$ proceeding from $i$ to $j$;</li><li>$w_{i,c}$ is an indicator for whether vertex $i$ is the anchor (starting<br />and ending point) for cycle $c$; and<br /></li><li>$u_{i,j,c,c'}$ is an indicator that edge $[i,j]\in E$ (in either<br />orientation) belongs to exactly one of cycles $c$ and $c'\neq c.$</li></ul><p>We also have one set of variables that can be defined as either continuous or integer (continuous probably makes the model solve faster):</p><ul style="text-align: left;"><li>$f_{i,j,c}\ge 0$ is the ``flow'' on edge $[i,j]$ in the orientation $i\rightarrow j$ in cycle $c.$</li></ul><p>The flow variables are an adaptation of the variables used in the <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem#Miller%E2%80%93Tucker%E2%80%93Zemlin_formulation[21]" target="_blank">Miller-Tucker-Zemlin formulation</a> for the traveling salesman problem (TSP). We will use them to ensure that cycles found are in fact cycles.</p><p>The objective, at least, is straightforward. Since we want to find all possible simple cycles, we maximize the number of cycles constructed: $\max\sum_{c=1}^C x_{c}.$ Now come the constraints. (Brace yourself for some serious vertical scrolling.)<br /></p><ul style="text-align: left;"><li>Edges can only belong to cycles containing both endpoints, and only<br />in one orientation: \begin{align*}<br />z_{i,j,c}+z_{j,i,c} & \le y_{i,c}\quad\forall[i,j]\in E,\forall c\\<br />z_{i,j,c}+z_{j,i,c} & \le y_{j,c}\quad\forall[i,j]\in E,\forall c<br />\end{align*}</li><li>Flow only occurs on edges that are used, and only in the orientation<br />used: \begin{align*}<br />f_{i,j,c} & \le Nz_{i,j,c}\quad\forall[i,j]\in E,\forall c\\<br />f_{j,i,c} & \le Nz_{j,i,c}\quad\forall[i,j]\in E,\forall c<br />\end{align*}</li><li>Constructed cycles have one anchor and unused cycles have none: \[<br />\sum_{i=1}^N w_{i,c}=x_{c}\quad\forall c<br />\]</li><li>A cycle's anchor must belong to the cycle: \[<br />w_{i,c}\le y_{i,c}\quad\forall i,\forall c<br />\]</li><li>In every cycle, at every node, there are either two incident edges<br />(if the node belongs to the cycle) or none (if the node is not part<br />of the cycle: \[<br />\sum_{j:[i,j]\in E}(z_{i,j,c}+z_{j,i,c})=2y_{i,c}\quad\forall i,\forall c<br />\]</li><li>In any used cycle, at any node in the cycle other than the anchor, flow out is at least one unit less than flow in (mimicking the MTZ formulation of the TSP): \[<br />\sum_{j:[i,j]\in E}f_{i,j,c}\le\sum_{j:[i,j]\in E}f_{j,i,c}-y_{i,c}+Nw_{i,c}\quad\forall i,\forall c<br />\]</li><li>To ensure a complete cycle, one unit of flow must return to the anchor node: \[<br />\sum_{j:[i,j]\in E}f_{j,i,c}\ge w_{i,c}\quad\forall i,\forall c<br />\]</li><li>The $u$ variables are defined in terms of the $z$ variables: \[<br />u_{i,j,c,c'}\le z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\quad\forall[i,j]\in E,\forall c<c'<br />\] \[<br />u_{i,j,c,c'}\le2-\left(z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\right)\quad\forall[i,j]\in E,\forall c<c'<br />\]</li><li>Any pair of cycles constructed must differ in at least one edge (which<br />also prevents a cycle from repeating with the opposite orientation): \[<br />\sum_{[i,j]\in E}u_{i,j,c,c'}\ge x_{c}+x_{c'}-1\quad\forall c<c'<br />\]</li></ul><p>It is also possible to add constraints to mitigate some of the symmetry in the model. Two come to mind:</p><ul style="text-align: left;"><li>Unused cycles must have higher indices than used cycles: \[<br />x_{c}\le x_{c-1}\quad\forall c\in\left\{ 2,\dots,C\right\} <br />\]</li><li>The anchor in a cycle must have the smallest index of any node<br />in the cycle: \[<br />w_{i,c}+y_{j,c}\le1\quad\forall i>j,\forall c<br />\]</li></ul><p>The first one is of low enough dimension that I automatically included it in my Java code. I made the second one optional in the code.</p><p>When I ran it against the 9 node, 12 edge graph in the OR SE question, CPLEX had a solution with 13 cycles (the full number) after about 25 seconds and 5,000 nodes -- way, way more time than the 13 <i>milliseconds</i> the iterative method needed. I set $C=20$ as the upper bound on the number of cycles, which of course is also the initial upper bound on the objective. After 60 seconds (the time limit I set), CPLEX's upper bound was still 20. So probable optimality will not be easy to come by.</p><p>I've got one more MILP model up my sleeve, coming in the next post. Meanwhile, as a reminder, my Java code is available <a href="https://gitlab.msu.edu/orobworld/simplecycles" target="_blank">here</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-44266291329494032342023-01-25T18:26:00.000-05:002023-01-25T18:26:42.240-05:00Finding All Simple Cycles (I)<p>This is the first of a sequence of posts related to the question of finding all simple cycles in an undirected graph. Motivation for the posts comes from a <a href="https://or.stackexchange.com/questions/9786/number-of-hamiltonian-sub-cycles-on-the-graph" target="_blank">question</a> on OR Stack Exchange. A simple cycle is a cycle with no repeated nodes (other than the starting/ending node).</p><p>Assume we have an undirected graph with nodes indexed from 1 to $N$ and with edge set $E.$ I will use the term "anchor" to refer to the node at which a tour of a simple cycle starts (and ends), and require (without loss of generality) that the anchor of every cycle be the lowest index node in the cycle. Since a cycle is the same whether you traverse it "clockwise" or "counterclockwise", I will also require without loss of generality that nodes in a cycle be listed so that the node immediately after the anchor has smaller index than the node immediately before the anchor. This is just to reduce production of duplicate cycles.</p><p>I came up with three approaches to finding all simple cycles. The first, which I will describe below, is a simple iterative scheme that is the fastest. The other two, described in subsequent posts, use mixed integer linear programs (MILPs) -- not because they are efficient but because the OR SE question specifically asked about MILP approaches. I will describe those in subsequent posts.</p><p>The iterative scheme is pretty simple. It uses a queue of paths (incomplete cycles) to store work in progress. I will use "terminus" to indicate the last node in a path. When the terminus equals the anchor, we have a cycle. Due to my definition of "anchor", we can say that no node on a path can have a lower index than the anchor of the path.</p><p>The iterative scheme loops over possible anchors from 1 to $N.$ For each anchor, we initialize the queue with all paths consisting of a single edge incident at the anchor and not incident at any node less than the anchor. While the queue is not empty, we pop a path and create one or more new paths by extending that path with each qualifying edge. A qualifying edge is one that meets the following conditions:</p><ul style="text-align: left;"><li>it is incident at the terminus of the current path;</li><li>it is not incident at any node already on the path (other than anchor);</li><li>it is not incident at any node less than the anchor; and</li><li>if it is incident at the anchor, it is not the same as the first edge of the path (i.e., we do not allow a "cycle" of the form $a\rightarrow b \rightarrow a$).<br /></li></ul><p>If the new edge leads to the anchor, we have a cycle. If not, we update the terminus and add the extended path to the queue, for further processing later. If a path cannot be extended because it is not a cycle and there are no qualifying edges left, we discard it.</p><p>Coding the algorithm is pretty straightforward, and it should work on any undirected graph (even those that are not connected). On the example from the OR SE post (9 nodes, 12 edges) it took a whopping 11 milliseconds to find all 13 simple cycles.</p><p>Java code for this and the two MILP models (the latter requiring a recent version of CPLEX) can be found in my <a href="https://gitlab.msu.edu/orobworld/simplecycles" target="_blank">GitLab repository</a>. The iterative algorithm is in the <span style="font-family: courier;">CycleFinder</span> class. Stay tuned for the MILP models.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-77178391836408762632022-12-26T14:27:00.001-05:002022-12-26T14:27:41.558-05:00Selecting Dispersed Points<p>Fellow blogger Erwin Kalvelagen posted <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2022/12/maximum-number-of-points-with-minimum.html" target="_blank">a comparison</a> of two binary programming models, one quadratically constrained and one linearly constrained, for the problem of selecting a maximal number of points from a finite set subject to the requirement that no two selected points be closer than a specified distance. The models were an answer to a <a href="https://scicomp.stackexchange.com/questions/42321/selecting-most-points-from-a-set-of-points-with-distance-constraint" target="_blank">question</a> posted on Computational Science Stack Exchange. Not surprisingly, the linearized version tended to solve faster than the quadratic version.</p><p>In Erwin's linear model, the constraints take the form $$\underline{D}(x_i + x_j - 1) \le d_{i,j} \quad (1)$$ where $d_{i,j}$ is the distance between points $i$ and $j$, $\underline{D}$ is the minimum allowable distance between selected points, and $x_k$ is a binary variable indicating whether point $k$ is selected (1) or not (0). I coded both his models in Java, using CPLEX 22.1.1, along with another linear model where the constraints are expressed as $$x_i + x_j \le 1\quad (2)$$ for those pairs $(i,j)$ where $d_i + d_j \le \underline{D}.$ In other words, we exploit the fact that we know the distances at the outset to precompute which pairs of points can / cannot coexist, and just rule out the pairs that cannot. Since (1) is equivalent to $$x_i + x_j \le 1 + \frac{d_{i,j}}{\underline{D}},$$ constraint (2) is clearly at least a bit tighter than constraint (1).<br /></p><p>Erwin started with a problem size of 50 points for demonstration purposes, then doubled that to compare timing of this two models. I ratcheted the problem size up to 1,000 points to compare his linear model to mine. (I did not test the quadratic model at that size.) As with all things MIP, the results were not entirely consistent. In limited testing, the model using (2) was usually faster than the model using (1), but occasionally (1) proved faster. The run time differences were not large enough to be exciting. For instance, in one test run version (1) needed 4.633 seconds versus 2.987 seconds for version (2).</p><p>Overall, I can't say the time differences lived up to my expectations, and the fact that at least occasionally (1) was faster than (2) (perhaps due to some quirk in presolving, or just to some random choices in branching) is consistent with my experience that MIP behaviors are, well, not consistent. <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com4tag:blogger.com,1999:blog-8781383461061929571.post-2656940070063535962022-12-13T17:38:00.000-05:002022-12-13T17:38:10.113-05:00Scheduling a Round-Robin Tournament<p>Once in a while I come across a scheduling problem, and of course my first reaction is to think "integer program". A while back, for example, I used an IP model to work out a schedule for a rotating duplicate bridge game (in which both teams and hosts rotated) at the request of a buddy. Most recently, I saw a question on OR Stack Exchange ("<a href="https://or.stackexchange.com/questions/9545/doubles-round-robin-sorting-algorithm" target="_blank">Doubles Round Robin Sorting Algorithm</a>") related to tournament scheduling. </p><p></p><p>The problem involves six players playing pickleball (so two players on each team, with two sitting out, for each game). Given six players, there are 15 possible teams (pairings). Twelve games are scheduled each week for four weeks, with two sets of six games per week. The author of the question correctly computed that there are 45 possible game configurations. Over the course of the four weeks, he wanted every possible game played at least once (which leaves the final three game slots to be filled arbitrarily). The following conditions must be met:</p><ul style="text-align: left;"><li>within each set of six games, every player plays four games and sits out two;</li><li>no player sits out two consecutive games within the same set;</li><li>no two players are partners more than once per set.</li></ul><p>The problem poses no criterion for selecting among feasible schedules; we just want to find any one schedule that works. </p><p>My IP formulation can be found in my answer to the OR SE question. I will just mention that it uses binary variables $x_{g,s}=1$ if game $g$ goes in slot $s$ and 0 if not. Games are enumerated up front and indexed from 1 to 15. Slots refer to positions in the schedule and are numbered from 1 to 48, with six slots per set and 12 slots per week. Since no criterion is specified, we just use the default (minimize 0).</p><p>I also tried a constraint programming model, using general integer variables $s_1,\dots,s_{48},$ where $s_j \in {1,\dots,15}$ is the index of the game played in slot $j$ of the schedule. My CP formulation uses the "all different" constraint (fairly ubiquitous among CP solvers) to ensure that the first 45 slots each contain a different game. It uses the CP Optimizer "distribute" constraint to ensure that no game is played more than twice and the CP Optimizer "count" constraint to ensure that each player plays exactly four times per set (and thus sits twice) and that no team plays more than once in a set. I'm not sure how common those types of constraints are among CP solvers because I don't have much experience with CP solvers.</p><p>As it turns out, both CPLEX (for the IP model) and CP Optimizer (for the CP model) produced feasible schedules in about a second or so on my desktop PC. My intuition was that a CP model would be better suited to this type of problem, because all variables are integer and a CP solver would not be doing much if any matrix arithmetic. As it turns out, it would take a much larger test case to tell whether that intuition has any merit.</p><p>If anyone wants to play with the models, my Java code is available from my university GitLab <a href="https://gitlab.msu.edu/orobworld/roundrobin" target="_blank">repository</a>. Running it would require CPLEX and CP Optimizer (and not necessarily the most recent versions of either).<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-23095946496261555012022-11-06T15:52:00.002-05:002022-11-06T15:53:39.220-05:00A Bicriterion IP Model for a Game Strategy<p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">A <a href="https://math.stackexchange.com/questions/4569317/how-can-i-solve-this-optimization-problem-with-integer-variables" target="_blank">question</a> on Mathematics Stack Exchange asks how to solve an optimization problem related to a game apparently called "speedrunner". The problem boils down to selecting integer values for variables $n_a,$ $n_c$ and $n_{b,i}$ ($i=1,\dots,n_c$) so as to minimize elapsed time $$t_{\textrm{total}} =t_{a}n_{a}+\sum_{i=1}^{n_{c}}t_{b}n_{b,i}+t_{c}n_{c},$$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">where $t_a$, $t_b$ and $t_c$ are parameters with specified values. The sole constraint specified is that winnings, defined by</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$x_{\textrm{total}} = x_{0}n_{a}\prod_{i=1}^{n_{c}}n_{b,i}$$(where $x_0$ is another parameter), must be at least $\$1$ billion. We can infer from the requirement of positive earnings that $n_a \ge 1,$ $n_c \ge 1,$ and $n_{b,i} \ge 1$ for $i=1,\dots,n_c.$ For modeling purposes, we will allow the index $i$ to exceed $n_c$ and require that $n_{b,i}=0$ for $i \gt n_c.$ An integer linear programming model would be a no-brainer were it not for two things: the product form of the expression for winnings is nonlinear; and it involves the product of a variable number of variables. </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The approach I chose was to express the original variables in terms of binary variables. To do that, we first need upper bounds on the original variables. We get those by guessing an upper bound $T$ on $t_{\textrm{total}}.$ (If we guess too low, we will either get a solution that exceeds that upper bound or the model will become infeasible, either of which will tip us off to try a larger guess.) Once we have that upper bound, we get an upper bound for each variable by setting the other two variables as small as possible. That leads to the following upper bounds:$$N_{a} =\left\lfloor \frac{T-t_{b}-t_{c}}{t_{a}}\right\rfloor$$</p>
<p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$N_{b} =\left\lfloor \frac{T-t_{a}-t_{c}}{t_{b}}\right\rfloor$$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">and </p>
<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$N_{c} =\left\lfloor \frac{T-t_{a}}{t_{c}+t_{b}}\right\rfloor .$$ </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The denominator in the third equation arises from the fact that adding 1 to $n_c$ not only directly costs $t_c$ units of time but also adds another $n_{b,i}$ variable with minimum value 1, costing $t_b$ units of time.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Armed with those upper bounds, we can introduce binary variables $y_j$ $(j=1,\dots,N_a),$ $w_j$ $(j=1,\dots, N_c)$ and $z_{i,j}$ $(i=1,\dots,N_c$ and $j=0,\dots,N_b)$ along with constraints making them type 1 special ordered sets (i.e., each set contains exactly one variable with value 1) and expanding the original variables in terms of them. Specifically:</p><ul style="text-align: left;"><li>$n_{a}=\sum_{j=1}^{N_{a}}j\cdot y_{j}$ with constraint $\sum_{j=1}^{N_{a}}y_{j}=1;$<style type="text/css">p, li { white-space: pre-wrap; }</style> </li><li><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$n_{c}=\sum_{j=1}^{N_{c}}j\cdot w_{j}$ with constraint $\sum_{j=1}^{N_{c}}w_{j}=1;$ and</p></li><li>$n_{b,i}=\sum_{j=0}^{N_{b}}j\cdot z_{i,j}$ for $i=1,\dots,N_{c}$
<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">with the following constraints for all $i\in\left\{ 1,\dots,N_{c}\right\}:$</p><ul><li>$\sum_{j=0}^{N_{b}}z_{i,j}=1$ (the value of $n_{b,i}$ is uniquely defined); and<br /></li></ul><style type="text/css">p, li { white-space: pre-wrap; }</style></li><ul><li>$z_{i,0}+\sum_{k=i}^{N_{c}}w_{k}=1$ ($n_{b,i}=0$ if and only if $n_{c}<i$).</li></ul></ul><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Armed with all that, the original objective function can be expressed as minimizing$$t_{a}n_{a}+t_{b}\sum_{i=1}^{N_{c}}n_{b,i}+t_{c}n_{c},$$where the only tweak is that we now sum over a constant number $N_c$ of terms rather than a variable number $n_c,$ relying on the fact that $n_{b,i}=0$ for $i>n_c.$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">That brings us to the nonlinear formula for winnings. The original constraint is $ x_{\textrm{total}} \ge 10^9,$ which we can rewrite as $\log(x_{\textrm{total}}) \ge \log(10^9)$ using whatever flavor logarithm you like. Expanding the logs of $n_a$ and $n_{b,i}$ in terms of the binary variables, we get$$\log(x_{\textrm{total}}) = \log(x_{0})+\sum_{j=1}^{N_{a}}\log(j)\cdot y_{j} +\sum_{i=1}^{N_{c}}\sum_{j=1}^{N_{b}}\log(j)\cdot z_{i,j}.$$So the constraint that log winnings equal or exceed the log of $10^9$ is linear in the binary variables.</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The solution in the Math Stack Exchange post $(n_a = 7,$ $n_c = 3$ and $n_{b,1}=n_{b,2}=n_{b,3}=18)$ turns out to be optimal given the parameter values in the question, and the IP model here will correctly achieve the minimum time (approximately 52.3 minutes) ... but it will not necessarily reproduce the best winnings within that time (approximately $\$1.02$ billion). That is because there are multiple feasible solutions that hit the correct time value, with $n_a = 7$ and with $n_c = 3$ but with the 54 cumulative $n_b$ units allocated differently (for instance, $\{19, 19, 16\}$ rather than $\{18, 18, 18\}.$ To get the best possible solution, we can use a bicriterion model with lexicographically ordered objectives, first minimizing time and then maximizing winnings (by minimizing the negative of the log of winnings).</p><p></p><p>I tested that model using the Java API to CPLEX 22.1. My code is available from my <a href="https://gitlab.msu.edu/orobworld/speedrunner" target="_blank">university repository</a>.<br /></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p><style type="text/css">p, li { white-space: pre-wrap; }</style></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-14095270023479677062022-11-01T11:59:00.000-04:002022-11-01T11:59:41.193-04:00Holt Double Exponential Smoothing<p>Given a time series $x_t,\, t=1,2,\dots,$ presumed to contain linear trend, <a href="https://en.wikipedia.org/wiki/Exponential_smoothing#Double_exponential_smoothing_(Holt_linear)" target="_blank">Holt's double exponential smoothing method</a> computes estimates $s_t$ of the level (mean) at time $t$ and $b_t$ of the (presumed constant) slope using the following formulas: $$s_t = \alpha x_t + (1-\alpha)(s_{t-1} + b_{t-1})$$ and $$b_t = \beta(s_t - s_{t-1}) + (1-\beta)b_{t-1}$$ where $\alpha,\beta \in (0,1)$ are smoothing weights. A recent <a href="https://or.stackexchange.com/questions/9217/forecasting-with-holts-linear-trend-method/" target="_blank">question on OR Stack Exchange</a> asked why the second formula is based on the level estimate and not the observed value. In other words, the proposed alternative to the trend update was $$b_t = \beta(x_t - x_{t-1}) + (1-\beta)b_{t-1}.$$</p><p>The intuition for doing it Holt's way is fairly simple. If exponential smoothing is working as intended (meaning <i>smoothing</i> things), then the difference in level estimates $s_t - s_{t-1}$ should be less variable than the difference in observed values $x_t - x_{t-1}.$ A formal proof probably involves induction arguments, requiring more functioning brain cells than I had available at the time, so I was a bit loose mathematically in my answer on OR SE. Just to confirm the intuition, I did some Monte Carlo simulations in R. The notebook containing the experimental setup, including code, is available <a href="https://rubin.msu.domains/blog/Holt.nb.html" target="_blank">here</a>. It requires the <span style="font-family: courier;">dplyr</span> and <span style="font-family: courier;">ggplot2</span> library packages.<br /></p><p>The following plots show confidence intervals over time for the errors in the level and trend estimates using both Holt's formula and what I called the "variant" method. They are from a single experiment (100 independent time series with identical slope and intercept, smoothed both ways), but other trials with different random number seeds and changes to the variability of the noise produced similar results.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUgHyaC2TcpZkTaIz56M66ivTu36aHkdHCisEYUsLmc_7rQdmZ2xYiVmj0QPE5YK0h_8NoYjlEN4C6aELVQThit3XqDUi6THWxVZ3w4KV53k-h6CL19t7anItawyIpmSDYPgij665mZBcYH_KQNoERrxFeEYFbR3xi7Ikp-0NXwqhKK2Z1FlRh8QxE/s579/Holt_level.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="plot of confidence intervals for error in level estimates" border="0" data-original-height="357" data-original-width="579" height="394" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUgHyaC2TcpZkTaIz56M66ivTu36aHkdHCisEYUsLmc_7rQdmZ2xYiVmj0QPE5YK0h_8NoYjlEN4C6aELVQThit3XqDUi6THWxVZ3w4KV53k-h6CL19t7anItawyIpmSDYPgij665mZBcYH_KQNoERrxFeEYFbR3xi7Ikp-0NXwqhKK2Z1FlRh8QxE/w640-h394/Holt_level.png" width="640" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDQmQIPfbMZ25b11FqmSHjgqyN1h0RfwKWn5rAKNHzEB1lvhuKpYEe17y4MiUIwcwTGwS6-wxTeHN0a6VA-Q8Qxiwuo1q3QT5ML1ABHMm7YJhdtCOLjl63-Y8XQKbq8Io1a8bruxkkgjv8Yi_-jiWy091hKV5TNu0UeHIag-ZNoM3UjyktlIdE_IVH/s579/Holt_trend.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="plot of confidence intervals for error in trend estimates" border="0" data-original-height="357" data-original-width="579" height="394" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDQmQIPfbMZ25b11FqmSHjgqyN1h0RfwKWn5rAKNHzEB1lvhuKpYEe17y4MiUIwcwTGwS6-wxTeHN0a6VA-Q8Qxiwuo1q3QT5ML1ABHMm7YJhdtCOLjl63-Y8XQKbq8Io1a8bruxkkgjv8Yi_-jiWy091hKV5TNu0UeHIag-ZNoM3UjyktlIdE_IVH/w640-h394/Holt_trend.png" width="640" /></a></div><br /><p>In both cases, the estimates start out a bit wobbly (and the Holt estimates may actually be a bit noisier), but over time both stabilize. There does not seem to be much difference between the two approaches in how noisy the level estimates are, at least in this run. The Holt estimates may have slightly narrower confidence intervals, but that is not clear, and the difference if any seems pretty small. The Holt trend estimates, however, are considerably less noisy than those of the variant method, supporting the intuitive argument.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-37764870158890072652022-09-13T17:55:00.001-04:002022-09-13T17:55:59.819-04:00With CPLEX, You're Not In Until You're In<style type="text/css"><!--
/**
* GeSHi (C) 2004 - 2007 Nigel McNie, 2007 - 2008 Benny Baumann
* (http://qbnz.com/highlighter/ and http://geshi.org/)
*/
.java {font-family:monospace;color: #006; border: 1px solid #d0d0d0; background-color: #f0f0f0;}
.java a:link {color: #000060;}
.java a:hover {background-color: #f0f000;}
.java .imp {font-weight: bold; color: red;}
.java .kw1 {color: #000000; font-weight: bold;}
.java .kw2 {color: #000066; font-weight: bold;}
.java .kw3 {color: #003399;}
.java .kw4 {color: #000066; font-weight: bold;}
.java .co1 {color: #666666; font-style: italic;}
.java .co2 {color: #006699;}
.java .co3 {color: #008000; font-style: italic; font-weight: bold;}
.java .coMULTI {color: #666666; font-style: italic;}
.java .es0 {color: #000099; font-weight: bold;}
.java .br0 {color: #009900;}
.java .sy0 {color: #339933;}
.java .st0 {color: #0000ff;}
.java .nu0 {color: #cc66cc;}
.java .me1 {color: #006633;}
.java .me2 {color: #006633;}
.java span.xtra { display:block; }
--!></style>
<p>(For a suitable soundtrack for this post, try <a href="https://www.youtube.com/watch?v=OOWO--z1S8A" target="_blank">this</a> three minute video.) <br /></p><p>A <a href="https://community.ibm.com/community/user/datascience/discussion/variables-annotations-in-cplex-for-automatic-benders-decomposition-java-api" target="_blank">question</a> popped up on the CPLEX support forum that reminded me of a slightly obscure and occasionally import aspect of the CPLEX programming APIs. What follows holds true for the Java API, and I believe it applies to the C++ and maybe C APIs. It may also hold for some of the other APIs.</p><p>The questioner, working in Java, was trying to use the "annotated Benders" approach to a MIP model. The programming issue actually is not specific to Benders decomposition, annotated or otherwise. The problem occurred in a loop where the user was defining integer variables and trying to associate them with the master problem. Here's the code:</p>
<div class="java"><span class="kw1">for</span><span class="br0">(</span><span class="kw4">int</span> k<span class="sy0">=</span><span class="nu0">0</span><span class="sy0">;</span> k <span class="sy0"><</span> x.<span class="me1">length</span><span class="sy0">;</span> k<span class="sy0">++</span><span class="br0">)</span> <span class="br0">{</span> <br />
x <span class="br0">[</span>k<span class="br0">]</span> <span class="sy0">=</span> cplex.<span class="me1">intVar</span><span class="br0">(</span><span class="nu0">0</span>,cluster.<span class="me1">getNbVeh</span><span class="br0">(</span><span class="br0">)</span>,<span class="st0">"x_"</span><span class="sy0">ez_plus</span>k<span class="br0">)</span><span class="sy0">;</span> <br />
cplex.<span class="me1">setAnnotation</span><span class="br0">(</span>benders, x<span class="br0">[</span>k<span class="br0">]</span>,<br /> IloCplex.<span class="me1">CPX_BENDERS_MASTERVALUE</span><span class="br0">)</span><span class="sy0">;</span><br />
<span class="br0">}</span></div><div class="java"><span class="br0"> </span></div><p>The error message from CPLEX was "object is unknown to IloCplex", which was understandably confusing and yet, from personal experience, predictable.</p><p>Here is the problem: at the point that variable <span style="font-family: courier;">x[k]</span> is fed to the <span style="font-family: courier;">setAnnotation()</span> function, <span style="font-family: courier;">x[k]</span> is initialized (in the programming sense that it has content) but is <i>not yet part of the model instance</i> (since it has not be used in a constraint or objective function). In fact, even if it had been added to a constraint or objective expression, it would still not be part of the model until the constraint or objective was itself added to the model. This is despite the fact that the model instance (here <span style="font-family: courier;">cplex</span>) was used to create the variable.</p><p>One possible fix would be to use <span style="font-family: courier;">x[k]</span> in something that was added to the model <i>before</i> trying to set its annotation. A simpler fix is just to insert the line <span style="font-family: courier;">cplex.add(x[k]);</span> in between where <span style="font-family: courier;">x[k]</span> is defined and where the annotation is set. This nominally adds <span style="font-family: courier;">x[k]</span> to the model (even though not in any specific role), which is enough to make it a "known object" to CPLEX.</p><p>I learned (the hard way) about this long before CPLEX added support for Benders decomposition. When you look up solution values after solving a model, CPLEX provides convenient functions to look up either the value of a single variable or the value of an array of variables. In many applications, it is tempting to use the latter. At the same time, in some applications you may want to define an array of variables but not use all of them. For instance, suppose I have a network where some nodes have demands and some do not, and I want to define a binary variable <span style="font-family: courier;">x[k]</span> just at nodes <span style="font-family: courier;">k</span> that do have demand. If I define <span style="font-family: courier;">x</span> to have dimension equal to the number of nodes and then only initialize <span style="font-family: courier;">x[k]</span> at the nodes with demands, passing <span style="font-family: courier;">x</span> to a method to retrieves values will bomb because some of the entries of <span style="font-family: courier;">x</span> are null. So I cleverly initialize <span style="font-family: courier;">x[k]</span> for all <span style="font-family: courier;">k</span> ... and the attempt to retrieve values still bombs, because <span style="font-family: courier;">x[k]</span> was never included in the model (in any constraint or objective function) when node <span style="font-family: courier;">k</span> has no demand. Again, the solution is to call <span style="font-family: courier;">add(x[k])</span> on the model object before solving, which means the model knows about <span style="font-family: courier;">x[k]</span> even if it never appears in any part of the actual model.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-3375395025770255632022-07-19T19:33:00.001-04:002022-07-19T19:33:08.984-04:00"Block Party" Puzzle<p>A question posed on the OR Discord channel by a doctoral student led me to discover the existence of the <a href="https://www.janestreet.com/puzzles/" target="_blank">Jane Street puzzle page</a>. The student was asking about building a MILP model for the June 2022 puzzle, called "<a href="https://www.janestreet.com/puzzles/block-party-4-index/" target="_blank">Block Party 4</a>". The puzzle involves inserting numbers into a grid, with some cells already filled in. It bears a superficial resemblance to sudoku, but with a few key differences. Where a sudoku is divided into nine square regions of nine cells each, the block party puzzle grid is divided into connected regions of varying sizes and shapes. Within a region of $k$ cells, the numbers 1 through $k$ must be filled in. Finally, rather than requiring that rows and columns contain no repeated numbers, the rules require that, for each possible value $K$, if $K$ is inserted into a cell then the nearest instance of $K$ must be at distance exactly $K$ in the $L_1$ norm. So to use a 1, there must be a 1 in an adjacent cell. To use a 2, there must be a 2 in a cell two moves away but no 2 in any adjacent cell.</p><p>Since this is a problem with logic constraints and integer decisions, my instinct was to think that constraint programming would be faster than integer programming. To test this, I coded both an IP model and a CP model in Java, using CPLEX and CP Optimizer as the respective solvers. I assumed that the grid would be square, since both the June puzzle (10 x 10) and a smaller example provided (5 x 5) were. Both models can easily be adjusted for non-square grids.</p><p>Assume an $N\times N$ grid partitioned into regions, and let $M$ be the size of the largest region (and thus the largest value that can be used in the puzzle). Number the cells 1 through $N^2$ in any order. (I used a left-to-right raster scan.) For the IP model, I use binary variables $x_{i,j}$ $(i=1,\dots,N^2$, $j=1,\dots,M)$ to indicate whether value $j$ is inserted into cell $i$. For cells with known values, I fix $x_{i,j}$ to either 0 or 1 as appropriate while building the model. Also, if cell $i$ lies in a region of size $K$, then I can fix $x_{i,j}=0$ for $j>K.$</p><p>Since we just want a feasible solution, I let the IP objective function default to minimizing zero. The most obvious constraint is $$\sum_{j=1}^M x_{i,j} = 1 \quad \forall i,$$which forces a single value to be selected for each cell. Similarly, if $B$ is a block with size $K,$ then $$\sum_{i\in B}x_{i,j}=1 \quad \forall j=1,\dots,K$$forces every value from 1 to $K$ to be used exactly once in the block. Finally, for each block $B,$ each cell $i\in B$ and each legal value $j\in \lbrace 1, \dots, \vert B\vert\rbrace$ for that cell, we add these constraints: $$x_{i,j} \le \sum_{k\in N_j(i)} x_{k,j} $$ and $$x_{i,j} + x_{k,j} \le 1\quad \forall k\in N^-_j(i),$$ where $N_j(i)$ is the set of all cells at distance exactly $j$ from cell $i$ and $N^-_j(i)$ is the set of all cells at distance less than $j$ from cell $i$ (excluding cell $i$ itself). These enforce the rule that, for value $j$ to be used in cell $i,$, it must also be used in at least one cell at distance $j$ from $i$ and in no closer cell.</p><p>The CP model is a bit more straightforward to articulate. Again, there is no objective function, since we are just solving for a feasible solution. For each cell $i$, there is a single integer variable $x_i$ with domain $1,\dots,\vert B \vert$ where $B$ is the block containing cell $i.$ If we know that cell $i$ is fixed to value $k,$ we just declare $x_i$ to have domain $\lbrace k \rbrace.$ For each block, we use an "all different" constraint to enforce the requirement that the cells in the block take distinct values. For each cell $i$ and legal value $j$ for it, the implication constraint $$(x_i = j) \implies \bigvee_{k\in N_j(i)} (x_k = j)$$ where $\bigvee$ denotes disjunction ("or"), forces at least one cell at distance $j$ from $i$ to take value $j$ if cell $i$ does, while the constraints $$(x_i = j) \implies (x_k \neq j) \quad \forall k\in N^-_j(i)$$ prohibit any closer cell from using that value. (These constraints could be condensed into a conjunction on the right hand side. For reasons I have since forgotten, I did not bother to do so.)</p><p>Both models solved the 10x10 puzzle easily. My expectation was that the CP model would be faster, for several reasons. First, it has 100 general integer variables, whereas the IP model started out with 1,100 binary variables (which the presolver whittled down to 119 binary variables, compared to 90 variables for the CP model after presolving). Second, the "all different" CP constraint seems to be a more efficient way than a steaming pile of inequality constraints to enforce the rule that no two cells in the same block take the same value. Third, CP Optimizer would be doing integer arithmetic while CPLEX would be doing double precision arithmetic, and on a per-operation basis integer arithmetic should be faster. Lastly, my experience in the past has been that the one edge IP models tend to have over CP models is tighter bounds, but that has no effect in a feasibility problem (when you are not optimizing anything).</p><p>As it turns out, I was in for a surprise. Actually, make that two surprises. First, the IP model after presolving had 211 constraints, whereas the CP model after presolving had 7,399 constraints. Note that, in the implication constraints, the left side and each equality on the right side count as a constraint. I'm not sure how comparable constraints are between the two models, but I was not expecting the CP model to have so many more. Second, while both model solved in negligible time, the IP model was faster. CPLEX solved the IP model at the root node (no branching) in about 0.01 seconds on my fairly average desktop PC. CP Optimizer needed 2,678 branches and about 0.12 seconds to solve the CP model, of which 0.05 seconds was spent in the "engine" (i.e., solving) and the rest was spent in "extraction" (turning the model into something suitable for the engine).</p><p>My Java code (which requires both CPLEX and CP Optimizer but nothing else) can be found in my <a href="https://gitlab.msu.edu/orobworld/blockparty4" target="_blank">GitLab repository</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-23367493808631263932022-07-15T14:22:00.000-04:002022-07-15T14:22:38.301-04:00Left Matrix Inverses in R<p>The following <a href="https://or.stackexchange.com/questions/8693/finding-all-left-inverses-of-a-matrix" target="_blank">question</a> popped up on OR Stack Exchange: given an $m\times n$ matrix $A$ with $m > n,$ how can one find <i>all</i> left inverses of $A$ in R? The author mentions that dimensions in their case would be around 200x10.<br /></p><p>A left inverse is a matrix $B\in \mathbb{R}^{n\times m}$ such that $B A = I.$ In order for a left inverse to exist, $A$ must have full column rank. If it does not, $Ax = 0$ for some nonzero $x\in \mathbb{R}^n,$ in which case $$x = Ix = (BA)x = B(Ax) = B\cdot 0 = 0,$$ a contradiction.</p><p>Henceforth we assume that $A$ has full column rank, in which case the <a href="https://en.wikipedia.org/wiki/Kernel_(linear_algebra)#Left_null_space" target="_blank">left null space</a> $\mathcal{N} \subseteq \mathbb{R}^m$ will have dimension $m-n.$ Now suppose that $C\in \mathbb{R}^{n\times m}$ has the property that every row of $C$ belongs to $\mathcal{N}.$ Then $CA = 0$ and so $(B+C)A = BA+0 = I,$ making $B+C$ another left inverse of $A.$ That means $A$ has an uncountably infinite number of left inverses. Conversely, if $DA=I$ then the rows of $D-B$ belong to $\mathcal{N}.$ So the set of left inverses of $A$ can be fully characterized by any individual left inverse and a basis for the left null space of $A.$</p><p>Getting this information in R is remarkably easy. There are multiple ways to compute a left inverse, but if you have the <span style="font-family: courier;">pracma</span> library loaded, then the function <span style="font-family: courier;">pracma::pinv()</span> can be used to compute the <a href="https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse" target="_blank">Moore-Penrose pseudoinverse</a>, which is a left inverse of $A.$ To get a basis for the left null space of $A,$ we apply the function <span style="background-color: white;"><span style="font-family: courier;">pracma::nullspace()</span></span> to $A^T,$ which computes a basis for the right null space of the transpose of $A,$ and then transpose the results.</p><p>I have a small <a href="https://rubin.msu.domains/blog/left_inverses.nb.html" target="_blank">R notebook</a> that demonstrates this on a random 200x10 matrix.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-81045312963307406372022-07-14T19:52:00.001-04:002022-07-14T19:52:29.473-04:00Models for a Social Network Problem<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">An <a href="https://or.stackexchange.com/questions/8674/is-this-ilp-formulation-for-group-closeness-centrality-a-column-generation-appro" target="_blank">interesting question</a> recently popped up on Operations Research Stack Exchange. The setting is a graph $G=(V,E)$ in which path length is defined to be the number of edges on the path (i.e., all edges have weight 1). The problem is to select a subset $S\subset V$ of vertices with a specified cardinality $k$ so as to minimize the sum over all nodes of the distance from each node to the closest selected node. I will assume the graph is connected, since otherwise the objective might be undefined.</p>
<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br />The author of the original question indicated in a comment that the context for the problem is something involving social networks, and in another comment indicated that the diameters of the graphs are relatively constant regardless of graph size. The diameter of $G$, which I will denote by $\delta(G),$ is the maximum distance between any pair of vertices in $V.$ I will denote by $D$ the set $\left\{ 0,1,\dots,\delta(G)\right\} .$</p>
<p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">
</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The author posed the following integer programming model, in which $x_{v,d}\in\left\{ 0,1\right\} $ is 1 if vertex $v$ has distance $d$ to the nearest selected vertex and 0 otherwise. Note that $x_{v,0}=1$ if and only if $v\in S.$ \begin{align*}
\min_{x} & \sum_{v\in V}\sum_{d\in D}d\cdot x_{v,d}\\
\text{s.t.} & \sum_{v\in V}x_{v,0}=k & (1)\\
& \sum_{d\in D}x_{v,d}=1\quad\forall v\in V & (2)\\
& x_{v,d}\le\sum_{u\in N_{d}(v)}x_{u,0}\quad\forall v\in V,\forall d\in D\backslash\left\{ 0\right\} & (3)
\end{align*}where $N_{d}(v)\subset V$ is the set of all nodes at (shortest) distance $d$ from $v.$ Constraint (1) ensures that $S$ has the correct cardinality, constraint (2) ensures that the distance of any node from $S$ is uniquely defined, and constraint (3) ensures that a node is at distance $d$ from $S$ only if at least one node at distance $d$ belongs to $S.$ I will refer to this as the "distance model".</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">
</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The author was asking about a possible incremental approach, but I got curious about alternative models. Frequent forum contributor Rob Pratt suggested changing constraint (3) to $$x_{v,d}\le\sum_{u\in N_{1}(v)}x_{u,d-1}\quad\forall v\in V,\forall d\in D\backslash\left\{ 0\right\} \quad(3'),$$</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">which says that for a node $v$ to be at distance $d,$ one of its neighbors must be at distance $d-1.$ I will call that "distance model 2". Meanwhile, I thought it might help to leave $x_{v,0}$ binary but make $x_{v,d}\in\left[0,1\right]$ continuous for $d>0$ (which might or might not be equivalent to using branching priorities</p>
<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">to ensure that the $x_{v,0}$ variables were branched on before any</p>
<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">of the other variables). I will call that "distance model 3".</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><style type="text/css">
</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Someone else suggested what I will call the "assignment model", which uses one set of binary variables $x_{v}$ to determine which vertices are selected to be in $S$ and a second set if binary variables $y_{v,u}$ to indicate whether vertex $u$ is the closest vertex in $S$ to vertex $v.$ That model is as follows:\begin{align*}
\min_{x,y} & \sum_{u,v\in V}d_{v,u}y_{v,u}\\
\text{s.t.} & \sum_{v\in V}x_{v}=k & (4)\\
& \sum_{u\in V}y_{v,u}=1\quad\forall v\in V & (5)\\
& y_{v,u}\le x_{u}\quad\forall v,u\in V & (6)
\end{align*}where (4) enforces the size requirement for $S$, (5) stipulates that every vertex is assigned to a single selected vertex (possibly itself) and (6) makes sure that the assigned selected vertex is actually selected.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">
</p><p>Lastly, I came up with yet another model, which I will call the "flow model" since it is based on treating selected vertices as sinks and vertices outside $S$ as sources in a flow model. It uses binary variables $x_{v}$ to indicate whether a vertex is selected and continuous variables $y_{u,v}\in\left[0,\vert V\vert-k\right]$ for flow volumes. Since the edges are bidirectional, for each edge $(u,v)\in E$ there will be two flow variables, $y_{u,v}$ and $y_{v,u}$ (only one of which will be nonzero in the optimal solution). The idea is that each vertex that is not selected passes along any flow arriving at it plus one unit of new flow. Selected vertices soak up whatever flow comes in. We minimize the sum of the aggregate flows across all edges, which effectively charges each unit of flow 1 for each edge it crosses. The optimal solution will therefore send each unit of flow to the selected vertex (sink) closest to its source, making the cost of each unit of flow equal to the distance from source to nearest selected vertex. That model is as follows.\begin{align*}
\min_{x,y} & \sum_{(u,v)\in E}\left(y_{u,v}+y_{v,u}\right)\\
\text{s.t.} & \sum_{v\in V}x_{v}=k & (7)\\
& \sum_{u\in N_{1}(v)}\left(y_{v,u}-y_{u,v}\right)\ge1-\left(\vert V\vert-k\right)\cdot x_{v}\quad\forall v\in V & (8).
\end{align*}The by now familiar constraint (7) sets the size of the selected set. Constraint (8) says that the flow out of any vertex $v$ must be one greater than the flow in \emph{unless} the vertex is selected (in which case nothing needs to flow out of it).</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">To assess how the various models perform, I coded them in Java using CPLEX 22.1 as the solver and ran a few experiments on randomly generated graphs. I did not do nearly enough experiments for any definitive conclusions, but I will quote the results of one that I think is somewhat informative. The test graph has $\vert V\vert=2,000,$ $\vert E\vert=23,360$ and diameter $\delta(G)=5.$ Each model was run for 10 minutes (not including the time spent constructing the model). The next table summarizes the results.</p><p><style type="text/css">p, li { white-space: pre-wrap; }</style><style type="text/css">p, li { white-space: pre-wrap; }</style></p>
<style type="text/css">
.tg {border:none;border-collapse:collapse;border-color:#bbb;border-spacing:0;}
.tg td{background-color:#E0FFEB;border-color:#bbb;border-style:solid;border-width:0px;color:#594F4F;
font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{background-color:#9DE0AD;border-color:#bbb;border-style:solid;border-width:0px;color:#493F3F;
font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-dvpl{border-color:inherit;text-align:right;vertical-align:top}
</style>
<style type="text/css">
.tg {border:none;border-collapse:collapse;border-color:#bbb;border-spacing:0;}
.tg td{background-color:#E0FFEB;border-color:#bbb;border-style:solid;border-width:0px;color:#594F4F;
font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{background-color:#9DE0AD;border-color:#bbb;border-style:solid;border-width:0px;color:#493F3F;
font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-dvpl{border-color:inherit;text-align:right;vertical-align:top}
</style>
<div align="center">
<table class="tg">
<thead>
<tr>
<th class="tg-c3ow"><br /></th>
<th class="tg-c3ow">Distance</th>
<th class="tg-0pky">Distance2</th>
<th class="tg-0pky">Distance3</th>
<th class="tg-c3ow">Assignment</th>
<th class="tg-0pky">Flow</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-c3ow">Binary variables</td>
<td class="tg-dvpl">9,976</td>
<td class="tg-dvpl">12,000</td>
<td class="tg-dvpl">2,000</td>
<td class="tg-dvpl">4,002,000?</td>
<td class="tg-dvpl">2,000</td>
</tr>
<tr>
<td class="tg-c3ow">Total columns</td>
<td class="tg-dvpl">9,976</td>
<td class="tg-dvpl">12,000</td>
<td class="tg-dvpl">9,976</td>
<td class="tg-dvpl">4,002,000?</td>
<td class="tg-dvpl">48,720</td>
</tr>
<tr>
<td class="tg-c3ow">Total rows</td>
<td class="tg-dvpl">9,977</td>
<td class="tg-dvpl">12,001</td>
<td class="tg-dvpl">9,977</td>
<td class="tg-dvpl">2,001?</td>
<td class="tg-dvpl">2,001</td>
</tr>
<tr>
<td class="tg-c3ow">Nonzero coefficients</td>
<td class="tg-dvpl">4,017,952</td>
<td class="tg-dvpl">257,600</td>
<td class="tg-dvpl">4,017,952</td>
<td class="tg-dvpl">12,002,000?</td>
<td class="tg-dvpl">97,440</td>
</tr>
<tr>
<td class="tg-c3ow">Objective value</td>
<td class="tg-dvpl">3,202</td>
<td class="tg-dvpl">3,192</td>
<td class="tg-dvpl">3,181</td>
<td class="tg-dvpl">none</td>
<td class="tg-dvpl">3,506</td>
</tr>
<tr>
<td class="tg-c3ow">Lower bound</td>
<td class="tg-dvpl">3,139.7</td>
<td class="tg-dvpl">3,139.2</td>
<td class="tg-dvpl">3,143.3</td>
<td class="tg-dvpl">none</td>
<td class="tg-dvpl">1,980</td>
</tr>
</tbody>
</table></div>
<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">None of the models reach proven optimality, and in fact the assignment model hit the time limit early in the presolve phase, before CPLEX printed any statistics about dimensions. All the output I got was that presolve "has eliminated 0 rows and 0 columns..." The dimensions I listed are the dimensions for the paper model, before any presolve reductions.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">If you are wondering about why the "Distance2" model (which is the original distance model with Rob's modification to constraint (3)) has larger row and column dimensions than the original, it turns out the both start with the same initial dimensions but the CPLEX presolver removes a bunch of rows and columns in the original model that it cannot remove from Rob's version. Still, Rob's version winds up with an order of magnitude fewer nonzero coefficients than the original version (or my tweak to it, "Distance3").</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Based on this and a few other preliminary runs, there are a handful of takeaways that I am somewhat comfortable in saying.</p><ul style="text-align: left;"><li> As graph size grows, the assignment model fairly quickly becomes too large to use. Hypothetically it could reach proven optimum faster than the others given a liberal time limit, but the betting line is against that.</li><li>Of the remaining models, my flow model has the most columns but the fewest rows and the fewest nonzeros. Unfortunately, it also has the worst performance. On a couple of other tests it did better, at least early on, with the incumbent solution value, but it seems to have the weakest bounds. In the interest of salvaging a little bit of my bruised pride, I will note that fewest nonzeros means that, as the graph grows, it might at some point be the only one of these models to fit in memory.</li><li>The tweak I made to the original model ("Distance3") did best in both incumbent value and bound ... on this example ... within the 10 minute run time limit. There is no guarantee this holds up in other cases, and it ties with the original model for largest constraint matrix (excluding the assignment model).</li><li>Rob's tweak has more rows and columns than the original model (or my tweak), but not hugely more, and it has the fewest nonzeros among the distance models. So it is definitely a contender for the production model, at least pending more testing.</li></ul>Source code (in Java) that generates a random graph and tests the model is available from <a href="https://gitlab.msu.edu/orobworld/social-networks" target="_blank">my repository</a>.<br /><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><style type="text/css">p, li { white-space: pre-wrap; }</style>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0