tag:blogger.com,1999:blog-87813834610619295712024-02-27T04:35:39.329-05:00OR in an OB WorldA mix of operations research items and software tricks that I'll probably forget if I don't write them down somewhere.Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.comBlogger486125tag:blogger.com,1999:blog-8781383461061929571.post-8682390060503227062024-02-09T14:12:00.000-05:002024-02-09T14:12:20.432-05:00Another R Quirk<p>For the most part I like programming in R, but it is considerably quirkier than any other language I have used. I'm pretty sure that is what led to the development of what is known now as the "<a href="https://www.tidyverse.org/" target="_blank">Tidyverse</a>". The Tidyverse in turn introduces other quirks, as I've pointed out in a <a href="https://orinanobworld.blogspot.com/2023/10/the-trouble-with-tibbles.html" target="_blank">previous post</a>.</p><p>One of the quirks in base R caused me a fair amount of grief recently. The context was an interactive program (written in <a href="https://www.rstudio.com/products/shiny/" target="_blank">Shiny</a>, although that is beside the point here). At one point in the program the user would be staring at a table (the display of a data frame) and would select rows and columns for further analysis. The program would reduce the data frame to those rows and columns, and pass the reduced data frame to functions that would do things to it.</p><p>The program worked well until I innocently selected a bunch of rows and one column for analysis. That crashed the program with a rather cryptic (to me) error message saying that some function I was unaware of was not designed to work with a vector.</p><p>I eventually tracked down the line where the code died. The function I was unaware of apparently was baked into a library function I was using. As for the vector part, that was the result of what I would characterize as a "quirk" (though perhaps "booby trap" might be more accurate). I'll demonstrate using the <span style="font-family: courier;">mtcars</span> data frame that automatically loads with R.</p><p>Consider the following code chunk.</p><p><span style="font-family: courier;">rows <- 1:3<br />cols <- c("mpg", "cyl")<br />temp <- mtcars[rows, cols]<br />str(temp)<br /></span></p><p>This extracts a subset of three rows and two columns from <span style="font-family: courier;">mtcars</span> and presents it as a data frame.</p><p><span style="font-family: courier;">'data.frame': 3 obs. of 2 variables:<br /> <span>$</span> mpg: num 21 21 22.8<br /> <span>$</span> cyl: num 6 6 4</span></p><p>So far, so good. Now suppose we choose only one column and rerun the code.</p><p><span style="font-family: courier;">rows <- 1:3<br />cols <- c("mpg")<br />temp <- mtcars[rows, cols]<br />str(temp)</span><br /></p><p>Here is the result.</p><p><span style="font-family: courier;">num [1:3] 21 21 22.8</span></p><p>Our data frame just became a vector. That was what caused the crash in my program.</p><p>Since I was using the <span style="font-family: courier;">dplyr</span> library elsewhere, there was an easy fix once I knew what the culprit was.</p><p><span style="font-family: courier;">rows <- 1:3<br />cols <- c("mpg")<br />temp <- mtcars[rows, ] |> select(all_of(cols))<br />str(temp)</span><br /></p><p>The result, as expected, is a data frame.</p><p> <span style="font-family: courier;">'data.frame': 3 obs. of 1 variable:<br /> $ mpg: num 21 21 22.8</span></p>There will be situations where you grab one column of a data frame and want it to be a vector, and situations (such as mine) where you want it to be a data frame, so the designers of the language have to choose which route to go. I just wish they had opted to retain structure (in this case data frame) until explicitly dropped, rather than drop it without warning. <br /><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-68450540266748231052023-11-06T18:39:00.000-05:002023-11-06T18:39:06.985-05:00Displaying a Nested Data Frame<p>Almost anything (within reason) you want to do with data can be done in an R application ... if you can just find the correct libraries and functions. <br /></p><p>A program I'm writing in R, using the <a href="https://www.rstudio.com/products/shiny/" target="_blank">Shiny</a> web framework, needs to both display and export a data frame. Ordinarily, this would be straightforward: I would use <span style="font-family: courier;">datatable()</span>, <span style="font-family: courier;">renderDT()</span> and <span style="font-family: courier;">dataTableOutput()</span> from the <span style="font-family: courier;">DT</span> library for display, and something like <span style="font-family: courier;">write.csv</span> from the <span style="font-family: courier;">utils</span> library to export the data frame in a spreadsheet format. Unfortunately, none of that works this time. The problem is that the data frame comes from an HTTP POST request, with the data arriving as a <a href="https://www.json.org/json-en.html" target="_blank">JSON</a> object. The JSON object contains nested structures, which results in a data frame where some cells contain lists or data frames. Even after applying the <span style="font-family: courier;">flatten()</span> function from the <span style="font-family: courier;">jsonlite</span> library, the nesting persisted.</p><p>The data frame would render in Shiny as a table, but in cells where nested structures hid the rendered table would just say something like "list()" or maybe "1 entry". The <span style="font-family: courier;">write.csv</span> function refused to write the table to a file. I noticed, though, that if I viewed the data frame in <a href="https://posit.co/products/open-source/rstudio/" target="_blank">RStudio</a> (using the <span style="font-family: courier;">View()</span> command), the nested structures would still say something like "list()" but clicking them would open them in a new window and hovering over them with the mouse would give an indication of the content. A quick check of the help entry for <span style="font-family: courier;">View()</span> showed that it uses the <span style="font-family: courier;">format.data.frame()</span> function from the base library. So I tried applying that function to my data frame before displaying or exporting it, and both problems were fixed. The displayed table showed the contents of each cell, and the flattened table could be exported to a CSV file. The structure of the nested data frames and lists was not preserved, but the user could at least see what was in those cells, which was good enough for my application.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com1tag:blogger.com,1999:blog-8781383461061929571.post-16674283060805881212023-10-30T19:39:00.000-04:002023-10-30T19:39:20.806-04:00Publishing a Shiny Application<p>I've twice encountered problems while trying to deploy <a href="https://www.rstudio.com/products/shiny/" target="_blank">Shiny</a> applications on the <a href="https://www.shinyapps.io/">shinyapps.io</a> hosting platform, and since the fix was the same in both cases, I think a pattern is emerging. In both cases, the issue has to do with use of the <span style="font-family: courier;">includeMarkdown</span> function to display help text written in Markdown and stored in a file within the application. The <span style="font-family: courier;">includeMarkdown</span> function is provided by the <span style="font-family: courier;">htmltools</span> library, which is apparently loaded automatically by the shiny library. So my code explicitly specifies <span style="font-family: courier;">library(shiny)</span> but does not load <span style="font-family: courier;">htmltools</span>.</p><p>In both problem cases, the code ran fine on my PC but not on the server. Why? A little "fine print" in the documentation of the <span style="font-family: courier;">includeMarkdown</span> function mentions that it requires the <span style="font-family: courier;">markdown</span> package. I have that installed on my PC, and apparently it gets loaded automatically. The server deployment system, though, apparently does not realize the need for it. So on the server my code loads the <span style="font-family: courier;">shiny</span> library explicitly, and that causes the server to include the <span style="font-family: courier;">htmltools</span> library but not the <span style="font-family: courier;">markdown</span> library.</p><p>The solution is trivial: just add <span style="font-family: courier;">include(markdown)</span> in the source code. The hard part is remembering to do it, given that it is unnecessary on my PC.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-72079832567152834012023-10-28T12:19:00.001-04:002023-10-28T12:19:38.457-04:00The Trouble with Tibbles<p>Apologies to Star Trek fans for the title pun, but I couldn't resist. (If you're not a Trekkie, see <a href="https://en.wikipedia.org/wiki/The_Trouble_with_Tribbles" target="_blank">here</a> for clarification.)</p><p>I've been working on a interactive R application (using <a href="https://www.rstudio.com/products/shiny/" target="_blank">Shiny</a>, although that's probably irrelevant here). There are places where the code needs to loop through a data frame, looking for consecutive rows where either three text fields match or two match and one does not. Matching rows are copied into new data frames. The data I'm testing the code on has a bit over 9,000 rows, and the time spent on this process can take upwards of nine seconds -- not an eternity, but a bit annoying when you are sitting there waiting for it to hatch.</p><p>I decided to use the profiler in RStudio to see where time was being eaten up. Almost all the nine seconds was blamed on two steps. The biggest time suck was doing case-insensitive string comparisons on the three fields, which did not come as a big surprise. I went into the profiling process thinking the other big time suck would be adding a data frame to a growing list of data frames, but that was actually quite fast. To my surprise, the number two consumer of time was "df <- temp[n, ]", which grabs row n from data frame "temp" and turns it into a temporary data frame named "df". How could such a simple operation take so long?</p><p>I had a hunch that turned out to be correct. Somewhere earlier in the code, my main data frame (from which "temp" was extracted) became a <a href="https://tibble.tidyverse.org/" target="_blank">tibble</a>. Tibbles are modernized, souped-up versions of data frames, with extra features but also occasional extra overhead/annoyances. One might call them the backbone of the <a href="https://www.tidyverse.org/" target="_blank">Tidyverse</a>. The Tidyverse has its adherents and its detractors, and I don't want to get into the middle of that. I'm mostly happy to work with the Tidyverse, but in this case using tibbles became a bit of a problem.</p><p>So I tweaked the line of code that creates "temp" to "temp <- ... %>% as.data.frame()", where ... was the existing code. Lo and behold, the time spend on "df <- temp[n, ]" dropped to a little over half a second. Somewhat surprisingly, the time spent on the string comparisons dropped even more. So the overall processing time fell from around 9 seconds to under 1.3 seconds, speeding up the code by a factor of almost 7.</p><p>I'll need to keep this in mind if I bump into any other places where my code seems unusually slow.</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-57342727226178949832023-09-16T14:48:00.000-04:002023-09-16T14:48:31.757-04:00Unnecessary LyX Backup Files<p>If you edit a file from an older version of LyX and then save it, LyX will create an extra backup file with a lengthy name containing something about "version". As with other LyX backups, the file extension is ".lyx~". I believe this is a defense against the possibility that changes to the file format might muck up the file, and I have absolutely no problem with it. Let's call this the version backup.<br /></p><p>That said, for a long time now it has been the case that every time I created a new LyX file, saved it, then edited it and saved the edit, I would get both the standard backup file and (after the first edit only) the version backup. Subsequent saves would only produce the usual backup. This was a minor annoyance for me, since I was perpetually tracking down and deleting the version backups to reduce drive clutter (and cut down the time it took to backup my hard drive, which I do regularly).</p><p>Turns out there was a simple explanation (and fix) for this, which I found in a thread on a support forum. Like many users, at some point I created a new file, customized a few things, and then in Document > Settings..., clicked "Save as Document Defaults" to make those settings the defaults for all new documents. Doing so creates a file named "defaults.lyx" in the templates directory (which, on my Linux Mint system, is ~/.lyx/templates). That file used the file version in effect when I set it as defaults, meaning every new file I've created since then has started with that version and then been upgraded to whatever the current version is ... leading to all those version backups.</p><p>The fix was simple. I just created a new document and set it as the default template, which updated the template to the current LyX format (544 as of this writing). Now I just have to remember to repeat this whenever a new LyX version comes out with a new file format. (At my age, "I just have to remember" should not be taken lightly.)</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-24199057223154572202023-09-12T17:21:00.001-04:002023-09-16T14:33:07.698-04:00Randomly Generating a Connected Graph<p>Someone posted a <a href="https://or.stackexchange.com/questions/10942/how-to-generate-random-connected-planar-graph" target="_blank">question</a> on Operations Research Stack Exchange about generating random test instances of a connected graph. (They specified a <a href="https://en.wikipedia.org/wiki/Planar_graph" target="_blank">planar graph</a>, but I am going to ignore that restriction, which I suspect was unintended and which I am pretty sure would make the problem much harder.) The premise is fairly straightforward. You start with a fixed set of nodes, partitioned into three subsets (supply, transshipment and demand nodes). The goal is to select a random set of edges of fixed cardinality such that (a) the resulting graph is connected and (b) no edge connects two supply nodes or a supply node and a demand node. Setting the number of edges too low will obviously make the problem infeasible. The author specified that the graph could be either directed or undirected. My solutions are coded for undirected graphs but can be adapted to directed graphs.</p><p>I proposed two solutions, both of which I coded and tested in Java. One uses a mixed integer programming (MIP) model, which my code solves using CPLEX. Let $E$ be the set of all valid edges, $N$ the number of nodes and $K$ the desired number of edges. The MIP model uses binary variables $x_{i,j}$ for all $(i,j)\in E.$ The requirement to include $K$ edges is handled by the constraint $$\sum_{(i,j)\in E} x_{i,j} = K$$ and randomness is achieved by minimizing $$\sum_{(i,j)\in E} c_{i,j} x_{i,j}$$ where the cost coefficients $c_{i,j}$ are generated randomly. </p><p>That leaves the matter of forcing the graph to be connected. To do that, we introduce forward and backward flow variables $f_{i,j} \ge 0$ and $r_{i,j} \ge 0$ for each edge $(i,j),$ where $f_{i,j}$ is interpreted as flow from $i$ to $j$ and $r_{i,j}$ as flow from $j$ to $i.$ The concept is to single out one node (call it $s$) as a source for $N-1$ units of flow of some mystery commodity and assign every other node a demand of one unit of the commodity. For $i\neq s,$ the flow conservation constraint is $$\sum_{(j,i)\in E} (f_{j,i} - r_{j,i}) + \sum_{(i,j)\in E} (r_{i,j} - f_{i,j}) = 1,$$ which says that flow in minus flow out equals 1. To ensure that flow occurs only on selected edges, we add the constraints $$f_{i,j} \le (N-1) x_{i,j}$$ and $$r_{i,j} \le (N-1) x_{i,j}$$ for all $(i,j)\in E.$</p><p>The edge removal heuristic is a bit simpler. We start by selecting all eligible edges and shuffling the list randomly. While the number of edges in the graph exceeds $K,$ we pop the next edge from the list, remove it, and test whether the graph remains connected. If yes, the edge is permanently gone. If no, we replace the edge in the graph (but not on the list) and move to the next edge in the list. To confirm that removing edge $(i,j)$ leaves a connected graph, we start from node $i$ and find all nodes that can be reached from $i$ in one step (i.e., remain adjacent). Then we find all nodes adjacent to those nodes, and continue until either we encounter node $j$ (in which case the graph is still connected) or run out of nodes to test (in which case the graph is now disconnected).</p><p>The edge removal heuristic is simpler to explain, does not require a MIP solver and likely is faster. There are two potential disadvantages compared to the MIP approach. One is that the MIP is done after solving once, whereas there is a possibility that the edge removal heuristic fails to produce a connected graph due to an "unlucky" choice of which edges to remove, requiring one or more restarts with different random number seeds. The other is that if $K$ is set too low (making a connected graph impossible), the MIP model will detect that the problem is infeasible. With the edge removal heuristic, you would not be able to distinguish infeasibility from bad luck (although if the heuristic failed multiple times with different random seeds, infeasibility would start to look likely). In very very limited testing of my code, the edge removal heuristic was definitely faster than the MIP and achieved a valid result in the first pass each time.</p><p>Java code for both methods is in my <a href="https://gitlab.msu.edu/orobworld/randomconnectedgraph" target="_blank">code repository</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-72192781010666705332023-08-12T15:27:00.003-04:002023-08-12T15:27:32.756-04:00The "Set" Card Game<p>Someone asked a <a href="https://or.stackexchange.com/questions/10820/set-game-how-to-generate-all-sets-with-a-mip" target="_blank">question</a> (actually a pair of questions) about the <a href="https://en.wikipedia.org/wiki/Set_(card_game)" target="_blank">"Set" card game</a> on OR Stack Exchange. This was the first I had heard of the game. For purposes of this post, the game play is irrelevant. The key is that it uses a deck of 81 cards, each of which has four "features" with three possible values for each feature. Every combination of feature values appears on exactly one card. Three cards form a "set" if, for each of the four features, either all three cards have the same value or all three cards have different values. As noted in the Wikipedia page, there are 1,080 possible "sets". (I'm putting "set" in quotes to avoid confusion with, you know, sets.) The OR SE question asked if an integer program (IP) could be used to generate all 1,080 sets.</p><p>The answer is yes, with the caveat that while you can do it, an IP model is not the most efficient way to do it. I won't repeat my IP model here, since (a) it's a bit of a space hog and (b) you can read it in my answer on ORSE. The model is pretty small (97 binary variables) and finds one "set" quickly. There are two ways to get it to find all possible "sets". Both involve adding a "no good" constraint for each solution found, i.e., a constraint that eliminates that solution but no others. The more tedious approach solves the progressively growing model 1,081 times (the last time producing a declaration of infeasibility, meaning all "sets" have been found). The less tedious approach requires a solver that supports a callback function letting you reject candidate solutions by adding a constraint cutting them off (the same "no good" constraint used in the first approach).</p><p>Since there is no objective and the constraints are mostly logical in nature, constraint programming (CP) strikes me as a more intuitive approach than IP. The specifics of a CP model will depend on what types of constraints your CP solver supports, but I think the model I tried is likely to work with a lot of solvers. Since I coded the models in Java, I'll use 0-based indexing here. My variables (all integer-valued) are: $x_s\in \lbrace 0,\dots, 80\rbrace,$ the index of the card in slot $s\in \lbrace 0,1,2\rbrace$ of the "set"; and $y_{f,s}\in \lbrace 0,1,2\rbrace,$ the value of feature $f$ in the card in slot $s.$ The constraints are as follows:</p><ul style="text-align: left;"><li>allDifferent($x$) (no two slots can contain the same card);</li><li>$x_0 \lt x_1 \lt x_2$ (to avoid counting the same "set" more than once, we require that the cards be listed in ascending index order);</li><li>$y_{f,s} = a_{x_s,f},$ where $a_{c,f}$ is the value of feature $f$ in card $c$ (the value of a feature in a slot comes from the card in the slot); and</li><li>allDifferent$(\lbrace y_{f,0}, y_{f,1}, y_{f,2}\rbrace) \vee (y_{f,0}=y_{f,1} \wedge y_{f,1}=y_{f,2})\,\forall f$ (for every feature, either all three slots have different values or all three slots have the same value).</li></ul><p>This exploits two constraints that CP solvers typically have but IP solvers do not. One is the "all different" constraint, which forces a set of variables to take values that do not repeat. The other is the ability to use a decision variable as an index to either a constant vector or another variable vector. That comes into play in the third constraint, where the variable $x_s$ (the index of the card in slot $s$) is used as a subscript of $a$ to get the feature value of that card.</p><p>Even CP is probably overkill for this problem, though. We can compute the 1,080 possible solutions by brute force. I used a recursive function for this.</p><p>Here is how each method stacks up in terms of computation time. As I said, I coded it in Java, using CPLEX 22.1.1 for the IP models and CP Optimizer 22.1.1 for the CP model.</p><ul style="text-align: left;"><li>brute force took between 0.1 and 0.2 seconds;</li><li>the CP model also took between 0.1 and 0.55 seconds;</li><li>solving the IP model once using a callback took between 1.4 and 3 seconds; and</li><li>solving the IP model 1,081 times (without a callback) took around 56 seconds, give or take.</li></ul><p>Solution times for all methods are a bit random (they change when I rerun the program), so treat them as approximate. In the interest of full disclosure, I throttled CPLEX to a single thread for the callback approach. Using multiple threads would have required both synchronizing the code (a pain) and also checking for repeated solutions (since hypothetically two parallel threads could independently locate the same "set"). The moral, to the extent there is one, is that sometimes brute force beats more sophisticated approaches.</p><p>You can find my Java code <a href="https://gitlab.msu.edu/orobworld/set-game" target="_blank">here</a> if interested.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-30053475196960195992023-07-30T11:51:00.000-04:002023-07-30T11:51:46.447-04:00Visiting All Nodes<p>A <a href="https://or.stackexchange.com/questions/10762/how-can-i-find-the-shortest-path-visiting-all-nodes-in-a-connected-graph-as-milp" target="_blank">question</a> on Operations Research Stack Exchange asks about finding a shortest route through a connected graph that visits every node. This is distinct from the well-known <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem" target="_blank">traveling salesman problem</a> (TSP), in that the TSP requires each node to be visited exactly once whereas the OR SE question explicitly allows nodes to be visited more than once.</p><p>The question reminded me of a conversation I had years ago with a doctoral student in our logistics program. He was working on routing deliveries (via truck) from a depot to various customers, and was trying to formulate it as a TSP using a graph based on a road map (buildings or intersections as nodes, roads as edges). I did my best at the time to convince him that the TSP was unnecessarily restrictive and might even make the problem infeasible. The second example in the OR SE question is a nice example of how that could happen. Generally speaking, in an actual routing problem there is nothing stopping you from passing by/through a previously visited location if that's the best route. There are a few exceptions to this, some in military applications (you were laying mines and/or blowing up bridges as you passed them) and some in normal life (you were speeding and might not want to return some place where a traffic cop would remember you), but for the most part I believe it holds.</p><p>My advice to the doctoral student was similar to the answer Rob Pratt posted on OR SE. Rethink the graph, keeping just the depot and customers as nodes and letting the edges in the new graph represent the shortest routes between pairs of nodes. So edge $(i,j)$ and edge $(k,\ell)$ connect different pairs of nodes but might represent physical routes that cross (you pass through the same intersection on both, albeit possibly in different directions) or overlap (both take US 127 north from mile marker 102 to mile marker 137). Building the edges in the new graph involves solving a shortest path problem between each pair of nodes (using the original graph). You can apply <a href="https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm" target="_blank">Dijkstra's algorithm</a> to each pair of nodes, but I'm pretty sure applying the <a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm" target="_blank">Floyd-Warshall algorithm</a> would be faster. Pro tip: if you use Floyd-Warshall, don't bother to extract the route for each pair of nodes. Just get the distances, and keep the F-W solution handy. Once you have solved a TSP on the modified graph and know which edges will be used, you can extract the physical routes corresponding to just the used edges from the F-W solution.</p><p>I posted an explicit MIP model (which I will not repeat here, since it's a space hog) as an answer to the OR SE question. I suspect that Rob's approach (TSP on a modified graph) might be faster, particularly because it allows you to use a dedicated TSP solver. Although TSPs have a reputation for being difficult (being the poster child for NP-hard problems), bespoke solvers like <a href="https://www.math.uwaterloo.ca/tsp/concorde.html" target="_blank">Concorde</a> can chew through rather large TSPs pretty quickly. Other than the upfront computational cost of finding the shortest path between every pair of nodes, the one concern I might have with the TSP approach is that it converts what might have been a sparse graph to a complete graph. So, like all things MIP, which approach is better is an empirical question.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-74007901323322356242023-06-18T10:00:00.002-04:002023-06-18T10:00:00.149-04:00An Unbounded Bounded Feasible Region (II)<p>In my <a href="https://orinanobworld.blogspot.com/2023/06/an-unbounded-bounded-feasible-region-i.html" target="_blank">previous post</a>, I cited a <a href="https://or.stackexchange.com/questions/10579/randomly-constructing-a-bounded-ellipsoid" target="_blank">question</a> on Operations Research Stack Exchange about an allegedly unbounded feasible region defined by a randomly generated quadratic constraint. In that post, I presented what I believe is a valid proof that, at least in theory, the feasible region should be bounded with probability 1.</p><p>Unfortunately, in our digital world, theory and computation sometimes diverge. To test whether I was correct, I coded several mixed integer quadratic programming (MIQCP) models to assess the boundedness of $X$, and applied them to a small sample of test problems (using Java and CPLEX 22.1.1). I set the dimension of the test problems to $n=5$ (for no particular reason).<br /></p><p>All three models contained the variable $x$ and the constraint $x^{\prime}Qx+q^{\prime}x \le -q_{0}.$ Two of the models attempted to answer the question directly by maximizing either the $L_1$ or $L_\infty$ norm of $x$ over $X$.</p><p>For the $L_\infty$ model, I added continuous variable $y$ and binary variables $z_i$ and $w_i$ together with the constraints $$\sum_i z_i + \sum_i w_i = 1,$$ $$y_i \ge x_i,$$ $$y_i \ge -x_i,$$ $$ z_i = 1 \implies y_i = x_i$$ and $$w_i = 1 \implies y = -x_i.$$ The combined effect of these is to force $y = \vert x_i \vert$ for some $i$ where $$\vert x_i \vert = \max_j \vert x_j \vert = \parallel x \parallel_\infty.$$ The objective was to maximize $y.$<br /></p><p>For the $L_1$ model, I added continuous variables $y_i \ge 0$ and constraints $y_i = \vert x_i \vert.$ (Internally, CPLEX adds binary variables does big-M magic to linearize the use of absolute values.) The objective was to maximize $\sum_i y_i = \parallel x \parallel_1.$</p><p>My third approach was iterative. The model was almost the same as the $L_\infty$ model, except that rather than maximizing $y$ I set the objective to minimize 0 (meaning the first feasible solution wins) and set the lower bound of $y$ to some value $M.$ If the model found a feasible solution (an $x\in X$ such that $\parallel x \parallel_\infty \ge M$), I doubled $M$ and tried again, until either $M$ exceeded some upper limit or the solver said the problem was infeasible (meaning $X$ is bounded in $L_\infty$ norm by the last value of $M$).</p><p>You can find <a href="https://gitlab.msu.edu/orobworld/ellipsoid" target="_blank">my Java code</a> (self-contained other than needing CPLEX) in my repository. Here are the results from a handful of test runs.</p><p></p>
<div style="text-align: center;"><style type="text/css">
.tg {border:none;border-collapse:collapse;border-color:#bbb;border-spacing:0;}
.tg td{background-color:#E0FFEB;border-color:#bbb;border-style:solid;border-width:0px;color:#594F4F;
font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{background-color:#9DE0AD;border-color:#bbb;border-style:solid;border-width:0px;color:#493F3F;
font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
</style></div>
<table align="center" class="tg">
<thead align="center">
<tr>
<th class="tg-c3ow">Random Seed</th>
<th class="tg-c3ow">Max L_infinity<br /></th>
<th class="tg-c3ow">Max L_1<br /></th>
<th class="tg-c3ow">Iterative<br /></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td class="tg-c3ow">123<br /></td>
<td class="tg-c3ow">unbounded<br /></td>
<td class="tg-c3ow">unbounded<br /></td>
<td class="tg-c3ow">bounded (100)<br /></td>
</tr>
<tr align="center">
<td class="tg-c3ow">456</td>
<td class="tg-c3ow">unbounded<br /></td>
<td class="tg-c3ow">unbounded<br /></td>
<td class="tg-c3ow">bounded (100)</td>
</tr>
<tr align="center">
<td class="tg-c3ow">789</td>
<td class="tg-c3ow">unbounded</td>
<td class="tg-c3ow">unbounded</td>
<td class="tg-c3ow">bounded (1600)</td>
</tr>
<tr align="center">
<td class="tg-c3ow">12345</td>
<td class="tg-c3ow">unbounded</td>
<td class="tg-c3ow">bounded (22.1)</td>
<td class="tg-c3ow">bounded (100)</td>
</tr>
<tr align="center">
<td class="tg-c3ow">61623</td>
<td class="tg-c3ow">bounded (12.4)</td>
<td class="tg-c3ow">unbounded</td>
<td class="tg-c3ow">bounded (100)</td>
</tr>
<tr>
<td class="tg-c3ow" style="text-align: center;">20230616</td>
<td class="tg-c3ow" style="text-align: center;">bounded (120.9)</td>
<td class="tg-c3ow" style="text-align: center;">unbounded</td>
<td class="tg-c3ow" style="text-align: center;">bounded (200)</td>
</tr>
</tbody>
</table><p>The numbers in parentheses are upper bounds on the $L_1$ norm in the third column and the $L_\infty$ norm in the second and fourth columns. The bound found by the iterative method is never tight, so a bound of 100 means $\max_{x in X} \parallel x \parallel_\infty \le 100.$ As you can see, the iterative method always found the ellipsoids to be bounded, consistent with the mathematical argument in the previous post. The other two models frequently found the problem to be "unbounded", though they did not always agree on that. This is a bit confusing (OK, very confusing). In particular, the "Max L_infinity" and "Iterative" models differ only in whether you are maximizing $y$ or looking for any solution with $y\ge M,$ so saying (when the seed is 123) that the supremum of $y$ is $\infty$ but $y$ cannot exceed 100 is grounds for another beer (or three).</p><p>Something is apparently going on under the hood in CPLEX that is beyond me. Meanwhile, I'm sticking to my belief that $X$ is always bounded.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-90771706836036592872023-06-17T15:27:00.000-04:002023-06-17T15:27:56.335-04:00An Unbounded Bounded Feasible Region (I)<p>If you find the title of this post confusing, join the club! A question on Operations Research Stack Exchange, <a href="https://or.stackexchange.com/questions/10579/randomly-constructing-a-bounded-ellipsoid" target="_blank">"Randomly constructing a bounded ellipsoid"</a>, sent me down a rabbit hole, eventually leading to this opus. I'm going to make a couple of notational tweaks to the original question, but the gist is as follows. We have an elliptical feasible region $X = \lbrace x \in \mathbb{R}^n : f(x) \le 0 \rbrace$ where $f(x) = x^\prime Q x + q^\prime x + q_0.$ (One of my tweaks is to absorb the author's factor $1/2$ into $Q.$) $Q\in \mathbb{R}^{n \times n},$ $q\in \mathbb{R}^n$ and $q_0\in \mathbb{R}$ are generated by sampling random numbers from the standard normal distribution. In the case of $Q,$ we sample an $n\times n$ matrix $H$ and then set $Q = \frac{1}{2} H^\prime H.$ (My introducing the symbol $H$ for the sampled matrix is the other notational tweak.) Note that $Q$ is automatically symmetric and positive semidefinite, and is positive definite with probability 1. (For it not to be positive definite, $H$ would have to have less than full rank, which has zero probability of occurring.) I should point out here that saying something has probability 1 or 0 assumes that the random number generator works as advertised.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The author of the question said that in their experience $X$ was unbounded "most of the time." That struck me as impossible, and after a bunch of scribbling on marker boards I finally came down to what I think is a correct argument that $X$ must be bounded. Let $\left\{ x_{1},\dots,x_{n}\right\} $ be an orthonormal basis of eigenvectors of $Q,$ with $Qx_{i}=\lambda_{i}x_{i}$ and $$x_i^\prime x_j =\begin{cases} 1 & i = j \\ 0 & i \neq j. \end{cases}$$</p><span style="white-space: pre-wrap;">(I'll leave the proof that such a basis exists to the reader as an exercise.)</span><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Now suppose that $X$ is unbounded, meaning that for an arbitrarily large $M$ we can find $x\in X$ such that $\parallel x\parallel>M.$ Write $x$ in terms of the basis: $x=\sum_{i}a_{i}x_{i}.$ Observe that $$M^{2}=\parallel x\parallel^{2}=x^{\prime}x=\sum_i \sum_j a_i a_j x_i^\prime x_j = \sum_{i}a_{i}^{2}\left(x_{i}^{\prime}x_{i}\right)=\sum_{i}a_{i}^{2}.$$Expanding $f(x),$ we have </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">\begin{align*} f(x) & =\left(\sum_{i}a_{i}x_{i}\right)^{\prime}Q\left(\sum_{i}a_{i}x_{i}\right)+q^{\prime}\left(\sum_{i}a_{i}x_{i}\right)+q_{0} \\ & =\sum_{i,j}a_{i}a_{j}\left(x_{i}^{\prime}Qx_{j}\right)+\sum_i a_{i}\left(q^{\prime}x_{i}\right)+q_{0} \\ & =\sum_{i,j}a_{i}a_{j}\lambda_{j}\left(x_{i}^{\prime}x_{j}\right)+\sum_{i}a_{i}\left(q^{\prime}x_{i}\right)+q_{0} \\ & =\sum_{i}a_{i}^{2}\lambda_{i}+\sum_{i}a_{i}\left(q^{\prime}x_{i}\right)+q_{0}. \end{align*}</p><p></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Since $x\in X,$ $\sum_{i}a_{i}^{2}\lambda_{i}\le-\sum_{i}a_{i}\left(q^{\prime}x_{i}\right)-q_{0}.$ According to the Cauchy-Schwarz inequality, $\vert q^{\prime}x_{i}\vert\le\parallel q\parallel\parallel x_{i}\parallel=\parallel q\parallel,$ so we have $$\sum_{i}a_{i}^{2}\lambda_{i}\le-\sum_{i}a_{i}\left(q^{\prime}x_{i}\right)-q_{0}\le\sum_{i}\vert a_{i}\vert\parallel q\parallel+\vert q_{0}\vert.$$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">On the other hand, if $\Lambda=\min_{i}\lambda_{i}>0,$ then $$\sum_{i}a_{i}^{2}\lambda_{i}\ge\Lambda\sum_{i}a_{i}^{2}=\Lambda M^{2}.$$ Combining these, $$\Lambda M^{2}\le\parallel q\parallel\sum_{i}\vert a_{i}\vert+\vert q_{0}\vert.\quad (1)$$</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Now let $A=\max_{i}\vert a_{i}\vert$ and assume without loss of generality that $\vert a_{1}\vert=A.$ Since $M^{2}=\sum_{i}a_{i}^{2},$ $A^{2}=M^{2}-\sum_{i>1}a_{i}^{2}\le M^{2}$ and so $0<A\le M.$ Meanwhile, $M^{2}=\sum_{i}a_{i}^{2}\le nA^{2},$ which implies $A\ge\frac{M}{\sqrt{n}}.$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style>
Dividing both sides of (1) by $A,$ we have $$\Lambda M\le\Lambda\frac{M^{2}}{A}\le\parallel q\parallel\sum_{i}\frac{\vert a_{i}\vert}{A}+\frac{\vert q_{0}\vert}{A}\le\parallel q\parallel n+\frac{\vert q_{0}\vert}{A}.\quad (2)$$</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The left side of (2) increases as we increase $M,$ while the right side decreases (since $A$ increases and both $q_0$ and $\parallel q\parallel n$ are constant). This leads to a contradiction.</p>
<p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><br />So, barring an error in the above, we have a mathematical proof that $X$ must be bounded. In the next post I will explore the computational side of things.<br /></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p>
<p><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p><style type="text/css">p, li { white-space: pre-wrap; }</style></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-65639951029435806652023-05-08T18:49:00.000-04:002023-05-08T18:49:56.048-04:00Finding a Connected Subgraph<p>A recent <a href="https://math.stackexchange.com/questions/4693636/milp-constraints-for-connectivity-in-a-subgraph" target="_blank">question</a> on Mathematics Stack Exchange asked about using a mixed integer linear program (MILP) to find a connected subgraph of a specified size (vertex count) within a given graph. Specifically, it asked how to formulate constraints to enforce the requirement that the subgraph be connected.</p><p>As I have <a href="https://orinanobworld.blogspot.com/2013/07/extracting-connected-graph.html" target="_blank">previously mentioned</a>, one way to force a graph to be connected is to add flows between vertices, treating one vertex as a source and the other vertices as sinks. In the context of the current question, that works as follows. Let $V$ and $E$ be the sets of vertices and edges respectively in our (undirected) graph and $M<\vert V \vert$ the number of vertices required in the connected subgraph. Note that we do not assume the parent graph is connected, and in fact the sample graph in the Math SE post is not. Since flows are inherently directional, we replace each edge $e=(v,w)\in E$ with two arcs, $(v,w)$ and $(w,v).$ Let $A$ be the set of all arcs.<br /></p><p>We designate (arbitrarily) one vertex in the subgraph to be the "root" (source) vertex, with a supply of $M-1$ thingies. (I was going to say "honest politicians", but then the problem becomes infeasible, hence "thingies".) Each of the remaining vertices in the subgraph consumes one thingy. Thingies flow along arcs, but only if both endpoints are selected. Since $M-1$ thingies enter the network at the root vertex and each of the remaining $M-1$ vertices consumes one thingy, the only way for flow to be balanced is if the selected vertices form a connected graph.</p><p>For each vertex $v\in V,$ let $x_v \in \lbrace 0, 1 \rbrace$ be 1 if and only if vertex $v$ is selected and let $y_v\in \lbrace 0, 1 \rbrace$ be 1 if and only if vertex $v$ is chosen to be the root vertex. For each arc $a\in A,$ let $f_a \in [0, M-1]$ be the flow across arc $a.$ The objective function is immaterial, since we just want a feasible solution, so we will minimize 0. The constraints are as follows:</p><ul style="text-align: left;"><li>The correct number of vertices must be chosen: $$\sum_{v\in V} x_v = M.$$</li><li>One vertex must be designated the root: $$\sum_{v \in V} y_v = 1.$$</li><li>The root vertex must be one of the chosen vertices: $$y_v \le x_v \quad \forall v\in V.$$</li><li>The flow on any arc must be 0 unless both head and tail are among the chosen vertices: $$f_{(v,w)}\le (M-1)x_v \quad \forall (v,w)\in A$$ and $$f_{(v,w)}\le (M-1)x_w \quad \forall (v,w)\in A.$$</li><li>The net flow out of any vertex (flow out - flow in) must be $M - 1$ for the source vertex, -1 for every other selected vertex and 0 otherwise: $$\sum_{(v,w)\in A} f_{(v,w)} - \sum_{(w,v)\in A} f_{(w,v)} = M\cdot y_v - x_v \quad \forall v\in V.$$</li></ul><p>This works in a MILP model, but a MILP model is probably not the best way to solver the underlying problem. Consider the following algorithm, which is based on constructing a layered subgraph. Assume the vertices are in some order. Make the first vertex the root and create a graph containing just it (which we will call layer 0). Add up to $M-1$ vertices connected to the root by an edge (layer 1). If the graph now contains $M$ vertices, declare victory and stop. Otherwise, for each vertex in layer 1, find all vertices connected to it by an edge and not in layers 0 or 1 and as many as them as are needed to reach $M$ vertices (or all of them if you are still short). Repeat until done or until all vertices in layer 1 have been processed. The added vertices form layer 2 because they are two edges from the root. Repeat with each layer until either you have $M$ vertices (victory) or there are no vertices left to process.</p><p>If all connected vertices have been found and the count is less than $M,$ discard the current root and start over with the next vertex as root. Note that when processing subsequent roots, you can ignore any vertices already tried as root and just look at vertices later in the vertex ordering.</p><p>I ginned up some <a href="https://gitlab.msu.edu/orobworld/connected-subgraph" target="_blank">Java code</a> (open source, as always) to test both the MILP model and the layered graph approach, using CPLEX as the MILP solver. Both got the examples from the Math SE post correct in a trivial amount of time. On a randomly generated graph with 5,000 nodes and 624,713 edges (about 5% of the edges in a complete graph that size), with a target subgraph size of 1,000 nodes, the MILP model took about 8 seconds to build and 12 seconds to solve. The layered graph search needed 3 milliseconds. I tried a larger graph, but my computer ran out of heap space trying to build the model (mainly due to the expanding number of flow variables and related constraints).</p><p>So there are two things to take away. One is that flows and flow balance constraints can be used to enforce connection among vertices, even when there is no actual thing flowing. The other is that, much as I like MILP models, sometimes simpler approaches are better.</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-12979510750466884702023-04-28T16:03:00.000-04:002023-04-28T16:03:37.126-04:00A Matrix Puzzle<p>A question on Mathematics Stack Exchange, <a href="https://math.stackexchange.com/questions/4686170/placing-number-blocks-so-that-the-resulting-matrix-is-symmetric" target="_blank">"Placing number blocks so that the resulting matrix is symmetric"</a>, revolves around a square matrix whose entries are partitioned into blocks (submatrices). The submatrices need to be rearranged so that the resulting matrix is symmetric around its main diagonal. The author of the question asked if it were possible to do this using linear programming. I think the answer is no, but we can certainly do it with either an integer linear program or a constraint program. The author posts two examples ($4\times 4$ and $5\times 5$). I cobbled together Java code for a MIP model (using CPLEX) and a CP model (using CP Optimizer) and had no trouble solving both problems.</p><p>There are a couple of generalizations worth noting. First, while the example matrices contain integers, there is nothing preventing the two models from being used with other content types (real numbers, complex numbers, text). Some variable definitions would need modification, but conceptually both methods would work. Second, the author of the post used only one-dimensional blocks (row vectors or column vectors), but my code allows for arbitrary rectangular blocks. The code assumes that the "parent matrix" is square, but that would be easy enough to relax, so the same models (with minor adjustments) would work with arbitrary rectangular matrices.</p><p>I think that the puzzle makes a good vehicle for comparing MIP and CP applications to logic problems. In optimization problems, I suspect that MIP models often do a better job of computing objective bounds than do CP models. That is a non-issue here, since the problem is just to find a feasible solution. For logic problems and similar things (particularly scheduling), I think CP models tend to be more expressive, meaning certain types of constraints or relationships can be expressed more naturally with CP than with MIP (where the relationships turn into rather arcane and complicated adventures with binary variables). That applies here, where the CP model exploits the ability to use integer variables as subscripts of other variables.</p><p>As described in the PDF file, though, that subscripting ability has its limits. CP Optimizer will let you index a one-dimensional vector of variables using an integer variable, but not a two-dimensional array. In other words, <span style="font-family: courier;">x[y]</span> is fine but <span style="font-family: courier;">x[y, z]</span> is not (where <span style="font-family: courier;">x</span>, <span style="font-family: courier;">y</span> and <span style="font-family: courier;">z</span> are all variables). The workaround I used is to flatten a 2-D matrix into a 1-D matrix. So if $N\times N$ matrix $x$ is flattened into $N^2\times 1$ vector $\hat{x},$ $x_{y,z}$ becomes $\hat{x}_{N(y - 1) + z}.$</p><p>The models are a bit too verbose to squeeze into a blog post, so I wrote them up in a separate <a href="https://rubin.msu.domains/blog/matrix-puzzle-formulations.pdf" target="_blank">PDF file</a>. My Java code (which requires recent versions of CPLEX and CP Optimizer) can be had from my <a href="https://gitlab.msu.edu/orobworld/matrix-puzzle" target="_blank">code repository</a> under a Creative Commons license.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-61834652469186857042023-04-18T16:24:00.000-04:002023-04-18T16:24:32.357-04:00Node Coloring<p>Someone posted a question on OR Stack Exchange about <a href="https://or.stackexchange.com/questions/10321/coloring-of-nodes-of-a-sensor-network" target="_blank">coloring nodes</a> in a sensor network. The underlying problem has something to do with placing sensors, but in mathematical terms the poster wanted to color the nodes of a weighted undirected graph with a fixed number of colors, while minimizing the sum of the edge weights of edges connecting nodes of the same color.</p><p>There is an obvious MIP model for this problem, which was posted as an answer to an earlier version of the question. If $N$ and $C$ are the index sets for nodes and colors respectively, $E\subseteq N\times N$ is the set of edges, $w_{ij}$ is the objective weight of edge $(i,j)\in E$ and $x_{n,c}$ is a binary variable indicating whether node $n$ is assigned color $c$, the problem can be written as a binary quadratic program:</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$\min\ \sum_{(n,m)\in E}w_{nm}\sum_{c\in C}x_{nc}x_{mc}\\<br />\textrm{s.t. }\sum_{c\in C}x_{nc} =1\quad\forall n\in N\\<br />\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C.$$</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> <br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The model can be linearized by introducing continuous variables $y_{nm}\ge 0$ to represent the objective contribution of each edge $(n,m)\in E$:</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$\min\ \sum_{(n,m)\in E}w_{nm} y_{nm}\\<br />\textrm{s.t. }\sum_{c\in C}x_{nc}=1 \quad \forall n\in N\\<br />\phantom{\textrm{s.t. }}y_{nm}\ge x_{nc} + x_{mc} - 1 \quad \forall (n,m)\in E, \forall c\in C\\<br />\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C\\<br />\phantom{\textrm{s.t. }}y_{nm}\ge 0 \quad \forall (n,m)\in E.$$<br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">There is some symmetry in the model: given any feasible solution, another solution of identical cost is obtained by permuting the colors. It is possible to add constraints to remove that symmetry.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The person posting the question was looking for alternatives to a MIP model. An easy choice for me would be a genetic algorithm. I wanted to test this in R, using the <span style="font-family: courier;">GA</span> library for the genetic algorithm. The <span style="font-family: courier;">GA</span> library allows three types of chromosomes: binary or real vectors, or permutations of an index vector. If I were coding my own GA, the chromosome would be a vector in $\lbrace 1,\dots,C\rbrace^N.$ Within the available chromosome types in the library, the obvious choice was to use a vector $x\in [0,C]^N$ and then set the color of node $i$ to $\lceil x_i \rceil\in \lbrace 1,\dots,C\rbrace.$ I suspect, though, that the symmetry issue described above might have negative repercussions for a GA. If two "good" solutions were effectively the same (other than a permutation of the color indices), their offspring (which would be a mix of the two color numbering methods) might be a mess.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">A couple of responders to the OR SE question suggested heuristics, as did I. My choice was to start with a random generation heuristic (assign colors randomly), apply an improvement heuristic (go through the nodes in random order and, for each node, reassign it to the cheapest color given the rest of the solution), and repeat (changing the order in which nodes are examined on each pass) until one pass produced no improvement. Since the heuristic is fairly fast, it can be done multiple times (new random restarts) with the best solution retained. Heuristics like this are very easy to code.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">I coded all three approaches in R, using CPLEX (with the <span style="font-family: courier;">ompr</span> and <span style="font-family: courier;">ROI</span> libraries) for the MIP model, the <span style="font-family: courier;">GA</span> library for the genetic algorithm, and nothing special for the improvement heuristic. I also threw in the <span style="font-family: courier;">tictoc</span> library to do some timings. The test problem was a randomly constructed $10\times 20$ complete grid graph (200 nodes; 19,900 edges), to be colored using five colors.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">To break symmetry in the MIP model, I forced the colors to be ordered so that the lowest node index receiving any color was smaller than the lowest node index receiving the next color. I let CPLEX run for five minutes using default settings. It made what I would call reasonable progress on the incumbent objective value, but the formulation is apparently very weak (at least on the test example). The lower bound barely moved, and the gap was 99.97% after five minutes.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">I mostly used default parameters for the GA, although I let it run for 2,000 generations with a population of size 100. It found a much better solution (in 38 seconds) than CPLEX found in five minutes. To be fair, though, CPLEX was balancing improving the incumbent with working on the bound. Had I changed the CPLEX parameters to emphasize improving the incumbent, it likely would have done better (though likely not as well as the GA).</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The improvement heuristic surprised me a bit. 600 random starts needed only 37 seconds (and I'm not the poster boy for optimized R code), and the best solution it found had an objective value about 5% better than what the GA produced. I did a small number of runs with different random seeds (but the same problem size), and the improvement heuristic consistently edged out the GA. I suppose the moral here is to try simple heuristics before going "high tech".</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"><br /></p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">You are welcome to play with <a href="https://rubin.msu.domains/blog/graph_coloring.nb.html" target="_blank">my R notebook</a>, which contains both code and results.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-74977546469258899172023-04-13T18:07:00.001-04:002023-04-13T18:07:35.849-04:00The rJava Curse Strikes Again<p>Apparently I have not needed the <span style="font-family: courier;">rJava</span> R package in a while, because when I wanted to install an R package today that has <span style="font-family: courier;">rJava</span> as a dependency, it was not there. So I tried to install it (this is on Linux Mint), and of course it failed to install. I have a long history of installation battles with <span style="font-family: courier;">rJava</span> (see <a href="https://orinanobworld.blogspot.com/2011/03/configuring-r-for-java.html" target="_blank">here</a>, <a href="https://orinanobworld.blogspot.com/2015/06/the-rjava-nightmare.html" target="_blank">here</a> and <a href="https://orinanobworld.blogspot.com/2016/12/rjava-gift-that-keeps-on-giving.html" target="_blank">here</a> in chronological order ... or better still don't traumatize yourself by reading them). Why should this time be different?<br /></p><p>All my previous battles involved older versions of Java with apparently different directory locations or structures, and none of the previous fixes worked. After considerable aggravation, I found a <a href="https://datawookie.dev/blog/2018/02/installing-rjava-on-ubuntu/" target="_blank">very helpful post</a> by "datawookie" that nearly got the job done. I did in fact get an error message about "jni" and used the trick in datawookie's post of setting JAVA_HOME to the correct path (in my case to Open JDK 17) as an argument to <span style="font-family: courier;">javareconf</span>. When I then attempted to install <span style="font-family: courier;">rJava</span>, I got a different error ("could not find -lbz2"), which prompted me to install <span style="font-family: courier;">libbz2.dev</span> ("<span style="font-family: courier;">sudo apt-get install libbz2.dev</span>"), after which R was finally able to install <span style="font-family: courier;">rJava</span> (woo-hoo!).</p><p>I'd say this is getting ridiculous, but we passed that milestone years ago.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-82821209989770134572023-03-20T14:54:00.003-04:002023-03-20T14:54:48.504-04:00Annoying Message with R GA Package<p>This is a small technical note (mainly a reminder to myself) about using the <span style="font-family: courier;">GA</span> package for genetic algorithms in R, specifically within an R Markdown document (such as an R notebook). The <span style="font-family: courier;">GA::ga()</span> function includes an optional argument to turn on parallel evaluation of the fitness function. The argument value can be true or false (default), which are self-explanatory, or a number of cores, or a character string ("snow" or "multicore") for how parallelization is to be done. The default is "multicore" except on Windows, where "snow" is apparently the only option. When I choose to turn on parallel objective evaluation, I take the easy route and just set it to <span style="font-family: courier;">TRUE</span>.</p><p>When run in a terminal, I just get a sequence of log entries, one per generation, and then it terminates. That's also what I see when the command is run in an R Markdown document ... until the document is rendered to HTML. In the HTML output, interspersed with each line of log output is a block of four identical copies of the following.<br /></p><pre style="-webkit-text-stroke-width: 0px; background-color: white; border-radius: 4px; border: 1px solid rgb(204, 204, 204); box-sizing: border-box; color: #333333; display: block; font-family: monospace; font-size: 13px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; line-height: 1.42857; margin: 0px 0px 10px; orphans: 2; overflow-wrap: break-word; overflow: auto; padding: 9.5px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; widows: 2; word-break: break-all; word-spacing: 0px;"><code class="hljs" style="background-color: transparent; border-radius: 0px; box-sizing: border-box; color: inherit; font-family: monospace; font-size: inherit; padding: 0px; white-space: pre-wrap;">loaded GA and set parent environment</code></pre><p>
The fact that there are four copies is presumably because my PC has four cores. Turning off the monitor option gets rid of the progress printouts but does not eliminate the repetitive messages. Switching the parallel mode to "snow" does get rid of them, but I'm not sure what the broader implications of that are.</p><p>After significant spelunking, I traced the messages to the <span style="font-family: courier;">doParallel</span> R package, which is apparently being used to implement parallel processing under the "multicore" option. In any event, the way to get rid of them is to add the option "message = FALSE" to the R chunk in which <span style="font-family: courier;">ga()</span> is being executed. Barring other chunk options, this would change <span style="font-family: courier;">{r}</span> to <span style="font-family: courier;">{r, message = FALSE}</span> for that one chunk. (You can also set it as a global option in the document.) Happily, the monitor output (stats for each generation) still prints out. It just gets rid of those <expletive deleted> parallel processing messages.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-30048876379220296132023-03-07T16:48:00.000-05:002023-03-07T16:48:01.375-05:00Another GA for Clustering<p>Someone asked on OR Stack Exchange about <a href="https://or.stackexchange.com/questions/10059/how-to-perform-clustering-of-a-large-number-of-nodes/" target="_blank">clustering nodes</a>, a subject that comes up from time to time. The premise is that you have $N$ points (nodes), and for each pair $i \neq j$ there is (or may be) an edge with weight $w_{i,j} \in [0, 1]$ (where $w_{i,j} = 0$ if there is no edge between $i$ and $j$). The objective is to create exactly $G$ groups (clusters), with no group having cardinality greater than $M,$ such that the sum of within-group edge weights is maximized. It is simple to write an integer programming model for the problem, and the question includes one, but the author was looking for something that did not require an IP solver. The question specifies, as an example, dimensions of $N=500$ points, $G=8$ groups, and a maximum group size of $M=70.$</p><p>Since I have suggest genetic algorithms for previous clustering problems (see <a href="https://orinanobworld.blogspot.com/2021/04/a-ga-model-for-joint-clustering-problem.html" target="_blank">here</a> and <a href="https://orinanobworld.blogspot.com/2021/12/revisiting-joint-clustering-problem.html" target="_blank">here</a>), it probably will not shock you that I once again looked for a GA approach, using a "random key" algorithm (where the chromosome gets decoded into a feasible solution. In this case, the chromosome consists of $N+G$ values between 0 and 1. The first $G$ values are used to select the sizes $n_1,\dots, n_G$ of the groups. The last $N$ values are converted into a permutation of the point indices $1,\dots, N.$ The first $n_1$ entries in the permutation are the indices of the points in group 1, the next $n_2$ entries give the points in group 2, and so on.</p><p>The tricky part here is converting the first portion of the chromosome into the group sizes. We know that the maximum group size is $M,$ and it is easy to deduce that the minimum group size is $m = N - (G-1)M.$ So the minimum and maximum fractions of the population to assign to any group are $a=m/N$ and $b=M/N,$ where $$m = N - (G-1)M \implies a = 1 - (G-1)b.$$ Now suppose that I have values $\pi_1, \dots,\pi_G \in (a,b)$ and assign group sizes $n_i = \pi_i \cdot N,$ which clearly meet the minimum and maximum size limits. (I'm cheating here, but I'll fix it in a minute.) We want $\sum_{i=1}^G n_i = N,$ which means we need $\sum_i \pi_i = 1.$</p><p>To get the $\pi_i,$ we take a rather roundabout route. Assume that the first $G$ entries in the chromosome are $x_1,\dots,x_G \in (0,1).$ Set $$y_i = 1 - \frac{x_i}{\sum_{j=1}^G x_j} \in (0,1)$$ and observe that $\sum_{i=1}^G y_i = G-1.$ Finally, set $\pi_i = a + (b-a) y_i \in (a, b).$ We have $$\sum_i \pi_i = G\cdot a + (b-a) \sum_i y_i\\ = G\cdot a + (b-a)(G-1) = (G-1)b + a = 1.$$</p><p>Now to confess to my cheating. I said the $i$-th group size would be $n_i=\pi_i \cdot N,$ ignoring the minor problem that the left side is an integer and the right side almost surely is not. So of course we will round $\pi_i \cdot N$ to get $n_i.$ After rounding, though, we can no longer be sure that $\sum_i n_i = N.$ So we iteratively fix it. If $\sum_i n_i < N,$ we add 1 to the smallest $n_i$ and recheck. if $\sum_i n_i > N,$ we subtract 1 from the largest $n_i$ and recheck. Eventually, we end up with a feasible set of group sizes.</p><p>All this is encoded in an <a href="https://rubin.msu.domains/blog/ga_clustering.nb.html" target="_blank">R notebook</a> I wrote, including a sample problem. Whether the GA gets a "good" (near optimal) partition or not I cannot say, since I did not write the equivalent MIP model. I can say, though, that it gets a sequence of progressively improving partitions.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-19029288518592495832023-01-30T19:03:00.000-05:002023-01-30T19:03:23.918-05:00A Random Tree Generator<p>For a problem I was noodling with in R, I needed to generate random trees (meaning connected, acyclic, layered, undirected graphs, not classification trees or anything statistical). There are a bunch of libraries for R that have various capabilities for constructing and or massaging graphs and networks (see the <a href="https://cran.r-project.org/web/views/GraphicalModels.html" target="_blank">CRAN Task View</a> for graphical models). After spending some time poking through the docs for some of the libraries listed in the task view, and unsuccessfully trying to install one, I decided it would be faster just to roll my own code.</p><p>Along the way I succumbed to the software developer's maxim: if it works, it needs more features. Still, I wound up with a function that is not horribly complicated and seems to work. Given how many nodes and how many layers you want in the tree, it outputs a matrix of edges forming the desired tree. There are optional arguments for how you want the nodes labeled (the default is 1, 2, ... but you can supply a vector of labels) and in what order the labels should be assigned (top to bottom/left to right raster scan of the graph or randomly), as well as how you want the output matrix organized (randomly, by label order, or by layer order).</p><p>The code is in an R notebook that demonstrates the use of the function. It imports the <span style="font-family: courier;">DiagrammeR</span> library in order to plot the resulting trees, but the function does not require <span style="font-family: courier;">DiagrammeR</span> and in fact has no dependencies. You can view the notebook <a href="https://rubin.msu.domains/blog/treeGenerator.nb.html" target="_blank">here</a>, and if you want you can download the code from it (see the select box in the upper right of the notebook). The code is licensed under the <a href="https://creativecommons.org/licenses/by/3.0/deed.en_US">Creative
Commons Attribution 3.0 Unported license</a>.</p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-87638461500945047522023-01-27T14:16:00.000-05:002023-01-27T14:16:04.843-05:00Weird Caja Bug<p>In the past day or so I've tripped over a rather annoying bug in Caja, the file manager for the MATE desktop (which I use with Linux Mint, and which is also used by many Ubuntu users). Theoretically, the F2 key is a shortcut for Edit > Rename..., which lets you rename selected files. Usually, it works fine, but today (and at least once in the recent past) the F2 key had no effect.</p><p>At first I thought there was a problem with my keyboard, but no, the F2 key was being recognized. A Google search led me to a <a href="https://ubuntu-mate.community/t/potential-important-bug-in-caja/16686/12" target="_blank">bug report</a> from 2018 (!) where a diligent soul with screen name "terzag" discovered that the problem occurred when the Caja window displayed a vertical scrollbar and disappeared when the window was resized large enough to lose the scrollbar. Sure enough, that fix worked for me today ... and then things got weirder.</p><p>I did a little experimenting, and the bug reappeared when the scrollbar reappeared and disappeared whenever the scrollbar disappeared. Then I tried F2 with multiple file icons selected (which opens a dialog to bulk rename them), and it worked, despite there being a visible scrollbar. Since then, no matter how many times I close and reopen Caja, the bug does not manifest. So it is temporarily fixed, I guess, with emphasis on "temporarily".</p><p>I also encountered a possibly related bug, again reported years ago. If I hit either F2 or Edit > Rename... without first selecting one or more files, it closes (crashes?) all open Caja windows. I'm pretty sure that's a bug and not a "feature".</p><p>Hopefully this all gets fixed relatively soon ... although looking at bug reports from almost five years ago does not make me feel sanguine about the prospects.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-14998184317830038222023-01-27T13:43:00.000-05:002023-01-27T13:43:07.325-05:00Finding All Simple Cycles (III)<p>This is the third and (hopefully) final post about finding all simple cycles in an undirected graph. In the <a href="https://orinanobworld.blogspot.com/2023/01/finding-all-simple-cycles-i.html" target="_blank">first post</a>, I discussed the problem briefly and described a simple, efficient iterative algorithm. In the <a href="https://orinanobworld.blogspot.com/2023/01/finding-all-simple-cycles-ii.html" target="_blank">second post</a>, I posed (but definitely did not recommend) a mixed integer linear program (MILP) to find all simple cycles. Here I will discuss an alternative MILP approach, specific to CPLEX. To make sense of it, you will definitely need to have read the second post. Java code demonstrating all three methods is available from my <a href="https://gitlab.msu.edu/orobworld/simplecycles" target="_blank">repository</a>.</p><p>Recent versions of CPLEX have a feature called the solution pool, about which I have <a href="https://orinanobworld.blogspot.com/2013/01/finding-all-mip-optima-cplex-solution.html" target="_blank">previous written</a>. One use is to accumulate solutions (including suboptimal ones if desired) encountered along the way to the optimal solution of a MILP model. Another, using the <span style="font-family: courier;">IloCplex.populate()</span> method in the Java API, is to just generate a gaggle of feasible solutions. So the approach I propose here is to build a MILP model for finding a single simple cycle, set the solution pool population limit parameter (the maximum number of solutions to retain) to something really large and the solution pool intensity parameter (how hard CPLEX works to churn out solutions) to its maximum value, and then use the <span style="font-family: courier;">populate()</span> method to find simple cycles. This is implemented in the <span style="font-family: courier;">SingleCycleMIP.java</span> class in my code. </p><p>Actually, I'm not sure setting the intensity parameter is necessary. On the small test graph, results were similar no matter what intensity setting I used. CPLEX stopped with 160 solutions, many of which were duplicates. After removing duplicates, there were 13 distinct simple cycles, which matches what the other methods found. The only difference was run time, which was a fraction of a second using either the default intensity or the maximum intensity (4) and about two seconds using the minimum intensity (1). Either way, it was much faster than the full MILP model, at least on this small test graph.</p><p>The MILP model used here is a subset of the model in the previous post. The model has no objective value (or equivalently minimizes the constant zero function). We drop the $u$ variables (which were used to ensure that cycles were distinct) and the $x$ variables (which signaled whether a slot in the solution was filled with a cycle or not). For all other variables, we drop the $c$ subscript (which slot is being filled), since we are only building one cycle. The "one anchor per cycle" constraint becomes $\sum_i w_i = 1,$ and other constraints involving $u$ and/or $x$ are dropped.</p><p>One issue with this model is that the same cycle can be discovered multiple times, either because its orientation is reversed (so the "clockwise" and "counterclockwise" versions are considered distinct) or possibly because minor difference (rounding error) in the flow variables make two copies of the same solution look distinct. Regardless of the reason, the fact CPLEX terminated with 160 solutions when only 13 cycles exist tells you duplication is happening. Fortunately, we do not have to monkey with the model (or set overly finicky values for tolerance values); we can just filter the pool after the run and weed out the duplicates (which my code does). The alternative would be to use a callback (either an incumbent callback if you are using legacy callbacks, or the more current generic callback in "candidate" context) to weed out duplicates as they arise. Since the logic to be used in the callback is the same as the logic used in post-processing, I cannot see any virtue to using a callback, and it definitely would complicate the code.</p><p>With that I am hopefully done with this topic. :-)</p><p><br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-45323582944106023602023-01-26T15:03:00.002-05:002023-01-26T15:03:58.291-05:00Finding All Simple Cycles (II)<p><a href="https://orinanobworld.blogspot.com/2023/01/finding-all-simple-cycles-i.html" target="_blank">Yesterday's post</a> described my introduction (by way of a <a href="https://or.stackexchange.com/questions/9786/number-of-hamiltonian-sub-cycles-on-the-graph" target="_blank">question</a> on OR Stack Exchange) to the problem of how to find all simple cycles in a graph. In that post I described a simple, efficient iterative algorithm for finding them.</p><p>The OR SE question specifically asked about a mixed integer linear program (MILP) to find all simple cycles. While I would never take that approach myself, I got curious as to whether it was possible to construct such a model. Down the rabbit hole I went. The answer turns out to be yes. In this post, I'll describe the model (again, without advocating that it be used).</p><p>Assume that we have an undirected graph with vertices numbered $1,\dots,N$ and edge set $E.$ In what follows, $i$ and $j$ will always be vertex numbers and $c$ and $c'$ will index cycles. Although the graph is undirected, when we construct cycles we will give them an orientation (e.g., $3 \rightarrow 2 \rightarrow 5 \rightarrow 3$), which will be useful in ensuring that the result is an actual cycle (you can get from the starting node back to itself) and not the composition of multiple smaller cycles (e.g., not treating $3 \rightarrow 2 \rightarrow 4 \rightarrow 3$ and $5 \rightarrow 6 \rightarrow 9 \rightarrow 8 \rightarrow 5$ as if they form a single cycle).<br /></p><p>The model depends on guessing a non-binding upper bound $C$ on the number of simple cycles in the graph. (If the solution to the model does contain $C$ cycles, your guess may have been too small. To be certain that all cycles were found, you would need to bump up $C$ and solve again, until the optimal solution found fewer than $C$ cycles.)</p><p>The model contains the following gazillion binary variables:</p><ul style="text-align: left;"><li>$x_{c}$ is an indicator for whether cycle $c\in\left\{ 1,\dots,C\right\} $<br />is constructed;</li><li>$y_{i,c}$ is an indicator for whether vertex $i$ belongs to cycle $c$;</li><li>$z_{i,j,c}$ is an indicator for whether edge $[i,j]\in E$ is part of<br />cycle $c$ with the tour of $c$ proceeding from $i$ to $j$;</li><li>$w_{i,c}$ is an indicator for whether vertex $i$ is the anchor (starting<br />and ending point) for cycle $c$; and<br /></li><li>$u_{i,j,c,c'}$ is an indicator that edge $[i,j]\in E$ (in either<br />orientation) belongs to exactly one of cycles $c$ and $c'\neq c.$</li></ul><p>We also have one set of variables that can be defined as either continuous or integer (continuous probably makes the model solve faster):</p><ul style="text-align: left;"><li>$f_{i,j,c}\ge 0$ is the ``flow'' on edge $[i,j]$ in the orientation $i\rightarrow j$ in cycle $c.$</li></ul><p>The flow variables are an adaptation of the variables used in the <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem#Miller%E2%80%93Tucker%E2%80%93Zemlin_formulation[21]" target="_blank">Miller-Tucker-Zemlin formulation</a> for the traveling salesman problem (TSP). We will use them to ensure that cycles found are in fact cycles.</p><p>The objective, at least, is straightforward. Since we want to find all possible simple cycles, we maximize the number of cycles constructed: $\max\sum_{c=1}^C x_{c}.$ Now come the constraints. (Brace yourself for some serious vertical scrolling.)<br /></p><ul style="text-align: left;"><li>Edges can only belong to cycles containing both endpoints, and only<br />in one orientation: \begin{align*}<br />z_{i,j,c}+z_{j,i,c} & \le y_{i,c}\quad\forall[i,j]\in E,\forall c\\<br />z_{i,j,c}+z_{j,i,c} & \le y_{j,c}\quad\forall[i,j]\in E,\forall c<br />\end{align*}</li><li>Flow only occurs on edges that are used, and only in the orientation<br />used: \begin{align*}<br />f_{i,j,c} & \le Nz_{i,j,c}\quad\forall[i,j]\in E,\forall c\\<br />f_{j,i,c} & \le Nz_{j,i,c}\quad\forall[i,j]\in E,\forall c<br />\end{align*}</li><li>Constructed cycles have one anchor and unused cycles have none: \[<br />\sum_{i=1}^N w_{i,c}=x_{c}\quad\forall c<br />\]</li><li>A cycle's anchor must belong to the cycle: \[<br />w_{i,c}\le y_{i,c}\quad\forall i,\forall c<br />\]</li><li>In every cycle, at every node, there are either two incident edges<br />(if the node belongs to the cycle) or none (if the node is not part<br />of the cycle: \[<br />\sum_{j:[i,j]\in E}(z_{i,j,c}+z_{j,i,c})=2y_{i,c}\quad\forall i,\forall c<br />\]</li><li>In any used cycle, at any node in the cycle other than the anchor, flow out is at least one unit less than flow in (mimicking the MTZ formulation of the TSP): \[<br />\sum_{j:[i,j]\in E}f_{i,j,c}\le\sum_{j:[i,j]\in E}f_{j,i,c}-y_{i,c}+Nw_{i,c}\quad\forall i,\forall c<br />\]</li><li>To ensure a complete cycle, one unit of flow must return to the anchor node: \[<br />\sum_{j:[i,j]\in E}f_{j,i,c}\ge w_{i,c}\quad\forall i,\forall c<br />\]</li><li>The $u$ variables are defined in terms of the $z$ variables: \[<br />u_{i,j,c,c'}\le z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\quad\forall[i,j]\in E,\forall c<c'<br />\] \[<br />u_{i,j,c,c'}\le2-\left(z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\right)\quad\forall[i,j]\in E,\forall c<c'<br />\]</li><li>Any pair of cycles constructed must differ in at least one edge (which<br />also prevents a cycle from repeating with the opposite orientation): \[<br />\sum_{[i,j]\in E}u_{i,j,c,c'}\ge x_{c}+x_{c'}-1\quad\forall c<c'<br />\]</li></ul><p>It is also possible to add constraints to mitigate some of the symmetry in the model. Two come to mind:</p><ul style="text-align: left;"><li>Unused cycles must have higher indices than used cycles: \[<br />x_{c}\le x_{c-1}\quad\forall c\in\left\{ 2,\dots,C\right\} <br />\]</li><li>The anchor in a cycle must have the smallest index of any node<br />in the cycle: \[<br />w_{i,c}+y_{j,c}\le1\quad\forall i>j,\forall c<br />\]</li></ul><p>The first one is of low enough dimension that I automatically included it in my Java code. I made the second one optional in the code.</p><p>When I ran it against the 9 node, 12 edge graph in the OR SE question, CPLEX had a solution with 13 cycles (the full number) after about 25 seconds and 5,000 nodes -- way, way more time than the 13 <i>milliseconds</i> the iterative method needed. I set $C=20$ as the upper bound on the number of cycles, which of course is also the initial upper bound on the objective. After 60 seconds (the time limit I set), CPLEX's upper bound was still 20. So probable optimality will not be easy to come by.</p><p>I've got one more MILP model up my sleeve, coming in the next post. Meanwhile, as a reminder, my Java code is available <a href="https://gitlab.msu.edu/orobworld/simplecycles" target="_blank">here</a>.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-44266291329494032342023-01-25T18:26:00.000-05:002023-01-25T18:26:42.240-05:00Finding All Simple Cycles (I)<p>This is the first of a sequence of posts related to the question of finding all simple cycles in an undirected graph. Motivation for the posts comes from a <a href="https://or.stackexchange.com/questions/9786/number-of-hamiltonian-sub-cycles-on-the-graph" target="_blank">question</a> on OR Stack Exchange. A simple cycle is a cycle with no repeated nodes (other than the starting/ending node).</p><p>Assume we have an undirected graph with nodes indexed from 1 to $N$ and with edge set $E.$ I will use the term "anchor" to refer to the node at which a tour of a simple cycle starts (and ends), and require (without loss of generality) that the anchor of every cycle be the lowest index node in the cycle. Since a cycle is the same whether you traverse it "clockwise" or "counterclockwise", I will also require without loss of generality that nodes in a cycle be listed so that the node immediately after the anchor has smaller index than the node immediately before the anchor. This is just to reduce production of duplicate cycles.</p><p>I came up with three approaches to finding all simple cycles. The first, which I will describe below, is a simple iterative scheme that is the fastest. The other two, described in subsequent posts, use mixed integer linear programs (MILPs) -- not because they are efficient but because the OR SE question specifically asked about MILP approaches. I will describe those in subsequent posts.</p><p>The iterative scheme is pretty simple. It uses a queue of paths (incomplete cycles) to store work in progress. I will use "terminus" to indicate the last node in a path. When the terminus equals the anchor, we have a cycle. Due to my definition of "anchor", we can say that no node on a path can have a lower index than the anchor of the path.</p><p>The iterative scheme loops over possible anchors from 1 to $N.$ For each anchor, we initialize the queue with all paths consisting of a single edge incident at the anchor and not incident at any node less than the anchor. While the queue is not empty, we pop a path and create one or more new paths by extending that path with each qualifying edge. A qualifying edge is one that meets the following conditions:</p><ul style="text-align: left;"><li>it is incident at the terminus of the current path;</li><li>it is not incident at any node already on the path (other than anchor);</li><li>it is not incident at any node less than the anchor; and</li><li>if it is incident at the anchor, it is not the same as the first edge of the path (i.e., we do not allow a "cycle" of the form $a\rightarrow b \rightarrow a$).<br /></li></ul><p>If the new edge leads to the anchor, we have a cycle. If not, we update the terminus and add the extended path to the queue, for further processing later. If a path cannot be extended because it is not a cycle and there are no qualifying edges left, we discard it.</p><p>Coding the algorithm is pretty straightforward, and it should work on any undirected graph (even those that are not connected). On the example from the OR SE post (9 nodes, 12 edges) it took a whopping 11 milliseconds to find all 13 simple cycles.</p><p>Java code for this and the two MILP models (the latter requiring a recent version of CPLEX) can be found in my <a href="https://gitlab.msu.edu/orobworld/simplecycles" target="_blank">GitLab repository</a>. The iterative algorithm is in the <span style="font-family: courier;">CycleFinder</span> class. Stay tuned for the MILP models.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-77178391836408762632022-12-26T14:27:00.001-05:002022-12-26T14:27:41.558-05:00Selecting Dispersed Points<p>Fellow blogger Erwin Kalvelagen posted <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2022/12/maximum-number-of-points-with-minimum.html" target="_blank">a comparison</a> of two binary programming models, one quadratically constrained and one linearly constrained, for the problem of selecting a maximal number of points from a finite set subject to the requirement that no two selected points be closer than a specified distance. The models were an answer to a <a href="https://scicomp.stackexchange.com/questions/42321/selecting-most-points-from-a-set-of-points-with-distance-constraint" target="_blank">question</a> posted on Computational Science Stack Exchange. Not surprisingly, the linearized version tended to solve faster than the quadratic version.</p><p>In Erwin's linear model, the constraints take the form $$\underline{D}(x_i + x_j - 1) \le d_{i,j} \quad (1)$$ where $d_{i,j}$ is the distance between points $i$ and $j$, $\underline{D}$ is the minimum allowable distance between selected points, and $x_k$ is a binary variable indicating whether point $k$ is selected (1) or not (0). I coded both his models in Java, using CPLEX 22.1.1, along with another linear model where the constraints are expressed as $$x_i + x_j \le 1\quad (2)$$ for those pairs $(i,j)$ where $d_i + d_j \le \underline{D}.$ In other words, we exploit the fact that we know the distances at the outset to precompute which pairs of points can / cannot coexist, and just rule out the pairs that cannot. Since (1) is equivalent to $$x_i + x_j \le 1 + \frac{d_{i,j}}{\underline{D}},$$ constraint (2) is clearly at least a bit tighter than constraint (1).<br /></p><p>Erwin started with a problem size of 50 points for demonstration purposes, then doubled that to compare timing of this two models. I ratcheted the problem size up to 1,000 points to compare his linear model to mine. (I did not test the quadratic model at that size.) As with all things MIP, the results were not entirely consistent. In limited testing, the model using (2) was usually faster than the model using (1), but occasionally (1) proved faster. The run time differences were not large enough to be exciting. For instance, in one test run version (1) needed 4.633 seconds versus 2.987 seconds for version (2).</p><p>Overall, I can't say the time differences lived up to my expectations, and the fact that at least occasionally (1) was faster than (2) (perhaps due to some quirk in presolving, or just to some random choices in branching) is consistent with my experience that MIP behaviors are, well, not consistent. <br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com4tag:blogger.com,1999:blog-8781383461061929571.post-2656940070063535962022-12-13T17:38:00.000-05:002022-12-13T17:38:10.113-05:00Scheduling a Round-Robin Tournament<p>Once in a while I come across a scheduling problem, and of course my first reaction is to think "integer program". A while back, for example, I used an IP model to work out a schedule for a rotating duplicate bridge game (in which both teams and hosts rotated) at the request of a buddy. Most recently, I saw a question on OR Stack Exchange ("<a href="https://or.stackexchange.com/questions/9545/doubles-round-robin-sorting-algorithm" target="_blank">Doubles Round Robin Sorting Algorithm</a>") related to tournament scheduling. </p><p></p><p>The problem involves six players playing pickleball (so two players on each team, with two sitting out, for each game). Given six players, there are 15 possible teams (pairings). Twelve games are scheduled each week for four weeks, with two sets of six games per week. The author of the question correctly computed that there are 45 possible game configurations. Over the course of the four weeks, he wanted every possible game played at least once (which leaves the final three game slots to be filled arbitrarily). The following conditions must be met:</p><ul style="text-align: left;"><li>within each set of six games, every player plays four games and sits out two;</li><li>no player sits out two consecutive games within the same set;</li><li>no two players are partners more than once per set.</li></ul><p>The problem poses no criterion for selecting among feasible schedules; we just want to find any one schedule that works. </p><p>My IP formulation can be found in my answer to the OR SE question. I will just mention that it uses binary variables $x_{g,s}=1$ if game $g$ goes in slot $s$ and 0 if not. Games are enumerated up front and indexed from 1 to 15. Slots refer to positions in the schedule and are numbered from 1 to 48, with six slots per set and 12 slots per week. Since no criterion is specified, we just use the default (minimize 0).</p><p>I also tried a constraint programming model, using general integer variables $s_1,\dots,s_{48},$ where $s_j \in {1,\dots,15}$ is the index of the game played in slot $j$ of the schedule. My CP formulation uses the "all different" constraint (fairly ubiquitous among CP solvers) to ensure that the first 45 slots each contain a different game. It uses the CP Optimizer "distribute" constraint to ensure that no game is played more than twice and the CP Optimizer "count" constraint to ensure that each player plays exactly four times per set (and thus sits twice) and that no team plays more than once in a set. I'm not sure how common those types of constraints are among CP solvers because I don't have much experience with CP solvers.</p><p>As it turns out, both CPLEX (for the IP model) and CP Optimizer (for the CP model) produced feasible schedules in about a second or so on my desktop PC. My intuition was that a CP model would be better suited to this type of problem, because all variables are integer and a CP solver would not be doing much if any matrix arithmetic. As it turns out, it would take a much larger test case to tell whether that intuition has any merit.</p><p>If anyone wants to play with the models, my Java code is available from my university GitLab <a href="https://gitlab.msu.edu/orobworld/roundrobin" target="_blank">repository</a>. Running it would require CPLEX and CP Optimizer (and not necessarily the most recent versions of either).<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-23095946496261555012022-11-06T15:52:00.002-05:002022-11-06T15:53:39.220-05:00A Bicriterion IP Model for a Game Strategy<p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">A <a href="https://math.stackexchange.com/questions/4569317/how-can-i-solve-this-optimization-problem-with-integer-variables" target="_blank">question</a> on Mathematics Stack Exchange asks how to solve an optimization problem related to a game apparently called "speedrunner". The problem boils down to selecting integer values for variables $n_a,$ $n_c$ and $n_{b,i}$ ($i=1,\dots,n_c$) so as to minimize elapsed time $$t_{\textrm{total}} =t_{a}n_{a}+\sum_{i=1}^{n_{c}}t_{b}n_{b,i}+t_{c}n_{c},$$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">where $t_a$, $t_b$ and $t_c$ are parameters with specified values. The sole constraint specified is that winnings, defined by</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$x_{\textrm{total}} = x_{0}n_{a}\prod_{i=1}^{n_{c}}n_{b,i}$$(where $x_0$ is another parameter), must be at least $\$1$ billion. We can infer from the requirement of positive earnings that $n_a \ge 1,$ $n_c \ge 1,$ and $n_{b,i} \ge 1$ for $i=1,\dots,n_c.$ For modeling purposes, we will allow the index $i$ to exceed $n_c$ and require that $n_{b,i}=0$ for $i \gt n_c.$ An integer linear programming model would be a no-brainer were it not for two things: the product form of the expression for winnings is nonlinear; and it involves the product of a variable number of variables. </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The approach I chose was to express the original variables in terms of binary variables. To do that, we first need upper bounds on the original variables. We get those by guessing an upper bound $T$ on $t_{\textrm{total}}.$ (If we guess too low, we will either get a solution that exceeds that upper bound or the model will become infeasible, either of which will tip us off to try a larger guess.) Once we have that upper bound, we get an upper bound for each variable by setting the other two variables as small as possible. That leads to the following upper bounds:$$N_{a} =\left\lfloor \frac{T-t_{b}-t_{c}}{t_{a}}\right\rfloor$$</p>
<p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$N_{b} =\left\lfloor \frac{T-t_{a}-t_{c}}{t_{b}}\right\rfloor$$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">and </p>
<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$$N_{c} =\left\lfloor \frac{T-t_{a}}{t_{c}+t_{b}}\right\rfloor .$$ </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">The denominator in the third equation arises from the fact that adding 1 to $n_c$ not only directly costs $t_c$ units of time but also adds another $n_{b,i}$ variable with minimum value 1, costing $t_b$ units of time.</p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">Armed with those upper bounds, we can introduce binary variables $y_j$ $(j=1,\dots,N_a),$ $w_j$ $(j=1,\dots, N_c)$ and $z_{i,j}$ $(i=1,\dots,N_c$ and $j=0,\dots,N_b)$ along with constraints making them type 1 special ordered sets (i.e., each set contains exactly one variable with value 1) and expanding the original variables in terms of them. Specifically:</p><ul style="text-align: left;"><li>$n_{a}=\sum_{j=1}^{N_{a}}j\cdot y_{j}$ with constraint $\sum_{j=1}^{N_{a}}y_{j}=1;$<style type="text/css">p, li { white-space: pre-wrap; }</style> </li><li><p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">$n_{c}=\sum_{j=1}^{N_{c}}j\cdot w_{j}$ with constraint $\sum_{j=1}^{N_{c}}w_{j}=1;$ and</p></li><li>$n_{b,i}=\sum_{j=0}^{N_{b}}j\cdot z_{i,j}$ for $i=1,\dots,N_{c}$
<p style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; margin: 0px; text-indent: 0px;">with the following constraints for all $i\in\left\{ 1,\dots,N_{c}\right\}:$</p><ul><li>$\sum_{j=0}^{N_{b}}z_{i,j}=1$ (the value of $n_{b,i}$ is uniquely defined); and<br /></li></ul><style type="text/css">p, li { white-space: pre-wrap; }</style></li><ul><li>$z_{i,0}+\sum_{k=i}^{N_{c}}w_{k}=1$ ($n_{b,i}=0$ if and only if $n_{c}<i$).</li></ul></ul><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">Armed with all that, the original objective function can be expressed as minimizing$$t_{a}n_{a}+t_{b}\sum_{i=1}^{N_{c}}n_{b,i}+t_{c}n_{c},$$where the only tweak is that we now sum over a constant number $N_c$ of terms rather than a variable number $n_c,$ relying on the fact that $n_{b,i}=0$ for $i>n_c.$</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">That brings us to the nonlinear formula for winnings. The original constraint is $ x_{\textrm{total}} \ge 10^9,$ which we can rewrite as $\log(x_{\textrm{total}}) \ge \log(10^9)$ using whatever flavor logarithm you like. Expanding the logs of $n_a$ and $n_{b,i}$ in terms of the binary variables, we get$$\log(x_{\textrm{total}}) = \log(x_{0})+\sum_{j=1}^{N_{a}}\log(j)\cdot y_{j} +\sum_{i=1}^{N_{c}}\sum_{j=1}^{N_{b}}\log(j)\cdot z_{i,j}.$$So the constraint that log winnings equal or exceed the log of $10^9$ is linear in the binary variables.</p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"> </p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">The solution in the Math Stack Exchange post $(n_a = 7,$ $n_c = 3$ and $n_{b,1}=n_{b,2}=n_{b,3}=18)$ turns out to be optimal given the parameter values in the question, and the IP model here will correctly achieve the minimum time (approximately 52.3 minutes) ... but it will not necessarily reproduce the best winnings within that time (approximately $\$1.02$ billion). That is because there are multiple feasible solutions that hit the correct time value, with $n_a = 7$ and with $n_c = 3$ but with the 54 cumulative $n_b$ units allocated differently (for instance, $\{19, 19, 16\}$ rather than $\{18, 18, 18\}.$ To get the best possible solution, we can use a bicriterion model with lexicographically ordered objectives, first minimizing time and then maximizing winnings (by minimizing the negative of the log of winnings).</p><p></p><p>I tested that model using the Java API to CPLEX 22.1. My code is available from my <a href="https://gitlab.msu.edu/orobworld/speedrunner" target="_blank">university repository</a>.<br /></p><p style="-qt-block-indent: 0; -qt-user-state: 1; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;"><style type="text/css">p, li { white-space: pre-wrap; }</style></p><p><style type="text/css">p, li { white-space: pre-wrap; }</style></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-14095270023479677062022-11-01T11:59:00.000-04:002022-11-01T11:59:41.193-04:00Holt Double Exponential Smoothing<p>Given a time series $x_t,\, t=1,2,\dots,$ presumed to contain linear trend, <a href="https://en.wikipedia.org/wiki/Exponential_smoothing#Double_exponential_smoothing_(Holt_linear)" target="_blank">Holt's double exponential smoothing method</a> computes estimates $s_t$ of the level (mean) at time $t$ and $b_t$ of the (presumed constant) slope using the following formulas: $$s_t = \alpha x_t + (1-\alpha)(s_{t-1} + b_{t-1})$$ and $$b_t = \beta(s_t - s_{t-1}) + (1-\beta)b_{t-1}$$ where $\alpha,\beta \in (0,1)$ are smoothing weights. A recent <a href="https://or.stackexchange.com/questions/9217/forecasting-with-holts-linear-trend-method/" target="_blank">question on OR Stack Exchange</a> asked why the second formula is based on the level estimate and not the observed value. In other words, the proposed alternative to the trend update was $$b_t = \beta(x_t - x_{t-1}) + (1-\beta)b_{t-1}.$$</p><p>The intuition for doing it Holt's way is fairly simple. If exponential smoothing is working as intended (meaning <i>smoothing</i> things), then the difference in level estimates $s_t - s_{t-1}$ should be less variable than the difference in observed values $x_t - x_{t-1}.$ A formal proof probably involves induction arguments, requiring more functioning brain cells than I had available at the time, so I was a bit loose mathematically in my answer on OR SE. Just to confirm the intuition, I did some Monte Carlo simulations in R. The notebook containing the experimental setup, including code, is available <a href="https://rubin.msu.domains/blog/Holt.nb.html" target="_blank">here</a>. It requires the <span style="font-family: courier;">dplyr</span> and <span style="font-family: courier;">ggplot2</span> library packages.<br /></p><p>The following plots show confidence intervals over time for the errors in the level and trend estimates using both Holt's formula and what I called the "variant" method. They are from a single experiment (100 independent time series with identical slope and intercept, smoothed both ways), but other trials with different random number seeds and changes to the variability of the noise produced similar results.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUgHyaC2TcpZkTaIz56M66ivTu36aHkdHCisEYUsLmc_7rQdmZ2xYiVmj0QPE5YK0h_8NoYjlEN4C6aELVQThit3XqDUi6THWxVZ3w4KV53k-h6CL19t7anItawyIpmSDYPgij665mZBcYH_KQNoERrxFeEYFbR3xi7Ikp-0NXwqhKK2Z1FlRh8QxE/s579/Holt_level.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="plot of confidence intervals for error in level estimates" border="0" data-original-height="357" data-original-width="579" height="394" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUgHyaC2TcpZkTaIz56M66ivTu36aHkdHCisEYUsLmc_7rQdmZ2xYiVmj0QPE5YK0h_8NoYjlEN4C6aELVQThit3XqDUi6THWxVZ3w4KV53k-h6CL19t7anItawyIpmSDYPgij665mZBcYH_KQNoERrxFeEYFbR3xi7Ikp-0NXwqhKK2Z1FlRh8QxE/w640-h394/Holt_level.png" width="640" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDQmQIPfbMZ25b11FqmSHjgqyN1h0RfwKWn5rAKNHzEB1lvhuKpYEe17y4MiUIwcwTGwS6-wxTeHN0a6VA-Q8Qxiwuo1q3QT5ML1ABHMm7YJhdtCOLjl63-Y8XQKbq8Io1a8bruxkkgjv8Yi_-jiWy091hKV5TNu0UeHIag-ZNoM3UjyktlIdE_IVH/s579/Holt_trend.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="plot of confidence intervals for error in trend estimates" border="0" data-original-height="357" data-original-width="579" height="394" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDQmQIPfbMZ25b11FqmSHjgqyN1h0RfwKWn5rAKNHzEB1lvhuKpYEe17y4MiUIwcwTGwS6-wxTeHN0a6VA-Q8Qxiwuo1q3QT5ML1ABHMm7YJhdtCOLjl63-Y8XQKbq8Io1a8bruxkkgjv8Yi_-jiWy091hKV5TNu0UeHIag-ZNoM3UjyktlIdE_IVH/w640-h394/Holt_trend.png" width="640" /></a></div><br /><p>In both cases, the estimates start out a bit wobbly (and the Holt estimates may actually be a bit noisier), but over time both stabilize. There does not seem to be much difference between the two approaches in how noisy the level estimates are, at least in this run. The Holt estimates may have slightly narrower confidence intervals, but that is not clear, and the difference if any seems pretty small. The Holt trend estimates, however, are considerably less noisy than those of the variant method, supporting the intuitive argument.<br /></p>Paul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.com0