Showing posts with label integer programming. Show all posts
Showing posts with label integer programming. Show all posts

Tuesday, May 5, 2026

Identifying Extreme Points

I found a recent blog post by Erwin Kalvelagen titled "Convex hull models" rather interesting. Although the title mentions convex hulls, what Erwin actually discusses is finding the extreme points of the convex hull of a finite point set. (The full convex hull would also include bounding hyperplanes, which is a separate, arguably thornier, matter.) Erwin presents both a mixed integer nonlinear programming (MINLP) model and two mixed integer linear programming (MILP) models. In computational tests, they all solved at the root node, leading Erwin to make the following observation: "[a]s we consistently get integer models that need 0 nodes it is likely this model is really an LP." He tested the LP relaxation of one of his models, and it seemed to work.

My interest stems in part from the fact that after contemplating the problem a bit I assumed it could be solved as an LP. Assume we have a finite set $P=\lbrace p_1,\dots,p_m\rbrace\subset \mathbb{R}^n$ of points. Our mission is to determine which are the extreme points of $\mathrm{conv}(P),$ the convex hull of $P.$ A starting point is the recognition that the extreme points of $\mathrm{conv}(P)$ must belong to $P,$ and that an extreme point cannot be a convex combination of other points in $P.$

Now consider this system of linear equations and inequalities: 

$$p_i  = \sum^{m}_{k=1}\lambda_{k}p_k\quad (1)$$$$\sum^{m}_{k=1}\lambda_{k} = 1\quad (2)$$$$\lambda_{k} \ge0\ \ \forall k \quad (3).$$

This expresses point $p_i$ as a convex combination of the points in $P$ including itself. So a trivial solution is $\lambda_i = 1$ and $\lambda_j = 0$ for all $j\neq i.$ If a solution exists with $\lambda_i=0$ then $p_i$ is a convex combination of the other points and therefore not extreme. If there is no solution with $\lambda_i=0,$ then $p_i$ is not a convex combination of other points and therefore must be extreme. This logic, by the way, is also embedded in Erwin's models. 

Suppose that we find a solution with $0 < \lambda_i < 1.$ Combining the $p_i$ terms on the left side of (1), we have $$(1-\lambda_i) p_i = \sum_{k \neq i} \lambda_k p_k$$ and therefore $$p_i = \sum_{k \neq i} \frac{\lambda_k}{1-\lambda_i} p_k.$$ Since $$\sum_{k=1}^m \lambda_k = 1 \implies \sum_{k\neq i}  \frac{\lambda_k}{1-\lambda_i} = 1,$$ we know that if a solution to (1)-(3) exists with $\lambda_i < 1$ then a solution exists with $\lambda_i = 0.$

So we can solve the problem of minimizing $\lambda_i$ subject to the constraints above. If the minimum value is 0, $p_i$ is not an extreme point. If the minimum value is strictly greater than 0, then it must be 1, which means that $p_i$ cannot be written as a convex combination of the other points and therefore must be extreme.

Along the lines of what Erwin did, we can write a single LP that identifies all extreme points. We add a second index to $\lambda,$ so that $\lambda_{i,k}$ is the weight assigned to $p_k$ when writing $p_i$ as a convex combination of the points. We then stack the constraints for each $p_i$ into a single model and minimize $\sum_i \lambda_{i,i}.$ Positive values of the objective terms identify extreme points. That said, I find it perhaps more convenient to solve a sequence of smaller LPs, one for each point in $P.$

As a proof of concept, I wrote an R Markdown document that generates a random set of points in two dimensions and uses the single-point model to find extreme points. I stuck to two dimensions to allow for easy plotting of the results. It uses the lpSolve package, a wrapper for the open-source lp_solve LP/MIP solver. (It also needs the ggplot2 package for plotting purposes.) You can download the .Rmd file and try it for yourself (assuming that you have R and the two aforementioned packages installed). On my fairly humble PC, tow-dimensional test cases with 1,000 points solved in the blink of an eye. Of course, solution times increase as dimension and sample size increase. It is an empirical question (for which I have no answer) whether solving for all points at once in a single model is faster or slower than solving for each point individually. To belabor the obvious, doing all in a single model will require much more memory.

Thursday, September 11, 2025

Building a Partial Tour

A user on Mathematics Stack Exchange posted a question that, while superficially similar to the prize collecting traveling salesman problem (TSP), also has substantial differences to it. You are given a rectangular grid where each cell contains a reward. The objective is to build a tour of some (but not all) cells, with movement restricted to adjacent cells (those sharing an edge with the current cell), subject to the restriction that you cannot visit two cells with identical rewards. The objective is to maximize the value of the cells visited. It differs from the prize collecting TSP in several ways:

  • there is no designated start/end cell for the tour;
  • there are no edge costs (movement incurs no objective penalty);
  • there is no penalty for unvisited cells; and of course
  • duplicate rewards are prohibited.

A key part of the user's question was how to handle subtour elimination. The user was looking at the classic Miller-Tucker-Zemlin (MTZ) approach for TSPs, which involves an auxiliary variable at each cell recording in essence the position of the cell (first, second, ...) in the tour. The implementation of MTZ involves constraints at each cell except the depot (tour origin/destination). In the current problem, there is not designated depot, and you cannot just pick a cell to serve as the depot because you do not know which cells will be in the optimal tour.

The workaround I suggested in a reply to the post is to add an artificial cell (the "depot"), connected to every cell in the original grid, and use that to implement the MTZ constraints. The one mildly tricky part is that the cell immediately preceding the depot and the cell immediately following the depot in the tour need to be adjacent (or the same, if the tour only visits one cell). That is easily handled at the cost of some extra constraints.

Assume the grid contains $m$ rows and $n$ columns. Let us start by numbering the cells $1,\dots,mn$ in raster scan order from upper left to lower right, and call the depot cell 0. Let $r_i$ denote the reward for visiting cell $i$ and $N_i$ denote the set of cells (including the depot) adjacent to cell $i$. Finally, let $R$ denote the set of possible rewards and $C_r$ the "cluster" of cells having the same reward $r\in R.$ We build the model as follows.

Variables

  • $x_{ij}\in \{0,1\}$ will be 1 if the tour moves from cell $i$ to cell $j,$ 0 otherwise. 
  • $y_{i} \in \{0,1\}$ will be 1 if the tour visits cell $i > 0,$ 0 if not.
  • $z_{i} \ge 0$ is the auxiliary variable (position) for cell $i$ in the MTZ constraints (indexing where cell $i$ falls in the tour). 
If cell $i$ does not appear in the tour, the value of $z_{i}$ will be irrelevant (and the solver will likely set it to 0).
 

Objective

The objective is to maximize $$\sum_{i} r_i y_i.$$ 
 

Constraints

  • Movements may only occur between adjacent cells (with the depot cell adjacent to every other cell).  $$x_{ij} = 0 \quad \forall i, \forall j\notin N_i.$$
 
  • The depot must be exited and entered exactly once. $$\sum_{i > 0} x_{0i} = 1 = \sum_{i > 0} x_{i0}.$$
 
  • Each cell other than the depot is entered and exited once if visited and zero times if not visited. $$\sum_{j} x_{ji} = y_i = \sum_{j} x_{ij} \quad \forall i>0.$$
 
  • At most one cell with a given reward value can be visited. $$\sum_{i\in C_r} y_i \le 1 \quad \forall r\in R.$$
 
  • MTZ: If the tour moves from cell $i$ to cell $j,$ the position value for cell $j$ must be at least one greater than the position value for cell $i$ (excluding $j=0,$ i.e., return to depot). $$x_{ij} = 1 \implies z_j \ge z_i + 1 \quad \forall i, \forall j > 0.$$
 
  • If the tour goes $0 \rightarrow i \rightarrow \dots \rightarrow j \rightarrow 0,$ then cells $i$ and $j$ must be adjacent (or the same). $$x_{0i} + x_{j0} \le 1 \quad \forall i>0, j>0, i\neq j, j \notin N_i.$$
 
I tested the code on the example in the Math SE post, using Java and the FICO XpressMP MIP solver, and it produced what appears to be an optimal tour (confirmed by the original poster) in about two minutes on my somewhat archaic desktop. I posted the tour in a comment to my answer to the original question. The Java code is available (Creative Commons 4 open source license) from my Git repository.

Monday, April 28, 2025

Routing With Sequencing

The motivation for this post comes from a sequence of questions posted on OR Stack Exchange (including this one), having to do with a mixed integer programming model for routing an electronic vehicle (EV) serving various customers. One difference from the basic single vehicle routing models with which I'm familiar is that the EV has to visit a charging station periodically during the route. That is easy to accommodate. Where it gets funky is that the modeler needs to know within the model which customer was last on the route, because the EV is required to go to the nearest charging station after its last stop. I'll take that a step further and require the model to provide (via variables) the position of each node (first, second, third) in the route sequence. This might be useful if, for example, the model had to enforce a rule that customer X must be among the first three customers served. Identifying just the last customer node is easier, as I'll describe at the end.

Attempts by the author of the original question followed the usual pattern for vehicle routing. Assume that there is a single vehicle, each customer must be visited once, and there are no time windows complicating things. You have a digraph containing nodes for each customer and each charging station. You typically start by assigning a binary variable $x_{ij}$ to each arc $(i, j),$ taking value 1 if and only if the vehicle crosses that arc, and proceed from there.

To collect sequencing information, I would normally employ the Miller-Tucker-Zemlin formulation of subtour elimination constraints. The MTZ approach adds a nonnegative auxiliary variable $u_i$ for each node $i$ together with the constraints $$u_j \ge  u_i + x_{ij} - M(1 - x_{ij})$$ for each pair of distinct nodes $i\neq j.$ This says that if we cross arc $(i, j),$ the value of $u_j$ must be at least one higher than the value of $u_i,$ preventing any loops. If $n$ is the number of nodes and we are willing to start numbering with $u_s=0$ ($s$ being the starting node for the tour), we can choose $M=n-1.$ The MTZ constraints are intended to prevent subtours, but as a side effect the $u$ variables number the stops in the order they occur.

This would work for the EV problem if there were a rule that the EV cannot use the same charging station twice during a tour. If the vehicle can stop more than once at the same charging station (which I assume would normally be the case), we cannot use the MTZ constraints because a repeat visit to a charging station would create a subtour.  This also complicates (I think) the use of subtour elimination constraints to prevent disjoint subtours. Fortunately, there are at least two "reasonable" (in my opinion) workarounds, both using the MTZ constraints. Unfortunately both are clunky.

The first workaround is to create multiple clones of each charging station node. So if node $s$ represents a charging station, we introduce additional nodes $s', s'', s''' \dots$ that are all charging stations, all in the same location (meaning time / distance / charge consumption between node $i$ and any of the clones is the same). For any arcs $(i, s)$ and $(s, j)$ we add arcs $(i, s'), (s', j), (i, s''), (s'', j)$ etc. We do not require that every charging node be entered (unlike customer nodes, which must all be visited), but we do limit each charging node to at most one entry. That removes the threat of loops and let us use the MTZ approach. Besides making the digraph larger, this forces the modeler to guess how many clones of each charging node will be needed.

The other approach is change to a multigraph, with fewer nodes but more arcs. We include only customer nodes, plus dummy start and end nodes. Arcs from the start node to each customer (within EV range) are the same as before. The end node is a stand-in for the closest charging station to the last customer visited. The arc from any customer node to the end node has the time / distance / charge consumption required to reach the closest charging station. (The closest charging station to each customer node is computed before building the model.)

For each pair of customer nodes $i \neq j,$ the arc $(i, j)$ (if it exists) represents moving directly from $i$ to $j.$ For select charging nodes $s$ we add another arc from $i$ to $j$, which I will denote $<i, s, j>,$ that represents going from $i$ to $s,$ recharging, and then proceeding from $s$ to $j.$ Each of those arcs produces another MTZ constraint. As usual, every customer node should be entered/exited exactly once.

If the number of charging stations is small, we can create extra arcs  $<i, s, j>$ for every combination of two customers and a charging station, weeding out those that are infeasible (meaning an EV with a full charge could not get from $i$ to $s$ or from $s$ to $j$). To get a smaller model, we can throw out arcs that are dominated by other arcs. For $<i, s, j>$ to dominate $<i, s', j>,$ you would need power consumption from $i$ to $s$ to be no greater than power consumption from $i$ to $s'$ and power consumption from $s$ to $j$ to be no greater than power consumption from $s'$ to $j.$ (If other criteria, such as mileage or transit time, appear in the objective function then they would also factor into the determination of dominance.)

One advantage of this approach is that it would let us dispense with the $u$ variables and the MTZ constraints if the only reason for them was to enable constraints forcing the EV to end at the charging station closest to the last customer, since that is baked into the arcs leading to the dummy end node. If we want to use the cloned charging station approach, we can also use a dummy end node linked to each customer by an arc representing the link to that customer's nearest charging station to enforce the desired ending rule for the tour.

Monday, March 3, 2025

Boolean Grid II

As the title implies, this is a sequel to my previous post, to deal with a variety of odds and ends.

There's a saying that you cannot teach an old dog new tricks. That's untrue; it's just that dogs as old as I am are slow learners. When I first started tangling with integer programs, there was a rule that you never made the model dimensions (number of rows or columns) any larger than you absolutely had to, partly to conserve memory and partly so as not to slow down pivoting. Once solvers switched from Gaussian pivoting (the way I learned to do it by hand) to matrix factoring, keeping dimensions down took a back seat to reducing matrix density. 

Similarly, once upon a time I learned (the hard way) that symmetry in my model would slow down pruning of the search tree. In the previous post, I alluded to research on exploiting symmetry and said something about solvers having symmetry detection. Imre Polik mentioned in a comment that Xpress already detects the symmetry by default. CPLEX might need some encouragement to do so. Near the start of the Xpress output, it describes the model as symmetric and lists statistics on "orbits" (groups of variables whose values can be permuted). It is unclear to me whether Xpress exploits that information by using "orbital branching" (a relatively recent development) or in some other way. Early in the CPLEX output I see a message that it is "detecting symmetries", but the model dimensions do not change and there are no further mentions of symmetry.

Moving on, Imre suggested in his comment to the previous post that another possible antisymmetry constraint is to assert a dominance inequality between the number of  true values in the top row and the number in the left column. This is compatible with my original two constraints, and so I added a version of it (combining Imre's constraint with mine) to my code. Rob Pratt suggested yet another possibility, an inequality between the subdiagonal and superdiagonal. I'm not convinced that one plays nice with my original constraints, meaning that if you added Rob's constraint to my original two you might legislate the optimal solution out of existence, so I added it to my code by itself. Also, it only works when the grid is square. Finally, since both solvers have parameters to control how hard they work to detect symmetry, I added an option to my code to skip adding any constraints and just crank up the solver's response.

The results are summarized in the following graph, which shows the optimality gap (best bound at the left end, best incumbent at the right end) for each solver and modeling option. The dashed vertical line is the optimal solution (from Erwin's post).

plot of solver/model combination results

 

The "Bilateral" and "Trilateral" models are my original two constraints and my two constraints plus Imre's, respectively. The "Diagonal" model uses Rob's constraint. "None" is just the model by itself and "Solver" is the unmodified model plus a solver parameter setting to get it to work harder detecting symmetries. Note that the results for Xpress with "None" and Xpress with "Solver" are pretty much identical, confirming Imre's assertion that Xpress would detect the symmetry on its own. CPLEX saw a bit of improvement in both incumbent and bound going from "None" to "Solver", so apparently the nudge helped there. Only two runs found the optimal solution, both times CPLEX with some help from antisymmetry constraints. None of the runs got the best bound anywhere near tight.

Returning to my "old dog, new tricks" theme, the takeaway for me is that before I go nuts try to constrain away symmetry in a model, I need to investigate whether the solver can recognize it and, if so, whether it can eliminate or even exploit the symmetry.

Lastly, I belatedly realized that Erwin got his proven optimum quickly because the modified the model to use equality constraints in the interior of the grid and inequalities only in the two outermost rows/columns on each edge. I added that to my code as well, and yes, it gets a proven optimum incredibly fast. I was a bit leery about assuming that redundant coverage would only be required near the boundary, but per some comments by Rob on Erwin's post, 227 is indeed the (known) optimal value for a 32x32 grid.

Friday, February 28, 2025

A Boolean Grid

A recent blog post by Erwin Kalvelagen discusses a very straightforward integer programming problem. You have a rectangular grid of boolean variables, where a variables neighbors are the variable immediately above, below, to the left or to the right of it. The sole constraint is that, for any cell, at least one of that cell's variable or its neighbor variables must be true. The objective is to minimize the number of true cells in the grid. Erwin coded the model in GAMS and ran it for a 32x32 grid. He reported that he got an incumbent value of 227 in about 65 seconds but had trouble getting to optimality. (This might be a good time to point out that Erwin's computer is probably better than mine, since he is a consultant.)

I was curious whether a couple of redundant constraints would help. The problem suffers (if that is the correct term) from symmetry. Draw a grid and color in the cells of an optimal solution. Now flip the grid, switching either top with bottom, left with right, or both. The colored cells still form an optimal solution. What is the harm of symmetry? Think about the branch-and-bound (or, if you prefer, branch-and-cut) algorithm and specifically how it prunes nodes based on bound. When you find a new incumbent solution, you prune any node whose bound is no better than that solution. Typically the node bound will be at least slightly loose (meaning, in a minimization problem, that the bound will be strictly less than the objective value of the best feasible solution lurking in that node). In the context of the current problem, when a feasible solution is found, there will be at least three other feasible solutions with the same objective value, obtained by reversing the indexing of rows, columns or both. Each of them will likely be in a node of the search tree with an objective value at least slightly better than their true value, meaning that none of those nodes can be pruned right away, even if they do not contain an even better solution.

So symmetry can slow pruning and also slow improvement of the best bound. There has been research on how to exploit symmetry in IP models, but as far as I know that work has to be baked into a solver to be used. At least some solvers have some built-in capability to recognize and deal with symmetry, but I'm not sure how well that works. I usually keep an eye out for symmetry and, if I think it might be slowing improvement of the bound, see if I can constrain away some of it.

In this case, the symmetry I identified can be removed by adding just two constraints. One is that the number of true cells in the top row should not exceed the number in the bottom row (so that the vertical flip is ruled out unless the top and bottom rows are tied). Similarly, the other is that the number of true cells in the left column should not exceed the number in the right column (ruling out the horizontal flip). Since these constraints shrink the feasible region, it would not be surprising if they slow down identification of improved incumbents. By enlarging the model, they also slow down (very slightly?) the rate at which nodes are solved. The hope is that faster bound improvements compensate for that.

I ran the 32x32 case (coded in Java) using two solvers, FICO Xpress MP (version 44.01.01) and IBM CPLEX (version 22.1.2), both with and without the antisymmetry constraints. Each run was limited to 15 minutes (wall clock time) and used default settings for all parameters. Here are the results, in the format "best solution/best bound".

 


Xpress MP CPLEX
With antisymmetry 231 / 215.73 227 / 214.40
Without antisymmetry 228 / 215.95 229 / 214.38

 

There is enough randomness in IP solvers that I would not read much into a single iteration of each model. CPLEX actually did a bit better on the incumbent with the antisymmetry constraints included, which surprises me. Xpress had a very slightly worse bound with them included, which also surprises me. The bottom line seems to be that the antisymmetry constraints do not help much (which I find disappointing) and that, as Erwin noted, the problem is a bit stubborn.

As always, my code is available for download.

Wednesday, January 29, 2025

Minimizing Flow Support (II)

In yesterday's post I described a graph optimization problem (from a post on OR Stack Exchange) and how to generate random test instances. You are given a digraph with a single commodity flowing through it. There may be multiple supply and demand nodes (with supply and demand in balance), and arcs have neither costs nor capacity limits. The problem is to find a subgraph with all the original nodes and a minimal number of arcs (the "support" of the flow) such that there is a feasible flow pattern to satisfy all demands. In this post, I will discuss a couple of mixed integer programming (MIP) models for the problem.

I will use the following notation. The nodes are $n_1, \dots, n_N$, and the supply or demand at node $n_i$ is given by $s_i,$ where $s_i > 0$ at supply nodes and $s_i < 0$ at demand nodes. The total supply is given by $F.$ The set of arcs is $A,$ and at each node $n_i$ we denote by $\delta^+(n_i)$ and $\delta^-(n_i)$ respectively the set of arcs flowing into and out of $n_i.$ The common elements of my two MIP models are the following.

  • Variable $x_a \in \lbrace 0,1\rbrace$ is 1 if and only if arc $a\in A$ is selected for the subgraph.
  • Variable $y_a \in [0, F]$ is the flow volume over arc $a \in A.$
  • The objective is to minimize the total number of arcs selected:
       
    $$\textrm{minimize }\sum_{a\in A} x_a.$$
  • For each node $n_i$ we require that flows in and out combined with any supply or demand balance:
       
    $$\sum_{a \in \delta^-(n_i)} y_a - \sum_{a\in \delta^+(a)} y_a = s_i.$$

The models differ in the remaining requirement, that there be no flow on any arcs that were not selected (i.e., $y_a = 0$ if $x_a =0$). Those constraints are added as follows.

  • In the "big M" model, for each $a\in A$ we add the constraint $y_a \le Fx_a.$
  • In the "indicators" model, for each $a\in A$ we add an if-then constraint $x_a = 0 \implies y_a = 0.$

I tested both models using two different solvers, IBM CPLEX 22.1.1 and FICO Xpress 9.5 Optimizer (version 44.01.01). Before running a "production" problem I ran all four combinations of model and solver on a small instance using the solvers' respective tuning routines. In three cases, the default solver settings seemed best. In one case (CPLEX on the big M model), the tuner suggested a couple of nondefault parameter settings, but they fared poorly on the production problem. So I used default parameter settings on all the production runs.

The production problem had 25 supply nodes, 34 demand nodes and 41 transit nodes (nodes where supply was 0) and 526 arcs. Total supply was $F=1000.$ I gave each combination of model and solver a one hour time limit (on my slightly vintage PC). I was mainly curious about how the two models would compare, and secondarily on how the two solvers would compare. Of course, running one test instance for one hour per combination is far from probative, but my curiosity has its bounds. Here is what I found.


Solver Model Incumbent Lower bound Gap (%)
CPLEX big M 55 49.5 10
CPLEX indicators 54 48 11
Xpress big M 57 49 16
Xpress indicators 58 49 18

There are some differences among combinations in the results (which, again, might not bear up under multiple tests), but what I found a bit interesting was that the gap never made it below 10% in any combination, even though I consider the test problem to be not particularly large. (Also slightly interesting was that only roughly 10% of the arcs in the graph were needed.)

I will note one difference in the solvers. I'm not sure if it is a function of different default parameter settings or different memory management. Within the one hour run time limit, neither attempt with Xpress ran into memory issues. In contrast, one of the CPLEX runs exhausted system memory (and hung the system) before the hour was up. So I did the other CPLEX run with a limit of 9500 MB on the tree size (set via the parameter CPXPARAM_MIP_Limits_TreeMemory), and that run ended due to memory exhaustion before the hour was up. (Both CPLEX runs lasted only about 53 minutes.)

One of the main reasons I ran the tests was to see whether the model with indicators would give be tighter than the big M model. A bit of wisdom I received a long time ago was that big M was better than indicators if you have insights into the model that allow you to use a not terribly large value of $M.$ Here, the worst case would be if the entire flow volume $F$ passed through a single arc, which lets me use $F$ (1000 in the test problem) as my value of $M.$ The big M runs did produce smaller gaps than their indicator counterparts, but not by much, and possibly not by a "statistically significant" amount (if you can even mention "statistical significance" while working with a sample size of 1 🥲).

Still, for now I will stick to preferring big M constraints with not-so-big values of $M$ over indicator constraints.

As mentioned in the previous post, you can find my code here, including a README.md file that explains the code structure.

Tuesday, January 28, 2025

Minimizing Flow Support (I)

This is the first of hopefully two posts related to a question posted on Operations Research Stack Exchange. You are given a digraph through which a single commodity flows. Arcs have neither costs nor capacity limits. Each node $n_i$ is either a supply node (with supply $s_i>0$), a demand node (with demand $s_i<0$ treated as a "negative supply” in the optimization models), or what I will call a “transit node” ($s_i =0$) with neither supply nor demand. Key assumptions are that total supply equals total demand and that it is possible to find paths through the digraph satisfying all demands (and thus consuming all supplies).

The problem is to select a minimal number of arcs such that the reduced digraph (using all the original nodes but just the selected arcs) contains routes fulfilling all demands. In this post, I'll describe one way to generate random test instances of the problem. The following post will discuss modeling and solving the problem. I have Java code demonstrating both parts in my university GitLab repository.

My approach to generating a test instance starts with specification of the number of nodes in the graph and a lower bound for the number of arcs. (The lower bound might need to be exceeded in order to ensure that a feasible flow satisfying all demands exists.) My code also asks the user to specify an integer value $F$ for total supply (and total demand). I used integer flows just to make printing the supply/demand at each node and flow on each arc neat. You might prefer to use real valued flows and (since the problem is invariant with respect to the flow volume) just set the total supply/demand at 1.

In what follows, I will use the terms "upstream" and "downstream" to refer to nodes from which there are directed paths to a given node (upstream) or to which there are directed paths from a given node (downstream). To simplify explanation, I will treat demands as positive values. The construction process starts by creating the desired number of nodes and partitioning them into supply, demand and transit nodes. Since you need at least one supply node and at least one demand node, the first node is assigned as a supply node and the second as a demand node. (My code graph also assigns one node as a transit node, but if you do not care whether there are any transit nodes you can skip that.) The remaining nodes are randomly classified as supply, demand or transit. Since my code uses integer flow values, each supply (demand) node needs a supply (demand) of at least 1. If the partitioning process creates more than $F$ supply or demand nodes, the excess nodes are reclassified as transit nodes. If you are using real flow values, this can be skipped.

The next step is to allocate total supply (total demand) randomly across the supply (demand) nodes. Again, since I am using integer flows, my code first allocates one unit of supply or demand to each non-transit node, then allocates the remaining supply (demand) one unit at a time, randomly choosing with replacement a supply (demand) node to receive the next unit.

With the nodes created, it is time to move on to creating arcs. My code allocates to each node of any type two initially empty sets of nodes, those upstream and downstream, and two initially empty sets of arcs, those entering and those leaving the node. It also assigns a temporary variable containing the excess supply if the node is a supply node or the unmet demand if the node is a demand node. Two sets of nodes are created, one containing supply nodes with unused supply (initially, all the supply nodes) and the other containing demand nodes with unmet demands (initially, all the demand nodes).

A list of all possible arcs (excluding arcs from a node to itself) is created and randomly shuffled. Arcs are now added from that list until enough directed paths exist to ensure that all demands can be met. As each arc $(a, b)$ is added to the digraph, it is added to the set of arcs exiting the tail node $a$ and to the set of arcs entering the head node $b.$ The set of nodes upstream of $b$ is updated to include $a$ and all nodes upstream of $a,$ and the set of nodes downstream of $a$ is updated to include $b$ and all nodes downstream of $b.$ Finally, nodes upstream of $b$ with unused supply and nodes downstream of $a$ with unmet demand are paired up. The lesser of the unused supply and unmet demand is subtracted from both the supply of the upstream node and the demand of the downstream node, and whichever has zero supply/demand left is removed from the set of nodes with unused supply/unmet demand. It is possible (and certain when the very last drop of supply/demand is accounted for) that both nodes will be removed from their respective sets.

Once there are no nodes left with excess supply/demand, we can be certain the digraph contains at least one feasible solution. All that remains is to randomly add arcs until the user's specified minimum number of arcs is met.



Sunday, November 10, 2024

Solver Parameters Matter

Modern integer programming solvers come with a gaggle of parameters the user can adjust. There are so many possible parameter combinations that vendors are taking a variety of approaches to taming the beast. The first, of course, is for vendors to set default values that work pretty well most of the time. This is particularly important since many users probably stick to default settings unless they absolutely have to start messing with the parameters. (By "many" I mean "myself and likely others".) The second is to provide a "tuner" that the user can invoke. The tuner experiments with a subset of possible parameter settings to try to find a good combination. Third, I've seen some discussion and I think some papers or conference presentations on using machine learning to predict useful settings based on characteristics of the problem. I am not sure how far along that research is and whether vendors are yet implementing any of it.

In the past few days I got a rather vivid reminder of how much a single parameter tweak can affect things. Erwin Kalvelagen did a blog post on sorting a numeric vector using a MIP model (with an up front disclaimer that it "is not, per se, very useful"). He test a couple of variants of a model on vectors of dimension 50, 100 and 200. I implemented the version of his model with the redundant constraint (which he found to speed things up) in Java using the current versions of both CPLEX and Xpress as solvers. The vector to sort was generated randomly with components distributed uniformly over the interval (0, 1). I tried a few random number seeds, and while times varied a bit, the results were quite consistent. Not wanting to devote too much time to this, I set a time limit of two minutes per solver run.

Using default parameters, both solvers handled the dimension 50 case, but CPLEX was about eight times faster. For dimension 100, CPLEX pulled off the sort in two seconds but Xpress still did not have a solution at the two minute mark. For dimension 200, CPLEX needed around 80 seconds and Xpress, unsurprisingly, struck out.

So CPLEX is faster than Xpress, right? Well, hang on a bit. On the advice of FICO's Daniel Junglas, I adjusted one of their parameters ("PreProbing") to a non-default value. This is one of a number of parameters that will cause the solver to spend more time heuristically hunting for a feasible or improved solution. Using my grandmother's adage "what's sauce for the goose is sauce for the gander," I tweaked an analogous parameter in CPLEX ("MIP.Strategy.Probe"). Sure enough, both solvers got faster on the problem (and Xpress was able to solve all three sizes), but the changes were more profound than that. On dimension 50, Xpress was between three and four times faster than CPLEX. On dimension 100, Xpress was again around four times faster. On dimension 200, Xpress won by a factor of slightly less than three.

So is Xpress actually faster than CPLEX on this problem? Maybe, maybe not. I only tweaked one parameter among several that could be pertinent. To me, at least, there is nothing about the problem that screams "you need more of this" or "you need less of that", other than the fact that the objective function is a constant (we are just looking for a feasible solution), which suggests that any changes designed to tighten the bound faster are likely to be unhelpful. I confess that I also lack a deep understanding of what most parameters do internally, although I have a pretty good grasp on the role of the time limit parameter.

So the reminder for me is that before concluding that one solver is better than another on a problem, or that a problem is too difficult for a particular solver, I need to put a bit of effort into investigating whether any parameter tweaks have substantial impact on performance.

Update: A post on the Solver Max blog answers a question I had (but did not investigate). With feasibility problems, any feasible solution is "optimal", so it is common to leave the objective as optimizing a constant (usually zero). Erwin and I both did that. The question that occurred to me (fleetingly) was whether an objective function could be crafted that would help the solver find a feasible solution faster. In this case, per the Solver Max post, the answer appears to be "yes".

Sunday, September 8, 2024

A Bicriterion Movement Model

A question on Operations Research Stack Exchange asks about a bicriterion routing problem. The underlying scenario is as follows. A service area is partitioned into a rectangular grid, with a robot beginning (and eventually ending) in a specified cell. Each move of the robot is to an adjacent cell (up, down, left or right but not diagonal). The robot must eventually visit each cell at least once and return whence it came. One criterion for the solution is the number of movements (equivalently, the amount of time, since we assume constant speed) required for the robot to make its rounds. In addition, each cell is assigned a nonnegative priority (weight), and the second criterion is the sum over all cells of the time of the first visit weighted by the the priority of the cell. In other words, higher priority cells should be visited earlier. Both criteria are to be minimized.

The problem can be modeled as either an integer program or a constraint program. The movement portion is quite straightforward to model. Balancing the two objectives is where things get interesting. One can optimize either criterion after setting a bound on how bad the other can be, or one can use lexicographic ordering of the criteria (optimize the primary objective while finding the best possible value of the secondary objective given that the primary must remain optimal), or one can optimize a weighted combination of the two objectives (and then play with the weights to explore the Pareto frontier). Weighted combinations are a somewhat tricky business when the objective functions being merged are not directly comparable. For instance, merging two cost functions is pretty straightforward (a dollar of cost is a dollar of cost, at least until the accountants get involved). Merging distance traveled and "priority" (or "urgency", or "weighted delay") is much less straightforward. In real life (as opposed to answering questions on ORSE), I would want to sit with the problem owner and explore acceptable tradeoffs. How much longer could a "good" route be if it reduced weighted delays by 1 unit?

I chose to use an integer program (in Java, with CPLEX as the optimization engine), since CPLEX directly supports lexicographic combinations of objectives. You can find the source code in my GitLab repository. A write-up of the model is in a PDF file here, and output for the test case in the ORSE question is in a text file here. The output includes one run where I minimized delay while limiting the distance to a middle value between the minimum possible distance and the distance obtain from lexicographic ordering with delay having the higher priority. It turned out that compromising a little on distance did not help the delay value.

Tuesday, April 30, 2024

From IP to CP

Someone asked on OR Stack Exchange how to convert an integer programming model into a constraint programming model. I think you can reasonably say that it involves a "paradigm shift", for a couple of reasons.

The first paradigm shift has to do with how you frame the problem, mainly in terms of the decision variables. Math programmers are trained to turn discrete decisions with a logical flavor into binary variables. Discrete quantities, such as how many bins of a certain type to use or how many workers to assign to a task, are expressed as general integer variables, but most other things end up turning into a slew of binary variables. The problem being solved in the ORSE question illustrates this nicely.

The problem is as follows. You have $N$ participants in a tournament involving some kind of racing. Importantly, $N$ is guaranteed to be an even number. There is one track with two lanes, and races are spread over $N-1$ days. Every participant races head to head with every other participant exactly once, and nobody races twice in the same day. For whatever reason, the left lane is preferable to the right lane, and so there is a "fairness" constraint that nobody is assigned the left lane on more than $M$ consecutive days. For some reason, the author also imposed a second fairness constraint that nobody be assigned to the right lane on more than $M$ consecutive days. Dimensions for the author's problem were $N=20$ and $M=2.$

The model has to assign participant pairs (races) to days and also make lane assignments. To decide against whom I must race on a given day, someone building an IP model will use binary variables to select my opponent. Similarly, they will use binary variables to select my lane assignment each day. So the author of the question had in his IP model a variable array opp[Competitors][Competitors][Tracks][Days] taking value 1 "if competitor 'c1' races with 'c2' on track 't' on day 'd'".

CP models are more flexible in their use of variables, and in particular general integer variables. So to decide my opponent on a given day, I can just an integer variable array indexed by day where the value is the index number of my opponent on the given day. Similarly, I could (and would) use an integer variable indexed by day to indicate my lane assignment that day, although in this case that variable does turn out to be binary, since there are only two lanes.

The second paradigm shift has to do with constraints, and it ties to what solver you are using. IP models have a very limited constraint "vocabulary". They all understand linear equalities and inequalities, and some understand some combination of SOS1, SOS2, second order cone and implication constraints. That's pretty much it. CP solvers have a richer "vocabulary" of constraints, but with the caveat that not many of those constraints are universal. I would wager that every CP solver has the "all different" constraint, and they must have the usual arithmetic comparisons ($=,\neq,\lt,\le,\gt,\ge$). Beyond that, it pays to check in advance.

I wrote a CP model (in Java) using IBM's CP Optimizer (CPO) to solve the scheduling problem. Details of the model can be sussed out from the Java code, but I will mention a few pertinent details here.

  • I did use an integer variable array to determine, for each combination of participant and day, the participant's opponent that day, as well as an integer array giving the lane assignment (0 or 1) for each combination of participant and day.
  • To make sure that, on any day, the opponent of X's opponent is X I used CPO's  inverse constraint. The constraint inverse(f, g) says that f[g[x]] = x and g[f[x]] = x for any x in the domain of the inner function.
  • To ensure that nobody raced the same opponent twice, I used allDiff, which is CPO's version of the all different constraint.
  • We have to do something to force opponents in a race to be in different lanes. Let $x_{i,d}$ and $y_{i,d}$ denote respectively the opponent and lane assignment for participant $i$ on day $d.$ In mathematical terms, the constraint we want is $y_{x_{i,d},d} \neq y_{i,d}.$ Indexing a variable with another variable is impossible in an IP model. In CPO, I used the element constraint to do just that.

I added an objective function, namely to minimize the difference between the most and fewest times any participant gets assigned the preferred left lane. I also added one constraint to mitigate symmetry. Since any solution remains a solution (with the same objective value) under any permutation of the participant indices, I froze the first day's schedule as $1\  v.\  N$, $2\  v.\  N-1$, $3\  v.\  N-2$ etc.

On my decent but not screamingly fast PC, CPO found a feasible solution almost instantly and a solution with objective value 1 in under a second. In that solution, every participant gets the left lane either nine or ten times out of the 19 racing days. It's not hard to prove that 1 is the optimal value (you cannot have everybody get exactly the same number of left lane assignments), but don't tell CPO that -- it was still chugging along trying when it hit my five minute time limit.

My Java code is available from my repository under a Creative Commons 4.0 open source license.

Sunday, April 21, 2024

Where Quadratic, Positive Definite and Binary Meet

A comment by Rob Pratt (of SAS) on OR Stack Exchange pointed out two things that are glaringly obvious in hindsight but that somehow I keep forgetting. Both pertain to an expression of the form $x'Qx + c'x,$ either in an objective function or in a second order cone constraint, where $x$ is a vector of variables and $Q$ and $c$ are parameters.

The first observation does not depend on the nature of the $x$ variables. We can without loss of generality assume that $Q$ is symmetric. If it is not, replace $Q$ with the symmetric matrix $\hat{Q} = \frac{1}{2}\left(Q + Q'\right),$ which is symmetric. A wee bit of algebra should convince you that $x'\hat{Q}x = x'Qx.$

The second observation is specific to the case where the $x$ variables are binary (which was the case in the ORSE question which drew the comment from Rob). When minimizing an objective function of the form $x'Qx + c'x$ or when using it in a second order cone constraint of the form $x'Qx + c'x \le 0,$ you want the $Q$ matrix to be positive definite. When $x$ is binary, this can be imposed easily.

Suppose that $x$ is binary and $Q$ is symmetric but not positive definite. The following argument uses the euclidean 2-norm. Let $$\Lambda = \max_{\parallel y \parallel = 1} -y'Qy,$$ so that $y'Qy \ge -\Lambda$ for any unit vector $y.$ Under the assumption that $Q$ is not positive definite, $\Lambda \ge 0.$ Choose some $\lambda > \Lambda$ and set $\hat{Q} = Q + \lambda I,$ where $I$ is the identity matrix of appropriate dimension. For any nonzero vector $y,$

$$ \begin{align*} y'\hat{Q}y & =y'Qy+\lambda y'Iy\\ & =\parallel y\parallel^{2}\left(\frac{y'}{\parallel y\parallel}Q\frac{y}{\parallel y\parallel}+\lambda\right)\\ & \ge\parallel y\parallel^{2}\left(-\Lambda+\lambda\right)\\ & >0. \end{align*} $$

So $\hat{Q}$ is positive definite. Of course, $x'\hat{Q}x \neq x'Qx,$ but this is where the assumption that $x$ is binary sneaks in. For $x_i$ binary we have $x_i^2 = x_i.$ So

$$ \begin{align*} x'\hat{Q}x & =x'Qx+\lambda x'Ix\\ & =x'Qx+\lambda\sum_{i}x_{i}^{2}\\ & =x'Qx+\lambda e'x \end{align*} $$

where $e=(1,\dots,1).$ That means the original expression $x'Qx + c'x$ is equal to $x'\hat{Q}x+(c-\lambda e)'x,$ giving us an equivalent expression with a positive definite quadratic term.

Tuesday, September 12, 2023

Randomly Generating a Connected Graph

Someone posted a question on Operations Research Stack Exchange about generating random test instances of a connected graph. (They specified a planar graph, but I am going to ignore that restriction, which I suspect was unintended and which I am pretty sure would make the problem much harder.) The premise is fairly straightforward. You start with a fixed set of nodes, partitioned into three subsets (supply, transshipment and demand nodes). The goal is to select a random set of edges of fixed cardinality such that (a) the resulting graph is connected and (b) no edge connects two supply nodes or a supply node and a demand node. Setting the number of edges too low will obviously make the problem infeasible. The author specified that the graph could be either directed or undirected. My solutions are coded for undirected graphs but can be adapted to directed graphs.

I proposed two solutions, both of which I coded and tested in Java. One uses a mixed integer programming (MIP) model, which my code solves using CPLEX. Let $E$ be the set of all valid edges, $N$ the number of nodes and $K$ the desired number of edges. The MIP model uses binary variables $x_{i,j}$ for all $(i,j)\in E.$ The requirement to include $K$ edges is handled by the constraint $$\sum_{(i,j)\in E} x_{i,j} = K$$ and randomness is achieved by minimizing $$\sum_{(i,j)\in E} c_{i,j} x_{i,j}$$ where the cost coefficients $c_{i,j}$ are generated randomly. 

That leaves the matter of forcing the graph to be connected. To do that, we introduce forward and backward flow variables $f_{i,j} \ge 0$ and $r_{i,j} \ge 0$ for each edge $(i,j),$ where $f_{i,j}$ is interpreted as flow from $i$ to $j$ and $r_{i,j}$ as flow from $j$ to $i.$ The concept is to single out one node (call it $s$) as a source for $N-1$ units of flow of some mystery commodity and assign every other node a demand of one unit of the commodity. For $i\neq s,$ the flow conservation constraint is $$\sum_{(j,i)\in E} (f_{j,i} - r_{j,i}) + \sum_{(i,j)\in E} (r_{i,j} - f_{i,j}) = 1,$$ which says that flow in minus flow out equals 1. To ensure that flow occurs only on selected edges, we add the constraints $$f_{i,j} \le (N-1) x_{i,j}$$ and $$r_{i,j} \le (N-1) x_{i,j}$$ for all $(i,j)\in E.$

The edge removal heuristic is a bit simpler. We start by selecting all eligible edges and shuffling the list randomly. While the number of edges in the graph exceeds $K,$ we pop the next edge from the list, remove it, and test whether the graph remains connected. If yes, the edge is permanently gone. If no, we replace the edge in the graph (but not on the list) and move to the next edge in the list. To confirm that removing edge $(i,j)$ leaves a connected graph, we start from node $i$ and find all nodes that can be reached from $i$ in one step (i.e., remain adjacent). Then we find all nodes adjacent to those nodes, and continue until either we encounter node $j$ (in which case the graph is still connected) or run out of nodes to test (in which case the graph is now disconnected).

The edge removal heuristic is simpler to explain, does not require a MIP solver and likely is faster. There are two potential disadvantages compared to the MIP approach. One is that the MIP is done after solving once, whereas there is a possibility that the edge removal heuristic fails to produce a connected graph due to an "unlucky" choice of which edges to remove, requiring one or more restarts with different random number seeds. The other is that if $K$ is set too low (making a connected graph impossible), the MIP model will detect that the problem is infeasible. With the edge removal heuristic, you would not be able to distinguish infeasibility from bad luck (although if the heuristic failed multiple times with different random seeds, infeasibility would start to look likely). In very very limited testing of my code, the edge removal heuristic was definitely faster than the MIP and achieved a valid result in the first pass each time.

Java code for both methods is in my code repository.

Saturday, August 12, 2023

The "Set" Card Game

Someone asked a question (actually a pair of questions) about the "Set" card game on OR Stack Exchange. This was the first I had heard of the game. For purposes of this post, the game play is irrelevant. The key is that it uses a deck of 81 cards, each of which has four "features" with three possible values for each feature. Every combination of feature values appears on exactly one card. Three cards form a "set" if, for each of the four features, either all three cards have the same value or all three cards have different values. As noted in the Wikipedia page, there are 1,080 possible "sets". (I'm putting "set" in quotes to avoid confusion with, you know, sets.) The OR SE question asked if an integer program (IP) could be used to generate all 1,080 sets.

The answer is yes, with the caveat that while you can do it, an IP model is not the most efficient way to do it. I won't repeat my IP model here, since (a) it's a bit of a space hog and (b) you can read it in my answer on ORSE. The model is pretty small (97 binary variables) and finds one "set" quickly. There are two ways to get it to find all possible "sets". Both involve adding a "no good" constraint for each solution found, i.e., a constraint that eliminates that solution but no others. The more tedious approach solves the progressively growing model 1,081 times (the last time producing a declaration of infeasibility, meaning all "sets" have been found). The less tedious approach requires a solver that supports a callback function letting you reject candidate solutions by adding a constraint cutting them off (the same "no good" constraint used in the first approach).

Since there is no objective and the constraints are mostly logical in nature, constraint programming (CP) strikes me as a more intuitive approach than IP. The specifics of a CP model will depend on what types of constraints your CP solver supports, but I think the model I tried is likely to work with a lot of solvers. Since I coded the models in Java, I'll use 0-based indexing here. My variables (all integer-valued) are: $x_s\in \lbrace 0,\dots, 80\rbrace,$ the index of the card in slot $s\in \lbrace 0,1,2\rbrace$ of the "set"; and $y_{f,s}\in \lbrace 0,1,2\rbrace,$ the value of feature $f$ in the card in slot $s.$ The constraints are as follows:

  • allDifferent($x$) (no two slots can contain the same card);
  • $x_0 \lt x_1 \lt x_2$ (to avoid counting the same "set" more than once, we require that the cards be listed in ascending index order);
  • $y_{f,s} = a_{x_s,f},$ where $a_{c,f}$  is the value of feature $f$ in card $c$ (the value of a feature in a slot comes from the card in the slot); and
  • allDifferent$(\lbrace y_{f,0}, y_{f,1}, y_{f,2}\rbrace) \vee (y_{f,0}=y_{f,1} \wedge y_{f,1}=y_{f,2})\,\forall f$ (for every feature, either all three slots have different values or all three slots have the same value).

This exploits two constraints that CP solvers typically have but IP solvers do not. One is the "all different" constraint, which forces a set of variables to take values that do not repeat. The other is the ability to use a decision variable as an index to either a constant vector or another variable vector. That comes into play in the third constraint, where the variable $x_s$ (the index of the card in slot $s$) is used as a subscript of $a$ to get the feature value of that card.

Even CP is probably overkill for this problem, though. We can compute the 1,080 possible solutions by brute force. I used a recursive function for this.

Here is how each method stacks up in terms of computation time. As I said, I coded it in Java, using CPLEX 22.1.1 for the IP models and CP Optimizer 22.1.1 for the CP model.

  • brute force took between 0.1 and 0.2 seconds;
  • the CP model also took between 0.1 and 0.55 seconds;
  • solving the IP model once using a callback took between 1.4 and 3 seconds; and
  • solving the IP model 1,081 times (without a callback) took around 56 seconds, give or take.

Solution times for all methods are a bit random (they change when I rerun the program), so treat them as approximate. In the interest of full disclosure, I throttled CPLEX to a single thread for the callback approach. Using multiple threads would have required both synchronizing the code (a pain) and also checking for repeated solutions (since hypothetically two parallel threads could independently locate the same "set"). The moral, to the extent there is one, is that sometimes brute force beats more sophisticated approaches.

You can find my Java code here if interested.

Sunday, July 30, 2023

Visiting All Nodes

A question on Operations Research Stack Exchange asks about finding a shortest route through a connected graph that visits every node. This is distinct from the well-known traveling salesman problem (TSP), in that the TSP requires each node to be visited exactly once whereas the OR SE question explicitly allows nodes to be visited more than once.

The question reminded me of a conversation I had years ago with a doctoral student in our logistics program. He was working on routing deliveries (via truck) from a depot to various customers, and was trying to formulate it as a TSP using a graph based on a road map (buildings or intersections as nodes, roads as edges). I did my best at the time to convince him that the TSP was unnecessarily restrictive and might even make the problem infeasible. The second example in the OR SE question is a nice example of how that could happen. Generally speaking, in an actual routing problem there is nothing stopping you from passing by/through a previously visited location if that's the best route. There are a few exceptions to this, some in military applications (you were laying mines and/or blowing up bridges as you passed them) and some in normal life (you were speeding and might not want to return some place where a traffic cop would remember you), but for the most part I believe it holds.

My advice to the doctoral student was similar to the answer Rob Pratt posted on OR SE. Rethink the graph, keeping just the depot and customers as nodes and letting the edges in the new graph represent the shortest routes between pairs of nodes. So edge $(i,j)$ and edge $(k,\ell)$ connect different pairs of nodes but might represent physical routes that cross (you pass through the same intersection on both, albeit possibly in different directions) or overlap (both take US 127 north from mile marker 102 to mile marker 137). Building the edges in the new graph involves solving a shortest path problem between each pair of nodes (using the original graph). You can apply Dijkstra's algorithm to each pair of nodes, but I'm pretty sure applying the Floyd-Warshall algorithm would be faster. Pro tip: if you use Floyd-Warshall, don't bother to extract the route for each pair of nodes. Just get the distances, and keep the F-W solution handy. Once you have solved a TSP on the modified graph and know which edges will be used, you can extract the physical routes corresponding to just the used edges from the F-W solution.

I posted an explicit MIP model (which I will not repeat here, since it's a space hog) as an answer to the OR SE question. I suspect that Rob's approach (TSP on a modified graph) might be faster, particularly because it allows you to use a dedicated TSP solver. Although TSPs have a reputation for being difficult (being the poster child for NP-hard problems), bespoke solvers like Concorde can chew through rather large TSPs pretty quickly. Other than the upfront computational cost of finding the shortest path between every pair of nodes, the one concern I might have with the TSP approach is that it converts what might have been a sparse graph to a complete graph. So, like all things MIP, which approach is better is an empirical question.

Monday, May 8, 2023

Finding a Connected Subgraph

A recent question on Mathematics Stack Exchange asked about using a mixed integer linear program (MILP) to find a connected subgraph of a specified size (vertex count) within a given graph. Specifically, it asked how to formulate constraints to enforce the requirement that the subgraph be connected.

As I have previously mentioned, one way to force a graph to be connected is to add flows between vertices, treating one vertex as a source and the other vertices as sinks. In the context of the current question, that works as follows. Let $V$ and $E$ be the sets of vertices and edges respectively in our (undirected) graph and $M<\vert V \vert$ the number of vertices required in the connected subgraph. Note that we do not assume the parent graph is connected, and in fact the sample graph in the Math SE post is not. Since flows are inherently directional, we replace each edge $e=(v,w)\in E$ with two arcs, $(v,w)$ and $(w,v).$ Let $A$ be the set of all arcs.

We designate (arbitrarily) one vertex in the subgraph to be the "root" (source) vertex, with a supply of $M-1$ thingies. (I was going to say "honest politicians", but then the problem becomes infeasible, hence "thingies".) Each of the remaining vertices in the subgraph consumes one thingy. Thingies flow along arcs, but only if both endpoints are selected. Since $M-1$ thingies enter the network at the root vertex and each of the remaining $M-1$ vertices consumes one thingy, the only way for flow to be balanced is if the selected vertices form a connected graph.

For each vertex $v\in V,$ let $x_v \in \lbrace 0, 1 \rbrace$ be 1 if and only if vertex $v$ is selected and let $y_v\in \lbrace 0, 1 \rbrace$ be 1 if and only if vertex $v$ is chosen to be the root vertex. For each arc $a\in A,$ let $f_a \in [0, M-1]$ be the flow across arc $a.$ The objective function is immaterial, since we just want a feasible solution, so we will minimize 0. The constraints are as follows:

  • The correct number of vertices must be chosen: $$\sum_{v\in V} x_v = M.$$
  • One vertex must be designated the root: $$\sum_{v \in V} y_v = 1.$$
  • The root vertex must be one of the chosen vertices: $$y_v \le x_v \quad \forall v\in V.$$
  • The flow on any arc must be 0 unless both head and tail are among the chosen vertices: $$f_{(v,w)}\le (M-1)x_v \quad \forall (v,w)\in A$$ and $$f_{(v,w)}\le (M-1)x_w \quad \forall (v,w)\in A.$$
  • The net flow out of any vertex (flow out - flow in) must be $M - 1$ for the source vertex, -1 for every other selected vertex and 0 otherwise: $$\sum_{(v,w)\in A} f_{(v,w)} - \sum_{(w,v)\in A} f_{(w,v)} = M\cdot y_v - x_v \quad \forall v\in V.$$

This works in a MILP model, but a MILP model is probably not the best way to solver the underlying problem. Consider the following algorithm, which is based on constructing a layered subgraph. Assume the vertices are in some order. Make the first vertex the root and create a graph containing just it  (which we will call layer 0). Add up to $M-1$ vertices connected to the root by an edge (layer 1). If the graph now contains $M$ vertices, declare victory and stop. Otherwise, for each vertex in layer 1, find all vertices connected to it by an edge and not in layers 0 or 1 and as many as them as are needed to reach $M$ vertices (or all of them if you are still short). Repeat until done or until all vertices in layer 1 have been processed. The added vertices form layer 2 because they are two edges from the root. Repeat with each layer until either you have $M$ vertices (victory) or there are no vertices left to process.

If all connected vertices have been found and the count is less than $M,$ discard the current root and start over with the next vertex as root. Note that when processing subsequent roots, you can ignore any vertices already tried as root and just look at vertices later in the vertex ordering.

I ginned up some Java code (open source, as always) to test both the MILP model and the layered graph approach, using CPLEX as the MILP solver. Both got the examples from the Math SE post correct in a trivial amount of time. On a randomly generated graph with 5,000 nodes and 624,713 edges (about 5% of the edges in a complete graph that size), with a target subgraph size of 1,000 nodes, the MILP model took about 8 seconds to build and 12 seconds to solve. The layered graph search needed 3 milliseconds. I tried a larger graph, but my computer ran out of heap space trying to build the model (mainly due to the expanding number of flow variables and related constraints).

So there are two things to take away. One is that flows and flow balance constraints can be used to enforce connection among vertices, even when there is no actual thing flowing. The other is that, much as I like MILP models, sometimes simpler approaches are better.


Friday, April 28, 2023

A Matrix Puzzle

A question on Mathematics Stack Exchange, "Placing number blocks so that the resulting matrix is symmetric", revolves around a square matrix whose entries are partitioned into blocks (submatrices). The submatrices need to be rearranged so that the resulting matrix is symmetric around its main diagonal. The author of the question asked if it were possible to do this using linear programming. I think the answer is no, but we can certainly do it with either an integer linear program or a constraint program. The author posts two examples ($4\times 4$ and $5\times 5$). I cobbled together Java code for a MIP model (using CPLEX) and a CP model (using CP Optimizer) and had no trouble solving both problems.

There are a couple of generalizations worth noting. First, while the example matrices contain integers, there is nothing preventing the two models from being used with other content types (real numbers, complex numbers, text). Some variable definitions would need modification, but conceptually both methods would work. Second, the author of the post used only one-dimensional blocks (row vectors or column vectors), but my code allows for arbitrary rectangular blocks. The code assumes that the "parent matrix" is square, but that would be easy enough to relax, so the same models (with minor adjustments) would work with arbitrary rectangular matrices.

I think that the puzzle makes a good vehicle for comparing MIP and CP applications to logic problems. In optimization problems, I suspect that MIP models often do a better job of computing objective bounds than do CP models. That is a non-issue here, since the problem is just to find a feasible solution. For logic problems and similar things (particularly scheduling), I think CP models tend to be more expressive, meaning certain types of constraints or relationships can be expressed more naturally with CP than with MIP (where the relationships turn into rather arcane and complicated adventures with binary variables). That applies here, where the CP model exploits the ability to use integer variables as subscripts of other variables.

As described in the PDF file, though, that subscripting ability has its limits. CP Optimizer will let you index a one-dimensional vector of variables using an integer variable, but not a two-dimensional array. In other words, x[y] is fine but x[y, z] is not (where x, y and z are all variables). The workaround I used is to flatten a 2-D matrix into a 1-D matrix. So if $N\times N$ matrix $x$ is flattened into $N^2\times 1$ vector $\hat{x},$  $x_{y,z}$ becomes $\hat{x}_{N(y - 1) + z}.$

The models are a bit too verbose to squeeze into a blog post, so I wrote them up in a separate PDF file. My Java code (which requires recent versions of CPLEX and CP Optimizer) can be had from my code repository under a Creative Commons license.

Tuesday, April 18, 2023

Node Coloring

Someone posted a question on OR Stack Exchange about coloring nodes in a sensor network. The underlying problem has something to do with placing sensors, but in mathematical terms the poster wanted to color the nodes of a weighted undirected graph with a fixed number of colors, while minimizing the sum of the edge weights of edges connecting nodes of the same color.

There is an obvious MIP model for this problem, which was posted as an answer to an earlier version of the question. If $N$ and $C$ are the index sets for nodes and colors respectively, $E\subseteq N\times N$ is the set of edges, $w_{ij}$ is the objective weight of edge $(i,j)\in E$ and $x_{n,c}$ is a binary variable indicating whether node $n$ is assigned color $c$, the problem can be written as a binary quadratic program:

$$\min\ \sum_{(n,m)\in E}w_{nm}\sum_{c\in C}x_{nc}x_{mc}\\
\textrm{s.t. }\sum_{c\in C}x_{nc}  =1\quad\forall n\in N\\
\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C.$$

 

The model can be linearized by introducing continuous variables $y_{nm}\ge 0$ to represent the objective contribution of each edge $(n,m)\in E$:

 

$$\min\ \sum_{(n,m)\in E}w_{nm} y_{nm}\\
\textrm{s.t. }\sum_{c\in C}x_{nc}=1 \quad \forall n\in N\\
\phantom{\textrm{s.t. }}y_{nm}\ge x_{nc} + x_{mc} - 1 \quad \forall (n,m)\in E, \forall c\in C\\
\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C\\
\phantom{\textrm{s.t. }}y_{nm}\ge 0 \quad \forall (n,m)\in E.$$


There is some symmetry in the model: given any feasible solution, another solution of identical cost is obtained by permuting the colors. It is possible to add constraints to remove that symmetry.


The person posting the question was looking for alternatives to a MIP model. An easy choice for me would be a genetic algorithm. I wanted to test this in R, using the GA library for the genetic algorithm. The GA library allows three types of chromosomes: binary or real vectors, or permutations of an index vector. If I were coding my own GA, the chromosome would be a vector in $\lbrace 1,\dots,C\rbrace^N.$ Within the available chromosome types in the library, the obvious choice was to use a vector $x\in [0,C]^N$ and then set the color of node $i$ to $\lceil x_i \rceil\in \lbrace 1,\dots,C\rbrace.$ I suspect, though, that the symmetry issue described above might have negative repercussions for a GA. If two "good" solutions were effectively the same (other than a permutation of the color indices), their offspring (which would be a mix of the two color numbering methods) might be a mess.


A couple of responders to the OR SE question suggested heuristics, as did I. My choice was to start with a random generation heuristic (assign colors randomly), apply an improvement heuristic (go through the nodes in random order and, for each node, reassign it to the cheapest color given the rest of the solution), and repeat (changing the order in which nodes are examined on each pass) until one pass produced no improvement. Since the heuristic is fairly fast, it can be done multiple times (new random restarts) with the best solution retained. Heuristics like this are very easy to code.


I coded all three approaches in R, using CPLEX (with the ompr and ROI libraries) for the MIP model, the GA library for the genetic algorithm, and nothing special for the improvement heuristic. I also threw in the tictoc library to do some timings. The test problem was a randomly constructed $10\times 20$ complete grid graph (200 nodes; 19,900 edges), to be colored using five colors.


To break symmetry in the MIP model, I forced the colors to be ordered so that the lowest node index receiving any color was smaller than the lowest node index receiving the next color. I let CPLEX run for five minutes using default settings. It made what I would call reasonable progress on the incumbent objective value, but the formulation is apparently very weak (at least on the test example). The lower bound barely moved, and the gap was 99.97% after five minutes.


I mostly used default parameters for the GA, although I let it run for 2,000 generations with a population of size 100. It found a much better solution (in 38 seconds) than CPLEX found in five minutes. To be fair, though, CPLEX was balancing improving the incumbent with working on the bound. Had I changed the CPLEX parameters to emphasize improving the incumbent, it likely would have done better (though likely not as well as the GA).


The improvement heuristic surprised me a bit. 600 random starts needed only 37 seconds (and I'm not the poster boy for optimized R code), and the best solution it found had an objective value about 5% better than what the GA produced. I did a small number of runs with different random seeds (but the same problem size), and the improvement heuristic consistently edged out the GA. I suppose the moral here is to try simple heuristics before going "high tech".


You are welcome to play with my R notebook, which contains both code and results.

Friday, January 27, 2023

Finding All Simple Cycles (III)

This is the third and (hopefully) final post about finding all simple cycles in an undirected graph. In the first post, I discussed the problem briefly and described a simple, efficient iterative algorithm. In the second post, I posed (but definitely did not recommend) a mixed integer linear program (MILP) to find all simple cycles. Here I will discuss an alternative MILP approach, specific to CPLEX. To make sense of it, you will definitely need to have read the second post. Java code demonstrating all three methods is available from my repository.

Recent versions of CPLEX have a feature called the solution pool, about which I have previous written. One use is to accumulate solutions (including suboptimal ones if desired) encountered along the way to the optimal solution of a MILP model. Another, using the IloCplex.populate() method in the Java API, is to just generate a gaggle of feasible solutions. So the approach I propose here is to build a MILP model for finding a single simple cycle, set the solution pool population limit parameter (the maximum number of solutions to retain) to something really large and the solution pool intensity parameter (how hard CPLEX works to churn out solutions) to its maximum value, and then use the populate() method to find simple cycles. This is implemented in the SingleCycleMIP.java class in my code. 

Actually, I'm not sure setting the intensity parameter is necessary. On the small test graph, results were similar no matter what intensity setting I used. CPLEX stopped with 160 solutions, many of which were duplicates. After removing duplicates, there were 13 distinct simple cycles, which matches what the other methods found. The only difference was run time, which was a fraction of a second using either the default intensity or the maximum intensity (4) and about two seconds using the minimum intensity (1). Either way, it was much faster than the full MILP model, at least on this small test graph.

The MILP model used here is a subset of the model in the previous post. The model has no objective value (or equivalently minimizes the constant zero function). We drop the $u$ variables (which were used to ensure that cycles were distinct) and the $x$ variables (which signaled whether a slot in the solution was filled with a cycle or not). For all other variables, we drop the $c$ subscript (which slot is being filled), since we are only building one cycle. The "one anchor per cycle" constraint becomes $\sum_i w_i = 1,$ and other constraints involving $u$ and/or $x$ are dropped.

One issue with this model is that the same cycle can be discovered multiple times, either because its orientation is reversed (so the "clockwise" and "counterclockwise" versions are considered distinct) or possibly because minor difference (rounding error) in the flow variables make two copies of the same solution look distinct. Regardless of the reason, the fact CPLEX terminated with 160 solutions when only 13 cycles exist tells you duplication is happening. Fortunately, we do not have to monkey with the model (or set overly finicky values for tolerance values); we can just filter the pool after the run and weed out the duplicates (which my code does). The alternative would be to use a callback (either an incumbent callback if you are using legacy callbacks, or the more current generic callback in "candidate" context) to weed out duplicates as they arise. Since the logic to be used in the callback is the same as the logic used in post-processing, I cannot see any virtue to using a callback, and it definitely would complicate the code.

With that I am hopefully done with this topic. :-)


Thursday, January 26, 2023

Finding All Simple Cycles (II)

Yesterday's post described my introduction (by way of a question on OR Stack Exchange) to the problem of how to find all simple cycles in a graph. In that post I described a simple, efficient iterative algorithm for finding them.

The OR SE question specifically asked about a mixed integer linear program (MILP) to find all simple cycles. While I would never take that approach myself, I got curious as to whether it was possible to construct such a model. Down the rabbit hole I went. The answer turns out to be yes. In this post, I'll describe the model (again, without advocating that it be used).

Assume that we have an undirected graph with vertices numbered $1,\dots,N$ and edge set $E.$ In what follows, $i$ and $j$ will always be vertex numbers and $c$ and $c'$ will index cycles. Although the graph is undirected, when we construct cycles we will give them an orientation (e.g., $3 \rightarrow 2 \rightarrow 5 \rightarrow 3$), which will be useful in ensuring that the result is an actual cycle (you can get from the starting node back to itself) and not the composition of multiple smaller cycles (e.g., not treating $3 \rightarrow 2 \rightarrow 4 \rightarrow 3$ and $5 \rightarrow 6 \rightarrow 9 \rightarrow 8 \rightarrow 5$ as if they form a single cycle).

The model depends on guessing a non-binding upper bound $C$ on the number of simple cycles in the graph. (If the solution to the model does contain $C$ cycles, your guess may have been too small. To be certain that all cycles were found, you would need to bump up $C$ and solve again, until the optimal solution found fewer than $C$ cycles.)

The model contains the following gazillion binary variables:

  • $x_{c}$ is an indicator for whether cycle $c\in\left\{ 1,\dots,C\right\} $
    is constructed;
  • $y_{i,c}$ is an indicator for whether vertex $i$ belongs to cycle $c$;
  • $z_{i,j,c}$ is an indicator for whether edge $[i,j]\in E$ is part of
    cycle $c$ with the tour of $c$ proceeding from $i$ to $j$;
  • $w_{i,c}$ is an indicator for whether vertex $i$ is the anchor (starting
    and ending point) for cycle $c$; and
  • $u_{i,j,c,c'}$ is an indicator that edge $[i,j]\in E$ (in either
    orientation) belongs to exactly one of cycles $c$ and $c'\neq c.$

We also have one set of variables that can be defined as either continuous or integer (continuous probably makes the model solve faster):

  • $f_{i,j,c}\ge 0$ is the ``flow'' on edge $[i,j]$ in the orientation $i\rightarrow j$ in cycle $c.$

The flow variables are an adaptation of the variables used in the Miller-Tucker-Zemlin formulation for the traveling salesman problem (TSP). We will use them to ensure that cycles found are in fact cycles.

The objective, at least, is straightforward. Since we want to find all possible simple cycles, we maximize the number of cycles constructed: $\max\sum_{c=1}^C x_{c}.$ Now come the constraints. (Brace yourself for some serious vertical scrolling.)

  • Edges can only belong to cycles containing both endpoints, and only
    in one orientation: \begin{align*}
    z_{i,j,c}+z_{j,i,c} & \le y_{i,c}\quad\forall[i,j]\in E,\forall c\\
    z_{i,j,c}+z_{j,i,c} & \le y_{j,c}\quad\forall[i,j]\in E,\forall c
    \end{align*}
  • Flow only occurs on edges that are used, and only in the orientation
    used: \begin{align*}
    f_{i,j,c} & \le Nz_{i,j,c}\quad\forall[i,j]\in E,\forall c\\
    f_{j,i,c} & \le Nz_{j,i,c}\quad\forall[i,j]\in E,\forall c
    \end{align*}
  • Constructed cycles have one anchor and unused cycles have none: \[
    \sum_{i=1}^N w_{i,c}=x_{c}\quad\forall c
    \]
  • A cycle's anchor must belong to the cycle: \[
    w_{i,c}\le y_{i,c}\quad\forall i,\forall c
    \]
  • In every cycle, at every node, there are either two incident edges
    (if the node belongs to the cycle) or none (if the node is not part
    of the cycle: \[
    \sum_{j:[i,j]\in E}(z_{i,j,c}+z_{j,i,c})=2y_{i,c}\quad\forall i,\forall c
    \]
  • In any used cycle, at any node in the cycle other than the anchor, flow out is at least one unit less than flow in (mimicking the MTZ formulation of the TSP): \[
    \sum_{j:[i,j]\in E}f_{i,j,c}\le\sum_{j:[i,j]\in E}f_{j,i,c}-y_{i,c}+Nw_{i,c}\quad\forall i,\forall c
    \]
  • To ensure a complete cycle, one unit of flow must return to the anchor node: \[
    \sum_{j:[i,j]\in E}f_{j,i,c}\ge w_{i,c}\quad\forall i,\forall c
    \]
  • The $u$ variables are defined in terms of the $z$ variables: \[
    u_{i,j,c,c'}\le z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\quad\forall[i,j]\in E,\forall c<c'
    \] \[
    u_{i,j,c,c'}\le2-\left(z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\right)\quad\forall[i,j]\in E,\forall c<c'
    \]
  • Any pair of cycles constructed must differ in at least one edge (which
    also prevents a cycle from repeating with the opposite orientation): \[
    \sum_{[i,j]\in E}u_{i,j,c,c'}\ge x_{c}+x_{c'}-1\quad\forall c<c'
    \]

It is also possible to add constraints to mitigate some of the symmetry in the model. Two come to mind:

  • Unused cycles must have higher indices than used cycles: \[
    x_{c}\le x_{c-1}\quad\forall c\in\left\{ 2,\dots,C\right\}
    \]
  • The anchor in a cycle must have the smallest index of any node
    in the cycle: \[
    w_{i,c}+y_{j,c}\le1\quad\forall i>j,\forall c
    \]

The first one is of low enough dimension that I automatically included it in my Java code. I made the second one optional in the code.

When I ran it against the 9 node, 12 edge graph in the OR SE question, CPLEX had a solution with 13 cycles (the full number) after about 25 seconds and 5,000 nodes -- way, way more time than the 13 milliseconds the iterative method needed. I set $C=20$ as the upper bound on the number of cycles, which of course is also the initial upper bound on the objective. After 60 seconds (the time limit I set), CPLEX's upper bound was still 20. So probable optimality will not be easy to come by.

I've got one more MILP model up my sleeve, coming in the next post. Meanwhile, as a reminder, my Java code is available here.

Wednesday, January 25, 2023

Finding All Simple Cycles (I)

This is the first of a sequence of posts related to the question of finding all simple cycles in an undirected graph. Motivation for the posts comes from a question on OR Stack Exchange. A simple cycle is a cycle with no repeated nodes (other than the starting/ending node).

Assume we have an undirected graph with nodes indexed from 1 to $N$ and with edge set $E.$ I will use the term "anchor" to refer to the node at which a tour of a simple cycle starts (and ends), and require (without loss of generality) that the anchor of every cycle be the lowest index node in the cycle. Since a cycle is the same whether you traverse it "clockwise" or "counterclockwise", I will also require without loss of generality that nodes in a cycle be listed so that the node immediately after the anchor has smaller index than the node immediately before the anchor. This is just to reduce production of duplicate cycles.

I came up with three approaches to finding all simple cycles. The first, which I will describe below, is a simple iterative scheme that is the fastest. The other two, described in subsequent posts, use mixed integer linear programs (MILPs) -- not because they are efficient but because the OR SE question specifically asked about MILP approaches. I will describe those in subsequent posts.

The iterative scheme is pretty simple. It uses a queue of paths (incomplete cycles) to store work in progress. I will use "terminus" to indicate the last node in a path. When the terminus equals the anchor, we have a cycle. Due to my definition of "anchor", we can say that no node on a path can have a lower index than the anchor of the path.

The iterative scheme loops over possible anchors from 1 to $N.$ For each anchor, we initialize the queue with all paths consisting of a single edge incident at the anchor and not incident at any node less than the anchor. While the queue is not empty, we pop a path and create one or more new paths by extending that path with each qualifying edge. A qualifying edge is one that meets the following conditions:

  • it is incident at the terminus of the current path;
  • it is not incident at any node already on the path (other than anchor);
  • it is not incident at any node less than the anchor; and
  • if it is incident at the anchor, it is not the same as the first edge of the path (i.e., we do not allow a "cycle" of the form $a\rightarrow b \rightarrow a$).

If the new edge leads to the anchor, we have a cycle. If not, we update the terminus and add the extended path to the queue, for further processing later. If a path cannot be extended because it is not a cycle and there are no qualifying edges left, we discard it.

Coding the algorithm is pretty straightforward, and it should work on any undirected graph (even those that are not connected). On the example from the OR SE post (9 nodes, 12 edges) it took a whopping 11 milliseconds to find all 13 simple cycles.

Java code for this and the two MILP models (the latter requiring a recent version of CPLEX) can be found in my GitLab repository. The iterative algorithm is in the CycleFinder class. Stay tuned for the MILP models.