Yesterday's post described my introduction (by way of a question on OR Stack Exchange) to the problem of how to find all simple cycles in a graph. In that post I described a simple, efficient iterative algorithm for finding them.
The OR SE question specifically asked about a mixed integer linear program (MILP) to find all simple cycles. While I would never take that approach myself, I got curious as to whether it was possible to construct such a model. Down the rabbit hole I went. The answer turns out to be yes. In this post, I'll describe the model (again, without advocating that it be used).
Assume that we have an undirected graph with vertices numbered $1,\dots,N$ and edge set $E.$ In what follows, $i$ and $j$ will always be vertex numbers and $c$ and $c'$ will index cycles. Although the graph is undirected, when we construct cycles we will give them an orientation (e.g., $3 \rightarrow 2 \rightarrow 5 \rightarrow 3$), which will be useful in ensuring that the result is an actual cycle (you can get from the starting node back to itself) and not the composition of multiple smaller cycles (e.g., not treating $3 \rightarrow 2 \rightarrow 4 \rightarrow 3$ and $5 \rightarrow 6 \rightarrow 9 \rightarrow 8 \rightarrow 5$ as if they form a single cycle).
The model depends on guessing a non-binding upper bound $C$ on the number of simple cycles in the graph. (If the solution to the model does contain $C$ cycles, your guess may have been too small. To be certain that all cycles were found, you would need to bump up $C$ and solve again, until the optimal solution found fewer than $C$ cycles.)
The model contains the following gazillion binary variables:
- $x_{c}$ is an indicator for whether cycle $c\in\left\{ 1,\dots,C\right\} $
is constructed; - $y_{i,c}$ is an indicator for whether vertex $i$ belongs to cycle $c$;
- $z_{i,j,c}$ is an indicator for whether edge $[i,j]\in E$ is part of
cycle $c$ with the tour of $c$ proceeding from $i$ to $j$; - $w_{i,c}$ is an indicator for whether vertex $i$ is the anchor (starting
and ending point) for cycle $c$; and
- $u_{i,j,c,c'}$ is an indicator that edge $[i,j]\in E$ (in either
orientation) belongs to exactly one of cycles $c$ and $c'\neq c.$
We also have one set of variables that can be defined as either continuous or integer (continuous probably makes the model solve faster):
- $f_{i,j,c}\ge 0$ is the ``flow'' on edge $[i,j]$ in the orientation $i\rightarrow j$ in cycle $c.$
The flow variables are an adaptation of the variables used in the Miller-Tucker-Zemlin formulation for the traveling salesman problem (TSP). We will use them to ensure that cycles found are in fact cycles.
The objective, at least, is straightforward. Since we want to find all possible simple cycles, we maximize the number of cycles constructed: $\max\sum_{c=1}^C x_{c}.$ Now come the constraints. (Brace yourself for some serious vertical scrolling.)
- Edges can only belong to cycles containing both endpoints, and only
in one orientation: \begin{align*}
z_{i,j,c}+z_{j,i,c} & \le y_{i,c}\quad\forall[i,j]\in E,\forall c\\
z_{i,j,c}+z_{j,i,c} & \le y_{j,c}\quad\forall[i,j]\in E,\forall c
\end{align*} - Flow only occurs on edges that are used, and only in the orientation
used: \begin{align*}
f_{i,j,c} & \le Nz_{i,j,c}\quad\forall[i,j]\in E,\forall c\\
f_{j,i,c} & \le Nz_{j,i,c}\quad\forall[i,j]\in E,\forall c
\end{align*} - Constructed cycles have one anchor and unused cycles have none: \[
\sum_{i=1}^N w_{i,c}=x_{c}\quad\forall c
\] - A cycle's anchor must belong to the cycle: \[
w_{i,c}\le y_{i,c}\quad\forall i,\forall c
\] - In every cycle, at every node, there are either two incident edges
(if the node belongs to the cycle) or none (if the node is not part
of the cycle: \[
\sum_{j:[i,j]\in E}(z_{i,j,c}+z_{j,i,c})=2y_{i,c}\quad\forall i,\forall c
\] - In any used cycle, at any node in the cycle other than the anchor, flow out is at least one unit less than flow in (mimicking the MTZ formulation of the TSP): \[
\sum_{j:[i,j]\in E}f_{i,j,c}\le\sum_{j:[i,j]\in E}f_{j,i,c}-y_{i,c}+Nw_{i,c}\quad\forall i,\forall c
\] - To ensure a complete cycle, one unit of flow must return to the anchor node: \[
\sum_{j:[i,j]\in E}f_{j,i,c}\ge w_{i,c}\quad\forall i,\forall c
\] - The $u$ variables are defined in terms of the $z$ variables: \[
u_{i,j,c,c'}\le z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\quad\forall[i,j]\in E,\forall c<c'
\] \[
u_{i,j,c,c'}\le2-\left(z_{i,j,c}+z_{j,i,c}+z_{i,j,c'}+z_{j,i,c'}\right)\quad\forall[i,j]\in E,\forall c<c'
\] - Any pair of cycles constructed must differ in at least one edge (which
also prevents a cycle from repeating with the opposite orientation): \[
\sum_{[i,j]\in E}u_{i,j,c,c'}\ge x_{c}+x_{c'}-1\quad\forall c<c'
\]
It is also possible to add constraints to mitigate some of the symmetry in the model. Two come to mind:
- Unused cycles must have higher indices than used cycles: \[
x_{c}\le x_{c-1}\quad\forall c\in\left\{ 2,\dots,C\right\}
\] - The anchor in a cycle must have the smallest index of any node
in the cycle: \[
w_{i,c}+y_{j,c}\le1\quad\forall i>j,\forall c
\]
The first one is of low enough dimension that I automatically included it in my Java code. I made the second one optional in the code.
When I ran it against the 9 node, 12 edge graph in the OR SE question, CPLEX had a solution with 13 cycles (the full number) after about 25 seconds and 5,000 nodes -- way, way more time than the 13 milliseconds the iterative method needed. I set $C=20$ as the upper bound on the number of cycles, which of course is also the initial upper bound on the objective. After 60 seconds (the time limit I set), CPLEX's upper bound was still 20. So probable optimality will not be easy to come by.
I've got one more MILP model up my sleeve, coming in the next post. Meanwhile, as a reminder, my Java code is available here.