Thursday, March 3, 2022

Finding Almost All Paths

A question posted on Stack Overflow (and subsequently deleted) led to a blog post by Erwin Kalvelagen on how to find all paths between two nodes in a directed graph (possibly with self-loops, i.e. arcs from a node to itself) subject to two constraints: no arc can be used more than once in a path; and there is an upper limit $M$ on the number of arcs used. Note that a path might visit a *node* more than once. It just cannot repeat an arc. The original question seems to have referred to an undirected graph, but Erwin's post works with directed graphs and so will I.

Erwin explored some mixed integer linear programming (MIP) models in his post, and a subsequent post on OR Stack Exchange led to more proposals of MIP models (including one from me). I also suggested that a "brute force" approach might be faster than any of the MIP models. In what follows, I will spell out both my MIP model and the brute force approach I used. Java code for both (which requires CPLEX and the Apache Commons Collections library) are in my code repository.

In what follows $A$ is the set of arcs in the graph, $s$ is the origin node for all paths, $t$ is the destination node for all paths, and $M$ is the maximum number of arcs to include in a path.

MIP model

The MIP model uses the following variables. 
  • $u_{a}\in\left\{ 0,1\right\} $ is 1 if and only if arc $a$ is used on the path.
  • $f_{a}\in\left\{ 0,1\right\} $ is 1 if and only if arc $a$ is the first arc on the path.
  • $\ell_{a}\in\left\{ 0,1\right\} $ is 1 if and only if arc $a$ is the last arc on the path.
  • $y_{ab}\in\left\{ 0,1\right\} $ is 1 if and only if arc $b$ immediately follows arc $a$ on the path.
  • $z_{a}\in\left[0,M\right]$ will be the number of arcs preceding arc $a$ on the path (0 if $a$ is not on the path).

Some of the variables can be eliminated (fixed at 0) at the outset.

  • $f_{a}=0$ if node $s$ is not the tail of arc $a.$
  • $\ell_{a}=0$ if node $t$ is not the head of arc $a.$
  • $y_{ab}=0$ if the head of arc $a$ is not the tail of arc $b.$

Since we are just interested in finding feasible solutions, the objective is to minimize 0.

Use of the $z$ variables mimics the Miller-Tucker-Zemlin approach to avoiding subtours in the traveling salesman problem. A common knock on the MTZ formulation for the TSP is that it has a somewhat loose continuous relaxation. Since we have a trivial objective (all feasible solutions are optimal), that is not a concern here.

The constraints are as follows.

  • There must be one first arc and one last arc.
    $$\sum_{a\in A}f_{a}=1.$$$$\sum_{a\in A}\ell_{a}=1.$$
  • At most $M$ arcs can be used.$$\sum_{a\in A}u_{a}\le M.$$
  • An arc is used if and only if it is either the first arc or follows another arc on the path.$$f_{a}+\sum_{b\in A}y_{ba}=u_{a}\quad\forall a\in A.$$
  • The last arc must be a used arc.$$\ell_{a}\le u_{a}\quad\forall a\in A. (1)$$
  • The sequence value of an unused arc is 0.$$z_{a}\le Mu_{a}\quad\forall a\in A.$$
  • No arc can follow the last arc.$$\ell_{a}+\sum_{b\in A}y_{ab}\le1\quad\forall a\in A. (2)$$
  • If an arc is used, either it is the last arc or another arc follows it.$$\ell_{a}+\sum_{b\in A}y_{ab}=u_{a}\quad\forall a\in A. (3)$$
  • If an arc $b$ follows arc $a$, the sequence number of arc $b$ must be one higher than the sequence number of arc $a.$ $$z_{a}-z_{b}+\left(M+1\right)y_{ab}\le M\quad\forall a,b\in A,a\neq b.$$

Over on OR Stack Exchange, Rob Pratt correctly pointed out that constraint (3) implies constraints (1) and (2). I've left them in the Java code anyway. The CPLEX presolver removes them automatically.

Finding all solutions 

To find all solutions, one possible approach is to solve whatever MIP model you choose, then add a "no-good" constraint that eliminates the solution just found (and only that one) and solve again, until eventually the aggregation of "no-good" constraints makes the model infeasible. What I did instead was to use the "populate" command in CPLEX, which accumulates a pool of solutions. Besides a time limit, I used two non-default parameter settings: I cranked up the solution pool capacity (the maximum number of solutions to find) to something high enough to let it find all solutions; and I set the solution pool intensity to its highest value (4), which tells CPLEX to aggressively seek out all feasible solutions.

Brute force approach

The brute force approach is remarkably straightforward. We will use a queue of (incomplete) paths that I will call the "to-do list". Start by creating a length one path for each arc with tail $s$ and add them to the to-do list. We now proceed in a loop until the to-do list is empty. At each iteration, we pull a path $P$ off the to-do list, identify the arcs whose tails match the head of the last arc in $P$, remove any arcs already on $P$, and for each surviving arc $a$ create a new path $P'$ by extending $P$ with $a$. If the head of arc $a$ is $t$, $P'$ is a new $s-t$ path, which we record. Either way, if $P'$ has length less than $M$, we add it to the to-do list, and eventually try to extend it further.

Do they work?

Would I be posting this if they didn't. :-) I tested both methods on the digraph from Erwin's post, which has 10 nodes and 22 arcs (with two self-loops). The source and sink are nodes 1 and 10 respectively. With $M=3$ there are four paths (which both methods find), and with $M=4$ there are nine paths (which both methods find). In both cases, brute force took about 1 ms. on my PC (using non-optimized Java code with a single thread). CPLEX times were rather stochastic, but I think it fair to say that building the models took around 55+ ms. and solving them typically took 20 ms. or more.
When I set a maximum length ($M$) of 10, things got interesting. The brute force approach found 268 paths (in about 6 ms), while CPLEX found only 33. Neither time limit nor pool size were a factor, so I assume that this is just a limitation of the aggressive setting for pool intensity. That means that to find all possible solutions using CPLEX, the solve / add constraint / solve again approach would be necessary.

No comments:

Post a Comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.