## Thursday, July 25, 2013

### Extracting a Connected Graph

Some optimization problems can be characterized as follows:
• a graph or digraph, either explicitly stated or implicit, underlies the problem;
• part of the problem is to select a subgraph by selecting either nodes, edges or both; and
• the selected subgraph must be connected.
Below I will show one way to model this. What I propose is mathematically valid, but I have no idea whether it is the most efficient approach. One disclaimer: in the case of a directed graph, this method will guarantee weak connectivity but not necessarily strong connectivity.

Let $\mathcal{V}$ be the set of vertices and $\mathcal{E}\subseteq \mathcal{V} \times \mathcal{V}$ the set of edges of the underlying graph. Create binary variables $x_v \in \{0,1\}\,\,\forall v\in\mathcal{V}$ and $y_{(v,w)}\in \{0,1\}\,\,\forall (v,w)\in \mathcal{E}$. They signal respectively the selection of vertex $v$ and edge $(v,w)$ for inclusion in the constructed subgraph. For consistency, add either the constraints\begin{gather*} y_{(v,w)}\le x_{v}\\ y_{(v,w)}\le x_{w} \end{gather*} or the constraints$2y_{(v,w)}\le x_{v}+x_{w}$ for all $(v,w)\in\mathcal{E}$. The former yields a bit tighter model, while the latter yields a bit smaller model. They serve the same purpose: to avoid "dangling" edges (edges not connected to selected vertices at both ends). Also, if the original graph is undirected, we can exploit$y_{(v,w)}=y_{(w,v)}\,\,\forall (v,w)\in\mathcal{E},$ to effectively halve the number of $y$ variables in the model.

To enforce connectedness, we will run a flow through the selected subgraph (treating edges as bidirectional for flow purposes), siphoning off one unit of flow at each selected vertex. Let $f_{(v,w)}\ge 0\,\,\forall (v,w)\in\mathcal{E}$ denote the flow along edge $(v,w)$. If we intend to siphon off one unit of flow at each vertex, the maximum flow volume we need is $\left|\mathcal{V}\right|$, the cardinality of the original vertex set. So we can limit flow to selected edges with the constraints$f_{(v,w)}\le\left|\mathcal{V}\right|\left(y_{(v,w)}+y_{(w,v)}\right)\,\,\forall (v,w)\in\mathcal{E}.$Note that, in the case of a digraph, we allow flow in both directions $v\rightarrow w$ and $w\rightarrow v$ if either of the arcs $(v,w)$ and $(w,v)$ is selected.

Before addressing flow balance constraints, we need to deal with the one tricky part: where does the flow originate? We will postulate a source node with sufficient supply sitting outside the original graph. For $v\in\mathcal{V}$, let $g_v\in\left[0,\left|\mathcal{V}\right|\right]$ denote a flow volume from the source node directly to vertex $v$ ("directly" meaning not passing through any edges of the original graph). We can now express conservation of flow ("flow in equals flow out") as$g_v + \sum_{(w,v)\in\mathcal{E}} f_{(w,v)} = \sum_{(v,w)\in\mathcal{E}} f_{(v,w)} + x_v.$If vertex $v$ is not selected ($x_v=0$), then by our previous constraints $y_{(v,w)}=y_{(w,v)}=0\,\,\forall w$, forcing  $f_{(v,w)}=f_{(w,v)}=0\,\,\forall w$, and so this constraint reduces to $g_v=0$ (no external flow into unselected vertices). If vertex $v$ is selected ($x_v=1$), the constraint balances flow after extracting exactly one unit.

There is one thing left to do: restrict our source node to supplying a single (selected) vertex. To do this, we impose an arbitrary total ordering ($\preceq$) on $\mathcal{V}$. This allows us to define, for each $v\in\mathcal{V}$, the set $\mathcal{P}(v)=\{w\in\mathcal{V}\,:\, w\precneqq v\}$ of predecessors of vertex $v$. By adding the constraints$g_v\le\left|\mathcal{V}\right|\left(1-x_w\right)\,\,\forall w\precneqq v$we limit the source node to supplying only the first (in the precedence order) selected vertex. Since each of the other selected vertices draws supply from the first selected vertex (call that the "entry vertex") by conservative flows over selected edges between selected vertices, there must be a path in the selected subgraph from the entry vertex to every other selected vertex, meaning the subgraph is connected.

Addendum: The last step (restricting external flow to one vertex) adds $O\left(\left|\mathcal{V}\right|^2\right)$ constraints to the model. If we know a priori that at least one vertex from some (hopefully small) subset $\tilde{\mathcal{V}}\subset\mathcal{V}$ is guaranteed to be included in the subgraph, we can require that flow enter the subgraph through one of those vertices. Define the (arbitrary) ordering on $\mathcal{V}$ to that $\tilde{\mathcal{V}}$ comes first:$v\prec w \,\,\forall v\in\tilde{\mathcal{V}},w\in\mathcal{V}\backslash\tilde{\mathcal{V}}.$The last set of constraints then becomes\begin{gather*}g_v\le\left|\mathcal{V}\right|\left(1-x_w\right)\,\,\forall v,w\in\tilde{\mathcal{V}}\,\ni\,w\precneqq v\\g_v = 0\,\,\forall v\in\mathcal{V}\backslash\tilde{\mathcal{V}}\end{gather*}which adds $O\left(\left|\tilde{\mathcal{V}}\right|^2\right)$ constraints (and gets rid of some of the $g$ variables).

Addendum #2: I have a bit of a blind spot when it comes to SOS1 constraints -- I usually only think of them in terms of binary variables, whereas they can in fact involve continuous variables. An alternative to the final $O\left(\left|\mathcal{V}\right|^2\right)$ set of constraints is simply to declare the $g$ variables to be a type 1 specially ordered set. Depending on the solver, that may implicitly create additional binary variables (which I was trying to avoid when I crafted the last set of constraints -- see J-F Puget's comment below and my response), or it may just alter the branching scheme of the solver. To exploit an SOS1 constraint, a solver typically needs "weights" for the variables, to guide its branching decisions. I'm not sure what useful weights for the $g$ variables would be, but my first guess would be to weight $g_v$ by the number of successor nodes to $v$ in my precedence order.

Addendum #3: The restriction to a single entering flow can be done in  $O\left(\left|\mathcal{V}\right|\right)$ rather than $O\left(\left|\mathcal{V}\right|^2\right)$ constraints as follows:$\sum_{w\in\mathcal{V}\,:\,w\succneqq v}g_w\le\left|\mathcal{V}\right|\left(1-x_v\right)\,\,\forall v\in\mathcal{V}.$Selecting any node $v$ precludes inbound flows at all successor nodes, so the only possible entry point is the first selected node. I think this formulation has a weaker relaxation than the first one I gave above, but it certainly has fewer constraints.

1. I had been thinking about this problem and converged on a model that is quite similar to yours except for the last part (the ordering).
Here it is.
Let's introduce one extra binary variable z_v per vertex v. z_v is 1 iff v is the vertex by which the flow enters the graph. Then constraints are
sum_v z_v = 1
z_v <= x_v
g_v <= |V| z_v
The number of additional variables and constraints is linear in the size for the problem. I don't know if my model would yield to better performance though.

1. J-F: You end up with fewer constraints than I do but additional binary variables (z_v). My first reaction was the same as yours -- binary variables to determine where the flow enters -- but my last set of constraints was a conscious attempt to avoid adding binary variables to the problem. I'm not sure which of our models has the tighter relaxation, nor which imposes more work per node in the search tree. I suspect that relative performance may hinge on whether you can sufficiently exploit the z variables being an SOS1 (meaning whether you can assign meaningful weights to them for branching purposes) and perhaps whether you can find good branching priorities for the x and z variables. Meanwhile, with my model, I would be tempted to add the last set of constraints as lazy constraints and see if that helped speed things up. I also just had another thought (motivated by your comment), which I'll add above as a second addendum.

2. Your comment made me think as well. My model suffers from symmetry: for a given connected graph any node could be the supply node. We can enforce that the supply node is the first node of the graph as follows
z_v + sum_{w<v} x_v <= 1

1. Good point about symmetry ... and fiddling a bit with your new formulation led me to addendum #3 above (order 1 increase in constraints but no extra binary variables). This exchange is starting to remind me of a tatonement process (cobweb model?) in economics. The question is, is it convergent? :-)

2. You could also have used this one
g_v <= |V| (1 - sum_{w<v} x_w)

I have the feeling that it is as tight as your original constraints

I'm not helping converge, do I?

3. I believe this one makes the model infeasible: if you select vertices 1 and 2 (x_1 = x_2 = 1), then the constraint forces g_3 <= -|V|.

3. You are right, I was thinking of my own model. It is time to say we converged!

1. I can accept that. :-)

If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on OR-Exchange.