OR in an OB World: Extracting a Connected Graph

Thursday, July 25, 2013

Extracting a Connected Graph

Some optimization problems can be characterized as follows:

a graph or digraph, either explicitly stated or implicit, underlies the problem;
part of the problem is to select a subgraph by selecting either nodes, edges or both; and
the selected subgraph must be connected.

Below I will show one way to model this. What I propose is mathematically valid, but I have no idea whether it is the most efficient approach. One disclaimer: in the case of a directed graph, this method will guarantee weak connectivity but not necessarily strong connectivity.

Let $\mathcal{V}$ be the set of vertices and $\mathcal{E}\subseteq \mathcal{V} \times \mathcal{V}$ the set of edges of the underlying graph. Create binary variables $x_v \in \{0,1\}\,\,\forall v\in\mathcal{V}$ and $y_{(v,w)}\in \{0,1\}\,\,\forall (v,w)\in \mathcal{E}$. They signal respectively the selection of vertex $v$ and edge $(v,w)$ for inclusion in the constructed subgraph. For consistency, add either the constraints\begin{gather*} y_{(v,w)}\le x_{v}\\ y_{(v,w)}\le x_{w} \end{gather*} or the constraints\[ 2y_{(v,w)}\le x_{v}+x_{w} \] for all $(v,w)\in\mathcal{E}$. The former yields a bit tighter model, while the latter yields a bit smaller model. They serve the same purpose: to avoid "dangling" edges (edges not connected to selected vertices at both ends). Also, if the original graph is undirected, we can exploit\[y_{(v,w)}=y_{(w,v)}\,\,\forall (v,w)\in\mathcal{E},\] to effectively halve the number of $y$ variables in the model.

To enforce connectedness, we will run a flow through the selected subgraph (treating edges as bidirectional for flow purposes), siphoning off one unit of flow at each selected vertex. Let $f_{(v,w)}\ge 0\,\,\forall (v,w)\in\mathcal{E}$ denote the flow along edge $(v,w)$. If we intend to siphon off one unit of flow at each vertex, the maximum flow volume we need is $\left|\mathcal{V}\right|$, the cardinality of the original vertex set. So we can limit flow to selected edges with the constraints\[f_{(v,w)}\le\left|\mathcal{V}\right|\left(y_{(v,w)}+y_{(w,v)}\right)\,\,\forall (v,w)\in\mathcal{E}.\]Note that, in the case of a digraph, we allow flow in both directions $v\rightarrow w$ and $w\rightarrow v$ if either of the arcs $(v,w)$ and $(w,v)$ is selected.

Before addressing flow balance constraints, we need to deal with the one tricky part: where does the flow originate? We will postulate a source node with sufficient supply sitting outside the original graph. For $v\in\mathcal{V}$, let $g_v\in\left[0,\left|\mathcal{V}\right|\right]$ denote a flow volume from the source node directly to vertex $v$ ("directly" meaning not passing through any edges of the original graph). We can now express conservation of flow ("flow in equals flow out") as\[g_v + \sum_{(w,v)\in\mathcal{E}} f_{(w,v)} = \sum_{(v,w)\in\mathcal{E}} f_{(v,w)} + x_v.\]If vertex $v$ is not selected ($x_v=0$), then by our previous constraints $y_{(v,w)}=y_{(w,v)}=0\,\,\forall w$, forcing $f_{(v,w)}=f_{(w,v)}=0\,\,\forall w$, and so this constraint reduces to $g_v=0$ (no external flow into unselected vertices). If vertex $v$ is selected ($x_v=1$), the constraint balances flow after extracting exactly one unit.

There is one thing left to do: restrict our source node to supplying a single (selected) vertex. To do this, we impose an arbitrary total ordering ($\preceq$) on $\mathcal{V}$. This allows us to define, for each $v\in\mathcal{V}$, the set $\mathcal{P}(v)=\{w\in\mathcal{V}\,:\, w\precneqq v\}$ of predecessors of vertex $v$. By adding the constraints\[g_v\le\left|\mathcal{V}\right|\left(1-x_w\right)\,\,\forall w\precneqq v\]we limit the source node to supplying only the first (in the precedence order) selected vertex. Since each of the other selected vertices draws supply from the first selected vertex (call that the "entry vertex") by conservative flows over selected edges between selected vertices, there must be a path in the selected subgraph from the entry vertex to every other selected vertex, meaning the subgraph is connected.

Addendum: The last step (restricting external flow to one vertex) adds $O\left(\left|\mathcal{V}\right|^2\right)$ constraints to the model. If we know a priori that at least one vertex from some (hopefully small) subset $\tilde{\mathcal{V}}\subset\mathcal{V}$ is guaranteed to be included in the subgraph, we can require that flow enter the subgraph through one of those vertices. Define the (arbitrary) ordering on $\mathcal{V}$ to that $\tilde{\mathcal{V}}$ comes first:\[v\prec w \,\,\forall v\in\tilde{\mathcal{V}},w\in\mathcal{V}\backslash\tilde{\mathcal{V}}.\]The last set of constraints then becomes\begin{gather*}g_v\le\left|\mathcal{V}\right|\left(1-x_w\right)\,\,\forall v,w\in\tilde{\mathcal{V}}\,\ni\,w\precneqq v\\g_v = 0\,\,\forall v\in\mathcal{V}\backslash\tilde{\mathcal{V}}\end{gather*}which adds $O\left(\left|\tilde{\mathcal{V}}\right|^2\right)$ constraints (and gets rid of some of the $g$ variables).

Addendum #2: I have a bit of a blind spot when it comes to SOS1 constraints -- I usually only think of them in terms of binary variables, whereas they can in fact involve continuous variables. An alternative to the final $O\left(\left|\mathcal{V}\right|^2\right)$ set of constraints is simply to declare the $g$ variables to be a type 1 specially ordered set. Depending on the solver, that may implicitly create additional binary variables (which I was trying to avoid when I crafted the last set of constraints -- see J-F Puget's comment below and my response), or it may just alter the branching scheme of the solver. To exploit an SOS1 constraint, a solver typically needs "weights" for the variables, to guide its branching decisions. I'm not sure what useful weights for the $g$ variables would be, but my first guess would be to weight $g_v$ by the number of successor nodes to $v$ in my precedence order.

Addendum #3: The restriction to a single entering flow can be done in $O\left(\left|\mathcal{V}\right|\right)$ rather than $O\left(\left|\mathcal{V}\right|^2\right)$ constraints as follows:\[\sum_{w\in\mathcal{V}\,:\,w\succneqq v}g_w\le\left|\mathcal{V}\right|\left(1-x_v\right)\,\,\forall v\in\mathcal{V}.\]Selecting any node $v$ precludes inbound flows at all successor nodes, so the only possible entry point is the first selected node. I think this formulation has a weaker relaxation than the first one I gave above, but it certainly has fewer constraints.

8 comments:

JFPugetJuly 27, 2013 at 3:50 AM
I had been thinking about this problem and converged on a model that is quite similar to yours except for the last part (the ordering).
Here it is.
Let's introduce one extra binary variable z_v per vertex v. z_v is 1 iff v is the vertex by which the flow enters the graph. Then constraints are
sum_v z_v = 1
z_v <= x_v
g_v <= |V| z_v
The number of additional variables and constraints is linear in the size for the problem. I don't know if my model would yield to better performance though.
ReplyDelete
Replies
JFPugetJuly 28, 2013 at 10:19 AM
Your comment made me think as well. My model suffers from symmetry: for a given connected graph any node could be the supply node. We can enforce that the supply node is the first node of the graph as follows
z_v + sum_{w<v} x_v <= 1
ReplyDelete
Replies
JFPugetJuly 30, 2013 at 11:42 AM
You are right, I was thinking of my own model. It is time to say we converged!
ReplyDelete
Replies

Add comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.

OR in an OB World

Thursday, July 25, 2013

Extracting a Connected Graph

8 comments:

Previous Posts

Labels