Tuesday, April 18, 2023

Node Coloring

Someone posted a question on OR Stack Exchange about coloring nodes in a sensor network. The underlying problem has something to do with placing sensors, but in mathematical terms the poster wanted to color the nodes of a weighted undirected graph with a fixed number of colors, while minimizing the sum of the edge weights of edges connecting nodes of the same color.

There is an obvious MIP model for this problem, which was posted as an answer to an earlier version of the question. If $N$ and $C$ are the index sets for nodes and colors respectively, $E\subseteq N\times N$ is the set of edges, $w_{ij}$ is the objective weight of edge $(i,j)\in E$ and $x_{n,c}$ is a binary variable indicating whether node $n$ is assigned color $c$, the problem can be written as a binary quadratic program:

$$\min\ \sum_{(n,m)\in E}w_{nm}\sum_{c\in C}x_{nc}x_{mc}\\
\textrm{s.t. }\sum_{c\in C}x_{nc}  =1\quad\forall n\in N\\
\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C.$$

 

The model can be linearized by introducing continuous variables $y_{nm}\ge 0$ to represent the objective contribution of each edge $(n,m)\in E$:

 

$$\min\ \sum_{(n,m)\in E}w_{nm} y_{nm}\\
\textrm{s.t. }\sum_{c\in C}x_{nc}=1 \quad \forall n\in N\\
\phantom{\textrm{s.t. }}y_{nm}\ge x_{nc} + x_{mc} - 1 \quad \forall (n,m)\in E, \forall c\in C\\
\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C\\
\phantom{\textrm{s.t. }}y_{nm}\ge 0 \quad \forall (n,m)\in E.$$


There is some symmetry in the model: given any feasible solution, another solution of identical cost is obtained by permuting the colors. It is possible to add constraints to remove that symmetry.


The person posting the question was looking for alternatives to a MIP model. An easy choice for me would be a genetic algorithm. I wanted to test this in R, using the GA library for the genetic algorithm. The GA library allows three types of chromosomes: binary or real vectors, or permutations of an index vector. If I were coding my own GA, the chromosome would be a vector in $\lbrace 1,\dots,C\rbrace^N.$ Within the available chromosome types in the library, the obvious choice was to use a vector $x\in [0,C]^N$ and then set the color of node $i$ to $\lceil x_i \rceil\in \lbrace 1,\dots,C\rbrace.$ I suspect, though, that the symmetry issue described above might have negative repercussions for a GA. If two "good" solutions were effectively the same (other than a permutation of the color indices), their offspring (which would be a mix of the two color numbering methods) might be a mess.


A couple of responders to the OR SE question suggested heuristics, as did I. My choice was to start with a random generation heuristic (assign colors randomly), apply an improvement heuristic (go through the nodes in random order and, for each node, reassign it to the cheapest color given the rest of the solution), and repeat (changing the order in which nodes are examined on each pass) until one pass produced no improvement. Since the heuristic is fairly fast, it can be done multiple times (new random restarts) with the best solution retained. Heuristics like this are very easy to code.


I coded all three approaches in R, using CPLEX (with the ompr and ROI libraries) for the MIP model, the GA library for the genetic algorithm, and nothing special for the improvement heuristic. I also threw in the tictoc library to do some timings. The test problem was a randomly constructed $10\times 20$ complete grid graph (200 nodes; 19,900 edges), to be colored using five colors.


To break symmetry in the MIP model, I forced the colors to be ordered so that the lowest node index receiving any color was smaller than the lowest node index receiving the next color. I let CPLEX run for five minutes using default settings. It made what I would call reasonable progress on the incumbent objective value, but the formulation is apparently very weak (at least on the test example). The lower bound barely moved, and the gap was 99.97% after five minutes.


I mostly used default parameters for the GA, although I let it run for 2,000 generations with a population of size 100. It found a much better solution (in 38 seconds) than CPLEX found in five minutes. To be fair, though, CPLEX was balancing improving the incumbent with working on the bound. Had I changed the CPLEX parameters to emphasize improving the incumbent, it likely would have done better (though likely not as well as the GA).


The improvement heuristic surprised me a bit. 600 random starts needed only 37 seconds (and I'm not the poster boy for optimized R code), and the best solution it found had an objective value about 5% better than what the GA produced. I did a small number of runs with different random seeds (but the same problem size), and the improvement heuristic consistently edged out the GA. I suppose the moral here is to try simple heuristics before going "high tech".


You are welcome to play with my R notebook, which contains both code and results.

No comments:

Post a Comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.