OR in an OB World: heuristics

Showing posts with label heuristics. Show all posts

Tuesday, September 12, 2023

Randomly Generating a Connected Graph

Someone posted a question on Operations Research Stack Exchange about generating random test instances of a connected graph. (They specified a planar graph, but I am going to ignore that restriction, which I suspect was unintended and which I am pretty sure would make the problem much harder.) The premise is fairly straightforward. You start with a fixed set of nodes, partitioned into three subsets (supply, transshipment and demand nodes). The goal is to select a random set of edges of fixed cardinality such that (a) the resulting graph is connected and (b) no edge connects two supply nodes or a supply node and a demand node. Setting the number of edges too low will obviously make the problem infeasible. The author specified that the graph could be either directed or undirected. My solutions are coded for undirected graphs but can be adapted to directed graphs.

I proposed two solutions, both of which I coded and tested in Java. One uses a mixed integer programming (MIP) model, which my code solves using CPLEX. Let $E$ be the set of all valid edges, $N$ the number of nodes and $K$ the desired number of edges. The MIP model uses binary variables $x_{i,j}$ for all $(i,j)\in E.$ The requirement to include $K$ edges is handled by the constraint $$\sum_{(i,j)\in E} x_{i,j} = K$$ and randomness is achieved by minimizing $$\sum_{(i,j)\in E} c_{i,j} x_{i,j}$$ where the cost coefficients $c_{i,j}$ are generated randomly.

That leaves the matter of forcing the graph to be connected. To do that, we introduce forward and backward flow variables $f_{i,j} \ge 0$ and $r_{i,j} \ge 0$ for each edge $(i,j),$ where $f_{i,j}$ is interpreted as flow from $i$ to $j$ and $r_{i,j}$ as flow from $j$ to $i.$ The concept is to single out one node (call it $s$) as a source for $N-1$ units of flow of some mystery commodity and assign every other node a demand of one unit of the commodity. For $i\neq s,$ the flow conservation constraint is $$\sum_{(j,i)\in E} (f_{j,i} - r_{j,i}) + \sum_{(i,j)\in E} (r_{i,j} - f_{i,j}) = 1,$$ which says that flow in minus flow out equals 1. To ensure that flow occurs only on selected edges, we add the constraints $$f_{i,j} \le (N-1) x_{i,j}$$ and $$r_{i,j} \le (N-1) x_{i,j}$$ for all $(i,j)\in E.$

The edge removal heuristic is a bit simpler. We start by selecting all eligible edges and shuffling the list randomly. While the number of edges in the graph exceeds $K,$ we pop the next edge from the list, remove it, and test whether the graph remains connected. If yes, the edge is permanently gone. If no, we replace the edge in the graph (but not on the list) and move to the next edge in the list. To confirm that removing edge $(i,j)$ leaves a connected graph, we start from node $i$ and find all nodes that can be reached from $i$ in one step (i.e., remain adjacent). Then we find all nodes adjacent to those nodes, and continue until either we encounter node $j$ (in which case the graph is still connected) or run out of nodes to test (in which case the graph is now disconnected).

The edge removal heuristic is simpler to explain, does not require a MIP solver and likely is faster. There are two potential disadvantages compared to the MIP approach. One is that the MIP is done after solving once, whereas there is a possibility that the edge removal heuristic fails to produce a connected graph due to an "unlucky" choice of which edges to remove, requiring one or more restarts with different random number seeds. The other is that if $K$ is set too low (making a connected graph impossible), the MIP model will detect that the problem is infeasible. With the edge removal heuristic, you would not be able to distinguish infeasibility from bad luck (although if the heuristic failed multiple times with different random seeds, infeasibility would start to look likely). In very very limited testing of my code, the edge removal heuristic was definitely faster than the MIP and achieved a valid result in the first pass each time.

Java code for both methods is in my code repository.

Tuesday, April 18, 2023

Node Coloring

Someone posted a question on OR Stack Exchange about coloring nodes in a sensor network. The underlying problem has something to do with placing sensors, but in mathematical terms the poster wanted to color the nodes of a weighted undirected graph with a fixed number of colors, while minimizing the sum of the edge weights of edges connecting nodes of the same color.

There is an obvious MIP model for this problem, which was posted as an answer to an earlier version of the question. If $N$ and $C$ are the index sets for nodes and colors respectively, $E\subseteq N\times N$ is the set of edges, $w_{ij}$ is the objective weight of edge $(i,j)\in E$ and $x_{n,c}$ is a binary variable indicating whether node $n$ is assigned color $c$, the problem can be written as a binary quadratic program:

$$\min\ \sum_{(n,m)\in E}w_{nm}\sum_{c\in C}x_{nc}x_{mc}\\
\textrm{s.t. }\sum_{c\in C}x_{nc} =1\quad\forall n\in N\\
\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C.$$

The model can be linearized by introducing continuous variables $y_{nm}\ge 0$ to represent the objective contribution of each edge $(n,m)\in E$:

$$\min\ \sum_{(n,m)\in E}w_{nm} y_{nm}\\
\textrm{s.t. }\sum_{c\in C}x_{nc}=1 \quad \forall n\in N\\
\phantom{\textrm{s.t. }}y_{nm}\ge x_{nc} + x_{mc} - 1 \quad \forall (n,m)\in E, \forall c\in C\\
\phantom{\textrm{s.t. }}x_{nc} \in\left\{ 0,1\right\} \quad\forall n\in N,c\in C\\
\phantom{\textrm{s.t. }}y_{nm}\ge 0 \quad \forall (n,m)\in E.$$

There is some symmetry in the model: given any feasible solution, another solution of identical cost is obtained by permuting the colors. It is possible to add constraints to remove that symmetry.

The person posting the question was looking for alternatives to a MIP model. An easy choice for me would be a genetic algorithm. I wanted to test this in R, using the GA library for the genetic algorithm. The GA library allows three types of chromosomes: binary or real vectors, or permutations of an index vector. If I were coding my own GA, the chromosome would be a vector in $\lbrace 1,\dots,C\rbrace^N.$ Within the available chromosome types in the library, the obvious choice was to use a vector $x\in [0,C]^N$ and then set the color of node $i$ to $\lceil x_i \rceil\in \lbrace 1,\dots,C\rbrace.$ I suspect, though, that the symmetry issue described above might have negative repercussions for a GA. If two "good" solutions were effectively the same (other than a permutation of the color indices), their offspring (which would be a mix of the two color numbering methods) might be a mess.

A couple of responders to the OR SE question suggested heuristics, as did I. My choice was to start with a random generation heuristic (assign colors randomly), apply an improvement heuristic (go through the nodes in random order and, for each node, reassign it to the cheapest color given the rest of the solution), and repeat (changing the order in which nodes are examined on each pass) until one pass produced no improvement. Since the heuristic is fairly fast, it can be done multiple times (new random restarts) with the best solution retained. Heuristics like this are very easy to code.

I coded all three approaches in R, using CPLEX (with the ompr and ROI libraries) for the MIP model, the GA library for the genetic algorithm, and nothing special for the improvement heuristic. I also threw in the tictoc library to do some timings. The test problem was a randomly constructed $10\times 20$ complete grid graph (200 nodes; 19,900 edges), to be colored using five colors.

To break symmetry in the MIP model, I forced the colors to be ordered so that the lowest node index receiving any color was smaller than the lowest node index receiving the next color. I let CPLEX run for five minutes using default settings. It made what I would call reasonable progress on the incumbent objective value, but the formulation is apparently very weak (at least on the test example). The lower bound barely moved, and the gap was 99.97% after five minutes.

I mostly used default parameters for the GA, although I let it run for 2,000 generations with a population of size 100. It found a much better solution (in 38 seconds) than CPLEX found in five minutes. To be fair, though, CPLEX was balancing improving the incumbent with working on the bound. Had I changed the CPLEX parameters to emphasize improving the incumbent, it likely would have done better (though likely not as well as the GA).

The improvement heuristic surprised me a bit. 600 random starts needed only 37 seconds (and I'm not the poster boy for optimized R code), and the best solution it found had an objective value about 5% better than what the GA produced. I did a small number of runs with different random seeds (but the same problem size), and the improvement heuristic consistently edged out the GA. I suppose the moral here is to try simple heuristics before going "high tech".

You are welcome to play with my R notebook, which contains both code and results.

Wednesday, January 26, 2022

Balancing Machine Groups

This post continues my previous post about approximate equality of some value across multiple entities. The motivation was a question posed on OR Stack Exchange asking how to partition $M$ machines into $G$ groups so that each group would have approximately the same total productivity. The machines have specified productivity values $P_1,\dots,P_M \ge 0$, and the productivity of a group is just the sum of the productivity values of the group members. The question assumes that $G$ divides $M$ evenly, so that each group has the number $S = M / G$ of members.

The previous post covered the issue of what metric to use in the objective function. Here I will focus on what algorithm/heuristic to use to solve it.

The author of the question specified $M=21$, $G=7$, $S=3$ for problem dimensions, with productivity values $P_m \in [0,10].$ He also posed a heuristic that on the surface makes sense: assign machines sequentially (I assume in decreasing productivity order) to the group with lowest current productivity (and with space to accept another machine). A genetic algorithm with chromosomes representing permutations of $1,\dots,M$ can be applied to any of the objective functions. (Since GAs maximize fitness and the problem here is to minimize the dispersion measure, we would just maximize the difference between the dispersion measure and some fixed upper bound on it.)

Finally, we can find exact solutions using a mixed integer linear program and any objective except MSD. (The MSD would be a mixed integer quadratic program.) The MIP models would contain variables $x_{mg}\in \lbrace 0,1 \rbrace$ for each machine $m$ and group $g$, indicating membership of that machine in that group. They would also contain auxiliary variables $y_g \ge 0$ to indicate the productivity value for each group $g$. The common part of the models would involve constraints $$\sum_{g=1}^G x_{mg} = 1\quad\forall m$$ (every machine is assigned to exactly one group), $$\sum_{m=1}^M x_{mg} = S \quad\forall g$$ (every group contains $S$ machines) and $$y_g = \sum_{m=1}^M P_m x_{mg} \quad\forall g$$ (defining the group productivity variables in terms of the assignments). To minimize the range, we would add variables $z_0, z_1 \ge 0$ plus constraints $$z_0 \le y_g \quad\forall g$$ and $$z_1 \ge y_g \quad \forall g$$ and then minimize $z_1 - z_0$. To minimize the MAD, we would instead add variables $z_g \ge 0$ for $g=1,\dots,G$ with the constraints $$ z_g \ge y_g - S\cdot \overline{P} \quad\forall g$$ and $$z_g \ge S\cdot \overline{P} - y_g \quad\forall g$$ (where $\overline{P} = (\sum_m P_m)/M$ is the mean productivity level of the machines, so that $S\cdot \overline{P}$ is the mean productivity per group regardless of how the groups are formed). The objective would then be to minimize $\sum_g z_g$.

I decided to code some of the alternatives in R (using CPLEX to solve the MIPs) and see how they did. The author's heuristic has the advantage of being very fast. The MIP models, not surprisingly, were much slower. I tried adding some antisymmetry constraints (since the objective value is unaffected if you permute the indices of the groups), but they did not seem to help run time, so I dropped them.

Here are the range and MAD values achieved by each method on a particular test run (reasonably representative of other runs).

Method Criterion Range % Gap MAD % Gap
Author      None   381.02613 564.734106
     GA     Range    52.04068 49.728175
     GA       MAD    56.95300 66.916019
    MIP     Range     0.00000   2.215414
    MIP       MAD    17.02795   0.000000

The gap measures are percentage above optimal. The author's method seems to be suboptimal by a wide margin, and the GAs are not doing particularly well on this example either.

We can visualize the group productivity levels for each solution in a bubble plot, where the size of a point is the productivity. The actual group productivity levels are close enough to each other that the plot is not helpful, so rather than plotting productivity I will plot the absolute percentage deviation from the overall mean productivity of all groups.

For the author's heuristic, group 3 has a productivity level close to the target mean and the other groups are pretty far off. (Remember that this is absolute deviation, so while we know group 1 for the author's solution is way off, we do not know from the plot whether it is above or below mean.) The MIP model that minimizes range looks the best to me, even though the MIP model that minimizes MAD is better with respect to overall absolute deviation.

There are three caveats to bear in mind. First, this is one test run, with randomly generated data. Altering the data affects all the results, although in multiple runs the author's heuristic always had gaps of 200% or more and the GAs never found optimal solutions. Second, performance of the GAs is affected by multiple parameters, in particular the population size and number of generations run. Third, we could adapt the author's heuristic by using the $G$ machines with highest productivity as "anchors" for the $G$ groups and then assigning the remaining machines in random order rather than in decreasing productivity order. That would allow us to run the heuristic multiple times with different sort orders for the machines, subject to time and/or iteration limits, and pick the best solution found.

My R notebook is available for download. Fair warning: it requires a bunch of packages, and requires a CPLEX installation if you plan to run the MIP models.

Tuesday, December 28, 2021

Revisiting a Joint Clustering Problem

In a previous post, I described a genetic algorithm for a problem posted on OR Stack Exchange. The problem involved grouping users and servers so as to maximize the total quality of service to all users. Every user in a cluster is served by every server in the cluster, but servers in other clusters interfere to some degree with the user's service.

The person who posted that problem recently posted another variation on OR SE. The new version of the problem is nearly, but not quite, identical to the previous version. "Servers" are now termed "transmitters", but the problem remains clustering transmitters and users, with limits on how many transmitters (but not how many users) belong to each cluster, and with the same objective function. The new version may be slightly easier thanks to one new restriction.

The user must belong to a cluster that has the transmitter providing the highest weight for the given user.

Thus the assignment of a transmitter to a cluster immediately implies the assignment to that cluster of all users for whom the specified transmitter has highest weight. (I will assume that ties for highest weight either do not occur or are broken whimsically.)

I put together two different mixed-integer programming (MIP) models for the problem. Both are rather complicated, in part because the objective function needs to be linearized. They are too big to squeeze into a blog post, so I put them in a PDF file. The models differ in the use of binary variables to assign transmitters. In the first model, a binary variable $x_{t,t'}$ is 1 if transmitters $t$ and $t'$ belong to the same cluster. In the second, a binary variable $x_{t,c}$ is 1 if transmitter $t$ belongs to cluster $c$. I coded both models in Java and ran them (using CPLEX as the MIP solver) on a few test problems with dimensions similar to what the original post contained (20 transmitters, 60 users). The first model was consistently smaller in size than the second and produced tighter bounds within whatever time limit I gave CPLEX.

The OR SE question also mentioned heuristics, so I coded up a couple of reasonably obvious ones. The first is a greedy construction heuristic. It initially puts each transmitter in a separate cluster, then repeatedly merges the pair of clusters that either produces the best improvement in the overall objective function. If no further improvement is possible and there are still more clusters than the number allowed, it merges whichever pair does the least damage to the objective. In all cases, mergers occur only if the merged cluster is within the allowed size limit.

The second heuristic is a pairwise swap heuristic. Given the (presumably feasible) solution from the first heuristic, it considers all pairs of transmitters belong to different clusters and, for each pair, evaluates the objective impact of swapping the two transmitters and performs the swap if it is beneficial. Note that swapping preserves the sizes of both clusters, so there is no question whether it is feasible. The heuristic keeps looping through pairs of transmitters until it is unable to find a useful swap.

In my experiments, I used the best solution from the second heuristic to "hot start" the first MIP model. (I ran into some difficulties getting the second MIP model to accept it as a starting solution, and frankly was not motivated enough to bother investigating the issue.) The results from one computational test (20 transmitters, 60 users, maximum cluster size 5) are fairly representative of what I saw in other tests. The construction heuristic got a solution with objective value 26.98 in 0.1 seconds on my desktop PC. The improvement heuristic ran for about 0.25 seconds and improved the objective value to 29.05. Running the first (better) MIP model using CPLEX 20.1 with a 10 minute time limit, starting from that solution, CPLEX improved the objective value to 29.62 with a final upper bound around 31.18. So the heuristic solution, which was obtained very quickly, was within 7% of optimal based on the bound and within 2% of the last solution CPLEX got within the 10 minute time limit.

Two caveats apply here. First, I did a minimal amount of testing. Second, and possibly more important, I randomly generated the transmitter-user weights using a uniform distribution. Having weights that follow a different distribution might affect both heuristic and model performance.

The Java code for both heuristics and both MIP models can be downloaded from my repository. It requires a recent version of Java (14 or later) since it uses the Record class, and it of course requires CPLEX (unless you want to delete that part and just play with the heuristics), but has no other dependencies.

Monday, November 15, 2021

A Heuristic Surprise

The maximum cut problem is one of those combinatorics problems that are deceptively easy to articulate and yet NP-annoying to solve. Given an undirected graph $G=(V,E)$, find a partition of $V$ into disjoint subsets $A$ and $B$ such that the set of edges in $E$ with one endpoint in $A$ and the other in $B$ (the "cut set") is as large as is possible. There is also a weighted version, where you maximize the total weights of the cut set rather than its cardinality.

Someone asked on OR Stack Exchange about a variant of the problem, in which there is a specified size (which the author called a "budget") for one of the sets, i.e., a constraint $|A|=b$ for some given $b$. This generated various suggestions, including some heuristics. I had a couple of thoughts about possible heuristics, and in playing with some computational examples found myself rather surprised by the results.

Mixed Integer Program

It is of course quite straightforward to write an integer or mixed-integer linear program for the problem. I'll assume that $V=\lbrace 1,\dots,N\rbrace$ and that $2\le b \le N-1$. Let variable $x_i$ be 1 if vertex $i\in V$ is in $A$ and 0 if it is in $B$, and let variable $y_{i,j}$ be 1 if edge $(i,j)\in E$ is in the cut set ($i\in A$ and $j\in B$ or vice versa) and 0 if not ($i$ and $j$ both in $A$ or both in $B$).

Please allow me a slight digression here. The $y$ variables can be declared either continuous or binary. The combination of objective and constraints will force the values to 0 or 1 even if they are declared as continuous. Force of habit, dating back to when I was using very early generation software on not particularly powerful (by current standards) computers, has me declaring every variable as continuous unless it has to be integer. On the other hand, it has been suggested to me that declaring such variables integer-valued may help solvers tighten bounds or fix variables. That makes sense to me, but in this particular model I don't see anything that would suggest making $y$ integer would help the solver.

Digression over, we can write the MIP model as

\begin{alignat*}{1} \max\sum_{(i,j)\in E}y_{ij}\\ \textrm{s.t. }\sum_{i\in V}x_{i} & =b\\ y_{ij} & \le x_{i}+x_{j}\quad\forall(i,j)\in E\\ y_{ij} & \le2-x_{i}-x_{j}\quad\forall(i,j)\in E\\ x_{i} & \in\left\{ 0,1\right\} \quad\forall i\in V\\ y_{ij} & \in\left[0,1\right]\quad\forall(i,j)\in E. \end{alignat*}

The first constraint enforces the "budget", while the second and third constraints ensure that $y_{ij}=0$ if $i$ and $j$ are on the same side of the partition. The objective function will force $y_{ij}=1$ if $i$ and $j$ are on different sides. The MIP model is easy to solve for small instances and grows more difficult as the number of vertices and the number of edges grows, and as $b$ gets closer to $N/2$ (since the number of possible partitions satisfying the budget increases).

Genetic Algorithm

Since the max cut problem is NP-hard, I suspect the budgeted version also is, and in any case it is likely that solving the MIP to optimality can take too long (or too much memory) in many cases. That led me to thinking about heuristics. One easy heuristic is to solve the MIP model but not to optimality, stopping at some arbitrary time or memory/node limit. That requires a MIP solver. Another possibility (among many) is a genetic algorithm.

I was fooling around with the problem in R, which has a very handy genetic algorithm library (named, rather suggestively, GA). Among other things, the GA library allows you to use a permutation as a "chromosome" (solution encoding). So we can define a candidate solution to be a permutation of $1,\dots,N$, and it's "fitness" is then the size of the cut set defined by setting $A$ equal to the first $b$ elements of the permutation.

Pairwise Swap

Another possibility that I suggested on OR SE was a simple pairwise swapping heuristic. Start by generating a random choice of $A$ (and, by implication, $B$) and calculate the size of the cut set. Now consider all pairs of vertices $i\in A$ and $j\in B$ and, for each pair, see what happens if you swap them (moving $i$ to $B$ and $j$ to $A$). If the size of the cut set increases, accept the swap; otherwise do not. This is a primitive example of a "neighborhood" search, and it can be used in fancier heuristics, including simulated annealing (where you occasionally accept a swap that actually makes the objective smaller, in order to get you out of your current neighborhood). To keep things simple, I suggested just doing pairwise swaps when the objective improved, with random restarts. That means assigning a time limit to the heuristic, and when you run out of acceptable swaps, start fresh with a new random solution until time is exhausted.

And the surprise is ...

I code the MIP model (solved with CPLEX), the GA and the swapping heuristic in R and ran them on a few random test cases, keeping dimensions small enough that CPLEX could solve the MIP reasonably quickly. Since the swapping heuristic relies on a time limit to stop it, I set a limit of two minutes. For the GA, I set a limit of 10,000 generations (which it never reached, as I expected) and a stagnation limit of 200 generations (meaning that, after 200 consecutive generations with no improvement in the best objective value, it would terminate). Early experiments suggested that the GA could stagnate pretty quickly, so I used an "island" GA (in which several smaller populations on "islands" evolve independently, with periodic migrations from one island to another to freshen the "genetic pools" on the islands).

My expectation was that the "sophisticated" GA would outperform the "primitive" swapping heuristic. Wrong! In the table below, I show the optimality gap (percent worse than the optimum, as computed by CPLEX) for the GA and the swap heuristic on six examples. Since the GA stagnates well before the two minute limit I gave to the swap heuristic, I also ran the GA with restarts (column "GA-R"), which restarted the GA with a new population each time it stagnated until the two minute limit was reached. The restarts improved the GA performance somewhat, but as you can see the swapping heuristic beat it on every one of the six test cases (and found the optimum in four out of six).

Graph			Gap
Nodes	Edges	Budget (b)	GA	GA-R	Swap
50	429	10	1.7%	1.7%	0.0%
50	490	20	3.0%	0.7%	0.0%
75	416	20	6.4%	4.3%	0.0%
75	416	30	6.8%	4.7%	0.4%
100	495	30	8.4%	7.8%	1.6%
100	990	25	9.1%	4.4%	0.0%

I don't expect the swapping heuristic to find the optimum this consistently on larger problems. Whether it would outperform the GA (with restarts) on tougher problems is an open question. Nonetheless, I take this as a reminder that sometimes simple heuristics can do a pretty good job.

My R notebook is available if you want to play with it. Fair warning: the notebook assumes you have CPLEX installed and loads seven R libraries (four for use with CPLEX, one for the GA, one to time execution of the heuristics and a separate one to set a time limit on the heuristics). If some of that does not interest you, you can of course edit out the related code and skip loading the corresponding libraries.

Wednesday, November 3, 2021

Smallest Polygon Containing A 2D Point Set

A question on OR Stack Exchange asked about an optimization model (specifically a MILP) for finding the "smallest hull" containing a finite set of points in the plane. The phrase "smallest hull" is rather fuzzy. In a follow up comment, the author of the question clarified that "smallest" means smallest area. That leaves the meaning of "hull". Note that any Hamiltonian path through the points would contain all the points (with all of them on its "boundary") and would have area zero. Based on a sample illustration added by the author, I will assume that "hull" here means polygon (but not necessarily a convex one).

I'm pretty sure that a MILP model could be devised, but I suspect it would be rather large and might solve slowly. So I looked at the possibility of a heuristic solution, and came up with something that seems to do pretty well but definitely does not produce an optimal solution in all cases. I got it working in R. You can see the full notebook (which includes the code) here. (Warning: It's almost 6MB, due to the presence of some graphs.) The notebook uses three R libraries: ggplot2 and plotly, for plotting; and interp, for generating a tesselation (more about that below). The plotly library is used just once, to generate an interactive plot with "tooltips" identifying each point, which is mainly useful for debugging. So if you don't feel like installing it, you can just delete any references to it and still run the important parts of the code.

Suppose we start with the following 2D point set.

Point set (with convex hull)

The heuristic first finds the convex hull of the points (shown above) and a Delaunay triangulation of them, using the tri.mesh function from the interp library. The triangulation is a collection of nonoverlapping triangles, with all vertices taken from the point set, such that no point is in the interior of any triangle. The next plot shows what the triangulation looks like.

Delaunay triangulation

The heuristic now takes a lap around the boundary of the convex hull. For each line segment $[a,b]$, it locates the (unique) triangle containing that edge. Let's say the third vertex of that triangle is $c$. If $c$ is not currently on the boundary, the heuristic adds it to the boundary, replacing the segment $[a,b]$ with the segments $[a,c]$ and $[c,b]$. It then deletes the $a-b-c$ triangle from the polygon, shaving off some area, and moves on to the next boundary segment.

This loop is repeated until a full circuit around the boundary fails to make any changes, at which point the heuristic terminates. Here is what the final polygon looks like.

Final polygon

What appear to be line segments sticking out in a few places are actually very skinny triangles. Clearly the heuristic did not select this polygon for its aesthetics.

The points were randomly generated in the unit square, which of course has area 1. For this example, the convex hull has area 0.865, while the final polygon has area 0.511, a reduction of almost 41%. At the same time, it is clear from the final plot that the solution is not optimal. There is one point left in the interior of the polygon, around $(0.493, 0.378)$. The heuristic cannot do anything about it because none of the triangles to which it belongs have an edge on the boundary. We could certainly add line segments from it to two boundary points, at $(0.483, 0.280)$ and $(0.450, 0.175)$, that currently share an edge. That would form a new triangle containing no other points, and we could drop the existing edge, adding the two new edges to boundary, and shed a little more area.

So the heuristic does not produce "nice looking" polygons, nor does it find an optimal solution, but my intuition after running a few examples is that it probably does reasonably well.

Thursday, April 8, 2021

A GA Model for a Joint Clustering Problem

A problem in grouping users and servers was posted on Mathematics Stack Exchange and OR Stack Exchange. (Someone remind me to rant about cross-posting in a future blog post. Just don't cross-post the reminder.) The gist of the problem is as follows. We have $S$ servers of some sort, and $U$ users. For each combination of user $u$ and server $s$, we have a parameter $h_{u,s}$ which pertains to the quality / strength / something of service user $u$ would get from server $s$. We are told to group users and servers into a predefined number $G$ of groups or clusters. Every user in cluster $g$ will be served by every server in cluster $g$, but servers in other clusters will interfere with the service to user $u$. (A possible application might be cellular phone service, where signals from towers to which you are not connected might interfere with your signal. Just guessing.)

There is one more parameter, the maximum number ($M$) of servers that can be assigned to a group. It is explicitly stated that there is no limit to the number of users that can be assigned to a group. I'm going to go a step further and assume that every group must contain at least one server but that there is no lower limit to the number of users assigned to a group. (If a group has no users, presumably the servers in that group get to relax and play video games or whatever.)

The objective is to maximize $\sum_{u=1}^U q_u$, the total quality of service, where $q_u$ is the quality of service for user $u$. What makes the problem a bit interesting is that $q_u$ is a nonlinear function of the allocation decisions. Specifically, if we let $\mathcal{S}_1, \dots, \mathcal{S}_G$ be the partition of the set $\mathcal{S} = \lbrace 1,\dots, S\rbrace$ of all servers, and if user $u$ is assigned to group $g$, then $$q_{u}=\frac{\sum_{s\in\mathcal{S}_{g}}h_{us}}{\sum_{s\notin\mathcal{S}_{g}}h_{us}}.$$Note that the service quality for user $u$ depends only on which servers are/are not in the same group with it; the assignments of other users do not influence the value of $q_u$.

An answer to the OR SE post explains how to model this as a mixed-integer linear program, including how to linearize the objective. That is the approach I would recommend. The original poster, however, specifically asked for a heuristic approach. I got curious and wrote a genetic algorithm for it, in R, using the GA library. Since this is a constrained problem, I used a random key GA formulation. I won't go into excessive detail here, but the gist is as follows. We focus on assigning servers to groups. Once we have a server assignment, we simply compute the $q_u$ value for each user and each possible group, and assign the user to the group that gives the highest $q_u$ value.

To assign servers to groups, we start with an "alphabet" consisting of the indices $1,\dots,S$ for the servers and $G$ dividers (which I will denote here as "|"). In the R code, I use NA for the dividers. A "chromosome" is an index vector that permutes the alphabet. Without loss of generality, we can assume that the first server goes in the first group, and the last divider must come after the last group, and thus we permute only the intervening elements of the alphabet. For instance, if $S=5$ and $G=3$, the alphabet is $1,2,3,4,5,|,|,|$ and a chromosome $(2, 7, 4, 6, 5, 3)$ would translate to the list $1, 2, |, 4, |, 5, 3, |$. (Each element of the chromosome is the index of an element in the alphabet.) I would interpret that as group 1 containing servers 1 and 2, group 2 containing server 4, and group 3 containing servers 3 and 5.

There is no guarantee that a random chromosome produces a server grouping that contains at least 1 and at most $M$ servers in every group, so we post-process it by going group by group and adjusting the dividers by the minimum amount necessary to make the current group legal. Once we have the servers grouped, we assign users by brute force and compute an overall fitness of the solution.

I am deliberately leaving out some (OK, many) gory details here. My R code and a test problem are contained in an R notebook that you are welcome to peruse. When I rerun the GA on the same test problem, I get different results, which is not surprising since the GA is a heuristic and is most definitely not guaranteed to produce an optimal solution. Whether the solution is "good enough", and whether it scales to the size problem the original poster has in mind, are open questions.

Wednesday, September 30, 2020

A Greedy Heuristic Wins

A problem posted on OR Stack Exchange starts as follows: "I need to find two distinct values to allocate, and how to allocate them in a network of stores." There are $n$ stores (where, according to the poster, $n$ can be close to 1,000). The two values (let's call them $x_1$ and $x_2$) must be integer, with $x_1 \in \lbrace 1, \dots, k_1 \rbrace$ and $x_2 \in \lbrace k_1, \dots, k_2 \rbrace$ for given parameters $k_1 < k_2$. Additionally, there is an additional set of parameters $s_{i3}$ and a balance constraint saying $$0.95 g(k_1 e) \le g(x_1, x_2) \le 1.05 g(k_1 e)$$ where $$g(y) = \sum_{i=1} \frac{s_{i3}}{y_i}$$ for any allocation $y$ and $e = (1,\dots, 1).$

The cost function (to be minimized) has the form $$f(x_1, x_2) = a\sum_{i=1}^n \left[ s_{i1}\cdot \left( \frac{s_{i2}}{y_i} \right)^b \right]$$with $a$, $s_{i1}$, $s_{i2}$ and $b$ all parameters and $y_i \in \lbrace x_1, x_2 \rbrace$ is the allocation to store $i$. There are two things to note about $f$. First, the leading coefficient $a (> 0)$ can be ignored when looking for an optimum. Second, given choices $x_1$ and $x_2>x_1$, the cheaper choice at all stores will be $x_1$ if $b < 0$ and $x_2$ if $b > 0$.

It's possible that a nonlinear solver might handle this, but I jumped straight to metaheuristics and, in particular, my go-to choice among metaheuristics -- a genetic algorithm. Originally, genetic algorithms were intended for unconstrained problems, and were tricky to use with constrained problems. (You could bake a penalty for constraint violations into the fitness function, or just reject offspring that violated any constraints, but neither of those approaches was entirely satisfactory.) Then came a breakthrough, the random key genetic algorithm [1]. A random key GA uses a numeric vector $v$ (perhaps integer, perhaps byte, perhaps double precision) as the "chromosome". The user is required to supply a function that translates any such chromosome into a feasible solution to the original problem.

I did some experiments in R, using the GA package to implement a random key genetic algorithm. The package requires all "genes" (think "variables") to be the same type, so I used a double-precision vector of dimension $n_2$ for chromosomes. The last two genes have domains $(1, k_1 + 1)$ and $(k_1, k_2 + 1)$; the rest have domain $(0, 1)$. Decoding a chromosome $v$ proceeds as follows. First, $x_1 = \left\lfloor v_{n+1}\right\rfloor $ and $x_2 = \left\lfloor v_{n+2}\right\rfloor $, where $\left\lfloor z \right\rfloor$ denotes the "floor" (greatest lower integer) of $z$. The remaining values $v_1, \dots, v_{n}$ are sorted into ascending order, and their sort order is applied to the stores. So, for instance, if $v_7$ is the smallest of those genes and $v_{36}$ is the largest, then store $7$ will be first in the sorted list of stores and store $36$ will be last. (The significance of this sorting will come out in a second.)

Armed with this, my decoder initially assigns every store the cheaper choice between $x_1$ and $x_2$ and computes the value of $g()$. If $g()$ does not fall within the given limits, the decoder runs through the stores in their sorted order, switching the allocation to the more expensive choice and updating $g()$, until $g()$ meets the balance constraint. As soon as it does, we have the decoded solution. This cheats a little on the supposed guarantee of feasibility in a decoded solution, since there is a (small?) (nearly zero?) chance that the decoding process will fail with $g()$ jumping from below the lower bound to above the upper bound (or vice versa) after some swap. If it does, my code discards the solution. This did not seem to happen in my testing.

The GA seemed to work okay, but it occurred to me that I might be over-engineering the solution a bit. (This would not be the first time I did that.) So I also tried a simple greedy heuristic. Since $k_1$ and $k_2$ seem likely to be relatively small in the original poster's problem (whereas $n$ is not), my greedy heuristic loops through all valid combinations of $x_1$ and $x_2$. For each combination, it sets $v_1$ equal to the cheaper choice and $v_2$ equal to the more expensive choice, assigns the cheaper quantity $v_1$ to every store and computes $g()$. It also computes, for each store, the ratio \[ \frac{|\frac{s_{i3}}{v_{2}}-\frac{s_{i3}}{v_{1}}|}{s_{i1}\left(\left(\frac{s_{i2}}{v_{2}}\right)^{b}-\left(\frac{s_{i1}}{v_{1}}\right)^{b}\right)} \]in which the numerator is the absolute change in balance at store $i$ when switching from the cheaper allocation $v_1$ to the more expensive allocation $v_2$, and the denominator is the corresponding change in cost. The heuristic uses these ratios to select stores in descending "bang for the buck" order, switching each store to the more expensive allocation until the balance constraint is met.

Both the GA decoder and the greedy heuristic share the approach of initially allocating every store the cheaper choice and then switching stores to the more expensive choice until balance is attained. My R notebook generates a random problem instance with $n=1,000$ and then solves it twice, first with the GA and then with the greedy heuristic. The greedy heuristic stops when all combinations of $x_1$ and $x_2$ have been tried. Stopping criteria for the GA are more arbitrary. I limited it to at most 1,000 generations (with a population of size 100) or 20 consecutive generations with no improvement, whichever came first.

The results on a typical instance were as follows. The GA ran for 49 seconds and got a solution with cost 1065.945. The greedy heuristic needed only 0.176 seconds to get a solution with cost 1051.735. This pattern (greedy heuristic getting a better solution in much less time) repeated across a range of random number seeds and input parameters, including switching between positive and negative values of $b$.

If you are interested, you can browse my R notebook (which includes both code and results).

[1] Bean, J. C. (1994). Genetic Algorithms and Random Keys for Sequencing and Optimization. ORSA Journal on Computing, 6, 154-160.

Tuesday, January 7, 2020

Greedy Methods Can Be Exact

We generally sort optimization algorithms (as opposed to models) into two or three categories, based on how certain we are that solutions will be either optimal or at least "good". An answer by Michael Feldmeier to a question I posted on OR Stack Exchange neatly summarizes the categories:

exact methods eventually cough up provably optimal solutions;
approximate methods eventually cough up solutions with some (meaningful) guarantee regarding how far from optimal they might be; and
heuristics provide no worst-case guarantees (but generally are either easy to implement, fast to execute or both).

I should explain my use of "meaningful" (which is not part of Michael's answer). A common way to estimate the "gap" between a solution and the optimum is to take $|z - \tilde{z}|/|z|$, where $z$ is the objective value of the solution produced by the algorithm and $\tilde{z}$ is some bound (lower bound in a minimization, upper bound in a maximization) of the optimal solution. Now suppose that we are minimizing a function known to be nonnegative. If we set $\tilde{s}=0$, we know that any method, no matter how stupid, will have a gap no worse than 100%. To me, that is not a meaningful guarantee. So I'll leave the definition of "meaningful" to the reader.

What brings all this to mind is a question posted on Mathematics Stack Exchange. The author of the question was trying to solve a nonlinear integer program. He approached it by applying a "greedy algorithm". Greedy algorithms are generally assumed to be heuristics, since it seldom is possible to provide useful guarantees on performance. In his case, though, the greedy algorithm is provably optimal, mainly due to the objective function being concave and separable. I'll state the problem and show a proof of optimality below (changing the original notation a bit). Brace yourself: the proof is a bit long-winded.

You start with $N$ workers to be assigned to $M$ work stations. The output of workstation $m$, as a function of the number of workers $x$ assigned to it, is given by $$f_{m}(x)=a_{m}x+b_{m}-\frac{c_{m}}{x},$$

where $a_{m},b_{m},c_{m}$ are all positive constants. Since $f(0)=-\infty$, we can assume that each work station gets at least one worker (and, consequently, that $N>M$). Since $f_{m}'(x)=a_{m}+c_{m}/x^{2}>0$, each $f_{m}()$ is monotonically increasing. Thus, we can safely assume that all $N$ workers will be assigned somewhere. $f_{m}''(x)=-2c_{m}/x^{3}<0$, so $f_{m}()$ is strictly concave (which we will need later). We also note, for future reference, that the impact of adding one worker to a current staff of $x$ at station $m$ is $$\Delta f_{m}(x)=a_{m}+\frac{c_{m}}{x(x+1)}>0.$$

Similarly, the impact of removing one worker at station $m$ is $$\delta f_{m}(x)=-a_{m}-\frac{c_{m}}{x(x-1)}<0.$$We see that $\delta f_{m}(x)$ is an increasing function of $x$ (i.e., it gets less negative as $x$ gets bigger). We also note that $\Delta f_{m}(x)=-\delta f_{m}(x+1)$.

The IP model is easy to state. Let $x_{m}$ be the number of workers assigned to work station $m$. The model is

$$\max\sum_{m=1}^{M}f_{m}(x_{m})$$

subject to

$$\sum_{m=1}^{M}x_{m}\le N$$

with

$$x\in\mathbb{Z}_{+}^{M}.$$

The greedy algorithm starts with a single worker at each station ($x=(1,\dots,1)$) and, at each step, adds one worker to the workstation where that worker produces the greatest increase in objective value (breaking ties arbitrarily). It stops when all $N$ workers are assigned. To prove that it actually finds an optimal solution, I'll use proof by contradiction.

Let $x^{(0)},x^{(1)},\dots,x^{(N-M)}$ be the sequence of solutions constructed by the greedy algorithm, with $x^{(0)}=(1,\dots,1)$, and let $x^{(k)}$ be the last solution in the sequence for which an optimal solution $x^{*}$ exists such that $x^{(k)}\le x^{*}$. The significance of the inequality is that if $x\le x^{*}$, it is possible to extend the partial solution $x$ to the optimal solution $x^{*}$ by adding unassigned workers to work stations. We know that $k$ is well defined because $x^{(0)}\le x^{*}$ for any optimal $x^{*}$. Since we are assuming that the greedy algorithm does not find an optimum, it must be that $k<N-M$.

Now identify the work station $j$ to which the greedy algorithm added a worker at step $k$, meaning that $x_{j}^{(k+1)}=x_{j}^{(k)}+1$ and $x_{i}^{(k+1)}=x_{i}^{(k)}$ for $i\neq j$. Since, by assumption, $x^{(k)}\le x^{*}$ but $x^{(k+1)}\not\le x^{*}$, it must be that $x_{j}^{(k)}=x_{j}^{*}$.

Next, since $x^{(k)}\le x^{*}$ and $x^{(k)}\neq x^{*}$ (else $x^{(k)}$ would be optimal), there is some work station $h\neq j$ such that $x_{h}^{(k)}<x_{h}^{*}$. Let $\tilde{x}$ be the solution obtained from $x^{(k)}$ by adding a worker to station $h$: $\tilde{x}_{h}=x_{h}^{(k)}+1$ and $\tilde{x}_{i}=x_{i}^{(k)}$ for $i\neq h$. Observe that $\tilde{x}\le x^{*}$. The greedy algorithm chose work station $j$ over work station $h$ at $x^{(k)}$, so it must be that

$$\Delta f_{j}(x_{j}^{(k)})\ge\Delta f_{h}(x_{h}^{(k)}). \quad (1)$$

Finally, let $\hat{x}$ be the result of starting from optimal solution $x^{*}$ and shifting one worker from station $h$ to station $j$. Since

$$x_{j}^{(k+1)}=x_{j}^{(k)}+1=x_{j}^{*}+1=\hat{x}_{j},$$ $$x_{h}^{(k+1)}=x_{h}^{(k)}<x_{h}^{*}\implies x_{h}^{(k+1)}\le\hat{x}_{h}$$

and

$$x_{i}^{(k+1)}=x_{i}^{(k)}\le x_{i}^{*}=\hat{x}_{i}\,\forall i\notin\{h,j\},$$

we have $x^{(k+1)}\le\hat{x}$. Under the assumption that $x^{(k)}$ was the last solution in the greedy sequence that could be extended to an optimal solution, it must be that $\hat{x}$ is not optimal. Thus the net change to the objective function at $x^{*}$ when shifting one worker from station $h$ to station $j$ must be negative, i.e.,

$$\Delta f_{j}(x_{j}^{*})+\delta f_{h}(x_{h}^{*})<0.\quad (2)$$

We showed previously that, under our assumptions, $x_{j}^{(k)}=x_{j}^{*}$, from which it follows that

$$\Delta f_{j}(x_{j}^{*})=\Delta f_{j}(x_{j}^{(k)}). \quad (3)$$

We also showed that $\delta f_{h}()$ is an increasing function. Since $\tilde{x}_{h}\le x_{h}^{*}$,

$$\delta f_{h}(x_{h}^{*})\ge\delta f_{h}(\tilde{x}_{h})=-\Delta f_{h}(x_{h}^{(k)}). \quad (4)$$

Combining (4) with (2), we have

$$\Delta f_{j}(x_{j}^{*})-\Delta f_{h}(x_{h}^{(k)})<0,$$

i.e.,

$$\Delta f_{j}(x_{j}^{*})<\Delta f_{h}(x_{h}^{(k)}). \quad (5)$$

Combining (3) with (5) yields

$$\Delta f_{j}(x_{j}^{(k)})<\Delta f_{h}(x_{h}^{(k)})$$

which contradicts (1).

OR in an OB World