Tuesday, December 28, 2021

Revisiting a Joint Clustering Problem

In a previous post, I described a genetic algorithm for a problem posted on OR Stack Exchange. The problem involved grouping users and servers so as to maximize the total quality of service to all users. Every user in a cluster is served by every server in the cluster, but servers in other clusters interfere to some degree with the user's service.

The person who posted that problem recently posted another variation on OR SE. The new version of the problem is nearly, but not quite, identical to the previous version. "Servers" are now termed "transmitters", but the problem remains clustering transmitters and users, with limits on how many transmitters (but not how many users) belong to each cluster, and with the same objective function. The new version may be slightly easier thanks to one new restriction.

The user must belong to a cluster that has the transmitter providing the highest weight for the given user.

Thus the assignment of a transmitter to a cluster immediately implies the assignment to that cluster of all users for whom the specified transmitter has highest weight. (I will assume that ties for highest weight either do not occur or are broken whimsically.)

I put together two different mixed-integer programming (MIP) models for the problem. Both are rather complicated, in part because the objective function needs to be linearized. They are too big to squeeze into a blog post, so I put them in a PDF file. The models differ in the use of binary variables to assign transmitters. In the first model, a binary variable $x_{t,t'}$ is 1 if transmitters $t$ and $t'$ belong to the same cluster. In the second, a binary variable $x_{t,c}$ is 1 if transmitter $t$ belongs to cluster $c$. I coded both models in Java and ran them (using CPLEX as the MIP solver) on a few test problems with dimensions similar to what the original post contained (20 transmitters, 60 users). The first model was consistently smaller in size than the second and produced tighter bounds within whatever time limit I gave CPLEX.

The OR SE question also mentioned heuristics, so I coded up a couple of reasonably obvious ones. The first is a greedy construction heuristic. It initially puts each transmitter in a separate cluster, then repeatedly merges the pair of clusters that either produces the best improvement in the overall objective function. If no further improvement is possible and there are still more clusters than the number allowed, it merges whichever pair does the least damage to the objective. In all cases, mergers occur only if the merged cluster is within the allowed size limit.

The second heuristic is a pairwise swap heuristic. Given the (presumably feasible) solution from the first heuristic, it considers all pairs of transmitters belong to different clusters and, for each pair, evaluates the objective impact of swapping the two transmitters and performs the swap if it is beneficial. Note that swapping preserves the sizes of both clusters, so there is no question whether it is feasible. The heuristic keeps looping through pairs of transmitters until it is unable to find a useful swap.

In my experiments, I used the best solution from the second heuristic to "hot start" the first MIP model. (I ran into some difficulties getting the second MIP model to accept it as a starting solution, and frankly was not motivated enough to bother investigating the issue.) The results from one computational test (20 transmitters, 60 users, maximum cluster size 5) are fairly representative of what I saw in other tests. The construction heuristic got a solution with objective value 26.98 in 0.1 seconds on my desktop PC. The improvement heuristic ran for about 0.25 seconds and improved the objective value to 29.05. Running the first (better) MIP model using CPLEX 20.1 with a 10 minute time limit, starting from that solution, CPLEX improved the objective value to 29.62 with a final upper bound around 31.18. So the heuristic solution, which was obtained very quickly, was within 7% of optimal based on the bound and within 2% of the last solution CPLEX got within the 10 minute time limit.

Two caveats apply here. First, I did a minimal amount of testing. Second, and possibly more important, I randomly generated the transmitter-user weights using a uniform distribution. Having weights that follow a different distribution might affect both heuristic and model performance.

The Java code for both heuristics and both MIP models can be downloaded from my repository. It requires a recent version of Java (14 or later) since it uses the Record class, and it of course requires CPLEX (unless you want to delete that part and just play with the heuristics), but has no other dependencies.

Tuesday, December 21, 2021

Installing Rcplex in RStudio

This is an update to a previous post ("Installing Rcplex and cplexAPI"). Rcplex is a interface from R to the CPLEX optimization solver. A recent release (it's now at version 0.3-5) has made installation and updating much easier (no hacking of installer code required), provided you configure your environment correctly.

I use the RStudio IDE for almost everything related to R, including updating packages. A recent list of packages waiting to be updated included Rcplex 0.3-4, but when I told RStudio to update them all it balked at Rcplex. The workaround is actually quite simple. What follows is how I got to work in Linux Mint, but the same steps should work with any Linux distribution and I assume can be modified to work on Windows or Mac OS.

1. Find where the CPLEX binary lives. In my case, the path was "/home/paul/<stuff>/CPLEX_Studio201/cplex/bin/x86-64_linux/cplex".
2. In a terminal, create an environment variable CPLEX_BIN containing that path (unless you already have one). This can be done using the export command, as in "export CPLEX_BIN=/home/paul/<stuff>/CPLEX_Studio201/cplex/bin/x86-64_linux/cplex". Note that what I am exporting includes the name of the executable (that last "cplex"), not just the path to it. (I goofed on my first try by just specifying the path.)
3. Now invoke RStudio from the terminal and do the package update (or installation if you are getting it for the first time) as normal.

If you are not an RStudio user, you can install or update the package manually from a terminal by issuing the same export command followed by "R CMD INSTALL <path to download>/Rcplex_0.3-5.tar.gz" (updating the version number as necessary).

This was much less painful than previous installations. Thanks to the developers for maintaining the package and updating the installation scripts!

Thursday, December 9, 2021

Normalization Constraints

There are occasions where one might consider adding a constraint to an optimization problem specifying that $\|x\| \sim \nu,$ where $x$ is the solution vector, $\nu$ is a positive constant (usually 1, but sometimes something else), the norm is whichever norm (1, 2, $\infty$) suits the modeler's fancy, and the relation $\sim$ is whichever of $\le,$ $=$ or $\ge$ fits the modeler's needs. I can think of two somewhat common reasons for a constraint like this.

The first is a situation where $x$ is free to scale up or down and the modeler is trying to avoid having the solver choose ridiculously large or ridiculously small values for $x.$ Commonly, this is to prevent the solver from exploiting rounding error. As an example, consider the two group discriminant problem with a linear discriminant function, in which your observations have dimension $k$ and you have two samples, one of observations that should receive a positive discriminant score and one of observations that should receive a negative discriminant score. Let $P$ be an $m_P \times k$ matrix containing the first sample and $N$ an $m_N \times k$ matrix containing the second sample. Your goal is to find a coefficient vector $x\in \mathbb{R}^k$ and scalar $x_0\in \mathbb{R}$ that maximize the number of indices $i$ for which $\sum_j P_{i,j} x_j + x_0 \ge \delta$ plus the number of indices $i$ for which $\sum_j N_{i,j} x_j + x_0 \le -\delta,$ where $\delta > 0$ is a cutoff for how small a discriminant score can be and still be counted as nonzero. There is an obvious formulation as a mixed integer program using binary indicator variables for correct v. incorrect scores.

From a "pure mathematics" perspective, scaling the coefficients of the discriminant function (including $x_0$) up or down by a constant positive factor does not change whether an observation gets a positive, negative or zero score, but unfortunately the computer's finite precision arithmetic drags us out of the realm of "pure mathematics". Absent some sort of normalization, the solver might scale the coefficients way up to exploit some lucky rounding error (where an observation in the positive sample whose score should be zero or even negative gets a tiny positive score courtesy of rounding error, which becomes big enough to clear the $\delta$ threshold after the solution is scaled up). A normalization constraint is one way to avoid this, but I suspect a better way is just to impose bounds on the variables, which avoids any nonlinearities introduced by the norm function. Note, though, that when the domain of the variables contains both positive and negative values (true in the discriminant example, where coefficients can have either sign), bounds will not stop the solver from scaling the solution downward. If the solver wants to make the solution artificially small (for instance, if there is a penalty for scores that are incorrect by more than a particular amount), bounds will not prevent it; you will need a norm constraint.

The other situation I have encountered is one where zero is a feasible solution that is perhaps attractive from the perspective of the objective function but needs to be eliminated. An example of this recently popped up on OR Stack Exchange. Given a matrix $W,$ the author wants to find a vector $x\in\mathbb{R}^n$ that maximizes the number of nonnegative components of $Wx.$ The author turns this into a MIP model using binary indicator variables for whether a component is nonnegative or not. The catch here is that $x=0\implies Wx=0,$ so the zero solution trivially maximizes the objective (all components nonnegative). The author rules this out by adding the normalization constraint $\| x \|_1=n.$ The one-norm will introduce binary variables, either directly or indirectly (meaning they might be added internally by the solver).

For this situation, there is an alternative way to rule out zero. Suppose that we draw an observation $r\in\mathbb{R}^n$ from a continuous distribution over a set of full dimension. For simplicity, I like the uniform distribution, so I might take $r\sim U([-1, 1]^n).$ I would normalize the vector (changing $r$ to $r/\|r\|_2$) just to avoid the possibility that it is too close to a zero vector (or, if using a larger domain, too big). In lieu of the norm constraint, I would add the linear constraint $r^\prime x = 1.$ Clearly this eliminates $x=0,$ but does it eliminate other values of $x?$ Most likely not. If $r^\prime x \neq 0$ and if $x$ is free to scale, than the solver can scale an otherwise optimal $x$ to satisfy $r^\prime x = 1.$ Because $r$ is drawn from a continuous distribution over a full-dimensional domain, $\Pr(r^\prime x = 0)=0$ for any nonzero $x.$ So the only real concern would be if the optimal $x$ had a (nonzero) inner product with $r$ small enough that attempting to scale it to meet the normalization constraint would violate some bounds on the components of $x.$ That is a risk I'm usually willing to take.

Here is the punch line (and the motivation for this post). I suggested using the random product in lieu of the norm constraint in an answer I posted on OR SE. The author of the question commented that he had tried it, and it "improves performance by at least an order of magnitude". Presumably this is be eliminating the (explicit or implicit) binary variables in the computation of the 1-norm, along with an auxiliary variables and constraints used to implement the norm restriction.

I tested a genetic algorithm versus the author's MIP model (modified to replace the norm constraint with a random normalization) using CPLEX. If anyone is curious how the GA does, you can play with my R code (which, as usual, requires one library for the GA and a gaggle of libraries to use CPLEX).