Saturday, August 29, 2020

A Group Selection Problem

Someone posted an interesting question about nonlinear integer programming with grouped binary variables on Stack Overflow, and it drew multiple responses. The problem is simple to state. You have 52 binary variables $x_i$ partitioned into 13 groups of four each, with a requirement that exactly one variable in each group take the value 1. So the constraints are quite simple:

\begin{align*} x_{1}+\dots+x_{4} & =1\\ x_{5}+\dots+x_{8} & =1\\ \vdots\\ x_{49}+\dots+x_{52} & =1. \end{align*}

The objective function is a cubic function of the form

\[ \left(\alpha\sum_{i}a_{i}x_{i}\right)\times\left(\beta\sum_{j}b_{j}x_{j}+\beta_{0}\right)\times\left(\gamma\sum_{k}c_{k}x_{k}+\gamma_{0}\right) \] where $\alpha = 1166/2000$, $\beta = 1/2100$, $\beta_0 = 0.05$, $\gamma = 1/1500$ and $\gamma_0 = 1.5$. (In the original post, there is a minus sign in front of the function and the author minimizes; for various reasons I am omitting the minus sign and maximizing here.) Not only is the objective nonlinear, it is nonconvex if minimizing (nonconcave if maximizing). The author of the question was working in R.

Fellow blogger Erwin Kalvelagen solved the problem with a variety of nonlinear optimizers, obtaining a solution with objective value -889.346. Alex Fleischer of IBM posted an answer with the same objective value, using a constraint programming model written in OPL and solved with CP Optimizer.

My initial thought was to linearize the objective function by introducing continuous variables $y_{ij} = x_i \cdot x_j$ and $z_{ijk} = x_i \cdot x_j \cdot x_k$ with domain [0,1]. Many of those variables can be eliminated, due in part to symmetry ($y_{ij} = y_{ji}$, $z_{ijk} = z_{ikj}=\dots=z_{kji}$ and in part due to the observation that $y_{ii}=z_{iii}=x_i$. Also useful is that for $i<j<k$ $z_{ijk}=x_i \cdot y_{jk}$. I have an R notebook that you can download, in which I build the model using standard linearizations for the product of two binarys, then try to solve it with CPLEX using the Rcplex package (and the Matrix package, which allows a sparse representation of the constraint matrix). The results were, shall we say, unspectacular. With a five minute time limit (much longer than what Erwin or Alex needed), CPLEX found an incumbent with value 886.8748 (not bad but not optimal) and a rather dismal optimality gap of 146.5% (due mainly to a loose and slow moving bound).

Out of curiosity, I took a second shot using a genetic algorithm and the GA package for R. I was geeked to see that the GA package includes both an island model (using parallel processing) and a permutation variant (which lets me use permutations of the indices 1 to 52 as chromosomes with no extra work on my part). The permutation approach allows me to treat a chromosome as a prioritization of the 52 binary variables, which I decode into a solution $x$ by scanning the $x_i$ in priority order and setting each to 1 if and only none of the other variables in its group of four has been set to 1. That R notebook is also available for download.

As a metaheuristic, the GA does not offer a proof of optimality, and in fact may or may not find the optimal solution. With my inspired choice of random number seed (123), I matched Erwin's and Alex's solution (889.3463). The settings I used resulted in a run time of about 36 seconds on my PC, more than half of which was spent after the best solution had been found. It's still slower than what Erwin and Alex achieved, but it is a "pure R" solution, meaning it requires nothing besides open-source R packages.

Sunday, August 23, 2020

Multiobjective Optimization in CPLEX

In my previous post, I discussed multiobjective optimization and ended with a simple example. I'll use this post to discuss some of the new (as of version 12.9) features in CPLEX related to multiobjective optimization, and then apply them to the example from the previous post. My Java code can be downloaded from my GitLab repository.

Currently (meaning as of CPLEX version 12.10), CPLEX supports multiple objectives in linear and integer programs. It allows mixtures of "blended" objective functions (weighted combinations of original criteria) and "lexicographic" hierarchical objectives. Basically, you set one or more hierarchy (priority) levels, and in each one you can have a single criterion or a weighted combination of criteria. So the "classical" preemptive priority approach would involve multiple priority levels with a single criterion in each, while the "classical" weighted combination approach would involve one priority level with a blended objective in it. Overall, you are either maximizing or minimizing, but you can use negative weights for criteria that should go the opposite direction of the rest. In the example here, which is a minimization problem, the third priority level gives maximum provider utilization a weight of +1 (because we want to minimize it) and minimum provider utilization a weight of -1 (because we want to maximize it).

There are some limitations to the use of multiple objectives. The ones I think are of most general interest are the following:

  • objectives and constraints must be linear (no quadratic terms); and
  • all generic callbacks and legacy information callbacks can be used, but other legacy callbacks, and in particular legacy control callbacks (branch callbacks, cut callbacks etc.) cannot be used. So if you need to use callbacks with a multiobjective problem, now would be a good time to learn the new generic callback system.

Every criterion you specify has a priority level and, within that priority level, a weight. A feature that I appreciate, and which I will use in the code, is that you can also specify an absolute and/or a relative tolerance for each criterion. The tolerances tell CPLEX how much it can sacrifice in that criterion to improve lower priority criteria. The default tolerance is zero, meaning higher priority criteria must be optimized before lower priority criteria are even considered. A nonzero tolerance basically tells CPLEX that is allowed to sacrifice some amount (either an absolute amount or a percentage of the optimal value) in that criterion in order to improve lower priority criteria.

Defining the variables and building the constraints of a multiobjective model is no different from a typical single criterion model. Getting the solution after solving the model is also unchanged. The differences come mainly in how you specify the objectives and how you invoke the solver.

To build the objective function, you need to use one of the several overloads of IloCplex.staticLex(). They all take as first argument a one dimensional array of expressions IloNumExpr[], and they all return an instance of the new interface IloCplexMultiCriterionExpr. In addition to an array of objective expressions, one of the overloads lets you also specify arrays of weights, priorities and tolerances (absolute and relative). That's the version used in my sample code. 

This brings me to a minor inconvenience relative to a conventional single objective problem. Ordinarily, I would use IloCplexModeler.addMinimize(expr) or IloCplexModeler.addMaximize(expr) to add an objective to a model, where expr is an instance of IloNumExpr. I naively thought to do the same here, using the output of staticLex() as the expression, but that is not (currently) supported. There's no overload of addMinimize() or addMaximize() that accepts a multicriterion expression. So it's a three step process: use cplex.staticLex(...) to create the objective and save it to a temporary variable (where cplex is your IloCplex instance); pass that variable to either cplex.minimize(...) or cplex.maximize(...) and save the resulting instance of IloObjective in a temporary variable; and then invoke cplex.add(...) on that variable.

When you are ready to solve the model, you invoke the solve() method on it. You can continue to use the version of solve() that takes no arguments (which is what my code does), or you can use a new version that takes as argument an array of type IloCplex.ParameterSet[]. This allows you to specify different parameter settings for different priority levels.

Other methods you might be interested in are IloCplex.getMultiObjNSolves() (which gets the number of subproblems solved) and IloCplex.getMultiObjInfo() (which lets you look up a variety of things that I really have not explored yet).

The output from my code (log file), which is in the repository, is too lengthy to show here, but if you want you can use this link to open it in a new tab. Here's a synopsis. I first optimized each of the three objective functions separately. (Recall that maximum and minimum provider utilization are blended into one objective.) This gives what is sometimes referred to as the "Utopia point" or "ideal point". This is column "U" in the table below. Next, I solved the prioritized multiobjective problem. The results are in column "O" of the table. Finally, to demonstrate the ability to be flexible with priorities, I resolved the multiobjective problem using a relative tolerance of 0.1 (10%) for the top priority objective (average distance traveled) and 0.05 (5%) for the second priority objective (maximum distance traveled). Those results are in column "F".

Avg. distance 14.489 14.489 15.888
Max distance 58.605 58.605 60.000
Utilization spread 0.030 0.267 0.030
Max utilization 0.710 0.880 0.710
Min utilization 0.680 0.613 0.680

There are a few things to note.

  1. The solution to the multiobjective model ("O") achieved the ideal values for the first two objectives. One would expect to match the ideal value on the highest priority objective; matching on the second objective was luck. The third objective (utilization spread) was, not surprisingly, somewhat worse than the ideal value.
  2. Absolute and relative tolerances appear to work the same way that absolute and relative gap tolerances do: if a solution is within either absolute or relative tolerance of the best possible value on a higher priority objective, it can be accepted. In the third run, I set relative tolerances but let the absolute tolerances stay at default values.
  3. The relative tolerances I set in the last run would normally allow CPLEX to accept a solution with an average travel distance as large as $(1 + 0.1)*14.489 = 15.938$ and a maximum travel distance as large as $(1 + 0.05)*58.605 = 61.535$. There is a constraint limiting travel distance to at most 60, though, which supersedes the tolerance setting.
  4. The "flexible" solution (column "F") exceeds the ideal average distance by about 9.7%, hits the cap of 60 on maximum travel distance, and actually achieves the ideal utilization spread. However, without knowing the ideal point you would not realize that last part. I put a fairly short time limit (30 seconds) on the run, and it ended with about a 21% gap due to a very slow-moving best bound.

I'll close with one last observation. At the bottom of the log, after solving the "flexible" variant, you will see the following lines.

Solver status = Unknown.
Objective 0: Status = 101, value = 14.489, bound = 14.489.
Objective 1: Status = 101, value = 58.605, bound = 58.605.
Objective 2: Status = 107, value = 0.030, bound = 0.023.
Final value of average distance traveled = 15.888.
Final value of longest distance traveled = 60.000.
Final value of maximum provider utilization = 0.710.
Final value of minimum provider utilization = 0.680.

The first four lines are printed by CPLEX, the last four by my code. Note the mismatch in objective values of the first two criteria (bold for CPLEX, italic for my results). CPLEX prints the best value it achieved for each objective before moving on to lower priority objectives. When you are using the default tolerances of zero (meaning priorities are absolute), the printed values will match what you get in the final solution. When you specify non-zero tolerances, though, CPLEX may "give back" some of the quality of the higher priority results to improve lower priority results, so you will need to recover the objective values yourself.

Thursday, August 20, 2020

Multiobjective Optimization

Multiobjective optimization (making "optimal" decisions involving multiple, frequently conflicting, criteria) is a big subject. I will only nibble at the fringes of it here. In the next post, I'll describe recent additions to CPLEX that facilitate solving some multiobjective problems.

Among the various approaches to multiobjective problems, two are probably the most common, weighting and prioritization. The first approach is to merge the various criteria into a single one, usually (almost always?) by taking a weighted sum of the criteria. The CPLEX documentation refers to this as a blended objective. For this to make sense, the units of the various criteria really should be commensurable (e.g., all monetary values), but I'm pretty sure having criteria that are not commensurable doesn't stop people from trying. The weights serve two roles. First, they bring the units into some semblance of parity (so if $f()$ is in dollars and $g()$ in millions of dollars, $g()$ gets a weight roughly on millionth the size of the weight of $f()$). Second, they convey relative importance of various criteria.

The second approach is to prioritize the criteria. The solver initially optimizes the highest priority criterion, without regard to any others. Once an optimal value of the highest priority criterion is known, maintaining that value becomes a constraint, and the solver moves to the second highest priority criterion, and so on. The CPLEX documentation refers to this as a lexicographic objective, meaning that the objective function is vector-valued rather than scalar-valued, and optimization means achieving the lexicographically largest or smallest objective vector possible. A variant of this allows a little "slippage" in the value of each criterion, so that for example the solver can accept a solution that is 1% below optimal on the first criterion in return for optimizing the second criterion. A key limitation here is the solver will trade any amount of degradation in a lower priority criterion, no matter how much, for any improvement in a higher priority criterion, no matter how small.

Although they are not relevant to the recent CPLEX additions, I will mention two other approaches. One is a variant of the priority method, known as goal programming (GP). This was originally developed as an extension of linear programming, but the same general approach can be extended to problems with integer variables. The user sets target levels for each criterion, and then prioritizes them. If a goal is underachieved, work on meeting lower priority goals cannot sacrifice any amount of the higher priority criterion. On the other hand, if a goal is overachieved, any portion of the overachievement can be sacrificed in the quest to reach a lower priority goal. An interesting attribute of goal programming is that the same criterion can be used with more than one goal. Suppose that you are building a GP model allocating a budget to various conservation projects. Your highest priority goal might be to allocate at least 50% of the budget to projects in underserved communities (USCs, to save me typing, with apologies to the universities of South Carolina and Southern Califonia). Your second highest priority goal might be to allocate at least 30% of the budget to projects with matching funds from outside sources. Your third highest priority goal might be to allocate at least 75% of the budget to USCs. The other approach is to investigate the Pareto frontier, the set of all solutions for which no other solution does as well in all criteria and better in at least one. In essence, you want to present the decision-maker with the entire Pareto frontier and say "here, pick one". In practice, computing the Pareto frontier can be very computationally expensive, and trying to make sense of it might cause the decision maker to melt down.

To close this post, I'll pose a small sample problem and formulate the model for it. Suppose that we have $N$ patients in a health care system and $M$ providers, and that each patient needs to be assigned to a single provider. Provider $j$ has a limit $c_j$ on the number of patients they can handle. (To keep the example simple, and at the expense of some realism, we treat all patients as identical with regard to their capacity consumption.) We are given a matrix $D\in \mathbb{R}^{N\times M}$ of distances from patients to providers, as well as a cap $D_{max}$ on the distance that a patient can be required to travel. There are four criteria to be considered:

  • the average distance patients will travel (minimize, highest priority);
  • the maximum distance any patient must travel (minimize, second highest priority);
  • the maximum utilization of any provider as a fraction of their capacity (minimize, tied for third highest priority); and
  • the minimum utilization of any provider as a fraction of their capacity (maximize, tied for third highest priority).

So we have a mix of three things to minimize and one to maximize, with the last two criteria combining to somewhat level the workload across providers. 

Let $x_{ij}$ be 1 if patient $i$ is assigned to provider $j$ and 0 if not, let $w$ be the longest distance traveled by any patient, let $y_j$ be the fraction of provider $j$'s capacity that is utilized, and let $z_{lo}$ and $z_{hi}$ be the minimum and maximum capacity utilization rates, respectively (where 0 means the provider is unused and 1 means the provider is operating at capacity). The objective expression is $f\in\mathbb{R}^3$, whose lexicographic minimum we seek, where

\[ f=\left[\begin{array}{c} \frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{M}d_{ij}x_{ij}\\ w\\ z_{hi}-z_{lo} \end{array}\right]. \]

The first and second components of $f$ are the average and maximum client travel distances. The third component is a weighted mix of maximum and minimum provider utilization, where the weights (+1, -1) are equal in magnitude to reflect the equal importance I am assigning to them and the negative coefficient for minimum utilization allows it to be maximized in what is otherwise a minimization problem.

The constraints of the model are easy to state:

\begin{align*} \sum_{j=1}^{M}x_{ij} & =1\quad\forall i\in\left\{ 1,\dots,N\right\} & (1)\\ d_{ij}x_{ij} & \le w\quad\forall i\in\left\{ 1,\dots,N\right\} ,\forall j\in\left\{ 1,\dots,M\right\} & (2)\\ \frac{1}{c_{j}}\sum_{i=1}^{N}x_{ij} & =y_{j}\quad\forall j\in\left\{ 1,\dots,M\right\} & (3)\\ y_{j} & \le z_{hi}\quad\forall j\in\left\{ 1,\dots,M\right\} & (4)\\ y_{j} & \ge z_{lo}\quad\forall j\in\left\{ 1,\dots,M\right\} & (5)\\ x & \in\left\{ 0,1\right\} ^{N\times M} & (6)\\ x_{ij} & =0\quad\forall i,j\ni d_{ij}>D_{max} & (7)\\ y & \in\left[0,1\right]^{M} & (8)\\ z_{hi},z_{lo} & \in\left[0,1\right] & (9)\\ w & \in\left[0,D_{max}\right] & (10) \end{align*} 

  • Constraint (1) ensures that each patient is assigned to exactly one provider.
  • Constraint (2) defines $w$, the maximum distance traveled.
  • Constraint (3) defines the fraction $y_j$ of capacity used at each provider $j$.
  • Constraints (4) and (5) define $z_{lo}$ and $z_{hi}$.
  • Constraints (6), (8), (9) and (10) define variable domains. The upper bound of 1 for $y_j$ in (8) ensures that no provider is assigned more patients than their capacity allows.
  • Constraint (7) enforces the travel distance limit $D_{max}$ by preventing any assignments that would violate the limit (effectively removing those assignment variables from the model).

In the next post, I will show how to solve the model using CPLEX (with, as usual, the Java API).


Tuesday, August 18, 2020

A Partitioning Problem

 A recent question on Mathematics Stack Exchange dealt with reducing the number of sets in a partition of a set of items. I'll repeat it here but with slightly different terminology from the original question. You start with $N$ items partitioned into $M$ disjoint sets. Your goal is to generate a smaller partition of $K < M$ sets (which I will henceforth call "collections" to distinguish them from the original sets). It is required that all items from any original set end up in the same collection (i.e., you cannot split the original sets). The criterion for success is that "the new [collection] sizes should be as close to even as possible".

This is easily done with an integer programming model. The author of the question thought about minimizing the variance in the collection sizes, which would work, but I'm fond of keeping things linear, so I will minimize the range of collection sizes. I'll denote the cardinality of original set $i$ by $n_i$. Let $x_{ij}$ be a binary variable which is 1 if set $i\in \lbrace 1,\dots, M\rbrace$ is assigned to collection $j\in \lbrace 1,\dots,K\rbrace$  and 0 if not. Let $y$ and $z$ denote the sizes of the smallest and largest collections. Finally, for $j\in \lbrace 1,\dots,K\rbrace$ let $s_j$ be the size (cardinality) of collection $j$. A MILP model for the problem is the following:

\begin{align} \min\,z-y\\ \textrm{s.t. }\sum_{j=1}^{K}x_{ij} & =1\;\; \forall i\in\left\{ 1,\dots M\right\} \\ \sum_{i=1}^{M}n_{i}x_{ij} & =s_{j}\;\; \forall j\in\left\{ 1,\dots,K\right\} \\ s_{j} & \le z\;\; \forall j\in\left\{ 1,\dots,K\right\} \\ s_{j} & \ge y\;\; \forall j\in\left\{ 1,\dots,K\right\} \\ y,z,s_{\cdot} & \ge0\\ x_{\cdot\cdot} & \in\left\{ 0,1\right\} \end{align} 

The author of the question also indicated an interest in "fast greedy approximate solutions" (and did not specify problem dimensions). The first greedy heuristic that came to my mind was a simple one. Start with $K$ empty collections and sort the original sets into descending size order. Now assign each set, in turn, to the collection that currently has the smallest size (breaking times whimsically). Why work from largest to smallest set? There will be times when you will want to offset a large set in one collection with two or more smaller sets in another collection, and that will be easier to do if you start big and keep the smaller sets in reserve as long as is possible. Rob Pratt, owner of a rather massive reputation score on MSE, correctly noted that this is equivalent to the "longest processing time" heuristic for assigning jobs to machines so as to minimize makespan.

I put together an R notebook to test this "greedy" heuristic against the optimization model (solved with CPLEX). The notebook uses Dirk Schumacher's OMPR package for building the MILP model. It in turn uses the ROI package (which requires the Rcplex package) in order to communicate with CPLEX. On a test run using nice, round values of $N$, $M$ and $K$ that all ended in zeros (and in particular where $K$ divided evenly into both $M$ and $N$), the greedy heuristic nearly found the optimal solution. When I switched to less round numbers ($N=5723$, $M=137$, $K=10$), though, the heuristic did not fare as well. It was fast (well under one second on my PC) but it produced a solution where collection sizes ranged from 552 to 582 (a spread of 30), while CPLEX (in about 21 seconds) found an optimal solution where all collections had size either 572 or 573 (spread of 1). So I tacked on a second heuristic to refine the solution of the first heuristic. The second heuristic attempts pairwise swaps of the smallest set from the smallest collection with a larger set from a larger collection (trying collections in descending size order). Swaps are constrained not to leave the second collection (the one donating the larger set) smaller than the first collection started out. The intuition is to shrink the range by making the smallest collection bigger while shrinking the largest collection if possible and, if not, at least some collection that is larger than the smallest one. The new heuristic also ran in well under one second and shrank the range of collection sizes from 30 to 3 -- still not optimal, but likely good enough for the application the original questioner had in mind.

You are free to use the R code (which can be extracted from the notebook linked above) under the Creative Commons license that governs the blog.

Saturday, August 15, 2020

Firefox and the New Blogger Interface

Blogger has a (relatively) new interface, to which I switched a while back. The one major annoyance I found was that clicking the "Preview" button while editing a post did not actually generate a preview. I got a notification (lower left) that the preview was being prepared, and then ... nothing. To get a preview, I had to save my work, exit the edit screen (going back to the Blogger control panel), and do the preview there.

It wasn't just me, either. Checking the Blogger help community, I found a ton of posts about this, on pretty much all operating systems and browsers, with some dated this month. A tip about fixing the problem on Safari worked for me. The key (somewhat obvious in hindsight) is that Blogger needs permission to open a pop-up. This was not entirely obvious to me, since I don't consider opening a tab the same as opening a pop-up, but so be it. In Firefox, with any Blogger screen displayed, click the padlock icon in the URL bar, and under "Permissions" allow the site to open pop-ups.

Other users said they had the same problem with Chrome, which is interesting in that preview works fine for me on Chrome, and I don't recall giving explicit permission there. At any rate, I seem to be back in business.

And yes, I previewed this entry before posting it.