Wednesday, September 30, 2020

A Greedy Heuristic Wins

 A problem posted on OR Stack Exchange starts as follows: "I need to find two distinct values to allocate, and how to allocate them in a network of stores." There are $n$ stores (where, according to the poster, $n$ can be close to 1,000). The two values (let's call them $x_1$ and $x_2$) must be integer, with $x_1 \in \lbrace 1, \dots, k_1 \rbrace$ and $x_2 \in \lbrace k_1, \dots, k_2 \rbrace$ for given parameters $k_1 < k_2$. Additionally, there is an additional set of parameters $s_{i3}$ and a balance constraint saying $$0.95 g(k_1 e) \le g(x_1, x_2) \le 1.05 g(k_1 e)$$ where $$g(y) = \sum_{i=1} \frac{s_{i3}}{y_i}$$ for any allocation $y$ and $e = (1,\dots, 1).$

The cost function (to be minimized) has the form $$f(x_1, x_2) = a\sum_{i=1}^n \left[ s_{i1}\cdot \left( \frac{s_{i2}}{y_i} \right)^b \right]$$with $a$, $s_{i1}$, $s_{i2}$ and $b$ all parameters and $y_i \in \lbrace x_1, x_2 \rbrace$ is the allocation to store $i$. There are two things to note about $f$. First, the leading coefficient $a (> 0)$ can be ignored when looking for an optimum. Second, given choices $x_1$ and $x_2>x_1$, the cheaper choice at all stores will be $x_1$ if $b < 0$ and $x_2$ if $b > 0$.

It's possible that a nonlinear solver might handle this, but I jumped straight to metaheuristics and, in particular, my go-to choice among metaheuristics -- a genetic algorithm. Originally, genetic algorithms were intended for unconstrained problems, and were tricky to use with constrained problems. (You could bake a penalty for constraint violations into the fitness function, or just reject offspring that violated any constraints, but neither of those approaches was entirely satisfactory.) Then came a breakthrough, the random key genetic algorithm [1]. A random key GA uses a numeric vector $v$ (perhaps integer, perhaps byte, perhaps double precision) as the "chromosome". The user is required to supply a function that translates any such chromosome into a feasible solution to the original problem.

I did some experiments in R, using the GA package to implement a random key genetic algorithm. The package requires all "genes" (think "variables") to be the same type, so I used a double-precision vector of dimension $n_2$ for chromosomes. The last two genes have domains $(1, k_1 + 1)$ and $(k_1, k_2 + 1)$; the rest have domain $(0, 1)$. Decoding a chromosome $v$ proceeds as follows. First, $x_1 = \left\lfloor v_{n+1}\right\rfloor $ and $x_2 = \left\lfloor v_{n+2}\right\rfloor $, where $\left\lfloor z \right\rfloor$ denotes the "floor" (greatest lower integer) of $z$. The remaining values $v_1, \dots, v_{n}$ are sorted into ascending order, and their sort order is applied to the stores. So, for instance, if $v_7$ is the smallest of those genes and $v_{36}$ is the largest, then store $7$ will be first in the sorted list of stores and store $36$ will be last. (The significance of this sorting will come out in a second.)


Armed with this, my decoder initially assigns every store the cheaper choice between $x_1$ and $x_2$ and computes the value of $g()$. If $g()$ does not fall within the given limits, the decoder runs through the stores in their sorted order, switching the allocation to the more expensive choice and updating $g()$, until $g()$ meets the balance constraint. As soon as it does, we have the decoded solution. This cheats a little on the supposed guarantee of feasibility in a decoded solution, since there is a (small?) (nearly zero?) chance that the decoding process will fail with $g()$ jumping from below the lower bound to above the upper bound (or vice versa) after some swap. If it does, my code discards the solution. This did not seem to happen in my testing.


The GA seemed to work okay, but it occurred to me that I might be over-engineering the solution a bit. (This would not be the first time I did that.) So I also tried a simple greedy heuristic. Since $k_1$ and $k_2$ seem likely to be relatively small in the original poster's problem (whereas $n$ is not), my greedy heuristic loops through all valid combinations of $x_1$ and $x_2$. For each combination, it sets $v_1$ equal to the cheaper choice and $v_2$ equal to the more expensive choice, assigns the cheaper quantity $v_1$ to every store and computes $g()$. It also computes, for each store, the ratio \[ \frac{|\frac{s_{i3}}{v_{2}}-\frac{s_{i3}}{v_{1}}|}{s_{i1}\left(\left(\frac{s_{i2}}{v_{2}}\right)^{b}-\left(\frac{s_{i1}}{v_{1}}\right)^{b}\right)} \]in which the numerator is the absolute change in balance at store $i$ when switching from the cheaper allocation $v_1$ to the more expensive allocation $v_2$, and the denominator is the corresponding change in cost. The heuristic uses these ratios to select stores in descending "bang for the buck" order, switching each store to the more expensive allocation until the balance constraint is met.

Both the GA decoder and the greedy heuristic share the approach of initially allocating every store the cheaper choice and then switching stores to the more expensive choice until balance is attained. My R notebook generates a random problem instance with $n=1,000$ and then solves it twice, first with the GA and then with the greedy heuristic. The greedy heuristic stops when all combinations of $x_1$ and $x_2$ have been tried. Stopping criteria for the GA are more arbitrary. I limited it to at most 1,000 generations (with a population of size 100) or 20 consecutive generations with no improvement, whichever came first.


The results on a typical instance were as follows. The GA ran for 49 seconds and got a solution with cost 1065.945. The greedy heuristic needed only 0.176 seconds to get a solution with cost 1051.735. This pattern (greedy heuristic getting a better solution in much less time) repeated across a range of random number seeds and input parameters, including switching between positive and negative values of $b$.

If you are interested, you can browse my R notebook (which includes both code and results).


[1] Bean, J. C. (1994). Genetic Algorithms and Random Keys for Sequencing and Optimization. ORSA Journal on Computing, 6, 154-160.

Thursday, September 3, 2020

Installing Rcplex and cplexAPI

I've previously mentioned solving MIP models in R, using CPLEX. In one post [1], I used the OMPR package, which provides a domain specific language for model construction. OMPR uses the ROI package, and in particular the ROI.plugin.cplex package, to communicate with CPLEX. That, in turn, uses the Rcplex package. In another post [2], I used Rcplex directly. Meanwhile, there is still another package, cplexAPI, that provides a low-level API to CPLEX.

Both Rcplex and cplexAPI will install against CPLEX Studio 12.8 and earlier, but neither one installs with CPLEX Studio 12.9 or 12.10. Fortunately, IBM's Daniel Junglas was able to hack solutions for both of them. I'll spell out the steps I used to get Rcplex working with CPLEX 12.10. You can find the solutions for both in the responses to this question on the IBM Decision Optimization community site. Version information for what follows is: Linux Mint 19.3; CPLEX Studio 12.10; R 3.6.3; and Rcplex 0.3-3. Hopefully essentially the same hack works with Windows.

  1. Download Rcplex_0.3-3.tar.gz, put it someplace harmless (the Downloads folder in my case, but /tmp would be fine) and expand it, producing a folder named Rcplex.
  2. Go to the Rcplex folder and open the 'configure' file in a text editor (one you would use for plain text files).
  3. Line 1548 should read as follows:
    CPLEX_LIBS="-L${CPLEXLIBDIR} `${AWK} 'BEGIN {FS = " = "} /^CLNFLAGS/ {print $2}' ${CPLEX_MAKEFILE}`"
    Replace it with
    CPLEX_LIBS="-L${CPLEXLIBDIR} `${AWK} 'BEGIN {FS = " = "} /^CLNFLAGS/ {print $2}' ${CPLEX_MAKEFILE} | sed -e 's,\$(CPLEXLIB),cplex,'`"
    Save the modified file.
  4. Open a terminal in the parent directory of the Rcplex folder and run the following command:
    R CMD INSTALL --configure-args="--with-cplex-dir=.../CPLEX_Studio1210/cplex/" ./Rcplex
    Adjust the file path (particularly the ...) so that it points to the 'cplex' directory in your CPLEX Studio installation (the one that has subdirectories named "bin", "examples", "include" etc.).
  5. Assuming there were no error messages during installation, you should be good to go.



Update: Version 1.4.0 of cplexAPI, released on 2020-09-21, installs correctly against CPLEX 12.10 (and presumably 12.9), at least on my system (Linux Mint).