OR in an OB World: nonlinear optimization

Showing posts with label nonlinear optimization. Show all posts

Sunday, January 31, 2021

Solving a Multidimensional NLP via Line Search

Someone posted a nonconvex nonlinear optimization model on OR Stack Exchange and asked for advice about possible reformulations, piecewise linear approximations, using global optimizers, and other things. The model is as follows:\begin{alignat}{1} \max\,\, & q_{1}+q_{2}\\ \mathrm{s.t.} & \sum_{i=1}^{n}p_{i}x_{i}=\sum_{t=0}^{T}\frac{F_{t}}{(1+q_{1})^{t}} &(1)\\ & \sum_{i=1}^{n}p_{i}x_{i}=\sum_{t=0}^{T}\sum_{i=1}^{n}\frac{b_{t,i}x_{i}}{(1+q_{2})^{t}} &(2)\\ & \sum_{i=1}^{n}p_{i}x_{i}=\beta\sum_{t=0}^{T}\frac{F_{t}}{(1+q_{1}+q_{2})^{t}} &(3)\\ & q_{1,}q_{2}\ge0\\ & x\in\left[0,1\right]^{n} \end{alignat} All symbols other than $q_1$, $q_2$ and $x$ are model parameters (or indexes). The author originally had $x$ as binary variables, apparently believing that would facilitate linearization of products, but also expressed interest in the case where $x$ is continuous. I'm going to propose a possible "low-tech" solution procedure for the continuous case. The binary case is probably a bit too tough for me. The author supplied sample data for all parameters except $\beta$, with dimensions $n=3$ and $T=4$ but expressed a desire to solve the model for $n=10,000$ and $T=1,200$ (gulp).

Note that the left-hand sides (LHSes) of the three constraints are identical. Let $h()$ be the function on the right-hand side (RHS) of constraint (1), so that the RHS of (1) is $h(q_1)$. $h()$ is a monotonically decreasing function. The RHS of (3) is $\beta h(q_1 + q_2)$. Since the left sides are equal, we have $$\beta h(q_1 + q_2) = h(q_1) \quad (4)$$which tells us several things. First, $q_2 \ge 0 \implies h(q_1+q_2) \le h(q_1)$, so if $\beta<1$ it is impossible to satisfy (4). Second, if $\beta =1$, (4) implies that $q_2 = 0$, which simplifies the problem a bit. Lastly, let's assume $\beta > 1$. For fixed $q_1$ the LHS of (4) is monotonically decreasing in $q_2$, with the LHS greater than the RHS when $q_2 = 0$ and with $$\lim_{q_2\rightarrow \infty} \beta h(q_1+q_2) = \beta F_0.$$ If $\beta F_0 > h(q_1)$, there is no $q_2$ that can balance equation (4), and so the value of $q_1$ is infeasible in the model. If $\beta F_0 < h(q_1)$, then there is exactly one value of $q_2$ for which (4) holds, and we can find it via line search.

Next, suppose that we have a candidate value for $q_1$ and have found the unique corresponding value of $q_2$ by solving (4). We just need to find a vector $x\in [0,1]^n$ that satisfies (1) and (2). Equation (3) will automatically hold if (1) does, given (4). We can find $x$ by solving a linear program that minimizes 0 subject to (1), (2) and the bounds for $x$.

Thus, we have basically turned the problem into a line search on $q_1$. Let's set an arbitrary upper limit of $Q$ for $q_1$ and $q_2$, so that our initial "interval of uncertainty" for $q_1$ is $[0, Q]$. It's entirely possible that neither 0 nor $Q$ is a feasible value for $q_1$, so our first task is to search upward from $0$ until we find a feasible value (call it $Q_\ell$) for $q_1$, then downward from $Q$ until we find a feasible value (call it $Q_h$) for $q_1$. After that, we cross our fingers and hope that all $q_1 \in [Q_\ell,Q_h]$ are feasible. I think this is true, although I do not have a proof. (I'm much less confident that it is true if we require $x$ to be binary.) Since $q_2$ is a function of $q_1$ and the objective function does not contain $x$, we can search $[Q_\ell,Q_h]$ for a local optimum (for instance, by golden section search) and hope that the objective function is unimodal as a function of $q_1$, so that the local optimum is a global optimum. (Again, I do not have proof, although I would not be surprised if it were true.)

I put this to the test with an R script, using the author's example data. The linear programs were solved using CPLEX, with the model expressed via the OMPR package for R and using ROI as the intermediary between OMPR and CPLEX. I first concocted an arbitrary feasible solution and used it to compute $\beta$, so that I would be sure that the problem was feasible with my choice of $\beta$. Using $\beta = 1.01866$ and 100 (arbitrarily chosen) as the initial upper bounds for $q_1$ and $q_2$, my code produced an "optimal" solution of $q_1= 5.450373$, $q_2 = 0.4664311$, $x = (1, 0.1334608, 0)$ with objective value $5.916804$. There is a bit of rounding error involved: the common LHS of (1)-(3) evaluated to 126.6189, while the three RHSes were 126.6186, 126.6188, and 126.6186. (In my graduate student days, our characterization of this would be "good enough for government work".) Excluding loading libraries, the entire process took under three seconds on my desktop computer.

You can access my R code from my web site. It is in the form of an R notebook (with the code embedded), so even if you are not fluent in R, you can at least read the "prose" portions and see some of the nagging details involved.

Wednesday, September 30, 2020

A Greedy Heuristic Wins

A problem posted on OR Stack Exchange starts as follows: "I need to find two distinct values to allocate, and how to allocate them in a network of stores." There are $n$ stores (where, according to the poster, $n$ can be close to 1,000). The two values (let's call them $x_1$ and $x_2$) must be integer, with $x_1 \in \lbrace 1, \dots, k_1 \rbrace$ and $x_2 \in \lbrace k_1, \dots, k_2 \rbrace$ for given parameters $k_1 < k_2$. Additionally, there is an additional set of parameters $s_{i3}$ and a balance constraint saying $$0.95 g(k_1 e) \le g(x_1, x_2) \le 1.05 g(k_1 e)$$ where $$g(y) = \sum_{i=1} \frac{s_{i3}}{y_i}$$ for any allocation $y$ and $e = (1,\dots, 1).$

The cost function (to be minimized) has the form $$f(x_1, x_2) = a\sum_{i=1}^n \left[ s_{i1}\cdot \left( \frac{s_{i2}}{y_i} \right)^b \right]$$with $a$, $s_{i1}$, $s_{i2}$ and $b$ all parameters and $y_i \in \lbrace x_1, x_2 \rbrace$ is the allocation to store $i$. There are two things to note about $f$. First, the leading coefficient $a (> 0)$ can be ignored when looking for an optimum. Second, given choices $x_1$ and $x_2>x_1$, the cheaper choice at all stores will be $x_1$ if $b < 0$ and $x_2$ if $b > 0$.

It's possible that a nonlinear solver might handle this, but I jumped straight to metaheuristics and, in particular, my go-to choice among metaheuristics -- a genetic algorithm. Originally, genetic algorithms were intended for unconstrained problems, and were tricky to use with constrained problems. (You could bake a penalty for constraint violations into the fitness function, or just reject offspring that violated any constraints, but neither of those approaches was entirely satisfactory.) Then came a breakthrough, the random key genetic algorithm [1]. A random key GA uses a numeric vector $v$ (perhaps integer, perhaps byte, perhaps double precision) as the "chromosome". The user is required to supply a function that translates any such chromosome into a feasible solution to the original problem.

I did some experiments in R, using the GA package to implement a random key genetic algorithm. The package requires all "genes" (think "variables") to be the same type, so I used a double-precision vector of dimension $n_2$ for chromosomes. The last two genes have domains $(1, k_1 + 1)$ and $(k_1, k_2 + 1)$; the rest have domain $(0, 1)$. Decoding a chromosome $v$ proceeds as follows. First, $x_1 = \left\lfloor v_{n+1}\right\rfloor $ and $x_2 = \left\lfloor v_{n+2}\right\rfloor $, where $\left\lfloor z \right\rfloor$ denotes the "floor" (greatest lower integer) of $z$. The remaining values $v_1, \dots, v_{n}$ are sorted into ascending order, and their sort order is applied to the stores. So, for instance, if $v_7$ is the smallest of those genes and $v_{36}$ is the largest, then store $7$ will be first in the sorted list of stores and store $36$ will be last. (The significance of this sorting will come out in a second.)

Armed with this, my decoder initially assigns every store the cheaper choice between $x_1$ and $x_2$ and computes the value of $g()$. If $g()$ does not fall within the given limits, the decoder runs through the stores in their sorted order, switching the allocation to the more expensive choice and updating $g()$, until $g()$ meets the balance constraint. As soon as it does, we have the decoded solution. This cheats a little on the supposed guarantee of feasibility in a decoded solution, since there is a (small?) (nearly zero?) chance that the decoding process will fail with $g()$ jumping from below the lower bound to above the upper bound (or vice versa) after some swap. If it does, my code discards the solution. This did not seem to happen in my testing.

The GA seemed to work okay, but it occurred to me that I might be over-engineering the solution a bit. (This would not be the first time I did that.) So I also tried a simple greedy heuristic. Since $k_1$ and $k_2$ seem likely to be relatively small in the original poster's problem (whereas $n$ is not), my greedy heuristic loops through all valid combinations of $x_1$ and $x_2$. For each combination, it sets $v_1$ equal to the cheaper choice and $v_2$ equal to the more expensive choice, assigns the cheaper quantity $v_1$ to every store and computes $g()$. It also computes, for each store, the ratio \[ \frac{|\frac{s_{i3}}{v_{2}}-\frac{s_{i3}}{v_{1}}|}{s_{i1}\left(\left(\frac{s_{i2}}{v_{2}}\right)^{b}-\left(\frac{s_{i1}}{v_{1}}\right)^{b}\right)} \]in which the numerator is the absolute change in balance at store $i$ when switching from the cheaper allocation $v_1$ to the more expensive allocation $v_2$, and the denominator is the corresponding change in cost. The heuristic uses these ratios to select stores in descending "bang for the buck" order, switching each store to the more expensive allocation until the balance constraint is met.

Both the GA decoder and the greedy heuristic share the approach of initially allocating every store the cheaper choice and then switching stores to the more expensive choice until balance is attained. My R notebook generates a random problem instance with $n=1,000$ and then solves it twice, first with the GA and then with the greedy heuristic. The greedy heuristic stops when all combinations of $x_1$ and $x_2$ have been tried. Stopping criteria for the GA are more arbitrary. I limited it to at most 1,000 generations (with a population of size 100) or 20 consecutive generations with no improvement, whichever came first.

The results on a typical instance were as follows. The GA ran for 49 seconds and got a solution with cost 1065.945. The greedy heuristic needed only 0.176 seconds to get a solution with cost 1051.735. This pattern (greedy heuristic getting a better solution in much less time) repeated across a range of random number seeds and input parameters, including switching between positive and negative values of $b$.

If you are interested, you can browse my R notebook (which includes both code and results).

[1] Bean, J. C. (1994). Genetic Algorithms and Random Keys for Sequencing and Optimization. ORSA Journal on Computing, 6, 154-160.

Saturday, August 29, 2020

A Group Selection Problem

Someone posted an interesting question about nonlinear integer programming with grouped binary variables on Stack Overflow, and it drew multiple responses. The problem is simple to state. You have 52 binary variables $x_i$ partitioned into 13 groups of four each, with a requirement that exactly one variable in each group take the value 1. So the constraints are quite simple:

\begin{align*} x_{1}+\dots+x_{4} & =1\\ x_{5}+\dots+x_{8} & =1\\ \vdots\\ x_{49}+\dots+x_{52} & =1. \end{align*}

The objective function is a cubic function of the form

\[ \left(\alpha\sum_{i}a_{i}x_{i}\right)\times\left(\beta\sum_{j}b_{j}x_{j}+\beta_{0}\right)\times\left(\gamma\sum_{k}c_{k}x_{k}+\gamma_{0}\right) \] where $\alpha = 1166/2000$, $\beta = 1/2100$, $\beta_0 = 0.05$, $\gamma = 1/1500$ and $\gamma_0 = 1.5$. (In the original post, there is a minus sign in front of the function and the author minimizes; for various reasons I am omitting the minus sign and maximizing here.) Not only is the objective nonlinear, it is nonconvex if minimizing (nonconcave if maximizing). The author of the question was working in R.

Fellow blogger Erwin Kalvelagen solved the problem with a variety of nonlinear optimizers, obtaining a solution with objective value -889.346. Alex Fleischer of IBM posted an answer with the same objective value, using a constraint programming model written in OPL and solved with CP Optimizer.

My initial thought was to linearize the objective function by introducing continuous variables $y_{ij} = x_i \cdot x_j$ and $z_{ijk} = x_i \cdot x_j \cdot x_k$ with domain [0,1]. Many of those variables can be eliminated, due in part to symmetry ($y_{ij} = y_{ji}$, $z_{ijk} = z_{ikj}=\dots=z_{kji}$ and in part due to the observation that $y_{ii}=z_{iii}=x_i$. Also useful is that for $i<j<k$ $z_{ijk}=x_i \cdot y_{jk}$. I have an R notebook that you can download, in which I build the model using standard linearizations for the product of two binarys, then try to solve it with CPLEX using the Rcplex package (and the Matrix package, which allows a sparse representation of the constraint matrix). The results were, shall we say, unspectacular. With a five minute time limit (much longer than what Erwin or Alex needed), CPLEX found an incumbent with value 886.8748 (not bad but not optimal) and a rather dismal optimality gap of 146.5% (due mainly to a loose and slow moving bound).

Out of curiosity, I took a second shot using a genetic algorithm and the GA package for R. I was geeked to see that the GA package includes both an island model (using parallel processing) and a permutation variant (which lets me use permutations of the indices 1 to 52 as chromosomes with no extra work on my part). The permutation approach allows me to treat a chromosome as a prioritization of the 52 binary variables, which I decode into a solution $x$ by scanning the $x_i$ in priority order and setting each to 1 if and only none of the other variables in its group of four has been set to 1. That R notebook is also available for download.

As a metaheuristic, the GA does not offer a proof of optimality, and in fact may or may not find the optimal solution. With my inspired choice of random number seed (123), I matched Erwin's and Alex's solution (889.3463). The settings I used resulted in a run time of about 36 seconds on my PC, more than half of which was spent after the best solution had been found. It's still slower than what Erwin and Alex achieved, but it is a "pure R" solution, meaning it requires nothing besides open-source R packages.

Sunday, May 31, 2020

A Simple Constrained Optimization

A question posted to OR Stack Exchange, "Linear optimization problem with user-defined cost function", caught my eye. The question has gone through multiple edits, and the title is a bit misleading, in that the objective function is in at least some cases nonlinear. The constraints are both linear and very simple. The user is looking for weights to assign to $n$ vectors, and the weights $x_i$ satisfy $$\sum_{i=1}^n x_i = 1\\x \ge 0.$$ Emma, the original poster, put a working example (in Python) on GitHub. The simplified version of her cost function includes division of one linear expression by another, with an adjustment to deal with division by zero errors (converting the resulting NaN to 0).

The feasible region of the problem is a simplex, which triggered a memory of the Nelder-Mead algorithm (which was known as the "Nelder-Mead simplex algorithm" when I learned it, despite confusion with Dantzig's simplex algorithm for linear programs). The Nelder-Mead algorithm, published in 1965, attempts to optimize a nonlinear function (with no guarantee of convergence to the optimum in general), using only function evaluations (no derivatives). It is based on an earlier algorithm (by Spendley, Hext and Himsworth, in 1962), and I'm pretty sure there have been tweaks to Nelder-Mead over the subsequent years.

The Nelder-Mead algorithm is designed for unconstrained problems. That said, my somewhat fuzzy recollection was that Nelder-Mead starts with a simplex (hopefully containing an optimal solution) and progressively shrinks the uncertainty region, each time getting a simplex that is a subset of the previous simplex. So if we start with the unit simplex $\lbrace (1,0,0,\dots,0,0), (0,1,0,\dots,0,0),\dots,(0,0,0,\dots,0,1)\rbrace$, which is the full feasible region, every subsequent simplex should be comprised of feasible points. It turns out I was not quite right. Depending on the parameter values you use, there is one step (expansion) that can leave the current simplex and thus possibly violate the sign restrictions. That's easily fixed, though, by checking the step size and shrinking it if necessary.

There are several R packages containing a Nelder-Mead function, but most of them look like they are designed for univariate optimization, and the one I could find that was multivariate and allowed specification of the initial simplex would not work for me. So I coded my own, based on the Wikipedia page, which was easy enough. I used what that page describes as typical values for the four step size parameters. It hit my convergent limit (too small a change in the simplex) after 29 iterations, producing a solution that appears to be not quite optimal but close.

Just for comparison purposes, I thought I would try a genetic algorithm (GA). GAs are generally not designed for constrained problems, although there are exceptions. (Search "random key genetic algorithm" to find one.) That's easy to finesse in our case. Getting a GA to produce only nonnegative values is easy: you just have to require the code that generates new solutions (used to seed the initial population, and possibly for immigration) and the code that mutates existing solutions to use only nonnegative numbers. That might actually be the default in a lot of GA libraries. "Crossover" (their term for solutions having children) takes care of itself. So we just need to enforce the lone equation constraint, which we can do by redefining the objective function. We allow the GA to produce solutions without regard to the sum of their components, and instead optimize the function $$\hat{f}(x)=f\left(\frac{x}{\sum_{i=1}^n x_i}\right)$$where $f()$ is the original objective function.

R has multiple GA packages. I used the `genalg` package in my experiments. Running was 100 generations with a population of size 200 took several seconds (so longer than what Nelder-Mead took), but it produced a somewhat better solution. Since the GA is a random algorithm, running it repeatedly will produce different results, some worse, possibly some better. You could also try restarting Nelder-Mead when the polytope gets too small, starting from a new polytope centered around the current optimum, which might possibly improve on the solution obtained.

This was all mostly just to satisfy my curiosity. My R code for both the Nelder-Mead and GA approaches is in an R notebook you are free to download.