Someone posted a nonconvex nonlinear optimization model on OR Stack Exchange and asked for advice about possible reformulations, piecewise linear approximations, using global optimizers, and other things. The model is as follows:\begin{alignat}{1}
\max\,\, & q_{1}+q_{2}\\
\mathrm{s.t.} & \sum_{i=1}^{n}p_{i}x_{i}=\sum_{t=0}^{T}\frac{F_{t}}{(1+q_{1})^{t}} &(1)\\
& \sum_{i=1}^{n}p_{i}x_{i}=\sum_{t=0}^{T}\sum_{i=1}^{n}\frac{b_{t,i}x_{i}}{(1+q_{2})^{t}} &(2)\\
& \sum_{i=1}^{n}p_{i}x_{i}=\beta\sum_{t=0}^{T}\frac{F_{t}}{(1+q_{1}+q_{2})^{t}} &(3)\\
& q_{1,}q_{2}\ge0\\
& x\in\left[0,1\right]^{n}
\end{alignat} All symbols other than $q_1$, $q_2$ and $x$ are model parameters (or indexes). The author originally had $x$ as binary variables, apparently believing that would facilitate linearization of products, but also expressed interest in the case where $x$ is continuous. I'm going to propose a possible "low-tech" solution procedure for the continuous case. The binary case is probably a bit too tough for me. The author supplied sample data for all parameters except $\beta$, with dimensions $n=3$ and $T=4$ but expressed a desire to solve the model for $n=10,000$ and $T=1,200$ (gulp).

Note that the left-hand sides (LHSes) of the three constraints are identical. Let $h()$ be the function on the right-hand side (RHS) of constraint (1), so that the RHS of (1) is $h(q_1)$. $h()$ is a monotonically decreasing function. The RHS of (3) is $\beta h(q_1 + q_2)$. Since the left sides are equal, we have $$\beta h(q_1 + q_2) = h(q_1) \quad (4)$$which tells us several things. First, $q_2 \ge 0 \implies h(q_1+q_2) \le h(q_1)$, so if $\beta<1$ it is impossible to satisfy (4). Second, if $\beta =1$, (4) implies that $q_2 = 0$, which simplifies the problem a bit. Lastly, let's assume $\beta > 1$. For fixed $q_1$ the LHS of (4) is monotonically decreasing in $q_2$, with the LHS greater than the RHS when $q_2 = 0$ and with $$\lim_{q_2\rightarrow \infty} \beta h(q_1+q_2) = \beta F_0.$$ If $\beta F_0 > h(q_1)$, there is no $q_2$ that can balance equation (4), and so the value of $q_1$ is infeasible in the model. If $\beta F_0 < h(q_1)$, then there is exactly one value of $q_2$ for which (4) holds, and we can find it via line search.

Next, suppose that we have a candidate value for $q_1$ and have found the unique corresponding value of $q_2$ by solving (4). We just need to find a vector $x\in [0,1]^n$ that satisfies (1) and (2). Equation (3) will automatically hold if (1) does, given (4). We can find $x$ by solving a linear program that minimizes 0 subject to (1), (2) and the bounds for $x$.

Thus, we have basically turned the problem into a line search on $q_1$. Let's set an arbitrary upper limit of $Q$ for $q_1$ and $q_2$, so that our initial "interval of uncertainty" for $q_1$ is $[0, Q]$. It's entirely possible that neither 0 nor $Q$ is a feasible value for $q_1$, so our first task is to search upward from $0$ until we find a feasible value (call it $Q_\ell$) for $q_1$, then downward from $Q$ until we find a feasible value (call it $Q_h$) for $q_1$. After that, we cross our fingers and hope that all $q_1 \in [Q_\ell,Q_h]$ are feasible. I think this is true, although I do not have a proof. (I'm much less confident that it is true if we require $x$ to be binary.) Since $q_2$ is a function of $q_1$ and the objective function does not contain $x$, we can search $[Q_\ell,Q_h]$ for a local optimum (for instance, by golden section search) and hope that the objective function is unimodal as a function of $q_1$, so that the local optimum is a global optimum. (Again, I do not have proof, although I would not be surprised if it were true.)

I put this to the test with an R script, using the author's example data. The linear programs were solved using CPLEX, with the model expressed via the OMPR package for R and using ROI as the intermediary between OMPR and CPLEX. I first concocted an arbitrary feasible solution and used it to compute $\beta$, so that I would be sure that the problem was feasible with my choice of $\beta$. Using $\beta = 1.01866$ and 100 (arbitrarily chosen) as the initial upper bounds for $q_1$ and $q_2$, my code produced an "optimal" solution of $q_1= 5.450373$, $q_2 = 0.4664311$, $x = (1, 0.1334608, 0)$ with objective value $5.916804$. There is a bit of rounding error involved: the common LHS of (1)-(3) evaluated to 126.6189, while the three RHSes were 126.6186, 126.6188, and 126.6186. (In my graduate student days, our characterization of this would be "good enough for government work".) Excluding loading libraries, the entire process took under three seconds on my desktop computer.

You can access my R code from my web site. It is in the form of an R notebook (with the code embedded), so even if you are not fluent in R, you can at least read the "prose" portions and see some of the nagging details involved.

Small typo in (2). Pretty sure that the index under the 3rd summation should be i, not n.

ReplyDeleteThanks for pointing that out. It's fixed now.

DeleteI happens to be the one who asked the question on OR StackExchange. I didn't realise that you have written a blog here. I think you are being modest when you say "low-tech" solution. To me it looks like a very elegant solution. Using your assumption for β, I get the same answer with Couenne solver. I see that you have given your R code as well but I don’t understand R. I can follow the code up to the point you solve the LP to find x, and then you find a lower bound for q1. If I may ask some questions on the method:

ReplyDelete(1) What bracketing interval you assume to find feasible values for q1 using line search when setting the (feasible) lower bound and upper bound?

(2) What is the objective function for the golden search? Is it given by LHS – RHS of (1) that you minimize?

(3) Are you iteratively solving the LP based on candidate solutions for q1 and q2 until you hit the minimum based on Golden Search?

Hi,

Delete(1) I start out by setting upper limits of 100 for $q_1$ and $q_2$, snatched out of thin air. In practice, the context of the model (what $q_1$ and $q_2$ stand for) would hopefully let you set reasonable upper bounds. To get an initial lower bound for $q_1$, I find the maximum value of the left side of (1). In the R code, I assume the $p_i$ are nonnegative, in which case the left side is maxed out when all $x_i=1$. I then solve (1) to get a value for $q_1$, which is the smallest possible choice for $q_1$. Neither the lower nor upper bound is automatically feasible, so I do a line search up from the lower bound and a separate line search down from the upper bound to get the lowest and highest values of $q_1$ than are feasible.

(2) The objective function for golden section is $q_1 + q_2$, the original objective function.

(3) At each step of the golden section search, we get a new candidate value for $q_1$. I call the findQ2() function to find the corresponding value of $q_2$. That function in turn calls findX(), which solves an LP to get $x$. So there is an LP solution at each step of the GS search.

I hope that clarifies things.

Thank you for the detailed explanation. Overall method is clear to me. I would like to clarify one point regarding the interval of uncertainty.

ReplyDeleteIn the R code, I note that you assume a step size of 0.1 and keep increasing the value of q1 by this amount until you find a feasible value. That code calls findQ2() function which in turn calls findX(). To find the upper bound, you use bisection method with a bracket of [q1min,100] to find a root for q1.

I do not understand why different methods are used for lower and upper bounds. Can the bisection method be used to find the lower bound as well for q1?

You can use bisection (which may be faster than a fixed step approach) to find the lower bound *if* you have a feasible value you can use as the upper limit of the search interval. You cannot use bisection is you are starting with an interval where both endpoints are infeasible (because, after checking the midpoint, you would have no idea which half interval to keep). So you need a feasible value of $q_1$ before you do anything else. I suppose you could randomly generate values of $q_1$ in the initial interval until you got a feasible one, then use bisection (twice) to get feasible lower and upper limits.

Delete