## Thursday, November 6, 2014

### Linearize That!

For whatever reason, I've seen a bunch of questions posted on various fora boiling down to "How do I linearize <insert grossly nonlinear function>?" Whether by coincidence or due to some virtual viral epidemic, I've seen three or four in the past week that involved logarithms. So, without further ado, here is the quick solution: Source: Norbert Schnitzler (reproduced under Creative Commons license)
Just put your function/constraint on the ground in front of it and back away carefully.

Which is to say, you cannot linearize an arbitrary nonlinear function. That's why they stuck the "non" in nonlinear.

I suspect the confusion stems in part from the fact that you can linearize certain otherwise nonlinear expressions involving binary (or, if you care to do binary expansions, general integer) variables. In fact, I've written about it myself (cf. Binary Variables and Quadratic TermsInteger Variables and Quadratic Terms). What is perhaps not apparent is that, under the hood, the linearization techniques amount to decomposing the feasible region into convex chunks on which the original nonlinear function is actually linear. For example, if we want to linearize the product $f(x,y)=xy$ where $x$ is a binary variable and $y$ is a continuous variable, the linearization described in the first of my two posts equates to splitting the feasible regions into two subregions, one where $x=0$ and $f(x,y) = 0$ (trivially linear) and the other where $x=1$ and $f(x,y)=y$ (also linear).

Now consider the nonlinear, concave and suspiciously logarithmic function $f(x)$ shown in red in the following plot: Suppose that you have the constraint $f(x)\le b$ in a mathematical program, where $b$ is either a constant or a linear function. Since $f$ is concave and on the left side of a $\le$ constraint, the feasible region becomes nonconvex, a major problem.

Perhaps there is a way to linearize this, other than the analog method I alluded to in the photo, but if so I do not know one. What you can do, however, is a linear approximation. In the plot, I broke the (finite) domain of $x$ into intervals $[t_0, t_1], [t_1, t_2], \dots, [t_{N-1},t_N]$. Suppose now that we introduce continuous variables $\alpha_0,\dots,\alpha_N$ satisfying three requirements:
• $\sum_{i=0}^N \alpha_i = 1$;
• at most two of the $\alpha$ variables can be nonzero; and
• if two are nonzero, they must be consecutive ($\alpha_j$ and $\alpha_{j+1}$ for some $j$).
We can now write $$x=\sum_{i=0}^N t_i \alpha_i$$(a linear constraint) and replace $f(x)$ with the linear expression $$\sum_{i=0}^N f(t_i) \alpha_i,$$noting that $f(t_i)$ is a constant. As an illustration, the midpoint of the interval $[t_1, t_2]$ in the plot would correspond to $\alpha_1 = \alpha_2 = 0.5$, all other $\alpha_j = 0$. The blue piecewise-linear curve shows the approximation $\sum_{i=0}^N f(t_i) \alpha_i,$.

The $\alpha$ variables form what is known as a Type 2 Special Ordered Set (SOS2). Although the $\alpha$ variables appear continuous, the SOS2 itself makes the problem a mixed integer program, effectively chopping the feasible region into subdomains (in this example, the intervals $[t_i, t_{i+1}]$) on which the approximation is linear.

1. Dear Prof Rubin,

I have a question regarding the modelling of a nonlinear problem. I have two ways of formulating a problem that appears to be separable. I'd like to ask which formulation is more efficient.

In the first formulation. the objective looks like minimizing (a+b*V1+c*V1^2 + a+b*V2+c*V2^2) and the constraint: (m / V1+n / V2 = p). The coefficients a,b,c,m,n,p are all positive.

If the variables are chosen to be U1 = 1/V1 and U2 = 1/V2 then we have the second formulation with the objective being (a+b*(1/U1)+c*(1/U1)^2 + a+b*(1/U2)+c*(1/U2)^2) and the constraint m*U1+n*U2 = p.

1. I'm going to assume here that V1 and V2 are nonnegative, in which case the left side of the constraint in the first formulation and the objective in the second formulation are both convex. I'm not sure about efficiency, but I would be inclined to go with the second formulation for the following reason. You are going to have to do a piecewise-linear approximation to the constraint function in the first model or the objective in the second. The solution in the second case will be feasible and (hopefully) approximately optimal. The solution in the first case is going to be optimal (for a slightly different problem) and approximately feasible for your problem. Unfortunately, "approximately feasible" likely means infeasible. With a sufficiently granular approximation, you'll likely get close enough to feasible for your purposes. (Remember that even solutions to LPs are only feasible to within rounding tolerances.) I'm just a little more comfortable with calling an objective value "close enough" than I am with causing a constraint violation "close enough".

2. Yes, the V1 and V2 are non negative, actually they are kind of speeds of cars so their range is pretty much known. I think I'll go with the second case first as you suggested.

I saw in some books that in case of separable non linear constraints, it is still possible to approximate them with piecewise-linear approximation. I just wonder if "approximately feasible" likely means infeasible as you mentioned, why the authors still present this approach (sometimes without even mentioning the risk of infeasibility).

3. Between formulations that simplify reality, estimates of parameter values, the accuracy limits of floating point math and nonzero convergence tolerances, results are always approximate anyway. If the solution to the linear approximation is outside feasibility tolerances, you can always add more break points near the "optimum" and rerun the solver.

2. This comment has been removed by the author.

3. Dear Prof. Paul,

Thank you very much for this helpful post. I have a small query. I have a constraint in the form: log(x1)+log(x2)+log(x3)<=B, and x1+x2+x3=P. Here x are continuous variables and bounded as x_lb <= x <= x_ub. and B is a constant. How do I generate the (N+1) breaking points: [t_0, t_1], [t_1, t_2], … , [t_(N-1), t_N]? Lets say x_lb = 0 and x_ub = 1 and N=10. And for each of the SOS2 variables a_0, a_1, …, a_N, can I assume that a_n is bounded as 0 <= a_n <= 1 ? Furthermore, if N is very large, then does the complexity becomes comparable to the original nonlinear constraint?

1. I may need to perform the piecewise linear approximaion for each of the terms separately, and then sum them up. My question is how to generate the breaking points for this particular constraint so that the second constraint can also be satisfied.

2. I think if you employ piece-wise linear approximation for these constraints, the solution is infeasible because of the constraints on x1, x2 and x3. I am not sure whether you can generate the breaking points t_0, t_1,....,t_N based on the upper and lower bounds of the variable. Prof. Paul may answer this well.

3. Let the domain of $x_j$ be $[a_j, b_j]$. Split each of those domains arbitrarily, say $x_j \in [w_{j0}, \dots, w_{jN}]$ where $w_{j0}=a_j$ and $w_{jN}=b_j$. The number of breakpoints ($N+1$ here) need not be the same for each $j$, and the breakpoints need not be evenly spaced. The weight variables are doubly-subscripted ($\alpha_{jk}$). They are nonnegative and sum to 1 separately for each $j$. There's nothing special about the second (linear) constraint that would cause any problems with this.

4. This comment has been removed by the author.

5. Dear Prof. Paul.

Thanks for your nice post. This question is same as Dimitrios Vlastaras's post. However, I am a bit confused about the feasibility while employing this approach in constraint optimization. Let say: log(1+2*x1)+log(1+2*x2)<=Cy. Here x1 and x2 are continuous variable and they are bounded. The RHS is an affine function with C being a constant and y: a continuous variable. Please note that x may be active in other constraints also. When we perform the optimization, x takes only one particular value, not all the values within its bound, so we are supposed to perform the linear approximation at that point not over its full domain. When we do the piece-wise linearisation, x1 and x2 will take some predefined values (from the breaking points), which may not satisfy the other constraints because of the SOS2 constraints and the breaking points generated over its domain. Please clarify me this issue.

1. You do a pw-linear approximation separately for $x_1$ and $x_2$. The breakpoints cover the full domain of each of those variables, and so $\sum_j \alpha_{ij}t_{ij}$ can take value in the domain of $x_i$.

6. Dear Prof. Paul,

Thank you very much for your helpful post. I need your support to linearize an objective function. The objective is: minimize x(2^y-1)/z, where x in a binary integer variable, y is a continuous variable and z is a known parameter. How can I linearize this?
The constraint has the for

1. You cannot linearize it. Introduce a new variable $w$ and the constraint $w=\frac{2^y-1}{z}$. Linearize the product $xw$ as explained in my earlier post about quadratic terms involving binary variables. Then follow the process I outlined above for approximating a log function, applying it to the expression $\frac{2^y-1}{z}$.

2. Dear Prof. Paul,

Thank you very much for your answer. However, do I need to find the upper bound and lower bound values for w. Lets say, 0<= y <= 2 and z=3. Then (2^0-1)/3 <= w <= (2^2-1)/3, Am I right?

3. Yes, you need to bound $w$ to linearize the product $xw$, and yes, what you propose for the bounds looks reasonable.

7. Dear Prof. Paul, putting w=(2^y-1)/z gives me the following error:

Disciplined convex programming error:
Invalid constraint: {real affine} == {convex}

I am using the CVX solver

1. I don't use CVX, so I can only speculate here. It seems to think that the right side of your constraint is convex (it's not, unless you have constrained y >= 1) and is complaining that you are trying to set a (strictly) convex function equal to an affine function (f(w) = w is linear, hence affine).

Setting a linear/affine function equal to a strictly convex function creates a nonconvex feasible region, which is probably why CVX is complaining. The fact that your "convex" function is actually nonconvex if y is allowed to be less than 1 just reinforces the nonconvexity of the feasible region.

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.