## Friday, July 6, 2012

### Modeling Absolute Values

There's a category of frequently asked questions that, at some point, boils down to how to model an absolute value in a linear (LP) or mixed-integer linear program (MILP). There is typically more than one answer, and I became curious how different methods perform. Having idle time on my hands, I decided to try a little experiment. Being lazy (and not having to worry about pacifying reviewers and editors -- hooray for blogs!), I decided to do just one experiment. Since this involves MILPs, let me start with a disclaimer.

There are no safe generalizations about MILPs other than this one.
Past performance is not a predictor of anything.
Stuff happens.

(Hopefully that was sufficient. I should probably park that disclaimer in the margin of the blog so that it appears on all posts.)

Let's say that we have a variable (or variable expression) $x$ whose absolute value we need. Any trick for modeling absolute values will require $x$ to be bounded, say $L\le x\le U$.  If $L\ge 0$ then $|x|=x$, and if $U\le 0$ then $|x|=-x$, so we will dispense with the boring cases and assume that $L<0<U$.

The most common way to model $|x|$ starts with writing $x$ as the difference of two nonnegative variables: \begin{gather*} x=x^{+}-x^{-}\\ 0\le x^{+}\le U\\ 0\le x^{-}\le|L|. \end{gather*}It is then tempting to write $|x|=x^+ + x^-$, and in point of fact that comes close. What we can say is that$|x|\le x^+ + x^-.$If the net effect of inflating the claimed value of $|x|$ is to hurt the objective value, then the solver will force either $x^+$ or $x^-$ to be zero and the other one to be $|x|$. So this is all you need to do in some cases.

The problem is that if there is a possibility that inflating the estimate may benefit the objective, such inflation may happen. For instance, if we are maximizing $|x|$ and the correct answer is $x=a>0$ (so $|x|=a$), what we will actually get is $x^+=a+\delta$, $x^-=\delta$ with $\delta=\min\{|L|, U-a\}$ and $x^+ + x^- = a+2\delta$ rather than $a$.

#### Binary method

This brings me to the first method I know for modeling absolute values (outside of the case mentioned above where you get lucky): split the variable or expression into positive and negative parts as above, and add a binary variable to force one or the other to zero. Mathematically, this looks like \begin{gather*} x=x^{+}-x^{-}\\ 0\le x^{+}\le Uy\\ 0\le x^{-}\le|L|(1-y)\\ y\in \{0,1\}.\end{gather*}Besides the fact that it is the best known approach, a potential advantage of the binary method is that it contributes to bounding at nodes in the search tree. The smaller $U$ and $|L|$ are, the greater the value in bounding. Disadvantages include (a) adding another binary variable to the problem (MILP solution times tend to be sensitive to the number of integer variables) and (b) possibly causing numerical stability issues (discussed here, so I won't repeat the details) if $U$ and/or $|L|$ is large, since they are now constraint coefficients.

#### SOS1 method

In the commentary following the Big-M post, Ed Klotz points out that an alternative to using a binary variable to zero out another variable (or not) is to use SOS1 constraints.  Applied here, we skip the binary variable $y$ and just declare $\{x^+,x^-\}$ to be an ("an"? "a"? I'm so confused!) SOS1. This of course assumes the solver knows what to do with an SOS1. The upsides of this are that it avoids an additional binary variable, and avoids any numerical stability problems. The downside is that the fact that one of $x^+$ and $x^-$ must be zero does not contribute directly to bounding. In effect, it becomes a branching rule. The solver knows that at some point it needs to branch and create two children with $x^+=0$ in one and $x^-=0$ in the other. So an SOS1 formulation may be "weaker" than a binary formulation (unless $U$ and $|L|$ are large).

#### SOS2 method

Thinking about the SOS1 approach made me consider using an SOS2. An SOS2 involves three or more variables in a specified order; the solver interprets the SOS2 constraint to mean that at most two of the variables can be nonnegative, and those two must be consecutive in the given ordering. SOS2 is designed to model piecewise-linear functions, of which the absolute value function is one.

In this approach, we do not use $x^+$ and $x^-$. Instead, we introduce three weight variables $w^+,w^-,w^0$ that form the weights of a convex combination of $U$, $L$ and $0$. So the constraints are \begin{gather*}0\le w^+,w^-,w^0\le 1\\w^++w^-+w^0=1\\x=Lw^-+0w^0+Uw^+=Lw^-+Uw^+\end{gather*}with $\{w^-,w^0,w^+\}$ (in that order) an SOS2. With this approach, $|x|=|L|w^-+Uw^+.$The SOS2 method shares one of the virtues of the SOS1 method (no added integer variables) but not the other ($L$ and $U$ creep into constraint coefficients). I'm uncertain what its effect on bounding would be, but probably nothing useful. Consider, for instance, $U=100$ and $L=-100$. Any legal value of $x$ can be written as a convex combination of those two limits (i.e., with $w^0=0$), which would produce an "absolute value" of $100w^-+100w^+=100$ regardless of the actual value of $x$. This would violate the SOS2 constraint, but that constraint will be relaxed in node LPs until the solver decides to branch on it. (After branching, node LPs will contain the correct absolute value.)

#### ABS method

As noted by Irv Lustig in the comments section, CPLEX contains an absolute value function. Irv believes that CPLEX converts it into an indicator constraint, which then (as I understand things) can either take the binary approach or be handled by a branching rule, at CPLEX's discretion. This would seem to be the most intuitive approach from a user's perspective, but (a) there's a small programming quirk that I'll mention below, (b) the user can't be entirely sure whether it translates into the equivalent of the binary method or the equivalent of the SOS1 method (which, as we'll see, seem to perform differently) and (c) if it's the equivalent of the binary method, the question then becomes whether the "big M" value that CPLEX deduces is tighter or looser than the one the user would employ when applying the binary approach directly.

#### Test problem

So I devised a little experiment. Let's start with a transportation problem involving $M$ sources and $N$ sinks:\begin{gather*}\textrm{minimize }c'x\\\textrm{s.t. }\sum_{i=1}^Mx_{ij}\ge d_j\quad\forall j\in \{1,\dots,N\}\\\sum_{j=1}^Nx_{ij}\le s_i\quad\forall i\in \{1,\dots,M\}\\0\le x_{ij}\le \min(s_i,d_j)\quad\forall i,j.\end{gather*}The $s_i$, $d_j$ and $c_{ij}$ are supplies, demands and unit flow costs respectively, while $x_{ij}$ is flow from source $i$ to sink $j$. Disdaining the convention of equation constraints, I prefer inequalities (essentially implying that disuse of excess capacity is free) and assume that total supply is at least rather than exactly total demand ($\sum_i s_i\ge \sum_j d_j$). This reduces my fear of rounding errors. Also note that the flow variables have natural lower ($0$) and upper ($\min(s_i,d_j)$) bounds.

Now suppose that we solve the model, get an optimal solution with objective value $C$, and decide that we want some variety or flexibility in our decisions. (The virtues of this in real life have been discussed on other blogs and forums; since I'm just cooking up an example, I won't try to justify myself here.) So we'll choose some fraction $\delta>0$ such that we are interested in exploring "near-optimal" solutions having cost $\le (1+\delta)C$, and look for some number $K$ of such solutions. To maximize the diversity of the solutions we get, we will maximize the minimum L1 distance between any two solutions. The L1 norm$\left\Vert x\right\Vert _{1}=\sum_{i}|x_{i}|$is one of two norms (along with$\left\Vert x\right\Vert _{0}=\max_{i}|x_{i}|,$the L0 norm) that linearizes, and so is a nice fit with an LP or MILP.

Adding a superscript $k$ to $x$ to indicate which candidate solution it is, our model becomes
\begin{gather*}
\textrm{maximize } z\\
\textrm{s.t. }\sum_{i=1}^M x_{ij}^k\ge d_j\quad\forall j\in \{1,\dots,N\},k\in\{1,\dots,K\}\\
z_{hk}=\sum_{i,j}|x_{ij}^k-x_{ij}^h|\quad\forall h,k\in\{1,\dots,K\},h<k.\end{gather*}Here $z_{hk}$ is the L1 distance between solutions $h$ and $k$, and $z$ is the minimum distance between any pair of solutions. (As an aside, this formulation is not guaranteed to produce the optimal solution; but you already know that one, since it was necessary to solve the original problem to find $C$.) The trick now is to linearize that last constraint.

Here's where the "small programming  quirk" I mentioned above comes into play. Although you can apply IloCplexModeler.abs() to each difference $|x_{ij}^k-x_{ij}^h|$, you cannot directly add those expressions. [Oops! That's not correct. See update #3 below.] That's because IloCplexModeler.abs() returns an object of type IloNumExpr (numeric expression) rather than type IloLinearNumExpr (linear numeric expression), and only linear expressions may be combined into a larger linear expression (which is what you need for the right side of the last constraint). You cannot cast IloNumExpr to IloLinearNumExpr, which is plausible since casting an expression that is actually nonlinear to a linear expression could cause the sun to go nova. The workaround is easy but a bit tedious: assign each term of the last constraint to a variable. (Maybe Irv, or someone more familiar with CPLEX than I, can come up with a better approach.) [No sooner said than done: update #3 below.] Mathematically, the last constraint becomes\begin{gather*}z^{hk}_{ij}=|x_{ij}^k-x_{ij}^h|\quad\forall h<k,\forall i,j\\ z_{hk}=\sum_{i,j}z^{hk}_{ij}\quad\forall h,k\in\{1,\dots,K\},h<k.\end{gather*}The extra variables may not be a bad thing, though, since we get to bound them. If we write $U_{ij}$ for the upper bound of $x^h_{ij}$ for every $h,i,j$, then we can bound the extra variables with $z^{hk}_{ij}=|x_{ij}^k-x_{ij}^h|\le \max(U^h_{ij},U^k_{ij}).$Bounding individual terms in the sum can't hurt and possibly could help.

#### Results

I ran all four methods against my test problem using CPLEX 12.4 (Java API) with all parameters except the time limit at default values. I cut the runs off after 15 minutes (on a quad-core PC). The source code is available on request, but I don't know that it is very informative.

Problem dimensions were $M=5$ (sources), $N=10$ (sinks), $K=5$ (solutions), $\delta=0.05$ (accept a 5% increase in cost over the optimum). Coefficients $s$, $d$ and $c$ were generated randomly. The results are tabulated below. The value in the "Mean" column is the mean distance between any pair of solutions. (As a sidebar, CPLEX was apparently so disgusted with the SOS1 results that it refused to print the gap, substituting "---" ... or maybe it just didn't fit.) Remember that the objective is minimum distance between solutions, and we are maximizing it.

MethodIncumbentBoundGapMean
Binary1,051.81,278.921.6%1,071.6
SOS1697.611,248.01,512%822.9
SOS2858.36,318.8636.2%912.8
ABS1,019.55,493.1438.8%1,037.6
ABS21,038.25,724.7451.4%1,062.2

My example has reasonably tight bounds ($L$ and $U$ above): $-\min(s_i,d_j)\le x^h_{ij}-x^k_{ij}\le \min(s_i,d_j).$I thought I might relax the bound a bit by changing $\min(s_i,d_j)$ to $\sum_i s_i$ (i.e., bounding every flow by the total supply in the system). The results after that modification were as follows:

MethodIncumbentBoundGapMean
Binary1,063.51,508.341.8%1,068.2
SOS1512.2133,089.625,883%615.8
SOS2482.073,752.515,200%556.1
ABS1,020.97,663.2650.6%1,060.6
ABS21,011.68,745.1764.5%1,019.5

Generally speaking, looser bounds on the variables hurt all four methods, although the incumbents for the binary and ABS methods were actually better with the loose bounds even though the average distance between solutions, best bound and gap were consistently worse.  (Have I mentioned that MILPs are spiteful little beasts?)

#### Conclusion (?)

Reread the disclaimer above before interpreting these results. Still, based on very limited testing (I did run the same problem with smaller dimensions, with consistent outcomes), it looks as if the improved bounding in the binary formulation outweighs the cost of the binary variables -- provided that $L$ and $U$ are reasonable.  Using CPLEX's abs() method did better than either of the SOS approaches but not as well as providing your own binary variables. What I did not test here is a case where the bounds are large enough to cause the solver numerical indigestion.  Should that happen, perhaps one of the SOS methods might be worth using (but which one?).

#### Update #1

Bo Jensen pointed me to the slides from a presentation he did on exploiting absolute values in LP solvers. (Bo's presentation is well worth viewing, although he is dealing with the case where absolute values are being minimized, and so no binary variables are needed.) It reminded me of another variation on the binary approach. Rather than splitting $x$, we add a new variable $z=|x|$ and reformulate as follows:\begin{gather*}x\le z\\-x\le z\\z\le x + 2|L|(1-y)\\z\le -x+2Uy\\0\le z\\y\in \{0,1\}.\end{gather*}It uses the same number of variables (since, in my original binary formulation, $x$ can be eliminated) and more constraints, and the constraint coefficients containing the bounds are doubled, so I suspect it is inferior to the formulation above.

#### Update #2

Per Irv Lustig's suggestion in the comment section, I've added a fourth option, using CPLEX's built in absolute value function (IloCplexModeler.abs() in the Java API).

#### Update #3

Roland Wunderling at IBM kindly (and gently) pointed out that you can add absolute values in the CPLEX APIs. In the Java API, you can put the running total in an instance of IloNumExpr (let's call it total). If cplex is the instance of IloCplex we're working with, then total = cplex.sum(total, cplex.abs(...)) can be used in a loop to assemble a sum of absolute values. Once the sum is complete, total can be used as one side of a constraint. This avoids the use of the intermediate variables for each absolute difference, but also removes the opportunity to assign each term a bound. I've added results for this variant to the table (labeled "ABS2"). In my very limited testing (have I mentioned the disclaimer above?), it had somewhat looser bounds/larger gap than the ABS approach, but neither method dominated the other when it came to incumbents.

Update #4

IBM's Jean-Francois Puget posted an alternative approach for a special case ($x\in\{0,1,2\}$, $z=\left|x-1\right|\in\{0,1\}$) that readers might find interesting.

Related post: Absolute Value Inequalities

1. It has always seemed to me that this is something that the solver should handle. Solvers that split free variables already have to take similar steps to control growth, right?

In CVX I actually model the absolute value function using a second-order cone! In other words, $|x|\le y$ becomes $\|x\|\le y$. Of course when it's time to transform the problem to a solver standard form I can convert these degenerate SOC inequalities to linear constraints. But if a solver had a special facility to handle absolute values, then I've preserved enough problem structure to take advantage of it.

1. I'm not sure whether I would put the onus on modeling languages or solvers, but I can see an argument for building it into solvers. I agree that one or the other should.

Splitting free variables may be a bit different. You can split $n$ free variables into $n+1$ nonnegative variables. (I don't know if any solvers do that -- solvers are pretty much black boxes to me -- but I know that using a single additional variable is an option.) With absolute values, I think each expression has to be handled independently of any other absolute values.

Your comment got me thinking about a parallel to indicator constraints, though. Like absolute values, you can convert indicators into either "big M" type constraints (danger of loose bounds, numerical instability) or branching logic (even weaker formulation, but no numerical issues). Apparently CPLEX supports both approaches and internally chooses between them. So I guess it makes sense that a solver could support absolute values in model inputs and then internally choose either the binary method or one of the SOS methods.

2. My talk is not completely in same scope as your blog post, but I think the observations and conclusions are interesting. They affects most modeling choices with abs, which at some points need to be solved with a simplex algorithm.

The interesting part is that even very simple improvements, which we can easily do by inspecting the solution, may not be straight forward for the simplex algorithm.

Since LP is solved as relaxation in MIP, then it's also relevant for MIP.

I am not sure abs should be supported by solvers, but one argument could be the solver then does not need to search for such patterns.

1. The graph on slide 21 would, I think, not be easy to produce when absolute values occur in constraints and larger absolute values benefit the objective (net, if not directly). There may still be room for algorithmic improvements, but I suspect they will be more complicated than what you showed (which was complicated enough for me).

Based on our email conversation, I think we're agreed that wherever the support for absolute values falls, ideally it should not be on the users.

3. A solver does not have to support the abs() function, actually. It just has to support its epigraph, which is the set $|x|\leq z$. That's just a degenerate second-order cone. In fact, Gurobi now supports second-order-cone constraints, and I have successfully passed it absolute values in this way.

Of course, nothing prevents Gurobi or other second-order solvers from reformulating this second-order-cone constraint into an equivalent set of linear constraints. Indeed, I would suspect that for performance reasons it _should_ do so. But it now has the freedom to implement it in whatever way is most efficient given the solver's internal design. Indeed, the answer could be model dependent: if the model has other binary or integer variables, for instance, it could choose an MILP formulation; otherwise, it could split the variables or use a formulation like Bo's.

1. If you want to handle $|x|\le y$, you just need solver support for the epigraph. But what if you want to handle $|x|+|y|=|z|$?

I'm looking at this from the (somewhat naive) user's perspective. Almost all users get that transcendental functions don't mix with LPs and MILPs. They're a bit less discerning about which quadratic functions are/are not supported, and where. My sense from various forums is that a lot of them think that because absolute value is common/low order/piecewise linear, they should be able to use it freely.

If you say "no absolute values", they may internalize that; but if you say "you can use absolute values, but only in constraints that look like $|x|\le y$", they may not internalize the limitation ... and then your support people start buying aspirin in the big bottles. As the poet Paul Simon said, "a man hears what he wants to hear, and disregards the rest." (http://www.lyricsfreak.com/s/simon+and+garfunkel/the+boxer_20124664.html)

2. Good point, Paul, I was thinking about just the convex case. Blast that tunnel vision!

But in fact I've thought about the extension of my convex transformation ideas to non-convex cases, and other people have, too. For absolute values there are three cases to consider: (1) $|x|\leq y$, (2) $|x|\geq y$, (3) $|x|=y$.

Other variations follow from these. For instance, $|x|+|y|=|z|$ is just an extension of case 3. For a nonlinear composition like $f(||x|) \leq y$, it depends on the monotonicity of $f$: it's case 1 if $f$ is nondecreasing, case 2 if $f$ is nonincreasing, and case 3 if $f$ is nonmontonic. If you don't know the monotonicity of $f$, stick with case 3.

Yeah it gets complicated, but as a modeler---or a modeling framework---you can always truncate that complexity wherever you wish by defaulting to case 3 for any situation you don't handle.

3. Michael,

I agree with your points here (particularly the last paragraph). I also suspect that there will be cases where the user can find a tighter formulation than whatever the modeling framework uses, just as (at least on my one test problem here) explicit introduction of binaries seems to beat CPLEX's built-in absolute value support. That said, I do think it would be best if, to the extent possible, a modeling language/API/system provided some support for absolute values, so that users who don't know how to model them (or don't have any problem-specific intuition better than whatever the modeling system's internal logic provides) can use absolute values without sweating over them.

It would also help if there were some way to convey to users that absolute values come at a cost. I suspect some users think they can use absolute values willy-nilly and still have LP-like performance.

4. Convergence! :P

4. CPLEX does support modeling absolute value in the Concert API's and in OPL. I did a little experiment and it seems that CPLEX turns it into the binary model, but with indicator constraints. Paul - it would be interesting if you repeated your experiment using the IloCplexModeler.abs method from the Java API.

1. Your wish is my command (as long as no large hats are involved). I added a fourth method above. It did better than either SOS method, not as well as explicit binaries.

If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on OR-Exchange.