There's a category of frequently asked questions that, at some point, boils down to how to model an absolute value in a linear (LP) or mixed-integer linear program (MILP). There is typically more than one answer, and I became curious how different methods perform. Having idle time on my hands, I decided to try a little experiment. Being lazy (and not having to worry about pacifying reviewers and editors -- hooray for blogs!), I decided to do just one experiment. Since this involves MILPs, let me start with a disclaimer.
There are no safe generalizations about MILPs other than this one.
Your mileage will vary.
Past performance is not a predictor of anything.
Stuff happens.
(Hopefully that was sufficient. I should probably park that disclaimer in the margin of the blog so that it appears on all posts.)
Let's say that we have a variable (or variable expression) $x$ whose absolute value we need. Any trick for modeling absolute values will require $x$ to be bounded, say $L\le x\le U$. If $L\ge 0$ then $|x|=x$, and if $U\le 0$ then $|x|=-x$, so we will dispense with the boring cases and assume that $L<0<U$.
The most common way to model $|x|$ starts with writing $x$ as the difference of two nonnegative variables:
\begin{gather*}
x=x^{+}-x^{-}\\
0\le x^{+}\le U\\
0\le x^{-}\le|L|.
\end{gather*}It is then tempting to write $|x|=x^+ + x^-$, and in point of fact that comes close. What we can say is that\[|x|\le x^+ + x^-.\]If the net effect of inflating the claimed value of $|x|$ is to hurt the objective value, then the solver will force either $x^+$ or $x^-$ to be zero and the other one to be $|x|$. So this is all you need to do
in some cases.
The problem is that if there is a possibility that inflating the estimate may benefit the objective, such inflation may happen. For instance, if we are maximizing $|x|$ and the correct answer is $x=a>0$ (so $|x|=a$), what we will actually get is $x^+=a+\delta$, $x^-=\delta$ with $\delta=\min\{|L|, U-a\}$ and $x^+ + x^- = a+2\delta$ rather than $a$.
Binary method
This brings me to the first method I know for modeling absolute values (outside of the case mentioned above where you get lucky): split the variable or expression into positive and negative parts as above, and add a binary variable to force one or the other to zero. Mathematically, this looks like \begin{gather*}
x=x^{+}-x^{-}\\
0\le x^{+}\le Uy\\
0\le x^{-}\le|L|(1-y)\\ y\in \{0,1\}.\end{gather*}Besides the fact that it is the best known approach, a potential advantage of the binary method is that it contributes to bounding at nodes in the search tree. The smaller $U$ and $|L|$ are, the greater the value in bounding. Disadvantages include (a) adding another binary variable to the problem (MILP solution times tend to be sensitive to the number of integer variables) and (b) possibly causing numerical stability issues (discussed
here, so I won't repeat the details) if $U$ and/or $|L|$ is large, since they are now constraint coefficients.
SOS1 method
In the commentary following the
Big-M post, Ed Klotz points out that an alternative to using a binary variable to zero out another variable (or not) is to use
SOS1 constraints. Applied here, we skip the binary variable $y$ and just declare $\{x^+,x^-\}$ to be an ("an"? "a"? I'm so confused!) SOS1. This of course assumes the solver knows what to do with an SOS1. The upsides of this are that it avoids an additional binary variable, and avoids any numerical stability problems. The downside is that the fact that one of $x^+$ and $x^-$ must be zero does not contribute directly to bounding. In effect, it becomes a branching rule. The solver knows that at some point it needs to branch and create two children with $x^+=0$ in one and $x^-=0$ in the other. So an SOS1 formulation may be "weaker" than a binary formulation (unless $U$ and $|L|$ are large).
SOS2 method
Thinking about the SOS1 approach made me consider using an SOS2. An
SOS2 involves three or more variables in a specified order; the solver interprets the SOS2 constraint to mean that at most two of the variables can be nonnegative, and those two must be consecutive in the given ordering. SOS2 is designed to model piecewise-linear functions, of which the absolute value function is one.
In this approach, we do
not use $x^+$ and $x^-$. Instead, we introduce three weight variables $w^+,w^-,w^0$ that form the weights of a convex combination of $U$, $L$ and $0$. So the constraints are \begin{gather*}0\le w^+,w^-,w^0\le 1\\w^++w^-+w^0=1\\x=Lw^-+0w^0+Uw^+=Lw^-+Uw^+\end{gather*}with $\{w^-,w^0,w^+\}$ (in that order) an SOS2. With this approach, \[|x|=|L|w^-+Uw^+.\]The SOS2 method shares one of the virtues of the SOS1 method (no added integer variables) but not the other ($L$ and $U$ creep into constraint coefficients). I'm uncertain what its effect on bounding would be, but probably nothing useful. Consider, for instance, $U=100$ and $L=-100$. Any legal value of $x$ can be written as a convex combination of those two limits (i.e., with $w^0=0$), which would produce an "absolute value" of $100w^-+100w^+=100$ regardless of the actual value of $x$. This would violate the SOS2 constraint, but that constraint will be relaxed in node LPs until the solver decides to branch on it. (After branching, node LPs will contain the correct absolute value.)
ABS method
As noted by
Irv Lustig in the comments section, CPLEX contains an absolute value function. Irv believes that CPLEX converts it into an indicator constraint, which then (as I understand things) can either take the binary approach or be handled by a branching rule, at CPLEX's discretion. This would seem to be the most intuitive approach from a user's perspective, but (a) there's a small programming quirk that I'll mention below, (b) the user can't be entirely sure whether it translates into the equivalent of the binary method or the equivalent of the SOS1 method (which, as we'll see, seem to perform differently) and (c) if it's the equivalent of the binary method, the question then becomes whether the "big M" value that CPLEX deduces is tighter or looser than the one the user would employ when applying the binary approach directly.
Test problem
So I devised a little experiment. Let's start with a transportation problem involving $M$ sources and $N$ sinks:\begin{gather*}\textrm{minimize }c'x\\\textrm{s.t. }\sum_{i=1}^Mx_{ij}\ge d_j\quad\forall j\in \{1,\dots,N\}\\\sum_{j=1}^Nx_{ij}\le s_i\quad\forall i\in \{1,\dots,M\}\\0\le x_{ij}\le \min(s_i,d_j)\quad\forall i,j.\end{gather*}The $s_i$, $d_j$ and $c_{ij}$ are supplies, demands and unit flow costs respectively, while $x_{ij}$ is flow from source $i$ to sink $j$. Disdaining the convention of equation constraints, I prefer inequalities (essentially implying that disuse of excess capacity is free) and assume that total supply is at least rather than exactly total demand ($\sum_i s_i\ge \sum_j d_j$). This reduces my fear of rounding errors. Also note that the flow variables have natural lower ($0$) and upper ($\min(s_i,d_j)$) bounds.
Now suppose that we solve the model, get an optimal solution with objective value $C$, and decide that we want some variety or flexibility in our decisions. (The virtues of this in real life have been discussed on other blogs and forums; since I'm just cooking up an example, I won't try to justify myself here.) So we'll choose some fraction $\delta>0$ such that we are interested in exploring "near-optimal" solutions having cost $\le (1+\delta)C$, and look for some number $K$ of such solutions. To maximize the diversity of the solutions we get, we will maximize the minimum
L1 distance between any two solutions. The L1 norm\[
\left\Vert x\right\Vert _{1}=\sum_{i}|x_{i}|
\]is one of two norms (along with\[
\left\Vert x\right\Vert _{0}=\max_{i}|x_{i}|,
\]the L0 norm) that linearizes, and so is a nice fit with an LP or MILP.
Adding a superscript $k$ to $x$ to indicate which candidate solution it is, our model becomes
\begin{gather*}
\textrm{maximize } z\\
\textrm{s.t. }\sum_{i=1}^M x_{ij}^k\ge d_j\quad\forall j\in
\{1,\dots,N\},k\in\{1,\dots,K\}\\
\sum_{j=1}^N x_{ij}^k\le s_i\quad\forall i\in
\{1,\dots,M\},k\in\{1,\dots,K\}\\
0\le x_{ij}^k\le \min(s_i,d_j)\quad\forall i,j,k\\
\sum_{ij}c_{ij}x_{ij}^k\le (1+\delta)C\quad\forall k\\
0\le z\le z_{hk}\quad\forall h,k\in\{1,\dots,K\},h<k\\
z_{hk}=\sum_{i,j}|x_{ij}^k-x_{ij}^h|\quad\forall h,k\in\{1,\dots,K\},h<k.\end{gather*}Here $z_{hk}$ is the L1 distance between solutions $h$ and $k$, and $z$ is the minimum distance between any pair of solutions. (As an aside, this formulation is not guaranteed to produce the optimal solution; but you already know that one, since it was necessary to solve the original problem to find $C$.) The trick now is to linearize that last constraint.
Here's where the "small programming quirk" I mentioned above comes into play. Although you can apply
IloCplexModeler.abs() to each difference $|x_{ij}^k-x_{ij}^h|$, you cannot directly add those expressions. [
Oops! That's not correct. See update #3 below.] That's because
IloCplexModeler.abs() returns an object of type
IloNumExpr (numeric expression) rather than type
IloLinearNumExpr (linear numeric expression), and only linear expressions may be combined into a larger linear expression (which is what you need for the right side of the last constraint). You cannot cast
IloNumExpr to
IloLinearNumExpr, which is plausible since casting an expression that is actually nonlinear to a linear expression could cause the sun to go nova. The workaround is easy but a bit tedious: assign each term of the last constraint to a variable. (Maybe Irv, or someone more familiar with CPLEX than I, can come up with a better approach.) [
No sooner said than done: update #3 below.] Mathematically, the last constraint becomes\begin{gather*}z^{hk}_{ij}=|x_{ij}^k-x_{ij}^h|\quad\forall h<k,\forall i,j\\ z_{hk}=\sum_{i,j}z^{hk}_{ij}\quad\forall h,k\in\{1,\dots,K\},h<k.\end{gather*}The extra variables may not be a bad thing, though, since we get to bound them. If we write $U_{ij}$ for the upper bound of $x^h_{ij}$ for every $h,i,j$, then we can bound the extra variables with \[z^{hk}_{ij}=|x_{ij}^k-x_{ij}^h|\le \max(U^h_{ij},U^k_{ij}).\]Bounding individual terms in the sum can't hurt and possibly could help.
Results
I ran all four methods against my test problem using
CPLEX 12.4 (Java API) with all parameters except the time limit at default values. I cut the runs off after 15 minutes (on a quad-core PC). The source code is available on request, but I don't know that it is very informative.
Problem dimensions were $M=5$ (sources), $N=10$ (sinks), $K=5$ (solutions), $\delta=0.05$ (accept a 5% increase in cost over the optimum). Coefficients $s$, $d$ and $c$ were generated randomly. The results are tabulated below. The value in the "Mean" column is the mean distance between any pair of solutions. (As a sidebar, CPLEX was apparently so disgusted with the SOS1 results that it refused to print the gap, substituting "---" ... or maybe it just didn't fit.) Remember that the objective is minimum distance between solutions, and we are maximizing it.
Method | Incumbent | Bound | Gap | Mean |
Binary | 1,051.8 | 1,278.9 | 21.6% | 1,071.6 |
SOS1 | 697.6 | 11,248.0 | 1,512% | 822.9 |
SOS2 | 858.3 | 6,318.8 | 636.2% | 912.8 |
ABS | 1,019.5 | 5,493.1 | 438.8% | 1,037.6 |
ABS2 | 1,038.2 | 5,724.7 | 451.4% | 1,062.2 |
My example has reasonably tight bounds ($L$ and $U$ above): \[-\min(s_i,d_j)\le x^h_{ij}-x^k_{ij}\le \min(s_i,d_j).\]I thought I might relax the bound a bit by changing $\min(s_i,d_j)$ to $\sum_i s_i$ (i.e., bounding every flow by the total supply in the system). The results after that modification were as follows:
Method | Incumbent | Bound | Gap | Mean |
Binary | 1,063.5 | 1,508.3 | 41.8% | 1,068.2 |
SOS1 | 512.2 | 133,089.6 | 25,883% | 615.8 |
SOS2 | 482.0 | 73,752.5 | 15,200% | 556.1 |
ABS | 1,020.9 | 7,663.2 | 650.6% | 1,060.6 |
ABS2 | 1,011.6 | 8,745.1 | 764.5% | 1,019.5 |
Generally speaking, looser bounds on the variables hurt all four methods, although the incumbents for the binary and ABS methods were actually better with the loose bounds even though the average distance between solutions, best bound and gap were consistently worse. (Have I mentioned that MILPs are spiteful little beasts?)
Conclusion (?)
Reread the disclaimer above before interpreting these results. Still, based on very limited testing (I did run the same problem with smaller dimensions, with consistent outcomes), it looks as if the improved bounding in the binary formulation outweighs the cost of the binary variables -- provided that $L$ and $U$ are reasonable. Using CPLEX's
abs() method did better than either of the SOS approaches but not as well as providing your own binary variables. What I did not test here is a case where the bounds are large enough to cause the solver numerical indigestion. Should that happen, perhaps one of the SOS methods might be worth using (but which one?).
Update #1
Bo Jensen pointed me to the
slides from a presentation he did on exploiting absolute values in LP solvers. (Bo's presentation is well worth viewing, although he is dealing with the case where absolute values are being minimized, and so no binary variables are needed.) It reminded me of another variation on the binary approach. Rather than splitting $x$, we add a new variable $z=|x|$ and reformulate as follows:\begin{gather*}x\le z\\-x\le z\\z\le x + 2|L|(1-y)\\z\le -x+2Uy\\0\le z\\y\in \{0,1\}.\end{gather*}It uses the same number of variables (since, in my original binary formulation, $x$ can be eliminated) and more constraints, and the constraint coefficients containing the bounds are doubled, so I suspect it is inferior to the formulation above.
Update #2
Per Irv Lustig's suggestion in the comment section, I've added a fourth option, using CPLEX's built in absolute value function (
IloCplexModeler.abs() in the Java API).
Update #3
Roland Wunderling at IBM kindly (and gently) pointed out that you can add absolute values in the CPLEX APIs. In the Java API, you can put the running total in an instance of
IloNumExpr (let's call it
total). If
cplex is the instance of
IloCplex we're working with, then
total = cplex.sum(total, cplex.abs(...)) can be used in a loop to assemble a sum of absolute values. Once the sum is complete,
total can be used as one side of a constraint. This avoids the use of the intermediate variables for each absolute difference, but also removes the opportunity to assign each term a bound. I've added results for this variant to the table (labeled "ABS2"). In my
very limited testing (have I mentioned the disclaimer above?), it had somewhat looser bounds/larger gap than the ABS approach, but neither method dominated the other when it came to incumbents.
Update #4
IBM's
Jean-Francois Puget posted an
alternative approach for a special case ($x\in\{0,1,2\}$, $z=\left|x-1\right|\in\{0,1\}$) that readers might find interesting.
Related post:
Absolute Value Inequalities