OR in an OB World: Modeling Absolute Values

Friday, July 6, 2012

Modeling Absolute Values

There's a category of frequently asked questions that, at some point, boils down to how to model an absolute value in a linear (LP) or mixed-integer linear program (MILP). There is typically more than one answer, and I became curious how different methods perform. Having idle time on my hands, I decided to try a little experiment. Being lazy (and not having to worry about pacifying reviewers and editors -- hooray for blogs!), I decided to do just one experiment. Since this involves MILPs, let me start with a disclaimer.

There are no safe generalizations about MILPs other than this one.
Your mileage will vary.
Past performance is not a predictor of anything.
Stuff happens.

(Hopefully that was sufficient. I should probably park that disclaimer in the margin of the blog so that it appears on all posts.)

Let's say that we have a variable (or variable expression) $x$ whose absolute value we need. Any trick for modeling absolute values will require $x$ to be bounded, say $L\le x\le U$. If $L\ge 0$ then $|x|=x$, and if $U\le 0$ then $|x|=-x$, so we will dispense with the boring cases and assume that $L<0<U$.

The most common way to model $|x|$ starts with writing $x$ as the difference of two nonnegative variables: \begin{gather*} x=x^{+}-x^{-}\\ 0\le x^{+}\le U\\ 0\le x^{-}\le|L|. \end{gather*}It is then tempting to write $|x|=x^+ + x^-$, and in point of fact that comes close. What we can say is that\[|x|\le x^+ + x^-.\]If the net effect of inflating the claimed value of $|x|$ is to hurt the objective value, then the solver will force either $x^+$ or $x^-$ to be zero and the other one to be $|x|$. So this is all you need to do in some cases.

The problem is that if there is a possibility that inflating the estimate may benefit the objective, such inflation may happen. For instance, if we are maximizing $|x|$ and the correct answer is $x=a>0$ (so $|x|=a$), what we will actually get is $x^+=a+\delta$, $x^-=\delta$ with $\delta=\min\{|L|, U-a\}$ and $x^+ + x^- = a+2\delta$ rather than $a$.

Binary method

This brings me to the first method I know for modeling absolute values (outside of the case mentioned above where you get lucky): split the variable or expression into positive and negative parts as above, and add a binary variable to force one or the other to zero. Mathematically, this looks like \begin{gather*} x=x^{+}-x^{-}\\ 0\le x^{+}\le Uy\\ 0\le x^{-}\le|L|(1-y)\\ y\in \{0,1\}.\end{gather*}Besides the fact that it is the best known approach, a potential advantage of the binary method is that it contributes to bounding at nodes in the search tree. The smaller $U$ and $|L|$ are, the greater the value in bounding. Disadvantages include (a) adding another binary variable to the problem (MILP solution times tend to be sensitive to the number of integer variables) and (b) possibly causing numerical stability issues (discussed here, so I won't repeat the details) if $U$ and/or $|L|$ is large, since they are now constraint coefficients.

SOS1 method

In the commentary following the Big-M post, Ed Klotz points out that an alternative to using a binary variable to zero out another variable (or not) is to use SOS1 constraints. Applied here, we skip the binary variable $y$ and just declare $\{x^+,x^-\}$ to be an ("an"? "a"? I'm so confused!) SOS1. This of course assumes the solver knows what to do with an SOS1. The upsides of this are that it avoids an additional binary variable, and avoids any numerical stability problems. The downside is that the fact that one of $x^+$ and $x^-$ must be zero does not contribute directly to bounding. In effect, it becomes a branching rule. The solver knows that at some point it needs to branch and create two children with $x^+=0$ in one and $x^-=0$ in the other. So an SOS1 formulation may be "weaker" than a binary formulation (unless $U$ and $|L|$ are large).

SOS2 method

Thinking about the SOS1 approach made me consider using an SOS2. An SOS2 involves three or more variables in a specified order; the solver interprets the SOS2 constraint to mean that at most two of the variables can be nonnegative, and those two must be consecutive in the given ordering. SOS2 is designed to model piecewise-linear functions, of which the absolute value function is one.

In this approach, we do not use $x^+$ and $x^-$. Instead, we introduce three weight variables $w^+,w^-,w^0$ that form the weights of a convex combination of $U$, $L$ and $0$. So the constraints are \begin{gather*}0\le w^+,w^-,w^0\le 1\\w^++w^-+w^0=1\\x=Lw^-+0w^0+Uw^+=Lw^-+Uw^+\end{gather*}with $\{w^-,w^0,w^+\}$ (in that order) an SOS2. With this approach, \[|x|=|L|w^-+Uw^+.\]The SOS2 method shares one of the virtues of the SOS1 method (no added integer variables) but not the other ($L$ and $U$ creep into constraint coefficients). I'm uncertain what its effect on bounding would be, but probably nothing useful. Consider, for instance, $U=100$ and $L=-100$. Any legal value of $x$ can be written as a convex combination of those two limits (i.e., with $w^0=0$), which would produce an "absolute value" of $100w^-+100w^+=100$ regardless of the actual value of $x$. This would violate the SOS2 constraint, but that constraint will be relaxed in node LPs until the solver decides to branch on it. (After branching, node LPs will contain the correct absolute value.)

ABS method

As noted by Irv Lustig in the comments section, CPLEX contains an absolute value function. Irv believes that CPLEX converts it into an indicator constraint, which then (as I understand things) can either take the binary approach or be handled by a branching rule, at CPLEX's discretion. This would seem to be the most intuitive approach from a user's perspective, but (a) there's a small programming quirk that I'll mention below, (b) the user can't be entirely sure whether it translates into the equivalent of the binary method or the equivalent of the SOS1 method (which, as we'll see, seem to perform differently) and (c) if it's the equivalent of the binary method, the question then becomes whether the "big M" value that CPLEX deduces is tighter or looser than the one the user would employ when applying the binary approach directly.

Test problem

So I devised a little experiment. Let's start with a transportation problem involving $M$ sources and $N$ sinks:\begin{gather*}\textrm{minimize }c'x\\\textrm{s.t. }\sum_{i=1}^Mx_{ij}\ge d_j\quad\forall j\in \{1,\dots,N\}\\\sum_{j=1}^Nx_{ij}\le s_i\quad\forall i\in \{1,\dots,M\}\\0\le x_{ij}\le \min(s_i,d_j)\quad\forall i,j.\end{gather*}The $s_i$, $d_j$ and $c_{ij}$ are supplies, demands and unit flow costs respectively, while $x_{ij}$ is flow from source $i$ to sink $j$. Disdaining the convention of equation constraints, I prefer inequalities (essentially implying that disuse of excess capacity is free) and assume that total supply is at least rather than exactly total demand ($\sum_i s_i\ge \sum_j d_j$). This reduces my fear of rounding errors. Also note that the flow variables have natural lower ($0$) and upper ($\min(s_i,d_j)$) bounds.

Now suppose that we solve the model, get an optimal solution with objective value $C$, and decide that we want some variety or flexibility in our decisions. (The virtues of this in real life have been discussed on other blogs and forums; since I'm just cooking up an example, I won't try to justify myself here.) So we'll choose some fraction $\delta>0$ such that we are interested in exploring "near-optimal" solutions having cost $\le (1+\delta)C$, and look for some number $K$ of such solutions. To maximize the diversity of the solutions we get, we will maximize the minimum L1 distance between any two solutions. The L1 norm\[ \left\Vert x\right\Vert _{1}=\sum_{i}|x_{i}| \]is one of two norms (along with\[ \left\Vert x\right\Vert _{0}=\max_{i}|x_{i}|, \]the L0 norm) that linearizes, and so is a nice fit with an LP or MILP.

Adding a superscript $k$ to $x$ to indicate which candidate solution it is, our model becomes
\begin{gather*}
\textrm{maximize } z\\
\textrm{s.t. }\sum_{i=1}^M x_{ij}^k\ge d_j\quad\forall j\in \{1,\dots,N\},k\in\{1,\dots,K\}\\
\sum_{j=1}^N x_{ij}^k\le s_i\quad\forall i\in \{1,\dots,M\},k\in\{1,\dots,K\}\\
0\le x_{ij}^k\le \min(s_i,d_j)\quad\forall i,j,k\\
\sum_{ij}c_{ij}x_{ij}^k\le (1+\delta)C\quad\forall k\\
0\le z\le z_{hk}\quad\forall h,k\in\{1,\dots,K\},h<k\\
z_{hk}=\sum_{i,j}|x_{ij}^k-x_{ij}^h|\quad\forall h,k\in\{1,\dots,K\},h<k.\end{gather*}Here $z_{hk}$ is the L1 distance between solutions $h$ and $k$, and $z$ is the minimum distance between any pair of solutions. (As an aside, this formulation is not guaranteed to produce the optimal solution; but you already know that one, since it was necessary to solve the original problem to find $C$.) The trick now is to linearize that last constraint.

Here's where the "small programming quirk" I mentioned above comes into play. Although you can apply IloCplexModeler.abs() to each difference $|x_{ij}^k-x_{ij}^h|$, you cannot directly add those expressions. [Oops! That's not correct. See update #3 below.] That's because IloCplexModeler.abs() returns an object of type IloNumExpr (numeric expression) rather than type IloLinearNumExpr (linear numeric expression), and only linear expressions may be combined into a larger linear expression (which is what you need for the right side of the last constraint). You cannot cast IloNumExpr to IloLinearNumExpr, which is plausible since casting an expression that is actually nonlinear to a linear expression could cause the sun to go nova. The workaround is easy but a bit tedious: assign each term of the last constraint to a variable. (Maybe Irv, or someone more familiar with CPLEX than I, can come up with a better approach.) [No sooner said than done: update #3 below.] Mathematically, the last constraint becomes\begin{gather*}z^{hk}_{ij}=|x_{ij}^k-x_{ij}^h|\quad\forall h<k,\forall i,j\\ z_{hk}=\sum_{i,j}z^{hk}_{ij}\quad\forall h,k\in\{1,\dots,K\},h<k.\end{gather*}The extra variables may not be a bad thing, though, since we get to bound them. If we write $U_{ij}$ for the upper bound of $x^h_{ij}$ for every $h,i,j$, then we can bound the extra variables with \[z^{hk}_{ij}=|x_{ij}^k-x_{ij}^h|\le \max(U^h_{ij},U^k_{ij}).\]Bounding individual terms in the sum can't hurt and possibly could help.

Results

I ran all four methods against my test problem using CPLEX 12.4 (Java API) with all parameters except the time limit at default values. I cut the runs off after 15 minutes (on a quad-core PC). The source code is available on request, but I don't know that it is very informative.

Problem dimensions were $M=5$ (sources), $N=10$ (sinks), $K=5$ (solutions), $\delta=0.05$ (accept a 5% increase in cost over the optimum). Coefficients $s$, $d$ and $c$ were generated randomly. The results are tabulated below. The value in the "Mean" column is the mean distance between any pair of solutions. (As a sidebar, CPLEX was apparently so disgusted with the SOS1 results that it refused to print the gap, substituting "---" ... or maybe it just didn't fit.) Remember that the objective is minimum distance between solutions, and we are maximizing it.

Method	Incumbent	Bound	Gap	Mean
Binary	1,051.8	1,278.9	21.6%	1,071.6
SOS1	697.6	11,248.0	1,512%	822.9
SOS2	858.3	6,318.8	636.2%	912.8
ABS	1,019.5	5,493.1	438.8%	1,037.6
ABS2	1,038.2	5,724.7	451.4%	1,062.2

My example has reasonably tight bounds ($L$ and $U$ above): \[-\min(s_i,d_j)\le x^h_{ij}-x^k_{ij}\le \min(s_i,d_j).\]I thought I might relax the bound a bit by changing $\min(s_i,d_j)$ to $\sum_i s_i$ (i.e., bounding every flow by the total supply in the system). The results after that modification were as follows:

Method	Incumbent	Bound	Gap	Mean
Binary	1,063.5	1,508.3	41.8%	1,068.2
SOS1	512.2	133,089.6	25,883%	615.8
SOS2	482.0	73,752.5	15,200%	556.1
ABS	1,020.9	7,663.2	650.6%	1,060.6
ABS2	1,011.6	8,745.1	764.5%	1,019.5

Generally speaking, looser bounds on the variables hurt all four methods, although the incumbents for the binary and ABS methods were actually better with the loose bounds even though the average distance between solutions, best bound and gap were consistently worse. (Have I mentioned that MILPs are spiteful little beasts?)

Conclusion (?)

Reread the disclaimer above before interpreting these results. Still, based on very limited testing (I did run the same problem with smaller dimensions, with consistent outcomes), it looks as if the improved bounding in the binary formulation outweighs the cost of the binary variables -- provided that $L$ and $U$ are reasonable. Using CPLEX's abs() method did better than either of the SOS approaches but not as well as providing your own binary variables. What I did not test here is a case where the bounds are large enough to cause the solver numerical indigestion. Should that happen, perhaps one of the SOS methods might be worth using (but which one?).

Update #1

Bo Jensen pointed me to the slides from a presentation he did on exploiting absolute values in LP solvers. (Bo's presentation is well worth viewing, although he is dealing with the case where absolute values are being minimized, and so no binary variables are needed.) It reminded me of another variation on the binary approach. Rather than splitting $x$, we add a new variable $z=|x|$ and reformulate as follows:\begin{gather*}x\le z\\-x\le z\\z\le x + 2|L|(1-y)\\z\le -x+2Uy\\0\le z\\y\in \{0,1\}.\end{gather*}It uses the same number of variables (since, in my original binary formulation, $x$ can be eliminated) and more constraints, and the constraint coefficients containing the bounds are doubled, so I suspect it is inferior to the formulation above.

Update #2

Per Irv Lustig's suggestion in the comment section, I've added a fourth option, using CPLEX's built in absolute value function (IloCplexModeler.abs() in the Java API).

Update #3

Roland Wunderling at IBM kindly (and gently) pointed out that you can add absolute values in the CPLEX APIs. In the Java API, you can put the running total in an instance of IloNumExpr (let's call it total). If cplex is the instance of IloCplex we're working with, then total = cplex.sum(total, cplex.abs(...)) can be used in a loop to assemble a sum of absolute values. Once the sum is complete, total can be used as one side of a constraint. This avoids the use of the intermediate variables for each absolute difference, but also removes the opportunity to assign each term a bound. I've added results for this variant to the table (labeled "ABS2"). In my very limited testing (have I mentioned the disclaimer above?), it had somewhat looser bounds/larger gap than the ABS approach, but neither method dominated the other when it came to incumbents.

Update #4

IBM's Jean-Francois Puget posted an alternative approach for a special case ($x\in\{0,1,2\}$, $z=\left|x-1\right|\in\{0,1\}$) that readers might find interesting.

Related post: Absolute Value Inequalities

11 comments:

UnknownJuly 7, 2012 at 10:26 AM
It has always seemed to me that this is something that the solver should handle. Solvers that split free variables already have to take similar steps to control growth, right?

In CVX I actually model the absolute value function using a second-order cone! In other words, $|x|\le y$ becomes $\|x\|\le y$. Of course when it's time to transform the problem to a solver standard form I can convert these degenerate SOC inequalities to linear constraints. But if a solver had a special facility to handle absolute values, then I've preserved enough problem structure to take advantage of it.
ReplyDelete
Replies
Bo JensenJuly 7, 2012 at 12:36 PM
My talk is not completely in same scope as your blog post, but I think the observations and conclusions are interesting. They affects most modeling choices with abs, which at some points need to be solved with a simplex algorithm.

The interesting part is that even very simple improvements, which we can easily do by inspecting the solution, may not be straight forward for the simplex algorithm.

Since LP is solved as relaxation in MIP, then it's also relevant for MIP.

I am not sure abs should be supported by solvers, but one argument could be the solver then does not need to search for such patterns.
ReplyDelete
Replies
UnknownJuly 7, 2012 at 3:47 PM
A solver does not have to support the abs() function, actually. It just has to support its epigraph, which is the set $|x|\leq z$. That's just a degenerate second-order cone. In fact, Gurobi now supports second-order-cone constraints, and I have successfully passed it absolute values in this way.

Of course, nothing prevents Gurobi or other second-order solvers from reformulating this second-order-cone constraint into an equivalent set of linear constraints. Indeed, I would suspect that for performance reasons it _should_ do so. But it now has the freedom to implement it in whatever way is most efficient given the solver's internal design. Indeed, the answer could be model dependent: if the model has other binary or integer variables, for instance, it could choose an MILP formulation; otherwise, it could split the variables or use a formulation like Bo's.
ReplyDelete
Replies
Irv LustigJuly 9, 2012 at 10:24 PM
CPLEX does support modeling absolute value in the Concert API's and in OPL. I did a little experiment and it seems that CPLEX turns it into the binary model, but with indicator constraints. Paul - it would be interesting if you repeated your experiment using the IloCplexModeler.abs method from the Java API.
ReplyDelete
Replies

Add comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.

OR in an OB World