OR in an OB World: Perils of "Big M"

Monday, July 11, 2011

Perils of "Big M"

A somewhat recent Twitter exchange with Bo Jensen, which I believe was triggered by a post on an optimization forum (details of which are fading rapidly from my memory), motivated this entry. It deals with optimization models (specifically linear or integer linear ones) that are constructed using what is sometimes referred to as the "big M" method.

Terminology

First, I need to clarify terminology. Any reference to "big M" means the incorporation into a model of coefficients (usually represented by $M$) that are in a sense surrogates for $\infty$. The phrase "big M method" is frequently used to cover two somewhat different modeling techniques.

One eliminates the first phase (finding a feasible solution) of the two-phase simplex method by merging the phase I and phase II objective functions into a single function, penalizing the artificial variables with very large coefficients ($M$). It's an example of using a penalty function, and requires a sufficiently large value of $M$ to ensure that violating a constraint is never optimal.
The other provides a way for binary variables to turn constraints on or off. Consider, for example, a lockbox problem in which the number $x_{ij}$ of checks from customer $i$ assigned to lockbox location $j$ cannot exceed some capacity limit $M$ if the location is used (binary variable $y_j=1$), but should be zero if the location is not used ($y_j=0$). This is formulated as $$x_{ij}-My_j\le 0.$$

Rounding error

My first course in graduate school was in numerical analysis, and it came as a bit of a shock to my system. We covered rounding, truncation and representation errors rather extensively -- very disconcerting stuff to a pure mathematician. One would like to think that computers do arithmetic accurately. One would like to believe that politicians are honest and competent. One is destined to be disappointed both times.

I won't go into a lengthy introduction to the aforementioned types of error (which I'll somewhat cavalierly lump together under the name "rounding error"). Let me instead give a small example. The code is in Python (which represents all floating point numbers in double precision), but it could be any language.

x = 1.0e10
y = 1.0e-9
z = x - y
w = z - x
print x, y, z, w
print (z == x)
print (w == -y), (w == 0)

The output of the first print line is:

1.00000000000000e10 1.00000000000000e-9 1.00000000000000e10
 0.000000000000000

Note that z looks suspiciously like x (it should be strictly smaller), and w (which should be -y) is zero. That could just be limited precision in the output. The next three prints, however, tell the tale. The first (z==x) and third (w==0) should be false, while the second (w==-y) should be true. What we actually get:

True
False True

Whoops!

This leads me to a few truisms from that long-ago numerical analysis course:

Addition or subtraction of numbers introduce rounding errors, and addition or subtraction of numbers with substantially different magnitudes introduce rounding headaches. (Multiplication and division can also introduce small errors, but addition and subtraction tend to be the killers. The biggest problem with multiplication or division is probably overflow or underflow.)
Comparisons are closet subtractions. It's generally unsafe to test $x=y$ when $x$ and $y$ are floating point numbers. Instead, test $|x-y|\lt \epsilon$ for a suitably small (but not too small) $\epsilon \gt 0$.
In particular, things that should be zero often come out not quite zero (commonly referred to as "decimal dust"). More than once I've seen someone ask why some solver gave an optimal solution to his problem that had a binary variable equal to 1.0e-9 (throw in a few more junk digits if you like). It should be either 0 or 1! Well, yes; but if the solver held out for exactly 0 or exactly 1, you'd get a lot of wrong answers due to decimal dust. So the solver accepts that "darned near 0" is 0 and "darned near 1" is 1.

$M$ in the objective or RHS

Having $M$ in the objective function (as in the first definition of "big M" above, eliminating phase I of the simplex method) or having it as the right hand side of a constraint introduces rounding error (per my first bullet above). Specifically, $M$ in the objective can cause rounding errors in reduced costs, which may confuse the solver about which columns are candidates for entry to the basis. $M$ in the right hand side can introduce rounding errors in the right hand sides of subsequent tableaus, which may cause problems selecting the variable to leave the basis. As far as I know, those issues are generally survivable, and I've not seen any crash landings caused by it. Perhaps someone who has will comment with an example of a bad outcome from $M$ on the perimeter of the model.

$M$ in the constraints

The lockbox formulation illustrates the situation where $M$ shows up as a coefficient in the constraint matrix. This is where serious misadventures can occur. If $M$ creeps into the basis matrix $\textbf{B}$ (or a column that is a candidate for entry into the basis matrix), the ensuing rounding errors can cause $\textbf{B}$ to look singular (or can cause a column that should be ineligible for entry to the basis, because it would make $\textbf{B}$ singular, look eligible). The rounding errors can also cause severe loss of precision in the computation of $\textbf{B}^{-1}$. In technical terms, $M$ (or $1/M$) appearing in $\textbf{B}$ can make $\textbf{B}$ ill-conditioned.

Suppose that, in a formulation with $M$ in constraints, we bump into the following candidate for a basis matrix$$\textbf{B}=\left[\begin{array}{ccc} 1 & 0 & \frac{1}{M} \\ M & 0 & 1 \\ 1 & M & 0\end{array}\right].$$Since $$\textbf{B}\ \left[\begin{array}{c}1 \\ \frac{-1}{M} \\ -M\end{array}\right]=\left[\begin{array}{c}0\\0\\0\end{array}\right]$$it is obvious that $\textbf{B}$ is singular. Right? Well, let's check the determinant of $\textbf{B}$. I did the calculations using Sage 4.7, but I have no reason to think other packages would do any better (although pathologies might well vary).

Here is the Sage code I used to compute the determinants:

B = matrix(3,3,[1.0,0.0,1.0/M,M,0.0,1.0,1.0,M,0.0]);
[det(B.substitute(M=10.^k)) for k in (28, 29, 30, 31)];

Here is a table of the results:$$\begin{array}{rccc}M & 10^{28} & 10^{29} \\ \det(\textbf{B}) & 0.000000000000000 & 0.000000000000000 \\ \\
M & 10^{30} & 10^{31} \\ \det(\textbf{B}) & -1.40737488355328e14 & 0.000000000000000\end{array}$$
So something inexplicable (but bad) happened when $M=10^{30}$, presumably due to loss of precision in some key computation.

You won't see exactly this glitch in a typical "big M" model, but if you play with enough "big M" formulations, sooner or later (probably sooner) you will trip over numerical instability attributable to mixing disgustingly large coefficients with considerably smaller ones.

$M$ in MIP models

As I mentioned above, one common use of "big M" formulations (illustrated by the lockbox example) is to allow a binary variable to turn a constraint on or off. Even if ill-conditioned basis matrices do not occur, large values of $M$ can cause branch-and-bound solvers to make slow progress solving the mixed-integer programming model (MIP). Consider our simple lockbox constraint: $x_{ij}-My_j\le 0$. There will be an associated penalty cost (say, $c_j y_j$) in the objective function for using location $j$. Now suppose that, at some point, the solver is considering a node where $x_{1j}=2$, $x_{2j}=5$ and $x_{ij}=0$ for other values of $i$. Logically, we know this requires $y_j=1$ and incurs cost $c_j$; but in the LP-relaxation of the node problem, $y_j=5/M$ is sufficient, incurring a cost of just $5c_j/M$. For large values of $M$, this substantially underestimates the true cost, leading to loose node bounds. Loose bounds at nodes in turn make it hard for the solver to prune nodes based on objective value, and so more nodes need to be examined, slowing the solution process.

$M$ and (dubious) pedagogy

I can't write a post this long without at least one rant. Let me stipulate up front that neither text book authors nor instructors can be expected to cover every exigency. That said, many instructors and authors introduce students to "big M" formulations in the abstract, because they are mathematically neat and easy to explain, and then move on to the next topic without regard to the grubby implementation details. I confess to having done this myself when I was young and foolish.

At least one thing, though, is simply notational laziness. That $M$ in my lockbox example? It should be $M_j$. There is no reason why a single value should be used for all instances of $M$ in a model, and very good reasons why the values should differ. I don't recall seeing $M$ subscripted very often, if ever. (Are text book authors charged for each subscript they use?) In certain examples, including lockbox problems, there is a natural and obvious value for $M$ (typically a capacity limit), and in those cases you see individualized values being used. More often, though, there is just this undifferentiated symbol $M$ floating throughout the model. I think that influences users to actually use a single, very large value everywhere. (I recently saw a model on a CPLEX support forum that was clearly a "big M" model with a single large value of $M$ in about half the constraints.)

Concluding thoughts...

... and if you're still with me, you're very ready for this post to conclude.

If you are going to use a "big M" formulation, look for the smallest values of $M$ that work in the context of the model. (I once wrote a paper in which I analytically derived a sufficiently large but not horribly inflated value for $M$. There are tricks for doing so.)
$M$ does not need to be one size fits all.
Particularly if $M$ will show up as the coefficient of a variable in a constraint, be prepared for numerical instability. Learn how to detect numerical problems and what levers your solver has for coping with instability.
Consider alternatives to a "big M" approach (two phase simplex, Benders decomposition, ...) if you have stability or accuracy problems you cannot resolve.

52 comments:

komarJuly 12, 2011 at 2:57 AM
Another idea is to invent a class which has two components: M's component and the common number component. Moreover, the mathematical operators concerning this new class need to be implemented. I guess this will have more work to do.
ReplyDelete
Replies
fbahrJuly 12, 2011 at 4:13 AM
So worth reading. Every single line of it.
ReplyDelete
Replies
Bo JensenJuly 12, 2011 at 5:10 AM
Excellent!

I will add in some situation the big M formulation is a question of "laziness" i.e the model should be treated in a two stage approach, one handling feasibility issue and the other optimality.
ReplyDelete
Replies
Paul A. RubinJuly 12, 2011 at 10:38 AM
@fbahr, @Bo: Thanks for the kind words. :-)

@Bo: Good point about the two stage approach. My models are always feasible (something about the nature of the problems I work on, I guess), so I lose track of this.

@komar: I had a similar thought years ago -- either treat it similar to complex numbers (i -> M), but with different arithmetic rules, or if necessary replace scalars with polynomials in M (and 1/M). I'm not sure if defining an algebra where M is an idempotent (M^2 = M) and 1/M is treated as zero would retain sufficient precision. Pivoting would be more work, and storage would increase. Ultimately, I decided that it wasn't worth pursuing; the time might be better invested in finding ways to choose reasonable values for M (or looking for alternatives to using M). I might have been short-sighted, though. Maybe this will be someone's dissertation someday.
ReplyDelete
Replies
Matthew SaltzmanJuly 12, 2011 at 11:07 AM
Very nice discussion, Paul.

A couple of points on big M in the simplex method:

- One reason you don't hear complaints is that almost nobody actually implements big M. Most simplex codes use some variant of the two-phase method.

- One can always reoptimize the big-M solution using the original objective. If M is well chosen, the big-M solution is feasible, so it can be used as a starting solution. And (with some luck) it should be at least close to optimal for the original objective. Now you essentially have a two-phase method that takes account of the original objective in phase I.

I have sometimes wondered if big M could be implemented correctly with IEEE Infinity, which IIRC satisfies your hypothetical algebra rules.
ReplyDelete
Replies
Paul A. RubinJuly 12, 2011 at 11:32 AM
@Matt: Thanks. You're right that simplex codes don't use the penalty "big M" method, but (per Bo's comment) I think sometimes users do it in a (misguided?) attempt to establish feasibility (or reason for infeasibility) and optimality in one swell foop, rather than either solve a feasibility problem first or rely on an IIS finder in the solver.

Re IEEE infinity, you would need to use a floating point library that didn't spit up when doing arithmetic on infinity. Also, I'm embarrassed to admit that I can't recall whether IEEE standard thinks 1/infinity is zero, NaN or what.
ReplyDelete
Replies
Matthew SaltzmanJuly 12, 2011 at 2:52 PM
Apparently, 1.0/Infinity == 0 (which makes sense to me). The current C standard is supposed to support IEEE 754, IIRC, as is Java.
ReplyDelete
Replies
Greg GlocknerJuly 12, 2011 at 7:24 PM
This is fantastic, and I'm definitely going to point people to this when the need arises. One thing I might have added is that an overly large big-M in MIP models often lets you cheat; for example, the 'integer' variable could take on a value of epsilon; it would be considered integer feasible with respect to the integrality tolerance yet the big-M might be large enough to allow some related variable to be nonzero (ex: to allow a value without incurring the associated fixed cost).
ReplyDelete
Replies
Paul A. RubinJuly 12, 2011 at 9:52 PM
@Greg: Great point. I was thinking of epsilon values for the binary variable in the relaxation solution, but of course you're right -- with M big enough, that epsilon looks integer-feasible (to within tolerance).
ReplyDelete
Replies
David WoodsJuly 13, 2011 at 4:34 AM
Great explanation Paul.
ReplyDelete
Replies
Paul A. RubinJuly 13, 2011 at 7:48 AM
Thanks, David.
ReplyDelete
Replies
Erling D. AndersenJuly 13, 2011 at 7:55 AM
I wish all our MOSEK customers would read your post.
ReplyDelete
Replies
Paul A. RubinJuly 13, 2011 at 8:32 AM
@Erling: Thanks. That's the problem right now, though; I suspect that I'm preaching mostly to the choir.
ReplyDelete
Replies
AnonymousJanuary 12, 2012 at 5:28 PM
By the way, SCIP also features "indicator constraints", with an additional binary variable for that "either/or" constraint you had. This will then decompose into a "variable bound constraint", using a big-M that is computed from the variable bounds in the problem automatically (I think).
ReplyDelete
Replies
Paul A. RubinJanuary 12, 2012 at 5:46 PM
CPLEX also has "indicator constraints", and I wouldn't be shocked if Gurobi and/or Xpress did as well. AFAIK, there's two ways to implement indicator constraints, and I have no idea which solvers use which methods. Method 1 is as you describe: turn them into big-M constraints, with the software taking a stab at a (hopefully tight) value of M. Method 2 is to handle them in the branching logic - wait (hopefully not too long) until you branch on the binary variable, then add the constraint to one child and not to the other. As noted above, the first approach can cause the constraint to have minimal impact on bounds until the binary variable gets fixed. The second approach seems to imply that the constraint has _no_ impact on bounds until the binary variable gets fixed. The second approach does, however, avoid numerical stability problems caused by M.
ReplyDelete
Replies
Ed KlotzJanuary 23, 2012 at 1:50 PM
Yes, as Paul mentioned, indicator constraints, while removing the numerical stability issues caused by M, potentially result in a weaker relaxation that makes the MIP model more difficult to solve. However,
CPLEX does a pretty good job of preprocessing variable bounds and figuring out the smallest possible value of M it can legitimately use.
It can then select among Method 1 and Method 2 from Paul's preceding post.

If you have to use big M in the formulation, one rule I have found helpful consists of setting the optimizer's integrality tolerance to a value smaller than 1/M. That limits the amount of trickle flow that can creep in as Greg Glockner mentioned above.

And finally, note that in addition to indicator constraints, you can remove big Ms from the formulation using type 1 SOSs. If you have a nonnegative continuous variable x and binary z with

x - Mz <= 0

you can reformulate it by introducing a new binary y and

y + z = 1
{x,y} an SOS1

If z = 0, then y = 1, and the SOS1 forces x to 0.
If z = 1, then y = 0, and the SOS1 doesn't impose any restrictions on x.
ReplyDelete
Replies
Ed KlotzJanuary 23, 2012 at 1:53 PM
Sorry, forgot to mention one thing about the SOS approach above: it has the same issue regarding the indicator constraint approach regarding the potential weakness of the relaxation.
ReplyDelete
Replies
AnonymousMay 2, 2012 at 6:33 AM
how to write this constraints let say x(i,j,k) is a binary variable and if x(i,j,k)=1 then A(i) >= B(j,k) ?
ReplyDelete
Replies
Paul A. RubinMay 3, 2012 at 9:26 AM
I discuss this in a new post: Indicator Implies Relation.
ReplyDelete
Replies
DewanJune 26, 2012 at 10:45 AM
Hi Paul,

Thank you for this post. As, you wrote in this post that, once you wrote a paper on analytical derivation to select value for M. Could you please refer the paper here? I would like to read the paper.

Regards,

Dewan
ReplyDelete
Replies
AnonymousJuly 5, 2012 at 3:07 AM
Hello Paul,

Firstly i would like to thank you for this post.
In my formulation, i use the big M to linearize Z=A*B knowning that B is boolean variable and (Z and A) are integer, to do this i have the following constraints:

A-M*(1-B)<=Z
Z<=A
z<=M*B

My question is about the imprecision of the objective function? Is it possible to get this imprecision? If you have some papers, please give them to me.

Regards
ReplyDelete
Replies
AnonymousJuly 17, 2012 at 2:46 AM
Hi Paul,

I wonder if is it possible to use the Big M to linearize "A/(B*C)", such as: A is a real variable, C is a parameter and B is a real variable that takes 1 or epsilon (very small value between <= to 1 and > to 0).

The objective is: if B=1 then A/C
if B=epsilon then 0

Thanks a lot for this post

Best regards
ReplyDelete
Replies
AnonymousJuly 18, 2012 at 10:20 AM
Ok thanks Paul,

But in some cases Z= M i don't get the correct value for z.

i put M1=M2=M3=M4=1000

Do you think the problem comes with the value chosen for M ?
ReplyDelete
Replies
Paul A. RubinJuly 18, 2012 at 11:18 AM
$z=M_i$ should not be possible. You made $y$ binary, right? If $y=0$, the third and fourth constraints resolve to $0\le z\le 0$, so $z$ must be 0. If $y=1$, the first and second constraints resolve to $A/C\le z\le A/C$, so $z$ must equal $A/C$. If $M_3$ and $M_4$ are chosen too small (less than $A/C$ can be), then either $y=0$, $z=0$ is forced or the model becomes infeasible. Similarly, if $M_1$ and $M_2$ are chosen too small to allow $z=0$, then $y=1$ is forced (or the model becomes infeasible).

I don't think the values of $M_i$ are large enough to cause serious numerical instability, so I suggest you double-check your model and make sure everything is coded correctly.
ReplyDelete
Replies
Sothea HongJuly 30, 2012 at 1:34 PM
Dear,
My name is sothea, it is to see your discusion about Big M method. I use this method to linearise my problem but i dont know whether i have instability issue. In my model, i use M=1000. As i understand from Paul's command, it is not enough value to cause a numerial problem, right? I have another question about computation time. between using big M methode and indicator variable, which one is better in term of computation time? and how to set SOS in Cplex? do you have any experience to solv 1 million of coeffient nonzero of MIQP in Cplex? in fact my problem i have a model using MIQP which has about 1 million of coeffient nonzero. I tried to solve this but i never reached the integer solutions. Do you have any sugestion to handle it? Thanks in advance,
Sothea
ReplyDelete
Replies
OanaAugust 29, 2013 at 3:02 AM
Thank you so much for this post! It was really helpful in understanding some of the shortcomings of the "big M" method. Is there any reference (article) that further develops this subject? Thank you!
ReplyDelete
Replies
NickNovember 24, 2014 at 5:16 PM
Hi, thaks for the article, is there any way to get rid of Big-M when forming Mccormick envelopes ?
eg
x1 >= x2 - BigM*(1-w)
x1 <= x2
x1 <= BigM*(w)

where x1,x2 are continuous, and w is a binary

ReplyDelete
Replies
UnknownNovember 28, 2014 at 11:26 PM
This comment has been removed by the author.
ReplyDelete
Replies
AnonymousNovember 28, 2014 at 11:39 PM
Hi Paul,

Thank you for the valuable comments. I have used big M for linearizing multiplication of a binary and a continuos varible. When I play around with M value, I observe changes in my objective function value. What might be the problem? Is it about precision?
ReplyDelete
Replies
UnknownNovember 19, 2015 at 12:22 PM
Thanks for a great post. Why question is why "M" and not some other letter? I am curious if you know any history behind this and its first use. Thanks!
ReplyDelete
Replies
optimiziation GAMSMay 10, 2017 at 10:39 PM
Hi Paul,

If I have a two continuous variables X*y=z and I want to linearize them IS there any easy way to do that similar to using big M as if I have a binary and a continuous variable ?

Appreciate your help in advance
ReplyDelete
Replies
MeritxellJune 26, 2017 at 8:40 AM
Hi Paul,

Really nice post. I need to linearize the maximum of continuous variables z_i and of course to know at which index it is achieved. So far I was using the classical way (i.e. with an auxiliary binary constraint stating the index of the maximum and a bigM constraint). However, after a discussion with some colleagues we were wondering if it is possible not to use any bigM constraint to this end. I have not found such an approach on the literature and I have not found the way myself. Have you ever seen something similar?

Thanks!
Meritxell
ReplyDelete
Replies
UnknownApril 13, 2018 at 8:54 AM
Hi Paul. Thanks for a great post.

I have a problem regarding optimization of cleaning scheduling plans for trains. The objective is to minimize time spend cleaning while trying to keep the kilometers of dirtiness below a threshold. Trains accumulate kilometers of dirtiness from stop to stop and this threshold gets exceeded sometimes due to lack of time on each step (model accounts for this using a slack variable u_s counting the number of exceeded kilometers of dirtiness). In my model, I want to make sure it always prioritizes to keep the train below the threshhold, hence punishing the model if it exceeds the threshhold, even if it results in longer cleaning times. The secondary object is to minimizing the time spend cleaning, resulting in the following objective function:
sum(TimeCleaning * x_s) + M*sum(u_s)
Where TimeCleaning is a constant in minutes, x_s denotes if cleaning takes place at stop s [binary] and u_s is a continues that counts the exceed kilometers of dirtiness at stop s.
Do you have any idea how I choose the value of M to make to the threshhold is exceeded at least as possible? I have considered changing u_s to be binary. If we consider a train over a full day the maximum amount of time that can be used for cleaning would be M = 24*60 = 1440 minutes.
However the important metric for this project is not if it exceeeds the threshhold, but rather how much it exceeds threshhold, and I can't really find any litterature on using Big M with continues variables.

Thanks!
ReplyDelete
Replies
UnknownNovember 24, 2021 at 2:18 AM
See my paper (2020): A three-phase Simplex Method for Infeasible and Unbounded Linear Programming Problems" J.Math. Comp Sci 10(2020) No 4, 906-921.
ReplyDelete
Replies
UnknownDecember 26, 2021 at 12:15 PM
See article:Available online at http://scik.org
J. Math. Comput. Sci. 10 (2020), No. 4, 906-921
https://doi.org/10.28919/jmcs/4387
ISSN: 1927-5307. A three-phase simplex method....where M=1 or M=100 000 is.
A THREE-PHASE SIMPLEX METHOD FOR INFEASIBLE AND UNBOUNDED
LINEAR PROGRAMMING PROBLEMS
EVALD UBI∗
Department of Economics, Tallinn University of Technology, Tallinn 12618, Estoni
ReplyDelete
Replies
Ed KlotzApril 26, 2022 at 7:44 PM
If you have the typical big M style constraint with binary z and
x - Mz <= 0, and your model lacks a known modest upper bound on x to use instead of M, you may be able to gain insight by solving the related model with a suitably chosen lower bound on x. For example, if the original constraint is
x - 100000000 z <= 0, try solving the model giving x a lower bound of 100000. If the model is infeasible, you immediately can reduce M to 100000. But you can also calculate an IIS that explains the infeasibility (several solvers offer this functionality), then use that to deduce an even tighter value for M. Remember, infeasibility is your friend. On the other hand, if the model remains feasible, you could try a larger value until you get an infeasible result.
ReplyDelete
Replies

Add comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.