A somewhat recent Twitter exchange with Bo Jensen, which I believe was triggered by a post on an optimization forum (details of which are fading rapidly from my memory), motivated this entry. It deals with optimization models (specifically linear or integer linear ones) that are constructed using what is sometimes referred to as the "big M" method.
Terminology
First, I need to clarify terminology. Any reference to "big M" means the incorporation into a model of coefficients (usually represented by $M$) that are in a sense surrogates for $\infty$. The phrase "big M method" is frequently used to cover two somewhat different modeling techniques.
- One eliminates the first phase (finding a feasible solution) of the two-phase simplex method by merging the phase I and phase II objective functions into a single function, penalizing the artificial variables with very large coefficients ($M$). It's an example of using a penalty function, and requires a sufficiently large value of $M$ to ensure that violating a constraint is never optimal.
- The other provides a way for binary variables to turn constraints on or off. Consider, for example, a lockbox problem in which the number $x_{ij}$ of checks from customer $i$ assigned to lockbox location $j$ cannot exceed some capacity limit $M$ if the location is used (binary variable $y_j=1$), but should be zero if the location is not used ($y_j=0$). This is formulated as $$x_{ij}-My_j\le 0.$$
Rounding error
My first course in graduate school was in numerical analysis, and it came as a bit of a shock to my system. We covered
rounding, truncation and representation errors rather extensively -- very disconcerting stuff to a pure mathematician. One would like to think that computers do arithmetic accurately. One would like to believe that politicians are honest and competent. One is destined to be disappointed both times.
I won't go into a lengthy introduction to the aforementioned types of error (which I'll somewhat cavalierly lump together under the name "rounding error"). Let me instead give a small example. The code is in Python (which represents all
floating point numbers in
double precision), but it could be any language.
x = 1.0e10
y = 1.0e-9
z = x - y
w = z - x
print x, y, z, w
print (z == x)
print (w == -y), (w == 0)
The output of the first print line is:
1.00000000000000e10 1.00000000000000e-9 1.00000000000000e10
0.000000000000000
Note that
z looks suspiciously like
x (it should be strictly smaller), and
w (which should be
-y) is zero. That could just be limited precision in the output. The next three prints, however, tell the tale. The first (
z==x) and third (
w==0) should be false, while the second (
w==-y) should be true. What we actually get:
True
False True
Whoops!
This leads me to a few truisms from that long-ago numerical analysis course:
- Addition or subtraction of numbers introduce rounding errors, and addition or subtraction of numbers with substantially different magnitudes introduce rounding headaches. (Multiplication and division can also introduce small errors, but addition and subtraction tend to be the killers. The biggest problem with multiplication or division is probably overflow or underflow.)
- Comparisons are closet subtractions. It's generally unsafe to test $x=y$ when $x$ and $y$ are floating point numbers. Instead, test $|x-y|\lt \epsilon$ for a suitably small (but not too small) $\epsilon \gt 0$.
- In particular, things that should be zero often come out not quite zero (commonly referred to as "decimal dust"). More than once I've seen someone ask why some solver gave an optimal solution to his problem that had a binary variable equal to 1.0e-9 (throw in a few more junk digits if you like). It should be either 0 or 1! Well, yes; but if the solver held out for exactly 0 or exactly 1, you'd get a lot of wrong answers due to decimal dust. So the solver accepts that "darned near 0" is 0 and "darned near 1" is 1.
$M$ in the objective or RHS
Having $M$ in the objective function (as in the first definition of "big M" above, eliminating phase I of the simplex method) or having it as the right hand side of a constraint introduces rounding error (per my first bullet above). Specifically, $M$ in the objective can cause rounding errors in reduced costs, which may confuse the solver about which columns are candidates for entry to the basis. $M$ in the right hand side can introduce rounding errors in the right hand sides of subsequent tableaus, which may cause problems selecting the variable to leave the basis. As far as I know, those issues are generally survivable, and I've not seen any crash landings caused by it. Perhaps someone who has will comment with an example of a bad outcome from $M$ on the perimeter of the model.
$M$ in the constraints
The lockbox formulation illustrates the situation where $M$ shows up as a coefficient in the constraint matrix. This is where serious misadventures can occur. If $M$ creeps into the basis matrix $\textbf{B}$ (or a column that is a candidate for entry into the basis matrix), the ensuing rounding errors can cause $\textbf{B}$ to look singular (or can cause a column that should be ineligible for entry to the basis, because it would make $\textbf{B}$ singular, look eligible). The rounding errors can also cause severe loss of precision in the computation of $\textbf{B}^{-1}$. In technical terms, $M$ (or $1/M$) appearing in $\textbf{B}$ can make $\textbf{B}$
ill-conditioned.
Suppose that, in a formulation with $M$ in constraints, we bump into the following candidate for a basis matrix$$\textbf{B}=\left[\begin{array}{ccc} 1 & 0 & \frac{1}{M} \\ M & 0 & 1 \\ 1 & M & 0\end{array}\right].$$Since $$\textbf{B}\ \left[\begin{array}{c}1 \\ \frac{-1}{M} \\ -M\end{array}\right]=\left[\begin{array}{c}0\\0\\0\end{array}\right]$$it is obvious that $\textbf{B}$ is singular. Right? Well, let's check the determinant of $\textbf{B}$. I did the calculations using
Sage 4.7, but I have no reason to think other packages would do any better (although pathologies might well vary).
Here is the Sage code I used to compute the determinants:
B = matrix(3,3,[1.0,0.0,1.0/M,M,0.0,1.0,1.0,M,0.0]);
[det(B.substitute(M=10.^k)) for k in (28, 29, 30, 31)];
Here is a table of the results:$$\begin{array}{rccc}M & 10^{28} & 10^{29} \\ \det(\textbf{B}) & 0.000000000000000 & 0.000000000000000 \\ \\
M & 10^{30} & 10^{31} \\ \det(\textbf{B}) & -1.40737488355328e14 & 0.000000000000000\end{array}$$
So something inexplicable (but bad) happened when $M=10^{30}$, presumably due to loss of precision in some key computation.
You won't see exactly this glitch in a typical "big M" model, but if you play with enough "big M" formulations, sooner or later (probably sooner) you will trip over numerical instability attributable to mixing disgustingly large coefficients with considerably smaller ones.
$M$ in MIP models
As I mentioned above, one common use of "big M" formulations (illustrated by the lockbox example) is to allow a binary variable to turn a constraint on or off. Even if ill-conditioned basis matrices do not occur, large values of $M$ can cause branch-and-bound solvers to make slow progress solving the mixed-integer programming model (MIP). Consider our simple lockbox constraint: $x_{ij}-My_j\le 0$. There will be an associated penalty cost (say, $c_j y_j$) in the objective function for using location $j$. Now suppose that, at some point, the solver is considering a node where $x_{1j}=2$, $x_{2j}=5$ and $x_{ij}=0$ for other values of $i$. Logically, we know this requires $y_j=1$ and incurs cost $c_j$; but in the LP-relaxation of the node problem, $y_j=5/M$ is sufficient, incurring a cost of just $5c_j/M$. For large values of $M$, this substantially underestimates the true cost, leading to loose node bounds. Loose bounds at nodes in turn make it hard for the solver to prune nodes based on objective value, and so more nodes need to be examined, slowing the solution process.
$M$ and (dubious) pedagogy
I can't write a post this long without at least one rant. Let me stipulate up front that neither text book authors nor instructors can be expected to cover every exigency. That said, many instructors and authors introduce students to "big M" formulations in the abstract, because they are mathematically neat and easy to explain, and then move on to the next topic without regard to the grubby implementation details. I confess to having done this myself when I was young and foolish.
At least one thing, though, is simply notational laziness. That $M$ in my lockbox example? It should be $M_j$. There is no reason why a single value should be used for all instances of $M$ in a model, and very good reasons why the values should differ. I don't recall seeing $M$ subscripted very often, if ever. (Are text book authors charged for each subscript they use?) In certain examples, including lockbox problems, there is a natural and obvious value for $M$ (typically a capacity limit), and in those cases you see individualized values being used. More often, though, there is just this undifferentiated symbol $M$ floating throughout the model. I think that influences users to actually use a single, very large value everywhere. (I recently saw a model on a CPLEX support forum that was clearly a "big M" model with a single large value of $M$ in about half the constraints.)
Concluding thoughts...
... and if you're still with me, you're
very ready for this post to conclude.
- If you are going to use a "big M" formulation, look for the smallest values of $M$ that work in the context of the model. (I once wrote a paper in which I analytically derived a sufficiently large but not horribly inflated value for $M$. There are tricks for doing so.)
- $M$ does not need to be one size fits all.
- Particularly if $M$ will show up as the coefficient of a variable in a constraint, be prepared for numerical instability. Learn how to detect numerical problems and what levers your solver has for coping with instability.
- Consider alternatives to a "big M" approach (two phase simplex, Benders decomposition, ...) if you have stability or accuracy problems you cannot resolve.