Saturday, August 7, 2010

Ill-conditioned Bases and Numerical Instability

I spent a chunk of this afternoon playing with a MIP model sent to me by someone who could not figure out why a solution CPLEX certified as "optimal" had nontrivial constraint violations.  That motivated today's blog entry, but first a reminiscence (it's my blog, and I grant myself permission).

Back when I started grad school, men were men and computers were mainframes (and had more or less completed the transition from vacuum tubes to transistors; integrated circuits were still pretty experimental).  The first course in my doctoral program was numerical analysis (highly recommended for future OR practitioners), taught by a visiting professor who had not gotten the memo about our "easy grades for doctoral students" policy.  (On the bright side, I spent the remaining 27 quarters of my doctoral program free from worry about maintaining a 4.0 GPA.)  He told an anecdote about a nameless professor who, in the course of some research, inverted a matrix using some homebrewed FORTRAN code. In the inevitable journal article, he printed the coefficients of the inverse matrix (it apparently had a fairly low dimension) to some ridiculous number of decimal places ... all of which were wrong ... as were the first four digits to the left of the decimal point.  The moral of the story was that "stiff" (ill-conditioned) matrices need to be handled carefully.

A bit later in my studies, I took a course in convex analysis that included some linear optimization.  I'd also had a course in linear programming as an undergraduate.  Neither the undergrad course (in which pivoting was done by hand) nor the graduate course (which emphasized theory) mentioned numerical precision at all.  So I'm not surprised that a number of LP/MIP users today are either unaware of or have at best a vague understanding of scaling, conditioning and the perils of stiff matrices.

I probably should offer up some sort of definition of "ill-conditioned" at this point. The Wikipedia has a useful definition of the condition number of a (square, nonsingular) matrix.  Essentially, it's the product of the matrix norm of the matrix with the norm of the inverse matrix. Large condition numbers invite bad rounding errors when dealing with the inverse, and of course the simplex method explicitly or implicitly inverts the basis matrix at each iteration. What constitutes a "large" condition number is in the eye of the beholder, but I think it's safe to say that if your matrix has a condition number bigger than the US federal deficit (in dollars), you should start paying attention to it.

The related concept "numerical instability" generally refers to a situation where small, seemingly benign rounding errors can lead to strange results. One source of small errors is when you export a model to a text file and then import it again. Integers are retrieved at full precision, but decimals are rounded both during export and during import, so the imported model is not identical to the exported model. The differences are small, and if the problem is numerical stable the effect is probably negligible; but if the problem is unstable, those small differences can have significant consequences.

A word about scaling before I move on to stiffness.  If your constraint matrix mixes big honking coefficients and itty-bitty coefficients (I refer here to magnitude, irrespective of sign), there's a pretty good chance that somewhere along the line you'll get some stiff basis matrices.  Even if you don't, there's a good chance rounding error will cause problems somewhere.  To a mathematician, a zillion minus epsilon is less than a zillion (epsilon being small and positive).  To a computer, maybe it's less, maybe it's equal.  So it is advisable to avoid scaling issues by converting the units of variables, where necessary, to bring coefficients into a tighter range of magnitudes.

There is, however, a misconception in some quarters that poor scaling is the only source of ill-conditioned basis matrices. Sadly, that's not true; they can crop up for other, harder to predict, reasons. If you have software that computes condition numbers, pick a nice small value for $\epsilon\gt 0$ and compute the condition number of
$\left[\begin{array}{cc} 1 & 1\\ 1-\epsilon & 1 \end{array}\right].$ Kind of depressing, isn't it?

So what happens if you encounter ill-conditioned basis matrices? It depends on a number of factors, not least of which is the care with which the authors of the software you are using dealt with the possibility. Some algorithms for inverting, factoring or otherwise screwing with matrices are more robust than others, but oftentimes the more robust algorithms are also slower. So it is quite possible (as is the case with CPLEX) that your LP/IP software is capable of compensating for ill-conditioned matrices, at least to some extent, but may not do so to its full capabilities by default because using the full capabilities would slow the software down, including on the majority of problems where that caution was unnecessary.

Optimization software typically will provide, on request, some measures of solution quality. At minimum, you would hope for the maximum violation of any integrality restriction (in a MIP), the maximum violation of any variable bounds, and the maximum violation of any functional constraint (which may be expressed as the maximum bounds violation of any slack variable, which is equivalent). You can, of course, compute these for yourself by substituting the solution values into the constraints. Rounding error being omnipresent, do not expect zero violations across the board; but if you have violations that exceed what you reasonably might expect from rounding, take a look at the condition numbers (sometimes called "kappa values") of the basis matrices. CPLEX will let you see the condition number of the final basis matrix when you solve an LP via one of the simplex variants. A "final" kappa is not available when you solve a MIP, but recent versions of CPLEX allow you, through a parameter setting, to tell it to record the distribution of basis kappa values for either a sample of the node LPs it solves or for all of them (the latter option obviously being slower than the former). It will then tell you what percentage of basis matrices struck it as "stable", "suspect", "unstable" or "ill-posed".

If you are getting suspicious results, there are a number of things you can try, starting with checking the scaling of your coefficients and proceeding to a variety of software settings. I'm familiar with the choices in CPLEX, but my guess is that other optimization software offers a similar palette. First, you can adjust parameter settings that declare the maximum allowable violation of any constraint (EpRHS in CPLEX) or integrality restriction (EpInt). Cranking those tolerances down may help, but it is not guaranteed. (Users sometimes set the tolerances at zero, which can have the unfortunate effect of making a feasible problem look infeasible to CPLEX.) Tighter tolerances may well slow the solution process. Another possibility is to find a setting that tells your solver to try to be more careful about rounding errors. CPLEX has a "Markowitz tolerance" (EpMrk) setting which, if increased, sometimes helps with ill-conditioned bases. The current version (CPLEX 12.2) also has a "numerical precision emphasis" setting (NumericalEmphasis) that tells CPLEX you are concerned about stability issues and want it to be extra careful.

1. Ed Klotz from CPLEX has a very nice talk which focus on several interesting issues, among them numerical stability and condition numbers. As I recall he suggest the following :

Better precision on input data.
Reduce large coefficients.
Turn on numerical emphasis on and scaling aggressive.
Set markowitz tol > 0.9.

But more interestingly he touches the subject of modeling with large big M penalties, which also can lead to lost precision and numerical issues. He suggest to solve such problems in a two stage approach i.e not to mix multiple objective into one.

2. I attended an iteration of that talk and got the slides, which I've just now glanced at again. I seem to recall Ed talking about big M, but I can't find it in the slides. I can find where he points out that large coefficients can cause trouble, though, and of course big M is a leading culprit there.

The model I alluded to in the first paragraph (which I received as a .lp file without much context) had certain specific large coefficients that recurred frequently, so I suspect it is a big M formulation, and I did in fact suggest to the owner that she look for a smaller (but still valid) value of M. I did not go so far as to suggest combinatorial Benders cuts, which I suppose could be considered a two stage approach (though I don't think it's exactly what Ed had in mind). Mike Trick posted slides from a talk he gave (at ALIO, I think) that covered combinatorial Benders.

3. Hello, I'm solving the Robust VRP problem, and I was telling you may problem via the IBM site.

I was reaing your post and many of the parameters mentioned are presents in the version 12.2 of CPLEX, but I'm using the 12.1 and MIPKappa is not a parameter in 12.1.

What is the meaning of EpMrk, EpInt and EpRhs parameters? and How can I estimate the appropiate values for that parameters?

4. Hi (again),

MIPKappa was new in CPLEX 12.2. There are a couple of things you can do in 12.1 to get a feel for whether you have numerical stability issues. One is to solve the MIP, fix the integer variables and solve the restricted LP, then ask for the kappa value of the final basis of the LP. The other is to relax the problem to an LP, solve it, and check the kappa value of that LP.

Regarding the other parameters, the computed value of an integer variable is considered integer-feasible if it is within EpInt of the nearest integer, and a constraint (this includes variable bounds) is considered satisfied if the discrepancy is less than EpRHS in magnitude. I usually stick with the defaults, but sometimes if you have persistent issues with infeasible solutions looking feasible you can fix them by tightening one or both parameters. The danger is that if you tighten them too much, you can make an optimal solution look infeasible (or, worse, make the entire problem look infeasible) due to unavoidable rounding errors.

EpMrk has to do with rules CPLEX uses for pivot selection. From the name, I suspect it's connected to a rule developed by Markowitz for pivoting (http://en.wikipedia.org/wiki/Minimum_degree_algorithm). I don't know enough to explain it precisely, but the essence is that if you increase the value you _may_ improve numerical precision (and I think will likely slow down the pivot process). I've occasionally had CPLEX toss messages into the output log saying that it was automatically increasing EpMrk (presumably in response to what it perceived to be numerical stability problems), and I've occasionally increased it myself (guessing a new value or, in one case, setting it to its maximum allowed value) because a problem was going out of its way to annoy me.

5. I'm pretty late to this thread, but I'll add a few comments. First regarding what constitutes a large condition number, CPLEX's MIP kappa feature has to make that decision when it categorizes your model as stable, suspicious, unstable or ill posed. The condition number of a basis matrix by its definition multiplies a perturbation to the input
of the linear system associated with the basis matrix. The product of the condition number and the perturbation provides an upper bound to the potential perturbation of the computed solution to that linear system. Using this definition, a useful measure for an optimizer like CPLEX involves assessing when the condition number is large enough to cause the optimizer to make decisions based on round off error. That happens when the round off error can exceed the algorithmic tolerances (e.g. the feasibility, optimality and integrality tolerances mentioned above). So, with the typical machine
precision of 1e-16, the typical feasibility and optimality tolerances of 1e-6, a condition number greater than 1e+10 can magnify perturbations from round off due to machine precision sufficiently to
exceed the solver's tolerances, causing it to make decisions based on round off error. That in turn can yield inconsistent results, or ill conditioning.

Second, regarding the 2x2 matrix that has very reasonable values but is ill conditioned, the numerical analysts have shown that an ill conditioned matrix is equivalent to an almost singular matrix. The 2x2
matrix is indeed almost singular for small values of epsilon. Optimizers effectively handle truly singular matrices, but are less effective with nearly singular matrices. As practitioners, we need to be careful to avoid turning a truly singular matrix into a nearly singular one by rounding data unnecessarily. In particular, try to avoid rounding fractions of integers out to 6-12 decimal places in your
data. Either use all 16 digits of precision, or rescale the model by multiplying by the smallest common denominator of the fractions to transform the data into integer values.

A more detailed discussion of these issues can be found at

http://www.sciencedirect.com/science/article/pii/S1876735413000020

1. Ed: Thanks for the clarifications (and the link).

If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on OR-Exchange.