OR in an OB World: Rounding Error

Saturday, May 28, 2016

Rounding Error

I recently experienced the eight-bazillionth iteration of the following scenario. Someone posts a request for help on a forum. They are doing some sort of MIP model, or possibly something even more advanced (Benders decomposition, column generation, ...), and things are not going well. In some cases they think they have made a coding error; more commonly they think the solver (CPLEX in my personal experience, since that's the only one whose forums I watch) has a bug in it. After substantial back and forth, the key problem emerges: their binary variable came out 1.00000 E-9 or 0.99999999, which is "clearly" neither 0 nor 1.

At this point, my reaction is always the same (add your favorite expletives after every few words): "Don't they know about rounding error???"

In fairness, though, they may not, and perhaps the blame lies on the shoulders of faculty (possibly including me, in my former life). One of the courses I took during my first term of graduate school (in math) was numerical analysis. This is back when mainframes were just making the transition from vacuum tubes to semiconductors, and "user input" was a euphemism for "deck of punch cards". It's also a time when memory was at a premium and, thus, single precision ruled the realm.

(BEGIN TANGENT) I suspect that some readers who are (much) younger than I, and who program, will have noticed that pretty much everything of a floating point nature is automatically done double-precision these days, and may be wondering why single-precision float types still exist. They're a holdover from my youth, and I suspect exist almost exclusively for backward compatibility. (END TANGENT)

I don't recall why I picked numerical analysis as an elective -- whimsy, I think -- but it ended up being one of the most useful courses I took in grad school. The first, and most important, lesson from it was that computers do math inexactly. This should be immediately obvious if you stop to think about it. I learned in childhood that 0.3333... = 1/3, where the ... means "repeated an infinite number of times". Stop the decimal expansion after a finite number of digits, and you get something slightly less than 1/3. Now pretend for the moment that your computer encoded numbers in decimal rather than binary, and that you used the expression 1/3 in a program. The compiler would convert 1/3 to a decimal representation, but not 0.3333... -- because that would require an infinite amount of memory to store, and the computer only has a finite amount. So the computer has to truncate the decimal expansion, and as a result what the computer stores for 1/3 is not actually 1/3. (Now go back to using binary. The same logic applies.)

I'm not going to repeat an entire course in numerical analysis here (too much typing), but another lesson I learned was that, when programming, I should never do strict equality comparisons on anything that was not stored as an integer. Here's a small example that I just ran in R (not to pick on R -- it was just handy):

> sqrt(9) - 2.99999999999999999 == 0
[1] TRUE

To a mathematician, that relation is patently false, but to the program the decimal expansion is "close enough" to 3.

My prof shared one anecdote that has stuck with me for almost half a century now. He told of an engineer who coded his own routine for inverting a matrix and was not particularly careful about dealing with rounding errors (and "stiff" matrices, a topic for another time). The engineer then published a paper which included, at one point, the inverse of a particular matrix, computed by his program, with each entry in the matrix printed to some absurd number of decimal places (five, I think). All the digits to the right of the decimal points were wrong. So were the first couple of digits immediately left of the decimal points. You can ignore the limitations of computer arithmetic precision, but (queue sinister background music) those limitations will not ignore you.

Where I'm going with this is that I learned about the limitations of computer math somewhat accidentally, because I just happened to take numerical analysis. I suspect that some OR/IE programs require it, and I'm pretty sure others (most?) don't. In the interests of reducing frustration for a lot of people (of whom I'm the only one I care about), and perhaps avoiding incorrect results being published, I think we should pressure anyone teaching computational topics in OR (particularly, but not exclusively, integer programming) to provide some minimal coverage of rounding and truncation errors, the IEEE standards for single- and double-precision values, etc. If they cannot (or will not) teach it themselves, they can at least require that the student go learn it as a prerequisite.

Please! I'm begging you!

11 comments:

UnknownMay 28, 2016 at 7:44 PM
I am not convinced that CPLEX is doing the right thing here (nor are other solvers I've seen that do the same). If I've specified that a variable is binary, I think I ought to get an exact 0 or exact 1 out when the problem is declared "solved".

I have trouble understanding the circumstances where the solver *can't* return a true 0 or 1. I guess the idea would be something like this: the iterate isn't feasible to within numerical tolerance when the variable is exactly 0 or 1; but it *is* feasible to within numerical tolerance for 1e-8 or 1-1e-8, so we declare that "solved".

I'm sure a solver implementor could convince me this is absolutely necessary internally. But that doesn't mean it *should* be returning 1e-8 or 1-1e-8. Again, the variable is binary, those aren't the values I should be seeing at my output. From a user experience standpoint, I think solvers should round the variables to 0 or 1, and then include in the documentation that this may occur.
ReplyDelete
Replies
PetterMay 29, 2016 at 10:32 AM
Single precision is still really important. Memory bandwidth has still not become infinite.

No only that, half-precision is becoming ever more important: https://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/
ReplyDelete
Replies
Ed KlotzJune 9, 2016 at 12:19 PM
Recent versions of CPLEX have supported setting the integrality tolerance to 0. However, for some of the reasons already discussed, CPLEX cannot guarantee an integer feasible solution will satisfy the 0 integrality tolerance. I will give more details on why below.

First, regarding the notion of rounding a solution with slight violations of integrality, the rounded solution can become infeasible relative to the constraints and bounds. That really is not that surprising; if the rounded solution was feasible, branching probably would have found it anyways. This notion of rounding to remove integrality violations was very quickly discarded in early versions of CPLEX.

Regarding why an integrality tolerance of 0 may not be realistic under finite precision, let's return to Paul's example of 1/3. Suppose you have, in double precision

.3333333333x = 1
x integer

Let's run branch and bound on this. Relaxing integrality, we get x = 3 + (1/3)*1e-10

With an integrality tolerance of 0, we have to branch on x. The up branch is immediately infeasible. For the down branch, we add the constraint x <= 3. Let's solve LP relaxation. In the parent node we had x = 3 + (1/3)*1e-10 as a basic variable. That basis remains feasible within (1/3)*1e-10 for the child node with the x <= 3 branch. Given CPLEX's default feasibility tolerance of 1e-6, we accept this solution. So now we need to branch down on x= 3 + (1/3)*1e-10 again, and we thus repeat this process an infinite number of times. In this simple example numerous remedies are available. But, when the integrality violation arises from a complex interaction of rows and columns of the basis matrix, this becomes a much greater challenge.

This is why CPLEX supports an integrality tolerance of 0, but doesn't guarantee the final solution will adhere to it. CPLEX will try to branch on all integrality violations in that situation, but if it sees repeated branches as in this example, it will stop before resolving all of them.

One way to address this issue would be if the optimizer could support a feasibility tolerance of 0. Fundamentally, if your MIP algorithm solves continuous relaxations, the level of precision of your integer feasible solutions is strongly connected to the level of precision you can compute for the feasible solutions of your continuous relaxations. But, so far I don't know of any finite precision solvers for continuous problems that support a feasibility tolerance of 0.

Maybe there is a way to address this in the branch and bound framework, but hopefully this post at least sheds light on some of the fundamental challenges involved in this issue. Or maybe computer memory and speed will continue to improve, and solvers that perform exact arithmetic will become standard.
ReplyDelete
Replies
Ed KlotzJune 9, 2016 at 12:24 PM
Not to toot my own horn or anything, but I'm going to toot my own horn. I have written a paper on some of these finite precision computing issues for LP and MIP, which can be found here:

http://pubsonline.informs.org/doi/abs/10.1287/educ.2014.0130

The Cliff Notes version (i.e. slides and a recording) can be found here:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W1a790e980a7d_49c5_963d_2965e5d01401/page/Virtual%20User%20Group%20Meetings

ReplyDelete
Replies

Add comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.

OR in an OB World

Saturday, May 28, 2016

Rounding Error

11 comments:

Previous Posts

Labels