Saturday, May 28, 2016

Rounding Error

I recently experienced the eight-bazillionth iteration of the following scenario. Someone posts a request for help on a forum. They are doing some sort of MIP model, or possibly something even more advanced (Benders decomposition, column generation, ...), and things are not going well. In some cases they think they have made a coding error; more commonly they think the solver (CPLEX in my personal experience, since that's the only one whose forums I watch) has a bug in it. After substantial back and forth, the key problem emerges: their binary variable came out 1.00000 E-9 or 0.99999999, which is "clearly" neither 0 nor 1.

At this point, my reaction is always the same (add your favorite expletives after every few words): "Don't they know about rounding error???"

In fairness, though, they may not, and perhaps the blame lies on the shoulders of faculty (possibly including me, in my former life). One of the courses I took during my first term of graduate school (in math) was numerical analysis. This is back when mainframes were just making the transition from vacuum tubes to semiconductors, and "user input" was a euphemism for "deck of punch cards". It's also a time when memory was at a premium and, thus, single precision ruled the realm.

(BEGIN TANGENT) I suspect that some readers who are (much) younger than I, and who program, will have noticed that pretty much everything of a floating point nature is automatically done double-precision these days, and may be wondering why single-precision float types still exist. They're a holdover from my youth, and I suspect exist almost exclusively for backward compatibility. (END TANGENT)

I don't recall why I picked numerical analysis as an elective -- whimsy, I think -- but it ended up being one of the most useful courses I took in grad school. The first, and most important, lesson from it was that computers do math inexactly. This should be immediately obvious if you stop to think about it. I learned in childhood that 0.3333... = 1/3, where the ... means "repeated an infinite number of times". Stop the decimal expansion after a finite number of digits, and you get something slightly less than 1/3. Now pretend for the moment that your computer encoded numbers in decimal rather than binary, and that you used the expression 1/3 in a program. The compiler would convert 1/3 to a decimal representation, but not 0.3333... -- because that would require an infinite amount of memory to store, and the computer only has a finite amount. So the computer has to truncate the decimal expansion, and as a result what the computer stores for 1/3 is not actually 1/3. (Now go back to using binary. The same logic applies.)

I'm not going to repeat an entire course in numerical analysis here (too much typing), but another lesson I learned was that, when programming, I should never do strict equality comparisons on anything that was not stored as an integer. Here's a small example that I just ran in R (not to pick on R -- it was just handy):
> sqrt(9) - 2.99999999999999999 == 0
[1] TRUE

To a mathematician, that relation is patently false, but to the program the decimal expansion is "close enough" to 3.

My prof shared one anecdote that has stuck with me for almost half a century now. He told of an engineer who coded his own routine for inverting a matrix and was not particularly careful about dealing with rounding errors (and "stiff" matrices, a topic for another time). The engineer then published a paper which included, at one point, the inverse of a particular matrix, computed by his program, with each entry in the matrix printed to some absurd number of decimal places (five, I think). All the digits to the right of the decimal points were wrong. So were the first couple of digits immediately left of the decimal points. You can ignore the limitations of computer arithmetic precision, but (queue sinister background music) those limitations will not ignore you.

Where I'm going with this is that I learned about the limitations of computer math somewhat accidentally, because I just happened to take numerical analysis. I suspect that some OR/IE programs require it, and I'm pretty sure others (most?) don't. In the interests of reducing frustration for a lot of people (of whom I'm the only one I care about), and perhaps avoiding incorrect results being published, I think we should pressure anyone teaching computational topics in OR (particularly, but not exclusively, integer programming) to provide some minimal coverage of rounding and truncation errors, the IEEE standards for single- and double-precision values, etc. If they cannot (or will not) teach it themselves, they can at least require that the student go learn it as a prerequisite.

Please! I'm begging you!

11 comments:

  1. I am not convinced that CPLEX is doing the right thing here (nor are other solvers I've seen that do the same). If I've specified that a variable is binary, I think I ought to get an exact 0 or exact 1 out when the problem is declared "solved".

    I have trouble understanding the circumstances where the solver *can't* return a true 0 or 1. I guess the idea would be something like this: the iterate isn't feasible to within numerical tolerance when the variable is exactly 0 or 1; but it *is* feasible to within numerical tolerance for 1e-8 or 1-1e-8, so we declare that "solved".

    I'm sure a solver implementor could convince me this is absolutely necessary internally. But that doesn't mean it *should* be returning 1e-8 or 1-1e-8. Again, the variable is binary, those aren't the values I should be seeing at my output. From a user experience standpoint, I think solvers should round the variables to 0 or 1, and then include in the documentation that this may occur.

    ReplyDelete
    Replies
    1. Michael: I've kicked this around in my head a bit, and I can see both sides of the question. I can't speak for the developers of CPLEX, Gurobi or Xpress, but if I were writing a solver, my concern would be that rounding is a one-way transformation -- you can't "unround" the value. I figure that anyone who can use a programming API to a solver (and hence presumably can program) should be able to find and use the programming language's rounding function. On the other hand, if I as the vendor do the rounding for the user, the user has no way to tell whether the actual double-precision value was really close to the nearest integer or just barely inside whatever tolerance was set. That information might be meaningful to a user trying to diagnose issues with their model.

      All that said, I don't think it would kill vendors to provide two versions of their "get the value of" functions, one that coughs up the actual value obtained and one that automatically rounds if the variable is integer. Most users would want the latter, so it should probably have the easier to remember name or be the default choice. Vendors of modeling languages would have to do something similar, I suppose.

      Delete
    2. I'd agree with Michael here. The user sets a variable to be integer for a reason: it represents a discrete decision. If rounding the solution to the nearest integer is no longer feasible, then it isn't really a solution as it cannot be implemented. The issue is that for mixed integer problems the rounding requires resolving an LP and it may come back infeasible. But then the solution wasn't really a good one, so it should have been discarded.

      Yes, the user can always tighten the integrality tolerance (which is quite high by default (typically 1e-6-1e-5), more than the feasibility tolerance), but that requires setting an option. Too much work.

      Delete
    3. Imre: I agree with you for the most part, but I don't know enough about the inner workings of a contemporary solver to know whether implementing this would be problematic. The solver would have to round each candidate solution encountered and then solve a restricted LP to make sure the solution was valid. For an integral solution to the node LP, the extra solved problem would be a restriction of the node LP; but for a candidate found by a heuristic, the extra solve might require a restricted version of an LP not currently in memory (and quite possibly a larger one than the current node LP). Also, the solver works with presolved versions of the original model, and rounding would have to be done on the original variables. Not knowing the details of presolve operations (and ignorance here really is bliss!), I don't know if the reverse transform - round - transform sequence would introduce any errors likely to undo the good of rounding.

      I have to think, though, that the authors of solvers are not rounding candidate incumbents for a reason other than laziness on their part. If I'm wrong, then I will definitely join the movement to get unambiguously integer solutions.

      Delete
    4. This comment has been removed by the author.

      Delete
    5. "I have to think, though, that the authors of solvers are not rounding candidate incumbents for a reason other than laziness on their part."

      Oh, I'm absolutely willing to give them the benefit of the doubt on this. The problem, in my view, is the expertise gap between solver developers and solver users. The problem is that what may seem intuitive or sensible to the developers is not at all intuitive for the users. So when these non-integral results are returned, it stokes distrust in the quality of the software.

      As I said, I don't see the wisdom here, but I wouldn't be surprised if they could convince me, or you, or Imre. But we're experts in this field. People whose expertise lies in application areas, and not in optimization itself, are the ones that need convincing.

      Delete
  2. Single precision is still really important. Memory bandwidth has still not become infinite.

    No only that, half-precision is becoming ever more important: https://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/

    ReplyDelete
    Replies
    1. Thanks for the link! I would not have guessed that anyone would want half-precision, but in hindsight it makes sense. If you are running stochastic gradient descent on noisy data, milking a bunch of meaningless decimal places is rather pointless. I wonder if anyone will start producing custom FPUs with half-precision registers?

      Delete
  3. Recent versions of CPLEX have supported setting the integrality tolerance to 0. However, for some of the reasons already discussed, CPLEX cannot guarantee an integer feasible solution will satisfy the 0 integrality tolerance. I will give more details on why below.

    First, regarding the notion of rounding a solution with slight violations of integrality, the rounded solution can become infeasible relative to the constraints and bounds. That really is not that surprising; if the rounded solution was feasible, branching probably would have found it anyways. This notion of rounding to remove integrality violations was very quickly discarded in early versions of CPLEX.

    Regarding why an integrality tolerance of 0 may not be realistic under finite precision, let's return to Paul's example of 1/3. Suppose you have, in double precision

    .3333333333x = 1
    x integer

    Let's run branch and bound on this. Relaxing integrality, we get x = 3 + (1/3)*1e-10

    With an integrality tolerance of 0, we have to branch on x. The up branch is immediately infeasible. For the down branch, we add the constraint x <= 3. Let's solve LP relaxation. In the parent node we had x = 3 + (1/3)*1e-10 as a basic variable. That basis remains feasible within (1/3)*1e-10 for the child node with the x <= 3 branch. Given CPLEX's default feasibility tolerance of 1e-6, we accept this solution. So now we need to branch down on x= 3 + (1/3)*1e-10 again, and we thus repeat this process an infinite number of times. In this simple example numerous remedies are available. But, when the integrality violation arises from a complex interaction of rows and columns of the basis matrix, this becomes a much greater challenge.

    This is why CPLEX supports an integrality tolerance of 0, but doesn't guarantee the final solution will adhere to it. CPLEX will try to branch on all integrality violations in that situation, but if it sees repeated branches as in this example, it will stop before resolving all of them.

    One way to address this issue would be if the optimizer could support a feasibility tolerance of 0. Fundamentally, if your MIP algorithm solves continuous relaxations, the level of precision of your integer feasible solutions is strongly connected to the level of precision you can compute for the feasible solutions of your continuous relaxations. But, so far I don't know of any finite precision solvers for continuous problems that support a feasibility tolerance of 0.

    Maybe there is a way to address this in the branch and bound framework, but hopefully this post at least sheds light on some of the fundamental challenges involved in this issue. Or maybe computer memory and speed will continue to improve, and solvers that perform exact arithmetic will become standard.

    ReplyDelete
    Replies
    1. Ed, Thanks for adding this. Early on you said "First, regarding the notion of rounding a solution with slight violations of integrality, the rounded solution can become infeasible relative to the constraints and bounds." That makes complete sense, but if CPLEX doesn't round the solution, the user will, and the infeasibility threat remains ... confounded by the fact that the user may not test feasibility of the rounded solution, particularly in a large model (whereas CPLEX presumably would be programmed to do so).

      I wonder if a reasonable compromise might be for CPLEX to round as many variables as it can without violating feasibility tolerances, and then report a solution that might or might not have some lingering fractional values for integer variables. If the users see fractions in that solution, they know that a fully rounded solution will violate feasibility tolerance somewhere, and hopefully are alerted to investigate (or at least cross their fingers, offer a prayer, sacrifice a wolverine, ...). I know that trying all possible rounding combinations could be time consuming, so I personally would settle for a heuristic attempt.

      Delete
  4. Not to toot my own horn or anything, but I'm going to toot my own horn. I have written a paper on some of these finite precision computing issues for LP and MIP, which can be found here:

    http://pubsonline.informs.org/doi/abs/10.1287/educ.2014.0130

    The Cliff Notes version (i.e. slides and a recording) can be found here:

    https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W1a790e980a7d_49c5_963d_2965e5d01401/page/Virtual%20User%20Group%20Meetings

    ReplyDelete

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.