Sunday, October 28, 2018

B.S.-ing Precisely

In a recent blog post titled "Excessive Precision", John D. Cook points out the foolishness of articulating results to an arbitrarily high degree of precision when the inputs are themselves not that precise. To quote him:
Excessive precision is not the mark of the expert. Nor is it the mark of the layman. It’s the mark of the intern.
He also mentions that it is typically difficult to assess the precision of the output even if we know the precision of the input. This is in part a matter of possible nonlinearity (and quite possibly opaqueness, as in "black box") in the mechanism that transforms inputs into outputs. It can also be caused by the inherent imprecision of floating point arithmetic (rounding error, truncation error, spiteful quantum effects, ...).

In operations research, there are myriad other sources of imprecision. Your conceptual model of the system is an approximation of the actual system. You may have linearized things that are not linear, or used "convenient" nonlinear representations (polynomials, exponential functions) for things that are nonlinear in a goofy way. If you are like me, you will have ignored randomness entirely, because the work is hard enough even when you pretend determinism. (Also, I confuse easily.) If you did bake in some explicit consideration of randomness, you likely approximated that. (How many things in life are really normally distributed?) There's also the matter of the passage of time. Did you make all the coefficients, distributions etc. time-varying? Didn't think so (and don't really blame you). Throw in estimation error for the model parameters and you have a witches' brew of imprecision. (It's almost Halloween, so I had to work in at least one spooky metaphor.)

This brings to mind two running questions that I encounter with discrete optimization models. I doubt either has a definitive answer. The first question is whether there is any benefit to running the solver all the way to "proven optimality" (gap nearly zero) if everything is so approximate. My personal preference is to do it when it doesn't take too long (why not?) but not bother if the gap is closing at a painfully slow rate.

The second question is whether to bother using a MIP model and solver at all, or just run some metaheuristic (preferably one with a cool biologically-inspired name, such as "banana slug foraging optimization"). After all, if you are just getting an answer to an approximation of the actual problem, why not get an approximate answer to the approximation? My personal inclination is to solve the model exactly when possible, in the hope that the model is at least "close" to reality and thus an optimal solution to the model will be "close" to an optimal solution to the real problem. At the same time, I recognize that there is a lot of hoping going on there, so if getting an exact solution is painful, I'll switch to a heuristic or metaheuristic and hope that the answer I get proves decent in practice.

Either way, I'm not going to claim the "optimal" value of some variable is 2.31785 with any degree of certainty. It's about 2.3 (if we're lucky). On the bright side, a lot of the problems I deal with have binary variables. There's not a lot of concern there about artificial precision; the only concern is artificial confidence in the solution.