Comments on OR in an OB World: Internal model v. LP file v. SAV file

Ed: Thanks for the links. I agree with your point ...

2020-01-07T17:39:40.369-05:00

Ed: Thanks for the links. I agree with your point about expressing constraints with integers rather than fractions (or their decimal equivalents) where practical.

I landed on this blog post today while trying to a...

2020-01-07T16:33:14.736-05:00

I landed on this blog post today while trying to answer the same question. And Paul's September 23 post is correct about different numerical bases (not the plural of basis for the simplex method) lose precision with different numbers (e.g. 1/3 cannot be stored exactly in the base 2 arithmetic used on most of today's computers). And yes, it is possible that just losing the last bit of the mantissa can be enough to break a tie in the algorithm's decision, leading to alternate optimal bases (the simplex kind), which in turn leads to alternate branching selections when solving MIPs, and then all sorts of performance variability. This is one reason I recommend that whenever you can replace fractions involving simple ratios of integers during the model formulation step with integers, do so. For example, instead of representing 1/3x + 2/3y = 1 with decimal representations of 1/3 that result in a loss of precision around the 16th decimal place in a base 2 representation, multiply the constraint by 3 to get
x + 2y = 3, and now all your data is in terms of sums of powers of 2 and represented precisely.

Given that I'm writing on a thread that is almost 10 years old, it may seem like nothing has changed, but that is not really the case. Progress has been made in understanding performance variability in mixed integer programming (e.g. see http://www.or.deis.unibo.it/andrea/Lodi-MIC2011-nopause.pdf). CPLEX now has a runseeds command that allows you to assess the level of variability in your MIP model. Also since 2010, I have written a detailed paper on the numerical aspects of LP and MIP solver that can be found on the INFORMS web site here: https://pubsonline.informs.org/doi/10.1287/educ.2014.0130. And if you don't have the patience for a lengthy paper on numerical epsilons and other nitty gritty details, you can try accessing a shorter powerpoint version here: http://yetanothermathprogrammingconsultant.blogspot.com/2015/03/mip-links.html

@Ed: It's been quite a while since I took my l...

2010-09-23T17:11:34.223-04:00

@Ed: It's been quite a while since I took my last numerical analysis class (back then the limit on arithmetic precision was the number of beads on the abacus), but as I recall there is generally a loss of precision (truncation error) when converting between numerical bases, since a fraction expressed in finitely many numerals in one base may require infinitely many numerals in another base. Quite likely the model built through an API already contains some truncations (builder likely uses decimals, computer representation is binary), and it probably is miniscule compared to fuzziness in some of the parameter values. Nonetheless, the model exported in text format using decimal notation will probably deviate slightly from the internal binary representation of the model, due to truncation error. In terms of the optimal solution, the difference is probably well below the radar -- but in terms of replicating the exact sequence of pivots in a degenerate LP, or possibly the exact sequence of nodes in a MIP search, can we be sure that flipping the last bit in the mantissa of some of the coefficients has no effect on the algorithm's decisions?

@Ed I can sure understand the reasons for choosing...

2010-09-23T15:16:35.421-04:00

@Ed I can sure understand the reasons for choosing a text file with full precision, so you get the exact same model data. As said I don't know what is included in cplex sav format, but I was expecting more than just model data, though I might be mistaken. Anyways I think it's important to have some internal information stored to better reproduce customer runs.

One point I wanted to raise - you are equating ...

2010-09-23T12:02:23.212-04:00

One point I wanted to raise - you are equating 'binary' and 'full precision' here. The two are linked in CPLEX (through the SAV format), but this isn't a technical requirement. We chose to use MPS format (a text format) as our full precision format in Gurobi. In both the CPLEX SAV format and our MPS format, you get the exact same model, down to the last bit, when you write the model to a file and then read it back in.

The two runs should produce the same binary file, ...

2010-09-23T02:18:27.409-04:00

The two runs should produce the same binary file, given your code and the optimizer is deterministic. Serializing hot start data structures doesn't change that. The problem with serializing is the maintainability, you got to make sure it is always up to date, sounds easy but can get complicated between major releases.

Interesting point. I hadn't thought about inc...

2010-09-22T19:39:29.720-04:00

Interesting point. I hadn't thought about including hot start information in the SAV file. You're absolutely right, though, that failure to include an initial basis (when one is used) makes it tough to replicate behavior. At the same time, including the hot start means that if I export a SAV file today and again tomorrow, with no changes to my code in between (but having run it twice on one day and once on the other day), I might end up exporting different SAV files while thinking they would be the same.

One more reason I'm happy not to be working in tech support (and sympathetic to those who are). :-)

I don't think all the vendors have a binary fo...

2010-09-22T19:20:26.111-04:00

I don't think all the vendors have a binary format, but it's generally a good investment for support, since you otherwise might (and in many cases will) end up not being able to reproduce the faulty run a customer had. I don't know how much is included in cplex's sav format. If you think about it, it,s more complicated than expected, because you also have to deal with hotstarts i.e serialize LU and other internal structures cleverly stored by the optimizer between runs. So my experience is that you choose a level of deepness, that captures "most" of the randomness you see from the optimizer taking different paths. I have had several cases, where not serializing the LU made it impossible for me to reproduce the customers run. Which clearly is a support drawback.