Friday, January 25, 2013

Solution Pool: "All" Is Not All

I was struck by an idle thought (which left a wicked bruise) regarding yesterday's post about finding "all" MIP solutions using the CPLEX solution pool. Branch-and-bound and branch-and-cut algorithms generally prune a node for one of three reasons:
  • the LP relaxation is infeasible, in which case no amount of further torture will cause the node to confess to a solution;
  • the objective value of the LP relaxation is worse (or at least no better) than that of the current incumbent solution, in which case further division of the node will never yield a solution better than the incumbent; or
  • "the" optimal solution to the LP relaxation is integer-feasible, in which case it is unclear how to further partition the node (and whether to bother).
Note the quotation marks around "the" in the third bullet. When searching for all optimal solutions, or even all feasible solutions, the first rule (prune an infeasible node) still makes sense. The second rule needs to be tempered, as there may be solutions in other parts of the tree that are as good as the incumbent even if they are not better (and, in the case of finding all feasible solutions, you do not care that solutions in other branches of the tree have worse objective values). That brings us to the third rule.

A node problem whose LP relaxation has an integer-feasible optimum may in fact have multiple integer-feasible optima, and certainly may have multiple integer-feasible solutions (which is important if you are looking for all feasible solutions, not just all optima). The usual branching rule (pick a variable that should be integer but is not and separate into two nodes by rounding it up or down) is of no avail if the solution is integer-feasible. One could concoct an arbitrary rule for further partitioning, but it would be outside the scope of the normal branch-and-bound algorithm.

[Edit: Ed Klotz pointed out an important distinction to me. When the node relaxation solution is integer-feasible, you need to look at the sequence of branching decisions leading to the node. If those decisions fixed all the integer variables to specific values, then the node LP solution is the only integer-feasible solution at that node, and there's nothing left to do but prune it. If the branching decisions leave open the possibility that some of the integer variables might take on different values consistent with all the added cuts that got you there, then further processing is required, and CPLEX will indeed do this.]

So the idle thought was "what does the CPLEX populate function do in the third case?", and the answer seems to be "prune the node". [Edit: I think I was wrong here. See the update below.] Consider the following trivial test problem:\[ \begin{array}{lrcl} \textrm{maximize} & x_{1}\\ \textrm{s.t.} & \sum_{i=1}^{d}x_{i} & \ge & \frac{d}{3}\\ & x & \in & \mathbb{B}^{d} \end{array} \] where $\mathbb{B}=\{0,1\}$. There is no significance to the choice of 3 as the denominator of the right-side of the constraint, and in fact the constraint is there primarily to make sure that CPLEX extracts and solves for all variables. (Without it, CPLEX treats the problem as having a single variable $x_1$.) The problem clearly has multiple optimal solutions, formed by setting $x_1=1$ and $x_j=1$ for at least $d/3-1$ other values of $j$.

I ran the populate method against this problem, using the same parameter choices mentioned in yesterday's post (and in particular asking for up to 100 optimal solutions). For various (small) choices of $d$, CPLEX produced only a single solution in the pool ($x_j=1$ for all $j$). Turning off the presolver did not affect this, nor did monkeying with the root and node LP algorithms (nor any other parameter change I tried). I'm not sure why that particular solution was the winner. As to why only one solution was returned, my guess is that the first step branches on (or otherwise fixes) $x_1$, rules out the node where $x_1=0$ based on bound, finds an integer-feasible optimum in the node where $x_1=1$ and then fails to partition that node any further. (This is strictly conjecture, since I am not privy to the inner workings of populate.)

My conclusion is that if you really need all the optima for a MIP, you will need to do something clever (either with callbacks or looping through a sequence of problems) and not rely on the CPLEX solution pool.

Update: My conjecture about CPLEX branching once and then not partitioning the winning node was wrong. I noticed that neither the "phase I" nor "phase II" output contained a node log, which would suggest that CPLEX obtained the one solution it found by presolving, even though I had presolving turned off. So I modified the objective function to $x_1 - \epsilon \sum_{j\gt 1}x_j$, which means it is no longer harmless to force all $x_j$ to 1 for $j\gt 1$. With the new objective, CPLEX generated a node log and found all optimal solutions (as well as some suboptimal ones). Similarly, with just one of the variables penalized, say an objective function $x_1 - \epsilon x_2$, CPLEX branched and found all optima.

So failing to partition a node with an integer solution is not the problem. This may just be a bug.

Update #2: It is a bug. Here's an official response from the IBM/ILOG support team.

This is a bug that we will fix. Thomas Opfer's conjecture from Jan 28 is on target:

> Might it be that they iterate trivial examples because it is faster than starting the complete CPLEX-overhead?

Specifically, before CPLEX's main presolve gets started, CPLEX computes a trivial objective upper bound on this maximization problem by removing all the constraints and setting the variables to their appropriate bound based on the objective value. In this case that established an upper bound of 1. CPLEX also applies some very simple heuristics, which find a feasible solution with an objective of 1. So, the lower and upper bound for the MIP already match, which means an optimal solution has already been found. So, no tree, which the solution pool algorithm requires, is available, and CPLEX essentially stops with the one optimal solution in the pool. This is not correct behavior when the solution pool intensity has been set to 4 to tell CPLEX to enumerate all solutions. CPLEX properly handles a similar case when the full presolve solves the whole model and no tree is available to the solution pool, so we need to do likewise in this situation.

We will fix this in either the next fixpack or feature release, whichever comes first. Sorry for the inconvenience, although the good news is that the bug will only bite on very simple models that CPLEX can solve to optimality with some very minimal presolving effort. In the short term, you can work around this on such models by setting the heuristic frequency parameter to -1 to disable heuristics.

11 comments:

  1. That's very interesting and clearly contradicts the CPLEX user manual. I checked this and experienced the same problem. I think you should report a bug (at least for the wrong manual).

    Concerning this and your former blog post, I think one should point out that enumerating all solutions only makes sense for pure integer programs. For MIPs, this is much more difficult in general. (One possibility would be to project the linear part away and iterate all integer feasible solutions. For each of them, one then can take the remaining LP and do some vertex enumeration like LRS.)

    Best regards,
    Thomas

    ReplyDelete
    Replies
    1. Thomas: I agree with both paragraphs, although I think it might be a code bug rather than a documentation bug. (I just added an update to the text of the post.) Regarding the second paragraph, I tried to make the same point in the "grandparent" of this post (the post preceding it's immediate predecessor).

      Delete
  2. Maybe they find all non-isomorphic solutions? (Did you monkey with the 'symmetry' parameter).

    But I agree -- this is at least a bug in the documentation. There is an old IPCO paper by the CPLEX people on this topic, so they should know what they are taking about.

    Emilie Danna, Mary Fenelon, Zonghao Gu, Roland Wunderling: Generating Multiple Solutions for Mixed Integer Programming Problems. IPCO 2007: 280-294

    Cheers,
    -Jeff

    ReplyDelete
    Replies
    1. Jeff: I had not tried the Symmetry parameter, but turning it off did not change the result. I just added an update above that makes me think this is a bug. I wonder if some preprocessing occurs regardless of parameter settings? At any rate, I'll let them know and we'll see what happens.

      Delete
    2. There is some preprocessing concerning the clique tables. This cannot be turned off. Some time ago, I experienced (and Daniel confirmed) that this can fix variables.

      Did you have a look at the root LP?

      Best regards,
      Thomas

      Delete
    3. Thomas: No, I use the Java API, and the root LP is available only in the C API. :-(

      Cheers,
      Paul

      Delete
  3. Hmm, it is really strange. I tried to access some information using the C API. I disabled everything that came into my mind and I added several callbacks. Result: "Found incumbent of value 1.000000 after 0.00 sec. (0.00 ticks)". No callback was called... How does CPLEX find this solution? Voodoo? Enumeration (probing is off!)?

    On the other hand, I found out that the interactive optimizer finds several solutions if I do the following: r test.lp, set pre pre n, o, pop.

    Best regards,
    Thomas

    ReplyDelete
    Replies
    1. Your result with C is identical to mine with Java. Perhaps there is an ESP heuristic? Or perhaps the IBM ILOG folks have figured out how to program intuition?

      I matched your result with the interactive optimizer; but if I skip the optimize step and go straight from 'set pre pre n' to 'pop', I only get a single solution. Adding to my confusion (not that I needed any more), if I do the same in Java (cplex.solve() first, then cplex.populate()), it makes no difference; I get one solution either way.

      Cheers,
      Paul

      Delete
    2. It seems that the behaviour in C and Java is the same.

      Might it be that they iterate trivial examples because it is faster than starting the complete CPLEX-overhead?

      Best regards,
      Thomas

      Delete
  4. Paul,

    thanks for posting the support response. I checked this and setting CPX_PARAM_HEURFREQ to -1 in fact seems to solve the problem. I thought, I had tried this before. Maybe, I accidently set it to 0 or CPX_OFF.

    Best regards,
    Thomas

    ReplyDelete
    Replies
    1. Thomas: I've had that same experience. There are too many parameters to keep track of them all.

      Delete

If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on OR-Exchange.