Thursday, January 24, 2013

Finding "All" MIP Optima: The CPLEX Solution Pool

Following up on my recent post about finding all optimal solutions (or at least multiple optimal solutions) to MIP models, it turns out that recent versions of CPLEX make this rather easy to program (if not necessarily quick to do) through their solution pool feature. I'll illustrate the technique with a simple example.

The problem


The example problem is to find $N$ binary vectors of dimension $D$ so as to maximize the minimum Hamming distance between them. One source of this problem is coding theory (not to be confused with cryptography): we might want a "vocabulary" of $N$ "words" that are as unlikely as possible to be confused with each other if individual bits are independently molested, with low likelihood, during transmission. The mathematical model is \[ \begin{array}{lrclr} \textrm{maximize} & & d\\ \textrm{s.t.} & \Vert x^{(j)}-x^{(k)}\Vert & \ge & d & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\ & x^{(j)} & \in & \mathbb{B}^{D} & \forall j\in\{1,\dots,N\} \end{array} \]where$\Vert\cdot\Vert$ denotes the Hamming length (number of non-zeros) of a vector and $\mathbb{B}^D$ is the space of $D$-dimensional binary vectors.

The MILP model


To massage this into a mixed-integer linear program (MILP), we introduce the following auxiliary variables:
  • $y^{(j,k)}_i\in\mathbb{B}$ is 1 if and only if vectors $x^{(j)}$ and $x^{(k)}$ differ in component $i$;
  • $d^{(j,k)}$ is the Hamming distance between vectors $x^{(j)}$ and $x^{(k)}$; and
  • $d$ is the minimum Hamming distance between any pair of vectors (to be maximized).
The MILP model is:

\[ \begin{array}{lrclr} \textrm{maximize} & & d\\ \textrm{s.t.} & y_{i}^{(j,k)} & \le & x_{i}^{(j)}+x_{i}^{(k)} & \forall i\in\{1,\dots,D\};\forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\ & y_{i}^{(j,k)} & \le & 2-x_{i}^{(j)}-x_{i}^{(k)} & \forall i\in\{1,\dots,D\};\forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\ & d^{(j,k)} & = & \sum_{i=1}^{D}y_{i}^{(j,k)} & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\ & d & \le & d^{(j,k)} & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\ & x^{(j)} & \in & \mathbb{B}^{D} & \forall j\in\{1,\dots,N\}\\ & y^{(j,k)} & \in & \mathbb{B}^{D} & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\ & d^{(j,k)} & \in & [0,D] & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\ & d & \in & [0,D]. \end{array} \]

Symmetry


The model contains a high degree of symmetry, and as I've mentioned before symmetry will slow the search for multiple optima (and, at the same time, can be exploited to easily generate alternate optima from a given solution). I'll add some constraints to the MILP model to get rid of at least some of the symmetry.

First, consider any solution $x^{(1)},\dots,x^{(N)}$ to the model, and suppose I toggle bit $i$ in every vector; that is, define $\hat{x}^{(j)}$ for $j\in\{1,\dots,N\}$ by \[ \hat{x}_{h}^{(j)}=\begin{cases} x_{h}^{(j)} & h\neq i\\ 1-x_{i}^{(j)} & h=i \end{cases}. \] We obtain a new solution with $\Vert\hat{x}^{(j)}-\hat{x}^{(k)}\Vert=\Vert x^{(j)}-x^{(k)}\Vert\:\forall j\lt k$, so the new solution has the same objective value as the old one. Therefore, without loss of generality, I can require that $x^{(1)}=0$.

Second, if we permute the indices of the vectors (other than the first vector, which we have forced to be 0), we obtain another set of $N$ vectors with the same pairwise distances. To eliminate that, we will constrain the vectors to be lexicographically increasing; that is, if $x^{(j)}_h=x^{(j+1)}_h$ for $h\lt i$ and $x^{(j)}_i \neq x^{(j+1)}_i$, then $x^{(j)}_i=0$ and $x^{(j+1)}_i = 1$. Lexicographic ordering constraints can in general be a pain to write, but our auxiliary variables $y$ actually make it simple here:\[ x_{i}^{(j+1)}\ge x_{i}^{(j)}-\sum_{h\lt i}y_{h}^{(j,j+1)}. \]

Setting up the solution pool


Before invoking CPLEX's populate method to fill in the solution pool, we need to set some parameter values. Let's say that our ambition is to find $M$ optimal solutions if possible. (To find all optimal solutions, choose a really large value of $M$ and hope you live long enough to see the run terminate.) The first thing to do is to set the SolutionPoolCapacity parameter to $M$. Next, set the SolutionPoolIntensity parameter to 4, the most aggressive setting ("leave no stone unturned").

The PopulateLim parameter dictates the maximum number of integer-feasible solutions CPLEX will generate before it declares victory and retires from the battlefield. It may stop before reaching this limit because it has exhausted all possible solutions, or because no possible solutions remain that would satisfy the pool gap parameters (discussed below), or because the standard time limit or node limit is reached. The default value for this parameter is 20. You will want it at least $M$, but realistically you need to set it considerably larger than $M$, since suboptimal solutions found along the way count against the limit. Set it to something really, really large to be safe.

We want to screen out suboptimal solutions. Since the objective function is integer-valued, a suboptimal solution will be worse than the incumbent by at least 1. Allowing for some rounding error, we set the pool absolute gap parameter (SolnPoolAGap) to something between 0 and 1 (I'll use 0.5). Any solution with 0.5 of the incumbent will have the same minimum Hamming distance after cleaning up rounding error; any solution more than 0.5 worse than the incumbent has a minimum Hamming distance at least 1 greater and is thus infeasible. (I'm trusting that I won't get a rounding error worse than 0.5 in any objective value.) We will leave the relative pool gap parameter (SolnPoolGap) at its default value of $10^{75}$.

The gap parameter prevents CPLEX from adding solutions to the pool that are inferior to the best solution already in the pool. It does not prevent suboptimal solutions from being added to the pool before the optimum is found, nor does it force them to be dropped from the pool once better solutions are found. The documentation may be a bit unclear on the latter point; in the sections on the gap parameters, it says that inferior solutions will be discarded but does not explicitly limit that statement to new solutions. I've verified by experiment, though, that inferior solutions found prior to the first truly optimal incumbent may be retained.

The (partial) cure for this is to adjust the pool replacement strategy (SolnPoolReplace). This parameter tells CPLEX how to make room when the pool is full and a new integer-feasible solution is found. Note that, by virtue of our choice of SolnPoolAGap value, the newly found solution will be at least as good as the best of the solutions in the pool. The default value (CPX_SOLNPOOL_FIFO = 0) uses a first in, first out replacement strategy, without regard to objective value. What we want is CPX_SOLNPOOL_OBJ = 1, which kicks out the solution with the worst objective value to make room for the new and improved solution. (If you want to find lots of feasible but not necessarily optimal solutions, you might want CPX_SOLNPOOL_DIV = 2, which kicks out solutions so as to enhance the diversity of the pool.)

Are we done yet???


Now call the populate method (whether in an API or the interactive optimizer), sit back, and wait for the results. When CPLEX finishes, the pool is full of optimal solutions, right? Not quite. If there are fewer than $M$ optimal solutions to your model, chances are quite high that some suboptimal solutions will have made it into the pool. Since the pool never filled up, no solutions were kicked out, so they'll still be sitting there. You will need to check the objective values of each solution in the pool, find the best of those values, and use it to weed out the suboptimal solutions.

You are welcome to see my Java source code for this problem. I wrote it for Java 7, but I believe it will work with Java 6 if you edit one line.


37 comments:

  1. "Look what's in the CPLEX solution pool and see what can be done with it" is on my to-do list for 2013. Besides being interesting to read, your post will probably help me to save time in the near future! Thank you for taking the time to write this stuff.

    ReplyDelete
    Replies
    1. Retirement has given me time to answer idle thoughts ("I wonder how this works?"), and the blog has given me someplace to put the answers. Maybe I should have titled the blog "Random Itches Scratched". :-)

      Delete
    2. Hi Mr Rubin,
      I am using cplex class on a matlab interface.Currently I am working on a sports scheduling problem, with few thousands of variables. I want to get all the possible solution of the model. I know to do so cplex has a method called populate. I know how to use it and all the parameter settings. The one thing I am not sure how exactly Solution pool works. Because it gives me less number of solutions, if i repeat some constraints again. Though it shouldn't happen since the constraints will be redundant. Also for my problem I know that there should be millions of solutions as it only looks for feasibility as there is no objective function in the model but it only gives me 181, even if i specify the solution pool capacity as 200 or 300 etc. Changing the solution pool intensity doesn't help either when i change the intensity to maximum that is '4' the total number of solutions even get smaller to 20.
      I will be very thankful to you if you could shed some light on my problem. Thanks
      in advance.

      Thanks
      Niraj

      Delete
    3. Niraj,

      Are you calling the populate method, solving and then calling populate, or just solving? If you are not calling populate, that is likely the problem.

      Delete
    4. Paul,
      First of all thanks for your quick reply. I am calling the Populate method directly as I think it solves the problem in Phase 1 and then in Phase 2 it populates the solution pool.
      Do you think solving it first and then calling the populate method will make any change?

      Delete
    5. No, I don't think there is any need to solve first. Did you follow all the directions in the user guide for enumerating solutions (CPLEX Optimizers > User's Manual for CPLEX > Discrete Optimization > Solution pool: ... > Enumerating all solutions > How to enumerate all solutions), including setting the pool gap parameter to zero?

      Delete
  2. Hi Paul,
    Thanks again, Yes I have followed all the instructions in the manual. The thing which surprised me was in one case when I removed a set of constraints the total number of solution reported by cplex.populate() method reduced. Also in another case when I set the solution pool intensity parameter to 4 the number of solution reported in solution pool is reduced, which shouldn't be the case as per the definition in the manual.
    I was asking you that how does cplex.populate() work. Does it store all the integer incumbent solution prior to getting the optimal solution and report it in the solution pool, or does it also looks into the nodes which were fathomed cause of the bounds while finding the optimal solution and give other feasible solutions.

    ReplyDelete
    Replies
    1. Niraj,

      I have no inside information on how populate() works. You could ask on the IBM developerWorks CPLEX forum, but I suspect the answer will involve the phrase "trade secret" (although I believe it is based on a published paper). That said, if you search the forum, you'll find all sorts of questions and answers about populate(), including https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014394838&ps=25. You need to register with the forum (free) to post, but you should be able to browse without logging in.

      As far as having fewer optimal solutions when you remove constraints, that is neither expected nor surprising. Picture, in two dimensions, a triangle with one side horizontal and the opposite vertex up, and all three vertices integral. Add a horizontal cut to create a trapezoid with integral vertices. Maximize the ordinate (y), and it has two optimal solutions (the top two corners of the trapezoid). Now remove that extra cut (restoring the triangle), and you are down to one optimal solution (top vertex).

      Delete
    2. Paul,
      I am really thankful for all your responses. I agree with everything you said, I think I didn't made myself clear, yes we can get less number of solution for sure when we remove a constraint but the objective will improve for the problem. In my case I am looking for all feasible solution of a problem so there is no objective function. So in my case the total number of solution shouldn't decrease. That's what make me suspect that populate method may not be giving all the feasible solution.
      Thanks again for your time and ideas and your blog is really helpful.

      Niraj

      Delete
    3. Niraj,

      You're very welcome ... and you are correct. Removing a constraint may reduce the number of integer-feasible corner point solutions, but it cannot reduce the number of integer-feasible solutions.

      I'll throw in one last qualification. If a problem has numerical stability issues (usually reflected in large condition numbers for the basis matrices), rounding problems can make a feasible solution look infeasible (and vice-versa). Other than that, I can't think of any explanation.

      Delete
  3. Paul,
    Thanks once again. So what I understand from what you are saying is that may be cplex.populate() method returns only "integer-feasible corner point solutions". That may be the reason that total number of solution may decrease if we decrease the number of constraint in some case.
    Please rectify me if I am wrong.

    Niraj

    ReplyDelete
  4. No, populate() returns integer-feasible solutions, regardless of whether they are corner points or not. I was just remarking that removing a constraint changes the corner points, so it can alter the number of integer corners in either direction (up or down), but it cannot decrease the set of integer-feasible solutions (corner and interior combined).

    ReplyDelete
  5. Paul,
    Here is something interesting that I observed about the populate method, and thought of sharing with you. I was solving a small example (which I have written below) to check if the populate method works as it is supposed to. This problem has eight feasible solutions but when I used populate method, no matter what parameter I use, it returns only one solution. But for the same problem when I changed the sense of objective (x1+x2) as maximize it gave me all 8 feasible solutions. I tried another setting in which the sense of objective was maximize but changed the objective to (-x1-x2), it again returned me one solution. With the same objective (-x1-x2) I changed the sense of objective to minimize and it returned all the 8 feasible solutions. I would like to know your thoughts on this.
    Minimize
    obj: x1 + x2
    Subject To
    c1: 3 x1 + x2 <= 6
    c2: x1 + 2 x2 <= 6
    Bounds
    x1 >= 0
    x2 >= 0
    Generals
    x1 x2
    End

    ReplyDelete
    Replies
    1. Niraj,

      I ran your problem in the interactive optimizer (CPLEX 12.6). The only nondefault parameter setting I used was solution pool intensity 4. I called populate (without first solving the model), and it returned eight solutions. Solving first and then calling populate also returned eight solutions. Changing the objective to maximize the negative of the original objective again returned eight solutions. So I cannot reproduce your results. Which version of CPLEX did you use?

      Delete
  6. Hello, Dr. Rubin,

    Thanks for providing an example on populating solutions. I'm wondering if there is a way to check all feasible solutions that cplex encounters during its branch-and-cut procedure. To be specific, I don't want to disturb the default algorithms and paths used by CPLEX when it solves an MIP problem. I just want to check every feasible solution it encounters during its process so that I can keep some of them based on my own criteria. I checked the diversity filters, but it seems not helpful in my case because I would like the feasible solution kept in the pool to be all different.

    Thank you very much,
    Da

    ReplyDelete
    Replies
    1. If CPLEX identifies an integer-feasible solution that is suboptimal (fails to qualify as a new incumbent), it discards the solution, and I don't know a way to see that solution at all, other than to have CPLEX discard all potential incumbents (show them to you and let you reject them), which definitely alters the search path.

      For solutions that qualify as new incumbents, if you are using one of the programming APIs (C, C++, Java, maybe Python or C# -- not sure about MATLAB), you can install an incumbent callback. The callback lets you look at (and record) the candidate solution, then accept or reject it (you would presumably accept them all so as not to alter CPLEX's trajectory).

      Delete
  7. Hi Paul, i have a question in an other context in cplex. I want to express a constraint for that two BoolVarArray whose must be differents, these arrays are the solutions. so i search a constraint like allDiff in gecode. if you have an idea for how can i express it, thank you.

    ReplyDelete
    Replies
    1. Please see the Ground Rules for Comments, specifically the second one (about relevance to the post). Perhaps you can find a more suitable post to which to attach the comment.

      Delete
  8. Dear Dr. Rubin,

    Is it possible to generate all possible solutions for an MIP problem, by applying 'populate' to a subset of integer variables? I mean, for instance, I have 30 integer variables, and I only want to populate 15 of them, regardless of the values for the others?

    Thank you

    Mert

    ReplyDelete
    Replies
    1. Mert: The short answer is no. As far as I know, populate computes complete solutions and only complete solutions.

      To be honest, I'm not sure what you are trying to do. If you want to ignore half the integer variables, even if you could get CPLEX to do that, how would you know that the values it produced for the 15 that stayed integer would lead to an integer-feasible solution to the original problem?

      Now, if you know values for 15 variables (call them x1-x15) that you are sure will lead to at least one integer-feasible solution, and you want to find as many combinations as possible for the other integer variables (x16-x30) that produce feasible solutions when combined with your known values for x1-x15, you can fix x1-x15 at the known values and then call populate. Similarly, if you want to find values of x16-x30 that lead to "feasible" solutions where x1-x15 might not be integer, you can relax the integrality on x1-x15 and then call populate.

      Delete
    2. Thank you very much for your answer. I was trying to do the part you mentioned at the end. My idea was to find all possible combinations of x16-x30, without fixing the x1-x15 to a certain value (and still have feasible solutions). I did it by fixing the x1-x15, and it worked. But I guess it is impossible to do it without fixing them.

      Regards

      Mert

      Delete
    3. What about adding "cuts" via the diversity filter, repeatedly calling populate ?

      1. Populate without filter (probably to a fixed number limit)
      2. From this pool get the *different* "supports" for the interesting variable subset.
      3. Add a diversity filter over the variable subset with this supports
      4. Repeat

      Would this work ?

      Delete
    4. For clarification, I'm talking about enumerating configurations of a subset of the variables (regardless of the value of the rest).

      Delete
    5. If you repeat enough times, I think it should.

      Delete
  9. Dr Rubin, in CPLEX output, How to make sense of absolute gap, relative gap, optimality gap? if I get optimality gap >0.4% is it still a good solution?

    ReplyDelete
    Replies
    1. Sorry, this question is not really relevant to the post. Please see the Ground Rules for Comments (link below).

      Delete
  10. Hi, Dr. Rubin

    Thank you so much for the post. I am using CPLEX to solve a ILP with a sparse matrix containing 19000 variances and 13000 constraints. Each variance is restricted to {-1,0,1}. CPLEX can quickly get the optimal value. But I am interested in what are the variances that can be fixed under the optimal value, let's say, a variance is assigned to 1 in >80% solutions. Do you think is it possible to "populate()" them? Or is there any other way can probably do this?

    Best,
    Ian

    ReplyDelete
    Replies
    1. I take it you are assuming (or know somehow?) that there are multiple optimal solutions. The only way to know if $x_i=1$ in 80% or more of optimal solutions is to find all of them. Whether that's feasible depends on how many there are and how much time you have. Another possibility is to use the approach discussed here to generate a sample of solutions, then look for variables that are 1 in most of the solutions in the sample. That would let you choose a sample size that fits your "time budget".

      Delete
  11. Hi Dr. Rubin,

    Thanks for your help about the Solution Pool. I'm actually trying to solve a medium size MIP, but I need to obtain the time elapsed until the first feasible solution was found... also I need the time until the best feasible solution was found.

    I haven't found something in CPLEX documentation that can help me to do this, however I know this data is commonly used in research...

    Do you know what would be the best way to do this?

    Best,
    Javier

    ReplyDelete
    Replies
    1. If you are using a programmatic API, you could attach an incumbent callback and record the time at which the incumbent was found. The first and last recorded values would be what you want. Otherwise, I think you have to dump the log to a file and pick through it (or write a script to pick through it).

      Delete
    2. I think that answers my question!
      Thanks!

      Delete
  12. Hi
    I am using IBM ILOG CPLEX optimization studio studio. Thanks to your useful hints I am able to see my solution pool in the left-down part of the software window. Know in order to see each solution I should choose "pool solution #.." .
    1.I wanted to know is there a way to display or extract all solutions in the solution pool in the same time?
    2.My model contains continious and integer variables. Does solution pool shows different solutions respect to continious variable as well?

    ReplyDelete
    Replies
    1. 1. As far as I know, the only way to access the entire pool is to get the number of solutions in the pool and then retrieve each solution individually in a loop.

      2. If there are multiple solutions with the same values for the integer variables but different values for the continuous variables, only one of those solutions will appear in the pool.

      Delete
  13. Thank for this helpful post. Can I call populate in integer (binary) programming problems with CPLEX? I tried to do it but I get an error message like 'CPLEX Error 3003: Not a mixed-integer problem.'

    ReplyDelete
    Replies
    1. Yes, it should work for both pure integer and mixed-integer models. The error message suggests that CPLEX thinks your model is an LP (or QP). Are you sure it contains at least one integer variable that is not eliminated during presolve?

      Delete
    2. Thank you for your time. You are right I fixed the issue now. I am using populate to generate multiple optimal solutions (same objective values but different optimization vectors). Is this fine use of populate? Because as I read from some examples in Python files of CPLEX, they use it to obtain solutions of objective value within 10% of the optimal, for example.

      Delete
    3. Both uses (find multiple optima, find multiple "near optimal" solutions) are fine. Decision makers sometimes want a choice of solutions even when only one is optimal, which leads to something like the 10% gap criterion. Your use is also fairly common.

      Delete

If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on OR-Exchange.