The problem
The example problem is to find $N$ binary vectors of dimension $D$ so as to maximize the minimum Hamming distance between them. One source of this problem is coding theory (not to be confused with cryptography): we might want a "vocabulary" of $N$ "words" that are as unlikely as possible to be confused with each other if individual bits are independently molested, with low likelihood, during transmission. The mathematical model is \[ \begin{array}{lrclr} \textrm{maximize} & & d\\ \textrm{s.t.} & \Vert x^{(j)}-x^{(k)}\Vert & \ge & d & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\ & x^{(j)} & \in & \mathbb{B}^{D} & \forall j\in\{1,\dots,N\} \end{array} \]where$\Vert\cdot\Vert$ denotes the Hamming length (number of non-zeros) of a vector and $\mathbb{B}^D$ is the space of $D$-dimensional binary vectors.
The MILP model
To massage this into a mixed-integer linear program (MILP), we introduce the following auxiliary variables:
- $y^{(j,k)}_i\in\mathbb{B}$ is 1 if and only if vectors $x^{(j)}$ and $x^{(k)}$ differ in component $i$;
- $d^{(j,k)}$ is the Hamming distance between vectors $x^{(j)}$ and $x^{(k)}$; and
- $d$ is the minimum Hamming distance between any pair of vectors (to be maximized).
\[
\begin{array}{lrclr}
\textrm{maximize} & & d\\
\textrm{s.t.} & y_{i}^{(j,k)} & \le & x_{i}^{(j)}+x_{i}^{(k)} & \forall i\in\{1,\dots,D\};\forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\
& y_{i}^{(j,k)} & \le & 2-x_{i}^{(j)}-x_{i}^{(k)} & \forall i\in\{1,\dots,D\};\forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\
& d^{(j,k)} & = & \sum_{i=1}^{D}y_{i}^{(j,k)} & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\
& d & \le & d^{(j,k)} & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\
& x^{(j)} & \in & \mathbb{B}^{D} & \forall j\in\{1,\dots,N\}\\
& y^{(j,k)} & \in & \mathbb{B}^{D} & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\
& d^{(j,k)} & \in & [0,D] & \forall j,k\in\left\{ 1,\dots,N\right\} \ni j\lt k\\
& d & \in & [0,D].
\end{array}
\]
Symmetry
The model contains a high degree of symmetry, and as I've mentioned before symmetry will slow the search for multiple optima (and, at the same time, can be exploited to easily generate alternate optima from a given solution). I'll add some constraints to the MILP model to get rid of at least some of the symmetry.
First, consider any solution $x^{(1)},\dots,x^{(N)}$ to the model, and suppose I toggle bit $i$ in every vector; that is, define $\hat{x}^{(j)}$ for $j\in\{1,\dots,N\}$ by \[ \hat{x}_{h}^{(j)}=\begin{cases} x_{h}^{(j)} & h\neq i\\ 1-x_{i}^{(j)} & h=i \end{cases}. \] We obtain a new solution with $\Vert\hat{x}^{(j)}-\hat{x}^{(k)}\Vert=\Vert x^{(j)}-x^{(k)}\Vert\:\forall j\lt k$, so the new solution has the same objective value as the old one. Therefore, without loss of generality, I can require that $x^{(1)}=0$.
Second, if we permute the indices of the vectors (other than the first vector, which we have forced to be 0), we obtain another set of $N$ vectors with the same pairwise distances. To eliminate that, we will constrain the vectors to be lexicographically increasing; that is, if $x^{(j)}_h=x^{(j+1)}_h$ for $h\lt i$ and $x^{(j)}_i \neq x^{(j+1)}_i$, then $x^{(j)}_i=0$ and $x^{(j+1)}_i = 1$. Lexicographic ordering constraints can in general be a pain to write, but our auxiliary variables $y$ actually make it simple here:\[ x_{i}^{(j+1)}\ge x_{i}^{(j)}-\sum_{h\lt i}y_{h}^{(j,j+1)}. \]
Setting up the solution pool
Before invoking CPLEX's populate method to fill in the solution pool, we need to set some parameter values. Let's say that our ambition is to find $M$ optimal solutions if possible. (To find all optimal solutions, choose a really large value of $M$ and hope you live long enough to see the run terminate.) The first thing to do is to set the SolutionPoolCapacity parameter to $M$. Next, set the SolutionPoolIntensity parameter to 4, the most aggressive setting ("leave no stone unturned").
The PopulateLim parameter dictates the maximum number of integer-feasible solutions CPLEX will generate before it declares victory and retires from the battlefield. It may stop before reaching this limit because it has exhausted all possible solutions, or because no possible solutions remain that would satisfy the pool gap parameters (discussed below), or because the standard time limit or node limit is reached. The default value for this parameter is 20. You will want it at least $M$, but realistically you need to set it considerably larger than $M$, since suboptimal solutions found along the way count against the limit. Set it to something really, really large to be safe.
We want to screen out suboptimal solutions. Since the objective function is integer-valued, a suboptimal solution will be worse than the incumbent by at least 1. Allowing for some rounding error, we set the pool absolute gap parameter (SolnPoolAGap) to something between 0 and 1 (I'll use 0.5). Any solution with 0.5 of the incumbent will have the same minimum Hamming distance after cleaning up rounding error; any solution more than 0.5 worse than the incumbent has a minimum Hamming distance at least 1 greater and is thus infeasible. (I'm trusting that I won't get a rounding error worse than 0.5 in any objective value.) We will leave the relative pool gap parameter (SolnPoolGap) at its default value of $10^{75}$.
The gap parameter prevents CPLEX from adding solutions to the pool that are inferior to the best solution already in the pool. It does not prevent suboptimal solutions from being added to the pool before the optimum is found, nor does it force them to be dropped from the pool once better solutions are found. The documentation may be a bit unclear on the latter point; in the sections on the gap parameters, it says that inferior solutions will be discarded but does not explicitly limit that statement to new solutions. I've verified by experiment, though, that inferior solutions found prior to the first truly optimal incumbent may be retained.
The (partial) cure for this is to adjust the pool replacement strategy (SolnPoolReplace). This parameter tells CPLEX how to make room when the pool is full and a new integer-feasible solution is found. Note that, by virtue of our choice of SolnPoolAGap value, the newly found solution will be at least as good as the best of the solutions in the pool. The default value (CPX_SOLNPOOL_FIFO = 0) uses a first in, first out replacement strategy, without regard to objective value. What we want is CPX_SOLNPOOL_OBJ = 1, which kicks out the solution with the worst objective value to make room for the new and improved solution. (If you want to find lots of feasible but not necessarily optimal solutions, you might want CPX_SOLNPOOL_DIV = 2, which kicks out solutions so as to enhance the diversity of the pool.)
Are we done yet???
Now call the populate method (whether in an API or the interactive optimizer), sit back, and wait for the results. When CPLEX finishes, the pool is full of optimal solutions, right? Not quite. If there are fewer than $M$ optimal solutions to your model, chances are quite high that some suboptimal solutions will have made it into the pool. Since the pool never filled up, no solutions were kicked out, so they'll still be sitting there. You will need to check the objective values of each solution in the pool, find the best of those values, and use it to weed out the suboptimal solutions.
You are welcome to see my Java source code for this problem. I wrote it for Java 7, but I believe it will work with Java 6 if you edit one line.
"Look what's in the CPLEX solution pool and see what can be done with it" is on my to-do list for 2013. Besides being interesting to read, your post will probably help me to save time in the near future! Thank you for taking the time to write this stuff.
ReplyDeleteRetirement has given me time to answer idle thoughts ("I wonder how this works?"), and the blog has given me someplace to put the answers. Maybe I should have titled the blog "Random Itches Scratched". :-)
DeleteHi Mr Rubin,
DeleteI am using cplex class on a matlab interface.Currently I am working on a sports scheduling problem, with few thousands of variables. I want to get all the possible solution of the model. I know to do so cplex has a method called populate. I know how to use it and all the parameter settings. The one thing I am not sure how exactly Solution pool works. Because it gives me less number of solutions, if i repeat some constraints again. Though it shouldn't happen since the constraints will be redundant. Also for my problem I know that there should be millions of solutions as it only looks for feasibility as there is no objective function in the model but it only gives me 181, even if i specify the solution pool capacity as 200 or 300 etc. Changing the solution pool intensity doesn't help either when i change the intensity to maximum that is '4' the total number of solutions even get smaller to 20.
I will be very thankful to you if you could shed some light on my problem. Thanks
in advance.
Thanks
Niraj
Niraj,
DeleteAre you calling the populate method, solving and then calling populate, or just solving? If you are not calling populate, that is likely the problem.
Paul,
DeleteFirst of all thanks for your quick reply. I am calling the Populate method directly as I think it solves the problem in Phase 1 and then in Phase 2 it populates the solution pool.
Do you think solving it first and then calling the populate method will make any change?
No, I don't think there is any need to solve first. Did you follow all the directions in the user guide for enumerating solutions (CPLEX Optimizers > User's Manual for CPLEX > Discrete Optimization > Solution pool: ... > Enumerating all solutions > How to enumerate all solutions), including setting the pool gap parameter to zero?
DeleteHi Paul,
ReplyDeleteThanks again, Yes I have followed all the instructions in the manual. The thing which surprised me was in one case when I removed a set of constraints the total number of solution reported by cplex.populate() method reduced. Also in another case when I set the solution pool intensity parameter to 4 the number of solution reported in solution pool is reduced, which shouldn't be the case as per the definition in the manual.
I was asking you that how does cplex.populate() work. Does it store all the integer incumbent solution prior to getting the optimal solution and report it in the solution pool, or does it also looks into the nodes which were fathomed cause of the bounds while finding the optimal solution and give other feasible solutions.
Niraj,
DeleteI have no inside information on how populate() works. You could ask on the IBM developerWorks CPLEX forum, but I suspect the answer will involve the phrase "trade secret" (although I believe it is based on a published paper). That said, if you search the forum, you'll find all sorts of questions and answers about populate(), including https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014394838&ps=25. You need to register with the forum (free) to post, but you should be able to browse without logging in.
As far as having fewer optimal solutions when you remove constraints, that is neither expected nor surprising. Picture, in two dimensions, a triangle with one side horizontal and the opposite vertex up, and all three vertices integral. Add a horizontal cut to create a trapezoid with integral vertices. Maximize the ordinate (y), and it has two optimal solutions (the top two corners of the trapezoid). Now remove that extra cut (restoring the triangle), and you are down to one optimal solution (top vertex).
Paul,
DeleteI am really thankful for all your responses. I agree with everything you said, I think I didn't made myself clear, yes we can get less number of solution for sure when we remove a constraint but the objective will improve for the problem. In my case I am looking for all feasible solution of a problem so there is no objective function. So in my case the total number of solution shouldn't decrease. That's what make me suspect that populate method may not be giving all the feasible solution.
Thanks again for your time and ideas and your blog is really helpful.
Niraj
Niraj,
DeleteYou're very welcome ... and you are correct. Removing a constraint may reduce the number of integer-feasible corner point solutions, but it cannot reduce the number of integer-feasible solutions.
I'll throw in one last qualification. If a problem has numerical stability issues (usually reflected in large condition numbers for the basis matrices), rounding problems can make a feasible solution look infeasible (and vice-versa). Other than that, I can't think of any explanation.
Paul,
ReplyDeleteThanks once again. So what I understand from what you are saying is that may be cplex.populate() method returns only "integer-feasible corner point solutions". That may be the reason that total number of solution may decrease if we decrease the number of constraint in some case.
Please rectify me if I am wrong.
Niraj
No, populate() returns integer-feasible solutions, regardless of whether they are corner points or not. I was just remarking that removing a constraint changes the corner points, so it can alter the number of integer corners in either direction (up or down), but it cannot decrease the set of integer-feasible solutions (corner and interior combined).
ReplyDeletePaul,
ReplyDeleteHere is something interesting that I observed about the populate method, and thought of sharing with you. I was solving a small example (which I have written below) to check if the populate method works as it is supposed to. This problem has eight feasible solutions but when I used populate method, no matter what parameter I use, it returns only one solution. But for the same problem when I changed the sense of objective (x1+x2) as maximize it gave me all 8 feasible solutions. I tried another setting in which the sense of objective was maximize but changed the objective to (-x1-x2), it again returned me one solution. With the same objective (-x1-x2) I changed the sense of objective to minimize and it returned all the 8 feasible solutions. I would like to know your thoughts on this.
Minimize
obj: x1 + x2
Subject To
c1: 3 x1 + x2 <= 6
c2: x1 + 2 x2 <= 6
Bounds
x1 >= 0
x2 >= 0
Generals
x1 x2
End
Niraj,
DeleteI ran your problem in the interactive optimizer (CPLEX 12.6). The only nondefault parameter setting I used was solution pool intensity 4. I called populate (without first solving the model), and it returned eight solutions. Solving first and then calling populate also returned eight solutions. Changing the objective to maximize the negative of the original objective again returned eight solutions. So I cannot reproduce your results. Which version of CPLEX did you use?
Hello, Dr. Rubin,
ReplyDeleteThanks for providing an example on populating solutions. I'm wondering if there is a way to check all feasible solutions that cplex encounters during its branch-and-cut procedure. To be specific, I don't want to disturb the default algorithms and paths used by CPLEX when it solves an MIP problem. I just want to check every feasible solution it encounters during its process so that I can keep some of them based on my own criteria. I checked the diversity filters, but it seems not helpful in my case because I would like the feasible solution kept in the pool to be all different.
Thank you very much,
Da
If CPLEX identifies an integer-feasible solution that is suboptimal (fails to qualify as a new incumbent), it discards the solution, and I don't know a way to see that solution at all, other than to have CPLEX discard all potential incumbents (show them to you and let you reject them), which definitely alters the search path.
DeleteFor solutions that qualify as new incumbents, if you are using one of the programming APIs (C, C++, Java, maybe Python or C# -- not sure about MATLAB), you can install an incumbent callback. The callback lets you look at (and record) the candidate solution, then accept or reject it (you would presumably accept them all so as not to alter CPLEX's trajectory).
Hi Paul, i have a question in an other context in cplex. I want to express a constraint for that two BoolVarArray whose must be differents, these arrays are the solutions. so i search a constraint like allDiff in gecode. if you have an idea for how can i express it, thank you.
ReplyDeletePlease see the Ground Rules for Comments, specifically the second one (about relevance to the post). Perhaps you can find a more suitable post to which to attach the comment.
DeleteDear Dr. Rubin,
ReplyDeleteIs it possible to generate all possible solutions for an MIP problem, by applying 'populate' to a subset of integer variables? I mean, for instance, I have 30 integer variables, and I only want to populate 15 of them, regardless of the values for the others?
Thank you
Mert
Mert: The short answer is no. As far as I know, populate computes complete solutions and only complete solutions.
DeleteTo be honest, I'm not sure what you are trying to do. If you want to ignore half the integer variables, even if you could get CPLEX to do that, how would you know that the values it produced for the 15 that stayed integer would lead to an integer-feasible solution to the original problem?
Now, if you know values for 15 variables (call them x1-x15) that you are sure will lead to at least one integer-feasible solution, and you want to find as many combinations as possible for the other integer variables (x16-x30) that produce feasible solutions when combined with your known values for x1-x15, you can fix x1-x15 at the known values and then call populate. Similarly, if you want to find values of x16-x30 that lead to "feasible" solutions where x1-x15 might not be integer, you can relax the integrality on x1-x15 and then call populate.
Thank you very much for your answer. I was trying to do the part you mentioned at the end. My idea was to find all possible combinations of x16-x30, without fixing the x1-x15 to a certain value (and still have feasible solutions). I did it by fixing the x1-x15, and it worked. But I guess it is impossible to do it without fixing them.
DeleteRegards
Mert
What about adding "cuts" via the diversity filter, repeatedly calling populate ?
Delete1. Populate without filter (probably to a fixed number limit)
2. From this pool get the *different* "supports" for the interesting variable subset.
3. Add a diversity filter over the variable subset with this supports
4. Repeat
Would this work ?
For clarification, I'm talking about enumerating configurations of a subset of the variables (regardless of the value of the rest).
DeleteIf you repeat enough times, I think it should.
DeleteDr Rubin, in CPLEX output, How to make sense of absolute gap, relative gap, optimality gap? if I get optimality gap >0.4% is it still a good solution?
ReplyDeleteSorry, this question is not really relevant to the post. Please see the Ground Rules for Comments (link below).
DeleteHi, Dr. Rubin
ReplyDeleteThank you so much for the post. I am using CPLEX to solve a ILP with a sparse matrix containing 19000 variances and 13000 constraints. Each variance is restricted to {-1,0,1}. CPLEX can quickly get the optimal value. But I am interested in what are the variances that can be fixed under the optimal value, let's say, a variance is assigned to 1 in >80% solutions. Do you think is it possible to "populate()" them? Or is there any other way can probably do this?
Best,
Ian
I take it you are assuming (or know somehow?) that there are multiple optimal solutions. The only way to know if $x_i=1$ in 80% or more of optimal solutions is to find all of them. Whether that's feasible depends on how many there are and how much time you have. Another possibility is to use the approach discussed here to generate a sample of solutions, then look for variables that are 1 in most of the solutions in the sample. That would let you choose a sample size that fits your "time budget".
DeleteHi Dr. Rubin,
ReplyDeleteThanks for your help about the Solution Pool. I'm actually trying to solve a medium size MIP, but I need to obtain the time elapsed until the first feasible solution was found... also I need the time until the best feasible solution was found.
I haven't found something in CPLEX documentation that can help me to do this, however I know this data is commonly used in research...
Do you know what would be the best way to do this?
Best,
Javier
If you are using a programmatic API, you could attach an incumbent callback and record the time at which the incumbent was found. The first and last recorded values would be what you want. Otherwise, I think you have to dump the log to a file and pick through it (or write a script to pick through it).
DeleteI think that answers my question!
DeleteThanks!
Hi
ReplyDeleteI am using IBM ILOG CPLEX optimization studio studio. Thanks to your useful hints I am able to see my solution pool in the left-down part of the software window. Know in order to see each solution I should choose "pool solution #.." .
1.I wanted to know is there a way to display or extract all solutions in the solution pool in the same time?
2.My model contains continious and integer variables. Does solution pool shows different solutions respect to continious variable as well?
1. As far as I know, the only way to access the entire pool is to get the number of solutions in the pool and then retrieve each solution individually in a loop.
Delete2. If there are multiple solutions with the same values for the integer variables but different values for the continuous variables, only one of those solutions will appear in the pool.
Thank for this helpful post. Can I call populate in integer (binary) programming problems with CPLEX? I tried to do it but I get an error message like 'CPLEX Error 3003: Not a mixed-integer problem.'
ReplyDeleteYes, it should work for both pure integer and mixed-integer models. The error message suggests that CPLEX thinks your model is an LP (or QP). Are you sure it contains at least one integer variable that is not eliminated during presolve?
DeleteThank you for your time. You are right I fixed the issue now. I am using populate to generate multiple optimal solutions (same objective values but different optimization vectors). Is this fine use of populate? Because as I read from some examples in Python files of CPLEX, they use it to obtain solutions of objective value within 10% of the optimal, for example.
DeleteBoth uses (find multiple optima, find multiple "near optimal" solutions) are fine. Decision makers sometimes want a choice of solutions even when only one is optimal, which leads to something like the 10% gap criterion. Your use is also fairly common.
DeleteI am using the CPLEX solver for binary integer linear programming (cplexbilp) in MATLAB and would like to print out the identified alternative solutions in the solution pool. The code looks as follows:
ReplyDeleteoptions = cplexoptimset('cplex');
options.Display = 'on';
options.solnpoolagap = 0;
>> options.solnpoolintensity = 4;
>> options.populatelim = 100;
[x, fval, exitflag, output] = cplexbilp (f, Aineq, bineq, Aeq, beq, ...
[ ], options);
Each time only one solution is stored in x, but I know as CPLEX tells me that there are other solutions as well.
Do you know how to print out those alternative solutions?
I don't use MATLAB, so I can't answer completely. In other APIs, though, there is a version of the "get a solution" function that takes an additional argument, the index of the solution in the solution pool. There's also a function to find out how many solutions there are in the pool. So you call the latter function to get the pool count, then call the "get x" function in a loop and retrieve one solution at a time.
DeleteHi Dr. Rubin,
ReplyDeleteI am trying to change the parameter of solution capacity between successive calls of populate. But it seems only the capacity of the first populate is keeping (for example i populate 100 times, and the first capacity is set to 10. and then i set the capacity to 20 and call populate again. ) . Can i change the capacity parameter when calling multiple time of populate?Thank you
Yes, you can increase the capacity between calls, at least with the interactive optimizer. I haven't tried it with any of the programming APIs, but I would expect it to work there too. The catch is that if the previous call to populate exhausted the search tree, or at least exhausted all feasible solutions, you may not pick up any more solutions with the second call.
DeleteIf you are using replacement strategy 1 (keep the 10 or 20 best solutions), bear in mind that on the second call CPLEX will *not* go back to parts of the tree it previously visited to find solutions that it previously ignored (as not better than the worst of the 10 solutions in the first pool), even if they might now qualify as one of the 20 best.
Dear pro. Rubin:
ReplyDeleteThank you so much! Yes, i set the replacement strategy to 1. You answer really helped me. Thank you again!
Best regards,
Qingwei
You are very welcome. Glad it helped.
ReplyDeleteHi Dr. Rubin,
ReplyDeleteThank you very much for this post and your blog. First of all, I am quite newbie in OR. I have a partitioning problem where I need to find all optimal solutions. By following your post and the IBM manual (I set 'SolnPoolAGap' to 0.5), I was able to find many optimal solutions. Based on this, I would like to ask 2 questions:
1) Can we say for sure that we get all optimal solutions?
2) I ran into some memory problem for some instances, please note that the memory (RAM) I use is quite large: 256 gb. So, I am thinking an alternative way which consumes less memory when enumerating all optimal solutions. I have two ideas (but, I do not know if it is feasible):
*) Using cutting plane approach. Hence, we do not add all triangle constraints, it might help us gain memory usage. However, I am not sure how to implement this. If I first generate user cuts within X minutes (Branch&Cut), and then I solve the same instance with generated tight user cuts (Branch&Bound), this would be a good idea?
*) I can (efficiently) solve the problem for one optimal solution, and get the objective function value. Then, use this objective function value as constraint, and solve the same instance again but for all optimal solution.
I am open to all your suggestions. Thanks in advance.
Nejat
Nejat,
DeleteYou did not indicate whether you are using "solve" or "populate" to generate solutions. (The solution pool would be used with either one.) I believe that CPLEX works differently depending on which you use. To get all the solutions, I would use "populate". Even then, several parameters set limits that can cause CPLEX to stop before all optimal solutions are found. To be sure to find them all, you can keep calling populate until it comes back with an empty node log (nothing more to do). This assumes that you set the pool size limit high enough to hold all the optimal solutions.
I did not understand what you meant about "triangle constraints", and I really do not know whether using a user cut callback at the beginning but then turning it off after X minutes would be helpful. (By the way, the second phase would still be branch-and-cut, since CPLEX will generate cuts even if you do not.) Setting the objective value as a constraint is unlikely to help, and done the wrong way can slow the solver. One possibility to help with memory is to set parameters that let CPLEX swap parts of the node tree out to disk. Another, if you have many constraints, is to make some of the constraints lazy constraints. (This may slow progress on the bound, but may help conserve RAM.) Finally, note that using parallel threads will massively increase RAM consumption, so you can always try cutting back on the number of threads used (again, at the cost of speed).