## Wednesday, July 31, 2013

### Benders Decomposition with Integer Subproblems

I'm not sure why, but occasionally people post questions related to an attempt to apply Benders decomposition in a situation where the subproblem contains integer variables. A key question is how you generate cuts from the subproblem, since discrete problems do not enjoy the same duality theory that continuous problems do.

A typical application of Benders decomposition to integer programming starts with a problem of the form$\begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & c_{2}'y\\ \textrm{subject to} & Ax & & & \ge & a\\ & & & By & \ge & b\\ & Dx & + & Ey & \ge & d\\ & x & \in & \mathbb{Z}^{m}_+\\ & y & \in & \mathbb{R}^{n}_+ \end{array}$This decomposes into a master problem$\begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & z\\ \textrm{subject to} & Ax & & & \ge & a\\ & h'x & & & \ge & h_0 & \forall (h,h_0)\in \mathcal{F}\\ & h'x & + & z & \ge & h_0 & \forall (h, h_0)\in \mathcal{O}\\ & x & \in & \mathbb{Z}^{m}_+ \\ & z & \ge & 0 \end{array}$and a subproblem$\begin{array}{lrclcc} \textrm{minimize} & c_{2}'y\\ \textrm{subject to} & By & \ge & b\\ & Ey & \ge & d - Dx\\ & y & \in & \mathbb{R}^{n}_+ \end{array}$where $\mathcal{F}$ and $\mathcal{O}$ are  sets of coefficient vectors for "feasibility" cuts (pushing $x$ in directions that make the solution $(x,y)$ feasible) and "optimality" cuts (pushing $z$ upward so as not to underestimate $c_2'y$) respectively. The subproblem is a linear program, and its dual solution supplies the coefficient vectors $(h,h_0)$ for both types of master cuts.

So what happens if $y$ is integer-valued ($y\in\mathbb{Z}^n_+$) rather than continuous ($y\in\mathbb{R}^n_+$)? I don't have a definitive answer, but there are a few things that can be tried. The following suggestions should also work equally well (or poorly) when $y$ is a mix of integer and continuous variables.

### Proceed as usual

The subproblem is now an integer program, but you can always relax it to a linear program and obtain the dual solution to the relaxation. If the current master solution $x = \hat{x}$, $z = \hat{z}$ makes the linear relaxation of the subproblem infeasible, you can be sure it also makes the actual subproblem infeasible, and thus you will get a legitimate feasibility cut. If the subproblem is feasible, the dual to the relaxation will produce a cut that forces $z$ to be at least as great as the objective value of the relaxation, which is a legitimate lower bound for the actual subproblem objective value.

The news here is not all good, though. It is possible that $\hat{x}$ makes the subproblem integer-infeasible but with a feasible relaxation, in which case you will not get the feasibility cut you need. If the subproblem is feasible (let's say with optimal solution $\hat{y}$) but $\hat{z}$ underestimates its objective value $c_2'\hat{y}$, you want an optimality cut that forces $z\ge c_2'\hat{y}$ when $x=\hat{x}$; but the cut you get forces $z\ge w$ where $w$ is a lower bound for $c_2'\hat{y}$, and so there is the possibility that $c_2'\hat{y} > \hat{z} \ge w$ and the optimality cut accomplishes nothing.

### "No good" constraints for infeasibility

Suppose that $x$ consists exclusively of binary variables. (General integer variables can always be converted to binary variables, although it's not clear that the conversion is in general desirable.) We can exclude a particular solution $x=\hat{x}$ with a "no good" constraint that forces at least one of the variables to change value:$\sum_{i : \hat{x}_i=0} x_i + \sum_{i : \hat{x}_i = 1} (1-x_i)\ge 1.$This gives us another option for feasibility cuts. Solve the subproblem as an IP (without relaxation); if the subproblem is infeasible, add a "no good" cut to the master problem. Note that "no good" cuts are generally not as deep as regular Benders feasibility cuts -- the latter may cut off multiple integer solutions to the master problem, whereas a "no good" cut only cuts off a single solution.

If a "no good" cut eliminates just one solution, is it worth the bother? After all, the node that produced $x=\hat{x}$ will be pruned once we realize the subproblem is infeasible. The answer depends on a combination of factors (and essentially reduces to "try it and see"). First, if $x=\hat{x}$ was produced by a heuristic, rather than as the integer-feasible solution to the node LP problem, then you likely cannot prune the current node (and, in fact, the node you would want to prune may be elsewhere in the tree). Adding the "no good" cut may prevent your ever visiting that node, and at minimum will result in the node being pruned as soon as you visit it, without having to solve the subproblem there. Second, if your master problem suffers from symmetry, the same solution $x=\hat{x}$ may lurk in more than one part of the search tree. The "no good" cut prevents your tripping over it multiple times.

It may be possible to strengthen the cut a bit. Suppose that $\hat{x}$ renders the subproblem infeasible (as an IP). There are various ways to identify a subset of the subproblem (excluding the objective function) that causes infeasibility. CPLEX can do this with its conflict refiner; other solvers may have similar functionality. Let $N$ be the set of indices of the $x$ variables and $N_0$ the set of indices of all $x$ variables that appear in the right hand side of at least one subproblem constraint identified as part of the conflict. If we are lucky, $N_0$ is a proper subset of $N$. We can form a "no good" cut for the master problem using just the variables $x_i, i\in N_0$, rather than all the $x$ variables, and obtain a somewhat deeper cut (one that potentially cuts off multiple master problem solutions). The caveat here is that running something like the CPLEX conflict refiner, after determining that the subproblem is infeasible, may eat up a fair bit of CPU time for questionable reward.

### "No good" constraints for optimality

It may be possible to exploit the technique I just described to create ersatz optimality constraints as well. Suppose that the current incumbent solution is $(\tilde{x}, \tilde{y})$, and that some node gives an integer-feasible solution $(\hat{x},\hat{z})$ for the master problem. It must be the case that$c_1'\hat{x}+\hat{z}<c_1'\tilde{x}+c_2'\tilde{y},$else the master problem node would be pruned based on bound. Now suppose we pass $\hat{x}$ to the IP subproblem and obtain an optimal solution $y=\hat{y}$. If $c_1'\hat{x}+c_2'\hat{y}<c_1'\tilde{x}+c_2'\tilde{y}$, we have a new incumbent solution. If not, then $x=\hat{x}$ cannot lead to an improved solution, and we can add a "no good" cut to eliminate it (again recognizing that this is a weak constraint).

### More?

That pretty much exhausts my quiver. If any readers have other ideas for generating Benders cuts from integer subproblems, I invite you to post them in comments.

1. I recall using something akin to the 'inference duals' in logic based benders decomp by jn hooker.

1. Shiva, thanks for the pointer!

2. For the case of binary only first-stage variables, you could use Laporte-Louveaux Cuts. All you need is a finite lower bound for the problem.
http://www.sciencedirect.com/science/article/pii/016763779390002X

1. Thanks very much for the link. I shy away from stochastic problems -- deterministic ones are plenty hard enough for me -- so this is an area I would not normally mine for information on cut generation.

3. The idea of "logical" or "combinatorial" benders is pretty popular right now (for a suitable definition of popular). John Hooker started it off here but others (including me). I have a talk on this. Generally this work exploits the structure of the subproblems. Codato and Fischetti explored the general integer program case. Lots more to do in this area!

1. Mike, thanks for the links! Catching up on Hooker's research is on my bucket list (which seems to be the poster child for exploding queues). I'm familiar with the Codato & Fischetti paper -- in fact, I suggested one of the applications they cite -- but IIRC their subproblem is an LP, with the binary master variables turning constraints on and off rather than relying on "big M" versions.

2. To be completely honest, I never understood the difference between Hooker's "logical" Benders decomposition from 1995 and Geoffrion's generalized Benders decomposition from 1972. They look pretty much the same to me, the only difference being Geoffrion uses a "polyhedral" view while Hooker's uses a more "modeling" or "constraint oriented" description.

http://www.anderson.ucla.edu/faculty/art.geoffrion/home/docs/GBD.pdf

The classical results in network design Benders decomposition are from the 90's like Stoher & Dahl or Bienstock & Gunluk. While the results on the max cut decomposition are from the 80's with Barahona & Mahjoub. They just don't use the word "decomposition" but rather "polyhedral approach".

4. The real feat of no-good cuts is being able to handle side constraints in Benders decompositions.

Once you got to a point where you cannot generate any other Bender cut in your sub-problem, there are only two possible cases
1. The Benders cuts as a whole are necessary and sufficient, in which case you are done : no more Benders cuts is equivalent to master feasibility
2. The Benders cuts as a whole are necessary but not sufficient : you don't know how to generate more cuts, but you are not done yet
[3. The Benders cuts are not necessary : you did it all wrong, you are not even solving the right problem...]

Finding Benders cuts that are necessary is "easy". You just need to start from the original problem and sum and replace variables / constraints till you get to the cut. If all the steps are correct, then your cut is a valid Benders cut.

Finding a group of Benders cuts that as a whole is a sufficient condition is a serious problem
- lots of maths : remember your classics like the network design problem with bi-partition cuts and then the generalized flow cuts computed in a euclidean graph derived form the shortest paths in the network, or the max-cut where you have to find odd-cycle constraints on a graph where some specific minors have been contracted...
- it doesn't handle side constraints : even if you did do all the maths, you add a constraint to the original problem and the sufficient condition is broken !

That's were no-good cuts enter the scene. They save you from the issue 2. and allow you to do a Benders decomposition with only necessary cuts while still retaining global optimality. The reason is that by themselves, no-good cuts are sufficient as a whole because they basically are equivalent to "generate and test" or ordered enumeration. Thereafter whatever my Benders cuts are + no-good is necessary and sufficient.

1. I'm glad you put "easy" in quotes -- I had a problem where the cuts were _conceptually_ easy (based on the dual solution to an LP), but the LP was annoyingly slow to solve (probably some numerical stability issue we never did diagnose).

You're right about no-good cuts providing the flexibility to handle situations where you don't know how to generate more cuts. The one down side is that they are so "shallow" (only zapping one solution). If your problem has no symmetry, so that each candidate solution lives in exactly one place in the solution tree, and if you are using solver callbacks to guide the branch-and-cut process, I think you can get the same effect without actually adding the no good cuts simply by pruning the nodes containing those solutions.

I suppose they're like nuclear weapons and lawyers -- nobody likes them until they need them. :-)

2. When we speak we say no-good "cuts" but if you were to implement them in a MIP, you would implement them with a branching rule : normally, when an LP is integer the (sub)problem is solved, no matter if there are still variables that have a non-singleton domain. To implement the no-good cut you ignore the LP integrality (= don't report you found a solution) and continue bi-secting variables with non-singleton domain. If all your variables already have singleton domains, then you just fail and backtrack the branch & bound.

The difference between bisecting and actually adding a cut is significant, about x100 times last time I tried. On the other hand, ordering properly your callbacks to achieve the desired effect is a headache. Last time I tried, Cplex was declaring the solution feasible before giving you the hand to branch, so you had to follow a complex sequence of steps, using multiple callbacks. I have always advocated a "reject_solution" function in MILP engines.

3. A couple of versions back, CPLEX callbacks were refactored. I used to need three (I think) for Benders decomposition; now I just need one (a lazy constraint callback). The catch is that one works if you are always adding cuts to reject solutions. To implement "no-good" via branching, you still need at least two (an incumbent callback to intercept the solution and a branch callback).

4. You might also need an incumbent callback to control yourself the upper bounds and that's where things become complicated.

If you are doing a "traditionnal" (= non incremental) Benders decomposition
1. you solve the master problem till optimality
2. you compute the "deepest" benders cuts with a MIP + multiple solutions
3. you solve from scratch a new master + benders cuts

Unsurprisingly authors report 95% of the time is spent computing the master MIPs given that you are running till optimality and then trashing them.

To speed up results you might want to do an incremental Benders decomposition, adding Benders cuts derived from any solution found in the master problem 1. instead of waiting till optimality proof; which basically means embedding 2. in an incumbent callback for 1.

However, before letting a solution of 1. be accepted, you have to
- check that it is a solution of the whole problem (so you need a checker and a compact model for your problem)
- possibly correct it's objective value

Otherwise the MIP will use this invalid solution to prune branches.

I devised this simplified problem to understand the issue "find all solutions at 1% of the optimum of a given MIP in a single MIP solve". For each solution of value V you can add an objective-cut of value 1.01 * V. But you have a mess with the upper bounds
- you either have to manage the solutions outside of the MIP or "reinject" the solution with the corrected objective
- you obtain a superset of the real solution set

That's why I also advocate MIPs to support incremental addition of constraints. Lets say you have the complete B&B tree that solved a MIP and you add a constraint. Only two things can happen to a pruned node
1. The node was pruned by infeasibility : adding a constraint doesn't change that, so nothing to be done
2. The node was pruned because of an upper bound : adding a constraint might remove the upper bound and make the pruning invalid. You have to filter all the solutions, check if there is still one that can prune the node and if not, solve it again.

If MIPs managed simultaneously the solution set and cuts in an incremental way, you could just "throw global cuts" without having to think about it.

5. Your second (incremental) approach, which some people call a "one tree" approach, is what I use. I did a couple of posts about it (http://orinanobworld.blogspot.com/2011/10/benders-decomposition-then-and-now.html, http://orinanobworld.blogspot.com/2012/07/benders-decomposition-in-cplex.html). The current version of CPLEX can do the one tree method with a single callback (a lazy constraint callback), as mentioned in the second post.

I've had similar thoughts about saving and reloading a tree (minus any nodes pruned for infeasibility), but I think there are some complicating factors that make it more difficult than it appears. One is that I believe the algorithm, during branching, may fix some integer variables using dual-based bounds, which rely on the incumbent value. Since the incumbent value may change when you add a constraint, those fixings may no longer be valid, which means the branching structure of the saved tree may not be valid. Repairing that could be more expensive than starting from scratch.

5. Right... but that wasn't really my point, I kind of got sidetracked. My point was in a real setting you need to reinject the solutions found via a heuristic callback.

Lets take your transportation example of Benders decomposition (in one of your posts) and imagine that in the first iteration, two heuristics find the following solutions
- y = [1, 1, ..., 1] and z = 0
- y = [1, 1, ..., 1] and z = 100

Then you compute in the sub-problem a min-cost flow and find the minimum cost flow is 10. At this point you have in your hands a full solution of the problem ([1, 1, ..., 1], 10)

If we follow the algorithm description in your post, in the first case the solution ([1, 1, ... 1], 100) will be kept while the other solution will generate an objective cut, and hopefully in a subsequent iteration, the engine might find ([1, 1, ..., 1], 10) thanks to the objective cut but without guarantee because it might have "jumped" to another more promising branch. Worse, if your sub-problem is an IP, there will be a duality gap between the primal and the dual, so we are not even sure the optimiality cut will "touch" the solution.

At the end, you literally had in your hands a solution for the full problem and didn't give it to the engine. Instead you generated an upper and a lower bound !

In a real world benders decomposition you will need a heuristic callback to "inject your corrected solutions". That will help the MIP engine improve its branching, pruning, and save a couple of iterations. Also, your user / customer will appreciate the early solutions as the "dual ascending" property of the traditional Benders is terribly annoying (the fact you don't have any solution for hours and suddenly you are given the optimal one).

1. I follow what you are saying, but I'm having trouble picturing your example scenario happening. In the master problem, the only constraints on z are optimality cuts (z >= something involving y); so how would a heuristic arrive at y = [1, 1, ..., 1] with z = 100 when z = 10 would do? In other words, how would a heuristic not select the smallest z satisfying all optimality cuts so far? (I'm assuming that z is a continuous variable. If, for some reason, z is integral, I still think a heuristic would reliably choose the smallest integer satisfying all optimality cuts.)

In a recent application I actually did use a heuristic callback to "polish" and re-inject solutions found by CPLEX. It wasn't a Benders decomposition, though; it was a "straight" MIP model. CPLEX was, for whatever reason, fixing binary variables tied to penalty terms at 1 before branching on the variables that would determine the necessity of the penalties, and as a result some proposed incumbents were paying penalties the solutions had not actually incurred. I could have fixed the problem by specifying branching priorities or by adding extra constraints; the heuristic callback just seemed like the path of least resistance. I don't see anything like that happening with z in a Benders decomposition, though.

6. Hum... lets flatten those master optimality and feasibility constraints

min z + sum c_j y_j
subject to
z + y1 + y4 + y5 >= 2 // optimality
z + y3 + y5 + y6 >= 2 // optimality
y1 + y5 >= 1 // feasibility
y4 + y6 >= 3 // feasibility
y_j in {0,1}, z in R+

That's a cover problem (actually a general mixed integer multi-knapsack, which is why there is so much literature about separating MIKPs). There is no reason any heuristic should always find the minimum value for Z. What if the heuristic started fixing Z and "propagated" to the other variables ?

But by far the worst part is the lower bound (when Z in the master is lower than the real Z).
You started with a master solution (Y, 0), found out that the real solution was (Y, Z) and added a cut. However, there is no guarantee in the IP case that the cut will actually even cut anything between (Y, 0) and (Y, Z). So in the next iterations you could find (Y, 1), (Y, 2), ... and have a "slow convergence" behavior, similar to no-good cuts.

That's actually the meaning of "Benders cuts that are necessary but not sufficient". In the IP case, there is a duality gap between the primal (network flow with integer capacity) and the dual used here (max-cut). Thereafter your dual cuts don't always "touch" your primal solutions, even the ones used to generate them. Therefore they may fail to make the master feasible, which was exactly case 2. in the initial discussion about no-good cuts.

That's why if you don't inject the solutions in an IP benders, you exhibit dual-ascending behavior, just like traditional Benders.

1. I agree with what you wrote about the possibility for a duality gap, and the need for injecting, when the subproblem is an IP. In the "traditional" As to your first example, since z is a real variable, I'm struggling to visualize a heuristic that would start by fixing it (to what?). Granted a heuristic can be pretty much anything, but what would be the logic of fixing a non-integer variable first (let alone to an inflated value)? Are you aware of a heuristic that does that? (I'm genuinely curious, not being argumentative -- if this really is a danger when the subproblem is an LP, I may need to rethink how I implement Benders.)

2. MIP engines find "implied integers". Won't apply in this case since the user is manually doing the decomposition and the engine doesn't know Z is a surrogate. However if Z happens to be integer (e.g. a sum of integer flows) it should be declared as integer.

I remember somewere on the web a discussion about "is Cplex right to find implied integers". I think it was between Erwin Kalvelagen and Jeff Linderoth but I can't find it anymore. The conclusion was that knowing its an integer creates more room for cuts and heuristics, which corresponds to my experience.

I don't think there is any risk for LP decomposition. However, the title of your inital post is "Benders Decomposition with Integer Subproblems".

3. Thanks for the clarification. I don't typically deal with integer subproblems (born lucky?), but I was tacitly, if possibly incorrectly, assuming that the z variable would be declared real even if the subproblem objective is known to be integer-valued. Having an integer-valued subproblem objective might allow optimality cuts to be tightened a bit, but I would expect that to be done by the cut generator, irrespective of how z is declared. Declaring it real avoids, among other things, the possibility that the solver elects to branch on z rather than on a more productive variable.

Thanks in particular for the mention of "implied integers". I was not aware of the previous discussion. I think I knew, somewhere in the back of my head, that CPLEX made some effort to detect integral objective functions to facilitate pruning and convergence decisions; I did not know it would also be used in branching.

7. Hi Paul,

We have a network flow problem with integrality property. Where in, the optimal objective of the linear relaxation (with fractional solutions) is same as the optimal objective of the integer solution. In such cases, do you think we could have integer sub-problems and hope to exploit the LP duality?

Regards,
Vivek

1. The "integrality property" is typically used to describe a situation where all vertices of the LP relaxation are integral. Is that what you mean, or do you mean that the relaxation can have optimal solutions that are fractional but are somehow guaranteed to have the correct objective value?

If you have the integrality property, why use Benders? Typically problems with the integrality property solve nicely as single LPs.

2. Hi Paul,

Thanks. Yes, I mean the second one. ( For more reference, you can see page 158 of this document - http://www.ensta-paristech.fr/~diam/ro/online/Monique.Guignard-top11201.pdf ). Hope I have interpreted this correctly.

We wanted to see if Benders works well on our problem which could only be decomposed into two integer problems. And we were limited because for Benders the sub needs to be a LP.

Regards,
Vivek.

3. First, there is no guarantee that Benders "works well" under any circumstances. All we can do is see if there is anything in your setup that would prevent cut generation or make it likely that the cuts would be looser than usual.

I looked at p. 158 of the referenced document, and I'm not sure I believe definition 5.1 for technical reasons: X contains not only integrality restrictions but also sign restrictions, so I think $\{x|Cx\le d\}$ should also contain the sign restrictions.

At any rate, if your Benders subproblem has the integrality property, then the solution to the LP relaxation will automatically be integer-feasible if it is unique. If it is not unique, at least one of the optima will be integer-feasible, and they will of course all have the same objective value. So it seems to me that optimality cuts generated by the subproblem will be unaffected by the integrality restriction.

Feasibility cuts may be a bit trickier, because the integrality property relies on a specific value of the RHS $d$. Think of a transportation problem, which has the integrality property. If all supplies and demands are integer, the LP gives an integral solution; but if you specify fractional supplies and/or demands, there is no integer optimum.

So let's say that you generate a potential incumbent in master problem that produces a RHS $\hat{d}$ in the subproblem, and suppose that you treat the subproblem as an LP (relaxing integrality). If the subproblem LP relaxation is infeasible, you get a valid feasibility cut the usual way. If the subproblem LP is feasible and $\hat{d}$ is integral in the right places, you get no feasibility cut, which is correct since the integrality property says that there is an integer-feasible optimum for the subproblem. If the subproblem LP is feasible but $\hat{d}$ has fractional values in places where you need integers, then I think (not positive) that you might have a situation where you get no feasibility cut (meaning the master solution is accepted as feasible) when it actually should be cut off. So the key is to be sure, courtesy of some specific property in the constraints, that a valid solution to the master problem yields a RHS $\hat{d}$ that is integer in all the necessary places.

4. Thanks Paul for the nice explanation. Benders does seem to work in my case, at least partly. I could attain convergence for smaller instances but takes quite some time for larger instances (this could be just with Benders and not specific to my problem). I need to look into the way I generate my optimality cuts. Feasibility cuts seem to work too. In my case, the RHS of my sub-problem may not be always integers. But there may be a way to remodel my sub-problem as Min Cost Flow model to ensure that the RHS would always be integers.

I was also interested in your post in CPLEX forum related to finding adjacent BFSes using CPXpivot(). Since my problem has the integrality property, I could attain the LP optimal basis and use series of CPXpivot()s to reach integer basis. Maybe this idea is silly, but I haven't tried it. Because to attain integer basis, many variables in LP basis must become zero.

Regards,
Vivek

5. Pivoting around the optimal face of your feasible region, looking for an integer-feasible corner, could get tricky: you would need to keep track of previously encountered bases and explore every edge out of each corner you encountered (plus possibly dealing with degenerate corners). I don't think it is for the faint of heart. :-)

There's a paper by Magnanti and Wong about accelerating Benders. If you use the blog's search box to look up "Magnanti", I'm pretty sure you'll find the full citation somewhere on the blog. Their approach does not mess with the feasibility cuts, but it tries to generate deeper optimality cuts. Perhaps that would buy you some speed-up.

6. Then it's definitely not for me :) Thanks for the reference!

Regards,
Vivek.

8. Hi Paul,

While "googloing" for a solution to my problem with benders decomposition i stumbled on your blog which relates to what am facing.
I have a 2 stage stochastic benders decomposition with big-M (disjunctive) constraints on my subproblem. Neither of my objectives (master or subproblem) have binary/integer constraints.
I implemented the benders optimality cuts the usual way but in some situations my lower bound overshoots my upper bound and the problem will never convergence.

Have you ever faced something similar?

On another note, congratulations for the great blog.

Best,
Roberto

1. Roberto: Thanks for the kind words. No, I have not seen what you described before (that I can recall ... if it was sufficiently traumatic, I might have blocked it from my memory). If your "big-M" constraints have really BIG values of M, though, you might consider the possibility that the subproblems become numerically unstable. If your software supports it (CPLEX does), you can check the condition numbers (kappa values) of the bases obtained in the node LPs (assuming the subproblems are IPs, and thus have nodes). Ill-conditioning suggests (but does not guarantee) instability, which could hypothetically possibly maybe lead to incorrect subproblem solutions.

2. Could it be because the MIP MP was not solved till optimality, and some promising sub-otimal incumbent MP solution was used to generate optimality cuts from SP(s)?

3. Sanchit: I don't think so, but without more information (starting with whether the MP was a max or a min problem) it's really hard to say what is going on. If the *subproblem* is not solved to optimality, though, perhaps that might (?) cause it.

9. Hi Paul,
Thank you very much for your great blog. It really helps a lot for my research work.
When I google Benders decomposition with binary variable in the subproblem, I found this useful post.
For my problem, the binary variables introduced in the subproblem are actually dummy variables which are not included in the objective function (They are not the complicating variables either). The physical meanings of the dummy variables are the power flow direction on the transmission lines.

I add penalty term to the subproblem so the subproblem is always feasible. What I did to extract duals from the subproblem is quite straightforward:

1. Solve the original MILP model. This will give you the optimal values for the dummy variable (flow direction).
2. Fix the value of the dummy variables to the values obtained from step 1, the problem becomes to a pure linear programming problem. Then I extract the corresponding duals and form the corresponding optimality cut.

I know that the duality gap in the MILP may prevent the implementation of the benders decomposition. But this method works in my problem. It will take a reasonable time for the solution to converge to the tolerance less than 0.5% for a large scale problem.

Could you give some comments on the approach I used? Are the Benders cuts obtained from this approach valid? I mean the benders cut from this approach would cause suboptimal or cut out some feasible region for the original problem?
Thank you very much for your time.

1. Hi Xiaohu,

I'm glad you find the blog helpful. I *think* I understand what you are doing in the subproblem. Assuming I do, I cannot say with certainty whether the cuts are valid or not, but I suspect that in some cases they might not be. (These are all optimality/point cuts, right?) The problem is that the fixed values of the subproblem binary variables (flow direction) wind up in the constant term of the optimality cut you generate. The optimality cut should be valid for any feasible solution to the master, but if you change the master solution, you may get a different set of flow directions in the subproblem, and so the constant term of the cuts *seems* likely not to be uniformly valid. If the constant term is not valid in general, it could make the cut too loose (unhelpful but harmless), but it could also make it too tight (making the true optimal solution to the master look more expensive than it should, and thus potentially causing the solver to reject it).

To summarize, I'm not sure, but I think you do run a risk of missing the correct optimum this way. Have you considered adding the flow direction variables to the master problem (making the subproblem an LP), but telling the solver to giving branching priority to the variables currently in the master?

One other possibility would be to relax the integrality constraint on the flow direction variables, solve the LP relaxation of the subproblem, and hope that it is a tight enough bound on the true subproblem objective value to provide an optimality cut that the current master solution does not satisfy.

2. There's one other option, which is basically an optimality cut version of the "no good" cut. Suppose that the master variables $x$ are all binary and the subproblem objective contribution (represented in the master by $z$) is nonnegative. At some point you have a candidate solution $(\hat{x}, \hat{z})$ for the master. You solve the (MIP) subproblem with $x=\hat{x}$ and get an objective value $w^*>\hat{z}$. Let $\mathcal{O}$ be the set of indices $i$ such that $\hat{x}_i=1$ and let $\mathcal{Z}$ be the set of indices such that $\hat{x}_i=0$. The constraint $$z\ge w^* \left(1 - \sum_{i\in\mathcal{Z}} x_i - \sum_{i\in\mathcal{O}} (1-x_i) \right)$$is a valid constraint that cuts off $(\hat{x}, \hat{z})$ but nothing else. It's not a "tight" cut, but at least it's valid.

3. Hi Paul,

I am not quite sure about the “constant term” you mentioned in your post. Did you mean the objective for the subproblem from the previous iteration? And I have tried to relax the integrality constraint in the subproblem and benders decomposition is actually converged. However, the solution is infeasible because the fractional value of the flow direction variable.

What I am doing is a power system planning problem. The general idea is to make the decision about which transmission line need to be installed with a power flow control device. I consider both the base case operation and contingency operation (Contingency means losing one of the transmission lines). So it is natural for me to decompose the problem into master problem (Base case operation) and a series of subproblem (Contingency case operation). The complicating variable between the master and subproblem is the location of the power flow control device and the generation of the generator (since the adjustment of generation in the contingency should be based on the generation in the base case operation).

The reason that I introduced the flow direction dummy variable is to transform the master problem from the original MINLP model into MILP model (at the expense of the additional dummy variable and big-M formulation). If the locations of the device are fixed, the subproblem can take either MILP (with dummy flow direction variable) form or just nonconvex NLP form (no flow direction variable). Therefore, I googled the question about having integer in the subproblem in Benders decomposition.

Benders cut applications are based on a prerequisite that the subproblem corresponding to the second stage is LP or convex NLP. However, I recently saw one paper in my fields citing the paper “Estimates of the duality gap for large-scale separable nonconvex optimization problem” to justify its implementation of benders decomposition with nonconvex NLP subproblem. The reason is that the objective function of the subproblem convexifies as the number of scenearios increases (The paper is dealing with stochastic wind power generation problem). I feel I may directly formulate the subproblem into NLP form since I considered a large number of contingency in the subproblem (These contingencies may correspond to the scenarios).

Could you give some suggestions on this topic?

Thank you very much for your time.

4. Sorry, I don't have any experience with either energy transmission models or nonconvex NLP subproblems, so I cannot offer any suggestions there.

Regarding my mention of the "constant term", consider the subproblem constraints $$Ey \ge d-Dx$$from the body of this post. In ordinary Benders (LP subproblem, $y$ continuous), you start with a proposed incumbent $\hat{x}$, solve the LP, get a dual vector (call it $h$) and then use $h^\prime (d-Dx)$ in forming a cut. The "constant term" of that cut is $h^\prime d$.

Now replace $Ey$ with $Ey + Fw$ where $y$ is continuous and $w$ is integer (your flow direction variables, if I understand you correctly). You solve the subproblem and get an optimal solution $(\tilde{y}, \tilde{w})$, then fix $w = \tilde{w}$ and solve the restricted LP to get a dual vector $h$. The constraints in the restricted LP are now $$Ey \ge d-D\hat{x}-F\tilde{w}$$and the Benders cut contains $$h^\prime (d-Dx-F\tilde{w}).$$The constant term for this cut is $h^\prime(d-F\tilde{w})$, and because it contains $\tilde{w}$ (which is optimal when $x=\hat{x}$ but not necessarily in general), I'm suspicious of the validity of the cut.

10. Thank you very much for your time! I guess I understand what you are trying to explain.

I think I get some simulation results to justify my approach (It will definitely be suboptimal but the results are very close to the global optimum in terms of the objective value).

What I did is to compare the results from my approach with the results obtained when relaxing all the flow direction variables in the subproblem to the continuous variables. I think the result from the (master problem+relax subproblem) provides the lower bound for the original problem (Actually the result is not a feasible result because the flow direction can take fractional value).

The result obtained by using my approach is very close to the lower bound. So I guess it is useful from engineering point of view.

Thank you again for your help!

11. "General integer variables can always be converted to binary variables, although it's not clear that the conversion is in general desirable."
Hi Paul, do you remember what you meant by this? I.e. how to convert a general integer variable to a binary variable? This seems non-trivial? E.g. expressing a benders cut over binary variables is easy, but doing the same for general integer variables seems much harder?

1. Joris,

I was thinking in terms of the original model formulation: replace general integer variable $x\in \{\ell,\dots,u\}$ ($\ell \ge 0$) with $\sum_{i=0}^n 2^i y_i$ where the $y$ variables are binary and $n=\left\lceil \log_{2}(u)\right\rceil$. I don't think there's any way to do this in a CPLEX callback if the original model contained $x$ but not $y_{\cdot}$.

Due to recent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Mathematics Stack Exchange.