Showing posts with label Benders decomposition. Show all posts
Showing posts with label Benders decomposition. Show all posts

Thursday, January 18, 2018

More on "Core Points"

A few additions to yesterday's post occurred to me belatedly.

First, it may be a good idea to check whether your alleged core point $y^0$ is actually in the relative interior of the integer hull $\mathrm{conv}(Y)$. A sufficient condition is that, when you substitute $y^0$ into the constraints, all inequality constraints including variable bounds have positive slack. (Equality constraints obviously will not have slack.) In particular, do not forget that nonnegativity restrictions count as bounds. If you specify $0\le y_i \le u_i$, then you are looking for $\epsilon \le y^0_i \le u_i - \epsilon$ for some $\epsilon > 0$ (and $\epsilon$ greater than your tolerance for rounding error).

Second, while the presence of equality constraints will definitely make the feasible region less than full dimension, it can occur even in a problem with only inequality constraints. Consider a problem with nonnegative general integer variables and the following constraints (as well as others, and other variables): \begin{align*} y_{1}+y_{2} +y_{3} & \le 2\\ y_{2}+y_{4} + y_{5} & \le 4\\ y_{1}+y_{3} + y_{4} + y_{5}& \ge 6 \end{align*} Although all the constraints are inequalities, the feasible region will live in the hyperplane $y_2 = 0$, and thus be less than full dimension. This points to why I said the condition in the previous paragraph is sufficient rather than necessary and sufficient for $y^0$ to be in the relative interior of the integer hull. In this example, there is no way to get positive slack in all three of the constraints (or, in fact, in any one of them) without violating the nonnegativity restriction on $y_2$.

Yesterday, I listed a few things one could try in the hope of getting a core point $y^0$ in the relative interior of the integer hull. Here are a few others that occurred to me. (Warning: I'm going to use the term "slack variable" for both slacks and surpluses.)
  • Tighten all inequality constraints (including variable bounds) and solve the LP relaxation of the tightened problem. (Feel free to change the objective function if you wish.) If you find a feasible solution $y^0$, it will be in the relative interior of the LP hull, and quite possibly in the integer hull. Good news: It's easy to do. Bad news: Even a modest tightening might make the problem infeasible (see example above).
  • Insert slack variables in all constraints, including variable bounds. That means $0 \le y_i \le u_i$ would become \begin{align*} y_{i}-s_{i} & \ge0\\ y_{i}+t_{i} & \le u_{i} \end{align*} where $s_i \ge 0$ and $t_i \ge 0$ are slack variables. Maximize the minimum value of the slacks over the LP relaxation of the modified problem. Good news: If the solution has positive objective value (meaning all slacks are positive), you have a relative interior point of at least the LP hull. Bad news: An unfortunate combination of inequalities, like the example above, may prevent you from getting all slacks positive.

Wednesday, January 17, 2018

Finding a "Core Point"

In a famous (or at least relatively famous) paper [1], Magnanti and Wong suggest a method to accelerate the progress of Benders decomposition for certain mixed-integer programs by sharpening "optimality" cuts. Their approach requires the determination of what they call a core point. I'll try to follow their notation as much as possible. Let $Y$ be the set of integer-valued vectors satisfying all the constraints that involve solely the integer variables. Basically, $Y$ would be the feasible region for the problem if the continuous variables did not exist. A core point is a point $y^0$ in the relative interior of the convex hull of $Y$.

I'll assume that any reader who made it this far knows what a convex hull is. That said, I should point out that the convex hull of $Y$ is not the feasible region of the LP relaxation of $Y$. The feasible region of the LP relaxation is frequently termed the "LP hull", whereas the convex hull of $Y$ is typically referred to as the "integer hull".  The integer hull is a subset (typically a proper subset) of the LP hull. Knowing what the integer hull is basically makes a MIP model pretty easy: solving a linear program over the integer hull yields the integer optimum. Since it's pretty well known that most MIP models are not easy, one can infer that in most cases the integer hull is not known (and not easy to figure out).

Mathematicians will know the term "relative interior", but it may not be familiar to OR/IE/MS people trying to solve MIP models. In Euclidean spaces, a point belongs to the interior of a set if there's a ball centered around that point and contained entirely in the set. In lay terms, you're at an interior point if you move slightly in any direction and not leave the set. In our context, that set is the integer hull, a convex polyhedron. The catch is that if the set is not full dimensional, there is no interior (or, more properly, the interior is empty). Picture the unit cube in $\mathbb{R}^3$ as the integer hull. The point $y^0 = (0.5, 0.5, 0.5)$ is in the interior; you can move a bit in any direction and stay in the cube. Now add the constraint $y_3 = 0.5$, which flattens the integer hull to a square (still containing $y^0$). You can move a bit in the $y_1$ and $y_2$ directions and stay in the square, but any vertical movement (parallel to the $y_3$ axis) and you're no longer feasible. That brings us to the relative interior. A point is in the relative interior of a set if it can be surrounded by a ball in the largest subspace for which the set is full dimensional, with the ball staying in the set. Returning to our example, before adding the equality constraint I could surround $y^0$ with a sphere contained in the unit cube. After adding the equality constraint, making the feasible region a square, I can no longer surround $y^0$ with a feasible sphere, but I can surround it with a feasible circle in the $y_3 = 0.5$ plane. So $y^0$ is at least in the relative interior of the integer hull after adding the equation constraint.

Back to Magnanti-Wong. To use their technique, you need to know a "core point" up front. In their paper, they mention that in some cases a core point will be obvious ... but in many cases it will not be. So I'll mention some techniques that probably yield a core point (but no general guarantee).
  • If you happen to know two or more integer-feasible points, average them. The average will be feasible, and unless all those points live on the same facet of the integer hull, you should get a relative interior point.
  • Pick an objective function and optimize it (maximize or minimize, it doesn't matter) over $Y$, ignoring the continuous variables and any constraints involving them from the original problem. Do this a few times: minimize and maximize the same objective, switch the objective, iterate; or just minimize (or maximize) a different objective each time. I might use randomly generated objective functions, but an easy alternative is to minimize $y_1$, then maximize $y_1$, then minimize $y_2$, etc. Average the solutions you get. This is guaranteed to belong to integer hull, and almost surely (at least with random objectives and multiple iterations) to the relative interior of the integer hull. Bad news: you just solved a bunch of integer programs, which might be a trifle time-consuming.
  • Use the previous method, but optimize over the LP hull (relaxing the integrality constraints on $y$). Again, average the solutions you get. Good news: Solving a handful of LPs is typically much less painful than solving a handful of IPs. Bad news: Since the LP hull is a superset of the integer hull, and since your LP solutions are all vertices of the LP hull, there's at least a chance that the average of them lives inside the LP hull but outside the integer hull. That said, I've used this method once or twice and not lost any sleep. If your core point happens to fall outside the integer hull, I don't think it will cause any problems; it just probably won't make your Benders decomposition solve any faster.
  • Generate a bunch of random integer vectors of the correct dimension, and filter out those that do not satisfy the constraints defining $Y$. Average the survivors. Good news: Each of the surviving integer vectors will belong to the integer hull, so your averaged core point $y^0$ definitely will as well. Also, generating random integer vectors is pretty easy. Bad news: If the integer hull is less than full dimension, the probability of a random vector falling in it is zero, so the likelihood of any "survivors" is negligible. Mitigating news: In some cases you can randomly generate an integer vector and then tweak it to ensure it satisfies the constraints that are keeping the integer hull from being full dimension. For instance, suppose that you have a constraint of the form $\sum_i y_i = B$ where $B$ is an integer constant. Generate a random integer vector $\tilde{y}$, replace $\tilde{y}_1$ with $B-\sum_{i>1}\tilde{y}_i$, and you have a vector satisfying that constraint. If you can pull off similar tweaks for other equation constraints (staying integer-valued and not violating bounds on the variables), maybe you can get lucky and find a few integer-feasible points. In fact, even if you can only find one survivor (before exhausting your patience), you might get lucky and have it be in the relative interior of the integer hull. (You won't know this, but you can try using it as your core point and hope for the best.)


[1] Magnanti, T. L. & Wong, R. T., Accelerating Benders Decomposition: Algorithmic Enhancement and Model Selection Criteria. Operations Research, 1981, 29, 464-484

Monday, November 13, 2017

Benders Decomposition with Generic Callbacks

Brace yourself. This post is a bit long-winded (and arguably geekier than usual, which is saying something). Also, it involves CPLEX 12.8, which will not ship until some time next month.

I have an updated version of an old example, solving a fixed charge transportation problem using Benders decomposition. The example (using Java, naturally) is scattered across several posts, going back a ways:
Now here I am breaking the even year pattern by doing yet another version in 2017. If you want to see the MIP model written out in mathematical notation, it's in the 2012 post. If you want to see the code for the most recent version, which contains both a "manual" decomposition (using callbacks) and a few versions of the new (in 12.7) automatic decomposition feature, there's a link to the repository in the 2016 post. At this point in time, the 2014 post is not worth reading.

Before I get into the updated example code, let me outline some key differences between the old ("legacy") callback system and the new ("generic") callback system.

When is a callback called by CPLEX?
  • Legacy: This is dictated by the type of callback. A branch callback is called when CPLEX is ready to split a node into children, a lazy constraint callback is called when CPLEX identifies what it thinks is a new integer-feasible solution, and so on.
  • Generic: When you attach your callback to the model (IloCplex instance), you specify via a bit mask in which of several "contexts" it should be called. As of this writing (things do have a habit of changing), the contexts are Candidate (a new integer-feasible candidate for incumbent solution has been found), GlobalProgress (progress has occurred at the "global" level -- presumably something you might see in the node log), LocalProgress (a thread has made what it thinks is progress, but has not yet passed it back to the controller), Relaxation (CPLEX has found a solution that might not be integer-feasible, such as the solution to a node LP), ThreadDown (CPLEX deactivated a thread) and ThreadUp (CPLEX activated a thread).
If you are fuzzy about the difference between local and global progress, one example is a thread that finds a new incumbent solution, one that is better than the incumbent that existed when the thread was launched, but not as good as the current incumbent (recently found by a different thread). That is local progress (in the first thread) but not global progress (the incumbent found by the first thread will be ignored when it gets around to reporting it to the controller).
Attaching callbacks to models:
  • Legacy: You attach separate callbacks of each desired type (info callback, branch callback, user cut callback, ...).
  • Generic: You attach a single callback that handles all supported operations. Attaching a second callback automatically detaches the first one. When attaching the callback, you specify via a bitmap from which context(s) the callback is to be called.
Callback classes:
  • Legacy: Your class extends one of the existing callbacks. For instance, if you want a lazy constraint callback, you extend the IloCplex.LazyConstraintCallback class and override the abstract main() method.
  • Generic: Your class implements the IloCplex.Callback.Function interface and overrides the abstract invoke() method.
Calling functions inside the callback:
  • Legacy: You call one of the methods for the class you extended. For example, in a branch callback (extending IloCplex.BranchCallback), you would call one of the overloads of makeBranch() to create a new branch.
  • Generic: When CPLEX calls the invoke() method, it passes in as an argument an instance of IloCplex.Callback.Context. This object has a number of methods you can invoke to find out the context, access a candidate solution (if one exists), get information about what is going on (including the index of the thread in which it was invoked, and the total number of threads in use), etc. Basically, this is your key to the castle.

Benders decomposition


For Benders decomposition, we will be focused on the Candidate context (CPLEX thinks it has an integer-feasible solution, which we need to test) and the ThreadUp and ThreadDown contexts, for reasons I'll eventually get to.

I wrote about thread safety in my previous post. Let me pin down why this is critical when using Benders decomposition. With Benders, we have a master problem and one or more subproblems. (I'll stick with one subproblem here, for concreteness.) The master problem model itself is never modified. Benders cuts are added to the working copy of the master problem, from the callback. Several threads could be generating Benders cuts more or less simultaneously, but it is the responsibility of CPLEX (not your code) to make sure that multiple threads adding cuts don't trip over each other.

On the other hand, the way we generate cuts is to use the proposed incumbent solution to modify the constraint limits in the subproblem (or the subproblem objective, if you're one of those people who prefers to work with the dual of the actual subproblem). It's our code (specifically, the callback) making those changes, and if changes are being made concurrently by multiple threads, bad things can happen. Just to verify this is not hypothetical, I converted my original Benders code to the new callback system and ran it without doing anything special regarding thread safety. It crashed the JVM in a matter of seconds. Lesson learned.

So what are our options? One is to use a single copy of the subproblem and impose a locking mechanism that prevents more than one thread from accessing the subproblem at a time. That turns out to be the easiest to program in Java (at least in my opinion), but it slows the solver down due to threads being blocked. What seems to be the general wisdom for using locks is that you should only use them on chunks of code that execute quickly. Modifying what might be a large LP model, solving it, and finding a dual ray or Farkas certificate if it is infeasible or the dual solution if it is feasible does not meet the "execute quickly" criterion.

Another option is to generate an independent copy of the subproblem each time a thread needs it. That seems wastefully slow to me. A third, more promising approach is to generate one copy of the subproblem per thread and store it, reusing it each time the callback is called in that thread. This again eats CPU cycles generating the subproblem multiple times, but once per thread is a lot less than once each time CPLEX calls the callback. A downside, however, is that this could consume a prohibitive amount of memory if the subproblem is quite large.

The revised Benders example


The latest version of my Benders example is available at https://gitlab.msu.edu/orobworld/BendersExample3. In addition to my code (and Java), you will need CPLEX 12.8 (which I am told is coming out in December) and the open-source Apache Commons CLI library (for processing command line options).

I've added some command line options to let you alter a number of things without having to alter or comment out code: which models to run; what random seed to use; how many trials to run (default 1); whether to show CPLEX log output for neither problem, just the master, or both master and subproblem; whether to show the formulas of the cuts generated (or just settle for messages telling you when a cut is added); and whether to print the full solution (which warehouses are used, what the shipment volumes are). Run the program with "-h" or "--help" on the command line to get the full list of options.

The new code has two versions of the "manual" (callback-based) Benders method. The "-m" command line option runs the default "manual" model, while the "-m2" option runs what I inelegantly named the "thread-local manual" model. The difference is in the approach taken to assure thread safety. I'll briefly describe that below, and of course you can pore over the code if you want more details.

Locking


The default manual model avoids collisions between threads by locking parts of the code so that one thread cannot execute certain portions when any other thread is executing certain portions. There are multiple ways to do this in Java, but a simple one (which I use here) is to synchronize methods. To do this, all you have to do is add the keyword synchronized to the method declarations. (In addition to synchronizing methods, you can synchronize smaller chunks of code, but I did not use that feature.)

My ManualBenders class has a private inner class named BendersCallback. The callback is entered only from the Candidate context. Its sole method is the invoke() method mandated by the IloCplex.Callback.Function interface described earlier. By declaring that method

public synchronized void invoke(final IloCplex.Callback.Context context)

I ensure that no thread can invoke the callback while another thread is in the callback. The good news is that it works (prevents threads from interfering with each other) and it's incredibly easy to do. The bad news is that it results in threads blocking other threads. The subproblem in the example is pretty small. If it were slow to solve (or large enough to make updating bounds meaningfully slower), performance would be even worse.

Thread-specific copies of the subproblem


The "thread-local" model avoids thread collisions by giving each thread a separate copy of the subproblem to play with. I created a class (cleverly named Subproblem) to hold instances of the subproblem. The callback (instance of class BendersCallback2) is now entered from three different contexts, ThreadUp, ThreadDown and Candidate. As before, in the Candidate context we do the usual things: use the candidate solution to the master problem to modify constraint limits in the subproblem; solve the subproblem; depending on whether it is feasible, and what its optimal objective value is, either accept the new incumbent or reject it by adding a cut. We use calls in the ThreadUp context to create and store a copy of the subproblem for the newly activated thread, and similarly we use calls in the ThreadDown context to delete the local subproblem copy and recover its memory.

So where do we store subproblem copies so that only the thread invoking the callback can see its designated copy? In their BendersATSP2.java example code, IBM creates a vector, of length equal to the maximum number of threads that will be used, to store subproblems. The context argument to the callback provides a method, context.getIntInfo(IloCplex.Callback.Context.Info.ThreadId), that returns the zero-based index number of thread invoking the callback, which is used to pluck the correct copy of the subproblem from the vector of copies.

I took a slightly different approach in my code, using a generic Java class named ThreadLocal that exists precisely to hold objects local to a given thread. My BendersCallback2 class contains a private field declared

private final ThreadLocal<Subproblem> subproblem

to hold the copy of the subproblem belonging to a given thread. When the callback is called from the ThreadUp context, subproblem.set() is used to store a newly constructed copy of the subproblem; when called from the Candidate context, subproblem.get() is used to access the copy of the subproblem belonging to the thread. (Since this is all rather new, both to CPLEX and to me, and since I have a suspicious nature, I stuck some extra code in the example to ensure that subproblem is empty when called from ThreadUp and not empty when called from Candidate or ThreadDown. There are comments indicating the superfluous code. Feel free to omit it from your own projects.)

I thought that when a thread was deleted, the subproblem variable would be deleted. Since subproblem is (I think) the only pointer to the Subproblem instance for that thread, I expected the memory used by the Subproblem to be recovered during garbage collection. Apparently I was wrong: my initial attempt leaked memory (enough to shut down the JVM after solving a few problems). So I added an invocation of the callback from the ThreadDown context (thread deactivation), at which point the callback deletes the Subproblem instance. That seems to have cured the memory leak. So, in summary, the callback is called in three contexts:
  • ThreadUp -- create a copy of the subproblem for the thread;
  • Candidate -- test a candidate incumbent and, if necessary, generate a Benders cut; and
  • ThreadDown -- delete the copy of the subproblem belonging to that thread.
I'm comfortable with this approach, but you might prefer the subproblem vector used in the BendersATSP2.java example.

Performance Comparison


I ran five solution methods (Benders with locking, Benders with copies of the subproblem for each thread, and three versions of the automatic Benders decomposition added to CPLEX 12.7) on 10 instances of the fixed charge transportation problem. That's a small sample, based on a single type of problem, on a single machine (four cores, 4 GB of RAM), so I wouldn't read too much into the results. Nonetheless, I thought I would close by comparing solution times. The three automatic methods provided by CPLEX ("annotated", "automatic" and "workers", described in my 2016 post) had virtually indistinguishable times, so to make the plot more reasonable I am including only the "automatic" method (labeled "Auto"), along with my two approaches ("Local" for the method using copies of the subproblem for each thread and "Locked" for the method using synchronization).

The plot below shows, as a function of wall clock time, the fraction of trials that reached proven optimality by the specified time. The automatic approach and my manual approach with thread-local subproblems were competitive (with the automatic approach having a slight advantage), while the manual approach with synchronized class methods lagged rather badly. Although I cannot say with certainty, I am rather confident that is a result of threads engaged in solving a subproblem blocking other threads wanting to do the same.

performance plot (completion rate versus run time, by method)

Finally, just to drive home the realities of multiple threads, here is a sample of the output generated by the synchronized (locked) approach during one run.

>>> Adding optimality cut
>>> Accepting new incumbent with value 3598.1162756564636
>>> Accepting new incumbent with value 3598.1162756564636
>>> Accepting new incumbent with value 4415.750134379851
>>> Accepting new incumbent with value 4415.750134379851
>>> Accepting new incumbent with value 4411.334611014113
>>> Accepting new incumbent with value 4411.334611014113
>>> Accepting new incumbent with value 4700.008131082763
>>> Accepting new incumbent with value 4700.008131082763
>>> Adding optimality cut
>>> Adding optimality cut
>>> Adding optimality cut
Final solver status = Optimal
# of warehouses used = 14
Total cost = 3598.1162756564636
Solution time (seconds) = 143.090

The objective here is minimization. The messages beginning ">>>" are printed by the callback. Notice that after the optimal solution is found and accepted (in two different threads?), provably suboptimal solutions (solutions with inferior objective values) appear to be accepted several times. This also happens with the thread-local approach. It may well be happening in the various automated Benders approaches, but since they do not print progress messages there is no way to tell. This illustrates the difference between "local" and "global" progress. It happens because the various worker threads do not communicate with each other, and communicate only sporadically with the controller thread. So if thread 3 finds the optimal solution and has it accepted, thread 1 does not know this and continues cranking through its portion of the search tree, finding new "incumbents" that are improvements over the best solution thread 1 has seen but not necessarily improvements over the best global solution. The inferior solutions are accepted in the threads in which they were found, but the controller knows not to accept them as incumbents (because it already has a better incumbent). This behavior is not an error; it's just another indication why adding threads does not improve performance proportionally.

Saturday, January 21, 2017

Fischetti on Benders Decomposition

I just came across slides for a presentation that Matteo Fischetti (University of Padova) gave at the Lunteren Conference on the Mathematics of Operations Research a few days ago. Matteo is both expert at and dare I say an advocate of Benders decomposition. I use Benders decomposition (or variants of it) rather extensively in my research, so it ends up being a frequent theme in my posts. Those posts tend to generate more comments than posts on other topics. Apparently Matteo and I are not the only BD users out there.

I don't know that I would recommend Matteo's presentation as the starting point for someone who has heard of BD but never used it, but I certainly recommend having a look at the slides for anyone who has any familiarity with BD. Matteo provides several interesting perspectives as well as a tip or two for potentially improving performance. I learned a few new things.

In a sad coincidence, Professor Jacques Benders, the originator of Benders decomposition, passed away at age 92 just eight days before Matteo's presentation.

Friday, December 2, 2016

Support for Benders Decomposition in CPLEX

As of version 12.7, CPLEX now has built-in support for Benders decomposition. For details on that (and other changes to CPLEX), I suggest you look at this post on J-F Puget's blog and Xavier Nodet's related slide show. [Update 12/7/16: There is additional information about the Benders support in a presentation by IBM's Andrea Tramontani at the 2016 INFORMS national meeting, "Recent advances in IBM ILOG CPLEX".]

I've previously posted Java code for a simple example of Benders decomposition (a fixed-cost warehousing/distribution problem). To get a feel for the new features related to Benders, I rewrote that example. The code for the revised example is in this Git repository, and is free for use under a Creative Commons 3.0 license. There is also an issue tracker, if you bump into any bugs.

Going through the code here would be pretty boring, but there are a few bits and pieces I think are worth explaining, and I'll show some far from definitive timing results.

Modeling Approaches


I have five modeling approaches in the code.
  • STANDARD: This is just a single MIP model, using no decomposition. It's present partly for benchmark purposes and partly because a few of the newer approaches build on it.
  • MANUAL: This is essentially a reprise of my earlier code. I write two separate models, a master problem (MIP) and a subproblem (LP), just as one would do prior to CPLEX 12.7.
  • ANNOTATED: This is the first of three methods exploiting the new features of CPLEX 12.7 I create a single model (by duplicating the STANDARD code) and then annotate it (using the annotatiion method added to CPLEX 12.7) to tell CPLEX how to split it into a master problem and a single subproblem. The annotation method would also let me create multiple disjoint subproblems, but the example we are using only needs one subproblem.
  • WORKERS: This is the ANNOTATED method again, but with a parameter setting giving CPLEX the option to split the single subproblem into two or more subproblems if it sees a structure in the subproblem that suggests partitioning is possible.
  • AUTOMATIC: Here I create a single model (identical to the STANDARD method) and, via a parameter setting, let CPLEX decide how to split it into a master problem and one or more subproblems.

Benders Strategy Parameter


The key to all this is a new parameter, whose Java name is IloCplex.Benders.Strategy. It is integer valued, but for some reason is not part of the IloCplex.IntParam class hierarchy (which threw me off at first when I tried to use it in my code). The values defined for it are:
  • -1 = OFF: Use standard branch and cut, doing no Benders decomposition (even if annotations are present). Note that this will not stop manually coded Benders from working (as in my MANUAL method); it just stops CPLEX for creating a decomposition. This is the default value.
  • 0 = AUTO: If no annotations are present, do standard branch and cut (akin to the OFF value). If the user supplies annotations, set up a Benders decomposition using those annotations, but partition the subproblems further if possible (akin to the WORKERS behavior below). If the user supplies incorrect annotations, throw an exception. This is the default value.
  • 1 = USER: Decompose the problem, adhering strictly to the user's annotations (with no additional decomposition of subproblems). If the user fails to annotate the model, or annotates it incorrectly, throw an exception.
  • 2 = WORKERS: This is similar to USER, but gives CPLEX permission to partition subproblems into smaller subproblems if it identifies a way to do so.
  • 3 = FULL: CPLEX ignores any user annotations and attempts to decompose the model. If either all the variables are integer (no LP subproblem) or none of them are (no MIP master problem), CPLEX throws an exception.
There are also two other Benders-related parameters, in Java IloCplex.Param.Benders.Tolerances.feasibilitycut and IloCplex.Param.Benders.Tolerances.optimalitycut, that set tolerances for feasibility and optimality cuts. (In my code I leave those at default values. If it ain't broke, don't fix it.)

Syntax


Coding the strategy parameter looks like the following.
IloCplex cplex;      // declare a model object
// ... build the model ...
int strategy = ...;  // pick the strategy value to use
cplex.setParam(IloCplex.Param.Benders.Strategy, strategy);

Annotations


Annotations can be specified in a separate file (if you are reading in a model) or added in code (which is what I do in my demo program). IBM apparently intends annotations to be used more generally than just for Benders decomposition. To use them for Benders, what you do is annotate each model variable with index number of the problem into which it should be placed (where problem 0 is the master problem and problems 1, 2, ... are subproblems). Only variables are annotated, not constraints or objective functions. If you fail to annotate a variable, it is given a default annotation (discussed further below). If you assign a negative integer as an annotation, the universe will implode spectacularly, and CPLEX with throw an exception just before it does.

You start by creating a single MIP model, as if you were not going to use Benders decomposition. Assuming again that your model exists in a variable named cplex, you begin the decomposition process with a line like the following:
IloCplex.LongAnnotation benders =
  cplex.newLongAnnotation("cpxBendersPartition");
The name "benders" for the variable in which the annotation is stored is arbitrary, but the name cpxBendersPartition for the annotation must be used verbatim. This version of the annotation constructor sets the default value to 0, so that variables lacking annotations are assumed to belong to the master problem. An alternate version of newLongAnnotation uses a second argument to set the default value.

Next, you annotate each variable with the index of the problem into which it should be placed. Suppose that we have two variables defined as follows.
IloIntVar x = cplex.boolVar("Use1");
  // open warehouse 1?
IloNumVar y = cplex.numVar("Ship12")
  // amount shipped from warehouse 1 to customer 2
Assuming that we want x to belong to the master problem (index 0) and y to belong to the first and only subproblem (index 1), we would add the following code:
cplex.setAnnotation(benders, x, 0);
cplex.setAnnotation(benders, y, 1);
where benders is the variable holding our annotation.

There are versions of setAnnotation that take vector arguments, so that you can annotate a vector of variables in a single call. If the model in question has a fairly small number of integer variables and a rather large number of continuous variables (which is pretty common), you might want to use the two-argument version of newLongAnnotation to set the default value at 1, and then annotate only the integer variables, letting the continuous variables take the default annotation (i.e., be assigned to subproblem 1).

That's all there is to creating a Benders decomposition from a standard MIP model. Note, in particular, that you do not need to create callbacks. CPLEX handles all that internally. You can still add things like lazy constraint callbacks if you have a reason to do so; but for a "typical" Benders decomposition (dare I use that phrase?), you just need to create a single MIP model, annotate it, and set the Benders strategy parameter. What could be easier than that? Well, actually, one thing. If you set the Benders strategy to FULL, you don't even have to annotate the model! You can let CPLEX figure out the decomposition on its own.

Performance


Test runs of a single model (especially one as simple as the fixed-charge warehouse problem) on a single computer (especially one with a measly four cores) written by a single programmer (who happens not to be terribly good at it) don't prove anything. I'll leave it to someone else to do serious benchmarking. That said, I was curious to see if there were any gross differences in performance, and I also wanted to see if all methods led to correct answers. (I'm not what you would call a trusting soul.) So I ran 25 replications of each method, using a different problem instance (and a different random seed for CPLEX) each time. In an attempt to level the playing field, I forced Java to collect garbage after each model was solved (and timing stopped), to minimize the impact of garbage collection on run times. All problem instances were the same size: 50 warehouses serving 4,000 customers. The basic model has these dimensions: 4,050 rows; 200,050 columns (of which 50 are binary variables, the rest continuous); and 400,050 nonzero constraint coefficients.

The first plot below shows the wall-clock time spent setting up (and, where relevant, decomposing) each model.
box plot of model setup times
None of the methods strikes me as being overly slow, ignoring a couple of outliers. Using annotations (the "Annotated" and "Workers" methods) does seem to impose a nontrivial burden on model construction.

The second plot shows solution times. For each problem instance, all five methods achieved identical objective values (and proved optimality), and all used the same number of warehouses. (I did not check whether the values of the flow variables were identical.) So any concerns I might have had about validity were assuaged.
box plot of solution times
I'm not sure how much to read into this, but "manual" decomposition (the old fashioned approach, using explicit callbacks) seems to have the highest variance in solution time. The three approaches using new features of CPLEX 12.7 ("Annotated", "Automatic" and "Workers") had very similar run times. I'm fairly certain that the "Workers" method, wherein CPLEX tries to further partition the single subproblem I specified, wound up sticking with a single subproblem (due to the structure of the model).

Which Is Better?


Assume that you have a problem that is amenable to "standard" Benders decomposition (as opposed to some of the funky variants, such as combinatorial Benders, Benders with MIP subproblems, Benders with subproblems that are not necessarily optimization problems, ...). The easiest approach from the user perspective is clearly the automatic option (Benders strategy = FULL), in which CPLEX literally does all the work. Runner up is the annotation approach, which is both much easier and much less prone to coding errors than the "manual" approach (defining separate problems and writing a lazy constraint callback).

On the performance side, things are a bit less clear. If you dig through Xavier Nodet's slides, you'll see that CPLEX use somewhat sophisticated techniques, based on recent research, to generate Benders cuts. I suspect that, on the test problem, their cuts would be a bit stronger than the ones I generate in the "manual" approach, which may account for some of the difference (in their favor) in median run times seen in the second plot. Also, since they can access the subproblems and add cuts to the master directly, rather than having to go through callbacks, using the annotation feature may result in a bit less "drag".

With other cases, I suspect that if you have a particular insight into the model structure that lets you generate problem-specific Benders cuts that are tighter than the generic cuts, you may do better sticking to the manual approach. Fortunately, since you can turn on automatic Benders decomposition just by tweaking a single parameter, it should be easy to tell whether your cuts really do improve performance.

Wednesday, August 6, 2014

Updated Benders Example

Two years ago, I posted an example of how to implement Benders decomposition in CPLEX using the Java API. At the time, I believe the current version of CPLEX was 12.4; as of this writing, it is 12.6.0.1. Around version 12.5, IBM refactored the Java API for CPLEX and, in the process, made one or more non-backward-compatible changes that broke the sample code. This morning, I posted an updated version.

While I was at it, I decided to create a Git repository for the code on Bitbucket. You can access the repository at https://bitbucket.org/prubin/bendersexample. If you click the "Downloads" link in the navigation menu, you will be led to a page from which you can download the source code in a Zip archive. (There is no executable for this project, just the source.) There is also a link to an issue tracker in case you run into a bug (not that I ever write code with bugs in it) and wish to report it.

UPDATE: I moved the code to a new repository. The URL is https://gitlab.msu.edu/orobworld/BendersExample. Click "Files" in the left side navigation menu to get to the source code. There will be a link (upper right) on that page to download the source in a Zip archive.

The code and issue tracker should be accessible to anonymous users (no login required). If not, hopefully someone will let me know. As with the blog, the code is released under a rather permissive Creative Commons license.

Wednesday, July 31, 2013

Benders Decomposition with Integer Subproblems

I'm not sure why, but occasionally people post questions related to an attempt to apply Benders decomposition in a situation where the subproblem contains integer variables. A key question is how you generate cuts from the subproblem, since discrete problems do not enjoy the same duality theory that continuous problems do.

A typical application of Benders decomposition to integer programming starts with a problem of the form\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & c_{2}'y\\ \textrm{subject to} & Ax & & & \ge & a\\ & & & By & \ge & b\\ & Dx & + & Ey & \ge & d\\ & x & \in & \mathbb{Z}^{m}_+\\ & y & \in & \mathbb{R}^{n}_+ \end{array} \]This decomposes into a master problem\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & z\\ \textrm{subject to} & Ax & & & \ge & a\\ & h'x & & & \ge & h_0 & \forall (h,h_0)\in \mathcal{F}\\ & h'x & + & z & \ge & h_0 & \forall (h, h_0)\in \mathcal{O}\\ & x & \in & \mathbb{Z}^{m}_+ \\ & z & \ge & 0 \end{array} \]and a subproblem\[ \begin{array}{lrclcc} \textrm{minimize} &  c_{2}'y\\ \textrm{subject to} & By & \ge & b\\ & Ey & \ge & d - Dx\\ & y & \in & \mathbb{R}^{n}_+ \end{array} \]where $\mathcal{F}$ and $\mathcal{O}$ are  sets of coefficient vectors for "feasibility" cuts (pushing $x$ in directions that make the solution $(x,y)$ feasible) and "optimality" cuts (pushing $z$ upward so as not to underestimate $c_2'y$) respectively. The subproblem is a linear program, and its dual solution supplies the coefficient vectors $(h,h_0)$ for both types of master cuts.

So what happens if $y$ is integer-valued ($y\in\mathbb{Z}^n_+$) rather than continuous ($y\in\mathbb{R}^n_+$)? I don't have a definitive answer, but there are a few things that can be tried. The following suggestions should also work equally well (or poorly) when $y$ is a mix of integer and continuous variables.

Proceed as usual


The subproblem is now an integer program, but you can always relax it to a linear program and obtain the dual solution to the relaxation. If the current master solution $x = \hat{x}$, $z = \hat{z}$ makes the linear relaxation of the subproblem infeasible, you can be sure it also makes the actual subproblem infeasible, and thus you will get a legitimate feasibility cut. If the subproblem is feasible, the dual to the relaxation will produce a cut that forces $z$ to be at least as great as the objective value of the relaxation, which is a legitimate lower bound for the actual subproblem objective value.

The news here is not all good, though. It is possible that $\hat{x}$ makes the subproblem integer-infeasible but with a feasible relaxation, in which case you will not get the feasibility cut you need. If the subproblem is feasible (let's say with optimal solution $\hat{y}$) but $\hat{z}$ underestimates its objective value $c_2'\hat{y}$, you want an optimality cut that forces $z\ge c_2'\hat{y}$ when $x=\hat{x}$; but the cut you get forces $z\ge w$ where $w$ is a lower bound for $c_2'\hat{y}$, and so there is the possibility that $c_2'\hat{y} > \hat{z} \ge w$ and the optimality cut accomplishes nothing.

"No good" constraints for infeasibility


Suppose that $x$ consists exclusively of binary variables. (General integer variables can always be converted to binary variables, although it's not clear that the conversion is in general desirable.) We can exclude a particular solution $x=\hat{x}$ with a "no good" constraint that forces at least one of the variables to change value:\[\sum_{i : \hat{x}_i=0} x_i + \sum_{i : \hat{x}_i = 1} (1-x_i)\ge 1.\]This gives us another option for feasibility cuts. Solve the subproblem as an IP (without relaxation); if the subproblem is infeasible, add a "no good" cut to the master problem. Note that "no good" cuts are generally not as deep as regular Benders feasibility cuts -- the latter may cut off multiple integer solutions to the master problem, whereas a "no good" cut only cuts off a single solution.

If a "no good" cut eliminates just one solution, is it worth the bother? After all, the node that produced $x=\hat{x}$ will be pruned once we realize the subproblem is infeasible. The answer depends on a combination of factors (and essentially reduces to "try it and see"). First, if $x=\hat{x}$ was produced by a heuristic, rather than as the integer-feasible solution to the node LP problem, then you likely cannot prune the current node (and, in fact, the node you would want to prune may be elsewhere in the tree). Adding the "no good" cut may prevent your ever visiting that node, and at minimum will result in the node being pruned as soon as you visit it, without having to solve the subproblem there. Second, if your master problem suffers from symmetry, the same solution $x=\hat{x}$ may lurk in more than one part of the search tree. The "no good" cut prevents your tripping over it multiple times.

It may be possible to strengthen the cut a bit. Suppose that $\hat{x}$ renders the subproblem infeasible (as an IP). There are various ways to identify a subset of the subproblem (excluding the objective function) that causes infeasibility. CPLEX can do this with its conflict refiner; other solvers may have similar functionality. Let $N$ be the set of indices of the $x$ variables and $N_0$ the set of indices of all $x$ variables that appear in the right hand side of at least one subproblem constraint identified as part of the conflict. If we are lucky, $N_0$ is a proper subset of $N$. We can form a "no good" cut for the master problem using just the variables $x_i, i\in N_0$, rather than all the $x$ variables, and obtain a somewhat deeper cut (one that potentially cuts off multiple master problem solutions). The caveat here is that running something like the CPLEX conflict refiner, after determining that the subproblem is infeasible, may eat up a fair bit of CPU time for questionable reward.

"No good" constraints for optimality


It may be possible to exploit the technique I just described to create ersatz optimality constraints as well. Suppose that the current incumbent solution is $(\tilde{x}, \tilde{y})$, and that some node gives an integer-feasible solution $(\hat{x},\hat{z})$ for the master problem. It must be the case that\[c_1'\hat{x}+\hat{z}<c_1'\tilde{x}+c_2'\tilde{y},\]else the master problem node would be pruned based on bound. Now suppose we pass $\hat{x}$ to the IP subproblem and obtain an optimal solution $y=\hat{y}$. If $c_1'\hat{x}+c_2'\hat{y}<c_1'\tilde{x}+c_2'\tilde{y}$, we have a new incumbent solution. If not, then $x=\hat{x}$ cannot lead to an improved solution, and we can add a "no good" cut to eliminate it (again recognizing that this is a weak constraint).

More?


That pretty much exhausts my quiver. If any readers have other ideas for generating Benders cuts from integer subproblems, I invite you to post them in comments.

Sunday, September 23, 2012

Separable Benders Decomposition

A reader (I have readers??) asked me a question about Benders decomposition of a mixed-integer program (MIP) when the linear program (LP) subproblem is decomposable: how do we glue together a mix of optimality and feasibility cuts from the subproblems? The short answer is that we don't.

Rather than repeat a lot of definitions, I'll just refer you to a previous post for some background (if you need it; chances are you don't, because if you're not familiar with Benders, you've already tuned out). I will repeat the formulation of a generic decomposable MIP, just to establish notation:\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & c_{2}'y\\ \textrm{subject to} & Ax & & & \ge & a\\ & & & By & \ge & b\\ & Dx & + & Ey & \ge & d\\ & x & \in & \mathbb{Z}^{m}\\ & y & \in & \mathbb{R}^{n} \end{array} \]The MIP decomposes into a master MIP (containing \(x\) and a real variable \(z\) that serves as a surrogate for \(c_{2}'y\)) and a subproblem (LP) containing \(y\).

Now suppose that \(B\) and \(E\) are both block-diagonal, with the same block structure and with \(K\) blocks. We can decompose the subproblem into \(K\) smaller LPs. In the master problem, we replace \(z\) with \(z_1+\dots+z_K\), with \(z_k\) the surrogate for the objective contribution \(c_{2k}'y_k \) from the \(k\)-th subproblem.

When we obtain an incumbent solution \((\hat{x},\hat{z}_1,\dots,\hat{z}_K)\) in the master, we pass \(\hat{x}\) to each subproblem and solve. Some subproblems may be infeasible (generating feasibility cuts); others may have optimal solutions with \(c_{2k}'y_k\gt\hat{z}_k \) (in which case we generate an optimality cut); and some may have optimal solutions with \(c_{2k}'y_k=\hat{z}_k \), requiring no action. If any subproblem is infeasible, then \(\hat{x}\) is infeasible in the original problem regardless of what goes on in other subproblems, and the feasibility cut from the subproblem is a valid feasibility cut in the master. Similarly, if \(\hat{z}_k\) underestimates the objective value in subproblem \(k\), then the optimality cut generated there is valid in the master regardless of what happens in other subproblems. So each feasibility/optimality cut generated in a subproblem can be added to the master problem without modification. There is no need to combine cuts (and, in fact, combining cuts might weaken them).

I'll close with two comments. First, although Benders is usually taught as "solve master, solve subproblem, add one cut, repeat", adding multiple cuts during one iteration is certainly legitimate and possibly beneficial (assuming, as in this case, that you discover multiple valid cuts). The cuts need not all be a single type (optimality or feasibility); a mix is perfectly fine. Second, while it is fine to solve every subproblem at every iteration, you may also opt to solve the subproblems in some order (perhaps cyclical?), abort the cycle the first time a subproblem generates a cut, and add just that one cut. This speeds up each cycle but delays finding cuts. I really do not know which way is better, but I suspect that optimality cuts generated from a value of \(\hat{x}\) that is found to be infeasible in another subproblem may not be all that useful, and multiple feasibility cuts lopping off the same \(\hat{x}\) might be a bit redundant. So I would be inclined to abort the cycle the first time I got a feasibility cut, and add just that cut, but add all the optimality cuts I found if no feasibility cut was generated.

Thursday, August 9, 2012

User Cuts versus Lazy Constraints

After an e-mail exchange with a contact at IBM, I recently gained a bit of clarity (I hope) about the distinction between user cuts and lazy constraints in CPLEX, which I will share here. The terminology and some of the details in this post are specific to CPLEX, but I suspect that a number of other solvers will have something analogous to user cuts and lazy constraints, albeit perhaps under different names. If the terminology is familiar to you, you can skip to the end of the post and find the one potentially useful nugget of information it contains.

Let me start with an illustration of the feasible region of a hypothetical two variable pure integer linear program.
integer program feasible region, showing LP and IP hulls
The polygon bounded in black (including portions of the two axes) is the feasible region of the linear program relaxation (also known as the linear hull or LP hull). This is what you get if you relax the integrality restrictions while enforcing all functional constraints and variable bounds. The blue dots are the integer lattice points inside the polygon, i.e., the feasible solutions that meet the integrality restrictions. The blue polygon, the integer hull or IP hull, is the smallest convex set containing all integer-feasible solutions. The IP hull is always a subset of the LP hull. Assuming a linear objective function, one of the vertices of the IP hull (all of which are lattice points) will be the optimal solution to the problem.

User Cuts


In CPLEX parlance, a cut is a constraint that is not part of the original model and does not eliminate any feasible integer solutions. Cuts are typically added for one of two reasons: to separate a node in the search tree into two nodes representing smaller subproblems; or to strengthen bounds by paring away some of the fat in the LP hull (the "fat" being the set difference between the LP hull and the IP hull). CPLEX automatically adds a variety of cuts, of which the most famous is arguably the Gomory cut. A user cut is simply a cut added by the user, rather than by CPLEX, frequently but not necessarily inside a callback. The next figure illustrates a user cut (red dashed line).
LP and IP hulls with a user cut added
The shaded triangle shows the portion of the original LP hull that is removed by the user cut. Note that no integer solutions were harmed in the construction of this diagram.

Lazy Constraints


In contrast to user cuts, lazy constraints are allowed to pare away integer-feasible solutions. The next figure shows an example of a lazy constraint (green line).
LP and IP hulls with a lazy constraint added
The shaded polygon is again the portion of the LP (and, in part, IP) hull removed by the constraint. Note that, this time, three integer points are being rendered infeasible. The implication of using lazy constraints is that the problem originally specified (first figure) is actually a relaxation of the true problem; adding the lazy constraints to that initial problem gives you the problem you really want to solve.

Why use lazy constraints? Three reasons come to mind. First, a full specification of the model may involve an extremely large number of constraints, many of which you anticipate will be redundant, or at least not binding near the optimal solution. If you knew in advance which of those constraints were redundant, you could simply omit them, but frequently you cannot identify redundant constraints at the outset. Many modern solvers allow you to put a portion of the constraints (what we are calling the "lazy" ones) in a pool, where they are initially not a part of the active model. As solutions are generated, the solver checks to see if any lazy constraints are violated and, if so, adds them to the active set. Lazy constraints that were previously added but have not been binding for a while may be returned to the pool. This is an attempt to reduce the computation time per iteration, and possibly the memory footprint of the problem.

A second and closely related reason to use lazy constraints is that there may in fact be so many constraints that you cannot afford the time or memory to generate them all at the outset. One approach to formulating the Traveling Salesman Problem as an integer program requires the use of subtour elimination constraints. The number of such constraints grows exponentially with the number of nodes in the network. While they are easy to specify in a computer program, for decent size networks you simply cannot afford to list them all. An alternative is to treat them as lazy constraints and generate them on the fly, through a callback.

The third reason to use lazy constraints, specifically in a callback, is that you may not be able to identify them at the time the model is specified. The feasibility and optimality cuts generated during Benders decomposition fall into this category. You discover them by solving one or more subproblems at certain points in the search for the solution to the master problem.

Why Two Categories?


We've observed that the distinction between user cuts and lazy constraints is fundamentally whether or not they can cut off otherwise feasible solutions. The reason we care about that is somewhat arcane. CPLEX and other solvers do a variety of problem reductions, both at the outset (presolving) and on the fly during the search process, to tighten the feasible regions of node problems and expedite the solution process. Some of those reductions involve solutions to dual problems. User cuts do not affect the reductions, but if you hide some actual constraints in a pool of lazy constraints (or generate them after the reductions have been done), the dual solutions used in the reductions are incomplete (lacking dual variables corresponding to the constraints that are not yet included). This can lead to incorrect results.

If CPLEX sees the user adding lazy constraints (or, by virtue of the presence of a lazy constraint callback, threatening to add lazy constraints), it automatically turns of reductions that might be incorrect. I recently ran a program that uses CPLEX (12.4) with a lazy constraint callback. The output contained the following message:

Lazy constraint(s) or lazy constraint callback is present.
    Disabling dual reductions (CPX_PARAM_REDUCE) in presolve.
    Disabling non-linear reductions (CPX_PARAM_PRELINEAR) in presolve.

As user, you can turn off these reductions manually, but including a lazy constraint callback (or adding constraints to the lazy constraint pool during model construction) will induce CPLEX to do it for you.

So What Got Clarified?


A few versions ago, IBM refactored their implementation of cut callbacks. Of particular interest is the conditions under which cut callbacks are called.

  • User cuts may be checked and applied at any time during the optimization process. This includes the cut generation phase at each node. They are not guaranteed to be checked at the time a candidate solution (potential new incumbent) is found, however. Calling a constraint a user cut tells CPLEX that it cannot cut off a feasible solution, so there is no need to test it each time a feasible solution is found.
  • Lazy constraints, on the other hand, are guaranteed to be checked each time CPLEX finds a candidate solution, regardless of how the candidate is found (node LP solution, rounding, various heuristics, inferred from the position of the planets ...). This is important, because a lazy constraint might make the candidate solution infeasible in the full problem. (There is one small exception, at least as of version 12.4 and earlier: a candidate solution injected by the user through a heuristic callback is not checked against the lazy constraints. The user is somewhat charitably assumed to know what she or he is doing, so a user-injected solution is assumed to satisfy all lazy constraints.) What changed in the refactoring of the cut callback interface is that lazy constraints are checked only when a candidate solution is found. They are not routinely checked at every node.
I mainly use lazy constraints in Benders decomposition, and I generate Benders cuts only when a new incumbent is proposed, so this is fine with me. Discussions on various CPLEX and OR forums tells me that other people doing Benders sometimes want to check for possible Benders cuts at every node. There may be other scenarios unknown to me where users desire to check lazy constraints at each node. Before the refactoring, this was possible, but the new rule that lazy constraint callbacks are called only when potential incumbents are identified seems to throw up an obstacle. This is compounded by the fact that the current CPLEX documentation threatens the user with exile to the Phantom Zone should she or he dare lie and call a lazy constraint a user cut.

So here, at last, is the clarification. It turns out that it is legal to add a lazy constraint in a user cut callback (gasp!) as long as the user includes a lazy constraint callback or otherwise ensures that the necessary reductions are turned off. Since user cut callbacks are not called when candidate incumbents are found, at least for Benders decomposition this would suggest adding the same cut generation logic in both a lazy constraint callback (mandatory) and a user cut callback (optional).

I'm told that the next (?) release of official documentation for CPLEX will include this clarification. To close, I'll note that if you intend to sneak lazy constraints into a user cut callback, the safest way to do this may be to include a lazy constraint callback, even if it does nothing (just returns with no action). That way, CPLEX will automatically turn off everything that needs to be turned off. As I said, you can use parameter settings to turn off reductions manually, but which reductions will need to be turned off could potentially change in a new version. I, for one, am too lazy (pun deliberate) to keep up with that stuff.

Friday, July 27, 2012

Benders Decomposition in CPLEX

Author's note: This post is getting a bit long in the tooth. For users with CPLEX 12.7 or later, you might want to read this more recent post (after reading this one).

I finally got around to constructing a reasonably small example of Benders decomposition for solving a mixed-integer linear program (MIP) in CPLEX. What I am about to describe works in CPLEX 12.4 (possibly 12.2 or 12.3) and hopefully will work in the future (since I have no ambition to revise the post). The significance of the CPLEX version is twofold: IBM refactored the API for cuts, emphasizing the distinction between "user cuts" (cuts that tighten bounds without eliminating integer-feasible solutions) and "lazy constraints" (cuts that eliminate feasible solutions); and they decided that lazy constraint callbacks would be called whenever CPLEX found an incumbent, regardless of when and how it found the incumbent (node LP solution, application of heuristics, lucky guess ...), which eliminated the need for incumbent and branch callbacks. If you are still using an earlier version of CPLEX, I strongly recommend you upgrade. Among various reasons, you will need a bunch more code to pull off Benders correctly with an old version.

My test problem is simple: a transportation problem with fixed charges for the use of supply sources. (In the code, I refer to sources as "warehouses" and sinks as "customers"). The fixed-charge transportation problem is well know and rather easy to solve -- too easy to warrant Benders decomposition, in fact. I chose it because it is sufficiently straightforward that it should not interfere with understanding how the code works. (Subsequent to writing the first draft of this post, I discovered a nice document by Erwin Kalvelagen that demonstrates Benders decomposition in GAMS. He too used a fixed-charge transportation problem, but in his version there is a fixed charge for each arc used, rather than for each source.)

The mathematical model is as follows: \begin{gather*} \textrm{minimize }c'x+f'y\\ \textrm{s.t. }\sum_{i}x_{ij}\ge d_{j}\quad\forall j\\ \sum_{j}x_{ij}\le s_{i}y_{i}\quad\forall i\\ x\ge0\\ y_{i}\in\{0,1\}\quad\forall i. \end{gather*} Continuous variable $x_{ij}$ represents the flow from source $i$ to sink $j$; binary variable $y_{i}$ signals whether source $i$ is used (1) or not (0). Parameters $c\ge0$ and $f\ge0$ are unit flow costs and facility use charges respectively, while $s$ and $d$ are supplies (source capacities) and demands.

My Benders decomposition puts the facility decisions in a master problem (MIP) and the flow decisions in a subproblem (linear program). The subproblem is just \begin{gather*} \textrm{minimize }z=c'x\\ \textrm{s.t. }\sum_{i}x_{ij}\ge d_{j}\quad\forall j\\ \sum_{j}x_{ij}\le s_{i}y_{i}\quad\forall i\\ x\ge0 \end{gather*} in which the $y_{i}$ are treated as constants. The master problem is \begin{gather*} \textrm{minimize }f'y+\hat{z}\\ \textrm{s.t. }a^{(k)\prime}y-\hat{z}\le b^{(k)}\quad\forall k\in\mathcal{O}\\ a^{(k)\prime}y\le b^{(k)}\quad\forall k\in\mathcal{F}\\ y_{i}\in\{0,1\}\quad\forall i\\ \hat{z}\ge0 \end{gather*} where $\hat{z}$ is a surrogate for the flow cost component ($z$) of the original objective function (i.e., the value of the subproblem objective), $a^{(k)}$ and $b^{(k)}$ are coefficient vector and constant term for a cut generated during the solution of the problem, $\mathcal{O}$ is the index set of "optimality" cuts (cuts that attempt to correct the master underestimating the true flow costs for a given set of decisions $y$, i.e., $\hat{z}<z$) and $\mathcal{F}$ is the index set of "feasibility" cuts (cuts that eliminate use decisions $y$ that provide too little capacity to satisfy demand). Initially $\mathcal{O}=\emptyset=\mathcal{F}$, and the first master problem solution is both trivial ($y=0$, $\hat{z}=0$) and infeasible in the original model.

To generate cuts, we will need to work indirectly with the dual of the subproblem: \begin{gather*} \textrm{maximize }z=d'\lambda+\sum_{i}(s_{i}\mu_{i})y_{i}\\ \textrm{s.t. }\lambda_{j}+\mu_{i}\le c_{ij}\quad\forall i,j\\ \lambda\ge0\\ \mu\le0 \end{gather*} where $\lambda$ is the vector of shadow prices of the demand constraints and $\mu$ the vector of shadow prices of the supply constraints. Given the assumption that all flow costs are nonnegative ($c\ge0$), the dual is obviously feasible ($\lambda=0=\mu$ works). If, for a given choice of $y$, the LP subproblem has an optimal solution, then so will the dual (with equal objective value $z$). Should the master problem generate a solution $(y,\hat{z})$ for which the subproblem is feasible with optimal value $z=\hat{z}$, we accept that solution as the new incumbent. If, instead, the subproblem is feasible but $\hat{z}<z$, then we add a new optimality cut $\hat{z}\ge d'\lambda+\sum_{i}(s_{i}\mu_{i})y_{i}$ where $(\lambda,\mu)$ is the dual solution to the subproblem (so that $a_{i}^{(k)}=s_{i}\mu_{i},b^{(k)}=-d'\lambda$).

Should the master problem generate a solution $(y,\hat{z})$ for which the subproblem is infeasible, then the dual will be unbounded, and there will be a dual solution $(\lambda,\mu)$ such that $t(\lambda,\mu)$ is feasible in the dual for all $t>0$ and $d'\lambda+\sum_{i}(s_{i}\mu_{i})y_{i}>0$. Note that the master problem variables ($y$) have no role in defining the dual problem's feasible region; in the dual, they occur only in the objective. So the dual ray we just obtained will remain a recession direction for the dual regardless of $y$, and to get a feasible subproblem we need to modify $y$ so that $d'\lambda+\sum_{i}(s_{i}\mu_{i})y_{i}\le0$ (i.e., so that the ray now points in an undesirable direction). This gives us our feasibility cut ($a_{i}^{(k)}=s_{i}\mu_{i},b^{(k)}=-d'\lambda$). Observe that optimality and feasibility cuts have the same constant term and coefficient vector; the only difference is the inclusion of $\hat{z}$ in optimality cuts and its exclusion in feasibility cuts.

On to the code. Let me start with my usual disclaimer: I am not a professional coder, and I make no claim that my code is particularly efficient, nor that I took the fastest/most straightforward route to the solution. As usual, the code is in Java. For the most part, conversion to other APIs that expose the callback functionality should be straightforward, but there are a few wrinkles to note.
  • If you code the comparisons $\hat{z}<z$ and $z=\hat{z}$ literally, the gods of computing will wait until your back is turned and then bite you in your derrière. There will be rounding errors in both $z$ and $\hat{z}$, so asking for exact equality is inviting disaster. Insert some rounding tolerance into your comparisons. In my code, look for a constant named FUZZ.
  • When the LP subproblem is feasible, we can get the dual values directly with a function named getDuals. When the LP is infeasible, it is tempting to use the getRay function to get the dual ray, but to do that you would have to formulate the dual itself, solve it directly, and then invoke getRay on the dual. It is easier to ask for a Farkas certificate, which provides the ray without further ado.
  • I found it necessary to turn off presolving of the subproblem. If the presolver detects that the subproblem is infeasible, it reports that fact with no option to get a Farkas certificate. So I sacrifice whatever speed advantage the presolver confers in order to have access to the Farkas certificate when the subproblem is infeasible.
  • Update 11/30/16: I recently tripped over another thing that I should have realized but somehow overlooked. In order to access the Farkas certificate, the LP subproblem must be solved by one of the simplex solvers (primal, dual or network). I'd been leaving the choice up to CPLEX, and on one instance it used the barrier method, decided the LP was infeasible, and then threw an exception when the Farkas certificate was requested. There's also a glitch in the CPLEX documentation for the dualFarkas method. It turns out that dualFarkas only works if the LP was solved by the dual simplex algorithm. So the key is to set the RootAlg parameter for the LP to 2 (dual simplex).
  • Forming a cut (either optimality or feasibility) involves multiplying the right-hand side of a subproblem constraint, expressed as a linear function of the master variables, by the corresponding dual value. In this particular example, the right-hand side always has a single term, either a constant ($d_{j}$) in a demand constraint or a multiple of a single master variable ($s_{i}y_{i}$) in a supply constraint.
    • More generally, though, each right-hand side could contain multiple terms. I deliberately over-engineered my code a bit so that it will generalize to other problems. What I do in the code is form and store an instance of IloNumExpr for each right-hand side, at the time that I am creating the subproblem. Inside the lazy constraint callback, I multiply those expressions by the corresponding dual values and sum to get the necessary expression ($a^{(k)\prime}y-b^{(k)}$) for the new cut. Somewhat counterintuitively, and for rather arcane reasons, this cannot be done with IloLinearNumExpr, even though the right-hand sides are in fact linear.
    • Since any master variable can occur in multiple subproblem right-hand sides, the method I just described for assembling the cut expression will typically result in multiple terms involving the same variable ($\dots+\alpha_{3}y_{7}+\dots+\alpha_{12}y_{7}+\dots+\alpha_{23}y_{7}+\dots$). Not to worry: when you actually add the cut, CPLEX will combine like terms ($\dots+(\alpha_{3}+\alpha_{12}+\alpha_{23})y_{7}+\dots$) for you.
  • Interpreting the Farkas certificate is a bit tricky, since CPLEX returns the dual values in an arbitrary order (see my earlier post "Infeasible LPs and Farkas Certificates"). To deal with this, I store the subproblem constraints in two arrays (one for supply constraints, one for demand constraints), and then use a HashMap to map each constraint to the corresponding right-hand side expression.
    • The C++ Standard Template Library has some analogous container class you can use (don't ask me which; I'm blissfully C++-free and mean to stay that way). With Python, you would create a dict. With other APIs, you are on your own.
    • The reason I use two arrays, rather than storing all constraints in a single array, is that it facilitates updating right-hand sides in the subproblem each time the callback is invoked. I need to recompute the RHS of each supply constraint ($s_{i}y_{i}$, using the new value of $y_{i}$), but not the RHS of each demand constraint ($d_{j}$). Having the supply and demand constraints separate makes this a bit easier, but in a more general application (where most or all constraints require an update), you may want to use more than two arrays, or just a single array.
  • Once the master problem has been solved, you have the optimal set of warehouse use decisions ($y$). There's a fairly good chance you will also want to know the optimal flows ($x$). The subproblem may well have been solved more recently than when the callback accepted what proved to be the optimal $y$; after that node, CPLEX may have generated some incumbents that your callback tested and rejected. So the most recent subproblem solution may not correspond to the optimal $y$. One resolution is to solve the subproblem one last time, using the optimal $y$. The other (which I use in the code) is to store the subproblem solution any time an incumbent is accepted, overwriting any previously stored solution. When the master is solved, the stored subproblem solution will be the correct flows.
That's it. To avoid inducing comas among readers, I will not go through the code line by line here. Instead, I will just provide this link for anyone who wants to download it. [UPDATE: A newer version of CPLEX broke the code. I have created a repository for the updated code; please see this more recent post for details.]

Sunday, October 9, 2011

Benders Decomposition Then and Now

A couple of years ago, a coauthor and I had a paper under review at a prestigious journal that shall remain nameless. In the paper we solved a mixed integer linear program (MILP) using Benders decomposition, with CPLEX link as the solver, and we employed callbacks to check each potential solution to the master problem as it occurred. One of the reviewers asked me to justify the use of callbacks. My first inclination was a variation on “because it's standard practice”, but since I have not actually surveyed practitioners to confirm that it is standard (and because, as an academic, I'm naturally loquacious), I gave a longer explanation. What brought this to mind is a recent email exchange in which someone else asked me why I used callbacks with Benders decomposition. It occurred to me that, assuming I'm correct about it being the current norm, this may be a case of textbooks not having caught up to practice. So here is a quick (?) explanation.

The original paper by Jack Benders [1] dealt with both continuous and discrete optimization, linear and nonlinear. The application I have in mind here deals with problems that, without loss of generality, can be modeled as follows:
\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & c_{2}'y\\ \textrm{subject to} & Ax & & & \ge & a\\ & & & By & \ge & b\\ & Dx & + & Ey & \ge & d\\ & x & \in & \mathbb{Z}^{m}\\ & y & \in & \mathbb{R}^{n} \end{array} \] 
Benders decomposition splits this into a master problem and a subproblem. The subproblem must be a linear program (LP), so all the discrete variables have to go into the master problem (a MILP). You can optionally keep some of the continuous variables in the master, but I will not do so here. You can also optionally create multiple subproblems (with disjoint subsets of $y$), but again I will not do so here. Finally, in the spirit of keeping things simple (and because unbounded models are usually a sign of a lazy modeler), I'll assume that the original problem is bounded (if feasible), which implies that both the master and subproblem will be bounded (if feasible).

The master problem is as follows:
\[ \begin{array}{lrclccc} \textrm{minimize} & c_{1}'x & + & z\\ \textrm{subject to} & Ax & & & \ge & a\\ & g'x & + & z & \ge & f & \forall (g,f)\in\mathcal{O}\\ & g'x & & & \ge & f & \forall (g,f)\in\mathcal{F}\\ & x & \in & \mathbb{Z}^{m}\\ & z & \in & \mathbb{R} \end{array}. \]Variable $z$  acts as a surrogate for the objective contribution $c_{2}'y$ of the continuous variables. Set $\mathcal{O}$ contains what are sometimes known as “optimality cuts”: cuts that correct underestimation of $c_{2}'y$ by $z$. $\mathcal{F}$ contains what are sometimes known as “feasibility cuts”: cuts that eliminate solutions $x$ for which the subproblem is infeasible. Initially $\mathcal{O}=\emptyset=\mathcal{F}$. The subproblem, for a given $x$ feasible in the master problem, is: \[ \begin{array}{lrcl} \textrm{minimize} & c_{2}'y\\ \textrm{subject to} & By & \ge & b\\ & Ey & \ge & d-Dx\\ & y & \in & \mathbb{R}^{n} \end{array} \]Optimality cuts are generated using the dual solution to a feasible subproblem; feasibility cuts are generated using an extreme ray of the dual of an infeasible subproblem. The precise details are unnecessary for this post.

Back in the '60s ('70s, '80s), solvers were essentially closed boxes: you inserted a problem, set some parameters, started them and went off to read the collected works of Jane Austen. Intervening in the algorithms was not an option. Thus the original flowchart for Benders decomposition looked like the following.
The modern approach, using callbacks, looks like this:
Obviously the modern approach can only be implemented if you are using a solver that supports callbacks, and requires more advanced programming than the classical approach. Other than that, what are the advantages and disadvantages?


The main advantage of the callback approach is that it is likely to avoid considerable rework. In the original approach, each time you add a cut to the master problem, you have to solve it anew. Although the new cuts may change the structure of the solution tree (by changing the solver's branching decisions), you are probably going to spend time revisiting candidate solutions that you had already eliminated earlier. Moreover, you may actually encounter the optimal solution to the original problem and then discard it, because a superoptimal solution that is either infeasible in the original problem or has an artificially superior value of $z$ causes the true optimum to appear suboptimal. With the callback approach, you use a single search tree, never revisit a node, and never overlook a truly superior solution.

The potential disadvantage of the callback approach is that you potentially generate a cut each time a new incumbent is found, and some of those cuts quite possibly would have been avoided if you followed the classical approach (and added a cut only when an “optimal” solution was found). Modern solvers are good at carrying “lazy” constraints in a pool, rather than adding the cuts to the active constraint set, but there is still a chance that the modern approach will take longer than the classical approach. I believe (but have no data to support my belief) that the modern approach will typically prove the faster of the two.

[1] Benders, J. F., Partitioning Procedures for Solving Mixed-Variables Programming Problems, Numerische Mathematik 4 (1962), 238--252.