This is about formulating and solving mixed integer programs. It has
nothing to do with political office holders, although it's about as
long-winded as one of their speeches to a camera and an empty legislative
chamber. The topic is one of those things that slips easily under
the radar, can pop up to bite you in the butt, and is blindingly obvious
in hindsight.
It's well known that, in general, getting good incumbents early is
helpful if not essential in solving mixed integer linear programs
(MILPs). (MILPs being the perverse beasts they are, virtually nothing
is always true of them, so assume everything I say carries an implicit
"in general" qualifier.) Good incumbents let the solver prune
larger chunks of the search tree earlier.
It's also common practice, in some situations, to model an "if
and only if" constraint as an "if", relying on the objective
function to enforce the "only if", perhaps through a penalty
mechanism. Let's start with a general MILP, of the form\[
\begin{array}{lc}
\textrm{minimize} & f(x)\\
\textrm{subject to:} & x\in\mathcal{X}\end{array}\]
where $f()$ is a linear objective function, $x$ is a mixture of
discrete and continuous variables, and $\mathcal{X}$ is a feasible
region defined by linear equality or (weak) inequality constraints
and integrality conditions. Now let $g_{1}()$, $g_{2}()$ and $g_{3}()$
be linear functions; let $d^{+}\ge0$, $d^{-}\ge0$ and $y\ge0$ be
continuous nonnegative variables, and let $z\in\{0,1\}$ be a binary
variable; and let $\alpha^{+}$, $\alpha^{-}$, $\beta$, $\gamma$
and $M$ be strictly positive constants. To our general MILP we add
goals $g_{1}(x)=0$, $g_{2}(x)\le0$ and $g_{3}(x)\le0$, and assess
penalties proportional to deviations in the first two and a fixed
penalty for deviations in the third. A typical way to deal with these
converts the original MILP to the following:\[
\begin{array}{lcr}
\textrm{minimize} & f(x)+\alpha^{+}d^{+}+\alpha^{-}d^{-}\\ & +\beta y+\gamma z & (1)\\
\textrm{subject to:} & x\in\mathcal{X} & (2)\\
& g_{1}(x)=d^{+}-d^{-} & (3)\\
& g_{2}(x)\le y & (4)\\
& g_{3}(x)\le Mz & (5)\end{array}\]
with $M$ (the infamous "big M") chosen sufficiently large that
we can be sure the optimal solution $x^{*}$ satisfies $g_{3}(x^{*})\le M$.
Constraints like (3) are common in goal programming. Penalizing overtime
or tardiness in production and scheduling problems usually leads to
constraints of the form (4). Problems with fixed costs, such as the
bin packing problem or the lock box problem, often result in constraints
of the form (5). The first critical observation is that, while it
is not feasible to violate one of those constraints and "underpay"
in the objective, it is feasible to "overpay". Suppose that
$(\hat{x},\hat{d}^{+},\hat{d}^{-},\hat{y},\hat{z})$ is an integer-feasible
solution to the problem with objective value $\hat{f}$; then we can
add any positive amount $\epsilon$ to both $\hat{d}^{+}$ and $\hat{d}^{-}$,
or to $\hat{y}$, and obtain another feasible solution with objective
value $\hat{f}+(\alpha^{+}+\alpha^{-})\epsilon$ or $\hat{f}+\beta\epsilon$
respectively. If $\hat{z}=0$, then $(\hat{x},\hat{d}^{+},\hat{d}^{-},\hat{y},1)$
is also feasible with cost $\hat{f}+\gamma$. In formulating the model,
we are ordinarily not concerned with this since minimization of the
objective will ensure that \emph{the optimal solution} does not overpay.
Moreover, adding constraints that would ensure that $d^{+}$ and $d^{-}$
are not simultaneously positive, or that $g_{2}(x)\le0\Longrightarrow y=0$,
or that $g_{3}(x)\le0\Longrightarrow z=0$, may involve the addition of integer
variables (which raises computational expense) and may poke holes
in the feasible region. As an example of the latter, consider that
the contrapositive of $g_{3}(x)\le0\Longrightarrow z=0$ is $z=1\Longrightarrow g_{3}(x)>0$.
Since mathematical programs admit only weak inequalities (to keep
the feasible region closed), this winds up being $z=1\Longrightarrow g_{3}(x)\ge\epsilon$
for some small (but nonzero) positive $\epsilon$, and we have arbitrarily
excluded any solution that might have $0\ltg_{3}(x)\lt\epsilon$.
So common practice is to accept the fact that formulation (1)-(5)
allows overpayment in theory, and rely on the objective to prevent
overpayment in practice. What of intermediate (suboptimal) solutions,
though? In classical branch-and-bound, we have reason to be optimistic:
new incumbents are found exclusively as integer-feasible solutions
to the linear relaxation of the subproblem at some node (meaning that
some of the integer variables are fixed, and the remaining integer
variables and all continuous variables are determined by solving a
linear program). Since all the continuous variables are determined
by solving a linear program, we do not have to worry about "padding"
in $d^{+}$, $d^{-}$ or $y$. (In fact, the simplex method itself
precludes $d^{+}$ and $d^{-}$ from simultaneously being positive:
since their columns are negatives of each other, they cannot both
be part of the basis, as that would make the basis matrix singular.)
As to $z$, there is the possibility that $z$ is locked at 1 (due
to prior branching decisions) while $g_{3}(x)\le0$; so the incumbent
found at this node might possibly have a padded objective value. Contemporary
solvers, however, frequently include heuristics that generate potential
incumbents even when the solution to the node LP is not integer-feasible.
Those incumbents can again result in padding of the form $g_{3}(x)\le0$,
$z=1$; but, depending on the heuristic, they may also allow $g_{2}(x)\le0$
and $y>0$, or possibly even $d^{+}$and $d^{-}$ simultaneously positive.
Should we care about this, given that the optimal solution is guaranteed
not to be padded? Yes, for two reasons. First, in practice we frequently
have to settle for good but not optimal solutions, either because
finding the optimal solution would take prohibitively long or because
\emph{proving} the optimal solution is actually optimal would take
prohibitively long. If we are going to settle for an intermediate
incumbent, it behooves us to check it for gratuitous overpayment and,
if necessary, polish it (in other words, accept the value of $x$
and compute $d^{+}$, $d^{-}$, $y$ and $z$ for ourselves). That's
not a big deal; we're really accepting the solution itself as-is and
just recomputing the objective value. The other reason for concern,
though, is not as easily dismissed. If the MILP solver generates a
new incumbent with an inflated objective value, we do not get the
full value of that incumbent in pruning nodes. The solver only prunes
nodes whose objective bound (typically the objective value of the
LP relaxation at that node) is worse than the inflated objective value
of the new incumbent; we lose the opportunity to prune nodes whose
LP bounds are better than the inflated value but worse than the actual
(polished) objective value of that solution. In fact, if the objective
value is sufficiently inflated, the solver may not consider the solution
a new incumbent at all, and we never see it, even thought its true
objective value would make it useful.
I think this problem tends to fly below the radar, in part because
we generally have no indication in the output that inflation is taking
place, and in part because we subconsciously think that the simplex
algorithm (or whatever is solving the node LPs) will protect us from
it. I recently tripped over this in a very real way, though. The only
reason I knew it was going on was that I was comparing outright solution
of a MILP model (using the Java API to CPLEX) to the performance of
a blended metaheuristic on the same problem. The code I used to solve
it with CPLEX was a branch of the metaheuristic code, and used the
metaheuristic's method of computing the objective value. Watching
my code's output with one eye and CPLEX's node log with the other,
I was unpleasantly surprised to see CPLEX announce an incumbent with
objective value $f^{*}$ and my code announce a considerably lower
objective value (in the worst cases lower by factors of two to five).
I eventually tracked the discrepancy to problems with constraints
of the form (5).
What can we do about it? We can, at least in some cases, add constraints
that preclude the padding; but those constraints inflate the size
of the problem and may slow the solver. We can use an incumbent callback
to detect the problem and a cut callback to install the necessary
additional constraints locally, so that they apply only to the subtree
descending from the node where we detected the inflation. Compared
to installing all the anti-inflation constraints up front, this makes
for smaller node problems, but it may require rediscovering some of
those constraints repeatedly. Also, because it relies on an incumbent
callback to detect that a proposed incumbent has an inflated objective
value, it does not help with cases where the inflation is enough to
prevent the solver from recognizing the solution as a new incumbent.
In my case, I was already using an incumbent callback to capture and
record incumbents as they were found. I added code to the incumbent
callback that checked for padding and, if it found any, queued the
incumbent for "polishing". Then I added a heuristic callback
that does nothing unless a solution is queued, in which case it polishes
the solution and injects it (same $x$ but $d^{+}$, $d^{-}$, $y$
and/or $z$ adjusted as necessary) as a new incumbent, which CPLEX
recognizes with the correct objective value. Like the second proposed
solution above, it will not help if the inflation makes a new solution
look worse than the current incumbent, but it is fairly streamlined
and does seem to help the solution process.
No comments:
Post a Comment
Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.