Thursday, July 8, 2021

Big-M versus Goals

Much has been written (including by yours truly) about the trials and tribulations of "big M" integer programming models. A common use of "big M" is to let binary variables turn constraints on or off. So, for instance, $$a^\prime x \le b + Mz\quad (1)$$with $x$ a vector of continuous variables and $z$ a binary variable is intended to enforce $a^\prime x \le b$ when $z=0$ and not enforce it when $z=1$.

Large values of $M$ can contribute to weak relaxations (leading to slow progress on the bound), "leakage" (where a value of $z$ can be close enough to 0 that the solver considers it 0 to within rounding error while making $Mz$ big enough to relax the constraint), and various numerical problems. Small values of $M$ may make the constraint binding when it is supposed to be relaxed. Still, for most people the "big M" approach is the only way they know to model certain problem features.

One alternative, at least in some cases, is to use "combinatorial Benders decomposition" [1], in which the constraints in question are either enforced or ignored in a subproblem depending on the values of $z$ in the master problem. The good news is that this eliminates any worries about choosing $M$, since no coefficients $M$ are needed. The bad news is that (a) Benders decomposition is a bit advanced for many users, (b) it may require custom programming (as opposed to making the model in a high-level language and letting the modeling environment pass it to the solver), and (c) Benders decomposition is an "outer approximation", so the best bound may be a bit leisurely in converging to the optimum.

There is another alternative available, at least with CPLEX. Recent versions of CPLEX support "goals". The user's manual does a masterful job of not actually defining what a goal is -- according to the CPLEX 20.1 manual, goals are things that "allow you to take control of the branch & cut search procedure used by IBM ILOG CPLEX to solve MIP problems". Basically, a goal is an alternative form of a constraint (I think), which rather than explicitly appearing in the constraint matrix is put in a stack of goals, passed to nodes when they are created, and somehow used to influence the creation of child nodes (I think).

The tie-in to today's topic is that one type of goal provided by CPLEX is an "or" goal, which is what it sounds like: a disjunction of two or more constraints or goals. So an alternative to writing constraint (1) with the dreaded $M$ would be to use an "or" goal $$a^\prime x \le b \mathrm{\quad OR \quad} z=1.\quad (2)$$

I was curious about how well this would work, so I tried to do a comparison between "big-M" and goal-based models for a two-group discriminant problem. The gist of the model is as follows. We have as data a sample of vectors $x_i\in \mathbb{R}^n$ from two groups. Let $G_0$ and $G_1$ denote the indices belong to the first and second groups respectively. We want to find coefficients $w\in \mathbb{R}^n$, $w_0 \in \mathbb{R}$ for a linear function $f(x) = w^\prime x + w_0$ such that $f(x) \lt 0$ predicts membership of $x$ in the first group and $f(x) \gt 0$ predicts membership in the second group.

The specific model I started with (from some research I did in my much younger days) includes one more variable $d\ge \delta$ (where $\delta$ is some small positive constant) and binary variables $z_i$ signaling whether an observation is correctly ($z_i =0$) or incorrectly ($z_i=1$) classified. Variable $d$ captures the minimum absolute score of a correctly classified observation, which in essence represents the amount of separation between (correct) scores for the two groups. If $d$ is too small, you may end up classifying observations positive or negative based on what amounts to rounding error, hence the lower bound on $d$.

The "big M" version is as follows: \begin{align*} \min\quad\sum_{i}z_{i}-\epsilon d\\ \mathrm{s.t.}\quad w^{\prime}x_{i}+w_{0}+d & \le\phantom{-}M_{i}z_{i}\quad i\in G_{0}\\ w^{\prime}x_{i}+w_{0}-d & \ge-M_{i}z_{i}\quad i\in G_{1}\\ -1\le w & \le1\\ w_{0} & \quad \mathrm{free}\\ d & \ge\delta\\ z_{i} & \in\left\{ 0,1\right\} \quad\forall i. \end{align*}The model minimizes the number of misclassifications with a secondary criterion of maximizing separation. The coefficient $\epsilon$ is chosen to keep the objective contribution small enough that the solver is not tempted to make unnecessary misclassifications just to boost the value of $d$. Putting bounds on $w$ prevents huge coefficients for the classifier (which again could result in decisions being made based on rounding error). The model has been shown to work correctly.

The goal version of the model keeps the bounds on the variables and objective function but replaces all the "big-M" constraints with disjunctions of the form $w^\prime x_i +w_0 +d \le 0$ or $z_i=1$ for $i\in G_0$ and similarly for $i\in G_1$. In other words, "classify this observation correctly or pay the price for misclassifying it". I coded both models in Java and ran a test case, expecting both to produce an optimal classifier but unsure which would be faster. There was an unpleasant surprise waiting for me: CPLEX declared the goal-based model unbounded! It was right. You can satisfy all the disjunctions by declaring all the observations misclassified ($z_i = 1$ for all $i$). That lets you choose an arbitrarily large value for $d$, large enough that $\epsilon d$ is arbitrarily bigger than the sum of the $z$ variables, making the objective arbitrarily negative.

This is not a problem with the "big M" model, because no matter how large you make $M_i$, you still have a finite bound on the left side of each constraint. The fix was to come up with a defensible upper bound for $d$ and add it to the goal model, making the goal model bounded. With that fix in place, both models arrived at optimal solutions, in what was comparable time for the one test case I have run so far.

So the takeaway here is that if you want to use disjunctions to avoid "big M", you may need to take extra care to ensure that your model is bounded.

[1] Codato, G. and Fischetti, M. Combinatorial Benders' Cuts for Mixed-Integer Linear Programming. Operations Research 54(4), 2006, 756-766.