Of course, to try out an optimization modeling package you need to have something to optimize. I went back to some ancient research I did, specifically a paper in 1990 on using MIP models to choose linear (or polynomial) discriminant functions for classifying observations into one of two groups. For the sleep deprived, the full citation is:
Rubin, P. A. Heuristic solution procedures for a mixed-integer programming discriminant model. Managerial and Decision Economics, Vol. 11, No. 4, October 1990.I used Anderson's iris data for the test case (since it's conveniently available in R, through the datasets package), and just for the halibut I also threw in a support vector machine (SVM) model using the kernlab package for R. Comparing the two is a bit misleading, since SVM models are inherently nonlinear, but I just felt like doing it. In any case, the purpose is to see how OMPR works with CPLEX, not to compare MIP discriminant models and SVMs.
The details are too lengthy for a blog post, so I posted them separately in an R notebook. If you're not familiar with R notebooks, you can find details here. The web page generated by my notebook contains the source code, and there's a control in the upper right of the web page that will let you download it as a notebook (R Markdown) file. You can also grab it from my GitLab repository for the project. As with other content of this blog, it's yours to play with under a Creative Commons license.
The MIP model is as follows. We start with samples of two classes ( "positive" and "negative") containing $p$ features. Let $X_1$ be the $N_1\times p$ matrix of data for the negative sample and let $X_2$ be the $N_2\times p$ matrix of data for the positive sample. The discriminant function we wish to train is $f(x)=w'x+c$; we will classify an observation $x$ as belonging to the positive or negative class according to the sign of $f(x)$. To allow for both limited measurement accuracy of the data and the inevitable adventures with rounding error, we arbitrarily choose some constant $\delta > 0$ and declare a positive observation correctly classified if $f(x)\ge \delta$ and a negative observation correctly classified if $f(x)\le -\delta$.
To avoid the pitfall of having the solver scale $w$ up to some ridiculous magnitude in order to force borderline observations to "look" correctly classified (i.e., to get around the use of $\delta$ as a minimum magnitude for non-zero classification scores), we bound $w$ via its sup norm: $-1 \le w_j \le 1 \, \forall j\in \{1,\dots,p\}$. The constant term $c$ is unbounded and unrestricted in sign.
To detect and count misclassifications (including ambiguous classifications, $-\delta < f(x) < \delta$, we introduce binary variables $y_i,\, i\in\{1,\dots,N_1\}$ for the negative sample and $z_i,\, i\in\{1,\dots,N_2\}$ for the positive sample. Each will take value 0 if the corresponding observation is correctly classified and 1 if not. In the OMPR demo, I just count total misclassifications ($\sum_i y_i + \sum_i z_i$); in general, misclassifications can be weighted to account for prior probabilities, oversampling of one group relative to the other, or relative importance of different types of errors (e.g., false positives looking for cancer tumors are bad, but false negatives can be fatal). There is also a variable $d$ that captures the minimum absolute value of any correct classification score (i.e., how far correct scores are from the boundary value of 0). Larger values are rewarded in the objective function (using a coefficient $\epsilon > 0$).
The model also contains "big M" type constraints to define the binary variables. Formulas for selecting the values of the parameters $M_1$, $M_2$ and $\epsilon$ are given in the paper. So, finally, we get to the MIP model:
\begin{alignat*}{2} & \textrm{minimize} & \mathbf{1}^{\prime}y+\mathbf{1}^{\prime}z-\epsilon d\\ & \textrm{subject to}\quad & X_{1}w+c \cdot \mathbf{1}+d \cdot \mathbf{1}-M_{1} \cdot y & \le 0\\ & & X_{2}w+c \cdot \mathbf{1}-d \cdot \mathbf{1}+M_{2} \cdot z & \ge 0\\ & & w & \le \mathbf{1}\\ & & w & \ge -\mathbf{1}\\ & & c \quad & \textrm{free}\\ & & d \quad & \ge \delta\\ & & y, \: z \quad & \textrm{binary} \end{alignat*}
I'm using OMPR to call Symphony and have a weird circumstance. I put the Symphony solver arguments to be verbose, so I know that the problem solves to optimality because the R console shows the optimal decision variable values. Yet when I use ompr::get_solution, the solution object says that the problem is infeasible. Any idea why this may be happening?
ReplyDeleteI've only used OMPR with CPLEX (and I've never used Symphony for anything), so I don't really know. It's common for solver libraries to provide a "getter" method to get a status code of some kind (optimal, feasible but maybe not optimal, infeasible, ...). If ompr::get_solution grabs a status code first, and if the status code for some reason says infeasible, that would explain the outcome. Are you sure that what you are seeing in the console is the optimal solution (and is in fact feasible in the original model)?
DeleteDefinitely feasible, I assume it is optimal since the verbosity on the console says so (and it looks very similar to the console output when I've used Rsymphony before). Some more digging and I found that this only occurs when I have additional arguments in solve_model. When my argument is just (solver='symphony'), no issue. But whenever I try to add additional arguments to pass to symphony (verbosity, time limit, gap limit) then the returned object says infeasible. I've logged a issue with Dirk on github.
ReplyDeleteIf you're willing to do some additional coding to experiment, you might try bypassing OMPR and working directly with ROI and the Symphony plug-in for it. It's possible this is a bug in ROI (which means Dirk can't fix it).
Delete