Friday, May 31, 2013

A Priori Objective Bound

I was asked a question about mixed integer programs (MIPs) that I'm pretty sure I've seen on forums before, so I'll summarize my answer here. The question was in the context of the CPLEX solver, but the answer applies more generally.

Consider a MIP model of the form \[ \begin{array}{lrclr} \mathrm{maximize} & s\\ \mathrm{s.t.} & Ax+s & \le & b\\ & s & \ge & 0\\ & x & \in & X \end{array} \]where $X$ is some domain and $x$ may be a mix of integer and continuous variables. The questioner knows a priori that $s\le\bar{s}$ for some constant $\bar{s}$. When he adds $\bar{s}$ as an upper bound for $s$, though, his solution time roughly triples. So (a) why does that happen and (b) how can knowledge of $\bar{s}$ be exploited?

Answering the second question first, the main virtue of knowing $\bar{s}$ is that we can stop the solver as soon as it finds a solution with $s = \bar{s}$. Of course, this is only helpful if the bound is tight, i.e., if a feasible solution does exist with $s = \bar{s}$. If so, I think the best way to accomplish the desired end is to attach a callback that monitors incumbents and, when it sees one with $s = \bar{s}$, tells the solver to terminate the search.

Solvers (including CPLEX) typically incorporate a relative convergence criterion, which stops the solver if an incumbent $(x, s)$ is found with\[\frac{\hat{s}-s}{\hat{s}}\le\delta,\]where $\hat{s}$ is the current best bound and $\delta$ is a parameter. Asserting the known a priori bound, this effectively becomes\[\frac{\min(\hat{s},\bar{s})-s}{\min(\hat{s},\bar{s})}\le\delta\]with the left side of the new inequality less than or equal to the left side of the previous inequality (left to the reader as an exercise). Thus the relative convergence termination criterion might be triggered sooner (a good thing), provided that the bound $\bar{s}$ could be communicated to the solver in a benign way.

There are at least two more effects, though, that are harder to predict. Suppose that, in the absence of $\bar{s}$, the solver at some juncture is staring at a number of live nodes with bounds $s_1,s_2,\dots$ greater than $\bar{s}$ and not all identical (which is likely to be the case). When it comes time to leave whatever branch of the tree it is currently exploring, the solver chooses the node with the best bound (the largest among $\{s_1, s_2,\dots\}$). Now suppose that the solver has been informed, in some way, of the upper bound $\bar{s}$. The bounds of all those live nodes are truncated to $\{\bar{s},\bar{s},\dots\}$, meaning that they are all tied for best. The solver will select one, not necessarily the same one it would have selected without $\bar{s}$, and begin exploring it. Thus knowledge of the bound changes the order in which the solution tree is explored, and we have no way of knowing whether this moves the solver from a less productive area of the tree to a more productive area or vice versa.

CPLEX (and I assume other solvers) also employ a test to decide when to backtrack (shift exploration to nodes higher in the tree without pruning the current node). Let $s_{lp}$ denote the bound provided by the linear relaxation of the current node, let $s_{inc}$ denote the value of the current incumbent solution, and as before let $\hat{s}$ be the best bound (the largest linear relaxation bound of any live node). Backtracking occurs when$$\hat{s}-s_{lp}\ge \alpha(\hat{s}-s_{inc})$$with $\alpha\in [0,1]$ a parameter. Smaller values of $\alpha$ promote backtracking ($\alpha=0$, if allowed, would be pure best-first search), while larger values of $\alpha$ inhibit backtracking ($\alpha=1$, if allowed, would be pure depth-first search). The backtracking inequality can be rewritten as$$(1-\alpha)\hat{s}\ge s_{lp}-\alpha s_{inc}.$$Knowing $\bar{s}$ would effectively replace $\hat{s}$ with $\min(\hat{s},\bar{s})$, making the inequality harder to satisfy and thus making backtracking less likely to occur (pushing the solver closer to depth-first search). Again, this changes the way the solution tree is explored, in a way whose impact is difficult to predict.

The questioner said that his solution time roughly tripled when he added $\bar{s}$ as an upper bound. As far as I can tell, that was just bad luck; the solution time might decrease in other cases (but also might increase more). Personally, the only use I would make of $\bar{s}$ would be in an incumbent callback to terminate the search as soon as $s=\bar{s}$ occurred -- and I would only do that if I thought that $\bar{s}$ was really the optimal value (not just a tight bound).

Thursday, May 23, 2013

SQLite Studio

A high proportion of my research projects end up with my writing code that solves a set of test problem and spews results that eventually need to be analyzed. Once upon a time this mean parsing log files, transcribing data to a spreadsheet or statistics program and proceeding from there. In recent years I've taken to adding a database interface to the program, so that it records results (both intermediate and final) to a database. Add records of problem characteristics (as seen by the program) and parameter values (as passed to the program) and the database becomes a one-stop-shop for any data I need when it comes time to write the associated paper.

SQLite has proven to be a great choice for the back end, because it does not require a server. It understands a reasonably complete dialect of SQL and is more than sufficiently efficient for my purposes. Drivers are readily available for it. (I program in Java, and I've been having very good luck with SQLite JDBC.) As with pretty much everything I use, it is open-source.

I typically create the database outside my program, and once my program has done its thing I need to access the results stored in the database. There are a variety of client programs for SQLite. The best one I've found so far is SQLite Studio (open-source, available for multiple platforms). It has a very complete feature set and is remarkably easy to use. I recently had to merge data from multiple tables of one database into the corresponding tables of another one. SQLite Studio allowed me to open both databases and then do the merge with simple SQL queries; it handled the mechanics of "attaching" one database to the other silently in the background.

So I thought I should give SQLite, SQLite JDBC and especially SQLite Studio a shout-out.

Wednesday, May 15, 2013

A NetBeans Bug

I ran into a momentarily scary bug in the NetBeans 7.2 IDE as I was working on a Java program. I'm posting details here in case anyone else is doing what I did -- desperately googling for a fix.

I had just successfully run my code from the IDE and committed the changes (using Git, although I doubt that's relevant). Somewhere between doing the commit and possibly starting to edit a file -- I forget exactly where -- the IDE gave me a low memory warning. So I exited and restarted the IDE, thinking that might free up some memory. When I went to run my code (unmodified from the previous successful run), the program would not start. Instead, I was informed in the terminal window that a ClassNotFoundException had occurred. The named class was definitely there, at least in source form. I tried a "clean and build" to recompile the code, but the exception repeated. Restarting the IDE (again) did not help. Rebooting the PC did not help.

So I went off to Google to hunt for the reason for the exception, and found a variety of bug reports (from 2011 onward) that might or might not be pertinent. Fortunately, one of them contained a suggestion to turn off the Compile on Save feature. You can access this, which is on by default (at least for me), by right-clicking the project in the Projects window and clicking Properties > Build > Compiling. I turned it off, and sure enough the program would again run from the IDE. So I tried a "clean and build", verified the program ran, switched Compile on Save back on, and the bug returned.

Compile on Save does what is sounds like: it compiles source files when you save them. This saves time when you run or debug the program, so I was not entirely thrilled at having to turn it off. Before submitting a bug report, I did a quick search of the NetBeans issue tracker, and found only one bug report (again, from 2011) that mentioned both Compile on Save and ClassNotFoundException. The person who filed that report resolved his problem by deleting the NetBeans cache (which NetBeans rebuilds when next run). So I tracked down the cache (on my Linux Mint PC, it's in the folder ~/.cache/netbeans/7.2), deleted it, restarted NetBeans, turned Compile on Save back on, and happily discovered that the program once again runs.

Tuesday, May 7, 2013

Consecutive Ones

[APOLOGY: Not too long ago I added to the blog a "feature" provided by Google that hypothetically merges comments made on the blog with comments made to the Google+ post in which I announce the blog entry. I should have read the fine print more carefully. This (a) disenfranchised anyone without a Google account from commenting, (b) hid the presence of comments on the blog until you clicked the "$n$ comments" link, and (c) set $n$ to be the number of comments posted directly to the blog, not the number of comments in the merged stream. The effect of (c) was that this post was showing "0 comments" when in fact there were four or five, all on the G+ post. So I turned off the feature ... which appears to have deleted the existing comments from the G+ post (unless Google is messing with me). Google messing with me is entirely possible: they logged me out in the middle of the edit below, and when I logged back in they had lost all the changes, even though I'd done several previews, which typically has the effect of saving changes. Anyone know if Microsoft recently bought Google??]
Stripped of some unnecessary detail, the following showed up on a support forum today: given a set $x_1,\dots,x_n$ of binary variables and a parameter $K$, how can one specify in a mathematical program that (a) $\sum_{i=1}^n x_i$ is either 0 or $K$ and (b) that, in the latter case, the $K$ variables with value 1 must be consecutive?

The first part is easy. Define a new integer variable $s\in\{0,K\}$ and add the constraint $$\sum_{i=1}^n x_i = s.$$Some modeling languages (AMPL comes to mind) allow you to specify directly the domain $\{0,K\}$; in other cases, replace $s$ with $K\hat{s}$ where $\hat{s}$ is a binary variable.

A couple of methods come to mind for enforcing requirement (b). I'll assume that $1<K<n$ to avoid trivial cases. One method is to add the constraints
$$\begin{eqnarray*} x_{1} & \le & x_{2}\\ 2x_{j} & \le & x_{j-1}+x_{j+1}\quad \forall j\in\left\{ 2,\dots,n-1\right\} \\ x_{n} & \le & x_{n-1}. \end{eqnarray*}$$[END NONSENSE] 
I suffered a rather severe brain cramp when I wrote the above. Several comments pointed out that this is erroneous. The corrected formulation (which I've tested, and thus post with a modicum of possibly misplaced confidence), is:$$(K-1)(x_{i-1}-x_i)\le \sum_{j=\max(1,i-K)}^{i-2}x_j\quad \forall i\in\{2,\dots,n\}$$(with an empty sum understood to be 0).

Another is to replace binary variable $x_i$ with continuous variable $\tilde{x}_i$ and introduce a new set of binary variables $y_1,\dots,y_{n-K+1}$ where $y_j=1$ signals that $\tilde{x}_j$ starts a run of $K$ consecutive ones. In this approach, we do not need $s$; requirement (a) becomes $$\sum_{i=1}^{n-K+1}y_j \le 1.$$Requirement (b) translates to$$ \tilde{x}_{j}  =  \sum_{i=\max\left\{ 1,j-K+1\right\} }^{j}y_{j}\quad \forall j.$$With the second approach, a presolver can easily eliminate the $\tilde{x}_j$ variables, although it may be preferable to retain them. (Retaining the $\tilde{x}_j$ increases both the number of variables and number of constraints; but in cases where the original $x_j$ appear in more than one expression, eliminating the $\tilde{x}_j$ by substitution increases the density of the constraint matrix.)