Questions about a recent post containing a couple of flow charts made me regret the fact that Blogger does not support SVG graphics. Although I made the charts using the excellent TiKZ LaTeX package (which produces vector output in the form of a PDF file), I had to convert the PDF output to PNG for Blogger, and the images are a tad fuzzy (and do not scale well when zoomed), which may have led to some confusion. So I was on a mission today to find a workaround.

The first thing I discovered that SVG is not nirvana. I generated the images as separate PDF files from the TiKZ source. (Hint to people doing this: use the LaTeX standalone document class and the output file will automatically be cropped to the smallest rectangle containing the image.) I then used ImageMagick to convert the PDF files to SVG files ... which blew them up from around 54 KB to about 8+ MB. Each. Oops! I tried a tool I found to convert the SVG files to "canvas" files, which got me another 5x or so increase in size. So back to PDF files. Blogger does not consider PDFs to be images, to the plan now is to use a PNG file as the image and link it to the corresponding PDF, allowing readers to click on the image and at least get a scalable version in a new window or tab.

Blogger does not allow you to upload arbitrary files, only images in formats it recognizes (which are actually stored in Picassa, another Google product/service). I thought about putting the PDFs on a personal server somewhere, but I worry about moving things (including them) to a new server at some future date and not remembering to update the links. Enter Google Documents. I just have to upload the PDFs to Google Docs, make them public, grab the URLs and set the links on the PNGs (which by default point to the PNG files in Picassa) to point to the PDFs in Google Docs instead.

I did this with the aforementioned post, and I think it's working. The one question mark for me is whether everyone can see them. I can, but Google knows I'm the owner. If either of my loyal readers runs into problems, or wants me to convert images from earlier posts, please drop a comment here.

## Saturday, October 29, 2011

## Friday, October 21, 2011

### Time Series, Causality and Renting a Congressman

This morning I was reading an article, in a national (US) news magazine, that discussed energy policy decisions. It mentioned executives of companies in various energy-related industries contributing to the coffers of various congressmen (from both parties), and it mentioned their votes on key policy issues. The implication was clear: the votes were being bought, or at least influenced, by interested parties.

I understand why writers do this: it spices up the articles (a hint of corruption) without making any explicit charges that cannot be backed up with hard evidence; and it appeals to voters looking for confirmation of their prior belief that congressmen are crooks. I tend to view our elected representatives as by and large venal to some extent myself. The last part of the title of this post is motivated by a quote from Simon Cameron (Abraham Lincoln's first Secretary of War): "An honest politician is one who, when he is bought, will stay bought." As you can tell from the title, I am not sanguine about finding honest politicians.

As an OR professional, though, I understand the flaw in the argument (if we can characterize what is really innuendo as an "argument"). It relates in part to confusing correlation with causation and in part with confusing the direction of causation. Let's say that our fearless reporter accurately reports that Representative A votes for subsidies for domestic production of buggy whips, and Ms. B, CEO of a large manufacturer of buggy whips, donates regularly and/or generously to Rep. A's election campaigns. With minimal effort choosing words, our reporter can make this sound as if A voted for subsidies

Another possibility is that A voted for subsidies first and then B donated, recognizing A (from his vote) as a friend of the buggy whip industry. There's causation, but in the opposite direction from what the reporter implied, and nothing venal. The nature of political contributions is that you usually contribute to candidates with whom you agree. (I say

Yet another possibility is that B contributed to A first, either anticipating a pro-buggy whip platform or for reasons unrelated to industrial policy, and A favors subsidizing domestic production for sincerely held (if possibly misguided) reasons. This is a case of trying to infer causation from what may be mere correlation.

How can we tell whether A's vote is in fact being purchased? Anyone who has done much with statistics recognizes how hard it is to establish causality, especially in the absence of a designed experiment with an appropriate control group. We could try tracking A's votes over time, matching them with B's contributions (or the contributions of B and other pro-subsidy individuals or groups), and look for a positive cross-correlation with lag 1 or higher (so that swings in donations lead swings in votes). A positive cross-correlation would still not be solid proof of vote-buying, though, and might also lack "power" as a test of vote-buying: if A fits Cameron's definition of an honest politician, and consistently votes pro-buggy whip because he is taking graft, there will be no variance in his votes and therefore no cross-correlation.

This leaves me in the unsatisfying position of not knowing whether or not A is a crook. There have been enough US congressmen convicted of malfeasance to convince me that the proportion of occupants of Capitol Hill who are crooks is not zero. Still, there is also a well known aphorism (of uncertain provenance) that might apply: "Never ascribe to malice that which is adequately explained by incompetence." We have ample evidence that incompetence is not in short supply on the Hill.

I understand why writers do this: it spices up the articles (a hint of corruption) without making any explicit charges that cannot be backed up with hard evidence; and it appeals to voters looking for confirmation of their prior belief that congressmen are crooks. I tend to view our elected representatives as by and large venal to some extent myself. The last part of the title of this post is motivated by a quote from Simon Cameron (Abraham Lincoln's first Secretary of War): "An honest politician is one who, when he is bought, will stay bought." As you can tell from the title, I am not sanguine about finding honest politicians.

As an OR professional, though, I understand the flaw in the argument (if we can characterize what is really innuendo as an "argument"). It relates in part to confusing correlation with causation and in part with confusing the direction of causation. Let's say that our fearless reporter accurately reports that Representative A votes for subsidies for domestic production of buggy whips, and Ms. B, CEO of a large manufacturer of buggy whips, donates regularly and/or generously to Rep. A's election campaigns. With minimal effort choosing words, our reporter can make this sound as if A voted for subsidies

*because*B (and perhaps others of a similar mind) paid him off. That is indeed one possibility.Another possibility is that A voted for subsidies first and then B donated, recognizing A (from his vote) as a friend of the buggy whip industry. There's causation, but in the opposite direction from what the reporter implied, and nothing venal. The nature of political contributions is that you usually contribute to candidates with whom you agree. (I say

*usually*because lobbyists sometimes contribute to both sides, hoping the winner will remember their contribution to her and not to her opponent.)Yet another possibility is that B contributed to A first, either anticipating a pro-buggy whip platform or for reasons unrelated to industrial policy, and A favors subsidizing domestic production for sincerely held (if possibly misguided) reasons. This is a case of trying to infer causation from what may be mere correlation.

How can we tell whether A's vote is in fact being purchased? Anyone who has done much with statistics recognizes how hard it is to establish causality, especially in the absence of a designed experiment with an appropriate control group. We could try tracking A's votes over time, matching them with B's contributions (or the contributions of B and other pro-subsidy individuals or groups), and look for a positive cross-correlation with lag 1 or higher (so that swings in donations lead swings in votes). A positive cross-correlation would still not be solid proof of vote-buying, though, and might also lack "power" as a test of vote-buying: if A fits Cameron's definition of an honest politician, and consistently votes pro-buggy whip because he is taking graft, there will be no variance in his votes and therefore no cross-correlation.

This leaves me in the unsatisfying position of not knowing whether or not A is a crook. There have been enough US congressmen convicted of malfeasance to convince me that the proportion of occupants of Capitol Hill who are crooks is not zero. Still, there is also a well known aphorism (of uncertain provenance) that might apply: "Never ascribe to malice that which is adequately explained by incompetence." We have ample evidence that incompetence is not in short supply on the Hill.

## Sunday, October 9, 2011

### Benders Decomposition Then and Now

A couple of years ago, a coauthor and I had a paper under review at a prestigious journal that shall remain nameless. In the paper we solved a mixed integer linear program (MILP) using Benders decomposition, with CPLEX link as the solver, and we employed callbacks to check each potential solution to the master problem as it occurred. One of the reviewers asked me to justify the use of callbacks. My first inclination was a variation on “because it's standard practice”, but since I have not actually surveyed practitioners to confirm that it is standard (and because, as an academic, I'm naturally loquacious), I gave a longer explanation. What brought this to mind is a recent email exchange in which someone else asked me why I used callbacks with Benders decomposition. It occurred to me that, assuming I'm correct about it being the current norm, this may be a case of textbooks not having caught up to practice. So here is a quick (?) explanation.

The original paper by Jack Benders [1] dealt with both continuous and discrete optimization, linear and nonlinear. The application I have in mind here deals with problems that, without loss of generality, can be modeled as follows:

\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & c_{2}'y\\ \textrm{subject to} & Ax & & & \ge & a\\ & & & By & \ge & b\\ & Dx & + & Ey & \ge & d\\ & x & \in & \mathbb{Z}^{m}\\ & y & \in & \mathbb{R}^{n} \end{array} \]

Benders decomposition splits this into a master problem and a subproblem. The subproblem must be a linear program (LP), so all the discrete variables have to go into the master problem (a MILP). You can optionally keep some of the continuous variables in the master, but I will not do so here. You can also optionally create multiple subproblems (with disjoint subsets of $y$), but again I will not do so here. Finally, in the spirit of keeping things simple (and because unbounded models are usually a sign of a lazy modeler), I'll assume that the original problem is bounded (if feasible), which implies that both the master and subproblem will be bounded (if feasible).

The master problem is as follows:

\[ \begin{array}{lrclccc} \textrm{minimize} & c_{1}'x & + & z\\ \textrm{subject to} & Ax & & & \ge & a\\ & g'x & + & z & \ge & f & \forall (g,f)\in\mathcal{O}\\ & g'x & & & \ge & f & \forall (g,f)\in\mathcal{F}\\ & x & \in & \mathbb{Z}^{m}\\ & z & \in & \mathbb{R} \end{array}. \]Variable $z$ acts as a surrogate for the objective contribution $c_{2}'y$ of the continuous variables. Set $\mathcal{O}$ contains what are sometimes known as “optimality cuts”: cuts that correct underestimation of $c_{2}'y$ by $z$. $\mathcal{F}$ contains what are sometimes known as “feasibility cuts”: cuts that eliminate solutions $x$ for which the subproblem is infeasible. Initially $\mathcal{O}=\emptyset=\mathcal{F}$. The subproblem, for a given $x$ feasible in the master problem, is: \[ \begin{array}{lrcl} \textrm{minimize} & c_{2}'y\\ \textrm{subject to} & By & \ge & b\\ & Ey & \ge & d-Dx\\ & y & \in & \mathbb{R}^{n} \end{array} \]Optimality cuts are generated using the dual solution to a feasible subproblem; feasibility cuts are generated using an extreme ray of the dual of an infeasible subproblem. The precise details are unnecessary for this post.

Back in the '60s ('70s, '80s), solvers were essentially closed boxes: you inserted a problem, set some parameters, started them and went off to read the collected works of Jane Austen. Intervening in the algorithms was not an option. Thus the original flowchart for Benders decomposition looked like the following.

The modern approach, using callbacks, looks like this:

Obviously the modern approach can only be implemented if you are using a solver that supports callbacks, and requires more advanced programming than the classical approach. Other than that, what are the advantages and disadvantages?

The main advantage of the callback approach is that it is likely to avoid considerable rework. In the original approach, each time you add a cut to the master problem, you have to solve it anew. Although the new cuts may change the structure of the solution tree (by changing the solver's branching decisions), you are probably going to spend time revisiting candidate solutions that you had already eliminated earlier. Moreover, you may actually encounter the optimal solution to the original problem and then discard it, because a superoptimal solution that is either infeasible in the original problem or has an artificially superior value of $z$ causes the true optimum to appear suboptimal. With the callback approach, you use a single search tree, never revisit a node, and never overlook a truly superior solution.

The potential disadvantage of the callback approach is that you potentially generate a cut each time a new incumbent is found, and some of those cuts quite possibly would have been avoided if you followed the classical approach (and added a cut only when an “optimal” solution was found). Modern solvers are good at carrying “lazy” constraints in a pool, rather than adding the cuts to the active constraint set, but there is still a chance that the modern approach will take longer than the classical approach. I believe (but have no data to support my belief) that the modern approach will typically prove the faster of the two.

[1] Benders, J. F.,

The original paper by Jack Benders [1] dealt with both continuous and discrete optimization, linear and nonlinear. The application I have in mind here deals with problems that, without loss of generality, can be modeled as follows:

\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & c_{2}'y\\ \textrm{subject to} & Ax & & & \ge & a\\ & & & By & \ge & b\\ & Dx & + & Ey & \ge & d\\ & x & \in & \mathbb{Z}^{m}\\ & y & \in & \mathbb{R}^{n} \end{array} \]

Benders decomposition splits this into a master problem and a subproblem. The subproblem must be a linear program (LP), so all the discrete variables have to go into the master problem (a MILP). You can optionally keep some of the continuous variables in the master, but I will not do so here. You can also optionally create multiple subproblems (with disjoint subsets of $y$), but again I will not do so here. Finally, in the spirit of keeping things simple (and because unbounded models are usually a sign of a lazy modeler), I'll assume that the original problem is bounded (if feasible), which implies that both the master and subproblem will be bounded (if feasible).

The master problem is as follows:

\[ \begin{array}{lrclccc} \textrm{minimize} & c_{1}'x & + & z\\ \textrm{subject to} & Ax & & & \ge & a\\ & g'x & + & z & \ge & f & \forall (g,f)\in\mathcal{O}\\ & g'x & & & \ge & f & \forall (g,f)\in\mathcal{F}\\ & x & \in & \mathbb{Z}^{m}\\ & z & \in & \mathbb{R} \end{array}. \]Variable $z$ acts as a surrogate for the objective contribution $c_{2}'y$ of the continuous variables. Set $\mathcal{O}$ contains what are sometimes known as “optimality cuts”: cuts that correct underestimation of $c_{2}'y$ by $z$. $\mathcal{F}$ contains what are sometimes known as “feasibility cuts”: cuts that eliminate solutions $x$ for which the subproblem is infeasible. Initially $\mathcal{O}=\emptyset=\mathcal{F}$. The subproblem, for a given $x$ feasible in the master problem, is: \[ \begin{array}{lrcl} \textrm{minimize} & c_{2}'y\\ \textrm{subject to} & By & \ge & b\\ & Ey & \ge & d-Dx\\ & y & \in & \mathbb{R}^{n} \end{array} \]Optimality cuts are generated using the dual solution to a feasible subproblem; feasibility cuts are generated using an extreme ray of the dual of an infeasible subproblem. The precise details are unnecessary for this post.

Back in the '60s ('70s, '80s), solvers were essentially closed boxes: you inserted a problem, set some parameters, started them and went off to read the collected works of Jane Austen. Intervening in the algorithms was not an option. Thus the original flowchart for Benders decomposition looked like the following.

Obviously the modern approach can only be implemented if you are using a solver that supports callbacks, and requires more advanced programming than the classical approach. Other than that, what are the advantages and disadvantages?

The main advantage of the callback approach is that it is likely to avoid considerable rework. In the original approach, each time you add a cut to the master problem, you have to solve it anew. Although the new cuts may change the structure of the solution tree (by changing the solver's branching decisions), you are probably going to spend time revisiting candidate solutions that you had already eliminated earlier. Moreover, you may actually encounter the optimal solution to the original problem and then discard it, because a superoptimal solution that is either infeasible in the original problem or has an artificially superior value of $z$ causes the true optimum to appear suboptimal. With the callback approach, you use a single search tree, never revisit a node, and never overlook a truly superior solution.

The potential disadvantage of the callback approach is that you potentially generate a cut each time a new incumbent is found, and some of those cuts quite possibly would have been avoided if you followed the classical approach (and added a cut only when an “optimal” solution was found). Modern solvers are good at carrying “lazy” constraints in a pool, rather than adding the cuts to the active constraint set, but there is still a chance that the modern approach will take longer than the classical approach. I believe (but have no data to support my belief) that the modern approach will typically prove the faster of the two.

[1] Benders, J. F.,

*Partitioning Procedures for Solving Mixed-Variables Programming Problems*,**Numerische Mathematik**4 (1962), 238--252.
Subscribe to:
Posts (Atom)