tag:blogger.com,1999:blog-87813834610619295712018-10-16T11:22:11.755-04:00OR in an OB WorldA mix of operations research items and software tricks that I'll probably forget if I don't write them down somewhere.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.comBlogger371125tag:blogger.com,1999:blog-8781383461061929571.post-42318577500420907152018-09-21T16:32:00.000-04:002018-09-21T16:32:30.535-04:00Coordinating Variable SignsSomeone asked me today (or yesterday, depending on whose time zone you go by) how to force a group of variables in an optimization model to take the same sign (all nonpositive or all nonnegative). Assuming that all the variables are bounded, you just need one new binary variable and a few constraints.<br /><br />Assume that the variables in question are $x_1,\dots,x_n$ and that they are all bounded, say $L_i \le x_i \le U_i$ for $i=1,\dots,n$. If we are going to allow variables to be either positive or negative, then clearly we need $L_i < 0 < U_i$. We introduce a new binary variable $y$ and, for each $i$, the constraints$$L_i (1-y) \le x_i \le U_i y.$$If $y$ takes the value 0, every original variable must be between its lower bound and 0 (so nonpositive). If $y$ takes the value 1, every original variable must be between 0 and its upper bound (so nonnegative).<br /><br />Note that trying to enforce strictly positive or strictly negative rather than nonnegative or nonpositive is problematic, since optimization models abhor strict inequalities. The only work around I know is to change "strictly positive" to "greater than or equal to $\epsilon$" for some strictly positive $\epsilon$, which creates holes in the domains of the variables (making values between 0 and $\epsilon$ infeasible). Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com7tag:blogger.com,1999:blog-8781383461061929571.post-18595398495529745342018-09-12T18:03:00.000-04:002018-09-12T18:03:16.281-04:00Choosing "Big M" ValuesI seem to bring up "big M" models a lot, so apologies if I end up repeating myself in places here. Not long ago, someone passed along highlights of a "big M" type model to me and asked if he could somehow reformulate to get rid of $M$. I did not see any good way to do it, but I did offer up suggestions about choosing values for $M$, and I thought that might make a decent blog post.<br /><br />Just for clarity, what I'm referring to is an integer or mixed-integer program (I'll stick to linear objective and constraints) in which binary variables are used to, in loose terms, turn constraints on or off. So a representative constraint might look like the following:$$\sum_i a_i x_i \le b + M(1-y)$$with the $a_i$ coefficients, the $x_i$ variables (discrete, continuous or a mix), $y$ a binary variable and $M$ a "large" coefficient. Think of $M$ as a stunt double for infinity. The notion is that if $y=1$, the constraint is "turned on" and reduces to$$\sum_i a_i x_i \le b.$$If $y=0$, the constraint is "turned off" and reduces to$$\sum_i a_i x_i \le b + M \approx \infty,$$which should be satisfied by any choices for $x$ the solver might make. There are other variations of "big M" constraints, but I'll stick with this one for illustrative purposes.<br /><br /><h3>The perils of $M$</h3><br />Choosing a value for $M$ can be a tricky business. Suppose first that we choose $M$ small enough that, when $y=0$ and the constraint should be "off",$$\sum_i a_i x_i^* \gt b + M$$for some solution $x^*$ that <i>should</i> be optimal but now appears infeasible to the solver. The result is an incorrect solution. So let's refer to a value for $M$ as <b>correct</b> if it is large enough that no potentially optimal solution violates the constraint when $y=0$. ("Correct" is my term for this. I don't know if there is an "official" term.) Correctness is essential: choose an incorrect (too small) value for $M$ and you risk getting an incorrect solution.<br /><br />Since it is frequently not obvious how large $M$ needs to be in order to be guaranteed correct, people tend to err on the side of caution by choosing really massive values for $M$. That brings with it a different set of problems (ones often ignored in introductory text books). First, branch-and-bound (or branch-and-cut) solvers tend to rely on the continuous relaxation of subproblems (dropping integrality constraints but keeping the functional constraints). Large values of $M$ make for weak relaxations (producing very loose bounds).<br /><br />Suppose, for instance, that we are solving a model that both builds roads and routes traffic along those roads. $y$ represents the decision to build a particular road ($y=1$) or not ($y=0$). If we build the road, a cost $c$ is incurred (represented by the term $cy$ in the objective function) and we can send unlimited traffic along it; if not, there is no cost but no traffic is committed. In our constraint, the left side is traffic on the road and $b=0$, so traffic there can either be up to $M$ (if built) or 0 (if not built). Now suppose, for example, that the user chooses $M=10^9$ and the solver, in a continuous relaxation of some subproblem, sets $y=10^{-6}$. The solution to the relaxed problem pays only one millionth of $c$ for the privilege of allowing 1,000 units of traffic on this route, which basically allows it to "cheat" (and perhaps grossly underestimates the cost of any actual solution).<br /><br />A related problem is limited arithmetic precision. Since double-precision floating-point arithmetic (the way the computer does arithmetic with real numbers) has limited precision, and is thus susceptible to rounding errors, solvers have to establish standards for "close enough to feasible" and "close enough to integer". Continuing the previous example, it is entirely possible that the solver might look at $y=10^{-6}$, decide that is within integrality tolerance, and treat it as $y=0$, possibly leading to what it thinks is an "optimal" solution with 1,000 units of traffic running over a road that does not exist. Oops.<br /><br />Finally, note that $M$ is part of the coefficient matrix. Mixing very large coefficients (like $M$) with much smaller coefficients can create numerical instability, leading the solver to spend more time computing linear program pivots and possibly leading to totally erroneous solutions. That's too big a topic to get into here, but I'm pretty sure I've mentioned it elsewhere.<br /><br />Despite all this, "big M" models are still very common in practice. There is a nice paper by Codato and Fischetti [1] that shows how a form of Bender's decomposition can be used to get rid of the $M$ coefficients. I've used it successfully (for instance, in [2]), but Bender's decomposition is an advanced technique (i.e., not for the faint of heart), and is not always supported by high-level modeling languages.<br /><br />So, if we are stuck with "big M" models, what can we do to choose values of $M$ that are both correct and and <b>tight</b> (again, my term), meaning small enough that they avoid numerical problems and hopefully produce relaxations with bounds strong enough to do some good?<br /><br /><h3>Not all $M$ are created equal</h3><br />Most "big M" models have more than one constraint (and frequently many) containing a large coefficient $M$. One of my pet peeves is that authors of text books and journal articles will frequently just us $M$, with no subscripts, everywhere such a coefficient is needed. This, in turn, breeds a tendency for modelers to choose one large value for $M$ and use it everywhere.<br /><br />Back when manuscripts were produced on typewriters, it was a bit of a pain in the ass to put in subscripts, so I can see how the trend would have started. (Quick tangent: Nate Brixius recently did a blog post with a picture of a typewriter, for those too young to be familiar with them. I'll <a href="https://nathanbrixius.wordpress.com/2018/09/12/the-origin-of-cc-and-bcc/" target="_blank">link it here</a> for the benefit of younger readers ... and also note that the one pictured is much more modern than the one I learned on, which was not electric.) Today, when people routinely use LaTeX to write manuscripts, there's not much excuse for omitting one measly subscript here and there. Anyway, my point is that it is usually better to use a different, customized value of $M$ for each constraint that needs one.<br /><br /><h3>Customized how?</h3><br />In some cases, model context will provide you an obvious choice for $M$. For example, suppose you are selecting warehouse locations and planning shipments from warehouses to customers. A typical "big M" constraint will look like the following (slightly different from our previous constraint but clearly related:$$\sum_j x_{ij} \le M_i y_i.$$Here variable $x_{ij}$ indicates the amount shipped from warehouse $i$ to customer $j$, binary variable $y_i$ is 1 if we use that warehouse and 0 if we do not, and the intent of the constraint is to say that if we do not use (and pay for) a warehouse, we cannot ship anything from it. One obvious choice for $M_i$ is the capacity of the warehouse. A better choice might be the smaller of that capacity or the maximum volume of customers demands that might plausibly be assigned to that warehouse. The latter might be based, for example, on knowing that the company would not ship anything to customers outside a certain distance from the warehouse.<br /><br />In other cases, there may be some less obvious way to extract suitable (correct and hopefully tight) values for the $M_i$. A while back, I was working on mixed-integer programming models for two group discriminant analysis, and in one paper ([3]) I needed to pick values for $M$. Without going into gory details, I was able to normalize the coefficients of the (linear) discriminant function under construction and then compute valid coefficients $M_i$ by looking at the euclidean distances between pairs of observations. I don't claim that the $M_i$ I came up with were the tightest possible, but they were correct and they produced faster solution of the models than what I got with some arbitrarily large values I initially tried.<br /><br />Finally, you can actually solve subproblems to get correct and (hopefully) tight $M$ values. Whether this is feasible depends on how many you need and how large and scary your model is. Going back to my first example, the trick would work something like this. Relax all integrality constraints. Drop the constraint in question. If you have any other "big M" constraints for which you have not yet computed tight values of $M$, pick something safely large for those coefficients, trying to avoid numerical problems but not worrying about tightness. Now switch the objective to maximizing $\sum_i a_i x_i - b$. The optimal objective value is your value for $M$. It is clearly correct: any feasible solution to the MIP model is a feasible solution to the LP relaxation, and so the value of $\sum_i a_i x_i - b$ cannot exceed your choice of $M$ (regardless of whether $y$ is 0 or 1). Repeat for each additional constraint containing a "big M" coefficient (switching from minimizing to maximizing if the nature of the constraint warrants it).<br /><br />The process may be time-consuming, but at least you are solving LPs rather than MIPs. It is also a bit trickier than I made it sound, at least when multiple $M_i$ are involved. You have to guess values for any $M_i$ for which you have not yet solved, and guessing big values for them may result in a looser relaxation than necessary, which in turn may result in an inflated value for the $M_i$ you are currently choosing. You should definitely get correct choices for the $M$ coefficients; it's just the tightness that is in question. Even so, the values you get for the $M_i$ just might be tight enough to save you more time in the MIP solution than it costs to solve the LPs.<br /><br /><h3>References</h3><br />[1] Codato, G. and Fischetti, M. <i>Combinatorial Benders' Cuts for Mixed-Integer Linear Programming</i>. Operations Research, <b>2006</b>, Vol. 54(4), pp. 756-766.<br /><br />[2] Bai, L. and Rubin, P. A. <i>Combinatorial Benders Cuts for the Minimum Tollbooth Problem</i>. Operations Research, <b>2009</b>, Vol. 57(6), pp. 1510-1522.<br /><br />[3] Rubin, P. A. <i>Heuristic Solution Procedures for a Mixed-Integer Programming Discriminant Model</i>. Managerial and Decision Economics, <b>1990</b>, Vol. 11(4), pp. 255-266. Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com3tag:blogger.com,1999:blog-8781383461061929571.post-5385644160804986132018-08-25T15:32:00.000-04:002018-08-25T15:32:15.553-04:00Adding Items to a SequenceA <a href="https://www.or-exchange.org/questions/14874/mip-formulation-of-tsp-recreation-of-lns" target="_blank">question</a> posed on OR-Exchange in 2017 asked the following: Given a tour of nodes, how does one best add two new nodes while respecting the ordering of the original tour. Specifically, the author began with a tour 0 - 1 - 2 - 4 - 6 - 0 (where node 0 is a depot) and wanted to add new stops 3 and 5 in such away that, in the revised tour, stop 1 still came before stop 2, stop 2 before stop 4, etc.<br /><br />This problem can arise not just in vehicle routing but in many sorts of sequencing problems (such as scheduling jobs for production). Of course, preserving the original ordering to the extent possible is not always a concern, but it might be if, for instance, the existing stops are customers who have been promised somewhat general time windows for delivery. In any event, we'll just take the question as a given.<br /><br />The answer I posted on OR-X made the somewhat charitable (and, in hindsight, unwarranted) assumption that the two new stops would be inserted by breaking two previous arcs, rather than consecutively (for instance, ... - 2 - <span style="color: blue;">3 - 5</span> - 4 - ...). So I'll post an answer without that assumption here. In fact, I'll post three variants, one specific to the case of adding exactly two stops and the other two more general.<br /><br />First, let me articulate some common elements. I'll denote the set of original nodes by $N_1$, the set of nodes to be added by $N_2$, and their union by $N=N_1 \cup N_2$. All three approaches will involve setting up integer programming models that will look for the most part like familiar routing models. So we will have binary variables $x_{ij}$ that will take the value 1 if $j$ immediately follows $i$ in the new tour. We will have constraints ensuring that every node is entered and exited exactly once:$$\sum_{j\in N} x_{ij} = 1\quad \forall i\in N\\ \sum_{i \in N} x_{ij} = 1 \quad\forall j\in N.$$The objective function will be some linear combination of the variables (sum of distances covered, sum of travel times, ...), which I will not worry about here, since it is no different from any sequencing model.<br /><br />The first new wrinkle is that we do <i>not</i> define a variable for every pair of nodes. We create $x_{ij}$ only for the following combinations of subscripts: \begin{align*} i & \in N_{2},j\in N_{2},i\neq j\\ i & \in N_{1},j\in N_{2}\\ i & \in N_{2},j\in N_{1}\\ i & \in N_{1},j\in N_{1},(i,j)\in T \end{align*} where $T$ is the original tour. Thus, for example, we would have $x_{24}$ but not $x_{42}$, nor $x_{26}$. The rationale is straightforward: if we add an arc between two original nodes that were not successors on the original tour, we will force an order reversal. For instance, suppose we replace the arc 2 - 4 with, say, 2 - 6. Node 4 now must appear either before node 2 or after node 6, and either way the order has not been preserved.<br /><br /><h3>Version 1</h3><br />The first variant makes explicit use of the fact that we have only two new nodes. We add one subtour elimination constraint, to prevent the new nodes from forming a subtour: $x_{35}+x_{53}\le 1.$ Now consider how many different ways we could insert the two new nodes. First, we could break two links in the original tour, inserting 3 in the void where the first link was and 5 in the void where the second link was. Since the original tour had five links there are $\binom{5}{2}=10$ distinct ways to do this. Similarly, we could break two links but insert 5 first and 3 later. There are again ten ways to do it. Finally, we could break one link and insert either 3 - 5 or 5 - 3 into the void. With five choices of the link to break and two possible orders, we get another ten results, for a grand total of 30 possibly new tours.<br /><br />With that in mind, consider what happens if node 3 is inserted after original node $i$, breaking the link between $i$ and its original successor $j$. (In our model, this corresponds to $x_{i3}=1$.) If this is a single node insertion, then we should have $j$ follow node 3 ($x_{3j}=1$). If it is a double insertion ($i$ - 3 - 5 - $j$), we should have $x_{35}=x_{5j}=1$. We can capture that logic with a pair of constraints for each original arc: <br />\[ \left.\begin{aligned}x_{i3}-x_{3j} & \le x_{35}\\ x_{i3}-x_{3j} & \le x_{5j} \end{aligned} \right\} \forall(i,j)\in T. \] We could do the same using node 5 in place of node 3, but it is unnecessary. If node 3 is correctly inserted by itself, say between $i$ and $j$, and node 5 is inserted after original node $h$, then the original successor $k$ of $h$ needs a new predecessor. That predecessor cannot be $h$, nor can it be any other original node (given our reduced set of variables), nor can it be node 3 (which now precedes $j$). The only available predecessor is 5, giving us $h$ - 5 - $k$ as expected.<br /><br />You might wonder how this accommodates a 5 - 3 insertion, say after node $i$. The original successor $j$ of $i$ needs a new predecessor, and 3 is the only eligible choice, so we're good.<br /><br />I tested this with a small Java program, and it did in fact find all 30 valid revised tours (and no invalid ones).<br /><br /><h3>Version 2</h3><br />Version 2, which can be applied to scenarios with any number of new nodes, involves building a standard sequencing model with subtour elimination constraints. The only novel element is the reduced set of variables (as described above). A blog is no place to explain sequencing models in their full glory, so I'll just assume that you, the poor suffering reader, already know how.<br /><br /><h3>Version 3</h3><br />In version 3, we again build a sequencing model with the reduced set of variables, but this time we use the Miller-Tucker-Zemlin method of eliminating subtours rather than adding a gaggle of subtour elimination constraints. The MTZ approach generally results in smaller models (since the number of subtours, and hence the potential number of subtour constraints, grows combinatorially with the number of nodes), but also generally produces weaker relaxations.<br /><br />The <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem" target="_blank">Wikipedia page for the TSP</a> shows the MTZ constraints, although for some reason without labeling them as such. Assume a total of $n$ nodes (with consecutive indices), with node $0$ being the depot. The MTZ approach adds continuous variables $u_i, \,i\in \{1,\dots,n\}$ with bounds $0\le u_i \le n-1$. It also adds the following constraints for all eligible arcs $(i,j)$ with $i\neq 0$:$$u_i - u_j + n x_{ij} \le n-1.$$You can think of the $u_i$ variables as counters. The MTZ constraints say that if we go from any node $i$ (other than the depot) to any node $j$ (including the depot), the count at node $j$ has to be at least one larger than the count at node $i$. These constraints preclude any subtours, since a subtour (one starting and ending any place other than the depot) would result in the count at the first node of the subtour being larger than itself.<br /><br />As I mentioned, the MTZ formulation has a somewhat weaker LP relaxation than a formulation with explicit subtour elimination constraints, so it is not favored by everyone. In our particular circumstance, however, it has an additional virtue: it gives us a relatively painless way to enforce the order preservation requirement. All we need do is insert constraints of the form$$u_j \ge u_i + 1\quad\forall (i,j)\in T.$$This forces the counts at the original nodes to increase monotonically with the original tour order, without directly impacting the counts at the new nodes.<br /><br />Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-64507788324394708062018-07-31T17:02:00.000-04:002018-07-31T17:02:42.542-04:00NP ConfusionI just finished reading a somewhat provocative article on the <a href="https://www.cio.com/" target="_blank">CIO</a> website, titled "<a href="https://www.cio.com/article/3293010/hiring-and-staffing/10-reasons-to-ignore-computer-science-degrees.html" target="_blank">10 reasons to ignore computer science degrees</a>" (when hiring programmers). While I'm not in the business of hiring coders (although I recent was hired as a "student programmer" on a grant -- the Universe has a sense of humor), I find myself suspecting that the author is right about a few points, overstating a few and making a few that are valid for some university CS programs but not for all (or possibly most). At any rate, that's not why I mention it here. What particularly caught my eye was the following paragraph:<br /><blockquote class="tr_bq">It’s rare for a CS major to graduate without getting a healthy dose of <a href="https://en.wikipedia.org/wiki/NP-completeness" rel="nofollow noopener" target="_blank">NP-completeness</a> and <a href="https://en.wikipedia.org/wiki/Turing_machine" rel="nofollow noopener" target="_blank">Turing machines</a>, two beautiful areas of theory that would be enjoyable if they didn’t end up creating bad instincts. One biologist asked me to solve a problem in DNA sequence matching and I came back to him with the claim that it was NP-complete, a class of problems that can take a very long time to solve. He didn’t care. He needed to solve it anyway. And it turns out that most NP-complete problems are pretty easy to solve most of the time. There are just a few pathological instances that gum up our algorithms. But theoreticians are obsessed with the thin set that confound the simple algorithms, despite being rarely observed in everyday life.</blockquote>Any of my three regular readers will know that I periodically obsess/carp about NP-fixation, so I'm sympathetic to the tenor of this. At the same time, I have a somewhat mixed reaction to it.<br /><ul><li>"... NP-complete, a class of problems that can take a very long time to solve." This is certainly factually correct, and the author thankfully said "can" rather than "will". One thing that concerns me in general, though, is that not everyone grasps that problems in class P, for which polynomial time algorithms are known, can also take a very long time to solve. One reason, of course, is that "polynomial time" means run time is a polynomial function of problem size, and big instances will take longer. Another is that $p(n)=n^{1000}$ is a polynomial ... just not one you want to see as a (possibly attainable) bound for solution time. There's a third factor, though, that I think many people miss: the size of the coefficients (including a constant term, if any) in the polynomial bound for run time. I was recently reading a description of the default sorting algorithm in a common programming language. It might have been the one used in the Java collections package, but don't quote me on that. At any rate, they actually use two different sorting algorithms, one for small collections (I think the size cutoff might have been around 47) and the other for larger collections. The second algorithm has better computational complexity, but each step requires a bit more work and/or the setup is slower, so for small collections the nominally more complex algorithm is actually faster.</li><li>"He didn’t care. He needed to solve it anyway." I love this. It's certainly true that users can ask coders (and modelers) for the impossible, and then get snippy when they can't have it, but I do think that mathematicians (and, apparently, computer scientists) can get a bit too locked into theory. <major digression> As a grad student in math, I took a course or two in ordinary differential equations (ODEs), where I got a taste for the differences between mathematicians and engineers. Hand a mathematician an ODE, and he first tries to prove that it has a solution, then tries to characterize conditions under which the solution is unique, then worries about stability of the solution under changes in initial conditions or small perturbations in the coefficients, etc., ad nauseum. An engineer, faced with the same equation, tries to solve it. If she finds the solution, then obviously one exists. Depending on the nature of the underlying problem, she may or may not care about the existence of multiple solutions, and probably is not too concerned about stability given changes in the parameters (and maybe not concerned about changes in the initial conditions, if she is facing one specific set of initial conditions). If she can't solve the ODE, it won't do her much good to know whether a solution exists or not.</major digression> At any rate, when it comes to optimization problems, I'm a big believer in trying a few things before surrendering (and trying a few optimization approaches before saying "oh, it's NP-hard, we'll have to use my favorite metaheuristic").</li><li>"And it turns out that most NP-complete problems are pretty easy to solve most of the time. There are just a few pathological instances that gum up our algorithms." I find this part a bit misleading. Yes, some NP-complete problems can seem easier to solve than others, but the fundamental issue with NP-completeness or NP-hardness is problem dimension. Small instances of problem X are typically easier to solve than larger instances of problem X (although occasionally the Universe will mess with you on this, just to keep you on your toes). Small instances of problem X are likely easier to solve than large instances of problem Y, even if Y seems the "easier" problem class. Secondarily, the state of algorithm development plays a role. Some NP-complete problem classes have received more study than others, and so we have better tools for them. Bill Cook has a <a href="https://itunes.apple.com/us/app/concorde-tsp/id498366515?mt=8" target="_blank">TSP application for the iPhone</a> that can solve what I (a child of the first mainframe era) would consider to be insanely big instances of the traveling salesman problem in minutes. So, bottom line, I don't think a "few pathological instances" are responsible for "gum[ming] up our algorithms". Some people have problem instances of a dimension that is easily, or at least fairly easily, handled. Others may have instances (with genuine real-world application) that are too big for our current hardware and software to handle. That's also true of problems in class P. It's just that nobody ever throws their hands up in the air and quits without trying because a problem belongs to class P.</li></ul>In the end, though, the article got me to wondering two things: how often are problems left unsolved (or solved heuristically, with acceptance of a suboptimal final solution), due to fear of NP-completeness; and (assuming that's an actual concern), would we be better off if we never taught students (other than those in doctoral programs destined to be professors) about P v. NP, so that the applications programmers and OR analysts would tackle the problems unafraid?Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com4tag:blogger.com,1999:blog-8781383461061929571.post-2056829459271843022018-07-17T17:56:00.000-04:002018-07-20T12:47:36.791-04:00Selecting Box SizesSomeone posted an interesting <a href="https://math.stackexchange.com/questions/2843990/need-to-create-3-4-different-box-sizes-and-to-minimize-material-waste-for-a-se/2850260" target="_blank">question about box sizes</a> on Mathematics Stack Exchange. He (well, his girlfriend to be precise) has a set of historical documents that need to be preserved in boxes (apparently using a separate box for each document). He wants to find a solution that minimizes the total surface area of the boxes used, so as to minimize waste. The documents are square (I'll take his word for that) with dimensions given in millimeters.<br /><br />To start, we can make a few simplifying assumptions.<br /><ul><li>The height of a box is not given, so we'll assume it is zero, and only consider the top and bottom surfaces of the box. For a box (really, envelope) with side $s$, that makes the total area $2s^2$. If the boxes have uniform height $h$, the area changes to $2s^2 + 4hs$, but the model and algorithm I'll pose are unaffected.</li><li>We'll charitably assume that a document with side $s$ fits in a box with side $s$. In practice, of course, you'd like the box to be at least slightly bigger, so that the document goes in and out with reasonable effort. Again, I'll let the user tweak the size formula while asserting that the model and algorithm work well regardless.</li></ul>The problem also has three obvious properties.<br /><ul><li>Only document sizes need be considered as box sizes, i.e. for every selected size at least one document should fit "snugly".</li><li>The number of boxes you need at each selected size equals the number of documents too large to fit in a box of the next smaller selected size but capable of fitting in a box of this size.</li><li>You have to select the largest possible box size (since that is required to store the largest of the documents).</li></ul>What interests me about this problem is that it can be a useful example of <a href="https://en.wikipedia.org/wiki/Law_of_the_instrument" target="_blank">Maslow's Hammer</a>: if all you have is a hammer, every problem looks like a nail. As an operations researcher (and, more specifically, practitioner of discrete optimization) it is natural to hear the problem and think in terms of general integer variables (number of boxes of each size), binary variables (is each possible box size used or not), assignment variables (mapping document sizes to box sizes) and so on. OR consultant and fellow OR blogger <a href="https://plus.google.com/100547539949080099832" target="_blank">Erwin Kalvelagen</a> did a <a href="https://yetanothermathprogrammingconsultant.blogspot.com/2018/07/choosing-boxes-set-covering-model.html" target="_blank">blog post</a> on this problem, laying out several LP and IP formulations, including a network model. I do recommend your reading it and contrasting it to what follows.<br /><br />The first thought that crossed my mind was the possibility of solving the problem by brute force. The author of the original question supplied a data file with document dimensions. There are 1166 documents, with 384 distinct sizes. So the brute force approach would be to look at all $\binom{383}{2} = 73,153$ or $\binom{383}{3} = 9,290,431$ combinations of box sizes (in addition to the largest size), calculate the number of boxes of each size and their combined areas, and then choose the combination with the lowest total. On a decent PC, I'm pretty sure cranking through even 9 million plus combinations will only need a tolerable amount of time.<br /><br />A slightly more sophisticated approach is to view the problem through the lens of a layered network. There are either three or four layers, representing progressively larger selected box sizes, plus a "layer 0" containing a start node. In the three or four layers other than "layer 0", you put one node for each possible box size, with the following restrictions:<br /><ul><li>the last layer contains only a single node, representing the largest possible box, since you know you are going to have to choose that size;</li><li>the smallest node in each layer is omitted from the following layer (since layers go in increasing size order); and</li><li>the largest node in each layer is omitted from the preceding layer (for the same reason).</li></ul>Other than the last layer (and the zero-th one), the layers here will contain 381 nodes each if you allow four box sizes and 382 if you allow three box sizes. An arc connects the start node to every node in the first layer, and an arc connects every node (except the node in the last layer) to every node in the next higher layer where the head node represents a larger size box than the tail node. The cost of each arc is the surface area for a box whose size is given by the head node, multiplied by the number of documents too large to fit in a box given by the tail node but small enough to fit in a box given by the head node.<br /><br />I wanted to confirm that the problem is solvable without special purpose software, so I coded it in Java 8. Although there are plenty of high quality open-source graph packages for Java, I wrote my own node, arc and network classes and my own coding of <a href="https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm" target="_blank">Dijkstra's shortest path algorithm</a> just to prove a point about not needing external software. You are welcome to grab the source code (including the file of document sizes) from my <a href="https://gitlab.msu.edu/orobworld/BoxSizes" target="_blank">Git repository</a> if you like.<br /><br />I ran both the three and four size cases and confirmed that my solutions had the same total surface areas that Erwin got, other than a factor of two (I count both top and bottom; he apparently counts just one of them). How long does it take to solve the problem using Dijkstra's algorithm? Including the time reading the data, the four box version takes about half a second on my decent but not workstation caliber PC. The three box version takes about 0.3 seconds, but of course gives a worse solution (since it is more tightly constrained). This is single-threaded, by the way. Both problem set up and Dijkstra's method are amenable to parallel threading, but that would be overkill given the run times.<br /><br />So is it wrong to take a fancier modeling approach, along the lines of what Erwin did? Not at all. There are just trade-offs. The modeling approach produces more maintainable code (in the form of mathematical models, using some modeling language like GAMS or AMPL) that are also more easily modified if the use case changes. The brute force and basic network approaches I tried requires no extra software (so no need to pay for it, no need to learn it, ...) and works pretty well for a "one-off" situation where maintainability is not critical.<br /><br />Mainly, though, I just wanted to make a point that we should not overlook simple (or even brute force) solutions to problems when the problem dimensions are small enough to make them practical ... especially with computers getting more and more powerful each year.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-11570617661298890252018-07-06T18:16:00.001-04:002018-07-06T18:23:16.478-04:00Mint 19 Upgrade: Adventures #1-3I use my laptop as the "canary in the coal mine" when it comes to do operating system upgrades, since there's nothing awesomely important on it. So today I tried upgrading from Linux Mint 18.3 to 19.0. Note that I used the <a href="https://community.linuxmint.com/tutorial/view/2416" target="_blank">upgrade path</a>, rather than downloading the installer, burning it to a bootable disk, then installing from there. In hindsight, that might have been the faster approach. The upgrade took over an hour, and that's before any debugging.<br /><br /><h3>The case of the not-so-missing library file</h3><br />I hit the first of what will no doubt be several adventures when I reinstalled <a href="https://www.rstudio.com/products/rstudio-desktop/" target="_blank">RStudio desktop</a> and discovered it would not run. Despite the installer saying that all dependencies were satisfied, when I tried to run it from a command line I was told that a library file (libGL.so.1) could not be found.<br /><br />I'll skip over another hour or so of pointless flailing and cut to the chase scene. It turns out that libGL.so.1 actually was installed on my laptop, as part of the libgl1-mesa-glx package. It was hiding in plain sight in /usr/lib/x86_64-linux-gnu/mesa/. Somehow, that folder had not made it onto the system library path. (I have no idea why.) So I ran the command<br /><br /><div style="text-align: center;"><span style="font-family: "courier new" , "courier" , monospace;">sudo ldconfig /usr/lib/x86_64-linux-gnu/mesa</span></div><br />and that fixed the problem.<br /><br /><h3>Editor? We don't need no stinkin' editor</h3><br />Next up, I couldn't find a text editor! Note that LibreOffice was installed, and was the default program to open text (.txt) files. Huh?? Poking around, I found nano, but xed (the default text editor in Mint 18) and gedit (the previous default editor) were not installed (even though xed was present before the upgrade).<br /><br />Fixing this was at least (to quote a math prof I had in grad school) "tedious but brutally straightforward". In the software manager, I installed xed ... and xreader, also MIA. For whatever reason, the other X-Apps (xviewer, xplayer and pix) were already installed (as they all should have been).<br /><br /><h3>The mystery of the launcher that wouldn't launch</h3><br />Mint has a utility (mintsources) that lets you manage the sources (repositories, PPAs etc.) that you use. There is an entry for it in the main menu, but clicking that entry failed to launch the source manager. On the other hand, running the command ("pkexec mintsources") from a terminal worked just fine.<br /><br />I found the original desktop file at /usr/share/applications/mintsources.desktop (owned by root, with read and write permissions but not execute permission). After a bunch of messing around, I edited the menu entry through the menu editor (by right-clicking the menu entry and selecting "Edit properties"), changing "pkexec mintsources" to "gksudo mintsources". That creating another version at ~/.local/share/applications/mintsources.desktop. After right-clicking the main menu button and clicking "Reload plugins", the modified entry worked. I have no idea why that works but "pkexec mintsources" does not, even though it does from a terminal. I tried editing back to "pkexec", just in case the mere act of editing was what did the trick, but no joy there. So I edited back to "gksudo", which seems to be working ... for now ... until the gremlins return from their dinner break.<br /><br /><b>Update:</b> No sooner did I publish this than I found another instance of the same problem. The driver manager would not launch from the main menu. I edited "pkexec" to "gksudo" for that one, and again it worked. I guess "pkexec" is somehow incompatible with the Mint menu (at least on my laptop). <br /><br />I'll close for now with a link to "<a href="https://sites.google.com/site/easylinuxtipsproject/bugs" target="_blank">Solutions for 24 bugs in Linux Mint 19</a>".<br /><br /><br /><br /><br />Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-82474232585862714222018-07-05T11:23:00.002-04:002018-07-05T11:33:09.281-04:00Firefox Ate My BookmarksThis morning, I "upgraded" Firefox 60.0.2 to 61.0.0 on my desktop computer (Linux Mint). When I started the new version, it came to life with the correct default tabs and pages, no menu bar (my reference), and with the bookmark tool bar visible ... but completely empty. Toggling the menu option to display it was unproductive. I restored the most recent backup of the bookmarks, but the tool bar remained empty.<br /><br />So I killed Firefox, started it in safe mode (no change), then killed it again and restarted it normally. This time the bookmark tool bar was populated with the correct bookmarks and folders. (I don't know if passing through safe mode was necessary. Maybe it just needed another restart after the restoration operation.) Unfortunately, my problems were not yet over. Although I had the correct top-level stuff in the bookmark tool bar, the various folders only had about three items each, regardless of how many were supposed to be in each folder (and, trust me, it was typically more than three).<br /><br />When you go to restore bookmarks in Firefox, it will show you a list of backup files (I think it keeps the fifteen most recent) and how many items each contains. My recent backups were all listed with 18 to 21 items. Fortunately, I also have Firefox (not yet upgraded) on my laptop (running the same version of Linux Mint), with the same bookmarks. On the laptop, recent backups have <i>444</i> items. So either the upgrade messed up the backup files or Firefox 61.0.0 has trouble reading backups from version 60. Heck, maybe Firefox 60 screwed up making the automatic backups on my desktop (but, somehow, not on my laptop).<br /><br />The laptop proved my savior. I manually backed up the bookmarks on it to a file, parked that file on Dropbox just in case, copied it to the desktop and manually restored it. For the moment, at least, I have all my bookmarks back.<br /><br />In case you're reading this because you're in the same boat, here are the steps to do manual backups. Of course, this will only help if you have your bookmarks intact somewhere. If you're thinking of upgrading Firefox but haven't pulled the trigger yet, you might want to make a manual backup for insurance.<br /><br />Start with the "hamburger menu" (the button three horizontal parallel lines). From there, click the "Library" option, then "Bookmarks", then "Show All Bookmarks" (at the very bottom). That opens up a window titled "Library". Click the "Import and Backup" drop-down menu, then either "Backup" or "Restore" depending on your intent. Backup will give you a typical file saving dialog. Restore will give you a list of your recent backups and an option at the bottom to select a file. Use that option to navigate to a manual backup.<br /><br />Once again, software saves me from having a productive morning. :-(<br /><br />By the way, this bug has already been reported: <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1472127">https://bugzilla.mozilla.org/show_bug.cgi?id=1472127</a>. <br /><br />Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-3797701287895251572018-07-03T19:30:00.000-04:002018-07-03T19:30:38.132-04:00Usefulness of Computer Science: An ExampleI thought I would follow up on my June 29 post, "<a href="https://orinanobworld.blogspot.com/2018/06/does-computer-science-help-with-or.html" target="_blank">Does Computer Science Help with OR?</a>", by giving a quick example of how exposure to fundamentals of computer science recently helped me.<br /><br />A current research project involves optimization models containing large numbers of what are basically set covering constraints, constraints of the form $$\sum_{i\in S} x_i \ge 1,$$ where the $x_i$ are binary variables and $S$ is some subset of the set of all possible indices. The constraints are generated on the fly (exactly how is irrelevant here).<br /><br />In some cases, the same constraint may be generated more than once, since portions of the code run in parallel threads. Duplicates need to be weeded out before the constraints are added to the main integer programming model. Also, redundant constraints may be generated. By that, I mean we may have two cover constraints, summing over sets $S_1$ and $S_2$, where $S_1 \subset S_2$. When that happens, the first constraint implies the second one, so the second (weaker) constraint is redundant and should be dropped.<br /><br />So there comes a "moment of reckoning" where all the constraints generated by all those parallel threads get tossed together, and duplicate or redundant ones need to be weeded out. That turns out to be a rather tedious, time-consuming operation, which brings me to how the constraints are represented. I'm coding in Java, which has various implementations of a <span style="font-family: "Courier New", Courier, monospace;">Set</span> interface to represent sets. The coding path of least resistance would be to toss the indices for each constraint into some class implementing that interface (I generally gravitate to <span style="font-family: "Courier New", Courier, monospace;">HashSet</span>). The <span style="font-family: "Courier New", Courier, monospace;">Set</span> interface defines an <span style="font-family: "Courier New", Courier, monospace;">equals()</span> method to test for equality and a <span style="font-family: "Courier New", Courier, monospace;">containsAll()</span> method to test whether another set is a subset of a given set. So this would be pretty straightforward to code.<br /><br />The catch lies in performance. I have not found it documented anywhere, but I <i>suspect</i> that adding elements to a <span style="font-family: "Courier New", Courier, monospace;">HashSet</span> is $O(n)$ while checking subset status or equality is $O(n^2)$, where $n$ is the number of possible objects (indices). The reason I say $O(n^2)$ for the latter two operations is that, in the worst case, I suspect that Java takes each object from one subset and compares it to every object in the other set until it finds a match or runs out of things to which to compare. That means potentially $O(n)$ comparisons for each of $O(n)$ elements of the first set, getting us to $O(n^2)$.<br /><br />A while back, I took the excellent (and free) online course "<a href="https://www.coursera.org/learn/algorithms-part1" target="_blank">Algorithms, Part 1</a>", offered by a couple of faculty from my alma mater Princeton University. I believe it was <a href="https://www.coursera.org/instructor/~250165" target="_blank">Robert Sedgewick</a> who said at one point (and I'm paraphrasing here) that sorting is cheap, so if you have any inkling it might help, do it. The binary variables in my model represent selection or non-selection of a particular type of object, and I assigned a complete ordering to them in my code. By "complete ordering" I mean that, given two objects $i$ and $j$, I can tell (in constant time) which one is "preferable". Again, the details do not matter, nor does the plausibility (or implausibility) of the order I made up. It just matters that things are ordered.<br /><br />So rather than just dump subscripts into <span style="font-family: "Courier New", Courier, monospace;">HashSet</span>s, I created a custom class that stores them in a <span style="font-family: "Courier New", Courier, monospace;">TreeSet,</span> a type of Java set that maintains sort order using the ordering I created. The custom class also provides some useful functions. One of those functions is <span style="font-family: "Courier New", Courier, monospace;">isSubsetOf()</span>, which does pretty much what it sounds like: <span style="font-family: "Courier New", Courier, monospace;">A.isSubsetOf(B)</span> returns true if set $A$ <span style="font-family: "Courier New", Courier, monospace;"></span> is a subset of set $B$<span style="font-family: "Courier New", Courier, monospace;"></span> and false if not.<br /><br />In the <span style="font-family: "Courier New", Courier, monospace;">isSubsetOf()</span> method, I start with what are called iterators for the two sets $A$<span style="font-family: "Courier New", Courier, monospace;"></span> and $B.$ Each starts out pointing to the smallest member of its set, "smallest" defined according to the ordering I specified. If the smallest member of $B$ is bigger than the smallest member of $A$, then the first element of $A$ cannot belong to $B$, and we have our answer: $A\not\subseteq B$. If the smallest element of $B$ is smaller than the smallest element of $A$, I iterate through element of $B$ until either I find a match to the smallest element of $A$ or run out of elements of $B$ (in which case, again, $A\not\subseteq B$). Suppose I do find a match. I bump the iterator for $A$ to find the second smallest element of $A$, then iterate through subsequent members of $B$ (picking up where I left off in $B$, which is important) until, again, I get a match or die trying. I keep doing this until I get an answer or run out of elements of $A$. At that point, I know that $A\subseteq B$.<br /><br />What's the payoff for all this extra work? Since I look at each element of $A$ and each element of $B$ at most once, my i<span style="font-family: "Courier New", Courier, monospace;">sSubstOf()</span> method requires $O(n)$ time, not $O(n^2)$ time. Using a <span style="font-family: "Courier New", Courier, monospace;">TreeSet</span> means the contents of each set have to be sorted at the time of creation, which is $O(n\log n)$, still better than $O(n^2)$. I actually did code it both ways (<span style="font-family: "Courier New", Courier, monospace;">HashSet</span> versus my custom class) and timed them on one or two moderately large instances. My way is in fact faster. Without having a bit of exposure to computer science (including the Princeton MOOC), though, it would never have occurred to me that I could speed up what was proving to be a bottleneck in my code.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com3tag:blogger.com,1999:blog-8781383461061929571.post-22193855628552096642018-06-29T17:06:00.000-04:002018-06-29T17:06:02.382-04:00Does Computer Science Help with OR?Fair warning: tl/dr.<br /><br />After reading a blog post yesterday by John D. Cook, "<a href="https://www.johndcook.com/blog/2018/06/28/cs-and-programming/" target="_blank">Does computer science help you program?</a>", I decided to throw in my two cents (convert to euros at your own risk) on a related topic: does computer science (which I will extend to including programming) help you as an OR/IE/management science/analytics professional? What follows is a mix of opinion and experience, and hence cannot be tested rigorously. (In other words, it's a blog post.)<br /><br /><h3>Programming</h3><br />The obvious starting point is programming. Do you need to know how to program to work in OR (and related fields -- to cut down on typos, I'm going to use "OR" as an umbrella acronym)? In many (most?) cases, yes! Some people with boring jobs may only have to shove data into a statistics program with a graphical user interface and push buttons, or write optimization models in a high level language (technically a form of coding, but I'll discount that) and summon the optimizer, but many will have to do tasks either too complicated for high level development environments, too quirky, or just not suited to one. Unless you magically find yourself endowed with limitless programming support, sooner or later (most likely sooner) you are likely to need to do some coding. (Note to faculty: even if you have hot and cold running students, don't assume that means adequate programming support. I taught simulation coding to doctoral students in a previous life. One of them wrote a program that gave "spaghetti code" a whole new meaning. It looked like a street map of downtown Rome after an earthquake.)<br /><br />I've done a ton of coding in my time, so perhaps I'm a bit biased. Still, I cannot recall the last OR project I did (research, application, teaching tool or whatever) that did not involve significant coding.<br /><br /><h4>Languages </h4><br />John D. Cook once did a blog post (I don't have the link at hand) about how many languages an analyst or applied mathematician needed to know. I forget the details, but the gist was "one for every type of programming task you do". So here's my toolkit:<br /><ul><li>one scripting language, for quick or one-off tasks (R for me; others may prefer Python or whatever);</li><li>one language suited for data manipulation/analysis/graphic (R again for me);</li><li>one language for "substantial" computational tasks (Java for me, C++ for a lot of people, Julia for some recent adopters, MATLAB or equivalent for some people, ...); and</li><li>one language for dealing with databases (SQL for me, although "SQL" is like saying "Indian" ... there are a lot of "dialects").</li></ul>In case you're wondering how the first two bullets differ (since I use R for both), here's a recent example of the first bullet. A coauthor-to-be and I received a data set from the author of an earlier paper. One file was a MATLAB script with data embedded, and another looked like the output of a MATLAB script. We needed to extract the parts relevant to our work from both, smush them together in a reasonable way, and reformulate the useful parts to the file format our program expects. That does not exactly call for an object-oriented program, and using a script allowed me to test individual bits and see if they did what I expected (which was not always the case).<br /><br /><h4>Parallel computation</h4><br />I went a long time knowing hardly anything about this, because I was using <a href="https://en.wikipedia.org/wiki/Abacus" target="_blank">earlier computing devices</a> where parallelism was out of the question. Today, though, it is pretty easy to find yourself tackling problems where parallel threads or parallel processes are hard to avoid. This includes, but is not limited to, writing programs with a graphical user interface, where doing heavy number crunching in the main thread will freeze the GUI and seriously frustrate the user. I just finished (knock on virtual wood) a Java program for a research project. During the late stages, while tweaking some code related to a heuristic that runs parallel threads and also uses CPLEX, I managed to (a) fill up main memory once (resulting in disk thrashing and essentially paralyzing not only the program interface but the operating system interface ... meaning I couldn't even kill the program) and (b) crash the Java virtual machine (three times, actually). So, trust me, understanding parallel processing can really be important.<br /><br /><br /><h3>"Theoretical" Computer Science</h3><br />This is what John Cook was after in his informal poll. Here, my answer is that some parts are very useful, others pretty much not useful at all, and maybe some stuff in between.<br /><br /><h4>Data structures </h4><br />As a graduate student (in math, on the quarter system), I took a three course sequence required for junior year computer science majors. One course concentrated on how to write operating systems. It's perhaps useful to have a basic knowledge of what an operating system does, but I'm pretty sure an OR person can get that from reading computer magazines or something, without taking a course in it. I've totally forgotten what another one of the courses covered, which suggests that maybe it was not crucial to my professional life.<br /><br />The third course focused on data structures, and while much of what I learned has probably become obsolete (does anyone still use circular buffers?), it's useful both to know the basics and to understand some of the concerns related to different data structures. More than once, while hacking Java code, I've had to give some thought to whether I would be doing a lot of "give me element 7" (which favors array-like structures) versus "give me the smallest element" (favoring tree-like structures) versus "the same thing is going to get added a zillion times, and you only need to keep one copy" (set interfaces, built over who knows what kind of structure).<br /><br /><h4>Complexity</h4><br />Another computer science topic you need to know is computational complexity, for a variety of reasons. The first is that, at least if you work in optimization, you will be bombarded by statements about how this problem or that is NP-ridiculous, or this algorithm or that is NP-annoying. (The latter is total b.s. -- NP-hardness is a property of the problem, not the algorithm. Nonetheless, I occasionally see statements like that.) It's important to know both what NP-hardness is (a sign that medium to large problem instances might get pretty annoying) and what it is not (a sign of the apocalypse, or an excuse for skipping the attempt to get an exact solution and immediately dragging out your favorite metaheuristic). You should also understand what a proof of NP-hardness entails, which is more than just saying "it's an integer program".<br /><br />Beyond NP silliness, though, you need to understand what $O(\dots)$ complexity means, and in particular the fact that an $O(n^2)$ algorithm is slower than an $O(n \log(n))$ algorithm for large $n$, but possibly faster for small $n$. This can help in choosing alternative ways to do computational tasks in your own code.<br /><br /><h4>Finite precision</h4><br />This may turn up in a numerical analysis class in a mathematics program, or in a computer science course, but either way you <i>really</i> need to understand what rounding and truncation error are, how accurate double-precision floating point arithmetic is, etc. First, this will help you avoid embarrassment by asking questions like "Why does the solver say my 0-1 variable is 0.999999999999, which is clearly an error?". Second, it will give you an appreciation for why doing things with "stiff" matrices can create remarkably goofy results from very straightforward looking problems. That, in turn, will help you understand why "big M" methods are both a boon and a curse in optimization.<br /><br />I may have more to say on this at a later date, but right now my computer is running out of electrons, so I'll quit here.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com1tag:blogger.com,1999:blog-8781383461061929571.post-52655816084171556152018-06-19T11:03:00.001-04:002018-06-19T11:03:20.610-04:00Callback Cuts That RepeatThe following post is specific to the <a href="https://www.ibm.com/analytics/data-science/prescriptive-analytics/cplex-optimizer" target="_blank">CPLEX</a> integer programming solver. I have no idea whether it applies to other solvers, or even which other solver have cut callbacks.<br /><br />Every so often, a user will discover that a callback routine they wrote has "rediscovered" a cut it previously generated. This can be a bit concerning at first, for a few reasons: it seems implausible, and therefore raises concerns of a bug someplace; it represents repeated work, and thus wasted effort; and it suggests at least the possibility of getting stuck in a loop, the programmatic equivalent of "<a href="https://www.imdb.com/title/tt0107048/" target="_blank">Groundhog Day</a>". (Sorry, couldn't resist.) As it happens, though, repeating the same cut can legitimately happen (though hopefully not too often).<br /><br />First, I should probably define what I mean by callbacks here (although I'm tempted to assume that if you don't know what I mean, you've already stopped reading). If you want to get your geek on, feel free to wade through the <a href="https://en.wikipedia.org/wiki/Callback_(computer_programming)" target="_blank">Wikipedia explanation</a> of a callback function. In the context of CPLEX, what I'm referring to is a user-written function that CPLEX will call at certain points in the search process. I will focus on functions called when CPLEX is generating cuts to tighten the bound at a given node of the search tree (a "user cut callback" in their terminology) or when CPLEX thinks it has identified an integer-feasible solution better than the current incumbent (a "lazy constraint callback"). That terminology pertains to versions of CPLEX prior to 12.8, when those would be two separate (now "legacy") callbacks. As of CPLEX version 12.8, there is a single ("generic") type of callback, but generic callbacks continue to be called from multiple "contexts", including those two (solving a node LP, checking a new candidate solution).<br /><br />The purpose here is to let the user either generate a bound-tightening constraint ("user cut") using particular knowledge of the problem structure, or to vet a candidate solution and, if unsuitable, issue a new constraint ("lazy constraint") that cuts off that solution, again based on problem information not part of the original IP model. The latter is particularly common with decomposition techniques such as <a href="https://en.wikipedia.org/wiki/Benders_decomposition" target="_blank">Benders decomposition</a>.<br /><br />So why would the same user cut or lazy constraint be generated more than once (other than due to a bug)? There are at least two, and possibly three, explanations.<br /><br /><h3>Local versus Global</h3><br />A user cut can be either local or global. The difference is whether the cut is valid in the original model ("global") or whether it is contingent on branching decisions that led to the current node ("local"). The user can specify in the callback whether a new cut is global or local. If the cut is specified as local, it will be enforced only at descendants of the current node.<br /><br />That said, it is possible that a local cut might be valid at more than one node of the search tree, in which case it might be rediscovered when those other nodes are visited. In the worst case, if the cut is actually globally valid but for some reason added with the local flag set, it may be rediscovered quite a few times.<br /><br /><h3>Parallel Threading</h3><br />On a system with multiple cores (or multiple processors), using parallel threads can speed CPLEX up. Parallel threads, though, are probably the most common cause for cuts and lazy constraints repeating.<br /><br />The issue is that threads operate somewhat independently and only synchronize periodically. So suppose that thread A triggers a callback that generates a globally valid user cut or lazy constraint, which is immediately added to the problem A is solving. Thread B, which is working on a somewhat different problem (from a different part of the search tree), is unaware of the new cut/constraint until it reaches a synchronization point (and finds out what A and any other sibling threads have been up to). Prior to that, B might stumble on the same cut/constraint. Since A and B are calling the same user callback function, if the user is tracking what has been generated inside that function, the function will register a repetition. This is normal (and harmless).<br /><br /><h3>Cut Tables</h3><br />This last explanation is one I am not entirely sure about. When cuts and lazy constraints are added, CPLEX stores them internally in some sort of table. I believe that it is possible in some circumstances for a callback function to be called before the table is (fully) checked, in which case the callback might generate a cut or constraint that is already in the table. Since this deals with the internal workings of CPLEX (the secret sauce), I don't know first-hand if this is true or not ... but if it is, it is again slightly wasteful of time but generally harmless.<br /><br /><h3>User Error </h3><br />Of course, none of this precludes the possibility of a bug in the user's code. If, for example, the user reacts to a candidate solution with a lazy constraint that is intended to cut off that solution but does not (due to incorrect formulation or construction), CPLEX will register the user constraint, notice that the solution is still apparently valid, and give the callback another crack at it. (At least that is how it worked with legacy callbacks, and I <i>think</i> that is how it works with generics.) Seeing the same solution, the user callback might generate the same (incorrect) lazy constraints, and off the code goes, chasing its own tail.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-76104979455761626232018-05-30T11:26:00.000-04:002018-05-30T11:26:57.077-04:00Getting Videos to Play in FirefoxI <i>think</i> I've solved a lingering problem I've had playing certain videos in <a href="https://www.mozilla.org/en-US/firefox/" target="_blank">Firefox</a>, and I'm posting the details here mostly so I don't forget them and partly in case anybody else is tripping over this.<br /><br />I will periodically land on a web page with a video at the top. Usually I don't want to play the video, but sometimes I do ... in which case I have mixed results getting the video to play in Firefox. Typically Chrome has no such problem with the page, which makes sense given that the issue in Firefox relates to a couple of browser extensions I use. What made diagnosing the problem tricky was that (a) results were inconsistent (some videos played, some didn't) and (b) the problem resulted from a combination of two extensions, not just one particular one.<br /><br />They symptom was that, when I clicked the play button on a video, the player box would turn into a black screen with a spinner that would spin indefinitely, until an error message popped up saying that the video player had hit a time limit waiting for the video to load. Again, this happened on some videos but not others. In particular, I never had a problem with a YouTube video. It was at least consistent in that a video that triggered the error would always trigger the error, regardless of page reloads etc.<br /><br />The two Firefox extensions involved are <a href="https://www.eff.org/https-everywhere" target="_blank">HTTPS Everywhere</a> (which tries to force page contents to load using the more secure HTTPS protocol than the ordinary HTTP protocol) and <a href="https://addons.mozilla.org/en-US/firefox/addon/disable-autoplay/" target="_blank">Disable HTML5 Autoplay</a> (which prevents videos from automatically starting to play). The first extension is a security enhancement. (For an opinion piece on why HTTPS is important, read "<a href="https://developers.google.com/web/fundamentals/security/encrypt-in-transit/why-https" target="_blank">Why HTTPS Matters</a>".) Unfortunately, many web sites either do not deploy HTTPS for some content or screw up their site configuration. Regarding the second extension, I find video autoplay to be rather annoying, since it frequently involves ads or other videos that I have no interest in, and forces me to play whack-a-mole to shut them the bleep up.<br /><br />I'm loathe to give up either extension, and fortunately I don't have to. What works for me is a combination of two tweaks. On a site where I get problem videos (time.com is the main source, in my case), I click the toolbar button for the autoplay extension, leave "Disable Autoplay" selected, but deselect "Disable Preloading". That only needs to be done once per site. With a page open containing a problem video, I then disable HTTPS Everywhere (again, by clicking its toolbar button and deselecting the first option). That should automatically cause the page to reload, and the video will play properly. After I'm done watching, I just reenable HTTPS Everywhere. This part has to be repeated for each page containing a video that will not load via HTTPS, but it's a price I'm willing to pay to preserve security.<br /><br />Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-8366149693963839082018-05-15T15:03:00.001-04:002018-05-15T15:03:12.092-04:00Grouping Rows of a MatrixI spent a large chunk of yesterday afternoon doing something I thought would be simple (relatively speaking) in <a href="https://en.wikipedia.org/wiki/LaTeX" target="_blank">LaTeX</a>. I wanted to group rows of a matrix (actually, in my case, a vector) with right braces, and label the groups. An example of what I wanted is in the image below.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-t2tDchdNOnk/Wvspq6VIptI/AAAAAAAACoc/0uDGDRWYKOM3Ww0owIsHBxw4w-3-PDKXACLcBGAs/s1600/braces.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="221" data-original-width="160" src="https://4.bp.blogspot.com/-t2tDchdNOnk/Wvspq6VIptI/AAAAAAAACoc/0uDGDRWYKOM3Ww0owIsHBxw4w-3-PDKXACLcBGAs/s1600/braces.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div>This seems to me to be a fairly common thing to do, and LaTeX has been around over 35 years (TeX even longer), so by now it must be pretty easy. Right? Um, not so much. I wore out Google looking for packages that would do this. Curiously, it's easy to put braces <i>over</i> and <i>under</i> things:<br /><ul><li>$\overbrace{[x_1,\dots,x_n]}$ [<span style="font-family: "courier new" , "courier" , monospace;">\overbrace{[x_1,\dots,x_n]}</span>];</li><li>$\underbrace{[x_1,\dots,x_n]}$ [<span style="font-family: "courier new" , "courier" , monospace;">\underbrace{[x_1,\dots,x_n]}</span>].</li></ul>There are packages to let you surround matrices, arrays etc. with a variety of delimiters (not just parentheses or square brackets). Nowhere, though, could I find a command or package to do the above.<br /><br />Fortunately, something pointed me in the direction of the <a href="https://en.wikipedia.org/wiki/PGF/TikZ" target="_blank">PGF/TiKZ</a> package, which I've used in the past for doing drawings. It's an incredible tool in terms of both what it can do and the outstanding quality of its manual. Because it does so many things, I've never really gotten to know all its capabilities, and in particular its ability to do matrices in a picture environment.<br /><br />Here is the code to do my illustration. You need to load the TiKZ package and two of its libraries in your document preamble, as follows:<br /><br /><pre>\usepackage{tikz}<br />\usetikzlibrary{matrix, decorations.pathreplacing}</pre><br />The code for the drawing is:<br /><br /><div class="scroll"><pre>\begin{tikzpicture}<br /> \matrix (vec) [matrix of math nodes, left delimiter = {[}, right delimiter = {]}] {<br />f_1 \\<br />\vdots \\<br />f_{a} \\<br />f_{a + 1} \\<br />\vdots \\<br />f_{b} \\<br />f_{b + 1} \\<br />\vdots \\<br />f_{c} \\<br />};<br />\node (a) at (vec-1-1.north) [right=20pt]{};<br />\node (b) at (vec-3-1.south) [right=20pt]{};<br />\node (c) at (vec-4-1.north) [right=20pt]{};<br />\node (d) at (vec-6-1.south) [right=20pt]{};<br />\node (e) at (vec-7-1.north) [right=20pt]{};<br />\node (f) at (vec-9-1.south) [right=20pt]{};<br />\draw [decorate, decoration={brace, amplitude=10pt}] (a) -- (b) node[midway, right=10pt] {\footnotesize something};<br />\draw [decorate, decoration={brace, amplitude=10pt}] (c) -- (d) node[midway, right=10pt] {\footnotesize something else};<br />\draw [decorate, decoration={brace, amplitude=10pt}] (e) -- (f) node[midway, right=10pt] {\footnotesize something silly};<br />\end{tikzpicture}<br /></pre></div><br />The name of the matrix ("vec") is arbitrary. The amplitude for the brace (10pt) and the offsets (10pt and 20pt) are matters of taste.<br /><br />If you happen to know a faster way of doing this, please do share in a comment. Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-49090522745632818582018-04-27T14:35:00.002-04:002018-04-27T14:35:59.718-04:00Big M and Integrality ToleranceA change I made to an answer I posted on <a href="https://www.or-exchange.org/" target="_blank">OR-Exchange</a>, based on a comment from a well-informed user of OR-X, might be worth repeating here on the blog. It has to do with issues that can occur when using "big M" type integer programming models, a topic I've <a href="https://orinanobworld.blogspot.com/2011/07/perils-of-big-m.html" target="_blank">covered here</a> before.<br /><br />As I mentioned in that previous post, part of the problem in "big M" formulations stems from the inevitable <a href="https://en.wikipedia.org/wiki/Numerical_analysis#The_generation_and_propagation_of_errors" target="_blank">rounding error</a> in any non-integer computations done on a computer. A particular manifestation of rounding error (regardless of whether there are "big M" coefficients in the model or not) is that the double precision value assigned to integer variables in a solution will not necessarily be integers. With surprising frequency, I see users of MIP software demanding to know why the answer they got for their integer variable was 0.9999999999975 or 1.000000000032 rather than exactly 1. The answer has two parts: (a) rounding error is pretty much inevitable; and (b) the software designers accepted that reality and decreed that anything "close enough" to the nearest integer counts as being an integer for purposes of deciding if a solution is feasible. (Why the software prints all those decimal places rather than rounding for you is a separate question that I will not address.)<br /><br />So the solver generally has a parameter that gives an "integrality tolerance", just as it has a (typically separate) parameter for how close the expression in a constraint has to be to the allowed value(s) to be considered feasible (again, a nod to rounding error). In CPLEX, the name of the integrality tolerance parameter is some variant of "MIP.Tolerances.Integrality" (in earlier versions, the much more compact "EpInt"), and its default value (as of this writing) is 1.0E-5.<br /><br />So now I'll connect that to "big M". One of the more common uses of "big M" is to capture a logical constraint of the form "if condition is true then expression is limited". For instance, you might want to build into the model that if a warehouse is not open (the condition) then shipments from it must equal (or cannot exceed) zero. Algebraically, this frequently appears as $$f(x) \le My$$where $f(x)$ is some (hopefully linear) expression involving variable $x$ and $y$ is a binary variable (with 1 meaning the constraint is relaxed and 0 meaning it is enforced). In a world of exact arithmetic, and with $M$ chosen large enough, $y=1$ means the value of $f(x)$ is essentially unbounded above, while $y=0$ means $f(x)\le 0$.<br /><br />Here the issue of integrality tolerance sneaks in. Suppose that we choose some really large value of $M$, say $M=1E+10$, and that the solver decides to accept a solution where $y=1.0E-6$ (which is within CPLEX's default integrality tolerance). From the solver's perspective, $y=0$. Logically, that should mean $f(x)\le 0$, but given the rounding error in $y$ and the large value of $M$ what you actually get is $f(x)\le 10,000$. So, borrowing from my earlier example, I've got a closed warehouse shipping 10,000 units of whatever. Oops.<br /><br />A common reaction to this (and by "common" I mean I've seen it multiple times on help forums) is to say "I'll set the integrality tolerance to zero". Good luck with that. First, it's not guaranteed the software will let you. (CPLEX will.) Second, if it does let you do it, you might get a suboptimal solution, or be told your perfectly feasible problem is actually infeasible, because the software couldn't get the true optimal solution (or perhaps any solution) to have zero rounding error in all the integer variables.<br /><br />If you run into incorrect solutions in a "big M" model, some combination of tightening the integrality tolerance (but not all the way to zero) and ratcheting down the size of $M$ may fix things ... but, as with all aspects of MIP models, there are no guarantees.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com6tag:blogger.com,1999:blog-8781383461061929571.post-9061965388262978692018-03-11T16:30:00.003-04:002018-03-11T16:30:50.821-04:00Piecewise Linear Approximations in MIP ModelsIn the past, I've written about piecewise linear approximations of functions of a single variable. (There are too many posts to list here. Just type "piecewise linear" in the blog search box if you want to find them.) Handling piecewise linear approximations of multivariable functions is a bit more intimidating. I'll illustrate one approach here.<br /><br />To set the stage, assume an optimization problem involving variables $x\in\Re^{n}$, and possibly some other variables $y\in\Re^{m}$. Some or all of the $x$ and $y$ variables may be restricted to integer values. Also assume that the objective function and constraints are all linear, with the exception of one nonlinear function of the $x$ variables that may appear in one or more constraints and/or the objective function. I'll denote that function $f:\Re^{n}\rightarrow\Re$. With that, I'll write our hypothetical problem as<br />\begin{align*}<br />\text{min } & g(x,y,z)\\<br />\text{s.t. } & z=f(x)\\<br /> & (x,y,z)\in D<br />\end{align*}where $z$ is a new variable that captures the value of $f$, the domain $D$ incorporates all other functional constraints, bounds and integrality restrictions, and the objective function $g$ is linear. So if it were not for the constraint $z=f(x)$, this would be a mixed integer linear program.<br /><br />I'm also going to assume that we know <i>a priori</i> some hyperrectangle $R\subset\Re^{n}$ containing all feasible values of $x$, say $R=\prod_{i=1}^{n}[L_{i},U_{i}]$. Brace yourself, because the notation is about to get a bit messy. We will create a mesh of discrete points at which $f$ will be evaluated. First, we specify a sequence of what I will call "grid values" for each individual component of $x$. I'll denote the $j$-th grid value for $x_{i}$ by $a_{i}^{(j)}$, where $$L_{i}=a_{i}^{(1)}<a_{i}^{(2)}<\cdots<a_{i}^{(N_{i})}=U_{i}.$$The number of grid values need not be the same for each variable (hence the subscript on $N_{i}$), and the spacing need not be equal. It probably makes sense to sample $f()$ more frequently in regions where its curvature is greater, and less frequently in regions where it is fairly flat.<br /><br />I'll use the term "mesh points" for the points$$\prod_{i=1}^{n}\left\{ a_{i}^{(1)},\dots,a_{i}^{(N_{i})}\right\}$$formed by combinations of grid values, and $a^{(j)}$ will denote a generic mesh point $(a_{1}^{(j_{1})},\dots,a_{n}^{(j_{n})})$. The superscript $j$ for $w$ is a vector $(j_{1},\dots,j_{n})$ whose domain I will denote $J$. Now we can get down to the piecewise linear approximation. We will write $x$ as a convex combination of the mesh points, and approximate $f(x)$ with the corresponding convex combination of the function values at the mesh points. Just to be clear, in this formulation $z$ approximates, but no longer equals, $f(x)$. To do this, we will introduce new (continuous) variables $w^{(j)}\in[0,1]$ for each $j\in J$. There are $N_{1}\times\cdots\times N_{n}$ mesh points, and so an identical number of $w$ variables. The $w$ variables are weights assigned to the mesh points. This leads to the following additional constraints:<br />\begin{align*}<br />\sum_{j\in J} & w^{(j)}=1\\<br />\sum_{j\in J}a^{(j)}w^{(j)} & =x\\<br />\sum_{j\in J} & f(a^{(j)})w^{(j)}=z.<br />\end{align*}<br />There's still one more wrinkle with which to contend. Other than possibly at extreme points, there will be more than one convex combination of mesh points producing the same $x$ vector, and they will not all produce the same approximation $z$ of $f(x)$. Consider an example in which $n=2$, $R=[2,5]\times[1,3]$, and $f(x)=x^{\prime}x=\left\Vert x\right\Vert ^{2}$. I'll use integer-valued mesh points. Assume that the optimal solution requires that $x=(3.5,2.2)$, in which case $f(x)=17.09$. Figure 1 illustrates the situation, with the values of $f()$ at the mesh points shown in red. (Click any figure to get a better resolution version in a separate browser tab.)<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://drive.google.com/open?id=1XeZSlRW6f-rL3fKNldUCaLpxqTqtv82e" imageanchor="1" style="margin-left: auto; margin-right: auto;" target="_blank"><img border="0" data-original-height="887" data-original-width="861" height="400" src="https://4.bp.blogspot.com/-i6M7ROt9HXo/WqV9eacP58I/AAAAAAAAClo/Eu5XxDbB5gYJznW2r-QjY3mrN53GyJK_gCLcBGAs/s400/fig1.png" width="386" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 1: Rectangle $R$ and solution $x$</td></tr></tbody></table><br /><div class="separator" style="clear: both; text-align: center;"></div>Assume first that the solver prefers smaller values of $f(x)$. Figure 2 shows the weights $w$ that minimize $z$, the estimate of $f(x)$, for our given choice of $x$. The interpolated value of $z$ is $17.5$, which is moderately close to the correct value $17.09$. The three mesh points closest to $x$ receive weights 0.3, 0.5 and 0.2. Figure 3 shows an alternative solution with the same value of $z$, using a different combination of adjacent corners.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://drive.google.com/open?id=1AG1JGExyvYlttHGt--reizC-Tus9ehu7" imageanchor="1" style="margin-left: auto; margin-right: auto;" target="_blank"><img border="0" data-original-height="887" data-original-width="861" height="400" src="https://4.bp.blogspot.com/-Upy6iIzeFBI/WqV9eVLFiEI/AAAAAAAACmA/xJLUEeDHNgcEIdDRGrtaq83ZlKMNQbZYACEwYBhgL/s400/fig2.png" width="387" /> </a></td><td style="text-align: center;"><br /></td><td style="text-align: center;"><br /></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 2: Weights that minimize $z$</td></tr></tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://drive.google.com/open?id=1CV1dnfEKWqiZnIei4UemaDVaq0VnWsef" imageanchor="1" style="margin-left: auto; margin-right: auto;" target="_blank"><img border="0" data-original-height="887" data-original-width="861" height="400" src="https://4.bp.blogspot.com/-70dHGwl6T_E/WqV9eehCunI/AAAAAAAACmA/aSlPV3Z0KP47V5lLRJ938iy93mKtK44YQCEwYBhgL/s400/fig3.png" width="387" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 3: Alternative weights that minimize $z$</td></tr></tbody></table><br />Now assume instead that the solver prefers larger values of $f(x)$. The solution that maximizes $z$ is shown in Figure 4. It uses three corners of R, none of which are adjacent to $x,$ with weights 0.5, 0.4 and 0.1 Although this produces the correct value of $x$, the interpolated value of $z$ is $20.3$, which grossly overstates the correct value $17.09$.<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://drive.google.com/open?id=1keujAJkGEwVs7BSxLrzmeFVTysgqGAhC" imageanchor="1" style="margin-left: auto; margin-right: auto;" target="_blank"><img border="0" data-original-height="887" data-original-width="861" height="400" src="https://4.bp.blogspot.com/-K6E_xeID5ps/WqV9e1yUojI/AAAAAAAACl8/JtlRlVb8SO0iUJw1DROBvcs7uxgl68jVACEwYBhgL/s400/fig4.png" width="387" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 4: Weights that maximize $z$</td></tr></tbody></table>To keep the approximation as accurate as possible, we should force the solver to use mesh points adjacent to the actual solution $x$. We can do that by introducing a new binary variable $v_{i}^{(j)}\in\left\{ 0,1\right\} $ for each grid value $a_{i}^{(j)}$. Variable $v_{i}^{(j)}$ will signal whether $a_{i}^{(j)}$ is the $i$-th coordinate of the "lower left" (more generally, closest to the origin) corner of the mesh hyperrectangle containing $x$. Since we want to select a single hyperrectangle to contain $x$, we add constraints requiring exactly one choice for each coordinate of the lower left corner:<br />\[<br />\sum_{j=1}^{N_{i}}v_{i}^{(j)}=1\quad i=1,\dots,n.<br />\]Now we just need to add constraints forcing $w^{(j)}=0$ unless mesh point $j$ is one of the corners of the chosen hyperrectangle. Observe that, along any dimension $i$, the $i$-th component of any corner of the correct hyperrectangle will either be the chosen grid value or the next consecutive grid value. For instance, the rectangle containing $x$ in Figure 1 has lower left corner $(a_{1}^{(2)},a_{2}^{(2)})=(3,2)$. That means $v_{1}^{(2)}=1=v_{2}^{(2)}$. The corners of the rectangle have either $3=a_{1}^{(2)}$ or $4=a_{1}^{(3)}$ for their $x_{1}$ coordinate and either $2=a_{2}^{(2)}$ or $3=a_{2}^{(3)}$ for their $x_{2}$ coordinate.<br /><br />So for $w^{(j)}$ to be nonzero, we need either $v_{i}^{(j_{i})}=1$ or $v_{i}^{(j_{i}-1)}=1$. This leads us to the constraints \[<br />w^{(j)}\le v_{i}^{(j_{i})}+v_{i}^{(j_{i-1})}<br />\]for all indices $j$ of mesh points and for all $i=1,\dots,n$, with the understanding that $v_{i}^{(0)}=0$ (since there is no grid point prior to the first one in any direction).<br /><br />With those extra constraints, the solutions to our little example when we want $z$ small are unchanged, since they already obey the additional constraints. When we want $z$ large, however, the solution in Figure 4 is now infeasible. All three positive weights violate the new constraints. For instance, $w^{(1,3)}=0.5$ (the weight applied to the mesh point formed by the first grid value of $x_{1}$ and the third grid value of $x_{2}$), but $v_{1}^{(1)}+v_{1}^{(0)}=0$. The solutions that maximize $z$ end up being the same ones that minimize $z$ (those shown in Figures 2 and 3).<br /><br />What remains is to take stock of how much the model has inflated. Let $$P=N_{1}\times\cdots\times N_{n}$$and$$S=N_{1}+\cdots+N_{n}.$$ We first added $P$ continuous variables ($w$) and $n+2$ constraints involving them. Then we added $S$ binary variables ($v$), $n$ constraints involving just them, and $nP$ constraints tying them to the $w$ variables. That's a grand total of $P$ continuous variables, $S$ binary variables and $2n+2+nP$ constraints. Note that $n\ll S\ll P<nP$. Also note that, in general, continuous variables are computationally cheaper than integer variables. As for the gaggle of extra constraints, modern solvers have ways to deal with some constraints "lazily", which mitigates the load to some extent.<br /><br />An alternative approach would be to partion $R$ into nonoverlapping polygons (not necessarily hyperrectangles), assign weight variables $w$ to the corners of those polygons as above, and assign a binary variable to each polygon indicating whether it was selected. That would increase the number of binary variables from $S$ to $P$ (where $P$ would be the number of polygons) and decrease the number of added constraints from $2n+2+nP$ to $n+3+P.$ (The $n$ constraints requiring sums of binary variables to be 1 becomes a single constraint summing all the new binary variables. The $nP$ constraints tying the $w$ and $v$ variables together become $P$ constraints, one for each $w$ variable.) This approach is more flexible in terms of concentrating polygons where the curvature of $f$ is greatest, and it significantly reduces the number of constraints; but it significantly increases the number of binary variables, albeit with all of them tied into a single type 1 special ordered set. So it's hard for me to say which is better in general.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-60606689551724375282018-01-18T16:11:00.000-05:002018-01-18T16:11:29.069-05:00More on "Core Points"A few additions to <a href="https://orinanobworld.blogspot.com/2018/01/finding-core-point.html" target="_blank">yesterday's post</a> occurred to me belatedly.<br /><br />First, it may be a good idea to check whether your alleged core point $y^0$ is actually in the relative interior of the integer hull $\mathrm{conv}(Y)$. A sufficient condition is that, when you substitute $y^0$ into the constraints, all inequality constraints <i>including variable bounds</i> have positive slack. (Equality constraints obviously will not have slack.) In particular, do not forget that nonnegativity restrictions count as bounds. If you specify $0\le y_i \le u_i$, then you are looking for $\epsilon \le y^0_i \le u_i - \epsilon$ for some $\epsilon > 0$ (and $\epsilon$ greater than your tolerance for rounding error).<br /><br />Second, while the presence of equality constraints will definitely make the feasible region less than full dimension, it can occur even in a problem with only inequality constraints. Consider a problem with nonnegative general integer variables and the following constraints (as well as others, and other variables): \begin{align*} y_{1}+y_{2} +y_{3} & \le 2\\ y_{2}+y_{4} + y_{5} & \le 4\\ y_{1}+y_{3} + y_{4} + y_{5}& \ge 6 \end{align*} Although all the constraints are inequalities, the feasible region will live in the hyperplane $y_2 = 0$, and thus be less than full dimension. This points to why I said the condition in the previous paragraph is sufficient rather than necessary and sufficient for $y^0$ to be in the relative interior of the integer hull. In this example, there is no way to get positive slack in all three of the constraints (or, in fact, in any one of them) without violating the nonnegativity restriction on $y_2$.<br /><br />Yesterday, I listed a few things one could try in the hope of getting a core point $y^0$ in the relative interior of the integer hull. Here are a few others that occurred to me. (Warning: I'm going to use the term "slack variable" for both slacks and surpluses.)<br /><ul><li>Tighten all inequality constraints (including variable bounds) and solve the LP relaxation of the tightened problem. (Feel free to change the objective function if you wish.) If you find a feasible solution $y^0$, it will be in the relative interior of the LP hull, and quite possibly in the integer hull. Good news: It's easy to do. Bad news: Even a modest tightening might make the problem infeasible (see example above).</li><li>Insert slack variables in all constraints, including variable bounds. That means $0 \le y_i \le u_i$ would become \begin{align*} y_{i}-s_{i} & \ge0\\ y_{i}+t_{i} & \le u_{i} \end{align*} where $s_i \ge 0$ and $t_i \ge 0$ are slack variables. Maximize the minimum value of the slacks over the LP relaxation of the modified problem. Good news: If the solution has positive objective value (meaning all slacks are positive), you have a relative interior point of at least the LP hull. Bad news: An unfortunate combination of inequalities, like the example above, may prevent you from getting all slacks positive. </li></ul>Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-72700011616046427952018-01-17T18:15:00.000-05:002018-01-17T18:15:15.038-05:00Finding a "Core Point"In a famous (or at least relatively famous) paper [1], Magnanti and Wong suggest a method to accelerate the progress of <a href="https://en.wikipedia.org/wiki/Benders_decomposition" target="_blank">Benders decomposition</a> for certain mixed-integer programs by sharpening "optimality" cuts. Their approach requires the determination of what they call a <i>core point</i>. I'll try to follow their notation as much as possible. Let $Y$ be the set of integer-valued vectors satisfying all the constraints that involve solely the integer variables. Basically, $Y$ would be the feasible region for the problem if the continuous variables did not exist. A core point is a point $y^0$ in the <a href="https://en.wikipedia.org/wiki/Relative_interior" target="_blank">relative interior</a> of the <a href="https://en.wikipedia.org/wiki/Convex_hull" target="_blank">convex hull</a> of $Y$.<br /><br />I'll assume that any reader who made it this far knows what a convex hull is. That said, I should point out that the convex hull of $Y$ is <i>not</i> the feasible region of the LP relaxation of $Y$. The feasible region of the LP relaxation is frequently termed the "LP hull", whereas the convex hull of $Y$ is typically referred to as the "integer hull". The integer hull is a subset (typically a proper subset) of the LP hull. Knowing what the integer hull is basically makes a MIP model pretty easy: solving a linear program over the integer hull yields the integer optimum. Since it's pretty well known that most MIP models are not easy, one can infer that in most cases the integer hull is not known (and not easy to figure out).<br /><br />Mathematicians will know the term "relative interior", but it may not be familiar to OR/IE/MS people trying to solve MIP models. In Euclidean spaces, a point belongs to the interior of a set if there's a ball centered around that point and contained entirely in the set. In lay terms, you're at an interior point if you move slightly in any direction and not leave the set. In our context, that set is the integer hull, a convex polyhedron. The catch is that if the set is not full dimensional, there is no interior (or, more properly, the interior is empty). Picture the unit cube in $\mathbb{R}^3$ as the integer hull. The point $y^0 = (0.5, 0.5, 0.5)$ is in the interior; you can move a bit in any direction and stay in the cube. Now add the constraint $y_3 = 0.5$, which flattens the integer hull to a square (still containing $y^0$). You can move a bit in the $y_1$ and $y_2$ directions and stay in the square, but any vertical movement (parallel to the $y_3$ axis) and you're no longer feasible. That brings us to the relative interior. A point is in the relative interior of a set if it can be surrounded by a ball in the largest subspace for which the set is full dimensional, with the ball staying in the set. Returning to our example, before adding the equality constraint I could surround $y^0$ with a sphere contained in the unit cube. After adding the equality constraint, making the feasible region a square, I can no longer surround $y^0$ with a feasible sphere, but I can surround it with a feasible circle in the $y_3 = 0.5$ plane. So $y^0$ is at least in the relative interior of the integer hull after adding the equation constraint.<br /><br />Back to Magnanti-Wong. To use their technique, you need to know a "core point" up front. In their paper, they mention that in some cases a core point will be obvious ... but in many cases it will not be. So I'll mention some techniques that <i>probably</i> yield a core point (but no general guarantee).<br /><ul><li>If you happen to know two or more integer-feasible points, average them. The average will be feasible, and unless all those points live on the same <a href="https://en.wikipedia.org/wiki/Facet_(geometry)" target="_blank">facet</a> of the integer hull, you should get a relative interior point.</li><li>Pick an objective function and optimize it (maximize or minimize, it doesn't matter) over $Y$, ignoring the continuous variables and any constraints involving them from the original problem. Do this a few times: minimize and maximize the same objective, switch the objective, iterate; or just minimize (or maximize) a different objective each time. I might use randomly generated objective functions, but an easy alternative is to minimize $y_1$, then maximize $y_1$, then minimize $y_2$, etc. Average the solutions you get. This is guaranteed to belong to integer hull, and almost surely (at least with random objectives and multiple iterations) to the relative interior of the integer hull. Bad news: you just solved a bunch of integer programs, which might be a trifle time-consuming.</li><li>Use the previous method, but optimize over the LP hull (relaxing the integrality constraints on $y$). Again, average the solutions you get. Good news: Solving a handful of LPs is typically much less painful than solving a handful of IPs. Bad news: Since the LP hull is a superset of the integer hull, and since your LP solutions are all vertices of the LP hull, there's at least a chance that the average of them lives inside the LP hull but outside the integer hull. That said, I've used this method once or twice and not lost any sleep. If your core point happens to fall outside the integer hull, I don't think it will cause any problems; it just probably won't make your Benders decomposition solve any faster.</li><li>Generate a bunch of random integer vectors of the correct dimension, and filter out those that do not satisfy the constraints defining $Y$. Average the survivors. Good news: Each of the surviving integer vectors will belong to the integer hull, so your averaged core point $y^0$ definitely will as well. Also, generating random integer vectors is pretty easy. Bad news: If the integer hull is less than full dimension, the probability of a random vector falling in it is zero, so the likelihood of any "survivors" is negligible. Mitigating news: In some cases you can randomly generate an integer vector and then tweak it to ensure it satisfies the constraints that are keeping the integer hull from being full dimension. For instance, suppose that you have a constraint of the form $\sum_i y_i = B$ where $B$ is an integer constant. Generate a random integer vector $\tilde{y}$, replace $\tilde{y}_1$ with $B-\sum_{i>1}\tilde{y}_i$, and you have a vector satisfying that constraint. If you can pull off similar tweaks for other equation constraints (staying integer-valued and not violating bounds on the variables), maybe you can get lucky and find a few integer-feasible points. In fact, even if you can only find one survivor (before exhausting your patience), you might get lucky and have it be in the relative interior of the integer hull. (You won't know this, but you can try using it as your core point and hope for the best.)</li></ul><br /><br />[1] Magnanti, T. L. & Wong, R. T., Accelerating Benders Decomposition: Algorithmic Enhancement and Model Selection Criteria. <i>Operations Research</i>, <b>1981</b>, <i>29</i>, 464-484Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com7tag:blogger.com,1999:blog-8781383461061929571.post-77918086555135353452018-01-01T19:37:00.002-05:002018-01-01T19:37:50.636-05:00Ordering Index Vector with Java StreamsI bumped up against the following problem while doing some coding in Java 8 (and using <a href="https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html" target="_blank">streams</a> where possible). Given a vector of objects $x_1, \dots, x_N$ that come from some domain having an ordering $\preccurlyeq$, find the vector of indices $i_1, \dots, i_N$ that sorts the original values into ascending order, i.e., such that $x_{i_1} \preccurlyeq x_{i_2} \preccurlyeq \cdots \preccurlyeq x_{i_N}$ I'm not sure there's an "official" name for that vector of indices, but I've seen it referred to more than once as the "ordering index vector" (hence the name of this post).<br /><br />One of the nice features of the Java stream package is that it is really easy to sort streams, either using their natural ordering (where applicable) or using a specified ordering. As best I can tell, though, there is (at least currently) no built-in way to find out the original indices or positions of the sorted results. Fortunately, it didn't take to long to hack something that works. It may not be the most elegant way to get the ordering vector, but I'll share it anyway in case someone finds it useful.<br /><br />My example will sort a vector of doubles, since that was what I was trying to do when I was forced to come up with this code. With fairly obvious modifications, it should work for sorting vectors of other types. Here is the code. Please try not to laugh.<br /><br /><div class="scroll"><pre style="background-color: #e0eaee; color: black; font-family: "courier new" , monospace; font-size: 10pt;"><span style="color: #838183; font-style: italic;">// Create a vector of values whose order is desired.</span><br /><span style="color: #0057ae;">double</span><span style="color: black;">[]</span> vals <span style="color: black;">= ...</span><br /><span style="color: #838183; font-style: italic;">// Get the sort order for the values.</span><br /><span style="color: #0057ae;">int</span><span style="color: black;">[]</span> order <span style="color: black;">=</span><br /> IntStream<span style="color: black;">.</span><span style="color: #010181;">range</span><span style="color: black;">(</span><span style="color: #b07e00;">0</span><span style="color: black;">,</span> vals<span style="color: black;">.</span>length<span style="color: black;">)</span><br /> <span style="color: black;">.</span><span style="color: #010181;">boxed</span><span style="color: black;">()</span><br /> <span style="color: black;">.</span><span style="color: #010181;">sorted</span><span style="color: black;">((</span>i<span style="color: black;">,</span> j<span style="color: black;">) -></span> Double<span style="color: black;">.</span><span style="color: #010181;">compare</span><span style="color: black;">(</span>vals<span style="color: black;">[</span>i<span style="color: black;">],</span> vals<span style="color: black;">[</span>j<span style="color: black;">]))</span><br /> <span style="color: black;">.</span><span style="color: #010181;">mapToInt</span><span style="color: black;">(</span>x <span style="color: black;">-></span> x<span style="color: black;">)</span><br /> <span style="color: black;">.</span><span style="color: #010181;">toArray</span><span style="color: black;">();</span><br /><span style="color: #838183; font-style: italic;">// The sorted list is vals[order[0]], vals[order[1]], ...</span><br /></pre></div><br />The stream has to take a winding trip through the Ugly Forest to get this to work. We start out with an <span style="font-family: "Courier New", Courier, monospace;">IntStream</span>, because that is the easiest way to get a stream of indices. Unfortunately, sorting using a specified comparator is not supported by <span style="font-family: "Courier New", Courier, monospace;">IntStream</span>, so we have to "box" it get a stream of integers (<span style="font-family: "Courier New", Courier, monospace;">Stream<Integer></span>). (Yes, fans, <span style="font-family: "Courier New", Courier, monospace;">IntStream</span> and stream of integer are two separate things.) The stream of integers is sorted by comparing the values they index in the original vector of double precision reals. The we use the identity function to map the stream of integers back to an <span style="font-family: "Courier New", Courier, monospace;">IntStream</span> (!) so that we can easily convert it to a vector of integers (meaning <span style="font-family: "Courier New", Courier, monospace;">int[]</span>, not <span style="font-family: "Courier New", Courier, monospace;">Integer[]</span>).<br /><br />When I said this might not be the "most elegant" approach, what I meant was that it looks clunky (at least to me). I look forward to being schooled in the comments section.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-68684148194195461372017-12-02T15:19:00.000-05:002017-12-02T15:19:32.968-05:00Creating A New MIME TypeI struggled a bit this afternoon creating a new <a href="https://en.wikipedia.org/wiki/Media_type" target="_blank">MIME type</a> and associating it with a particular application, so I'm going to archive the solution here for future reference. This was on a Linux Mint system, but I found the key information in a <a href="https://help.gnome.org/admin/system-admin-guide/stable/mime-types-custom-user.html.en" target="_blank">GNOME documentation page</a>, so I suspect it works for Ubuntu and other systems using a GNOME-based desktop (and maybe even more generally?).<br /><br />The application in question is <a href="https://www.freefilesync.org/" target="_blank">FreeFileSync</a>, an open-source program (that I definitely recommend) for syncing directories. I use FFS to sync mirror certain directories in my home partition to an external drive. For each such directory, I have a FFS configuration, created in the FFS GUI and stored in a separate XML file (with extension <span style="font-family: "courier new" , "courier" , monospace;">.ffs_gui</span>).<br /><br />Unlike applications installed from <span style="font-family: "courier new" , "courier" , monospace;">.deb</span> files (where the installer handles the MIME-type stuff for you), FFS comes as an archive that you simply extract and park somewhere, so it does not create its own MIME-type or associate itself with an existing one. On my PC, Mint (with the MATE desktop environment) associated the extension <span style="font-family: "courier new" , "courier" , monospace;">.ffs_gui</span> with XML files in general, so making FFS the default program for <span style="font-family: "courier new" , "courier" , monospace;">.ffs_gui</span> files would have made it the default for all XML files, which is clearly undesirable. So I decided to create a MIME type for it. I also decided to store the necessary information in my home partition, rather than changing system files, so that a future system upgrade would not clobber the changes.<br /><br />Google searches produced a variety of (rather complicated) solutions, one or two of which I tried but failed to get to work. Fortunately, the steps on the GNOME page mostly did work. First, I created <span style="font-family: "courier new" , "courier" , monospace;">~/.local/share/mime/packages/application-x-freefilesync.xml</span> as instructed, naming the new MIME type <span style="font-family: "courier new" , "courier" , monospace;">application/x-freefilesync</span> and setting the glob pattern to <span style="font-family: "courier new" , "courier" , monospace;">*.ffs_gui</span>. Second, I created a new desktop file (<span style="font-family: "courier new" , "courier" , monospace;">~/.local/share/applications/freefilesync.desktop</span>) using the new MIME type and pointing to the FFS executable. Finally, I ran<br /><br /><div style="text-align: center;"><span style="font-family: "Courier New", Courier, monospace;">update-mime-database ~/.local/share/mime</span></div><br />and<br /><br /><div style="text-align: center;"><span style="font-family: "Courier New", Courier, monospace;">update-desktop-database ~/.local/share/applications</span></div><br />(as myself, not as root).<br /><br />The GNOME page suggests using the <span style="font-family: "Courier New", Courier, monospace;">gio</span> program to verify the results, but I don't have that installed on my system. What did work was to open a terminal in the directory where the FFS configuration files were stored, pick one (say, <span style="font-family: "Courier New", Courier, monospace;">foo.ffs_gui</span>) and run<br /><div style="text-align: center;"><br /></div><span style="font-family: "Courier New", Courier, monospace;">xdg-mime query filetype foo.ffs_gui</span><br /><br />to query its MIME-type, confirming that the result was <span style="font-family: "Courier New", Courier, monospace;">application/x-freefilesync</span>.<br /><br />After that, I right-clicked <span style="font-family: "Courier New", Courier, monospace;">foo.ffs_gui</span> in the file manager (Caja, on my system), chose <span style="font-family: "Courier New", Courier, monospace;">Properties > Open With</span>, added FFS and made it the default for all <span style="font-family: "Courier New", Courier, monospace;">.ffs_gui</span> files, and that was that.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-3212439075102612172017-11-13T14:37:00.000-05:002017-11-13T14:37:18.037-05:00Benders Decomposition with Generic CallbacksBrace yourself. This post is a bit long-winded (and arguably geekier than usual, which is saying something). Also, it involves CPLEX 12.8, which will not ship until some time next month.<br /><br />I have an updated version of an old example, solving a fixed charge transportation problem using Benders decomposition. The example (using Java, naturally) is scattered across several posts, going back a ways:<br /><ul><li>the <a href="https://orinanobworld.blogspot.com/2012/07/benders-decomposition-in-cplex.html" target="_blank">original example</a> was posted in 2012;</li><li>an <a href="https://orinanobworld.blogspot.com/2014/08/updated-benders-example.html" target="_blank">updated version</a> was posted in 2014; and</li><li>when IBM added automated Benders decomposition to CPLEX 12.7 in 2016, I <a href="https://orinanobworld.blogspot.com/2016/12/support-for-benders-decomposition-in.html" target="_blank">revised the example</a> yet again.</li></ul>Now here I am breaking the even year pattern by doing yet another version in 2017. If you want to see the MIP model written out in mathematical notation, it's in the 2012 post. If you want to see the code for the most recent version, which contains both a "manual" decomposition (using callbacks) and a few versions of the new (in 12.7) automatic decomposition feature, there's a link to the repository in the 2016 post. At this point in time, the 2014 post is not worth reading.<br /><br />Before I get into the updated example code, let me outline some key differences between the old ("legacy") callback system and the new ("generic") callback system.<br /><br /><b>When is a callback called by CPLEX?</b><br /><ul><li>Legacy: This is dictated by the type of callback. A branch callback is called when CPLEX is ready to split a node into children, a lazy constraint callback is called when CPLEX identifies what it thinks is a new integer-feasible solution, and so on.</li><li>Generic: When you attach your callback to the model (<span style="font-family: "courier new" , "courier" , monospace;">IloCplex</span> instance), you specify via a bit mask in which of several "contexts" it should be called. As of this writing (things do have a habit of changing), the contexts are <span style="font-family: "courier new" , "courier" , monospace;">Candidate</span> (a new integer-feasible candidate for incumbent solution has been found), <span style="font-family: "courier new" , "courier" , monospace;">GlobalProgress</span> (progress has occurred at the "global" level -- presumably something you might see in the node log), <span style="font-family: "courier new" , "courier" , monospace;">LocalProgress</span> (a thread has made what it thinks is progress, but has not yet passed it back to the controller), <span style="font-family: "courier new" , "courier" , monospace;">Relaxation</span> (CPLEX has found a solution that might not be integer-feasible, such as the solution to a node LP), <span style="font-family: "courier new" , "courier" , monospace;">ThreadDown</span> (CPLEX deactivated a thread) and <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> (CPLEX activated a thread).</li></ul>If you are fuzzy about the difference between local and global progress, one example is a thread that finds a new incumbent solution, one that is better than the incumbent that existed when the thread was launched, but not as good as the current incumbent (recently found by a different thread). That is local progress (in the first thread) but not global progress (the incumbent found by the first thread will be ignored when it gets around to reporting it to the controller). <br /><ul></ul><b>Attaching callbacks to models:</b><br /><ul><li>Legacy: You attach separate callbacks of each desired type (info callback, branch callback, user cut callback, ...).</li><li>Generic: You attach a single callback that handles all supported operations. Attaching a second callback automatically detaches the first one. When attaching the callback, you specify via a bitmap from which context(s) the callback is to be called.</li></ul><b>Callback classes:</b><br /><ul><li>Legacy: Your class extends one of the existing callbacks. For instance, if you want a lazy constraint callback, you extend the <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.LazyConstraintCallback</span> class and override the abstract <span style="font-family: "courier new" , "courier" , monospace;">main()</span> method.</li><li>Generic: Your class implements the <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.Callback.Function</span> interface and overrides the abstract <span style="font-family: "courier new" , "courier" , monospace;">invoke()</span> method.</li></ul><b>Calling functions inside the callback:</b><br /><ul><li>Legacy: You call one of the methods for the class you extended. For example, in a branch callback (extending <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.BranchCallback</span>), you would call one of the overloads of <span style="font-family: "courier new" , "courier" , monospace;">makeBranch()</span> to create a new branch.</li><li>Generic: When CPLEX calls the invoke() method, it passes in as an argument an instance of <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.Callback.Context</span>. This object has a number of methods you can invoke to find out the context, access a candidate solution (if one exists), get information about what is going on (including the index of the thread in which it was invoked, and the total number of threads in use), etc. Basically, this is your key to the castle.</li></ul><br /><ul></ul><h3>Benders decomposition</h3><br />For Benders decomposition, we will be focused on the <span style="font-family: "courier new" , "courier" , monospace;">Candidate</span> context (CPLEX thinks it has an integer-feasible solution, which we need to test) and the <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> and <span style="font-family: "courier new" , "courier" , monospace;">ThreadDown </span>contexts, for reasons I'll eventually get to.<br /><br />I wrote about thread safety in my previous post. Let me pin down why this is critical when using Benders decomposition. With Benders, we have a master problem and one or more subproblems. (I'll stick with one subproblem here, for concreteness.) The master problem model itself is never modified. Benders cuts are added to the working copy of the master problem, from the callback. Several threads could be generating Benders cuts more or less simultaneously, but it is the responsibility of CPLEX (not your code) to make sure that multiple threads adding cuts don't trip over each other.<br /><br />On the other hand, the way we generate cuts is to use the proposed incumbent solution to modify the constraint limits in the subproblem (or the subproblem objective, if you're one of those people who prefers to work with the dual of the actual subproblem). It's our code (specifically, the callback) making those changes, and if changes are being made concurrently by multiple threads, bad things can happen. Just to verify this is not hypothetical, I converted my original Benders code to the new callback system and ran it without doing anything special regarding thread safety. It crashed the JVM in a matter of seconds. Lesson learned.<br /><br />So what are our options? One is to use a single copy of the subproblem and impose a locking mechanism that prevents more than one thread from accessing the subproblem at a time. That turns out to be the easiest to program in Java (at least in my opinion), but it slows the solver down due to threads being blocked. What seems to be the general wisdom for using locks is that you should only use them on chunks of code that execute quickly. Modifying what might be a large LP model, solving it, and finding a dual ray or Farkas certificate if it is infeasible or the dual solution if it is feasible does not meet the "execute quickly" criterion.<br /><br />Another option is to generate an independent copy of the subproblem each time a thread needs it. That seems wastefully slow to me. A third, more promising approach is to generate one copy of the subproblem per thread and store it, reusing it each time the callback is called in that thread. This again eats CPU cycles generating the subproblem multiple times, but once per thread is a lot less than once each time CPLEX calls the callback. A downside, however, is that this could consume a prohibitive amount of memory if the subproblem is quite large.<br /><br /><h3>The revised Benders example</h3><br />The latest version of my Benders example is available at <a href="https://gitlab.msu.edu/orobworld/BendersExample3">https://gitlab.msu.edu/orobworld/BendersExample3</a>. In addition to my code (and Java), you will need CPLEX 12.8 (which I am told is coming out in December) and the open-source <a href="https://commons.apache.org/proper/commons-cli/" target="_blank">Apache Commons CLI library</a> (for processing command line options).<br /><br />I've added some command line options to let you alter a number of things without having to alter or comment out code: which models to run; what random seed to use; how many trials to run (default 1); whether to show CPLEX log output for neither problem, just the master, or both master and subproblem; whether to show the formulas of the cuts generated (or just settle for messages telling you when a cut is added); and whether to print the full solution (which warehouses are used, what the shipment volumes are). Run the program with "-h" or "--help" on the command line to get the full list of options.<br /><br />The new code has two versions of the "manual" (callback-based) Benders method. The "-m" command line option runs the default "manual" model, while the "-m2" option runs what I inelegantly named the "thread-local manual" model. The difference is in the approach taken to assure thread safety. I'll briefly describe that below, and of course you can pore over the code if you want more details.<br /><br /><h4>Locking</h4><br />The default manual model avoids collisions between threads by locking parts of the code so that one thread cannot execute certain portions when any other thread is executing certain portions. There are multiple ways to do this in Java, but a simple one (which I use here) is to <i>synchronize</i> methods. To do this, all you have to do is add the keyword <span style="font-family: "courier new" , "courier" , monospace;">synchronized</span> to the method declarations. (In addition to synchronizing methods, you can synchronize smaller chunks of code, but I did not use that feature.)<br /><br />My <span style="font-family: "courier new" , "courier" , monospace;">ManualBenders</span> class has a private inner class named <span style="font-family: "courier new" , "courier" , monospace;">BendersCallback</span>. The callback is entered only from the <span style="font-family: "courier new" , "courier" , monospace;">Candidate</span> context. Its sole method is the <span style="font-family: "courier new" , "courier" , monospace;">invoke()</span> method mandated by the <span style="font-family: "courier new" , "courier" , monospace;">IloCplex.Callback.Function</span> interface described earlier. By declaring that method<br /><br /><div style="text-align: center;"><span style="font-family: "courier new" , "courier" , monospace;">public synchronized void invoke(final IloCplex.Callback.Context context)</span></div><br />I ensure that no thread can invoke the callback while another thread is in the callback. The good news is that it works (prevents threads from interfering with each other) and it's incredibly easy to do. The bad news is that it results in threads blocking other threads. The subproblem in the example is pretty small. If it were slow to solve (or large enough to make updating bounds meaningfully slower), performance would be even worse.<br /><br /><h4>Thread-specific copies of the subproblem</h4><br />The "thread-local" model avoids thread collisions by giving each thread a separate copy of the subproblem to play with. I created a class (cleverly named <span style="font-family: "courier new" , "courier" , monospace;">Subproblem</span>) to hold instances of the subproblem. The callback (instance of class <span style="font-family: "courier new" , "courier" , monospace;">BendersCallback2</span>) is now entered from three different contexts, <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span><span style="font-family: inherit;">,</span> ThreadDown and <span style="font-family: "courier new" , "courier" , monospace;">Candidate</span>. As before, in the <span style="font-family: "courier new" , "courier" , monospace;">Candidate</span> context we do the usual things: use the candidate solution to the master problem to modify constraint limits in the subproblem; solve the subproblem; depending on whether it is feasible, and what its optimal objective value is, either accept the new incumbent or reject it by adding a cut. We use calls in the <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> context to create and store a copy of the subproblem for the newly activated thread, and similarly we use calls in the <span style="font-family: "Courier New",Courier,monospace;">ThreadDown</span> context to delete the local subproblem copy and recover its memory.<br /><br />So where do we store subproblem copies so that only the thread invoking the callback can see its designated copy? In their <span style="font-family: "courier new" , "courier" , monospace;">BendersATSP2.java</span> example code, IBM creates a vector, of length equal to the maximum number of threads that will be used, to store subproblems. The <span style="font-family: "courier new" , "courier" , monospace;">context</span> argument to the callback provides a method,<span style="font-family: "courier new" , "courier" , monospace;"> context.getIntInfo(IloCplex.Callback.Context.Info.ThreadId)</span>, that returns the zero-based index number of thread invoking the callback, which is used to pluck the correct copy of the subproblem from the vector of copies.<br /><br />I took a slightly different approach in my code, using a generic Java class named <span style="font-family: "courier new" , "courier" , monospace;">ThreadLocal</span> that exists precisely to hold objects local to a given thread. My <span style="font-family: "courier new" , "courier" , monospace;">BendersCallback2</span> class contains a private field declared<br /><br /><div style="text-align: center;"><span style="font-family: "courier new" , "courier" , monospace;">private final ThreadLocal<Subproblem> subproblem</span></div><br />to hold the copy of the subproblem belonging to a given thread. When the callback is called from the <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> context, <span style="font-family: "courier new" , "courier" , monospace;">subproblem.set()</span> is used to store a newly constructed copy of the subproblem; when called from the <span style="font-family: "courier new" , "courier" , monospace;">Candidate</span> context, <span style="font-family: "courier new" , "courier" , monospace;">subproblem.get()</span> is used to access the copy of the subproblem belonging to the thread. (Since this is all rather new, both to CPLEX and to me, and since I have a suspicious nature, I stuck some extra code in the example to ensure that <span style="font-family: "Courier New",Courier,monospace;">subproblem</span> is empty when called from <span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> and not empty when called from <span style="font-family: "courier new" , "courier" , monospace;">Candidate</span> or <span style="font-family: "Courier New",Courier,monospace;">ThreadDown</span>. There are comments indicating the superfluous code. Feel free to omit it from your own projects.)<br /><br />I thought that when a thread was deleted, the <span style="font-family: "courier new" , "courier" , monospace;">subproblem</span> variable would be deleted. Since <span style="font-family: "Courier New",Courier,monospace;">subproblem</span> is (I think) the only pointer to the <span style="font-family: "courier new" , "courier" , monospace;">Subproblem</span> instance for that thread, I expected the memory used by the <span style="font-family: "courier new" , "courier" , monospace;">Subproblem</span> to be recovered during garbage collection. Apparently I was wrong: my initial attempt leaked memory (enough to shut down the JVM after solving a few problems). So I added an invocation of the callback from the <span style="font-family: "courier new" , "courier" , monospace;">ThreadDown</span> context (thread deactivation), at which point the callback deletes the <span style="font-family: "courier new" , "courier" , monospace;">Subproblem</span> instance. That seems to have cured the memory leak. So, in summary, the callback is called in three contexts:<br /><ul><li><span style="font-family: "courier new" , "courier" , monospace;">ThreadUp</span> -- create a copy of the subproblem for the thread;</li><li><span style="font-family: "courier new" , "courier" , monospace;">Candidate</span> -- test a candidate incumbent and, if necessary, generate a Benders cut; and</li><li><span style="font-family: "courier new" , "courier" , monospace;">ThreadDown</span> -- delete the copy of the subproblem belonging to that thread.</li></ul>I'm comfortable with this approach, but you might prefer the subproblem vector used in the <span style="font-family: "courier new" , "courier" , monospace;">BendersATSP2.java</span> example. <br /><h3>Performance Comparison</h3><br />I ran five solution methods (Benders with locking, Benders with copies of the subproblem for each thread, and three versions of the automatic Benders decomposition added to CPLEX 12.7) on 10 instances of the fixed charge transportation problem. That's a small sample, based on a single type of problem, on a single machine (four cores, 4 GB of RAM), so I wouldn't read too much into the results. Nonetheless, I thought I would close by comparing solution times. The three automatic methods provided by CPLEX ("annotated", "automatic" and "workers", described in <a href="https://orinanobworld.blogspot.com/2016/12/support-for-benders-decomposition-in.html" target="_blank">my 2016 post</a>) had virtually indistinguishable times, so to make the plot more reasonable I am including only the "automatic" method (labeled "Auto"), along with my two approaches ("Local" for the method using copies of the subproblem for each thread and "Locked" for the method using synchronization).<br /><br />The plot below shows, as a function of wall clock time, the fraction of trials that reached proven optimality by the specified time. The automatic approach and my manual approach with thread-local subproblems were competitive (with the automatic approach having a slight advantage), while the manual approach with synchronized class methods lagged rather badly. Although I cannot say with certainty, I am rather confident that is a result of threads engaged in solving a subproblem blocking other threads wanting to do the same.<br /><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-rqzJZucPnYs/Wgd-zuNrW1I/AAAAAAAACi8/EtZ_8OFABfIxoaFq6-_a_kOnfrMRaZu0ACLcBGAs/s1600/runtimes.png" imageanchor="1"><img alt="performance plot (completion rate versus run time, by method)" border="0" data-original-height="429" data-original-width="483" height="568" src="https://3.bp.blogspot.com/-rqzJZucPnYs/Wgd-zuNrW1I/AAAAAAAACi8/EtZ_8OFABfIxoaFq6-_a_kOnfrMRaZu0ACLcBGAs/s640/runtimes.png" title="" width="640" /></a></div><br />Finally, just to drive home the realities of multiple threads, here is a sample of the output generated by the synchronized (locked) approach during one run.<br /><br /><pre>>>> Adding optimality cut<br />>>> Accepting new incumbent with value 3598.1162756564636<br />>>> Accepting new incumbent with value 3598.1162756564636<br />>>> Accepting new incumbent with value 4415.750134379851<br />>>> Accepting new incumbent with value 4415.750134379851<br />>>> Accepting new incumbent with value 4411.334611014113<br />>>> Accepting new incumbent with value 4411.334611014113<br />>>> Accepting new incumbent with value 4700.008131082763<br />>>> Accepting new incumbent with value 4700.008131082763<br />>>> Adding optimality cut<br />>>> Adding optimality cut<br />>>> Adding optimality cut<br />Final solver status = Optimal<br /># of warehouses used = 14<br />Total cost = 3598.1162756564636<br />Solution time (seconds) = 143.090</pre><br />The objective here is minimization. The messages beginning ">>>" are printed by the callback. Notice that after the optimal solution is found and accepted (in two different threads?), <i>provably suboptimal solutions</i> (solutions with inferior objective values) appear to be accepted several times. This also happens with the thread-local approach. It may well be happening in the various automated Benders approaches, but since they do not print progress messages there is no way to tell. This illustrates the difference between "local" and "global" progress. It happens because the various worker threads do not communicate with each other, and communicate only sporadically with the controller thread. So if thread 3 finds the optimal solution and has it accepted, thread 1 does not know this and continues cranking through its portion of the search tree, finding new "incumbents" that are improvements over the best solution thread 1 has seen but not necessarily improvements over the best global solution. The inferior solutions are accepted in the threads in which they were found, but the controller knows not to accept them as incumbents (because it already has a better incumbent). This behavior is not an error; it's just another indication why adding threads does not improve performance proportionally.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-16737771302254956262017-11-04T20:13:00.000-04:002017-11-08T08:25:46.533-05:00Thread SafetyAs I noted in <a href="https://orinanobworld.blogspot.com/2017/11/cplex-128-generic-callbacks.html" target="_blank">yesterday's post</a>, one of the major changes associated with the new "generic" callback structure in CPLEX is that users now bear the responsibility of making their callbacks thread-safe. As I also noted yesterday, this is pretty new stuff for me. So I'm going to try to share what I know about thread safety, but bear in mind that I don't know all that much (and don't know what I don't know). In a subsequent post, I'll share an updated example of Benders decomposition using the new callback structure.<br /><br /><h3>What is a thread?</h3><br />I'll refer you to the Wikipedia for a <a href="https://en.wikipedia.org/wiki/Thread_(computing)" target="_blank">detailed definition</a>. Basically, a thread is a chunk of code that is set up to be run asynchronously. Typically, the point of creating a thread is to allow the code to run in parallel with other parts of the parent program. The operating system swaps threads in and out of processor cores. Part of the reason for creating additional threads is to exploit the presence of multiple processor cores, but that's not the only reason. Consider a program designed to do some computationally intensive operation (such as solving an integer program) and assume the program has a graphical user interface (GUI). Chances are the GUI runs in a different thread from the computational portion of the program, even if it is running on a single-core processor. Otherwise, with everything in one thread, the GUI would become unresponsive for the entire time the program was crunching numbers. Among other things, that would make it impossible to use a GUI control or menu item to abort the computation if, say, you were jonesing to check your Twitter feed.<br /><br />Before continuing, I think it's worth noting three things here. The first is that, in an era of multi-core computer processors, multithreaded applications are increasingly common, and increasingly attractive. CPLEX defaults to using multiple threads when solving optimization problems, although you can control the number threads used (including throttling it to a single thread) via a parameter setting. Second, performance improvement due to multithreading is sublinear. If you go from one thread to two, the reduction in execution time is less than 50%. If you go from one thread to four, you will likely not see anywhere near a 75% reduction in processing time. This is partly due to added overhead required to set up and manage threads, and partly to the fact that threads can get in each others' way, jostling for CPU time and blocking progress of their siblings. Finally, making an application multithreaded may increase memory use, because you may need to make separate copies of some data for each thread. <br /><br /><h3>What is thread safety?</h3><br />Again, I'll refer you to a <a href="https://en.wikipedia.org/wiki/Thread_safety" target="_blank">Wikipedia definition</a>. The key concepts, I think, is that you want to defend against a couple of dangers. The first is that threads might block each other. <a href="https://en.wikipedia.org/wiki/Deadlock" target="_blank">Deadlock</a> can occur when one thread is waiting for another thread to do something but simultaneously blocking the second thread from doing it (say, by hogging a resource). That's actually an oversimplification, in that more than two threads can be contributing to a deadlock. <a href="https://en.wikipedia.org/wiki/Starvation_(computer_science)" target="_blank">Starvation</a> occurs when some thread cannot access the resources it needs because other threads (several different threads, or one thread over and over) keep blocking it.<br /><br />The second danger, which helps explain how things like starvation or deadlock can come about, is the danger that one thread writes shared data while another thread is reading or using it. For instance, consider Benders decomposition. (This is not a hypothetical example: you'll see it in the next post, if you stick around.) Assume, as is usually the case, a MIP master problem and a single LP subproblem, and assume we are using a callback to test proposed integer feasible solutions coming from the master problem. When the solver thinks it has a new incumbent for the master problem, it calls the callback. The callback uses the proposed incumbent to adjust some constraint limits in the subproblem, solves the subproblem, and based on the result either accepts the incumbent or cuts it off with a Benders cut.<br /><br />Now suppose that two different threads, A and B, both get (different) proposed incumbents, with A beginning processing a little ahead of B. A modifies the LP subproblem and tries to solve it, but before it gets to the point of calling the solver B (on a different core) starts modifying the subproblem. So when A calls the LP solver, the subproblem it is solving has a mix of some modifications A made and some modifications B made. At best, A ends up solving the wrong LP (and not knowing it). At worst, B is modifying the LP while A is trying to solve it. With CPLEX, at least, if this happens CPLEX throws an exception and the program likely grinds to a halt.<br /><br /><h3>How do we make code thread-safe?</h3><br />Good question. I don't know all the methods, but there are two fundamental techniques that I do know. The first is <a href="https://en.wikipedia.org/wiki/Lock_(computer_science)" target="_blank">locks</a>. Basically, locks are semaphores (flags, signals, whatever) that tell the system not to let any other thread touch some part of memory until the thread owning the lock is done. You write your code so that the part that will run concurrently locks shared objects, does what it needs with them, and then unlocks them. On the one hand, it's important to lock everything that needs locking. Missing one item is like trying to burglar-proof your home but then leaving the back door wide open. On the other hand, it's important to lock only what has to be locked, and only for as long as it needs to be locked. Hanging on to a lock for too long can block other threads, and hurt performance.<br /><br />The second technique is to avoid contention for data by giving each thread its own personal copy of the data. Thread-safety varies from language to language. In Java, each thread gets its own stack, containing local variables, and no thread can touch another thread's stack. So, for instance, if a thread starts a loop indexed by local variable <span style="font-family: "courier new" , "courier" , monospace;">i</span>, there is no danger that <span style="font-family: "courier new" , "courier" , monospace;">i</span> is touched by another thread. On the other hand, Java objects are parked in the heap, and are available to any thread that knows the addresses of the objects. So to avoid collisions between threads, you can either copy the original data to the stack for each thread and let the thread mess with its own copy, or (if the data is a Java object) create a clone of the original object for each thread. The clone will live in the heap, but only the thread for which it was created will know its address, so no other thread will screw with it.<br /><br />My modified Benders example (subject of a future post, after CPLEX 12.8 ships) will demonstrate both approaches.<br /><br />If you are a Java programmer, and if multithreading is new to you, I recommend you look at Oracle's tutorial on <a href="https://docs.oracle.com/javase/tutorial/essential/concurrency/" target="_blank">concurrency in Java</a>. It is written well, contains examples of things that can go wrong, and covers much if not all of what you need to know to handle thread safety while working with CPLEX generic callbacks. <br /><br />Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-22769509121523642017-11-03T14:00:00.001-04:002017-11-03T14:00:47.703-04:00CPLEX 12.8: Generic CallbacksIBM is getting ready to release CPLEX 12.8, and I had the opportunity to attend a presentation about by Xavier Nodet at the 2017 INFORMS annual meeting. Here are links to two presentations by Xavier: <a href="https://www.slideshare.net/xnodet/cplex-optimization-studio-128-whats-new-81104775" target="_blank">CPLEX Optimization Studio 12.8 - What's New</a> and <a href="https://www.slideshare.net/xnodet/cplex-128-the-generic-callback" target="_blank">CPLEX 12.8 - the Generic Callback</a>. As with any new release, there are multiple tweaks, performance improvements and some new features. What I want to focus on in this post and the next post or two (depending on how wordy I get) is a major change to the way CPLEX implements callbacks.<br /><br />I'm going to assume any reader continuing beyond this point knows what a callback is (and, for that matter, knows what CPLEX is). The callback system used through CPLEX version 12.7.1 is now being referred to as the "legacy" callback system. A new system, beginning in version 12.8, is called the "generic" callback approach. I'll outline what I think are the key take-aways here, but if you are a callback user, I strongly suggest you check out Xavier's slide show linked above.<br /><br />The first thing to know is that, for at least the near future, CPLEX will continue to support legacy callbacks. That's important for two reasons: it may take time to adjust to the new generic callbacks (and to port code to the new system); and not everything you could do with legacy callbacks is available with generic callbacks (more on that below). The second thing to know is that you cannot mix legacy and generic callbacks in the same program. You have to use one or the other exclusively (or neither).<br /><br />The new generic callback approach involves trade-offs. The legacy system involves essentially two types of callbacks: "information" callbacks (asking CPLEX what's going on at various points in the solution process) and "control" callbacks (meddling with how CPLEX solves a problem). Changes between how CPLEX worked back when control callbacks were introduced and how it works now created some adventures for both IBM and users. For instance, certain control callbacks were incompatible with dynamic search, so the presence of such a callback (even if it contained no code and did nothing) would cause CPLEX to turn off dynamic search. There were some other side effects, but I'm neither qualified nor motivated to list them all. Suffice it to say that IBM decided a revamped callback system, designed from the ground up to work fully with the current version of CPLEX, was warranted.<br /><br />So here come the trade-offs. In exchange for the ability to use callbacks without having to turn off dynamic search, limit CPLEX to a single thread, or whatever, you lose some functionality. Generic callbacks are currently not implemented for continuous problems. (Again, you can still use legacy callbacks on those.) Generic callbacks do not support user-controlled branching or node selection, nor user solution of node problems. A side effect of losing the branch callbacks is that users apparently can no longer attach node data to nodes and query it later (since, in the legacy system, the node data was attached in a branch callback). If you need any of those features, you have to stick to legacy callbacks for now (while you lobby IBM and/or your congressman to get your favorite features added to the generic callback system). Apparently, data IBM has accumulated on which features are or are not widely used suggested that omitting branch and solve callbacks would inconvenience a small portion of users.<br /><br />Finally, here comes what I think is potentially the biggest issue: it is now the responsibility of the user to ensure that a generic callback is thread-safe. This is not always easy to do, and it requires a depth of programming knowledge that you don't typically get in a first or maybe even second course on programming. To put it another way, I've been writing code for well over 45 years (going back to FORTRAN on a mainframe), and my experiments with the beta version of 12.8 represent the first time I've ever had to handle thread safety. In my next post (or two, depending on how chatty my muse is), I'm going to look specifically at thread safety. Meanwhile, if you're not comfortable with it, my advice is to stick to legacy callbacks (but start learning about writing thread-safe code -- soon or later you'll need to know how).<br /><br />Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0tag:blogger.com,1999:blog-8781383461061929571.post-88095806655456509132017-09-01T14:26:00.000-04:002017-09-01T14:26:15.815-04:00Minimizing a Median\( \def\xorder#1{x_{\left(#1\right)}} \def\xset{\mathbb{X}} \def\xvec{\mathbf{x}} \)A somewhat odd (to me) question was asked on a forum recently. Assume that you have continuous variables $x_{1},\dots,x_{N}$ that are subject to some constraints. For simplicity, I'll just write $\xvec=(x_{1},\dots,x_{N})\in\xset$. I'm going to assume that $\xset$ is compact, and so in particular the $x_{i}$ are bounded. The questioner wanted to know how to model the problem of minimizing the <i>median</i> of the values $x_{1},\dots,x_{N}$. I don't know why that was the goal, but the formulation is mildly interesting and fairly straightforward, with one wrinkle.<br /><br />The wrinkle has to do with whether $N$ is odd or even. Suppose that we sort the components of some solution $\xvec$, resulting in what is sometimes called the "order statistic": $\xorder 1\le\xorder 2\le\dots\xorder N$. For odd $N$, the median is $$\xorder{\frac{N+1}{2}}.$$For even $N$, it is usually defined as $$\left(\xorder{\frac{N}{2}}+\xorder{\frac{N}{2}+1}\right)/2.$$<br /><br />The odd case is easier, so we'll start there. Introduce $N$ new binary variables $z_{1},\dots,z_{N}$ and a new continuous variable $y$, which represents the median $x$ value. The objective will be to minimize $y$. In addition to the constraint $\xvec\in\xset$, we use "big-M" constraints to force $y$ to be at least as large as half the sample (rounding "half" up). Those constraints are: \begin{align*} y & \ge x_{i}-M_{i}z_{i},\ i=1,\dots,N\\ \sum_{i=1}^{N}z_{i} & =\frac{N-1}{2} \end{align*} with the $M_{i}$ sufficiently large positive constants. The last constraint forces $z_{i}=0$ for exactly $\frac{N+1}{2}$ of the indices $i$, which in turn forces $y\ge x_{i}$ for $\frac{N+1}{2}$ of the $x_{i}$. Since the objective minimizes $y$, $z_{i}$ will be 0 for the $\frac{N+1}{2}$ smallest of the $x_{i}$ and $y$ will be no larger than the smallest of them. In other words, we are guaranteed that in the optimal solution $y=\xorder{\frac{N+1}{2}}$, i.e., it is the median of the optimal $\xvec$.<br /><br />If $N$ is even and if we are going to use the standard definition of median, we need twice as many added variables (or at least that's the formulation that comes to mind). In addition to the $z_{i}$, let $w_{1},\dots,w_{N}$ also be binary variables, and replace $y$ with a pair of continuous variables $y_{1}$, $y_{2}$. The objective becomes minimization of their average $(y_{1}+y_{2})/2$ subject to the constraints \begin{align*} y_{1} & \ge x_{i}-M_{i}z_{i}\ \forall i\\ y_{2} & \ge x_{i}-M_{i}w_{i}\ \forall i\\ \sum_{i=1}^{N}z_{i} & =\frac{N}{2}-1\\ \sum_{i=1}^{N}w_{i} & =\frac{N}{2} \end{align*} where the $M_{i}$ are as before. The constraints force $y_{1}$ to be at least as large as $\frac{N}{2}+1$ of the $x_{i}$ and $y_{2}$ to be at least as large as $\frac{N}{2}$ of them. The minimum objective value will occur when $y_{1}=\xorder{\frac{N}{2}+1}$ and $y_{2}=\xorder{\frac{N}{2}}$. Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com2tag:blogger.com,1999:blog-8781383461061929571.post-7492205859967341512017-08-21T14:38:00.002-04:002017-08-21T14:38:43.759-04:00Updated Stepwise Regression FunctionBack in 2011, when I was still teaching, I cobbled together some <a href="https://orinanobworld.blogspot.com/2011/02/stepwise-regression-in-r.html" target="_blank">R code</a> to demonstrate stepwise regression using F-tests for variable significance. It was a bit unrefined, not intended for production work, and a few recent comments on that post raised some issues with it. So I've worked up a new and (slightly) improved version of it.<br /><br />The new version is provided in an <a href="https://www.msu.edu/~rubin/code/stepwise_demo.nb.html" target="_blank">R notebook</a> that contains both the stepwise function itself and some demonstration code using it. It does not require an R libraries besides the "base" and "stats" packages. There is at least one butt-ugly hack in it that would keep me from being hired in any sort of programming job, but so far it has passed all the tests I've thrown at it. If you run into issues with it, feel free to use the comment section below to let me know. I'm no longer teaching, though, so be warned that maintenance on this is not my highest priority.<br /><br />The updated function has a few new features:<br /><ul><li>it returns the final model (as an <span style="font-family: "Courier New",Courier,monospace;">lm</span> object), which I didn't bother to do in the earlier version;</li><li>you can specify the initial and full models as either formulas (<span style="font-family: "Courier New",Courier,monospace;">y~x+z</span>) or strings (<span style="font-family: "Courier New",Courier,monospace;">"y~x+z"</span>), i.e., quotes are strictly optional; and</li><li>as with the <span style="font-family: "Courier New",Courier,monospace;">lm</span> function, it has an optional <span style="font-family: "Courier New",Courier,monospace;">data = ...</span> argument that allows you to specify a data frame. </li></ul>There are also a few bug fixes:<br /><ul><li>if you set the alpha-to-enter greater than the alpha-to-leave, which could throw the function into an indefinite loop, the function will now crab at you and return <span style="font-family: "Courier New",Courier,monospace;">NA</span>; </li><li>if you try to fit a model with more parameters than you have observations, the function will now crab at you and return <span style="font-family: "Courier New",Courier,monospace;">NA</span>; and</li><li>the function no longer gets confused (I <i>think</i>) if you happen to pick variable/column names that happen to clash with variable names used inside the function.</li></ul>As always, the code is provided with a Creative Commons license, as-is, no warranty express or implied, your mileage may vary.<br /><br />Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com5tag:blogger.com,1999:blog-8781383461061929571.post-8627487359872892332017-08-07T19:50:00.001-04:002017-08-07T19:50:30.389-04:00Rolling HorizonsI keep seeing questions posted by people looking for help as they struggle to optimize linear programs (or, worse, integer linear programs) with tens of millions of variables. In my conscious mind, I know that commercial optimizers such as CPLEX allow models that large (at least if you have enough memory) and can often solve them (at least if you have enough computing resources to throw at the problems). My <a href="https://www.youtube.com/watch?v=JAoFPIHBu6U" target="_blank">lizard brain</a>, however, was conditioned by the state of the art in the late '70s to send me running (typically while screaming) at the sight of a model with more than, oh, 50 or so variables. Wrapping my head around tens of millions of variables, let alone thinking of a strategy for getting an optimal solution "quickly", is quite a struggle.<br /><br />A former acquaintance from my student days once articulated his strategy for essay exams to me: if you don't know the answer, argue the premise of the question. In that vein, I'm inclined to ask why so many variables are necessary. One possibility is that the model captures decisions for a sequence of time periods. If decisions in one time period had no impact on subsequent periods, the problem would decompose naturally into a bunch of smaller problems; so if we are talking about a multiperiod model, it's safe to assume that the periods are connected.<br /><br />That brings me to a strategy I learned back in primitive times, the "rolling horizon" approach. Let me stipulate up front that this is a heuristic method. It does not provide a provably optimal solution for the entire time horizon. Still, given the up-tick in humongous models, I'm starting to wonder if rolling horizons are no longer taught (or are looked down on).<br /><br />The basic concept is simple. The devil is in the details. Let's say we want to solve a planning model over a horizon of $H$ time periods, and that one omnibus model for the entire horizon is proving intractable. The rolling horizon approach is as follows.<br /><ol><li>Pick a shorter horizon $K < H$ that is tractable, and a number of periods $F \le K$ to freeze.</li><li>Set "boundary conditions" (more on this below).</li><li>Solve a model for periods $1,\dots,K$, incorporating the boundary conditions.</li><li>Freeze the decisions for periods $1,\dots,F$.</li><li>Solve a model for periods $F+1,\dots, \min(F+K, H)$.</li><li>Repeat <i>ad nauseam</i>.</li></ol>Note that if $F<K$, some decisions made in each solution but not frozen will be subject to revision in the next model.<br /><br />The initial conditions (starting inventory, locations of vehicles, pending orders, ...) for each model are dictated by the state of the system after the last frozen period. The boundary conditions are limits on how much of a mess you can leave at the end of the reduced planning horizon (period $K$ in the first model, $K+F$ in the second model, etc.). More precisely, they limit the terminal state of things that will be initial conditions in the next model.<br /><br />As a concrete example, consider a manufacturing scheduling model. You start with inventories of components and finished products, available capacity of various kinds, and unfilled orders, and you end with the same kinds of things. Without boundary conditions, your solution for the first model might end period $K$ with no finished goods inventory. Why make stuff if it doesn't count toward your demand within the time frame considered by the model? That might make the problem for periods $K+1$ onward infeasible, though: you have orders that must be filled, they exceed your immediate production capacity, and the cupboard was left bare by the previous solution.<br /><br />So you want to add constraints of the form "don't leave me with less than this much inventory on hand" or "don't leave me with more than this many unfilled orders". Picking values for those limits is a bit of an art form. Make the limits too draconian and you will get a very suboptimal solution (say, piling up way more inventory than you'll really need) or possibly even make the early subproblems infeasible. Make the limits too laissez faire, and you may force thoroughly suboptimal solutions to later subproblems (piling on lots of expensive overtime, blowing off soon-to-be-former customers) or even make the later problems infeasible. Still, it's usually possible to come up with pretty reasonable boundary conditions, perhaps with some trial and error, and it's a way to get a solution that, if not globally optimal, is at least good (and preferable to staring bleary-eyed at your screen in the 30th hour of a run, wondering if the beast will ever converge).<br /><br />The term "rolling horizon" was coined in reference to problems that are decomposed based on time, but the same concept may extent to problems that can be decomposed based on position. I used it not long ago to get a heuristic for a problem placing nodes in a physical network. In essence, we took the original geographic region and chopped it into pieces, picked a piece to start from, and then worked our way outward from that piece until all the pieces had been solved, each based on the (frozen) solution to the more inward pieces.Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com4tag:blogger.com,1999:blog-8781383461061929571.post-86056303017666696572017-07-14T22:25:00.001-04:002017-07-14T22:25:26.840-04:00Memory MinimizationAs I grow older, I'm starting to forget things (such as all the math I ever learned) ... but that's not the reason for the title of this post.<br /><br />A somewhat interesting <a href="https://math.stackexchange.com/questions/2354969/minimal-static-memory-allocation-for-a-schedule" target="_blank">question</a> popped up on <a href="https://math.stackexchange.com/" target="_blank">Mathematics StackExchange</a>. It combines a basic sequencing problem (ordering the processing of computational tasks) with a single resource constraint (memory) and a min-max criterion (minimize the largest amount of memory consumed anywhere in the sequence). What differentiates the problem from other resource-constrained scheduling problems I've seen is that the output of each task eats a specified amount of memory and must stay there until every successor task that needs it as input completes, after which the memory can be recycled. In particular, the output of the last task using the result of task $n$ will coexist momentarily with the output of $n$, before $n$'s output can be sent to the bit-recycler. Also, there is no time dimension here (we neither know nor care how much CPU time tasks need), just sequencing.<br /><br />The data for the problem is a precedence graph (directed acyclic graph, one node for each task) plus an integer weight $w_n$ for each task $n$, representing the memory consumption of the value computed by task $n$. There are several ways to attack a problem like this, including (mixed) integer programming (MIP), constraint programming (CP), and all sorts of metaheuristics. MIP and CP guarantee optimal solutions given enough resources (time, memory, computing power). Metaheuristics do not guarantee optimality, but also do not require commercial software to implement and may scale better than MIP and possibly CP.<br /><br />I thought I'd publish my MIP model for the problem, scaling be damned.<br /><br /><h4>Index sets and functions</h4><br />Let $N$ be the number of tasks (nodes).<br /><ul><li>I will use $V=\left\{ 1,\dots,N\right\}$ as the set of tasks and $S=\left\{ 1,\dots,N\right\}$ as the set of schedule slots. (Yes, I know they are the same set, but the dual notation may help distinguish things in context.)</li><li>$A\subset V\times V$ will represent the set of arcs, where $(u,v)\in A$ means that task $v$ takes the output of task $u$ as an input.</li><li>$\pi:V\rightarrow2^{V}$ and $\sigma:V\rightarrow2^{V}$ will map each task to its immediate predecessors (the inputs to that task) and immediate successors (tasks that need its value as inputs) respectively. </li></ul><br /><h4>Parameters</h4><br />The only parameter is $w:V\rightarrow\mathbb{N}$, which maps tasks to their memory footprints.<br /><br /><h4>Variables</h4><br />There are a few ways to express assignments in MIP models, the two most common of which are "does this critter go in this slot" and "does this critter immediately follow this other critter". I'll use the former.<br /><ul><li>$x_{ij}\in\left\{ 0,1\right\} \ \forall i\in V,\forall j\in S$ will designate assignments, taking the value 1 if task $i$ is scheduled into slot $j$ in the sequence and 0 otherwise.</li><li>$r_{ij}\in\left[0,1\right]\ \forall i\in V,\forall j\in S$ will take the value 1 if the output of task $i$ is resident in memory immediately prior to performing the computational task in slot $j$ and 0 otherwise.</li><li>$d_{ij}\in\left[0,1\right]\ \forall i\in V,\forall j\in S$ indicates whether all tasks requiring the output of task $i$ have been completed prior to the execution of the task in slot $j$ (1) or not (0).</li><li>$z>0$ is the maximum memory usage anywhere in the schedule.</li></ul>We can actually winnow a few of these variables. If task $i$ has any predecessors, then clearly $x_{i1}=0$, since there is no opportunity before the first slot to execute a predecessor task. Similarly, $d_{i1}=0\,\forall i\in V$, since nothing has been completed prior to the first slot.<br /><br />You may think at this point that you have spotted a typo in the second and third bullet points (brackets rather than braces). You haven't. Read on.<br /><br /><h4>Objective</h4><br />The easiest part of the model is the objective function: minimize $z$.<br /><br /><h4>Constraints</h4><br />The assignment constraints are standard. Every task is assigned to exactly one slot, and every slot is filled with exactly one task. $$\sum_{j\in S}x_{ij}=1\quad\forall i\in V$$ $$\sum_{i\in V}x_{ij}=1\quad\forall j\in S$$ Precedence constraints are also fairly straightforward. No task can be scheduled before each of its predecessors has completed.$$x_{ij}\le\sum_{k <j}x_{hk}\quad\forall i\in V,\forall j\in S,\forall h\in\pi(i)$$ Next, we define memory consumption. Immediately after executing the task in any slot $j$, the memory footprint will be the combined consumption of the output of that task and the output of any earlier task still in memory when slot $j$ executes.$$z\ge\sum_{i\in V}w(i)\left(r_{ij}+x_{ij}\right)\quad\forall j\in S$$ Defining whether the output of some task $i$ is no longer needed prior to executing the task in some slot $j>1$ is fairly straightforward.$$d_{ij}\le\sum_{k<j}x_{hk}\quad\forall i\in V,\forall 1<j\in S,\forall h\in\sigma(i)$$As noted above, there is no "before slot 1", so we start this set of constraints with $j=2$.<br /><br />Finally, the result of task $i$ remains in memory at the point where the task in slot $j$ executes if task $i$ executed before slot $j$ and the memory has not yet been freed.$$ r_{ij}\ge\sum_{k<j}x_{ik}-d_{ij}\quad\forall i\in V,\forall 1<j\in S$$Again, this is nonsensical for $j=1$, so we start with $j=2$.<br /><br /><h4>Comments</h4><br />I'll end with a few observations about the model.<br /><ul><li>It has $O(N^2)$ binary variables, so, as I noted above, it will not scale all that gracefully.</li><li>As mentioned early, the relaxation of $d_{ij}$ and $r_{ij}$ from binary to continuous in the interval $[0,1]$ was deliberate, not accidental. Why is this justified? $d_{ij}$ is bounded above by an integer quantity, and the solver "wants" it to be $1$: the sooner something can be removed from memory, the better for the solution. Similarly, $r_{ij}$ is bounded below by an integer expression, and the solver "wants" it to be $0$: the less cruft hanging around in memory, the better. So the solution will be binary even if the variables are not.</li><li>The solution might be "lazy" about clearing memory. From the model perspective, there is no rush to remove unneeded results as long as they are not contributing to the <i>maximum</i> memory footprint. The maximum memory load $z$ may occur at multiple points in the schedule. At least one would contain nothing that could be removed (else the solution would not be optimal). At points $j$ where memory use is below maximum, and possibly at points where memory use equals the maximum, there may be tasks $i$ for which $d_{ij}=0$ in the solution but could be set to $1$, without however reducing $z$. This can be avoided by adding constraints that force $d_{ij}=1$ wherever possible, but those constraints would enlarge the model (quite possibly slowing the solver) without improving the final solution.</li></ul>Paul Rubinhttps://plus.google.com/111303285497934501993noreply@blogger.com0