## Friday, April 24, 2020

### Generating Random Digraphs

In a recent post, OR consultant and blogger Erwin Kalvelagen discussed generating a random sparse network in GAMS. More specifically, he starts with a fixed set of nodes and a desired number of arcs, and randomly generates approximately that number of arcs. His best results, in terms of execution time, came from exporting the dimensions to R, running a script there, writing out the arcs and importing them back into GAMS.

There are three possible issues with the script. Erwin acknowledged the first, which applies to all his approaches: after removing duplicates, he wound up with fewer than the targeted number of arcs. In many applications this would not be a problems, since you would be looking for "about 1% density" rather than "exactly 1% density". Still, there might be times when you need a specific number of arcs, period. You could supplement Erwin's method with a loop that would test the number of arcs and, if short, would generate more arcs, remove duplicates, add the survivors to the network and repeat.

The second possible issue is the occurrence of self-loops (arcs with head and tail the same, such as (5, 5)). Again, this may or may not be a problem in practice, depending on the application. I rarely encounter network applications where self-loops are expected, or even tolerated. Again, you could modify Erwin's code easily to remove self-loops, and it would not increase execution time much.

The third possible issue is that some nodes may be "orphans" (no arcs in or out), and others may be accessible only one way (either inward degree 0 or outward degree 0). Once again, the application will dictate whether this is a matter of concern.

I decided to test a somewhat different approach to generating the network (using R). It has the advantage of guaranteeing the targeted number of arcs, with no self-loops. (It does not address the issue of orphan nodes.) It has the disadvantage of being slower than Erwin's algorithm (but by what I would call a tolerable amount). My approach is based on assigning an integer index to every possible arc. Assume there are $N$ nodes, indexed $0, \dots, N-1$. (Erwin uses 1-based indexing, but it is trivial to adjust from 0-based to 1-based after the network has been generated.) There are $N^2$ arcs, including self-loops, indexed from $0,\dots,N^2-1$. The arc with index $k$ is given by $$f(k) = (k \div n, k \mod n),$$where $\div$ denotes the integer quotient (so that $7 \div 3 = 2$). As self-loop is an arc whose index $k$ satisfies $k\div n = k \mod n$; those are precisely the arcs with indices $k=m(n + 1)$ for $m=0,\dots,n-1$. So my version of the algorithm is to start with the index set $\lbrace 0,\dots,n^2-1\rbrace$, remove the indices $0, n+1, 2n+2,\dots, n^2-1$, take a random subset of the survivors and apply $f()$ to them.

I have an R Notebook that compares the two algorithms, using the same dimensions Erwin had: $n=5000$ nodes, with a target density of 1% (250,000 arcs). Timing is somewhat random, even though I set a fixed random number seed for each algorithm. The notebook includes both code and output. As expected, my code gets the targeted number of arcs, with no self-loops. Erwin's code, as expected, comes up a bit short on the number of arcs, and contains a few (easily removed) self-loops. Somewhat interestingly, in my test runs every node had both inward and outward degree at least 1 for both algorithms. I think that is a combination of a fairly large arc count and a bit of luck (the required amount of luck decreasing as the network density increases). If orphans, or nodes with either no outbound or no inbound arcs, turn out to be problems, there is a fairly easy fix for both methods. First, randomly generate either one or two arcs incident on each node (depending on whether you need both inward and outward arcs everywhere). Then generate the necessary number of additional arcs by adjusting the sample size. As before, you might come up a few arcs short with Erwin's algorithm (unless you include a loop to add arcs until the target is reached). In my algorithm, you can easily calculate the indices of the initial set of arcs (the index of arc $(i,j)$ is $n\times i + j$) and then just remove those indices at the same time that you remove the indices of the self-loops, before generating the remaining arcs.

## Tuesday, April 21, 2020

### A CP Model for Toasting Bread

A question on Mathematics Stack Exchange deals with a problem (apparently from the book "Thinking Mathematically") about toasting bread on a grill. The grill can hold two slices at a time and can only toast one side of each at a time. You have three slices to toast, and the issue is to figure out how to do it in the minimum possible amount of time (what operations management people refer to as the makespan).

The questioner had a solution that I was able to prove is optimal, using a constraint programming (CP) model that I coded using the Java API to IBM's CPOptimizer (part of the CPLEX Studio product). I won't swear my model is elegant or efficient, since I'm pretty new to CPO, but I think it is correct. If anyone getting started with CPO and the Java API wants to see the source code, it is available in my repository. I'll describe a few key aspects below.

I assumed in my model that there is a single cook. The fundamental components of the model are CPO "interval variables" (instances of IloIntervalVar) for each task (inserting a slice, toasting one side, removing a slice, flipping a slice) along with a dummy task for being done and a placeholder task I called "reversing". Interval variables represent time spans during which tasks are done.

In the problem, there are two ways to get from toasting the front of a slice to toasting the back: you can leave it on the grill and flip it; or you can remove it and (later) replace it with the other side up. Since I didn't know a priori which slices will be handled which way, I created interval variables for removing each slice after the first side, replacing each slice with the second side up, and flipping each slice. Those variables are declared optional, meaning each interval may or may not show up in the solution. For each slice, there is an interval variable for the "reversing" task that is not optional. Each slice has to be reversed, one way or the other. The tasks for replacing a slice (after removing it) and for flipping the slice are listed as alternatives for the reversing task for that slice, which means exactly one of flipping or replacing must be in the solution. Separate constraints ensure that a slice is reinserted if and only if it was removed after the first side toasted. Those constraints use the IloCP.presenceOf function to test whether the remove and reinsert intervals are present, and set the results equal (so both present or neither present).

The sequencing of operations (insert, toast first side, reverse, toast second side, remove) is enforced through a bunch of constraints that use IloCP.endBeforeStart (which says the first argument has to end before the second argument starts). The dummy "done" task is sequenced to start only after each slice has been removed for the final time. I'm pretty sure the objective value (the time everything is done) could be handled other ways, but I tend to think of completion as a zero-length task.

The cook can only do one thing at a time. This is handled using the IloCP.noOverlap function. It is passed a list of every interval that requires the cook's attention (everything but the actual toasting tasks), and prevents any of those intervals from overlapping with any other.

Finally, I need to prevent more than two slices from occupying the grill at any one time. The noOverlap function is no help here. Instead, I use an instance of IloCumFunctionExpr, which represents a function over time that changes as intervals begin and end. In this case, the function measures occupancy.  This is handled by treating the usage as a combination of step functions (IloCP.stepAtStart and IloCP.stepAtEnd). Usage steps up by one at the start of the task of inserting a slice and steps down by one at the end of the task of removing a slice. (Toasting and flipping have no effect on occupancy.) The Javadoc for the relevant functions a bit, um, sparse, essentially saying nothing about the step height argument. Thus I discovered the hard way (through error messages) that adding a step with height -1 when a slice was removed was not acceptable. Instead, I had to subtract a step of height +1.

Although it was not really necessary on a problem this small, I removed some symmetry resulting from the arbitrary ordering of the slices by setting precedence constraints saying that slice 1 is started before slice 2, which in turn is started before slice 3.

It is possible to model the problem as an integer program, and in fact I initially leaned that direction. The IP model, however, would be bulkier and less expressive (which would make it more prone to logic errors), and quite possibly would be slower to solve. CPOptimizer is designed specifically with scheduling problems in mind, so it is the better tool for this particular job.

## Friday, April 17, 2020

### Objective Constraints (Again)

Long ago, I did a couple of posts [1, 2] about constraints designed to bound objective functions. We are referring here to constraints that explicitly bound the value of the objective function in an integer or mixed-integer linear program. The typical application is when the user is minimizing $f(x)$ subject to $x\in X$ and specifies a bound $f(x) \le d$. (If maximizing the inequality becomes $f(x)\ge d$.) The reason for doing so is to help the solver prune inferior nodes (nodes where $f(x) > d$ when minimizing) faster.

One way to accomplish the goal is to set a feasible starting solution $x^0 \in X$ for which $f(x)\le d$. This of course requires you to know such a solution. Also, setting a starting solution, even a good one, will likely steer the solver in a different direction than what it would have taken without the starting solution (meaning it will build a different tree), and this can wind up either faster or slower than not having the start, depending on where you sit on Santa's naughty/nice list and assorted random factors. (Asserting the bound by any of the other methods listed below can also have unintended consequences. Pretty much anything you do with a MIP can have unintended consequences.)

Assuming you have a bound in mind but not a starting solution, you have a few options. The main takeaways from those two posts were basically the following.
1. If your solver has the capability, your best bet is probably to specify the bound via a parameter. (CPLEX has the "upper cutoff" parameter for min problems and the "lower cutoff" parameter for max problems to do just this.)
2. Failing that, you can introduce a variable $z$ to represent your objective function, add a defining constraint $z = f(x)$, minimize $z$ and then specify $d$ as an upper bound for $z$. This may slow the solver some (for reasons explained in the prior posts) but is likely not as bad as the last option.
3. The last option, which is the most obvious (and thus one users gravitate to), is to add the constraint $f(x) \le d$ to the model. This can slow the solver down noticeably.
The short version of why the last option is undesirable is that if the last constraint is not binding  (which will happen if $d$ is not the optimal value and the solver has found an optimal or near optimal solution), it is just excess baggage. If it is binding, it can cause dual degeneracy.

Someone recently asked about this, and I waved my hand and invoked "dual degeneracy", but I'm not sure how clear I was. So I thought I would augment the two previous posts with a small example.

Suppose that we are solving a MIP model, and at some node we are staring at the following LP relaxation:\begin{alignat*}{5} \min & {}-{}5x_{1} & {}+{}40x_{2} & {}-{}5x_{3} & {}+{}5x_{4}\\ \textrm{s.t.} & \phantom{\{\}-}x_{1} & {}-{}\phantom{4}6x_{2} & & {}-{}3x_{4} & {}+{}s_{1} & & & =-3\\ & \phantom{\{\}-}x_{1} & {}-{}\phantom{4}2x_{2} & {}+\phantom{5}{}x_{3} & {}+{}\phantom{4}x_{4} & & {}+{}s_{2} & & =\phantom{-}0\\ & {}-{}5x_{1} & {}+{}40x_{2} & {}-{}5x_{3} & {}+{}5x_{4} & & & {}+{}s_{3} & =-6 &\quad (*)\end{alignat*}where the variables are nonnegative, the $s$ variables are slacks, and the constraint (*) is our way of imposing an upper bound of -6 on the objective function. In matrix terms the problem is\begin{align} \min\quad & \bar{c}'\bar{x}\\ \textrm{s.t.}\quad & \bar{A}\bar{x}=\bar{b}\\ & \bar{x}\ge0 \end{align} with $\bar{x}=(x_1,\dots,x_4,s_1,\dots,s_3)'$, $\bar{c}=(-5,40,-5,5,0,0,0)'$, $\bar{b}=(-3,0,-6)'$ and $$\bar{A}=\left[\begin{array}{rrrrrrr} 1 & -6 & 0 & -3 & 1 & 0 & 0\\ 1 & -2 & 1 & 1 & 0 & 1 & 0\\ -5 & 40 & -5 & 5 & 0 & 0 & 1 \end{array}\right].$$The initial basis would be the slack variables, giving us an infeasible solution $x=0$, $s=(-3,0,-6)$ with reduced costs $r = \bar{c}$. The negative values of $s_1$ and $s_3$ cause the infeasibility.

MIP solvers commonly use the dual simplex method to eliminate infeasibility in a node LP. Dual simplex would pivot in the row $i$ with the most negative right-hand side value $\bar{b}_i$, and in the column $j$ for which the ratio $r_j/\bar{a}_{ij}$ is minimal among those where $\bar{a}_{ij}\lt 0$. Here $i=3$ and $j$ is either 1 or 3 (the ratio in both column 1 and column 3 being $-5/-5=1$). Suppose that the solver chooses column 1, making the new basis (in row order) $(s_1, s_2, x_1).$ After the pivot, the reduced cost vector becomes $\hat{r}=c(0,0,0,0,0,0,-1)$, the new right-hand side vector is $\hat{b}=(-4.2, -1.2, 1.2)'$, and the new constraint matrix is $$\hat{A} = \left[\begin{array}{rrrrrrr} 0 & 2 & -1 & -2 & 1 & 0 & 0.2\\ 0 & 6 & 0 & 2 & 0 & 1 & 0.2\\ 1 & -8 & 1 & -1 & 0 & 0 & -0.2 \end{array}\right].$$The solution is still infeasible, and dual simplex will look to pivot in row 1 (where $\hat{b}$ is most negative. There are two possible pivot columns, columns 3 and 4, but the ratio used to distinguish them is zero in both cases because the reduced cost vector is all zeros (except for $s_3$, the slack in the objective constraint).

The same thing happens if we pivot in column 3 rather than column 1, and in fact it is possible to show that the reduced cost vector will be all zeros other than the slack in the constraint limit as long as the slack in the constraint limit is nonbasic. Since that slack variable will typically be nonbasic so long as the constraint is binding, and the constraint is useful only when binding, we can expect to see a lot of LPs where this occurs.  The tie is survivable (we've already seen one tie for pivot column), but picture this occurring where there are many dual pivots required, with perhaps many eligible columns (negative coefficients) for each pivot, and they all have ratio 0. The solver will be flying somewhat blind when it picks pivot columns, which could plausibly slow things down.

### References

[1] "Objective Functions Make Poor Constraints"
[2] "Objective Constraints: The Sequel"

## Saturday, April 4, 2020

### Tangents v. Secants Part II

This is a continuation of a recent post ("Approximating Nonlinear Functions: Tangents v. Secants") on how to work a nonlinear function into a mixed-integer linear programming model. As before, I'm sticking to functions of one variable. To recap the take-aways from that post, there are basically four ways I know to approximate a nonlinear function:
1. use a piecewise-linear function based on secants;
2. use a piecewise-linear function based on tangents;
3. use a surrogate variable that is bounded (below if the function is convex, above if the function is concave) by a collection of linear functions derived from tangents; or
4. use the third technique plus a callback that adds additional linear tangent functions whenever a candidate solution underestimates (convex case) or overestimates (concave case) the true value of the function.
The first two methods apply generally, meaning that the function need not be convex or concave, and the constraint it appears in need not be a particular type ($\ge$, $\le$, $=$). The third and fourth methods only work when the function is convex or concave. For convex functions, tangents will underestimate the true value and secants will overestimate it; for concave functions, the opposite is true. For specific cases (convex function in a $\le$ constraint, concave function in a $\ge$ constraint), one of secants or tangents will produce feasible but potentially suboptimal solutions while the other will produce superoptimal but potentially infeasible solutions.

Before going on, I want to make two observations that I should have made in the first post. First, for the specific cases I just mentioned, if you solve the problem twice, once using secants and once using tangents, the true optimal objective value will fall between the values of the two solutions, so you will have an idea of how close to optimal the possibly suboptimal solution is. (I'll illustrate this below.) For nonlinear functions in equality constraints, or for functions that are neither convex nor concave, this would not work. Second, if the argument to the nonlinear function is integer-valued, it makes sense to construct a piecewise-linear function (first two options) with integer-valued break points or to construct tangents (third option) at integer-valued points. That way, you are guaranteed correct function values at least at some points in the domain of the function. This is easy to do with secants but considerably more work with tangents.

I have one more observation to make before getting to an example. In the fourth method I listed, with a convex function in a $\le$ constraint or a concave function in a $\ge$ constraint, if the solver finishes the search with an "optimal" solution, the solution will really be optimal. These are the cases where we would normally risk superoptimality, but the callback will prevent that from happening.

At this point, I'm going to present an example of one possible scenario. The problem is to select repeating order quantities for a collection of products. In the model to follow, capital letters will be parameters and lower case letters will be indices or variables. We start with $N$ products. For each product $i$, we know the annual demand ($D_i$), the unit price ($P_i$), the unit annual holding cost ($H_i$), the cost to place an order for the product ($S_i$, regardless of how much is being ordered), and the unit volume ($V_i$, the amount of storage space one unit occupies). In addition, we know the total storage capacity $C$ of the warehouse where the products will be stored. We will somewhat laughably assume that everything is deterministic.

Let $q_i$ denote the quantity of product $i$ ordered each time an order is placed, and $f_i$ the frequency (number of orders per year) with which product $i$ is replenished. The total annual cost, to be minimized, is $$\sum_{i=1}^N \left[P_iD_i + H_i\frac{q_i}{2} + S_i f_i \right],\tag{1}$$where the first term is the total cost of purchasing products (which is constant), the second term is the total cost of storing them (based on the average inventory level, which is half the order size), and the last term is the total cost for placing orders.

The nonlinear function in this problem is the one relating order quantity to order frequency:$$f_i = \frac{D_i}{q_i}\quad \forall i=1,\dots,N.\tag{2}$$For a single product, it would be easy to substitute out $f_i$ from the objective, leaving a function of just $q_i$, and then differentiate. The first order optimality condition leads to the well known economic order quantity (EOQ) formula$$q_i^* = \sqrt{\frac{2D_iS_i}{H_i}}.$$The catch here is that ordering the EOQ for every item might exceed our storage space. So we resort to a mixed-integer program, minimizing (1) subject to $$\frac{1}{2}\sum_{i=1}^n V_i q_i \le C \tag{3}$$with $$q_i\in \lbrace 1,\dots,D_i\rbrace\quad \forall i\in\lbrace 1,\dots,N\rbrace$$ and $$f_i \in\left[ 1,D_i\right]\quad \forall i\in\lbrace 1,\dots,N\rbrace ,$$plus some constraint(s) to connect $f_i$ to $q_i$. It's worth pausing here to note that at most one of $f_i$ and $q_i$ needs to be discrete. Here I am assuming that order quantities must be integers (we are ordering something like appliances, where a third of a washing machine is not a meaningful concept) but that order frequencies need not be integers (2.5 orders per year just means two orders one year, three the next, and so on).

What is left is to pick one of the methods for approximating the relationship between quantity and frequency, equation (2). For methods 1 and 2,  we can add to (1) and (3) the constraint $$f_i = \ell_i(q_i)\quad\forall i\in\lbrace 1,\dots,N\rbrace,\tag{4}$$where $\ell_i()$ is a piecewise linear function derived from either tangents or secants to the reciprocal function $g(x)=1/x$. For method 3, we can instead compute $M_i$ tangent functions $\ell_i()$ for each $i$ and add the constraint $$f_i \ge \ell_j(q_i) \quad\forall i\in\lbrace 1,\dots,i\rbrace,\,\forall j \in\lbrace 1,\dots,M_i\rbrace.\tag{4'}$$Note that this may underestimate $f_i$ (and thus the cost of a solution) but will not affect feasibility (since the space constraint (3) does not contain $f_i$). Finally, for the fourth method, we can minimize (1) subject to (3) and (4') and also use a callback to add more constraints like (4') on the fly.

I tried all four methods, using CPLEX 12.10, on a small test problem ($N=10$ products). I used a constant holding cost rate ($H_i = 0.2$) for all products, and set the space limit to $C=2136.41$. (Yes, it looks goofy, but I used random numbers to generate the problem.) The product level data was as follows:

ProductUnit CostOrder CostDemandSize
04.042.5417061.57
12.372.5512034.68
23.822.0112061.56
32.922.5213732.00
41.373.3810294.79
52.562.5217002.91
64.553.1213144.28
71.073.1812234.06
83.643.7419163.32
91.972.136302.52

For method 1, I used 10 evenly spaced breakpoints (so nine chords). For the other methods, I computed tangents at 10 evenly spaced points. Of course, more breakpoints or tangents would make for a more accurate approximation. The objective value for each of the four methods is as follows:

MethodNominalActual
1672.48670.05
2490.2728946.48
3490.2728946.48
4633.40633.53

Here "nominal" is the objective value reported by CPLEX and "actual" is the actual cost, using the order quantities from the CPLEX solution but recalculating the order frequencies according to (2). Method 1, based on secants, overestimates frequency (and thus cost) slightly. Methods 2 and 3 massively underestimate some frequencies, and thus the overall cost. The reason is apparent from the next table, which shows the nominal and actual order frequencies for each product:

ProductDemandQuantityActual FreqNominal Freq
0170611706.0019.89
1120311203.0019.80
2120611206.0019.85
3137311373.0019.83
410291377.516.69
5170011700.0019.82
6131411314.0019.83
712231647.466.64
8191611916.0019.91
9630857.416.61

For products with nontrivial order quantities, frequencies are underestimated a bit, but for products with order quantity 1 frequencies are massively underestimated (which is what attracts the solver to using such a small order quantity). Basically, the piecewise-linear approximation of the reciprocal relation (2) stinks at the low end of the quantity range, because the curve is steep and none of the tangents are close enough to that end of the quantity domain. This could be improved by forcing the piecewise-linear functions in method 2 to have a breakpoint at $q_i=1$ or by including a tangent function $\ell_i()$ calculated at $q_i=1$ in method 3. Still, to get a reasonable approximation you might need to add a bunch of tangents at that end of the curve.

Method 4 produces a solution that is actually optimal (to within convergence tolerances). The callback ignored discrepancies between nominal and actual order frequency below 0.01. Cost is very slightly underestimated, which I think could be fixed by setting that 0.01 tolerance even smaller. That might cause the program to generate a huge number of tangents in the callback, slowing it down considerably. As it was, the callback added 392 tangents to the initial set of 10 tangents during the solution run.

I mentioned earlier that using both tangents and secants (in separate runs) would bracket the true optimal value in cases of convex (concave) functions in less than (greater than) constraints. Here the true optimal cost (around 633.5) is indeed below the nominal cost of the secant approach (672.58) and above the nominal cost of the tangent approach (490.27). Note that we have to use the nominal costs, which in the case of the tangent approach is at once horribly inaccurate for the solution produced but still a valid lower bound on the actual optimal cost.

If you would like to look at or play with my code (add breakpoints or tangents, add products, make the frequency rather than the order quantity discrete), you can find it at https://gitlab.msu.edu/orobworld/secants.