## Sunday, December 16, 2018

### Of Typewriters and Permutations (V)

Okay, this is the last post on the subject. I promise! If you're coming into this movie on  the last reel, you may need to skim the last few posts to see what it's about. I'm trying not to repeat myself too much.

To summarize where we are at: Hardmath123 posted a solution (generated by a heuristic) that might or might not be optimal; a constraint programming model gets that solution very quickly and fails to improve on it to the extent of my experience (I let it run four hours); and the MIP1 model was the least worst of the various optimization models I tried, but it had rather loose bounds.

Rob Pratt, in a comment to the second post, suggested a new constraint that does improve the bound performance of MIP1. Recall that the variables in MIP1 are $x_{i,j}$ (binary, does symbol $i$ go in slot $j$), $p_i$ (the position where symbol $i$ gets parked), and $d_{ij}$ (the distance between symbols $i$ and $j$ in the layout). Rob's constraint is$$\sum_{i} \sum_{j>i} d_{ij} = \frac{n^3 - n}{6}$$where $n$ is the size of the symbol set (26 in our case). I'll skip the derivation here; it's based on the observation that the sum of the distances between all pairs of positions is constant, regardless of who fills those positions.

That got me thinking, and I came up with a set of constraints that are similarly motivated. Let $\delta_j$ be the sum of the distances from position $j$ to all other positions. Since we are using zero-based indexing, there are $j$ positions to the left of position $j$ with distances $1,2,\dots,j$ (looking right to left) and $n -1-j$ positions to the right of $j$, with distances $1,2,\dots, n-1-j$ (looking left to right). Using the well known formula for summing a sequence of consecutive integers,$$\delta_j = \frac{j(j+1)}{2}+\frac{(n-1-j)(n-j)}{2}.$$It follows that if symbol $i$ lands in position $j$, $\sum_j d_{ij} = \delta_j$. Considering all possible landing spots for symbol $i$ leads to the following constraints:$$\sum_k d_{ik} = \sum_j \delta_j x_{ij}\quad \forall i.$$A little experimentation showed that the combination of those constraints plus Rob's constraint improved MIP1 more than either did alone. That's the good news. The bad news is that I still haven't been able to prove optimality for Hardmath123's solution. I ran MIP1 with the extra constraints (using branching priorities, which help), using the Hardmath123 solution as the start and changing the CPLEX emphasis setting to 3. That last bit tells CPLEX to emphasize pushing the bound, which makes sense when you think you have the optimal solution and you're trying to get to proof of optimality. After 338 minutes (5.6 hours), the incumbent had not improved (so it's looking good as a possible optimum), but the gap was still around 14.5%, and the search tree was north of 1.25 GB and growing. To get a sense of what I might call the "futility index", it took eleven minutes to shrink the gap from 14.58% to 14.48%.

Here are a few final takeaways from all this.
• There are often multiple ways to write a MIP model for a discrete optimization problem. They'll often perform differently, and (at least in my experience) it's hard to anticipate which will do best.
• Constraint programming has potential for generating good incumbent solutions quickly for some discrete optimization problems, particularly those whose structure fits with the available selection of global constraints in the solver. Permutation problems will meet this criterion. CP may not be particularly helpful for proving optimality, though, since the bounding is weak.
• Seemingly redundant constraints can help ... but there's no guarantee. (Rob suggested an additional constraint in a comment to part III. I tried it, and it slowed things down.)
• Some days you get the bear, and some days the bear gets you.

## Saturday, December 15, 2018

### Of Typewriters and Permutations (IV)

I'm continuing the recent theme of solving a quadratic assignment problem that lays out the 26 letters of the English alphabet on a one-dimensional "keyboard" for an 18th century typewriter. I thought this would be the last post, but something new turned up, so there will likely be one more.

In the blog post that started all this, Hardmath123 found a solution (via a heuristic) with cost 5,499,341. Using frequency data from Nate Brixius, I get a slightly higher cost (5,510,008), which is the value I will use for it (because, hey, it's my blog). Hardmath123 suspected but could not prove that this layout is optimal. I still can't verify optimality (but maybe in the next post I will ... or maybe not).

In my previous epistles on this subject, I tried out three MIP models and a quadratic (integer) program. In five minute runs using a beta copy of the next version of CPLEX, the best I was able to do was a solution with objective value 5,686,878.

Out of curiosity, I tried a constraint programming (CP) model (using CP Optimizer, the constraint solver in the CPLEX Studio suite). Constraint programming is best suited to models with integer variables (which we have here), although it can handle floating-point variables to some extent. It is well suited to logic constraints, and in particular is popular in scheduling, but it's not my go-to tool in part because it tends to have very weak bounding.

Defining the constraint programming model for the keyboard problem (denoted "CP" in my code) is fairly straightforward, but rather different from any of the MIP/QP models. We use general integer variables rather than binary variables to determine the position assignments. So $p_i\in \{0,\dots,25\}$ is the position of symbol $i$ in the layout, $s_j \in \{0,\dots,25\}$ is the symbol at position $j$, and $d_{ik}\in \{0,\dots,25\}$ is the distance between symbols $i$ and $k$. (Recall that we are using zero-based indexing.) As before, the objective is$$\min \sum_i \sum_j f_{ij}d_{ij},$$where $f_{ij}$ is the frequency with which symbol $j$ follows symbol $i$.

Where it gets interesting is in specifying the constraints. Although CP is designed for logic constraints ("or", "and", "if-then"), it's real power comes from what they call "global constraints", specialized constraints that tie multiple variables together. Arguably the most common global constraint is "all-different", meaning that in a group of variables no two of them can take the same value. It is the key to defining permutations, and in our model we add all-different constraints for both $p$ (no two symbols have the same position) and $s$ (no two slots have the same symbol). One or the other of those is technically redundant, but redundant constraints in a CP model can help the solver, so it's probably worth including both.

Another global constraint provided by CP Optimizer is "inverse". CP models allow variables to be used to index other variables, something we cannot do directly in math programming modes. The inverse constraint, applied to two vector variables $u$ and $v$, says that $u_{v_i}=i$ and $v_{u_i}=i$. In our model, we can apply the inverse constraint to $p$ and $s$. In effect, it says that the symbol in the position occupied by symbol $i$ is $i$, and the position of the symbol in position $j$ is $j$.

Finally, we can define the distance variables using the position variables:$$d_{ij} = |p_i - p_j|\quad \forall i,j.$$ How does the CP model perform? As I expected, the bounding is pretty bad. In a five minute run, the lower bound only makes it to 447,795 (91.87% gap). The good news is that the CP model finds Hardmath123's solution with objective value 5,510,008 ... and finds it in about 22 seconds on my PC! This is using the default search strategy; I did not assign branching priorities.

The take-away here, from my perspective, is that in some problems a CP model can be a great way to generate an optimal or near-optimal incumbent, which you can use as a starting solution in an optimization model if you need to prove optimality.

## Friday, December 14, 2018

### Of Typewriters and Permutations (III)

This continues the discussion (okay, monologue) from the two previous posts about the problem of laying out a one-dimensional typewriter keyboard. This is not the last post in the series, but I can at least guarantee that the series is converging.

In the previous post, I gave a MIP model (named MIP1) that used binary variables $x_{ij}$ to signal whether or not symbol $i$ was placed at position $j$ on the keyboard. It also used auxiliary variables $p_i$ for the position of symbol $i$ and $d_{ij}$ for the distance between symbols $i$ and $j$.

You might wonder whether those auxiliary variables are worth having. I did, after the somewhat disappointing (to me) lack of progress in the lower bound when I ran MIP1. So I omitted the position variables in a new model, denoted MIP2 in the code. (Reminder: My Java code is available and free to play with.) I kept the distance variables because they make it easier to define the objective function.

MIP2 contains the $x$ and $d$ variables from MIP1, along with the constraints that rows and columns of the $x$ matrix sum to 1. It also retains the objective function from MIP1. To tie $d_{ij}$ to $x_{i\cdot}$ and $x_{j\cdot}$, MIP2 contains the following constraints:$$d_{ij} \ge |m-k|\left(x_{ik} + x_{jm} -1\right)\quad \forall i\neq j, \forall k\neq m.$$These constraints can be entered as "normal" constraints or, using an option in the command line, as lazy constraints (meaning they'll be held off to side and activated whenever a node LP solution violates one of them). Since the distances are symmetric, we actually need only about half of these. Rather than risking an indexing error in my code, I added constraints of the form$$d_{ij}=d_{ji}\quad \forall i, \forall j<i$$and let the CPLEX presolver take care of eliminating the redundant constraints for me.

Was MIP2 more efficient than MIP1? Nope, it was worse ... much worse. In a five minute run, the lower bound got stuck at zero, leaving the gap at 100%. (With the cut Rob Pratt suggested in a comment to the previous post, the lower bound got off zero but froze at 192,550, so that the gap was "only" 97.13% by the end of five minutes.

Having no luck with MIP2, I went back to MIP1. In an effort to improve its performance, I made one tweak: rather than putting the branching priorities on the assignment variables ($x$), I declared the position variables ($p$) to be integer (in MIP1 they were continuous) and put the branching priorities on them. Thus higher priority was given to pinning down the positions of higher usage characters. This model is designated "MIP3" in the code. It did a fair bit worse than MIP1.

As mentioned in the post by Nate Brixius (and in my first post), the underlying problem looks like a quadratic assignment problem (QAP). So I tried a model with a quadratic objective function (identified as "QP" in the code). It contains only the $x$ variables and row/column sum constraints from MIP1. The objective function is$$\min \sum_i \sum_{j\neq i} \sum_k \sum_{m\neq k} f_{ij} |k - m| x_{ik} x_{jm}.$$This model had a better incumbent than MIP2 but a worse incumbent than either MIP1 or MIP3, and the lower bound was stuck at zero. (Since there are no distance variables, Rob Pratt's cut is not applicable here.)

Here is a summary of results after five minutes, using emphasis 2 (optimality), branching priorities, and Rob's cut:

Model Incumbent Bound Gap
MIP1 5,686,878 2,970,708 47.76%
MIP2 6,708,297 192,550 97.13%
MIP3 5,893,907 1,503,828 74.49%
QP 6,056,668 0 100.00%

I'm not quite done. There's one more model to try, coming up in what will hopefully be my last post on this.

## Thursday, December 13, 2018

### Of Typewriters and Permutations (II)

This continues my previous post about the problem of optimally laying out a one-dimensional typewriter keyboard, where "optimally" is taken to mean minimizing the expected amount of lateral movement to type a few selected books. As I noted there, Nate Brixius correctly characterized the problem as a quadratic assignment problem (QAP). I'll in fact try out a quadratic model subsequently, but my inclination is always to try to linearize anything that can't outrun or outfight me. So I'll start by discussing a couple of mixed integer linear program formulations.

The starting point for all the math programming formulations is a matrix of binary variables $x_{ij}\in \{0,1\}$, where $x_{ij}=1$ if and only if symbol $i$ is placed in slot $j$ in the keyboard. (Consistent with Nate's post, I'll be using zero-based indexing, so symbol index $i=0$ will correspond to the letter "A" and position index $j=0$ will correspond to the left edge of the keyboard.) Since each letter needs to be placed exactly once, we need the constraints $$\sum_{j=0}^{25}x_{ij} = 1\quad \forall i.$$Similarly, each slot can only contain one character, so we need the constraints $$\sum_{i=0}^{25} x_{ij} = 1\quad \forall j.$$Each row and column of $x$ can also be declared to be a type 1 specially ordered set (SOS1), but in CPLEX that tends to be useful only if you can assign "meaningful" weights to the variables in each set. I'll return to that later.

Recall from the previous post that we have a $26\times 26$ matrix $f$, where $f_{ij}$ is the frequency with which symbol $j$ immediately follows symbol $i$. We can also get an overall frequency with which each symbol is used by summing the corresponding row and column of $f$. I'll denote the use frequency for symbol $i$ by $g_i$, where $g_i=\sum_{j=0}^{25}(f_{ij} + f_{ji})$. I'll use that for a couple of things, one of which is to eliminate a bit of symmetry. As I noted in that previous post, if we reverse any layout (treat it as listing symbols from right to left rather than from left to right), we get the same objective value as that of the original layout. We can break that symmetry by selecting one symbol and arbitrarily requiring it to be in the left half of the keyboard. Although it probably does not make much difference to the solver which symbol we use, I'll selecting the most frequently used symbol. So let $g_{i^*}=\max_i g_i$ (breaking ties arbitrarily). We will add the constraints $x_{i^*j}=0$ for $j=13,\dots,25$.

Back to the SOS1 stuff. When you declare an SOS1 constraint in CPLEX, CPLEX wants weights. It uses the weights to do some branching on the sets. Branching at a node typically means selecting an integer variable and splitting its domain to create two children (generally by rounding the value of the variable in the node solution up or down). With an SOS1 constraint, CPLEX can partition the set of variables involved into two subsets. In either child node, one subset of variables is fixed at zero and the other subset remains unfixed. The weights are used to help CPLEX decide how to split the set of variables. Here, we can try declaring each column of $x$ to be an SOS1 using the cumulative frequencies. So we tell CPLEX for each $j$ that $(x_{0,j}, x_{1,j},\dots,x_{25,j})$ is an SOS1 with corresponding weights $(g_0, g_1,\dots, g_{25})$. In the code I posted, using SOS1 constraints is optional.

Another option in the code is to assign branching priorities to the variables. This encourages CPLEX to branch on variables with higher priorities before branching on variables with lower priorities. If you were laying out the keyboard heuristically, you would probably want to put high usage symbols ("Hello, 'e'!") toward the center of the keyboard, where they would be easy to reach, and lower usage symbols ("q"?) toward the edges. So I assigned to each variable $x_{ij}$ the priority $g_i \times \min(j, 25-j)$. Again, this is an option in the code.

If you're still awake at this point, you'll realize that I have not yet specified an objective function, which is where the linearization is needed. In my first MIP model ("MIP1" in the code), I did this by introducing a bunch of auxiliary variables. First, for each $i$ let $p_i\in [0,25]$ denote the position (slot) that symbol $i$ occupies. We define $p$ with the constraints $$p_i =\sum_{j=0}^{25} j \times x_{ij} \quad \forall i.$$Note that the $p_i$ do not need to be declared integer-valued. Armed with them, I can define another set of continuous variables $d_{ij}\in [0,25]$ for all $i$ and $j$, where $d_{ij}$ will be the distance between symbols $i$ and $j$ in the layout. Since $d_{ij}=|p_i - p_j|$ and we cannot use absolute values explicitly, we do the standard linearization of the absolute value function, adding the constraints $$d_{ij}\ge p_i - p_j \quad \forall i,j$$and$$d_{ij}\ge p_j - p_i \quad \forall i,j.$$(Yes, I know that when $i=j$ this gives me two copies of $d_{ii} \ge 0$. I'll let the presolver take care of that.) Now we have a very simple, clean expression of the objective function:$$\min \sum_{i=0}^{25}\sum_{j=0}^{25} f_{ij} d_{ij}.$$
How does this model do? I ran it for five minutes on a decent desktop PC (using four threads). I included both the branching priorities and the SOS1 constraints, but the CPLEX presolver eliminated all the SOS1 constraints as "redundant". It did that even if skipped the branching priorities, which irks me a bit. Someday maybe I'll figure out why it's ignoring those carefully defined SOS1 weights. At any rate,  I did the five minute run with MIP emphasis 2, which emphasizes proving optimality. After five minutes, the incumbent solution had objective value 5,706,873. That's a bit worse than the solution Hardmath123 got in the original post. (Speaking of which, Hardmath123 quoted an objective value of 5,499,341 and posted a layout. I get a value of 5,510,008 for that solution. It may be that Nate's frequency data, which I'm using, differs slightly from the frequency data Hardmath123 used.)

Unfortunately, after five minutes the gap was still 56.55%, and closing very slowly. (The last two minutes of that five minute run only closed the gap from about 57.5% to 56.5%.) I'm pretty sure the actual optimal value will be a lot closer to 5.5 million that to the last lower bound in the five minute run (2,479,745). So we're contending with a somewhat weak bound.

Update: A longer run, using MIP emphasis 3 (which focuses on improving the lower bound), still had a gap of 33% after four hours.

The incumbent was found after a bit less than four minutes (which will be relevant as I explore other models, in future posts). Still more to come on this ...

## Wednesday, December 12, 2018

### Of Typewriters and Permutations (I)

This is going to be the first of a few posts on the same problem. My efforts are based on a blog post by Nate Brixius (@natebrix) titled "Optimizing 19th Century Typewriters", which in turn is based on a post titled "Tuning a Typewriter" by "Hardmath123". As the image in Hardmath123's post shows, an old (very old!) Crown typewriter used a "keyboard" that had all the symbols in a single row. You shifted the row left or right to get to the symbol you wanted next, then pressed a key to enter it. There was a second key, the equivalent of "shift" on today's keyboards, that let you strike an alternate symbol perched above the primary symbol at each position in the lone row.

If we ignore numerals and punctuation marks (which both Hardmath123 and Nate did), the problem of finding an optimal layout for the keyboard amounts to finding the best permutation of the 26 letters A through Z. The criterion we are all adopting for "best" is requiring the least movement, in some average sense, to get to the next letter. This requires some estimate of how often each possible transition occurs. Hardmath123 and Nate estimated those transition frequencies from three books in the Project Gutenberg library. Since Nate was kind enough to share his frequency data with me, I'm indirectly using those same books.

To be consistent with Nate's post, I'll use zero-based indexing, so "A" will have index 0 and "Z" will have index 25. Similarly, the leftmost slot on the keyboard will have index 0 and the rightmost will have index 25. The metric we have all adopted is to minimize $\sum_{i=0}^{25} \sum_{j=0}^{j=25} f_{ij} |p_i - p_j|$, where $f_{ij}$ is the frequency with which symbol $j$ immediately follows symbol $i$ and $p_i$ is the position in which symbol $i$ is located (so that $|p_i - p_j|$ is the distance shifted moving from symbol $i$ to symbol $j$). Note that $i = j$ is perfectly fine, since a letter can follow itself.

A brute force solution would require $26!/2 \approx 2\times 10^{26}$ layouts to be evaluated. The division by 2 is because the problem is bilaterally symmetric: given any layout, the reverse of that layout will have the same objective value. (This relies implicitly on the fact that we do not differentiate between moving $n$ spaces to the left and moving $n$ spaces to the right.) Hardmath123 applied a heuristic that I would characterize as a version of 2-opt (with restarts) and found a solution with cost 5,499,341. Nate ran a heuristic by a third party and matched Hardmath123's result "within seconds". He also modeled the problem as a quadratic assignment problem (more on that later) and ran a solver he'd written himself, back in the day ... but did not let it run to completion, and did not say how good the incumbent was when he terminated the run.

I tried several models, in an attempt (as yet unsuccessful) to get a provably optimal solution. All the models used a beta copy of version 12.9 of IBM's CPLEX Studio, but perform similarly with the current production version (12.8). In subsequent posts, I'll document the various models I tried and give some results. Meanwhile, you are welcome to play with my Java code, the repository for which is publicly available. In addition to my Java code and the usual miscellany (README file, license file, ...), you'll find a text file (cro26_data.txt) with the frequencies being used, which Nate kindly is letting me share. To run the code, you will need a recent version of the CPLEX Studio suite, and you will need to put all three major pieces (CPLEX, CPOptimizer and OPL) on your library and class paths. You'll also need the open source Apache Commons CLI library, which I use to process command line options. Once you have the code installed and linked up, you can run the program with the command line option -h or --help (that's a double dash before "help") to get a list of what can/should be specified as options.

More to come ...