The starting point for all the math programming formulations is a matrix of binary variables $x_{ij}\in \{0,1\}$, where $x_{ij}=1$ if and only if symbol $i$ is placed in slot $j$ in the keyboard. (Consistent with Nate's post, I'll be using zero-based indexing, so symbol index $i=0$ will correspond to the letter "A" and position index $j=0$ will correspond to the left edge of the keyboard.) Since each letter needs to be placed exactly once, we need the constraints $$\sum_{j=0}^{25}x_{ij} = 1\quad \forall i.$$Similarly, each slot can only contain one character, so we need the constraints $$\sum_{i=0}^{25} x_{ij} = 1\quad \forall j.$$Each row and column of $x$ can also be declared to be a type 1 specially ordered set (SOS1), but in CPLEX that tends to be useful only if you can assign "meaningful" weights to the variables in each set. I'll return to that later.

Recall from the previous post that we have a $26\times 26$ matrix $f$, where $f_{ij}$ is the frequency with which symbol $j$ immediately follows symbol $i$. We can also get an overall frequency with which each symbol is used by summing the corresponding row and column of $f$. I'll denote the use frequency for symbol $i$ by $g_i$, where $g_i=\sum_{j=0}^{25}(f_{ij} + f_{ji})$. I'll use that for a couple of things, one of which is to eliminate a bit of symmetry. As I noted in that previous post, if we reverse any layout (treat it as listing symbols from right to left rather than from left to right), we get the same objective value as that of the original layout. We can break that symmetry by selecting one symbol and arbitrarily requiring it to be in the left half of the keyboard. Although it probably does not make much difference to the solver which symbol we use, I'll selecting the most frequently used symbol. So let $g_{i^*}=\max_i g_i$ (breaking ties arbitrarily). We will add the constraints $x_{i^*j}=0$ for $j=13,\dots,25$.

Back to the SOS1 stuff. When you declare an SOS1 constraint in CPLEX, CPLEX wants weights. It uses the weights to do some branching on the sets. Branching at a node typically means selecting an integer variable and splitting its domain to create two children (generally by rounding the value of the variable in the node solution up or down). With an SOS1 constraint, CPLEX can partition the set of variables involved into two subsets. In either child node, one subset of variables is fixed at zero and the other subset remains unfixed. The weights are used to help CPLEX decide how to split the set of variables. Here, we can try declaring each column of $x$ to be an SOS1 using the cumulative frequencies. So we tell CPLEX for each $j$ that $(x_{0,j}, x_{1,j},\dots,x_{25,j})$ is an SOS1 with corresponding weights $(g_0, g_1,\dots, g_{25})$. In the code I posted, using SOS1 constraints is optional.

Another option in the code is to assign branching priorities to the variables. This encourages CPLEX to branch on variables with higher priorities before branching on variables with lower priorities. If you were laying out the keyboard heuristically, you would probably want to put high usage symbols ("Hello, 'e'!") toward the center of the keyboard, where they would be easy to reach, and lower usage symbols ("q"?) toward the edges. So I assigned to each variable $x_{ij}$ the priority $g_i \times \min(j, 25-j)$. Again, this is an option in the code.

If you're still awake at this point, you'll realize that I have not yet specified an objective function, which is where the linearization is needed. In my first MIP model ("MIP1" in the code), I did this by introducing a bunch of auxiliary variables. First, for each $i$ let $p_i\in [0,25]$ denote the position (slot) that symbol $i$ occupies. We define $p$ with the constraints $$p_i =\sum_{j=0}^{25} j \times x_{ij} \quad \forall i.$$Note that the $p_i$ do

*not*need to be declared integer-valued. Armed with them, I can define another set of continuous variables $d_{ij}\in [0,25]$ for all $i$ and $j$, where $d_{ij}$ will be the distance between symbols $i$ and $j$ in the layout. Since $d_{ij}=|p_i - p_j|$ and we cannot use absolute values explicitly, we do the standard linearization of the absolute value function, adding the constraints $$d_{ij}\ge p_i - p_j \quad \forall i,j$$and$$d_{ij}\ge p_j - p_i \quad \forall i,j.$$(Yes, I know that when $i=j$ this gives me two copies of $d_{ii} \ge 0$. I'll let the presolver take care of that.) Now we have a very simple, clean expression of the objective function:$$\min \sum_{i=0}^{25}\sum_{j=0}^{25} f_{ij} d_{ij}.$$

How does this model do? I ran it for five minutes on a decent desktop PC (using four threads). I included both the branching priorities and the SOS1 constraints, but the CPLEX presolver eliminated all the SOS1 constraints as "redundant". It did that even if skipped the branching priorities, which irks me a bit. Someday maybe I'll figure out why it's ignoring those carefully defined SOS1 weights. At any rate, I did the five minute run with MIP emphasis 2, which emphasizes proving optimality. After five minutes, the incumbent solution had objective value 5,706,873. That's a bit worse than the solution Hardmath123 got in the original post. (Speaking of which, Hardmath123 quoted an objective value of 5,499,341 and posted a layout. I get a value of 5,510,008 for that solution. It may be that Nate's frequency data, which I'm using, differs slightly from the frequency data Hardmath123 used.)

Unfortunately, after five minutes the gap was still 56.55%, and closing very slowly. (The last two minutes of that five minute run only closed the gap from about 57.5% to 56.5%.) I'm pretty sure the actual optimal value will be a lot closer to 5.5 million that to the last lower bound in the five minute run (2,479,745). So we're contending with a somewhat weak bound.

**Update**: A longer run, using MIP emphasis 3 (which focuses on improving the lower bound), still had a gap of 33% after four hours.

The incumbent was found after a bit less than four minutes (which will be relevant as I explore other models, in future posts). Still more to come on this ...

Two observations:

ReplyDelete1. Because distance is symmetric, you can cut the problem size in half by using variables $d_{ij}$ with $i < j$ and then minimizing $\sum_{i < j} (f_{ij}+f_{ji}) d_{ij}$. Equivalently, enforce $d_{ij} = d_{ji}$ in your original formulation, and let the presolver do this for you.

2. Because $\sum_{k=1}^{n-1} k(n-k) = (n^3-n)/6$, where $n=26$, is the sum of distances between unordered pairs of positions, you can add a cut $\sum_{i < j} d_{ij} = 2925$.

You are correct about item 1, and in fact in the MIP2 model (discussed in the next post, which I'm actually working on right now) I did what you suggested second (enforce symmetry and let the presolver simplify things).

DeleteYour second cut is interesting. I would not have expected it to help much, but it does make the lower bound tighter. Compared to my initial run, the cut gives a higher lower bound out of the gate. For the first couple of minutes or so, it slows down progress on the primal bound, but between three and four minutes in I get a better primal bound with the constraint than without. By the five minute mark, CPLEX is again nibbling at the lower bound, but at least the gap is smaller. I'm adding the cut as another command line option (with your name on it).