We can view the source "list" as a vector $x\in\mathbb{Z}^n$ for some dimension $n$ (equal to the length of the list). With duplicates removed, we can view the differences as a set $D\subset \mathbb{Z}_+$. So the question has to do with recovering $x$ from $D$. Our first observation kills any chance of recovering the original list with certainty:
If $x$ produces difference set $D$, then for any $t\in\mathbb{R}$ the vector $x+t\cdot (1,\dots,1)'$ produces the same set $D$ of differences.Translating all components of $x$ by a constant amount has no effect on the differences. So there will be an infinite number of solutions for a given difference set $D$. A reasonable approach (proposed by Håkan in his question) is to look for the shortest possible list, i.e., minimize $n$.
Next, observe that $0\in D$ if and only if two components of $x$ are identical. If $0\notin D$, we can assume that all components of $x$ are distinct. If $0\in D$, we can solve the problem for $D\backslash\{0\}$ and then duplicate any one component of the resulting vector $x$ to get a minimum dimension solution to the original problem.
Combining the assumption that $0\notin D$ and the observation about adding a constant having no effect, we can assume that the minimum element of $x$ is 1. That in turn implies that the maximum element of $x$ is $1+m$ where $m=\max(D)$.
From there, Håkan went on to solve a test problem using constraint programming (CP). Although I'm inclined to suspect that CP will be more efficient in general than an integer programming (IP) model, I went ahead and solved his test problem via an IP model (coded in Java and solved using CPLEX 12.10). CPLEX's solution pool feature found the same four solutions to Håkan's example that he did, in under 100 ms. How well the IP method scales is an open question, but it certainly works for modest size problems.
The IP model uses binary variables $z_1, \dots, z_{m+1}$ to decide which of the integers $1,\dots,m+1$ are included in the solution $x$. It also uses variables $w_{ij}\in [0,1]$ for all $i,j\in \{1,\dots,m+1\}$ such that $i \lt j$. The intent is that $w_{ij}=1$ if both $i$ and $j$ are included in the solution, and $w_{ij} = 0$ otherwise. We could declare the $w_{ij}$ to be binary, but we do not need to; constraints will force them to be $0$ or $1$.
The full IP model is as follows:
\[
\begin{array}{lrlrc}
\min & \sum_{i=1}^{m+1}z_{i} & & & (1)\\
\textrm{s.t.} & w_{i,j} & \le z_{i} & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (2)\\
& w_{i,j} & \le z_{j} & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (3)\\
& w_{i,j} & \ge z_{i}+z_{j}-1 & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (4)\\
& w_{i,j} & =0 & \forall i,j\in\left\{ 1,\dots,m+1\right\} \textrm{ s.t. }(j-i)\notin D & (5)\\
& \sum_{i,j\in\left\{ 1,\dots,m+1\right\} |j-i=d}w_{i,j} & \ge 1 & \forall d\in D & (6)\\
& z_{1} & = 1 & & (7)
\end{array}
\]
The objective (1) minimizes the number of integers used. Constraints (2) through (4) enforce the rule that $w_{ij}=1$ if and only if both $z_i$ and $z_j$ are $1$ (i.e., if and only if both $i$ and $j$ are included in the solution). Constraint (5) precludes the inclusion of any pair $i < j$ whose difference $j - i$ is not in $D$, while constraint (6) says that for each difference $d \in D$ we must include at least one pair $i < j$ for that produces that difference ($j - i = d$). Finally, since we assumed that our solution starts with minimum value $1$, constraint (7) ensures that $1$ is in the solution. (This constraint is redundant, but appears to help the solver a little, although I can't be sure given the short run times.)
My Java code is available from my repository (bring your own CPLEX).
Thanks for the blog post and MIP model, Paul!
ReplyDeleteI converted it to a Picat model: hakank.org/picat/linear_combinations_mip.pi The CBC solver took a long time to get all solutions, the SAT solver too 1s, but the CP solver was quite fast: finished in about 0.1s.
The original CP model takes about 40ms to solve this problem so it's faster.
Also, I added the constraint that Z[M+1] #= 1 since we know that both 1 and max(Diffs) + 1 must be in the (normalized) solution.
In the MIP model, constraint (6) for the case $d=m$ implies $w_{1,m+1}=1$, which in turn implies via (2) and (3) that $z_1 =z_{m+1}=1$. So I'm not sure why constraint (7) helps (maybe it doesn't and it's just random luck), and $z_{m+1}=1$ similarly should not help much. Does it help with the CP model?
DeleteYes, the two constraints (z[1] = 1 and z[m+1] = 1) helps both the CP model (as well as the SAT model). All constraints that prunes the search tree - here fixing two values in z - are good for CP. :-) The MIP solver (CBC) is also significantly faster with these constraints than without, though much slower than CP and SAT.
ReplyDeleteFor the CP model the time with these constraints are about 0.10s and without about 0.16s.
One thing I noted - in the Picat model - was that the w matrix is under constrained, and ot generated a lot of different solutions of w for the same value of z. I tried to fix this in the (MIP) model but didn't succeed. Perhaps CPLEX don't care about this in the model?
(The Picat specific trick that fixed this was to ignore w when calling the solver.)
I don't understand how you would be getting multiple solutions for $w$ for the same values of $z$. Constraints (2) through (4) create an equivalence between $w_{ij}$ and $z_i \wedge z_j$.
DeleteYou are - of course - correct. I had misplaced the Eq 5 constraints. Now there is no underconstrained w. Great!
ReplyDeleteAnd it's quite faster now:
- CP: 0.07s
- SAT: 0.38s
- CBC: 10.4s
Compare with the original CP/SAT model where the CP solver took 0.4s for the same problem.
Tomorrow I will test the MIP model on harder problems and also create a MiniZinc model.