Monday, February 17, 2020

Reversing Differences

Fellow blogger Håkan Kjellerstrand posted an interesting question on OR Stack Exchange recently. Starting from a list of integers, it is trivial to compute the list of all pairwise absolute differences, but what about going in the other direction? Given the pairwise (absolute) differences, with duplicates removed, can you recover the source list (or a source list)?

We can view the source "list" as a vector $x\in\mathbb{Z}^n$ for some dimension $n$ (equal to the length of the list). With duplicates removed, we can view the differences as a set $D\subset \mathbb{Z}_+$. So the question has to do with recovering $x$ from $D$. Our first observation kills any chance of recovering the original list with certainty:
If $x$ produces difference set $D$, then for any $t\in\mathbb{R}$ the vector $x+t\cdot (1,\dots,1)'$ produces the same set $D$ of differences.
Translating all components of $x$ by a constant amount has no effect on the differences. So there will be an infinite number of solutions for a given difference set $D$. A reasonable approach (proposed by Håkan in his question) is to look for the shortest possible list, i.e., minimize $n$.

Next, observe that $0\in D$ if and only if two components of $x$ are identical. If $0\notin D$, we can assume that all components of $x$ are distinct. If $0\in D$, we can solve the problem for $D\backslash\{0\}$ and then duplicate any one component of the resulting vector $x$ to get a minimum dimension solution to the original problem.

Combining the assumption that $0\notin D$ and the observation about adding a constant having no effect, we can assume that the minimum element of $x$ is 1. That in turn implies that the maximum element of $x$ is $1+m$ where $m=\max(D)$.

From there, Håkan went on to solve a test problem using constraint programming (CP). Although I'm inclined to suspect that CP will be more efficient in general than an integer programming (IP) model, I went ahead and solved his test problem via an IP model (coded in Java and solved using CPLEX 12.10). CPLEX's solution pool feature found the same four solutions to Håkan's example that he did, in under 100 ms. How well the IP method scales is an open question, but it certainly works for modest size problems.

The IP model uses binary variables $z_1, \dots, z_{m+1}$ to decide which of the integers $1,\dots,m+1$ are included in the solution $x$. It also uses variables $w_{ij}\in [0,1]$ for all $i,j\in \{1,\dots,m+1\}$ such that $i \lt j$. The intent is that $w_{ij}=1$ if both $i$ and $j$ are included in the solution, and $w_{ij} = 0$ otherwise. We could declare the $w_{ij}$ to be binary, but we do not need to; constraints will force them to be $0$ or $1$.

The full IP model is as follows:

\[ \begin{array}{lrlrc} \min & \sum_{i=1}^{m+1}z_{i} & & & (1)\\ \textrm{s.t.} & w_{i,j} & \le z_{i} & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (2)\\ & w_{i,j} & \le z_{j} & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (3)\\ & w_{i,j} & \ge z_{i}+z_{j}-1 & \forall i,j\in\left\{ 1,\dots,m+1\right\} ,i\lt j & (4)\\ & w_{i,j} & =0 & \forall i,j\in\left\{ 1,\dots,m+1\right\} \textrm{ s.t. }(j-i)\notin D & (5)\\ & \sum_{i,j\in\left\{ 1,\dots,m+1\right\} |j-i=d}w_{i,j} & \ge 1 & \forall d\in D & (6)\\ & z_{1} & = 1 & & (7) \end{array} \]

The objective (1) minimizes the number of integers used. Constraints (2) through (4) enforce the rule that $w_{ij}=1$ if and only if both $z_i$ and $z_j$ are $1$ (i.e., if and only if both $i$ and $j$ are included in the solution).  Constraint (5) precludes the inclusion of any pair $i < j$ whose difference $j - i$ is not in $D$, while constraint (6) says that for each difference $d \in D$ we must include at least one pair $i < j$ for that produces that difference ($j - i = d$). Finally, since we assumed that our solution starts with minimum value $1$, constraint (7) ensures that $1$ is in the solution. (This constraint is redundant, but appears to help the solver a little, although I can't be sure given the short run times.)

My Java code is available from my repository (bring your own CPLEX).

5 comments:

  1. Thanks for the blog post and MIP model, Paul!

    I converted it to a Picat model: hakank.org/picat/linear_combinations_mip.pi The CBC solver took a long time to get all solutions, the SAT solver too 1s, but the CP solver was quite fast: finished in about 0.1s.

    The original CP model takes about 40ms to solve this problem so it's faster.

    Also, I added the constraint that Z[M+1] #= 1 since we know that both 1 and max(Diffs) + 1 must be in the (normalized) solution.

    ReplyDelete
    Replies
    1. In the MIP model, constraint (6) for the case $d=m$ implies $w_{1,m+1}=1$, which in turn implies via (2) and (3) that $z_1 =z_{m+1}=1$. So I'm not sure why constraint (7) helps (maybe it doesn't and it's just random luck), and $z_{m+1}=1$ similarly should not help much. Does it help with the CP model?

      Delete
  2. Yes, the two constraints (z[1] = 1 and z[m+1] = 1) helps both the CP model (as well as the SAT model). All constraints that prunes the search tree - here fixing two values in z - are good for CP. :-) The MIP solver (CBC) is also significantly faster with these constraints than without, though much slower than CP and SAT.

    For the CP model the time with these constraints are about 0.10s and without about 0.16s.

    One thing I noted - in the Picat model - was that the w matrix is under constrained, and ot generated a lot of different solutions of w for the same value of z. I tried to fix this in the (MIP) model but didn't succeed. Perhaps CPLEX don't care about this in the model?

    (The Picat specific trick that fixed this was to ignore w when calling the solver.)

    ReplyDelete
    Replies
    1. I don't understand how you would be getting multiple solutions for $w$ for the same values of $z$. Constraints (2) through (4) create an equivalence between $w_{ij}$ and $z_i \wedge z_j$.

      Delete
  3. You are - of course - correct. I had misplaced the Eq 5 constraints. Now there is no underconstrained w. Great!

    And it's quite faster now:
    - CP: 0.07s
    - SAT: 0.38s
    - CBC: 10.4s

    Compare with the original CP/SAT model where the CP solver took 0.4s for the same problem.

    Tomorrow I will test the MIP model on harder problems and also create a MiniZinc model.

    ReplyDelete

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.