Monday, July 20, 2020

Longest Increasing Subsequence

In a recent blog post (whose title I have shamelessly appropriated), Erwin Kalvelagen discusses a mixed-integer nonlinear programming formulation (along with possible linearizations) for a simple problem from a coding challenge: "Given an unsorted array of integers, find the length of longest increasing subsequence." The challenge stipulates at worst $O(n^2)$ complexity, where $n$ is the length of the original sequence. Erwin suggests the intent of the original question was to use dynamic programming, which makes sense and meets the complexity requirement.

I've been meaning for a while to start fiddling around with binary decision diagrams (BDDs), and this seemed like a good test problem. Decision diagrams originated in computer science, where the application was evaluation of possibly complicated logical expressions, but recently they have made their way into the discrete optimization arena. If you are looking to familiarize yourself with decision diagrams, I can recommend a book by Bergman et al. [1].

Solving this problem with a binary decision diagram is equivalent to solving it with dynamic programming. Let $[x_1, \dots, x_n]$ be the original sequence. Consistent with Erwin, I'll assume that the $x_i$ are nonnegative and that the subsequence extracted must be strictly increasing.

We create a layered digraph in which each node represents the value of the largest (and hence most recent) element in a partial subsequence, and has at most two children. Within a layer, no two nodes have the same state, but nodes in different layers can have the same state. We have $n+2$ layers, where in layer $j\in\lbrace 1,\dots,n \rbrace$ you are deciding whether or not to include $x_j$ in your subsequence. One child, if it exists, represents the state after adding $x_j$ to the subsequence. This child exists only if $x_j$ is greater than the state of the parent node (because the subsequence must be strictly increasing). The other child, which always exists, represents the state when $x_j$ is omitted (which will be the same as the state of the parent node). Layer 1 contains a root node (with state 0), layer $n+1$ contains nodes corresponding to completed subsequences, and layer $n+2$ contains a terminal node (whose state will be the largest element of the chosen subsequence). Actually, you could skip layer $n+1$ and follow layer $n$ with the terminal layer; in my code, I included the extra layer mainly for demonstration purposes (and debugging).

In the previous paragraph, I dealt a card off the bottom of the deck. The state of a node in layer $j$ is the largest element of a partial subsequence based on including or excluding $x_1,\dots,x_{j-1}$. The sneaky part is that more than one subsequence may be represented at that node (since more than one subsequence of $[x_1,\dots,x_{j-1}]$ my contain the same largest element). In addition to the state of a node, we also keep track at each node of the longest path from the root node to that node and the predecessor node along the longest path, where length is defined as the number of yes decisions from the root to that node. So although multiple subsequences may lead to the same node, we only care about one (the longest path, breaking ties arbitrarily). Note that by keeping track of the longest path from root to each node as we build the diagram, we actually solve the underlying problem during the construction of the BDD.

The diagram for the original example ($n=8$) is too big to fit here, so I'll illustrate this using a smaller initial vector: $x=[9, 2, 5, 3]$. The BDD is shown below (as a PDF file, so that you can zoom in or out while maintaining legibility).

The first four layers correspond to decisions on whether to use a sequence entry or not. (The corresponding entries are shown in the right margin.) Nodes "r" and "t" are root and terminus, respectively. The remaining nodes are numbered from 1 to 14. Solid arrows represent decisions to use a value, so for instance the solid arrow from node 4 to node 8 means that 5 ($x_3$) has been added to the subsequence. Dashed arrows represent decisions not to use a value, so the dashed arrow from node 4 to node 7 means that 5 ($x_3$) is not being added to the subsequence. Dotted arrows (from the fifth layer to the sixth) do not represent decisions, they just connect the "leaf" nodes to the terminus.

The green(ish) number to the lower left of a node is the state of the node, which is the largest element included so far in the subsequence. The subsequence at node 4 is just $[2]$ and the state is 2. At node 7, since we skipped the next element, the subsequence and state remain the same. At node 8, the subsequence is now $[2, 5]$ and the state changes to 5.

The red numbers $d_i:p_i$ to the lower right of a node $i$ are the distance (number of solid arcs) from the root to node $i$ along the longest path ($d_i$) and the predecessor of node $i$ on the longest path ($p_i$). Two paths converge at $i=13$: a path $r \dashrightarrow 2 \rightarrow 4 \dashrightarrow 7 \rightarrow 13$ of length 2 and a path $r \dashrightarrow 2 \dashrightarrow 5 \dashrightarrow 9 \rightarrow 13$ of length 1. So the longest path to node 13 has length 2 and predecessor node 7. Backtracking from the terminus (distance 2, predecessor either 12 or 13), we get optimal paths $r \dashrightarrow 2 \rightarrow 4 \rightarrow 8 \dashrightarrow 12 \dashrightarrow t$ (subsequence $[2, 5]$) and $r \dashrightarrow 2 \rightarrow 4 \dashrightarrow 7 \rightarrow 13 \dashrightarrow t$ (subsequence $[2, 3]$), the latter shown in blue.

In addition to the original example from the coding challenge ($n=8$), Erwin included an example with $n=100$ and longest increasing subsequence length 15. (There are multiple optimal solutions to both the original example and the larger one.) Gurobi solved the larger example to proven optimality in one second (probably less, since the output likely rounded up the time). My highly non-optimized Java code solved the $n=100$ example in 6 ms. on my PC (not including the time to print the results).

BDDs can get large in practice, with layers growing combinatorially. In this case, however, that is not a problem. Since the state of a node is the largest value of a subsequence, there can be at most $n$ different states. Given the stipulation that no two nodes in a layer have the same state, that means at most $n$ states in a layer. For Erwin's example with $n=100$, the largest layer in fact contained 66 nodes.

As I said earlier, using the BDD here is equivalent to using dynamic programming. With $n+2$ layers, at most $n$ nodes in a layer, and two operations on each node (figuring out the state and path length of the "yes" child and the "no" child), the solution process is clearly $O(n^2)$.

[1] D. Bergman, A. A. Cire, W.-J. van Hoeve and J. Hooker. Decision Diagrams for Optimization (B. O’Sullivan and M. Wooldridge, eds.).  Springer International Publishing AG, 2016.