Monday, January 1, 2018

Ordering Index Vector with Java Streams

I bumped up against the following problem while doing some coding in Java 8 (and using streams where possible). Given a vector of objects $x_1, \dots, x_N$ that come from some domain having an ordering $\preccurlyeq$, find the vector of indices $i_1, \dots, i_N$ that sorts the original values into ascending order, i.e., such that $x_{i_1} \preccurlyeq x_{i_2} \preccurlyeq \cdots \preccurlyeq x_{i_N}$ I'm not sure there's an "official" name for that vector of indices, but I've seen it referred to more than once as the "ordering index vector" (hence the name of this post).

One of the nice features of the Java stream package is that it is really easy to sort streams, either using their natural ordering (where applicable) or using a specified ordering. As best I can tell, though, there is (at least currently) no built-in way to find out the original indices or positions of the sorted results. Fortunately, it didn't take to long to hack something that works. It may not be the most elegant way to get the ordering vector, but I'll share it anyway in case someone finds it useful.

My example will sort a vector of doubles, since that was what I was trying to do when I was forced to come up with this code. With fairly obvious modifications, it should work for sorting vectors of other types. Here is the code. Please try not to laugh.

// Create a vector of values whose order is desired.
double[] vals = ...
// Get the sort order for the values.
int[] order =
IntStream.range(0, vals.length)
.boxed()
.sorted((i, j) -> Double.compare(vals[i], vals[j]))
.mapToInt(x -> x)
.toArray();
// The sorted list is vals[order[0]], vals[order[1]], ...


The stream has to take a winding trip through the Ugly Forest to get this to work. We start out with an IntStream, because that is the easiest way to get a stream of indices. Unfortunately, sorting using a specified comparator is not supported by IntStream, so we have to "box" it get a stream of integers (Stream<Integer>). (Yes, fans, IntStream and stream of integer are two separate things.) The stream of integers is sorted by comparing the values they index in the original vector of double precision reals. The we use the identity function to map the stream of integers back to an IntStream (!) so that we can easily convert it to a vector of integers (meaning int[], not Integer[]).

When I said this might not be the "most elegant" approach, what I meant was that it looks clunky (at least to me). I look forward to being schooled in the comments section.