Monday, May 10, 2021

Symmetric Difference of Sets in Java

The symmetric difference of two sets $A$ and $B$ is defined as $$C=(A \cup B)\backslash (A \cap B).$$It is the set of objects that belong to either $A$ or $B$ but not both. I've been working on some Java code for a research project in which I will need to compute the sizes of the symmetric differences of a large number of pairs of sets. The contents of the sets are nonnegative integers (Java type Integer), which makes life a bit simpler because integers are nicely ordered. (In Java-speak, Integer implements the Comparable<Integer> interface.) Since I will be doing a large number of differences, and since the sets involved will be moderately large (by my standards), I wanted to find the fastest way to compute a difference. So I wrote a Java program to generate random pairs of integer sets and compute the size of their symmetric difference four different ways.

Since the introduction of streams in Java (I believe in version 8), what I think is the most obvious/intuitive way to do it is to convert each set to a stream, filter out members of the other set, and count up the survivors. On the other hand, I am a happy user of the Apache Commons Collections library, whose CollectionsUtils class provides not one but two ways to do this. One is to use the subtract() method twice, switching the order of the arguments. This does the same thing that my first (stream-based) method does. The other is to use the disjunction() method, which calculates the symmetric difference in one invocation.

Equally obvious to me from a mathematical perspective, but not from a Java perspective, is to take the bitwise exclusive OR of the characteristic vectors of the two sets. Java contains a BitSet class, with set() and flip() methods for individual bits, which makes this easy to do.

My program runs the experiments on instances of both HashSet<Integer> and TreeSet<Integer>. Hash sets are generally faster, but tree sets have more features (and impose an internal ordering on their contents). My somewhat naive intuition was that having the contents accessed in ascending order would make differencing tree sets faster than differencing hash sets.

There are lots of moving parts here (how large the integers get, how big the sets involved in the comparisons are, ...), so I hesitate to read too much into the results. That said, here are some timing results using sets of cardinality between 100 and 200 with integers from 0 to 100,000. Times are in microseconds and are averages of 10,000 replications. (You can afford big samples when things go this quickly.)


Method Streams subtract()
disjunction() BitSet
HashSet 10.07 36.40 62.15 9.68
TreeSet 23.79 35.37 60.94 7.21

 

I was surprised (OK, shocked) to find that the disjunction() method, which directly computes the symmetric difference, was almost twice as slow as two invocations of the subtract() method. Less surprising was that turning both sets into streams and filtering each other out was faster than calling subtract() twice. (Note that I am computing the size of the symmetric difference, not the symmetric difference itself. Uniting the two filtered streams or the two sets resulting from calls to subtract() into one combined set might move their times closer to those of disjunction()).

Another surprise to me was that the approach using streams was consistently slower with tree sets than it was with hash sets, despite the tree sets being inherently ordered. The characteristic vector (BitSet) approach seemed to have a slight preference for tree sets, but I'm not entirely sure that was consistent.

In this particular run, the characteristic vector approach (using BitSet) beat all comers. In other runs with similar parameters, streams sometimes beat BitSet and sometimes did not when the sets were instances of HashSet. With TreeSet, using BitSet appeared to be consistently faster. Let's contrast that with a second experiment, in which the integers are still between 0 and 100,000 but the sets are smaller (cardinality between 10 and 20).


Method Streams subtract()
disjunction() BitSet
HashSet 3.18 5.96 9.75
6.24
TreeSet 3.11
4.89
8.10
4.07

 

Note that with the smaller sets the method using streams beats the characteristic vector (BitSet) method, and even the Commons Collections subtract() method may be a bit better than using the characteristic vector.

For my project, I am using instances of TreeSet, and I'm fairly certain the sets will (mostly) have cardinality in the low hundreds, so I will probably go with the BitSet approach. If anyone would like to run their own tests, my code is available in a GitLab repository.

Update: It belatedly occurred to me that in my research project, where I only need to get the size of the symmetric difference and not the symmetric difference itself, it might make sense to exploit the fact that the size of the symmetric difference between $A$ and $B$ is$$\vert A\vert + \vert B \vert - 2 \cdot \vert A \cap B\vert.$$So I can compute the size of the intersection, do a little arithmetic, and have the size of the symmetric difference.

I updated my code to include two versions of this approach. One version computes the intersection by streaming one set and filtering it based on inclusion in the other set. The other version uses the CollectionUtils.intersection() method. Once again, the CollectionUtils method was not competitive. Comparing the stream intersection method to the original stream approach and the characteristic vector approach using the original specifications for sets (cardinality 100 to 200), it seems the streamed intersection method is about twice as fast as the original stream method on both hash sets and tree sets (which makes sense, since it does with one stream what the original method did with two). It is also about twice as fast as the characteristic vector method on hash sets, but roughly half as fast on tree sets, as seen in the table below. (Times for the first two methods differ from previous tables because exact timings are unfortunately not reproducible.)


Method Streams BitSet Intersection
HashSet 11.33 10.33 5.59
TreeSet 23.32 7.47
12.23

 

So the single stream intersection approach looks to be the fastest for computing the size (but, again, not contents) of the symmetric difference if I'm using hash sets, while the characteristic vector (BitSet) approach is fastest if I'm using tree sets.

Wednesday, April 28, 2021

A BDD with JGraphT

Decision diagrams, and in particular binary decision diagrams (BDDs) [1], were originally introduced in computer science to evaluate logic propositions or boolean functions. Lately, they've taken on multiple roles in discrete optimization [2]. I've been reading an excellent book [3] on them, with ideas about using BDDs in a current research project. As is my wont, I'll be coding the research in Java, so I wanted to do a little demo project to figure out how to build and process a BDD in Java.

Not wanting to reinvent any wheels, I looked for an open-source Java graph library with which to build the diagrams, and settled on JGraphT [4]. Not only does JGraphT have the necessary building blocks, it has much better online documentation than many libraries. Also, there is a very helpful article [5] about it on the Baeldung web site (which is itself an extremely useful site for all things Java).

A BDD is a directed, acyclic, layered multigraph with edge weights. If you're not familiar with the term "multigraph", it means that there can be two or more distinct edges between the same pair of nodes, in the same direction. In a BDD, each node represents a state of the system, with up to two outbound arcs, corresponding to true (1) or false (0) values for a particular decision variable. The decision variable is the same for all nodes in a particular layer. An arc is omitted if it represents a decision which, given the state, would make the solution infeasible. To keep the size of the BDD in check (somewhat), you do not want multiple nodes in a layer with the same state. The multigraph aspect arises because, in some circumstances, the next state may be the same regardless of the decision at the current node (so both arcs go to the same child node). Among the attractions of the JGraphT library are its support for nodes based on arbitrary classes (which in a BDD means the state at the node) and for multigraphs.

To learn how to build BDDs with JGraphT, I decided to solve a maximal independent set problem (MISP) [6] with integer node weights. This means choosing the subset of nodes with greatest total weight such that no two chosen nodes are adjacent. JGraphT contains routines to generate some well-known (to graph theorists -- less well known to me) graphs, and for convenience I chose the Chvátal graph [7], which has 12 nodes and 24 edges. Here is an illustration of the Chvátal graph, with the (randomly generated) node weights in parentheses.

Chvatal graph
My Java program uses routines in the JGraphT library to turn the graph into a DOT file [8], which it saves. I then use GraphViz [9] outside the Java program to convert the DOT file into the format I need for wherever the plot is going.

Using the same DOT export trick, I managed to generate a plot of the BDD, in which nodes display the set of vertices still available for addition to the independent set, arcs are solid if a vertex is being added to the independent and dotted if not, and solid arcs are annotated with the number of the vertex being added.

BDD graph
Unfortunately, Blogger does not accept SVG images and the BDD is a bit too big for a legible PNG graph. If you want to see a better image, click it and an SVG version should open in a new window or tab.

This post is already a bit long, so I won't go into details about the various coding issues I ran into or how I worked around them. I will point out one minor mathematical issue. Since the MISP is a maximization problem, the goal is to find the longest (in terms of weight, not number of edges) path from root node to terminal node in the BDD. JGraphT has a package containing shortest path algorithms, but no longest path algorithms. Fortunately, the number of layers in the graph is fixed (one layer per decision variable, plus one to hold the terminal node), which means the number $L$ of links in a longest path is fixed. So we simply find the maximum weight $W$ of any node in the graph, change the weight $w_e$ of each edge $e$ to $LW - w_e$, and find the shortest path using the modified weights. That path is guaranteed to be the longest path with respect to the original weights.

Last thing: As usual, my code is available for you to play with from my GitLab repository.

References

[1] Wikipedia entry: Binary decision diagram
[3] Bergman, D.; Cire, A. A.; van Hoeve, W.-J. & Hooker, J. Decision Diagrams for Optimization. Springer International Publishing AG, 2016.
[4] JGraphT library
[6] Wikipedia entry: Maximal independent set
[7] Wikipedia entry: Chvátal graph
[9] Graphviz - Graph Visualization Software

Wednesday, April 21, 2021

Lagrangean Relaxation: The Sequel

In a previous post, I looked at a way to solve a multiple assignment problem (where multiple users can be assigned to each server and each user can be assigned to multiple servers) using Lagrangean relaxation (LR). I won't repeat the details of the problem, or why LR was of interest, here. The post included some computational experiments in R, using CPLEX to get the optimal solution (for confirmatory purposes) and then trying out various nonlinear optimization algorithms on the Lagrangean function.

I've been looking for an open-source, derivative-free nonlinear optimizer (capable of taking box constraints) in Java, and I came across a couple in the Apache Commons Mathematics Library. Wanting to test one of them out, I repeated the experiment with the assignment problem in Java, again using CPLEX to get the optimal solution, and using the BOBYQA algorithm for minimizing the Lagrangean. As is my habit, I've made my Java code available via a GitLab repository for anyone who might want to see it. The Apache Commons library is a bit funky when it comes to using the optimization classes, so I had to do a little trial and error (and considerable staring at the Javadocs), along with a web search for examples. Hopefully my code is simple enough to be easy to digest.


Monday, April 12, 2021

A Math Puzzle as a Network

There is a standard type of math puzzle that has been around at least since I was a child. The details vary, but the concept is consistent. You are typically given a few initially empty containers of various (integer) capacities, an essentially infinite reservoir of something that goes in the containers, and a goal (integer) for how much of that something you want to end up with. You have to figure out how to reach the goal without having any measuring instruments, meaning that your operations are limited to emptying a container into the reservoir, filling a container from the reservoir, or moving content from one container to another until you empty the source or fill the destination, whichever happens first. (All this is done under the assumption of no spillage, meaning the originator of the puzzle did not have me in mind.) I think I've seen a variant that involves cutting things, where your ability to measure where to cut is limited to stacking pieces you already have as a guide to the piece you want to cut.

A question popped up on Mathematics Stack Exchange about how to solve one of these puzzles using dynamic programming (DP) with backward recursion. The problem at hand involves two jugs, of capacities seven and three liters respectively, and a lake, with the desired end state being possession of exactly five liters of water. The obvious (at least to me) state space for DP would be the volume of water in each jug, resulting in 32 possible states ($\lbrace 0,\dots,7\rbrace \times \lbrace 0,\dots,3 \rbrace$). Assuming the objective function is to reach the state $(5,0)$ with a minimal number of operations, the problem can be cast just as easily as a shortest path problem on a digraph, in which each node is a possible state of the system, each arc has weight 1, and arcs fall into one of the categories mentioned in the previous paragraph.

I was looking for an excuse to try out the igraph package for R, and this was it. In my R notebook, a node label "5|2" would indicate the state where the larger jug contains five liters and the smaller jug contains two. Arcs are labeled with one of the following: "EL" (empty the larger jug); "FL" (fill the larger jug); "ES" (empty the smaller jug); "FS" (fill the smaller jug); "PLS" (pour the larger jug into the smaller jug); or "PSL" (pour the smaller jug into the larger jug).

Assuming I did not screw up the digraph setup, a total of nine operations are required to get the job done. If you are interested, you can see my code (and the solution) in this R notebook.

Thursday, April 8, 2021

A GA Model for a Joint Clustering Problem

A problem in grouping users and servers was posted on Mathematics Stack Exchange and OR Stack Exchange. (Someone remind me to rant about cross-posting in a future blog post. Just don't cross-post the reminder.) The gist of the problem is as follows. We have $S$ servers of some sort, and $U$ users. For each combination of user $u$ and server $s$, we have a parameter $h_{u,s}$ which pertains to the quality / strength / something of service user $u$ would get from server $s$. We are told to group users and servers into a predefined number $G$ of groups or clusters. Every user in cluster $g$ will be served by every server in cluster $g$, but servers in other clusters will interfere with the service to user $u$. (A possible application might be cellular phone service, where signals from towers to which you are not connected might interfere with your signal. Just guessing.)

There is one more parameter, the maximum number ($M$) of servers that can be assigned to a group. It is explicitly stated that there is no limit to the number of users that can be assigned to a group. I'm going to go a step further and assume that every group must contain at least one server but that there is no lower limit to the number of users assigned to a group. (If a group has no users, presumably the servers in that group get to relax and play video games or whatever.)

The objective is to maximize $\sum_{u=1}^U q_u$, the total quality of service, where $q_u$ is the quality of service for user $u$. What makes the problem a bit interesting is that $q_u$ is a nonlinear function of the allocation decisions. Specifically, if we let $\mathcal{S}_1, \dots, \mathcal{S}_G$ be the partition of the set $\mathcal{S} = \lbrace 1,\dots, S\rbrace$ of all servers, and if user $u$ is assigned to group $g$, then $$q_{u}=\frac{\sum_{s\in\mathcal{S}_{g}}h_{us}}{\sum_{s\notin\mathcal{S}_{g}}h_{us}}.$$Note that the service quality for user $u$ depends only on which servers are/are not in the same group with it; the assignments of other users do not influence the value of $q_u$.

 

An answer to the OR SE post explains how to model this as a mixed-integer linear program, including how to linearize the objective. That is the approach I would recommend. The original poster, however, specifically asked for a heuristic approach. I got curious and wrote a genetic algorithm for it, in R, using the GA library. Since this is a constrained problem, I used a random key GA formulation. I won't go into excessive detail here, but the gist is as follows. We focus on assigning servers to groups. Once we have a server assignment, we simply compute the $q_u$ value for each user and each possible group, and assign the user to the group that gives the highest $q_u$ value.

 

To assign servers to groups, we start with an "alphabet" consisting of the indices $1,\dots,S$ for the servers and $G$ dividers (which I will denote here as "|"). In the R code, I use NA for the dividers. A "chromosome" is an index vector that permutes the alphabet. Without loss of generality, we can assume that the first server goes in the first group, and the last divider must come after the last group, and thus we permute only the intervening elements of the alphabet. For instance, if $S=5$ and $G=3$, the alphabet is $1,2,3,4,5,|,|,|$ and a chromosome $(2, 7, 4, 6, 5, 3)$ would translate to the list $1, 2, |, 4, |, 5, 3, |$. (Each element of the chromosome is the index of an element in the alphabet.) I would interpret that as group 1 containing servers 1 and 2, group 2 containing server 4, and group 3 containing servers 3 and 5.

 

There is no guarantee that a random chromosome produces a server grouping that contains at least 1 and at most $M$ servers in every group, so we post-process it by going group by group and adjusting the dividers by the minimum amount necessary to make the current group legal. Once we have the servers grouped, we assign users by brute force and compute an overall fitness of the solution.

 

I am deliberately leaving out some (OK, many) gory details here. My R code and a test problem are contained in an R notebook that you are welcome to peruse. When I rerun the GA on the same test problem, I get different results, which is not surprising since the GA is a heuristic and is most definitely not guaranteed to produce an optimal solution. Whether the solution is "good enough", and whether it scales to the size problem the original poster has in mind, are open questions.

 

Thursday, February 18, 2021

Restarting the MATE Panel

I know that Cinnamon is the trendy choice for desktop environment with Linux Mint,  but ever since an unfortunate misadventure with video drivers I have been using the somewhat more stable and somewhat faster MATE (pronounced Ma-Tay, per their home page) environment. Overall I am very happy with it. There are, however, occasional hiccups with the panel, the bar along one edge of the display (bottom in my case) containing launchers, tabs for open applications and general what-not. Occasionally, for reasons beyond me ken, some of the icons will be screwed up, duplicated, or duplicated and screwed-up. This morning, for instance, I found not one but three iterations of the audio output icon (looks like a loud speaker, used to set audio preferences). The first icon had the normal appearance, meaning audio was enabled, while the second and third were indicating that audio was muted. (Despite the 2-1 vote against, audio was in fact enabled.)

Glitches like that do not render the system unusable, but they are annoying. So I dug around a bit and discovered that the system command mate-panel. Run from a shell script or a launcher, the command mate-panel --replace seems to do the trick of restarting the panel (and hopefully fixing the glitch that made you restart it).  If you run the command from a terminal, be sure to push it to the background using an ampersand (mate-panel --replace &). Otherwise, it will tie up the terminal session, and when you break out of it (probably via Ctrl-C) the panel will rather unhelpfully evaporate.

Monday, February 15, 2021

Lagrangean Relaxation for an Assignment Problem

 A question on OR Stack Exchange asked about solving an assignment problem "heuristically but to optimality". The problem formulation (in which I stick as closely as possible to the notation in the original post, but substitute symbols for two numeric parameters) is as follows:

\begin{align*} \max_{d_{u,c}} & \sum_{u=1}^{U}\sum_{c=1}^{C}\omega_{u,c}d_{u,c}\\ \text{s.t. } & \sum_{c=1}^{C}d_{u,c}\le C_{\max}\ \forall u\in\left\{ 1,\dots,U\right\} \\ & \sum_{c=1}^{C}d_{u,c}\ge1\ \forall u\in\left\{ 1,\dots,U\right\} \\ & \sum_{u=1}^{U}d_{u,c}\le U_{\max}\ \forall c\in\left\{ 1,\dots,C\right\} \\ & d_{u,c}\in\left\{ 0,1\right\} \ \forall u,c. \end{align*} Here $d_{u,c}$ is a binary variable, representing assignment of "user" $u$ to "service provider" $c$, and everything else is a parameter. Each user must be assigned at least one provider and at most $C_\max$ providers, and each provider can be assigned at most $U_\max$ users. The objective maximizes the aggregate utility of the assignments.

One of the answers to the question asserts that the constraint matrix has the "integrality property", meaning that any basic feasible solution of the LP relaxation will have integer variable values. The recommended solution approach is therefore to solve the LP relaxation, and I agree with that recommendation. (I have not seen a proof that the matrix has the integrality property, but in my experiments the LP solution always was integer-valued.) That said, the author did ask about "heuristic" approaches, which got me wondering if there was a way to solve to optimality without solving an LP (and thus requiring access to an LP solver).

 I decided to try Lagrangean relaxation, and it seems to work. In theory, it should work: if the constraint matrix has the integrality property, and the LP relaxation automatically produces an optimal integer-valued solution, then there is no duality gap, so the solution to the Lagrangean problem should be optimal for the original problem. The uncertainty lies more in numerical issues stemming from the solving of the Lagrangean problem.

In what follows, I am going to reverse the middle constraint of the original problem (multiplying both sides by -1) so that all constraints are $\le$ and thus all dual multipliers are nonnegative. If we let $\lambda\ge 0$, $\mu\ge 0$ and $\nu\ge 0$ be the duals for the three sets of constraints, the Lagrangean relaxation is formulated as follows:

$$\min_{\lambda,\mu,\nu\ge0}LR(\lambda,\mu,\nu)=\\\max_{d\in\left\{ 0,1\right\} ^{U\times C}}\left(\sum_{u}\sum_{c}\omega_{u,c}d_{u,c}-\sum_{u}\lambda_{u}\left[\sum_{c}d_{u,c}-C_{\max}\right]\\+\sum_{u}\mu_{u}\left[\sum_{c}d_{u,c}-1\right]-\sum_{c}\nu_{c}\left[\sum_{u}d_{u,c}-U_{\max}\right]\right).$$

We can simplify that a bit:

$$\min_{\lambda,\mu,\nu\ge0}LR(\lambda,\mu,\nu)=\\\max_{d\in\left\{ 0,1\right\} ^{U\times C}}\left(\sum_{u}\sum_{c}\left[\omega_{u,c}-\lambda_{u}+\mu_{u}-\nu_{c}\right]d_{u,c}\\+C_{\max}\sum\lambda_{u}-\sum_{u}\mu_{u}+U_{\max}\sum_{c}\nu_{c}\right).$$

The inner maximization problem is solvable by inspection. Let $\rho_{u,c}= \omega_{u,c}-\lambda_{u}+\mu_{u}-\nu_{c}$. If $\rho_{u,c} > 0$, $d_{u,c}=1$. If $\rho_{u,c} < 0$, $d_{u,c}=0$. If $\rho_{u,c} = 0$, it does not matter (as far as the inner problem goes) what value we give $d_{u,c}$. So we can rewrite the outer (minimization) problem as follows:

$$\min_{\lambda, \mu, \nu \ge 0}LR(\lambda,\mu,\nu)=\\\sum_{u}\sum_{c}\left(\rho_{u,c}\right)^{+}+C_{\max}\sum\lambda_{u}-\sum_{u}\mu_{u}+U_{\max}\sum_{c}\nu_{c}.$$

$LR(\lambda,\mu,\nu)$ is a piecewise-linear function of its arguments, with directional gradients, but is not continuously differentiable. (Things get a bit tricky when you are on a boundary between linear segments, which corresponds to having $\rho_{u,c}=0$ for one or more combinations of $u$ and $c$.)

I coded a sample instance in R and tested both solving the LP relaxation (using CPLEX) and solving the Lagrangean problem, both using a derivative-based method (a version of the BFGS algorithm) and using a couple of derivative-free algorithms (versions of the Nelder-Mead and Hooke-Jeeves [1] algorithms). Importantly, all three algorithms are modified to allow box constraints, so that we can enforce the sign restriction on the multipliers.

You can download my code in the form of an R notebook, containing text, output and the code itself (which can be extracted). In addition to CPLEX, it uses a gaggle of R libraries: magrittr (for convenience); ompr, ompr.roi, ROI and ROI.plugin.cplex for building the LP model and interfacing with CPLEX; and dfoptim for the Nelder-Mead and Hooke-Jeeves algorithms. (The BFGS algorithm comes via the optim() method, part of the built-in stats library.) If you want to play with the code but do not have CPLEX or some of the libraries, you can just delete the lines that load the missing libraries along with the code that uses them.

Based on limited experimentation, I would say that Nelder-Mead did not work well enough to consider, and BFGS did well in some cases but produced somewhat suboptimal results in others. It may be that tweaking some control setting would have helped with the cases where BFGS ran into trouble. Hooke-Jeeves, again in limited testing, consistently matched the LP solution. So if I needed to come up with some hand-coded way to solve the problem without using libraries (and did not want to write my own simplex code), I would seriously consider using Hooke-Jeeves (which I believe is pretty easy to code) on the Lagrangean problem.


[1] Hooke, Robert and Jeeves, T. (1961) "Direct search'' solution of numerical and statistical problems. Journal of the ACM, Vol. 8, No. 2, 212-229.

Sunday, January 31, 2021

Solving a Multidimensional NLP via Line Search

Someone posted a nonconvex nonlinear optimization model on OR Stack Exchange and asked for advice about possible reformulations, piecewise linear approximations, using global optimizers, and other things. The model is as follows:\begin{alignat}{1} \max\,\, & q_{1}+q_{2}\\ \mathrm{s.t.} & \sum_{i=1}^{n}p_{i}x_{i}=\sum_{t=0}^{T}\frac{F_{t}}{(1+q_{1})^{t}} &(1)\\ & \sum_{i=1}^{n}p_{i}x_{i}=\sum_{t=0}^{T}\sum_{i=1}^{n}\frac{b_{t,i}x_{i}}{(1+q_{2})^{t}} &(2)\\ & \sum_{i=1}^{n}p_{i}x_{i}=\beta\sum_{t=0}^{T}\frac{F_{t}}{(1+q_{1}+q_{2})^{t}} &(3)\\ & q_{1,}q_{2}\ge0\\ & x\in\left[0,1\right]^{n} \end{alignat} All symbols other than $q_1$, $q_2$ and $x$ are model parameters (or indexes). The author originally had $x$ as binary variables, apparently believing that would facilitate linearization of products, but also expressed interest in the case where $x$ is continuous. I'm going to propose a possible "low-tech" solution procedure for the continuous case. The binary case is probably a bit too tough for me. The author supplied sample data for all parameters except $\beta$, with dimensions $n=3$ and $T=4$ but expressed a desire to solve the model for $n=10,000$ and $T=1,200$ (gulp).

Note that the left-hand sides (LHSes) of the three constraints are identical. Let $h()$ be the function on the right-hand side (RHS) of constraint (1), so that the RHS of (1) is $h(q_1)$. $h()$ is a monotonically decreasing function. The RHS of (3) is $\beta h(q_1 + q_2)$. Since the left sides are equal, we have $$\beta h(q_1 + q_2) = h(q_1) \quad (4)$$which tells us several things. First, $q_2 \ge 0 \implies h(q_1+q_2) \le h(q_1)$, so if $beta<1$ it is impossible to satisfy (4). Second, if $\beta =1$, (4) implies that $q_2 = 0$, which simplifies the problem a bit. Lastly, let's assume $\beta > 1$. For fixed $q_1$ the LHS of (4) is monotonically decreasing in $q_2$, with the LHS greater than the RHS when $q_2 = 0$ and with $$\lim_{q_2\rightarrow \infty} \beta h(q_1+q_2) = \beta F_0.$$ If $\beta F_0 > h(q_1)$, there is no $q_2$ that can balance equation (4), and so the value of $q_1$ is infeasible in the model. If $\beta F_0 < h(q_1)$, then there is exactly one value of $q_2$ for which (4) holds, and we can find it via line search.

Next, suppose that we have a candidate value for $q_1$ and have found the unique corresponding value of $q_2$ by solving (4). We just need to find a vector $x\in [0,1]^n$ that satisfies (1) and (2). Equation (3) will automatically hold if (1) does, given (4). We can find $x$ by solving a linear program that minimizes 0 subject to (1), (2) and the bounds for $x$.

Thus, we have basically turned the problem into a line search on $q_1$. Let's set an arbitrary upper limit of $Q$ for $q_1$ and $q_2$, so that our initial "interval of uncertainty" for $q_1$ is $[0, Q]$. It's entirely possible that neither 0 nor $Q$ is a feasible value for $q_1$, so our first task is to search upward from $0$ until we find a feasible value (call it $Q_\ell$) for $q_1$, then downward from $Q$ until we find a feasible value (call it $Q_h$) for $q_1$. After that, we cross our fingers and hope that all $q_1 \in [Q_\ell,Q_h]$ are feasible. I think this is true, although I do not have a proof. (I'm much less confident that it is true if we require $x$ to be binary.) Since $q_2$ is a function of $q_1$ and the objective function does not contain $x$, we can search $[Q_\ell,Q_h]$ for a local optimum (for instance, by golden section search) and hope that the objective function is unimodal as a function of $q_1$, so that the local optimum is a global optimum. (Again, I do not have proof, although I would not be surprised if it were true.)

I put this to the test with an R script, using the author's example data. The linear programs were solved using CPLEX, with the model expressed via the OMPR package for R and using ROI as the intermediary between OMPR and CPLEX. I first concocted an arbitrary feasible solution and used it to compute $\beta$, so that I would be sure that the problem was feasible with my choice of $\beta$. Using $\beta = 1.01866$ and 100 (arbitrarily chosen) as the initial upper bounds for $q_1$ and $q_2$, my code produced an "optimal" solution of $q_1= 5.450373$, $q_2 = 0.4664311$, $x = (1, 0.1334608, 0)$ with objective value $5.916804$. There is a bit of rounding error involved: the common LHS of (1)-(3) evaluated to 126.6189, while the three RHSes were 126.6186, 126.6188, and 126.6186. (In my graduate student days, our characterization of this would be "good enough for government work".) Excluding loading libraries, the entire process took under three seconds on my desktop computer.

You can access my R code from my web site. It is in the form of an R notebook (with the code embedded), so even if you are not fluent in R, you can at least read the "prose" portions and see some of the nagging details involved.

 

Thursday, January 28, 2021

A Monotonic Assignment Problem

 A question posted on Stack Overflow can be translated to an assignment problem with a few "quirks". First, the number of sources ($m$) is less than the number of sinks ($n$), so while every source is assigned to exactly one sink, not every sink is assigned to a source. Second, there are vectors $a\in\mathbb{R}^m$ and $b\in\mathbb{R}^n$ containing weights for each source and sink, and the cost of assigning source $i$ to sink $j$ is $a_i \times b_j$. Finally, there is a monotonicity constraint. If source $i$ is assigned to sink $j$, then source $i+1$ can only be assigned to one of the sinks $j+1,\dots,n$.

Fellow blogger Erwin Kalvelagen posted a MIP model for the problem and explored some approaches to solving it. A key takeaway is that for a randomly generated problem instance with $m=100$ and $n=1,000$, CPLEX needed about half an hour to get a provably optimal solution. After seeing Erwin's post, I did some coding to cook up a network (shortest path) solution in Java. Several people proposed similar and in some cases essentially the same model in comments on Erwin's post. Today, while I was stuck on a Zoom committee call and fighting with various drawing programs to get a legible diagram, Erwin produced a follow-up post showing the network solution (including the diagram I was struggling to produce ... so I'll refer readers to Erwin's post and forget about drawing it here).

The network is a layered digraph (nodes organized in layers, directed arcs from nodes in one layer to nodes in the next layer). It includes two dummy nodes (a start node in layer 0 and a finish node in layer $m+1$). All nodes in layer $i\in \lbrace 1,\dots,m \rbrace$ represent possible sink assignments for source $i$. The cost of an arc entering a node representing sink $j$ in layer $i$ is $a_i \times b_j$, regardless of the source of the arc. All nodes in layer $m$ connect to the finish node via an arc with cost 0. The objective value of any valid assignment is the sum of the arc costs in the path from start to finish corresponding to that assignment, and the optimal solution corresponds to the shortest path from start to finish.

The monotonicity restriction is enforced simply by omitting arcs from any node in layer $i$ to a lower-index node in layer $i+1$. It also allows us to eliminate some nodes. In the first layer after the start node (where we assign source 1), the available sinks are $1,\dots,n-m+1$. The reason sinks $n-m+2,\dots,n$ are unavailable is that assigning source 1 to one of them and enforcing monotonicity would cause us to run out of sinks before we had made an assignment for every source. Similarly, nodes in layer $i>1$ begin with sink $i$ (because the first $i-1$ sinks must have been assigned or skipped in earlier layers) and end with sink $n-m+i$ (to allow enough sinks to cover the remaining $m-i$ nodes).

For the dimensions $m=100$, $n=1000$, the network has 90,102 nodes and 40,230,551 arcs. That may sound like a lot, but my Java code solves it in under four seconds, including the time spent setting up the network. I used the excellent (open-source) algs4 library, and specifically the AcyclicSP class for solving the shortest path problem. Erwin reports even faster solution time for his network model (0.9 seconds, coded in Python), albeit on different hardware. At any rate, he needed about half an hour to solve the MIP model, so the main takeaway is that for this problem the network model is much faster.

If anyone is interested, my Java code is available for download from my Git repository. The main branch contains just the network model, and the only dependency is the algs4 library. There is also a branch named "CPLEX" which contains the MIP model, in case you either want to compare speeds or just confirm that the network model is getting correct results. If you grab that branch, you will also need to have CPLEX installed.


Friday, January 22, 2021

Rainbow Parentheses in RStudio

I use the open-source edition of the RStudio IDE for any R coding I do, and I'm a big fan of it. The latest version (1.4.1103) introduced a new feature that was previously only available in alpha and beta versions: rainbow parentheses. I'd never heard the term before, but the meaning turns out to be remarkably simple. When turned on, if you enter an expression with nested parentheses, brackets or braces, RStudio automatically color codes the parentheses (brackets, braces) to make it easier to see matching pairs. This is in addition to the existing feature that highlights the matching delimiter when you put the cursor after a delimiter.

I was geeked to try this out, but when I first installed the latest version and turned it on, I did not see any changes. Eventually I figured out that it was color coding the delimiters, but the differences were too subtle for me to see. (This was with the default Textmate theme for the IDE.) So I hacked a new theme which makes the colors easier to see. I'll go through the steps here.

First, let me point to some documentation. In a blog post, the folks at RStudio explain how to turn rainbow parentheses on, either globally or just for specific files, and near the end tell which CSS classes need to be tweaked to customize the colors (.ace_paren_color_0 to .ace_paren_color_6). A separate document discusses how to create custom themes.

Theme selection in RStudio is done via Tools > Global Options... > Appearance > Editor theme. Since I use the default (Textmate) theme, my first step was to track down that file and make a copy with a new name. On my Linux Mint system, the file is /usr/lib/rstudio/resources/themes/textmate.rstheme. On Windows (and Macs?) the built-in themes will be lurking somewhere else. The customization document linked above alluded to a ~/.R/rstudio/themes directory on Linux and Macs, but that directory did not exist for me. (I created it, and parked my hacked theme file there.) Put the copied file someplace under a new name. I'm not sure whether the file name is significant to RStudio, but better safe than sorry.

Open the copy you made of your preferred theme file in a text editor. The first two lines are comments that define the theme name (as it will appear in the list of editor themes in RStudio) and whether it is a dark theme or not. Change the name to something that won't conflict with existing themes. In my case, the first line was

/* rs-theme-name: Textmate (default) */
which I changed to
/* rs-theme-name: Paul */
(not very clever, but it got the job done).

Now add code at the bottom to define colors for the seven "ace_paren_color" styles. Here's what I used:

.ace_paren_color_0 {
  color: #000000 
  /* black */
}

.ace_paren_color_1 {
  color: #ff00ff
  /* magenta */
}

.ace_paren_color_2 {
  color: #ffff00
  /* yellow */
}

.ace_paren_color_3 {
  color: #0080ff
  /* light blue */
}

.ace_paren_color_4 {
  color: #FF0000
  /* red */
}

.ace_paren_color_5 {
  color: #004f39
  /* Spartan green */
}

.ace_paren_color_6 {
  color: #0000ff
  /* dark blue */
}

Once you have a candidate style to test, go to the editor themes settings and use the Add... button to add it. I had to fight through a lot of complaints from RStudio, and I needed to restart RStudio to get the new theme to appear. In the same dialog, I selected it, and then put it to the test by typing some nonsense with lots of nested parentheses in a file.

There are two things to watch out for if you try to remove a theme (using the Remove button in that dialog). First, you cannot remove the currently selected theme, so you will need to select a different theme and click Apply, then go back and select the theme to remove. Second, if you remove a theme, RStudio will delete the theme file. So be sure you have a backup copy if you think you might want to use it again (or edit it).

One good thing: once you have added your theme, you can edit your theme without having to remove it and then add it back. After saving any changes, you just have to switch to some other theme and then switch back to your theme to see the impact of the changes in your documents.