Monday, December 26, 2022

Selecting Dispersed Points

Fellow blogger Erwin Kalvelagen posted a comparison of two binary programming models, one quadratically constrained and one linearly constrained, for the problem of selecting a maximal number of points from a finite set subject to the requirement that no two selected points be closer than a specified distance. The models were an answer to a question posted on Computational Science Stack Exchange. Not surprisingly, the linearized version tended to solve faster than the quadratic version.

In Erwin's linear model, the constraints take the form $$\underline{D}(x_i + x_j - 1) \le d_{i,j} \quad (1)$$ where $d_{i,j}$ is the distance between points $i$ and $j$, $\underline{D}$ is the minimum allowable distance between selected points, and $x_k$ is a binary variable indicating whether point $k$ is selected (1) or not (0). I coded both his models in Java, using CPLEX 22.1.1, along with another linear model where the constraints are expressed as $$x_i + x_j \le 1\quad (2)$$ for those pairs $(i,j)$ where $d_i + d_j \le \underline{D}.$ In other words, we exploit the fact that we know the distances at the outset to precompute which pairs of points can / cannot coexist, and just rule out the pairs that cannot. Since (1) is equivalent to $$x_i + x_j \le 1 + \frac{d_{i,j}}{\underline{D}},$$ constraint (2) is clearly at least a bit tighter than constraint (1).

Erwin started with a problem size of 50 points for demonstration purposes, then doubled that to compare timing of this two models. I ratcheted the problem size up to 1,000 points to compare his linear model to mine. (I did not test the quadratic model at that size.) As with all things MIP, the results were not entirely consistent. In limited testing, the model using (2) was usually faster than the model using (1), but occasionally (1) proved faster. The run time differences were not large enough to be exciting. For instance, in one test run version (1) needed 4.633 seconds versus 2.987 seconds for version (2).

Overall, I can't say the time differences lived up to my expectations, and the fact that at least occasionally (1) was faster than (2) (perhaps due to some quirk in presolving, or just to some random choices in branching) is consistent with my experience that MIP behaviors are, well, not consistent.

Tuesday, December 13, 2022

Scheduling a Round-Robin Tournament

Once in a while I come across a scheduling problem, and of course my first reaction is to think "integer program". A while back, for example, I used an IP model to work out a schedule for a rotating duplicate bridge game (in which both teams and hosts rotated) at the request of a buddy. Most recently, I saw a question on OR Stack Exchange ("Doubles Round Robin Sorting Algorithm") related to tournament scheduling. 

The problem involves six players playing pickleball (so two players on each team, with two sitting out, for each game). Given six players, there are 15 possible teams (pairings). Twelve games are scheduled each week for four weeks, with two sets of six games per week. The author of the question correctly computed that there are 45 possible game configurations. Over the course of the four weeks, he wanted every possible game played at least once (which leaves the final three game slots to be filled arbitrarily). The following conditions must be met:

  • within each set of six games, every player plays four games and sits out two;
  • no player sits out two consecutive games within the same set;
  • no two players are partners more than once per set.

The problem poses no criterion for selecting among feasible schedules; we just want to find any one schedule that works.

My IP formulation can be found in my answer to the OR SE question. I will just mention that it uses binary variables $x_{g,s}=1$ if game $g$ goes in slot $s$ and 0 if not. Games are enumerated up front and indexed from 1 to 15. Slots refer to positions in the schedule and are numbered from 1 to 48, with six slots per set and 12 slots per week. Since no criterion is specified, we just use the default (minimize 0).

I also tried a constraint programming model, using general integer variables $s_1,\dots,s_{48},$ where $s_j \in {1,\dots,15}$ is the index of the game played in slot $j$ of the schedule. My CP formulation uses the "all different" constraint (fairly ubiquitous among CP solvers) to ensure that the first 45 slots each contain a different game. It uses the CP Optimizer "distribute" constraint to ensure that no game is played more than twice and the CP Optimizer "count" constraint to ensure that each player plays exactly four times per set (and thus sits twice) and that no team plays more than once in a set. I'm not sure how common those types of constraints are among CP solvers because I don't have much experience with CP solvers.

As it turns out, both CPLEX (for the IP model) and CP Optimizer (for the CP model) produced feasible schedules in about a second or so on my desktop PC. My intuition was that a CP model would be better suited to this type of problem, because all variables are integer and a CP solver would not be doing much if any matrix arithmetic. As it turns out, it would take a much larger test case to tell whether that intuition has any merit.

If anyone wants to play with the models, my Java code is available from my university GitLab repository. Running it would require CPLEX and CP Optimizer (and not necessarily the most recent versions of either).