## Friday, December 24, 2010

### A Seasonal TSP (Traveling Santa Problem)

You can tell when academics with blogs are between semesters by the increased frequency of their postings.  Yesterday Laura McLay posted a pointer to a 12-city traveling Santa problem (PDF file) at Math-Drills.Com.  Faced with a Christmas Eve choice of (a) paying bills, (b) performing some overdue data analysis for a paper-to-be or (c) playing with a TSP ... well, let's just say that I didn't need the Analytic Hierarchy Process to sort that one out.

Slightly (well, very slightly) more seriously, I was curious about the difficulty of the problem.  The TSP is one of the quintessential NP-hard problems, with all sorts of special purpose algorithms devised for it.  Sometimes, though, I think we confuse NP-hard with hard.  NP-hard refers to a class of problems.  At the risk of starting a flame war with purists (which will at least drive up my site stats), all optimization problems get pretty darn hard when they're big enough; we tend to believe that NP-hard problems get pretty darn hard faster than NP-not-so-hard problems as their size grows.  What is sometimes overlooked, though, is that all this is a sort of asymptotic result; we can solve not horribly large problems pretty easily (I can do the three city TSP in my head, at least after a cup or two of coffee), and the dividing line between doable and not doable continually shifts with improvements in hardware and software.

I wasn't really motivated enough to look up the latest and greatest algorithms for TSPs, so I started with an old, and I'm sure now "obsolete", method due I believe to Miller, Tucker and Zemlin ca. 1960.  It's a mixed integer linear programming formulation using binary variables to determine which links are included in the Hamiltonian circuit (route).  There are additional nonnegative variables that assign what we might call "potentials" to each node.  We require each node to be entered and exited once, and we require that any time a link is used, the potential of the destination city be at least one greater than the potential of the origin city, unless the destination city is the starting point for the circuit (i.e., unless the link closes the loop).  The MTZ formulation is more compact than formulations built with subtour elimination constraints (of which exponentially many can occur), but I believe it tends to have weaker bounds (and I've seen some papers on tightening the bounds in the MTZ formulation, but can't recall any details).

Anyway, driving back and forth between home and the YMCA (one of two loci of physical self-abuse for me), I mapped out a general strategy:  write a MILP model using the MTZ approach and see how long it takes to solve (using the Java API to CPLEX 12.2); see if identifying the single entry/single exit constraints as SOS1 sets (with "cleverly chosen" weights) helps; maybe try adding cuts on the fly to eliminate dominated segments of the tour (don't go A-B-C-D if A-C-B-D is shorter); maybe try depth-first search; etc., etc.  As always, to paraphrase someone (Napoleon?), the first casualty in every battle is the plan.  I wrote the basic MILP model with MTZ constraints pretty quickly (copying the data out of the PDF and into the Java editor, suitably reformatted, actually took longer than writing the code itself) and turned CPLEX loose with all default settings.  CPLEX found an optimal solution in something around 0.3 seconds (on my dual core laptop).  So much for any experiments in parameter tweaking, SOS1, and so on.

I won't publish the optimal route here, in case anyone is still working on it, but I will say that the length of the optimal solution is 30,071 miles -- and this is only 12 of the cities Santa must visit.  I hope he remembered to feed the reindeer.