The bug occurred in the context of some ongoing research, involving optimization of vehicle assignments and routing. Computational results on artificial data sets occasionally produced "optimal" solutions that were demonstrably suboptimal. The source of the problem turned out to be instances where the data did not satisfy the triangle inequality, one of the assumptions upon which our model is based. Fixing the data cured the problem.
This in turn reminded me of a dissertation proposal defense I attended a couple of years ago. The candidate (who, in fairness, had not received any training in operations research) was proposing a set of research questions in logistics that only made sense in situations where the underlying transportation network failed to satisfy the triangle inequality, either when the metric was distance traveled or when it was travel time. This prompts some observations about the triangle inequality in the context of transport networks. To save myself some typing, I'll henceforth use the phrase "literal network" to mean a network in which nodes are locations on a map and arcs or edges are literally segments of a road network. Some but not all of what follows may also apply to rail, sea or air transport.
- When the metric is travel time, literal networks will frequently not satisfy the triangle inequality. The most direct route from my former employer's campus to my house involved a fairly linear stretch on streets with comparatively low speed limits and frequent traffic lights. By going a little out of my way, I could turn the bulk of that drive into a sprint up an interstate highway. So A (campus exit) to B (home) had a substantially longer driving time, particularly at rush hour, than A to C (highway entrance) to D (highway exit) to B.
- When the metric is distance, literal networks may still not satisfy the triangle inequality. The "direct" route from A to B may loop around some obstacle (hill, swamp, US Capitol ... but I repeat myself) or have a significant vertical component. If A to C and C to B are fairly flat and linear segments, their combined length may be less than the looping A to B.
- If your key metric is distance, an undirected graph may be sufficient, unless roads are one-way. If you are concerned with driving time, you probably need a directed network, even if there are no one-way roads. Anyone who has driven during rush hour in the opposite direction of the heavy traffic has experienced (and enjoyed) the asymmetry.
- Practically speaking, the network actually traveled by (knowledgeable) drivers will satisfy the triangle inequality in either time or distance, whichever is more important, regardless of whether the literal network does. If I can get from A to B faster by way of C than going directly, I will, and so my A->B "arc" is actually A->C->B.
Point 4 also relates to what was tripping up the doctoral student I mentioned. His research required a scenario where commercial delivery vehicles would be routed using traveling salesman problems (TSPs), with travel time as the metric, on a network where travel times did not satisfy the triangle inequality. When I pointed out the gist of the fourth point (that drivers will take an indirect route if faster, assuming that their mileage is not being tightly monitored), he was properly scandalized, because it is well known that TSP solutions are Hamiltonian cycles that do not revisit intermediate nodes. The indirect routes would in some cases violate that restriction.
My response was that one has to apply the TSP model to a network in which an arc A->B represents not a literal segment from A directly to B, passing through no other nodes, but a "virtual" arc originating at A, terminating at B and stopping at no intermediate nodes (but possibly passing through them, if that is more efficient). He was quite skeptical, until I pointed out that if your warehouse is at the end of a blind alley and you are not allowed to repeat any streets, you can never return. That convinced him. There are a few situations in which the no repetition rule needs to be enforced rigorously, typically when you are leaving burned bridges, land mines or irate traffic cops in your wake. In most other cases, I think you can be flexible.
Replacing the arcs in my joint research problem with "virtual arcs" amounted to computing shortest (time) paths between all pairs of nodes, and replacing the original travel time on each arc with the time of the shortest path. I used something equivalent to the Floyd-Warshall algorithm for that. In the other incident, once I had convinced the doctoral candidate that travel times typically satisfy the triangle inequality (and that the no repetition rule of the Hamiltonian cycle is to be taken a bit loosely), he contacted two vendors of commercial routing software. As he reported the conversations to me, neither had taken this into account in their software, which I find a bit surprising.
One final note: if your literal network satisfies the triangle inequality with respect to distances (road segments are fairly flat and linear) but not time (due to traffic signal, asymmetric traffic volumes, etc.), and you "virtualize" the network as I described to get times that do satisfy the triangle inequality, your distances may no longer satisfy it. You can't have everything.
Muses: Greek mythology seems to be a bit vague about the number of Muses -- nine seems to be a common estimate -- and most sources ignore my personal muse: Erratic, the unreliable twin of Erato.