Monday, November 4, 2013

The Triangle Inequality in Transportation Networks

I just noticed that I managed to go the entire month of October without a post. This was a combination of several factors: it's conference season for me (INFORMS in October, Decision Sciences Institute coming up soon); I was a guest blogger at the INFORMS conference (sample here); deciduous trees + autumn + Michigan = yard work; college and pro football games to watch; laziness; and a somewhat silent muse. Today's entry was prompted partly by a vague sense of guilt and partly by a bug I encountered.

The bug occurred in the context of some ongoing research, involving optimization of vehicle assignments and routing. Computational results on artificial data sets occasionally produced "optimal" solutions that were demonstrably suboptimal. The source of the problem turned out to be instances where the data did not satisfy the triangle inequality, one of the assumptions upon which our model is based. Fixing the data cured the problem.

This in turn reminded me of a dissertation proposal defense I attended a couple of years ago. The candidate (who, in fairness, had not received any training in operations research) was proposing a set of research questions in logistics that only made sense in situations where the underlying transportation network failed to satisfy the triangle inequality, either when the metric was distance traveled or when it was travel time. This prompts some observations about the triangle inequality in the context of transport networks. To save myself some typing, I'll henceforth use the phrase "literal network" to mean a network in which nodes are locations on a map and arcs or edges are literally segments of a road network. Some but not all of what follows may also apply to rail, sea or air transport.
  1. When the metric is travel time, literal networks will frequently not satisfy the triangle inequality. The most direct route from my former employer's campus to my house involved a fairly linear stretch on streets with comparatively low speed limits and frequent traffic lights. By going a little out of my way, I could turn the bulk of that drive into a sprint up an interstate highway. So A (campus exit) to B (home) had a substantially longer driving time, particularly at rush hour, than A to C (highway entrance) to D (highway exit) to B.
  2. When the metric is distance, literal networks may still not satisfy the triangle inequality. The "direct" route from A to B may loop around some obstacle (hill, swamp, US Capitol ... but I repeat myself) or have a significant vertical component. If A to C and C to B are fairly flat and linear segments, their combined length may be less than the looping A to B.
  3. If your key metric is distance, an undirected graph may be sufficient, unless roads are one-way. If you are concerned with driving time, you probably need a directed network, even if there are no one-way roads. Anyone who has driven during rush hour in the opposite direction of the heavy traffic has experienced (and enjoyed) the asymmetry.
  4. Practically speaking, the network actually traveled by (knowledgeable) drivers will satisfy the triangle inequality in either time or distance, whichever is more important, regardless of whether the literal network does. If I can get from A to B faster by way of C than going directly, I will, and so my A->B "arc" is actually A->C->B.
Points 2 and 4 underly the bug in that research project I mentioned. The arcs were generated without the assumption that distances would satisfy the triangle inequality. The non-triangularity (to coin a term) is plausible given that a rural road network, where road segments are not particularly straight, was being simulated. Travel times were then made proportional to distances, which is reasonable from a literal perspective but neglects the fact that drivers will use an indirect route if it is faster.

Point 4 also relates to what was tripping up the doctoral student I mentioned. His research required a scenario where commercial delivery vehicles would be routed using traveling salesman problems (TSPs), with travel time as the metric, on a network where travel times did not satisfy the triangle inequality. When I pointed out the gist of the fourth point (that drivers will take an indirect route if faster, assuming that their mileage is not being tightly monitored), he was properly scandalized, because it is well known that TSP solutions are Hamiltonian cycles that do not revisit intermediate nodes. The indirect routes would in some cases violate that restriction.

My response was that one has to apply the TSP model to a network in which an arc A->B represents not a literal segment from A directly to B, passing through no other nodes, but a "virtual" arc originating at A, terminating at B and stopping at no intermediate nodes (but possibly passing through them, if that is more efficient). He was quite skeptical, until I pointed out that if your warehouse is at the end of a blind alley and you are not allowed to repeat any streets, you can never return. That convinced him. There are a few situations in which the no repetition rule needs to be enforced rigorously, typically when you are leaving burned bridges, land mines or irate traffic cops in your wake. In most other cases, I think you can be flexible.

Replacing the arcs in my joint research problem with "virtual arcs" amounted to computing shortest (time) paths between all pairs of nodes, and replacing the original travel time on each arc with the time of the shortest path. I used something equivalent to the Floyd-Warshall algorithm for that. In the other incident, once I had convinced the doctoral candidate that travel times typically satisfy the triangle inequality (and that the no repetition rule of the Hamiltonian cycle is to be taken a bit loosely), he contacted two vendors of commercial routing software. As he reported the conversations to me, neither had taken this into account in their software, which I find a bit surprising.

One final note: if your literal network satisfies the triangle inequality with respect to distances (road segments are fairly flat and linear) but not time (due to traffic signal, asymmetric traffic volumes, etc.), and you "virtualize" the network as I described to get times that do satisfy the triangle inequality, your distances may no longer satisfy it. You can't have everything.

-----

Muses: Greek mythology seems to be a bit vague about the number of Muses -- nine seems to be a common estimate -- and most sources ignore my personal muse: Erratic, the unreliable twin of Erato.

4 comments:

  1. Hello, what if the asymmetric distance matrix is generated in google map and the matrix not satisfy triangle inequality. I already check which node that violate the triangle inequlaity and I update the side that violate tiangle inequality like Floyd-Warshall algorithm (sum of antohter two side).

    But i'm not sure wether by changing the distance will affect on algorithm cause sometime the new value of distance have high value diffrence with the old value. Is my method to fix the triangle inequality in distance matrix valid ?.

    Currently I'm developing a program in C#.NET for delivery to customers using CW saving algorithm by Altinel and Oncan. I'm in computer science program so my OR knowledge limited.

    Sory if I vioalate the Ground rule for comments.

    Sulistyo Chandrianto,
    Thank You

    ReplyDelete
    Replies
    1. If by "high value difference" you mean that the total distance of the optimal solution using the new distances is considerably less than that of the solution the old distances, that is not necessarily a surprise. If the new solution is worse, that would mean something went wrong. Your modified distances are always no longer than the original distances.

      Floyd-Warshall would be my first choice, unless the network was large enough that I needed something faster to get computation time down.

      One thing to keep in mind is that travel distance and travel time are not always strongly correlated. The shortest distance route might take you on roads that are heavily congested or that have large numbers of traffic lights. I mention this because I know that the Google map application estimates both distance and time. I don't know whether the Google maps API lets you download times as well as distances.

      Delete
    2. Thank's for the answer sir.
      I mean in "high value difference" is the diffrence between old value and modified value have a big gap. If i have an old value with 5 km and modifed value with 2.9 km the difference is 2.1 km.

      Sulistyo Chandrianto

      Delete

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.