Recently, blogs and other online resources have begun to fill the gap in what I call "tactical knowledge", things that seldom make it into textbooks or college lectures. To that end, I'd like to highlight, and expand a bit upon, a recent blog post by Jean-François Puget titled "Analytic Challenges". He lists a number of challenges and then discusses one in detail, how to make analytics "socially and organizationally acceptable". What follows are a few observations of my own.
Choices
Puget discusses impediments to getting the front line troops to accept and implement analytical solutions. This is a very important consideration. Another, slightly different one, is that managers like choices. They often do not care to be given a single solution, even if it is "optimal". (I put optimal in quotes because optimality is always with respect to a particular model, and no model is a perfect representation of the real world.) Among the various reasons for this, some managers realize that they are being paid the "big bucks" for making decisions, not rubber-stamping something an analyst (let alone a rather opaque computer model) said. If you find this notion quaint, ask yourself how you would feel surrendering control to an automated driving system while your car is zipping down a crowded highway during rush hour.
Long ago, before "open-source software" was a recognized phrase, a student in one of my courses became so enthralled by linear and integer programming that he coded his own solver ... on a Commodore 64. (Clearly he was not lacking in fortitude.) He actually used it, in his day job (for the state Department of Transportation), to help make decisions about project portfolios. In order to get his bosses to use his results, he had to generate multiple solutions for them to browse, even when the model had a unique optimum. So he would hand them one optimal solution and several diverse "not too far from optimal" solutions. I think they often picked the optimal solution, but they key is that they picked.
Data Quality
When I taught statistics, I liked to share with my classes Stamp's Law:
The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the chowky dar (village watchman in India), who just puts down what he damn pleases.Analytics relies on the existence and accessibility of relevant data, and quite often that data is compiled from unreliable sources, or recorded by workers whose attention to detail is less than stellar. I once had a simple application of Dijkstra's shortest path algorithm go horribly awry because the arc lengths (distances), extracted from a corporate database, contained a large number of zeros. (For instance, the distance from Cincinnati to Salt Lake City was zero.) The zeros were apparently the result of employees leaving the distance field blank when recording shipments (as opposed to, say, spontaneous appearance and disappearance of wormholes). Puget's list mentions "uncertain or incomplete data", to which we can add incorrect data. I've read estimates (for instance, here) that anywhere from 60 to 80 percent of the work in a data-based analytics project can be devoted to data cleaning ... a topic that I think has not yet gained sufficient traction in academe.
Implicit Constraints/Criteria
By "implicit" I mean unstated (and possibly unrecognized). Puget mentions this as a cause of resistance to implementation of model solutions, in the context of front-line troops finding faults in the solution based on aspects not captured in the model. Managers may also find these types of faults.
An example I used in my modeling classes was a standard production planning model (linear program), in which the analyst selects the amounts of various products to manufacture so to maximize profit (the single criterion). In the textbook examples we used, it was frequently the case that some products were not produced at all in the optimal solution, because they were insufficiently profitable within the planning horizon. I would then ask the class what happens if you implement the solution and, down the road, those product become profitable (and perhaps quite attractive)? By discontinuing the products, have you lost some of the expertise/institutional knowledge necessary to produce them efficiently and with high quality? Have you sacrificed market share to competitors that may be difficult to recover? Did you just kill a product developed by the boss's nephew? My point was that perhaps there should be constraints requiring some (nonzero) minimum level of output of each product, so as to maintain a presence in the market. Otherwise, you in essence have a tactical or operational model (the boundary is a bit fuzzy to me) making a strategic decision.
Another example I used of implicit criteria also began with a production planning model (in this case more of a scheduling application). You solve the model and find an optimal schedule that meets demand requirements at minimum cost. It also involves major changes from the current schedule. What if there were a slightly suboptimal schedule that required only minor deviations from the current schedule. To a mathematician, "slightly suboptimal" is still suboptimal. To a production foreman having to make those changes, the trade-off might seem well justified.
Mission Creep
Particularly when a model is a first foray into analytics, the users commissioning the model may have a limited scope in mind, either because they are narrowly focused on solving a particular problem or because they think anything larger might be unmanageable. Once the decision makers see the fruits of a working model, they may get the urge to widen the scope and/or scale of the model. Should that happen, modeling choices made early on may need to be revisited, since the original model may not scale well. (Coders know this process as refactoring, not to be confused with refactoring a basis matrix.)
Mission creep can be the undoing of a project. I vaguely remember stories of the Pentagon tweaking the design of a new surface combatant, while the prototype was under construction, to the point where its superstructure raised the center of gravity so high that it was allegedly unstable in a rough sea. I also vaguely remember stories of a new carrier-based patrol aircraft (antisubmarine patrol bomber?) being redesigned on the fly (so to speak) until the prototype rolled off the assembly line too large to land on a carrier. Sadly, "vaguely remember" translates to being unable to recall the specific projects. If anyone recalls the details, please let me know via comment.
These are great insights! Thank you for making these available. I particularly enjoyed (and agree with) the explanation about the need to keep decision makers in charge. I didn't thought of that actually!
ReplyDeleteThere is another part I did thought about, because I encountered the exact same situation: a distance matrix with too many zero entries. Running a Floyd-Warshall algorithm on it led to instant space travel, as all distances vanished to 0...
Thanks. The sad thing about those zeros in the distance matrix is that they never seem to occur when I'm the one doing the traveling. :-(
DeleteDistances can become funny when doing transportation between Canada (we use kilometers) and USA. Employees will note the distance in the unit they naturally use, and it can easily go unnoticed. Using GIS based software solves the issue until you start to work on wood transportation.
ReplyDeleteDoes GIS software not help with wood transportation?
Delete