OR in an OB World: current events

Showing posts with label current events. Show all posts

Tuesday, September 3, 2013

STEM Crisis: Fact or Fiction?

After reading assorted articles about a looming crisis in the supply of qualified STEM (Science, Technology, Engineering, Mathematics) graduates, today LinkedIn pointed me to an article on the IEEE Spectrum site titled "The STEM Crisis Is a Myth". It seems to be cogent and fairly well researched, but I'm not sure I entirely buy the author's analysis. I will certainly stipulate this: determining whether there is a looming STEM crisis, its probable extent (if it does occur), and what to do about it are complex questions, involving fuzzy definitions, data that can be parsed in a variety of ways, and at times some rather heated opinions that can get in the way of careful analysis.

I don't have a position to defend in this debate, and I certainly don't have any answers. I do have some questions/concerns about some of the definitions, how some of the data is interpreted, and how some realities are perhaps ignored (or abstracted away) during the analysis ... and that's without getting into the questions of what constitutes a STEM degree or a "STEM job". In short, I'm happy to muddy the waters further. Here, in no particular order, are some thoughts:

Not all STEM graduates get "STEM jobs" ... nor need they.

Do the debaters consider jobs in banking and finance to be "STEM jobs"? My guess is that the answer in most cases is no; and yet the banking and financial services industries employ significant numbers of mathematics and statistics graduates. Actuarial positions may be classified as "STEM jobs", but what about people who work designing portfolios or analyzing market trends? Wall Street is a leading employer of Ph. D. graduates in particle physics (see, for instance, this New York Times article), largely because the physicists (claim to) understand the Ito calculus, used to describe physical processes such as Brownian motion but also used in derivative pricing.

My personal opinion, formed well before the 2008 market crash, can be summed up as follows: Handing my retirement portfolio to people who think they can measure particles (tachyons) that exist only at speeds greater than the speed of light ... what could possibly go wrong with that? I cringed when I learned that my alma mater, Princeton University, has a department titled Operations Research and Financial Engineering -- and trust me, it's not the OR part at which I cringe. Personal prejudices aside, though, it seems reasonable to me that a portion of STEM graduates will legitimately be desired for (and hired into) jobs that are probably not considered "STEM jobs", siphoning a portion of our university output away from the jobs specifically targeting holders of STEM degrees.

People have changes of heart.

Years ago, I had a student in an MBA course (now a lifelong friend) who had a bachelors degree in a scientific field (chemistry or something related). She had worked for the Michigan Department of Natural Resources as a lab chemist before deciding that it was not her passion, and that she would rather work in the business sector. Years later, I had another student with a science degree (chemistry? biochemistry? chemical engineering?) who had worked in the pharmaceutical industry before joining the MBA program. After graduating, he worked in management jobs that leveraged his scientific background but would not be classified as STEM jobs. Another MBA student had a degree in nuclear engineering and had served as a propulsion engineer aboard a US Navy submarine.

In fact, a fairly sizable portion of our MBA program consisted of people with undergraduate STEM degrees (and, often related work experience) who had decided to go over to the Dark Side and join the ranks of management. In comparing supply of and demand for STEM degrees, we need to allow for the fact that some STEM degree holders will naturally choose, for reasons other than a lack of job opportunities, to pursue non-STEM careers.

There is more to employability than having the right degree.

An administrator once made the mistake of inviting me to participate in an orientation session for new MBA students, designed to foster camaraderie and esprit de corps. During the session, I remember saying the following: "Look at the person on your left. Now look at the person on your right. Statistically, one in three people is an asshole ... so if it's not one of them, it's you." (I made the statistic up, based on my experience growing up in New York.) The point I was making then was that candidates would be required to work in teams, just as in industry, and it was naive to assume that it would always be easy to get along with all your teammates.

Sadly, though, it is also true that some people just cannot coexist at all with other workers. Some are chronically absent or late. Some need to be nagged, wheedled or hand-held just to get things done. Some are larcenous. I see no inherent reason why a STEM degree would preclude any of those possibilities (and in fact I've met walking, talking counterexamples to that conjecture). Those people will often "fail" a job interview, no matter how solid their technical credentials, or they will be involuntarily separated from an employer in a way that follows them through subsequent interviews. Thus we have to allow for some slice, hopefully small, of the STEM degree holders to be unsuitable for employment in STEM jobs (unless the recruiters are desperate).

Educational standards in the US ain't what they used to be.

During my graduate studies in mathematics at Michigan State University, I was a teaching assistant. Like most TAs, I began teaching recitation sections of a large service course, whose title was (approximately) "Introduction to College Algebra". The course was taught in a large lecture by a faculty member. The primary roles of the TAs handling the recitation sections were to answer questions and grade homework and exams.

A year or so after I arrived, we were joined by a new graduate student with a bachelor's degree in mathematics from a "prestigious" university. (I shall not name the university, so as to avoid embarrassing the fine folks in Ann Arbor.) He too had a teaching assistantship. Fall quarter of his first year, he was assigned to teach a couple of sections of the introductory algebra course. Winter quarter he was pulled out of the classroom, and his sole duty as TA was to work out the answers to the problems in the textbook ... because the Powers That Be had discovered that he could not answer simple algebra questions in class (and not, apparently, due to stage fright). Had he chosen to work in industry, rather than going straight to graduate school, and had the recruiters done their jobs well, he might have contributed to the statistics representing STEM degree holders not working in STEM jobs.

Employers frequently complain (particularly when lobbying for more H1B visas) that they cannot find a sufficient number of STEM degree holders with the "right skill set". We can argue about who bears responsibility for a genuine lack of the right skills: universities with outmoded curricula; employers unwilling to pay for training; or degree holders unwilling to upgrade their skills on their own time. We can also speculate (and many people do -- see the comments on the IEEE Spectrum post) on how often the "right skill set" translates to "willing to work cheap". We also need to accept that in some cases the "right skill set" translates to "actually knows the subject matter for their awarded degree", and that this is not a given.

Thursday, July 4, 2013

What's Wrong with a Gerontocracy?

The digital version of the June 24, 2013 issue of Time magazine contains a viewpoint column by Grace Wyler titled "Washington is a Gerontocracy". The subtitle adequately conveys her central point:

A 20-something can be the CEO of a billion-dollar company but can't run for the Senate. That doesn't make sense.

Given some of her arguments, I thought an operations research/mathematics perspective might be in order (plus it's a holiday and I have nothing better to do with my time).

First, I should stipulate that I'm actually somewhat in agreement with her. Here, in no particular order, is a partial list of things that a 21 year old US citizen legally can do:

consume alcohol (most places, anyway);
own property;
sign contracts;
wed;
reproduce;
drive an automobile;
vote;
control a variety of lethal military ordinance;
serve (and perhaps die for) his/her country.

Here, per Ms. Wyler, is a list of things that same citizen cannot do:

serve in the House of Representatives (minimum age: 25);
serve in the Senate (minimum age: 30);
serve as President of the United States (minimum age: 35).

Now, one of the truisms of operations research (albeit one not often taught in classrooms) is that when confronted with constraints for a decision, the first question should be something along the lines of "Why do those constraints exist?" It is not uncommon that the actual reason, if you can drill down to it, is something like "Because <insert name of previous or current big-wig> thinks/thought it should be that way" or "Because that's how we've always done it" (since the last time anybody bothered to think about it). With that in mind, I am not ready to dismiss the validity of lower age limits in general and those specified in the Constitution in particular, but I do think it is worth reviewing them and asking whether they are still appropriate.

I also happen to agree that, to some extent,

Capitol Hill could probably take some cues from people who aren’t afraid to move fast and break a few things.

I further agree that

Congress is struggling to keep up, spinning its wheels in a bygone era when people thought the Internet was a “series of tubes.”

Some of their votes on electronic privacy, protection of intellectual property and other technology-related bills have suggested a less than stellar understanding of technology ... but when the current 20-somethings are 40-somethings, that will resolve itself, much as the current "gerontocracy" knows how to drive a car without smacking it with a buggy whip.

Those points made, I have to take issue with some of Ms. Wyler's arguments.

But if any of these wunderkinder want to direct their powerful young minds toward governing the country, they will have to wait a few years. ... [T]here is a serious downside to barring young people from seeking federal office: with the public debate determined and dominated by senior citizens, the country doesn’t get to hear from — and vote for — the interests of young adults[.]

Having earlier touted a few 20-something billionaires who made their money from Internet businesses, Ms. Wyler suddenly seems to have forgotten their online presence (and the online presences of the multitude of non-billionaire 20-somethings who patronize those businesses). Television is also awash in good-looking 20- and 30-somethings, whether on scripted shows or reality shows. If there is one thing young people do not currently lack, it is platforms to make their opinions heard. Furthermore, I'm fairly sure the Constitutional limits on the ages of office-holders do not apply to staff (else the occasional Congressman-page scandal would be reported in the AARP newsletter and not in Time). Our current leaders have ample opportunity to listen to the opinions of their (considerably) younger constituents, whether they exercise those opportunities or not.

The result is that Capitol Hill remains at least a generation behind the rest of the country. In the 113th Congress, elected in 2012, the average age in the House is 57, and the average age in the Senate is 62.

In support of Ms. Wyler's position, using 2012 demographic data from the US Census Bureau, 57 is roughly the 70th percentile of adults (those over age 20) and roughly the 78th percentile of the entire population. So Congress does in fact skew a bit old. On the other hand, if we equate a "generation" to approximately 25 years, the "average" member of Congress is a bit more than a generation older than the 20-somethings and roughly a generation younger than the eldest segment of the population. So if they are "a generation behind the rest of the country", the "rest of the country" must exclude anyone with an AARP membership ... and trust me, you do not want to meddle with us. (Old proverb: Old age and treachery will overcome youth and skill.)

In his book Too Young to Run?: A Proposal for an Age Amendment to the U.S. Constitution, [Pomona College politics professor John] Seery writes that the age restrictions imposed by the Constitution lower the incentive for young adults to participate in what is supposed to be a representative democracy.

This is certainly plausible, the key being that "lower" conveys a direction but not a magnitude. The young adults in question can run for many local, regional and state offices. Where I live, we had a recent college graduate serve on our city council at 24 and serve as mayor well before his thirtieth birthday. I'm not one to advocate for career politicians, but an apprenticeship before taking national office does not seem unduly burdensome. There is something comforting to the notion that the people steering the ship are not out on their maiden voyage.

On to my personal favorite:

New Jersey Senator Frank Lautenberg, who died June 3 at 89, was the 298th Senator to die in office. Of the 22 Senators who have died in office since 1970, 16 were over 70.

This is again presented in the context that the Congress is too old, so presumably the reader is expected to infer that (a) having 16 of the last 22 senators who died in office be over 70 is somehow bad and that (b) electing younger people to Congress would reduce this number.

Electing younger Senators and Representatives would indeed bring the percentage of those expiring in office who are 70+ down, at least marginally, because it would create more opportunities for someone relatively young to die in office. According to the Social Security Administration actuarial tables, though, the average life expectancy for a 25 year old male is 52 years (death at age 77) and the average life expectancy for a 25 year old female is almost 57 years (death around age 82). If our goal is to decrease the percentage of in-office deaths that occur beyond age 70, let's look at some options that would have a more significant impact:

elect younger Senators and Representatives with serious, preferably terminal, illnesses;
elect younger Senators and Representatives who enjoy participating in extreme (and hazardous) sports;
elect younger Senators and Representatives who drive without using seat belts or ride motorcycles without helmets (preferably at high speed in both cases);
elect younger Senators and Representatives who are on active duty in war zones;
elect younger Senators and Representatives who are active members of street gangs or drug cartels;
elect substantially more young Senators and Representatives with strict term limits (so that they cannot grow old in office).

I kind of like the last one; the other suggestions may not be very practical. More to the point, though, is that we need to ask why it is a bad thing that so many of the Senators who died in office were relatively old (or, from my current vantage point, "not particularly young"). If someone is going to die in office, I personally would just as soon have them lead a long and hopefully fulfilling life first. Now if we want to reduce the frequency of deaths in office, as opposed to the average age of those who go out with their boots on, I can think of a few measures (including cutting back on free food and alcohol provided by lobbyists), and electing younger members of Congress (of reasonable health and with reasonably conservative personal habits) would likely lead to a nontrivial reduction in frequency.

Friday, March 29, 2013

Justice Flunks Math ... Or Not

Catching up on some online reading, I just chanced upon a New York Times op-ed piece titled "Justice Flunks Math". It deals with the Amanda Knox case. The authors' argument for their thesis (captured well by the title) centers around the following:

One of the major pieces of evidence was a knife collected from Mr. Sollecito’s apartment, which according to a forensic scientist contained a tiny trace of DNA from the victim. Even though the identification of the DNA sample with Ms. Kercher seemed clear, there was too little genetic material to obtain a fully reliable result — at least back in 2007.

By the time Ms. Knox’s appeal was decided in 2011, however, techniques had advanced sufficiently to make a retest of the knife possible, and the prosecution asked the judge to have one done. But he refused. His reasoning? If the scientific community recognizes that a test on so small a sample cannot establish identity beyond a reasonable doubt, he explained, then neither could a second test on an even smaller sample.

Whatever concerns the judge might have had regarding the reliability of DNA tests, he demonstrated a clear mathematical fallacy: assuming that repeating the test could tell us nothing about the reliability of the original results. In fact, doing a test twice and obtaining the same result would tell us something about the likely accuracy of the first result. Getting the same result after a third test would give yet more credence to the original finding.

Imagine, for example, that you toss a coin and it lands on heads 8 or 9 times out of 10. You might suspect that the coin is biased. Now, suppose you then toss it another 10 times and again get 8 or 9 heads. Wouldn’t that add a lot to your conviction that something’s wrong with the coin? It should.

My answer to the final (rhetorical?) question is yes: my conviction that the coin was biased would increase, because the second test is plausibly independent of the first test. Whether that same reasoning applied to a retest of DNA evidence would depend on whether the retest would be probabilistically independent or, if not, how strongly the two test results would covary.

Suppose, hypothetically, that we have a test that is sometimes accurate, sometimes inaccurate, but infallibly produces the same result (right or wrong) on a given sample. No number of retests will improve the accuracy of the test.

So the use of the coin flip analogy is somewhat facile. (I can understand the temptation to use it, though. The authors were writing for a general audience, not the more mathematically sophisticated -- not to mention orders of magnitude smaller -- audience for this blog.) Retrials of the DNA test are likely to be neither independent nor identical, but somewhere in between. So a retest might add some information, but might well not alter our confidence in the original test enough to justify it. Bear in mind that retesting has both monetary and evidentiary expenses (it consumes portions of a finite, irreplaceable sample).

I'm inclined to believe that the second DNA test should have been done, not because a repeated test would necessarily raise confidence substantially, but because technology had "advanced" -- but only if there were expert testimony that the technological improvements justified consumption of more of the sample.

Tuesday, March 5, 2013

The Value of Knowing the Value of Your Degree

Fellow blogger Laura McLay wrote today about a push in Congress to require someone (apparently states) to report statistics on graduate earnings by college/university and major. (See this report at Inside Higher Ed for more details; tip of the hat to Laura for the link.) Laura raises some excellent points about this pitfalls of this, and I will try not to duplicate her analysis.

Proponents of this sort of disclosure throw the word "transparency" around a fair bit, and in general I'm in favor of transparency (possible exceptions being clothing and curtains). Those of us associated with analytics are unlikely to argue against the provision of data (and, hopefully, some statistical analysis of it). Anyone who has used or taught decision analysis knows that, under typical assumptions, the expected value of imperfect information is nonnegative. In other words, it can't hurt to know. Those "typical assumptions", while mathematically mild, are important, and include the following:

we have some ability to assess the general accuracy (or inaccuracy) of the information; and
we make rational use of it.

Tips about the stock market, for example, point to the importance of the first assumption. Stock tips are never fully accurate, and stock tips from your halfwit brother-in-law may be chronically inaccurate (which raises the value of the information as long as you realize that -- just do the opposite of what he suggests). Buy recommendations from a generally reliable broker who just happens to have a big fish client trying to dump a chunk of that particular security, though, are problematic because you cannot assess even their approximate accuracy.

The second issue -- can we make rational use of the information (and will we) -- is one reason doctors are not always supportive of genetic testing and sometimes even follow-up tests for marginal positive results. Will information that the patient's risk of some relatively unlikely or slowly progressing condition unduly depress the patient, cause the patient to embark on expensive, invasive and/or risky tests or procedures, or otherwise push the patient to do something that might not be entirely reasonable?

So my first reaction to the notion of making information available to potential college students about career prospects (placement rates, starting salaries, salaries five years out etc.) as they relate to the nature and source of the college degree is positive: more information is better. My second reaction is that it needs to be information, not just data, meaning that someone reliable (knows analytics) and trustworthy (not out to recruit students) needs to process the data and translate it into actionable knowledge. Moreover, it needs to be communicated to prospective students in a way that lets them understand both the implications and the limitations of the information. So we need statisticians or analytics professionals involved, and we need communications professionals involved.

I'll end with a few specific comments:

As Laura mentions, the analysis should at minimum provide ranges and not just averages. Those of us with analytics training are only too aware of the Flaw of Averages.
Salaries are one way to look at the value of a degree. Break-even analysis (the time required to earn enough to pay off the cost of the degree, include lost earnings for the time spent in college) is another, but it is trickier to compute.
Some nontrivial statistical modeling may be required to account for various factors other than school and major that might influence earning power. For example, some schools have an explicit pre-med major, while at some schools pre-med students major in chemistry, biology or biochemistry, and at some schools they major in something unrelated. When I was an undergraduate, the student living next door to me was a pre-med who majored in English. If that were true across the board (and I have no idea if it was, but at minimum it was not an anomaly), then our English majors probably out-earned English majors at schools with explicit pre-med majors.
There is more to a career than salary, and that needs to be conveyed to consumers of this information. Before the markets tanked in '07-'08, finance majors hired by Wall Street trading firms enjoyed rather high salaries (higher than what finance majors earned in corporate finance positions, and certainly higher than many other majors). They also "enjoyed" a high cost of living, ungodly work hours and high stress. My impression is that aggressive personalities tended to fare better than less aggressive ones. So that high salary figure for finance majors at schools that fed the Wall Street mill needed to be tempered by an understanding of those other factors.
Laura mentions the effect of time. Widespread dissemination of salary data might lead to gluts in the better-paying fields, driving down salaries in those fields. At the same time, demographic, economic or technological trends might augur for higher salaries down the road in fields that recently have not paid that well. (I'd mention gerontology, but someone might read something personal into it.)
There are risk factors involved in the decision to attend college, the choice of the college to attend, and the choice of major. Major A might pay more after graduation than major B, but if majoring in A makes it likely you will fail to graduate and B is safer (given your particular skill set and inclinations), maybe B is really the better deal.

Sunday, December 16, 2012

How Not to Debate Gun Control

In the wake of Friday's shooting rampage at a Connecticut elementary school that left 26 dead (not counting the shooter), 20 of them between the ages of six and seven (casualty list here; keep a full box of tissues handy if you decide to read it), there are once again calls for a debate on greater gun control in the U.S., and protests by those against it. I have my own opinions on the issue, which I will keep to myself because they are just that: opinions. Both sides of the issue are repeating a pattern of "debate" that contains several fundamental flaws.

Emotional Decisions

First, both sides are confronting a very difficult issue while emotions are running high. It is hardly surprising that a considerably body of research has shown that emotions impact decision-making in a variety of ways. While there may be some benefit to a heightened emotional state in this case -- it pushes us to take up a contentious issue when we might otherwise be tempted to "kick the can down the road" -- there is also the danger that we let those emotions trump reason. In particular, listening to one's "gut" is considerably easier than dealing with a complex, multidimensional analysis.

Reliance on Anecdotes

There is a rational analysis of the issue on Randy Cassingham's blog, along with a considerable discussion in the comments section. It illustrates the flaws I'm discussing here, including in particular the reliance on anecdotes as opposed to statistics and decision models. Some parties in favor of tighter control over guns and ammunition will argue that, had those tighter controls been in effect, this particular incident would have/might have been averted, or at least produced a lower body count. Some parties opposed to tighter controls (or opposed to tighter controls merely in reaction to this incident) will argue that other crimes of a similar nature were conducted without the use of firearms, citing in particular the 1927 Bath Township school bombings. (It happens that I live approximate four miles from Bath Township.) Both sides are relying on historical anecdotes.

Mr. Cassingham mentions closures of mental hospitals, and some commenters echo the theme that we need to address mental illness, rather than gun control. It's not clear what prompted those comments, other than what I suspect is a common assumption that you have to be nuts to murder children, but it is possible that some people are recalling previous incidents in which they believe a shooter was mentally deranged (in the clinical sense) and either was denied treatment or should have been (but was not) involuntarily committed to treatment. For what it's worth, the shooter in the Virginia Tech massacre had been diagnosed with an anxiety disorder (which, to the best of my knowledge, is not a know precursor to violence) but had been receiving treatment. Eric Harris, one of the two Columbine shooters, also suffered some emotional issues but was receiving treatment. The gunman in the 2006 Amish school shooting seems to have been (in unscientific terms) a whack-job, but an undiagnosed one.

In any case, making decisions based on anecdotal evidence is unsound. Anyone in operations research (or statistics) knows that a sample of size 1 is not a useful sample. Reliance on anecdotes also makes us more susceptible to confirmation bias, since (a) we better remember the anecdotes that support our beliefs and (b) we may unconsciously warp those memories if they would otherwise not provide confirmation.

There's a parallel here to the climate control debate. Elements in favor of climate control legislation will argue that a particular weather event (the North American drought in 2012, "Superstorm" Sandy) was the direct result of global warming, even when climatologists are scrupulous in pointing out that there is no direct causal link to a single event. Global warming naysayers will focus on specific events (recent drops in recorded temperatures, record floods from a century or more ago) as evidence that global warming is not occurring, is not a recent phenomenon, or is not exacerbated by man-made emissions.

Optimizing vs. Satisficing

Not that I really believe "satisficing" is a word, but I'll bite the bullet and use it here. Even when a problem has an optimal solution, it is sometimes the case that the time and effort to find it are not adequately rewarded when an alternative solution provides an adequate degree of satisfaction in a more timely or economical manner. Besides the anecdotal aspect, Mr. Cassingham's emphasis on the Bath bombings and the wave of school stabbings and bludgeonings in China (echoed by some of the commenters) implicitly uses the line of argument that if we cannot prevent every mass shooting by enhanced gun control (optimality), it is not worth pursuing. Many (including, I'm willing to bet, Mr. Cassingham) would consider a reduction in mass shootings, or even a reduction in the body counts, as a significant improvement (satisficing). Gun control advocates are not immune from this focus on optimality; they sometimes appear to adopt the position that any level of regulation that would fail to prevent a particular incident is insufficient.

Multi-Criteria Decision Analysis

Okay, you knew this post eventually had to tie back to operations research in some meaningful way (didn't you?). The issue of how to mitigate violence at schools seems to me to be a rather messy example of multi-criteria decision analysis. This is by far not my area of expertise, so I'll keep things rather general here. As I understand MCDA, the discussion we should be having should be framed approximately as follows:

What are our criteria for success? This includes a variety of objectives: keeping children safe; preserving individual rights; making schools conducive places for learning; maintaining a populace capable of defending itself in times of war (I personally do not subscribe to the theory that gun owners necessarily make more effective soldiers, but it should be considered); and likely other objectives I'm not seeing at the moment.
How do we quantify/measure attainment of those objectives? For instance, it's fine to say we want no more children to die at school (or perhaps no more deaths specifically from gun violence), but does that mean that a 99% reduction in such deaths has no value? What about a 1% reduction?
What are our options? We cannot have a meaningful argument about choices without knowing what those choices are.
What are the costs and consequences of each alternative? These will need to be considered in a probabilistic manner. To take a specific, if rather vague, example, suppose that one of the options bans the sale of high-capacity ammunition clips. That would not stop a shooter from wreaking havoc with one or more standard clips, nor could we be positive that no shooter ever would manage to gain access to a high-capacity clip. For that matter, we have no idea when another school shooting might take place (although, sadly, the "if" does not seem to be in doubt -- see the Wikipedia compilation of school-related attacks for some severely depressing historical evidence.). Someone will need to attempt a quantitative assessment of the frequency and severity of violent incidents under each alternative being considered. "Minority Report" being a work of fiction, we cannot claim that a particular decision will prevent or mitigate a specific future attack; we can only talk about expected benefits (and costs).
If we consider multiple alternatives in combination, how do they interact? There is no reason to assume that, in our quest to make life safer for our children, we are limited to a single course of action. It would, however, be a mistake to evaluate each course of action independently and then assume that the results of a combination of them would be additive.
How will we reconcile trade-offs? As one can see in the Wikipedia article on MCDA, there are quite a few methods for solving a multi-criteria decision problem. They differ in large part in how they address the issue of trade-offs. For instance, charitably assuming we can even measure these things, how much freedom are you willing to give up, and/or how much more are you willing to pay in taxes, to eliminate one "expected" future fatality or injury?

This is all very complex stuff, and to do it justice requires a great deal of research (by unbiased parties, if we can find them), a careful and open discussion of what we might gain and what we might lose with each possible action, and an attempt to reach consensus on what trade-offs are or are not acceptable. If the process ever reaches a conclusion, it will also take one heck of a sales job to convince the public that the result really is a good (not perfect, but better than status quo) result.

Update

Shivaram Subramanian wrote a really interesting and well-researched "Collection of Notes on Gun Control", which I commend to anyone interested in the debate. (You won't often see Louis L'Amour, Isaac Asimov and King Asoka of India name-checked in the same blog post.) He cites a Washington Post column from July, in the aftermath of the Aurora (Colorado) shooting, titled "Six facts about guns, violence and gun control" which is also very much worth reading (or, sadly, rereading).

Friday, June 22, 2012

The Normal Density Is Not A Fractal

Early in May, my radio alarm awoke me to NPR news doing a story about a recently published paper [1]. The main point of the paper seemed to be that productivity of people in several fields (academic research, professional and high-level amateur athletics, entertainment, politics) does not seem to follow a Gaussian (normal) distribution, but rather follows a Pareto (power) distribution. Perhaps the primary differences are that a small number of individuals generate a disproportionate share of the output, and that a high proportion of individuals have "below average" output (median well below mean).

My first reaction was "No fooling!" (Actually, "fooling" is an editorial substitution. I can be a bit cranky when I'm torn from the arms of Morpheus.) Hot on that first reaction was this: "The normal density is not a fractal." Specifically, the right tail of a bell curve is not itself a bell curve ... and if I can recognize that while half asleep, it must be fairly obvious. So where's the beef?

The first figure below shows density functions for Gaussian (red) and Pareto (blue) distributions. The second figure shows a plausible Gaussian distribution for athletic ability among the overall population. Professional athletes, or Division I collegiate athletes, presumably fall in the right tail (shaded). The right tail bears considerably more resemblance to the Pareto distribution in the first figure than it does to a Gaussian distribution.

The research documented in the paper covered five studies (one for each of the fields I listed above) using 198 samples (a variety of performance measures for each field, or subsets of each field), involving a total of 633,263 individuals. As with any statistical study, you can question methods, sample definitions, interpretation of results etc. I've read the paper, and I find the evidence fairly compelling, particularly as it confirms my initial intuition (expressed in my waking reaction).

I'm not sold on Pareto as the correct choice among one-tailed distributions, and I'm not sold on single tail distributions in general. Consider, for instance, a performance measure on a [0,1] scale, such as career batting average for baseball players. The bounded domain precludes a long, thick tail on either side. A friend of mine did in fact download and fit some batting average data. I won't reproduce it here, since I'm not sure how his sample was defined (all players or just selected ones), but the histogram he showed me was a lot closer to Gaussian than to Pareto.

That said, it seems intuitive to me that if performance correlates strongly to ability, if ability has a roughly Gaussian distribution, and if there is a selection mechanism in play that selects the most able to compete, then performance among the competitors will not be Gaussian. This needs to be qualified in a variety of ways, including the following:

Not everyone with high ability may choose to compete. Athletes may find it more lucrative (and less dangerous) to act in adventure films; people capable of great research may find it more lucrative to work in industry (where opportunities to publish are greatly diminished).
Not everyone with high ability may have the opportunity to compete. Some star athletes are forced to retire prematurely, or to abandon hope of starting a professional career, due to health concerns. Potential stars in any field may go undiscovered due to where they live or what schools they attend. At the risk of creating a distraction by introducing a somewhat charged topic, women or members of an ethnic (here) or religious (elsewhere) minority may be excluded or discouraged from entering the competition, regardless of their ability.
Nepotism (particularly but not exclusively in the entertainment field) may introduce competitors who do not reside in the right tail of the ability distribution. To a lesser extent (I think), diligence, hard work or "heart" may allow some people with above median but less than excellent athletic ability to compete in high level events.
Productivity is not always a monotonic function of talent. A very talented wide receiver may not catch many balls if he is on a team with a relatively poor quarterback, a strong running game, and/or a stellar defense (allowing the team to practice a conservative offense). Conversely, a modestly talented receiver on a team with no running game and a poor defense (so that they are perpetually playing catch-up) may catch a disproportionate number of passes. Similar things can happen to an academic with great research potential working at a "teaching school", or in a department lacking resources and not providing colleagues or doctoral students with whom to collaborate. (Conversely, some faculty are adept at riding the coattails of their more productive colleagues, showing up as coauthors of all sorts of papers.)

Disclaimers aside, suppose that we accept the central premise of the paper, that in many cases productivity looks more like a power distribution than like a bell curve. So what? Here's the point (and the reason that my second reaction, the "not a fractal" comment, was perhaps a bit unjust in evaluating the significance of the paper). Probably the most common things I see in academic studies involving any sort of performance measure are F-tests of the statistical significance of groups of terms and t-tests of the significance of individual terms in (usually linear) regression models. Both those tests are predicated on normally distributed residuals. They are somewhat "robust" with respect to the normality assumption, which is a hand-wavy way of saying "we're screwed if the buggers aren't normal, unless we have an infinite sample size, so we'll just call our sample size close enough to infinite and forge ahead". If the residuals are not sufficiently close to Gaussian, and the sample size is not large enough, F- and t-tests may induce falsely high levels of confidence.

Now it is not the case that the response variable (here, the performance measure) need be normally distributed in order for the residuals to be normally distributed. Unless ability is adequately covered by the explanatory variables, though, the effects of ability will be seen in the residuals, and if the distribution of ability among the sample (those who likely were chosen at least in part on the basis of ability) bears any resemblance to a Pareto distribution, it seems fairly unlikely that the residuals will be normally distributed ... and fairly risky to assume that they are. Some academic papers cite specific tests of the normality of residuals, but in my experience it is far from a universal practice.

The authors of the paper point out a second issue related to this. Extremely high values of performance are more likely to occur with Pareto distributions than with Gaussian distributions. Some (many?) authors, taking normality for granted, treat extreme values as outliers, assume the observations are defective, and "sanitize" the data by excluding them.

So consumers of academic papers studying performance may be buying a pig in a poke.

[1] O'Boyle Jr. and Aguinis, "The Best and the Rest: Revisiting the Norm of Normality of Individual Performances." Personnel Psychology 65 (2012), 79-119.

Tuesday, February 21, 2012

OR and Base Voters: Common Pitfalls

My adopted state of Michigan is currently afflicted with the Republican presidential primary. (Symptoms include repetitious attack ads on television, robocalls to one's house, and the general malaise associated with staring at any crop of candidates for political office.) Primaries tend to draw out "base" voters (those committed to one party or the other); we swing voters just stay at home, hiding under the covers until it is over.

Last night the local TV news included a sound bite from a generic Republican voter, an apparently intelligent and articulate woman (to the extent one can judge these attributes from a two sentence interview) who said she was still undecided because she wanted to vote for the "most conservative" candidate. The logic, or lack of logic, behind that statement caused me to take notice of the similarities between how some "base" voters think and common errors in operations research.

A single criterion is easy, but multiple criteria may be correct. There are quite a few pressing issues these days, ranging from foreign policy to budget deficits to global warming to unemployment to ... (I'll stop there; I'm starting to depress myself). Our base voter, henceforth Mme. X, has apparently condensed these criteria down to a single value, on a scale from hard core liberal (arbitrarily 0) to hard core conservative (arbitrarily 1). What is not apparent is how the multiple dimensions were collapsed to a single one. OR people know that multiple criterion optimization is hard, more from a conceptual standpoint than from a computational one. Using a single composite criterion (weighted sum of criteria, distance from a Utopia point in some arbitrary metric, ...) makes the computational part easier, but there are consequences (frequently hidden) to the choice of the single criterion. Goal programming has its own somewhat arbitrary choices (aspiration levels, priorities) which again can have surprising consequences. Picking the "most conservative" candidate simplifies the cognitive process but may lead to buyer's remorse. Similarly, arbitrarily collapsing multiple objectives into a single objective may simplify modeling, but may produce solutions that do not leave the client happy.

Averages can be deceptive. Point estimates also make modeling and decision making easier, but they can mask important things. (A colleague has a favorite, if politically incorrect, quotation: "Statistics are like bikinis. What they reveal is interesting, but what they conceal is critical.")

Suppose that Mme. X has narrowed her choices down to two candidates, and that they have both weighed in on five important issues (A through E). If candidate 1 is consistently to the right of candidate 2 on all issues, we have a dominated solution: Mme. X can eliminate candidate 2 and vote for candidate 1. On the other hand, consider the following scenario, where each candidate's position is rated on a scale from 0 (liberal) to 1 (conservative).

Candidate 1 is more conservative than candidate 2 in both mean (0.780 versus 0.756) and median (0.80 versus 0.75); yet candidate 2 is to the right of candidate 1 on two of five issues (A and B), and close to a wash on a third (C). So if Mme. X truly wants a conservative candidate, it is not all that clear which she should prefer. Likewise, OR models that consider only point estimates without taking dispersion into account can result in solutions that should do well "on average" but sometimes do quite poorly.

A solution that goes unimplemented is not a solution. Missing in Mme. X's search for the most conservative candidate is the quality referred to by pundits as "electability". Neither major political party claims a majority of registered voters in the U.S., so to win a general election, a candidate must capture a significant number of moderates and independents. The most ideologically pure candidate (for either party) may not be able to do so. This is a bit of a paradox in recent elections, where candidates find that they must appeal to "base" voters at one end of the political spectrum to get the nomination, then appeal to voters in the middle of the spectrum to win the election. Ideological "base" voters may not grasp this particular reality; they expect the "correctness" of their candidate's views (which are also their views) to triumph. [This may be at least partly explained by the false consensus fallacy.]

OR modelers sometimes have a similar blind spot. We can pursue perfection at the expense of good answers. We can opt for the approach that uses the most sophisticated or "elegant" mathematics or the most high-powered solution technique available. We may try for more scope or more scale in a project than what we can accomplish in a reasonable time frame (or what users can realistically cope with, in terms of data requirements and solution complexity). Professional journals often encourage this trend by requiring "novel" solution methods in order to publish a paper. The end result can be a really impressive solution that sits on a shelf because the client is unwilling or unable to implement it, or because it is too complex for the client to understand and trust.

Garbage in, garbage out. OR models rely on data, as inputs to the decision process or to calibrate parameters of the model. Feed bad data to an otherwise correct model and no good will come of it. I have seen estimates that as much as 60% of the time in an OR project can be spent cleaning the data.

Meanwhile, Mme. X has to rely on a variety of unreliable sources to gauge how conservative each candidate may be. Candidates famously say things they may not entirely believe, or express intentions they may not carry out, either in an overt effort to curry voters or because their views change between campaigning and governing. Historical data may be faked or misreported, and sometimes facts may not be what they seem. For instance, a generally pro-military candidate might vote against a military appropriation bill because there is a rider on it that would fund an inordinately wasteful project, or something unpalatable to the candidate and/or the candidate's constituents. Opponents will characterize this as an anti-military stance. Budget projections, and indeed any sort of projections, are subject to forecast errors, so a candidate's magical plan to fix deficits/unemployments/Mme. X's dripping kitchen faucet may turn out not to be so magical after all. Unfortunately for Mme. X, she probably has less ability to filter and correct bad data than an OR analyst typically does.

So, in conclusion, voters and OR analysts face similar challenges ... but OR analysts do not have to cope with a glut of robocalls.

Friday, October 21, 2011

Time Series, Causality and Renting a Congressman

This morning I was reading an article, in a national (US) news magazine, that discussed energy policy decisions. It mentioned executives of companies in various energy-related industries contributing to the coffers of various congressmen (from both parties), and it mentioned their votes on key policy issues. The implication was clear: the votes were being bought, or at least influenced, by interested parties.

I understand why writers do this: it spices up the articles (a hint of corruption) without making any explicit charges that cannot be backed up with hard evidence; and it appeals to voters looking for confirmation of their prior belief that congressmen are crooks. I tend to view our elected representatives as by and large venal to some extent myself. The last part of the title of this post is motivated by a quote from Simon Cameron (Abraham Lincoln's first Secretary of War): "An honest politician is one who, when he is bought, will stay bought." As you can tell from the title, I am not sanguine about finding honest politicians.

As an OR professional, though, I understand the flaw in the argument (if we can characterize what is really innuendo as an "argument"). It relates in part to confusing correlation with causation and in part with confusing the direction of causation. Let's say that our fearless reporter accurately reports that Representative A votes for subsidies for domestic production of buggy whips, and Ms. B, CEO of a large manufacturer of buggy whips, donates regularly and/or generously to Rep. A's election campaigns. With minimal effort choosing words, our reporter can make this sound as if A voted for subsidies because B (and perhaps others of a similar mind) paid him off. That is indeed one possibility.

Another possibility is that A voted for subsidies first and then B donated, recognizing A (from his vote) as a friend of the buggy whip industry. There's causation, but in the opposite direction from what the reporter implied, and nothing venal. The nature of political contributions is that you usually contribute to candidates with whom you agree. (I say usually because lobbyists sometimes contribute to both sides, hoping the winner will remember their contribution to her and not to her opponent.)

Yet another possibility is that B contributed to A first, either anticipating a pro-buggy whip platform or for reasons unrelated to industrial policy, and A favors subsidizing domestic production for sincerely held (if possibly misguided) reasons. This is a case of trying to infer causation from what may be mere correlation.

How can we tell whether A's vote is in fact being purchased? Anyone who has done much with statistics recognizes how hard it is to establish causality, especially in the absence of a designed experiment with an appropriate control group. We could try tracking A's votes over time, matching them with B's contributions (or the contributions of B and other pro-subsidy individuals or groups), and look for a positive cross-correlation with lag 1 or higher (so that swings in donations lead swings in votes). A positive cross-correlation would still not be solid proof of vote-buying, though, and might also lack "power" as a test of vote-buying: if A fits Cameron's definition of an honest politician, and consistently votes pro-buggy whip because he is taking graft, there will be no variance in his votes and therefore no cross-correlation.

This leaves me in the unsatisfying position of not knowing whether or not A is a crook. There have been enough US congressmen convicted of malfeasance to convince me that the proportion of occupants of Capitol Hill who are crooks is not zero. Still, there is also a well known aphorism (of uncertain provenance) that might apply: "Never ascribe to malice that which is adequately explained by incompetence." We have ample evidence that incompetence is not in short supply on the Hill.

Saturday, July 16, 2011

Facts, Beliefs ... and Budgets

Melissa Moore (@mooremm), Executive Director of INFORMS, recently tweeted the following question: "What #ORMS or #Analytics tools would you use if you were asked to help solve the US Federal #budget/#debt impass?" My initial reaction (after verifying that a flammenwerfer is not considered an OR tool) was that OR and analytics would be of no use in the budget debate (debacle?). OR and analytics rely on facts and logic; it is unclear that either side of the debate is interested in facts or willing to be constrained by logic.

The question did set me to thinking about the difference between facts and beliefs. I have a hard time sorting out when demagogues, whether politicians or media bloviators, are espousing positions they actually believe and when they are simply pandering for ratings/votes. (My cynicism is hard won: I grew up in New York, went to school in New Jersey, and cast my first vote to reelect Richard M. Nixon. It's been downhill from there.) For the sake of argument, let's stipulate that both sides are acting on beliefs they truly hold. When I was younger it seemed to me that, however venal either side's motives might be, both the left and the right were capable of negotiating based on some common understanding of governance and the political, social and economic realities of the country they governed. It's hard to trade horses, though, when one side can't tell a horse from a zebra and the other can't tell a horse from a camel. Today, one party thinks that the answer to any question that does not contain the phrase "gay marriage" is "cut taxes". The other side thinks that the answer to any question that does not contain the phrase "gay marriage" is "tax the rich". That the proposed solution might not work is simply inconceivable (as is the possibility that the other side's solution might work).

The somewhat unnerving truth, however, is that everything we think we know as a fact (raw data aside) is ultimately a belief. My training is in mathematics. Casual users of mathematics, and even forgetful mathematicians, tend to think that what has been "proved" (i.e., a theorem) is definitively true. In reality, theorems are merely statements that must follow logically from a set of axioms (beliefs). The system of logic we accept is itself a matter of belief, but in the interest of avoiding a painful flashback to an undergraduate formal logic course I'll drop that line of thought right now. As in mathematics, so too in the physical sciences: theory arises from a mix of assumptions and empirical evidence; when new evidences clashes with the theory, modifications are made; and when the modifications become untenable, some assumption is altered or deleted and the theory is rebuilt. (Remember when the speed of light was a constant?)

So if mathematics and physical sciences are built on leaps of faith, we really cannot fault elected representatives (and economists) from doing the same. What we perhaps can demand, though, is that these beliefs at least be acknowledged as beliefs (not "proven facts"), and that decision makers attempt to examine the likely impact of any of those beliefs turning out false. As a parallel (pun deliberate), consider Euclid's Elements, written ca. 300BC, in which Euclid developed many theorems of what we now refer to as "Euclidean" geometry based on five postulates. The postulates appear self-evident, and mathematicians over the centuries tried unsuccessfully to derive one from the others (turning the derived one into a theorem). In the 19th century, Nikolai Lobachevsky famously replaced Euclid's fifth postulate with a negation of it, perhaps hoping to prove the fifth postulate from the others by contradiction. Rather than finding a contradiction, he invented hyperbolic geometry, which is not only consistent as a mathematical system but has actually found use (those bleeping physicists again).

So, back to the original question: can OR bring any useful tools to bear on the budget debate? With enough time and effort, and exploiting the systems perspective that underlies OR, perhaps we could diagram out the interplay of all the assumptions being made (consciously or unconsciously) by each side; and perhaps, using simulation models based on those assumptions and calibrated to historical data, we could explore the consequences of each side's preferred solution (or, for that matter, any compromise solution) should any specific assumption not hold up. It would be a massive undertaking, and I am not confident it would be productive in the end. Zealously held beliefs will not yield easily to "what if" analyses.

Wednesday, November 17, 2010

Hidden Assumptions, Unintended Consequences

The New York Times has an amusing "Budget Puzzle" that allows you to select various options for budget cuts and tax increases at the federal level, in an attempt to close the anticipated US federal deficits for 2015 and 2030. The visible math is basic addition and subtraction, but hidden below the projected results of each option are various unstated assumptions. Many of these assumptions will be economic in nature, and economists have been known to make some entertaining assumptions. (Do you remember that bit in Econ 101 about the ideal production level for a firm being the quantity where marginal cost equals marginal revenue? Do you recall any mention of a capacity limit?)

Also missing from the NYT puzzle are projections of indirect costs for any of the options. If we reduce our military presence in Afghanistan drastically, will we subsequently need to make an expensive redeployment there? If we increase taxes on employer-paid health care policies, will employers reduce the availability of those policies and, if so, will that indirectly impose greater health care costs on the federal government?

It's obvious why the Times puzzle does not go into those considerations -- they're involve considerable uncertainty, and they would probably make the puzzle too complicated to be entertaining. As OR people, we're used to making assumptions (frequently designed to simplify a real-world mess into a tractable problem); but we should also be acutely aware that the puzzle contains many unstated assumptions, and not attach too much credibility to the results.

Incidentally, I balanced the budget with 34% spending cuts and 66% tax increases, and I consider myself to be a fiscal conservative (but a realist, at least in the relative topology of mathematicians).

OR in an OB World