OR in an OB World: The Normal Density Is Not A Fractal

Friday, June 22, 2012

The Normal Density Is Not A Fractal

Early in May, my radio alarm awoke me to NPR news doing a story about a recently published paper [1]. The main point of the paper seemed to be that productivity of people in several fields (academic research, professional and high-level amateur athletics, entertainment, politics) does not seem to follow a Gaussian (normal) distribution, but rather follows a Pareto (power) distribution. Perhaps the primary differences are that a small number of individuals generate a disproportionate share of the output, and that a high proportion of individuals have "below average" output (median well below mean).

My first reaction was "No fooling!" (Actually, "fooling" is an editorial substitution. I can be a bit cranky when I'm torn from the arms of Morpheus.) Hot on that first reaction was this: "The normal density is not a fractal." Specifically, the right tail of a bell curve is not itself a bell curve ... and if I can recognize that while half asleep, it must be fairly obvious. So where's the beef?

The first figure below shows density functions for Gaussian (red) and Pareto (blue) distributions. The second figure shows a plausible Gaussian distribution for athletic ability among the overall population. Professional athletes, or Division I collegiate athletes, presumably fall in the right tail (shaded). The right tail bears considerably more resemblance to the Pareto distribution in the first figure than it does to a Gaussian distribution.

The research documented in the paper covered five studies (one for each of the fields I listed above) using 198 samples (a variety of performance measures for each field, or subsets of each field), involving a total of 633,263 individuals. As with any statistical study, you can question methods, sample definitions, interpretation of results etc. I've read the paper, and I find the evidence fairly compelling, particularly as it confirms my initial intuition (expressed in my waking reaction).

I'm not sold on Pareto as the correct choice among one-tailed distributions, and I'm not sold on single tail distributions in general. Consider, for instance, a performance measure on a [0,1] scale, such as career batting average for baseball players. The bounded domain precludes a long, thick tail on either side. A friend of mine did in fact download and fit some batting average data. I won't reproduce it here, since I'm not sure how his sample was defined (all players or just selected ones), but the histogram he showed me was a lot closer to Gaussian than to Pareto.

That said, it seems intuitive to me that if performance correlates strongly to ability, if ability has a roughly Gaussian distribution, and if there is a selection mechanism in play that selects the most able to compete, then performance among the competitors will not be Gaussian. This needs to be qualified in a variety of ways, including the following:

Not everyone with high ability may choose to compete. Athletes may find it more lucrative (and less dangerous) to act in adventure films; people capable of great research may find it more lucrative to work in industry (where opportunities to publish are greatly diminished).
Not everyone with high ability may have the opportunity to compete. Some star athletes are forced to retire prematurely, or to abandon hope of starting a professional career, due to health concerns. Potential stars in any field may go undiscovered due to where they live or what schools they attend. At the risk of creating a distraction by introducing a somewhat charged topic, women or members of an ethnic (here) or religious (elsewhere) minority may be excluded or discouraged from entering the competition, regardless of their ability.
Nepotism (particularly but not exclusively in the entertainment field) may introduce competitors who do not reside in the right tail of the ability distribution. To a lesser extent (I think), diligence, hard work or "heart" may allow some people with above median but less than excellent athletic ability to compete in high level events.
Productivity is not always a monotonic function of talent. A very talented wide receiver may not catch many balls if he is on a team with a relatively poor quarterback, a strong running game, and/or a stellar defense (allowing the team to practice a conservative offense). Conversely, a modestly talented receiver on a team with no running game and a poor defense (so that they are perpetually playing catch-up) may catch a disproportionate number of passes. Similar things can happen to an academic with great research potential working at a "teaching school", or in a department lacking resources and not providing colleagues or doctoral students with whom to collaborate. (Conversely, some faculty are adept at riding the coattails of their more productive colleagues, showing up as coauthors of all sorts of papers.)

Disclaimers aside, suppose that we accept the central premise of the paper, that in many cases productivity looks more like a power distribution than like a bell curve. So what? Here's the point (and the reason that my second reaction, the "not a fractal" comment, was perhaps a bit unjust in evaluating the significance of the paper). Probably the most common things I see in academic studies involving any sort of performance measure are F-tests of the statistical significance of groups of terms and t-tests of the significance of individual terms in (usually linear) regression models. Both those tests are predicated on normally distributed residuals. They are somewhat "robust" with respect to the normality assumption, which is a hand-wavy way of saying "we're screwed if the buggers aren't normal, unless we have an infinite sample size, so we'll just call our sample size close enough to infinite and forge ahead". If the residuals are not sufficiently close to Gaussian, and the sample size is not large enough, F- and t-tests may induce falsely high levels of confidence.

Now it is not the case that the response variable (here, the performance measure) need be normally distributed in order for the residuals to be normally distributed. Unless ability is adequately covered by the explanatory variables, though, the effects of ability will be seen in the residuals, and if the distribution of ability among the sample (those who likely were chosen at least in part on the basis of ability) bears any resemblance to a Pareto distribution, it seems fairly unlikely that the residuals will be normally distributed ... and fairly risky to assume that they are. Some academic papers cite specific tests of the normality of residuals, but in my experience it is far from a universal practice.

The authors of the paper point out a second issue related to this. Extremely high values of performance are more likely to occur with Pareto distributions than with Gaussian distributions. Some (many?) authors, taking normality for granted, treat extreme values as outliers, assume the observations are defective, and "sanitize" the data by excluding them.

So consumers of academic papers studying performance may be buying a pig in a poke.

[1] O'Boyle Jr. and Aguinis, "The Best and the Rest: Revisiting the Norm of Normality of Individual Performances." Personnel Psychology 65 (2012), 79-119.

10 comments:

AnonymousJune 25, 2012 at 5:48 AM
I have two comments:

1) In the figure, it looks as if there was a hard threshold, where people with a certain ability enter the sample of athletes. In your explanation, however, you state various reasons why this threshold may well be a quite individual one, in which case you may actually return to having a normal distribution, even for athletes. Then, the bell curve may not be a fractal, but due to sample selection behave just like one. Now this is ability of athletes, but the issue was performance.

2) Performance is not the same as ability and as you already stated, you probably need to include personal chances as explanatory variable of performance. Assuming personal chances to be normally distributed, if we multiply both random variables, we obtain the normal product distribution, which looks a lot like a Pareto distribution.
ReplyDelete
Replies
Paul A. RubinJune 25, 2012 at 11:43 AM
Nils,

On your first point, I think the "leakage" in the low end threshold might be enough to produce an initial upward trend in the density of the performance measure, but I doubt it is enough to produce a density close to the normal (and, in particular, that it would be enough to offset the "heavy tail" effects reported by the authors of the paper). That's pure speculation on my part.

Your second point is well taken, although I'm not sure that the two distributions you would be convolving would both be normal. If you are correct about arriving at something like a normal product distribution, it still confirms the authors' concern about assuming normality in regression residuals.

Thanks for the comment!
ReplyDelete
Replies
GLRJune 25, 2012 at 5:12 PM
That was a lot of “ifs” there, in paragraph 6! In addition to the practical issues you identify in your bullet points, there is also the fact that in many (most?) endeavors, ability is not unidimensional. So the selection mechanism you mention will most likely give you some individuals who are not particularly close to the right-hand tail on one dimension of ability, but who are selected because they excel in a different dimension – the speedy receiver vs. the “sure hands” receiver vs. the receiver who can take a hit and advance the ball. I suspect this will also introduce more of a left tail in the distribution of performance.

As regards the findings in the P Psych paper in particular, I think there’s an issue with the measures considered. You use the term “productivity” in your post as more-or-less a synonym for “performance.” The authors of the P Psych paper stick mainly to “performance,” but in their discussion of the importance of their findings, they use “productivity” extensively. Economists (sorry) define productivity as output per unit of input. To properly compare the productivity of different “producers,” you need comparable output measures and comparable input measures.

My concern with the measures examined in the P Psych paper is that most are cumulative “career” totals (total hits, total yards, total nominations, total elections won, etc.). The denominator – “career” – is going to represent vastly different numbers of “opportunities” across the individuals in the population. In some categories, even individuals whose careers are the same duration will have more minutes played, passes attempted, at-bats, etc. Moreover, for many of the measures they consider, “success” begets more opportunities (you get more playing time or whatever), which adds to the long right tail phenomenon.

The batting stats I sent you earlier are for the 2011 Major League Baseball season, and cover the 145 players who qualified for the batting title in their respective leagues (a minimum of 502 plate appearances). So in a sense, this is an even more elite group of players than the Majors as a whole.

For this group, Total Hits is clearly non-normal, with a long right tail, but definitely has a left tail, too. On Base Percentage and Batting Average, which are both “per opportunity” measures pass K-S and Wilk-Shapiro tests (.05 level of significance) for normality. They aren’t perfect bell curves, but they’re nothing close to a Pareto distribution.

I’m not trying to suggest that productivity is necessarily normally distributed, but I do think that the findings in the P Psych paper are largely an artifact of the measures the authors chose to examine.
ReplyDelete
Replies
Steve G.May 3, 2024 at 8:01 AM
Apologies in advance for any naivety and tardiness (asynchronous communication is quirky). I wonder if both distributions may be in play in the batting average example. I imagine that the population of batters come from the 20% group of a Pareto distribution of the all candidates (those of us in the 80% on the playground knew who the 20% were). The skill set of that 20% is perhaps a Gaussian distribution albeit one that is actively being culled with by the baseball farm system (tiered leagues) with lower performers being culled out of the system and higher performers being promoted (lather, rinse, repeat). All the populations will have been manipulated to some extent - even the baseball players vs non-baseball players, it is dependent on unarticulated assumptions.
ReplyDelete
Replies

Add comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.

OR in an OB World

Friday, June 22, 2012

The Normal Density Is Not A Fractal

10 comments:

Previous Posts

Labels