Regression toward the mean simply says that, following an extreme random event, the next random event is likely to be less extreme. In no sense does the future event ‘compensate for’ or ‘even out’ the previous event, though this is assumed in the gambler’s fallacy. Regression toward the mean was first described by Victorian polymath Francis Galton. He found that offspring of tall parents tended to be shorter. Also, offspring of shorter parents tended to be taller. Galton stated that processes that did not follow regression towards the mean would quickly go out of control. In finance, the term ‘mean reversion’ has a different meaning. Jeremy Siegel at Wharton uses it to describe a financial time series in which ‘returns can be very unstable in the short run but very stable in the long run,’ in seasonal businesses for example.
The effect can be exploited for general inference and estimation: the hottest place in the country today is more likely to be cooler tomorrow than hotter, as compared to today; the best performing mutual fund over the last three years is more likely to see relative performance decline than improve over the next three years; the most successful Hollywood actor of this year is likely to have less gross than more gross for his or her next movie; the baseball player with the greatest batting average by the All-Star break is more likely to have a lower average than a higher average over the second half of the season, etc.
In 1886, Galton published a paper called ‘Regression towards mediocrity in hereditary stature,’ in which he observed that extreme characteristics (e.g., height) in parents are not passed on completely to their offspring. Rather, the characteristics in the offspring regress towards a mediocre point. Today, this point is called the mean. By measuring the heights of hundreds of people, he was able to quantify regression to the mean, and estimate the size of the effect. Galton wrote that, ‘the average regression of the offspring is a constant fraction of their respective mid-parental deviations’ (i.e. the difference between a child and its parents for some characteristic is proportional to its parents’ deviation from typical people in the population). If the child’s parents are each two inches taller than the averages for men and women, on average, he will be shorter than his parents by some factor times two inches. Today, this factor has been calculated to be one minus the regression coefficient. For height, Galton estimated this coefficient to be about two thirds: the height of an individual will measure around a midpoint that is two thirds of the parents’ deviation from the population average.
Galton had in effect invented linear regression analysis, a tool to show the relationship between the inputs and the outputs of a system. This is the starting point for much of modern statistical modelling. Since then, the term ‘regression’ has taken on different meanings, and it may be used by modern statisticians to describe phenomena of sampling bias which have little to do with Galton’s original observations in the field of genetics. In fact, Galton’s explanation for the regression phenomenon he observed is now known to be incorrect. He stated: ‘A child inherits partly from his parents, partly from his ancestors. Speaking generally, the further his genealogy goes back, the more numerous and varied will his ancestry become, until they cease to differ from any equally numerous sample taken at haphazard from the race at large.’ This is incorrect, since a child receives its genetic makeup exclusively from its parents. There is no generation-skipping in genetic material: any genetic material from earlier ancestors than the parents must have passed through the parents. In addition, height is not entirely genetically determined, but also subject to environmental influences during development, which make offspring of exceptional parents even more likely to be closer to the average than their parents. In sharp contrast to this population genetic phenomenon of regression to the mean, which is best thought of as a combination of a binomially distributed process of inheritance (plus normally distributed environmental influences), the term ‘regression to the mean’ is now often used to describe completely different phenomena in which an initial sampling bias may disappear as new, repeated, or larger samples display sample means that are closer to the true underlying population mean.
Consider a simple example: a class of students takes a 100-item true/false test on a subject. Suppose that all students choose randomly on all questions. Then, each student’s score would be a realization of one of a set of independent and identically distributed random variables, with an expected mean of 50. Naturally, some students will score substantially above 50 and some substantially below 50 just by chance. If one takes only the top scoring 10% of the students and gives them a second test on which they again choose randomly on all items, the mean score would again be expected to be close to 50. Thus the mean of these students would ‘regress’ all the way back to the mean of all students who took the original test. No matter what a student scores on the original test, the best prediction of his score on the second test is 50. If there were no luck (good or bad) or random guessing involved in the answers supplied by students to the test questions, then all students would be expected to score the same on the second test as they scored on the original test, and there would be no regression toward the mean. Most realistic situations fall between these two extremes: for example, one might consider multiple choice exam scores as a combination of skill and luck. In this case, the subset of students scoring above average would be composed of those who were skilled and had not especially bad luck, together with those who were unskilled, but were extremely lucky. On a retest of this subset, the unskilled will be unlikely to repeat their lucky break, while the skilled will have a second chance to have bad luck. Hence, those who did well previously are unlikely to do quite as well in the second test even if the original cannot be replicated.
Regression toward the mean is a significant consideration in the design of experiments. Take a hypothetical example of 1,000 individuals of a similar age who were examined and scored on the risk of experiencing a heart attack. Statistics could be used to measure the success of an intervention on the 50 who were rated at the greatest risk. The intervention could be a change in diet, exercise, or a drug treatment. Even if the interventions are worthless, the test group would be expected to show an improvement on their next physical exam, because of regression toward the mean. The best way to combat this effect is to divide the group randomly into a treatment group that receives the treatment, and a control group that does not. The treatment would then be judged effective only if the treatment group improves more than the control group. Alternatively, a group of disadvantaged children could be tested to identify the ones with most college potential. The top 1% could be identified and supplied with special enrichment courses, tutoring, counseling and computers. Even if the program is effective, their average scores may well be less when the test is repeated a year later.
Many phenomena tend to be attributed to the wrong causes when regression to the mean is not taken into account. An extreme example of the regression fallacy is statistician Horace Secrist’s 1933 book ‘The Triumph of Mediocrity in Business,’ in which the statistics professor collected mountains of data to prove that the profit rates of competitive businesses tend toward the average over time. In fact, there is no such effect; the variability of profit rates is almost constant over time. Secrist had only described the common regression toward the mean. One exasperated reviewer, Harold Hotelling, likened the book to ‘proving the multiplication table by arranging elephants in rows and columns, and then doing the same for numerous other kinds of animals.’ The calculation and interpretation of ‘improvement scores’ on standardized educational tests can also result in a regression fallacy. In 1999, schools were given improvement goals in Massachusetts. For each school, the Department of Education tabulated the difference in the average score achieved by students in 1999 and in 2000. It was quickly noted that most of the worst-performing schools had met their goals, which the Department of Education took as confirmation of the soundness of their policies. However, it was also noted that many of the supposedly best schools in the Commonwealth, such as Brookline High School (with 18 National Merit Scholarship finalists) were declared to have failed.
Psychologist Daniel Kahneman, winner of the 2002 Nobel prize in economics, pointed out that regression to the mean might explain why rebukes can seem to improve performance, while praise seems to backfire. ‘I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said, ‘On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don’t tell us that reinforcement works and punishment does not, because the opposite is the case.’ This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency.’
To put Kahneman’s regression fallacy story in simple terms, it means that when you make a severe mistake, later your performance will usually return to average level anyway. This will seem as a improvement and as a ‘proof’ of a belief that it is better to criticize than to praise (held especially by a person who will criticize you at that ‘low’ moment). In the contrary situation when you happen to do something high above average performance (performance will also return to the average level later on), the change will be perceived as a deterioration and if being praised, as a cause of that deterioration. Just because criticizing or praising precedes the change (regression toward the mean) they are falsely attributed causality. In one case a positive and in the other negative (both false of course).
Statistical analysts have long recognized the effect of regression to the mean in sports; they even have a special name for it: the ‘Sophomore Slump.’ For example, Carmelo Anthony of the NBA’s Denver Nuggets had an outstanding rookie season in 2004. It was so outstanding, in fact, that he couldn’t possibly be expected to repeat it: in 2005, Anthony’s numbers had dropped from his rookie season. The reasons for the ‘sophomore slump’ abound, as sports are all about adjustment and counter-adjustment, but luck-based excellence as a rookie is as good a reason as any. Regression to the mean in sports performance may be the reason for the ‘Sports Illustrated cover jinx’ and the ‘Madden Curse.’ John Hollinger has an alternate name for the phenomenon of regression to the mean: the ‘fluke rule,’ while sports writer Bill James calls it the ‘Plexiglas Principle.’ However, because popular lore has focused on ‘regression toward the mean’ as an account of declining performance of athletes from one season to the next, it has usually overlooked the fact that such regression can also account for improved performance. For example, if one looks at the batting average of Major League Baseball players in one season, those whose batting average was above the league mean tend to regress downward toward the mean the following year, while those whose batting average was below the mean tend to progress upward toward the mean the following year.




Leave a comment