The Regression Effect / The
Regression Fallacy
Gerard E. Dallal, Ph.D.
Suppose you were told that when any group of subjects with low values on some measurement is later remeasured, their mean value will increase without the use of any treatment or intervention. Would this worry you? It sure had better!
If this were true and an ineffective treatment were applied to a such a group, the increase might be interpreted improperly as a treatment effect. This could result in the costly implementation of ineffective programs or faulty public policies that block the development of real solutions for the problem that was meant to be addressed.
The behavior described in the first paragraph is real. It is called the regression effect. Unfortunately, the misinterpretation of the regression effect described in the second paragraph is real, too. It is called the regression fallacy.
The regression effect is shown graphically and numerically in the following series of plots and computer output.
(Full Data Set) PRE POST DIFF N of cases 400 400 400 Mean 118.420 118.407 -0.012 Std. Error 1.794 1.718 1.526 Standard Dev 35.879 34.364 30.514 Mean Difference = -0.012 SD Difference = 30.514 paired t = -0.008 (399 df) P = 0.994
The first plot and piece of output show a sample of pre-test and post-test measurements taken before and after the administration of an ineffective treatment. The observations don't lie exactly on a straight line because the measurement is not entirely reproducible. In some cases, a person's pre-test measurement will be higher than the post-test measurement; in other cases the post-test measurement will be higher. Here, the pre- and post-test means are 118; the standard deviations are 35. The mean difference is 0 and the t test for equality of population means yields a P value of 0.994. There is no change in the mean or SD over time.
(Observations with PRE <= 120) PRE POST DIFF N of cases 201 201 201 Mean 90.029 100.503 10.474 Std. Error 1.580 1.844 1.930 Standard Dev 22.394 26.140 27.358 Mean Difference = 10.474 SD Difference = 27.358 paired t = 5.428 (200 df) P = <0.001
The second plot and piece of output show what happens when post-test measurements are made only on those with pre-test measurements less than 120. In the plot, many more observations lie above the line PRE=POST than below it. The output shows that the pre-test mean is 90 while the post-test mean is 100, some 10 units higher (P < 0.001)!
(Observations with PRE <= 60) PRE POST DIFF N of cases 23 23 23 Mean 46.060 76.111 30.050 Std. Error 2.733 4.441 4.631 Standard Dev 13.107 21.301 22.209 Mean Difference = 30.050 SD Difference = 22.209 paired t = 6.489 (22 df) P = <0.001
The third plot and final piece of output show what happens when a post-test measurements is taken only for those with pre-test measurements less than 60. In the plot, most observations lie above the line PRE=POST. The output shows that the pre-test mean is 46 while the post-test mean is 76, some *30* units higher (P < 0.001)!
This is how an ineffective treatment behaves. The plots and output clearly demonstrate how an analyst could be misled into interpreting the the regression effect as a treatment effect.
A Closer Look
The regression effect causes an individual's expected post-test measurement to fall somewhere between her pre-test measurement and the mean pre-test measurement. Those with very low pre-test measurements will see their average move up toward the overall mean while those with high pre-test measurements will see them move down. This is how regression got its name--Sir Francis Galton noticed that the sons of tall fathers tended to be shorter than their fathers while sons of short fathers tended to be taller. The sons "regressed to the mean".
This happens because there are two types of people with very low pre-test measurements: those who are truly low, and those with higher underlying values but appear low due to random variation. When post-test measurements are made, those who are truly low will tend to stay low, but those with higher underlying scores will tend to migrate up toward overall the mean, dragging the group's mean post-test measurement with them. A similar argument applies to those with pre-test measurements greater than the overall mean.
Another Approach
Another way to get a feel for the regression effect is to consider a
situation where the intervention has no effect and the pre-test and
post-test measurements are completely uncorrelated. If there is not
treatment effect and the measurements are
uncorrelated, then the best estimate of a subject's post-test measurement
is the overall mean of the pretest measurements. Consider those subjects
whose pre-test measurements are less than the overall mean (filled
circles). The mean of these subjects' pre-test values must be less
than the overall pre-test mean. Yet, their post-test mean will be
equal to the overall pre-test mean!
(Full Data Set) PRE POST DIFF N of cases 100 100 100 Mean 100.000 100.000 0.000 Std. Error 1.000 1.000 1.414 Standard Dev 10.000 10.000 14.142 Mean Difference = 0.000 SD Difference = 14.142 paired t = 0.000 (99 df) P = 1.000 ----------------------------------------------------- (Observations with PRE <= 100) PRE POST DIFF N of cases 50 50 50 Mean 92.030 99.966 7.936 Std. Error 0.859 1.650 1.761 Standard Dev 6.072 11.666 12.453 Mean Difference = 7.936 SD Difference = 12.453 t = 4.506 (49 df) P = <0.000
A Third Approach
When there is no intervention or treatment effect, a plot of post-test measurements against pre-test measurements reflects only the reproducibility of the measurements. If the measurements are perfectly reproducible, the observations will lie on the line POST = PRE and the best prediction of a subject's post-test measurement will be the pre-test measurement. At the other extreme, if there is no reproducibility, the observations will lie in a circular cloud and the best prediction of a subject's post-test measurement will be the mean of all pre-test measurements. The prediction equation, then, is the line POST = mean(PRE).
In intermediate situations, where there is some reproducibility, the prediction equation given by the linear regression of post-test on pre- test lies between the line POST = PRE and the horizontal. This means an individual's post-test measurement is predicted to be somewhere between his pre-test measurement and the overall mean pre-test measurement. Thus, anyone with a pre-test measurement greater than the pretest mean will be predicted to have a somewhat lower post-test measurement, while anyone with a pre-test measurement less than the pretest mean will be predicted to have a somewhat higher post-test measurement.
None of this speaks against regression analysis or in any way invalidates it. The best estimate of an individual's post-test measurement is the mean of the post-test measurements for those with the same pre-test score. When the pre- and post-test measurements are uncorrelated, the best estimate of an individual's post-test measurement is the mean of the pre-test measurements, regardless of an individual's pre-test measurement. The purpose of this discussion is to make you aware of the way data behave in the absence of any treatment effect so the regression effect will not be misinterpreted when it is encountered in practice.
Change and The Regression Effect
According to the regression effect, those who have extremely low pretest values show the greatest increase while those who have extremely high pretest values show the greatest decrease. Change is most positive for those with the lowest pretest values and most negative for those with the largest pretest values, that is, change is negatively correlated with pretest value.
The regression fallacy occurs when the regression effect is mistaken for a real treatment effect. The regression fallacy is often observed where there is no overall treatment effect, prompting investigators to conduct extensive subset analyses. A typical misstatement is, "While the education program produced no overall change in calcium intake, those with low initial intakes subsequently increased their intake while those with higher initial intakes subsequently decreased their intake. We recommend that the education program be continued because of its demonstrated benefit to those with low intakes. However, it should not offered to those whose intake is adequate to begin with." Or, in Fleiss's words, "Intervention A failed to effect a strong or significant change on the average value of X from baseline to some specified time after the intervention was applied, but a statistically significant correlation was found between the baseline value of X and the change from baseline. Thus, while the effectiveness of A cannot be claimed for all individuals, it can be claimed for those who were the worst off at the start."
Another popular variant of the regression fallacy occurs when subjects are enrolled into a study on the basis of an extreme value of some measurement and a treatment is declared effective because subsequent measurements are not as extreme. Similarly, it is falacious to take individuals with extreme values from one measuring instrument (a food frequency, say), reevaluate them using a different instrument (a diet record), and declare the instruments to be biased relative to each other because the second instrument's measurements are not as extreme as the first's. The regression effect guarantees that such results must be observed in the absence of any treatment effect or bias between the instruments. To quote Fleiss (p.194), "Studies that seek to establish the effectiveness of a new therapy or intervention by studying one group only, and by analyzing change either in the group as a whole or in a subgroup that was initially extreme, are inherently flawed."
While the regression effect is real and complicates the study of subjects who are initially extreme on the outcome variable, it does not make such studies impossible. Randomization and controls are enough to compensate for it. Consider a study of subjects selected for their initially low single measurement on some measure (such as vitamin A status) who are enrolled in a controlled diet trial to raise it. Regression to the mean says even the controls will show an increase over the course of the study, but if the treatment is effective the increase will be greater in the treated group than in the controls.
Honors question: Suppose a treatment is expected to lower the post-test measurements of those with high pre-test measurements and raise the post-test measurements of those with low pre-test measurements. For example, a broad-based health care program might be expected raise mean birthweight in villages where birthweight was too low and lower mean birthweight in villages where birthweight was too high. How would this be distinguished from regression to the mean?
Answer: If the program were effective, the follow-up SD would be smaller than the initial SD. When a treatment is ineffective, the marginal distributions of the two measurements are identical. If the health care program were making birthweights more homogeneous, the follow-up SD would be smaller than the initial SD.
Because the measurements are paired (made on the same subjects),
tests for equal population SDs based on independent samples cannot be
used. Here, a test of the equality of the initial and follow-up SDs is
equivalent to testing for a correlation of 0 between the sums and
differences of the measurement pairs.