Announcement

The Regression Effect / The Regression Fallacy
Gerard E. Dallal, Ph.D.

The Regression Effect

Suppose you were told that when any group of subjects with low values on some measurement is later remeasured, their mean value will increase without the use of any treatment or intervention. Would this worry you? It sure had better!

If this were true and an ineffective treatment were applied to a such a group, the increase might be interpreted improperly as a treatment effect. This could result in the costly implementation of ineffective programs or faulty public policies that block the development of real solutions for the problem that was meant to be addressed.

The behavior described in the first paragraph is real. It is called the regression effect. Unfortunately, the misinterpretation of the regression effect described in the second paragraph is real, too. It is called the regression fallacy.

The regression effect is shown graphically and numerically in the following series of plots and computer output.

                 (Full Data Set)

                    PRE        POST        DIFF
N of cases          400         400         400
Mean            118.420     118.407      -0.012
Std. Error        1.794       1.718       1.526
Standard Dev     35.879      34.364      30.514


      Mean Difference =       -0.012   
        SD Difference =       30.514   
             paired t =       -0.008 (399 df)    
                    P =        0.994
 


The first plot and piece of output show a sample of pre-test and post-test measurements taken before and after the administration of an ineffective treatment. The observations don't lie exactly on a straight line because the measurement is not entirely reproducible. In some cases, a person's pre-test measurement will be higher than the post-test measurement; in other cases the post-test measurement will be higher. Here, the pre- and post-test means are 118; the standard deviations are 35. The mean difference is 0 and the t test for equality of population means yields a P value of 0.994. There is no change in the mean or SD over time.



          (Observations with PRE <= 120)

                    PRE        POST        DIFF
N of cases          201         201         201
Mean             90.029     100.503      10.474
Std. Error        1.580       1.844       1.930
Standard Dev     22.394      26.140      27.358


      Mean Difference =       10.474   
        SD Difference =       27.358   
             paired t =        5.428 (200 df)    
                    P =       <0.001


The second plot and piece of output show what happens when post-test measurements are made only on those with pre-test measurements less than 120. In the plot, many more observations lie above the line PRE=POST than below it. The output shows that the pre-test mean is 90 while the post-test mean is 100, some 10 units higher (P < 0.001)!



 
          (Observations with PRE <= 60)

                    PRE        POST        DIFF
N of cases           23          23          23
Mean             46.060      76.111      30.050
Std. Error        2.733       4.441       4.631
Standard Dev     13.107      21.301      22.209


      Mean Difference =       30.050   
        SD Difference =       22.209   
             paired t =        6.489  (22 df)    
                    P =       <0.001

The third plot and final piece of output show what happens when a post-test measurements is taken only for those with pre-test measurements less than 60. In the plot, most observations lie above the line PRE=POST. The output shows that the pre-test mean is 46 while the post-test mean is 76, some *30* units higher (P < 0.001)!

This is how an ineffective treatment behaves. The plots and output clearly demonstrate how an analyst could be misled into interpreting the the regression effect as a treatment effect.

A Closer Look

The regression effect causes an individual's expected post-test measurement to fall somewhere between her pre-test measurement and the mean pre-test measurement. Those with very low pre-test measurements will see their average move up toward the overall mean while those with high pre-test measurements will see them move down. This is how regression got its name--Sir Francis Galton noticed that the sons of tall fathers tended to be shorter than their fathers while sons of short fathers tended to be taller. The sons "regressed to the mean".

This happens because there are two types of people with very low pre-test measurements: those who are truly low, and those with higher underlying values but appear low due to random variation. When post-test measurements are made, those who are truly low will tend to stay low, but those with higher underlying scores will tend to migrate up toward overall the mean, dragging the group's mean post-test measurement with them. A similar argument applies to those with pre-test measurements greater than the overall mean.

Another Approach

Another way to get a feel for the regression effect is to consider a situation where the intervention has no effect and the pre-test and post-test measurements are completely uncorrelated. If there is not treatment effect and the measurements are uncorrelated, then the best estimate of a subject's post-test measurement is the overall mean of the pretest measurements. Consider those subjects whose pre-test measurements are less than the overall mean (filled circles). The mean of these subjects' pre-test values must be less than the overall pre-test mean. Yet, their post-test mean will be equal to the overall pre-test mean!

                 (Full Data Set)
 
 
                    PRE        POST        DIFF
N of cases          100         100         100
Mean            100.000     100.000       0.000
Std. Error        1.000       1.000       1.414
Standard Dev     10.000      10.000      14.142
 
   
     Mean Difference =        0.000
       SD Difference =       14.142
            paired t =        0.000  (99 df)
                   P =        1.000
 
-----------------------------------------------------

          (Observations with PRE <= 100)
 
                    PRE        POST        DIFF
N of cases           50          50          50
Mean             92.030      99.966       7.936
Std. Error        0.859       1.650       1.761
Standard Dev      6.072      11.666      12.453
 
     Mean Difference =        7.936   
       SD Difference =       12.453                        
                   t =        4.506  (49 df)
                   P =       <0.000

A Third Approach

When there is no intervention or treatment effect, a plot of post-test measurements against pre-test measurements reflects only the reproducibility of the measurements. If the measurements are perfectly reproducible, the observations will lie on the line POST = PRE and the best prediction of a subject's post-test measurement will be the pre-test measurement. At the other extreme, if there is no reproducibility, the observations will lie in a circular cloud and the best prediction of a subject's post-test measurement will be the mean of all pre-test measurements. The prediction equation, then, is the line POST = mean(PRE).

In intermediate situations, where there is some reproducibility, the prediction equation given by the linear regression of post-test on pre- test lies between the line POST = PRE and the horizontal. This means an individual's post-test measurement is predicted to be somewhere between his pre-test measurement and the overall mean pre-test measurement. Thus, anyone with a pre-test measurement greater than the pretest mean will be predicted to have a somewhat lower post-test measurement, while anyone with a pre-test measurement less than the pretest mean will be predicted to have a somewhat higher post-test measurement.

None of this speaks against regression analysis or in any way invalidates it. The best estimate of an individual's post-test measurement is the mean of the post-test measurements for those with the same pre-test score. When the pre- and post-test measurements are uncorrelated, the best estimate of an individual's post-test measurement is the mean of the pre-test measurements, regardless of an individual's pre-test measurement. The purpose of this discussion is to make you aware of the way data behave in the absence of any treatment effect so the regression effect will not be misinterpreted when it is encountered in practice.

Change and The Regression Effect

According to the regression effect, those who have extremely low pretest values show the greatest increase while those who have extremely high pretest values show the greatest decrease. Change is most positive for those with the lowest pretest values and most negative for those with the largest pretest values, that is, change is negatively correlated with pretest value.

The Regression Fallacy

The regression fallacy occurs when the regression effect is mistaken for a real treatment effect. The regression fallacy is often observed where there is no overall treatment effect, prompting investigators to conduct extensive subset analyses. A typical misstatement is, "While the education program produced no overall change in calcium intake, those with low initial intakes subsequently increased their intake while those with higher initial intakes subsequently decreased their intake. We recommend that the education program be continued because of its demonstrated benefit to those with low intakes. However, it should not offered to those whose intake is adequate to begin with." Or, in Fleiss's words, "Intervention A failed to effect a strong or significant change on the average value of X from baseline to some specified time after the intervention was applied, but a statistically significant correlation was found between the baseline value of X and the change from baseline. Thus, while the effectiveness of A cannot be claimed for all individuals, it can be claimed for those who were the worst off at the start."

Another popular variant of the regression fallacy occurs when subjects are enrolled into a study on the basis of an extreme value of some measurement and a treatment is declared effective because subsequent measurements are not as extreme. Similarly, it is falacious to take individuals with extreme values from one measuring instrument (a food frequency, say), reevaluate them using a different instrument (a diet record), and declare the instruments to be biased relative to each other because the second instrument's measurements are not as extreme as the first's. The regression effect guarantees that such results must be observed in the absence of any treatment effect or bias between the instruments. To quote Fleiss (p.194), "Studies that seek to establish the effectiveness of a new therapy or intervention by studying one group only, and by analyzing change either in the group as a whole or in a subgroup that was initially extreme, are inherently flawed."

While the regression effect is real and complicates the study of subjects who are initially extreme on the outcome variable, it does not make such studies impossible. Randomization and controls are enough to compensate for it. Consider a study of subjects selected for their initially low single measurement on some measure (such as vitamin A status) who are enrolled in a controlled diet trial to raise it. Regression to the mean says even the controls will show an increase over the course of the study, but if the treatment is effective the increase will be greater in the treated group than in the controls.

Honors question: Suppose a treatment is expected to lower the post-test measurements of those with high pre-test measurements and raise the post-test measurements of those with low pre-test measurements. For example, a broad-based health care program might be expected raise mean birthweight in villages where birthweight was too low and lower mean birthweight in villages where birthweight was too high. How would this be distinguished from regression to the mean?

Answer: If the program were effective, the follow-up SD would be smaller than the initial SD. When a treatment is ineffective, the marginal distributions of the two measurements are identical. If the health care program were making birthweights more homogeneous, the follow-up SD would be smaller than the initial SD.

Because the measurements are paired (made on the same subjects), tests for equal population SDs based on independent samples cannot be used. Here, a test of the equality of the initial and follow-up SDs is equivalent to testing for a correlation of 0 between the sums and differences of the measurement pairs.


Copyright © 2000 Gerard E. Dallal