Announcement

### The Analysis of Pre-test/Post-test Experiments Gerard E. Dallal, Ph.D.

[This is an early draft. [figure] is a placeholder
for a figure to be generated when I get the chance.]

Consider a randomized, controlled experiment in which measurements are made before and after treatment.

One way to analyze the data is by comparing the treatments with respect to their post-test measurements. [figure]

Even though subjects are assigned to treatment at random, there may be some concern that any difference in the post-test measurements might be due a failure in the randomization. Perhaps the groups differed in their pre-test measurements.* [figure]

One way around the problem is to compare the groups on differences between post-test and pretest, sometimes called change scores or gain scores. [figure] The test can be carried out in a number of equivalent ways:

• t-test of the differences;
• 2-group ANOVA of the differences,
• repeated measures analysis of variance.

However, there is another approach that could be used--analysis of covariance, in which

• the post-test measurement is the response,
• treatment is the design factor, and
• the pre-test is a covariate.
[figure] It is possible for the analysis of covariance to produce a significant treatment effect while the t-test based on differences does not, and vice-versa. The question, then, is which analysis to use.

The problem was first stated by Lord (1967: Psych. Bull., 68, 304-305) in terms of a dietician who measures students' weight at the start and end of the school year to determine sex differences in the effects of the diet provided in the university's dining halls. The data are brought to two statisticians. The first, analyzing the differences (weight changes), claims there is no difference in weight gain between men and women. The second, using analysis of covariance, finds a difference in weight gain. Lord's conclusion was far from optimistic:

[W]ith the data usually available for such studies, there is simply no logical or statistical procedure that can be counted on to make proper allowances for uncontrolled pre-existing differences between groups. The researcher wants to know how the groups would have compared if there had been no pre-existing uncontrolled differences. The usual research study of this type is attempting to answer a question that simply cannot be answered in any rigorous way on the basis of available data.

Lord was wrong. His confusion is evident in the phrase, "controlling for pre-existing conditions." The two procedures, t-test and ANCOVA, test different hypotheses! For Lord's problem,

• the t test answers the question, "Is there a difference in the mean weight change for boys and girls?"
• ANCOVA answers the question, "Are boys and girls of the same initial weight expected to have the same final weight?" or, in Lord's words, "If one selects on the basis of initial weight a subgroup of boys and a subgroup of girls having identical frequency distribution of initial weight, the relative position of the regression lines shows that the subgroup of boys is going to gain substantially more during the year than the subgroup of girls."
Despite how proper and reasonable the ANCOVA question seems, it is NOT what the dietician really wanted to know. The reason it's wrong is that when looking at boys and girls of the same weight, one is looking at a relatively light boy and a relatively heavy girl. Even if the school cafeteria had no effect on weight, regression to the mean would have those heavy girls end up weighing less on average and those light boys end up weighing more, even though mean weight in each group would be unchanged.

Campbell and Erlebacher have described a problem that arises in attempts to evaluate gains due to compensatory education in lower-class populations.

Because randomization is considered impractical, the investigators seek a control group among children who are not enrolled in the compensatory program. Unfortunately, such children tend to be from somewhat higher social-class populations and tend to have relatively greater educational resources. If a technique such as analysis of covariance, blocking, or matching (on initial ability) is used to create treatment and control groups, the posttest scores will regress toward their population means and spuriously cause the compensatory program to appear ineffective or even harmful. Such results may be dangerously misleading if they are permitted to influence education policy. [Bock, p. 496]

Now, consider a case where two teaching methods are being compared in a randomized trial. Since subjects are randomized to method, we should be asking the question, "Are subjects with the same initial value expected to have the same final value irrespective of method?" Even if there is an imbalance in the initial values, the final values should nevertheless follow the regression line of POST on PRE. A test for a treatment effect, then, would involve fitting separate regression lines with common slope and testing for different intercepts. But this is just the analysis of covariance.

Summary
• Use t tests when experimental groups are defined by a variable that is relevant to the change in measurement.
• Use analysis of covariance for experiments in which subjects are assigned randomly to treatment groups, regardless of whether there is any bias with respect to the initial measurement.

NOTES
1. When subjects are randomly assigned to treatment, ANCOVA and t-tests based on differences will usually give the same result because significant imbalances in the pretest measurements are unlikely.

If the measurements are highly correlated so that the common regression slope is near 1, ANCOVA and t-tests will be nearly identical.

2. ANCOVA using difference (post - pre) as the response and pre-test as the covariate is equivalent to ANCOVA using post-test as the response. Minimizing [(POST-PRE) - (a TREAT + b * PRE)]²
is equivalent to minimizing [POST - (c TREAT + d * PRE)]²
with a = c and d = 1 + b.
3. The analysis could be taken one step further to see whether the ANCOVA lines are parallel. If not, then the treatment effect is not constant. It varies with the initial value. This should be reported. There may be a range of covariate values within which the two groups have not been shown to be significantly different. The Johnson-Neyman technique can be used to identify them.

-----------------

* This is actually a thorny problem. It is generally a bad idea to adjust for baseline values solely on the basis of a significance test.

• it messes up the level of the test of the outcome variable
• if the randomization were to have failed, differences in the baseline that do not reach statistical significance might still be sufficient to affect the results.

However, there is a good reason, other than imbalance in the initial values, for taking the initial values into account. In most studies involving people, analyses that involve the initial values are typically more powerful because they eliminate much of the between-subject variability from the treatment comparison.