Announcement

Repeated Measures Analysis Of Variance
Part I: Before SAS's Mixed Procedure

Gerard E. Dallal, Ph.D.

Introduction

Repeated measures analysis of variance generalizes Student's t test for paired samples. It is used when two or more measurements of the same type are made on the same subject. At one time, some statisticians made a sharp distinction between measurement made using different but related techniques and serial measurements made the same way over time. The term repeated measures was reserved for nonserial measures. Lately the distinction has blurred. Any time multiple measurements are made on the same subject, they tend to be called repeated measures. It's unfortunate that the distinction has blurred because serial measures should be approached differently from other types of repeated measures. This will be discussed later. However, to keep the discussion simple, all we'll ask of repeated measures here is that multiple measurements of some kind be made on the same subject.

Analysis of variance is characterized by the use of factors, which are composed of levels. Repeated measures analysis of variance involves two types of factors--between subjects factors and within subjects factors.

The repeated measures make up the levels of the within subjects factor. For example, suppose each subject has his/her reaction time measured under three different conditions. The conditions make up the levels of the within subjects factor, which might be called CONDITION. Depending on the study, subjects may divided into groups according to levels of other factors called between subjects factors. Each subject is observed at only a single level of a between-subjects factor. For example, if subjects were randomized to aeorbic or stretching exercise, form of exercise would be a between-subjects factor. The levels of a within-subject factor change as we move within a subject, while levels of a between-subject factor change only as we move between subjects.

Technical Issues

Most statistical program packages report two separate analyses for repeated measures data. One is labeled Univariate Repeated Measures Analysis of Variance; the other is labeled Multivariate Repeated Measures Analysis of Variance (MANOVA).

The univariate approach is more widely known and used because it was developed long before the ready availability of computers. The calculations can be performed by hand if necessary. It is essentially a multi-factor analysis of variance in which one of the factors is the random factor "Subject". The advantage to using a program's repeated measures routines is that the special handling required for the random "Subjects" factor is taken care of automatically. However, the univariate analysis demands that every pair of measures have the same correlation coefficient across subjects. While this may be reasonable when repeated measures come from different procedures, it is not realistic with serial measurements, where consecutive measurements are usually more highly correlated than measurements made far apart. Two adjustments--Greenhouse-Geisser and Huynh-Feldt--have been proposed to correct observed significance levels for unequal correlation coefficients.

The multivariate approach is computationally complex, but this is no longer an issue now that computers can do the work. The multivariate analysis does not require the correlations to be equal but it is less powerful (able to detect real differences) in small samples when the underlying conditions for the univariate approach are met. While both analyses require the data to follow a multivariate normal distribution, they differ in the way they are sensitive to violations of the assumption.

The multivariate approach includes many statistics--Wilks' Lambda, Pillai's Trace, Hotelling-Lawley Trace, and Roy's Greatest Root. They are different ways of summarizing the data. In the vast majority of cases, the observed significance levels for these statistics will be the same, so multiple testing concerns will not apply.

In summary, there are two accepted ways to analyze repeated measures designs--the univariate approach and the multivariate approach. While they often agree, they need not. Looney and Stanley (The American Statistician, 43(1989), 220-225) suggest a Bonferroni approach: declare an effect significant at the 0.05 level if either test is significant at the 0.025 level. In my experience, this recommendation is too simplistic. When the tests disagree, it can be due to an outlier, or that the requirements of one or both tests are not met by the data, or because one test has much less power than the other. Further study is needed to determine the cause of the disagreement. Wilkinson (Systat Statistics manual, 1990, page 301) states, "If they [univariate and multivariate analyses] lead to different conclusions, you are usually [in 1988, it read "almost always"] safer trusting the multivariate statistic because it does not require the compound symmetry assumption." I tend to agree with Wilkinson, but not for small samples. There, the reduced number of degrees of freedom for error in the multivariate approach may cause it to fail to identify effects that are significant in the univariate analysis.

When there are significant differences between levels of the within subject factor or when there are interactions involving the within subjects factor, it is common to want to describe them in some detail. The major drawback to most of today's repeated measures analysis routines is that they do not provide the standard set of multiple comparison procedures. The only method most programs provide is paired t tests. Even that isn't easy because most programs, in a single run, will only compare a specified level to all other. When there are four levels, three separate analyses are required to generate all of the comparisons. The first might generate (1,2), (1,3), (1,4). The second might generate (2,1), (2,3), (2,4), which obtains (2,3) and (2,4) but duplicates (1,2). A third analysis is required to obtain (3,4). It is up to the user to apply a Bonferroni adjustment manually.

Another approach, which makes the standard set of multiple comparison procedures available, is to perform a multi-factor analysis of variance in which SUBJECT appears explicitly as a factor. Because SUBJECT is a random factor and the standard analysis of variance routines assume all factors are fixed, it is up to the user to see that test statistics are constructed properly by using whatever features the software provides. Also, the G-G and H-F corrections to observed significance levels are not provided, so it is up to the user to determine whether it is appropriate to assume the data possess compound symmetry. The data will have to be rearranged to perform the analysis.

In the standard data matrix, each row corresponds to a different subject and each column contains a different measurement. If three measurements are made on subjects randomized to one of two treatments (A/B), a subject's data might look like

                            ID  TREAT   M1   M2   M3
                           1001   A     99  102  115

This is the subject-by-variables format required by most repeated measures analysis of variance programs.

In the rearranged data file, There will be as many records for each subject as there are repeated measures. In this type of file, a subject's data might look like

                              ID  TREAT METHOD   X
                             1001   A      1    99
                             1001   A      2   102
                             1001   A      3   115

Most statistical program packages have their own routines for rearranging data. Some are easier to use than others. [I use the program SYSTAT for most of my work. However, I became so dissatisfied with its routines for rearranging data that I wrote my own program to rearrange my SYSTAT files for me. It will accept either version 7 (.SYS) or version 8 (.SYD) files as input. It produces version 7 files of rearranged data as output, so long variable names (longer than 8 characters) are not permitted. It can be downloaded by clicking on the link.]

The rearranged data can be analyzed by using a multi-factor, mixed model analysis of variance. It is a mixed model because the factor subject--here, ID--is a random factor, while TREAT and METHOD are fixed. It gets somewhat more complicated because ID is nested within TREAT. Nested factors is a special topic that deserves its own discussion. Because it's a mixed model, test statistics must be constructed carefully. Default tests that assume all factors are fixed will be inappropriate and often too liberal, that is, lead to statistical significance more often than is proper.

One way of looking at a program's repeated measures ANOVA module is that in exchange for the lack of multiple comparison procedures, it performs a univariate analysis properly, that is, it saves users from having to know how to define test statistics for themselves, and adds the G-G and H-F correction to the observed significance levels.

I oversimplify somewhat.

Practical Considerations

The types of analyses that can be obtained from a computer program depend on the way the data are arranged as well as the particular software package. Multiple arrangements of the data may be needed to analyze the data properly. Data should not have to be entered twice. Almost every full-featured package makes it possible to construct one type of file from the other but, at present, this is not a task for the novice.

The analysis of repeated measures data, like any other analysis, begins with a series of graphical displays to explore the data. After studying scatterplots, box plots, and dot plots, a line plot showing profiles of each treatment group is constructed by plotting mean response against time for each treatment group. Each treatment's data are connected by a distinctive line style or color so that the treatments can be distinguished. If the number of subjects is suitably small, parallel plots can be constructed similar to the line plot in which each data for individual subjects are plotted. There is typically one plot for each treatment group containing one line for each subject. In some program packages, these plots are more easily constructed when the data are arranged with one record per measurement by using a program's ability to construct separate plots for each subgroup, defined by subject or treatment.

SAS, SPSS, and SYSTAT all allow the use of nested factors, but only SYSTAT can specify them through menus. SPSS lets factors be specified as fixed or random and will generate the proper F ratios for repeated measures analyses. SAS has a similar feature but requires that interactions be declared fixed or random, too. SYSTAT has no such feature; each F ratio must be specified explicitly. Part of the reason for this inconsistency is that there is no general agreement about the proper analysis of mixed models, which makes vendors reluctant to implement a particular approach.

Comments

When there are only two repeated measures, the univariate and multivariate analyses are equivalent. In addition, the test for treatment-by-measure interaction will be equivalent to a single factor ANOVA of the difference between the two measurements. When there are only two treatment groups, this reduces further to Student's t test for independent samples. It is constructive to take a small data set and verify this by using a statistical program package.

Missing data: The standard statistical program packages provide only two options for dealing with missing data--ignore the subjects with missing data (keeping all the measures) or ignore the measures for which there are missing values (keeping all the subjects). Left to their own devices, the packages will eliminate subjects rather than measures, which is usually the sensible thing to do because typically more data are lost eliminating measures. If the data are rearranged and a multi-factor analysis of variance approach is used, all of the available data can be analyzed.

What to do? If these are the only programs available, I would begin with the standard analysis. If the univariate and multivariate approaches gave the same result and/or if the Greenhouse-Geiser and Huynh-Feldt adjusted P values did not differ from the unadjusted univariate P values, I would rearrange the data so that multiple comparison procedures could be applied to the within subjects factors.

[back to LHSP]

Copyright © 2001 Gerard E. Dallal