[On occasion, I am asked when this note will be completed. That's a hard question to answer, but the delay is not for lack of interest or enthusiasm. This is arguably the most important topic in linear models today. The techniques described in Part I were developed to be carried out by hand, before computers were invented. They place many constraints on the data, not all of which are met in practice. The class of models that PROC MIXED makes available are computationally intensive, but much better reflect the structure of repeated measures data. However, this is not a simple topic that can be summarized suscintly in a few paragraphs, at least not by me at this time.
Until the time comes when I can do this note justice, and perhaps even afterward, there is no better discussion of repeated measures and longitudinal data than in the book Applied Longitudinal Analysis by Garrett Fitzmaurice, Nan Laird, and James Ware, published by John Wiley & Sons, Inc., ISBN 0-471-21487-6. (The link points to Amazon.com for the convenience of the reader. I am not an Amazon affiliate. I receive no remuneration of any kind if someone buys the book by clicking through. Amazon. I've stripped from the URL everything that looked like it could identify this site as having provided the link.)]
SAS's MIXED procedure revolutionized the way repeated measures analyses are performed. It requires the data to be in the one-record-per- measurment (or many-records-per-subject) format. As with other programs that analyze data in that format, PROC MIXED handles missing data and applies multiple comparison procedures to both between and within subjects factors. Unlike other programs, PROC MIXED handles all of the technical details itself. In particular, it knows the proper way to construct its test statistics that account for the fixed and random nature of the study factors. In addition, it provides many important, unique features. For example, it provides for many covariance structures for the repeated measures. However, PROC MIXED has a rich command language which often provides many ways of accomplishing a particular task. Care and attention to detail is necessary so that a model is specified correctly.
Consider a simple repeated measures study in which 8 subjects (ID) are randomized to one of 2 treatments (TREAT) and then measured under 3 periods (PERIOD). Although it might be unrealistic, let's fit the model assuming compound symmetry. The command language can be written
proc mixed; class id treat period; model y=treat period treat*period; repeat period/sub=id(treat) type=cs;The key elements are that
repeatstatement specifies the repeated measures, while
sub=option is used to specify the variable that identifies subjects.
The same results can be obtained by the command language
proc mixed; class id treat period; model y=treat period treat*period; random id(treat) period*id(treat);or
proc mixed; class id treat period; model y=treat period treat*period; random id(treat);or
proc mixed; class id treat period; model y=treat period treat*period; random int/sub=id(treat);In all three examples, the
repeatedstatement is replaced by a
randomstatement. In the first example, there is no
sub=option and all random factors are declared explicitly in the
randomstatement. In the second example, period*id(treat) is left off the
randomstatement. This is possible because its inclusion exhausts the data. When it is eliminated, it is not being pooled with other sources of variation. The third example uses the
sub=option to specify a subject identifier. In this formulation, the
randomstatement specifies a random intercept (
int) for each subject.
With repeated measures analysis of variance, measurements made the same subject are likely to be more similar than measurements made on different individuals. That is, repeated measures are correlated. For an analysis to be valid, the covariances among repeated measures must be modeled properly.
The three most commonly used covariance structures are compund symmetry (CS), unstructured (UN), and auto regressive (1) (AR(1)).
Its biggest drawback is that it is often unrealistic when the repeated measures are serial measurements, that is, the same response measured over time. Typically, measurements that are made close together (consecutive measurments, say) will be more highly correlated than measurements made farther apart (the first and last).
With some many options available, how does one choose among many covariances? Ideally, the covariance structure should be known from previous work or subject matter considerations. Otherwise, one runs the risk of "shopping" for the structure that leads to a preconceived result. However, there are many cases where the strucure is unknown or where the analyst would like to check to be sure that s/he is not making a mistake (in the manner of checking for an interaction between a factor and covariate in an analysis of covariance model).
While there are many who view this approach with a bit of skepticism, it is common for analysts to consider a few likely structures and choose among them according to some measure of fit. These measures tend to be composed of two parts--one that rewards for the accuracy of the fit and another that penalizes for the number of parameters it takes to achive it. The most popular of these is the Akaike Information Criterion (AIC).
Thus, one might analyze the data using the CS, AR(1), and UN covariance structures and choose the one for which the AIC is a minimum.