Part II: After SAS's Mixed Procedure

Gerard E. Dallal, Ph.D.

[On occasion, I am asked when this note will be completed. That's a hard question to answer, but the delay is not for lack of interest or enthusiasm. This is arguably the most important topic in linear models today. The techniques described in Part I were developed to be carried out by hand, before computers were invented. They place many constraints on the data, not all of which are met in practice. The class of models that PROC MIXED makes available are computationally intensive, but much better reflect the structure of repeated measures data. However, this is not a simple topic that can be summarized suscintly in a few paragraphs, at least not by me at this time.

Until the time comes when I can do this note justice, and perhaps even afterward, there is no better discussion of repeated measures and longitudinal data than in the book Applied Longitudinal Analysis by Garrett Fitzmaurice, Nan Laird, and James Ware, published by John Wiley & Sons, Inc., ISBN 0-471-21487-6. (The link points to Amazon.com for the convenience of the reader. I am not an Amazon affiliate. I receive no remuneration of any kind if someone buys the book by clicking through. Amazon. I've stripped from the URL everything that looked like it could identify this site as having provided the link.)]

SAS's MIXED procedure revolutionized the way repeated measures analyses are performed. It requires the data to be in the one-record-per- measurment (or many-records-per-subject) format. As with other programs that analyze data in that format, PROC MIXED handles missing data and applies multiple comparison procedures to both between and within subjects factors. Unlike other programs, PROC MIXED handles all of the technical details itself. In particular, it knows the proper way to construct its test statistics that account for the fixed and random nature of the study factors. In addition, it provides many important, unique features. For example, it provides for many covariance structures for the repeated measures. However, PROC MIXED has a rich command language which often provides many ways of accomplishing a particular task. Care and attention to detail is necessary so that a model is specified correctly.

Consider a simple repeated measures study in which 8 subjects (ID) are randomized to one of 2 treatments (TREAT) and then measured under 3 periods (PERIOD). Although it might be unrealistic, let's fit the model assuming compound symmetry. The command language can be written

proc mixed; class id treat period; model y=treat period treat*period; repeat period/sub=id(treat) type=cs;The key elements are that

**all factors**, fixed and random, go into the

statement.**class**- Only
**fixed factors**go into the

statement, however.**model** - The

statement specifies the repeated measures, while**repeat** - the

option is used to specify the variable that identifies subjects.**sub=**

The same results can be obtained by the command language

proc mixed; class id treat period; model y=treat period treat*period; random id(treat) period*id(treat);or

proc mixed; class id treat period; model y=treat period treat*period; random id(treat);or

proc mixed; class id treat period; model y=treat period treat*period; random int/sub=id(treat);In all three examples, the

`repeated`

statement is
replaced by a `random`

statement. In the first example, there
is no `sub=`

option and all random factors are declared
explicitly in the `random`

statement. In the second example,
period*id(treat) is left off the `random`

statement. This is
possible because its inclusion exhausts the data. When it is eliminated,
it is not being pooled with other sources of variation. The third
example uses the `sub=`

option to specify a subject
identifier. In this formulation, the `random`

statement
specifies a random intercept (`int`

) for each subject.
With repeated measures analysis of variance, measurements made the same subject are likely to be more similar than measurements made on different individuals. That is, repeated measures are correlated. For an analysis to be valid, the covariances among repeated measures must be modeled properly.

The three most commonly used covariance structures are *compund
symmetry* (CS), *unstructured* (UN), and *auto regressive
(1)* (AR(1)).

**Compound symmetry:**This structure says that the correlations between all pairs of measures are the same. One reason for its popularity is that in many simple cases it gives the same results as the univariate analysis from pre-PROC MIXED repeated measures ANOVA programs, including SAS's own PROC GLM. Also, the assumption is not unreasonable when the repeated measures arise from different sets of conditions, such as the response to different treatments.Its biggest drawback is that it is often unrealistic when the repeated measures are serial measurements, that is, the same response measured over time. Typically, measurements that are made close together (consecutive measurments, say) will be more highly correlated than measurements made farther apart (the first and last).

**Auto regressive (1):**This structure resolves some of the objections to the use of compound symmetry with serial data when the measures are equally spaced over time. AR(1) says that the correlation between two responses that are*t*measurements apart is^{t}. Since less than 1, the greater power, the smaller the magnitude. Thus, the farther apart measurements are, the lower their correlation.**Unstructured:**Sometimes, whatever the nature of the repeated measures, no standard covariance structure seems appropriate. The analyst is content to estimate every covariance individually and let the data themselves dictate what they should be. That is what the**unstructured**option does. Just as there is no such thing as a free lunch, there is no such thing as a free covariance matrix. The more data that are used to assess the correlation structure, the less data there are to estimate the parameters of linear models. An analysis that uses an unstructured covariance matrix will be less powerful that an analysis that uses the proper structure. The problem, though, is knowing what the structure is.

With some many options available, how does one choose among many covariances? Ideally, the covariance structure should be known from previous work or subject matter considerations. Otherwise, one runs the risk of "shopping" for the structure that leads to a preconceived result. However, there are many cases where the strucure is unknown or where the analyst would like to check to be sure that s/he is not making a mistake (in the manner of checking for an interaction between a factor and covariate in an analysis of covariance model).

While there are many who view this approach with a bit of skepticism, it is common for analysts to consider a few likely structures and choose among them according to some measure of fit. These measures tend to be composed of two parts--one that rewards for the accuracy of the fit and another that penalizes for the number of parameters it takes to achive it. The most popular of these is the Akaike Information Criterion (AIC).

- In the reward portion, the AIC looks at how well the estimated and observed structures agree, or rather the extent to which they differ. Small values are good.
- In the penalty portion, the AIC considers how many parameters it takes to achieve the fit.

Thus, one might analyze the data using the CS, AR(1), and UN covariance structures and choose the one for which the AIC is a minimum.