Announcement

Why SAS's PROC MIXED Can Seem So Confusing
Gerard E. Dallal, Ph.D.

[Early draft subject to change.]

[The technical details are largely a restatement of the Technical Appendix of Littell RC, Henry PR, and Ammerman CB (1998), "Statistical Analysis of Repeated Measures Data Using SAS Procedures", Journal of Animal Science, 76, 1216-1231.]

Abstract

The random and repeated statements of SAS's PROC MIXED have different roles. The random statement identifies random effects. The repeated statement specifies the structure of the within subject errors. They are not interchangeable. However, there are overspecified models that can be specified by using a random or repeated statement alone. Unfortunately, one such model is the commonly encounterd repeated measures with compound symmetry. This has the potential of leading to confusion over the proper use of the two types of statements.

The simple answer to why SAS's PROC MIXED can seem so confusing is that it's so powerful, but there's more to it than that. Early on, many guides to PROC MIXED present an example of fitting a compound symmetry model to a repeated measures study in which subjects (ID) are randomized to one of many treatments (TREAT) and then measured at multiple time points (PERIOD). The command language to analyze these data can be written

proc mixed;
  class id treat period;
  model y=treat period treat*period;
  repeat period/sub=id(treat) type=cs;
or
proc mixed;
class id treat period;
  model y=treat period treat*period;
  random id(treat);

Because both sets of command language produce the correct analysis, this immediately raises confusion over the roles of the repeated and random statements, In order to sort this out, the underlying mathematics must be reviewed. Once the reason for the equivalence is understood, the purposes of the repeated and random statements will be clear.

PROC MIXED is used to fit models of the form

y = Xβ + ZU + e
where
  • y is a vector of responses
  • X is a known design matrix for the fixed effects
  • β is vector of unknown fixed-effect parameters
  • Z is a known design matrix for the random effects
  • U is vector of unknown random-effect parameters
  • e is a vector of (normally distributed) random errors.
The random statement identifies the random effects. The repeated statement specifies the structure of the within subject errors.

For the repeated measures example,

yijk = μ + αi + γk + (αγ)ik + uij + eijk
where
  • yijk is response at time k for the j-th subject in the i-th group
  • μ, αi, γk, and (αγ)ik are fixed effects
  • uij is the random effect corresponding to the j-th subject in the i-th group
  • eijk is random error

The variance of yijk is

var(yijk) = var(uij + eijk)
The variance of the u-s is typically constant (denoted σu2). The errors eijk are typically idependent of the random effects uij. Therefore,
var(yijk) = σu2 + var(eijk)

The covariance between any two observations is

cov(yijk,ylmn) = cov(uij,ulm) + cov(uij,elmn) + cov(ulm,eijk) + cov(eijk,elmn)
Observations from different animals are typically considered to be independent of each other. Therefore, the covariance between two observations will be 0 unless i=l and j=m, in which case
cov(yijk,yijn) = cov(uij,uij) + cov(eijk,eijn)
= σu2 + cov(eijk,eijn)

Under the assumption of compound symmetry, cov(eijk,eijn) is σe2+σ, for k=n, and σe2, otherwise. It therefore follows that

var(yijk) = σu2 + σe2 + σ
and
cov(yijk,yijn) = σu2 + σe2.

The model is redundant because σu2 and σe2 occur only in the sum σu2 + σe2, so the sum σu2 + σe2 can be estimated, but σu2 and σe2 cannot be estimated individually. The command language file with the random statement resolves the redundancy by introducing the u-s into the model and treating the repeated measures as independent. The command language file with the repeated statement resolves the redundancy by removing the u-s from the model.

Littel et al. point out that a similar redundancy exists for the unstructured covariance matrix (TYPE=UN), but there is no reduncancy for an auto-regressive covariance structure (TYPE=AR1). In the latter case, both random and repeated statements should be used. See their article for additional details.


[back to LHSP]
Gerard E. Dallal