Adjusted Means: Adjusting For Categorical Variables
Gerard E. Dallal, Ph.D.

In a study of the cholesterol levels of omnivores (meat eaters) and vegans (no animal products), suppose the data are something like

Mean (n) Omnivore Vegan
Male 230 (10) 220 (90)
Female 210 (40) 200 (10)

The mean cholesterol level of all omnivores is 214 mg/dl (= [230*10+210*40]/50) while for vegans it is 218 (= [220*90+200*10]/100). Thus, the mean cholesterol level of omnivores is 4 mg/dl lower than that of vegans even though both male and female omnivores have mean levels 10 mg/dl higher than vegans'! The reason for the discrepancy is a confounding, or mixing up, of sex and diet. Males have mean levels 20 mg/dl higher than females regardless of diet. The vegans are predominantly male and while the omnivores are predominantly female. The benefit of being a vegan is swamped by the deficit of being male while the deficit of being an omnivore is swamped by the benefit of being female.

Means that have been corrected for such imbalances are called adjusted means or, lately, least squares means. Adjusted means are predicted values from a multiple regression equation (hence, the name least squares means). The equation will contain categorical predictors (factors) and numerical predictors (covariates). Standard practice is to estimate adjusted means by plugging in the mean value of any covariate to estimate the mean response for all combinations of the factors and taking simple means of these estimates over factor levels. Those familiar with directly standardized rates will see that this is essentially the same operation.

If SEX is treated as a categorical variable, the adjusted mean cholesterol level for omnivores is calculated by taking the simple mean of the mean cholesterol levels for male omnivores and female omnivores (that is, 220 [= (230+210)/2]) and similarly for vegans (210 [= (220+200)/2]). The adjusted mean for omnivores is 10 mg/dl higher than the vegans', which is the same as the difference observed in men and women separately. The calculations reflect the notion that, despite the imbalance in sample sizes, the best estimates of the cholesterol levels of male and female omnivores and vegans are given by the four cell means. The adjusted means simply average them.

If SEX is coded 0 for males and 1 for females, say, most statistical programs will evaluate the adjusted means at the mean value for SEX, which is 0.3333, the proportion of females. The adjusted means will be a weighted average of the cell means with the males being given weight 100/150 and the females given weight 50/150. For omnivores the adjusted mean is 223.3 [= 230 (100/150) + 210 (50/150)], while for vegans it is 213.3 [= 220 (100/150) + 200 (50/150)]. While these values differ from the earlier adjusted means, the difference between them is the same.

Choosing whether or not to name a two-level indicator variable as a factor can be thought of as choosing a set of weights to be applied to the individual levels. If the variable is categorical, the weights are equal. If the variable is numerical, the weight are proportional to the number of observations in each level.

In the previous example, the difference between adjusted means is 10 mg/dl, regardless of whether SEX is treated as categorical or numerical, because the difference between omnivores and vegans is the same for men and women. In practice, the differences will never be identical and the differences in adjusted means will depend on the choice of weights.

The following data are from a study of vitamin D-25 levels in healthy New Englanders during the wintertime. The actual analysis was more complicated, but here we will look at the difference between men and women adjusted for vitamin D intake. Vitamin D is manufactured in the body as the result of skin exposure to the sun, so it was decided to include an indicator for travel below 35 degrees north latitude.

Mean (n) No travel Traveler
Male 22.8 (47) 33.2 (5)
Female 22.3 (73) 29.5 (10)

The mean levels were 23.8 mg/dl for males and 23.1 for females. Adjusted means calculated by treating TRAVEL as a categorical variable in a model that also included a SEX-by-TRAVEL interaction and vitamin D intake as a covariate are 27.9 for males and 26.0 for females. The difference is 1.9 mg.dl. When TRAVEL is treated as a numerical variable, the adjusted means are 23.9 for males and 23.1 for females with a difference of 0.8 mg/dl. The former is the estimate based on equal numbers of travelers and nontravelers. The latter is the estimate based on mostly nontravelers.

While it is appropriate to compare adjusted means to each other, the individual adjusted means themselves are usually best ignored. They represent the estimated values for a specific set of circumstances that may not be realistic in practice.

In the previous example, the adjusted means calculated by treating TRAVEL as a categorical variable are 27.9 for males and 26.0 for females. These values are much larger than are typically seen in such a population and might be considered suspect. However, the reason they are so large is that they are the simple means of the values for travelers and nontravelers. There are very few travelers, but their vitamin D levels are 50% greater than those of nontravelers!

[back to LHSP]

Copyright © 2001 Gerard E. Dallal