What Does Multiple Linear Regression Look Like? (Part 2)

This note considers the case where one of the predictors is an indicator variable. It will be coded 0/1 here, but these results do not depend on the the two codes used. Here, men and women are placed on a treadmill. When they can no longer continue, duration (DUR) an maximum oxygen usage (VO2MAX) are recorded. The purpose of this analysis is to predict VO2MAX from sex (M0F1 = 0 for males, 1 for females) and DUR. When the model

VO2MAX = 0 + 1 DUR + 2 M0F1 +
is fitted to the data, the result is
VO2MAX = 1.3138 + 0.0606 DUR - 3.4623 M0F1

When the data are plotted in three dimensions, it is seen that they lie along two slices--one slice for each of the two values of M0F1. The regression surface is once again a flat plane. This follows from our choice of a model.
The data in each slice can be plotted as VO2MAX against DUR and the two plots can be superimposed. The two lines are the pieces of the plane corresponding to M0F1=0 and M0F1=1. The lines are parallel because they are parallel strips from the same flat plane. This also follow directly from the model. The fitted equation can be written conditional on the two values of M0F1. When M0F1=0, the model is

YO2MAX = 1.3138 + 0.0606 DUR - 3.4623 * 0, or YO2MAX = 1.3138 + 0.0606 DUR
When M0F1=1, the model is
YO2MAX = 1.3138 + 0.0606 DUR - 3.4623 * 1, or
YO2MAX = -2.1485 + 0.0606 DUR.

A more complicated model can be fitted that does not force the lines to be parallel. This is discussed in the note on interactions. The seaparate lines are fitted in the picture to the left. The test for whether the lines are parallel has an observed significance level of 0.102. Thus, the regression coefficients are within sampling variability of each other and the lines are within sampling variability of what one would expect of parallel lines.

Copyright © 2001 Gerard E. Dallal