Consider once again the
regression of homocysteine on B12 and folate (all logged). It's common to
think of the data in terms of pairwise scatterplots. The regression
equation
is often mistakenly thought of as a kind of line. However, it is not a
line, but a surface.
Each observation
is a three-dimensional vector {(xi, yi,
zi), i = 1,..n} [here, (LCLCi, LB12i,
LHCYi)]. When plotted in a three-dimensional space, the data
look like the picture to the left.
It can be
difficult to appreciate a two-dimensional representation of three-
dimensional data. The picture is redrawn with spikes from each
observation to the plane defined by LCLC and LB12 to give a better sense
of where the data lie.
The final display
shows the regression surface. It is a flat plane. Predicted values are
obtained by staring at the intersection of LB12 and LCLC on the LB12-LCLC
plane and travelling parallel to the LHCY axis until the plane is reached
(in the manner of the spike, but to the plane instead of the
observation). Residuals are calculated as the distance from the
observation to the plane, again travelling parallel to the LCHY axis.
The same thing happens with more that 2 predictors, but it's hard to
draw a two-dimensional representation of it. With p predictors,
the regression surface is a p-dimensional hyperplane in a
(p+1)-dimensional space.