Announcement
### Why Is a Regression Line Straight?

This could have been part of the "What does multiple linear regression
look like?" note. However, I didn't want it to be seen as a footnote to
the pretty pictures. This is the more important lesson.

A simple linear regression line is straight because **we fit a straight
line to the data**! We could fit something other than a straight line
if we want to. For example, instead of fitting

BONE DENSITY = b_{0} + b_{1} AGE
we might fit the equation
BONE DENSITY = b_{0} + b_{1} AGE+ b_{2}
AGE^{2}
if we felt the relation was quadratic. This is one reason for
looking at the data as part of the analysis.
When homocysteine was regressed on CLC-folate and vitamin B12, why was
the regression surface flat? The answer here, too, is because we fit a
flat surface!

Let's take a closer look at the regession equation

LHCY = 1.570602 - 0.082103 LCLC - 0.136784 LB12
Suppose LCLC is 1.0. Then
LHCY = 1.570602 - 0.082103 * 1 - 0.136784 LB12
or
LHCY = 1.488499 - 0.136784 LB12
There is a straight line relation between LHCY and LB12 for any fixed
value of LCLC. WHen LCLC changes, the Y intercept of the straight line
changes, but the slope remains the same. Since the slope remains the
same, the change in LHCY per unit change in LB12 is the same for all
values of LCLC.
If you draw the regression lines
for various values of LCLC in the scatterplot of LHCY against LB12, you
get a series of parallel lines, that is, you get the regression plane
viewed by sighting down the LCLC axis.

The same argument applies to the regression surface for fixed LB12.

The first important lesson to be learned is that the shape of the
regression surfaces and the properties of the regression equation follow
from the model **we choose to fit to the data**. The second is that
**we are responsible for the models we fit**. We are obliged to
understand the interpretation and consequences of the models we fit. It
we don't believe a particular type of model will adequately describe a
dataset, we shouldn't be fitting that model! The responsibility is not
with the statistical software. It is with the analyst.

Copyright © 2001 Gerard E.
Dallal