Announcement

Why Is a Regression Line Straight?

This could have been part of the "What does multiple linear regression look like?" note. However, I didn't want it to be seen as a footnote to the pretty pictures. This is the more important lesson.

A simple linear regression line is straight because we fit a straight line to the data! We could fit something other than a straight line if we want to. For example, instead of fitting

BONE DENSITY = b0 + b1 AGE
we might fit the equation
BONE DENSITY = b0 + b1 AGE+ b2 AGE2
if we felt the relation was quadratic. This is one reason for looking at the data as part of the analysis.

When homocysteine was regressed on CLC-folate and vitamin B12, why was the regression surface flat? The answer here, too, is because we fit a flat surface!

Let's take a closer look at the regession equation

LHCY = 1.570602 - 0.082103 LCLC - 0.136784 LB12
Suppose LCLC is 1.0. Then
LHCY = 1.570602 - 0.082103 * 1 - 0.136784 LB12
or
LHCY = 1.488499 - 0.136784 LB12
There is a straight line relation between LHCY and LB12 for any fixed value of LCLC. WHen LCLC changes, the Y intercept of the straight line changes, but the slope remains the same. Since the slope remains the same, the change in LHCY per unit change in LB12 is the same for all values of LCLC.

If you draw the regression lines for various values of LCLC in the scatterplot of LHCY against LB12, you get a series of parallel lines, that is, you get the regression plane viewed by sighting down the LCLC axis.

The same argument applies to the regression surface for fixed LB12.

The first important lesson to be learned is that the shape of the regression surfaces and the properties of the regression equation follow from the model we choose to fit to the data. The second is that we are responsible for the models we fit. We are obliged to understand the interpretation and consequences of the models we fit. It we don't believe a particular type of model will adequately describe a dataset, we shouldn't be fitting that model! The responsibility is not with the statistical software. It is with the analyst.


Copyright © 2001 Gerard E. Dallal