Announcement

What do the Coefficients in a Multiple Linear Regression Mean?

The regression coefficient for the i-th predictor is the expected difference in response per unit difference in the i-th predictor, all other things being equal. That is, if the i-th predictor is changed 1 unit while all of the other predictors are held constant, the response is expected to change bi units. As always, it is important that cross-sectional data not be interpreted as though they were longitudinal.

The regression coefficient and its statistical significance can change according to the other variables in the model. Among postmenopausal women, it has been noted that bone density is related to weight. In this cross-sectional data set, density is regressed on weight, body mass index, and percent ideal body weight*. These are the regression coefficients for the 7 possible regression models predicting bone density from the weight measures.

              (1)       (2)       (3)       (4)      (5)      (6)      (7)
Intercept   0.77555   0.77264   0.77542   0.77065  0.74361  0.77411  0.75635
WEIGHT      0.00642    .        0.00723   0.00682  0.00499   .        .     
BMI        -0.00610  -0.04410    .       -0.00579   .       0.01175   .     
PCTIDEAL    0.00026   0.01241  -0.00155    .        .        .       0.00277
         

Not only do the magnitudes of the coefficients change from model to model, but for some variables the sign changes, too**.

For each regression coefficient, there is a t statistic. The corresponding P value tells us whether the variable has statistically significant predictive capability in the presence of the other predictors. A common mistake is to assume that when many variables have nonsignificant P values they are all unnecessary and can be removed from the regression equation. This is not necessarily true. When one variable is removed from the equation, the others may become statistically significant. Continuing the bone density example, the P values for the predictors in each model are

            (1)     (2)     (3)     (4)     (5)     (6)     (7)
WEIGHT    0.1733   .      0.0011  <.0001  <.0001   .       .    
BMI       0.8466  0.0031   .      0.1960   .      <.0001   .    
PCTIDEAL  0.9779  0.0002  0.2619   .       .       .      <.0001

All three predictors are related, so it is not surprising that model (1) shows that all of them are nonsignifcant in the presence of the others. Given WEIGHT and BMI, we don't need PCTIDEAL, and so on. Any one of them is superfluous. However, as models (5), (6),and (7) demonstrate, all of them are highly statistically significant when used alone.

The P value from the ANOVA table tells us whether there is predictive capability in the model as a whole. All four combinations in the following table are possible.

Overall F
Significant NS
Individual t Significant - -
NS - -

-------------------

*In general, great care must be used when using a predictor such as body mass index or percent ideal body weight that is a ratio of other variables. This will be discussed in detail later.

**This touches on another point, too important to be left buried here: It is not always easy to guess/know what the sign of a regression coefficient will be when a predictor is correlated with other variables in the model.
Consider model (2), for example. Both predictors are statistically significant. On there own, bone density goes up and down as they go up and down [models (6) & (7)]. Yet, when they appear in a model together, bone density goes down as BMI increases with PCTIDEAL held constant! It is sometimes said that BMI is "correcting" for PCTIDEAL, which sounds good, but really isn't much help determining what will happen at the outset.


Copyright © 2000 Gerard E. Dallal