Announcement

### What do the Coefficients in a Multiple Linear Regression Mean?

The regression coefficient for the i-th predictor is the expected difference in response per unit difference in the i-th predictor, all other things being equal. That is, if the i-th predictor is changed 1 unit while all of the other predictors are held constant, the response is expected to change bi units. As always, it is important that cross-sectional data not be interpreted as though they were longitudinal.

The regression coefficient and its statistical significance can change according to the other variables in the model. Among postmenopausal women, it has been noted that bone density is related to weight. In this cross-sectional data set, density is regressed on weight, body mass index, and percent ideal body weight*. These are the regression coefficients for the 7 possible regression models predicting bone density from the weight measures.

```              (1)       (2)       (3)       (4)      (5)      (6)      (7)
Intercept   0.77555   0.77264   0.77542   0.77065  0.74361  0.77411  0.75635
WEIGHT      0.00642    .        0.00723   0.00682  0.00499   .        .
BMI        -0.00610  -0.04410    .       -0.00579   .       0.01175   .
PCTIDEAL    0.00026   0.01241  -0.00155    .        .        .       0.00277

```

Not only do the magnitudes of the coefficients change from model to model, but for some variables the sign changes, too**.

For each regression coefficient, there is a t statistic. The corresponding P value tells us whether the variable has statistically significant predictive capability in the presence of the other predictors. A common mistake is to assume that when many variables have nonsignificant P values they are all unnecessary and can be removed from the regression equation. This is not necessarily true. When one variable is removed from the equation, the others may become statistically significant. Continuing the bone density example, the P values for the predictors in each model are

```            (1)     (2)     (3)     (4)     (5)     (6)     (7)
WEIGHT    0.1733   .      0.0011  <.0001  <.0001   .       .
BMI       0.8466  0.0031   .      0.1960   .      <.0001   .
PCTIDEAL  0.9779  0.0002  0.2619   .       .       .      <.0001
```

All three predictors are related, so it is not surprising that model (1) shows that all of them are nonsignifcant in the presence of the others. Given WEIGHT and BMI, we don't need PCTIDEAL, and so on. Any one of them is superfluous. However, as models (5), (6),and (7) demonstrate, all of them are highly statistically significant when used alone.

The P value from the ANOVA table tells us whether there is predictive capability in the model as a whole. All four combinations in the following table are possible.

Overall F
Significant NS
Individual t Significant - -
NS - -
• Cases where the t statistic for every predictor and the F statistic for the overall model are statistically significant are those where every predictor has something to contribute.
• Cases where nothing reaches statistical significance are those where none of the predictors are of any value.
• This note has shown that it is possible to have the overall F ratio statistically significant and all of the t statistics nonsignificant.
• It is also possible to have the overall F ratio nonsignificant and some of the t statistics significant. There are two ways this can happen.
• First, there may be no predictive capability in the model. However, if there are many predictors, statistical theory guarantees that on average 5% of them will appear to have statistically significant predictive capability when tested individually.
• Second, the investigator may have chosen the predictors poorly. If one useful predictor is added to many that are unrelated to the outcome, its contribution may not be large enough for the overall model to appear to have statistically significant predictive capability. A contribution that might have reached statistical significant when viewed individually, might not make it out of the noise when viewed as part of the whole.

-------------------

*In general, great care must be used when using a predictor such as body mass index or percent ideal body weight that is a ratio of other variables. This will be discussed in detail later.

**This touches on another point, too important to be left buried here: It is not always easy to guess/know what the sign of a regression coefficient will be when a predictor is correlated with other variables in the model.
Consider model (2), for example. Both predictors are statistically significant. On there own, bone density goes up and down as they go up and down [models (6) & (7)]. Yet, when they appear in a model together, bone density goes down as BMI increases with PCTIDEAL held constant! It is sometimes said that BMI is "correcting" for PCTIDEAL, which sounds good, but really isn't much help determining what will happen at the outset.