[This example involves a cross-sectional study of HDL cholesterol
(**HCHOL**, the so-called good cholesterol) and body mass index
(**BMI**), a measure of obesity. Since both BMI and HDL cholesterol
will be related to total cholesterol (**CHOL**), it would make good
sense to adjust for total cholesterol.]

In the multiple regression models we have been considering so far, the
effects of the predictors have been **additive**. When HDL cholesterol
is regressed on total cholesterol and BMI, the fitted model is

Dependent Variable: HCHOL Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEPT 1 64.853 8.377 7.742 0.000 BMI 1 -1.441 0.321 -4.488 0.000 CHOL 1 0.068 0.027 2.498 0.014says that the expected difference in HCHOL is 0.068 per unit difference in CHOL when BMI is held fixed. This is true whatever the value of BMI. The difference in HCHOL is -1.441 per unit difference in BMI when CHOL is held fixed. This is true whatever the value of CHOL. The effects of CHOL and BMI are additive because the expected difference in HDL cholesterol corresponding to differences in both CHOL and BMI is obtained by adding the differences expected from CHOL and BMI determined without regard to the other's value. That is, the expected difference between two subjects' HDL is

The model that was fitted to the data (HCHOL = b_{0} +
b_{1 }CHOL + b_{2} BMI ) *forces* the effects to be
additive, that is, the effect of CHOL is the same for all values of BMI
and vice-versa because the model won't let it be anything else. While
this condition might seem restrictive, experience shows that it is a
satisfactory description of many data sets. I'd guess it depends on your
area of application. The good news is that we don't have to accept this
blindly. There are ways to check on the adequacy of the model!

Even if additivity is appropriate for many situations, there are times when it does not apply. Sometimes, the purpose of a study is to formally test whether additivity holds. Perhaps the way HDL cholesterol varies with BMI depends on total cholesterol. One way to investigate this is by including an interaction term in the model. Let BMICHOL=BMI*CHOL, the product of BMI and CHOL. The model incorporating the interaction is

Dependent Variable: HCHOL Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEPT 1 -24.990 38.234 -0.654 0.515 BMI 1 2.459 1.651 1.489 0.139 CHOL 1 0.498 0.181 2.753 0.007 BMI*CHOL 1 -0.019 0.008 -2.406 0.018The general form of the model is

It can be rewritten two ways to show how the change in response with one variable depends on the other.

(1) Y = b_{0} + b_{1} X + (b_{2} + b_{3}
X) Z

(2) Y = b_{0} + b_{2} Z + (b_{1} + b_{3}
Z) X

Expression (1) shows the difference in Y per unit difference in Z when
X is held fixed is (b_{2}+b_{3}X). This varies with the
value of X. Expression (2) shows the difference in Y per unit difference
in X when Z is held fixed is (b_{1}+b_{3}Z). This varies
with the value of Z. The coefficient b_{3} measures the amount by
which the change in response with one predictor is affected by the other
predictor. If b_{3} is not statistically significant, then the
data have not demonstrated the change in response with one predictor
depends on the value of the other predictor. In the HCHOL, COL, BMI
example, the model

HCHOL = -24.990 + 0.498 CHOL + (2.459
- 0.019 CHOL) BMI or

HCHOL = -24.990 + 2.459 BMI + (0.498 - 0.019
BMI) CHOL

*Comment*: Great care must be exercised when interpreting the
coefficients of individual variables in the presence of interactions.
The coefficient of BMI is 2.459. If there were no interaction term, this
would be interpreted as saying that among those with a given total
cholesterol level, those with greater BMIs are expected to have
**greater** HDL levels! However, once the interaction is taken into
account, the coefficient for BMI is, in fact, (2.459-0.019 CHOL), which
is **negative** provided total cholesterol is greater than 129, which
is true of all but 3 subjects. (How many people do you know with total
cholesterol levels less than 129? There are too many people whose LDL--the
so-called "bad cholesterol"--alone is greater than 129!)

*Comment*: The inclusion of interactions when the study was not
specifically designed to assess them can make it difficult to estimate
the other effects in the model. **Therefore, if **

**a study was not specifically designed to assess interactions,****there is no a priori reason to expect an interaction,****interactions are being assessed "for insurance" because modern statistical software makes it easy, and****no interaction is found,**

Interactions have a special interpretation when one of the predictors is a categorical variable with two categories. Consider an example in which the response Y is predicted from a continuous predictor X and indicator of sex (M0F1, =0 for males and 1 for females). The model

Y = b_{0} + b_{1} X + b_{2} M0F1

specifies two simple linear regression equations. For men, M0F1=0 and

The change in Y per unit change in X--b_{1}--is the same for
men and women. The model forces the regression lines to be parallel. The
difference between men and women is the same for all values of X and is
equal to b_{2}, the difference in Y-intercepts.

Including a sex-by-X interaction term in the model allows the regression lines for men and women to have different slopes.

Y = b_{0} + b_{1} X + b_{2} M0F1 + b_{3}
X * M0F1

For men, the model reduces to Y = b_{0} + b_{1}
X

while for women, it is Y = (b_{0} + b_{2}) + (b_{1}
+ b_{3}) X

Thus, b_{3} is the difference in slopes. The slopes for men
and women will have been shown to differ if and only if b_{3} is
statistically significant.

**The individual regression equations for men and women obtained from
the multiple regression equation with a sex-by-X interaction are
identical to the equations that are obtained by fitting a simple linear
regression of Y on X for men and women separately.** The advantage of
the multiple regression approach is that it simplifies the task of
testing whether the regression coefficients for X differ between men and
women.

*Comment:* A common mistake is to compare two groups by fitting
separate regression models and declaring them different if the regression
coefficient is statistically significant in one group and not the
other. it may be the two regression coefficients are similar with P
values close to and on either side of 0.05. In order to show men and
women response differently to a change in the continuous predictor, the
multiple regression approach must be used and the difference in
regression coefficients as measured by the sex-by-X interaction must be
tested formally.

*Centering* refers to the practice of subtracting a constant from
predictors before fitting a regression model. Often the constant is a
mean, but it can be any value.

There are two reasons to center. One is technical. The numerical routines that fit the model are often more accurate when variables are centered. Some computer programs automatically center variables and transform the model back to the original variables, all without the user's knowledge.

The second reason is practical. The coefficients from a centered model are often easier to interpret. Consider the model that predicts HDL cholesterol from BMI and total cholesterol and a centered version fitted by subtracting 22.5 from each BMI and 215 from each total cholesterol.

In the original model

- -24.990 is the expected HDL cholesterol level for someone with total cholesterol and BMI of 0,
- 0.498 is the difference in HDL cholesterol corresponding to a unit difference in total cholesterol for someone with a BMI of 0, and
- 2.459 is the difference in HDL cholesterol corresponding to a unit difference in BMI for someone with a total cholesterol of 0.

- 47.555 is the expected HDL cholesterol level for someone with total a cholesterol of 215 and a BMI of 22.5,
- 0.080 is the difference in HDL cholesterol corresponding to a unit difference in total cholesterol for someone with a BMI of 22.5, and
- -1.537 is the difference in HDL cholesterol corresponding to a unit difference in BMI for someone with a total cholesterol of 215.

When there is an interaction in the model,

- the coefficients for the individual
*uncentered*variables are the differences in response corresponding to a unit change in the predictor when the other predictors are 0, while - the coefficients for the
individual
*centered*variables are the differences in response corresponding to a unit change in the predictor when the other predictors are at their centered values.