Announcement

### [Early draft subject to change.]

[Some of these notes must involve more mathematical notation than others. This is one of them. However, the mathematics is nothing more than simple algebra.]

This note was prompted by a student's question about interactions. She noticed that many of the tolerances became low when she had two predictors in a multiple regression equation along with their interaction. She wondered what these low tolerances had to do with the collinearity low tolerances usually signaled.

There are two reasons why tolerances can be small. The first is true collinearity, that is, a linear relation among predictors. The second is high correlated predictors that raise concerns about computational accuracy and whether individual coefficients and estimates of their contributions are numerically stable. In some cases where there is high correlation without a linear relation among the variables, the collinearity is avoidable and can be removed by centering, which transforms variables by subtracting a variable's mean (or other typical value) from all of the observations.

Before we go further, let's look at some data where we'll consider the regression of Y on X and X2 and there can be no linear relation among the predictors.

 Y X X2 Z(=X-3) Z2(=(X-3)2) 18 5 25 2 4 15 4 16 1 1 12 3 9 0 0 3 2 4 -1 1 9 1 1 -2 4

In this example, Z is the centered version of X, that is, Z=X-3, where 3 is the mean of the Xs. We'll be referring to 6 different regressions

1. Y = b0 + b1 X
2. Y = c0 + c1 Z
3. Y = b0 + b2 X2
4. Y = c0 + c2 Z2
5. Y = b0 + b1 X + b2 X2
6. Y = c0 + c1 Z + c2 Z2
They should be thought of as three pairs of equations. A particular coefficient does not have to have the same value in all equations in which it appears. That is, there is not just one b0, but different b0s for equations (1), (3), and (5). If this is proves to be confusing, I'll rewrite the note.

So that computer output will not clutter this note, I've placed it in a separate web page.

Things to notice:

• The correlation between X & X2 is 0.98 while the correlation between Z & Z2 is 0.00.
• Equations (1) & (2): The regression of Y on X is virtually identical to the regression of Y on Z. R2 is the same for both equations (0.676). The coefficients for X and Z are the same (-3.00) as are their P values (0.088). This must happen any time Z = X - k, where k is any constant (here, k is 3) because
Y = c0 + c1 Z
Y = c0 + c1 (X - k)
Y = (c0 - c1 k) + c1 X
Thus, regressing Y on Z is equivalent to regressing Y on X with
b1 = c1
b0 = c0 - c1 k
This is why the slopes of the two equations are the same, but their intercepts differ.

Another way to see why the equations must be so similar is to recognize that because Z is X shifted by a constant, the correlation between Y and Z will be equal to the correlation between Y and X. Further, the SDs of X and Z will be equal and the SD of Y will be common to both equations. Thus, the three quantities that determine the regression coefficient--the SD of the response, the SD of the predictor, and the correlation between response and predictor--are the same for both equations!

• Equations (3) & (4): On the other hand, the regression of Y on X2 is different from the regression of Y on Z2. There is no reason why they must be the same. Z2 is not X2 shifted by a constant. Z2 is not even a linear function of X2. Regressing Y on Z2 is not equivalent to regressing Y on X2.
Y = c0 + c2 Z2
Y = c0 + c2 (X - k)2
Y = (c0 + c2 k2) - 2 c2 k X + c2X2
which includes a term involving X.
• Equations (5) & (6): In many ways, the multiple regression of Y on X and X2 is similar to the regression of Y on Z and Z2. R2 is the same for both equations (0.753). The coefficients for X2 and Z2 (0.857) along with their P values (0.512).

The agreement is close because the two regressions are equivalent.

Y = c0 + c1 Z + c2 Z2
Y = c0 + c1 (X - k) + c2 (X - k)2
Y = (c0 - c1 k + c2 k2) + (c1 - 2 c2 k) X + c2 X2
Thus, the regression of Y on Z and Z2 is equivalent to the regression Y on X and X2 with
b2 = c2
b1 = c1 - 2 c2 k
b0 = c0 - c1 k + c2 k2

One question remains: In the regressions of Y on Z & Z2 and Y on X & X2, why are the P values for the coefficients of Z2 and X2 the same while the P values for Z and X differ? The answer is supplied by the description of the P value as an indicator of the extent to which the variable adds predictive capability to the other variables in the model. We've already noted that the regression of Y on Z has the same predictive capability (R2) as the regression of Y on X and the regression of Y on Z and Z2 has the same predictive capability as the regression of Y on X and X2. Therefore, adding Z2 to Z has the same effect as adding X2 to X. We start from the same place (X and Z) and end at the same place (Z,Z2 and X,X2), so the way we get there (Z2 and X2) must be the same.

We've also noted that the regression of Y on Z2 does not have the same predictive capability (R2) as the regression of Y on X2. Since we start from different places (Z2 and X2) and end at the same place (Z,Z2 and X,X2), the way we get there (X and Z) must be different.

Interactions behave in a similar fashion. Consider predicting Y from X, Z, and their interaction XZ and predicting Y from (X-kx), (Z-kz) and their interaction (X-kx) (Z-kz), where kx and kz are constants that center X and Z.

• Regressing Y on (X-kx) & (Z-kz) is equivalent to regressing Y on X & Z because
Y = c0 + c1 (X - kx) + c2 (Z - kz)
Y = (c0 - c1 kx - c2 kz) + c1 X + c2 Z
• Regressing Y on (X-kx) & (X-kx) (Z-kz) is different from regressing Y on X & XZ because
Y = c0 + c1 (X - kx) + c3 (X - kx) (Z - kz)
Y = (c0 - c1 kx + c3 kx kz) + (c1 - c3 kz) X - c3 kx Z + c3 X Z
which includes a term involving Z alone.
• Regressing Y on (X-kx), (Z-kz), and (X-kx) (Z-kz) is equivalent to regressing Y on X, Z, and XZ because
Y = c0 + c1 (X - kx) + c2 (Z - kz) + c3 (X - kx) (Z - kz)
Y = (c0 - c1 kx - c2 kz + c3 kx kz) + (c1 - c3 kz) X + (c2 - c3 kx) Z + c3 XZ
Adding XZ to X and Z will have the same effect as adding (X-kx)(Z-kz) to (X-kx) and (Z-kz) because the models start from the same place and end at the same place, so the P values for XZ and (X-kx)(Z-kz) will be the same. However, adding Z to X and XZ is different from adding (Z-kz) to (X-kx) and (X-kx)(Z-kz) because the models start from different places and end up at the same place. (In similar fashion, adding adding X to Z and XZ is different from adding (X-kx) to (Z-kz) and (X-kx)(Z-kz).)

If there is a linear relation among the variables, centering will not remove it. If

c1X1+c2X2+..+cpXp = m
and X1 is replaced by (Z=X1-k), then
c1Z+c1k +c2X2+..+cpXp = m
or
c1Z+c2X2+..+cpXp = m - c1k
Because m - c1k is a(nother) constant, there is still a linear relation among the variables.

Comment: While I might worry about centering when fitting polynomial regressions (if I didn't use software specially designed for the purpose), I tend not to worry about it when fitting interactions. There has been more than a quarter century of research into the problems of numerical accuracy when fitting multiple regression equations. Most statistical software, including all of the software I use personally, makes use of this work and is fairly robust. In addition, I rarely fit anything more complicated than a first-order interaction, which won't grow any faster than a square. If the software shows a signifcant or important interaction, I tend to believe regardless of any collinearity measure because the effect of collinearity is to mask things. I would look more closely if collinearity measures were suspicious and an expected effect were nonsignifcant.

Comment: Centering can make regression coefficients easier to understand. Consider an equation that predicts Systolic Blood Pressure (SBP) from AGE and Physical Activity Level (PAL, which ranges from 1.4 to 2.0 in a typical healthy adult population). An AGE by PAL interaction is included in the model because it is felt that age will have more of an effect at low Physical Activity Levels than at high levels. The resulting equation is

SBP = 78.6 + 2.6 AGE + 14 PAL - 1.0 AGE*PAL

Those unaccustomed to dealing with interactions will be surprised to see that the coefficient of PAL is positive. This seems to suggest that exercise raises blood pressure! However, when the data are centered by subtracting the mean age, 34, from AGE and the mean PAL, 1.6, from PAL, the equation becomes

SBP = 135 + 1.0 (AGE-34) - 20 (PAL-1.6) - 1.0 (AGE-34)*(PAL-1.6)

The two equations are the same, that is, they give the same predicted values and simple algebra can be used to transform one into the other. Now, however, the coefficient of PAL is negative and the coefficient of age is less substantial. It's the interaction that's causing all the changes. If there were no interaction, the coefficent of PAL would be the change in SBP with each unit change in PAL. With the interaction in the model, this interpretation is correct only when the interaction term is 0. But that can happen only when age is 0, which is not true for anyone in this population.

To see this another way, rewrite the original equation as

SBP = 78.6 + (14 - 1.0 AGE) PAL + 2.6 AGE
• The change of SBP per unit change in PAL is (14 - 1.0 AGE). This is 14 when age is 0 but is -20 when age is equal to the more typical value of 34.
• As age increases, the effect of PAL (its coefficient) becomes greater.
• For the ages in the sample (20-50), the coefficent of PAL ranges from -6 to -36. Since PAL takes on values between 1.4 and 2.0, the full range of PAL (1.4 to 2.0) accounts for a difference in SBP of 4 mm at the low end of age (20) and 22 mm at the high end of 50.

When the data are centered, the coefficient for AGE is the change in SBP per unit change in age when PAL is equal to its mean value, 1.6. The coefficient for PAL is the change in SBP per unit change in PAL when age is equal to its mean value, 34. In general, when data are centered, the coefficients for each individual variable are the changes in response per unit change in predictor when all other predictors are equal to their sample means. This is usually more informative to the reader than the change in response per unit change in predictor when all other predictors are equal to 0.

[back to LHSP]