[Some of these notes must involve more mathematical notation than others. This is one of them. However, the mathematics is nothing more than simple algebra.]
This note was prompted by a student's question about interactions. She noticed that many of the tolerances became low when she had two predictors in a multiple regression equation along with their interaction. She wondered what these low tolerances had to do with the collinearity low tolerances usually signaled.
There are two reasons why tolerances can be small. The first is true collinearity, that is, a linear relation among predictors. The second is high correlated predictors that raise concerns about computational accuracy and whether individual coefficients and estimates of their contributions are numerically stable. In some cases where there is high correlation without a linear relation among the variables, the collinearity is avoidable and can be removed by centering, which transforms variables by subtracting a variable's mean (or other typical value) from all of the observations.
Before we go further, let's look at some data where we'll consider the regression of Y on X and X^{2} and there can be no linear relation among the predictors.



(=X3) 
(=(X3)^{2}) 

























In this example, Z is the centered version of X, that is, Z=X3, where 3 is the mean of the Xs. We'll be referring to 6 different regressions
So that computer output will not clutter this note, I've placed it in a separate web page.
Things to notice:
Another way to see why the equations must be so similar is to recognize that because Z is X shifted by a constant, the correlation between Y and Z will be equal to the correlation between Y and X. Further, the SDs of X and Z will be equal and the SD of Y will be common to both equations. Thus, the three quantities that determine the regression coefficientthe SD of the response, the SD of the predictor, and the correlation between response and predictorare the same for both equations!
The agreement is close because the two regressions are equivalent.
One question remains: In the regressions of Y on Z & Z^{2} and Y on X & X^{2}, why are the P values for the coefficients of Z^{2} and X^{2} the same while the P values for Z and X differ? The answer is supplied by the description of the P value as an indicator of the extent to which the variable adds predictive capability to the other variables in the model. We've already noted that the regression of Y on Z has the same predictive capability (R^{2}) as the regression of Y on X and the regression of Y on Z and Z^{2} has the same predictive capability as the regression of Y on X and X^{2}. Therefore, adding Z^{2} to Z has the same effect as adding X^{2} to X. We start from the same place (X and Z) and end at the same place (Z,Z^{2} and X,X^{2}), so the way we get there (Z^{2} and X^{2}) must be the same.
We've also noted that the regression of Y on Z^{2} does not have the same predictive capability (R^{2}) as the regression of Y on X^{2}. Since we start from different places (Z^{2} and X^{2}) and end at the same place (Z,Z^{2} and X,X^{2}), the way we get there (X and Z) must be different.
Interactions behave in a similar fashion. Consider predicting Y from X, Z, and their interaction XZ and predicting Y from (Xk_{x}), (Zk_{z}) and their interaction (Xk_{x}) (Zk_{z}), where k_{x} and k_{z} are constants that center X and Z.
If there is a linear relation among the variables, centering will not remove it. If
Comment: While I might worry about centering when fitting polynomial regressions (if I didn't use software specially designed for the purpose), I tend not to worry about it when fitting interactions. There has been more than a quarter century of research into the problems of numerical accuracy when fitting multiple regression equations. Most statistical software, including all of the software I use personally, makes use of this work and is fairly robust. In addition, I rarely fit anything more complicated than a firstorder interaction, which won't grow any faster than a square. If the software shows a signifcant or important interaction, I tend to believe regardless of any collinearity measure because the effect of collinearity is to mask things. I would look more closely if collinearity measures were suspicious and an expected effect were nonsignifcant.
Comment: Centering can make regression coefficients easier to understand. Consider an equation that predicts Systolic Blood Pressure (SBP) from AGE and Physical Activity Level (PAL, which ranges from 1.4 to 2.0 in a typical healthy adult population). An AGE by PAL interaction is included in the model because it is felt that age will have more of an effect at low Physical Activity Levels than at high levels. The resulting equation is
Those unaccustomed to dealing with interactions will be surprised to see that the coefficient of PAL is positive. This seems to suggest that exercise raises blood pressure! However, when the data are centered by subtracting the mean age, 34, from AGE and the mean PAL, 1.6, from PAL, the equation becomes
The two equations are the same, that is, they give the same predicted values and simple algebra can be used to transform one into the other. Now, however, the coefficient of PAL is negative and the coefficient of age is less substantial. It's the interaction that's causing all the changes. If there were no interaction, the coefficent of PAL would be the change in SBP with each unit change in PAL. With the interaction in the model, this interpretation is correct only when the interaction term is 0. But that can happen only when age is 0, which is not true for anyone in this population.
To see this another way, rewrite the original equation as
When the data are centered, the coefficient for AGE is the change in SBP per unit change in age when PAL is equal to its mean value, 1.6. The coefficient for PAL is the change in SBP per unit change in PAL when age is equal to its mean value, 34. In general, when data are centered, the coefficients for each individual variable are the changes in response per unit change in predictor when all other predictors are equal to their sample means. This is usually more informative to the reader than the change in response per unit change in predictor when all other predictors are equal to 0.