**Partial Correlation Coefficients
**Gerard E. Dallal, Ph.D.

Scatterplots, correlation coefficients, and simple linear regression coefficients are inter-related. The scatterplot displays the data. The correlation coefficient measures linear association between the variables. The regression coefficient describes the linear association through a number that gives the expected change in the response per unit change in the predictor.

The coefficients of a multiple regression equation give the change in response per unit change in a predictor when all other predictors are held fixed. This raises the question of whether there are analogues to the correlation coefficient and the scatterplot to summarize the relation and display the data after adjusting for the effects of other variables.

This note answers these questions and illustrates them by using the
crop yield example of Hooker reported by Kendall and Stuart in volume 2
of their *Advanced Theory of Statistics, Vol, 2, 3rd ed.*(example
27.1) Neither Hooker nor Kendall & Stuart provide the raw data, so I
have generated a set of random data with means, standard deviations, and
correlations identical to those given in K&S. These statistics are
sufficient for all of the methods that will be discussed here
(*Sufficient* is a technical term meaning nothing else to do with
the data has any effect on the analysis. Any data set with the same
values of the sufficient statistics will produce these results.), so
the random data will be adequate.

The variables are yields of "seeds' hay" in cwt per acre, spring rainfall in inches and the accumulated temperature above 42 F in the spring for an English area over 20 years. The plots suggest yield and rainfall are positively correlated, while yield and temperature are negatively correlated! This is borne out by the correlation matrix itself.

Pearson Correlation Coefficients, N = 20 Prob > |r| under H0: Rho=0 YIELD RAIN TEMP YIELD 1.00000 0.80031 -0.39988 <.0001 0.0807 RAIN 0.80031 1.00000 -0.55966 <.0001 0.0103 TEMP -0.39988 -0.55966 1.00000 0.0807 0.0103

Just as the simple correlation coefficient between Y and X describes their joint behavior, the partial correlation describes the behavior of Y and X

A partial correlation coefficient can be written in terms of simple
correlation coefficients

Thus, r

A partial correlation between two variables can differ substantially from their simple correlation. Sign reversals are possible, too. For example, the partial correlation between YIELD and TEMPERATURE holding RAINFALL fixed is 0.09664. While it does not reach statistical significance (P = 0.694), the sample value is positive nonetheless.

The partial correlation between X & Y holding a set of variables fixed
will have the same sign as the multiple regression coefficient of X when
Y is regressed on X and the set of variables being held fixed. Also,

Just as the simple correlation coefficient describes the data in an ordinary scatterplot, the partial correlation coefficient describes the data in the partial regression residual plot.

Let Y and X_{1} be the variables of primary interest and let
X_{2}..X_{p} be the variables held fixed.

- First,
calculate the residuals after regressing Y on
X
_{2}..X_{p}. These are the parts of the Ys that cannot be predicted by X_{2}..X_{p}. - Then, calculate the
residuals after regressing X
_{1}on X_{2}..X_{p}. These are the parts of the X_{1}s that cannot be predicted by X_{2}..X_{p}. - The partial correlation coefficient between
Y and X
_{1}adjusted for X_{2}..X_{p}is the correlation between these two sets of residuals. - The regression
coefficient when the Y residuals are regressed on the X
_{1}residuals is equal to the regression coefficient of X_{1}in the multiple regression equation when Y is regressed on the entire set of predictors.

For example, the partial
correlation of YIELD and TEMP adjusted for RAIN is the correlation
between the residuals from regressing YIELD on RAIN and the residuals
from regressing TEMP on RAIN. In this partial regression residual plot,
the correlation is 0.09664. The regression coefficient of TEMP when
the YIELD residuals are regessed on the TEMP residuals is
0.003636. The multiple regression equation for the original data set is

Because the data are residuals, they
are centered around zero. The values, then, are not similar to the
original values. However, perhaps this is an advantage. It stops them
from being misinterpreted as Y or X_{1} values "adjusted for
X_{2}..X_{p}".

While the regression of Y on X_{2}..X_{p} seems
reasonable, it is not uncommon to hear questions about adjusting
X_{1}, that is, some propose comparing the residuals of Y on
X_{2}..X_{p} with X_{1}directly.

This approach has been suggested many times over the years. Lately, it has been used in the field of nutrition by Willett and Stampfer (AJE, 124(1986):17-22) to produce "calorie-adjusted nutrient intakes", which are the residuals obtained by regressing nutrient intakes on total energy intake. These adjusted intakes are used as predictors in other regression equations. However, total energy intake does not appear in the equations and the response is not adjusted for total energy intake. Willett and Stampfer recognize this, but propose using calorie-adjusted intakes nonetheless. They suggest "calorie-adjusted values in multivariate models will overcomethe problem of high-collinearity frequently observed between nutritional factors", but this is just an artifact of adjusting only some of the factors. The correlation between an adjusted factor and an unadjusted factor is always smaller in magnitude than the correlation between two adjusted factors.

This method was first proposed before the ready availability of
computers as a way to approximate multiple regression with two
independent variables (regress Y on X1, regress the residuals on X2) and
was given the name two-stage regression. Today, however, it is
a mistake to use the approximation when the correct answer is easily
obtained. If the goal is to report on two variables after adjusting for
the effects of another set of variables, then both variables must be
adjusted.