**Which Predictors Are More Important?
**Gerard E. Dallal, Ph.D.

When a multiple regression is fitted, it is not uncommon for someone to ask which predictors are more important. This is a reasonable question. There have been some attempts to come up with a purely statistical answer, but they are unsatisfactory. The question can be answered only in the context of a specific research question by using subject matter knowledge.

To focus the discussion, consider the regression equation for predicting HDL cholesterol presented earlier.

The REG Procedure Dependent Variable: LHCHOL Parameter Estimates Parameter Standard Pr > Standardized Variable Estimate Error T |t| Estimate Intercept 1.16448 0.28804 4.04 <.0001 0 AGE -0.00092 0.00125 -0.74 0.4602 -0.05735 BMI -0.01205 0.00295 -4.08 <.0001 -0.35719 BLC 0.05055 0.02215 2.28 0.0239 0.17063 PRSSY -0.00041 0.00044 -0.95 0.3436 -0.09384 DIAST 0.00255 0.00103 2.47 0.0147 0.23779 GLUM -0.00046 0.00018 -2.50 0.0135 -0.18691 SKINF 0.00147 0.00183 0.81 0.4221 0.07108 LCHOL 0.31109 0.10936 2.84 0.0051 0.20611The predictors are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The regression coefficients range from 0.0004 to 0.3111 in magnitude.

One possibility is to measure the importance of a variable by the magnitude of its regression coefficient. This approach fails because the regression coefficients depend on the underlying scale of measurements. For example, the coefficient for AGE measures the expected difference in response for each year of difference in age. If age were recorded in months instead of years, the regression coefficient would be divided by 12, but surely the change in units does not change a variable's importance.

Another possibility is to measure the importance of a variable by its observed significance level (P value). However, the distinction between statistical significant and practical importance applies here, too. Even if the predictors are measured on the same scale, a small coefficient that can be estimated precisely will have a small P value, while a large coefficient that is not estimate precisely will have a large P value.

In an attempt to solve the problem of units of measurement, many
regression programs provide **standardized regression coefficients**.
Before fitting the multiple regression equation, all variables--response
and predictors--are standardized by subtracting the mean and dividing by
the standard deviation. The standardized regression coefficients, then,
represent the change in response for a change of one standard deviation
in a predictor. Some like SPSS report them automatically, labeling them
"Beta" while the ordinary coefficients are labelled "B". Others, like
SAS, provide them as an option and label them "Standardized Coefficient".

Advocates of standardized regression coefficients point out that the
coefficients are the same regardless of a predictor's underlying scale of
units. They also suggest that this removes the problem of comparing
*years* with *mm Hg* since each regression coefficient
represents the change in response per standard unit (one SD) change in a
predictor. However, this is illusory. there is no reason why a change of
one SD in one predictor should be equivalent to a change of one SD in
another predictor. Some variables are easy to change--the amount of time
watching television, for example. Others are more difficult--weight or
cholesterol level. Others are impossible--height or age.

The answer to which variable is most important depends on the specific context and why the question is being asked. The investigator and the analyst should consider specific changes in each predictor and the effect they'd have on the response. Some predictors will not be able to be changed, regardless of their coefficients. This is not an issue if the question asks what most determines the response, but it is critical if the point of the exercise is to develop a public policy to effect a change in the response. When predictors can be modified, investigators will have to decide what changes are feasible and what changes are comparable. Cost will also enter into the discussion. For example, suppose a change in response can be obtained by either a large change in one predictor or a small change in another predictor. According to circumstances, it might prove more cost-effective to attempt the large change than the small change.