Introduction to Multiple Linear Regression
Gerard E. Dallal, Ph.D.
If you are familiar with simple linear regression, then you know the
very basics of multiple linear regression. Once again, the goal is to
obtain the least squares equation (that is, the equation for which the
sum of squared residuals is a minimum) to predict some response. With
simple linear regression there was one predictor. The fitted equation
was of the form
The output from a multiple linear analysis will look familiar. Here is an example of cross-sectional data where the log of HDL cholesterol (the so-called good cholesterol) in women is predicted from their age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol.
The REG Procedure Model: MODEL1 Dependent Variable: LHCHOL Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 0.54377 0.06797 6.16 <.0001 Error 147 1.62276 0.01104 Corrected Total 155 2.16652 Root MSE 0.10507 R-Square 0.2510 Dependent Mean 1.71090 Adj R-Sq 0.2102 Coeff Var 6.14105 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1.16448 0.28804 4.04 <.0001 AGE 1 -0.00092863 0.00125 -0.74 0.4602 BMI 1 -0.01205 0.00295 -4.08 <.0001 BLC 1 0.05055 0.02215 2.28 0.0239 PRSSY 1 -0.00041910 0.00044109 -0.95 0.3436 DIAST 1 0.00255 0.00103 2.47 0.0147 GLUM 1 -0.00046737 0.00018697 -2.50 0.0135 SKINF 1 0.00147 0.00183 0.81 0.4221 LCHOL 1 0.31109 0.10936 2.84 0.0051
To predict someone's logged HDL cholesterol, just take the values of the predictors, multiply them by their coefficients, and add them up. Some coefficients are statistically significant; some are not. What we make of this or do about it depends on the particular research question.
It is reasonable to think that statistical methods appearing in a wide variety of text books have the imprimatur of the statistical community and are meant to be used. However, multiple regression includes many methods that were investigated for the elegance of their mathematics. Some of these methods (such as stepwise regression and principal component regression) should not be used to analyze data. We will discuss these methods in future notes.
The analyst should be mindful from the start that multiple regression techniques should never be studied in isolation from data. What we do and how we do it can only be addressed in the context of a specific research question.