Announcement

Introduction to Multiple Linear Regression
Gerard E. Dallal, Ph.D.

If you are familiar with simple linear regression, then you know the very basics of multiple linear regression. Once again, the goal is to obtain the least squares equation (that is, the equation for which the sum of squared residuals is a minimum) to predict some response. With simple linear regression there was one predictor. The fitted equation was of the form .

With multiple linear regression, there are multiple predictors. The fitted equation is of the form ,

where p is the number of predictors.

The output from a multiple linear analysis will look familiar. Here is an example of cross-sectional data where the log of HDL cholesterol (the so-called good cholesterol) in women is predicted from their age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol.

```                              The REG Procedure
Model: MODEL1
Dependent Variable: LHCHOL

Analysis of Variance

Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     8        0.54377        0.06797       6.16    <.0001
Error                   147        1.62276        0.01104
Corrected Total         155        2.16652

Root MSE              0.10507    R-Square     0.2510
Dependent Mean        1.71090    Adj R-Sq     0.2102
Coeff Var             6.14105

Parameter Estimates

Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        1.16448        0.28804       4.04      <.0001
AGE           1    -0.00092863        0.00125      -0.74      0.4602
BMI           1       -0.01205        0.00295      -4.08      <.0001
BLC           1        0.05055        0.02215       2.28      0.0239
PRSSY         1    -0.00041910     0.00044109      -0.95      0.3436
DIAST         1        0.00255        0.00103       2.47      0.0147
GLUM          1    -0.00046737     0.00018697      -2.50      0.0135
SKINF         1        0.00147        0.00183       0.81      0.4221
LCHOL         1        0.31109        0.10936       2.84      0.0051
```

To predict someone's logged HDL cholesterol, just take the values of the predictors, multiply them by their coefficients, and add them up. Some coefficients are statistically significant; some are not. What we make of this or do about it depends on the particular research question.

Warning

It is reasonable to think that statistical methods appearing in a wide variety of text books have the imprimatur of the statistical community and are meant to be used. However, multiple regression includes many methods that were investigated for the elegance of their mathematics. Some of these methods (such as stepwise regression and principal component regression) should not be used to analyze data. We will discuss these methods in future notes.

The analyst should be mindful from the start that multiple regression techniques should never be studied in isolation from data. What we do and how we do it can only be addressed in the context of a specific research question.