Introduction to Multiple Linear Regression
Gerard E. Dallal, Ph.D.

If you are familiar with simple linear regression, then you know the very basics of multiple linear regression. Once again, the goal is to obtain the least squares equation (that is, the equation for which the sum of squared residuals is a minimum) to predict some response. With simple linear regression there was one predictor. The fitted equation was of the form


With multiple linear regression, there are multiple predictors. The fitted equation is of the form

where p is the number of predictors.

The output from a multiple linear analysis will look familiar. Here is an example of cross-sectional data where the log of HDL cholesterol (the so-called good cholesterol) in women is predicted from their age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol.

                              The REG Procedure
                                Model: MODEL1
                         Dependent Variable: LHCHOL 

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     8        0.54377        0.06797       6.16    <.0001
Error                   147        1.62276        0.01104                     
Corrected Total         155        2.16652                                    

             Root MSE              0.10507    R-Square     0.2510
             Dependent Mean        1.71090    Adj R-Sq     0.2102
             Coeff Var             6.14105                       

                             Parameter Estimates

                          Parameter       Standard
     Variable     DF       Estimate          Error    t Value    Pr > |t|

     Intercept     1        1.16448        0.28804       4.04      <.0001
     AGE           1    -0.00092863        0.00125      -0.74      0.4602
     BMI           1       -0.01205        0.00295      -4.08      <.0001
     BLC           1        0.05055        0.02215       2.28      0.0239
     PRSSY         1    -0.00041910     0.00044109      -0.95      0.3436
     DIAST         1        0.00255        0.00103       2.47      0.0147
     GLUM          1    -0.00046737     0.00018697      -2.50      0.0135
     SKINF         1        0.00147        0.00183       0.81      0.4221
     LCHOL         1        0.31109        0.10936       2.84      0.0051

To predict someone's logged HDL cholesterol, just take the values of the predictors, multiply them by their coefficients, and add them up. Some coefficients are statistically significant; some are not. What we make of this or do about it depends on the particular research question.


It is reasonable to think that statistical methods appearing in a wide variety of text books have the imprimatur of the statistical community and are meant to be used. However, multiple regression includes many methods that were investigated for the elegance of their mathematics. Some of these methods (such as stepwise regression and principal component regression) should not be used to analyze data. We will discuss these methods in future notes.

The analyst should be mindful from the start that multiple regression techniques should never be studied in isolation from data. What we do and how we do it can only be addressed in the context of a specific research question.

Copyright © 2001 Gerard E. Dallal