Announcement
### Simplifying a Multiple Regression
Equation:

The Real Problem!

**Gerard E. Dallal, Ph.D.**

### [Early draft subject to change.]

When my students and colleagues ask me whether a particular
statistical method is appropriate, I invariably tell them to state their
research question and the answer will be clear. Applying the same
approach to regression models reveals the real barrier to using automatic
model fitting procedures to answer the question, "Which variables are
important?"

Let's back up. It is well-known that when testing whether the mean
change produced by a treatment is different for two groups, it is not
appropriate to evaluate the mean change for each group separately. That
is, it is not appropriate to say the groups are different if the mean
change in one group is statistically significant while the other is not.
It may be that the mean changes are nearly identical, with the P value
for one group being slightly less than 0.05 and the other
slightly more than 0.05. To determine whether the mean changes for
the two groups differ, the changes have to be compared directly. perhaps
by using Student's t test for independent samples applied to changes for
the two groups..

There's a similar problem with simplifying multiple regression models.
The automatic techniques find **a** model that fits the data. However,
the question isn't just a matter of what model fits the data, but what
model is demonstrably better than all other models in terms of fit or
relevance. In order to do this, automatic procedures would have to
compare models to each other directly, but they don't! At least the
stepwise procedures don't.

The "all possible models" approach may suffer from trying to summarize
a model in a single number and it certainly overestimates the utility of
the models it identifies as best, However, unlike the stepwise
procedures, the "all possible models" approach gives the analyst a feel
for competing models. Unlike the automatic stepwise procedures which
generate a single sequence of models, the "all possible models" approach
forces the analyst to come to grips with the fact that there may be many
models that look quite different from each other but fit the data almost
equally well. However, because the technique overstates the value of the
models it identifies as best, it is still necessary for those models to
be evaluated in another dataset.

[back to The Little Handbook of
Statistical Practice]

Copyright © 2003
Gerard E. Dallal