Sometimes the same model is fitted to two different populations. For
example, an researcher might wish to investigate whether weight predicts
blood pressure in smokers and nonsmokers and, if so, whether the
regression model fits one group better than the other. The problem with
questions like this is that the answer depends on what we mean by
*better*.

It is common to hear investigators speak of the model with the larger
**coefficient of determination**, R^{2}, as though it fits
better because it accounts for more of the variability in the response.
However, it is possible for the model with the smaller R^{2} to
have the smaller standard error of the estimate and make more precise
predictions. Here is a small dataset to illustrate this behavior.

X | Y | X | Y |
---|---|---|---|

158.2 | 157.8 | 140.4 | 153.2 |

214.9 | 146.6 | 211.9 | 157.4 |

153.2 | 147.5 | 152.4 | 149.6 |

196.0 | 153.1 | 124.7 | 154.9 |

88.5 | 143.7 | 103.9 | 145.2 |

55.5 | 132.3 | 128.5 | 141.7 |

86.4 | 144.3 | 187.1 | 159.7 |

223.6 | 169.1 | 168.5 | 145.3 |

256.9 | 160.9 | 138.3 | 151.7 |

252.4 | 157.1 | 137.9 | 141.7 |

20.9 | 141.6 | 203.3 | 153.3 |

92.9 | 145.4 | 102.5 | 145.8 |

The two data sets need not have the same regression line, but they have been constructed with the same regression line in this example to remove any suspicion that these results might have something to do with the slopes of the regression lines. They don't!

Y = 134.9 + 0.100 X | ||
---|---|---|

R^{2} | s_{e} | |

Red | 0.36 | 5.04 |

Black | 0.64 | 6.28 |

The **black** data set, with open circles and outer prediction
bands, has the **larger R ^{2}**. The

Does the model fit one group better than the other? I try to avoid
questions demanding one word answers where the answer depends on the
choice of summary measure. However, if pressed, I would argue that the
answer is red. R^{2} is just a
disguised correlation coefficient (the square of the correlation between
the observed and predicted values). I have yet to encounter a real
research question for which the answer is "correlation coefficient". If I
were to use "better" in connection with linear regression it would almost
certainly have something to do with prediction. The standard error of the
estimate (s_{e}) estimates the precision of the predictions. The
accuracy of the predictions typically determines whether the regression
equation will be useful. While the regression equation may account for
more variability in the black group, the predictions are more precise in
the red group.

R^{2} can be written as

These results apply when the same model is fitted to two different
sets of observations. If two models were fitted to the same set of
responses--for example, if weight and amount of exercise were used
separately to predict blood pressure in the same set of individuals--then
the model for which R^{2} is larger would necessarily be the
model for which s_{e} is smaller. That's because Total SS
would be the same for both, so the model for which
Residual SS/Total SS is smaller must also be the one for which
Residual SS is smaller.