**An Underappreciated Consequence of
Sample Size CalculationsAs They Are Usually
Performed**

[Someday, this will get integrated into the main sample size note. At the moment, I don't know how to do it without the message getting lost, so I've created this separate note with a provocative title so that it will be noticed.]

My concern with sample size calculations is related to the distinction
between significance tests and confidence intervals. Sample size
calculations as described in these notes and most textbooks are designed
to answer the very simple question, *"How many observations are needed
so that an effect can be detected?"* While this question is often of
great interest, the *magnitude* of the effect is often equally
important. While it is possible to sidestep the issue by asserting
that the magnitude of the effect can be assess after it's determined that
there's something to assess, the question must be addressed at some
point.

As with significance tests, knowing whether there is an effect tells you something, but leaves a lot unsaid. There are statistically significant effects of great practical importance and effects of no practical importance. The problem with sample size calculations as they are ususally performed is that there is a substantial chance that one end of the confidence interval will include values of no practical importance. Thus, while an experiment has a large chance of demonstrating the effect if it is what the investigators expect, there is a good chance that the corresponding confidence interval might leave open the possibility that the effect is quite small.

For example, consider a
comparison of two population means where the expectd mean difference and
*known* within group standard deviation are both equal to 1. The
standard deviation is treated as known for this example to keep the
mathematics manageable. A sample of 16
subjects per group gives an 81% chance that the hypothesis of no
difference will be rejected by Student's t test at the 0.05 level of
significance.

The picture at the left shows what happens with the lower limit of the 95% confidence interval for the population mean difference when the underlying mean difference and within group standard deviation are both 1. There is a 20% chance that the lower confidence limit will be less than 0, in keeping with the 20% chance that the expereiment will fail to show a statistically significant difference. As the curve demonstrates, there is also a 50% chance that the lower limit will be less than 0.31 and a 70% chance that it will be less than 0.50, that is, there is a 70% chance that the lower limit of the 95% CI will be less than half of the expected effect!

This is not a problem if the goal of a study is merely to
demonstrate a difference in population means. If the goal is to estimate
the difference accurately, the sample size calculations must take this
into account, perhaps by using a method such as the one presented by
Kupper and Hafner in their 1989 article "How Appropriate Are Popular
Sample Size Formulas?" (*The American Statistician*, vol 43,
pp 101-105).