An Underappreciated Consequence of
Sample Size Calculations
As They Are Usually
Performed
[Someday, this will get integrated into the main sample size note. At the moment, I don't know how to do it without the message getting lost, so I've created this separate note with a provocative title so that it will be noticed.]
My concern with sample size calculations is related to the distinction between significance tests and confidence intervals. Sample size calculations as described in these notes and most textbooks are designed to answer the very simple question, "How many observations are needed so that an effect can be detected?" While this question is often of great interest, the magnitude of the effect is often equally important. While it is possible to sidestep the issue by asserting that the magnitude of the effect can be assess after it's determined that there's something to assess, the question must be addressed at some point.
As with significance tests, knowing whether there is an effect tells you something, but leaves a lot unsaid. There are statistically significant effects of great practical importance and effects of no practical importance. The problem with sample size calculations as they are ususally performed is that there is a substantial chance that one end of the confidence interval will include values of no practical importance. Thus, while an experiment has a large chance of demonstrating the effect if it is what the investigators expect, there is a good chance that the corresponding confidence interval might leave open the possibility that the effect is quite small.
For example, consider a comparison of two population means where the expectd mean difference and known within group standard deviation are both equal to 1. The standard deviation is treated as known for this example to keep the mathematics manageable. A sample of 16 subjects per group gives an 81% chance that the hypothesis of no difference will be rejected by Student's t test at the 0.05 level of significance.
The picture at the left shows what happens with the lower limit of the 95% confidence interval for the population mean difference when the underlying mean difference and within group standard deviation are both 1. There is a 20% chance that the lower confidence limit will be less than 0, in keeping with the 20% chance that the expereiment will fail to show a statistically significant difference. As the curve demonstrates, there is also a 50% chance that the lower limit will be less than 0.31 and a 70% chance that it will be less than 0.50, that is, there is a 70% chance that the lower limit of the 95% CI will be less than half of the expected effect!
This is not a problem if the goal of a study is merely to demonstrate a difference in population means. If the goal is to estimate the difference accurately, the sample size calculations must take this into account, perhaps by using a method such as the one presented by Kupper and Hafner in their 1989 article "How Appropriate Are Popular Sample Size Formulas?" (The American Statistician, vol 43, pp 101-105).