LARGE SAMPLE Formulas for Confidence Intervals
Involving Population Means

All of these 95% confidence intervals will be of the form point estimate plus and minus 1.96 times the appropriate measure of uncertainty for the point estimate.

A 95% confidence interval for a single population mean is

   or      or   .

A 95% confidence interval for the difference between two population means, x - y, is

When population standard deviations are equal, a 95% confidence interval for the difference between two population means is

   or      or   ,

where sp is the pooled sample standard deviation, so called because it combines or pools the information from both samples to estimate their common population variance

   or   .

Both expressions are informative. The first shows that sp2 is a weighted combination of the individual sample variances, with weights equal to one less than the sample sizes. The second shows that it is calculated by summing up the squared deviations from each sample and dividing by 2 less than the combined sample size. It's worth noting that

nx + ny - 2 = (nx-1) + (ny-1).
The right hand side is the sum of the denominators that are used when calculating the individual SDs.

In general, when there are many ways to answer a question, the approach that makes assumptions is better in some sense when the assumptions are met. The 95% CIs that assume equal population variances will have true coverage closer to 95% for smaller sample sizes if the population variances are, in fact, equal. The downside is that the population variances have to be equal (or not so different that it matters).

Many argue that the interval that makes no assumptions should be used routinely for large samples because it will be approximately correct whether or not the assumptions are met. However, methods (yet to be seen) that adjust for the effects of other variables often make assumptions similar to the equality of population SDs. It seems strange to say that the SDs should be treated as unequal unless adjustments are being made! For this reason, I tend to use the common variances version of CIs, transforming the data if necessary to better satisfy requirement for equal population variances. That said, it is important to add that assumptions should not be made when they are known from the start to be false.


Copyright © 2000 Gerard E. Dallal