Involving Population Means

All of these 95% confidence intervals will be of the form *point
estimate* plus and minus *1.96* times *the appropriate measure
of uncertainty for the point estimate*.

A 95% confidence interval for a single population mean is

or or .

A 95% confidence interval for the difference between two population
means, _{x} - _{y}, is

When **population** standard deviations are equal, a 95% confidence
interval for the difference between two population means is

or or ,

where s_{p} is the pooled sample standard deviation, so called
because it combines or pools the information from both samples to
estimate their common population variance

or .

Both expressions are informative. The first shows that
s_{p}^{2} is a weighted combination of the individual
sample variances, with weights equal to one less than the sample sizes.
The second shows that it is calculated by summing up the squared
deviations from each sample and dividing by 2 less than the combined
sample size. It's worth noting that

In general, when there are many ways to answer a question, the approach that makes assumptions is better in some sense when the assumptions are met. The 95% CIs that assume equal population variances will have true coverage closer to 95% for smaller sample sizes if the population variances are, in fact, equal. The downside is that the population variances have to be equal (or not so different that it matters).

Many argue that the interval that makes no assumptions should be used routinely for large samples because it will be approximately correct whether or not the assumptions are met. However, methods (yet to be seen) that adjust for the effects of other variables often make assumptions similar to the equality of population SDs. It seems strange to say that the SDs should be treated as unequal unless adjustments are being made! For this reason, I tend to use the common variances version of CIs, transforming the data if necessary to better satisfy requirement for equal population variances. That said, it is important to add that assumptions should not be made when they are known from the start to be false.