With large samples, confidence intervals for population means can be constructed by using only the sample mean, sample standard deviation, sample size, and the properties of the normal distribution. This is true regardless of the distribution of the individual observations.

Early in the history of statistical practice, it was recognized that there was no similar result for small samples. Even when the individual observations themselves follow a normal distribution exactly, the difference between the sample and population means tends to be greater than the normal distribution predicts. For small samples, confidence intervals for the population mean constructed by using the normal distribution are too short (they contain the population mean less often than expected) and statistical tests (to be discussed) based on the normal distribution reject a true null hypothesis more often than expected. Analysts constructed these intervals and performed these tests for lack of anything better to do, but they were aware of the deficiencies and treated the results as descriptive rather than inferential.

William Sealey Gosset, who published under the pseudonym 'A Student
of Statistics', discovered that when individual observations follow a
normal distribution, confidence intervals for population means could be
constructed in a manner similar to that for large samples. The only
difference was that the usual multiplier was replaced by one that grew
larger as the sample size became smaller. He also discovered that a
similar method could be used to compare two population means provided
individual observations in both populations follow normal distributions
and the **population** standard deviations were equal (sample standard
deviations are never equal)--the 1.96 is replaced by a multiplier that
depends on the combined sample size. Also, the two sample standard
deviations were combined (or *pooled*) to give a best estimate of the
common population standard deviation. If the samples have standard
deviations s_{1} and s_{2}, and sample sizes
n_{1} and n_{2}, then the pooled standard deviation is

and the standard deviation of the difference between the sample means is

It was now possible to perform exact significance tests and construct exact confidence intervals based on small samples in many common situations. Just as the multipliers in the case of large samples came from the normal distribution, the multipliers in the case of small samples came from a distribution which Student named the t distribution. Today, it is known as Student's t distribution.

There isn't just one t distribution. There is an infinite number of them, indexed (numbered) 1, 2, 3, and so on. The index, called "degrees of freedom," allows us to refer easily to any particular t distribution. ("Degrees of freedom" is not hyphenated. The only terms in statistics that are routinely hyphenated are "chi-square" and "goodness-of-fit.") The t distributions are like the normal distribution--unimodal and symmetric about 0--but they are spread out a bit more (heavier in the tails). As the degrees of freedom get larger, the t distribution gets closer to the standard normal distribution. A normal distribution is a t distribution with infinite degrees of freedom.

Each analysis has a particular number of degrees of freedom associated with it. Virtually all computer programs calculate the degrees of freedom automatically, but knowing how to calculate degrees of freedom by hand makes it easy to quickly check that the proper analysis is being performed and the proper data are being used.

When estimating a single population mean, the number of degrees of
freedom is n - 1. When estimating the difference between two population
means, the number of degrees of freedom is n_{1} + n_{2}
- 2.

The only change in tests and confidence intervals from those based on large sample theory is the value obtained from the normal distribution, such as 1.96, is replaced by a value from a t distribution.

In the old days (B.C: before computers) when calculations were done by hand, analysts would use the normal distribution if the degrees of freedom were greater than 30 (for 30 df, the proper multiplier is 2.04; for 60 df, it's 2.00). Otherwise, the t distribution was used. This says as much about the availability of tables of the t distribution as anything else.

Today, tables of distributions have been replaced by computer programs. The computer thinks nothing about looking up the t distribution with 2351 degrees of freedom, even if it is almost identical to the standard normal distribution. There is no magic number of degrees of freedom above which the computer switches over to the standard normal distribution. Computer programs that compare sample means use Student's t distribution for every sample size and the standard normal distribution never comes into play.

We find ourselves in a peculiar position. Before
computers, analysts used the standard normal distribution to
analyze every large data set. It was an approximation, but a
good one. After computers, we use t distributions to analyze
every large data set. It works for large non-normal samples
because a t distribution with a large number of degrees of
freedom is essentially the standard normal distribution. The
output may say *t test*, but it's the large sample theory that
makes the test valid and large sample theory says that the
distribution of a sample mean is approximately normal, not t!