What Student Did

With large samples, confidence intervals for population means can be constructed by using only the sample mean, sample standard deviation, sample size, and the properties of the normal distribution. This is true regardless of the distribution of the individual observations.

Early in the history of statistical practice, it was recognized that there was no similar result for small samples. Even when the individual observations themselves follow a normal distribution exactly, the difference between the sample and population means tends to be greater than the normal distribution predicts. For small samples, confidence intervals for the population mean constructed by using the normal distribution are too short (they contain the population mean less often than expected) and statistical tests (to be discussed) based on the normal distribution reject a true null hypothesis more often than expected. Analysts constructed these intervals and performed these tests for lack of anything better to do, but they were aware of the deficiencies and treated the results as descriptive rather than inferential.

William Sealey Gosset, who published under the pseudonym 'A Student of Statistics', discovered that when individual observations follow a normal distribution, confidence intervals for population means could be constructed in a manner similar to that for large samples. The only difference was that the usual multiplier was replaced by one that grew larger as the sample size became smaller. He also discovered that a similar method could be used to compare two population means provided individual observations in both populations follow normal distributions and the population standard deviations were equal (sample standard deviations are never equal)--the 1.96 is replaced by a multiplier that depends on the combined sample size. Also, the two sample standard deviations were combined (or pooled) to give a best estimate of the common population standard deviation. If the samples have standard deviations s₁ and s₂, and sample sizes n₁ and n₂, then the pooled standard deviation is

s_p =

( [(n₁-1) s₁² + (n₂-1) s₂²] / [n₁ + n₂ - 2] )

and the standard deviation of the difference between the sample means is

s_p

(1/n₁ + 1/n₂)

It was now possible to perform exact significance tests and construct exact confidence intervals based on small samples in many common situations. Just as the multipliers in the case of large samples came from the normal distribution, the multipliers in the case of small samples came from a distribution which Student named the t distribution. Today, it is known as Student's t distribution.

There isn't just one t distribution. There is an infinite number of them, indexed (numbered) 1, 2, 3, and so on. The index, called "degrees of freedom," allows us to refer easily to any particular t distribution. ("Degrees of freedom" is not hyphenated. The only terms in statistics that are routinely hyphenated are "chi-square" and "goodness-of-fit.") The t distributions are like the normal distribution--unimodal and symmetric about 0--but they are spread out a bit more (heavier in the tails). As the degrees of freedom get larger, the t distribution gets closer to the standard normal distribution. A normal distribution is a t distribution with infinite degrees of freedom.

Each analysis has a particular number of degrees of freedom associated with it. Virtually all computer programs calculate the degrees of freedom automatically, but knowing how to calculate degrees of freedom by hand makes it easy to quickly check that the proper analysis is being performed and the proper data are being used.

When estimating a single population mean, the number of degrees of freedom is n - 1. When estimating the difference between two population means, the number of degrees of freedom is n₁ + n₂ - 2.

The only change in tests and confidence intervals from those based on large sample theory is the value obtained from the normal distribution, such as 1.96, is replaced by a value from a t distribution.

In the old days (B.C: before computers) when calculations were done by hand, analysts would use the normal distribution if the degrees of freedom were greater than 30 (for 30 df, the proper multiplier is 2.04; for 60 df, it's 2.00). Otherwise, the t distribution was used. This says as much about the availability of tables of the t distribution as anything else.

Today, tables of distributions have been replaced by computer programs. The computer thinks nothing about looking up the t distribution with 2351 degrees of freedom, even if it is almost identical to the standard normal distribution. There is no magic number of degrees of freedom above which the computer switches over to the standard normal distribution. Computer programs that compare sample means use Student's t distribution for every sample size and the standard normal distribution never comes into play.

We find ourselves in a peculiar position. Before computers, analysts used the standard normal distribution to analyze every large data set. It was an approximation, but a good one. After computers, we use t distributions to analyze every large data set. It works for large non-normal samples because a t distribution with a large number of degrees of freedom is essentially the standard normal distribution. The output may say t test, but it's the large sample theory that makes the test valid and large sample theory says that the distribution of a sample mean is approximately normal, not t!