Announcement

Student's t Test for Independent Samples

Student's t test for independent samples is used to determine whether two samples were drawn from populations with different means. If both samples are large, the separate or unequal variances version of the t test has many attractive features. The denominator of the test statistic correctly estimates the standard deviation of the numerator, while the Central Limit Theorem guarantees the validity of the test even if the populations are nonnormal. "Large" sample sizes can be as small as 30 per group if the two populations are roughly normally distributed. The more the populations depart from normality, the larger the sample size needed for the Central Limit Theorem to weave its magic, but we've seen examples to suggest that 100 observations per group is often quite sufficient.

For small and moderate sample sizes, the equal variances version of the test provides an exact test of the equality of the two population means. The validity of the test demands that the samples be drawn from normally distributed populations with equal (population) standard deviations. Just as one reflexively asks about randomization, blinding, and controls when evaluating a study design, it should become second-nature to ask about normality and equal variances when preparing to use Student's t test.

Formal analysis and simulations offer the following guidelines describing extent to which the assumptions of normality and equal population variances be violated without affecting the validity of Student's test for independent samples. [see Rupert Miller, Jr., (1986) Beyond ANOVA, Basics of Applied Statistics, New York: John Wiley & Sons]

• If sample sizes are equal, (a) nonnormality is not a problem and (b) the t test can tolerate population standard deviation ratios of 2 without showing any major ill effect. (For equal sample sizes, the two test statistics are equal.) The worst situations occurs when the population variances and sample sizes are unequal. Here are the results of a quick simulation of Student's t test for independent samples where the data come from populations that follow a normal distribution.
 Confidence Intervals Significance Tests construct interval calculate test statistic is 0 in interval? is statistic far enoughaway from 0?
one sample has both a much larger variance and a much smaller sample size than the other. For example, if the variance ratio is 5 and the sample size ratio is 1/5, a nominal P value of 0.05 is actually 0.22.
• Serious distortion of the P value can occur when the skewness of the two populations is different.
• Outliers can distort the mean difference and the t statistic. They tend to inflate the variance and depress the value and corresponding statistical significance of the t statistic.

Preliminary tests for normality and equality of variances--using Student's t test only if these preliminary tests fail to achieve statistical significance--should be avoided. These preliminary tests often detect differences too small to affect Student's t test. Since the test is such a convenient way to compare two populations, it should not be abandoned without good cause. Important violations of the requirements will be detectable to the naked eye without a formal significance test.

What should be done if the conditions for the validity of Student's t test are violated? The best approach is to transform the data to a scale in which the conditions are satisfied. This will almost always involve a logarithmic transformation. On rare occasions, a square root, inverse, or inverse square root might be used. For proportions, arcsin(sqrt(p)) or log(p/(1-p)) might be used. If no satisfactory transformation can be found, a nonparametric test such as the median test or the Wilcoxon-Mann-Whitney test might be used.

The major advantage of transformations is that they make it possible to use standard techniques to construct confidence intervals for estimating between-group differences. In theory, it is possible to construct confidence intervals (for the diffference in medians, say) when rank tests are used. However, we are prisoners of our software. Programs that construct these confidence intervals are not readily available.

Gerard E. Dallal