**Student's t Test for Independent
Samples**

Student's t test for independent samples is used to determine whether
two samples were drawn from populations with different means. If both
samples are large, the *separate* or *unequal variances*
version of the t test has many attractive features. The denominator of
the test statistic correctly estimates the standard deviation of the
numerator, while the Central Limit Theorem guarantees the validity of the
test even if the populations are nonnormal. "Large" sample
sizes can be as small as 30 per group if the two populations are roughly
normally distributed. The more the populations depart from normality, the
larger the sample size needed for the Central Limit Theorem to weave its
magic, but we've seen examples to suggest that 100 observations per group
is often quite sufficient.

For small and moderate sample sizes, the *equal variances*
version of the test provides an exact test of the equality of the two
population means. The validity of the test demands that the samples be
drawn from normally distributed populations with equal (population)
standard deviations. Just as one reflexively asks about randomization,
blinding, and controls when evaluating a study design, it should become
second-nature to ask about normality and equal variances when preparing
to use Student's t test.

Formal analysis and simulations offer the following guidelines
describing extent to which the assumptions of normality and equal
population variances be violated without affecting the validity of
Student's test for independent samples. [see Rupert Miller, Jr., (1986)
*Beyond ANOVA, Basics of Applied Statistics*, New York: John Wiley
& Sons]

- If sample sizes are equal, (a) nonnormality is not a problem and
(b) the t test can tolerate population standard deviation ratios of 2
without showing any major ill effect. (For equal sample sizes, the two
test statistics are equal.) The worst situations occurs when the
population variances and sample sizes are unequal. Here are the
results of a quick simulation of Student's t test for independent
samples where the data come from populations that follow a normal
distribution.
one sample has both a much larger variance and a much smaller sample size than the other. For example, if the variance ratio is 5 and the sample size ratio is 1/5, a nominal P value of 0.05 is actually 0.22.**Confidence Intervals****Significance Tests**construct interval calculate test statistic is 0 in interval? is statistic far enough

away from 0?

- Serious distortion of the P value can occur when the skewness of the
two populations is different.

- Outliers can distort the mean difference and the t statistic. They tend to inflate the variance and depress the value and corresponding statistical significance of the t statistic.

Preliminary tests for normality and equality of variances--using Student's t test only if these preliminary tests fail to achieve statistical significance--should be avoided. These preliminary tests often detect differences too small to affect Student's t test. Since the test is such a convenient way to compare two populations, it should not be abandoned without good cause. Important violations of the requirements will be detectable to the naked eye without a formal significance test.

What should be done if the conditions for the validity of Student's t test are violated? The best approach is to transform the data to a scale in which the conditions are satisfied. This will almost always involve a logarithmic transformation. On rare occasions, a square root, inverse, or inverse square root might be used. For proportions, arcsin(sqrt(p)) or log(p/(1-p)) might be used. If no satisfactory transformation can be found, a nonparametric test such as the median test or the Wilcoxon-Mann-Whitney test might be used.

The major advantage of transformations is that they make it possible
to use standard techniques to construct confidence intervals for
estimating between-group differences. In theory, it is possible to
construct confidence intervals (for the diffference in medians,
say) when rank tests are used. However, we are prisoners of our
software. Programs that construct these confidence intervals are not
readily available.