Student's t test for independent samples is equivalent to the linear regression of the response variable on the grouping variable, where the grouping variable is recoded to have numerical values, if necessary.
Here's an example involving
glucose levels in two strains of rats, A and B. First, the data are
displayed in a dot plot. Then, Glucose is plotted against A0B1, where
A0B1 is created by setting it equal 0 for strain A and 1 for
strain B.
Student's t test for independent samples yields
Variable: GLU STRAIN N Mean Std Dev Std Error ----------------------------------------------------- A 10 80.40000000 29.20502240 9.23543899 B 12 99.66666667 19.95601223 5.76080452 Variances T DF Prob>|T| --------------------------------------- Unequal -1.7700 15.5 0.0965 Equal -1.8327 20.0 0.0818
The linear regression of glucose on A0B1 gives the equation GLU = b0 + bA0B1 A0B1 .
Dependent Variable: GLU Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEPT 1 80.400000 7.76436303 10.355 0.0001 A0B1 1 19.266667 10.51299725 1.833 0.0818
The P value for the Equal Variances version of the t test is equal to the P value for the regression coefficient of the grouping variable A0B1 (P = 0.0818). The corresponding t statistics are equal in magnitude (|t| = 1.833). This is not a coincidence. Statistical theory says the two P values must be equal, while the t statistics must be equal in magnitude. The signs of the t statistics will be the same if the t statistic for Student's t test is calculated by subtracting the mean of group 0 from the mean of group 1.
The equal variances version of Student's t test is used to test the
hypothesis of the equality of A and
B, the means of two normally
distributed populations with equal population variances
(H0:
A=
B). The population means can be
reexpressed as
A=
and
B=
+
,
where
=
B-
A (that is, data from strain A
are normally distributed with mean
and
standard deviation
while data
from
strain B are normally distributed with mean
+
and
standard deviation
) and the
hypothesis
can be rewriten as H0:
=0.
The linear regression model says data are normally distributed about
the regression line with constant standard deviation
. The predictor variable
A0B1 (the grouping variable) takes on only two
values. Therefore, there are only two locations along the regression line
where there are data (see the display). "Homoscedastic (constant spread
about the regression line) normally distributed values about the
regression line" is equivalent to "two normally distributed populations
with equal variances".
Since the probability structure is the same for the two problems (homoscedastic, normally distributed data), test statistics and P values will be the same, too. The numbers confirm this.