Student's t test for independent samples is equivalent to the linear regression of the response variable on the grouping variable, where the grouping variable is recoded to have numerical values, if necessary.
Here's an example involving glucose levels in two strains of rats, A and B. First, the data are displayed in a dot plot. Then, Glucose is plotted against A0B1, where A0B1 is created by setting it equal 0 for strain A and 1 for strain B.
Student's t test for independent samples yields
Variable: GLU STRAIN N Mean Std Dev Std Error ----------------------------------------------------- A 10 80.40000000 29.20502240 9.23543899 B 12 99.66666667 19.95601223 5.76080452 Variances T DF Prob>|T| --------------------------------------- Unequal -1.7700 15.5 0.0965 Equal -1.8327 20.0 0.0818
The linear regression of glucose on A0B1 gives the equation GLU = b_{0} + b_{A0B1} A0B1 .
Dependent Variable: GLU Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEPT 1 80.400000 7.76436303 10.355 0.0001 A0B1 1 19.266667 10.51299725 1.833 0.0818
The P value for the Equal Variances version of the t test is equal to the P value for the regression coefficient of the grouping variable A0B1 (P = 0.0818). The corresponding t statistics are equal in magnitude (|t| = 1.833). This is not a coincidence. Statistical theory says the two P values must be equal, while the t statistics must be equal in magnitude. The signs of the t statistics will be the same if the t statistic for Student's t test is calculated by subtracting the mean of group 0 from the mean of group 1.
The equal variances version of Student's t test is used to test the hypothesis of the equality of _{A} and _{B}, the means of two normally distributed populations with equal population variances (H_{0}: _{A}=_{B}). The population means can be reexpressed as _{A}= and _{B}=+, where =_{B}-_{A} (that is, data from strain A are normally distributed with mean and standard deviation while data from strain B are normally distributed with mean + and standard deviation ) and the hypothesis can be rewriten as H_{0}: =0.
The linear regression model says data are normally distributed about the regression line with constant standard deviation . The predictor variable A0B1 (the grouping variable) takes on only two values. Therefore, there are only two locations along the regression line where there are data (see the display). "Homoscedastic (constant spread about the regression line) normally distributed values about the regression line" is equivalent to "two normally distributed populations with equal variances".
Since the probability structure is the same for the two problems (homoscedastic, normally distributed data), test statistics and P values will be the same, too. The numbers confirm this.