Class Levels Values GROUP 3 CC CCM P Dependent Variable: DBMD05 Sum of Source DF Squares Mean Square F Value Pr > F Model 2 44.0070120 22.0035060 5.00 0.0090 Error 78 343.1110102 4.3988591 Corrected Total 80 387.1180222 R-Square Coeff Var Root MSE DBMD05 Mean 0.113679 -217.3832 2.097346 -0.964815 Source DF Type I SS Mean Square F Value Pr > F GROUP 2 44.00701202 22.00350601 5.00 0.0090 Source DF Type III SS Mean Square F Value Pr > F GROUP 2 44.00701202 22.00350601 5.00 0.0090 Standard Parameter Estimate Error t Value Pr > |t| Intercept -1.520689655 B 0.38946732 -3.90 0.0002 GROUP CC 0.075889655 B 0.57239773 0.13 0.8949 GROUP CCM 1.597356322 B 0.56089705 2.85 0.0056 GROUP P 0.000000000 B . . . NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable. The GLM Procedure Least Squares Means DBMD05 LSMEAN GROUP LSMEAN Number CC -1.44480000 1 CCM 0.07666667 2 P -1.52068966 3 Least Squares Means for effect GROUP Pr > |t| for H0: LSMean(i)=LSMean(j) i/j 1 2 3 1 0.0107 0.8949 2 0.0107 0.0056 3 0.8949 0.0056 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used. Adjustment for Multiple Comparisons: Tukey-Kramer Least Squares Means for effect GROUP Pr > |t| for H0: LSMean(i)=LSMean(j) i/j 1 2 3 1 0.0286 0.9904 2 0.0286 0.0154 3 0.9904 0.0154
The Analysis of Variance table is just like any other ANOVA table. The Total Sum of Squares is the uncertainty that would be present if one had to predict individual responses without any other information. The best one could do is predict each observation to be equal to the overall sample mean. The ANOVA table partitions this variability into two parts. One portion is accounted for (some say "explained by") the model. It's the reduction in uncertainty that occurs when the ANOVA model,
Model, Error, Corrected Total, Sum of Squares, Degrees of Freedom, F Value, and Pr F have the same meanings as for multiple regression. This is to be expected since analysis of variance is nothing more than the regression of the response on a set of indicators definded by the categorical predictor variable.
The degrees of freedom for the model is equal to one less than the number of categories. The F ratio is nothing more than the extra sum of squares principle applied to the full set of indicator variables defined by the categorical predictor variable. The F ratio and its P value are the same regardless of the particular set of indicators (the constraint placed on the -s) that is used.
Sums of Squares: The total amount of variability in the response can be written , the sum of the squared differences between each observation and the overall mean. If we were asked to make a prediction without any other information, the best we can do, in a certain sense, is the overall mean. The amount of variation in the data that can't be accounted for by this simple method of prediction is the Total Sum of Squares.
When the Analysis of Variance model is used for prediction, the best that can be done is to predict each observation to be equal to its group's mean. The amount of uncertainty that remains is sum of the squared differences between each observation and its group's mean, . This is the Error sum of squares. In this outpur it also appears as the GROUP sum of squares. The difference between the Total sum of squares and the Error sum of squares is the Model Sum of Squares, which happens to be equal to .
Each sum of squares has corresponding degrees of freedom (DF) associated with it. Total df is one less than the number of observations, N-1. The Model df is the one less than the number of levels The Error df is the difference between the Total df (N-1) and the Model df (g-1), that is, N-g. Another way to calculate the error degrees of freedom is by summing up the error degrees of freedom from each group, ni-1, over all g groups.
The Mean Squares are the Sums of Squares divided by the corresponding degrees of freedom.
The F Value or F ratio is the test statistic used to decide whether the sample means are withing sampling variability of each other. That is, it tests the hypothesis H0: 1...g. This is the same thing as asking whether the model as a whole has statistically significant predictive capability in the regression framework. F is the ratio of the Model Mean Square to the Error Mean Square. Under the null hypothesis that the model has no predictive capability--that is, that all of thepopulation means are equal--the F statistic follows an F distribution with p numerator degrees of freedom and n-p-1 denominator degrees of freedom. The null hypothesis is rejected if the F ratio is large. This statstic and P value might be ignored depending on the primary research question and whether a multiple comparisons procedure is used. (See the discussion of multiple comparison procedures.)
The Root Mean Square Error (also known as the standard error of the estimate) is the square root of the Residual Mean Square. It estimates the common within-group standard deviation.
The parameter estimates from a single factor analysis of variance might best be ignored. Different statistical program packages fit different paraametrizations of the one-way ANOVA model to the data. SYSTAT, for example, uses the usual constraint where i=0. SAS, on the other hand, sets g to 0. Any version of the model can be used for prediction, but care must be taken with significance tests involving individual terms in the model to make sure they correspond to hypotheses of interest. In the SAS output above, the Intercept tests whether the mean bone density in the Placebo group is 0 (which is, after all, to be expected) while the coefficients for CC and CCM test whether those means are different from placebo. It is usually safer to test hypotheses directly by using the whatever facilities the software provides that by taking a chance on the proper interpretation of the model parametrization the software might have implemented. The possiblity of many different parametrizations is the subject of the warning that Terms whose estimates are followed by the letter 'B' are not uniquely estimable.
After the parameter estimates come two examples of multiple comparisons procedures, which are used to determine which groups are different given that they are not all the same. These methods are discussed in detail in the note on multiple comparison procedures. The two methods presented here are Fisher's Least Significant Differences and Tukey's Honestly Signficant Differences. Fisher's Least Significant Differences is essentially all possible t tests. It differs only in that the estimate of the common within group standard deviation is obtained by pooling information from all of the levels of the factor and not just the two being compared at the moment. The values in the matrix of P values comparing groups 1&3 and 2&3 are identical to the values for the CC and CCM parameters in the model.