Class Levels Values GROUP 3 CC CCM P Dependent Variable: DBMD05 Sum of Source DF Squares Mean Square F Value Pr > F Model 2 44.0070120 22.0035060 5.00 0.0090 Error 78 343.1110102 4.3988591 Corrected Total 80 387.1180222 R-Square Coeff Var Root MSE DBMD05 Mean 0.113679 -217.3832 2.097346 -0.964815 Source DF Type I SS Mean Square F Value Pr > F GROUP 2 44.00701202 22.00350601 5.00 0.0090 Source DF Type III SS Mean Square F Value Pr > F GROUP 2 44.00701202 22.00350601 5.00 0.0090 Standard Parameter Estimate Error t Value Pr > |t| Intercept -1.520689655 B 0.38946732 -3.90 0.0002 GROUP CC 0.075889655 B 0.57239773 0.13 0.8949 GROUP CCM 1.597356322 B 0.56089705 2.85 0.0056 GROUP P 0.000000000 B . . . NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable. The GLM Procedure Least Squares Means DBMD05 LSMEAN GROUP LSMEAN Number CC -1.44480000 1 CCM 0.07666667 2 P -1.52068966 3 Least Squares Means for effect GROUP Pr > |t| for H0: LSMean(i)=LSMean(j) i/j 1 2 3 1 0.0107 0.8949 2 0.0107 0.0056 3 0.8949 0.0056 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used. Adjustment for Multiple Comparisons: Tukey-Kramer Least Squares Means for effect GROUP Pr > |t| for H0: LSMean(i)=LSMean(j) i/j 1 2 3 1 0.0286 0.9904 2 0.0286 0.0154 3 0.9904 0.0154

The **Analysis of Variance** table is just like any other ANOVA
table. The Total Sum of Squares is the uncertainty that would be present
if one had to predict individual responses without any other information.
The best one could do is predict each observation to be equal to the
overall sample mean. The ANOVA table partitions this variability into two
parts. One portion is accounted for (some say "explained by") the model.
It's the reduction in uncertainty that occurs when the ANOVA model,

**Model, ** **Error**, **Corrected Total**, **Sum of
Squares**, **Degrees of Freedom**, **F Value**, and **Pr F**
have the same meanings as for multiple regression. This is to be
expected since analysis of variance is nothing more than the regression
of the response on a set of indicators definded by the categorical
predictor variable.

The degrees of freedom for the model is equal to one less than the number of categories. The F ratio is nothing more than the extra sum of squares principle applied to the full set of indicator variables defined by the categorical predictor variable. The F ratio and its P value are the same regardless of the particular set of indicators (the constraint placed on the -s) that is used.

**Sums of Squares:** The total amount of variability in the
response can be written , the
sum of the squared differences between each observation and the overall
mean. If we were asked to make a prediction without any other
information, the best we can do, in a certain sense, is the overall mean.
The amount of variation in the data that can't be accounted for by this
simple method of prediction is the Total Sum of Squares.

When the Analysis of Variance model is used for prediction, the best that can be done is to predict each observation to be equal to its group's mean. The amount of uncertainty that remains is sum of the squared differences between each observation and its group's mean, . This is the Error sum of squares. In this outpur it also appears as the GROUP sum of squares. The difference between the Total sum of squares and the Error sum of squares is the Model Sum of Squares, which happens to be equal to .

Each sum of squares has corresponding degrees of freedom (DF) associated
with it. Total df is one less than the number of observations, N-1.
The Model df is the one less than the number of levels The
Error df is the difference between the Total df (N-1) and the Model df
(g-1), that is, N-g. Another way to calculate the error degrees of freedom
is by summing up the error degrees of freedom from each group,
n_{i}-1, over all *g* groups.

The **Mean Squares** are the Sums of Squares divided by the
corresponding degrees of freedom.

The **F Value **or **F ratio** is the test statistic used
to decide whether the sample means are withing sampling variability of
each other. That is, it tests the hypothesis H_{0}: _{1}..._{g}. This is the same thing as asking
whether the model as a whole has statistically significant predictive
capability in the regression framework. **F** is the ratio of the
Model Mean Square to the Error Mean Square. Under the null
hypothesis that the model has no predictive capability--that is, that all
of thepopulation means are equal--the F statistic follows an F
distribution with *p* numerator degrees of freedom and *n-p-1*
denominator degrees of freedom. The null hypothesis is rejected if the F
ratio is large. This statstic and P value might be ignored depending on
the primary research question and whether a multiple comparisons procedure
is used. (See the discussion of multiple comparison
procedures.)

The **Root Mean Square Error** (also known as **the standard error
of the estimate**) is the square root of the Residual Mean Square. It
estimates the common within-group standard deviation.

The parameter estimates from a single factor analysis of variance
might best be ignored. Different statistical program packages fit
different paraametrizations of the one-way ANOVA model to the data.
SYSTAT, for example, uses the usual constraint where _{i}=0. SAS, on the other hand, sets
_{g} to 0. Any version of the model
can be used for prediction, but care must be taken with significance
tests involving individual terms in the model to make sure they
correspond to hypotheses of interest. In the SAS output above, the
Intercept tests whether the mean bone density in the Placebo group is 0
(which is, after all, to be expected) while the coefficients for CC and
CCM test whether those means are different from placebo. It is usually
safer to test hypotheses directly by using the whatever facilities the
software provides that by taking a chance on the proper interpretation of
the model parametrization the software might have implemented. The
possiblity of many different parametrizations is the subject of the
warning that *Terms whose estimates are followed by the letter 'B' are
not uniquely estimable.*

After the parameter estimates come two examples of multiple
comparisons procedures, which are used to determine which groups are
different given that they are not all the same. These methods are
discussed in detail in the note on multiple comparison
procedures. The two methods presented here are *Fisher's Least
Significant Differences* and *Tukey's Honestly Signficant
Differences*. Fisher's Least Significant Differences is essentially
all possible t tests. It differs only in that the estimate of the common
within group standard deviation is obtained by pooling information from
all of the levels of the factor and not just the two being compared at
the moment. The values in the matrix of P values comparing groups 1&3
and 2&3 are identical to the values for the CC and CCM parameters in the
model.