Significantly Different From Each Other

[To explain the concepts in a straightforward manner,
I'm being a bit loose with my language. I am using the word
*similar* to indicate *not shown to be different statistically*
or *within sampling variability of each other.*]

When statistical program packages report the results of a multiple comparisons procedure, the output is usually in the form of a list of pairwise comparisons along with an indication whether each comparison is statistically significant. When these results are summarized for publication, standard practice is to present a table of mean with various superscripts attached and a comment such as,

- "Means sharing the same superscript are not significantly different from each other (Tukey's HSD, P<0.05)" or
- "Means that have no superscript in common are significantly different from each other (Tukey's HSD, P<0.05)."

This procedure is widely used. Nevertheless, at the time of this writing (November 2007; the last version was written in 2003 and, before that, March 2000!), none of the major statistical packages--SAS, SPSS, SYSTAT--provides the superscripts automatically. The analyst must deduce them from the table of P values. The one exception is the MEANS statement of SAS's GLM procedure, which can be used only when the number of observations is the same for each group or treatment. Since the computer software refuses to do the work, the analyst is left to translate the list of pairwise differences into a set of superscripts so that those not judged different from each other share a superscript while those judged different do not have a superscript in common.

By way of example, consider a set of four groups--A,B,C,D--where A was judged different from B and B was judged different from D. A brute force approach might use a different superscript for each possible comparison, eliminating those superscripts where the pair is judged significantly different. There are six possible comparisons--AB, AC, AD, BC, BD, CD--so the brute force approach would start with six superscripts

where the superscript

This is a true description of the differences between the groups, but it is awkward when you consider that the same set of differences can be written

In both cases, A & B do not share a
superscript, nor do B & D. However, every other combination *does*
share a superscript. The second expression is much easier to interpret
because

- it contains only two superscripts rather than the four in the first expression and
- it makes it much easier to identify sets of 3 or more similar treatments.

There is a straightforward way to obtain the simpler expression. A computer program to generate the superscripts is now available. The procedure takes sets of similar treatments and divides them if they contain pairs of treatments have been shown to be different.

**Initially, there is one set of treatments that are judged to be similar. It includes all of the treatments.****Then, every dissimilar pair of treatments is considered in turn. The order in which the dissimilar pairs are considered is unimportant. Two operations are performed for each dissimilar pair:**- Each of the similar sets is considered in turn. If a similar set contains the dissimilar pair under consideration, the similar set is replaced with two smaller sets by rewriting it twice, each without one of the two dissimilar treatments.
- Similar sets that are subsets of other similar sets are deleted.

Example: Consider the situation described earlier: four treatments A, B, C, D where the pairwise differences A&B and B&D have been judged statistically significant.

With four treatments A,B,C,D, the initial similar set is

**ABCD**There are two dissimilar pairs (A,B) and (B,D).

**Start with (A,B)**. Since ABCD contains (A,B), replace ABCD by rewriting it twice, once without A and once without B, to get**ACD**and**BCD****Next, consider (B,D)**. Since the similar set ACD does not contain BD, leave it alone. The similar set BCD*does*contain (B,D). Therefore, replace BCD by rewriting it twice, once without B and once without D to get**ACD, CD,**and**BC**.CD is eliminated because it is contained in ACD leaving

**ACD**and**BC**.Thus, two marks/superscripts are needed. One is attached to the means of A, C, and D. The other is attached to the means of B and C.

A ^{a}B^{b}C^{ab}D^{a}

This is consistent with the analysis that said the only statistically significant differences among these means were between A & B and B & D. A & B do not share a superscript, nor do B & D . Every other combination, however,*does*share a superscript.

**(Never!) Attaching Superscripts To Singletons**

Some researchers have attached unique superscripts to single means that are judged to be different from all other means. For example, suppose when comparing four treatment means,

- D was judged significantly different from A, B, and C,
- while A, B, and C showed no statistically significant differences among themselves.

I find superscripts affixed to a single mean to be the *worst*
kind of visual clutter. They invite the reader to look for matches that
don't exist. It's similar to reading an article that includes a symbol
indicating a footnote and being unable to find the footnote! Without such
superscripts, unique means stand unadorned and the absence of any
superscript trumpets a mean's uniqueness. For this reason, I **never use
superscripts that would be attached to only one mean**.