Significance Tests Simplified
If an observed significance level (P value) is less than
0.05, reject the hypothesis under test; if it is greater than
0.05, fail to reject the null hypothesis.
That was the most difficult sentence I've had to write for these
notes, not because it's wrong (indeed, it's what good analysts often
do when they conduct a statistical test) but because its indiscriminate
and blind use is at the root of much bad statistical practice. So, I hate
to just come out and say it.
It is what good analysts do! At some point, all of the
principles of good study design and execution have been followed, a well-
posed research question will have been developed, and the proper data
will have been collected. An essential part of the subsequent analysis
will address whether the results are "statistically significant" or just
random noise. In classical statistics, significance tests are the way
statistical significance is assessed.
There are two main reasons why the indiscriminate and blind use of
significance tests is at the root of much bad statistical practice.
Significance tests are never the whole answer. They are just a piece of
the puzzle. Statistical significance is irrelevant if the effect is of
no practical importance. That said, significance tests are an important
and useful piece of the puzzle. Every so often, a cry is raised that P
values should no longer be used because of the way they can be abused.
Those who would abandon significance tests entirely because of the
potential of misuse make an even greater mistake that those who abuse
- Often, investigators are so blinded by statistical signicance
that they all but ignore practical importance. Any result that is
statistically signifcant is the cause of great joy and celebration, even
if it is of no practical importance whatsoever. Groups whose difference
is not statistically significant are reported as being "the same" or we
are told that "there is no difference between the groups"!
- The value 0.05 takes on mystical characteristics. A good analyst
also knows that 0.05 is not a magic number. There is little difference
between P=0.04 and P=0.06. Construct a few 94 and 96% CIs from the same
data set and see how little they differ (one is about 10% longer than the
other.) Significance tests by themselves suggest that the research
question is only about whether there is a difference, no matter the size
or direction. The confidence intervals corresponding to P=0.04 and
P=0.06 will probably show much the same thing, namely, they will rule out
differences of practical importance in a particular direction. They will
also suggest that, if there is a difference, it may or may not be of
Good Analyst, Bad Analyst
The difference between a good analyst and a bad analyst is that the
bad analyst wants a P value and pays no attention to the quality of the
data. The bad analyst sees only P<0.05 or P>0.05 with no regard for
confidence intervals or for the context in which the data were collected.
The good analyst first consideres whether all of the principles of
good study design have been followed. The good analyst knows what a test
procedure requires for the resulting P value to be valid. The good
analyst treats the P value as an important part of the analysis, but not
as the whole answer.
Copyright © 2000 Gerard E. Dallal