**Probability Theory**

There's a lot that could be said about probability theory. Probability theory is what makes statistical methods work. Without probability theory, there would be no way to describe the way samples might differ from the populations from which they were drawn. While it is important for mathematical statisticians to understand all of the details, all that is necessary for most analysts is to insure that random sampling is involved in observational studies and randomization is involved in intervention trials. Beyond that, there are just four things the analyst needs to know about probability.

- The probability of an event E, P(E), is the proportion of times the event occurs in a long series of experiments.
- 0 P(E) 1, where P(E) = 0 if E is an impossible event and P(E) = 1 if E is a sure thing.
- If ~E is the opposite or complement of E, then P(~E) = 1 - P(E). Thus, the probability that (an individual has high blood pressure) is 1 - the probability that (an individual does not have high blood pressure).
- The probability that something is true for an individual selected at random from a population is equal to the fraction of the population for whom it is true. For example, if 10% of a population is left-handed, the probability is 0.10 or 10% that an individual chosen at random will be left-handed.

**Probability, Histograms,
Distributions**

We've seen histograms--bar charts in which the area of the bar is
proportional to the number of observations having values in the range
defining the bar. Just as we can construct histograms of samples, we can
construct histograms of populations. The population histogram describes
the proportion of the population that lies between various limits. It
also describes the behavior of individual observations drawn at random
from the population, that is, it gives the probability that an individual
selected at random from the population will have a value between
specified limits. **It is critical that you understand that population
histograms describe the way individual observations behave. You should
not go on unless you do!**

When we're talking about
populations and probability, we don't use the words "population
histogram". Instead, we refer to *probability densities* and
*distribution functions*. (However, it will sometimes suit my
purposes to refer to "population histograms" to remind you what a density
is.) When the area of a histogram is standardized to 1, the histogram
becomes a probability density function. The area of any portion of the
histogram (the area under any part of the curve) is the proportion of the
population in the designated region. It is also the probability that an
individual selected at random will have a value in the designated region.
For example, if 40% of a population have cholesterol values between 200
and 230 mg/dl, 40% of the area of the histogram will be between 200 and
230 mg/dl. The probability that a randomly selected individual will have
a cholesterol level in the range 200 to 230 mg/dl is 0.40 or 40%.

Strictly speaking, the histogram is properly a *density*, which
tells you the proportion that lies between specified values. A
*(cumulative) distribution function* is something else. It is a
curve whose value is the proportion with values less than or equal to the
value on the horizontal axis, as the example to the left illustrates.
Densities have the same name as their distribution functions. For
example, a bell-shaped curve is a normal density. Observations that can
be described by a normal density are said to follow a normal
distribution.

If you understand that population histograms describe the way
individual observations behave, you're well on your way to understanding
what statistical methods are all about. One of the jobs of the
mathematical statistician is to describe the behavior of things other
than individual observations. If we can describe the behavior of an
individual observation, then perhaps we can describe the behavior of a
sample mean, or a sample proportion, or even the difference between two
sample means. We can! Here is the one sentence condensation of an entire
course in distribution theory: **Starting with a distribution function
that describes the behavior of individual observations, it is possible to
use mathematics to find the distribution functions that describe the
behavior of a wide variety of statistics, including, means, proportions,
standard deviations, variances, percentiles, and regression
coefficients.**

If you ever take a mathematical statistics course, you'll go through a large number of examples to learn how the mathematics works. You'll gain the skills to extend statistical theory to derive distributions for statistics that have not previously been studied. However, the basic idea will be the same. Given a distribution function that describes the behavior of individual observations, you'll derive distribution functions that describe the behavior of a wide variety of statistics, In these notes, we will accept the fact that this can be done and we will use the results obtained by others to describe the behavior of statistics that interest us. We will not bother to derive them ourselves.

This is the most important idea, after study design, that we've discussed so far--that distributions describe the behavior of things. They tell us how likely it is that the quantity being described will take on particular values. So far, we've talked about individual observations only. That is, all of the densities we've seen so far describe the behavior of individual observations, such as the individual heights displayed above.

We will soon be seeing distributions that describe the behavior of things such as sample means, sample proportions, and the difference between two sample means and two sample proportions. These distributions are all used the same way. For example, the distribution of the difference between two sample means describes what is likely to happen when two samples are drawn and the difference in their means is calculated. If you ever wanted to verify this, you could repeat the study over and over and construct a histogram of mean differences. You would find that it looks the same as the density function predicted by probability theory.