There's a lot that could be said about probability theory. Probability theory is what makes statistical methods work. Without probability theory, there would be no way to describe the way samples might differ from the populations from which they were drawn. While it is important for mathematical statisticians to understand all of the details, all that is necessary for most analysts is to insure that random sampling is involved in observational studies and randomization is involved in intervention trials. Beyond that, there are just four things the analyst needs to know about probability.
Probability, Histograms, Distributions
We've seen histograms--bar charts in which the area of the bar is proportional to the number of observations having values in the range defining the bar. Just as we can construct histograms of samples, we can construct histograms of populations. The population histogram describes the proportion of the population that lies between various limits. It also describes the behavior of individual observations drawn at random from the population, that is, it gives the probability that an individual selected at random from the population will have a value between specified limits. It is critical that you understand that population histograms describe the way individual observations behave. You should not go on unless you do!
When we're talking about populations and probability, we don't use the words "population histogram". Instead, we refer to probability densities and distribution functions. (However, it will sometimes suit my purposes to refer to "population histograms" to remind you what a density is.) When the area of a histogram is standardized to 1, the histogram becomes a probability density function. The area of any portion of the histogram (the area under any part of the curve) is the proportion of the population in the designated region. It is also the probability that an individual selected at random will have a value in the designated region. For example, if 40% of a population have cholesterol values between 200 and 230 mg/dl, 40% of the area of the histogram will be between 200 and 230 mg/dl. The probability that a randomly selected individual will have a cholesterol level in the range 200 to 230 mg/dl is 0.40 or 40%.
Strictly speaking, the histogram is properly a density, which tells you the proportion that lies between specified values. A (cumulative) distribution function is something else. It is a curve whose value is the proportion with values less than or equal to the value on the horizontal axis, as the example to the left illustrates. Densities have the same name as their distribution functions. For example, a bell-shaped curve is a normal density. Observations that can be described by a normal density are said to follow a normal distribution.
If you understand that population histograms describe the way individual observations behave, you're well on your way to understanding what statistical methods are all about. One of the jobs of the mathematical statistician is to describe the behavior of things other than individual observations. If we can describe the behavior of an individual observation, then perhaps we can describe the behavior of a sample mean, or a sample proportion, or even the difference between two sample means. We can! Here is the one sentence condensation of an entire course in distribution theory: Starting with a distribution function that describes the behavior of individual observations, it is possible to use mathematics to find the distribution functions that describe the behavior of a wide variety of statistics, including, means, proportions, standard deviations, variances, percentiles, and regression coefficients.
If you ever take a mathematical statistics course, you'll go through a large number of examples to learn how the mathematics works. You'll gain the skills to extend statistical theory to derive distributions for statistics that have not previously been studied. However, the basic idea will be the same. Given a distribution function that describes the behavior of individual observations, you'll derive distribution functions that describe the behavior of a wide variety of statistics, In these notes, we will accept the fact that this can be done and we will use the results obtained by others to describe the behavior of statistics that interest us. We will not bother to derive them ourselves.
This is the most important idea, after study design, that we've discussed so far--that distributions describe the behavior of things. They tell us how likely it is that the quantity being described will take on particular values. So far, we've talked about individual observations only. That is, all of the densities we've seen so far describe the behavior of individual observations, such as the individual heights displayed above.
We will soon be seeing distributions that describe the behavior of things such as sample means, sample proportions, and the difference between two sample means and two sample proportions. These distributions are all used the same way. For example, the distribution of the difference between two sample means describes what is likely to happen when two samples are drawn and the difference in their means is calculated. If you ever wanted to verify this, you could repeat the study over and over and construct a histogram of mean differences. You would find that it looks the same as the density function predicted by probability theory.