Gerard E. Dallal, Ph.D.
Suppose I wish to compare two diets, treatments, whatever. There are two ways to do this:
Paired samples have an obvious attraction.
Here's an extreme case of what could happen:
Consider these data from an experiment in which subjects are assigned at random to one of two diets and their cholesterol levels are measured. Do the data suggest a real difference in the effect of the two diets? The values from Diet A look like they might be a bit lower, but this difference must be judged relative to the variability within each sample. One of your first reactions to looking at these data should be, "Wow! Look at how different the values are. There is so much variability in the cholesterol levels that these data don't provide much evidence for a real difference between the diets." And that response would be correct. With P = 0.47 and a 95% CI for A-B of (-21.3, 9.3) mg/dl, we could say only that diet A produces a mean cholesterol level that could be anywhere from 21 mg/dL less than that from diet B to 9 mg/dL more.
However, suppose you are now told that a mistake had been made. The numbers are correct, but the study was performed by having every subject consume both diets. The order of the diets was selected at random for each subject with a suitable washout period between diets. Each subject's cholesterol values are connected by a straight line in the diagram to the left.
Even though the mean difference is the same (6 mg/dl) we conclude the diets are certainly different because we now compare the mean difference of 6 to how much the individual differences vary. Each subject's cholesterol level on diet A is exactly 6 mg/dl less than on diet B! There is no question that there is an effect and that it is 6 mg/dl!
Notice what characterizes the two situations:
IF (note caps and bolding) the goal of a study is to compare the population means of the paired measurements, the way to do it is by calculating differences between the pairs (Diet_A-Diet_B, wife-husband) and constructing a confidence interval for the mean of the single population of differences.
The point of pairing is to reduce the variability so that what's left will be more like that of the lines than the dots. The between-unit variability will be eliminated. If pairing is effective it will reduce variability enough to justify the effort involved to obtain paired data. For example, if we are interested in the difference in dairy intake of younger and older women, we could take random samples of young women and older women (independent samples). However, we might interview mother/daughter pairs (paired samples), in the hope of removing some of the lifestyle and socioeconomic differences from the age group comparison. However, it will be much harder to recruit mother/daughter pairs than samples of younger and older women. Sometimes pairing turns out to have been a good idea because variability is greatly reduced. Other times it turns out to be have been a bad idea, as is often the case with matched samples.
In my experience, the LEAST EFFECTIVE pairing is age/sex matching, even if weight or body mass index is used, too. Under this kind of pairing, subjects are matched on, for example, sex, age within three years, and 1 unit of BMI. If a 35 year old woman with a BMI of 24 is recruited, we wait until we recruit another woman between 32 and 38 years old with a BMI of between 23 and 25. We then randomly assign one member of the pair to one treatment and the other member to the other treatment. The observational unit is then the pair of subjects. While this may sound like an easy way to reduce variability, matching suffers from two issues:
Pairing is not something that is done after the fact. Pairing is part of the study design. Pairing starts with an idea--the notion that pairing will reduce variability. Often it does, but often it doesn't! However, once a study is designed with pairing, it has to be analyzed that way.
If, by mistake, the data were treated as independent samples, the mean difference will be estimated properly but the amount of uncertainty against which it must be judged will be wrong. The uncertainty will usually be overstated, causing some real differences to be missed. However, although it is unlikely, it is possible for uncertainty to be understated, causing things to appear to be different even though the evidence is inadequate. Thus, criticism of an improper analysis cannot be dismissed by claiming that because an unpaired analysis shows a difference, the paired analysis will show a difference, too.
Paired data do not have to come from a single person. They come from the same observational unit. The observational unit is often a person, but it could be a brother and sister (rather than a male and female chosen at random) or two plates cultured in an oven at the same time or two lanes on the same gel. In short, the observational units can be defined in any way that the investigator thinks will reduce variability, that is, that the differences between measurements within the unit will be less variable than samples collected independently. Because paired data do not necessarily come from the same individual, it is common for beginning analysts to fail to recognize paired data and, as a result, analyze them as though they were independent samples.
The best way to determine whether data are paired is to look for links between outcome measurements. For example,
Pairing is usually optional. In most cases an investigator can choose to design a study that leads to a paired analysis or one that uses independent samples. The choice is a matter of tradeoffs between cost, convenience, and likely benefit. A paired study requires fewer subjects, but the subjects have to experience both treatments, which might prove a major inconvenience. Subjects with partial data usually do not contribute to the analysis. Also, when treatments must be administered in sequence rather than simultaneously, there are questions about whether the first treatment will affect the response to the second treatment (carry-over effect). In most cases, a research question will not require the investigator to take paired samples, but if a paired study is undertaken, a paired analysis must be used. That is, the analysis must always reflect the design that generated the data.
Warning! It is common to have paired data when independent samples are being compared. In such cases, the pairing is used to construct two independent samples of a single response variable. This leads to the usual analysis for two independent samples, as the following example illustrates: