**Paired Data
**Gerard E. Dallal, Ph.D.

Suppose I wish to compare two diets, treatments, whatever. There are two ways to do this:

- I could randomize a group of subject so that each subject
receives only one treatment. I could then compare the mean responses
from each treatment. (
*independent samples*) - I could give each subject BOTH treatments, randomizing the order in
which each subject receives them and providing a suitable washout period,
if necessary, if the treatments cannot be delivered simultaneously.
(
*paired samples*)

Paired samples have an obvious attraction.

- When we compare independent samples, we have not only the variability in response to deal with (the kind of variability you'd expect if the treatment were delivered many times to the same individual), but the variability between subjects as well. Some will score high whatever you give them; others will score low. The treatment effect has to be strong enough to stand out against this background.
- When data are paired, we look at the data within each subject, so the comparison is not affected by the way subjects differ. (You have an outcome on treatment A from a blond-haired, green-eyed, 5'6", 130 lb, 40 year old female who exercises 3 times a week? Well, you also have an outcome on treatment B from a blond-haired, green-eyed, 5'6", 130 lb, 40 year old female who exercises 3 times a week...the same woman!)

Here's an extreme case of what could happen:

Consider these data from an
experiment in which subjects are assigned at random to one of two diets
and their cholesterol levels are measured. Do the data suggest a real
difference in the effect of the two diets? The values from Diet A look
like they might be a bit lower, but this difference must be judged
relative to the variability within each sample. One of your first
reactions to looking at these data should be, "Wow! Look at how
different the values are. There is so much variability in the
cholesterol levels that these data don't provide much evidence for a real
difference between the diets." And that response would be correct. With P
= 0.47 and a 95% CI for A-B of (-21.3, 9.3) mg/dl, we could say only
that diet A produces a mean cholesterol level that could be anywhere from
21 mg/dL ** less** than that from diet B to 9 mg/dL

However, suppose you are now told that a mistake had been made. The numbers are correct, but the study was performed by having every subject consume both diets. The order of the diets was selected at random for each subject with a suitable washout period between diets. Each subject's cholesterol values are connected by a straight line in the diagram to the left.

Even though the mean difference is the same (6 mg/dl) we conclude the
diets are certainly different because we now compare the mean difference
of 6 to how much the individual differences vary. Each subject's
cholesterol level on diet A is *exactly* 6 mg/dl less than on diet
B! There is no question that there is an effect and that it is 6 mg/dl!

Notice what characterizes the two situations:

- The effect is the
**same**in both cases: 6 mg/dL. This is not an accident. Simple algebra shows that the difference in sample means MUST be equal to the mean of the differences. - What has changed is the measure of variability that the effect is compared to. When the underlying variability is large, the 6 mg/dL difference does not stand out. However, when the underlying variability is small, the 6 mg/dL mean difference stands out like a beacon!

**IF** (note caps and bolding) the goal of a study is to compare the
population means of the paired measurements, the way to do it is by
calculating differences between the pairs (Diet_A-Diet_B, wife-husband)
and constructing a confidence interval for the mean of the *single*
population of differences.

- If the sample size is large, normal theory applies and the sample mean difference and population mean difference will be within two standard errors of the mean difference 95% of the time.
- If the sample
size is small and the differences appear to be not too far from coming from an
normal distribution, Student's t distributions can be used to obtain the proper
multiplier. In practice, it often happens that even when the individual
observations have distributions that are far from normal, the differences
between them are close enough to normal that the normal distribution can be
used reliably to construct CIs. The moral here is not only to always look at
the data, but to look at the data in the proper way. If the response to be
analyzed is the difference between the individual measurement, then one should
examine the
*differences*along with the individual values!

The point of pairing is to reduce the variability so that what's left
will be more like that of the lines than the dots. The between-unit
variability will be eliminated. If pairing is effective it will reduce
variability enough to justify the effort involved to obtain paired data.
For example, if we are interested in the difference in dairy intake of
younger and older women, we could take random samples of young women and
older women (independent samples). However, we might interview
mother/daughter pairs (paired samples), in the hope of removing some of
the lifestyle and socioeconomic differences from the age group
comparison. However, it will be much harder to recruit mother/daughter
pairs than samples of younger and older women. Sometimes pairing turns
out to have been a good idea because variability *is* greatly reduced.
Other times it turns out to be have been a bad idea, as is often the case
with matched samples.

In my experience, the LEAST EFFECTIVE pairing is age/sex matching,
even if weight or body mass index is used, too. Under this kind of
pairing, subjects are matched on, for example, sex, age within three
years, and 1 unit of BMI. If a 35 year old woman with a BMI of 24 is
recruited, we wait until we recruit another woman between 32 and 38 years
old with a BMI of between 23 and 25. We then randomly assign one member
of the pair to one treatment and the other member to the other treatment.
The observational unit is then the *pair* of subjects. While this
may sound like an easy way to reduce variability, matching suffers from
two issues:

- Age/sex/weight matching rarely reduces variability. It is usually more effective to use statistical methods such as analysis of covariance after the fact to adjust for any age/sex/weight differences in the treatment groups.
- You usually end up having to compromise. It gets really hard to find a 5'2" 45-49 year old male, so the criteria is relaxed to "improve" recruitment. But, once the criterion becomes 5 year, you start to wonder "Why bother?" and you are correct. All you've gained from your matching is a much more complicated analysis.

Pairing is not something that is done after the fact. **Pairing is
part of the study design.** Pairing starts with an *idea*--the
notion that pairing will reduce variability. Often it does, but often it
doesn't! However, once a study is designed with pairing, it has to be
analyzed that way.

If, by mistake, the data were treated as independent samples, the mean
difference will be estimated properly but the amount of uncertainty
against which it must be judged will be wrong. The uncertainty will
*usually* be overstated, causing some real differences to be missed.
However, although it is unlikely, it is *possible* for uncertainty to be
*under*stated, causing things to appear to be different even though
the evidence is inadequate. Thus, criticism of an improper analysis
cannot be dismissed by claiming that because an unpaired analysis shows a
difference, the paired analysis will show a difference, too.

Paired data do not have to come from a single *person*. They come
from the same *observational unit*. The observational unit is often
a person, but it could be a brother and sister (rather than a male and
female chosen at random) or two plates cultured in an oven at the same
time or two lanes on the same gel. In short, the observational units can
be defined in any way that the investigator thinks will reduce
variability, that is, that the differences between measurements within
the unit will be less variable than samples collected independently.
Because paired data do not necessarily come from the same individual,
it is common for beginning analysts to fail to recognize paired data and,
as a result, analyze them as though they were independent samples.

The best way to determine whether data are paired is to look for links between outcome measurements. For example,

- when husbands and wives are studied, there is a natural correspondence between a man and his wife.
- When independent samples of men and women are studied, no particular female is associated with any particular male.

Pairing is usually optional. In most cases an investigator can choose
to design a study that leads to a paired analysis or one that uses
independent samples. The choice is a matter of tradeoffs between cost,
convenience, and likely benefit. A paired study requires fewer subjects,
but the subjects have to experience both treatments, which might prove a
major inconvenience. Subjects with partial data usually do not
contribute to the analysis. Also, when treatments must be administered
in sequence rather than simultaneously, there are questions about whether
the first treatment will affect the response to the second treatment
(*carry-over effect*). In most cases, a research question will not
require the investigator to take paired samples, but if a paired study is
undertaken, a paired analysis **must** be used. That is, **the
analysis must always reflect the design that generated the
data**.

* Warning!* It is common to
have paired data when independent samples are being compared. In such
cases, the pairing is used to construct two independent samples of a
single response variable. This leads to the usual analysis for two
independent samples, as the following example illustrates:

- Suppose an investigator compares the effects of two diets on
cholesterol levels by randomizing subjects to one of the two diets and
measuring their cholesterol levels at the start and end of the study. The
primary outcome will be the
**change**in cholesterol levels.**Each subject's before and after measurements are paired because they are made on the same subject.**The pairing will be used to construct a set of**changes in levels.**The diets will be compared by looking at**two independent samples of changes**. - If, instead, each subject had eaten both diets--that is, if there were two diet periods with a suitable washout between them and the order of diets randomized--a paired analysis would be required because both diets would have been studied on the same people.

- A hypothesis of ongoing clinical interest is that vitamin
C prevents the common cold. In a study involving 20
volunteers, 10 are randomly assigned to receive vitamin C
capsules and 10 are randomly assigned to receive placebo
capsules. The number of colds over a 12 month period is
recorded.
- A topic of current interest in ophthalmology is whether or
not spherical refraction is different between the left and
right eyes. To examine this issue, refraction is measured
in both eyes of 17 people.
- In order to compare the working environment in offices
where smoking is permitted with that in offices where
smoking was not permitted, measurements were made at 2 p.m.
in 40 work areas where smoking was permitted and 40 work
areas was not permitted.
- A question in nutrition research is whether male and
female college students undergo different mean weight
changes during their freshman year. A data file contains
the September 1994 weight (lbs), May 1995 weight (lbs),
and sex (1=male/2=female) of students from the class of
1998. The file is set up so that each record contains the
data for one student. The first 3 records, for example,
might be
120 126 2 118 116 2 160 149 1 - To determine whether cardiologists and pharmacists are equally
knowledgeable about how nutrition and vitamin K affect anticoagulation
therapy (to prevent clotting), an investigator has 10 cardiologists and
10 pharmacists complete a questionnaire to measure what they know. She
contacts the administrators at 10 hospitals and asks the administrator to
select a cardiologist and pharmacist at random from the hospital's staff
to complete the questionnaire.
- To determine whether the meals served on the meal plans of public
and private colleges are equally healthful, an investigator chooses 7
public colleges and 7 private colleges at random from a list of all
colleges in Massachusetts. On each day of the week, she visits one
public college and one private college. She calculates the mean amount
of saturated fat in the dinner entrees at each school.