Paired Data

Paired Data
Gerard E. Dallal, Ph.D.

Paired Data--In Theory

Suppose I wish to compare two diets, treatments, whatever. There are two ways to do this:

I could randomize a group of subject so that each subject receives only one treatment. I could then compare the mean responses from each treatment. (independent samples)
I could give each subject BOTH treatments, randomizing the order in which each subject receives them and providing a suitable washout period, if necessary, if the treatments cannot be delivered simultaneously. (paired samples)

Paired samples have an obvious attraction.

When we compare independent samples, we have not only the variability in response to deal with (the kind of variability you'd expect if the treatment were delivered many times to the same individual), but the variability between subjects as well. Some will score high whatever you give them; others will score low. The treatment effect has to be strong enough to stand out against this background.
When data are paired, we look at the data within each subject, so the comparison is not affected by the way subjects differ. (You have an outcome on treatment A from a blond-haired, green-eyed, 5'6", 130 lb, 40 year old female who exercises 3 times a week? Well, you also have an outcome on treatment B from a blond-haired, green-eyed, 5'6", 130 lb, 40 year old female who exercises 3 times a week...the same woman!)

Here's an extreme case of what could happen:

Consider these data from an experiment in which subjects are assigned at random to one of two diets and their cholesterol levels are measured. Do the data suggest a real difference in the effect of the two diets? The values from Diet A look like they might be a bit lower, but this difference must be judged relative to the variability within each sample. One of your first reactions to looking at these data should be, "Wow! Look at how different the values are. There is so much variability in the cholesterol levels that these data don't provide much evidence for a real difference between the diets." And that response would be correct. With P = 0.47 and a 95% CI for A-B of (-21.3, 9.3) mg/dl, we could say only that diet A produces a mean cholesterol level that could be anywhere from 21 mg/dL less than that from diet B to 9 mg/dL more.

However, suppose you are now told that a mistake had been made. The numbers are correct, but the study was performed by having every subject consume both diets. The order of the diets was selected at random for each subject with a suitable washout period between diets. Each subject's cholesterol values are connected by a straight line in the diagram to the left.

Even though the mean difference is the same (6 mg/dl) we conclude the diets are certainly different because we now compare the mean difference of 6 to how much the individual differences vary. Each subject's cholesterol level on diet A is exactly 6 mg/dl less than on diet B! There is no question that there is an effect and that it is 6 mg/dl!

Notice what characterizes the two situations:

The effect is the same in both cases: 6 mg/dL. This is not an accident. Simple algebra shows that the difference in sample means MUST be equal to the mean of the differences.
What has changed is the measure of variability that the effect is compared to. When the underlying variability is large, the 6 mg/dL difference does not stand out. However, when the underlying variability is small, the 6 mg/dL mean difference stands out like a beacon!

IF (note caps and bolding) the goal of a study is to compare the population means of the paired measurements, the way to do it is by calculating differences between the pairs (Diet_A-Diet_B, wife-husband) and constructing a confidence interval for the mean of the single population of differences.

If the sample size is large, normal theory applies and the sample mean difference and population mean difference will be within two standard errors of the mean difference 95% of the time.
If the sample size is small and the differences appear to be not too far from coming from an normal distribution, Student's t distributions can be used to obtain the proper multiplier. In practice, it often happens that even when the individual observations have distributions that are far from normal, the differences between them are close enough to normal that the normal distribution can be used reliably to construct CIs. The moral here is not only to always look at the data, but to look at the data in the proper way. If the response to be analyzed is the difference between the individual measurement, then one should examine the differences along with the individual values!

Paired Data--In Practice

The point of pairing is to reduce the variability so that what's left will be more like that of the lines than the dots. The between-unit variability will be eliminated. If pairing is effective it will reduce variability enough to justify the effort involved to obtain paired data. For example, if we are interested in the difference in dairy intake of younger and older women, we could take random samples of young women and older women (independent samples). However, we might interview mother/daughter pairs (paired samples), in the hope of removing some of the lifestyle and socioeconomic differences from the age group comparison. However, it will be much harder to recruit mother/daughter pairs than samples of younger and older women. Sometimes pairing turns out to have been a good idea because variability is greatly reduced. Other times it turns out to be have been a bad idea, as is often the case with matched samples.

In my experience, the LEAST EFFECTIVE pairing is age/sex matching, even if weight or body mass index is used, too. Under this kind of pairing, subjects are matched on, for example, sex, age within three years, and 1 unit of BMI. If a 35 year old woman with a BMI of 24 is recruited, we wait until we recruit another woman between 32 and 38 years old with a BMI of between 23 and 25. We then randomly assign one member of the pair to one treatment and the other member to the other treatment. The observational unit is then the pair of subjects. While this may sound like an easy way to reduce variability, matching suffers from two issues:

Age/sex/weight matching rarely reduces variability. It is usually more effective to use statistical methods such as analysis of covariance after the fact to adjust for any age/sex/weight differences in the treatment groups.
You usually end up having to compromise. It gets really hard to find a 5'2" 45-49 year old male, so the criteria is relaxed to "improve" recruitment. But, once the criterion becomes 5 year, you start to wonder "Why bother?" and you are correct. All you've gained from your matching is a much more complicated analysis.

Pairing is not something that is done after the fact. Pairing is part of the study design. Pairing starts with an idea--the notion that pairing will reduce variability. Often it does, but often it doesn't! However, once a study is designed with pairing, it has to be analyzed that way.

If, by mistake, the data were treated as independent samples, the mean difference will be estimated properly but the amount of uncertainty against which it must be judged will be wrong. The uncertainty will usually be overstated, causing some real differences to be missed. However, although it is unlikely, it is possible for uncertainty to be understated, causing things to appear to be different even though the evidence is inadequate. Thus, criticism of an improper analysis cannot be dismissed by claiming that because an unpaired analysis shows a difference, the paired analysis will show a difference, too.

Paired data do not have to come from a single person. They come from the same observational unit. The observational unit is often a person, but it could be a brother and sister (rather than a male and female chosen at random) or two plates cultured in an oven at the same time or two lanes on the same gel. In short, the observational units can be defined in any way that the investigator thinks will reduce variability, that is, that the differences between measurements within the unit will be less variable than samples collected independently. Because paired data do not necessarily come from the same individual, it is common for beginning analysts to fail to recognize paired data and, as a result, analyze them as though they were independent samples.

The best way to determine whether data are paired is to look for links between outcome measurements. For example,

when husbands and wives are studied, there is a natural correspondence between a man and his wife.
When independent samples of men and women are studied, no particular female is associated with any particular male.

Pairing is usually optional. In most cases an investigator can choose to design a study that leads to a paired analysis or one that uses independent samples. The choice is a matter of tradeoffs between cost, convenience, and likely benefit. A paired study requires fewer subjects, but the subjects have to experience both treatments, which might prove a major inconvenience. Subjects with partial data usually do not contribute to the analysis. Also, when treatments must be administered in sequence rather than simultaneously, there are questions about whether the first treatment will affect the response to the second treatment (carry-over effect). In most cases, a research question will not require the investigator to take paired samples, but if a paired study is undertaken, a paired analysis must be used. That is, the analysis must always reflect the design that generated the data.

Warning! It is common to have paired data when independent samples are being compared. In such cases, the pairing is used to construct two independent samples of a single response variable. This leads to the usual analysis for two independent samples, as the following example illustrates:

Suppose an investigator compares the effects of two diets on cholesterol levels by randomizing subjects to one of the two diets and measuring their cholesterol levels at the start and end of the study. The primary outcome will be the change in cholesterol levels. Each subject's before and after measurements are paired because they are made on the same subject. The pairing will be used to construct a set of changes in levels. The diets will be compared by looking at two independent samples of changes.
If, instead, each subject had eaten both diets--that is, if there were two diet periods with a suitable washout between them and the order of diets randomized--a paired analysis would be required because both diets would have been studied on the same people.

Examples -- Paired or Independent Analysis?

A hypothesis of ongoing clinical interest is that vitamin C prevents the common cold. In a study involving 20 volunteers, 10 are randomly assigned to receive vitamin C capsules and 10 are randomly assigned to receive placebo capsules. The number of colds over a 12 month period is recorded.
A topic of current interest in ophthalmology is whether or not spherical refraction is different between the left and right eyes. To examine this issue, refraction is measured in both eyes of 17 people.
In order to compare the working environment in offices where smoking is permitted with that in offices where smoking was not permitted, measurements were made at 2 p.m. in 40 work areas where smoking was permitted and 40 work areas was not permitted.
A question in nutrition research is whether male and female college students undergo different mean weight changes during their freshman year. A data file contains the September 1994 weight (lbs), May 1995 weight (lbs), and sex (1=male/2=female) of students from the class of 1998. The file is set up so that each record contains the data for one student. The first 3 records, for example, might be

120 126 2

118 116 2

160 149 1
To determine whether cardiologists and pharmacists are equally knowledgeable about how nutrition and vitamin K affect anticoagulation therapy (to prevent clotting), an investigator has 10 cardiologists and 10 pharmacists complete a questionnaire to measure what they know. She contacts the administrators at 10 hospitals and asks the administrator to select a cardiologist and pharmacist at random from the hospital's staff to complete the questionnaire.
To determine whether the meals served on the meal plans of public and private colleges are equally healthful, an investigator chooses 7 public colleges and 7 private colleges at random from a list of all colleges in Massachusetts. On each day of the week, she visits one public college and one private college. She calculates the mean amount of saturated fat in the dinner entrees at each school.

120	126	2
118	116	2
160	149	1

Paired Data Gerard E. Dallal, Ph.D.

Paired Data
Gerard E. Dallal, Ph.D.