What does pairing really do?

What does pairing really do?

Whether data are independent samples or paired, the best estimate of the difference between population means is the difference between sample means. When the data are two independent samples of size n with approximately equal sample standard deviations (s_x s_y s), a 95% confidence interval for the population mean difference, _x-_y, is

Now suppose the data are n paired samples ((X_i,Y_i): i=1,..,n) where the sample standard deviations of the Xs and Ys are roughly equal (s_x s_y s) and the correlation between X & Y is r. A 95% confidence interval for the population mean difference, _x-_y, is

If the two responses are uncorrelated--that is, if the correlation coefficient is 0--the pairing is ineffective. The confidence interval is no shorter than it would have been had the investigators not taken the trouble to collect paired data. On the other hand, the stronger the correlation, the narrower the confidence interval and the more effective was the pairing. This formula also illustrates that pairing can be worse than ineffective. Had the correlation been negative, the confidence interval would have been longer than it would have been with independent samples.

[back to LHSP]