A Boy & His Dog

Probablilty & Statistics / A Boy & His Dog
Gerard E. Dallal, Ph.D.

Much if not all of what one needs to understand about confidence intervals can be learned through a simple analogy with a boy and his dog.

Imagine a boy and his dog outside together. The dog is friendly and there are no leash laws, so the boy lets his dog run free. The dog has been to obedience school and knows that he shouldn't stray too far: 68% of the time, he is within a certain distance of the boy, while he is within twice that distance 95% of the time.

While walking down the street, you meet the boy. You know the dog is somewhere nearby. Now, here comes the great insight! One day, you see the dog. You know the boy is somewhere nearby!

Think of the boy as a population mean and the dog as a sample mean. The standard error of the mean (SEM) plays the role of the "certain distance". The sample size reflects the dog's training. The larger the sample size, the smaller the "certain distance" and the better behaved the dog

Boy	Population Mean
Dog	Sample Mean
Training	Sample Size
"Certain Distance"	Standard Error of the Mean (SEM)

Probability theory describes the way the dog wanders from the boy. That is, given a population mean (location of the boy), probability theory tells us how likely it is that a sample mean (location of the dog) will differ from the population mean by so many SEMs (wander from the boy).

Statistical theory turns everything around.

If knowing the population mean (location of the boy) tells us what the sample mean might be (location of the dog), then knowing the sample mean (location of the dog) gives us some clue about the population mean (location of the boy)!
If 95% of the time the sample mean (dog) will be found within 2 SEMs of the population mean (boy), then 95% of the time the time the population mean (boy) will be found within 2 SEMs of the sample mean (dog).

This is how confidence intervals work.

We want to know the location of the population mean.
We can observe only the sample mean.
However, if the two have to be within two SEMs of each other 95% of the time, then knowing the sample mean and SEM gives us some hints about the value of the population mean.