**Sample Size Calculations
Surveys**

In a survey, there's usually no hypothesis being tested. The sample size determines the precision with which population values can be estimated. The usual rules apply--to cut the uncertainty (for example, the length of a confidence interval) in half, quadruple the sample size, and so on. The sample size for a survey, then, is determined by asking the question, "How accurately do you need to know something?" Darned if I know!

Sometimes imprecise estimates are good enough. Suppose in some underdeveloped country a 95% confidence interval for the proportion of children with compromised nutritional status was (20%, 40%). Even though the confidence interval is quite wide, every value in that interval points to a problem that needs to be addressed. Even 20% is too high. Would it help (would it change public policy) to know the true figure more precisely?

In his book *Sampling Techniques, 3rd ed.* (pp 72-74), William
Cochran gives the example of an anthropologist who wishes to know the
percentage of inhabitants of some island who belong to blood group O. He
decides he needs to know this to within 5%. Why 5%? Why not 4% or 6%. I
don't know. Neither does Cochran. Cochran asks!

He strongly suspects that the islanders belong either to a racial type with a P of about 35% or to one with a P of about 50%. An error limit of 5% in the estimate seemed to him small enough to permit classification into one of these types. He would, however, have no violent objection to 4 or 6% limits of error.Thus the choice of a 5 %limit of error by the anthropologist was to some extent arbitrary. In this respect the example is typical of the way in which a limit of error is often decided on. In fact, the anthropologist was more certain of what he wanted than many other scientists and administrators will be found to be. When the question of desired degree of precision is first raised, such persons may confess that they have never thought about it and have no idea of the answer. My experience has been, however, that after discussion they can frequently indicate at least roughly the size of a limit of error that appears reasonable to them. [Cochran had a lot of experience with sample surveys. I don't. I have yet to have the experience where investigators can "indicate at least roughly the size of a limit of error that appears reasonable to them" with any degree of confidence or enthusiasm. I find the estimate is given more with resignation.]

Further than this we may not be able to go in many practical situations. Part of the difficulty is that not enough is known about the consequences of errors of different sizes as they affect the wisdom of practical decisions that are made from survey results. Even when these consequences are known, however, the results of many important surveys are used by different people for different purposes, and some of the purposes are not foreseen at the time when the survey is planned.

Thus, the specification of a sample size for a survey invariably contains a large element of guesswork. The more the survey can be made to resemble a controlled trial with comparisons between groups, the easier it is to come up with sample size estimates.

Sampling Schemes

With *simple random samples*, every possible sample has the same
probability of being selected. Estimates of population quantities and
their uncertainties are relatively straightforward to calculate. Many
surveys are conducted by using random samples that are not simple. Two of
the most commonly used alternatives to simple random samples are
*stratified random samples* and *cluster samples*.

With *stratified random sampling*, the population is divided into
strata and a simple random sample is selected from each stratum. This
insures it is possible to make reliable estimates for each stratum as
well as for the population as a whole. For example, if a population
contains a number of ethnic groups, a simple random sample might contain
very few of certain ethnicities. If we were to sample equal numbers of
each ethnicity, then characteristics of all ethnicities can be estimated
with the same precision. Overall population estimates and their standard
errors can be obtained by combining the stratum-specific estimates in the
proper way.

For example, suppose a population is 90% white and 10 % black, a stratified sample of 500 whites and 500 blacks is interviewed, and the mean time per day spent watching television is 4 hours for whites and 2 hours for black. Then, the estimated mean number of hours spent watching television for the population combines the two stratum-specific estimates by giving 90% of the weight to the mean for whites and 10% of the weight to the mean for blacks, that is 0.90*4 + 0.10*2 = 3.8 hours. Similar calculations are used to calculate the overall SD.

With *cluster sampling*, the population is divided into clusters.
A set of clusters is selected at random and individual units are selected
within each cluster. Cluster sampling is typically used for
convenience. Imagine a country composed of hundreds of villages. Rather
than survey a simple random sample of the population (which might have
the survey team visiting every village), it is usually more practical to
take a simple random sample of villages and then take a random sample of
individuals from each village. A cluster sample is always less precise
than a simple random sample of the same size, but it is usually a lot
less expensive to obtain. To put it another way, to achieve a specified
level of precision it is often less expensive and more convenient to use
a larger cluster sample than a smaller simple random sample. Once again,
there are special formulas that allow analysts to combine the data from
the clusters to calculate estimates and of population quantities and
their standard errors.

Many of the sample size calculations for complex surveys involve
estimates of quantities that are often unavailable. Levy & Lemeshow
(*Sampling of Populations*, New York: John Wiley & Sons, 1991) are
explicit about what investigators face: These quantities are
population parameters that in general would be unknown and would have to
be either estimated from preliminary studies or else guessed by means of
intuition or past experience. (p 198)

A common method for obtaining sample size calculations for cluster
sampling is by performing them as though simple random sampling were
being used, except that the variances (SD^{2}) used in the
formulas are multiplied by a *Design Effect* which involves
intraclass correlations, a measure of hove much of the variability
between subjects is due to the variability between clusters. It has
never been clear to me how design effects are estimated in practice. The
ones I've seen have invariably been 2.

Over the last decade, statistical program packages have been developed for analyzing data from complex sample surveys. The best known of these is SUDAAN (from SUrvey DAta ANalysis), which is available as a stand-alone program or as an add-on to SAS. Lately, SAS has been adding this functionality to its own program with its SURVEYMEANS and SURVEYREG procedures.