Sample Size Calculations
In a survey, there's usually no hypothesis being tested. The sample size determines the precision with which population values can be estimated. The usual rules apply--to cut the uncertainty (for example, the length of a confidence interval) in half, quadruple the sample size, and so on. The sample size for a survey, then, is determined by asking the question, "How accurately do you need to know something?" Darned if I know!
Sometimes imprecise estimates are good enough. Suppose in some underdeveloped country a 95% confidence interval for the proportion of children with compromised nutritional status was (20%, 40%). Even though the confidence interval is quite wide, every value in that interval points to a problem that needs to be addressed. Even 20% is too high. Would it help (would it change public policy) to know the true figure more precisely?
In his book Sampling Techniques, 3rd ed. (pp 72-74), William Cochran gives the example of an anthropologist who wishes to know the percentage of inhabitants of some island who belong to blood group O. He decides he needs to know this to within 5%. Why 5%? Why not 4% or 6%. I don't know. Neither does Cochran. Cochran asks!
He strongly suspects that the islanders belong either to a racial type with a P of about 35% or to one with a P of about 50%. An error limit of 5% in the estimate seemed to him small enough to permit classification into one of these types. He would, however, have no violent objection to 4 or 6% limits of error.
Thus the choice of a 5 %limit of error by the anthropologist was to some extent arbitrary. In this respect the example is typical of the way in which a limit of error is often decided on. In fact, the anthropologist was more certain of what he wanted than many other scientists and administrators will be found to be. When the question of desired degree of precision is first raised, such persons may confess that they have never thought about it and have no idea of the answer. My experience has been, however, that after discussion they can frequently indicate at least roughly the size of a limit of error that appears reasonable to them. [Cochran had a lot of experience with sample surveys. I don't. I have yet to have the experience where investigators can "indicate at least roughly the size of a limit of error that appears reasonable to them" with any degree of confidence or enthusiasm. I find the estimate is given more with resignation.]
Further than this we may not be able to go in many practical situations. Part of the difficulty is that not enough is known about the consequences of errors of different sizes as they affect the wisdom of practical decisions that are made from survey results. Even when these consequences are known, however, the results of many important surveys are used by different people for different purposes, and some of the purposes are not foreseen at the time when the survey is planned.
Thus, the specification of a sample size for a survey invariably contains a large element of guesswork. The more the survey can be made to resemble a controlled trial with comparisons between groups, the easier it is to come up with sample size estimates.
With simple random samples, every possible sample has the same probability of being selected. Estimates of population quantities and their uncertainties are relatively straightforward to calculate. Many surveys are conducted by using random samples that are not simple. Two of the most commonly used alternatives to simple random samples are stratified random samples and cluster samples.
With stratified random sampling, the population is divided into strata and a simple random sample is selected from each stratum. This insures it is possible to make reliable estimates for each stratum as well as for the population as a whole. For example, if a population contains a number of ethnic groups, a simple random sample might contain very few of certain ethnicities. If we were to sample equal numbers of each ethnicity, then characteristics of all ethnicities can be estimated with the same precision. Overall population estimates and their standard errors can be obtained by combining the stratum-specific estimates in the proper way.
For example, suppose a population is 90% white and 10 % black, a stratified sample of 500 whites and 500 blacks is interviewed, and the mean time per day spent watching television is 4 hours for whites and 2 hours for black. Then, the estimated mean number of hours spent watching television for the population combines the two stratum-specific estimates by giving 90% of the weight to the mean for whites and 10% of the weight to the mean for blacks, that is 0.90*4 + 0.10*2 = 3.8 hours. Similar calculations are used to calculate the overall SD.
With cluster sampling, the population is divided into clusters. A set of clusters is selected at random and individual units are selected within each cluster. Cluster sampling is typically used for convenience. Imagine a country composed of hundreds of villages. Rather than survey a simple random sample of the population (which might have the survey team visiting every village), it is usually more practical to take a simple random sample of villages and then take a random sample of individuals from each village. A cluster sample is always less precise than a simple random sample of the same size, but it is usually a lot less expensive to obtain. To put it another way, to achieve a specified level of precision it is often less expensive and more convenient to use a larger cluster sample than a smaller simple random sample. Once again, there are special formulas that allow analysts to combine the data from the clusters to calculate estimates and of population quantities and their standard errors.
Many of the sample size calculations for complex surveys involve estimates of quantities that are often unavailable. Levy & Lemeshow (Sampling of Populations, New York: John Wiley & Sons, 1991) are explicit about what investigators face: These quantities are population parameters that in general would be unknown and would have to be either estimated from preliminary studies or else guessed by means of intuition or past experience. (p 198)
A common method for obtaining sample size calculations for cluster sampling is by performing them as though simple random sampling were being used, except that the variances (SD2) used in the formulas are multiplied by a Design Effect which involves intraclass correlations, a measure of hove much of the variability between subjects is due to the variability between clusters. It has never been clear to me how design effects are estimated in practice. The ones I've seen have invariably been 2.
Over the last decade, statistical program packages have been developed for analyzing data from complex sample surveys. The best known of these is SUDAAN (from SUrvey DAta ANalysis), which is available as a stand-alone program or as an add-on to SAS. Lately, SAS has been adding this functionality to its own program with its SURVEYMEANS and SURVEYREG procedures.