· Why sample? Most of the time, it is not possible to observe every case of something. Instead, we select a sample from the total population of cases. Sampling always introduces some bias, or deviation from the true distribution of the population, but it is possible to estimate that bias and report it with the results. The estimated bias will depend on the size of the sample (the larger the sample, the smaller the bias introduced by sampling)—not the size of the sample in relation to the total population (i.e., the proportion of the population), but the absolute size (whether the total population is large or small, a certain sample size will be needed to obtain a specified “confidence interval.” The necessary size of the sample will depend on the use to which it will be put. The greater the variability in the responses or the more risky a false conclusion, the larger the sample must be.
· Types of samples: Some sampling schemes are so common that have their own names:
o Population “sample”: Sample “everyone”
o Simple random sample: Cases are randomly selected from the total population. This is the “put the names in the a hat” method. The modern, quantitative approach is to use a random number generator (BTW, Excel spreadsheets have a function that will do this) to select cases from a list.
o Systematic sample: This is the approach of picking the nth name on each page of a phone book, for example. This will also generate a random sample, if there is no bias in the original list. It could result in an undersampling of a particular ethnic group if they are represented by a small set of family names (e.g., a large number of Koreans share the family name “Lee,” and a large number of Hmong share the family name Vang).
o Stratified sample: One way to insure that groups, perhaps small in number but significant for other reasons, are included is to “stratify” the sample by subgroup, and then randomly select from within each group. If one wanted to look at the effect of, say, a proposed education policy on school districts, one might want to select an equal number of rural and urban districts on the assumption that the impact on them might be different.
o Cluster sample: Sometimes you have access to a number of groups (say, cohorts composed of people who used your agency’s services, grouped by week for all of last year). In that case, you might choose to “sample” the groups and use the information from all the members of each group you sample. This is useful when you have reason to believe that the group characteristics may be important in themselves (perhaps your staff went through a major training exercise midway through the year….).
· Determining Sample Size: There are different ways to determine sample size, depending on how you will be using the sample (what sort of statistical test you will be using):
o For a proportion:
§ n = 1.96(1 / p’ – p*)(r[1-r], where p* is the hypothesized value and p’ is the obtained value and r is the true proportion underlying the observation.
§ This can be simplified to 0.25(1.96/Δ)2, where Δ is the difference between the obtained and hypothesized value and the true proportion is hypothesized as .5 (this is the most restrictive assumption—as the true proportion deviates from .5, the minimum sample size decreases as well).
§ Note that sample size (n) is determined by the difference you can accept between what you expect (p*) and what you get (p’ ).
o For an average:
[1.96 * (σ / √n-1)2 ] / (x* -
) where x* is the
hypothesized mean and x is the observed mean (average).
§ This can be simplified to (1.96σ /Δ) 2+1, where Δ is the difference between the observed and hypothesized means.
§ Note here that sample size (n) is determined not only by the acceptable difference between the hypothesized and obtained mean, but also by the variance around the mean (s).
o Applied example
§ Suppose you want to get an estimate of public support for a bond referendum which is coming up in your district. How many people would you have to survey, assuming you can accept a 10-pt. (0.1) difference between what the survey says and what the voters are actually thinking? Suppose you can only accept a 5-pt. (0.05) spread?
· The calculating formula is 0.25(1.96/Δ)2 , so 0.25(1.96/0.1)2=96, so you would need to survey about 100 people.
· If you need to be more precise, .025(1.96/.05) 2=385, so you would need to survey almost 4 times as many people to get a doubling in accuracy (you are squaring the difference, remember?)
§ Suppose you want to survey the community to estimate the income security in the community. Suppose you have anticipate that the true average will be around 5 (on a 10-pt index), and that the average deviation about that mean will probably be 1 pt. How many people would you have to survey if you need to be accurate within 0.1 pts? Suppose you need to be accurate within 0.05 pts?
· The calculating formula is (1.96σ /Δ) 2+1
· So, to be accurate within 0.1 pts, you would need to sample 385 people; doubling your accuracy to 0.05 pts. would require 1,538 people.
§ What happens with the previous problem if the average deviation narrows to 0.5 instead of 1 pt.?
· With the average deviation cut in half, you would need to sample on 97 people to get within 0.1 points of the true average value, and a sample of 385 people would get you within 0.05 points.
When I was an undergraduate at the
There are a number of terms and techniques used in statistics to estimate the importance of the findings, given the quality of the data. All of these terms, taken together, are called “sensitivity measures.” Sensitivity measures the amount of “error” in a mathematical analysis. “Error,” in this sense, is a technical term meaning the amount of deviation of the observed values from the “true” (or “population”) values, due to
· Constant error: Effects which distort the measures in one direction
· Random error: Effects which obscure possible effects (i.e., which distort the measures in all directions).
There are four terms which, taken together, determine the sensitivity of an analysis:
· Reliability: The degree to which a measure generates similar responses over time and across situations. It is mostly affected by random error. Reliability can be tested by:
o Test-Retest: Same subjects retake the test after some lapse of time and results are compared for consistency.
o Alternate forms: A group of subjects take alternate forms of the same test and the results are compared for consistency.
o Subsample: A sample of the subject pool is called back for a retest, another sample is given an alternate form, and the results of all three conditions (first form, alternate form, retest of first form) are compared for consistency.
Each of these techniques carries its own validity problems. Retest is subject to maturation and learning (the subjects have had additional life experiences and have had some initial experience with the test when they sit for the retest) problems. Alternate forms cannot answer which form is more accurate. Subsample shares all of these weaknesses.
· Validity: Measure of the extent to which measures accurately reflect what they are supposed to measure—to what extent are observed differences “true”? Validity is mostly affected by constant error. You have already considered the issue of validity in your discussion of questionnaire design. To summarize, there are two issues of validity:
o Internal validity: How effective are the measures within the confines of the study?
o External validity: Can the results obtained be generalized o the broader population? This involves questions of the representativeness of subject group, the possibility that the selection or the measurement process might have affected the outcome (in addition to the experimental treatment), and the possibility that multiple conditions present in the study might be necessary for the experimental effect to occur.
· Confidence limits: Estimate of the error that can be expected in a measure, just due to the nature of measurement. “Confidence” is a term with a very precise meaning in quantitative analysis, so use it in technical writing only when you intend the technical meaning. It can be used to estimate the sample size needed for a study. The general formula is of the form:
o CL = 1.96(σ / √n -1)
o Try this formula with the problem above for the sample size for a test of means.
§ Inserting the values for a standard deviation of 1 and a sample of 385, the formula returns a Confidence Interval of 0.1.
§ With a standard deviation of 0.05 and a sample of 385, the Confidence Interval is 0.05.
§ Isn’t it exciting when things work out the way the theory says they should?
· Significance: Test to determine whether observed results could be explained by chance deviation from expected (average) values. “Significance,” too, has a technical meaning and should only be used in technical writing with that intention.
o Significance testing is based on the assumption of a “normal curve” (that is the infamous “bell-shaped” curve, the profile through a pile of sand that has dripped through your fingers—the distribution of objects whose interaction with each other is independent and random).
o The normal curve can be completely described by two terms: the mean (m) and the standard deviation (s). [I am using Greek letters here because I am referring to “ideal” parameters of a population, not those actually observed in a sample]. The mean measures the central point (the median and the mode will be at the same point in this ideal distribution). The standard deviation measures the dispersion of cases about the mean.
o Any observation can be located on an ideal normal curve through its “z-score.” Z is a number which converts the value obtained from observation into a location on the normal curve. It takes the form
§ z = (x – μ) / σ
Or, the distance of the observation from the mean, divided by the dispersion (standard deviation) about the mean. Z-scores, then, are commonly used to gauge levels of significance.
o Significance levels represent a trade-off between two potential types of error—the possibility that one could deny that there is a real difference (when there really is one) and the possibility that one could accept that there is a real difference (when there really isn’t one). The relationship between the two is inverse—as one goes up, the other goes down. In the ideal, the two lines cross at the .01 level of significance; this represents the balance between affirming a false result and denying a true result. It is the level commonly used in bench science. In medical science, where the consequences of affirming a false result could be fatal, researchers tend to take a stricter stand and use a .005 level. In the social sciences, where experimental control is weaker and the data are often less precisely measured, we commonly use the .05 level.
o A “.05 level of significance” means that 5% of cases belonging to a normal distribution will fall outside of the point with a z-score of +1.96. Any smaller deviation from the mean can be expected to occur simply by chance 95% of the time. And this explains what that “1.96” was doing in the formulas above. It was setting the bounds for the .05 level of significance.
o To use the z-score (i.e., 1.96 for the .05 level), simply multiply the sample standard deviation by 1.96. Anything “outside” that value is not likely to belong to the sample distribution.
2001. Municipal Benchmarks, 2nd ed.
Richard Netemeyer, Mary Mobley. 1993. Handbook
of Marketing Scales.
Hatry, Harry P. et alii. 1977. How
Effective Are Your Community Services?
Knapp, Gerrit J.
C. & Sue Thiemann. 1987. How
Many Subjects? Statistical
Power Analysis in Research.
C. & Neil Salkind 2002. Handbook of Research Design and Social Measurement.
Sheldon, Eleanor B. & Wilbert E. Moore, eds. 1968. Indicators of Social Change: Concepts and Measurements. NY: Russell Sage Foundation.
Vellman, Paul F & David C. Hoaglin. 1981. Applications,
Basics, and Computing of Exploratory Data Analysis.
© 2006 A.J.Filipovitch
Revised 5 October 2008