Much of the information you will need in attending to the public’s business is unique—it depends on the individual characteristics of a specific situation. You might want to know whether a proposed construction is fifty feet from the lot line, or if someone’s dog is barking outside at night, or whether your assistant is going to get that report to the committee on time for Monday’s meeting. You don’t need fancy quantitative analysis for situations like these; the facts speak for themselves. While the solution may be far from straightforward, the problem at least can be described fairly simply.
Some problems are not so simply described. It might seem that one neighborhood in the city is getting “run down.” You seem to have noticed more houses needing paint—but how many more? And is that “enough” to mean that there is a problem? Or you might wonder if the population in the region is getting older—maybe it seems that you are seeing fewer babies around lately. Both of these situations cannot be satisfactorily described by simple observation, because the really interesting information is what is going on between the observations or what is going on beyond the scope of any single observation.
Fortunately (or unfortunately, if you prefer the simple life), there are tools which can be used to summarize a large body of observations and which can be used to compare these summaries to other, similar summaries. You are probably already familiar with some of them, like “average” and “correlation” (as in, “there is a relationship between smoking and lung cancer”). Others have exotic names, like “Student’s tTest” and “Chisquare.” All of these tools are jointly known as “statistics,” a word coined in the eighteenth century to describe information which was being collected and analyzed about affairs of state. Since then, statistics have spread to fields other than public administration, but the name remains to remind us of their importance in making sense of the public’s business.
Statistical tools are grouped according to function. Some tools serve to describe a large group of information in just a few numbers. These tools are called “descriptive statistics.” Other tools tools test to see whether the information obtained is close to what should be expected. These tools are called “inductive statistics.” They have that name because they are based on inductive logic: key terms in the formulas are derived from observation rather than deduced from a definition. In this unit, we will consider two inductive statistics (Student’s ttest and Chisquare). The next unit will be devoted to the most commonly used inductive statistics—correlation and analysis of variance. The difference between Student’s ttest and Chisquare lies in the assumptions you can make about the “true” distribution of the data. All inductive statistics work from observed data to create an estimate of the “true” or “population” distribution. Rarely will you have the luxury of actually observing the totality of the phenomenon in which you are interested; most of the time, you must settle for looking at a representative sample of the data.
Nonparametric statistics make no assumptions about the shape of the underlying distribution. Since they make fewer assumptions (they still assume the rules of probability), they can be used in more instances. There is a price to pay for the increased flexibility, however. Nonparametric statistics are also less sensitive. They are more likely than parametric statistics to overlook a small, but still significant, difference in the data. Chisquare tests whether the observed interaction between two variables could be obtained by chance from a larger population, just from knowing how many in the sample belong in the various categories of each of the variables.
Parametric statistics assume that the true distribution is “normally” distributed. The “normal” distribution is the name given to the way things distribute themselves when they are independent of each other. The sands in an hourglass come to rest in a “normal” distribution. If a test is wellconstructed, scores on the test will fall into a “normal” distribution. The simplest of the parametric statistics compares an observed distribution to an ideal distribution or to another observed distribution, and tests for the fit between them. “Student’s tTest,” a statistic you will learn in this unit, is one such tool. Since observation is based on only a sample of the entire population of events or traits, it is likely that samples drawn at different times will return slightly different observations. So it is important to know whether observed differences are small enough that they could just be due to sampling “error” (or variation), or whether the differences are large enough to justify the claim that there is a “significant” difference. In the next unit, you will learn other, even more powerful parametric statistics.
Chisquare (the “ch” is hard—pronounce it like “ky”) is a general test for evaluating whether observed frequencies are significantly different form those expected by chance. It is used whenever the observations can be “crossclassified,” or evaluated for two items of information at the same time (for example, the social class and the owner/renter status for each household in a city). Unlike the other statistics in this unit and the next, Chisquare is not confined to interval data; any means of categorizing the data is acceptable, as long as it is possible to crossclassify the categories.
Crossclassification is usually accomplished through a “contingency table,” like this one:
Education 



Low 
High 

Income 
High 
126 
99 


Low 
71 
162 





458 
The first step in creating a contingency table is to divide the measurements of each characteristic into categories; here, two categories were used for each group (“High” and “Low”). Any number of categories may be used, and it is not necessary to have the same number of categories in the rows as there are in the columns. Each cell in the table then represents the number of observations which fit the category at both the row heading and the column heading. In the example above the research looked at the income and education for 458 different households. Seventyone households were both low income and low education, and 162 households were both low income and high education. Notice that the total of all the cells (458) is the same as the total number of observations (“458,” in the lower right corner of the contingency table).
The formula for computing Chisquare has two parts:
· “Expected frequencies”: Each cell of the contingency table contains the “observed frequency” of occurrences under that rowandcolumn heading. The “expected frequency” of occurrence is that cell’s proportion of the row (the number of observations in the row divided by the total number of observations—“r/n”) multiplied by the cell’s proportion of the column (the number of observations in the column divided by the total number of observations—“c/n”). This expression can be simplified to
f_{e} = r ´ c
n
· Calculating formula: The calculation of Chisquare builds on the ratio of observed to expected frequencies. It is calculated by:
f_{e}_{}
c^{2} = å (f_{o}_{ } f_{e})^{2}
f_{e}_{}
df = (r – 1) ´ (c 1)

ChiSquare 

df = 
0.05 
0.01 
1 
3.841 
5.412 
2 
5.991 
9.210 
3 
7.815 
11.341 
4 
9.488 
13.277 
5 
11.070 
15.086 
6 
12.592 
16.812 
10 
18.307 
23.209 
This statistic is used to test for the significance of the difference between two means. The tTest is simply asking whether the observed value of the mean could have been obtained, with random variation due to sampling and chance, from some other group. Or, in other words, is the value obtained from this set of observations sufficiently different that one could conclude that the two distributions are independent of each other?
The formula for the tTest is fairly simple. The difference between the observed mean (called the “sample mean”) and the criterion mean (called the “population mean”) is divided by the “standard error of the sample,” which is the ratio of the sample standard deviation to the square root of the sample size:
·
Standard error: s.e = s
Ön
·
tTest: t
= X  Pop.
Mean
s.e.
Remember that both the sample standard deviation and the sample mean are calculated using “n1” instead of “n.” The tstatistic is interpreted much the same as Chisquare, examining the tvalue to determine if it exceeds the critical values listed in a table of tvalues. Degrees of freedom are determined by the number of observations less one (df = n1).

tTest 

df = 
0.05 
0.025 
0.01 
1 
6.314 
12.706 
31.821 
2 
2.920 
4.303 
6.965 
3 
2.353 
3.182 
4.541 
4 
2.132 
2.776 
3.747 
5 
2.015 
2.571 
3.365 
6 
1.943 
2.447 
3.143 
10 
1.812 
2.228 
2.764 
But, now that you know how to read the mathematical formulas and now that you appreciate what they are doing to the data you have gathered, the truth is that you will rarely have to do the calculations yourself. The Excel spreadsheet has these statistical tests built into it. You can find them using the “Insert/Function” utility. Place your cursor where you want to display the results of the analysis, insert the appropriate function, and enter the locations of the data ranges to which you wish to apply the function.
Now that you know the mathematics behind the tests, the simple truth is that you will almost never calculate these statistics “by hand.” Instead, you can use the “function key” in Microsoft Excel. You can get to it several ways—there is a “Formulas” tab at the top of Excel (where “Home” and “File,” etc. are—depending on which version of Excel you are using). Or, at the very beginning of the data entry bar (just above the matrix of cells, the bar that shows what you have entered or are entering into each cell) there is a “fx” box (it stands for “function of x”). If you click on the “fx” box, it gives you a slate of preloaded formulas (if you don’t find it using fx, go up to the Formulas tab and look in the libraries there). The function for chisquare is called CHISQ.TST and Ttest is T.DIST. The ChiSquare and Ttest in Excel do not return an obtained value for the test—it goes one better and compares the obtained value to the critical value and gives you the exact probability (ie, “0.032” rather than “less than 0.05”). You, of course, hope to see a value that is smaller than 0.05.
Inductive statistics allow you to draw conclusions about the relationship between variables: whether there is one, how strong it is, and sometimes even what it looks like.
But in another sense, you have learned little about these statistics. Each of these tools has been developed in a context of statistical theory, very little of which has been transmitted here. This unit is no substitute for a course in statistics. To really understand parametric statistics, you should study the “normal curve” and its characteristics. To really understand significance testing, you should study probability theory.
There is another sense in which the discussion here is lacking: I have presented the statistics in their basic form. Almost all of them have corrections which must be applied in special circumstances. Chisquare, in the form I have given you, requires a minimum value of 10 for the expected frequency in each cell. There are adjustments to the formula which will allow you to tolerate as few as 5 expected observations per cell; under no circumstances will Chisquare work if an expected cell frequency is 0. Adjustments like these are also the topics of courses in statistics.
Choose one of the following assignments. Don’t just “tell me”
that there is a difference—prove that the difference you
observed is, in fact, a “significant” difference (at the .05 level)by reporting and explaining an appropriate quantitative
(statistical) test.
1. Using
the
2. Compare
the average per capita income for Minnesota’s 25 largest cities to the
average for the nation and for the West North Central Region (MN, SD, ND, IA, WI). Are people
in Minnesota cities better off or worse off than the nation as a whole? Than the region? Does it make a difference if you
distinguish between the Metro and outState cities? Was the same relation apparent ten
years ago? What would account for
your findings?
Blalock,
H.M., Jr. 1972. Social Statistics, 2^{nd} Ed..
Chemical
Rubber Company. 1959. CRC Stnadard
mathematical Tables, 12^{th} Ed.
Krueckeberg, D.A. & A. Silvers. 1974. Urban Planning Analysis: Methods and Models. NY: John Wiley & Sons.
Maxwell,
A.E. 1961. Analyzing Qualitative Data.
Willemain, T.R.
1980. Statistical Methods
for Planners.
© 2006 A.J.Filipovitch
Revised 6 February 2010