Statistics II:  Sources of Data & Calculation Process


The Excel spreadsheet for this unit is called Stats2.  As with the descriptive and inductive statistics, you will  have no problem finding data which can be analyzed using parametric statistics.  If the one of the variables is categorical (“nominal” is the technical term for such data) and the other is interval, use ANOVA.  If both variables are measured on interval scales (or can be assumed to depend on underlying interval scales), use correlation.

 

The table for ANOVA requires that you specify the number of groups and that you provide the upper limits of measurement for each group.  Suppose you have measurements that range from 1 to 100 and you have decided (for some good reason) that there are 4 natural groups, each composed of 25 points.  The upper limit of Group 1 would be 25 (the highest score an observation could have and still be in Group 1), 50 for Group 2, 75 for Group 3, and 100 for Group 4.  The spreadsheet will use these values to divide the observations into groups, based on the score for variable 1 (the”x” variable).  If the data for variable 1 are already measured in categorical terms, simply repeat the specifications in the Summary table.  In other words, if you had already coded the data from 1 to 4 (rather than from 1 to 100), then the upper limits for the groups are 1, 2, 3, and 4.

 

The results of an ANOVA and of a correlation analysis are reported at the top of the spreadsheet in the “Summary Statistics” table.  The data input section is below that.  It provides a great number of columns used to calculate the two statistics:

 

Data Input Sheet

 

 

 

 

 

 

 

 

"x"

"y"

Observed - Average

Deviation Squared

Deviation 1

Prediction

Residual

Residual

Variable 1

Variable 2

Variable 1

Variable 2

Variable 1

Variable 2

* Deviatin 2

Variable 2

Variable 2

Squared

………

……….

#VALUE!

#VALUE!

#VALUE!

#VALUE!

#VALUE!

#NAME?

#NAME?

#NAME?

 

The “Observed – Average” columns calculate the deviation of each observation from its respective mean; the next two columns calculate the squared deviation for each item.  The next four columns are used for correlation.  “Deviation1 * Deviation2” is the covariance.  The next column calculates the value for each observation as predicted by the regression line (y=a+bx).  The last two columns calculate and square the “residuals” (the left-over variation in y which is not explained by the effect of x on y).

 

Grp1 Mean #VALUE!

Grp2 Mean  #VALUE!

Grp3 Mean  #VALUE!

Grp4 Mean  #VALUE!

 

 

 

 

          Group 1

          Group 2

          Group 3

           Group 4

     Squared  Deviation from the Group Mean

Scores

Size

Scores

Size

Scores

Size

Scores

Size

Group 1

Group 2

Group 3

Group 4

0

0

0

0

0

0

0

0

0

0

0

0

 

The remaining columns are used to calculate the group characteristics for ANOVA.  The spreadsheet provides for up to 4 groups in the analysis of variance.  If you are using fewer than four groups or more than four groups, you must make the appropriate adjustments, both in the Summary Statistics Table and in the Data Input sheet.  The “Group Mean”  columns divide the observations for variable 2 (the “y” variable) into groups, using the criteria specified in the Summary Statistics table (columns labeled “Scores”), calculates the number in each group (columns labeled “size”), and calculates the mean for each group.  The last part of the Data Input Sheet calculates the squared deviation from the group mean for each observation.

 

 

Summary Statistics

 

 

 

ANOVA

 

Correlation

N of Groups

…………

r =

0

 

Group 1

…………

t =

0

 

Group 2

…………

 

 

 

Group 3

…………

Slope =

0

 

Group 4

…………

Intercept =

0

 

 

 

 

 

 

E Squared =

#VALUE!

 

 

 

F =

#VALUE!

 

 

 

 

Next Section

Back to Syllabus


 

609

 

© 1996 A.J.Filipovitch
Revised 11 March 2005