Correlation is the statistical tool which most clearly expresses the general linear model. To perform a correlation, you must have observations of two characteristics for each case you wish to include, and the observation must both be measured on interval scales. You must further be willing to assume that the distribution underlying the observations is “normal,” or balanced about the mean.
There are several formulas connected with correlation:
r = (Σ
(y - y ) (x - x ))2
Σ(y – y)2 Σ(x - x)2
The correlation coefficient is an index number which will always fall between –1.0 and +1.0. The closer the value is to 1, the stronger the relationship. If the sign of the coefficient is positive, it means that the value of one variable increases if the value of the other one increases. If the sign is negative, it means that an increase in the value of one variable is associated with a decline in the value of the other.
b = Σ (y – y)
(x – x )
Σ (x – x
)2
a = y - βx
t = r
.
√Σ (y – y )2 / (n –
2) Σ(x – x )2
The obtained value of t is compared to the standard values of t in a table. The degrees of freedom are “n-2.” Since you have no way of knowing whether the true value of the correlation is higher or lower than the obtained coefficient, you are testing both possibilities. This is called a “two-tailed” test. In the tables of t-values, the proper values for a two-tailed test at .05 level of probability are listed in the “0.025” column (the .05 is split between the two sides of the distribution).
Analysis of variance applies the general linear model to situations in which one of the variables is measured on an interval scale, but the other variable (the “x,” or causal variable) is membership in a group. For example, a neighborhood group might be complaining that they are not getting their fair share of the city’s park & recreation money. ANOVA will allow you to determine whether there is merit to such a claim. An advantage of ANOVA over correlation is that no assumption need be made that the relationship between the two variables is a straight line. Analysis of variance will work with “U-shaped” or other curvilinear relationships.
In the natural order of things, every member of a group will not behave in exactly the same way. There will be a certain amount of variability within each group. However, if the groups are truly distinct, then it is reasonable to assume that each group also behaves differently from the others. These two sources of variance, within-group variance and between-group variance, should add up to the total difference observable in the entire community. In an idealized situation, all of the variance would be seen between the groups. In other words, if the “group” designation is not causing the difference in the other characteristic, there would be no variance between groups and there would be similar variance within each group.
Analysis of variance calculates a statistic called E2. The formula is:
E2 = 1 – Σ (y – y group)2
Σ (y
– y total)2
E2 calculates the proportion of the total variance which is due to group membership (“between group” variance). One cannot calculate between-group variance directly, but it is what is left over after within-group variance is taken into account. The formula calculates the ratio of within-group variance to total variance and subtracts the result from unity (1), arriving at the ratio of between-group variance to total variance.
Just as a t-Test is used to determine whether the correlation ratio could be explained by chance variation, there is also a test to determine the significance of Analysis of Variance. It is similar to t, and is called the “F-test.” For analysis of variance, the F-statistic is calculated by the ratio of the variance between groups to the variance within groups, standardizing each to take the number of groups and the size of the sample into account:
F = Σ (y – y between)2 / (groups
– 1)
Σ (y – ygroups )2 / (observations – groups)
The two “standardizing” factors are measures of the “degrees of freedom” in each of the terms. As with the t-statistic, one consults a table of F-values using the two measures of degrees of freedom to determine the significance of the analysis. The following table contains only few of the possible values of F. If necessary, consult an F-table in any statistics text or in a collection of standard math tables.
|
Degrees of freedom for Observations (n-c) |
||||||
df for Groups (c-1) |
1 |
10 |
20 |
30 |
40 |
50 |
infinity |
1 |
161.0 |
4.96 |
4.35 |
4.17 |
4.08 |
4.03 |
3.84 |
2 |
200.0 |
4.10 |
3.49 |
3.32 |
3.23 |
3.18 |
2.99 |
3 |
216.0 |
3.71 |
3.10 |
2.92 |
2.84 |
2.79 |
2.6 |
4 |
225.0 |
3.48 |
2.87 |
2.69 |
2.61 |
2.56 |
2.37 |
F-Table for .05 Level of Significance
Analysis of variance tells you if there is a relationship, and how strong it is. The statistics do not tell you where the relationship lies. You must determine that by inspecting the pattern of the group means and the within-group deviation from those means.
© 1996 A.J.Filipovitch
Revised 11 March 2005