The Design of Experiments: Experimental Design
The classical experiment is an artificial setting,
constructed so that subjects are randomly assigned to experimental and control
conditions, and all conditions are held constant except for one (the
experimental condition). As a
result, any difference between the experimental and the control groups must
have been due to the action of the experimental condition. Often, experiments are designed to prove
the obvious. In the real world,
random assignment to conditions and total control of conditions never occurs;
until the definitive experiment is done, one can never be sure that what one thinks
is occurring is not due to some confounding effects.
The argument to be tested in the experiment is expressed as an hypothesis.
- The
hypothesis (H1) is expressed as a prediction (i.e., “if this
occurs then that will happen.”)
- The
hypothesis is restated in the negative (H0—called the “null
hypothesis”).
- Attempt
to disprove the null hypothesis, in order to affirm the original
hypothesis.
- The
reason for testing the null hypothesis goes back to the character of
inductive reasoning (which is what an experiment is): Since you cannot test all of
anything, you try to find the exception to what you think is the
rule. If you provide the
optimum circumstances for the exception to occur, and it still
doesn’t occur, then it is likely that it will not occur under less
favorable circumstances.
An experiment is designed to provide a formal specification
of comparisons. It follows a
rigorous form:
- Given
an observation of differences (variances)—symbolize it as “O2
– O1”
- Argue
that the difference is produced (it is a result of
something)—“X > (O2-O1)”
- Determine
that the difference is real, not an artifact of the design – This is
the issue of “internal validity, “ which will be discussed
below
- Determine
that X is the causative agent – This is the issue of
“confounding effects,” also discussed below.
The classical experiment is done in the following form:
- O1 X O2 [Experimental Condition] O2 – O1 = A
- O3 O4 [Control Condition]
O4 – O3 = B
- A
– B = X
[Experimental effect is the difference between A and B]
Any research design must deal with internal threats to the
validity of the design. If one uses
a classical experimental design, internal validity is assured. These threats to
internal validity are:
- History
(order effect)—This is the possibility that
the observed effect could be due to the order of presentation. For example, measuring blood
pressure may itself raise a subject’s blood pressure. In a true experiment, this effect
would also occur in the control group, and so be subtracted from the
measurement of the effect in the experimental group.
- Maturation
(process within subject due to passage of time)—For example,
measurements of children’s scholastic achievement at intervals of a
year would be expected to show increases simply from the experience of living
(adult performance measures may be more stable).
- Instrumentation
(change in calibration or observer)—Human observers will differ
slightly in how they record their observations;
mechanical devices are subject to wear and fatigue over time.
- Testing
(prior experience affects subsequent behavior)—This
can be seen as a form of maturation, but it is due to experience with the
measurement instrumentation itself.
One of the reasons that “teaching to the test “ works
is that it gives the students experience with the form and the content of
the questions they are likely to face. The longer the time between testing
events, the less is the impact of this threat to validity.
- Selection
(bias in original assignment)—If subjects are not randomly assigned
to conditions, there might be characteristics of the subjects themselves
which leads to differences between the experimental and control
group. For example, one of the
difficulties of assessing the effectiveness of strategies for alleviating
poverty is that there may be other characteristics other than income which
put people among the poor (such as educational achievement, or ….)
- Mortality
(differential loss of respondents)—Not only may there be differences
in the characteristics of the people who begin in each group, there may
also be differences in who remains in the group to the end. One of the difficulties in
comparing the effectiveness of public and private K-12 education is that
the private schools may not retain their more difficult students to the
same extent that the public schools will.
- Regression
(groups selected for extreme scores will shift toward the mean)—Very
tall people tend to have slightly shorter children, and very short people
tend to have slightly taller children.
If all of the variables in a setting are not controlled, the
initial differences in those variables could interact with the experimental
condition to affect the outcome.
Control can be achieved in several ways.
- Randomization: This is not random sampling (which
responds to threats to validity), but is an expression of the probability
that any individual event is independent of any other. In an experiment, this is handled
by measuring the “error” (unexplained variance) in the control
and the experimental group (they should be the same).
- Matching: If there are known causes of
variance (other than the experimental effect) which cannot be controlled,
then the experimental design should be done so that individuals with that
characteristic are “matched” in both the control and
experimental groups. For
example, if one were assessing techniques for introducing technology into
an office, one would probably want to see to it that people of similar
experience levels are assigned to both the control and experimental
groups.
- Standardization: In field experiments, one often
cannot “assign” individual cases to experimental and control
conditions. But one can carry
out the analysis by establishing comparable groups, or even by comparing
individuals to a “standard.” For example, if one were looking at
housing costs one would presumable want to see to it communities of
similar sizes are included in both the control and experimental group.
- Partialling:
A reference population (a “standard”) is often not
available. In such cases,
linear regression (a statistical tool) can be used to create
“internal standardization,” called a “partial
correlation.” One can
statistically subtract interaction effects among several variables. The more variables are involved in
the analysis, the more observations will be needed to partition the
variability.
© 1996 A.J.Filipovitch
Revised 11 March 2005