The Design of Experiments: Experimental Design

The classical experiment is an artificial setting, constructed so that subjects are randomly assigned to experimental and control conditions, and all conditions are held constant except for one (the experimental condition). As a result, any difference between the experimental and the control groups must have been due to the action of the experimental condition. Often, experiments are designed to prove the obvious. In the real world, random assignment to conditions and total control of conditions never occurs; until the definitive experiment is done, one can never be sure that what one thinks is occurring is not due to some confounding effects.

The argument to be tested in the experiment is expressed as an hypothesis.

The hypothesis (H1) is expressed as a prediction (i.e., “if this occurs then that will happen.”)
The hypothesis is restated in the negative (H0—called the “null hypothesis”).
Attempt to disprove the null hypothesis, in order to affirm the original hypothesis.
The reason for testing the null hypothesis goes back to the character of inductive reasoning (which is what an experiment is): Since you cannot test all of anything, you try to find the exception to what you think is the rule. If you provide the optimum circumstances for the exception to occur, and it still doesn’t occur, then it is likely that it will not occur under less favorable circumstances.

An experiment is designed to provide a formal specification of comparisons. It follows a rigorous form:

Given an observation of differences (variances)—symbolize it as “O2 – O1”
Argue that the difference is produced (it is a result of something)—“X > (O2-O1)”
Determine that the difference is real, not an artifact of the design – This is the issue of “internal validity, “ which will be discussed below
Determine that X is the causative agent – This is the issue of “confounding effects,” also discussed below.

The classical experiment is done in the following form:

O1 X O2 [Experimental Condition] O2 – O1 = A
O3 O4 [Control Condition] O4 – O3 = B
A – B = X [Experimental effect is the difference between A and B]

Any research design must deal with internal threats to the validity of the design. If one uses a classical experimental design, internal validity is assured. These threats to internal validity are:

History (order effect)—This is the possibility that the observed effect could be due to the order of presentation. For example, measuring blood pressure may itself raise a subject’s blood pressure. In a true experiment, this effect would also occur in the control group, and so be subtracted from the measurement of the effect in the experimental group.
Maturation (process within subject due to passage of time)—For example, measurements of children’s scholastic achievement at intervals of a year would be expected to show increases simply from the experience of living (adult performance measures may be more stable).
Instrumentation (change in calibration or observer)—Human observers will differ slightly in how they record their observations; mechanical devices are subject to wear and fatigue over time.
Testing (prior experience affects subsequent behavior)—This can be seen as a form of maturation, but it is due to experience with the measurement instrumentation itself. One of the reasons that “teaching to the test “ works is that it gives the students experience with the form and the content of the questions they are likely to face. The longer the time between testing events, the less is the impact of this threat to validity.
Selection (bias in original assignment)—If subjects are not randomly assigned to conditions, there might be characteristics of the subjects themselves which leads to differences between the experimental and control group. For example, one of the difficulties of assessing the effectiveness of strategies for alleviating poverty is that there may be other characteristics other than income which put people among the poor (such as educational achievement, or ….)
Mortality (differential loss of respondents)—Not only may there be differences in the characteristics of the people who begin in each group, there may also be differences in who remains in the group to the end. One of the difficulties in comparing the effectiveness of public and private K-12 education is that the private schools may not retain their more difficult students to the same extent that the public schools will.
Regression (groups selected for extreme scores will shift toward the mean)—Very tall people tend to have slightly shorter children, and very short people tend to have slightly taller children.

If all of the variables in a setting are not controlled, the initial differences in those variables could interact with the experimental condition to affect the outcome. Control can be achieved in several ways.

Randomization: This is not random sampling (which responds to threats to validity), but is an expression of the probability that any individual event is independent of any other. In an experiment, this is handled by measuring the “error” (unexplained variance) in the control and the experimental group (they should be the same).
Matching: If there are known causes of variance (other than the experimental effect) which cannot be controlled, then the experimental design should be done so that individuals with that characteristic are “matched” in both the control and experimental groups. For example, if one were assessing techniques for introducing technology into an office, one would probably want to see to it that people of similar experience levels are assigned to both the control and experimental groups.
Standardization: In field experiments, one often cannot “assign” individual cases to experimental and control conditions. But one can carry out the analysis by establishing comparable groups, or even by comparing individuals to a “standard.” For example, if one were looking at housing costs one would presumable want to see to it communities of similar sizes are included in both the control and experimental group.
Partialling: A reference population (a “standard”) is often not available. In such cases, linear regression (a statistical tool) can be used to create “internal standardization,” called a “partial correlation.” One can statistically subtract interaction effects among several variables. The more variables are involved in the analysis, the more observations will be needed to partition the variability.

Next Section

Back to Syllabus

609