AN EMPIRICAL EXAMINATION OF
ANALYSIS OF COVARIANCE WITH AND WITHOUT
PORTER'S ADJUSTMENT FOR A FALLIBLE COVARIATE
By
JAMES EDWIN McLEAN
A DISSERTATION PRESENTED TO THE GRADUATE
COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL
FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1974
(
DEDICATION
I dedicate this study to the late Dr. Charles M. Bridges, Jr.,
my teacher, advisor and friend.
ACKNOWLEDGMENTS
I wish to express my grateful appreciation to my committee members
for their guidance in this study. Dr. William B. Ware, my chairman,
suggested the topic and was a major influence in its development. Dr.
James T. McClave was a constant source of assistance, both theoretically
and editorially. Drs. Vynce Hines, William Mendenhall, and P. V. Rao
all provided direction along with many helpful suggestions. This study
could not have been completed without the spirit of cooperation which I
encountered between members of the two departments involved.
The late Dr. Charles M. Bridges, Jr. first encouraged me to enroll
in the program and served as my chairman until his death. His counseling
and guidance were a major reason for my successful completion of graduate
study.
The searching questions of my fellow students led to several
worthwhile modifications and I wish to express my appreciation to them.
I also wish to thank my wife, Sharon, who for the last five years,
has held two jobs (wife and medical technologist), so that I might
further my education. She has also been most understanding about the
many hours spent away from home.
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . ... .... .. . vi
ABSTRACT . . . . . . . . ... . . . . . .viii
CHAPTER
I. INTRODUCTION . .. . . . . . . . . . . 1
The Problem of Comparing Groups of Differing Abilities . . 1
Methods of Comparing Groups of Differing Abilities . . . 3
Limitations Used in this Study . . . . . . . . 4
Procedures . . . . . . . ... . . . . 5
Relevance of the Study . . . . . . . . . 7
Organization of the Dissertation . . . . . . . 7
II. REVIEW OF THE RELATED LITERATURE . . . . . . . 8
Historical Review of the Problem . . . . . . . 8
Analysis of Covariance . . . .. . . . . . . 10
Analysis of Covariance with Porter's Adjustment
for a Fallible Covariate . . . . . . . . 11
Summary. . . . . . . . . . . . . .. 14
III. PROCEDURES . . . . . . . . ... .... . 16
The Model. ......................... 16
Selecting the Reliabilities. ............... . 17
The Regression of Y on X . . . . . . . ... 18
Selecting Means. . . . . . . . . . . .. 20
Generation of Random Normal Deviates with Specified
Means and Variances. . . . . . . . . ... 26
Analysis for Comparing the Selected Methods. . . . ... 26
IV. RESULTS. . . . . . . . . ... ....... 29
V. DISCUSSION . . . . . . . . ... ... .. . 41
Comparison of the Two Methods of Analysis. . . . . ... 41
Factors that Affect Alpha and Beta . . . . . ... 42
Page
Predicting Alpha and Power .. ............. 43
A Direction for Future Research. . . . . . . . 46
VI. SUMMARY . . . . . . . . . . .. 47
APPENDIX . .................... ...... 49
Fortran Program Used to Perform Analysis . . . . 50
BIBLIOGRAPHY . . . . . . . . . .......56
BIOGRAPHICAL SKETCH.................. .......59
v
LIST OF TABLES
Table Page
1. CONDITIONS UNDER WHICH COMPARISONS WERE MADE WHEN
THE RELIABILITIES FOR BOTH GROUPS WERE EOUAL. .... .24
2. CONDITIONS UNDER WHICH COMPARISONS WERE MADE WHEN
THE RELIABILITIES FOR BOTH GROUPS WERE UNEQUAL. ... 25
3. FACTORIAL DESIGN FOR STANDARD ANALYSIS OF COVARIANCE
WHEN MEAN GAIN OF GAIN GROUP WAS ZERO AND CRITERION
VARIABLES ARE MONTE CARLO GENERATED ALPHA VALUES. . 28
4. FRACTION OF SIGNIFICANT F'S AND NUMBER OF TIMES F
EXCEEDS F WHERE THERE WAS NO GAIN IN EITHER ROUP
AND THE PRETEST RELIABILITIES WERE EQUAL. . . .. 32
5. FRACTION OF SIGNIFICANT F'S AND NUMBER OF TIMES F
EXCEEDS F WHERE THERE WAS NO GAIN IN EITHER ROUP
AND THE PRETEST RELIABILITIES WERE NOT EQUAL. .... .33
6. FRACTION OF SIGNIFICANT F'S AND NUMBER OF TIMES Fs
EXCEEDS F WHERE THERE WAS GAIN IN THE GAIN GROUP
AND THE PRETEST RELIABILITIES WERE EQUAL. . . .. 34
7. FRACTION OF SIGNIFICANT F'S AND NUMBER OF TIMES F
EXCEEDS Fp WHERE THERE WAS NO GAIN IN THE GAIN
GROUP AND THE PRETEST RELIABILITIES WERE NOT
EQUAL . . . . . . . . . . . 35
8. ANOVA SUMMARY TABLE USING MONTE CARLO GENERATED ALPHAS
THE STANDARD ANALYSIS OF COVARIANCE AS THE CRITERION
VARIABLES . . . . . . . .... ... . 36
9. ANOVA SUMMARY TABLE USING MONTE CARLO GENERATED ALPHAS
FROM ANALYSIS OF COVARIANCE WITH PORTER'S ADJUSTMENT
AS THE CRITERION VARIABLES. . . . . . . ... 37
10. ANOVA SUMMARY TABLE USING MONTE CARLO GENERATED POWERS
FROM THE STANDARD ANALYSIS OF COVARIANCE AS THE
CRITERION VARIABLES . . . . . . . .... 38
11. ANOVA SUMMARY TABLE USING MONTE CARLO GENERATED POWERS
FROM ANALYSIS OF COVARIANCE WITH PORTER'S ADJUSTMENT
AS THE CRITERION VARIABLES. . ... . . . .. 39
Table Page
12. COEFFICIENTS FOR THE LINEAR CONTRASTS USED WHEN RELIA
BILITY WAS SIGNIFICANT IN THE ANALYSIS OF VARIANCE. . 40
13. NINTYFIVE PERCENT CONFIDENCE INTERVALS FOR THE EXPECTED
VALUES OF ALPHAS UNDER SPECIFIED CONDITIONS . . .. 44
14. NINTYFIVE PERCENT CONFIDENCE INTERVALS FOR THE EXPECTED
VALUES OF POWERS UNDER SPECIFIED CONDITIONS . . .. 45
Abstract of Dissertation Presented to the
Graduate Council of the University of Florida in Partial
Fulfillment of the Requirements for the Degree of Doctor of Philosophy
AN EMPIRICAL EXAMINATION OF
ANALYSIS OF COVARIANCE WITH AND WITHOUT
PORTER'S ADJUSTMENT FOR A FALLIBLE COVARIATE
by
James Edwin McLean
August, 1974
Chairman: William B. Ware
Major Department: Foundations of Education
The purpose of this study was to determine if analysis of covariance
or analysis of covariance with Porter's true covariate substitution ad
justment or neither is appropriate for analyzing pretestposttest educa
tional experiments with less than perfectly reliable measures.
Monte Carlo techniques were employed to generate two thousand sam
ples for each of fortyeight sets of conditions. These conditions in
cluded six combinations of reliability, two levels of sample size, two
levels of gain, and the equality or inequality of pretest means. Each
sample was analyzed by both of the tested methods. The proportion of re
jections for each method of analysis and the number of times the F
statistic for standard analysis of covariance exceeded that for analysis
of covariance with Porter's adjustment were recorded.
viii
The hypothesis, "The sampling distributions of the test statistics
from analysis of covariance with and without Porter's adjustments are
the same," was rejected at the .01 level of significance using the sign
test for each of the fortyeight sets of conditions.
To gain further insight into how the two methods of analysis dif
fered, four factorial experiments were conducted using either the com
puter generated alphas or powers as the criterion variables. The first
factorial experiment examined analysis of covariance when the mean gain
was zero in both groups, thus, the computer generated alphas were used
as criterion variables. The second experiment examined analysis of co
variance with Porter's adjustment when the mean gain was zero in each
group. Again, computer generated alphas were used as criterion variables.
The third and fourth factorial experiments examined standard analysis of
covariance and analysis of covariance with Porter's adjustment when the
mean gain in only one group was positive. The four factorial experiments
included three factors. These were reliability at six levels, sample
size at two levels, and the equality of pretest means at two levels
pretestt means equal and pretest means not equal). In each experiment,
it was found that unequal pretest means combined with low reliability
were significant sources of variation.
Additional study showed that unequal pretest (covariate) means com
bined with low reliability produced misleading results. More specifi
cally, when either method of analysis produced a significant F statis
tic and there was a mean gain of zero in both groups but the pretest
mean of one group was less than that of the other, the adjusted post
test means indicated that the group with the higher pretest mean had
the larger gain. Likewise, when there was a positive mean gain in the
group with the lower pretest mean, the adjusted posttest means indicated
the group with a mean gain of zero had the larger gain.
The results of this study combined with the results of others
concerned with analysis of covariance where the covariates are fallible
point to the inadequacy of the technique as it now exists. These
results should be kept in mind if the technique is to be used.
x
CHAPTER I
INTRODUCTION
Regardless of the theory or hypothesis being investigated, educa
tional researchers must, at some point, relate it to learning. One
common element of most definitions of learning is that a change must
take place in the learner (DeCecco, 1968, p. 243; Hilgard, 1956, p. 3;
Hill, 1971, p. 12). Thus most investigations in education are concerned
with identifying, measuring, and comparing these changes.
In the past decade, analysis of covariance has become a standard
procedure for comparing groups with different levels of ability. The
basic derivation of this procedure requires the assumption that the co
variable does not contain errors of measurement (Cochran, 1957). In
recent years, several researchers have recognized that this assumption
is violated when.the covariate is a mental test score (Lord, 1960;
Porter, 1967; Campbell and Erlebacher, 1970). Lord (1960) and Porter
(1967) have proposed adjustments to the analysis of covariance procedure
to use when the covariable is fallible, that is, when measurement error
is present in the covariable. The focus of this study is to determine
if analysis of covariance and analysis of covariance with Porter's
adjustment are different for a fallible covariate and if either is
appropriate under varying conditions of reliability, sample size, gain,
and pretest mean.
The Problem of ComparingGroups of Differing Abilities
The problem of comparing groups based on change manifests itself
prominently when the variables are mental measurements. Most physical
science measurements are made with the aid of some type of physical
instrument. If an engineer is interested in the weight gain of a metal
before and after a galvanizing process, instruments are available to
measure the weight of the metal to within at least one microgram. On
the other hand, if an educator is interested in the change in a student's
I.Q. before and after taking part in an experimental program, an error
of 5 points or more would not be uncommon. Based on the Wechsler
Intelligence Scale for Children with a standard error of measurement of
5.0 (Cronbach, 1970, p. 222), an error of 5.0 I.Q. points or more would
occur in approximately 32 percent of the measurements, assuming a normal
distribution. If the actual change in I.Q. were near zero for an
individual, it is very likely that the error of measurement would exceed
the change itself.
The problem is compounded further as the groups being compared often
are not of the same ability level. Many of the recent programs featuring
innovative teaching practices for compensatory education are available
only for the most needy and the comparison group is then sampled from
the general population of students (Campbell and Erlebacher, 1970).
Programs such as Head Start and Follow Through have probably been the
victim of tragically misleading analyses, such as in the Westinghouse/
Ohio University study (Campbell and Erlebacher, 1970). Personal contact
with the evaluation of Follow Through projects has emphasized the
magnitude of the problem. Warnings about the inappropriate uses of
existing modes of analysis have come from several sources (Lord, 1960,
1967, 1969; Campbell and.Erlebacher, 1970).
Remedies have been offered (Lord, 1960; Porter, 1967), but at this
time there is no conclusive evidence to indicate these remedies are
appropriate. Studies concerning the robustness of the analyses to
violated assumptions have also been in conflict (Peckham, 1970).
Methods of Comparing Groups of Differing Abilities
Two general approaches are available to compare groups with dif
fering abilities. The first is to compare the average change in one
group with the average change in the other group by means of a t test or
analysis of variance. The second is to use analysis of covariance, where
the pretest score is the covariate.
There are at least three methods of computing change scores to be
considered if the first approach is used. One method is to obtain raw
difference scores by simply subtracting each pretest score from its
corresponding posttest score. Another method is Lord's true gain (Lord,
1956) which was further developed by McNemar (1958). Basically, this
method requires the use of a regression equation to estimate the "true"
gain or change from pretest to posttest. A third method of measuring
gain, the method of residual gain, was proposed by Manning and DuBois
(1962). This method involves regressing the posttest scores on the
pretest scores and obtaining a predicted posttest score for each subject.
The indicator of change is the observed posttest score minus the pre
dicted posttest score. This method is.mathematically equivalent to the
standard analysis of covariance (Neel, 1970, p. 3031), the second
approach.
The second general approach is the standard analysis of covariance
procedure found in many statistical texts (e.g., Kirk, 1968; Winer, 1971).
One of the reported functions of a covariate is to adjust the treatment
means of the dependent variable for differences in the values of corres
ponding independent variables (Hicks, 1965). Logically, it seems that by
using the pretest scores of groups differing in ability as a covariate,
the treatment means can be statistically adjusted for the differences in
pretest means. A deficiency of this method is the fallibility of the
covariate. A fallible variable is one which is not measured without
error or with perfect reliability (Lord, 1960). Lord (1960), recognizing
this deficiency in the use of analysis of covariance, derived an adjust
ment to compensate for the fact that the pretest scores were fallible.
Porter (1967) modified Lord's procedure for more than two groups and
empirically investigated the sampling distribution of Lord's statistic.
Porter's solution has been suggested as an alternative to the standard >
analysis of covariance when the covariate is fallible.
Limitations Used in this Study
This study was designed to examine the characteristics of the two
covariance methods within the framework of simulated situations which
may occur in the analysis of a compensatory education project. Realis
tically, several limitations were imposed. Even with these limitations,
the two methods of analysis were investigated under fortyeight combi
nations of differing reliability, sample size, mean gain, and pretest
means.
The study was limited to two groups, henceforth called the gain and
nogain groups. Investigating more than two groups would not only in
crease the number of possibilities appreciably but would not be in line
with the objective stated previously, that is, comparing a compensatory
education group with a "comparison" group. The levels of the reliabil
ities being investigated are .also limited to the range .50 to .90. Sel
dom do mental measurement instruments have reliabilities above .90 and
instruments with reliabilities below .50 are generally considered in
adequate.
The levels of.sample size were 10 and 100 subjects per group.
These quantities are representative of small and large sample sizes en
countered in educational experiments.
Furthermore, the true score and true gain variances will be fixed
a priori with the true gain variance always four percent of the true
score variance. These figures are empirically based on Follow Through
achievement data in keeping with the aforementioned objective.
Procedures
The study compared the two methods of analysis under fortyeight
sets of conditions. The two methods were analysis of covariance and
analysis of covariance with Porter's adjustment for a fallible covariate.
Three levels of reliability and two levels of sample size were used.
The three levels of reliability will include situations where the re
liability of the protests are assumed equal and situations where they
differ to more closely simulate compensatory education project evalua
tion. Each of these combinations of conditions was repeated for four
separate cases. Case I is where there is nogain in either group and
both groups have equal pretest means. Case II is where there is gain in
only one group (the gain group) and both groups have equal pretest means.
Cases III and IV are with and without gain where the pretest means are
different.
For each of the fortyeight sets of conditions, two thousand samples
were computer generated and analyzed using both methods of analysis and
a .05 level of significance. Two thousand samples provide empirical
estimates of the fraction of type I and type II errors to within .01 of
their true values with ninetyfive percent confidence. The samples were
generated from an assumed normal population. The generation technique
is one proposed by Box and Muller (1958) and modified by Marsaglia and
Bray (1964). This procedure generates random normal deviates with a
mean and variance of 0 and 1 respectively. These normal variables are
then transformed to attain the specified means and variances. The IBM
370/165 computer of the North Florida Regional Data Center was used for
the generation and analyses.
Using these analyses, two research questions were examined:
1. Is there any difference between the sampling distributions
of the test statistics of standard analysis of covariance
and analysis of covariance with Porter's adjustment?
2. What factors affect each type of analysis?
The first research question can be restated in terms of a null
hypothesis for each set of conditions:
The sampling distributions of the test statistics from
analysis of covariance with and without Porter's adjustment
are the same for each set of conditions.
This hypothesis was tested for each reliability, sample size, gain, and
pretest mean combination.
The second research question was examined in four factorial experi
ments. Each factorial experiment included the factors of reliability at
six levels, sample size at two levels, and equality of pretest means at
two levels. The first experiment consisted of examining standard analy
sis of covariance with computer generated alpha values as the criterion
variables. The second examined analysis of covariance with Porter's cor
rection also using computer generated alpha values as the criterion var
iables. The third and fourth factorial experiments examined analysis of
covariance with and without Porter's adjustment respectively, using
computer generated powers as criterion variables.
Relevance of the Study
This study is designed to compare two recommended methods of ana
lyzing educational experiments under known simulated conditions which
are presumably realistic. The results should indicate whether either
of the methods is appropriate for its recommended use, thus giving edu
cational researchers some direction when faced with the choice. If both
methods were shown to be appropriate, the study could determine which
one is superior. The study should also indicate if either method of
analysis is appropriate for a set of restricted conditions, e.g. for
only certain levels of reliability.
Organization of the Dissertation
A statement of the problem, a description of possible solutions,
and an overview of the procedure of this study has been included in
Chapter I. A comprehensive review of the related literature is provided
in Chapter II. This review includes a historical overview of the pro
blem and a detailed description of the methods being compared. A
description of the procedures followed for this study is contained in
Chapter III. The data and analyses are presented in Chapter IV and a
discussion of the results is provided in Chapter V. A summary of the
study is provided in Chapter VI.
CHAPTER II
REVIEW OF THE RELATED LITERATURE
Comparing groups of differing abilities has been a problem for some
time. The literature has included discussions of possible solutions for
at least the past three decades. Many of these discussions have been in
conflict with one another, and there is still no general agreement. The
evaluation of federal compensatory education projects in the last decade
has intensified the debate and the need for a theoretically sound and
practically useful solution. Some of the solutions proposed over the
years are matching subjects, gain scores, and analysis of covariance.
A discussion of these solutions and why they may not be considered
satisfactory is provided in this chapter. The major properties of
analysis of covariance are also considered. Because Porter's adjustment
for a fallible covariable in analysis of covariance is not readily avail
able in the literature, its derivation is provided in this chapter.
Historical Review of the Problem
A paper by Thorndike (1942) examined in detail the fallacies of
comparing groups of differing abilities by matching subjects. The crux
of his argument was that the regression effects were systematically dif
ferent "whenever matched groups are drawn from populations which differ
with regard to the characteristics being studied," (p. 85).
During the late 1950's and early 1960's the literature was inundated
with papers on how to or how not to measure gain or change. One of the
leaders during this period was Lord (1956, 1958, 1959, 1963). His
proposal for estimating the "true" gain was originally set forth in 1956
with further developments in 1958 and 1959. McNemar (1958) extended his
results for the case of unequal variances among the groups. Garside
(1956) also proposed a method for estimating gain scores and Manning and
DuBois (1962) presented their derivation of residual.gain scores.
The debate over what techniques, if any, should be used for measuring
gain or change was at its peak in the early 1970's. Cronbach and Furby
(1970) concluded that one should generally rephrase his questions about
gain in other ways. Marks and Martin (1973) underscored the Cronbach
and Furby (1970) conclusion. O'Connor (1972) reviewed developments of
gain scores in terms of classical test theory. Neel (1970) employed
Monte Carlo techniques to compare four identified methods for measuring
gain. The compared methods were raw difference, Lord's true gain,
residual gain, and analysis of covariance. Under equivalent conditions,
he found that Lord's true gain tended to produce a greater significance
level than the user would intend, that is, a higher fraction of type I
errors than alpha.
The analysis of covariance method has been widely recommended by a
number of authorities in the field (Thorndike, 1942; Campbell and Stanley,
1963; O'Connor, 1972). Campbell and Erlebacher (1970) point out that the
Westinghouse/Ohio University Study was evaluated, possibly incorrectly,
using the analysis of covariance. Lord (1960, 1967, 1969) has sounded
a warning about its use that has been echoed by others (Werts and Linn,
1970; Campbell and Erlebacher, 1970; Winer, 1971). The warning stressed
that the analysis of covariance requires the assumption that the covariate"
is measured without error. Lord (1960) has proposed an adjustment which
was later generalized by Porter (1967). The adjustment is based on the
substitution of the true score estimate for the observed value of the
covariate.
Analysis of Covariance
The analysis of covariance procedure was originally introduced by
Sir Ronald A. Fisher (1932, 1935). According to Fisher (1946, p. 281),
the analysis of covariance "combines the advantages and reconciles the
requirements of two widely applicable procedures known as regression and
analysis of variance." The procedure is well documented in contemporary
texts (Snedecor and Cochran, 1967, p. 419446; Kirk, 1968, p. 455489;
Winer, 1971, p. 752812). Analysis of covariance is a popular technique
in both the physical and social sciences.
Among the principal uses of analysis of covariance pointed out by
Cochran (1957), p. 264) is "to remove the effects of disturbing variables
in observational studies." It was thought that by using the pretest score
as the covariate and comparing two groups with analysis of covariance,
the effects of different pretest scores could be eliminated. Lord (1960)
pointed out that this was not necessarily the case when the covariate
was fallible. The assumptions necessary for the analysis of covariance
are the same as those for analysis of variance with the addition of the
following:
1. The covariates are measured without error.
2. The regression coefficient is constant across all treatment
groups (Peckham, 1970).
Violation of the first assumption induces a bias in the analysis
of covariance because of "the presence of 'error' and 'uniqueness' in
the covariate, i.e. variance not shared by the dependent variable. If
the proportion of such variance can be correctly estimated, it can be
corrected for," (Campbell and Erlebacher, 1970, p. 199). This position
was upheld by Glass et al. (1972). The basic algebraic derivation of
this correction was presented by Lord (1960) and expanded by Porter (1967).
Analysis of Covariance with Porter's Adjustment for a Fallible Covariate
Lord's (1960) derivation of the analysis of covariance adjustment
was limited to two treatment groups and is slightly more difficult than
is Porter's (1967) procedure. Thus, Porter's procedure will be derived
here and used in the analysis. Porter's adjustment is based on the sub
stitution of the estimated true value for the covariate.
Let X denote a fallible variable (e.g. a pretest score), r an esti
mate of the reliability of X, and Ti the true value of the variable Xi.
Let Ti denote the estimated true score of Xi. The definition of Ti for
each individual is
(1) Ti = X + r(XiX),
or Ti = rXi + Y(lr).
Porter (1967) derived the mean and .variance of Ti as follows. By
definition, the mean of all Ti's is:
(2) T = Ti
N
= z[rXi+X(lr)]
N
by substitution. Thus
T = rX + NXNrX ,
N
where N denotes the sample size and it is understood that i is summed
over the values 1, 2, , N.
Also, by definition, the variance of T is
2 2
(3) S T = z(TiX)
T
N1
= r2S2,
and
(4) Sy = z(TiX)(Yi) ,
N1
= rSxy,
where Y is the dependent variable.
Based on these results, Porter (1967) derived an analysis of covar
iance procedure replacing the fallible covariate, Xi, with the estimated
true score, Ti. The analysis of covariance requires the computation of
analysis of variance sums of squares for the dependent variable, the co
variable, and on the crossproducts of the dependent variable and the co
variable. Porter (1967) showed that the use of estimated true scores for
the covariable did not affect the analysis of variance of the dependent
variable, Y. The changes found by Porter (1967) in the other two cases
are as follows:
For the analysis of variance of the estimated true scores, T,
(5) SSW = E(TijT.j)2
where SS denotes the within groups sum of squares,
(6) SSB. = nz(T.jt..)2
= n(7. jX..)2
where SSB, denotes the between groups sum of squares,
(7) SST. = z( ijf..)2
= r2zEX +(lr)E( Xij)2 (ZEij)2,
where SST^ denotes the total sum of squares, n denotes the number of
(T,Y) pairs per treatment group, and N denotes the total number of (T,Y)
pairs. In a similar manner, the crossproducts sums of squares are:
(8) SSwjY = rcz(XijX.j) (YijY.j),
(9) SSBef = nz(X.jX..)(Y.jY..),
(10) SSTi = rzXijYij+(lr)z (Exij)(EYij_(xXY'ij)2
(10) TY 1 
Thus, the adjusted sums of squares are
(rSS )2
(11) SSW' = SSW XY
r2SSWX
(SS )2
= SS (SSWXY 2
Y SS
W
X
(rSSy + SSB )2
(12) SS'T = SST (rSXY + SSB
r2SSW + SSB
SS'B = SS'T SS'W .
Note that the adjusted within groups sum of squares remains un
changed by the substitution of T for X but the substitution does alter
the adjusted total sum of squares and consequently, the adjusted between
groups sum of squares.
A question arises concerning what value to use as the estimate of
reliability in the formulae. A solution to this problem was proposed
by Campbell and Erlebacher (1970).
In a pretestposttest situation one may find it reasonable
to make two assumptions that would generate appropriate common
factor coefficients. First, if one has only the pretestposttest
correlations, one may assume that the correlation in the experi
mental group was unaffected by the treatment. (We need a survey
of experience in true experiments to check on this.) Second, one
may assume that the commonfactor coefficient is the same for
both pretest and posttest. Under these assumptions, the pretest
posttest correlation coefficient itself becomes the relevant
commonfactor coefficient for the pretest or covariate, the
"reliability" to be used in Lord's and Porter's formulas. (p. 200)
This recommendation will be followed in this study, thus the corre
lation between the pretest scores and posttest scores will be used as
the reliability estimate in. the formulae.
Summary
As noted, analysis of covariance and analysis of covariance with
Porter's adjustment have been recommended by several authors. Campbell
and Erlebacher (1970) computer generated data for two overlapping
groups with no true treatment effect and concluded that the analysis
of covariance method was inappropriate and that analysis of covariance
with Porter's adjustment should only be undertaken with great tenta
tiveness. Porter (1967) computer generated data to compare the F
sampling distribution with the theoretical F distribution using his
adjustment. He concluded that samples of 20 or larger were needed to
have a useful approximation to the theoretical F distribution. He also
15
found that the estimation degenerated further when the reliability was
less than .7. No study was found which compared both techniques under
similar conditions for both gain and nogain groups.
CHAPTER III
PROCEDURES
The analysis of covariance and the analysis of covariance with
Porter's adjustment were compared under fortyeight sets of conditions
on the basis of computer generated data. Six combinations of reliability,
two sample sizes, two levels of gain, and two different sets of pretest
means were used. Both equal and unequal reliabilities were used in
the comparisons. A random sample of two thousand observations was gen
erated under each combination of conditions and analyzed by analysis of
covariance with and without Porter's adjustment.
The sampling distributions of the two analyses were statistically
compared with a sign test for each of the fortheight sets of conditions.
The computer generated alpha values and powers were then used as the
criterion variables in four factorial experiments. Subsequent a poste
riori analyses were performed where warranted.
The Model
A standard model was used to represent the pretest and posttest
scores of one subject. The model follows the traditional measurement
approach as found in Gulliksen (1950) or Lord and Novick (1968) and
extended to gain score theory by O'Connor (1972).
(14) X = T + El
and
(15) Y = T + G + E2
where
X = observed pretest score,
Y = observed posttest score,
T = true pretest score
G = true gain,
E1= random measurement error in pretest score,
E2= random measurement error in posttest score.
The following properties about E1 and E2 are assumed to exist:
The errors El and E2
i) have zero means in the group tested,
ii) have the same variances for both groups,
iii) are independent of each other and of the true parts of
each test.
It is further assumed that T and G are independent (across subjects) and
that all components follow a normal probability distribution. These
assumptions parallel Lord's' (1956) stated and implied assumptions.
Selecting the Reliabilities
Reliability is related by definition to the variances of the ob
served scores, the true scores, and error. This section shows how that
relationship can be used to establish desired reliabilities. The basic
definition of reliability (Helmstadter, 1964, p. 62) is given by
equation (16) where the symbol, pXX, denotes reliability.
(16) pXX = X'2E
2
aX
Then,
~
(17) 02 = o2(1p )
E X XX
Choosing the variance of X a prior to be 100, the variance of El can be
found in the following manner as the reliability of X assumes different
values:
(18) o2 = 100(1p )
El xx
The independence of T and El in (14) implies
(19) 02 02 + 2
X T El
Combining (18) and (19) and solving for o2 yields
(20) o2 = 100
T XX
The posttest variances can be selected in a similar manner. Based
on the assumptions,
(21) 02 ,2 + G2 + 2
Y T G E
The variance of the gain scores is chosen in the manner prescribed in
Chapter I. Then, combining (16) and (21),
(22) 2 = 2 o G 02T oG
PYY
where pyy denotes the established reliability of the posttest scores.
Thus it can be seen that the effect of selecting specified reliabilities
can be obtained by selecting the variances of El, E2, and T in accor
dance with (18), (20), and (22) respectively.
The Regression of Y on X
Recall that one of the assumptions necessary for analysis of co
variance (Cochran, 1957) is that the regression slopes of the dependent
variable on the covariate must be equal for each treatment group. Let
BY.X denote the regression slope for one treatment group.
Then,
(23) BX = PXY OY (Ferguson, 1971, p. 113),
oX
implies
(24) BYX = XY
2 2
T T E
The covariance between X and Y is equal to the variance of T since it
has been assumed that all the components of the pretest and posttest are
independent except Twith itself. Therefore
2
(25) By.X = T ,
2 2
OT +OE1
and
(26) eX = PXX
by equation (16). Thus, the slope, Y on X, of any group is equal to the
reliability of its pretest. If the reliabilities of the protests were
the same for both groups, the assumptions concerning slopes would be
satisfied.
From equation (16), it can be seen that the reliability of a test
is dependent to a certain extent upon the variability'of the sample for
which it is given. In equation (16), o2 is in both the numerator and
denominator of the fraction with o2 being subtracted from the numerator.
As o~ increases both the numerator and denominator increase by the same
amount assuming oE is not changed thus, pXX increases. Two groups of
equal ability would likely produce similar variances, hence similar
reliabilities. However, the technique is often used for comparing
compensatory programs with a comparison group sampled from the general
population of untreated children in the same community such as with the
Westinghouse/Ohio University study (Campbell and Erlebacher, 1970). These
comparison children tend to be higher in ability than the treatment group.
The resulting differences in variation tend to produce different reli
abilities. In order to simulate such a situation this study used dif
ferent reliabilities for the gain and the nogain groups in addition to
the comparisons when the reliabilities are equal. The reliabilities for
the gain group are decreased by fifteen percent which roughly approxi
mates the reduced variation empirically observed in Project Follow
Through achievement data.
Selecting Means
The true score mean was set a priori at 100 when both the gain and
nogain groups have equal means. The situation in which the gain group
has a lower mean was also analyzed. In this case, the true score mean
of the gain group was set at 80. These values have been empirically
chosen based on Project Follow Through data. Based on the assumptions,
(27) E(X) = E(T+E1) = PT'
hence the mean of the observed pretest scores is equal to the mean of
the true scores. Also
(28) E(Y) = E(T+G+E2) = "T+"G.
Thus, the mean of the observed posttest scores is equal to the sum of
the means of the true scores and the gain scores.
Clearly in the case of the nogain group and in both groups where
no gain was used, the mean gain was zero. The selected value of pG for
the gain situation was based on power considerations, that is, PG was
chosen such that the power of an F test for analysis of covariance was
.50.
A linear model representation of the analysis of covariance for two
groups is
(29) Y = B0 +BX+B2W+E
where X is the pretest score (covariate), Y is the posttest score, and
W is a dummy variable designating group membership (W=l if gain group,
0 if nogain group). Testing the hypothesis that 62 is equal to 0 in
equation (29) is equivalent to the F test for treatments in the analysis
of covariance procedure. It can be shown that 02 in equation (29) is
equivalent to the mean gain, pG. The mean of the posttest for the gain
group is
(30) E(YG) = BO+B60 XG+02
where YG is a posttest score for the gain group and pXG is the mean pre
test score for the gain group. Likewise, the mean of the posttest for
the nogain group is
(31) E(YNG) = B0"1PXNG
where YNG is a pretest score for the nogain group, pXNG is the mean pre
test score for the no gain group. But it has been assumed that pXG and
PXNG are equal. Furthermore, the gain group has mean gain, uG and the
nogain group has mean gain zero, thus
(32) "G = E(YG)E(YNG) = B2
Hence, choosing the value of B2 that yields a power of .50 is equivalent
to choosing a value of PG to produce a power of .50 in the analysis of
covariance procedure.
This power can be obtained from the following probability state
ment:
(33) Pr[t*>tj = .50
where t* is a noncentral t statistic. This expression can be approxi
mated by the substitution of az statistic for t*.
(34) Pr~2B2 > 2 = .50 .
Thus a value of B2 can be chosen such that the substituted z
statistic is equal to t 025 with the appropriate degrees of freedom.
This value of 82 is the value of the average gain, 1G, such that the
power of analysis of covariance is .50 under the condition of perfect
reliability. In order to find this value of PG, a numerical expression
for 02 is needed.
B2
The variance of #2, 2, can be approximated in the following
manner. Karmel and Polasek (1970, p. 245) state that o2 is
12
(35) 02 = G2 z(XY)2
2 2 2
[z(XX) 2[z(wiq) 2[z(X)(WT)]2
Dividing both the numerator and denominator by N2 and substituting
population variances for sample variances, o? is approximately equal
2
to the following:
2 2
(36) o : Y oX
82 N 2 2 2
0X 0W (oXW
The quantity, OXW, the covariance between X and W has been assumed equal
to zero thus, in equation (36), the o2's divide out. Hence
x
(37) o1 = 1
B2 N 2
OW
2
Under the conditions assumed for the model, the variance of Y, GY,
2
is equal to 4. The variance of W, OW can be computed to be .25. Thus,
2 2
by substitution, 2 equals .80 when N is equal to 20 and 2 equals .08
when N is equal to 200.
Hence, for N = 20, vG can be found by the following expression:
(38) PG20 =t.025,172 = 2.11 vl17 = 1.88.
Likewise, for N = 200, "G can be found by solving equation (39).
(39) G200 = t.025,19702 = 1.97v08 = .56.
The approximated values of pG were tested using Monte Carlo generated
variables and found to indeed produce a power of .50.
The approximated values of vG under each set of conditions are
shown in Tables 1 and 2.
TABLE 1
CONDITIONS UNDER WHICH COMPARISONS WERE MADE WHEN
THE RELIABILITIES FOR BOTH GROUPS WERE EQUAL
GROUP GAIN GROUP PRETEST PRETEST
SAMPLE MEAN MEAN FOR MEAN FOR
RELIABILITY SIZE GAIN GAIN GROUP NOGAIN GROUP
.90 10 0 100 100
.90 10 0 80 100
.90 100 0 100 100
.90 100 0 80 100
.70 10 0 100 100
.70 10 0 80 100
.70 100 0 100 100
.70 100 0 80 100
.50 10 0 100 100
.50 10 0 80 100
.50 100 0 100 100
.50 100 0 80 100
.90 10 1.88 100 100
.90 10 1.88 80 100
.90 100 .56 100 100
.90 100. .56 80 100
.70 10 1.88 100 100
.70 10 1.88 80 100
.70 100 .56 100 100
.70 100 .56 80 100
.50 10 1.88 100 100
.50 10 1.88 80 100
.50 100 .56 100 100
.50 100 .56 80 100
TABLE 2
CONDITIONS UNDER WHICH COMPARISONS WERE MADE WHEN
THE RELIABILITIES FOR BOTH GROUPS WERE UNEQUAL
RELIABILITY RELIABILITY GROUP GAIN GROUP PRETEST PRETEST
FOR GAIN FOR NOGAIN SAMPLE MEAN MEAN FOR MEAN FOR
GROUP GROUP SIZE GAIN GAIN GROUP NOGAIN GROUP
.76 .90 10 0 100 100
.76 .90 10 0 80 100
.76 .90 100 0 100 100
.76 .90 100 0 80 100
.60 .70 10 0 100 100
.60 .70 10 0 80 100
.60 .70 100 0 100 100
.60 .70 100 0 80 100
.42 .50 10 0 100 100
.42 .50 10 0 80 100
.42. .50 100 0 100 100
.42 .50 100 0 80 100
.76 .90 10 1.88 100 100
.76 .90 10 1.88 80 100
.76 .90 100 .56 100 100
.76 .90 100 .56 80 100
.60 .70 10 1.88 100 100
.60 .70 10 1.88 80 100
.60 .70 100 .56 100 100
.60 .70 100 .56 80 100
.42 .50 10 1.88 100 100
.42 .50 10 1.88 80 100
.42 .50 100 .56 100 100
.42 .50 100 .56 80 100
Generation of Random Normal Deviates with Specified Means and Variances
The study required the use of computer generated normally distrib
uted random variables with specified means and variances. Two thousand
sets of variables were generated for each of the fortyeight sets of
conditions. Muller (1959) identified and compared six methods of gen
erating normal deviates on the computer. A method described by Box and
Muller (1958) was judged most attractive from a mathematical standpoint.
According to Muller (1959, p. 379), "Mathematically this approach has the
attractive advantage that the transformation for going from uniform deviates
to normal deviates is exact." This method was endorsed by Marsaglia and
Bray (1964). They modified the algorithm to reduce central processing
computer time without altering its accuracy.
The method first requires the generation of two independent uniform
random variables, U1 and U2, over the interval (1, 1). The variables
Z1 = U [2 In(U +U2) / 2 + 1/2
and
Z2 = U2[2 ln(U+U2) / (U2 + U)]1/2
will be two independent random variables from the same normal distrib
ution with mean zero and unit variance. The variables were then trans
formed to have the desired means and variances.
Analysis for Comparing the Selected Methods
Each of the two thousand sets of generated data for the fortyeight
sets of conditions was analyzed by analysis of covariance and analysis
of covariance with Porter's adjustment. The number of times the F
statistic from analysis of covariance exceeded the F statistic from
analysis of covariance with Porter's adjustment was noted along with
the proportion of rejections by each method of analysis.
A sign test (Siegel, 1956, p. 6367) was then performed for each
set of conditions to test the null hypotheses in Chapter I, that is,
"the sampling distributions of the test statistics from analysis of
covariance with and without Porter's adjustment will be the same for
each set of conditions." These tests were run at the .01 level of
significance.
The fraction of rejections noted alphass and powers) was then used
as the dependent variable in four factorial experiments to gain further
insight into what factors affected each method of analysis being studied.
Each factorial experiment had three factors. These were reliability
which included six combinations of reliability used in the study, sample
size, which included n = 10 and n = 100 subjects per group, and the
equality of pretest means, which included a level where both pretest means
were equal and a level where they differed. The layout of the factorial
experiments is illustrated in Table 3. The factorial experiment illus
trated is the situation for which there was no gain in either group and
the standard analysis of covariance was used. Thus, the dependent variables
are the alpha values generated by the computer. When significant main
effects or interactions occurred, the appropriate a posteriori analytical
procedures to locate the sources of the variation were followed.
0
0 CM 0
Sa 0
O 0
in o
S O Lt
< 00
CD O o
: c. o.
W>< 00
UJ CC
J02 Z
U) 0 0 C
)0 0 C 0 C'.n
M Z C Z t 00
St4 J r 0U
m < C o C 0
< =0 0
F CD _J N
U02 U)n 00
CS) co oll
O 0 0.
c1 ) coN
2 0 N
Ct Co
C =D
0Z 02 02
2m
U) 02
02 U) U)
co 02o 02U
0LI (f) rV)
a0 H : H Z=
:rZ 0 CM0
(n 4 C : CL m
CHAPTER IV
RESULTS
The number of times the F statistic from the standard analysis of
covariance exceeded the F statistic from analysis of covariance with
Porter's adjustment is listed for each set of conditions in Tables 4,
5, 6, and 7. These quantities were used as test statistics for the sign
tests used to test the hypothesis,"There is no difference between the
sampling distributions of the test statistics from analysis of covariance
with and without Porter's adjustment." This hypothesis was tested for
each of the fortyeight sets of conditions at the .01 level of signifi
cance.
Siegel (1956) stated that a large sample test statistic for the
sign test is
(40) z = x.5N
.5VN
where N is equal to the number of pairs of observations, x is equal to
the number of times the first measurement of the pair exceeds the second
measurement of the pair, and z is the standard normal variate. For a
level of .01, the null hypotheses would be rejected when z was less than
2.33 or greater than 2.33. This is equivalent to rejecting the null
hypotheses when x was less than 949 or greater than 1051 and N equals
2000. Thus, an inspection of the last column of Tables 4, 5, 6, and 7
reveals that the null hypothesis was rejected for each of the forty
eight sets of combinations.
The analysis of variance summary tables are presented in Tables 8,
~
9, 10, and 11. The analysis of variance summary table for the case when
the Monte Carlo generated alpha values from the standard analysis of
covariance were used as the criterion variables is presented in Table 8.
The analysis of variance summary table for the case when the Monte Carlo
generated alpha values from the analysis of covariance with Porter's ad
justment were used as the criterion variables is presented in Table 9.
The analysis of variance summary tables for the cases when Monte Carlo
generated powers were used as the criterion variables for standard
analysis of covariance and analysis of covariance with Porter's adjust
ment are presented in Tables 10 and 11, respectively. Each analysis of
variance table includes analyses of the simple effects where they are
warranted. Scheffe's S method for testing linear contrasts is included
in Table 8. These linear contrasts are defined in Table 12. Each F
statistic which exceeds the critical value at the .05 level is denoted
by an asterisk.
A result of particular interest is applicable to analysis of co
variance both with and without Porter's adjustment. That is, when
a spuriously high fraction of significant F statistics occurred when
there was a mean gain of zero in both groups and the pretest means
differed, the adjusted posttest means indicated that the gain was in
favor of the group having the largest pretest mean. This result was
more pronounced for lower reliabilities.
When the gain group had a positive gain, the nogain group had a
mean gain of zero, and the nogain group also had a larger pretest mean,
a similar situation occurred. In this situation, when a significant F
statistic occurred, the adjusted posttest means usually indicated the
nogain group had recorded the larger gain. Again, these results be
came more pronounced as the reliability of the scores were reduced.
LU
mci
SLUJ
2:
LU JU
ci c
ri
 
LL.L
OLU
LU
>F
: cl
LU LU
LU CD
ci
H LU
H 
L LU
: oi
2:
C 0
t 2I
LL LU
ci
ci
LL
mC L
D
( L U
i Li 2f: H
= 0 oI
S LU00
t > c)
2 S C) D C c
=I L)
 : U : I
CC O 3C 
UC V) C
L U i LU
0 H Qt z 2:
U2:C)
cI li 0.
CD CEL
a LU =0
0 V) 2 CD
H Ll
LU 0
LL =
LU ci
J 0
> LU
LJ L
LU L 0C
0 0 CO 0 CD0 0
c0 C O c0 cO C 00
C) CD CD CD
S c ci ci 0 O 9 O i i 9
o o o o o o o o o o o
Sc o c r r r r L L L Ln
t. Ji p ci t c) %i 03 rz In CM
i C\j C ci c00 m C oo c' c0 cri cm
(1) to co r r. c0 cO C0 ci ci N c1 i
co = C ) ci i rN. 10 (o ci 'C o' c
LO ki Lo c' n ci tO Ln c0 c0
CD CD C Ci Ci C Ln CD cD co
c0 ci C0 Co co c C CD o .0 r'C 'Ci ci
'C c) in C C
ci i ci cin c cij c ci. ci ci c ci
CO CO CO CO CO CO CD C0 CD C0 C0 C0
O O O O O :R 0 0 0 0 0 0
0 0 000
C C)O O O
C) C 0O co C
C4
c
r
41
in
s
0
LO
cu
,E
o
41
i.n
4i
0
io
2: c
10 .Q
V3
> LU
H O~
C C
11 0
CD<
CDi CD
cizv
ci 
I 0
u u ar
Q C D
LU O O
33
N CJ m() ( C'J 00 r cz C CD
CO CD MO ZC Lo 0') CO C C
LU
LU
2: (C
WH
IO
LU LU
2: C
C Z
wl
C (
LU LU
LU
m
= :0
H H
(
U LU
Uf
00 I
SLU
00
LL 7
LU
LU 
CO
E LU
2 2
4 H
e CD
C) LU
o
U LL
(A C C 0
LU C3 LU 
C2: z (C 01
LU C Co LU La
CO I LU H
Z) U > L C
IH
LI z
O 3 
UH C CQ
SLU
D >
2: C) Z Z: (A
C): CQ
: CD1 0 cl 0
L cn C
UHLL O
1 00L
Li:c C)
iO
F = 2f: Z
l U C C
C Y ( D (D
SlLL 0
H LL Z
LU U 3
0 U0
A C C
LU
D LC
I Z 0i
L1 CC
0CC
C0 0 0 0 0 0 0 0 0
e 0 0 0 0 0
C C 0
o o o
0 0 0 0 0 0 0 0 0 0 0
C O C CO o O0 CO o CO o
o 0 ~ oo 000o O O
0 0 0 0
 0 0
0 0 0 0
0 0
0 0 o 0
S o
C C C C C C C C C C C C
oa oN o oN o rCO rCO rO CO Lo
0 W w kD cD O O C) <\I C0 O
r^ ^.   D ^ ^ ^ ^ ^ It r
" co ( 00 "\ ". O C'. 0) 00 LO
r Lo ko LO C) r LO mD NI r C'.
C) C C') C C) CD o CD C C
ko cQ Ln O oCt C) m tD cr ir C '^
O m O UU O C CO m O C O
C C C C') C (
rC
C4
41
S.
lu
u
>4'
Ln
4
cu
C4
(U
0
i:
fu'
4,
U
(U
(UO
C04
Da
(U
(UU
(Uu,
rr0
CD
ClI
I 0
LU
Q:
LU
a3
I 
LU C
LU LU
CM V
Wu
SLU
a
SIA
LU I
LU
LU 
LU Co
LU
mZ r
C,
 LU
F
u LU
0
LL
a
CD
CD LU
SQa
ct C
C 
a C
F a3
g)
LQl
LL.Z
U U1
LU c wU 
C C of
Co F 2; LU F
t 0 0
CD C C) LU a
70
L o, 
LU
cF I
vi ; v>
C Z A LU e
0 > 0
2 n J
u 11 I
S1C
I u LU
LU .1c a
0 Z > D
<u C
C u w
0 CD
21 LL; o
F a C CD
Fi 
F w; :D 0
Y LL4 z
V) LU CD
u w
C
0 0
LU U 0
SC) CD
LL
u
LU C
LlD
a;
AO
0 0 0 0
o o o o
o oo o
CO O O O O O O O O
o C C o 0C C Co CD 0C0 C
L LO CO
0 0 O 0
C C
C D 00C
C n Co
o0 0 0
 C C
C LC
O 0 O O
C C
O O O CC O O O N O CO O O
. . o . *o *
C C C C C C C C
o o ooo o oo
34
S co i CM C O CM
0 Zl Co Ln C C D Co om Co OD C7i
oj ^ r m o o: m m C NC
Cn I 0C 0 0 C CC 0 0 CO
SM o r c
C O O C O
C
*
C
'a
S
a;
0
L
(U
4
+J
4)
a
+3
*
a)
C^
+3 c
C C
ui
LU CC
3:
F 0
LU
LU
0I
LO 
LU 0U
3Z
LL.]CJ
LU LU
C)
( ) w
U 1
LU UJ
C)
VId
LC t
C.) L
2i
LIL
(A Q;
U_ LU3
LLl
_j LU
_I =33
Z 
C) Q
: L!
Z Z
(/) If)
u
0;
U O
Z L
Zt
Uli
0v
D
WmI HL
LU IO LL 
=1 U F >< 0
Z C (n LLl 0
~I
LUl
C,) U D
C wC)
 0 I 
;I < QLU
UZ UI
I (A w
I F 3) V)
O) Z I >
LI 
u C
.. U lc
L L= =
D
wZ O
I Ll ZL
LU
w I
a 0
U
u C)
O
LU
C D L U
j 0
0
C)
co
S1 0
CQ 1C1
a O O lo O O 0
0 0 0 0 0 0 0
cz r r. o. or ID L9
m o
0 0
o 0
o 0
o 0
o Co
0 00
L0 I0
C C
o o
5n 5n
w w Lo w O O O O C l CM C'_ CJ
~~~zd rN
 o d LO M a C
o t oN rr o o o o
O O CO 0 0 0 0 0 0 0
C O 0 0 0CCD C C C C
0 o C o 0 o D 0 o 0
0 0 o o o c o o 0 o
M N. to o CN r .o r
0o o co w mo fln cE co
O o C C CO O OO O0 ( C0 r
i O  0 0 C9 O
0;
0
>C
4)
s
ow
'4
LO
4 C
r
0
0
c
I.
c0*
4,
(U.1
>
cO co L0
C 09 Lo
CD C:) C)
o1 O O
TABLE 8
ANOVA SUMMARY TABLE USING MONTE CARLO GENERATED ALPHAS FROM THE
STANDARD ANALYSIS OF COVARIANCE AS THE CRITERION VARIABLES
SUM OF DEGREES OF MEAN
SOURCE SQUARES FREEDOM SQUARE F
Pretest Mean 1.59960 1 1.59960 (544.08)
PM at R1 .07049 1 .07049 23.98*
PM at R2 .33582 1 .33582 114.22*
PM at R3 .40513 1 .40513 137.80*
PM at R4 .16687 1 .16687 56.76*
PM at R5 .33466 1 .33466 113.83*
PM at R .42510 1 .42510 144.59*
PM at SS10 .11525 1 .11525 39.20*
PM at SS100 2.10000 1 2.10000 714.29*
Reliability .10996 5 .02199 (7.48)
R at PM1 .00008 5 .00001 <1.00
R at PM2 .22278 5 .04455 15.15*
1 .00018 1 .00018 <1.00
'2 .10306 1 .10306 35.05*
'3 .73933 1 .73933 251.47*
'4 .04743 1 .04743 16.13*
Sample Size .59977 1 .59977 (204.00)
SS at PM1 .00005 1 .00005 <1.00
SS at PM2 1.21540 1 1.21540 413.40*
PM x R .11291 5 .02258' 7.68*
PM x SS .61568 1 .61568 209.41*
R x SS .01470 5 .00294 1.00
Residual .01468 5 .00294
Total 3.06730 23
*Sample statistic greater than critical value at .05 level.
TABLE 9
ANOVA SUMMARY TABLE USING MONTE CARLO GENERATED ALPHAS
FROM ANALYSIS OF COVARIANCE WITH PORTER'S ADJUSTMENT
AS THE CRITERION VARIABLES
SUM OF DEGREES OF MEAN
SOURCE SQUARES FREEDOM SQUARE F
Pretest Mean .45844 1 .45844 (25.08)
PM at SS1o .00832 1 .00832 <1.00
PM at SS100 .75050 1 .75050 41.06*
Reliability .17552 5 .03510 1.92
Sample Size .33018 1 .33018 (18.06)
SS at PM1 .00035 1 .00035 <1.00
SS at PM2 .63021 1 .63021 34.48*
PM x R .15727 5 .03145 1.72
PM x SS .30038 1 .30038 16.43*
R x SS .10189 5 .02038 1.11
Residual .09139 5 .01828
Total 1.61507 23
*Sample statistic greater than critical value at .05 level.
I
TABLE 10
ANOVA SUMMARY TABLE USING MONTE CARLO GENERATED POWERS
FROM THE STANDARD ANALYSIS OF COVARIANCE
AS THE CRITERION VARIABLES
SUM OF DEGREES OF MEAN
SOURCE SQUARES FREEDOM SQUARE F
Pretest Mean .97768 1 .97768 (85.54)
PM at SS10 .01620 1 .01620 1.42
PM at SS100 1.61334 1 1.61334 141.15*
Reliability .12431 5 .02486 2.17
Sample Size .65076 1 .65076 (56.93)
SS at PM1 .00001 1 .00001 <1.00
SS at PM2 1.30482 1 1.30482 114.16*
PM x R .19910 5 .03982 3.48
PM x SS .65406 1 .65406 57.22*
R x SS .05665 5 .01133 <1.00
Residual .05714 5 .01143
Total 2.71970 23
*Sample statistic greater than critical value at .05 level.
TABLE 11
ANOVA SUMMARY TABLE USING MONTE CARLO GENERATED POWERS
FROM ANALYSIS OF COVARIANCE WITH PORTER'S ADJUSTMENT
AS THE CRITERION VARIABLES
SOURCE
Pretest Mean
PM at SS10
PM at SS100
Reliability
Sample Size
SS at PM1
SS at PM2
PM x R
PM x SS
R x SS
Residual
Total
SUM OF
SQUARES
DEGREES OF
FREEDOM
.15714
.00157
.36018
.14179
.23285
.00046
.43701
.16990
.20461
.13367
.12132
MEAN
SQUARE
.15714
.00157
.36018
.02836
.23285
.00046
.43701
.03398
.20461
.02673
.02426
1.16127
,*Sample statistic greater than critical value at .05 level.
(6.48)
<1.00
14.85*
1.17
(9.60)
<1.00
18.01*
1.40
8.43*
1.10
TABLE 12
COEFFICIENTS FOR THE LINEAR CONTRASTS USED WHEN
RELIABILITY WAS SIGNIFICANT IN THE ANALYSIS OF VARIANCE
LEVELS OF RELIABILITY
RELIABILITY OF GAIN GROUP 90 70 50 76 50 42
RELIABILITY OF NO GAIN GROUP 90 70 50 90 70 50
1 1 1 1 1 1
1 3 3 3 3
1 1 0 1 1 0
2 2 7
CONTRAST
1 0 1 1 0 1
0 1 1 0 1 1
24 2 2 2 
CHAPTER V
DISCUSSION
In general, the results of this study support the positions of
Lord (1967, 1969), Campbell and Erlebacher (1970), and O'Connor (1972)
with respect to their warnings about the implications of analysis of
covariance using unreliable test scores. The study shows that even when
there is no gain in either group, a much higher fraction of rejections
occur than would be expected. The fraction of rejections is even more
extreme when the reliability is .70 or lower and the pretest means
differ. In addition, Porter's adjustment seems to offer little improve
ment.
Comparison of the Two Methods of Analysis
The rejection of all fortyeight null hypotheses concerning the
equality of the sampling distributions for the two methods of analysis
shows that there is a difference in the results obtained from analysis 
of covariance and analysis of covariance with Porter's adjustment. A
closer examination shows that these differences are more extreme when
the pretest means of the two groups differ and the reliabilities are
low. Further study shows that although the two methods of analysis are
different, neither method does an adequate job of modeling reality, that
is, both methods tend to produce erroneous proportions of type I and II
errors when pretest means differ and the reliabilities are low. When the
pretest means are equal, the power of the tests seem to be directly re
lated in a positive manner to the reliabilities when other variables are
held constant.
41
Possibly the most far reaching results were a function of reliability.
The data indicated that incorrect decisions about which group had the
larger gain could be made using either method of analysis when the pre
test means of the two groups differed and the reliabilities were low.
When the pretest mean of'the gain group was less than that of the nogain
group, the adjusted posttest means showed that the nogain group was
superior both when the mean gain was zero in both groups and when the
mean gain was positive only in the gain group. This possibility was
pointed out by Lord (1967) and Campbell and Erlebacher (1970).
Factors that Affect Alpha and Beta
The factorial experiments using the computer generated alphas and
betas allow one to infer which factors affect the levels of type I and
type II errors. Using Monte Carlo generated alphas from the standard
analysis of covariance in a factorial experiment indicated that an inter
action between pretest means and reliability and an interaction between
sample size and pretest means affected the alphas significantly. Further
analyses (simple effects) showed that the pretest mean factor was signif
icant at every level of reliability. Both reliability and sample size
were significant when the pretest means differed. The tests of linear
contrasts showed that there was no significant difference between equal
reliabilities and unequal reliabilities when the pretest means differed,
but there were significant differences among the levels of reliability
when the pretest means differed. These results indicate that a difference
in reliabilities between groups, thus a difference in slopes, has no
effect, however, the level of reliability does.
The other three factorial experiments indicated that interactions
between pretest means and sample size were the major contributors to the
differing levels of alpha and power. In all three experiments (See Tables
10, 11, and 12) the sample size was significant when the pretest means
differed. Reliability seems to have a somewhat moderate effect in these
cases.
Predicting Alpha and Power
Using the results of this study, a regression equation can be set
up to predict the experimental probability of a type I error or the
experimental probability of rejecting a false null hypotheses when the
established probabilities are .05 and .50 respectively. The basic
regression equation is
(41) Y = Bo+BIRG+12RN+B3S+B4M+B5RGRN+B6RGS
+B7RGM+g8RNS+9BRNM+B10SM+E
where
Y = the predicted alpha or power,
RG = the reliability of the gain group scores,
RN = the reliability of the no gain group scores,
S = sample size
M = 1 if the pretest means are equal,
0 otherwise, and
Bi = the regression coefficient, i.
Tables 14 and 15 provide confidence intervals for the expected
values of alpha and power respectively for each set of conditions noted.
TABLE 13
NINETYFIVE PERCENT CONFIDENCE INTERVALS FOR THE EXPECTED
VALUES OF ALPHAS UNDER SPECIFIED CONDITIONS
RELIABILITY
OF GAIN
GROUP SCORES
RELIABILITY
OF NO GAIN
GROUP SCORES
GROUP
SAMPLE
SIZE
EQUALITY
OF PRETEST
MEANS*
LOWER
CONFIDENCE
LIMIT OF
ALPHA
S.000
.000
.000
.556
.017
.108
.000
.780
.000
.297
.000
.932
.000
.043
.036
.691
.003
.228
.014
.879
.000
.307
.000
.967
UPPER
CONFIDENCE
LIMIT OF
ALPHA
.137
.137
.100
.741
.162
.325
.141
.945
.125
.451
.120
1.000
.136
.223
.143
.871
.136
.361
.146
1.000
.097
.472
.117
1.000
*If 1, the pretest means
are not equal.
are equal and if 0, the pretest means
TABLE 14
NINETYFIVE PERCENT CONFIDENCE INTERVALS FOR THE EXPECTED
VALUES OF POWER UNDER SPECIFIED CONDITIONS
RELIABILITY
OF GAIN
GROUP SCORES
RELIABILITY
OF NO GAIN
GROUP SCORES
GROUP
SAMPLE
SIZE
EQUALITY
OF PRETEST
MEANS*
LOWER
CONFIDENCE
LIMIT OF
POWER
.000
.000
.000
.351
.019
.048
.000
.677
.000
.155
.000
.863
.000
.000
.000
.523
.000
.097
.010
.782
.000
.156
.000
.909
UPPER
CONFIDENCE
LIMIT OF
POWER
.300
.115
.191
.666
.266
.295
.236
.924
.176
.417
.224
1.000
.265
.199
.235
.829
.210
.323
.236
1.000
.130
.438
1.000
*If 1, the pretest means
are not equal.
are equal and if 0, the pretest means
The predicted value of alpha or power can be found by using the
data found in Tables 4, 5, 6, and 7 to fit regression equation (41) to
obtain the following estimated regression parameters:
(42) alpha = .228 + .395RG + .332RN + .008 S .661M
1.09 RGRN .004 RGS + .619 RGM
.003 RNS + .195 RNM .007 SM
and
(43) power = .045 + .536RG + .404 RN + .010 S .700 M
1.228 RGRN .006 RGS + .846 RGM
+ .002 RNS + .218 RNM .007 SM.
A confidence interval for the predicted values of alpha and power
could be obtained in the usual manner.
A Direction for Further Research
The results of this study combined with other research (Lord, 1967;
Campbell and O'Connor, 1972), indicates a need for further study and the
development of a robust method for comparing groups of differing ability
when the scores are not perfectly reliable. Any new technique which is
proposed should first be investigated under known conditions, either
analytically, or by Monte Carlo techniques. This would insure that
another inappropriate method is not used for the evaluation of compensa
tory education projects.
CHAPTER VI
SUMMARY
This study was designed to determine if either analysis of covar
iance or analysis of covariance with Porter's adjustment is an appropri
ate analytical procedure for evaluating educational pretestposttest
experiments. In particular, these methods were compared with respect to
their use in the analysis of compensatory education projects where the
groups may differ in ability.
The study was carried out by computer generating 2000 sets of
normal data under fortyeight sets of predetermined conditions of reli
ability, sample size, gain, and equality of pretest means. Each of the
2000 sets of data for each set of conditions was then analyzed using both
standard analysis of covariance and analysis of covariance with Porter's
adjustment. A sign test was used to compare the two methods of analysis
under each of the fortyeight sets of conditions. It was concluded that 
the two methods of analysis yielded different results.
Factors affecting the two methods of analysis were then studied
separately using the computer generated alphas and powers as criterion
variables in four factorial experiments. The factors included reliability
at six levels, sample size at two levels, and the equality of pretest
means at two levels. From these experiments, it was concluded that pre
test means interacting with sample size and sometimes with reliability
were significant factors. More specifically, sample size was statistically
significant in each case where the pretest means differed. Also, pretest
means were significant at every level of reliability for the computer
generated alphas produced by the standard analysis of covariance.
It was also learned that when the pretest means differ, both
standard analysis of covariance and analysis of covariance with Porter's
adjustment produced erroneous results with respect to which group if
either had a gain. When both groups had a mean gain of zero and the
pretest means differed, significant results usually indicated that the
group with the larger pretest mean had the gain. This would correspond
to the control group sampled from the general population being credited
with the gain in a compensatory education experiment. When there was a
gain in only one group and the pretest mean was lower in that group, the
analyses still indicated that the other group had the gain.
The results of this study point to the recommendations that analysis
of covariance with or without Porter's adjustment should be approached V
with caution when the reliabilities are below .90 and the pretest means
(covariate means) are likely to be different for the groups.
APPENDIX
C FCRTRAN PRCGRAV WHICH PERFORMED DATA GENERATION AND ANALYSIS
C
C
DIVENSICN E1G (1002),E2GN(100),GAINGN(1CO),XX(2CO),YY(200),
*X(100),EING(100),E2NC(100),GAINNC(100),TG(100),TN(100)
REAL .VSB,PSE,MSBP,MSEP
NGRCUF=O
C
C INPUT CF GRCUP PARAMETERS WIEREO
C RELGN REPRESENTS TFE RELIABILITY OF THE GAIN GROUP DATA
C RELNG REPRESENTS THE RELIABILITY OF THE NO GAIN GROUP
C NPERG .REPRESENTS ThE NUMBER OF OBSERVATIONS PER GROUP
C GBAR REPRESENTS ThE MEAN GAIN FOR GAIN GROUP
C PRVNG REPRESENTS THE GAIN GROUP PRETEST MEAN
C PRVNNG REPRESENTS THE NO GAIN GROUP PRETEST MEAN
C ISEEC REPRESENTS THE SEED FOR RANDOM NUMBER GENERATOR
C
READ (5,99) ISEEO,NGP,NSPL
99 FORMAT (I ,2X,215)
100 READ (5,101) RELGN,RELNG,NPERG,GOAR, NGPRMNPRMNNG
101 FORMAT (2F3.2,I3,F3.2,2F3.0)
GGRCUP=NCGICUP+1
NSAVF=O
NF10=0
NF100=0
NFP10=0
NFP100=0
C
C HEADER CARC FOR NEW SET OF PARAMETERS
C
WRITE (6,102)
102 FORMAT ('1',T43,'F STATISTICS BASED ON THE FOLLOWING PARME'
*,'TERS')
WRITE (6,103)
103 FORMAT ('O',T22,'RELGN ',2X,'RELNG ',2X,'NPERG ',2X,' GOAR'
*,' ',2X,'PRMNG ',2X,'PRMNNG')
WRITE (6,104) RELGN,RELNG,NPERG,GBAR,PRMNG,PRMNNG
104 FORMAT (1X,T22,2F8.5,I5,3X,3F8.3)
WRITE (6,105)
105 FORMAT ('O',T45,'SAMPLE NUMBER',5X,'STANDARD F',5X,'PORTER'
I,'S F')
C
C COMPUTE VARIANCE COMPONENTS
C
VARELG=IGO*(1RELGN)
VARE1N=100I (1RELNG)
VARTG=100IRELGN
VART=1003*RELNG
VARGNG=.044VARTG
VARGKN=.04*VARTN
VARE2G=(VARTG+VARGNG)/RELGNVARTGVARGNG
VARE2N=(VARTN*VARCNN)/RELNGVARTNVARGNN
C
106 CONTINUE
NSAMP;NSAVP+1
C
C GENERATING CF CATA FOR GAIN GROUP
C
C
C TRUE SCORES
C
CALL RANGEN(NPERG,ISEEC,X)
00 300 I=1,NPERG
TG(I)=X(I)*SQRT(VARTCI+PRMNG
300 CCNTIKUE
C
C PRETEST ERRCR SCORES
C
CALL RANGEh(NPERG,1SEED,X)
CC 301 I=1,NPERG
EIGN(I)=X(I)*SCRT(VAREIG)
301 CCNTINUL
C
C POSTTEST ERROR SCORES
C
CALL RANGEN(NPERGISEED,.X)
DC 302 I=1,NPERG
E2G ( I)=X(I)*SCRT(VARE2G)
302 CCNTIKUE
C
C GAIN SCORES
C
CALL RANGEN(NPERG,ISEEC,X)
CC 303 I=1,NPERG
GAINGN(II=XII)*SQRT(VARGNG)+GBAR
303 CCNTINUt
C
C PRETEST ANC PUSTTEST SCORES
C
CO 304 I=1,NPERG
XX(I)=TGII)+EI GN(I)
YY(I)=TC(I)+GAINGN( I)E2GN(I)
304 CCATINUE
C
C
C GENERATION CF CATA FOR NO GAIN GROUP
C
C TRUE SCORES
C
CALL RANGEN(NPERG,ISEED,X)
OC 400 I=1,NPERG
52
TMI(I) X(I) SQORT(VARTN)+PRMNNG
400 CCKTIhUE
C
C PRETEST ERRCR SCORES
C
CALL RANGE (NFPERG,ISEEC,X)
OG 401 I=1,NPERG
EING(I)=X(I)*SQRT(VAREI1N)
401 CCNTINUE
C
C POSTTEST ERROR SCORES
C
CALL RANGEK(NPERG,ISEEO,X)
CC 402 I=1,NPERG
E2NG(I)=X(II)SQRT(VARE2N)
402 CCNTIrUt
C
C GAIN SCORES
C
CALL RANGEN(NPERG,ISEED,X)
CC 403 I=1,NPERG
GAINNG(I)=X(I)*SQRT(VARGNN)
403 CCNTINUt
C
C PRETEST ANC POSTTEST SCORES
C
DO 404 I=1,NPERG
L=NPERG+I
XX(L)=T {(I)+EI NG(I)
YY(L)=TMN( )+GAINNG(I)+E2NG(I)
404 CONTINUE
C
C
C STANCARC ANALYSIS OF COVARIANCE COMPUTATIONS
C
N=2*NPERG
C
C INITIALIZATICN
C
SUPXG=0.0
SUPX2G=O.C
SUVYG=0.0
SUMY2G=0.O
SUMXYG=0.0
SUVXN=0.0
SUPX2N=0.0
SUVYN=O.O
SUVY2N=O.C
SUVXYN=O.C
C
C GRCUP SUMS AND SUMS OF SQUARES
C
CC 600 I=1,NPERG
C
SUVXC=SUPXG+XX(II
SUrX2G=SUPX2(;+XX(I)*XX(I)
SUPYG=SUfYG+YY(1)
SUMY2G=SUPY2G+YY(I)*YY(I)
SUVXYG=SUNXYG+XX(I)*YY(I)
K=NPERG+I
SUMXN=SUPXN+XX(K)
SUPX2N=SUIX2N+XX(K) XX(K)
SUMYN=SUMYN+YY(K)
SUVY2N=SUPY2N+YY(K)*YY(K)
SUXXYN=SUUXYN+XX(KI*YY(K)
600 CCNTINUC
C
C TOTAL SUMS AND SUMS OF SQUARES
C
TSU X=SUPXG+SUMXN
TSUPX2=SUFX2G+SUMX2N
TSUPY=SUMYG+SUMYN
TSUVY2=SUrY2G+SUMY2N
TSUVXY=SUPXYG+SUMXYN
C
C CCMPUTE TCTAL SUMS OF SQUARES
C
CFX=TSUMX*TSUPX/N
CFY=TSUMY*TSUMY/N
CFXY=TSUMX*TSUPY/N
TXX=TSUPX2CFX
TYY=TSUVY2CFY
TXY=TSUMXYCFXY
C
C COMPUTE BETWEEN GROUPS SUMS OF SQUARES
C
BXX=(SUX'G*SUMXG+SUMXN*SUMXN)/NPERGCFX
BYY=(SUMYG*SUMYG+SUMYN*SUMYN)/NPERGCFY
BXY=(SUMXCG*SUVYG+SUMXN*SUMYN)/NPERGCFXY
C
C COMPUTE ERPCR SUMS OF SQUARES
C
EXX=TXXPXX
EYY=TYYBYY
EXY=TXYe.XY
C
C CCMPUTE ADJUSTED SUMS OF SQUARES
C
TYYACJ=TYYTXY*TXY/TXX
iYYACJ=EYYEXY*EXY/EXX
BYYACJ=EYY(TXY*TXY)/TXX+(EXY*EXY)/EXX
C CCVPUTE ACJUSTEC KEAN SQUARES
C
VSB=BYYAEJ/1.0
VSE=EYYA[J/IN3)
C
C CCVPUTE F STATISTIC
C
.F=VSB/MSE
C
C
C ANALYSIS .CF COVARIANCE WITH PORTER'S ADJUSTMENT
C
C COMPUTE CORRELATION BETWEEN X AND Y
C
CENCP=TXX*TYY
RXY=TXY/SCRT(CENOP)
C
C COMPUTE SUPS OF SQUARES WITH PORTER'S ADJUSTMENTS
C
EPCRT=EYYACJ
TFCRT=TYY((RXY*EXY+GXY)*2)/(IRXY*RXY*EXX)+BXX)
BFCRT=TPCRTEPGRT
C
C COMPUTE MEAN SQUARES
C
SBEP=EPCRT/1.G
PSEP=EPCRT/(N3)
C
C COMPUTE F STATISTIC WITH PORTER'S ADJUSTMENT
C
FPORTMPSEP/MSEP
C
IF (F.GT.4.45) NF10O=F10+1
IF (F.GT.3.89) NF100=NF1CO+1
IF (FFORT.GT.4.45) NFP10=NFP10+1
IF (FFCRT.GT.3.89) NFP100=NFP100+1
C WRITE CUT RESULTS
C
WRITE (6,800) NSAMP,F,FPORT
80 FORPAT (LX,T50,I4,T64,F8.3,T77,F8.3)
C
IF (NSAMP.LT.NSPL) GC TO 106
C
WRITE (6,801)
801 FORMAT ('C',T10,'NF10',.T20,'NF10 ',T30,'NFP10',T40,'NFP100'
a)
WRITE (6,802) NF10,NF100,NFP10,NFP100
802 FORMAT (IX,T10,4(14,6X))
IF (GNRCUP.LT.NGP) GC TO 100
STCP
ENh
C
C
SUBRCLTINE RANGEN(M, IR,.X)
CIPENSICN X(M)
l=1
1 CALL RANCU(IR,JR,R1)
IR=JR
CALL RANCUIIR,JR,R2)
IR=JR
1=2.0*(IR1.5)
R2=2.CI*R2.5)
S2=R1IRl+R2*R2
IF(S2.GT.1.0) GO TO 1
Y=SCRT(2.O*(ALOG(S2)/S2))
X(I)=RisY
X ( I ) = I1 Y
IFII.EC.Y) GO TO 2
X(I+1)=R2*Y
IF(I+I.EC.M) GO TO 2
1=1+2
GC TC I
2 RETURN
END
C
SUBRCUTINE RANCU( IX, IY,.YFL)
IY=IXn65539
IF(IY)5,6,6
5 IY=IY+2147483647t+
6 YFL=IY
YFL=YFL*.4656613E9
RETURN
ENC
BIBLIOGRAPHY
Box, G. E. P. and Muller, Mervin E. "A Note on the Generation of Random
Normal Deviates," Annals of Mathematical Statistics, 29 (June,
1958), 610611.
Campbell, Donald T. and Erlebacher, Albert. "How Regression Artifacts
in QuasiExperimental Evaluations can Mistakenly Make Compen
satory Education Look Harmful," in Disadvantaged Child, Vol. 3,
Ed. Hellmuth, Jerome. New York: Brunner/Marzel, Inc., 1970.
Campbell, Donald T. and Stanley, Julian C. Experimental and Quasi
Experimental Designs for Research. Chicago: Rand McNally and
Company, 1963.
Cochran, William G. "Analysis of Covariance: Its Nature and Uses,"
Biometrics, 13 (September, 1957), 261281.
Cronbach, Lee J. Essentials of Psychological Testing, Third Edition.
New York: Harper and Row, 1970.
Cronbach, Lee J. and Furby, Lita. "How We Should Measure 'Change' or
Should We?," Psychological Bulletin, 74 (January, 1970), 6880.
DeCecco, John P. The Psychology of Learning and Instruction: Educa
tional Psychology. Englewood Cliffs, N. J.: Prentice Hall, Inc.,
1968.
Ferguson, George A. Statistical Analysis in Psychology and Education,
Third Edition. New York: McGrawHill Book Company, 1971.
Fisher, Ronald A. Statistical Methods for Research Workers, Fourth
Edition. London: Oliver and Boyd Ltd., 1932.
Fisher, Ronald A. Statistical Methods for Research Workers, Tenth
Edition. London: Oliver and Boyd Ltd., 1946.
Fisher, Ronald A. The Design of Experiments. London: Oliver and Boyd
Ltd., 1935.
Garside, R. F. "The Regression of Gains Upon Initial Scores,"
Psychometrika, 21 (March, 1956), 6777.
Glass, Gene V., Peckham, Percy D., and Sanders, James R. "Consequences
of Failure to Meet Assumptions Underlying the Fixed Effects
Analysis of Variance and Covariance," Review of Educational
Research, 42 (Summer, 1972), 237288.
Gulliksen, Harold. Theory of Mental Tests. New York: John Wiley and
Sons, Inc., 1950.
Helmstadter, G. C. Principles of Psychological Measurement. New York:
AppletonCenturyCrofts, 1964.
Hicks, Charles R. "The Analysis of Covariance," Industrial Quality
Control, 20 (December, 1965), 282287.
Hilgard, Ernest R. Theories of Learning. New York: AppletonCentury
Crofts, Inc., 1956.
Hill, Winfred F. Learning: A Survey of Psychological Interpretations,
Revised Edition. Scranton: Chandler Publishing Company, 1971.
Karmel, P. H. and Polasek, M. Applied Statistics for Economists, Third
Edition. Bath, Great Britain: Pitman Publishing, 1970.
Kirk, Roger E. Experimental Design: Procedures for the Behavioral
Sciences. Belmont California: Wadsworth Publishing Company,
Inc., 1968.
Lord, Frederic M. "A Paradox in the Interpretation of Group Comparisons,"
Psychological Bulletin, 68 (1967), 304305.
Lord, Frederic M. "Elementary Models for Measuring Change," in Problems
in Measuring Change. Ed. Chester W. Harris. Madison, Wisconsin:
The University of Wisconsin Press, 1963.
Lord, Frederic M. "Further Problems in the Measurement of Growth,"
Educational and Psychological Measurement, XVIII (1958), 437451.
Lord, Frederic M. "LargeSample Covariance Analysis When the Control
Variable is Fallible," Journal of the American Statistical
Association, 55 (1960), 307321.
Lord, Frederic M. "Statistical Adjustments When Comparing Preexisting
Groups," Psychological Bulletin, 72 (1969), 336337.
Lord, Frederic M. "Statistical Inferences about True Scores," Psycho
metrika, 24 (March, 1959), 117.
Lord, Frederic M. "The Measurement of Growth," Educational and Psycho
logical Measurement, XVI (1956), 421437.
Lord, Frederic M. and Novick, Melvin R. Statistical Theories of Mental
Test Scores. Reading, Massachusetts: AddisonWesley Publishing
Company, 1968.
Manning, Winton H. and DuBois, Philip H. "Correlational Methods in
Research on Human Learning," Perceptional and Motor Skills,
15 (1962), 287321.
~~ ~~
___
Marks, Edmond and Martin, Charles G. "Further Comments Relating to the
Measurement of Change," American Educational Research Journal,
10 (Summer, 1973), 179191.
Marsaglia, G. and Bray, T. A. "A Convenient Method for Generating
Normal Variables," SIAM Review, 6 (July, 1964), 260264.
McNemar, Quinn. "On Growth Measurement," Educational and Psychological
Measurement, XVIII (1958), 4755.
Muller, Mervin E. "A Comparison of Methods for Generating Normal
Deviates on Digital Computers," Association for Computing
Machinery Journal, 6 (1959), 376383.
Neel, John Howard. A Comparative Analysis of Some Measures of Change.
Dissertation, University of Florida (1970).
O'Connor, Edward F., Jr. "Extending Classical Test Theory to the
Measurement of Change," Review of Educational Research, 42
(Winter, 1972), 7397.
Peckham, Percy D. "The Robustness of the Analysis of Covariance to
Heterogeneous Regression Slopes," Paper read at the Annual
Meeting of the American Educational Research Association,
Minneapolis, Minnesota, (March 5, 1970).
Porter, Andrew Colvin. The Effects of Using Fallible Variables in the
Analysis of Covariance. Dissertation, University of Wisconsin,
University Microfilms, Ann Arbor, Michigan: (1967).
Siegel, Sidney. Nonparametric Statistics for the Behavioral Sciences.
New York: McGrawHill Book Company, 1956.
Snedecor, George W. and Cochran, William G. Statistical Methods, Sixth
Edition. Ames, Iowa: The Iowa State University Press, 1967.
Thorndike, R. L. "Regression Fallacies in the Matched Groups Experiment,"
Psychometrika, 7 (June, 1942), 85102.
Werts, Charles E. and Linn, Robert L. "A General Linear Model for
Studying Growth," Psychological Bulletin, 73 (1970), 1722.
Winer, B. J. Statistical Principles in Experimental Design, Second
Edition. New York: McGrawHill Book Company, 1971.
BIOGRAPHICAL SKETCH
James Edwin McLean was born January 29, 1945, at Greensboro, North
Carolina. He grew up in Orlando, Florida and graduated from Edgewater
High School in June, 1963. He graduated from Orlando Junior College in
January, 1966, and entered the United States Marine Corps Reserve.
In December, 1968, he received the degree, Bachelor of Science,
with a major in mathematics education from the University of Florida.
In January, 1969, he enrolled in the Department of Statistics at the
University and received the degree, Master of Statistics, in June, 1971.
During this period, he worked as a graduate assistant in that department
where he taught elementary statistics and probability.
He accepted a teaching assistantship in the College of Education
at the University of Florida in September, 1971. He currently holds
that position parttime along with the position of research associate
for a Project Follow Through evaluation grant.
James Edwin McLean is a member of the American Educational Research
Association, National Council on Measurement in Education, the American
Statistical Association, and Phi Delta Kappa.
He is married to the former Sharon Elizabeth Robb and they have
no children.
I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy.
William B. Ware, Chairman
Associate Professor of Education
I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy.
Vyjce A. Hines
Professor of Education
I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy.
il liam ndenhall
Profesoor of Statistics
I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy.
P. V. Rao
Professor of Statistics
I certify that I have read this study and that in my opinion it
conforms to acceptable standards of scholarly presentation and is fully
adequate, in scope and quality, as a dissertation for the degree of
Doctor of Philosophy.
Jam T. McClave
Assis ant Professor of Statistics
This dissertation was submitted to the Dean of the College of Education
and to the Graduate Council, and was accepted as partial fulfillment of
the requirements for the degree of Doctor of Philosophy.
August, 1974
Dean, College of Education
Dean, Graduate School
