Citation
A study of the power of multivariate analysis of variance on standardized achievement testing

Material Information

Title:
A study of the power of multivariate analysis of variance on standardized achievement testing when estimators for omissions utilize mean value and regression approaches
Creator:
Sledjeski, Stephen Stanley, 1942-
Publication Date:
Copyright Date:
1976
Language:
English
Physical Description:
viii, 45 leaves : ; 28cm.

Subjects

Subjects / Keywords:
Achievement tests ( jstor )
Analytical estimating ( jstor )
Consistent estimators ( jstor )
Datasets ( jstor )
Educational research ( jstor )
Estimated cost to complete ( jstor )
Estimation methods ( jstor )
Estimators for the mean ( jstor )
Missing data ( jstor )
Statistical estimation ( jstor )
Dissertations, Academic -- Foundations of Education -- UF ( lcsh )
Estimation theory ( lcsh )
Foundations of Education thesis Ph. D ( lcsh )
Mathematical statistics ( lcsh )
Multivariate analysis ( lcsh )
City of Gainesville ( local )
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis--University of Florida.
Bibliography:
Bibliography: leaves 41-44.
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Stephen S. Sledjeski.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
025273895 ( AlephBibNum )
02759873 ( OCLC )
AAT0025 ( NOTIS )

Downloads

This item has the following downloads:


Full Text









A STUDY OF THE POWER OF MULTIVARIATE ANALYSIS OF
VARIANCE ON STANDARDIZED ACHIEVEMENT TESTING
WHEN ESTIMATORS FOR OMISSIONS UTILIZE MEAN
VALUE AND REGRESSION APPROACHES










By

STEPHEN S. SLEDJESKI


A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY











UNIVERSITY OF FLORIDA

1976



































UNIVERSITY OF FLORIDA
3 1262 08552 7785IIIIIIIIIIIIIII11 I
3 1262 08552 77t85














ACKNOWLEDGEMENTS


My appreciation is extended to the members of my

doctoral committee for their contributions to the develop-

ment of this dissertation. They are: Drs. Vynce A. Hines

(Chairman), Ira J. Gordon, Zorin R. Pop-Stojanovic, and

Robert S. Soar.

To Dr. Hattie Bessent, no statement can express her

impact and assistance in attaining my educational goals.

Words can be neither sufficient nor appropriate to express

my esteem.

To Drs. Ann Bromley, Molly Harrower, and Wilson H.

Guertin, I present thanks for direction and assistance in

the understanding of my educational commitment.

To my sisters, Helen Brush and Ann Pendzick, and

their families, I can but state our fortuitous interaction

which has allowed not only educational growth but also

complete dispersion while retaining faith in one another's

existence.

To my mother, Helen Sledjeski, and my late father,

Stephen Sledjeski, I wish to express my deepest appreciation

for their successful development of a family unit filled

with motivation, sincerity, trust, and love. This work is

dedicated to their lives and memory.















TABLE OF CONTENTS


Page
ACKNOWLEDGEMENTS ..................................... ii

LIST OF TABLES ............. ...... .... ....... .......... v

ABSTRACT ............................................. vi

Chapter

I. INTRODUCTION ............ ........... ......... 1

Nature of the Study ........... ............ 1
The Problem and the Hypotheses ............. 4
Significance of the Study .................. 5

II. REVIEW OF RELATED LITERATURE ................. 7

Introduction .............................. 7
Historical Overview ........................ 7
Problems of Missing Multiresponse
Observations in Education ................ 13
Direction of Present Research ............. 14

III. DESIGN OF THE STUDY ........... ............... 15

Procedures .. ....... ... ................... 15
Method .. ......... .......... ........... 17

IV. RESULTS ................... ....... 20

Comparison of the Mean Value and the
Regression Estimated Data Sets with
One Another and with the Complete
Data Set at the 2% Percent Level of
Missing Subsamples ................ ....... 22
Comparison of the Mean Value and the
Regression Estimated Data Sets with
One Another and with the Complete
Data Set at the 5 Percent Level of
Missing Subsamples ....................... 24









TABLE OF CONTENTS-Continued


Chapter Page

IV. Comparison of the Mean Value and the
Regression Estimated Data Sets with
One Another and with the Complete
Data Set at the 10 Percent Level of
Missing Subsamples ....................... 26
Comparison of the Mean Value and the
Regression Estimated Data Sets with
One Another and with the Complete
Data Set at the 15 Percent Level of
Missing Subsamples ................ ....... 28
Comparison of the Mean Value and the
Regression Estimated Data Sets with
One Another and with the Complete
Data Set at the 20 Percent Level of
Missing Subsamples ....................... 30
Further Results ............. .. ........ . 32
Summary ............... .......... .......... 34

V. DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS .. 36

Discussion ................................. 36
Conclusions ........... ......... ........... 37
Recommendations ........... ...... .......... 39

REFERENCES ............................................. 41

BIOGRAPHICAL SKETCH ................... ...... ......... 45














LIST OF TABLES


Table Page

1 F-ratios and Complements (P) of the Cumulative
Distribution Function for Fourth- and Fifth-
Grade Samples Having Mean Value and Regres-
sion Estimated Subsamples Consisting of 2
Percent of the Complete Samples ............. 23

2 F-ratios and Complements (P) of the Cumulative
Distribution Function for Fourth- and Fifth-
Grade Samples Having Mean Value and Regres-
sion Estimated Subsamples Consisting of 5
Percent of the Complete Samples ............ 25

3 F-ratios and Complements (P) of the Cumulative
Distribution Function for Fourth- and Fifth-
Grade Samples Having Mean Value and Regres-
sion Estimated Subsamples Consisting of 10
Percent of the Complete Samples ............ 27

4 F-ratios and Complements (P) of the Cumulative
Distribution Function for Fourth- and Fifth-
Grade Samples Having Mean Value and Regres-
sion Estimated Subsamples Consisting of 15
Percent of the Complete Samples ............. 29

5 F-ratios and Complements (P) of the Cumulative
Distribution Function for Fourth- and Fifth-
Grade Samples Having Mean Value and Regres-
sion Estimated Subsamples Consisting of 20
Percent of the Complete Samples ............ 31










Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment
of the Requirements for the Degree of
Doctor of Philosophy

A STUDY OF THE POWER OF MULTIVARIATE ANALYSIS OF
VARIANCE ON STANDARDIZED ACHIEVEMENT TESTING
WHEN ESTIMATORS FOR OMISSIONS UTILIZE MEAN
VALUE AND REGRESSION APPROACHES


By

Stephen S. Sledjeski


March, 1976

Chairman: Dr. Vynce A. Hines
Major Department: Foundations of Education


The efficacy of utilizing estimators for omissions

in a multiresponse achievement data set which is analyzed

using multivariate analysis of variance (MANOVA) techniques

is the concern of this study. The estimates were determined

employing mean value and regression methods.

Random samples of fourth- and fifth-grade students

were administered the Stanford Achievement Test, Intermediate

Level I and Intermediate Level II, respectively, in the spring

of 1974. Each sample had a n of 193 consisting of two fixed

groups as the independent variables and the achievement sub-

scores as the dependent variables.

These two samples comprised the complete data sets

from which random subsamples of missing data were removed









from among the dependent variables. The missing subsample

consisted of 2, 5, 10, 15, and 20 percent of the complete

samples, each percent level being investigated five times

for each of the two methods of estimation.

The MANOVA results of the data sets with mean value

and regression estimates were compared to one another and

to the complete data set. The null hypotheses tested were:

There is no difference in MANOVA results for the
complete data set and the mean value estimated
data set with the size of the missing subsample
ranging from 2 to 20 percent of the complete
data set.

There is no difference in MANOVA results for the
complete data set and the regression estimated
data set with the size of the missing subsample
ranging from 2 to 20 percent of the complete
data set.

There is no difference in MANOVA results for the
mean value estimated data set and the regression
estimated data set both with the size of the
missing subsample ranging from 2 to 20 percent
of the complete data set.

The hypotheses were analyzed by comparing the comple-

ment of the cumulative distribution function derived from the

F-ratio of each MANOVA of the complete data set to that of

the estimated data sets. No significant differences were

found for the three hypotheses. Inspection of the results

demonstrated that the regression estimates provide MANOVA

results apparently closer to that of the complete data set

than did mean value estimates.

The research concluded that, within the confines of

this study, one cannot reject the use of mean value and









regression estimates for data sets with missing values which

are to be analyzed using MANOVA.


viii













CHAPTER I

INTRODUCTION


With the increased emphasis on multivariate analysis,

the experimenter has been confronted with multiresponse data

where measurements on all responses are not available for

every experimental unit. Since the time, resources, and

money involved in gathering multiple observations on experi-

mental subjects are greater than for gathering single

observations, multivariate analysis of variance (MANOVA)

must give attention to missing data. It is the purpose of

this study to consider missing observations in MANOVA

utilizing mean value and regression estimators on a set of

achievement data with subsets of randomly chosen missing

data ranging in size from 2 to 20 percent of the complete

data set. The power of MANOVA results will then be

determined.


Nature of the Study

Missing data estimation has been of interest to

educational and statistical researchers for several decades.

Estimation of uniresponse data has been conducted for various

experimental designs. Baird and Kramer (1960) investigated

the balanced incomplete block design. They developed









formulas through minimization of the error sum of squares

for the special case where missing values are within the

same block or treatment. Their method facilitates calcu-

lations but does nothing to restore missing information.

Kramer and Glass (1960) examined the Latin square

design. In the same manner as Baird and Kramer, they

developed formulas through minimizing of the error sums of

squares for several missing values to restore the balance

of the design. The formulas are for the specific cases

described and not for the completely general case.

Preece (1972) studied the two-way classification

design. He developed a method of estimating block and

treatment parameters from the nonmissing data plus the

estimated data.

Mitra (1959) considered the effect of missing value

estimates on the F-test in analysis of variance (ANOVA).

He demonstrated that the numerator in F (the treatment mean

square) and the denominator (the error mean square) cannot

have the same expected value when missing observations exist.

An examination of various missing data procedures

was performed by Wilkinson (1960). He put forth a method

of solving for estimates through simultaneous equations and

compares it to an iterative least squares method and a

covariance method. His method is preferred since it

requires fewer steps and gives the correct residual sums of

squares directly.









Studies investigating multiresponse data estimators

have been less numerous. The works of Kleinbaum (1970),

Srivastava (1967), and Trawinski (1961) are some examples of

early endeavors in multiresponse data. Kleinbaum looked at

the effect of estimation upon hypothesis testing of general-

ized multivariate linear models. In concurrence with Mitra

who investigated the uniresponse situation, he demonstrated

that hypotheses are rejected with bias when utilizing

estimators for missing values.

Srivastava extended the Gauss-Markov theorem to

multivariate linear models.

Trawinski showed that it is not necessary to collect

data on each characteristic of interest for each experimental

unit. She brought out the important fact that in many situa-

tions one needs to have experiments where observations on

some of the responses are missing not by accident, but by

design.

The relevance and importance of missing observations

were demonstrated by Srivastava and McDonald (1969, 1971).

They established, under realistic conditions, the preference

for the hierarchial incomplete models within the groups of

general incomplete multiresponse models.

Dempster (1971) provided an overview of the problems

involved. He surveyed a cross section of the developing

topics in multivariate analysis of data concentrating on

problems of pragmatic data analysis and not on technical

and mathematical detail.








The Problem and the Hypotheses

The present investigationwill attempt to determine

the efficacy of two types of estimates of missing data in

MANOVA. One type of estimate will be the mean value of the

variable for a particular treatment; the other, the regres-

sion of one of the MANOVA dependent variables on the remain-

ing dependent variables which then act as independent

variables. The results of these MANOVAs will be compared

to MANOVA results of nonmissing data. The hypotheses to

be investigated are:

Hi: There is no difference in MANOVA results for the
complete data set and the mean value estimated
data set with the size of the missing subsample
ranging from 2 to 20 percent of the complete
data set.

H2: There is no difference in MANOVA results for the
complete data set and the regression estimated
data set with the size of the missing subsample
ranging from 2k to 20 percent of the complete
data set.

Ha: There is no difference in MANOVA results for the
mean value estimated data set and the regression
estimated data set both with the size of the
missing subsample ranging from 2% to 20 percent
of the complete data set.

For each hypothesis, missing subsamples will be randomly

chosen which will comprise. 2, 5, 10, 15, and 20 percent

of the original complete sample. Each subsample percent

level will be investigated five times. Estimated values

will then be substituted and be subjected to MANOVA.

F-values from the MANOVA results will be compared

using the cumulative distribution function to determine the

power of the analyses.









Data used in the analysis will consist of achievement

scores as determined on the Stanford Achievement Test col-

lected in the spring of 1974. Two samples will be investi-

gated: a fourth-grade sample of 193 students who were

administered the Intermediate I Battery (eight variables)

and a fifth-grade sample of 193 students who were adminis-

tered the Intermediate II Battery (seven variables). The

students in each sample were chosen at random from each of

two fixed groups, an experimental group and a control group.

For each MANOVA, the independent variables will be the two

fixed groups.


Significance of the Study

The two types of estimators to be investigated

differ from one another in an important sense. The mean

value estimator considers all nonmissing values of a par-

ticular dependent variable for a specific treatment whereas

the regression estimators consider only those experimental

units with complete data. One approach attempts to utilize

all possible data elements, and the other forms an esti-

mation based on even less information.

Combining the fact of the two approaches with that

of varying subsamples of missing data will provide a thorough

look at omissions in multiresponse data taken from an edu-

cational setting. It is hoped that insights will be

developed for future analysis of similar educational data.






6


This chapter has presented the problem to be investi-

gated and the nature, significance, and hypotheses of the

study. Chapter II contains a review of literature related

to the problem of the study. The design and procedures

are stated in Chapter III; the results of the study are in

Chapter IV; and the discussion, conclusions, and recommen-

dations are given in Chapter V.














CHAPTER II

REVIEW OF RELATED LITERAURE


Introduction

Missing data have posed a problem in data analysis

for more than four decades. The initial investigations

involving incomplete data sets concerned univariate statis-

tical analysis. With the developments in computational

technology in the past quarter century, multivariate data

analysis has become feasible (Dempster, 1971) as has the

investigation of missing data in multivariate analysis.

The initial focus of researchers concerned the

techniques involved in the estimation of parameters when

there existed missing observations in the data set. It was

a question of developing the parameters and then adjusting

these parameters considering the missing data. The direc-

tion taken in the review of the literature which follows is

first, the estimation of the missing observations and

second, the formulation of the parameters required for

analysis.


Historical Overview

The first researcher to develop analysis procedures

by first estimating values for the missing observations was


-7 -









Wilks (1932). He examined the incomplete bivariate case of

a bivariate normal distribution using sample means for the

missing observations. He found that the optimum method of

determining the variance between the two variables was the

correlation between the two variables which included only

those pairs that were complete.

Wilks' example of a sample of statistical data from

a multivariate population has been popularized in many

related papers. Srivastava and Zaatar (1972) summarized

Wilks' example as:

[T]he situation when the experimental units are
skulls that have been dug out from a certain
graveyard. Since these skulls may be partly
mutilated, the choice as to which characteristics
should be measured on a particular unit is not
entirely in the hand of the investigator. (One
may suggest that in such a situation, we should
restrict ourselves to those skulls on which all
measurements of interest can be obtained. How-
ever, clearly this would in general not be very
proper unless there were a rather large number
of skulls free from any mutilation.) p. 117

Little more was published on incomplete multivariate

data sets until the 1950s when papers began to appear extend-

ing the work of Wilks. Matthai (1951) developed a method to

determine the correlation between two variates with missing

data using the total available data set. He formulated a

solution for the trivariate case using the correlation

estimates. His estimates, he concluded, were inconsistent.

For example, correlation coefficients could exceed unity.

Federspiel et al. (1959) and Glasser (1964)

generalized this situation. They investigated the









correlation matrix of a general number of variates based

on all available paired data. They studied intuitive

approaches for estimating linear regression coefficients

when an unspecified number and pattern of missing values

exist among the independent values. It is shown that the

efficacy of the approaches depends upon the correlations

among the independent variables as well as the proportion

of observations which are missing.

Lord (1955) demonstrated the solutions for the

trivariate case when the dependent variable is recorded

for all experimental units in the sample. Either of the

two independent variables is recorded for all experimental

units, but not both. He showed that, in this instance,

means and regression coefficients can be estimated

accurately.

The trivariate case was studied by Edgett (1956) in

the opposite sense of Lord. He gave attention to the in-

stance when the dependent variable has missing values and

the two independent variates were complete. Nicholson

(1957) extended Edgett's work to any number of independent

variables. Edgett and Nicholson demonstrated that a maxi-

mum likelihood function for a plausible probability

distribution could provide as good population parameter

estimates as could least squares estimates.

A mode of estimation different from Wilks' method

was provided by Dear (1959). He substituted for each









missing observation of an independent variate the division

of the sum of the value of all observed independent vari-

ables by the sum of the number of observations for all

observed independent variables. This somewhat corresponds

to the grand mean of all the independent variables. It

is clear that serious difficulties would be incurred when

the independent variables are measured on different scales.

Walsh (1959) and Buck (1960) considered omission

estimates in respect to paired simple linear regression.

Walsh studied the utilization of all data available for a

pair of variables in the simple linear regression computa-

tion. Those experimental units for which no data were

missing were looked at by Buck in the paired regression

analysis. Both Walsh and Buck determined that the average

of values obtained from the simple linear regression pro-

vided suitable estimates for missing responses.

Anderson (1957) investigated a particular pattern

of missing observations called a monotone sample. This is

a sample in which the observations on each variate is a sub-

set of another variate, i.e., each variate is nested within

another variate. He'set forth a method of estimation very

similar to Edgett's although greatly simplified in the

amount of necessary mathematical manipulation. Several

writers (Bhargava, 1962; Afifi and Elashoff, 1966, 1967)

have gone beyond the monotone trivariate case of Anderson

and determined solutions for the general variate case.








In addition, Bhargava developed the likelihood ratio tests

for hypotheses dealing with the linear model and equality of

covariance matrices with multivariate monotone samples.

Trawinski and Bargmann (1964) examined a considerably

more complicate pattern of missing data than Anderson (1957),

Bhargava (1962), and Afifi and Elashoff (1966, 1967). The

concern of Trawinski and Bargmann was with observations that

were missing not by accident, but by design. They found that

correlation coefficients were logically consistent estimates

to use with incomplete multivariate data.

In deference to data missing by accident or design,

Hocking and Smith (1968) assumed neither in developing their

analytic procedures. They formulated a procedure to compute

maximum likelihood estimates for parameters but only in the

case of large samples.

Anderson, Trawinski and Bargmann, and Hocking and

Smith used estimates of groups of data. They did not esti-

mate specific missing observations.

The design of experiments which involve multiresponses

and omissions was considered by Srivastava (1968). He pointed

out that an experimenter must give attention to whether or not

each response on each experimental unit is to be measured. He

provides a discussion of what he calls the lack of need of a

regular design. (A regular design is one where all responses

are sought on all experimental units.) Before data collection,

a researcher should set up his design such that the only data

collected will be somewhat convenient or useful.








Haitovsky (1968) compared the methods of Buck and

Walsh. He carried out a simulated data analysis, first

using only complete data, discarding incomplete experi-

mental units and second, using all available observations

to estimate correlations. He found the former procedure

superior. This is the case when the number of missing

entries is not high.

A comparison of a complete data set and an incom-

plete data set which is a subset of the complete set was

conducted by Morrison (1971). He determined that when the

correlations between the complete and incomplete variates

of the data set are small, the multivariate missing value

estimates are less accurate in the estimation of the mean

square error term than the multivariate data set with no

estimates.

An extension of the work of Walsh and Buck was

conducted by Dagenais (1971). He developed a more general-

ized method which not only corrects for data omissions but

also provides for additional corrections during data analysis.

His estimates are consistent when the independent variable is

fixed; each observation contains a value for the dependent

variable and at least one of the independent variables; and

some observations are complete.

Srivastava and Zaatar (1972) dealt with the problem

of classifying a future multiresponse observation into one

of two populations given two incomplete multiresponse









samples, one from each population. They developed a rule for

the classification given the fact that the observation did

come from one of the populations.

Investigations of entire sections of missing data

were performed by Hartwell and Gaylor (1973) and Rubin (1974).

The former examined missing cells employing the method of

unweighted means. He provides a method of cell estimation

using estimated variances. Rubin looked at complete blocks

of missing data by decomposing the original estimation problem

into smaller estimation problems using a technique he denotes

as factorizationn." This consists of discovering those

subject responses that are complete and using these response

patterns to estimate missing observations of subjects with a

similar response pattern.


Problems of Missing Multiresponse
Observations in Education

In a paper which is an overview of multivariate data

in education, Pruzek (1971) brought both the educational com-

munity and other areas of research face to face with the

problem of incomplete multiresponse data sets and their

investigation employing multivariate analysis of variance

(MANOVA). He outlined two procedures regarding the phenome-

non of missing data in MANOVA applications. The first is the

situation where several scattered responses are missing for

each dependent variable, and the second is where whole vectors

of responses are missing. No proven method of estimations

for omissions is provided.









Raffeld (1973) and Lord (1974) considered missing item

responses and their estimates. Lord examined ability and item

parameters. His emphasis was on the inappropriateness of

scoring an item as incorrect if it were omitted by the sub-

ject. He uses probability methods to estimate the omitted

data from a minimum of two or three thousand other subjects.

Raffeld pursued estimates of items on standardized achieve-

ment tests using mean value estimates. He concluded that

for omitted items on a standardized achievement test it is

better to assign value which is the mean of the alternatives

for that item rather than assigning the mean response for the

group omitting the item. Neither Lord nor Raffeld concerned

himself with subscbre estimates.


Direction of Present Research

The above review was concerned either with estimates

of missing data and their parameters or estimates of missing

data without concern for analysis. The intention of this

study is to forego parametric concerns, apply simple methods

of data estimation, analyze the estimated data sets, examine

the results of the analysis,and provide results directly

related to educational research. It will use a frequently

employed educational measurement, the achievement test with

several subscores, and investigate estimation methods under-

stood by most researchers and students of research.














CHAPTER III

DESIGN OF THE STUDY


The research conducted in this study focused on the

usefulness of the inclusion of multiresponse data, which

consists of several subscores, in a multivariate analysis

of variance as dependent variables when random missing sub-

scores were estimated using mean value and regression

techniques. The analyses of the data sets formed by the

two methods of estimation were compared to each other and

to the analysis of the complete data set.

The underlying focus of the research concerned the

efficacy of the above method when applied to educationally

related data. Thus the data sets investigated consisted of

achievement scores collected on elementary school students.


Procedures

Two random samples were drawn from two fixed groups.

The first sample consisted of 193 fourth-grade students and

the second of an equal number of fifth-grade students. Both

were administered the Stanford Achievement Test Battery in

the spring of 1974. The fourth-grade sample was given the

Intermediate I Battery and the fifth-grade sample the

Intermediate II Battery providing raw scores for analysis.









In preparing the data for analysis, random subsamples

were drawn comprising 2, 5, 10, 15, and 20 percent of each

of the two original complete data sets. The number of

subjects in each of these subsamples was 5, 10, 20, 29,

and 39, respectively. The subjects in these subsamples

were considered as having missing data. One achievement

subscore was randomly discarded for each subject in each of

the missing subsamples. This procedure was conducted five

times for each of the five percent levels, obtaining five

different random subsamples.

Utilizing the subjects without randomly chosen

missing subscores, means on each achievement test variable

were formed. These means were substituted for the randomly

discarded subscore for each subject in each of the missing

subsamples.

Likewise, the subjects without randomly chosen

missing subscores were subjected to multiple linear regres-

sion analysis. One achievement test subscore was randomly

chosen as the dependent variable, and the remaining sub-

scores were the independent variables. The nondiscarded

subscores of each of the subjects with a missing subscore

were substituted in the corresponding resulting regression

equation. The value obtained from the regression equation

was substituted for the randomly discarded subscores.









Method

In testing the hypotheses, multivariate analysis of

variance (MANOVA) was conducted on each of the 100 adjusted

samples with missing data and on the complete original sample

with no missing data. The two fixed groups were the inde-

pendent variables, and the achievement test subscores were

the dependent variables in each case. The MANOVA results

of the mean value estimates and the multiple linear regres-

sion estimates were compared to the MANOVA results of the

complete original sample and to each other.

The comparisons of the resulting F-ratios were

determined by the evaluation of the complement of the

cumulative distribution function of the variance ratio

distribution. The method consists of the following series

expansion. Let n and m be the first and second number of

degrees of freedom, respectively, and let


a = tan-' /nF/m


where F is the F-ratio value. Then if n is even, the comple-

ment P is defined as


P(n,m,F) = cosm a 1 + sin a


+ (m+2) sin4 a + .


m(m+2) . (m+n-4) n-2
+ 2)(4) . (n-2) s






18

If m is even,


P(n,m,F) = 1 sinn a 1 + cos a


+ n(n+2) o4 +


n(n+2) . (n+m-4) m-2
+ (2)(4) (m-2) cos


If n and m are both odd,


2 (2)(4) (m-l) m
P(n,m,F) ()() . (m-2) cosm sin a
T m+) ... (m+) (m+3)

S1 + lsin2 a + ( )(m+3) sin4 a
3 (3)(5)

+ + (m+l)(m+3) . (m+n-4) n-3
S + (3)(5) . (n-2) s

2 sin a cos a, I 2 c
S cos a

+ (2)(4) 4 +
3T5 cos a + .

+(2)(4) . (m-3) m-3 2
+ (3 ) . (m-2) os + 1

where, if n = 1, the first series is to be taken as zero, and

if m = 1, the second series is to be taken as zero and the
factor (2)(4) (m-l)
factor (3)(5) (m-2) is to be taken as unity (Hopper, 1970)

If the complement of the complete data set is greater
than 0.05 and the complement of a data set with an estimated
missing subsample is less than or equal to 0.05, then the









MANOVA results are considered significantly different from

one another. Likewise, if the complement of the complete data

set is less than or equal to 0.05 and the complement of a data

set with an estimated missing subsample is greater than 0.05,

then the MANOVA results are considered significantly different

from one another. If both results are either greater than

0.05 or less than or equal to 0.05, then the MANOVA results

are not considered significantly different from one another.




























This method is contingent upon the level of significance
chosen and relies on the fact that the point of significance
is immutable.














CHAPTER IV

RESULTS


It has been the experience of the researcher that

when conducting data analysis on achievement tests, he

obtains a list of scores which contains missing subscores.

The data on experimental units with missing subscores must

then be discarded and results in a loss of information.

The present study questioned the applicability of

using estimates for multiresponse data in multivariate

analysis of variance (MANOVA) when one response of an experi-

mental unit is missing. Both mean value and regression

estimates were employed for missing data in the manner

reported in Chapter III.

There were three specific questions 'investigated

in this study: Do mean value estimates provide different

MANOVA results from that obtained when analyzing the total

data set? Do regression estimates provide different MANOVA.

results from that obtained when analyzing the complete data

set? and thus, Do mean value estimates provide different

MANOVA results from regression estimates? Each of these

inquiries was looked at for varying percent levels of missing

data (2, 5, 10, 15, and 20 percent of the total sample).

The five different levels were employed on five different









random subsamples of missing data. This was performed on

two different data sets of fourth- and fifth-grade elemen-

tary school students for the two types of estimates. This

resulted in 5 x 5 x 2 x 2 random incomplete samples, or a

total of 100 incomplete samples, that were studied and

compared to the two complete data sets of fourth- and fifth-

grade students.

The presentation of results in this chapter is

according to each of the five percent levels of missing

data for the three aforementioned questions. These three

questions represent the three hypotheses which are stated

as follows:

Hi: There is no difference in MANOVA results for the
complete data set and the mean value estimated
data set with the size of the missing subsample
ranging from 2 to 20 percent of the complete
data set.

H2: There is no difference in MANOVA results for the
complete data set and the regression estimated
data set with the size of the missing subsample
ranging from 2 to 20 percent of the complete
data set.

H3: There is no difference in MANOVA results for the
mean value estimated data set and the regression
estimated data set both with the size of the
missing subsample ranging from 2 to 20 percent
of the complete data set.

The MANOVA F-ratios and the corresponding complement of the

cumulative distribution function of the variance ratio

distribution are provided in response to these hypotheses.

MANOVA performed on the complete data set of fourth

graders resulted in a F = 2.8851 with 8 and 185 df (degrees









of freedom); for the fifth graders, there resulted a

F = 3.3229 with 7 and 185 df. Determining the complement

of the cumulative distribution function, the P value

obtained for the fourth-grade data set was 0.004745 and

that for the fifth-grade data set was 0.002341.


Comparison of the Mean Value and the Regression
Estimated Data Sets with One Another and
with the Complete Data Set at the 2
Percent Level of Missing Subsamples

The values of the F-ratio and complement of the

cumulative distribution function for fourth- and fifth-

grade mean value and regression estimated data sets at the

2 percent level are presented in Table 1. For the fourth-

grade sample, no F-ratio of the mean value estimated data

sets differed from the complete data set's F-ratio by more

than 0.1267. Likewise, for the regression estimated data

sets, no .F-ratio differed from the complete data set's

F-ratio by more than 0.0675. Equivalent ranges for the

fifth-grade sample were 0.0329 and 0.0397, respectively.

Examining the complement of the cumulative distri-

bution function for the fourth-grade sample, no P of the

mean value estimated data sets differed from the complete

data set's complement by a value greater than 0.001388.

Likewise, for the regression estimated data sets, no comple-

ment differed from the complete data set's complement by

a value greater than 0.000798. Equivalent ranges for the

fifth-grade sample were 0.000196 and 0.000245, respectively.










TABLE 1. F-ratios and Complements (P) of the Cumulative Distribution
Function for Fourth- and Fifth-Grade Samples Having Mean
Value and Regression Estimated Subsamples Consisting of 2
Percent of the Complete Samples


Grade Four Grade Five
Mean Value Regression Mean Value Regression
F P F P F P F P

2.9708 2.9228 3.3265 3.2832
Sample 1
0.003756 0.004282 0.002323 0.002589


2.8974 2.9338 3.3126 3.2907
Sample 2
0.004589 0.004155 0.002406 0.002541


2.8796 2.9096 3.3462 3.2865
Sample 3
0.004817 0.004440 0.002212 0.002568


3.0118 2.9526 3.2983 3.2852
Sample 4
0.003357 0.003947 0.002493 0.002576


2.9590 2.9490 3.3558 3.2953
Sample 5
0.003878 0.003988 0.002158 0.002512









Since the complement of the complete data set for

both the fourth and fifth grades was less than 0.05 while

at the same time the five complements of the mean value and

the regression estimated data sets were less than 0.05, the

three null hypotheses are not rejected at the 2 percent

level of missing subsamples.


Comparison of the Mean Value and the Regression
Estimated Data Sets with One Another and
with the Complete Data Set at the 5
Percent Level of Missing SubsampIes

The values of the F-ratio and complement of the

cumulative distribution function for fourth- and fifth-grade

mean value and regression estimated data sets at the 5 per-

cent level are presented in Table 2. For the fourth-grade

sample, no F-ratio of the mean value estimated data sets

differed from the complete data set's F-ratio by more than

0.1859. Likewise, for the regression estimated data sets,

no F-ratio differed from the complete data set's F-ratio by

more than 0.0302. Equivalent ranges for the fifth-grade

sample were 0.1268 and 0.1226, respectively.

Examining the complement of the cumulative distri-

bution function for the fourth-grade sample, no P of the

mean value estimated data sets differed from the complete

data set's complement by a value greater than 0.001893.

Likewise, for the regression estimated data sets, no

complement differed from the complete data set's complement









TABLE 2. F-ratios and Complements (P) of the Cumulative Distribution
Function for Fourth- and Fifth-Grade Samples Having Mean
Value and Regression Estimated Subsamples Consisting of
5 Percent of the Complete Samples


Grade Four Grade Five
Mean Value Regression Mean Value Regression
F P F P F P F P

2.9982 2.9094 3.3587 3.3830
Sample 1
0.003484 0.004418 0.002143 0.002016


2.8943 2.8848 3.2744 3.2745
Sample 2
0.004628 0.004750 0.002647 0.002647


2.8706 2.8771 3.3053 3.2786
Sample 3
0.004937 0.004851 0.002450 0.002619


3.0710 2.9153 3.2904 3.3363
Sample 4
0.002852 0.004370 0.002543 0.002267


2.9555 2.8999 3.1961 3.2003
Sample 5
0.003916 0.004558 0.003219 0.003186









by a value greater than 0.000375. Equivalent ranges for the

fifth-grade sample were 0.000875 and 0.000842, respectively.

Since the complement of the complete data set for

both the fourth and fifth grades was less than 0.05 while

at the same time the five complements of the mean value and

the regression estimated data sets were less than 0.05, the

three null hypotheses are not rejected at the 5 percent

level of missing subsamples.


Comparison of the Mean Value and the Regression
Estimated Data Sets with One Another and
with the Complete Data Set at the 10
Percent Level of Missing Subsamples

The values of the F-ratio and complement of the

cumulative distribution function for fourth- and fifth-grade

mean value and regression estimated data sets at the 10 per-

cent level are presented in Table 3. For the fourth-grade

sample, no F-ratio of the mean value estimated data sets

differed from the complete data set's F-ratio by more than

0.5650. Likewise, for the regression estimated data sets,

no F-ratio differed from the complete data set's F-ratio by

more than 0.1607. Equivalent ranges for the fifth-grade

sample were 0.1006 and 0.0801, respectively.

Examining the complement of the cumulative distri-

bution function for the fourth-grade sample, no P of the

mean value estimated data sets differed from the complete

data set's complement by a value greater than 0.003977.

Likewise, for the regression estimated data sets, no










TABLE 3. F-ratios and Complements (P) of the Cumulative Distribution
Function for Fourth- and Fifth-Grade Samples Having Mean
Value and Regression Estimated Subsamples Consisting of
10 Percent of the Complete Samples



Grade Four Grade Five
Mean Value Regression Mean Value Regression
F P F P F P F P

3.0076 2.9488 3.4235 3.4030
Sample 1
0.003395 0.003988 0.001821 0.001917


2.9682 2.9043 3.2743 3.2802
Sample 2
0.003782 0.004504 0.002648 0.002609


2.8678 2.8713 3.3378 3.2773
Sample 3
0.004975 0.004928 0.002259 0.002628


3.4501 3.0458 3.2941 3.3524
Sample 4
0.000998 0.003057 0.002520 0.002177


3.0149


2.8983


3.2814


3.2859


Sample 5


0.003328


0.004578


0.002601


0.002572









complement differed from the complete data set's complement

by a value greater than 0.001688. Equivalent ranges for the

fifth-grade sample were 0.000523 and 0.000427, respectively.

Since the complement of the complete data set for

both the fourth and fifth grades was less than 0.05 while

at the same time the five complements of the mean value and

the regression estimated data sets were less than 0.05, the

three null hypotheses are not rejected at the 10 percent

level of missing subsamples.


Comparison of the Mean Value and the Regression
Estimated Data Sets with One Another and
with the Complete Data Set at the 15
Percent Level of Missing Subsamples

The values of the F-ratio and complement of the

cumulative distribution function for fourth- and fifth-grade

mean value and regression estimated data sets at the 15 per-

cent level are presented in Table 4. For the fourth-grade

sample, no F-ratio of the mean value estimated data sets

differed from the complete data set's F-ratio by more than

0.3063. Likewise, for the regression estimated data sets,

no F-ratio differed from the complete data set's F-ratio by

more than 0.1386. Equivalent ranges for the fifth-grade

sample were 0.2364 and 0.0412, respectively.

Examining the complement of the cumulative distri-

bution function for the fourth-grade sample, no P of the mean

value estimated data sets differed from the complete data

set's complement by a value greater than 0.002696. Likewise,










TABLE 4. F-ratios and Complements (P) of the Cumulative Distribution
Function for Fourth- and Fifth-Grade Samples Having Mean
Value and Regression Estimated Subsamples Consisting of
15 Percent of the Complete Samples


Grade Four Grade Five
Mean Value Regression Mean Value Regression
F P F P F P F P

2.9470 2.9765 3.5593 3.3263
Sample 1
0.004008 0.003697 0.001294 0.002325


2.8829 2.8880 3.2797 3.3013
Sample 2
0.004775 0.004708 0.002612 0.002475


2.8862 2.8830 3.4280 3.2971
Sample 3
0.004731 0.004773 0.001801 0.002501


3.1914 3.0237 3.2777 3.2899
Sample 4
0.002049 0.003249 0.002625 0.002547


3.1742 2.9796 3.3087 3.2817
Sample 5
0.002146 0.003666 0.002430 0.002599









for the regression estimated data sets, no complement dif-

fered from the complete data set's complement by a value

greater than 0.001496. Equivalent ranges for the fifth-

grade sample were 0.001050 and 0.000255, respectively.

Since the complement of the complete data set for

both the fourth and fifth grades was less than 0.05 while

at the same time the five complements of the mean value and

the regression estimated data sets were less than 0.05, the

three null hypotheses are not rejected at the 15 percent

level of missing subsamples.


Comparison of the Mean Value and the Regression
Estimated Data Sets with One Another and
with the Complete Data Set at the 20
Percent Level of Missing Subsamples

The values of the F-ratio and complement of the

cumulative distribution function for fourth- and fifth-grade

mean value and regression estimated data sets at the 20 per-

cent level are presented in Table 5. For the fourth-grade

sample, no F-ratio of the mean value estimated data sets

differed from the complete data set's F-ratio by more than

0.3305. Likewise, for the regression estimated data sets,

no F-ratio differed from the complete data set's F-ratio by

more than 0.1237. Equivalent ranges for the fifth-grade

sample were 0.2711 and 0.0479, respectively.

Examining the complement of the cumulative distri-

bution function for the fourth-grade sample, no P of the

mean value estimated data sets differed from the complete










TABLE 5. F-ratios and Complements (P) of the Cumulative Distribution
Function for Fourth- and Fifth-Grade Samples Having Mean
Value and Regression Estimated Subsamples Consisting of
20 Percent of the Complete Samples



Grade Four Grade Five
Mean Value Regression Mean Value Regression
F P F P F P F P

2.9608 2.9272 3.5940 3.3024
Sample 1
0.003859 0.004231 0.001185 0.002468


2.8703 2.8637 3.3104 3.2750
Sample 2
0.004941 0.005031 0.002419 0.002643


2.9036 2.8916 3.5476 3.3119
Sample 3
0.004513 0.004663 0.001333 0.002410


3.0312 2.9180 3.3004 3.3196
Sample 4
0.003183 0.004339 0.002480 0.002364


3.2156 3.0088 3.3048 3.2770
Sample 5
0.001915 0.003384 0.002453 0.002630









data set's complement by a value greater than 0.002830.

Likewise, for the regression estimated data sets, no comple-

ment differed from the complete data set's complement by a

value greater than 0.001361. Equivalent ranges for the

fifth-grade sample were 0.001159 and 0.000299, respectively.

Since the complement of the complete data set for

both the fourth and fifth grades was less than 0.05 while

at the same time the five complements of the mean value and

the regression estimated data sets were less than 0.05, the

three null hypotheses were not rejected at the 20 percent

level of missing subsamples.


Further Results

To determine which method of estimation investigated

was the stronger, an inspection of the values of the F-ratios

and complements of the cumulative distribution function was

conducted. The closeness of these values of the incomplete

data sets to that of the appropriate complete data set was

observed. For each group of five incomplete data sets at

each percent level, the range of values was found and

examined for largeness of width.

The largest range at each percent level of missing

data for the fourth-grade sample with mean value estimates

varied from 0.001388 to 0.003977, whereas, for the regres-

sion estimated samples, it varied from only 0.000375 to

0.001688. For the fifth-grade samples with mean value









estimates, the range varied from 0.000196 to 0.001159. For

regression estimates, it was 0.000245 to 0.000842. Only at

the 2% percent level of missing values did the mean value

complement range not exceed that of the regression comple-

ment range.

A closer examination of the results revealed addi-

tional information. One might presume that as the percent

of estimated data elements decreased, the smaller the range

would be between the value of the F-ratio of the complete

data set and the most distant value of the F-ratio of the

data sets with estimated values. This was neither consistent

within the fourth-and fifth-grade samples nor within the

method of estimation. Considering the percent level of

missing data with the shortest range to the level with the

longest range, the order for the fourth-grade sample with

mean value estimates is 2, 5, 15, 20, 10; for the fourth-

grade sample with regression estimates, 5, 2, 20, 15, 10;

for the fifth-grade sample with mean value estimates, 2%,

10, 5, 15, 20; and for the fifth-grade sample with regres-

sion estimates, 2, 15, 20, 10, 5. The exact results hold

for the complement of the cumulative distribution function.

Another presumption might be that the value of the

F-ratio of the complete data set would be within the range

of the values of the F-ratios at a particular percent level

of missing data. This is consistent for the fourth- and

fifth-grade samples within a method of estimation but not









between methods of estimation. For both the fourth- and

fifth-grade samples having mean value estimates, the value

of the F-ratio of the complete data set is within the range

of the values of the F-ratios for all percent levels of

missing data. For regression estimated samples, this is

not the case. The fourth-grade samples have F-ratios not

inclusive, range-wise, of the complete data set's F-ratio

at the 2 percent level; for the fifth grade, it is at the

2% and 20 percent levels. The value of the F-ratio of the

complete data set exceeds the values of the F-ratio in the

fifth-grade sample and precedes the values in the fourth-

grade sample.


Summary

In summary, this chapter has presented the statisti-

cal analysis of the data. The results of the study indicated

that no significant differences exist among the MANOVA

results of data sets having missing subscores estimated by

mean values, data sets having missing subscores estimated by

regression, and the complete data set with no missing values.

This was demonstrated for 100 samples with estimated sub-

scores. The estimated subsamples consisted of 2, 5, 10,

15, and 20 percent of the complete samples of fourth- and

fifth-grade students.

Since inspection showed that the regression esti-

mated values provided MANOVA and complement results at each






35


percent level closer, in all instances, to that of the

complete data set, it is apparently the stronger of the two

estimation procedures. Both methods of estimation, though,

were demonstrated to provide MANOVA results not signifi-

cantly different from the results of the complete data sets.














CHAPTER V

DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS


Discussion

The intention of this study was to examine the

effect of different estimators for missing multiresponse

data on multivariate analysis of variance (MANOVA) results.

Mean value and regression techniques were used in deter-

mining estimates. The MANOVA results for the data sets

which employed the different estimation techniques were

compared to each other and to MANOVA results of the complete

data set.

Specifically investigated were the achievement test

scores of a fourth-grade sample and a fifth-grade sample.

Fifty MANOVAs were conducted on each grade; 25 analyzed the

incomplete data sets with mean value estimates and 25 with

regression estimates. The 25 analyses were subgrouped into

five sets of analyses. Each set contained a different per-

cent level of missing data. These levels were 2, 5, 10,

15, and 20 percent of the complete sample. Five samples

with different missing subsets of data were analyzed at each

level.

The results of Chapter IV demonstrated that the

MANOVA results of both estimation techniques did not differ








significantly from one another nor from the results obtained

from the complete data set. Inspection of the F-ratios and

complements implied that the regression method was apparently

the stronger estimation technique.

The latter result was determined by the closeness of

the values of the F-ratios and the complements of the cumu-

lative distribution function for the estimated samples to

that of the complete data set.

In addition, two a posteriori results were observed.

It was found that as the percent of estimated data elements

decreased, it did not follow that the smaller the range

would be between the value of the F-ratio of the complete

data set.and the most distant value of the F-ratio of the

data sets with estimated values. The non sequitur held for

both grades of students and both methods of estimation.

This was likewise true for the complement of the cumulative

distribution function.

A second finding was that the F-ratio of the complete

data set was not within the range of the values of the F-ratios

at all percent levels of missing data estimated by regression

techniques. It did hold for mean value estimated data sets.

The same findings occurred among the complements of the

cumulative distribution function.


Conclusions

Three conclusions were drawn from the present

study:









1. Achievement data with up to 20 percent missing
subscores that are estimated by mean value
techniques when analyzed by MANOVA provide
results which do not differ significantly from
MANOVA results of the same achievement data
without any missing subscores.

2. Achievement data with up to 20 percent missing
subscores that are estimated by regression
techniques when analyzed by MANOVA provide
results which do not differ significantly
from MANOVA results of the same achievement
data without any missing subscores.

3. Achievement data with up to 20 percent missing
subscores that are estimated by mean value
techniques when analyzed by MANOVA provide
results which do not differ significantly
from MANOVA results of achievement data with
up to 20 percent missing subscores that are
estimated by regression techniques.

The above conclusions seem to suggest that there

exist for educators alternatives in data analysis other than

discarding incomplete multiresponse observations. The

alternatives provided here are the two methods of estimation:

mean value and regression. In addition, the mean value

method of estimation was demonstrated to be as appropriate

in MANOVA as the regression method as proven by the non-

rejection of the third hypothesis. Further data consider-

ations revealed that for all levels of missing data, the

F-ratio of the complete data set was located within the

range of the F-values determined for the data sets with

missing subsamples estimated by the mean value methods.

This did not hold for the regression method.

Since the mean value method is straightforward

and has been proved to be an appropriate estimation









technique, data formerly lost to analysis can be retained.

No longer must estimates for omissions be evaded because of

complicated data manipulations, time, money, and resources.


Recommendations

The present study has operated under various limi-

tations which need to be investigated in order to extend

the inferences of this research. Bracht and Glass (1968)

stated:

The intent (sometimes explicitly stated, sometimes
not) of almost all experimenters is to generalize
their findings to some group of subjects and set
of conditions that are not included in the experi-
ment. To the extent and manner in which the
results of an experiment can be generalized to
different subjects, settings, experimenters, and,
possibly, tests, the experimenter possesses
external validity. pp. 437-438

The external validity of this study is restricted by the

lack of reported research dealing with statistical analyses

which employ data estimates without parametric estimates.

Areas which require further investigation in reference to

inferential conclusions are presented in the following list:

1. The samples consisted of fourth and fifth
graders. Other educational levels need to
be examined.

2. Achievement scores for two levels of one
standardized achievement test were analyzed.
Other standardized achievement tests need
to be investigated.

3. In addition to achievement tests, other types
of tests which measure not only the cognitive
domain but also the affective domain need to
be studied such as those dealing with self-
concept and social acceptance.









4. Other methods of estimation need to be con-
sidered in a manner similar to the present
investigation and compared to mean value
methods for accuracy and simplicity.

5. Missing subsamples were determined randomly.
Actual missing subsamples need to be investi-
gated for possible commonalities.

6. The levels of missing data should be expanded
in order to determine maximum levels of missing
subsamples.

7. More than one missing subscore per experimental
unit needs inspection.

8. Experimental designs requiring analyses different
from multivariate analysis of variance need
probing.

These recommendations are listed not only to provide closure

to the present study but also to indicate the multidirec-

tional approaches involved in this specific area of research.

Closure is provided with respect to confining the present

research's inferences to the subset of investigations out-

side of the above listing. The expanse of additional

approaches is suggested by the list itself. No one item

of the list is more worthy of study than the other. All

need investigation in order to advance to the universal

set of estimators for omissions of multirespons.e data.














REFERENCES


Afifi, A. and Elashoff, R. M. "Missing observations in
multivariate statistics I. Review of the litera-
ture." Journal of the American Statistical
Association, 1966, 61, 595-604.

Afifi, A. and Elashoff, R. M. "Missing observations in
multivariate statistics II. Point estimation in
simple linear regression." Journal of the
American Statistical Association, 1967, 62,
10-29.

Anderson, T. W. "Maximum likelihood estimates for a multi-
variate normal distribution when some observations
are missing." Journal of the American Statistical
Association, 1957, 52, 200-203.

Baird, H. R. and Kramer, C. Y. "Analysis of variance of a
balanced incomplete block design with missing
observations. Applied Statistics, 1960, 9,
189-198.

Bhargava, R. Multivariate tests of hypotheses with incomplete
data. Applied Mathematics and Statistical Labora-
tories, Technical Report 3, 1962.

Bracht, G. H. and Glass, G. V. "The external validity of
experiments." American Educational Research
Journal, 1968, 5, 437-474.

Buck, S. F. "A method of estimation of missing values in
multivariate data suitable for use with an electronic
computer." Journal of the Royal Statistical Society,
Series B, 1960, 22, 302-307.

Dagenais, M. G. "Further suggestions concerning the utili-
zation of incomplete observations in regression
analysis." Journal of the American Statistical
Association, 1971, 66, 93-98.









Dear, R. E. "A principal-component missing-data method for
multiple regression models." SP-86, Systems Develop-
ment Corporation, Santa Monica, California, 1959.

Dempster, A. P. "An overview of multivariate data analysis."
Journal of Multivariate Analysis, 1971, 1, 316-346.

Edgett, G. L. "Multiple regression with missing observa-
tions among the independent variables." Journal of
the American Statistical Association, 1956, 51,
122-131.

Federspiel, C. F., Monroe, R. J., and Greenberg, B. G.
"An investigation of some multiple regression
methods for incomplete samples." University of
North Carolina, Institute of Statistics, Mineo
Series, No. 236, August 1959.

Glasser, M. "Linear regression analysis with missing
observations and the independent variables."
Journal of the American Statistical Association,
1964, 59, 834-844.

Haitovsky, Y. "Missing data in regression analysis."
Journal of the Royal Statistical Society,
Series B, 1968, 30, 67-82.

Hartwell, T. D. and Gaylor, D. W. "Estimating variance
components for two-way disproportionate data with
missing cells by the method of unweighted means."
Journal of the American Statistical Association,
1973, 68, 379-383.

Hocking, R. R. and Smith, W. B. "Estimation of parameters
in the multivariate normal distribution with
missing observations." Journal of the American
Statistical Association, 1968, 63, 159-173.

Hopper, M. J., comp. Harwell Subroutine Library: A
Catalogue of Subroutines. London: Her Majesty's
Stationery Office, State House, 49 High Holborn,
1970.

Kleinbaum, D. G. Estimation and hypothesis testing for
generalized multivariate linear models. Doctoral
dissertation, University of North Carolina, Chapel
Hill, North Carolina, 1970.

Kramer, C. Y. and Glass, S. "Analysis of variance of a
Latin square design with missing observations."
Applied Statistics, 1960, 9, 43-50









Lord, F. M. "Estimation of parameters from incomplete data."
Journal of the American Statistical Association,
1955, 50, 870-876.

Lord, F. M. "Estimation of latent ability and item parame-
ters when there are omitted responses." Psycho-
metrika, 1974, 39, 247-264.

Matthai, A. "Estimation of parameters from incomplete data
with applications to design of sample surveys."
Sankhya, 1951, 2, 145-152.

Mitra, S. K. "Some remarks on the missing plot analysis."
Sankhya, 1959, 21, 337-344.

Morrison, D. F. "Expectations and variances of maximum
likelihood estimates of the multivariate normal
distribution parameters with missing data."
Journal of the American Statistical Association,
1971, 66, 602-604.

Nicholson, G. E., Jr. "Estimation of parameters from
incomplete multivariate samples." Journal of
the American Statistical Association, 1957, 2,
523-526.

Preece, D. A. "Query and answer: Non-additivity in two-
way classifications with missing values." Bio-
metrics, 1972, 28, 574-577.

Pruzek, R. M. "Methods and problems in the analysis of
multivariate data." Review of Educational Research,
1971, 41, 163-190.

Raffeld, P. C. The effects of Guttman weights on the
reliability and predictive validity of objective
tests when omissions are not differentially
weighted. Doctoral dissertation, University of
Oregon, 1973.

Rubin, D. B. "Characterizing the estimation of parameters
in incomplete-data problems." Journal of the
American Statistical Association, 1974, 69, 467-
474.

Srivastava, J. N. "On the extension of Gauss-Markov theorem
to complex multivariate linear models." The Annals
of the Institute of Statistical Mathematics 1967,
19, 417-437.









Srivastava, J. N. "On a general class of designs for multi-
response experiments." The Annals of Mathematical
Statistics, 1968, 39, 1825-1843.

Srivastava, J. N. and McDonald L. "On the costwise optimality
of hierarchical multiresponse randomized block designs
under the trace criterion." The Annals of the Insti-
tute of Statistical Mathematics, 1969, 21, 507-514.

Srivastava, J. N. and McDonald, L. "On the costwise opti-
mality of certain hierarchical and standard multi-
response models under the determinant criterion."
Journal of Multivariate Statistics, 1971, 1, 118-
128.

Srivastava, J. N. and Zaatar, M. K. "On the maximum likeli-
hood classification rule for incomplete multivariate
samples and its admissibility." Journal of Multi-
variate Analysis, 1972, 2, 115-126.

Trawinski, I. M. Incomplete-variable designs. Doctoral
dissertation, Virginia Polytechnic Institute,
Blacksburg, Virginia, 1961.

Trawinski, I. M. and Bargmann, R. E. "Maximum likelihood
estimation with incomplete multivariate data."
The Annals of Mathematical Statistics, 1964, 35,
647-657.

Walsh, J. E. "Computer-feasible general method for fitting
and using regression functions when data are
incomplete." SP-71, System Development Corpo-
ration, Santa Monica, California, 1959.

Wilkinson, G. N. "Comparison of missing value procedures."
Australian Journal of Statistics, 1960, 2, 53-65.

Wilks, S. S. "Moments and distributions of estimates of
population parameters from fragmentary samples."
The Annals of Mathematical Statistics, 1932, 3,
163-195.














BIOGRAPHICAL SKETCH


Stephen S. Sledjeski was born November 27, 1942, in

Greenport, New York. He graduated from Southold High School,

Southold, New York; the Diocesan Preparatory Seminary,

Buffalo, New York (A.A.); St. Bonaventure University, St.

Bonaventure, New York (B.S.); and the University of Florida,

Gainesville, Florida (M.Ed., Ed.S., Ph.D.).

His educational employment experience consists of

working as a middle school mathematics teacher with the

Alachua County Board of Public Instruction, Gainesville,

Florida; a research associate with Santa Fe Community

College, Gainesville, Florida; supervisor of data processing

as a graduate research assistant with the Florida Parent

Education Model of Project Follow Through, University of

Florida, Gainesville, Florida; and Research Specialist at

P. K. Yonge Laboratory School, Gainesville, Florida. In

addition, he has been a statistical and computer consultant

for doctoral students, the Florida State Department of

Health and Rehabilitation Services, and the Career Oppor-

tunities Program, Richmond, Virginia.










I certify that I have read this study and that in
my opinion'it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality, as
a dissertation for the degree of Doctor of Philosophy.




Vynce A. Hines, Chairman
Professor of Foundations of Education


I certify that I have read this study and that in
my opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality, as
a dissertation for the degree of Doctor of Philosophy.


( e/ e

Ira J. Gord n
Graduate Research Professor of
Foundations of Education


I certify that I have read this study and that in
my opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality, as
a dissertation for the degree of Doctor of Philosophy.




Robert S. Soar
Professor of Foundations of
Education


I certify that I have read this study and that in
my opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality, as
a dissertation for the degree of Doctor of Philosophy.




Z. R. Pop Stojanovic
Associate Chairman and Professor
of Mathematics










I certify that I have read this study and that in
my opinion it conforms to acceptable standards of scholarly
presentation and is fully adequate, in scope and quality, as
a dissertation for the degree of Doctor of Philosophy.





Hattie Bessent
Assistant Professor of Foundations
of Education


This dissertation was submitted to the Graduate Faculty of
the College of Education and to the Graduate Council, and
was accepted as partial fulfillment of the requirements for
the degree of Doctor of Philosophy.


March, 1976



Dean, Colleg of education


Dean, Graduate School




Full Text

PAGE 1

A STUDY. OF THE POWER OF MULTIVARIATE ANALYSIS OF VARIANCE ON STANDARDIZED ACHIEVEMENT TESTING WHEN ESTIMATORS FOR OMISSIONS UTILIZE MEAN VALUE AND REGRESSION APPROACHES By STEPHEN S. SLEDJESKI A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLIffiNT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1976

PAGE 2

UNIVERSITY OF FLORIDA ililllliliiii 3 1262 08552 7785

PAGE 3

ACKNOWLEDGEMENTS My appreciation is extended to the members of my doctoral committee for their contributions to the development of this dissertation. They are: Drs. Vynce A. Hines (Chairman), Ira J. Gordon, Zorin R. Pop-Stojanovic, and Robert S. Soar. To Dr. Hattie Bessent, no statement can express her impact and assistance in attaining my educational goals. Words can be neither sufficient nor appropriate to express my esteem. To Drs. Ann Bromley, Molly Harrower, and Wilson H. Guertin, I present thanks for direction and assistance in the understanding of my educational commitment. To my sisters, Helen Brush and Ann Pendzick, and their families, I can but state our fortuitous interaction which has allowed not only educational growth but also complete dispersion while retaining faith in one another's existence. To my mother, Helen Sledjeski, and my late father, Stephen Sledjeski, I wish to express my deepest appreciation for their successful development of a family unit filled with motivation, sincerity, trust, and love. This work is dedicated to their lives and memory.

PAGE 4

TABLE OF CONTENTS Page ACKNOWLEDGEMENTS . ±± LIST OF TABLES . ... . v ABSTRACT vi Chapter I . INTRODUCTION 1 Nature of the Study , I The Problem and the Hypotheses 4 Significance of the Study 5 II. REVIEW OF RELATED LITERATURE ........ . . 7 Introduction 7 Historical Overview 7 Problems of Missing Multiresponse Observations in Education 13 Direction of Present Research 14 III. DESIGN OF THE STUDY . 15 Procedures 15 Method 17 IV. RESULTS 20 Comparison of the Mean Value and the Regression Estimated Data Sets with One Another and with the Complete Data Set at the 2% Percent Level of Missing Subsamples 22 Comparison of the Mean Value | and the Regression Estimated Data Sets with One Another and with the Complete Data Set at the 5 Percent lievel of Missing Subsamples ' 24

PAGE 5

TABLE OF CONTENTS — Continued Chapter Page s. IV. Comparison of the Mean Value and the Regression Estimated Data Sets xizith One Another and with the Complete Data Set at the 10 Percent Level of Missing Subsamples 26 Comparison of the Mean Value and the Regression Estimated Data Sets with One Another and with the Complete Data Set at the 15 Percent Level of Missing Subsamples 28 Comparison of the Mean Value and the ; Regression Estimated Data Sets with One Another and with the Complete Data Set at the 20 Percent Level of Missing Subsamples 30 Further Results 32 Siommary 34 V. DISCUSSION, CONCLUSIONS, AND RECOl^IMENDATIONS .. 36 Discussion 36 Conclusions 37 Recommendations 39 REFERENCES 41 BIOGRAPHICAL SKETCH 45

PAGE 6

LIST OF TABLES Table Page 1 F-ratios and Complements (P) of the Cumulative Distribution Function for Fourthand FifthGrade Samples Having Mean Value and Regression Estimated Subsamples Consisting of 2% Percent of the Complete Samples 23 2 F-ratios and Complements (P) of the Cumulative Distribution Function for Fourthand FifthGrade Samples Having Mean Value and Regression Estimated Subsamples Consisting of 5 Percent of the Complete Samples 25 3 F-ratios and Complements (P) of the Cumulative Distribution Function for Fourthand FifthGrade Samples Having Mean Value and Regression Estimated Subsamples Consisting of 10 Percent of the Complete Samples 27 4 F-ratios and Complements (P) of the Ciomulative Distribution Function for Fourthand FifthGrade Samples Having Mean Value and Regression Estimated Subsamples Consisting of 15 Percent of the Complete Samples 29 5 F-ratios and Complements (P) of the Cumulative Distribution Function for Fourthand FifthGrade Samples Having Mean Value and Regression Estimated Subsamples Consisting of 20 Percent of the Complete Samples 31

PAGE 7

Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A STUDY OF THE POWER OF MULTIVARIATE ANALYSIS OF VARIANCE ON STANDARDIZED ACHIEVEMENT TESTING WHEN ESTIMATORS FOR OMISSIONS UTILIZE MEAN VALUE AND REGRESSION APPROACHES By Stephen S. Sledjeski March, 1976 Chairman: Dr. Vynce A. Hines Major Department: Foundations of Education The efficacy of utilizing estimators for omissions in a multiresponse achievement data set which is analyzed using multivariate analysis of variance (MANOVA) techniques is the concern of this study. The estimates were determined employing mean value and regression methods. Random samples of fourthand fifth-grade students were administered the Stanford Achievement Test, Intermediate Level I and Intermediate Level II, respectively, in the spring of 1974. Each sample had a n of 193 consisting of two fixed groups as the independent variables and the achievement subscores as the dependent variables. These two samples comprised the complete data sets from which random subsamples of missing data were removed

PAGE 8

from among the dependent variables. The missing subsample consisted of 2%, 5, 10, 15, and 20 percent of the complete samples, each percent level being investigated five times for each of the two methods of estimation. The MANOVA results of the data sets with mean value and regression estimates were compared to one another and to the complete data set. The null hypotheses tested were: • There is no difference in MANOVA results for the complete data set and the mean value estimated data set with the size of the missing subsample ranging from 2% to 20 percent of the complete data set. • There is no difference in MANOVA results for the complete data set and the regression estimated data set with the size of the missing subsample ranging from 2% to 20 percent of the complete data set. • There is no difference in I-IANOVA results for the mean value estimated data set and the regression estimated data set both with the size of the missing subsample ranging from 2% to 20 percent of the complete data set. The hypotheses were analyzed by comparing the complement of the ctomulative distribution function derived from the F-ratio of each MANOVA of the complete data set to that of the estimated data sets. No significant differences were found for the three hypotheses. Inspection of the results demonstrated that the regression estimates provide MANOVA results apparently closer to that of the complete data set than did mean value estimates. The research concluded that, within the confines of this study, one cannot reject the use of mean value and

PAGE 9

regression estimates for data sets with missing values which are to be analyzed using MANOVA.

PAGE 10

CHAPTER I INTRODUCTION With the increased emphasis on multivariate analysis, the experimenter has been confronted with multiresponse data where measurements on all responses are not available for every experimental unit. Since the time, resources, and money involved in gathering multiple observations on experimental subjects are greater than for gathering single observations, multivariate analysis of variance (MANOVA) must give attention to missing data. It is the purpose of this study to consider missing observations in MANOVA utilizing mean value and regression estimators on a set of achievement data with subsets of randomly chosen missing data ranging in size from 2% to 20 percent of the complete data set. The power of MANOVA results will then be determined. Nature of the Study Missing data estimation has been of interest to educational and statistical researchers for several decades. Estimation of uniresponse data has been conducted for various experimental designs. Baird and Kramer (1960) investigated the balanced incomplete block design. They developed

PAGE 11

formulas through minimization of the error sum of squares for the special case where missing values are within the same block or treatment. Their method facilitates calculations but does nothing to restore missing information. Kramer and Glass (1960) examined the Latin square design. In the same manner as Baird and Kramer, they developed formulas through minimizing of the error sums of squares for several missing values to restore the balance of the design. The formulas are for the specific cases described and not for the completely general case. Preece (1972) studied the two-way classification design. He developed a method of estimating block and treatment parameters from the nonmissing data plus the estimated data. Mitra (1959) considered the effect of missing value estimates on the F-test in analysis of variance (ANOVA) . He demonstrated that the numerator in F (the treatment mean square) and the denominator (the error mean square) cannot have the same expected value when missing observations exist, An examination of various missing data procedures was performed by Wilkinson (1960) . He put forth a method of solving for estimates through simultaneous equations and compares it to an iterative least squares method and a covariance method. His method is preferred since it requires fewer steps and gives the correct residual sums of squares directly.

PAGE 12

Studies investigating multiresponse data estimators have been less numerous. The works of Kleinbaum (1970), Srivastava (1967) , and Trawinski (1961) are some examples of early endeavors in multiresponse data. Kleinbaum looked at the effect of estimation upon hypothesis testing of generalized multivariate linear models. In concurrence with Mitra who investigated the uniresponse situation, he demonstrated that hypotheses are rejected with bias when utilizing estimators for missing values. Srivastava extended the Gauss -Markov theorem to multivariate linear models. Trawinski showed that it is not necessary to collect data on each characteristic of interest for each experimental unit. She brought out the important fact that in many situations one needs to have experiments where observations on some of the responses are missing not by accident, but by design. The relevance and importance of missing observations were demonstrated by Srivastava and McDonald (1969, 1971). They established, under realistic conditions, the preference for the hierarchial incomplete models within the groups of general incomplete multiresponse models. Dempster (1971) provided an overview of the problems involved. He surveyed a cross section of the developing topics in multivariate analysis of data concentrating on problems of pragmatic data analysis and not on technical and mathematical detail.

PAGE 13

The Problem and the Hypotheses The present investigation will attempt to determine the efficacy of two types of estimates of missing data in MANOVA. One type of estimate will be the mean value of the variable for a particular treatment; the other, the regression of one of the MANOVA dependent variables on the remaining dependent variables which then act as independent variables. The results of these MANOVAs will be compared to MANOVA results of nonmissing data. The hypotheses to be investigated are: Hi: There is no difference in MANOVA results for the complete data set and the mean value estimated data set with the size of the missing subsample ranging from 2% to 20 percent of the complete data set. H2 : There is no difference in MANOVA results for the complete data set and the regression estimated data set with the size of the missing subsample ranging from 2% to 20 percent of the complete data set, H3 : There is no difference in MANOVA results for the mean value estimated data set and the regression estimated data set both with the size of the missing subsample ranging from 2% to 20 percent of the complete data set. For each hypothesis, missing subsamples will be randomly chosen which will comprise 2%, 5, 10, 15, and 20 percent of the original complete sample. Each subsample percent level will be investigated five times. Estimated values will then be substituted and be subjected to MANOVA. F-values from the MANOVA results will be compared using the cumulative distribution function to determine the power of the analyses.

PAGE 14

Data used in the analysis will consist of achievement scores as determined on the Stanford Achievement Test collected in the spring of 1974. : Two samples will be investigated: a fourth-grade sample of 193 students who were administered the Intermediate I Battery (eight variables) and a fifth-grade sample of 193 students who were administered the Intermediate II Battery (seven variables). The students in each sample were chosen at random from each of two fixed groups, an experimental group and a control group. For each MANOVA, the independent variables will be the two fixed groups . Significance of the Study The two types of estimators to be investigated differ from one another in an important sense. The mean value estimator considers all nonmissing values of a particular dependent variable for a specific treatment whereas the regression estimators consider only those experimental units with complete data. One approach attempts to utilize all possible data elements, and the other forms an estimation based on even less information. Combining the fact of the two approaches with that of varying subsamples of missing data: will provide a thorough look at omissions in multires^ponse data taken from an educational setting. It is hoped that insights will be developed for future analysis of similar educational data.

PAGE 15

This chapter has presented the problem to be investigated and the nature, significance, and hypotheses of the study. Chapter II contains a review of literature related to the problem of the study. The design and procedures are stated in Chapter III; the results of the study are in Chapter IV; and the discussion, conclusions^ and recommendations are given in Chapter V.

PAGE 16

CHAPTER II REVIEW OF RELATED LITERAURE Introduction Missing data have posed a problem in data analysis r for more than four decades . The initial investigations involving incomplete data sets concerned univariate statistical analysis. With the developments in computational technology in the past quarter century, multivariate data analysis has become feasible (Dempster, 1971) as has the investigation of missing data in multivariate analysis. The initial focus of researchers concerned the techniques involved in the estimation of parameters when there existed missing observations in the data set. It was a question of developing the parameters and then adjusting these parameters considering the missing data. The direction taken in the review of the literature which follows is first, the estimation of the missing observations and second, the formulation of the parameters required for analysis. Historical Overview The first researcher to develop analysis procedures by first estimating values for the missing observations was 7 -

PAGE 17

Wilks (1932) . He examined the incomplete bivariate case of a bivariate normal distribution using sample means for the missing observations. He found that the optimum method of determining the variance between the two variables was the correlation between the two variables which included only those pairs that were complete. Wilks' example of a sample of statistical data from a multivariate population has been popularized in many related papers. Srivastava and Zaatar (1972) summarized Wilks' example as: [T]he situation when the experimental units are skulls that have been dug out from a certain graveyard. Since these skulls may be partly mutilated, the choice as to which characteristics should be measured on a particular unit is not entirely in the hand of the investigator. (One may suggest that in such a situation, we should restrict ourselves to those skulls on which all measurements of interest can be obtained. However, clearly this would in general not be very proper unl,ess there were a rather large number of skulls free from any mutilation.) p. 117 Little more was published on incomplete multivariate data sets until the 1950s when papers began to appear extending the work of Wilks. Matthai (1951) developed a method to determine the correlation between two variates with missing data using the total available data set. He formulated a solution for the trivariate case using the correlation estimates. His estimates, he concluded, were inconsistent. For example, correlation coefficients could exceed unity. Federspiel et al . (1959) and Glasser (1964) generalized this situation. They investigated the

PAGE 18

correlation matrix of a general number of variates based on all available paired data. They studied intuitive approaches for estimating linear regression coefficients when an unspecified number and pattern of missing values exist among the independent values. It is shown that the efficacy of the approaches depends upon the correlations among the independent variables as well as the praportion of observations which are missing. Lord (1955) demonstrated the solutions for the trivariate case when the dependent variable is recorded for all experimental units in the sample. Either of the two independent variables is recorded for all experimental units, but not both. He showed that, in this instance, means and regression coefficients can be estimated accurately. The trivariate case was studied by Edgett (1956) in the opposite sense of Lord. He gave attention to the instance when the dependent variable has missing values and the two independent variates were complete.. Nicholson (1957) extended Edgett 's work to any number of independent variables. Edgett and Nicholson demonstrated that a maximum likelihood function for a plausible probability distribution could provide as good population parameter estimates as could least squares estimates. A mode of estimation different from Wilks' method was provided by Dear (1959) . He substituted for each

PAGE 19

10 missing observation of an independent variate the division of the sum of the value of all observed independent variables by the sum of the number of observations for all observed independent variables . This somewhat corresponds to the grand mean of all the independent variables. It is clear that serious difficulties would be incurred when the independent variables are measured on different scales. Walsh (1959) and Buck (1960) considered omission estimates in respect to paired simple linear regression. Walsh studied the utilization of all data available for a pair of variables in the simple linear regression computation. Those experimental tinits for which no data were missing were looked at by Buck in the paired regression analysis. Both Walsh and Buck determined that the average of values obtained from the simple linear regression provided suitable estimates for missing responses. Anderson (1957) investigated a particular pattern of missing observations called a monotone sample. This is a sample in which the observations on each variate is a subset of another variate, i.e., each variate is nested within another variate. He set forth a method of estimation very similar to Edgett's although greatly simplified in the amount of necessary mathematical manipulation. Several writers (Bhargava, 1962; Afifi and Elashoff, 1966, 1967) have gone beyond the monotone trivariate case of Anderson and determined solutions for the general variate case.

PAGE 20

11 In addition, Bhargava developed the likelihood ratio tests for hypotheses dealing with the linear model and equality of covariance matrices with multivariate monotone samples. Trawinski and Bargmann (1964) examined a considerably more complicate pattern of missing data than Anderson (1957), Bhargava (1962), and Afifi and Elashoff (1966, 1967). The concern of Trawinski and Bargmann was with observations that were missing not by accident, but by design. They found that correlation coefficients were logically consistent estimates to use with incomplete multivariate data. In deference to data missing by accident or design, Hocking and Smith (1968) assumed neither in developing their analytic procedures. They formulated a procedure to compute maximum likelihood estimates for parameters but only in the case of large samples. Anderson, Trawinski and Bargmann, and Hocking and Smith used estimates of groups of data. They did not estimate specific missing observations. The design of experiments which involve multiresponses and omissions was considered by Srivastava (1968) . He pointed out that an experimenter must give attention to whether or not each response on each experimental unit is to be measured. He provides a discussion of what he calls the lack of need of a regular design. (A regular design is one where all responses are sought on all experimental units.) Before data collection, a researcher should set up his design such that the only data collected will be somewhat convenient or useful.

PAGE 21

12 Haitovsky (1968) compared the methods of Buck and Walsh. He carried out a simulated data analysis, first using only complete data, discarding incomplete experimental units and second, using all available observations to estimate correlations. He found the former procedure superior. This is the case when the number of missing entries is not high. A comparison of a complete data set and an incomplete data set which is a subset of the complete set was conducted by Morrison (1971). He determined that when the correlations between the complete and incomplete variates of the data set are small, the multivariate missing value estimates are less accurate in the estimation of the mean square error term than the multivariate data set with no estimates. An extension of the work of Walsh and Buck was conducted by Dagenais (1971) . He developed a more generalized method which not only corrects for data omissions but also provides for additional corrections during data analysis, His estimates are consistent when the independent variable is fixed; each observation contains a value for the dependent variable and at least one of the independent variables; and some observations are complete. Srivastava and Zaatar (1972) dealt with the problem of classifying a future multiresponse observation into one of two populations given two incomplete multiresponse

PAGE 22

13 samples, one from each population. They developed a rule for the classification given the fact that the observation did come from one of the populations. Investigations of entire sections of missing data were performed by Hartwell and Gaylor (1973) and Rubin (1974) . The former examined missing cells employing the method of unweighted means . He provides a method of cell estimation using estimated variances. Rubin looked at complete blocks of missing data by decomposing the original estimation problem into smaller estimation problems using a technique he denotes as "factorization." This consists of discovering those subject responses that are complete and using these response patterns to estimate missing observations of subjects with a similar response pattern. Problems of Missing Multiresponse Observations in Education In a paper which is an overview of multivariate data in education, Pruzek (1971) brought both the educational community and other areas of research face to face with the problem of incomplete multiresponse data sets and their investigation employing multivariate analysis of variance (MANOVA) . He outlined two procedures regarding the phenomenon of missing data in MANOVA applications. The first is the situation where several scattered responses are missing for each dependent variable, and the second is where whole vectors of responses are missing. No proven method of estimations for omissions is provided.

PAGE 23

14 Raff eld (1973) and Lord (1974)considered missing item responses and their estimates. Lord examined ability and item parameters. His emphasis was on the inappropriateness of scoring an item as incorrect if it were omitted by the subject. He uses probability methods to estimate the omitted data from a minimum of two or three thousand other subjects. Raff eld pursued estimates of items on standardized achievement tests using mean value estimates. He concluded that for omitted items on a standardized achievement test it is better to assign a; value which is the mean of the alternatives for that item rather than assigning the mean response for the group omitting the item. Neither Lord nor Raffeld concerned himself with subscbre estimates. Direction of Present Research The above review was concerned either with estimates of missing data and their parameters or estimates of missing data without concern for analysis . The intention of this study is to forego parametric concerns, apply simple methods of data estimation, analyze the estimated data sets, examine the results of the analysis, and provide results directly related to educatibnal research. It will use a frequently employed educational measurement, the achievement test with several subs cores, and investigate estimation miethods understood by most researchers and students of research.

PAGE 24

CHAPTER III DESIGN OF THE STUDY The research conducted in this study focused on the usefulness of the inclusion of multiresponse data, which consists of several subscores, in a multivariate analysis of variance as dependent variables when random missing subscores were estimated using mean value and regression techniques. The analyses of the data sets formed by the two methods of estimation were compared to each other and to the analysis of the complete data set. The underlying focus of the research concerned the efficacy of the above method when applied to educationally related data. Thus the data sets investigated consisted of achievement scores collected on elementary school students. Procedures Two random| samples were drawn from two fixed groups. The first sample consisted of 193 fourth-grade students and the second of an equal number of fifth-grade students. Both were administered the Stanford Achievement Test Battery in the spring of 1974. The fourth-grade sample was given the Intermediate I Battery and the fifth-grade sample the Intermediate II Battery providing raw scores for analysiis. 15

PAGE 25

16 In preparing the data for analysis, random subsamples were drawn comprising 2%, 5, 10, 15, and 20 percent of each of the two original complete data sets. The number of subjects in each of these subsamples was 5, 10, 20, 29, and 39, respectively. The subjects in these subsamples were considered as having missing data. One achievement subscore was randomly discarded for each subject in each of the missing subsamples . This procedure was conducted five times for each of the five percent levels, obtaining five different random subsamples . Utilizing the subjects without randomly chosen missing subscores, means on each achievement test variable were formed. These means were substituted for the randomly discarded subscore for each subject in each of the missing subsamples . Likewise, the subjects without randomly chosen missing subscores were subjected to multiple linear regression analysis. One achievement test subscore was randomly chosen as the dependent variable, and the remaining subscores were the independent variables. The nondiscarded subscores of each of the subjects with a missing subscore were substituted in the corresponding resulting regression equation. The value obtained from the regression equation was substituted for the randomly discarded subscores.

PAGE 26

17 Method In testing the hypotheses, multivariate analysis of variance (MANOVA) was conducted on each of the 100 adjusted samples with missing data and on the complete original sample with no missing data. The two fixed groups were the independent variables, and the achievement test subscores were the dependent variables in each case. The MANOVA results of the mean value estimates and the multiple linear regression estimates were compared to the MANOVA results of the complete original sample and to each other. The comparisons of the resulting F-ratios were determined by the evaluation of the complement of the cumulative distribution function of the variance ratio distribution. The method consists of the following series expansion. Let n and m be the first and second number of degrees of freedom, respectively, and let a = tan~\ /nF/m where F is the F-ratio value. Then if n is even, the complement P is defined as P(n,m,F) = cos"^ a . m(m+2) . It , , m(m+2) . . . (m+n-4) . n-2 + T2 )(U) . . . (n-2) ^^^

PAGE 27

18 If m is even, P(n,in,F) = 1 sin'^ a 1 I ll 2 1 + J COS a , n(n+2) It , "*" (2) (4) ^°^ " + , n(n+2) . . . (n+m-4) m-2 "^ (2) (4) . . (m-2) ^°^ " If n and m are both odd, PCn m FV= 2 (2) (4) . (m-1) m -^ cos a sm a . (in-2) 1 + ^_ sxn a + (3)^3) (m+1) (m+3) . ^ + . + (m+1) (m+3) . . (in+n-4) . n-3 2 sin a cos d . (n-2) sm a 1 + ^ cos^ a 4(2) (4) u ^ (3)(5) ^°^ a + . . + (2) (4) . . . (m-3) ^^ m-3 ^ (3) (5) ... (m-2) ^°^ + 1-2^ IT where, if n = 1, the first series is to be taken as zero, and if m = 1, the second series is to be taken as zero and the factor (3) (5) ' [ ' (^~-2) ^^ ^° ^^ taken as unity (Hopper, 1970) If the complement of the complete data set is greater than 0,05 and the complement of a data set with an estimated missing subsample is less than or equal to 0.05, then the

PAGE 28

19 MANOVA results are considered significantly different from one another. Likewise, if the complement of the complete data set is less than or equal to 0.05 and the complement of a data set with an estimated missing subsample is greater than 0.05, then the MANOVA results are considered significantly different from one another. If both results are either greater than 0.05 or less than or equal to 0.05, then the MANOVA results are not considered significantly different from one another. This method is contingent upon the level of significance chosen and relies on the fact that the point of significance is immutable.

PAGE 29

CHAPTER IV RESULTS It has been the experience of the researcher that when conducting data analysis on achievement tests, he obtains a list of scores which contains missing subscores. The data on experimental units with missing subscores must then be discarded and results in a loss of information. The present study questioned the applicability of using estimates for multiresponse data in multivariate analysis of variance (MANOVA) when one response of an experimental unit is missing. Both mean value and regression estimates were employed for missing data in the manner reported in Chapter III . There were three specific questions "investigated in this study: Do mean value estimates provide different MANOVA results from that obtained when analyzing the total data set? Do regression estimates provide different MANOVA results from that obtained when analyzing the complete data set? and thus. Do mean value estimates provide different MANOVA results from regression estimates? Each of these inquiries was looked at for varying percent levels of missing data (2%, 5, 10, 15, and 20 percent of the total sample). The five different levels were employed on five different 20

PAGE 30

21 random subsamples of missing data. This was performed on two different data sets of fourthand fifth-grade elementary school students for the two types of estimates. This resulted in 5 x 5 x 2 x 2 random incomplete samples, or a total of 100 incomplete samples, that were studied and compared to the two complete data sets of fourthand fifthgrade students. The presentation of results in this chapter is according to each of the five percent levels of missing data for the three aforementioned questions. These three questions represent the three hypotheses which are stated as follows : Hi : ' There is no difference in MANOVA results for the complete data set and the mean value estimated data set with the size of the missing subsample ranging from 2% to 20 percent of the complete data set. H2 : There is no difference in MANOVA results for the complete data set and the regression estimated data set with the size of the missing subsample ranging from 2% to 20 percent of the complete data set. H3: There is no difference in MANOVA results for the mean value estimated data set and the regression estimated data set both with the size of the missing subsample ranging from 2% to 20 percent of the complete data set. The MANOVA F-ratios and the corresponding complement of the cumulative distribution function of the variance ratio distribution are provided in response to these hypotheses. MANOVA performed on the complete data set of fourth graders resulted in a F = 2.8851 with 8 and 185 df (degrees

PAGE 31

22 of freedom) ; for the fifth graders, there resulted a F = 3.3229 with 7 and 185 df. Determining the complement of the cumulative distribution function, the P value obtained for the fourth-grade data set was 0.004745 and that for the fifth-grade data set was 0.002341. Comparison of the Mean Value and the Regression Estimated Data Sets with One Another_and with the Complete Data Set at the 2% Percent Level of Missing Subsamples The values of the Fratio and complement of the cumulative distribution function for fourthand fifthgrade mean value and regression estimated data sets at the 2% percent level are presented in Table 1. For the fourthgrade sample, no F-ratio of the mean value estimated data sets differed from the complete data set's F-ratio by more than 0.1267. Likewise, for the regression estimated data sets, no .F-ratio differed from the complete data set's F-ratio by more than 0.0675. Equivalent ranges for the fifth-grade sample were 0.0329 and 0.0397, respectively. Examining the complement of the ciamulative distribution function for the fourth-grade sample, no P of the mean value estimated data sets differed from the complete data set's complement by a value greater than 0.001388. Likewise, for the regression estimated data sets, no complement differed from the complete data set's complement by a value greater than 0.000798. Equivalent ranges for the fifth-grade sample were 0.000196 and 0.000245, respectively.

PAGE 32

23 c

PAGE 33

24 Since the complement of the complete data set for both the fourth and fifth grades was less than 0.05 while at the same time the five complements of the mean value and the regression estimated data sets were; less than 0.05, the three null hypotheses are not rejected at the 2% percent level of missing subsamples. Comparison of the Mean Value and the Regression Estimated Data Sets with One Anotfier_and ~ ^ith the Complete Data Set at the 5 ^ Percent Level of Missing SuBsampIes The values of the F-ratio and complement of the cumulative distribution function for fourthand fifth-grade mean value and regression estimated data sets at the 5 percent level are presented in Table 2. For the fourth-grade sample, no F-ratio of the mean value estimated data sets differed from the complete data set's F-ratio by more than 0.1859. Likewise, for the regression estimated data sets, no F-ratio differed from the complete data set's F-ratio by more than 0.0302. Equivalent ranges for the fifth-grade sample were 0.1268 and 0.1226, respectively. Examining the complement of the cumulative distribution function for the fourth-grade sample, no P of the mean value estimated data sets differed from the complete data set's complement by a value greater than 0.001893. Likewise, for the regression estimated data sets, no complement differed from the complete data set's complement

PAGE 34

25 c

PAGE 35

26 by a value greater than 0.000375. Equivalent ranges for the fifth-grade sample were 0.000875 and 0.000842, respectively. Since the complement of the complete data set for both the fourth and fifth grades was less than 0.05 while at the same time the five complements of the mean value and the regression estimated data sets were less than 0.05, the three null hypotheses are not rejected at the 5 percent level of missing subsamples. Comparison of the Mean Value and the Regression Estimated Data Sets with One Another°and with the Complete Data Set at the 10 Percent Level of Missing Subsamples" The values of the F-ratio and complement of the cumulative distribution function for fourthand fifth-grade mean value and regression estimated data sets at the 10 percent level are presented in Table 3. For the fourth-grade sample, no F-ratio of the mean value estimated data sets differed from the complete data set's F-ratio by more than 0.5650. Likewise, for the regression estimated data sets, no F-ratio differed from the complete data set's F-ratio by more than 0.1607. Equivalent ranges for the fifth-grade sample were 0.1006 and 0.0801, respectively. Examining the complement of the cumulative distribution function for the fourth-grade sample, no P of the mean value estimated data sets differed from the complete data set's complement by a value greater than 0.003977. Likewise, for the regression estimated data sets, no

PAGE 36

27 c

PAGE 37

28 complement differed from the complete data set's complement by a value greater than 0.001688. Equivalent ranges for the fifth-grade sample were 0.000523 and 0.000427, respectively. Since the complement of the complete data set for both the fourth and fifth grades was less than 0.05 while at the same time the five complements of the mean value and the regression estimated data sets were less than 0.05, the three null hypotheses are not rejected at the 10 percent level of missing subsamples. Comparison of the Mean Value and the Regression Estimated Data Sets with One Another and ~ with the Complete Data Set at the 15 " Percent Level of Missing Subsamples The values of the F-ratio and complement of the cumulative distribution function for fourthand fifth-grade mean value and regression estimated data sets at the 15 percent level are presented in Table 4. For the fourth-grade sample, no F-ratio of the mean value estimated data sets differed from the complete data set's F-ratio by more than 0.3063. Likewise, for the regression estimated data sets, no F-ratio differed from the complete data set's F-ratio by more than 0.1386. Equivalent ranges for the fifth-grade sample were 0.2364 and 0.0412, respectively. Examining the complement of the cumulative distribution function for the fourth-grade sample, no P of the mean value estimated data sets differed from the complete data set's complement by a value greater than 0.002696. Likewise,

PAGE 38

29 c

PAGE 39

30 for the regression estimated data sets, no complement differed from the complete data set's complement by a value greater than 0.001496. Equivalent ranges for the fifthgrade sample were 0.001050 and 0.000255, respectively. Since the complement of the complete data set for both the fourth and fifth grades was less than 0.05 while at the same time the five complements of the mean value and the regression estimated data sets were less than 0.05, the three null hypotheses are not rejected at the IS percent level of missing subsamples . Comparison of the Mean Value and the Regression Estimated Data Sets with One Another anH with the Complete Data Set at the 2(J ' Percent Level of Missing Subsamples The values of the F-ratio and complement of the cumulative distribution function for fourthand fifth-grade mean value and regression estimated data sets at the 20 percent level are presented in Table 5. For the fourth-grade sample, no F-ratio of the mean value estimated data sets differed from the complete data set's F-ratio by more than 0.3305, Likewise, for the regression estimated data sets, no F-ratio differed from the complete data set's F-ratio by more than 0.1237. Equivalent ranges for the fifth-grade sample were 0.2711 and 0.0479, respectively. Examining the complement of the cumulative distribution function for the fourth-grade sample, no P of the mean value estimated data sets differed from the complete

PAGE 40

31 w PQ H

PAGE 41

32 data set's complement by a value, greater than 0.002830. Likewise, for the regression estimated data sets, no complement differed from the complete data set's complement by a value greater than 0.001361. Equivalent ranges for the fifth-grade sample were 0.001159 and 0.000299, respectively. Since the complement of the complete data set for both the fourth and fifth grades was less than 0.05 while at the same time the five complements of the mean value and the regression estimated data sets were less than 0.05, the three null hypotheses were not rejected at the 20 percent level of missing subsamples. Further Results To determine which method of estimation investigated was the stronger, an inspection of the values of the F-ratios and complements of the cumulative distribution function was conducted. The closeness of these values of the incomplete data sets to that of the appropriate complete data set was observed. For each group of five incomplete data sets at each percent level, the range of values was found and examined for largeness of width. The largest range at each percent level of missing data for the fourth-grade sample with mean value estimates varied from 0.001388 to 0.003977, whereas, for the regression estimated samples, it varied from only 0.000375 to 0.001688. For the fifth-grade samples with mean value

PAGE 42

33 estimates, the range varied from 0.000196 to 0.001159. For regression estimates, it was 0.000245 to 0.000842. Only at the 2% percent level of missing values did the mean value complement range not exceed that of the regression complement range . A closer examination of the results revealed additional information. One might presume that as the percent of estimated data elements decreased, the smaller the range would be between the value of the F-ratio of the complete data set and the most distant value of the F-ratio of the data sets with estimated values. This was neither consistent within the fourthand fifth-grade samples nor within the method of estimation. Considering the percent level of missing data with the shortest range to the level with the longest range, the order for the fourth-grade sample with mean value estimates is 2%, 5, 15, 20, 10; for the fourthgrade sample with regression estimates, 5, 2%, 20, 15, 10; for the fifth-grade sample with mean value estimates, 2%, 10, 5, 15, 20; and for the fifth-grade sample with regression estimates, 2%, 15, 20, 10, 5. The exact results hold for the complement of the cumulative distribution function. . Another presumption might be that the value of the F-ratio of the complete data set would be within the range of the values of the F-ratios at a particular percent level of missing data. This is consistent for the fourthand fifth-grade samples within a method of estimation but not

PAGE 43

34 between methods of estimation. For both the fourthand fifth-grade samples having mean value estimates, the value of the F-ratio of the complete data set is within the range of the values of the F-ratios for all percent levels of missing data. For regression estimated samples, this is not the case. The fourth-grade samples have F-ratios not inclusive, range-wise, of the complete data set's F-ratio at the -2% percent level; for the fifth grade, it is at the 2% and 20 percent levels. The value of the F-ratio of the complete data set exceeds the values of the F-ratio in the fifth-grade sample and precedes the values in the fourthgrade sample. Summary In summary, this chapter has presented the statistical analysis of the data. The results of the study indicated that no significant differences exist among the MANOVA results of data sets having missing subscores estimated by mean values, data sets having missing subscores estimated by regression, and the complete data set with no missing values. This was demonstrated for 100 samples with estimated subscores. The estimated subsamples consisted of 2%, 5, 10, 15, and 20 percent of the complete samples of fourthand fifth-grade students. Since inspection showed that the regression estimated values provided MANOVA and complement results at each

PAGE 44

35 percent level closer, in all instances, to that of the complete data set, it is apparently the stronger of the two estimation procedures. Both methods of estimation, though, were demonstrated to provide MANOVA results not significantly different from the results of the complete data sets

PAGE 45

CHAPTER V DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS Discussion The intention of this study was to examine the effect of different estimators for missing multiresponse data on multivariate analysis of variance (MANOVA) results. Mean value and regression techniques were used in determining estimates. The MANOVA results for the data sets which employed the different estimation techniques were compared to each other and to MANOVA results of the complete data set. Specifically investigated were the achievement test scores of a fourth-grade sample and a fifth-grade sample. Fifty MANOVAs were conducted on each grade; 25 analyzed the incomplete data sets with mean value estimates and 25 with regression estimates. The 25 analyses were subgrouped into five sets of analyses. Each set contained a different percent, level of missing data. These levels were 2%, 5, 10, 15, and 20 percent of the complete sample. Five samples with different missing subsets, of data were analyzed at each level. The results of Chapter IV demonstrated that the 14AN0VA results of both estimation techniques did not differ 36

PAGE 46

37 significantly from one another nor from the results obtained from the complete data set.. Inspection of the F-ratios and complements implied that the regression method was apparently the stronger estimation technique. The latter result was determined by the closeness of the values of the F-ratios and the complements of the ciimulative distribution function for the estimated samples to that of the complete data set. In addition, two a posteriori results were observed. It was found that as the percent of estimated data elements decreased, it did not follow that the smaller the range would be between the value of the F-ratio of the complete data setand the most distant value of the F-ratio of the data sets with estimated values. The non sequitur held for both grades of students and both methods of estimation. This was likewise true for the complement of the cumulative distribution function. A second finding was that the F-ratio of the complete data set was not within the range of the values of the F-ratios at all percent levels of missing data estimated by regression techniques. It did hold for mean value estimated data sets. The same findings occurred among the complements of the cumulative distribution function. Conclusions Three conclusions were drawn from the present study:

PAGE 47

38 1. Achievement data with up to 20 percent missing subscores that are estimated by mean value techniques when analyzed by MANOVA provide results which do not differ significantly from MANOVA results of the same achievement data without any missing subscores. 2. Achievement data with up to 20 percent missing subscores that are estimated by regression techniques when analyzed by MANOVA provide results which do not differ significantly from MANOVA results of the same achievement data without any missing subscores. 3. Achievement data with up to 20 percent missing subscores that are estimated by mean value techniques when analyzed by MANOVA provide results which do not differ significantly from MANOVA results of achievement data with up to 20 percent missing subscores that are estimated by regression techniques. The above conclusions seem to suggest that there exist for educators alternatives in data analysis other than discarding incomplete multiresponse observations. The alternatives provided here are the two methods of estimation; mean value and regression. In addition, the mean value method of estimation was demonstrated to be as appropriate in MANOVA as the regression method as proven by the nonrejection of the third hypothesis. Further data considerations revealed that for all levels of missing data, the F-ratio.of the complete data set was located within the range of the F-values determined for the data sets with missing subsamples estimated by the mean value methods. This did not hold for the regression method. Since the mean value method is straightforward and has been proved to be an appropriate estimation

PAGE 48

39 technique, data formerly lost to' analysis can be retained. No longer must estimates for omissions be evaded because of complicated data manipulations, time, money, and resources. Recommendations The present study has operated under various limitations which need to be investigated in order to extend the inferences of this research. Bracht and Glass (1968) stated: The intent (sometimes explicitly stated, sometimes not) of almost all experimenters is to generalize their findings to some group of subjects and set of conditions that are not included in the experiment. To the extent and manner in which the results of an experiment can be generalized to different subjects, settings, experimenters, and,, possibly, tests, the experimenter possesses external validity , pp. 437-438 The external validity of this study is restricted by the lack of reported research dealing with statistical analyses which employ data estimates without parametric estimates. Areas which require further investigation in reference to inferential conclusions are presented in the following list 1. The samples consisted of fourth and fifth graders . Other educational levels need to be examined. 2. Achievement scores for two levels of one standardized achievement test were analyzed. Other standardized achievement tests need to be investigated. 3. In addition to achievement tests, other types of tests which measure not only the cognitive domain but also the affective domain need to be studied such as those dealing with selfconcept and social acceptance.

PAGE 49

40 4. Other methods of estimation need to be considered in a manner similar to the present investigation and compared to mean value methods for accuracy and simiplicity. 5. Missing subsamples were determined randomly. Actual missing subsamples need to be investigated for possible commonalities. 6. The levels of missing data should be expanded in order to determine maximum levels of missing subsamples. 7. More than one missing subscore per experimental unit needs inspection. 8. Experimental designs requiring analyses different from multivariate analysis of variance need probing. These recommendations are listed not only to provide closure to the present study but also to indicate the multidirectional approaches involved in this specific area of research Closure is provided with respect to confining the present research's inferences to the subset of investigations outside of the above listing. The expanse of additional approaches is suggested by the list itself. No one item of the list is more worthy of study than the other. All need investigation in order to advance to the universal set of estimators for omissions of multirespons.e data.

PAGE 50

REFERENCES Afifi, A. and Elashoff , R. M. "Missing observations in multivariate statistics I. Review of the literature . " Journal of the American Statisti cal Association , 1966, 61. 595-604. ~ Afifi, A. and Elashoff, R. M. "Missing observations in multivariate statistics II. Point estimation in simple linear regression." Journal of the American Sta tistical Association, 1967. 62. 10-29. — Anderson, T. W. "Maximum likelihood estimates for a multivariate normal distribution when some observations are missing." Journal of the American Sta tistical Association . 1957, 52, 200-203. ~ Baird, H. R. and Kramer, C. Y. "Analysis of variance of a balanced incomplete block design with missing observations. Applied Statistics, 1960, 9. 189-198. Bhargava, R. Multivar iate tests of hypotheses with incomplete data . "Applied Mathematics and Statistical Labora' tories, Technical Report 3, 1962. Bracht, G. H. and Glass, G. V. "The external validity of experiments." American Educa tional Research Journal , 1968, 5, 437-474. Buck, S. F. "A method of estimation of missing values in multivariate data suitable for use with an electronic computer." Journal of the Royal Statistical Society. Series B . 1960, 22, 302-307. [ ^ Dagenais, M. G. "Further suggestions concerning the utilization of incomplete observations in regression analysis." Journal of the American Statistical Association, l97I. 66. 93-98. ~* 41

PAGE 51

42 Dear, R. E. "A principal-component missing-data method for multiple regression models," SP-86, Systems Development Corporation, Santa Monica, California, 1959. Dempster, A. P. "An overview of multivariate data analysis." Journal of Multivariate Analysis , 1971, 1, 316-346. Edgett, G. L. "Multiple regression with missing observations among the independent variables . " Journal of the American St atistical Association, 1956. 51 122-131. \ — ; — Federspiel, C. F. , Monroe, R. J., and Greenberg, B. G. "An investigation of some multiple regression methods for incomplete samples." University of North Carolina, Institute of Statistics, Mineo Series, No. 236, August 1959. Glasser, M. "Linear regression analysis with missing observations and the independent variables." Journal of the A merican Statistical Association, 1964, 59, 834-844: ' Haitovsky, Y. "Missing data in regression analysis." Journal of the Roy al Statistical Society, Series B , 1968. 30. 67-82. 'Hartwell, T. D. and Gaylor, D. W. "Estimating variance components for two-way disproportionate data with missing cells by the method of unweighted means." Journal of t he American Statistical Association. 19/3. 68, 379-383. Hocking, _R. R. and Smith, W. B. "Estimation of parameters in the multivariate normal distribution with missing observations." Journal of the American Statistical Association , 1968, 63. 159-173. Hopper, M. J., comp. Harwell Subroutine Library: A Catalogue of Subroutines . London : Her Majesty 's Stationery Office, State House, 49 High Holborn. 1970. Kleinbaum, D. G. Estimation and hypothesis testing for generalized multivariate linear models . Doctoral dissertation. University of North Carolina, Chapel Hill, North Carolina, 1970. Kramer, C. Y. and Glass, S. "Analysis of variance of a Latin square design with missing observations," Applied Statistics . 1960. 9, 43-50

PAGE 52

43 Lord, F. M. "Estimation of parameters from incomplete data." Journal, of the American Statistical Association, 1955, 50, 870-876. ~ [ Lord, F. M. "Estimation of latent ability and item parameters when there are omitted responses." Psyc hometrika, 1974, 39, 247-264. Matthai, A. "Estimation of parameters from incomplete data with applications to design of sample surveys." Sankhya , 1951, 2, 145-152. Mitra, S. K. "Some remarks on the missing plot analysis." Sankhya , 1959, 21, 337-344. Morrison, D. F. "Expectations and variances of maximum likelihood estimates of the multivariate normal distribution parameters with missing data." Journal of the Am e rica n Statistical Association, 1971, 66, 602-604. Nicholson, G. E., Jr. "Estimation of parameters from incomplete multivariate samples . " Journal of the American Statistical Association, 1957, 52, 523-526. — — — Preece, D. A. "Query and answer: Non-additivity in tv/oway classifications with missing values." Bio metrics , 1972, 28, 574-577. Pruzek, R. M. "Methods and problems in the analysis of multivariate data." Review of Educational Research, 1971, 41, 163-190. ' ' : Raff eld, P. C. The effects of Guttman weights on the reliability and predictive validity of objective tests when omissions are not differentially weighted . Doctoral dissertation, University of Oregon, 1973. Rubin, D. B. "Characterizing the estimation of parameters in incomplete-data problems." Journal of the American Statist ical Association, 1974, 69. 467577^^ — , Srivastava, J, N. "On the extension of Gauss -Markov theorem to complex multivariate linear models . " The Annals of the Institute of St atistical Mathematics, 1967, 19, 417-437. [

PAGE 53

44 Srivas.tava, J. N. "On a general class of designs for multiresponse experiments." The Annals of Mat hematical Statistics . 1968, 39, 1825-1843. [ Srivastava, J. N. and McDonald L. "On the costwise optimality of hierarchical multiresponse randomized block designs under the trace criterion." The Annals of the I nstitute of Statistical Mathematics , 1969. 21. 507-514. Srivastava, J. N. and McDonald, L. "On the costwise optimality of certain hierarchical and standard multiresponse models under the determinant criterion." Journal of Multivariate Stat istics. 1971, 1, 118Srivastava, J. N. and Zaatar, M. K. "On the maximum likelihood classification rule for incomplete multivariate samples and its admissibility." Journal of Multi variate Analysis , 1972, 2, 115-125: [ Trawinski, I. M. Incomplete-variable designs . Doctoral dissertation, Virginia Polytechnic Institute, Blacksburg, Virginia, 1961. Trawinski ,. I . M. and Bargmann, R. E. "Maximiom likelihood estimation with incomplete multivariate data." The Annals of Mathemat ical Statistics, 1964, 35, 647-657. ~ Walsh, J. E. "Computer-feasible general method for fitting and using regression functions when data are incomplete." SP-71, System Development Corporation, Santa Monica, California, 1959. Wilkinson, G. N. "Comparison of missing value procedures." Australian Journal of Statistics , 1960, 2, 53-65. Wilks, S. S. "Moments and distributions of estimates of population parameters from fragmentary samples." The. Annal s of Mathematical Statistics, 1932, 3. 163-195. ^

PAGE 54

BIOGRAPHICAL SKETCH Stephen S. Sledjeski was born November 27, 1942, in Greenport, New York. He graduated from Southold High School, Southold, New York; the Diocesan Preparatory Seminary, Buffalo, New York (A. A.); St. Bonaventure University, St. Bonaventure, New York (B.S.); and the University of Florida, Gainesville, Florida (M.Ed. , Ed.S ., Ph.D.) . His educational employment experience consists of working as a middle school mathematics teacher with the Alachua County Board of Public Instruction, Gainesville, Florida; a research associate with Santa Fe Community College, Gainesville, Florida; supervisor of data processing as a graduate research assistant with the Florida Parent Education Model of Project Follow Through, University of Florida, Gainesville, Florida; and Research Specialist at P. K. Yonge Laboratory School, Gainesville, Florida. In addition, he has been a statistical and computer consultant for doctoral students, the Florida State Department of Health and Rehabilitation Services, and the Career Opportunities Program, Richmond, Virginia. 45

PAGE 55

I certify that I have read this study and that in my opinion' it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. / Vyrice A. Hines, Chairman Professor of Foundations of Education I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Ira J, Graduate Research Professor of Foundations of Education I certify that I have read this study and that, in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Robert S. Soar Professor of Foundations of Education I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Z. R. Pop^Stoja'novic C Associate Chairman and Professor of Mathematics

PAGE 56

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Hattxe Bessent Assistant Professor of Foundations of Education This dissertation was submitted to the Graduate Faculty of the College of Education and to the Graduate Council, and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. March, 1976 an, Collegd' of ^education wcd^^ Dean, Graduate School