THE IMPACT OF MULTIPLE IMPUTATIONS ON
THE ESTIMATION OF COEFFICIENT ALPHA
By
HON KEUNG YUEN
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2000
ACKNOWLEDGMENTS
I am indebted to a few special individuals who made this dissertation possible.
First, I would like to thank my wife, Kit, for her patience and understanding
throughout this process. Also, I would like to thank my committee members Dr.
David Miller, and Dr. Anne Seraphine, Dr. Kay Walker, and Dr. Arthur Newman for
their time and support.
TABLE OF CONTENTS
page
ACKNOWLEGEMENTS .................................................................... ii
ABSTRACT ......................................................................................... iv
CHAPTERS
1. INTRODUCTION ....................................................................... 1
Statement of the Problem ............................................................... 4
Rationale for the Study ..................................................................... 5
Purpose and Significance of the Study ................................................ 6
2. REVIEW OF LITERATURE ........................................... ............ 7
Common Missing Data Treatments .................................... ........... 7
M multiple Im putation .......................................................................... 14
Missing Data Mechanisms .............................................................. ... 33
3. METHODOLOGY ........................................................................ 45
Sim ulation Procedure ........................................................................... 46
Design of Study .............................................................................. 50
Multiple Imputation Procedure ........................................ ........... 61
Evaluating the Performance of Multiple Imputation ........................ 64
4. RESULTS .......................................................... ............................ 66
5. DISCUSSION ................................................................................ 79
Lim stations ........................................ ................ ............................. 81
Suggestions to Future Research ........................................ .......... 81
REFEREN CES ...................................................................................... ... 84
BIOGRAPHICAL SKETCH .................................................................. 90
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
THE IMPACT OF MULTIPLE IMPUTATIONS ON
THE ESTIMATION OF COEFFICIENT ALPHA
By
Hon Keung Yuen
August, 2000
Chairpersons: M. David Miller and Anne Seraphine
Major Department: Educational Psychology
The purpose of this dissertation is to investigate the accuracy of coefficient
alpha on tests when nonrandom missing data are replaced using multiple imputation
under a singlefacet crossed model. The performance of multiple imputation was
evaluated under the conditions of three sample sizes (N = 50, 100, or 500), ten
conditions of distribution and percent of missingness, and two omitting patterns
(omitting item responses in the body and omitting responses at the end of the test).
The ten missing conditions were formed from examinees of the three ability levels
(high, medium and low) with a differential number of missing items. The nonrandom
nature of missingness leads to examinees with low ability missing more difficult
items or more items at the end of the test than those with high ability. A twentyitem
test was used in this study. Results of the one thousand iterations indicated that the
magnitude of the bias obtained in the omitting pattern where missing responses are at
the end of the test was less than 0.03. In contrast, the magnitude of the bias obtained
in the omitting pattern where missing responses are in the body of the test was less
than 0.07. In general, the bias increased as the amount ofmissingness increased or as
the sample size decreased. However, this pattern is not uniform across all the missing
conditions investigated. Overall, this simulation study confirmed that multiple
imputation is a reasonably good procedure to replace the missing data on tests in
which missing responses are either in the body of the test or at the end of the test.
CHAPTER 1
INTRODUCTION
Accurate measurement of examinees' ability in standardized achievement
assessments requires the test scores to be reliably measured. Internal consistency is one
type of reliability that indicates how strongly the test items within the same construct are
correlated. Internal consistency of a test appeals to educators because it requires only a
single administration of one form of a test. Coefficient alpha (Cronbach, 1971) is a
commonly used index to estimate the internal consistency of a test. The index is not a
direct estimate of the theoretical reliability coefficient but is an estimate of the lower
bound of the internal consistency (Crocker & Algina, 1986). According to Peterson
(1994), the formula for computing coefficient alpha (a) can be expressed as
a= s 1 I s 1 (1)
sl a2 s1
i=1 i
where
s is the number of items in the test,
a. is the variance of the test scores,
of is the variance of a single item i,
oaiaris is the covariance between item i and item s, and
ris is the correlation between item i and item s.
Or a= s (12)
1 + F(s 1)
where F is the average interitem correlation.
As in the estimation of the Pearson product moment correlation coefficient,
computation of coefficient alpha requires a rectangular personbyitem data matrix with
no missing data (i.e., a balanced design data set). However, it is well known that missing
data is common in largescale standardized educational achievement tests such as the
National Assessment of Educational Progress (NAEP) (Koretz, Lewis, SkewesCox, &
Burstein, 1993; Longford, 1994) and the Test of English as a Foreign Language (TOEFL)
(Yamamoto, 1995). Yamamoto (1995) indicated that about 20% of examinees have
difficulty completing the last 20% of the items in the TOEFL. Two main classes of
nonrandom omitting pattern in a test have been identified: omitting item responses in the
body of the test and omitting item responses at the end of the test (i.e., notreached)
(Longford, 1994). A number of conceivable circumstances can contribute to these
occurrences. Angoff and Schrader (1984) found that response omissions in the body of
the test is common in tests with instructions indicating that there is a penalty for incorrect
responses, but not for response omissions. In this situation, examinees are more likely to
omit difficult items when they are not sure of the answer (Koretz et al., 1993). For
omitting responses at the end of the test, time constraints is a major factor. However, item
difficulty has been reported to contribute to this type of omitting pattern (Koretz et al.,
1993). Cluxton and Mandeville (1979) found less capable students tend to omit more
items at the end of the test.
Because of the balanced design requirement in the data set to compute coefficient
alpha, missing data present a challenge when standard methods of data analysis are used.
In the last few decades, a number of missing data treatments (MDTs) have been proposed
(see review in Little & Rubin, 1987). A promising MDT is multiple imputation (MI),
which was originally proposed by Rubin (1987). MI is a modelbased estimation
technique for analyzing data with missing scores (Rubin, 1987). Using information from
the observed part of the data set, MI generates k sets of equally plausible values from the
simulated distribution of the missing data to replace the missing scores, where k is greater
than one. The missing scores are imputed k times (Rubin, 1987). As a result, MI creates k
versions of complete data sets with imputed values. Each complete data set can be
analyzed separately by means of standard completecase analysis methods. The final
adjusted point estimate is obtained by averaging over the k intermediate parameter
estimates. MI has been shown to yield satisfactory parameter estimates with relatively
little bias (Graham & Schafer, 1999). However, MI has not been used widely in
educational settings except for matrix sampling and scaling procedures in the NAEP
(Mislevy, Johnson, & Muraki, 1992; Neal & Nianci, 1997).
Several recent studies compared different MDTs in estimating reliability
coefficients on measures with missing data (Downey & King, 1998; Harrison, 1998;
Marcoulides, 1990). Downey and King (1998) compared the accuracy of coefficient
alpha estimation using item mean and person mean substitution to replace missing data in
Likert scales. Results indicated that itemmean substitution reduces the reliability
estimate whereas personmean substitution increases the reliability estimate of the scale
as the number of missing items and the number of respondents with missing items
increases beyond 20% (Downey & King, 1998). Marcoulides (1990) compared the
consistency and efficiency of two MDTs (restricted maximum likelihood and analysis of
variance) in estimating variance components on measures with missing data. He found
that restricted maximum likelihood (REML) produces a more efficient and less biased
variance estimate when 20% of the data are randomly deleted (Marcoulides, 1990).
Along the same line of research, Harrison (1998) evaluated six MDTs (listwise deletion,
zero imputation, substituting least square ANOVA, substituting probabilities of correct
answers from logistic regression estimates, Hoyt's ANOVA formula, and REML) in
estimating coefficient alpha on tests with dichotomouslyscored items under the
conditions of five random and nonrandom missing data patterns crossed with two sample
sizes (50 and 100). Results showed that REML provides reasonable accuracy and
precision for the estimation of coefficient alpha in all five missing data patterns
(Harrison, 1998).
Statement of the Problem
Results of Harrison's (1998) study indicated that the average bias of the
coefficient alpha when using each of the six MDTs is negligible (less than 0.05) except in
two nonrandom missingdata situations where a listwise deletion procedure is used. One
reason for the small discrepancy in the bias among the six MDTs is that the maximum
amount of missing data in Harrison's study is less than 11%. Roth (1994) cited several
simulated and empirical MDT studies indicating that there is little difference in parameter
estimates when the amount of missing data is less than 510% regardless of the missing
data patterns (random or nonrandom). Roth (1994) suggested that the choice of MDTs
becomes more important when the amount of missing data in a data set is beyond 15
20%. Therefore, we still do not know how some of the MDTs behave in situations with a
moderate amount of missingness.
Harrison (1998) found that the mean coefficient alpha produced by REML is
more positively biased (i.e., overestimated) than that computed by Hoyt's ANOVA in
situations where lowability examinees have more omitted items or where they tend to
omit the most difficult items. There are two possible reasons for REML not behaving
well in Harrison's study. One is that REML has been shown to produce estimates that are
significantly biased in situations where sample size is small (N = 20 or 50) because
REML is based on largesample theory (Gross, 1997). The other is that REML produces
biased estimates when data are missing nonrandomly (Jamshidian & Bentler, 1999).
Rationale for the Study
The present study attempts to address some of the limitations of the REML
estimation procedure (Harrison, 1998) by implementing MI which has been shown to
perform well in small sample sizes (Graham & Schafer, 1999) and in nonrandom
missingdata situations (Graham, Hofer, Donaldson, MacKinnon, & Schafer, 1997).
Although MI is commonly applied to missing continuous data, it may also be applied to
dichotomous missing data (Graham et al., 1997). Because Harrison's study examined the
effectiveness of different MDTs under slight levels of missingness, it is of interest to
examine the MI under more extreme levels of missingness. The level of missingness is
therefore set as high as 30%. At this range of missingness, it would become more obvious
how well MI performs.
It is well known that data missing completely at random (MCAR) seldom occurs
in educational settings (Kromrey & Hines, 1994). The present study, therefore, focuses
on nonrandom missing data.
Purpose and Significance of the Study
The purpose of this study was to investigate, via data simulation, the accuracy of
the coefficient alpha on tests with missing data replaced using MI. The performance of
MI was evaluated under the conditions of three sample sizes (N = 50, 100, or 500), ten
conditions of distribution and percent of missingness, and two omitting patterns (omitting
item responses in the body and omitting responses at the end of the test). The results of
this study provided an indication of how well MI performed in the abovestated missing
data conditions under a singlefacet crossed model.
CHAPTER 2
REVIEW OF RELATED LITERATURE
The first section of this chapter provides an overview of some commonly used
missing data treatments (MDTs), which include listwise deletion, variable mean
substitution, regression imputation, and stochastic regression imputation. Limitations of
these MDTs are highlighted. The second section is devoted to the development and
theoretical framework of multiple imputation (MI), the relationship of MI and Bayes'
theorem, assumptions and characteristics of MI, the description of the imputation
methods, and procedures to perform MI. The last section discusses three major types of
missing data mechanisms as proposed by Little and Rubin (1987), and implications of
each missing data mechanism to the application of MI.
Common Missing Data Treatments
Listwise Deletion
In order to transform the missing data matrix into a rectangular one, a common
practice is to exclude those examinees who do not respond to all items. This is called the
listwise deletion procedure or the completecase analysis. The complete data with
reduced sample size are then used to estimate population parameters such as the
reliability coefficient. Listwise deletion is the default option for analysis in many popular
statistical software packages such as the Statistical Analysis System (SAS) and the
Statistical Packages for the Social Sciences (SPSSX). Even though listwise deletion is
the simplest approach to handle missing data, it is by no means the desirable one. Since
the analysis is based on only those examinees who respond to all items, a substantial
amount of useful data is lost. In a Monte Carlo investigation, Kim and Curry (1977)
found that even with 2% random nonresponses on each of the 10 variables, listwise
deletion results in retaining only 81.7% of the cases. There is an accompanied loss of
efficiency or statistical power in the estimation of the population parameters especially
when the amount of missing data is high. Raaijmakers (1999) demonstrated that listwise
deletion results in a loss of statistical power ranging from 35% to 98% as the amount of
missing data increases from 10% to 30% in various Likerttype data.
Listwise deletion is based on the assumption that the data are missing completely
at random even though there is little evidence to support this assumption in educational
research (Kromrey & Hines, 1994). When data are not missing completely at random,
estimates are biased. Empirical data support that the average bias increases as the amount
of missing data increases (Harrison, 1998). In Harrison's (1998) study, when item
responses are not missing at random, listwise deletion leads to range restriction. The
resulting mean coefficient alpha is then substantially underestimated (Harrison, 1998).
Single Imputation Procedures
Besides using deletion procedures to handle missing data, single imputation
procedures have also been used widely in educational research (see review in Raymond,
1987). Imputation involves filling in each missing response with a plausible value and
then analyzing the resulting data set with the imputed values. Also, plausible values are
estimated from observed scores in the study. Two major advantages of imputation
procedures are as follows:
1. They retain the information from incomplete cases without discarding any scores.
2. The resulting data set with the imputed values can be analyzed by means of standard
completecase analysis methods.
The three single imputation procedures that are commonly used in educational research
are variable mean substitution, regression imputation, and stochastic regression
imputation (Raymond, 1987; Roth, 1994).
Variable Mean Substitution
To implement variable mean substitution or unconditional mean imputation, each
missing score of a particular item is replaced with its respective mean value of all
nonmissing cases. Even though it seems that the mean is a good estimate and the
procedure is relatively easy to implement, variable mean substitution has several serious
disadvantages. The observed variance of an item with imputed mean value is
systematically underestimated (i.e., negatively biased) because imputing a mean value in
an item is equivalent to adding zero to the sum of the squared deviations, which is the
numerator of the formula for calculating variance. At the same time, there is an increase
in the denominator of the variance formula, (N 1), as the procedure attempts to restore
the original sample size (Landerman, Land, & Pieper, 1997; Raymond, 1987).
Attenuation of the magnitude of the covariance or correlation of scores with
filledin mean with scores in other items can be explained in a similar fashion.
Conceptually, the imputed values are a constant and they are unrelated to scores in other
items; therefore, interitem correlations are attenuated. Downey and King (1998) showed
that the severity of attenuation on correlation increases as the amount of imputed values
increases. As indicated in equation (12), a reduction in the average interitem correlation
results in a decrease in the coefficient alpha (Downey & King, 1998). Figure 21
graphically shows the coefficient alpha spuriously decreases when there is a reduction in
the average interitem correlation at the lower end (i.e., r < 0.2).
Another disadvantage related to the attenuation of variability of the item is that
the standard error of estimate is much too small resulting in biased inferences (Little &
Rubin, 1989). Finally, variable mean substitution does not use information from other
items to improve the accuracy of imputation (Landerman et al., 1997).
UI 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Average Interitem Correlation
Figure 21. Relationship between the coefficient alpha and the average interitem
correlation when s equals to 20.
Regression Imputation
Regression imputation or conditional mean substitution is used to fill in missing
scores of an item with values predicted from a regression model by utilizing information
from one or more highly related observed variables or predictors. When the response
variable with missing scores is dichotomous in nature, a logistic regression model is used
instead. The logistic regression produces a predicted probability of a response being
missing.
Suppose Y is an Nx 1 vector of the responses for examinees, Y= (yj, ... ,yn) and
is composed of a set of the observed scores, Yo = (y, ... yo) and a set of the missing
scores, Ym = (yo+1, ., y,). Y can be partitioned into Y = (Yo, Y,). Let no be the number of
respondents for Yo and n, be the number of nonrespondents for Y,. X denotes an N x G
design matrix of relevant explanatory variables or predictors that are highly correlated
with Y. These variables are fully observed (i.e., with no missing data). X. represents the
predictors for no individuals with observed scores Yo and Xm represents the predictors for
nm individuals with missing scores Y,. Ym scores are to be estimated. A linear regression
equation can be expressed as
Y= a + bX + e (21)
where
a is a column vector of the intercepts,
b is a column vector of estimated regression coefficients, and
e is a column vector of estimated residuals and is set to zero.
The procedure to impute the predicted values based on a deterministic regression
model is as follows. First, use X. to estimate the coefficients of the linear regression
equation by regressing Yo on X,. After estimating the regression coefficients from the
observed scores, a predicted score for Y,, can be obtained from the prediction equation:
Y, = a + bX. (Little & Rubin, 1989).
Landerman and associates (1997) explained that the distribution of Ybased on the
regression imputation is less distorted than Y based on the mean substitution because the
imputed values are now distributed across the predicted values fl instead of concentrated
at the mean. The variance of Ybased on the regression imputation is less attenuated than
that based on the mean substitution because the numerator for calculating variance is the
squared deviations of the Y. from the grand mean, which is not likely to be zero.
In regression imputation, the imputed values of Y, fall exactly on the predicted
regression line or plane as the estimated residual e in equation (21) is set to zero
(Landerman et al., 1997; Little & Rubin, 1989). Therefore, there is no variation in the
distribution of Ym given Xm (Little & Rubin, 1989). Because of the exact linear (or plane)
relationship between Ym and Xm, its correlation is spuriously inflated (Graham & Schafer,
1999). Harrison (1998) demonstrated that replacing missing data with either the least
squares estimates or the predicted probabilities overestimates the coefficient alpha. This
phenomenon can be explained by the positive relationship between the coefficient alpha
and the average interitem correlation in equation (12). In addition, the severity of bias
(overestimation of the coefficient alpha) increases as the amount of missing data
increases (Harrison, 1998). Little (1992) concluded that mean substitution or regression
imputation can yield unbiased estimates of aggregate means but leads to distorted
variance and covariance estimates.
Regression imputation conveys a false sense of accuracy that all missing scores
can be predicted from Xm without errors. By treating the imputed values as the known
observed scores, regression imputation fails to account properly for the variability or
uncertainty about not knowing the missing scores (i.e., which value to impute) (Rubin &
Schenker, 1991). Failure of the regression model to incorporate residual variability in the
imputation variance leads to standard errors of estimates bias toward zero (i.e., too small)
(Little & Rubin, 1989). For example, Brownstone and Valletta (1996) found that the least
squares standard error estimates are 30% less than their true values.
Stochastic Regression Imputation
In order to restore the prediction errors in the imputed values (i.e., the variability
around the regression line), a random residual / error is added to each predicted value.
The random residual can be drawn randomly with replacement either from a standard
normal distribution with a mean equal to zero and a standard deviation equal to the
standard error of estimate for Yo (Beaton, 1997), or from the distribution of residuals of
the regression estimate for Yo (Graham et al., 1997). The purpose of drawing with
replacement is to ensure that each drawn value has equal probability. In stochastic
regression imputation, each missing response is replaced by its conditional mean plus a
random residual from Yo (Little & Rubin, 1989).
However, stochastic regression imputation restores only one part of the
variability: the errors of prediction. There is another part of variability, the sampling
variability in which the values of the estimated regression coefficients are uncertain.
Graham and Schafer (1999) explained that the regression line estimated from Yo is not the
regression for the population, but is only an estimate from one sample. Stochastic
regression imputation cannot reflect properly the sampling variability because it lacks the
variation of the imputed values among several sets of imputations (Little & Rubin, 1989).
To incorporate the sampling variability in the estimation of the regression parameters,
multiple imputation (MI) is required (Rubin, 1987).
Multiple Imputation
Introduction
Multiple imputation was originally proposed by Rubin (1987). It is a modelbased
estimation technique for analyzing data with missing scores (Rubin, 1987). Using
information from the observed part of the data set, MI generates k sets of equally
plausible values from the simulated distribution of the missing data to replace the missing
scores, where k is greater than one. The missing scores are imputed k times and multiple
imputations within one model are called repetitions (Rubin, 1987). As a result, MI creates
k versions of complete data sets with imputed values. Each complete data set can be
analyzed separately by means of standard completecase analysis methods. The estimate
and its associated variance from each separate analysis can be combined to form an
unbiased final parameter estimate under the correctly specified model (Little & Rubin,
1989). The final variance incorporates the variability within the imputation (i.e., the
prediction error) and the variation of the imputed values among k sets of imputations (i.e.,
the sampling variability) to reflect the true accuracy of the estimation.
Theoretical Framework
Let Q denote a scalar population quantity (such as a coefficient alpha or a
regression coefficient) to be estimated and let Q = Q (Yo, Ym) denote a function of the
observed and missing data. Multiple imputation uses information from the observed
scores Yo to replace the missing scores Ym, and then uses the complete data set with
imputed values to estimate the parameter Q. A distribution of the missing data is required
to generate the imputed values. The distribution is drawn from the observed scores Yo. It
is necessary to know the structural or model parameter of the observed scores, which is 0,
where 0 represents a vector of q parameters. For example, 0= (uo2), means that 0is a
function of the mean p and the variance o2. Because 0is unknown, it must be estimated,
resulting in the random variable, 0 (Michiels & Molenberghs, 1997). Because of the
uncertainty of not knowing 0, Rubin (1987) recommended using Bayesian methodology
to account for this uncertainty in MI. Through a Bayesian procedure, a distribution
function of t in the form of the posterior probability distribution of 0can be obtained
from the data (Michiels & Molenberghs, 1997).
The derivation of MI is as follows (Rubin, 1996): Inferences for Q are based on
the actual posterior probability distribution of Q,f(Q Yo), which can be expressed as
f (Q I Yo)= f (QI Yo,Ym)f(Y, I Yo)dY, (22)
where
fdenotes the probability distribution function,
f(Q I Yo, Y,) is the complete data posterior probability distribution of Q and is expressed
as the conditional distribution of Q given both the observed and missing data, and
f(Y, I Yo) is the predictive probability distribution of missing scores Ym given the
observed scores Yo.
Based on equation (22), the actual posterior probability distribution of Q at a
particular value Qi can be obtained by drawing an infinite number of repeated
independent values for Y, fromf(Ym I Yo), calculatingf(Q, I Yo, Ym) separately for each
draw, and then averaging the values over the repeated imputations (Little & Schenker,
1995; Rubin, 1996).
The predictive distribution of the missing scores can be parameterized using a
structural parameter 0 (Little & Schenker, 1995), and is expressed as
f(Y, I Yo)= Jf(Y. I Yo,0)f(90 Yo)d (23)
where
f(9 Yo) is the conditional distribution of 0 given the observed scores Yo, and
f(Ym I Yo, 6) is the conditional distribution of Y, given the observed scores Yo and the
parameter 0.
From the Bayesian perspective, drawing k values for the missing scores Ym in MI
involves two steps (Schafer, 1999):
Step 1. Simulate an independent random draw of the unknown parameter ( from the
observeddata posterior distribution (0 Yo).
Sf(0 Yo)
where d is the posterior distribution of 0 or a distribution for the missing scores
estimated.
Step 2. Randomly draw missing values YI from the conditional predictive distribution of
Ym given parameter 6.
Y. ~ f(Y. I Yo, *)
These two steps are repeated k times to yield k sets of imputed values for the missing
scores Y,.
In principle, MI involves k repetitions of independent draws from the posterior
predictive distribution of Y. by specifying a prior distribution of the unknown structural
parameter 8 (Little & Schenker, 1995). This forms the k imputations for the missing
scores Y,.
Bayes' Theorem
The aim of MI is to estimate the unknown structural parameter 9 and to generate
the imputed values. Conditional upon a sample of observed scores Yo, Bayes' theorem
makes inferences about the unknown parameter 0. Bayes' theorem represents uncertainty
of not knowing Oby a prior probability distribution (Pollard, 1986).
The conditional distribution of 0 given the observed scores Yo or the posterior
distribution of is derived from a Bayesian procedure (Pollard, 1986) and is defined as
f(91Yo) =f(Yo I O)f(0) /f(Yo) (24)
where
f(Yo I 8) is the conditional probability, or likelihood, of the observed scores Yo,
f(0) is the prior distribution of unknown structural parameter 8 and represents
uncertainty about the value of the parameter before any data are seen, and
f(Yo) is the marginal probability of observed scores Yo for an examinee of parameter 0
randomly sampled from a population with the given distribution.
Sincef(Y,) is a constant that serves to makef(81 Yo) integrate to one, the
equation (24) becomes
f( I Yo) oc f(Yo I)f(0) (25)
where oc indicates a relationship of proportionality.
Given Yo,f (Yo I 0) becomes a likelihood function for 0 given Yo, L(O Yo). The posterior
distribution of 0is then written as
f(9 Yo) ac L(OI Y,)f(0) (26)
where L(0 Yo) expresses the information about the parameter provided by the observed
scores Yo, andf(Yo I 9) serves to convert the prior distribution f(0) into the posterior
distributionf(01 Yo) (Pollard, 1986).
It is through the likelihood (i.e., L( I Yo)) of f(Yo I 0) that the observed scores Yo
modify the prior distribution () to determine the posterior distributionf(9 I Yo) (Pollard,
1986). In essence, Bayesian methodology specifies a distribution of what is expected to
occur based on prior information and combines it with new information (i.e., the
observed scores Yo) to form inferences about 0.
Assumptions Underlying Multiple Imputation
The missing data mechanism is assumed to be ignorable or missing at random
(Graham & Schafer, 1999). Ignorable means it is not necessary to specify a nonresponse
model or to estimate its parameters in order to obtain valid likelihoodbased inferences.
Missing at random means that the missing data are a random sample from the complete
data after conditioning on the measured variables X in the imputation model (Schafer,
1997). However, Graham and associates (1997) have demonstrated that MI produces
satisfactory parameter estimates even when the ignorability assumption is suspect.
In addition, the variables in the data set are assumed to have multivariate normal
distribution. Simulation studies (Graham, Hofer, & McKinnon, 1996; Wang, Anderson,
& Prentice, 1999) supported that the MI estimator is robust even when the data model
departs from being multivariate normally distributed.
Proper Imputation Method
An imputation method is regarded as proper when it incorporates appropriate
variability (i.e., uncertainty about the missing scores and the sampling variability) in
creating multiply imputed values under a correctly specified model (Rubin, 1987, 1996).
Rubin (1987) has shown that one way to achieve proper imputation is for the imputation
procedure to follow the Bayes' theorem of infinite independent draws of Y, from its
posterior predictive distribution as specified in equations (22 & 23). By incorporating
variability to adjust the standard error of estimates of the parameter, proper imputation
method leads to valid inferences (Rubin & Schenker, 1991).
The conditions under which an imputation method is proper include the
following:
1. Imputed values are independent repeated draws from a Bayesian posterior predictive
distribution of the missing scores Y, given the observed scores, f(Y, I Yo) (Rubin,
1996).
2. Infinite k repeated imputations since parameter estimates derived from infinite draws
for Y, are fully efficient (Little & Rubin, 1989).
3. The underlying model specification for the complete data is correct.
4. The underlying model specification for the missing data mechanism (i.e., assumptions
about the nonresponse) is correct.
5. Largesample size (N> 100) (Rubin & Schenker, 1986).
6. All causes of missingness are included in the imputation model (Graham et al., 1997).
Imputation Methods
Rubin and Schenker (1986) proposed two types of imputation methodsimplicit
and explicit. Implicit or nonparameteric methods are applicable for discrete data and
involve drawing values only from Yo and then assigning them to Ym. In contrast, explicit
or parametric methods are applicable for continuous data and involve a statistical model
to form the posterior predictive distribution of Y,, from which imputing values are
drawn. Unlike implicit methods, the drawing values based on explicit methods are not in
Yo (Rubin & Schenker, 1986).
Implicit Methods
Simple hot deck procedure
The simple hot deck procedure involves random draws with replacement of nm
imputed values for nonrespondents from observed scores in matching respondents.
However, like stochastic regression imputation, simple hot deck procedure ignores the
sampling variability as the population distribution of (Ym Yo) is not known. The imputed
values are estimated from the respondent scores Yo in one sample only (Little &
Schenker, 1995).
Approximate Bayesian bootstrap
In order to incorporate sampling variability in the estimated parameters,
approximate Bayesian bootstrap (ABB) is used (Rubin, 1987). The ABB creates k
repeated imputations from the posterior predictive distribution of the missing data as
follows:
1. Draw no values at random with replacement from the no possible values to create a
bootstrap sample distribution such as a scaled multinomial distribution.
2. Then independently draw n, missing values with replacement from the bootstrap
sample distribution (Rubin & Schenker, 1986).
This process is repeated k times to yield k sets of imputed values, and each set of
imputations comes from a different bootstrap sample of Yo.
Explicit Methods
Explicit methods define the model for the distribution of the response variable Y
(e.g., normal linear regression model or logistic regression model) and a set of predictors
X that enters the model to create imputations (Little, 1992; Rubin & Schenker, 1991).
Fully normal imputation
Once again, suppose Yis an Nx 1 vector of the response for examinees, Y= (yi,.
.., y,) and is composed of both a set of the observed scores, Yo = (yi, ..., yo), and a set
of the missing scores, Ym = (yo+i, ... ., y,). Let no be the number of respondents for Yo and
nti be the number of nonrespondents for Ym. The scores of Ym are to be estimated. Rubin
and Schenker (1986) described how to create multiple imputations under the independent
normal model, yi ~ N(, o2), for i = 1,... ,o, where 0= (u, 2) is unknown, and 0 is a
function of the mean ,u and the variance O2. When the prior distribution of O,f(0), is
proportional to 1/o2, the conditional posterior distribution of P given o2, fu I o2, Yo) is
N(y0,2l /no),
10
where yo is the sample mean of Yo, and is equal to y,;
no i=1
and the observeddata posterior distribution of o2,f(o2 Yo) is
(no g)62 X og,
where ir2 is the estimated variance of Yo, and is equal to (y o)2, and
(no g) j=1
X2, denotes a chisquare random variable with nog degrees of freedom.
To create an imputation Y. = (Y:,.... Y, ), for = 1,..., k, the following three
steps are required.
Step 1. Generate the unknown parameters 0 = (U, "2) from the observeddata posterior
distributionf(9 Yo) by first randomly drawing the variance a*2 from(n, g)d2 IXg,
and then randomly drawing the mean u* from N(3y, &*2 / no).
Step 2. Independently draw nm missing values of Ym from yi, ~ N(p', a *2),
for i = o + 1,.. n (Rubin & Schenker, 1986, Schafer, 1999).
Step 3. Repeat the procedure k times (i.e., set = k) to yield k sets of proper imputations.
Markov Chain Monte Carlo
In addition to using the resampling procedure such as bootstrapping, which is a
noniterative method for creating posterior predictive probability distribution from Yo,
Markov Chain Monte Carlo (MCMC) is a collection of iterative simulationbased
methods for generating the posterior distribution of the unknown parameters 0and they
do not require large samples for efficacy (Little & Schenker, 1995). Data augmentation
algorithm (Tanner & Wong, 1987) and Gibbs sampling (Gelfand & Smith, 1990) are two
of the most common MCMC methods used in MI (Little & Schenker, 1995). They
generate simulationbased estimates of the Bayesian posterior predictive distribution of
the missing data, f(Y, I Y) and from it, perform k independent random draws of Y,m
(Schafer 1997).
Normal Linear Regression Model
Y is modeled by a linear regression model, Y~ N(XPf, o2) with a multivariate
normal distribution, where Xf covariate is a function of the parameters, X contains g
variables, 8 is the parameter vector of regression coefficients to be estimated, and o0 is
the regression variance. The algorithm for creating k multiply imputed values involves
the following steps (Rubin, 1987):
Step 1. Regress Yo on X. to give the ordinary least squares estimates: estimated
regression coefficient vector / and estimated regression variance ci2.
S= VX'. Y (27)
where V = (X'o X,).
The vector of predicted responses:
Y, = X, (28)
The maximum likelihood estimator of o2:
2 (Yo /(no g) (29)
This is the estimation task for the normal linear regression model (Rubin, 1987).
Imputation task for this model is from Steps 2 to 4.
Step 2. Estimate o*2 (square of a random error) to account for the deviations around the
regression line.
'2 = a2(no g)/L (210)
for = 1,... where L is a randomly drawn variate from a chisquared distribution
with no g degrees of freedom.
Substitute r2 = (Yo )2 /(no g) in equation (210), and o2 becomes (Yo o)2 /L.
Step 3. Estimate the regression coefficients by adding the random error a to account for
the uncertainty about the regression prediction.
f' = f^ + a"ZVZ (211)
where
Z is a gcomponent vector of standard normal deviate, Z ~ N (0, Ig),
Ig is the identity matrix of order g.
Z is formed by drawing g independent variates from N (0,1),
o'V"2 is the mean square error of the regression equation, and
oaV is the variancecovariance matrix 1. The roots of the main diagonal of Z are the
standard errors,
V/12 is the triangular square root of V obtained from the Cholesky decomposition, and
cr'V"2Z represents the error term.
Each set of randomly drawn coefficients is then used to estimate the missing
values, and different sets of coefficients reflect the variation of the regression lines due to
sampling. Steps 2 and 3 indicate random draws from the posterior distribution of j (van
Buuren, Boshuizen & Knook, 1999).
Step 4. Predict the missing values of Ym based on the following equation:
Y. = X8 + r'z (212)
where z is a random number drawn from normal deviates.
Each set of predicted values for Ym is based on a different set of regression predictions
and random components o*.
Step 5. Repeat steps 2 to 4 k times to create k sets of imputed values Y., Y2,..., Y .
RepeatedImputation Inferences
The final point estimate of the parameter Q approximates the actual posterior
mean of Q, E (Q I Yo). This equals the average of the repeated completedata posterior
means of Q, and can be expressed as (Rubin, 1996):
E (Q Yo) = E [ E (Q I Yo, Y) Yo] (213)
where E refers to the expectation over the repeated imputations, and E (Q I Yo, Ym)
approximates the completedata posterior mean.
The final estimated variance of the parameter Q approximates the actual posterior
variance of Q, Var (Q I Yo). This equals the average of the repeated completedata
posterior variances of Q plus the variance of the repeated completedata posterior means
of Q, and can be expressed as (Rubin, 1996):
Var (Q I Yo) = E [Var (Q I Yo, Y,) I Yo] +Var [E (Q I Yo, Ym) Yo] (214)
where Var refers to the variance over the repeated imputations, and Var (Q I Yo, Y,)
approximates completedata posterior variance.
Based on the above standard probability derivations (213 & 214), the k point
estimates and their associated variances obtained from the standard completecase
analysis method can be combined into a final adjusted estimate and its estimated variance
using Rubin's formulas (Rubin, 1987).
After generating k imputed data sets using an appropriate imputation model and
method, and analyzing each of them separately using a standard completecase analysis
method, MI yields k intermediate parameter estimates Q: ( ,..., Q,), and k associated
variance estimates l,: (lj,...,Uk ), for i = 1,..., k. The final adjusted point estimate Q
is obtained by averaging over the k intermediate parameter estimates, which is
k1k
Q = i (215)
k i=
Whereas the final estimated total variance is obtained by the sum of the average associate
variance within a set ofk imputed values and the variance across independent sets of
imputed values, which is
T = U+(1+ k')B (216)
where U is the average withinimputation variance within a set ofk imputed values, and
is expressed as
U =El (217)
k i=1
and B is the variance across independent sets of imputed values, and is expressed as
1k
B= k (Q, )2 (218)
k I x
Bacik, Murhy, and Anthony (1998) indicated that the withinimputation variance
is a measure of the uncertainty about not knowing the missing data and the between
imputation variance is a measure of ordinary sampling variation. The inflation factor
(l+k') accounts for the simulation errors in using a finite number of imputations
(i.e., k < oc) (Barnard & Meng, 1999). Multiple imputation correctly adjusts the standard
error of estimates of the parameter by including within and betweenimputation
variances.
When there are no missing data, Q,..., are identical, and the between
imputation variance B becomes zero and T is equal to U. When k = 1 (i.e., single
imputation), B cannot be estimated. T is then equal to U, and the variance is
systematically underestimated (Heitjan & Rubin, 1990). As k increases, both Q and T
decrease, hence resulting in greater precision of sample statistics (Little & Schenker,
1995).
The extent of influence of missing data on the estimation of Q is determined by
both yand r. The factor r estimates the proportional increase in variance due to missing
data, and can be expressed as
r = (1+ k)BIU = yl/( y) (219)
where the ratio of B to U is a reflection of how much information in the missing part of
the data relative to the observed part (Schafer & Olsen, 1988), and yis an estimate of the
fraction of missing data about Q (Little & Schenker, 1995). Little and Rubin (1989)
pointed out that yis equal to the fraction of data missing only when the missing data
mechanism is missing completely at random.
Uncertainty
Since the imputed values are not the true observed scores, MI takes into account
the uncertainty about the true values of the missing scores in the parameter estimates by
drawing parameter d from the observeddata posterior distributionf(01 Yo) and then
imputed values Y, from the conditional predictive distribution of Y,,, given that parameter
d, f(Y, I Yo, 0) (Rubin & Schenker, 1991).
In addition to incorporating the uncertainty about not knowing the missing scores,
MI also takes into account the fact that the population distribution of Y, given Yo is not
known, and is estimated from the observed scores Yo in one sample (Graham & Schafer,
1999; Little & Schenker, 1995). The variation in estimating the regression line is called
sampling variability (Graham & Schafer, 1999). The third source of uncertainty /
variability comes from the finite number of imputations derived from using
approximations to Bayesian posterior distributions, and is called simulation errors (Rubin
& Schenker, 1991). Finally, by comparing parameter estimates across a number of
plausible missingdata models (i.e., sensitivity analysis), MI reveals uncertainty about
reasons for nonresponse (Beaton, 1997; Little & Rubin, 1989).
Number of Imputations
Under the ignorable response assumption, the final adjusted point estimate Q and
its estimated variance based on infinite number of imputations are the same as the ones
obtained from the maximum likelihood estimation (MLE), which is fully efficient and
correct (Little, 1992). The largesample efficiency of the point estimate Q based on k
imputations relative to that based on an infinite number of imputations is 1 + (7 / k), in
standard error units (Rubin, 1996; Schafer & Olsen, 1988). As illustrated, with 30%
missing data (y= 0.3), an estimate based on k = 3 imputations has a standard error of
1 + 0.3/3 = 1.05. This means the standard error is 5% wider than the one obtained from
MLE. Alternatively, the percent efficiency of Q is defined as 1/l + (y/k) (Rubin,
1987). In this example, the percent efficiency is 1/1.05 = .95 or 95%. This means the
efficiency of Q is 5% less than the one obtained from MLE. By increasing k to 5 and 10
imputations, it increases the efficiency of Q to 97% and 99.5% respectively. As shown in
Figure 22, unless the fraction of missing data is unusually high (70% or more), the
efficiency gained by implementing k beyond 510 is minimal. Rubin and Schenker (1986)
concluded that only a few number of repetitions (3 < k < 10) are needed to produce point
estimates that are close to fully efficient when the amount of missing data is moderate
(e.g., 30%).
Y
 
.9 
0.7
0.6
S0.5
I. 0.4
S0.3
0.2
0.1 x
S1 2 3 4 5 6 7 8 9 10
Number of Imputations
Figure 22. Percent efficiency of MI estimation using different number of imputations in
three levels of missingness (10%, 30%, and 50%).
Several empirical studies supported that standard error estimates of the parameters
were underestimated by 1020% in single imputation when compared to the ones in MI
(Crawford, Tennstedt, & McKinlay, 1995; Heitjan & Rubin, 1990; Landerman et al.,
1997). Based on the results of two Monte Carlo simulation studies (Little & Rubin, 1989;
Rubin & Schenker, 1986), Table 1 shows the comparison of the actual confidence
interval (CI) coverage for Q, when k = 1, 2, or 3, with the nominal coverage at 90%,
95%, or 99%. Under the ignorable response assumption, the fraction of missing data in
these two largesample (N > 100) simulation studies was 30%. As indicated, when k = 1
(single imputation), the discrepancy between the actual and nominal coverage ranges
from 513%, whereas when k = 2, the discrepancy is only 23%. When k = 3, there is no
discrepancy at all, which means that inferences are valid.
Table 21. Analytic LargeSample (N> 100) Coverage (in %) of Single (k = 1) and
Multiple (k = 2 or 3) Imputation Procedure with Missing Data Equals 30%
k Nominal Coverage
90% 95% 99%
1 77 85 94
2 87 93 99
3 90 95 99
Note. Adapted from Little and Rubin (1989) and Rubin and Schenker (1986)
Rubin and Schenker (1986) demonstrated that k should increase from 2 to 3 as the
nonresponse rate increases from 10% to 60% in order to achieve a satisfactory CI
coverage (i.e., close to the nominal value). Rubin and Schenker (1986) also pointed out
that improvements in the actual CI coverage diminish as k increases. The differences in
standard errors or CI coverage between k = 5 and k > 25 have been shown to be
negligible (Heitjan & Rubin, 1990; Wang, Sedransk, & Jinn, 1992).
Based on the literature, the number of imputations depends on:
1. The amount of missing information. As the percent of missing data increases, the
amount of uncertainty about the imputed values increases. To accurately incorporate
this uncertainty, it requires an increase in the number of imputations. Since imputed
values are averaged over k imputations, the imputation variance is reduced as the
number of imputations increases (Kalton & Kasprzyk,1986; Rubin,1987).
2. The type of missing data mechanisms. Based on several simulation and empirical
studies, Glynn, Laird, and Rubin (1993) and Raghunathana and Siscovick (1996)
demonstrated that nonignorable nonresponse patterns require a larger number of
imputations (k > 10) than ignorable nonresponse patterns to achieve a satisfactory CI
coverage.
Advantages of Multiple Imputation
As in single imputation, the resulting data set with the imputed values can be
analyzed by means of standard completecase analysis methods. Because MI involves
averaging over k intermediate parameter estimates, the final point estimate derived from
MI is more efficient than that from single imputation (Rubin, 1996). The final estimated
variance also reflects the true variance of the parameter.
Studies affirmed that MI produces accurate standard errors (i.e., efficient) for
parameter estimates as it correctly adjusts for nonresponse bias (Heitjan & Little, 1991;
Rubin & Schenker, 1986; Xie & Paik, 1997). The estimated actual CI coverage is close to
the nominal levels (Little & Rubin, 1989; Rubin & Schenker, 1986; Wang et al., 1992),
which means MI yields valid inferences.
MI has been shown to yield satisfactory parameter estimates with relatively little
bias even under the following conditions:
1. Sample sizes are small (e.g., 50) (Graham & Schafer, 1999). Little (1992)
recommended to use MI for small samples and MLE for large samples.
2. Data are missing in large amounts (e.g., 50%) (Graham & Schafer, 1999).
3. Models are relatively large and complex (e.g., 18predictor model) (Graham
& Schafer, 1999).
4. Ignorability assumption is suspect (Graham et al., 1996, 1997).
5. Data distribution is skewed (Graham et al., 1996; Wang et al., 1999).
6. Model of the data distribution is misspecified (Greenland & Finkle, 1995).
Empirical and simulation studies have shown that MI is far superior to deletion
procedures, mean substitution, regression imputation (Crawford et al., 1995; Graham et
al., 1996), and simple hot deck procedure (DeCanio & Watkins, 1998) with regard to
bias, efficiency, and validity of interval estimates when the underlying MI model
specification is correct.
Limitations of Multiple Imputation
Since the observed scores Yo provide indirect evidence about the likely values of
the missing ones Ym in MI, relevant predictors for Y (i.e., knowledge on the cause of
missingness) are essential to obtain unbiased estimates and valid inferences.
Summary
A summary of the development, theoretical framework, and assumptions of MI
were discussed. The procedures on performing MI based on a normal linear regression
model with a univariate Y variable as well as how to combine k intermediate parameter
estimates into a final adjusted point estimate and its variance were described. The two
main features of MI are: (i) it takes into consideration the uncertainty of not knowing the
exact values of the missing scores by incorporating the residual variation about the
regression prediction, and (ii) it incorporates the sampling variability to estimate the
population distribution of the missing scores, which are unknown. Because of these two
features, MI has been shown to yield satisfactory parameter estimates with relatively little
bias.
Missing Data Mechanism
Little and Rubin (1987) indicated that valid inferences from MI depend on the
inclusion of a correct mechanism that produces missingness, and the knowledge of the
missing data mechanism is important in selecting an appropriate imputation model. Rubin
(1976) defined the mechanism of missingness in terms of a probability distribution model
of nonresponse. Let R denote an N x 1 vector of binary missingdata indicators with
distribution depending on a parameter vector V/ for the nonresponse model. If an
examinee responds to an item, R = 1; if an examinee omits an item, R = 0.
Since Y= (Yo, Ym), the probability distribution of the Y function for the complete
data can be expressed as
f(Y I) f(Y, Y, 10 ) (220)
Integrating over the sampling space of missing scores Y, yields the marginal probability
distribution for the observed scores (Schafer, 1997).
f(Y, I0) = ff(Y0, Y I )dY, (221)
The probability distribution for the observed scores given the parameters 0 of the data
model and the parameters y of the missing data mechanism can be expressed as
f(Y,,R I X,0, ) = f(Y, Y,,R I X,0, )dY, (222)
where 0and yare sets of indexing vectors of unknown parameters for their respective
distributions. For example, the parameters in y are the proportions of examinees assigned
to each item (Bradlow & Thomas, 1998), whereas 9= (Au, o) or 0= (', oa).
The equation (222) can be factorized
asf(Y,, R X, y) = Jf(R IY, Y., X, yl)f(Yo, Y. I X, O)dY, (223)
where
f(R I Yo, Y,, X, V) denotes the conditional distribution of R given Y and represents a
model for the missing data mechanism, and
f(Yo, Ym I X, 0) represents a model for the data.
Little and Rubin (1987) distinguished three types of missing data mechanisms:
missing completely at random (MCAR), missing at random (MAR), and nonignorable
missing (NIM).
Missing Completely at Random
Missing data mechanism is MCAR if the probability of the missing data indicator,
f(R), is independent of both the observed scores Yo and the missing scores Y, in the
model, which means
f(R I Yo, Y. X, ) =f (R I y) for all Y (224)
An example of MCAR in education is when the probability of missing responses to an
item in an achievement test depends on neither the examinees' ability nor the number of
instruction hours on test taking skills received, and the number of instruction hours
received for all examinees are known.
As indicated in equation (224), the distribution of the missing data indicator R
does not depend on the missing scores or any covariates X. The first term of the marginal
probability distribution in equation (223) can therefore come out of the integral, and
equation (223) can be expressed as
f(Yo,R I X,0,y) = f(R I ) ff(Y, Y, I )dY, (225)
From equation (221), when the MCAR assumption holds, the probability distribution of
the observed scores then becomes
f(Y, R I X, 0, V) =f(R I )f (Y, 0) (226)
where
f(R tI ) represents a model for the missing data mechanism, and
f(Yo  9) represents a model for the conditional probability distribution of the observed
scores Yo.
Since Yo and R are independent in equation (226), the sampling distribution of
the observed scores is a marginal of the complete data distribution (Laird, 1988). This
situation implies that samplingbased inferences such as regression imputation that make
use of the distributional properties of the marginal distribution of the observed scores are
unbiased and valid (Heitjan, 1997). However, MCAR makes the strongest assumption
among the three types of missing data mechanisms (Little, 1992).
The likelihood function of the observed scores under the MCAR assumption in
equation (226) can be factorized into two components, one pertaining solely to the
structural parameter 0of the model and the other pertaining solely to the nuisance
parameter y of the missing data mechanism.
L(, I Y,,X,R) oc f(R I y)f(Yo 10) (227)
When the joint parameter space of (0, V) is the product of the parameter space of each
separately, that is the two parameters 0and yare independent, the likelihood of based
on Yo, L( I Yo) is a function proportional tof(Y, I 0) (Chirembo, 1995),
L( I Y,) c f(Y, I0) (228)
Since L( I Yo) is proportional tof(Yo I 0), it is not necessary to specify the
missing data mechanism when using likelihoodbased inferences to obtain unbiased
estimates (Laird, 1988). Under the MCAR assumption, Bayesian inference or maximum
likelihood estimation of the structural parameters 0 will yield valid inferences from the
observed scoresf(Yo 9 ) without estimating the parameters y/(Rubin, 1976). That is why
the missing data mechanism is ignorable for likelihoodbased inferences.
The pattern of missingness on Y under the MCAR assumption is completely
randomly determined. The MCAR assumption can be assessed by comparing
distributions of missing variable Y for respondents and nonrespondents on covariates X to
check evidence for a systematic difference between nonrespondents and respondents
(Curran, Bacchi, Hsu Schmitz, Molenberghs, & Sylvester, 1998).
Missing at Random
MAR is based on a weaker assumption. Under the MAR assumption, the
conditional probability distribution of missing data indicator R given X depends on the
observed scores Yo, but not the missing scores Ym (Little, 1995)
f(R I Yo, Y, X, y) =f (R I Yo, X, V) for all Y, (229)
For example, the probability of missing responses to an item in an achievement test
depends on the scores of the measured variables (e.g., number of instruction hours on test
taking skills received), but not on the missing scores of the item itself (e.g., item
difficulty).
As indicated in equation (229), the distribution of the missing data indicator R
does not depend on the missing scores. The first term of the marginal probability
distribution in equation (223) can therefore come out of the integral, and equation (223)
can be expressed as
f(Y,,R I X,0,y)= f(R I Yo,X,) If(Y ,Y. X,9)dY, (230)
As in equation (221), the probability distribution of Yo is the marginal probability
distribution.
f(Y I X,0) = ff(Y,Y I X,o)dY (231)
The probability distribution of the observed scores under the MAR assumption then
becomes
f(Yo, R I X, 0, ) = f(R I Yo,X, )f(Y, I X,0) (232)
When data are MAR, the sampling distribution of the observed scores no longer
equals the ordinary marginal distribution, but depends upon the missing process (Laird,
1988). Hence samplingbased inferences are biased. On the other hand, the likelihood
function of the observed scores under the MAR assumption in equation (232) can be
factorized into two components:
L(0,y I Yo,X,R) oc f(R I Y,,X,y)f(Y, I X,0) (233)
Once again, when the two parameters 0 and V/are functionally unrelated, the likelihood
of the structural parameters Obased on Yo is a function proportion to the marginal
probability distribution of Yo (Rubin, 1976).
L( I Yo,X) o f(Yo IX,0) (234)
Thus the missing data mechanism under the MAR assumption is also ignorable for the
likelihoodbased inferences (Rubin, 1976). In summary, when the missing data
mechanism is ignorable, the imputation model does not have to include a distribution of
the missing data indicator R, and the likelihood function of 0 is based on only the
observed scores Yo.
Rubin (1976) showed that the response mechanism generating the missing data is
ignorable for likelihoodbased inferences if the parameter 0 of the data model and the
parameter / associated with the missing data mechanism are independent or functionally
unrelated, and the missing data are MAR.
Conceptually, the missing values under the MAR assumption are a random
sample from the complete data after conditioning on the measured variables X in the
imputation model; therefore, the process of creating these missing values can be modeled
using these variables (Barard, Du, Hill, & Rubin, 1998). For example, the percent of
missing responses in an item of an achievement test differs in groups of examinees with
high, medium, and low cognitiveability scores, and the scores of the cognitive ability for
all examinees are known. Under the MAR assumption, the missing responses are
randomly distributed within these three subgroups of examinees, even when the
responses are not missing at random across subgroups (Roth, 1994). In other words, the
measured variables X (i.e., the cognitive ability in this example) can account for the
differences in the distribution of Y between nonrespondents and respondents (Little &
Schenker, 1995).
In addition to MAR, Roth (1994) identified another pattern of missingness when
missing data are related to other variables. In this pattern, missing data are nonrandomly
distributed across and within subgroups. For example, more scores are missing at the
bottom range of the high cognitiveability group but a relatively few missing scores at the
top range of the same group (Roth, 1994).
Accessible Missing Data Mechanism
Graham and Donaldson (1993) defined the missing data mechanism as
"accessible" when the cause ofmissingness has been measured, whereas ignorablee"
refers to a combination of accessible and proper use of the cause of missingness for
analysis. Graham, Hofer, and Piccinin (1994) explained that unless the cause of
missingness is incorporated properly in the analysis, the mechanism will not be ignorable.
Schafer (1997) pointed out that whether the missing data mechanism is ignorable
is closely related to the fullness of the observed scores Yo, the relevant variables X (i.e.,
causes ofmissingness), and the complexity of the data modelf(Yo I X, 0). If Yo and X
contain a lot of information for predicting Y,, and are incorporated properly in the
imputation model for analysis, then the residual dependence of R upon Ym after
conditioning on Yo and X will be small (Schafer, 1997). Including relevant variables X,
(covariates, variables that relate to the nonresponse, and predictive variables that explain
a considerable amount of variance of Yin the model) help to reduce the uncertainty of the
imputations (van Buuren et al., 1999), and thus to adjust bias associated with the missing
data (Graham et al., 1994, 1997).
Barnard and Meng (1999) advocated the adoption of a "sensible imputation
model", which incorporates as many relevant variables for the cause of missingness, and
at the same time keeps the modelbuilding and fitting feasible so as to reduce
multicollinearity problems. It is suggested that including extra variables may affect
precision, but not bias in the inferences (Rubin 1987); on the other hand, leaving out
relevant causes of missingness will yield biased estimation (Schafer, 1999; Schafer &
Olsen, 1998).
Nonignorable Missing
Under the NIM assumption, the conditional probability distribution of missing
data indicator R given X is a function of the missing scores Y,, or the values of
unmeasured relevant variables, and possibly also the observed scores Yo (Laird, 1988).
The unmeasured variables may be unavailable or inaccessible.
f(R I Yo, Y,,,, X, ) (235)
For example, the probability of missing responses to an item in an achievement test
depends on the missing scores of the item itself(e.g., item difficulty) and/or the
examinees' true unobserved parameter (e.g., examinees' ability).
Since the conditional probability distribution (235) can not be simplified, NIM
involves joint probability distribution modeling of both the complete dataf(Yo, Y,,, X, 0)
for Y, the missing data mechanismf(R I Yo, Y,, X, V) and the joint estimation of 0and y/
from Yo and R, respectively (Schafer, 1997).
Little and Rubin (1987) suggested two ways to factorize the joint distribution of
the complete data Y and missing data indicator R. One is based on the selection models:
f(Y, RI X, y) =f(Y I )f( (R I Y, v) (236)
where
f(Y 0) is the model for the complete data Y, and
f(R I Y, V) is the model for the missing data mechanism.
The other is patternmixture models:
f(Y, R  q, n) =f(YI R, p)f(R I ) (237)
where
f(Y I R, 4p) represents the distribution of Y conditioning on the missing data indicator R,
f(R I T) represents the marginal distribution of the missing data indicator for whether or
not Y is missing, and
p and rare the two unknown parameters corresponding to the two distributions.
Selection models specify the precise form of the nonresponse model, whereas
patternmixture models incorporate the assumption of the missing data mechanism
through restrictions on the parameters (Little, 1995). When R is independent of Y (i.e.,
when 0= p and y= a), the missing data mechanism becomes MCAR, and the selection
models are equivalent to the patternmixture models.
The likelihood function on the observed scores under the NIM assumption include
a missing data indicator R and the missing data parameters y.
L(0,y  Yo,X,R) oc f(Y,R I X,0,y) (238)
The joint distribution for Y and R typically involves more parameters such as the
YR interaction term than can be estimated from Yo and R alone (i.e., underidentifiable)
(Little, 1995). In order to make them identifiable so that valid likelihoodbased inferences
can be made about the marginal responses, a restriction on the assumption is required.
Schafer (1997) suggested that a'priori restrictions be imposed on either the joint
parameter space for 0and V, or the Bayesian prior distribution (0, /).
Conceptually, under the NIM assumption, the distribution of respondents and
nonrespondents on Y differs systematically, even after conditioning on the values of
measured variables X (Rubin & Schenker, 1991). Compared to equation (23), the
posterior predictive probability distribution under the NIM assumption needs to include a
full specification of the probability model with the joint distribution of Y, the nonresponse
pattern R, and the measured variables X.
f(Y I YX, R) = Jf(Y Yo,,X,R, )f(9 1 Y,,X,R)dO (239)
Sensitivity Analysis
Often time, little is known about the nonresponse mechanism that creates the
missing responses in a particular achievement test. Missing responses can arise from a
variety of reasons including a combination of ignorable and nonignorable mechanisms
(Schafer & Olsen, 1988). However, distinguishing between ignorable and nonignorable
mechanisms (i.e., MAR and NIM) relies on fundamentally untestable assumptions
(Curran et al., 1998). Curran and associates (1998) demonstrated that these assumptions
cannot be tested formally from the empirical data at hand. Analyses should be conducted
to compare the estimates across a number of plausible missingdata models. Inferences
from the sensitivity analysis reveal uncertainty about reasons for nonresponse (Beaton,
1997; Little & Rubin, 1989). Sensitivity analysis can also be conducted across alternative
imputing procedures in a similar manner to reveal uncertainty about different possible
imputation models.
Under the NIM assumption, sensitivity analysis can be performed by comparing
estimates between selection and patternmixture models. If the results are consistent,
confidence about the conclusions is established. On the other hand, if the results depend
on the form of the model, then more specific conditions can be suggested about where the
conclusion can apply (Little, 1995).
44
Summary
Valid inferences from MI rely on the inclusion of a correct missing data
mechanism. As discussed above, factorization of the posterior predictive probability
distribution depends on whether the missing data mechanism is ignorable or not. When
the missing data mechanism is not ignorable, the missing data indicator has to be
incorporated into the posterior predictive probability distribution, and the likelihood
function of 0 is not just based on the observed scores.
CHAPTER 3
METHODOLOGY
This chapter first describes the data generation procedure, the design of the study,
and the MI procedure. Data generation was based on the threeparameter logistic model
(Hambleton & Swaminathan, 1985). The design of this study involved the distribution
and percent of missing responses as a function of the ability of the examinees and the
difficulty of the items in one omitting pattern, and the ability of the examinees and the
sequence of the items in another omitting pattern. The procedure of MI based on a
logistic regression model with a univariate Ywas outlined. The second part of this
chapter discusses how to evaluate the effectiveness of MI on handling missing data.
To achieve the goal of evaluating the effectiveness of MI on handling missing
data, several steps were required.
Step 1. Simulated a complete data matrix of item responses for a specified number of
examinees.
Step 2. Computed the coefficient alpha. The coefficient alpha of this original complete
data (i.e., 0% of missing ) served as a benchmark for later comparison.
Step 3. Nonrandomly deleted certain percent of item responses from the complete
examineebyitem matrix generated in Step 1. Each missing data set was generated in a
similar fashion.
Step 4. Replaced the omitting item responses in Step 3 using MI.
Step 5. Computed the coefficient alpha of the data set that was restored by MI.
Step 6. Compared the coefficient alpha from Step 2 with the one from Step 5.
Simulation Procedure
Let W be an Nx P matrix representing a complete examineebyitem data set. N
is the number of examinees in the data set and P is the number of test items. In this study,
P was fixed to be 20 in all conditions. A 20item test was used because it represented test
lengths frequently found in educational and psychological applications (Yen, 1987) such
as the American Mathematical Association of TwoYear Colleges' Student Mathematics
League contest (Isaacson & Smith, 1993). The response of each item was dichotomous in
nature. Simulation of the 20 dichotomouslyscored item responses of a specified number
of examinees was based on the threeparameter logistic model (Hambleton &
Swaminathan, 1985).
() = c, + (1 c) exp[ b)] (31)
1 + exp[Da,(2 b,)]
for i = 1,..., 20, andj = 1,..., n.
where
P(~5) is the probability of thejth examinee with an ability answering the ith item
correctly,
ai is the discrimination parameter of item i,
bi is the difficulty parameter of item i,
c; is the pseudochance parameter of item i,
. is the ability ofthejth examinee, and
D is a scaling factor, which is 1.7.
To compute P(<), it required the three parameters (i.e., a, b, & c) and the ability
parameters (i.e., 4) to be known. The three parameters were drawn from the distributions
of known mean and standard deviation. Harrison (1998) used the criteria of Oshima's
(1994) study to sample the three parameters. The criteria for Oshima's (1994) study were
as follows: the item discrimination parameters (i.e., a) were randomly drawn from a
lognormal distribution with a mean of 1.13 and a standard deviation of 0.63; the item
difficulty parameters (i.e., b) were randomly drawn from a normal distribution with a
mean of 0 and a standard deviation of 1; and the pseudochance parameters (i.e., c) were
randomly drawn from a normal distribution with a mean of 0.25 and a standard deviation
of 0.05. According to Oshima (1994), the distributions of these three parameters were
similar to the real data set of a speeded test (i.e., TOEFL) as reported by Way and Reese
(1991). The ability parameters, 4, for the examinees were randomly generated from a
standard normal N(0,1) distribution. The present study used the same values of the three
item parameters as in Harrison's (1998, p. 7) study to generate the item responses (Table
31). The correlation between the item difficulty parameters (b) and the item
discrimination parameters (a) was .111 (p = .642, twotailed), whereas the correlation
between the item difficulty parameters (b) and the pseudochance parameters (c) was
.281 (p =.230, twotailed).
The response data for a particular item given by an examinee with a trait level of
ability, P,({), was determined by computing the probability of correctly answering that
item based on the known item and ability parameters. Since the item responses were
dichotomous in nature, the response probabilities Pi(.) were converted into binary
responses by comparing to a random number r generated from a uniform distribution in
the interval between 0 and 1. The random number r was used to determine whether the
score of a particular item was correct or incorrect. If the response probability obtained
from the equation (31) was greater than the random number r, a 1 was assigned, which
indicated the examinee's response to that particular item was correct; otherwise a 0 was
assigned, which indicated the examinee's response to that particular item was incorrect.
KuderRichardson 20 Formula
Since the test items are scored dichotomously, KuderRichardson 20 formula (KR
20) was used to calculate the index of internal consistency for the test items. The Kuder
Richardson 20 formula is equivalent to coefficient alpha when the item responses are
dichotomous in nature (Kuder & Richardson, 1937).
[ s"p iqi
KR 20= 1s (32)
s1 or,
where
s is the number of items in the test,
a is the variance of the test scores,
pi is the proportion of subjects answering item correctly,
qi is the proportion of subjects answering item incorrectly, and
piqi is the variance of scores on a single item i.
Table 31. Item Parameters Used for Test Simulation
Item
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Discrimination (a)
0.269
0.236
2.817
8.565
1.452
1.043
1.594
1.258
5.502
2.468
1.016
3.413
2.238
2.370
2.635
0.533
1.601
2.809
0.036
7.637
Difficulty (b)
0.772
0.129
0.979
0.235
0.072
1.245
1.504
0.545
0.802
2.408
0.048
2.062
0.262
1.158
0.314
0.536
1.177
0.471
0.475
0.203
Note. Adapted from Harrison (1998)
Pseudochance (c)
0.176
0.200
0.257
0.193
0.264
0.246
0.229
0.221
0.250
0.306
0.231
0.240
0.287
0.207
0.276
0.319
0.320
0.261
0.297
0.328
Design of Study
This study represents a 3 x [(3 x 3) + 1] x 2 design with three factors: sample size
(3 levels), percent of examinees with missing items (3 levels), percent of items missing
for each examinee with missing items (3 levels), and omitting pattern (2 levels) that were
fully crossed. An additional condition with disproportional percent of examinees missing
items that were nonrandomly distributed across and within each ability group was
included. The rationales for selecting the levels in each factor were described below.
Sample Size
The three levels of sample size chosen in this study were: N= 50, 100, or 500.
The sample size of 50 examinees is typical for validation studies (Schmidt, Hunter, &
Urry, 1976). The sample size of 500 was same as the real data that Raghunathana and
Siscovick (1996) used in studying the performance of MI. These three levels,
representing the small to largesample sizes, were also used by Graham and Schafer
(1999) to evaluate the efficiency of MI in a simulation study. The present study adopted
these three levels of sample size to allow comparison of the performance of MI with that
of other MDTs investigated by Harrison (1998).
Distribution and Percent of Missing Responses
In order to simulate a more realistic distribution and percent of nonresponse
across test items, the distribution and percent of missing responses were based on the
findings from a large scale study, the Reading Comprehension subtest, Level I, of the
Comprehensive Test of Basic Skills, Form S (Cluxton & Mandeville, 1979). In Cluxton
and Mandeville's (1979) study, they stratified one thousand third grade students into
three ability levelshigh, medium, and low. They found the proportion of students with
missing items within each stratified ability level was: 020% for the high ability group,
2080% for the mediumability group, and 90100% for the lowability group. They also
reported the proportion of missing items (out of the 45 items in the subtest) for students
within each stratified ability level was approximately: 716% for the highability group,
1838% for the mediumability group, and 4049% for the lowability group. The
correlation between the ability of students and the number of items missing in the body of
the test was .76; and the correlation between the ability of students and the number of
items missing at the end of the test was .47 (Cluxton & Mandeville, 1979).
Based on the range of the proportion of students with missing items within each
ability level, and the range of proportion of items missing provided in Cluxton and
Mandeville's (1979) study, the distribution and percent of missing responses in this study
were constructed in four steps:
First, the ability of the examinees in a sample were rank ordered. Second, the
examinees in each data set were stratified into three ability levels. Stratification was
based on the assumption that the data were normally distributed N (0,1). Plus and minus
one standard deviation in each sample were used as the cutoff to stratify the three ability
groups. As a result, approximate 68% of the examinees were within the one standard
deviation band and these students were classified as the mediumability group. About
16% of the examinees were above one standard deviation and these students were
classified as the highability group, and about 16% of examinees were below one
standard deviation and these students were classified as the lowability group.
For the percent of examinees with missing items (%EMI), three conditions
(%EMII, %EMI2, and %EMI3) were constructed. In the first condition %EMIi, the
percent of the high, medium, and lowability examinees missing some test items were
0%, 20%, and 90% respectively. In the second condition %EMI2, the percent of the high,
medium, and lowability examinees missing some test items were 10%, 50%, and 95%.
In the third condition %EMI3, the percent of the high, medium, and lowability
examinees missing some test items were 20%, 80%, and 100%. The above three
conditions respectively corresponded to the minimum, the median, and the maximum
percent of examinees with missing items in each ability level provided by Cluxton and
Mandeville (1979).
Fourth, for the percent of items missing in those examinees with missing items
responses (%IM), another three conditions (%IMi, %IM2, and %IM3) were constructed.
The first condition %IMI was 7% of the items missing in the highability group, 18% of
the items missing in the mediumability group, and 40% of the items missing in the low
ability group. The second condition %IM2 was 12% of the items missing in the high
ability group, 28% of the items missing in the mediumability group, and 45% of the
items missing in the lowability group. The third condition %IM3 was 16% of the items
missing in the highability group, 38% of the items missing in the mediumability group,
and 49% of the items missing in the lowability group. The three conditions respectively
corresponded to the minimum, the median, and maximum percent of items missing in
each ability level provided by Cluxton and Mandeville (1979).
The above two sets of conditions were crossed to create nine missing conditions
as shown in Figure 31. For example, the results of one combination were 20% of the
highability examinees with three missing items (i.e., 16% of the 20 test items), 80% of
the mediumability examinees with eight missing items (i.e., 38% of the 20 test items),
and 100% of the lowability examinees with ten missing items (i.e., 49% of the 20 test
items). The distribution and percent of missing responses represented the typical range of
missing data in educational tests, which is approximately 1030% (Roth, 1994).
Range / Condition
Ability
H
M
L
M
L
M
y
Percent 7
0
20
90
50
95
80
Max / %IM
^~~JI
L1
40
Med / %IM2
12 28 45
Min / %IM3
161
38
Note. Max = Maximum, Med = Medium, and Min = Minimum.
Figure 31. Distribution and percent of missing responses in the nine missing conditions.
49
Max /
%EMI1
%EMI2
%EMI3
I
In addition to the nine missing conditions, an additional condition with
disproportional percent of examinees omitted items that were nonrandomly distributed
across and within each ability group was included (Roth, 1994). For example, more items
were missing at the bottom range of the highability group but a relatively fewer items
were missing at the top range of the same group (Roth, 1994). The procedure was to
stratify each ability group (high, medium, and low) into three substrata. Stratification
once again was based on plus and minus one standard deviation of the sample size within
each of the three ability groups. The percent of examinees missed some items within each
substratum ability group (%EMIs)was: 0, 10, 20 (in the highability group); 20, 50, 80
(in the mediumability group); 90, 95, 100 (in the lowability group). The corresponding
percent of item missing within each ability substratum (%IMs) was: 7, 12, 16 (for the
highability group); 18, 28, 38 (for the mediumability group); and 40, 45, 49 (for the
lowability group). The two situations were then crossed to form the tenth condition.
Table 32 summarized the distribution and percent of missing responses of the ten
missing conditions.
Table 32. Summary the Distribution and Percent of Missing Responses of the Ten
Missing Conditions
Condition Description
%EMIi x %IM 0% of the highability examinees having one missing item (i.e., 7% of the
20 test items) plus 20% of the mediumability examinees having four
missing items (i.e., 18% of the 20 items) plus 90% of the lowability
examinees having eight missing items (i.e., 40% of the 20 items). The total
percent of missing data is approximately 8.4%.
%EMI, x %IM2 0% of the highability examinees having two missing items (i.e., 12% of
the 20 test items) plus 20% of the mediumability examinees having six
missing items (i.e., 28% of the 20 items) plus 90% of the lowability
examinees having nine missing items (i.e., 45% of the 20 items). The total
percent of missing data is approximately 10.5%.
%EMIi x %IM3 0% of the highability examinees having three missing items (i.e., 16% of
the 20 test items) plus 20% of the mediumability examinees having eight
missing items (i.e., 38% of the 20 items) plus 90% of the lowability
examinees having ten missing items (i.e., 49% of the 20 items). The total
percent of missing data is approximately 12.6%.
%EMI2 x %IMN 10% of the highability examinees having one missing item (i.e., 7% of the
20 test items) plus 50% of the mediumability examinees having four
missing items (i.e., 18% of the 20 items) plus 95% of the lowability
examinees having eight missing items (i.e., 40% of the 20 items). The total
percent of missing data is approximately 13.3%.
Condition Description
%EMI2 x %IM2 10% of the highability examinees having two missing items (i.e., 12% of
the 20 test items) plus 50% of the mediumability examinees having six
missing items (i.e., 28% of the 20 items) plus 95% of the lowability
examinees having nine missing items (i.e., 45% of the 20 items). The total
percent of missing data is approximately 17.6%.
%EMI2 x %IM3 10% of the highability examinees having three missing items (i.e., 16% of
the 20 test items) plus 50% of the mediumability examinees having eight
missing items (i.e., 38% of the 20 items) plus 95% of the lowability
examinees having ten missing items (i.e., 49% of the 20 items). The total
percent of missing data is approximately 21.9%.
%EMI3 x %IMI 20% of the highability examinees having one missing item (i.e., 7% of the
20 test items) plus 80% of the mediumability examinees having four
missing items (i.e., 18% of the 20 items) plus 100% of the lowability
examinees having eight missing items (i.e., 40% of the 20 items). The total
percent of missing data is approximately 17.4%.
%EMI3 x %IM2 20% of the highability examinees having two missing items (i.e., 12% of
the 20 test items) plus 80% of the mediumability examinees having six
missing items (i.e., 28% of the 20 items) plus 100% of the lowability
examinees having nine missing items (i.e., 45% of the 20 items). The total
percent of missing data is approximately 23.8%.
%EMI3 x %IM3 20% of the highability examinees having three missing items (i.e., 16% of
the 20 test items) plus 80% of the mediumability examinees having eight
missing items (i.e., 38% of the 20 items) plus 100% of the lowability
Condition Description
examinees having ten missing items (i.e., 49% of the 20 items). The total
percent of missing data is approximately 30.2%.
%EMIs x %IMs 0% of the upper division of the highability examinees having one missing
item (i.e., 7% of the 20 test items) plus 10% of the middle division of the
highability examinees having two missing items (i.e., 12% of the 20
items) plus 20% of the lower division of the highability examinees having
three missing items (i.e., 16% of the 20 items) plus 20% of the upper
division of the mediumability examinees having four missing items (i.e.,
18% of the 20 test items) plus 50% of the middle division of the medium
ability examinees having six missing items (i.e., 28% of the 20 items) plus
80% of the lower division of the mediumability examinees having eight
missing items (i.e., 38% of the 20 items) plus 90% of the upper division of
the lowability examinees having eight missing items (i.e., 40% of the 20
test items) plus 95% of the middle division of the lowability examinees
having nine missing items (i.e., 45% of the 20 items) plus 100% of the
lower division of the lowability examinees having ten missing items (i.e.,
49% of the 20 items. The total percent of missing data approximately
18.0%.
Omitting Pattern
The two nonrandom omitting patterns were: (1) omitting item responses in the
body of the test (OPB), and (2) omitting item responses at the end of the test (OPE) (i.e.,
nonreached). The first situation was similar to the missing data mechanism 5 in
Harrison's (1998) study. However, in contrast to Harrison's (1998) study where
examinees with the lowest abilities missed the most difficult items, examinees with
different levels of abilities missed the most difficult items differentially. That meant that
the highability examinees missed fewer difficult items than those of the mediumability
examinees, and in turn the mediumability examinees missed fewer difficult items than
those of the lowability examinees (see Figure 32).
The nonreached pattern was similar to the combination of missing data
mechanism 2 and 3 in Harrison's (1998) study. However, the selection of missing
responses was based on the examinees' ability instead of random selection as the missing
data mechanism 2 in Harrison's (1998) study. Once again, the highability examinees
missed fewer items at the end of a test than those of the mediumability examinees, and
in turn the mediumability examinees missed fewer items at the end of a test than those of
the lowability examinees (see Figure 33).
Least difficult Most difficult
Item
Difficulty 4 i II ,:. I. : 3 12 7 I1
Person
Ability
S12 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1
0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0
0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 0
4 1 0 1 0 0 0 1 10 0 1 1 1 1 1 1
8 I 1 I 0 I 0 1 1 I I 1 I I I 0
12 0 0 0 1 0 0 1 0 0
1 111 0 1 0 0 110 0 1 1 1
10 I III 10
10 1 1 1 1 0 001 1 I1
13 1 0 0 1 0 0 1 0 0 1 1 0 1
1 1 1 0 0 1 1 0 0 0 1
Figure 32. Illustrate the omitting pattern of missing item responses in the body of the test
(OPB) with 15 examinees and 20 test items.
I 2 3 4 5 I 7 a 9 10 II 12 1 U 14 I 16 1 I S 19 2
(high ability) 1 0 1 0 1 0 0 0 I I 0 0 1 I I 1 1 1
0111110101111111101
1.4 0 000 00 1111111
8 I I I 0 1 0 I I I I I 1 I 1 0
12 0 1 0 0 0 0 I I 0 I 0 I 0 I
1 I I 0 0 0 0 0 I 0 0 I I
10 1 1 1 1 1 0 1 0 1 1 1 1 1 1
10. I 0 0 I 0 0 1 0 0 I I 0
3 1 0 1 1 1 1 1 1 1 0 1
2 I I I I 0 I 0 1 I 0 0
.. ... : 0 1 0 0 1 1 0 1 0 0
9 00011101
14 1 1 0 0 1 1 0 000
7 1 1 0 0 1 0 0 0 1
9 I 0 0 0 I I I 0 1
(lower ability)
Figure 33. Illustrate the omitting pattern of missing item responses at the end of the test
(OPE) with 15 examinees and 20 test items.
Cause of Missingness
Under the omitting pattern in which item responses was omitted in the body of the
test, the examinees' ability and the item difficulty provided an indirect evidence about the
likely values of the missing responses. On the other hand, when the item responses were
omitted at the end of the test, the examinee's ability and the item effect, which is the
random effect of the test items, provide an indirect evidence about the likely values of the
missing responses. Since the cause of missingness in this study was under the
researcher's control, and the differential amount of missing responses was a function of
the examinees' ability and item difficulty, or a function of the examinee's ability and
item effect depending on the omitting patterns, the missing data mechanism could be
considered as missing at random (Graham et al., 1997). The missing data mechanism was
therefore ignorable.
Iterations
For each of the 60 conditions (3 levels of sample size, 10 levels of the distribution
and percent of item response omission, and 2 levels of omitting pattern), one thousand
iterations were performed to ensure stable results. The one thousand iterations have been
used in two previous simulation studies in the evaluation of the efficiency of MI (Glynn,
Laird, & Rubin, 1993; Graham et al., 1996). The iterations resulted in generating 1,000
repeated data set for each level of sample size.
Multiple Imputation Procedure
Let Ybe an N x 1 vector of measures with Y~ N (X,/, o), where X? covariate
was a function of the parameters, X was a matrix of examinees' ability and item difficulty
variables under the situation when the omitting item responses were in the body of the
test, or examinees' ability and item effect variables under the situation when the omitting
item responses were at the end of the test, and / was a vector of regression parameters to
be estimated. The distribution of was assumed to be multivariate normal. The algorithm
for creating ten multiply imputed Ym involved the following steps (Freedman, 1990;
Freedman & Wolf, 1995):
Step 1. Specified a particular form of imputation model to predict the value of a missing
variable Y and estimate the parameter vector 3 of regression coefficients using the portion
of the sample with complete data. Since the item responses in this study were
dichotomous in nature, the prediction model required a logistic regression model with a
univariate Yof the form
Logit (pj) = In pj = o + Xo+ j + f2X2j (33)
lPj)
where = 1,..., n examinees.
The set of predictors entered the explicit model to create imputations for OPB
differed from that for OPE. Under the situation when the omitting item responses were in
the body of the test, XIj was the examinee's ability, and X2j was the item difficulty,
whereas under the situation when the omitting response were at the end of the test, Xij
was the examinee's ability, and X2j was the item effect, which was the random effect of
the 20 items. As a result, there were two distinct logistic regression models for the MI
procedure.
The posterior probability of the response given Xij and X2j was
pj = E (Yj Xi, X2) = Pr (Yj = 1 I Xj, X2j) (34)
The logistic regression model assumed that the logit of the posterior probability was a
linear combination of the Xij and X2j variables.
1 if logit [ ^Xj] > ,
Yj = (35)
0 otherwise,
where i, was a random error.
Regressed Y, on the corresponding X. matrix gave the ordinary least squares estimates:
estimated regression coefficient vector f and estimated variancecovariance matrix E.
Step 2. Randomly drew from the sampling distribution of regression coefficients.
Estimated the regression coefficients by adding the random error to account for the
uncertainty about the regression prediction, which was the same as equation (211). Each
repetition used a distinct value of/* common across all imputed cases. This value was
drawn independently from the multivariate normal distribution of the estimated vector/8.
Step 3. Given an estimate of / and the value for X1, X2, the probability of answering an
item correctly could be predicted with equation (35) by drawing a value of u drawn
from the uniform [0,1] distribution. Each set of predicted values Y. was based on
different sets of regression predictions and an independent drawn value of u The
probability of a correct response for respondents in the kth repetitions, Pj(k), was
calculated with the randomly selected regression coefficients and the value ofj for the
corresponding covariates from the logistic regression:
P 1' =1+ex. Iwx 2)X2
Step 4. The estimated probability j(k) from the logistic regression was compared to a
random number t from the uniform [0,1] distribution for each missing score. If the
predicted probability pj(k) was less than t then the imputed value for YJ(k) was assigned a 1;
otherwise the imputed value was assigned a 0. These probabilities were used to impute
the missing scores.
Step 5. Conducted ten repetitions which meant repeating steps 2 to 4 ten times to create a
series of ten imputed values (i.e., ten distinct imputed data sets).
Step 6. Computed the KR 20 (i.e., coefficient alpha) separately in each of the ten imputed
complete data sets. This resulted in ten separate coefficient alphas.
Step 7. Using the equation (215), the final adjusted coefficient alpha was obtained by
taking the simple arithmetic average of the ten coefficient alphas. This final coefficient
alpha was then compared to the one obtained from the original complete data set.
Evaluating the Performance of Multiple Imputation
The accuracy (bias and precision) of the coefficient alpha obtained from the
restored complete data set in each of the ten missing conditions using MI was assessed by
means of the bias and the rootmeansquare error (RMSE). Measures of the bias and
RMSE were averaged over the 1,000 iterations of the simulation.
Bias is defined as the average value of the coefficient alphas derived from the
original complete data set with no missing data minus the average value of the coefficient
alphas from the corresponding imputed data set over the 20,000 (i.e., 2 x 10 x 1000)
completed tests for a particular number of examinees. The estimated coefficient alpha is
unbiased when the average deviation (i.e., bias) between the coefficient alpha obtained
from the imputed values and that of the original values in the data set is close to 0.
RMSE is defined as the square root of the average squared difference between the
coefficient alpha derived from the original complete data set with no missing data and the
coefficient alpha from the corresponding imputed data set. The estimated coefficient
alpha is precise when the RMSE is close to 0.
RMSE = (a of original data a of restored data using MI)2 (36)
The relationship between RMSE and bias is
(RMSE)2 = (bias)2 + (SE)2 (37)
where
RMSE is the root mean square error, which represents an overall error,
bias is the average deviation, which represents a systematic error, and
SE is the standard error, which represents random error.
CHAPTER 4
RESULTS
In this chapter results of the analyses of the data for the two performance criteria
are presented. The two criteria are the bias and root mean square error (RMSE). The
mean coefficient alpha and its standard deviation of the original complete data set with no
missing data for 50, 100, and 500 examinees were M = 0.765, SD = 0.033; M = 0.764,
SD = 0.023; M = 0.763, SD = 0.01 respectively. Each mean coefficient alpha was based
on the average of 20,000 (10 missing conditions x 2 omitting patterns x 1000 iterations)
values. The results of these mean coefficient alphas were very close to those computed by
Harrison (1998). For example, the mean coefficient alpha for the sample size of 50 in
Harrison's study was 0.769. The mean coefficient alphas and their standard deviation of
the restored completed data set using MI for the ten missing conditions in each of the
three levels of sample size and two levels of omitting pattern are shown in Figures 41 to
46.
The biases obtained in each of the ten missing conditions for the two omitting
patterns are summarized in Tables 41 and 42. Under the omitting pattern where missing
responses are at the end of the test (OPE), the biases (in absolute value) ranged from
0.000 to 0.030. The majority (93%) of the biases in OPE were in the magnitude of less
than 0.02. The biases (in absolute value) obtained in OPE for the sample size 50, 100, and
500 ranged from 0.001 to 0.030, 0.000 to 0.016, and 0.000 to 0.009 respectively. The
biases obtained in the omitting pattern where missing responses are in the body of the
test (OPB) were noticeably higher than those in OPE of the corresponding missing
conditions, the ), the biases (in absolute value) ranged from 0.019 to 0.069. The majority
(97%) of the biases in OPB were less than 0.06. The biases (in absolute value) obtained
in OPB for the sample size 50, 100, and 500 ranged from 0.027 to 0.069, 0.019 to 0.051,
and 0.028 to 0.054 respectively. As expected, the largest bias was in the missing
condition %EMI3 x %IM3 where the small sample size (50) accompanied with the largest
percent (30.2%) ofmissingness. The bias in this condition was 0.069. The coefficient
alphas in OPB were always overestimated (positively biased), whereas in OPE, about half
of the coefficient alphas obtained through MI was overestimated and the other half was
underestimated. Whether MI produced the coefficient alphas that were overestimated or
underestimated in OPE did not depend on the percent ofmissingness. Further research
needs to be conducted to explore why some of the coefficient alphas obtained from MI
were overestimated while others were underestimated under the same omitting pattern.
For condition %EMIs x %IMs in which nonrandom distribution of omission is
across and within each ability group (Roth, 1994), the bias obtained in this condition,
regardless of the omitting patterns, was similar to other missing conditions where the
percent ofmissingness was about the same (e.g., missing condition %EMI2 x %IM2). The
RMSEs obtained in each of the ten missing conditions for the two omitting patterns are
summarized in Tables 43 and 44. The results were very similar to those obtained for the
bias.
In general, the bias (in absolute value) or the RMSE increased as the amount of
missingness increased. Graham and Schafer (1999) explained this phenomenon by
suggesting that MI introduced bias when handling missing data. However, the pattern of
increment in the bias or the RMSE was not unidirectional as indicated in Tables 41 to 4
4. There were irregularities in the magnitude of the bias or the RMSE across the ten
missing conditions within each sample size. That means in some missing conditions, the
magnitude of the bias or the RMSE for the smaller amount of missingness was bigger
than that of the larger amount of missingness even both conditions had the same sample
size. This kind of irregularity was also noticed in Graham and Schafer's (1999)
simulation study. Another general pattern revealed in this study was that the bias
decreased as the sample size increased. Once again the pattern of increment in the bias
was not unidirectional as indicated in Tables 41 and 42. There were irregularities across
the three sample sizes, and this kind of irregularity was also noticed in Graham and
Schafer's study. On the other hand, the magnitude of the RMSE in OPE, but not in the
OPB showed a clear pattern of decrement as the sample size increased (see Table 43 and
44).
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0
U 0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10
Missing Condition
Figure 41. The mean coefficient alpha in OPB with sample size of 50.
1.0
0.9
0.8
A 0.7
0.6
o 0.5
0.4
o 0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10
Missing Condition
Figure 42. The mean coefficient alpha in OPE with sample size of 50.
1.0
0.9
0.8
0.7
0.6
0.5
0
u 0.4 
0.2
0.1
0.0 
1 2 3 4 5 6 7 8 9 10
Missing Condition
Figure 43. The mean coefficient alpha in OPB with sample size of 100.
1.0
0.9
0.8
c, 0.7
0.6
0.5
0.4
0
U 0.3 
0.2
0.1
0.0 7 7. T
1 2 3 4 5 6 7 8 9 10
Missing Condition
Figure 44. The mean coefficient alpha in OPE with sample size of 100
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0 7 
1 2 3 4 5 6 7 8 9 10
Missing Condition
Figure 45. The mean coefficient alpha in OPB with sample size of 500.
1.0
0.9
0.8
0.7
0.6
0.5
0.4
u 0.3
0.2
0.1
0.0....
1 2 3 4 5 6 7 8 9 10
Missing Condition
Figure 46. The mean coefficient alpha in OPE with sample size of 500.
Table 41. Bias for the Coefficient Alpha in Omitting Pattern Where Missing Responses
are in the Body of the Test
Sample size
50 100 500
Missing Approx. %
Sof Mean (SD) Mean (SD) Mean (SD)
ConditionMssngness
Missingness
%EMI1 x %IM,
%EMIl x %IM2
%EMI, x %IM3
%EMI2 x %IMi
%EMI3 x %lM,
%EM12 x %IM2
%EMI, x %IM,
%EMI2 x %IM3
%EMIs x %IM2
%EMI3 x %IM3
8.4%
10.5%
12.6%
13.3%
17.4%
17.6%
18.0%
21.9%
23.8%
30.2%
0.036 (0.017)
0.028 (0.018)
0.035 (0.019)
0.042 (0.020)
0.059 (0.021)
0.054 (0.022)
0.034 (0.020)
0.027 (0.023)
0.031 (0.024)
0.069 (0.023)
0.037 (0.012)
0.019 (0.014)
0.030 (0.015)
0.036 (0.013)
0.051 (0.015)
0.050 (0.015)
0.047 (0.015)
0.043 (0.016)
0.040 (0.016)
0.047 (0.018)
0.028 (0.005)
0.029 (0.006)
0.043 (0.006)
0.036 (0.006)
0.043 (0.007)
0.035 (0.007)
0.047 (0.007)
0.040 (0.007)
0.054 (0.007)
0.048 (0.008)
Table 42. Bias for the Coefficient Alpha in Omitting Pattern Where Missing Responses
are at the End of the Test
Sample size
50 100 500
Missing Approx. %
Missing of Mean (SD) Mean (SD) Mean (SD)
ConditionM
Missingness
0.017 (0.019)
0.006 (0.017)
0.009 (0.020)
0.001 (0.023)
0.013 (0.024)
0.004 (0.022)
0.005 (0.024)
0.024 (0.025)
0.030 (0.028)
0.017 (0.029)
0.011 (0.014)
0.010 (0.014)
0.015 (0.017)
0.016 (0.013)
0.010(0.014)
0.013 (0.016)
0.000 (0.018)
0.006 (0.020)
0.008 (0.020)
0.006 (0.021)
0.007 (0.006)
0.002 (0.006)
0.007 (0.007)
0.007 (0.006)
0.009 (0.008)
0.008 (0.007)
0.005 (0.007)
0.008 (0.008)
0.000 (0.008)
0.003 (0.009)
%EMIT x %IMI
%EMIT x %1M2
%EMIi x %IM3
%EMI2 x %IMi
%EMI3 x %IMI
%EMI2 x %1M2
%EMI, x %IM,
%EMI2 x %IM3
%EMI3 x %IM2
%EMI3 x %IM3
8.4%
10.5%
12.6%
13.3%
17.4%
17.6%
18.0%
21.9%
23.8%
30.2%
Table 43. RMSE for the Coefficient Alpha in Omitting Pattern Where Missing
Responses are in the Body of the Test
Sample size
50 100 500
Missing Approx. %
niin of Mean (SD) Mean (SD) Mean (SD)
MissigConditionness
Missingness
0.037 (0.017)
0.029 (0.017)
0.035 (0.018)
0.043 (0.019)
0.059 (0.021)
0.054 (0.022)
0.034 (0.019)
0.030 (0.019)
0.033 (0.021)
0.069 (0.023)
0.037 (0.012)
0.020 (0.013)
0.030 (0.014)
0.036 (0.013)
0.051 (0.015)
0.050 (0.015)
0.047 (0.015)
0.043 (0.016)
0.040 (0.016)
0.047 (0.018)
0.028 (0.005)
0.029 (0.006)
0.043 (0.006)
0.036 (0.006)
0.043 (0.007)
0.035 (0.007)
0.047 (0.007)
0.040 (0.007)
0.054 (0.007)
0.048 (0.008)
%EMI, x %IMI
%EMI, x %IM2
%EMI1 x %1M3
%EMI2 x %IMI
%EMI3 x %IMI
%EMI2 x %1M2
%EMI, x %IM,
%EMI2 x %IM3
%EMI3 x %IM2
%EMI3 x %IM3
8.4%
10.5%
12.6%
13.3%
17.4%
17.6%
18.0%
21.9%
23.8%
30.2%
Table 44. RMSE for the Coefficient Alpha in Omitting Pattern Where Missing
Responses are at the End of the Test
Sample size
50 100 500
Missing Approx. %
of Mean (SD) Mean (SD) Mean (SD)
ConditionMissingness
Missingness
0.021 (0.014)
0.015 (0.011)
0.018 (0.014)
0.018 (0.014)
0.022 (0.016)
0.018 (0.014)
0.020 (0.015)
0.028 (0.021)
0.033 (0.024)
0.027 (0.020)
0.015 (0.010)
0.014 (0.010)
0.018 (0.013)
0.017 (0.011)
0.014 (0.011)
0.017 (0.013)
0.014 (0.011)
0.016 (0.013)
0.017 (0.013)
0.017 (0.013)
0.008 (0.005)
0.005 (0.004)
0.008 (0.005)
0.008 (0.005)
0.010 (0.006)
0.009 (0.006)
0.007 (0.005)
0.009 (0.006)
0.006 (0.005)
0.007 (0.006)
%EMI1 x %IMi
%EMI, x %IM2
%EMIi x %IM3
%EMI2 x %IMI
%EMI3 x %IMi
%EMI2 x %IM2
%EMI, x %IM,
%EMI2 x %IM3
%EMI3 x %IM2
%EMI3 x %IM3
8.4%
10.5%
12.6%
13.3%
17.4%
17.6%
18.0%
21.9%
23.8%
30.2%
CHAPTER 5
DISCUSSION
Because there was no substantial bias for all the missing conditions, the results of
this simulation study indicated that MI is a reasonably good procedure to replace the
missing data in a singlefacet crossed model in which missing responses are either in the
body of the test or at the end of the test. The majority of the biases obtained were less
than 0.05, and the magnitude was comparable to those obtained in Harrison's (1998)
study. The most significant difference was that the amount ofmissingness in the present
study was two to three times more than that used in Harrison's study, and the omitting
patterns were nonignorable.
The present study used the examinee's ability and item difficulty b as the
predictors in the logistic regression when the missing responses were in the body of the
test, and the examinee's ability and item effect i as the predictors when the missing
responses were at the end of the test. The predictors used in the present study differed
from the ones used by Harrison (1998). Harrison (1998) used examinee effect and item
effect i as the predictors. Results of using and i as the predictors in the present study
indicated that the biases for the coefficient alpha were unacceptably higher than those
obtained using and b, or and i. For example, in the missing condition %EMI3 x %IM3
with a sample size of 50, the bias obtained using thej and i as the predictors was 0.211
when missing responses were in the body of the test, and 0.233 when missing responses
were at the end of the test. This illustrated one of the limitations of MI as mentioned in
Chapter 3, namely that inference based on MI will be biased when relevant predictors are
not incorporated (Schafer, 1999; Schafer & Olsen, 1998).
An attempt to include more predictors (i.e., examinee's ability item difficulty b,
examinee effect and item effect i) in the logistic model did not help to reduce the bias.
For example, the bias obtained using 4, b,j and i as the predictors was 0.07 in the missing
condition %EMI3 x %IM3 when the omitting pattern was OPB and the sample size was
50. This illustration affirmed Rubin's (1987) suggestion that extra variables did not affect
the bias in the inferences. Further systematic studies need to be conducted to support
Rubin's claim regarding the relationship between the bias and the number of predictors.
Another possible factor that may affect the accuracy of the obtained coefficient
alphas was the extreme value in some of the item discrimination parameters (e.g., a =
7.637). Unfortunately, a simpler model such as a oneparameter Rasch model with fixed
item discrimination and pseudochance parameters did not help to reduce the bias. The
bias for the above missing condition (i.e., %EMI3 x %IM3) was still in the magnitude of
0.07 when using and b as the predictors.
When comparing the biases obtained from the two omitting patterns, it is
suggested that examinee's ability rather than item effect or item parameters may
contribute more to the accuracy of the parameter estimation. Further systematic
investigation is warranted.
Finally, a surprising finding was obtained when using listwise deletion to estimate
the coefficient alpha in the above missing condition (i.e., %EMI3 x %IM3)the bias was
0.077. The bias (in absolute value) was much smaller than those obtained from
Harrison's study (1998), even the amount of missingness was three time more. The bias
obtained from the nonrandom missing conditions (with 10% ofmissingness) in
Harrison's study was about 0.2. This surprising finding may have something to do with
the idiosyncratic nature of the missing mechanism in this study. Further research need to
systematic investigate this issue.
Limitations
The present study used examinee's ability and item difficulty as the predictors in
OPB, and used examinee's ability and item effect as the predictors in OPE. However, in
real life testing situations, the ability parameter and item parameter b need to be
estimated first. Accurate estimation of these two parameters may not be possible in
situations with a substantial amount of missing data (Bradlow & Thomas, 1998). Another
limitation is that one may not be sure of the mechanism producing the missing data.
Suggestions for Future Research
The present study only illustrated one way of using MI to analyze the data. It is
important to perform a sensitivity analysis to compare the results obtained in the present
study with those when the nonresponse model was treated as nonignorable. A comparison
of the coefficient alpha obtained using the selection approach model versus the pattern
mixture model certainly would be informative.
The bias obtained in the present study as well as in Graham and Schafer's (1999)
study was not a linear function of the amount of missingness or the sample size.
However, no good explanation can be given based on the limited information provided in
the present study as well as in Graham and Schafer's (1999) study. This may be an
important issue for further investigation.
Because of the positively skewed distribution of the biases in OPB and the lower
bound nature of the coefficient alpha, it is suggested that using the median instead of the
mean to compute the final adjusted alpha may be worthwhile to investigate.
This study illustrated two of the most commonly encountered omitting patterns
missing responses in the body of the test and at the end of the test. In most real life
educational tests, the types of omitting patterns are much more complicated and the
missing responses as suggested by Schafer and Olsen (1988) can arise from a variety of
reasons including a combination of ignorable and nonignorable mechanisms. Systematic
investigation of the effectiveness of different MDTs especially RMLE and MI under the
conditions involving a combination of ignorable and nonignorable is important for
examining different kinds of missing responses.
In chapter 2, several methods have been described to create the posterior
predictive probability distribution from Yo; however, today few studies have attempted to
compare different methods of data simulation applied to MI. Duncan, Duncan, and Li
(1998) illustrated the use of data augmentation and bootstrap in a structural equation
model. More studies should investigate the effectiveness of these data simulation
methods.
Obviously, the application of MI is not confined to the singlefacet situation;
further study should explore the application of MI to multifacet situations where a
generalizability coefficient is obtained. The incorporation of rater facet in nested designs
83
can be an extension of the present study to test the effectiveness of MI in handling
missing data in a more complicated situation.
REFERENCES
Angoff, W. H., & Schrader, W. B. (1984). A study of hypotheses basic to the use
of rights and formula scores. Journal of Educational Measurement, 21, 117.
Bacik, J. M., Murphy, S. A., & Anthony, J. C. (1998). Drug use prevention data,
missed assessments and survival analysis. Multivariate Behavioral Research, 33, 573
588.
Barnard, J., Du, J. T., Hill, J. L., & Rubin, D. B. (1998). A broader template for
analyzing broken randomized experiments. Sociological Methods and Research, 27, 285
317.
Barnard, J., & Meng, X. L. (1999). Applications of multiple imputation in
medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8,
1736.
Beaton, A. E. (1997). Missing scores in survey research. In J. P. Keeves (ed.),
Educational research, methodology, and measurement: An international handbook (2nd
ed., pp. 763766). New York: Pergamon Press.
Bradlow, E. T., & Thomas, N. (1998). Item response theory models applied to
data allowing examinee choice. Journal of Educational and Behavioral Statistics, 23, 236
243.
Brownstone, D., &Valletta, R.G. (1996), Modeling earnings measurement error:
A multiple imputation approach. Review of Economics and Statistics, 78, 705717.
Chirembo, A. M. (1995). Direct versus indirect methods for the estimation of
variancecovariance matrices and regression parameters when data are skewed and
incomplete. Unpublished doctoral dissertation, University of Florida, Gainesville.
Cluxton, S. E., & Mandeville, G. K. (1979, April). Latent trait models: Ability
estimates and omitted items. Paper presented at the 63rd Annual Meeting of the
American Educational Research Association, San Francisco, CA.
Crawford, S. L., Tennstedt, S. L., & McKinlay, J. B. (1995). A comparison of
analytic methods for nonrandom missingness of outcome data. Journal of Clinical
Epidemiology, 48, 209219.
Crocker, L., & Algina, J. (1986). Introduction to classical and modem test theory.
New York: Holt, Rinehart, & Winston.
Cronbach, L. J. (1971). Test validation. In R. L. Thomdike (Ed.), Educational
measurement (2nd ed.). Washington, DC: American Council on Education.
Curran, D., Bacchi, M., Hsu Schmitz, S. F., Molenberghs. G., & Sylvester, R. J.
(1998). Identifying the types of missingness in quality of life data from clinical trials.
Statistics in Medicine, 17, 739756.
DeCanio, S.J., & Watkins, W.E. (1998). Investment in energy efficiency: Do the
characteristics of firms matter? Review of Economics and Statistics, 80, 95107.
Downey, D. G., & King, C. V. (1998). Missing data in Likert ratings: A
comparison of replacement methods. Journal of General Psychology, 125, 175191.
Duncan, T. E., Duncan, S. C., & Li, F. (1998). A comparison of model and
multiple imputationbased approaches to longitudinal analyses with partial missingness.
Structural Equation Modeling, 5, 121.
Freedman, V. (1990). Using SAS to perform multiple imputation. (Discussion
Paper Series UIPSC6). Washington, DC: Urban Institute,
Freedman, V., & Wolf, D. A. (1995). A casestudy on the use of multiple
imputation. Demography, 32, 459470.
Gelfand, A. E., & Smith, A. M. F. (1990). Sampling based approaches to
calculating marginal densities. Journal of the American Statistical Association, 86, 398
409.
Glynn, R. J., Laird, N. M., & Rubin D. B. (1993). Multiple imputation in mixture
models for nonignorable nonresponse with followups. Journal of the American
Statistical Association, 88, 984993.
Graham, J. W., & Donaldson, S. I. (1993). Evaluating interventions with
differential attrition: The importance of nonresponse mechanisms and use of followup
data, Journal of Applied Psychology, 78, 119128.
Graham, J. W., Hofer, S. M., Donaldson, S. I., MacKinnon, D. P., & Schafer, J. L.
(1997). Analysis with missing data in prevention research. In K. J. Bryant, M. Windle, &
S. G. West (Eds.), The science of prevention: Methodological advances from alcohol and
substance abuse research (pp. 325366). Washington, DC: American Psychological
Association.
Graham, J. W., Hofer, S. M., & McKinnon, D. P. (1996). Maximizing the
usefulness of data obtained with planning missing value patterns: An application of
maximum likelihood procedures. Multivariate Behavioral Research, 31, 197218.
Graham, J.W., Hofer, S.M., & Piccinin, A.M. (1994). Analysis with missing data
in drug prevention research. In L. M. Collins & L. A. Seitz (Eds.), Advances in data
analysis for prevention intervention research (NIDA Research Monograph 142, pp. 13
62). Washington, DC: National Institute on Drug Abuse.
Graham, J. W., & Schafer, J. L. (1999). On the performance of multiple
imputation for multivariate data with small sample size. In R. H. Hoyle (Ed.), Statistical
strategies for small sample research (pp. 129). Thousand Oaks, CA: Sage.
Greenland, S., & Finkle, W. D. (1995). A critical look at methods for handling
missing covariates in epidemiologic regression analyses. American Journal of
Epidemiology, 142, 12551264.
Gross, A. L. (1997). Interval estimation of bivariate correlations with missing data
on both variables: A Bayesian approach. Journal of Educational and Behavioral Statistics,
22,407424.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles
and applications. Boston: KluwerNijhoff.
Harrison, J. M. (1998). A comparison of strategies for estimating internal
consistency on tests with missing data. Unpublished master's thesis, University of
Florida, Gainesville.
Heitjan, D. F. (1997). Annotation: What can be done about missing data?
Approaches to imputation. American Journal of Public Health, 87, 548550.
Heitjan, D. F., & Little, R. J. A.. (1991). Multiple imputation for the fetal accident
report and system. Applied Statistics, 40, 1329.
Heitjan, D. F., & Rubin, D. B. (1990). Inference from coarse data via multiple
imputation with application to age heaping. Journal of the American Statistical
Association, 85, 304314.
Isaacson, J., & Smith, G. (1993). Hosting a mathematics tournament for twoyear
college students. (ERIC Document Reproduction Service No. ED 366 382)
Jamshidian, M., & Bentler, P. M. (1999). ML estimation of mean and covariance
structures with missing data using complete data routines. Journal of Educational and
Behavioral Statistics, 24, 2141.
Kalton, G., & Kasprzyk, D. (1986). The treatment of missing survey data. Survey
Methodology, 12, 116.
Kim, J. 0., & Curry, J. (1977). The treatment of missing data in multivariate
analysis. Sociological Methods and Research, 6, 215241.
Koretz, D., Lewis, E., SkewesCox, T., & Burstein, L. (1993). Omitted and non
reached items in mathematics in the 1990 National Assessment of Educational Progress.
(ERIC Document Reproduction Service No. ED 378 220)
Kromrey, J. D., & Hines, C.V. (1994). Nonrandomly missing data in multiple
regression: An empirical comparison of common missingdata treatments. Educational
and Psychology Measurement, 54, 573593.
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test
reliability. Psychometrika, 2, 151160.
Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine,
7, 305315.
Landerman, L. R., Land, K. C., & Pieper, C. F. (1997). An empirical evaluation of
the predictive mean matching method for imputing missing values. Sociological Methods
and Research, 26, 333.
Little, R. J. A. (1992). Regression with Missing X's: A review. Journal of the
American Statistical Association, 87, 12271237.
Little, R. J. A. (1995). Modeling the dropout mechanism in repeatedmeasures
studies. Journal of the American Statistical Association, 90, 11121121.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New
York: Wiley.
Little, R. J. A., & Rubin, D. B. (1989). The analysis of social science data with
missing data. Sociological Methods and Research, 18, 292326.
Little, R. J. A., & Schenker, N. (1995). Missing data. In G. Arminger, C. C.
Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and
behavioral sciences (pp. 3975). New York: Plenum.
Longford, N. T. (1994). Models for scoring missing responses to multiplechoice
items. (ERIC Document Reproduction Service No. ED 382 650)
Marcoulides, G. A. (1990). An alternative method for estimating variance
components in generalizability theory. Psychological Report, 66, 379386.
Michiels, B., & Molenberghs, G. (1997). Protective estimation of longitudinal
categorical data with nonrandom dropout. Communication in Statistics Theory and
Method, 26, 6594.
Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP.
Journal of Educational Statistics, 17, 131154.
Neal, T., & Nianci, G. (1997). Generating multiple imputations for matrix
sampling data analyzed with item response models. Journal of Educational and
Behavioral Statistics, 22, 425445.
Oshima, T. C. (1994). The effect of speededness on parameter estimation in item
response theory. Journal of Educational Measurement, 31, 200219.
Peterson, R. A. (1994). A metaanalysis of Cronbach's coefficient alpha. Journal
of Consumer Research, 21, 381391.
Pollard, W. E. (1986). Bayesian statistics for evaluation research: An
introduction. Beverly Hills, CA: Sage.
Raaijmakers, Q. A. (1999). Effectiveness of different missing data treatments in
surveys with Likerttype data: Introducing the relative mean substitution approach.
Educational and Psychological Measurement, 59, 725728.
Raghunathana, E., & Siscovick, S. (1996). A multipleimputation analysis of a
casecontrol study of the risk of primary cardiac arrest among pharmacologically treated
hypertensives. Applied. Statistics, 45, 335352.
Raymond, M. R. (1987). Missing data in evaluation research. Evaluation and the
Health Professions, 9, 395420.
Roth, P. L. (1994). Missing data: A conceptual review for applied psychologists.
Personnel Psychology, 47, 537550.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581592.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York:
Wiley.
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American
Statistical Association, 91, 473489.
Rubin, D. B., & Schenker, N. (1986). Multiple imputation for interval estimation
from simple random samples with ignorable nonresponse. Journal of the American
Statistical Association, 81, 366374.
Rubin, D. B., & Schenker, N. (1991). Multiple imputation in healthcare
databases: An overview and some applications. Statistics in Medicine, 10, 585598.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York:
Chapman & Hall.
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in
Medical Research, 8, 315.
Schafer, J. L., & Olsen, M. K. (1988). Multiple imputation for multivariate
missingdata problems: A data analyst's perspective. Multivariate Behavioral Behavioral
Research, 33, 545571.
Schmidt, F. L., Hunter, J. E., & Urry, V. W. (1976). Statistical power in criterion
related validation studies. Journal of Applied Psychology, 61, 473485.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions
by data augmentation (with discussion). Journal of the American Statistical Association,
82, 528550.
van Buuren, S., Boshuizen, H. C., & Knook, D. L. (1999). Multiple imputation of
missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681
694.
Wang, C. Y., Anderson, G. L., & Prentice, R. L. (1999). Estimation of the
correlation between nutrient intake measures under restricted sampling. Biometrics, 55,
711717.
Wang, R., Sedransk, J., & Jinn, J. H. (1992). Secondary data analysis when there
are missing observations. Journal of the American Statistical Association, 87, 952961.
Way, W. D., & Reese, C. M. (1991). An investigation of the use of simplified IRT
models for scaling and equating the TOEFL test. (ERIC Document Reproduction Service
No. ED 395 024)
Xie, F., & Paik, M. C. (1997). Multiple imputation methods for the missing
covariates in generalized estimating equation. Biometrics, 53, 15381546.
Yamamoto, K. (1995). Estimating the effects of test length and test time on
parameter estimation using the HYBRID model. (ERIC Document Reproduction Service
No. ED 395 035)
Yen, W. M. (1987). A comparison of the efficiency and accuracy ofBILOG and
LOGIST. Psychometrika, 52, 275291.
BIOGRAPHIC SKETCH
Hon Keung Yuen was born in 1961 in Hong Kong. He completed his
undergraduate studies at Queensland University, Brisbane, Australia in 1986, where he
majored in occupational therapy. In 1988, he received a master of science degree in
occupational therapy from Western Michigan University. After five years of occupational
therapy practice in the field of traumatic head injury rehabilitation, Mr. Yuen's interest in
research grew. Between 1993 and 1996, he taught occupational therapy at Eastern
Kentucky University and subsequently in the Hong Kong Polytechnic University. In
1996, he began working on his Ph.D. in the College of Education at the University of
Florida, where he majored in research and evaluation methodology.
While pursuing his Ph.D., Mr. Yuen also worked fulltime on the faculty of the
Occupational Therapy Department at the University of Florida. Mr. Yuen has published
over twelve articles in the American Journal of Occupational Therapy. Currently, he
serves on the editorial board of the American Journal of Occupational Therapy.
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
M. David Miller, Chair
Professor of Educational Psychology
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
Anne E. Seraphine
Assistant Professor of Educational
Psychology
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
Arthur J. ewman
Professor of Educational Leadership, Policy,
and Foundations
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.
Kay Walker
Professor of Occupational Therapy
This dissertation was submitted to the Graduate Faculty of the College of
Education and the Graduate School and was accepted as partial fulfillment of the
requirements for the degree of Doctor of Philosophy.
August, 2000
Chairman, of Educational Psychology
Dean, College of Education
Dean, Graduate School
UNIVERSITY OF FLORIDA
3 1262 08555 1538
