UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
THE IMPACT OF MULTIPLE IMPUTATIONS ON THE ESTIMATION OF COEFFICIENT ALPHA By HON KEUNG YUEN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2000 ACKNOWLEDGMENTS I am indebted to a few special individuals who made this dissertation possible. First, I would like to thank my wife, Kit, for her patience and understanding throughout this process. Also, I would like to thank my committee members Dr. David Miller, and Dr. Anne Seraphine, Dr. Kay Walker, and Dr. Arthur Newman for their time and support. TABLE OF CONTENTS page ACKNOWLEGEMENTS .................................................................... ii ABSTRACT ......................................................................................... iv CHAPTERS 1. INTRODUCTION ....................................................................... 1 Statement of the Problem ............................................................... 4 Rationale for the Study ..................................................................... 5 Purpose and Significance of the Study ................................................ 6 2. REVIEW OF LITERATURE ........................................... ............ 7 Common Missing Data Treatments .................................... ........... 7 M multiple Im putation .......................................................................... 14 Missing Data Mechanisms .............................................................. ... 33 3. METHODOLOGY ........................................................................ 45 Sim ulation Procedure ........................................................................... 46 Design of Study .............................................................................. 50 Multiple Imputation Procedure ........................................ ........... 61 Evaluating the Performance of Multiple Imputation ........................ 64 4. RESULTS .......................................................... ............................ 66 5. DISCUSSION ................................................................................ 79 Lim stations ........................................ ................ ............................. 81 Suggestions to Future Research ........................................ .......... 81 REFEREN CES ...................................................................................... ... 84 BIOGRAPHICAL SKETCH .................................................................. 90 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy THE IMPACT OF MULTIPLE IMPUTATIONS ON THE ESTIMATION OF COEFFICIENT ALPHA By Hon Keung Yuen August, 2000 Chairpersons: M. David Miller and Anne Seraphine Major Department: Educational Psychology The purpose of this dissertation is to investigate the accuracy of coefficient alpha on tests when nonrandom missing data are replaced using multiple imputation under a singlefacet crossed model. The performance of multiple imputation was evaluated under the conditions of three sample sizes (N = 50, 100, or 500), ten conditions of distribution and percent of missingness, and two omitting patterns (omitting item responses in the body and omitting responses at the end of the test). The ten missing conditions were formed from examinees of the three ability levels (high, medium and low) with a differential number of missing items. The nonrandom nature of missingness leads to examinees with low ability missing more difficult items or more items at the end of the test than those with high ability. A twentyitem test was used in this study. Results of the one thousand iterations indicated that the magnitude of the bias obtained in the omitting pattern where missing responses are at the end of the test was less than 0.03. In contrast, the magnitude of the bias obtained in the omitting pattern where missing responses are in the body of the test was less than 0.07. In general, the bias increased as the amount ofmissingness increased or as the sample size decreased. However, this pattern is not uniform across all the missing conditions investigated. Overall, this simulation study confirmed that multiple imputation is a reasonably good procedure to replace the missing data on tests in which missing responses are either in the body of the test or at the end of the test. CHAPTER 1 INTRODUCTION Accurate measurement of examinees' ability in standardized achievement assessments requires the test scores to be reliably measured. Internal consistency is one type of reliability that indicates how strongly the test items within the same construct are correlated. Internal consistency of a test appeals to educators because it requires only a single administration of one form of a test. Coefficient alpha (Cronbach, 1971) is a commonly used index to estimate the internal consistency of a test. The index is not a direct estimate of the theoretical reliability coefficient but is an estimate of the lower bound of the internal consistency (Crocker & Algina, 1986). According to Peterson (1994), the formula for computing coefficient alpha (a) can be expressed as a= s 1 I s 1 (1) sl a2 s1 i=1 i where s is the number of items in the test, a. is the variance of the test scores, of is the variance of a single item i, oaiaris is the covariance between item i and item s, and ris is the correlation between item i and item s. Or a= s (12) 1 + F(s 1) where F is the average interitem correlation. As in the estimation of the Pearson product moment correlation coefficient, computation of coefficient alpha requires a rectangular personbyitem data matrix with no missing data (i.e., a balanced design data set). However, it is well known that missing data is common in largescale standardized educational achievement tests such as the National Assessment of Educational Progress (NAEP) (Koretz, Lewis, SkewesCox, & Burstein, 1993; Longford, 1994) and the Test of English as a Foreign Language (TOEFL) (Yamamoto, 1995). Yamamoto (1995) indicated that about 20% of examinees have difficulty completing the last 20% of the items in the TOEFL. Two main classes of nonrandom omitting pattern in a test have been identified: omitting item responses in the body of the test and omitting item responses at the end of the test (i.e., notreached) (Longford, 1994). A number of conceivable circumstances can contribute to these occurrences. Angoff and Schrader (1984) found that response omissions in the body of the test is common in tests with instructions indicating that there is a penalty for incorrect responses, but not for response omissions. In this situation, examinees are more likely to omit difficult items when they are not sure of the answer (Koretz et al., 1993). For omitting responses at the end of the test, time constraints is a major factor. However, item difficulty has been reported to contribute to this type of omitting pattern (Koretz et al., 1993). Cluxton and Mandeville (1979) found less capable students tend to omit more items at the end of the test. Because of the balanced design requirement in the data set to compute coefficient alpha, missing data present a challenge when standard methods of data analysis are used. In the last few decades, a number of missing data treatments (MDTs) have been proposed (see review in Little & Rubin, 1987). A promising MDT is multiple imputation (MI), which was originally proposed by Rubin (1987). MI is a modelbased estimation technique for analyzing data with missing scores (Rubin, 1987). Using information from the observed part of the data set, MI generates k sets of equally plausible values from the simulated distribution of the missing data to replace the missing scores, where k is greater than one. The missing scores are imputed k times (Rubin, 1987). As a result, MI creates k versions of complete data sets with imputed values. Each complete data set can be analyzed separately by means of standard completecase analysis methods. The final adjusted point estimate is obtained by averaging over the k intermediate parameter estimates. MI has been shown to yield satisfactory parameter estimates with relatively little bias (Graham & Schafer, 1999). However, MI has not been used widely in educational settings except for matrix sampling and scaling procedures in the NAEP (Mislevy, Johnson, & Muraki, 1992; Neal & Nianci, 1997). Several recent studies compared different MDTs in estimating reliability coefficients on measures with missing data (Downey & King, 1998; Harrison, 1998; Marcoulides, 1990). Downey and King (1998) compared the accuracy of coefficient alpha estimation using item mean and person mean substitution to replace missing data in Likert scales. Results indicated that itemmean substitution reduces the reliability estimate whereas personmean substitution increases the reliability estimate of the scale as the number of missing items and the number of respondents with missing items increases beyond 20% (Downey & King, 1998). Marcoulides (1990) compared the consistency and efficiency of two MDTs (restricted maximum likelihood and analysis of variance) in estimating variance components on measures with missing data. He found that restricted maximum likelihood (REML) produces a more efficient and less biased variance estimate when 20% of the data are randomly deleted (Marcoulides, 1990). Along the same line of research, Harrison (1998) evaluated six MDTs (listwise deletion, zero imputation, substituting least square ANOVA, substituting probabilities of correct answers from logistic regression estimates, Hoyt's ANOVA formula, and REML) in estimating coefficient alpha on tests with dichotomouslyscored items under the conditions of five random and nonrandom missing data patterns crossed with two sample sizes (50 and 100). Results showed that REML provides reasonable accuracy and precision for the estimation of coefficient alpha in all five missing data patterns (Harrison, 1998). Statement of the Problem Results of Harrison's (1998) study indicated that the average bias of the coefficient alpha when using each of the six MDTs is negligible (less than 0.05) except in two nonrandom missingdata situations where a listwise deletion procedure is used. One reason for the small discrepancy in the bias among the six MDTs is that the maximum amount of missing data in Harrison's study is less than 11%. Roth (1994) cited several simulated and empirical MDT studies indicating that there is little difference in parameter estimates when the amount of missing data is less than 510% regardless of the missing data patterns (random or nonrandom). Roth (1994) suggested that the choice of MDTs becomes more important when the amount of missing data in a data set is beyond 15 20%. Therefore, we still do not know how some of the MDTs behave in situations with a moderate amount of missingness. Harrison (1998) found that the mean coefficient alpha produced by REML is more positively biased (i.e., overestimated) than that computed by Hoyt's ANOVA in situations where lowability examinees have more omitted items or where they tend to omit the most difficult items. There are two possible reasons for REML not behaving well in Harrison's study. One is that REML has been shown to produce estimates that are significantly biased in situations where sample size is small (N = 20 or 50) because REML is based on largesample theory (Gross, 1997). The other is that REML produces biased estimates when data are missing nonrandomly (Jamshidian & Bentler, 1999). Rationale for the Study The present study attempts to address some of the limitations of the REML estimation procedure (Harrison, 1998) by implementing MI which has been shown to perform well in small sample sizes (Graham & Schafer, 1999) and in nonrandom missingdata situations (Graham, Hofer, Donaldson, MacKinnon, & Schafer, 1997). Although MI is commonly applied to missing continuous data, it may also be applied to dichotomous missing data (Graham et al., 1997). Because Harrison's study examined the effectiveness of different MDTs under slight levels of missingness, it is of interest to examine the MI under more extreme levels of missingness. The level of missingness is therefore set as high as 30%. At this range of missingness, it would become more obvious how well MI performs. It is well known that data missing completely at random (MCAR) seldom occurs in educational settings (Kromrey & Hines, 1994). The present study, therefore, focuses on nonrandom missing data. Purpose and Significance of the Study The purpose of this study was to investigate, via data simulation, the accuracy of the coefficient alpha on tests with missing data replaced using MI. The performance of MI was evaluated under the conditions of three sample sizes (N = 50, 100, or 500), ten conditions of distribution and percent of missingness, and two omitting patterns (omitting item responses in the body and omitting responses at the end of the test). The results of this study provided an indication of how well MI performed in the abovestated missing data conditions under a singlefacet crossed model. CHAPTER 2 REVIEW OF RELATED LITERATURE The first section of this chapter provides an overview of some commonly used missing data treatments (MDTs), which include listwise deletion, variable mean substitution, regression imputation, and stochastic regression imputation. Limitations of these MDTs are highlighted. The second section is devoted to the development and theoretical framework of multiple imputation (MI), the relationship of MI and Bayes' theorem, assumptions and characteristics of MI, the description of the imputation methods, and procedures to perform MI. The last section discusses three major types of missing data mechanisms as proposed by Little and Rubin (1987), and implications of each missing data mechanism to the application of MI. Common Missing Data Treatments Listwise Deletion In order to transform the missing data matrix into a rectangular one, a common practice is to exclude those examinees who do not respond to all items. This is called the listwise deletion procedure or the completecase analysis. The complete data with reduced sample size are then used to estimate population parameters such as the reliability coefficient. Listwise deletion is the default option for analysis in many popular statistical software packages such as the Statistical Analysis System (SAS) and the Statistical Packages for the Social Sciences (SPSSX). Even though listwise deletion is the simplest approach to handle missing data, it is by no means the desirable one. Since the analysis is based on only those examinees who respond to all items, a substantial amount of useful data is lost. In a Monte Carlo investigation, Kim and Curry (1977) found that even with 2% random nonresponses on each of the 10 variables, listwise deletion results in retaining only 81.7% of the cases. There is an accompanied loss of efficiency or statistical power in the estimation of the population parameters especially when the amount of missing data is high. Raaijmakers (1999) demonstrated that listwise deletion results in a loss of statistical power ranging from 35% to 98% as the amount of missing data increases from 10% to 30% in various Likerttype data. Listwise deletion is based on the assumption that the data are missing completely at random even though there is little evidence to support this assumption in educational research (Kromrey & Hines, 1994). When data are not missing completely at random, estimates are biased. Empirical data support that the average bias increases as the amount of missing data increases (Harrison, 1998). In Harrison's (1998) study, when item responses are not missing at random, listwise deletion leads to range restriction. The resulting mean coefficient alpha is then substantially underestimated (Harrison, 1998). Single Imputation Procedures Besides using deletion procedures to handle missing data, single imputation procedures have also been used widely in educational research (see review in Raymond, 1987). Imputation involves filling in each missing response with a plausible value and then analyzing the resulting data set with the imputed values. Also, plausible values are estimated from observed scores in the study. Two major advantages of imputation procedures are as follows: 1. They retain the information from incomplete cases without discarding any scores. 2. The resulting data set with the imputed values can be analyzed by means of standard completecase analysis methods. The three single imputation procedures that are commonly used in educational research are variable mean substitution, regression imputation, and stochastic regression imputation (Raymond, 1987; Roth, 1994). Variable Mean Substitution To implement variable mean substitution or unconditional mean imputation, each missing score of a particular item is replaced with its respective mean value of all nonmissing cases. Even though it seems that the mean is a good estimate and the procedure is relatively easy to implement, variable mean substitution has several serious disadvantages. The observed variance of an item with imputed mean value is systematically underestimated (i.e., negatively biased) because imputing a mean value in an item is equivalent to adding zero to the sum of the squared deviations, which is the numerator of the formula for calculating variance. At the same time, there is an increase in the denominator of the variance formula, (N 1), as the procedure attempts to restore the original sample size (Landerman, Land, & Pieper, 1997; Raymond, 1987). Attenuation of the magnitude of the covariance or correlation of scores with filledin mean with scores in other items can be explained in a similar fashion. Conceptually, the imputed values are a constant and they are unrelated to scores in other items; therefore, interitem correlations are attenuated. Downey and King (1998) showed that the severity of attenuation on correlation increases as the amount of imputed values increases. As indicated in equation (12), a reduction in the average interitem correlation results in a decrease in the coefficient alpha (Downey & King, 1998). Figure 21 graphically shows the coefficient alpha spuriously decreases when there is a reduction in the average interitem correlation at the lower end (i.e., r < 0.2). Another disadvantage related to the attenuation of variability of the item is that the standard error of estimate is much too small resulting in biased inferences (Little & Rubin, 1989). Finally, variable mean substitution does not use information from other items to improve the accuracy of imputation (Landerman et al., 1997). UI 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Average Interitem Correlation Figure 21. Relationship between the coefficient alpha and the average interitem correlation when s equals to 20. Regression Imputation Regression imputation or conditional mean substitution is used to fill in missing scores of an item with values predicted from a regression model by utilizing information from one or more highly related observed variables or predictors. When the response variable with missing scores is dichotomous in nature, a logistic regression model is used instead. The logistic regression produces a predicted probability of a response being missing. Suppose Y is an Nx 1 vector of the responses for examinees, Y= (yj, ... ,yn) and is composed of a set of the observed scores, Yo = (y, ... yo) and a set of the missing scores, Ym = (yo+1, ., y,). Y can be partitioned into Y = (Yo, Y,). Let no be the number of respondents for Yo and n, be the number of nonrespondents for Y,. X denotes an N x G design matrix of relevant explanatory variables or predictors that are highly correlated with Y. These variables are fully observed (i.e., with no missing data). X. represents the predictors for no individuals with observed scores Yo and Xm represents the predictors for nm individuals with missing scores Y,. Ym scores are to be estimated. A linear regression equation can be expressed as Y= a + bX + e (21) where a is a column vector of the intercepts, b is a column vector of estimated regression coefficients, and e is a column vector of estimated residuals and is set to zero. The procedure to impute the predicted values based on a deterministic regression model is as follows. First, use X. to estimate the coefficients of the linear regression equation by regressing Yo on X,. After estimating the regression coefficients from the observed scores, a predicted score for Y,, can be obtained from the prediction equation: Y, = a + bX. (Little & Rubin, 1989). Landerman and associates (1997) explained that the distribution of Ybased on the regression imputation is less distorted than Y based on the mean substitution because the imputed values are now distributed across the predicted values fl instead of concentrated at the mean. The variance of Ybased on the regression imputation is less attenuated than that based on the mean substitution because the numerator for calculating variance is the squared deviations of the Y. from the grand mean, which is not likely to be zero. In regression imputation, the imputed values of Y, fall exactly on the predicted regression line or plane as the estimated residual e in equation (21) is set to zero (Landerman et al., 1997; Little & Rubin, 1989). Therefore, there is no variation in the distribution of Ym given Xm (Little & Rubin, 1989). Because of the exact linear (or plane) relationship between Ym and Xm, its correlation is spuriously inflated (Graham & Schafer, 1999). Harrison (1998) demonstrated that replacing missing data with either the least squares estimates or the predicted probabilities overestimates the coefficient alpha. This phenomenon can be explained by the positive relationship between the coefficient alpha and the average interitem correlation in equation (12). In addition, the severity of bias (overestimation of the coefficient alpha) increases as the amount of missing data increases (Harrison, 1998). Little (1992) concluded that mean substitution or regression imputation can yield unbiased estimates of aggregate means but leads to distorted variance and covariance estimates. Regression imputation conveys a false sense of accuracy that all missing scores can be predicted from Xm without errors. By treating the imputed values as the known observed scores, regression imputation fails to account properly for the variability or uncertainty about not knowing the missing scores (i.e., which value to impute) (Rubin & Schenker, 1991). Failure of the regression model to incorporate residual variability in the imputation variance leads to standard errors of estimates bias toward zero (i.e., too small) (Little & Rubin, 1989). For example, Brownstone and Valletta (1996) found that the least squares standard error estimates are 30% less than their true values. Stochastic Regression Imputation In order to restore the prediction errors in the imputed values (i.e., the variability around the regression line), a random residual / error is added to each predicted value. The random residual can be drawn randomly with replacement either from a standard normal distribution with a mean equal to zero and a standard deviation equal to the standard error of estimate for Yo (Beaton, 1997), or from the distribution of residuals of the regression estimate for Yo (Graham et al., 1997). The purpose of drawing with replacement is to ensure that each drawn value has equal probability. In stochastic regression imputation, each missing response is replaced by its conditional mean plus a random residual from Yo (Little & Rubin, 1989). However, stochastic regression imputation restores only one part of the variability: the errors of prediction. There is another part of variability, the sampling variability in which the values of the estimated regression coefficients are uncertain. Graham and Schafer (1999) explained that the regression line estimated from Yo is not the regression for the population, but is only an estimate from one sample. Stochastic regression imputation cannot reflect properly the sampling variability because it lacks the variation of the imputed values among several sets of imputations (Little & Rubin, 1989). To incorporate the sampling variability in the estimation of the regression parameters, multiple imputation (MI) is required (Rubin, 1987). Multiple Imputation Introduction Multiple imputation was originally proposed by Rubin (1987). It is a modelbased estimation technique for analyzing data with missing scores (Rubin, 1987). Using information from the observed part of the data set, MI generates k sets of equally plausible values from the simulated distribution of the missing data to replace the missing scores, where k is greater than one. The missing scores are imputed k times and multiple imputations within one model are called repetitions (Rubin, 1987). As a result, MI creates k versions of complete data sets with imputed values. Each complete data set can be analyzed separately by means of standard completecase analysis methods. The estimate and its associated variance from each separate analysis can be combined to form an unbiased final parameter estimate under the correctly specified model (Little & Rubin, 1989). The final variance incorporates the variability within the imputation (i.e., the prediction error) and the variation of the imputed values among k sets of imputations (i.e., the sampling variability) to reflect the true accuracy of the estimation. Theoretical Framework Let Q denote a scalar population quantity (such as a coefficient alpha or a regression coefficient) to be estimated and let Q = Q (Yo, Ym) denote a function of the observed and missing data. Multiple imputation uses information from the observed scores Yo to replace the missing scores Ym, and then uses the complete data set with imputed values to estimate the parameter Q. A distribution of the missing data is required to generate the imputed values. The distribution is drawn from the observed scores Yo. It is necessary to know the structural or model parameter of the observed scores, which is 0, where 0 represents a vector of q parameters. For example, 0= (uo2), means that 0is a function of the mean p and the variance o2. Because 0is unknown, it must be estimated, resulting in the random variable, 0 (Michiels & Molenberghs, 1997). Because of the uncertainty of not knowing 0, Rubin (1987) recommended using Bayesian methodology to account for this uncertainty in MI. Through a Bayesian procedure, a distribution function of t in the form of the posterior probability distribution of 0can be obtained from the data (Michiels & Molenberghs, 1997). The derivation of MI is as follows (Rubin, 1996): Inferences for Q are based on the actual posterior probability distribution of Q,f(Q Yo), which can be expressed as f (Q I Yo)= f (QI Yo,Ym)f(Y, I Yo)dY, (22) where fdenotes the probability distribution function, f(Q I Yo, Y,) is the complete data posterior probability distribution of Q and is expressed as the conditional distribution of Q given both the observed and missing data, and f(Y, I Yo) is the predictive probability distribution of missing scores Ym given the observed scores Yo. Based on equation (22), the actual posterior probability distribution of Q at a particular value Qi can be obtained by drawing an infinite number of repeated independent values for Y, fromf(Ym I Yo), calculatingf(Q, I Yo, Ym) separately for each draw, and then averaging the values over the repeated imputations (Little & Schenker, 1995; Rubin, 1996). The predictive distribution of the missing scores can be parameterized using a structural parameter 0 (Little & Schenker, 1995), and is expressed as f(Y, I Yo)= Jf(Y. I Yo,0)f(90 Yo)d (23) where f(9 Yo) is the conditional distribution of 0 given the observed scores Yo, and f(Ym I Yo, 6) is the conditional distribution of Y, given the observed scores Yo and the parameter 0. From the Bayesian perspective, drawing k values for the missing scores Ym in MI involves two steps (Schafer, 1999): Step 1. Simulate an independent random draw of the unknown parameter ( from the observeddata posterior distribution (0 Yo). Sf(0 Yo) where d is the posterior distribution of 0 or a distribution for the missing scores estimated. Step 2. Randomly draw missing values YI from the conditional predictive distribution of Ym given parameter 6. Y. ~ f(Y. I Yo, *) These two steps are repeated k times to yield k sets of imputed values for the missing scores Y,. In principle, MI involves k repetitions of independent draws from the posterior predictive distribution of Y. by specifying a prior distribution of the unknown structural parameter 8 (Little & Schenker, 1995). This forms the k imputations for the missing scores Y,. Bayes' Theorem The aim of MI is to estimate the unknown structural parameter 9 and to generate the imputed values. Conditional upon a sample of observed scores Yo, Bayes' theorem makes inferences about the unknown parameter 0. Bayes' theorem represents uncertainty of not knowing Oby a prior probability distribution (Pollard, 1986). The conditional distribution of 0 given the observed scores Yo or the posterior distribution of is derived from a Bayesian procedure (Pollard, 1986) and is defined as f(91Yo) =f(Yo I O)f(0) /f(Yo) (24) where f(Yo I 8) is the conditional probability, or likelihood, of the observed scores Yo, f(0) is the prior distribution of unknown structural parameter 8 and represents uncertainty about the value of the parameter before any data are seen, and f(Yo) is the marginal probability of observed scores Yo for an examinee of parameter 0 randomly sampled from a population with the given distribution. Sincef(Y,) is a constant that serves to makef(81 Yo) integrate to one, the equation (24) becomes f( I Yo) oc f(Yo I)f(0) (25) where oc indicates a relationship of proportionality. Given Yo,f (Yo I 0) becomes a likelihood function for 0 given Yo, L(O Yo). The posterior distribution of 0is then written as f(9 Yo) ac L(OI Y,)f(0) (26) where L(0 Yo) expresses the information about the parameter provided by the observed scores Yo, andf(Yo I 9) serves to convert the prior distribution f(0) into the posterior distributionf(01 Yo) (Pollard, 1986). It is through the likelihood (i.e., L( I Yo)) of f(Yo I 0) that the observed scores Yo modify the prior distribution () to determine the posterior distributionf(9 I Yo) (Pollard, 1986). In essence, Bayesian methodology specifies a distribution of what is expected to occur based on prior information and combines it with new information (i.e., the observed scores Yo) to form inferences about 0. Assumptions Underlying Multiple Imputation The missing data mechanism is assumed to be ignorable or missing at random (Graham & Schafer, 1999). Ignorable means it is not necessary to specify a nonresponse model or to estimate its parameters in order to obtain valid likelihoodbased inferences. Missing at random means that the missing data are a random sample from the complete data after conditioning on the measured variables X in the imputation model (Schafer, 1997). However, Graham and associates (1997) have demonstrated that MI produces satisfactory parameter estimates even when the ignorability assumption is suspect. In addition, the variables in the data set are assumed to have multivariate normal distribution. Simulation studies (Graham, Hofer, & McKinnon, 1996; Wang, Anderson, & Prentice, 1999) supported that the MI estimator is robust even when the data model departs from being multivariate normally distributed. Proper Imputation Method An imputation method is regarded as proper when it incorporates appropriate variability (i.e., uncertainty about the missing scores and the sampling variability) in creating multiply imputed values under a correctly specified model (Rubin, 1987, 1996). Rubin (1987) has shown that one way to achieve proper imputation is for the imputation procedure to follow the Bayes' theorem of infinite independent draws of Y, from its posterior predictive distribution as specified in equations (22 & 23). By incorporating variability to adjust the standard error of estimates of the parameter, proper imputation method leads to valid inferences (Rubin & Schenker, 1991). The conditions under which an imputation method is proper include the following: 1. Imputed values are independent repeated draws from a Bayesian posterior predictive distribution of the missing scores Y, given the observed scores, f(Y, I Yo) (Rubin, 1996). 2. Infinite k repeated imputations since parameter estimates derived from infinite draws for Y, are fully efficient (Little & Rubin, 1989). 3. The underlying model specification for the complete data is correct. 4. The underlying model specification for the missing data mechanism (i.e., assumptions about the nonresponse) is correct. 5. Largesample size (N> 100) (Rubin & Schenker, 1986). 6. All causes of missingness are included in the imputation model (Graham et al., 1997). Imputation Methods Rubin and Schenker (1986) proposed two types of imputation methodsimplicit and explicit. Implicit or nonparameteric methods are applicable for discrete data and involve drawing values only from Yo and then assigning them to Ym. In contrast, explicit or parametric methods are applicable for continuous data and involve a statistical model to form the posterior predictive distribution of Y,, from which imputing values are drawn. Unlike implicit methods, the drawing values based on explicit methods are not in Yo (Rubin & Schenker, 1986). Implicit Methods Simple hot deck procedure The simple hot deck procedure involves random draws with replacement of nm imputed values for nonrespondents from observed scores in matching respondents. However, like stochastic regression imputation, simple hot deck procedure ignores the sampling variability as the population distribution of (Ym Yo) is not known. The imputed values are estimated from the respondent scores Yo in one sample only (Little & Schenker, 1995). Approximate Bayesian bootstrap In order to incorporate sampling variability in the estimated parameters, approximate Bayesian bootstrap (ABB) is used (Rubin, 1987). The ABB creates k repeated imputations from the posterior predictive distribution of the missing data as follows: 1. Draw no values at random with replacement from the no possible values to create a bootstrap sample distribution such as a scaled multinomial distribution. 2. Then independently draw n, missing values with replacement from the bootstrap sample distribution (Rubin & Schenker, 1986). This process is repeated k times to yield k sets of imputed values, and each set of imputations comes from a different bootstrap sample of Yo. Explicit Methods Explicit methods define the model for the distribution of the response variable Y (e.g., normal linear regression model or logistic regression model) and a set of predictors X that enters the model to create imputations (Little, 1992; Rubin & Schenker, 1991). Fully normal imputation Once again, suppose Yis an Nx 1 vector of the response for examinees, Y= (yi,. .., y,) and is composed of both a set of the observed scores, Yo = (yi, ..., yo), and a set of the missing scores, Ym = (yo+i, ... ., y,). Let no be the number of respondents for Yo and nti be the number of nonrespondents for Ym. The scores of Ym are to be estimated. Rubin and Schenker (1986) described how to create multiple imputations under the independent normal model, yi ~ N(, o2), for i = 1,... ,o, where 0= (u, 2) is unknown, and 0 is a function of the mean ,u and the variance O2. When the prior distribution of O,f(0), is proportional to 1/o2, the conditional posterior distribution of P given o2, fu I o2, Yo) is N(y0,2l /no), 10 where yo is the sample mean of Yo, and is equal to y,; no i=1 and the observeddata posterior distribution of o2,f(o2 Yo) is (no g)62 X og, where ir2 is the estimated variance of Yo, and is equal to (y o)2, and (no g) j=1 X2, denotes a chisquare random variable with nog degrees of freedom. To create an imputation Y. = (Y:,.... Y, ), for = 1,..., k, the following three steps are required. Step 1. Generate the unknown parameters 0 = (U, "2) from the observeddata posterior distributionf(9 Yo) by first randomly drawing the variance a*2 from(n, g)d2 IXg, and then randomly drawing the mean u* from N(3y, &*2 / no). Step 2. Independently draw nm missing values of Ym from yi, ~ N(p', a *2), for i = o + 1,.. n (Rubin & Schenker, 1986, Schafer, 1999). Step 3. Repeat the procedure k times (i.e., set = k) to yield k sets of proper imputations. Markov Chain Monte Carlo In addition to using the resampling procedure such as bootstrapping, which is a noniterative method for creating posterior predictive probability distribution from Yo, Markov Chain Monte Carlo (MCMC) is a collection of iterative simulationbased methods for generating the posterior distribution of the unknown parameters 0and they do not require large samples for efficacy (Little & Schenker, 1995). Data augmentation algorithm (Tanner & Wong, 1987) and Gibbs sampling (Gelfand & Smith, 1990) are two of the most common MCMC methods used in MI (Little & Schenker, 1995). They generate simulationbased estimates of the Bayesian posterior predictive distribution of the missing data, f(Y, I Y) and from it, perform k independent random draws of Y,m (Schafer 1997). Normal Linear Regression Model Y is modeled by a linear regression model, Y~ N(XPf, o2) with a multivariate normal distribution, where Xf covariate is a function of the parameters, X contains g variables, 8 is the parameter vector of regression coefficients to be estimated, and o0 is the regression variance. The algorithm for creating k multiply imputed values involves the following steps (Rubin, 1987): Step 1. Regress Yo on X. to give the ordinary least squares estimates: estimated regression coefficient vector / and estimated regression variance ci2. S= VX'. Y (27) where V = (X'o X,). The vector of predicted responses: Y, = X, (28) The maximum likelihood estimator of o2: 2 (Yo /(no g) (29) This is the estimation task for the normal linear regression model (Rubin, 1987). Imputation task for this model is from Steps 2 to 4. Step 2. Estimate o*2 (square of a random error) to account for the deviations around the regression line. '2 = a2(no g)/L (210) for = 1,... where L is a randomly drawn variate from a chisquared distribution with no g degrees of freedom. Substitute r2 = (Yo )2 /(no g) in equation (210), and o2 becomes (Yo o)2 /L. Step 3. Estimate the regression coefficients by adding the random error a to account for the uncertainty about the regression prediction. f' = f^ + a"ZVZ (211) where Z is a gcomponent vector of standard normal deviate, Z ~ N (0, Ig), Ig is the identity matrix of order g. Z is formed by drawing g independent variates from N (0,1), o'V"2 is the mean square error of the regression equation, and oaV is the variancecovariance matrix 1. The roots of the main diagonal of Z are the standard errors, V/12 is the triangular square root of V obtained from the Cholesky decomposition, and cr'V"2Z represents the error term. Each set of randomly drawn coefficients is then used to estimate the missing values, and different sets of coefficients reflect the variation of the regression lines due to sampling. Steps 2 and 3 indicate random draws from the posterior distribution of j (van Buuren, Boshuizen & Knook, 1999). Step 4. Predict the missing values of Ym based on the following equation: Y. = X8 + r'z (212) where z is a random number drawn from normal deviates. Each set of predicted values for Ym is based on a different set of regression predictions and random components o*. Step 5. Repeat steps 2 to 4 k times to create k sets of imputed values Y., Y2,..., Y . RepeatedImputation Inferences The final point estimate of the parameter Q approximates the actual posterior mean of Q, E (Q I Yo). This equals the average of the repeated completedata posterior means of Q, and can be expressed as (Rubin, 1996): E (Q Yo) = E [ E (Q I Yo, Y) Yo] (213) where E refers to the expectation over the repeated imputations, and E (Q I Yo, Ym) approximates the completedata posterior mean. The final estimated variance of the parameter Q approximates the actual posterior variance of Q, Var (Q I Yo). This equals the average of the repeated completedata posterior variances of Q plus the variance of the repeated completedata posterior means of Q, and can be expressed as (Rubin, 1996): Var (Q I Yo) = E [Var (Q I Yo, Y,) I Yo] +Var [E (Q I Yo, Ym) Yo] (214) where Var refers to the variance over the repeated imputations, and Var (Q I Yo, Y,) approximates completedata posterior variance. Based on the above standard probability derivations (213 & 214), the k point estimates and their associated variances obtained from the standard completecase analysis method can be combined into a final adjusted estimate and its estimated variance using Rubin's formulas (Rubin, 1987). After generating k imputed data sets using an appropriate imputation model and method, and analyzing each of them separately using a standard completecase analysis method, MI yields k intermediate parameter estimates Q: ( ,..., Q,), and k associated variance estimates l,: (lj,...,Uk ), for i = 1,..., k. The final adjusted point estimate Q is obtained by averaging over the k intermediate parameter estimates, which is k1k Q = i (215) k i= Whereas the final estimated total variance is obtained by the sum of the average associate variance within a set ofk imputed values and the variance across independent sets of imputed values, which is T = U+(1+ k')B (216) where U is the average withinimputation variance within a set ofk imputed values, and is expressed as U =El (217) k i=1 and B is the variance across independent sets of imputed values, and is expressed as 1k B= k (Q, )2 (218) k I x Bacik, Murhy, and Anthony (1998) indicated that the withinimputation variance is a measure of the uncertainty about not knowing the missing data and the between imputation variance is a measure of ordinary sampling variation. The inflation factor (l+k') accounts for the simulation errors in using a finite number of imputations (i.e., k < oc) (Barnard & Meng, 1999). Multiple imputation correctly adjusts the standard error of estimates of the parameter by including within and betweenimputation variances. When there are no missing data, Q,..., are identical, and the between imputation variance B becomes zero and T is equal to U. When k = 1 (i.e., single imputation), B cannot be estimated. T is then equal to U, and the variance is systematically underestimated (Heitjan & Rubin, 1990). As k increases, both Q and T decrease, hence resulting in greater precision of sample statistics (Little & Schenker, 1995). The extent of influence of missing data on the estimation of Q is determined by both yand r. The factor r estimates the proportional increase in variance due to missing data, and can be expressed as r = (1+ k)BIU = yl/( y) (219) where the ratio of B to U is a reflection of how much information in the missing part of the data relative to the observed part (Schafer & Olsen, 1988), and yis an estimate of the fraction of missing data about Q (Little & Schenker, 1995). Little and Rubin (1989) pointed out that yis equal to the fraction of data missing only when the missing data mechanism is missing completely at random. Uncertainty Since the imputed values are not the true observed scores, MI takes into account the uncertainty about the true values of the missing scores in the parameter estimates by drawing parameter d from the observeddata posterior distributionf(01 Yo) and then imputed values Y, from the conditional predictive distribution of Y,,, given that parameter d, f(Y, I Yo, 0) (Rubin & Schenker, 1991). In addition to incorporating the uncertainty about not knowing the missing scores, MI also takes into account the fact that the population distribution of Y, given Yo is not known, and is estimated from the observed scores Yo in one sample (Graham & Schafer, 1999; Little & Schenker, 1995). The variation in estimating the regression line is called sampling variability (Graham & Schafer, 1999). The third source of uncertainty / variability comes from the finite number of imputations derived from using approximations to Bayesian posterior distributions, and is called simulation errors (Rubin & Schenker, 1991). Finally, by comparing parameter estimates across a number of plausible missingdata models (i.e., sensitivity analysis), MI reveals uncertainty about reasons for nonresponse (Beaton, 1997; Little & Rubin, 1989). Number of Imputations Under the ignorable response assumption, the final adjusted point estimate Q and its estimated variance based on infinite number of imputations are the same as the ones obtained from the maximum likelihood estimation (MLE), which is fully efficient and correct (Little, 1992). The largesample efficiency of the point estimate Q based on k imputations relative to that based on an infinite number of imputations is 1 + (7 / k), in standard error units (Rubin, 1996; Schafer & Olsen, 1988). As illustrated, with 30% missing data (y= 0.3), an estimate based on k = 3 imputations has a standard error of 1 + 0.3/3 = 1.05. This means the standard error is 5% wider than the one obtained from MLE. Alternatively, the percent efficiency of Q is defined as 1/l + (y/k) (Rubin, 1987). In this example, the percent efficiency is 1/1.05 = .95 or 95%. This means the efficiency of Q is 5% less than the one obtained from MLE. By increasing k to 5 and 10 imputations, it increases the efficiency of Q to 97% and 99.5% respectively. As shown in Figure 22, unless the fraction of missing data is unusually high (70% or more), the efficiency gained by implementing k beyond 510 is minimal. Rubin and Schenker (1986) concluded that only a few number of repetitions (3 < k < 10) are needed to produce point estimates that are close to fully efficient when the amount of missing data is moderate (e.g., 30%). Y   .9  0.7 0.6 S0.5 I. 0.4 S0.3 0.2 0.1 x S1 2 3 4 5 6 7 8 9 10 Number of Imputations Figure 22. Percent efficiency of MI estimation using different number of imputations in three levels of missingness (10%, 30%, and 50%). Several empirical studies supported that standard error estimates of the parameters were underestimated by 1020% in single imputation when compared to the ones in MI (Crawford, Tennstedt, & McKinlay, 1995; Heitjan & Rubin, 1990; Landerman et al., 1997). Based on the results of two Monte Carlo simulation studies (Little & Rubin, 1989; Rubin & Schenker, 1986), Table 1 shows the comparison of the actual confidence interval (CI) coverage for Q, when k = 1, 2, or 3, with the nominal coverage at 90%, 95%, or 99%. Under the ignorable response assumption, the fraction of missing data in these two largesample (N > 100) simulation studies was 30%. As indicated, when k = 1 (single imputation), the discrepancy between the actual and nominal coverage ranges from 513%, whereas when k = 2, the discrepancy is only 23%. When k = 3, there is no discrepancy at all, which means that inferences are valid. Table 21. Analytic LargeSample (N> 100) Coverage (in %) of Single (k = 1) and Multiple (k = 2 or 3) Imputation Procedure with Missing Data Equals 30% k Nominal Coverage 90% 95% 99% 1 77 85 94 2 87 93 99 3 90 95 99 Note. Adapted from Little and Rubin (1989) and Rubin and Schenker (1986) Rubin and Schenker (1986) demonstrated that k should increase from 2 to 3 as the nonresponse rate increases from 10% to 60% in order to achieve a satisfactory CI coverage (i.e., close to the nominal value). Rubin and Schenker (1986) also pointed out that improvements in the actual CI coverage diminish as k increases. The differences in standard errors or CI coverage between k = 5 and k > 25 have been shown to be negligible (Heitjan & Rubin, 1990; Wang, Sedransk, & Jinn, 1992). Based on the literature, the number of imputations depends on: 1. The amount of missing information. As the percent of missing data increases, the amount of uncertainty about the imputed values increases. To accurately incorporate this uncertainty, it requires an increase in the number of imputations. Since imputed values are averaged over k imputations, the imputation variance is reduced as the number of imputations increases (Kalton & Kasprzyk,1986; Rubin,1987). 2. The type of missing data mechanisms. Based on several simulation and empirical studies, Glynn, Laird, and Rubin (1993) and Raghunathana and Siscovick (1996) demonstrated that nonignorable nonresponse patterns require a larger number of imputations (k > 10) than ignorable nonresponse patterns to achieve a satisfactory CI coverage. Advantages of Multiple Imputation As in single imputation, the resulting data set with the imputed values can be analyzed by means of standard completecase analysis methods. Because MI involves averaging over k intermediate parameter estimates, the final point estimate derived from MI is more efficient than that from single imputation (Rubin, 1996). The final estimated variance also reflects the true variance of the parameter. Studies affirmed that MI produces accurate standard errors (i.e., efficient) for parameter estimates as it correctly adjusts for nonresponse bias (Heitjan & Little, 1991; Rubin & Schenker, 1986; Xie & Paik, 1997). The estimated actual CI coverage is close to the nominal levels (Little & Rubin, 1989; Rubin & Schenker, 1986; Wang et al., 1992), which means MI yields valid inferences. MI has been shown to yield satisfactory parameter estimates with relatively little bias even under the following conditions: 1. Sample sizes are small (e.g., 50) (Graham & Schafer, 1999). Little (1992) recommended to use MI for small samples and MLE for large samples. 2. Data are missing in large amounts (e.g., 50%) (Graham & Schafer, 1999). 3. Models are relatively large and complex (e.g., 18predictor model) (Graham & Schafer, 1999). 4. Ignorability assumption is suspect (Graham et al., 1996, 1997). 5. Data distribution is skewed (Graham et al., 1996; Wang et al., 1999). 6. Model of the data distribution is misspecified (Greenland & Finkle, 1995). Empirical and simulation studies have shown that MI is far superior to deletion procedures, mean substitution, regression imputation (Crawford et al., 1995; Graham et al., 1996), and simple hot deck procedure (DeCanio & Watkins, 1998) with regard to bias, efficiency, and validity of interval estimates when the underlying MI model specification is correct. Limitations of Multiple Imputation Since the observed scores Yo provide indirect evidence about the likely values of the missing ones Ym in MI, relevant predictors for Y (i.e., knowledge on the cause of missingness) are essential to obtain unbiased estimates and valid inferences. Summary A summary of the development, theoretical framework, and assumptions of MI were discussed. The procedures on performing MI based on a normal linear regression model with a univariate Y variable as well as how to combine k intermediate parameter estimates into a final adjusted point estimate and its variance were described. The two main features of MI are: (i) it takes into consideration the uncertainty of not knowing the exact values of the missing scores by incorporating the residual variation about the regression prediction, and (ii) it incorporates the sampling variability to estimate the population distribution of the missing scores, which are unknown. Because of these two features, MI has been shown to yield satisfactory parameter estimates with relatively little bias. Missing Data Mechanism Little and Rubin (1987) indicated that valid inferences from MI depend on the inclusion of a correct mechanism that produces missingness, and the knowledge of the missing data mechanism is important in selecting an appropriate imputation model. Rubin (1976) defined the mechanism of missingness in terms of a probability distribution model of nonresponse. Let R denote an N x 1 vector of binary missingdata indicators with distribution depending on a parameter vector V/ for the nonresponse model. If an examinee responds to an item, R = 1; if an examinee omits an item, R = 0. Since Y= (Yo, Ym), the probability distribution of the Y function for the complete data can be expressed as f(Y I) f(Y, Y, 10 ) (220) Integrating over the sampling space of missing scores Y, yields the marginal probability distribution for the observed scores (Schafer, 1997). f(Y, I0) = ff(Y0, Y I )dY, (221) The probability distribution for the observed scores given the parameters 0 of the data model and the parameters y of the missing data mechanism can be expressed as f(Y,,R I X,0, ) = f(Y, Y,,R I X,0, )dY, (222) where 0and yare sets of indexing vectors of unknown parameters for their respective distributions. For example, the parameters in y are the proportions of examinees assigned to each item (Bradlow & Thomas, 1998), whereas 9= (Au, o) or 0= (', oa). The equation (222) can be factorized asf(Y,, R X, y) = Jf(R IY, Y., X, yl)f(Yo, Y. I X, O)dY, (223) where f(R I Yo, Y,, X, V) denotes the conditional distribution of R given Y and represents a model for the missing data mechanism, and f(Yo, Ym I X, 0) represents a model for the data. Little and Rubin (1987) distinguished three types of missing data mechanisms: missing completely at random (MCAR), missing at random (MAR), and nonignorable missing (NIM). Missing Completely at Random Missing data mechanism is MCAR if the probability of the missing data indicator, f(R), is independent of both the observed scores Yo and the missing scores Y, in the model, which means f(R I Yo, Y. X, ) =f (R I y) for all Y (224) An example of MCAR in education is when the probability of missing responses to an item in an achievement test depends on neither the examinees' ability nor the number of instruction hours on test taking skills received, and the number of instruction hours received for all examinees are known. As indicated in equation (224), the distribution of the missing data indicator R does not depend on the missing scores or any covariates X. The first term of the marginal probability distribution in equation (223) can therefore come out of the integral, and equation (223) can be expressed as f(Yo,R I X,0,y) = f(R I ) ff(Y, Y, I )dY, (225) From equation (221), when the MCAR assumption holds, the probability distribution of the observed scores then becomes f(Y, R I X, 0, V) =f(R I )f (Y, 0) (226) where f(R tI ) represents a model for the missing data mechanism, and f(Yo  9) represents a model for the conditional probability distribution of the observed scores Yo. Since Yo and R are independent in equation (226), the sampling distribution of the observed scores is a marginal of the complete data distribution (Laird, 1988). This situation implies that samplingbased inferences such as regression imputation that make use of the distributional properties of the marginal distribution of the observed scores are unbiased and valid (Heitjan, 1997). However, MCAR makes the strongest assumption among the three types of missing data mechanisms (Little, 1992). The likelihood function of the observed scores under the MCAR assumption in equation (226) can be factorized into two components, one pertaining solely to the structural parameter 0of the model and the other pertaining solely to the nuisance parameter y of the missing data mechanism. L(, I Y,,X,R) oc f(R I y)f(Yo 10) (227) When the joint parameter space of (0, V) is the product of the parameter space of each separately, that is the two parameters 0and yare independent, the likelihood of based on Yo, L( I Yo) is a function proportional tof(Y, I 0) (Chirembo, 1995), L( I Y,) c f(Y, I0) (228) Since L( I Yo) is proportional tof(Yo I 0), it is not necessary to specify the missing data mechanism when using likelihoodbased inferences to obtain unbiased estimates (Laird, 1988). Under the MCAR assumption, Bayesian inference or maximum likelihood estimation of the structural parameters 0 will yield valid inferences from the observed scoresf(Yo 9 ) without estimating the parameters y/(Rubin, 1976). That is why the missing data mechanism is ignorable for likelihoodbased inferences. The pattern of missingness on Y under the MCAR assumption is completely randomly determined. The MCAR assumption can be assessed by comparing distributions of missing variable Y for respondents and nonrespondents on covariates X to check evidence for a systematic difference between nonrespondents and respondents (Curran, Bacchi, Hsu Schmitz, Molenberghs, & Sylvester, 1998). Missing at Random MAR is based on a weaker assumption. Under the MAR assumption, the conditional probability distribution of missing data indicator R given X depends on the observed scores Yo, but not the missing scores Ym (Little, 1995) f(R I Yo, Y, X, y) =f (R I Yo, X, V) for all Y, (229) For example, the probability of missing responses to an item in an achievement test depends on the scores of the measured variables (e.g., number of instruction hours on test taking skills received), but not on the missing scores of the item itself (e.g., item difficulty). As indicated in equation (229), the distribution of the missing data indicator R does not depend on the missing scores. The first term of the marginal probability distribution in equation (223) can therefore come out of the integral, and equation (223) can be expressed as f(Y,,R I X,0,y)= f(R I Yo,X,) If(Y ,Y. X,9)dY, (230) As in equation (221), the probability distribution of Yo is the marginal probability distribution. f(Y I X,0) = ff(Y,Y I X,o)dY (231) The probability distribution of the observed scores under the MAR assumption then becomes f(Yo, R I X, 0, ) = f(R I Yo,X, )f(Y, I X,0) (232) When data are MAR, the sampling distribution of the observed scores no longer equals the ordinary marginal distribution, but depends upon the missing process (Laird, 1988). Hence samplingbased inferences are biased. On the other hand, the likelihood function of the observed scores under the MAR assumption in equation (232) can be factorized into two components: L(0,y I Yo,X,R) oc f(R I Y,,X,y)f(Y, I X,0) (233) Once again, when the two parameters 0 and V/are functionally unrelated, the likelihood of the structural parameters Obased on Yo is a function proportion to the marginal probability distribution of Yo (Rubin, 1976). L( I Yo,X) o f(Yo IX,0) (234) Thus the missing data mechanism under the MAR assumption is also ignorable for the likelihoodbased inferences (Rubin, 1976). In summary, when the missing data mechanism is ignorable, the imputation model does not have to include a distribution of the missing data indicator R, and the likelihood function of 0 is based on only the observed scores Yo. Rubin (1976) showed that the response mechanism generating the missing data is ignorable for likelihoodbased inferences if the parameter 0 of the data model and the parameter / associated with the missing data mechanism are independent or functionally unrelated, and the missing data are MAR. Conceptually, the missing values under the MAR assumption are a random sample from the complete data after conditioning on the measured variables X in the imputation model; therefore, the process of creating these missing values can be modeled using these variables (Barard, Du, Hill, & Rubin, 1998). For example, the percent of missing responses in an item of an achievement test differs in groups of examinees with high, medium, and low cognitiveability scores, and the scores of the cognitive ability for all examinees are known. Under the MAR assumption, the missing responses are randomly distributed within these three subgroups of examinees, even when the responses are not missing at random across subgroups (Roth, 1994). In other words, the measured variables X (i.e., the cognitive ability in this example) can account for the differences in the distribution of Y between nonrespondents and respondents (Little & Schenker, 1995). In addition to MAR, Roth (1994) identified another pattern of missingness when missing data are related to other variables. In this pattern, missing data are nonrandomly distributed across and within subgroups. For example, more scores are missing at the bottom range of the high cognitiveability group but a relatively few missing scores at the top range of the same group (Roth, 1994). Accessible Missing Data Mechanism Graham and Donaldson (1993) defined the missing data mechanism as "accessible" when the cause ofmissingness has been measured, whereas ignorablee" refers to a combination of accessible and proper use of the cause of missingness for analysis. Graham, Hofer, and Piccinin (1994) explained that unless the cause of missingness is incorporated properly in the analysis, the mechanism will not be ignorable. Schafer (1997) pointed out that whether the missing data mechanism is ignorable is closely related to the fullness of the observed scores Yo, the relevant variables X (i.e., causes ofmissingness), and the complexity of the data modelf(Yo I X, 0). If Yo and X contain a lot of information for predicting Y,, and are incorporated properly in the imputation model for analysis, then the residual dependence of R upon Ym after conditioning on Yo and X will be small (Schafer, 1997). Including relevant variables X, (covariates, variables that relate to the nonresponse, and predictive variables that explain a considerable amount of variance of Yin the model) help to reduce the uncertainty of the imputations (van Buuren et al., 1999), and thus to adjust bias associated with the missing data (Graham et al., 1994, 1997). Barnard and Meng (1999) advocated the adoption of a "sensible imputation model", which incorporates as many relevant variables for the cause of missingness, and at the same time keeps the modelbuilding and fitting feasible so as to reduce multicollinearity problems. It is suggested that including extra variables may affect precision, but not bias in the inferences (Rubin 1987); on the other hand, leaving out relevant causes of missingness will yield biased estimation (Schafer, 1999; Schafer & Olsen, 1998). Nonignorable Missing Under the NIM assumption, the conditional probability distribution of missing data indicator R given X is a function of the missing scores Y,, or the values of unmeasured relevant variables, and possibly also the observed scores Yo (Laird, 1988). The unmeasured variables may be unavailable or inaccessible. f(R I Yo, Y,,,, X, ) (235) For example, the probability of missing responses to an item in an achievement test depends on the missing scores of the item itself(e.g., item difficulty) and/or the examinees' true unobserved parameter (e.g., examinees' ability). Since the conditional probability distribution (235) can not be simplified, NIM involves joint probability distribution modeling of both the complete dataf(Yo, Y,,, X, 0) for Y, the missing data mechanismf(R I Yo, Y,, X, V) and the joint estimation of 0and y/ from Yo and R, respectively (Schafer, 1997). Little and Rubin (1987) suggested two ways to factorize the joint distribution of the complete data Y and missing data indicator R. One is based on the selection models: f(Y, RI X, y) =f(Y I )f( (R I Y, v) (236) where f(Y 0) is the model for the complete data Y, and f(R I Y, V) is the model for the missing data mechanism. The other is patternmixture models: f(Y, R  q, n) =f(YI R, p)f(R I ) (237) where f(Y I R, 4p) represents the distribution of Y conditioning on the missing data indicator R, f(R I T) represents the marginal distribution of the missing data indicator for whether or not Y is missing, and p and rare the two unknown parameters corresponding to the two distributions. Selection models specify the precise form of the nonresponse model, whereas patternmixture models incorporate the assumption of the missing data mechanism through restrictions on the parameters (Little, 1995). When R is independent of Y (i.e., when 0= p and y= a), the missing data mechanism becomes MCAR, and the selection models are equivalent to the patternmixture models. The likelihood function on the observed scores under the NIM assumption include a missing data indicator R and the missing data parameters y. L(0,y  Yo,X,R) oc f(Y,R I X,0,y) (238) The joint distribution for Y and R typically involves more parameters such as the YR interaction term than can be estimated from Yo and R alone (i.e., underidentifiable) (Little, 1995). In order to make them identifiable so that valid likelihoodbased inferences can be made about the marginal responses, a restriction on the assumption is required. Schafer (1997) suggested that a'priori restrictions be imposed on either the joint parameter space for 0and V, or the Bayesian prior distribution (0, /). Conceptually, under the NIM assumption, the distribution of respondents and nonrespondents on Y differs systematically, even after conditioning on the values of measured variables X (Rubin & Schenker, 1991). Compared to equation (23), the posterior predictive probability distribution under the NIM assumption needs to include a full specification of the probability model with the joint distribution of Y, the nonresponse pattern R, and the measured variables X. f(Y I YX, R) = Jf(Y Yo,,X,R, )f(9 1 Y,,X,R)dO (239) Sensitivity Analysis Often time, little is known about the nonresponse mechanism that creates the missing responses in a particular achievement test. Missing responses can arise from a variety of reasons including a combination of ignorable and nonignorable mechanisms (Schafer & Olsen, 1988). However, distinguishing between ignorable and nonignorable mechanisms (i.e., MAR and NIM) relies on fundamentally untestable assumptions (Curran et al., 1998). Curran and associates (1998) demonstrated that these assumptions cannot be tested formally from the empirical data at hand. Analyses should be conducted to compare the estimates across a number of plausible missingdata models. Inferences from the sensitivity analysis reveal uncertainty about reasons for nonresponse (Beaton, 1997; Little & Rubin, 1989). Sensitivity analysis can also be conducted across alternative imputing procedures in a similar manner to reveal uncertainty about different possible imputation models. Under the NIM assumption, sensitivity analysis can be performed by comparing estimates between selection and patternmixture models. If the results are consistent, confidence about the conclusions is established. On the other hand, if the results depend on the form of the model, then more specific conditions can be suggested about where the conclusion can apply (Little, 1995). 44 Summary Valid inferences from MI rely on the inclusion of a correct missing data mechanism. As discussed above, factorization of the posterior predictive probability distribution depends on whether the missing data mechanism is ignorable or not. When the missing data mechanism is not ignorable, the missing data indicator has to be incorporated into the posterior predictive probability distribution, and the likelihood function of 0 is not just based on the observed scores. CHAPTER 3 METHODOLOGY This chapter first describes the data generation procedure, the design of the study, and the MI procedure. Data generation was based on the threeparameter logistic model (Hambleton & Swaminathan, 1985). The design of this study involved the distribution and percent of missing responses as a function of the ability of the examinees and the difficulty of the items in one omitting pattern, and the ability of the examinees and the sequence of the items in another omitting pattern. The procedure of MI based on a logistic regression model with a univariate Ywas outlined. The second part of this chapter discusses how to evaluate the effectiveness of MI on handling missing data. To achieve the goal of evaluating the effectiveness of MI on handling missing data, several steps were required. Step 1. Simulated a complete data matrix of item responses for a specified number of examinees. Step 2. Computed the coefficient alpha. The coefficient alpha of this original complete data (i.e., 0% of missing ) served as a benchmark for later comparison. Step 3. Nonrandomly deleted certain percent of item responses from the complete examineebyitem matrix generated in Step 1. Each missing data set was generated in a similar fashion. Step 4. Replaced the omitting item responses in Step 3 using MI. Step 5. Computed the coefficient alpha of the data set that was restored by MI. Step 6. Compared the coefficient alpha from Step 2 with the one from Step 5. Simulation Procedure Let W be an Nx P matrix representing a complete examineebyitem data set. N is the number of examinees in the data set and P is the number of test items. In this study, P was fixed to be 20 in all conditions. A 20item test was used because it represented test lengths frequently found in educational and psychological applications (Yen, 1987) such as the American Mathematical Association of TwoYear Colleges' Student Mathematics League contest (Isaacson & Smith, 1993). The response of each item was dichotomous in nature. Simulation of the 20 dichotomouslyscored item responses of a specified number of examinees was based on the threeparameter logistic model (Hambleton & Swaminathan, 1985). () = c, + (1 c) exp[ b)] (31) 1 + exp[Da,(2 b,)] for i = 1,..., 20, andj = 1,..., n. where P(~5) is the probability of thejth examinee with an ability answering the ith item correctly, ai is the discrimination parameter of item i, bi is the difficulty parameter of item i, c; is the pseudochance parameter of item i, . is the ability ofthejth examinee, and D is a scaling factor, which is 1.7. To compute P(<), it required the three parameters (i.e., a, b, & c) and the ability parameters (i.e., 4) to be known. The three parameters were drawn from the distributions of known mean and standard deviation. Harrison (1998) used the criteria of Oshima's (1994) study to sample the three parameters. The criteria for Oshima's (1994) study were as follows: the item discrimination parameters (i.e., a) were randomly drawn from a lognormal distribution with a mean of 1.13 and a standard deviation of 0.63; the item difficulty parameters (i.e., b) were randomly drawn from a normal distribution with a mean of 0 and a standard deviation of 1; and the pseudochance parameters (i.e., c) were randomly drawn from a normal distribution with a mean of 0.25 and a standard deviation of 0.05. According to Oshima (1994), the distributions of these three parameters were similar to the real data set of a speeded test (i.e., TOEFL) as reported by Way and Reese (1991). The ability parameters, 4, for the examinees were randomly generated from a standard normal N(0,1) distribution. The present study used the same values of the three item parameters as in Harrison's (1998, p. 7) study to generate the item responses (Table 31). The correlation between the item difficulty parameters (b) and the item discrimination parameters (a) was .111 (p = .642, twotailed), whereas the correlation between the item difficulty parameters (b) and the pseudochance parameters (c) was .281 (p =.230, twotailed). The response data for a particular item given by an examinee with a trait level of ability, P,({), was determined by computing the probability of correctly answering that item based on the known item and ability parameters. Since the item responses were dichotomous in nature, the response probabilities Pi(.) were converted into binary responses by comparing to a random number r generated from a uniform distribution in the interval between 0 and 1. The random number r was used to determine whether the score of a particular item was correct or incorrect. If the response probability obtained from the equation (31) was greater than the random number r, a 1 was assigned, which indicated the examinee's response to that particular item was correct; otherwise a 0 was assigned, which indicated the examinee's response to that particular item was incorrect. KuderRichardson 20 Formula Since the test items are scored dichotomously, KuderRichardson 20 formula (KR 20) was used to calculate the index of internal consistency for the test items. The Kuder Richardson 20 formula is equivalent to coefficient alpha when the item responses are dichotomous in nature (Kuder & Richardson, 1937). [ s"p iqi KR 20= 1s (32) s1 or, where s is the number of items in the test, a is the variance of the test scores, pi is the proportion of subjects answering item correctly, qi is the proportion of subjects answering item incorrectly, and piqi is the variance of scores on a single item i. Table 31. Item Parameters Used for Test Simulation Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Discrimination (a) 0.269 0.236 2.817 8.565 1.452 1.043 1.594 1.258 5.502 2.468 1.016 3.413 2.238 2.370 2.635 0.533 1.601 2.809 0.036 7.637 Difficulty (b) 0.772 0.129 0.979 0.235 0.072 1.245 1.504 0.545 0.802 2.408 0.048 2.062 0.262 1.158 0.314 0.536 1.177 0.471 0.475 0.203 Note. Adapted from Harrison (1998) Pseudochance (c) 0.176 0.200 0.257 0.193 0.264 0.246 0.229 0.221 0.250 0.306 0.231 0.240 0.287 0.207 0.276 0.319 0.320 0.261 0.297 0.328 Design of Study This study represents a 3 x [(3 x 3) + 1] x 2 design with three factors: sample size (3 levels), percent of examinees with missing items (3 levels), percent of items missing for each examinee with missing items (3 levels), and omitting pattern (2 levels) that were fully crossed. An additional condition with disproportional percent of examinees missing items that were nonrandomly distributed across and within each ability group was included. The rationales for selecting the levels in each factor were described below. Sample Size The three levels of sample size chosen in this study were: N= 50, 100, or 500. The sample size of 50 examinees is typical for validation studies (Schmidt, Hunter, & Urry, 1976). The sample size of 500 was same as the real data that Raghunathana and Siscovick (1996) used in studying the performance of MI. These three levels, representing the small to largesample sizes, were also used by Graham and Schafer (1999) to evaluate the efficiency of MI in a simulation study. The present study adopted these three levels of sample size to allow comparison of the performance of MI with that of other MDTs investigated by Harrison (1998). Distribution and Percent of Missing Responses In order to simulate a more realistic distribution and percent of nonresponse across test items, the distribution and percent of missing responses were based on the findings from a large scale study, the Reading Comprehension subtest, Level I, of the Comprehensive Test of Basic Skills, Form S (Cluxton & Mandeville, 1979). In Cluxton and Mandeville's (1979) study, they stratified one thousand third grade students into three ability levelshigh, medium, and low. They found the proportion of students with missing items within each stratified ability level was: 020% for the high ability group, 2080% for the mediumability group, and 90100% for the lowability group. They also reported the proportion of missing items (out of the 45 items in the subtest) for students within each stratified ability level was approximately: 716% for the highability group, 1838% for the mediumability group, and 4049% for the lowability group. The correlation between the ability of students and the number of items missing in the body of the test was .76; and the correlation between the ability of students and the number of items missing at the end of the test was .47 (Cluxton & Mandeville, 1979). Based on the range of the proportion of students with missing items within each ability level, and the range of proportion of items missing provided in Cluxton and Mandeville's (1979) study, the distribution and percent of missing responses in this study were constructed in four steps: First, the ability of the examinees in a sample were rank ordered. Second, the examinees in each data set were stratified into three ability levels. Stratification was based on the assumption that the data were normally distributed N (0,1). Plus and minus one standard deviation in each sample were used as the cutoff to stratify the three ability groups. As a result, approximate 68% of the examinees were within the one standard deviation band and these students were classified as the mediumability group. About 16% of the examinees were above one standard deviation and these students were classified as the highability group, and about 16% of examinees were below one standard deviation and these students were classified as the lowability group. For the percent of examinees with missing items (%EMI), three conditions (%EMII, %EMI2, and %EMI3) were constructed. In the first condition %EMIi, the percent of the high, medium, and lowability examinees missing some test items were 0%, 20%, and 90% respectively. In the second condition %EMI2, the percent of the high, medium, and lowability examinees missing some test items were 10%, 50%, and 95%. In the third condition %EMI3, the percent of the high, medium, and lowability examinees missing some test items were 20%, 80%, and 100%. The above three conditions respectively corresponded to the minimum, the median, and the maximum percent of examinees with missing items in each ability level provided by Cluxton and Mandeville (1979). Fourth, for the percent of items missing in those examinees with missing items responses (%IM), another three conditions (%IMi, %IM2, and %IM3) were constructed. The first condition %IMI was 7% of the items missing in the highability group, 18% of the items missing in the mediumability group, and 40% of the items missing in the low ability group. The second condition %IM2 was 12% of the items missing in the high ability group, 28% of the items missing in the mediumability group, and 45% of the items missing in the lowability group. The third condition %IM3 was 16% of the items missing in the highability group, 38% of the items missing in the mediumability group, and 49% of the items missing in the lowability group. The three conditions respectively corresponded to the minimum, the median, and maximum percent of items missing in each ability level provided by Cluxton and Mandeville (1979). The above two sets of conditions were crossed to create nine missing conditions as shown in Figure 31. For example, the results of one combination were 20% of the highability examinees with three missing items (i.e., 16% of the 20 test items), 80% of the mediumability examinees with eight missing items (i.e., 38% of the 20 test items), and 100% of the lowability examinees with ten missing items (i.e., 49% of the 20 test items). The distribution and percent of missing responses represented the typical range of missing data in educational tests, which is approximately 1030% (Roth, 1994). Range / Condition Ability H M L M L M y Percent 7 0 20 90 50 95 80 Max / %IM ^~~JI L1 40 Med / %IM2 12 28 45 Min / %IM3 161 38 Note. Max = Maximum, Med = Medium, and Min = Minimum. Figure 31. Distribution and percent of missing responses in the nine missing conditions. 49 Max / %EMI1 %EMI2 %EMI3 I In addition to the nine missing conditions, an additional condition with disproportional percent of examinees omitted items that were nonrandomly distributed across and within each ability group was included (Roth, 1994). For example, more items were missing at the bottom range of the highability group but a relatively fewer items were missing at the top range of the same group (Roth, 1994). The procedure was to stratify each ability group (high, medium, and low) into three substrata. Stratification once again was based on plus and minus one standard deviation of the sample size within each of the three ability groups. The percent of examinees missed some items within each substratum ability group (%EMIs)was: 0, 10, 20 (in the highability group); 20, 50, 80 (in the mediumability group); 90, 95, 100 (in the lowability group). The corresponding percent of item missing within each ability substratum (%IMs) was: 7, 12, 16 (for the highability group); 18, 28, 38 (for the mediumability group); and 40, 45, 49 (for the lowability group). The two situations were then crossed to form the tenth condition. Table 32 summarized the distribution and percent of missing responses of the ten missing conditions. Table 32. Summary the Distribution and Percent of Missing Responses of the Ten Missing Conditions Condition Description %EMIi x %IM 0% of the highability examinees having one missing item (i.e., 7% of the 20 test items) plus 20% of the mediumability examinees having four missing items (i.e., 18% of the 20 items) plus 90% of the lowability examinees having eight missing items (i.e., 40% of the 20 items). The total percent of missing data is approximately 8.4%. %EMI, x %IM2 0% of the highability examinees having two missing items (i.e., 12% of the 20 test items) plus 20% of the mediumability examinees having six missing items (i.e., 28% of the 20 items) plus 90% of the lowability examinees having nine missing items (i.e., 45% of the 20 items). The total percent of missing data is approximately 10.5%. %EMIi x %IM3 0% of the highability examinees having three missing items (i.e., 16% of the 20 test items) plus 20% of the mediumability examinees having eight missing items (i.e., 38% of the 20 items) plus 90% of the lowability examinees having ten missing items (i.e., 49% of the 20 items). The total percent of missing data is approximately 12.6%. %EMI2 x %IMN 10% of the highability examinees having one missing item (i.e., 7% of the 20 test items) plus 50% of the mediumability examinees having four missing items (i.e., 18% of the 20 items) plus 95% of the lowability examinees having eight missing items (i.e., 40% of the 20 items). The total percent of missing data is approximately 13.3%. Condition Description %EMI2 x %IM2 10% of the highability examinees having two missing items (i.e., 12% of the 20 test items) plus 50% of the mediumability examinees having six missing items (i.e., 28% of the 20 items) plus 95% of the lowability examinees having nine missing items (i.e., 45% of the 20 items). The total percent of missing data is approximately 17.6%. %EMI2 x %IM3 10% of the highability examinees having three missing items (i.e., 16% of the 20 test items) plus 50% of the mediumability examinees having eight missing items (i.e., 38% of the 20 items) plus 95% of the lowability examinees having ten missing items (i.e., 49% of the 20 items). The total percent of missing data is approximately 21.9%. %EMI3 x %IMI 20% of the highability examinees having one missing item (i.e., 7% of the 20 test items) plus 80% of the mediumability examinees having four missing items (i.e., 18% of the 20 items) plus 100% of the lowability examinees having eight missing items (i.e., 40% of the 20 items). The total percent of missing data is approximately 17.4%. %EMI3 x %IM2 20% of the highability examinees having two missing items (i.e., 12% of the 20 test items) plus 80% of the mediumability examinees having six missing items (i.e., 28% of the 20 items) plus 100% of the lowability examinees having nine missing items (i.e., 45% of the 20 items). The total percent of missing data is approximately 23.8%. %EMI3 x %IM3 20% of the highability examinees having three missing items (i.e., 16% of the 20 test items) plus 80% of the mediumability examinees having eight missing items (i.e., 38% of the 20 items) plus 100% of the lowability Condition Description examinees having ten missing items (i.e., 49% of the 20 items). The total percent of missing data is approximately 30.2%. %EMIs x %IMs 0% of the upper division of the highability examinees having one missing item (i.e., 7% of the 20 test items) plus 10% of the middle division of the highability examinees having two missing items (i.e., 12% of the 20 items) plus 20% of the lower division of the highability examinees having three missing items (i.e., 16% of the 20 items) plus 20% of the upper division of the mediumability examinees having four missing items (i.e., 18% of the 20 test items) plus 50% of the middle division of the medium ability examinees having six missing items (i.e., 28% of the 20 items) plus 80% of the lower division of the mediumability examinees having eight missing items (i.e., 38% of the 20 items) plus 90% of the upper division of the lowability examinees having eight missing items (i.e., 40% of the 20 test items) plus 95% of the middle division of the lowability examinees having nine missing items (i.e., 45% of the 20 items) plus 100% of the lower division of the lowability examinees having ten missing items (i.e., 49% of the 20 items. The total percent of missing data approximately 18.0%. Omitting Pattern The two nonrandom omitting patterns were: (1) omitting item responses in the body of the test (OPB), and (2) omitting item responses at the end of the test (OPE) (i.e., nonreached). The first situation was similar to the missing data mechanism 5 in Harrison's (1998) study. However, in contrast to Harrison's (1998) study where examinees with the lowest abilities missed the most difficult items, examinees with different levels of abilities missed the most difficult items differentially. That meant that the highability examinees missed fewer difficult items than those of the mediumability examinees, and in turn the mediumability examinees missed fewer difficult items than those of the lowability examinees (see Figure 32). The nonreached pattern was similar to the combination of missing data mechanism 2 and 3 in Harrison's (1998) study. However, the selection of missing responses was based on the examinees' ability instead of random selection as the missing data mechanism 2 in Harrison's (1998) study. Once again, the highability examinees missed fewer items at the end of a test than those of the mediumability examinees, and in turn the mediumability examinees missed fewer items at the end of a test than those of the lowability examinees (see Figure 33). Least difficult Most difficult Item Difficulty 4 i II ,:. I. : 3 12 7 I1 Person Ability S12 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 0 4 1 0 1 0 0 0 1 10 0 1 1 1 1 1 1 8 I 1 I 0 I 0 1 1 I I 1 I I I 0 12 0 0 0 1 0 0 1 0 0 1 111 0 1 0 0 110 0 1 1 1 10 I III 10 10 1 1 1 1 0 001 1 I1 13 1 0 0 1 0 0 1 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 0 1 Figure 32. Illustrate the omitting pattern of missing item responses in the body of the test (OPB) with 15 examinees and 20 test items. I 2 3 4 5 I 7 a 9 10 II 12 1 U 14 I 16 1 I S 19 2 (high ability) 1 0 1 0 1 0 0 0 I I 0 0 1 I I 1 1 1 0111110101111111101 1.4 0 000 00 1111111 8 I I I 0 1 0 I I I I I 1 I 1 0 12 0 1 0 0 0 0 I I 0 I 0 I 0 I 1 I I 0 0 0 0 0 I 0 0 I I 10 1 1 1 1 1 0 1 0 1 1 1 1 1 1 10. I 0 0 I 0 0 1 0 0 I I 0 3 1 0 1 1 1 1 1 1 1 0 1 2 I I I I 0 I 0 1 I 0 0 .. ... : 0 1 0 0 1 1 0 1 0 0 9 00011101 14 1 1 0 0 1 1 0 000 7 1 1 0 0 1 0 0 0 1 9 I 0 0 0 I I I 0 1 (lower ability) Figure 33. Illustrate the omitting pattern of missing item responses at the end of the test (OPE) with 15 examinees and 20 test items. Cause of Missingness Under the omitting pattern in which item responses was omitted in the body of the test, the examinees' ability and the item difficulty provided an indirect evidence about the likely values of the missing responses. On the other hand, when the item responses were omitted at the end of the test, the examinee's ability and the item effect, which is the random effect of the test items, provide an indirect evidence about the likely values of the missing responses. Since the cause of missingness in this study was under the researcher's control, and the differential amount of missing responses was a function of the examinees' ability and item difficulty, or a function of the examinee's ability and item effect depending on the omitting patterns, the missing data mechanism could be considered as missing at random (Graham et al., 1997). The missing data mechanism was therefore ignorable. Iterations For each of the 60 conditions (3 levels of sample size, 10 levels of the distribution and percent of item response omission, and 2 levels of omitting pattern), one thousand iterations were performed to ensure stable results. The one thousand iterations have been used in two previous simulation studies in the evaluation of the efficiency of MI (Glynn, Laird, & Rubin, 1993; Graham et al., 1996). The iterations resulted in generating 1,000 repeated data set for each level of sample size. Multiple Imputation Procedure Let Ybe an N x 1 vector of measures with Y~ N (X,/, o), where X? covariate was a function of the parameters, X was a matrix of examinees' ability and item difficulty variables under the situation when the omitting item responses were in the body of the test, or examinees' ability and item effect variables under the situation when the omitting item responses were at the end of the test, and / was a vector of regression parameters to be estimated. The distribution of was assumed to be multivariate normal. The algorithm for creating ten multiply imputed Ym involved the following steps (Freedman, 1990; Freedman & Wolf, 1995): Step 1. Specified a particular form of imputation model to predict the value of a missing variable Y and estimate the parameter vector 3 of regression coefficients using the portion of the sample with complete data. Since the item responses in this study were dichotomous in nature, the prediction model required a logistic regression model with a univariate Yof the form Logit (pj) = In pj = o + Xo+ j + f2X2j (33) lPj) where = 1,..., n examinees. The set of predictors entered the explicit model to create imputations for OPB differed from that for OPE. Under the situation when the omitting item responses were in the body of the test, XIj was the examinee's ability, and X2j was the item difficulty, whereas under the situation when the omitting response were at the end of the test, Xij was the examinee's ability, and X2j was the item effect, which was the random effect of the 20 items. As a result, there were two distinct logistic regression models for the MI procedure. The posterior probability of the response given Xij and X2j was pj = E (Yj Xi, X2) = Pr (Yj = 1 I Xj, X2j) (34) The logistic regression model assumed that the logit of the posterior probability was a linear combination of the Xij and X2j variables. 1 if logit [ ^Xj] > , Yj = (35) 0 otherwise, where i, was a random error. Regressed Y, on the corresponding X. matrix gave the ordinary least squares estimates: estimated regression coefficient vector f and estimated variancecovariance matrix E. Step 2. Randomly drew from the sampling distribution of regression coefficients. Estimated the regression coefficients by adding the random error to account for the uncertainty about the regression prediction, which was the same as equation (211). Each repetition used a distinct value of/* common across all imputed cases. This value was drawn independently from the multivariate normal distribution of the estimated vector/8. Step 3. Given an estimate of / and the value for X1, X2, the probability of answering an item correctly could be predicted with equation (35) by drawing a value of u drawn from the uniform [0,1] distribution. Each set of predicted values Y. was based on different sets of regression predictions and an independent drawn value of u The probability of a correct response for respondents in the kth repetitions, Pj(k), was calculated with the randomly selected regression coefficients and the value ofj for the corresponding covariates from the logistic regression: P 1' =1+ex. Iwx 2)X2 Step 4. The estimated probability j(k) from the logistic regression was compared to a random number t from the uniform [0,1] distribution for each missing score. If the predicted probability pj(k) was less than t then the imputed value for YJ(k) was assigned a 1; otherwise the imputed value was assigned a 0. These probabilities were used to impute the missing scores. Step 5. Conducted ten repetitions which meant repeating steps 2 to 4 ten times to create a series of ten imputed values (i.e., ten distinct imputed data sets). Step 6. Computed the KR 20 (i.e., coefficient alpha) separately in each of the ten imputed complete data sets. This resulted in ten separate coefficient alphas. Step 7. Using the equation (215), the final adjusted coefficient alpha was obtained by taking the simple arithmetic average of the ten coefficient alphas. This final coefficient alpha was then compared to the one obtained from the original complete data set. Evaluating the Performance of Multiple Imputation The accuracy (bias and precision) of the coefficient alpha obtained from the restored complete data set in each of the ten missing conditions using MI was assessed by means of the bias and the rootmeansquare error (RMSE). Measures of the bias and RMSE were averaged over the 1,000 iterations of the simulation. Bias is defined as the average value of the coefficient alphas derived from the original complete data set with no missing data minus the average value of the coefficient alphas from the corresponding imputed data set over the 20,000 (i.e., 2 x 10 x 1000) completed tests for a particular number of examinees. The estimated coefficient alpha is unbiased when the average deviation (i.e., bias) between the coefficient alpha obtained from the imputed values and that of the original values in the data set is close to 0. RMSE is defined as the square root of the average squared difference between the coefficient alpha derived from the original complete data set with no missing data and the coefficient alpha from the corresponding imputed data set. The estimated coefficient alpha is precise when the RMSE is close to 0. RMSE = (a of original data a of restored data using MI)2 (36) The relationship between RMSE and bias is (RMSE)2 = (bias)2 + (SE)2 (37) where RMSE is the root mean square error, which represents an overall error, bias is the average deviation, which represents a systematic error, and SE is the standard error, which represents random error. CHAPTER 4 RESULTS In this chapter results of the analyses of the data for the two performance criteria are presented. The two criteria are the bias and root mean square error (RMSE). The mean coefficient alpha and its standard deviation of the original complete data set with no missing data for 50, 100, and 500 examinees were M = 0.765, SD = 0.033; M = 0.764, SD = 0.023; M = 0.763, SD = 0.01 respectively. Each mean coefficient alpha was based on the average of 20,000 (10 missing conditions x 2 omitting patterns x 1000 iterations) values. The results of these mean coefficient alphas were very close to those computed by Harrison (1998). For example, the mean coefficient alpha for the sample size of 50 in Harrison's study was 0.769. The mean coefficient alphas and their standard deviation of the restored completed data set using MI for the ten missing conditions in each of the three levels of sample size and two levels of omitting pattern are shown in Figures 41 to 46. The biases obtained in each of the ten missing conditions for the two omitting patterns are summarized in Tables 41 and 42. Under the omitting pattern where missing responses are at the end of the test (OPE), the biases (in absolute value) ranged from 0.000 to 0.030. The majority (93%) of the biases in OPE were in the magnitude of less than 0.02. The biases (in absolute value) obtained in OPE for the sample size 50, 100, and 500 ranged from 0.001 to 0.030, 0.000 to 0.016, and 0.000 to 0.009 respectively. The biases obtained in the omitting pattern where missing responses are in the body of the test (OPB) were noticeably higher than those in OPE of the corresponding missing conditions, the ), the biases (in absolute value) ranged from 0.019 to 0.069. The majority (97%) of the biases in OPB were less than 0.06. The biases (in absolute value) obtained in OPB for the sample size 50, 100, and 500 ranged from 0.027 to 0.069, 0.019 to 0.051, and 0.028 to 0.054 respectively. As expected, the largest bias was in the missing condition %EMI3 x %IM3 where the small sample size (50) accompanied with the largest percent (30.2%) ofmissingness. The bias in this condition was 0.069. The coefficient alphas in OPB were always overestimated (positively biased), whereas in OPE, about half of the coefficient alphas obtained through MI was overestimated and the other half was underestimated. Whether MI produced the coefficient alphas that were overestimated or underestimated in OPE did not depend on the percent ofmissingness. Further research needs to be conducted to explore why some of the coefficient alphas obtained from MI were overestimated while others were underestimated under the same omitting pattern. For condition %EMIs x %IMs in which nonrandom distribution of omission is across and within each ability group (Roth, 1994), the bias obtained in this condition, regardless of the omitting patterns, was similar to other missing conditions where the percent ofmissingness was about the same (e.g., missing condition %EMI2 x %IM2). The RMSEs obtained in each of the ten missing conditions for the two omitting patterns are summarized in Tables 43 and 44. The results were very similar to those obtained for the bias. In general, the bias (in absolute value) or the RMSE increased as the amount of missingness increased. Graham and Schafer (1999) explained this phenomenon by suggesting that MI introduced bias when handling missing data. However, the pattern of increment in the bias or the RMSE was not unidirectional as indicated in Tables 41 to 4 4. There were irregularities in the magnitude of the bias or the RMSE across the ten missing conditions within each sample size. That means in some missing conditions, the magnitude of the bias or the RMSE for the smaller amount of missingness was bigger than that of the larger amount of missingness even both conditions had the same sample size. This kind of irregularity was also noticed in Graham and Schafer's (1999) simulation study. Another general pattern revealed in this study was that the bias decreased as the sample size increased. Once again the pattern of increment in the bias was not unidirectional as indicated in Tables 41 and 42. There were irregularities across the three sample sizes, and this kind of irregularity was also noticed in Graham and Schafer's study. On the other hand, the magnitude of the RMSE in OPE, but not in the OPB showed a clear pattern of decrement as the sample size increased (see Table 43 and 44). 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0 U 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 Missing Condition Figure 41. The mean coefficient alpha in OPB with sample size of 50. 1.0 0.9 0.8 A 0.7 0.6 o 0.5 0.4 o 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 Missing Condition Figure 42. The mean coefficient alpha in OPE with sample size of 50. 1.0 0.9 0.8 0.7 0.6 0.5 0 u 0.4  0.2 0.1 0.0  1 2 3 4 5 6 7 8 9 10 Missing Condition Figure 43. The mean coefficient alpha in OPB with sample size of 100. 1.0 0.9 0.8 c, 0.7 0.6 0.5 0.4 0 U 0.3  0.2 0.1 0.0 7 7. T 1 2 3 4 5 6 7 8 9 10 Missing Condition Figure 44. The mean coefficient alpha in OPE with sample size of 100 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 7  1 2 3 4 5 6 7 8 9 10 Missing Condition Figure 45. The mean coefficient alpha in OPB with sample size of 500. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 u 0.3 0.2 0.1 0.0.... 1 2 3 4 5 6 7 8 9 10 Missing Condition Figure 46. The mean coefficient alpha in OPE with sample size of 500. Table 41. Bias for the Coefficient Alpha in Omitting Pattern Where Missing Responses are in the Body of the Test Sample size 50 100 500 Missing Approx. % Sof Mean (SD) Mean (SD) Mean (SD) ConditionMssngness Missingness %EMI1 x %IM, %EMIl x %IM2 %EMI, x %IM3 %EMI2 x %IMi %EMI3 x %lM, %EM12 x %IM2 %EMI, x %IM, %EMI2 x %IM3 %EMIs x %IM2 %EMI3 x %IM3 8.4% 10.5% 12.6% 13.3% 17.4% 17.6% 18.0% 21.9% 23.8% 30.2% 0.036 (0.017) 0.028 (0.018) 0.035 (0.019) 0.042 (0.020) 0.059 (0.021) 0.054 (0.022) 0.034 (0.020) 0.027 (0.023) 0.031 (0.024) 0.069 (0.023) 0.037 (0.012) 0.019 (0.014) 0.030 (0.015) 0.036 (0.013) 0.051 (0.015) 0.050 (0.015) 0.047 (0.015) 0.043 (0.016) 0.040 (0.016) 0.047 (0.018) 0.028 (0.005) 0.029 (0.006) 0.043 (0.006) 0.036 (0.006) 0.043 (0.007) 0.035 (0.007) 0.047 (0.007) 0.040 (0.007) 0.054 (0.007) 0.048 (0.008) Table 42. Bias for the Coefficient Alpha in Omitting Pattern Where Missing Responses are at the End of the Test Sample size 50 100 500 Missing Approx. % Missing of Mean (SD) Mean (SD) Mean (SD) ConditionM Missingness 0.017 (0.019) 0.006 (0.017) 0.009 (0.020) 0.001 (0.023) 0.013 (0.024) 0.004 (0.022) 0.005 (0.024) 0.024 (0.025) 0.030 (0.028) 0.017 (0.029) 0.011 (0.014) 0.010 (0.014) 0.015 (0.017) 0.016 (0.013) 0.010(0.014) 0.013 (0.016) 0.000 (0.018) 0.006 (0.020) 0.008 (0.020) 0.006 (0.021) 0.007 (0.006) 0.002 (0.006) 0.007 (0.007) 0.007 (0.006) 0.009 (0.008) 0.008 (0.007) 0.005 (0.007) 0.008 (0.008) 0.000 (0.008) 0.003 (0.009) %EMIT x %IMI %EMIT x %1M2 %EMIi x %IM3 %EMI2 x %IMi %EMI3 x %IMI %EMI2 x %1M2 %EMI, x %IM, %EMI2 x %IM3 %EMI3 x %IM2 %EMI3 x %IM3 8.4% 10.5% 12.6% 13.3% 17.4% 17.6% 18.0% 21.9% 23.8% 30.2% Table 43. RMSE for the Coefficient Alpha in Omitting Pattern Where Missing Responses are in the Body of the Test Sample size 50 100 500 Missing Approx. % niin of Mean (SD) Mean (SD) Mean (SD) MissigConditionness Missingness 0.037 (0.017) 0.029 (0.017) 0.035 (0.018) 0.043 (0.019) 0.059 (0.021) 0.054 (0.022) 0.034 (0.019) 0.030 (0.019) 0.033 (0.021) 0.069 (0.023) 0.037 (0.012) 0.020 (0.013) 0.030 (0.014) 0.036 (0.013) 0.051 (0.015) 0.050 (0.015) 0.047 (0.015) 0.043 (0.016) 0.040 (0.016) 0.047 (0.018) 0.028 (0.005) 0.029 (0.006) 0.043 (0.006) 0.036 (0.006) 0.043 (0.007) 0.035 (0.007) 0.047 (0.007) 0.040 (0.007) 0.054 (0.007) 0.048 (0.008) %EMI, x %IMI %EMI, x %IM2 %EMI1 x %1M3 %EMI2 x %IMI %EMI3 x %IMI %EMI2 x %1M2 %EMI, x %IM, %EMI2 x %IM3 %EMI3 x %IM2 %EMI3 x %IM3 8.4% 10.5% 12.6% 13.3% 17.4% 17.6% 18.0% 21.9% 23.8% 30.2% Table 44. RMSE for the Coefficient Alpha in Omitting Pattern Where Missing Responses are at the End of the Test Sample size 50 100 500 Missing Approx. % of Mean (SD) Mean (SD) Mean (SD) ConditionMissingness Missingness 0.021 (0.014) 0.015 (0.011) 0.018 (0.014) 0.018 (0.014) 0.022 (0.016) 0.018 (0.014) 0.020 (0.015) 0.028 (0.021) 0.033 (0.024) 0.027 (0.020) 0.015 (0.010) 0.014 (0.010) 0.018 (0.013) 0.017 (0.011) 0.014 (0.011) 0.017 (0.013) 0.014 (0.011) 0.016 (0.013) 0.017 (0.013) 0.017 (0.013) 0.008 (0.005) 0.005 (0.004) 0.008 (0.005) 0.008 (0.005) 0.010 (0.006) 0.009 (0.006) 0.007 (0.005) 0.009 (0.006) 0.006 (0.005) 0.007 (0.006) %EMI1 x %IMi %EMI, x %IM2 %EMIi x %IM3 %EMI2 x %IMI %EMI3 x %IMi %EMI2 x %IM2 %EMI, x %IM, %EMI2 x %IM3 %EMI3 x %IM2 %EMI3 x %IM3 8.4% 10.5% 12.6% 13.3% 17.4% 17.6% 18.0% 21.9% 23.8% 30.2% CHAPTER 5 DISCUSSION Because there was no substantial bias for all the missing conditions, the results of this simulation study indicated that MI is a reasonably good procedure to replace the missing data in a singlefacet crossed model in which missing responses are either in the body of the test or at the end of the test. The majority of the biases obtained were less than 0.05, and the magnitude was comparable to those obtained in Harrison's (1998) study. The most significant difference was that the amount ofmissingness in the present study was two to three times more than that used in Harrison's study, and the omitting patterns were nonignorable. The present study used the examinee's ability and item difficulty b as the predictors in the logistic regression when the missing responses were in the body of the test, and the examinee's ability and item effect i as the predictors when the missing responses were at the end of the test. The predictors used in the present study differed from the ones used by Harrison (1998). Harrison (1998) used examinee effect and item effect i as the predictors. Results of using and i as the predictors in the present study indicated that the biases for the coefficient alpha were unacceptably higher than those obtained using and b, or and i. For example, in the missing condition %EMI3 x %IM3 with a sample size of 50, the bias obtained using thej and i as the predictors was 0.211 when missing responses were in the body of the test, and 0.233 when missing responses were at the end of the test. This illustrated one of the limitations of MI as mentioned in Chapter 3, namely that inference based on MI will be biased when relevant predictors are not incorporated (Schafer, 1999; Schafer & Olsen, 1998). An attempt to include more predictors (i.e., examinee's ability item difficulty b, examinee effect and item effect i) in the logistic model did not help to reduce the bias. For example, the bias obtained using 4, b,j and i as the predictors was 0.07 in the missing condition %EMI3 x %IM3 when the omitting pattern was OPB and the sample size was 50. This illustration affirmed Rubin's (1987) suggestion that extra variables did not affect the bias in the inferences. Further systematic studies need to be conducted to support Rubin's claim regarding the relationship between the bias and the number of predictors. Another possible factor that may affect the accuracy of the obtained coefficient alphas was the extreme value in some of the item discrimination parameters (e.g., a = 7.637). Unfortunately, a simpler model such as a oneparameter Rasch model with fixed item discrimination and pseudochance parameters did not help to reduce the bias. The bias for the above missing condition (i.e., %EMI3 x %IM3) was still in the magnitude of 0.07 when using and b as the predictors. When comparing the biases obtained from the two omitting patterns, it is suggested that examinee's ability rather than item effect or item parameters may contribute more to the accuracy of the parameter estimation. Further systematic investigation is warranted. Finally, a surprising finding was obtained when using listwise deletion to estimate the coefficient alpha in the above missing condition (i.e., %EMI3 x %IM3)the bias was 0.077. The bias (in absolute value) was much smaller than those obtained from Harrison's study (1998), even the amount of missingness was three time more. The bias obtained from the nonrandom missing conditions (with 10% ofmissingness) in Harrison's study was about 0.2. This surprising finding may have something to do with the idiosyncratic nature of the missing mechanism in this study. Further research need to systematic investigate this issue. Limitations The present study used examinee's ability and item difficulty as the predictors in OPB, and used examinee's ability and item effect as the predictors in OPE. However, in real life testing situations, the ability parameter and item parameter b need to be estimated first. Accurate estimation of these two parameters may not be possible in situations with a substantial amount of missing data (Bradlow & Thomas, 1998). Another limitation is that one may not be sure of the mechanism producing the missing data. Suggestions for Future Research The present study only illustrated one way of using MI to analyze the data. It is important to perform a sensitivity analysis to compare the results obtained in the present study with those when the nonresponse model was treated as nonignorable. A comparison of the coefficient alpha obtained using the selection approach model versus the pattern mixture model certainly would be informative. The bias obtained in the present study as well as in Graham and Schafer's (1999) study was not a linear function of the amount of missingness or the sample size. However, no good explanation can be given based on the limited information provided in the present study as well as in Graham and Schafer's (1999) study. This may be an important issue for further investigation. Because of the positively skewed distribution of the biases in OPB and the lower bound nature of the coefficient alpha, it is suggested that using the median instead of the mean to compute the final adjusted alpha may be worthwhile to investigate. This study illustrated two of the most commonly encountered omitting patterns missing responses in the body of the test and at the end of the test. In most real life educational tests, the types of omitting patterns are much more complicated and the missing responses as suggested by Schafer and Olsen (1988) can arise from a variety of reasons including a combination of ignorable and nonignorable mechanisms. Systematic investigation of the effectiveness of different MDTs especially RMLE and MI under the conditions involving a combination of ignorable and nonignorable is important for examining different kinds of missing responses. In chapter 2, several methods have been described to create the posterior predictive probability distribution from Yo; however, today few studies have attempted to compare different methods of data simulation applied to MI. Duncan, Duncan, and Li (1998) illustrated the use of data augmentation and bootstrap in a structural equation model. More studies should investigate the effectiveness of these data simulation methods. Obviously, the application of MI is not confined to the singlefacet situation; further study should explore the application of MI to multifacet situations where a generalizability coefficient is obtained. The incorporation of rater facet in nested designs 83 can be an extension of the present study to test the effectiveness of MI in handling missing data in a more complicated situation. REFERENCES Angoff, W. H., & Schrader, W. B. (1984). A study of hypotheses basic to the use of rights and formula scores. Journal of Educational Measurement, 21, 117. Bacik, J. M., Murphy, S. A., & Anthony, J. C. (1998). Drug use prevention data, missed assessments and survival analysis. Multivariate Behavioral Research, 33, 573 588. Barnard, J., Du, J. T., Hill, J. L., & Rubin, D. B. (1998). A broader template for analyzing broken randomized experiments. Sociological Methods and Research, 27, 285 317. Barnard, J., & Meng, X. L. (1999). Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 1736. Beaton, A. E. (1997). Missing scores in survey research. In J. P. Keeves (ed.), Educational research, methodology, and measurement: An international handbook (2nd ed., pp. 763766). New York: Pergamon Press. Bradlow, E. T., & Thomas, N. (1998). Item response theory models applied to data allowing examinee choice. Journal of Educational and Behavioral Statistics, 23, 236 243. Brownstone, D., &Valletta, R.G. (1996), Modeling earnings measurement error: A multiple imputation approach. Review of Economics and Statistics, 78, 705717. Chirembo, A. M. (1995). Direct versus indirect methods for the estimation of variancecovariance matrices and regression parameters when data are skewed and incomplete. Unpublished doctoral dissertation, University of Florida, Gainesville. Cluxton, S. E., & Mandeville, G. K. (1979, April). Latent trait models: Ability estimates and omitted items. Paper presented at the 63rd Annual Meeting of the American Educational Research Association, San Francisco, CA. Crawford, S. L., Tennstedt, S. L., & McKinlay, J. B. (1995). A comparison of analytic methods for nonrandom missingness of outcome data. Journal of Clinical Epidemiology, 48, 209219. Crocker, L., & Algina, J. (1986). Introduction to classical and modem test theory. New York: Holt, Rinehart, & Winston. Cronbach, L. J. (1971). Test validation. In R. L. Thomdike (Ed.), Educational measurement (2nd ed.). Washington, DC: American Council on Education. Curran, D., Bacchi, M., Hsu Schmitz, S. F., Molenberghs. G., & Sylvester, R. J. (1998). Identifying the types of missingness in quality of life data from clinical trials. Statistics in Medicine, 17, 739756. DeCanio, S.J., & Watkins, W.E. (1998). Investment in energy efficiency: Do the characteristics of firms matter? Review of Economics and Statistics, 80, 95107. Downey, D. G., & King, C. V. (1998). Missing data in Likert ratings: A comparison of replacement methods. Journal of General Psychology, 125, 175191. Duncan, T. E., Duncan, S. C., & Li, F. (1998). A comparison of model and multiple imputationbased approaches to longitudinal analyses with partial missingness. Structural Equation Modeling, 5, 121. Freedman, V. (1990). Using SAS to perform multiple imputation. (Discussion Paper Series UIPSC6). Washington, DC: Urban Institute, Freedman, V., & Wolf, D. A. (1995). A casestudy on the use of multiple imputation. Demography, 32, 459470. Gelfand, A. E., & Smith, A. M. F. (1990). Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association, 86, 398 409. Glynn, R. J., Laird, N. M., & Rubin D. B. (1993). Multiple imputation in mixture models for nonignorable nonresponse with followups. Journal of the American Statistical Association, 88, 984993. Graham, J. W., & Donaldson, S. I. (1993). Evaluating interventions with differential attrition: The importance of nonresponse mechanisms and use of followup data, Journal of Applied Psychology, 78, 119128. Graham, J. W., Hofer, S. M., Donaldson, S. I., MacKinnon, D. P., & Schafer, J. L. (1997). Analysis with missing data in prevention research. In K. J. Bryant, M. Windle, & S. G. West (Eds.), The science of prevention: Methodological advances from alcohol and substance abuse research (pp. 325366). Washington, DC: American Psychological Association. Graham, J. W., Hofer, S. M., & McKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planning missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197218. Graham, J.W., Hofer, S.M., & Piccinin, A.M. (1994). Analysis with missing data in drug prevention research. In L. M. Collins & L. A. Seitz (Eds.), Advances in data analysis for prevention intervention research (NIDA Research Monograph 142, pp. 13 62). Washington, DC: National Institute on Drug Abuse. Graham, J. W., & Schafer, J. L. (1999). On the performance of multiple imputation for multivariate data with small sample size. In R. H. Hoyle (Ed.), Statistical strategies for small sample research (pp. 129). Thousand Oaks, CA: Sage. Greenland, S., & Finkle, W. D. (1995). A critical look at methods for handling missing covariates in epidemiologic regression analyses. American Journal of Epidemiology, 142, 12551264. Gross, A. L. (1997). Interval estimation of bivariate correlations with missing data on both variables: A Bayesian approach. Journal of Educational and Behavioral Statistics, 22,407424. Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: KluwerNijhoff. Harrison, J. M. (1998). A comparison of strategies for estimating internal consistency on tests with missing data. Unpublished master's thesis, University of Florida, Gainesville. Heitjan, D. F. (1997). Annotation: What can be done about missing data? Approaches to imputation. American Journal of Public Health, 87, 548550. Heitjan, D. F., & Little, R. J. A.. (1991). Multiple imputation for the fetal accident report and system. Applied Statistics, 40, 1329. Heitjan, D. F., & Rubin, D. B. (1990). Inference from coarse data via multiple imputation with application to age heaping. Journal of the American Statistical Association, 85, 304314. Isaacson, J., & Smith, G. (1993). Hosting a mathematics tournament for twoyear college students. (ERIC Document Reproduction Service No. ED 366 382) Jamshidian, M., & Bentler, P. M. (1999). ML estimation of mean and covariance structures with missing data using complete data routines. Journal of Educational and Behavioral Statistics, 24, 2141. Kalton, G., & Kasprzyk, D. (1986). The treatment of missing survey data. Survey Methodology, 12, 116. Kim, J. 0., & Curry, J. (1977). The treatment of missing data in multivariate analysis. Sociological Methods and Research, 6, 215241. Koretz, D., Lewis, E., SkewesCox, T., & Burstein, L. (1993). Omitted and non reached items in mathematics in the 1990 National Assessment of Educational Progress. (ERIC Document Reproduction Service No. ED 378 220) Kromrey, J. D., & Hines, C.V. (1994). Nonrandomly missing data in multiple regression: An empirical comparison of common missingdata treatments. Educational and Psychology Measurement, 54, 573593. Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151160. Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7, 305315. Landerman, L. R., Land, K. C., & Pieper, C. F. (1997). An empirical evaluation of the predictive mean matching method for imputing missing values. Sociological Methods and Research, 26, 333. Little, R. J. A. (1992). Regression with Missing X's: A review. Journal of the American Statistical Association, 87, 12271237. Little, R. J. A. (1995). Modeling the dropout mechanism in repeatedmeasures studies. Journal of the American Statistical Association, 90, 11121121. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley. Little, R. J. A., & Rubin, D. B. (1989). The analysis of social science data with missing data. Sociological Methods and Research, 18, 292326. Little, R. J. A., & Schenker, N. (1995). Missing data. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 3975). New York: Plenum. Longford, N. T. (1994). Models for scoring missing responses to multiplechoice items. (ERIC Document Reproduction Service No. ED 382 650) Marcoulides, G. A. (1990). An alternative method for estimating variance components in generalizability theory. Psychological Report, 66, 379386. Michiels, B., & Molenberghs, G. (1997). Protective estimation of longitudinal categorical data with nonrandom dropout. Communication in Statistics Theory and Method, 26, 6594. Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17, 131154. Neal, T., & Nianci, G. (1997). Generating multiple imputations for matrix sampling data analyzed with item response models. Journal of Educational and Behavioral Statistics, 22, 425445. Oshima, T. C. (1994). The effect of speededness on parameter estimation in item response theory. Journal of Educational Measurement, 31, 200219. Peterson, R. A. (1994). A metaanalysis of Cronbach's coefficient alpha. Journal of Consumer Research, 21, 381391. Pollard, W. E. (1986). Bayesian statistics for evaluation research: An introduction. Beverly Hills, CA: Sage. Raaijmakers, Q. A. (1999). Effectiveness of different missing data treatments in surveys with Likerttype data: Introducing the relative mean substitution approach. Educational and Psychological Measurement, 59, 725728. Raghunathana, E., & Siscovick, S. (1996). A multipleimputation analysis of a casecontrol study of the risk of primary cardiac arrest among pharmacologically treated hypertensives. Applied. Statistics, 45, 335352. Raymond, M. R. (1987). Missing data in evaluation research. Evaluation and the Health Professions, 9, 395420. Roth, P. L. (1994). Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47, 537550. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581592. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473489. Rubin, D. B., & Schenker, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association, 81, 366374. Rubin, D. B., & Schenker, N. (1991). Multiple imputation in healthcare databases: An overview and some applications. Statistics in Medicine, 10, 585598. Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman & Hall. Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 315. Schafer, J. L., & Olsen, M. K. (1988). Multiple imputation for multivariate missingdata problems: A data analyst's perspective. Multivariate Behavioral Behavioral Research, 33, 545571. Schmidt, F. L., Hunter, J. E., & Urry, V. W. (1976). Statistical power in criterion related validation studies. Journal of Applied Psychology, 61, 473485. Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528550. van Buuren, S., Boshuizen, H. C., & Knook, D. L. (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681 694. Wang, C. Y., Anderson, G. L., & Prentice, R. L. (1999). Estimation of the correlation between nutrient intake measures under restricted sampling. Biometrics, 55, 711717. Wang, R., Sedransk, J., & Jinn, J. H. (1992). Secondary data analysis when there are missing observations. Journal of the American Statistical Association, 87, 952961. Way, W. D., & Reese, C. M. (1991). An investigation of the use of simplified IRT models for scaling and equating the TOEFL test. (ERIC Document Reproduction Service No. ED 395 024) Xie, F., & Paik, M. C. (1997). Multiple imputation methods for the missing covariates in generalized estimating equation. Biometrics, 53, 15381546. Yamamoto, K. (1995). Estimating the effects of test length and test time on parameter estimation using the HYBRID model. (ERIC Document Reproduction Service No. ED 395 035) Yen, W. M. (1987). A comparison of the efficiency and accuracy ofBILOG and LOGIST. Psychometrika, 52, 275291. BIOGRAPHIC SKETCH Hon Keung Yuen was born in 1961 in Hong Kong. He completed his undergraduate studies at Queensland University, Brisbane, Australia in 1986, where he majored in occupational therapy. In 1988, he received a master of science degree in occupational therapy from Western Michigan University. After five years of occupational therapy practice in the field of traumatic head injury rehabilitation, Mr. Yuen's interest in research grew. Between 1993 and 1996, he taught occupational therapy at Eastern Kentucky University and subsequently in the Hong Kong Polytechnic University. In 1996, he began working on his Ph.D. in the College of Education at the University of Florida, where he majored in research and evaluation methodology. While pursuing his Ph.D., Mr. Yuen also worked fulltime on the faculty of the Occupational Therapy Department at the University of Florida. Mr. Yuen has published over twelve articles in the American Journal of Occupational Therapy. Currently, he serves on the editorial board of the American Journal of Occupational Therapy. I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. M. David Miller, Chair Professor of Educational Psychology I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Anne E. Seraphine Assistant Professor of Educational Psychology I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Arthur J. ewman Professor of Educational Leadership, Policy, and Foundations I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Kay Walker Professor of Occupational Therapy This dissertation was submitted to the Graduate Faculty of the College of Education and the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August, 2000 Chairman, of Educational Psychology Dean, College of Education Dean, Graduate School UNIVERSITY OF FLORIDA 3 1262 08555 1538 
Full Text 
xml version 1.0 encoding UTF8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID EOISU2U0W_L0MWZD INGEST_TIME 20131024T21:06:09Z PACKAGE AA00017700_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES 