Citation
The impact of multiple imputations on the estimation of coefficient alpha

Material Information

Title:
The impact of multiple imputations on the estimation of coefficient alpha
Creator:
Yuen, Hon Keung, 1961-
Publication Date:
Language:
English
Physical Description:
v, 90 leaves : ill. ; 29 cm.

Subjects

Subjects / Keywords:
Data imputation ( jstor )
Datasets ( jstor )
Mathematical independent variables ( jstor )
Maximum likelihood estimations ( jstor )
Missing data ( jstor )
Modeling ( jstor )
Parametric models ( jstor )
Probability distributions ( jstor )
Sample size ( jstor )
Statistical discrepancies ( jstor )
Dissertations, Academic -- Educational Psychology -- UF ( lcsh )
Educational Psychology thesis, Ph.D ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis (Ph.D.)--University of Florida, 2000.
Bibliography:
Includes bibliographical references (leaves 84-89).
General Note:
Printout.
General Note:
Vita.
Statement of Responsibility:
by Hon Keung Yuen.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
024914693 ( ALEPH )
45805646 ( OCLC )

Downloads

This item has the following downloads:


Full Text











THE IMPACT OF MULTIPLE IMPUTATIONS ON
THE ESTIMATION OF COEFFICIENT ALPHA















By

HON KEUNG YUEN


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2000













ACKNOWLEDGMENTS

I am indebted to a few special individuals who made this dissertation possible.

First, I would like to thank my wife, Kit, for her patience and understanding

throughout this process. Also, I would like to thank my committee members Dr.

David Miller, and Dr. Anne Seraphine, Dr. Kay Walker, and Dr. Arthur Newman for

their time and support.














TABLE OF CONTENTS

page

ACKNOWLEGEMENTS .................................................................... ii

ABSTRACT ......................................................................................... iv

CHAPTERS

1. INTRODUCTION ....................................................................... 1

Statement of the Problem ............................................................... 4
Rationale for the Study ..................................................................... 5
Purpose and Significance of the Study ................................................ 6

2. REVIEW OF LITERATURE ........................................... ............ 7

Common Missing Data Treatments .................................... ........... 7
M multiple Im putation .......................................................................... 14
Missing Data Mechanisms .............................................................. ... 33

3. METHODOLOGY ........................................................................ 45

Sim ulation Procedure ........................................................................... 46
Design of Study .............................................................................. 50
Multiple Imputation Procedure ........................................ ........... 61
Evaluating the Performance of Multiple Imputation ........................ 64

4. RESULTS .......................................................... ............................ 66

5. DISCUSSION ................................................................................ 79
Lim stations ........................................ ................ ............................. 81
Suggestions to Future Research ........................................ .......... 81

REFEREN CES ...................................................................................... ... 84

BIOGRAPHICAL SKETCH .................................................................. 90













Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

THE IMPACT OF MULTIPLE IMPUTATIONS ON
THE ESTIMATION OF COEFFICIENT ALPHA

By

Hon Keung Yuen

August, 2000

Chairpersons: M. David Miller and Anne Seraphine
Major Department: Educational Psychology

The purpose of this dissertation is to investigate the accuracy of coefficient

alpha on tests when nonrandom missing data are replaced using multiple imputation

under a single-facet crossed model. The performance of multiple imputation was

evaluated under the conditions of three sample sizes (N = 50, 100, or 500), ten

conditions of distribution and percent of missingness, and two omitting patterns

(omitting item responses in the body and omitting responses at the end of the test).

The ten missing conditions were formed from examinees of the three ability levels

(high, medium and low) with a differential number of missing items. The nonrandom

nature of missingness leads to examinees with low ability missing more difficult

items or more items at the end of the test than those with high ability. A twenty-item

test was used in this study. Results of the one thousand iterations indicated that the

magnitude of the bias obtained in the omitting pattern where missing responses are at

the end of the test was less than 0.03. In contrast, the magnitude of the bias obtained













in the omitting pattern where missing responses are in the body of the test was less

than 0.07. In general, the bias increased as the amount ofmissingness increased or as

the sample size decreased. However, this pattern is not uniform across all the missing

conditions investigated. Overall, this simulation study confirmed that multiple

imputation is a reasonably good procedure to replace the missing data on tests in

which missing responses are either in the body of the test or at the end of the test.













CHAPTER 1
INTRODUCTION

Accurate measurement of examinees' ability in standardized achievement

assessments requires the test scores to be reliably measured. Internal consistency is one

type of reliability that indicates how strongly the test items within the same construct are

correlated. Internal consistency of a test appeals to educators because it requires only a

single administration of one form of a test. Coefficient alpha (Cronbach, 1971) is a

commonly used index to estimate the internal consistency of a test. The index is not a

direct estimate of the theoretical reliability coefficient but is an estimate of the lower

bound of the internal consistency (Crocker & Algina, 1986). According to Peterson

(1994), the formula for computing coefficient alpha (a) can be expressed as



a= s 1 I s 1 (-1)
s-l a2 s-1
i=1 i
where

s is the number of items in the test,

a. is the variance of the test scores,

of is the variance of a single item i,

oaiaris is the covariance between item i and item s, and

ris is the correlation between item i and item s.









Or a=- s (1-2)
1 + F(s 1)

where F is the average inter-item correlation.

As in the estimation of the Pearson product moment correlation coefficient,

computation of coefficient alpha requires a rectangular person-by-item data matrix with

no missing data (i.e., a balanced design data set). However, it is well known that missing

data is common in large-scale standardized educational achievement tests such as the

National Assessment of Educational Progress (NAEP) (Koretz, Lewis, Skewes-Cox, &

Burstein, 1993; Longford, 1994) and the Test of English as a Foreign Language (TOEFL)

(Yamamoto, 1995). Yamamoto (1995) indicated that about 20% of examinees have

difficulty completing the last 20% of the items in the TOEFL. Two main classes of

nonrandom omitting pattern in a test have been identified: omitting item responses in the

body of the test and omitting item responses at the end of the test (i.e., not-reached)

(Longford, 1994). A number of conceivable circumstances can contribute to these

occurrences. Angoff and Schrader (1984) found that response omissions in the body of

the test is common in tests with instructions indicating that there is a penalty for incorrect

responses, but not for response omissions. In this situation, examinees are more likely to

omit difficult items when they are not sure of the answer (Koretz et al., 1993). For

omitting responses at the end of the test, time constraints is a major factor. However, item

difficulty has been reported to contribute to this type of omitting pattern (Koretz et al.,

1993). Cluxton and Mandeville (1979) found less capable students tend to omit more

items at the end of the test.









Because of the balanced design requirement in the data set to compute coefficient

alpha, missing data present a challenge when standard methods of data analysis are used.

In the last few decades, a number of missing data treatments (MDTs) have been proposed

(see review in Little & Rubin, 1987). A promising MDT is multiple imputation (MI),

which was originally proposed by Rubin (1987). MI is a model-based estimation

technique for analyzing data with missing scores (Rubin, 1987). Using information from

the observed part of the data set, MI generates k sets of equally plausible values from the

simulated distribution of the missing data to replace the missing scores, where k is greater

than one. The missing scores are imputed k times (Rubin, 1987). As a result, MI creates k

versions of complete data sets with imputed values. Each complete data set can be

analyzed separately by means of standard complete-case analysis methods. The final

adjusted point estimate is obtained by averaging over the k intermediate parameter

estimates. MI has been shown to yield satisfactory parameter estimates with relatively

little bias (Graham & Schafer, 1999). However, MI has not been used widely in

educational settings except for matrix sampling and scaling procedures in the NAEP

(Mislevy, Johnson, & Muraki, 1992; Neal & Nianci, 1997).

Several recent studies compared different MDTs in estimating reliability

coefficients on measures with missing data (Downey & King, 1998; Harrison, 1998;

Marcoulides, 1990). Downey and King (1998) compared the accuracy of coefficient

alpha estimation using item mean and person mean substitution to replace missing data in

Likert scales. Results indicated that item-mean substitution reduces the reliability

estimate whereas person-mean substitution increases the reliability estimate of the scale

as the number of missing items and the number of respondents with missing items









increases beyond 20% (Downey & King, 1998). Marcoulides (1990) compared the

consistency and efficiency of two MDTs (restricted maximum likelihood and analysis of

variance) in estimating variance components on measures with missing data. He found

that restricted maximum likelihood (REML) produces a more efficient and less biased

variance estimate when 20% of the data are randomly deleted (Marcoulides, 1990).

Along the same line of research, Harrison (1998) evaluated six MDTs (listwise deletion,

zero imputation, substituting least square ANOVA, substituting probabilities of correct

answers from logistic regression estimates, Hoyt's ANOVA formula, and REML) in

estimating coefficient alpha on tests with dichotomously-scored items under the

conditions of five random and nonrandom missing data patterns crossed with two sample

sizes (50 and 100). Results showed that REML provides reasonable accuracy and

precision for the estimation of coefficient alpha in all five missing data patterns

(Harrison, 1998).



Statement of the Problem

Results of Harrison's (1998) study indicated that the average bias of the

coefficient alpha when using each of the six MDTs is negligible (less than 0.05) except in

two nonrandom missing-data situations where a listwise deletion procedure is used. One

reason for the small discrepancy in the bias among the six MDTs is that the maximum

amount of missing data in Harrison's study is less than 11%. Roth (1994) cited several

simulated and empirical MDT studies indicating that there is little difference in parameter

estimates when the amount of missing data is less than 5-10% regardless of the missing

data patterns (random or nonrandom). Roth (1994) suggested that the choice of MDTs









becomes more important when the amount of missing data in a data set is beyond 15-

20%. Therefore, we still do not know how some of the MDTs behave in situations with a

moderate amount of missingness.

Harrison (1998) found that the mean coefficient alpha produced by REML is

more positively biased (i.e., overestimated) than that computed by Hoyt's ANOVA in

situations where low-ability examinees have more omitted items or where they tend to

omit the most difficult items. There are two possible reasons for REML not behaving

well in Harrison's study. One is that REML has been shown to produce estimates that are

significantly biased in situations where sample size is small (N = 20 or 50) because

REML is based on large-sample theory (Gross, 1997). The other is that REML produces

biased estimates when data are missing nonrandomly (Jamshidian & Bentler, 1999).



Rationale for the Study

The present study attempts to address some of the limitations of the REML

estimation procedure (Harrison, 1998) by implementing MI which has been shown to

perform well in small sample sizes (Graham & Schafer, 1999) and in nonrandom

missing-data situations (Graham, Hofer, Donaldson, MacKinnon, & Schafer, 1997).

Although MI is commonly applied to missing continuous data, it may also be applied to

dichotomous missing data (Graham et al., 1997). Because Harrison's study examined the

effectiveness of different MDTs under slight levels of missingness, it is of interest to

examine the MI under more extreme levels of missingness. The level of missingness is

therefore set as high as 30%. At this range of missingness, it would become more obvious

how well MI performs.









It is well known that data missing completely at random (MCAR) seldom occurs

in educational settings (Kromrey & Hines, 1994). The present study, therefore, focuses

on nonrandom missing data.



Purpose and Significance of the Study

The purpose of this study was to investigate, via data simulation, the accuracy of

the coefficient alpha on tests with missing data replaced using MI. The performance of

MI was evaluated under the conditions of three sample sizes (N = 50, 100, or 500), ten

conditions of distribution and percent of missingness, and two omitting patterns (omitting

item responses in the body and omitting responses at the end of the test). The results of

this study provided an indication of how well MI performed in the above-stated missing

data conditions under a single-facet crossed model.













CHAPTER 2
REVIEW OF RELATED LITERATURE

The first section of this chapter provides an overview of some commonly used

missing data treatments (MDTs), which include listwise deletion, variable mean

substitution, regression imputation, and stochastic regression imputation. Limitations of

these MDTs are highlighted. The second section is devoted to the development and

theoretical framework of multiple imputation (MI), the relationship of MI and Bayes'

theorem, assumptions and characteristics of MI, the description of the imputation

methods, and procedures to perform MI. The last section discusses three major types of

missing data mechanisms as proposed by Little and Rubin (1987), and implications of

each missing data mechanism to the application of MI.



Common Missing Data Treatments

Listwise Deletion

In order to transform the missing data matrix into a rectangular one, a common

practice is to exclude those examinees who do not respond to all items. This is called the

listwise deletion procedure or the complete-case analysis. The complete data with

reduced sample size are then used to estimate population parameters such as the

reliability coefficient. Listwise deletion is the default option for analysis in many popular

statistical software packages such as the Statistical Analysis System (SAS) and the

Statistical Packages for the Social Sciences (SPSS-X). Even though listwise deletion is









the simplest approach to handle missing data, it is by no means the desirable one. Since

the analysis is based on only those examinees who respond to all items, a substantial

amount of useful data is lost. In a Monte Carlo investigation, Kim and Curry (1977)

found that even with 2% random nonresponses on each of the 10 variables, listwise

deletion results in retaining only 81.7% of the cases. There is an accompanied loss of

efficiency or statistical power in the estimation of the population parameters especially

when the amount of missing data is high. Raaijmakers (1999) demonstrated that listwise

deletion results in a loss of statistical power ranging from 35% to 98% as the amount of

missing data increases from 10% to 30% in various Likert-type data.

Listwise deletion is based on the assumption that the data are missing completely

at random even though there is little evidence to support this assumption in educational

research (Kromrey & Hines, 1994). When data are not missing completely at random,

estimates are biased. Empirical data support that the average bias increases as the amount

of missing data increases (Harrison, 1998). In Harrison's (1998) study, when item

responses are not missing at random, listwise deletion leads to range restriction. The

resulting mean coefficient alpha is then substantially underestimated (Harrison, 1998).

Single Imputation Procedures

Besides using deletion procedures to handle missing data, single imputation

procedures have also been used widely in educational research (see review in Raymond,

1987). Imputation involves filling in each missing response with a plausible value and

then analyzing the resulting data set with the imputed values. Also, plausible values are

estimated from observed scores in the study. Two major advantages of imputation

procedures are as follows:









1. They retain the information from incomplete cases without discarding any scores.

2. The resulting data set with the imputed values can be analyzed by means of standard

complete-case analysis methods.

The three single imputation procedures that are commonly used in educational research

are variable mean substitution, regression imputation, and stochastic regression

imputation (Raymond, 1987; Roth, 1994).

Variable Mean Substitution

To implement variable mean substitution or unconditional mean imputation, each

missing score of a particular item is replaced with its respective mean value of all

nonmissing cases. Even though it seems that the mean is a good estimate and the

procedure is relatively easy to implement, variable mean substitution has several serious

disadvantages. The observed variance of an item with imputed mean value is

systematically underestimated (i.e., negatively biased) because imputing a mean value in

an item is equivalent to adding zero to the sum of the squared deviations, which is the

numerator of the formula for calculating variance. At the same time, there is an increase

in the denominator of the variance formula, (N- 1), as the procedure attempts to restore

the original sample size (Landerman, Land, & Pieper, 1997; Raymond, 1987).

Attenuation of the magnitude of the covariance or correlation of scores with

filled-in mean with scores in other items can be explained in a similar fashion.

Conceptually, the imputed values are a constant and they are unrelated to scores in other

items; therefore, inter-item correlations are attenuated. Downey and King (1998) showed

that the severity of attenuation on correlation increases as the amount of imputed values

increases. As indicated in equation (1-2), a reduction in the average inter-item correlation









results in a decrease in the coefficient alpha (Downey & King, 1998). Figure 2-1

graphically shows the coefficient alpha spuriously decreases when there is a reduction in

the average inter-item correlation at the lower end (i.e., r < 0.2).

Another disadvantage related to the attenuation of variability of the item is that

the standard error of estimate is much too small resulting in biased inferences (Little &

Rubin, 1989). Finally, variable mean substitution does not use information from other

items to improve the accuracy of imputation (Landerman et al., 1997).


UI 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Average Inter-item Correlation

Figure 2-1. Relationship between the coefficient alpha and the average inter-item
correlation when s equals to 20.









Regression Imputation

Regression imputation or conditional mean substitution is used to fill in missing

scores of an item with values predicted from a regression model by utilizing information

from one or more highly related observed variables or predictors. When the response

variable with missing scores is dichotomous in nature, a logistic regression model is used

instead. The logistic regression produces a predicted probability of a response being

missing.

Suppose Y is an Nx 1 vector of the responses for examinees, Y= (yj, ... ,yn) and

is composed of a set of the observed scores, Yo = (y, ... yo) and a set of the missing

scores, Ym = (yo+1, ., y,). Y can be partitioned into Y = (Yo, Y,). Let no be the number of

respondents for Yo and n, be the number of nonrespondents for Y,. X denotes an N x G

design matrix of relevant explanatory variables or predictors that are highly correlated

with Y. These variables are fully observed (i.e., with no missing data). X. represents the

predictors for no individuals with observed scores Yo and Xm represents the predictors for

nm individuals with missing scores Y,. Ym scores are to be estimated. A linear regression

equation can be expressed as

Y= a + bX + e (2-1)

where

a is a column vector of the intercepts,

b is a column vector of estimated regression coefficients, and

e is a column vector of estimated residuals and is set to zero.

The procedure to impute the predicted values based on a deterministic regression

model is as follows. First, use X. to estimate the coefficients of the linear regression









equation by regressing Yo on X,. After estimating the regression coefficients from the

observed scores, a predicted score for Y,, can be obtained from the prediction equation:

Y, = a + bX. (Little & Rubin, 1989).

Landerman and associates (1997) explained that the distribution of Ybased on the

regression imputation is less distorted than Y based on the mean substitution because the

imputed values are now distributed across the predicted values fl instead of concentrated

at the mean. The variance of Ybased on the regression imputation is less attenuated than

that based on the mean substitution because the numerator for calculating variance is the

squared deviations of the Y. from the grand mean, which is not likely to be zero.

In regression imputation, the imputed values of Y, fall exactly on the predicted

regression line or plane as the estimated residual e in equation (2-1) is set to zero

(Landerman et al., 1997; Little & Rubin, 1989). Therefore, there is no variation in the

distribution of Ym given Xm (Little & Rubin, 1989). Because of the exact linear (or plane)

relationship between Ym and Xm, its correlation is spuriously inflated (Graham & Schafer,

1999). Harrison (1998) demonstrated that replacing missing data with either the least

squares estimates or the predicted probabilities overestimates the coefficient alpha. This

phenomenon can be explained by the positive relationship between the coefficient alpha

and the average inter-item correlation in equation (1-2). In addition, the severity of bias

(overestimation of the coefficient alpha) increases as the amount of missing data

increases (Harrison, 1998). Little (1992) concluded that mean substitution or regression

imputation can yield unbiased estimates of aggregate means but leads to distorted

variance and covariance estimates.









Regression imputation conveys a false sense of accuracy that all missing scores

can be predicted from Xm without errors. By treating the imputed values as the known

observed scores, regression imputation fails to account properly for the variability or

uncertainty about not knowing the missing scores (i.e., which value to impute) (Rubin &

Schenker, 1991). Failure of the regression model to incorporate residual variability in the

imputation variance leads to standard errors of estimates bias toward zero (i.e., too small)

(Little & Rubin, 1989). For example, Brownstone and Valletta (1996) found that the least

squares standard error estimates are 30% less than their true values.

Stochastic Regression Imputation

In order to restore the prediction errors in the imputed values (i.e., the variability

around the regression line), a random residual / error is added to each predicted value.

The random residual can be drawn randomly with replacement either from a standard

normal distribution with a mean equal to zero and a standard deviation equal to the

standard error of estimate for Yo (Beaton, 1997), or from the distribution of residuals of

the regression estimate for Yo (Graham et al., 1997). The purpose of drawing with

replacement is to ensure that each drawn value has equal probability. In stochastic

regression imputation, each missing response is replaced by its conditional mean plus a

random residual from Yo (Little & Rubin, 1989).

However, stochastic regression imputation restores only one part of the

variability: the errors of prediction. There is another part of variability, the sampling

variability in which the values of the estimated regression coefficients are uncertain.

Graham and Schafer (1999) explained that the regression line estimated from Yo is not the

regression for the population, but is only an estimate from one sample. Stochastic









regression imputation cannot reflect properly the sampling variability because it lacks the

variation of the imputed values among several sets of imputations (Little & Rubin, 1989).

To incorporate the sampling variability in the estimation of the regression parameters,

multiple imputation (MI) is required (Rubin, 1987).



Multiple Imputation

Introduction

Multiple imputation was originally proposed by Rubin (1987). It is a model-based

estimation technique for analyzing data with missing scores (Rubin, 1987). Using

information from the observed part of the data set, MI generates k sets of equally

plausible values from the simulated distribution of the missing data to replace the missing

scores, where k is greater than one. The missing scores are imputed k times and multiple

imputations within one model are called repetitions (Rubin, 1987). As a result, MI creates

k versions of complete data sets with imputed values. Each complete data set can be

analyzed separately by means of standard complete-case analysis methods. The estimate

and its associated variance from each separate analysis can be combined to form an

unbiased final parameter estimate under the correctly specified model (Little & Rubin,

1989). The final variance incorporates the variability within the imputation (i.e., the

prediction error) and the variation of the imputed values among k sets of imputations (i.e.,

the sampling variability) to reflect the true accuracy of the estimation.

Theoretical Framework

Let Q denote a scalar population quantity (such as a coefficient alpha or a

regression coefficient) to be estimated and let Q = Q (Yo, Ym) denote a function of the









observed and missing data. Multiple imputation uses information from the observed

scores Yo to replace the missing scores Ym, and then uses the complete data set with

imputed values to estimate the parameter Q. A distribution of the missing data is required

to generate the imputed values. The distribution is drawn from the observed scores Yo. It

is necessary to know the structural or model parameter of the observed scores, which is 0,

where 0 represents a vector of q parameters. For example, 0= (uo2), means that 0is a

function of the mean p and the variance o2. Because 0is unknown, it must be estimated,

resulting in the random variable, 0 (Michiels & Molenberghs, 1997). Because of the

uncertainty of not knowing 0, Rubin (1987) recommended using Bayesian methodology

to account for this uncertainty in MI. Through a Bayesian procedure, a distribution

function of t in the form of the posterior probability distribution of 0can be obtained

from the data (Michiels & Molenberghs, 1997).

The derivation of MI is as follows (Rubin, 1996): Inferences for Q are based on

the actual posterior probability distribution of Q,f(Q Yo), which can be expressed as

f (Q I Yo)= f (QI Yo,Ym)f(Y, I Yo)dY, (2-2)

where

fdenotes the probability distribution function,

f(Q I Yo, Y,) is the complete data posterior probability distribution of Q and is expressed

as the conditional distribution of Q given both the observed and missing data, and

f(Y, I Yo) is the predictive probability distribution of missing scores Ym given the

observed scores Yo.









Based on equation (2-2), the actual posterior probability distribution of Q at a

particular value Qi can be obtained by drawing an infinite number of repeated

independent values for Y, fromf(Ym I Yo), calculatingf(Q, I Yo, Ym) separately for each

draw, and then averaging the values over the repeated imputations (Little & Schenker,

1995; Rubin, 1996).

The predictive distribution of the missing scores can be parameterized using a

structural parameter 0 (Little & Schenker, 1995), and is expressed as

f(Y, I Yo)= Jf(Y. I Yo,0)f(90 Yo)d (2-3)

where

f(9 Yo) is the conditional distribution of 0 given the observed scores Yo, and

f(Ym I Yo, 6) is the conditional distribution of Y, given the observed scores Yo and the

parameter 0.

From the Bayesian perspective, drawing k values for the missing scores Ym in MI

involves two steps (Schafer, 1999):

Step 1. Simulate an independent random draw of the unknown parameter ( from the

observed-data posterior distribution (0 Yo).

S-f(0 Yo)

where d is the posterior distribution of 0 or a distribution for the missing scores

estimated.

Step 2. Randomly draw missing values YI from the conditional predictive distribution of

Ym given parameter 6.

Y. ~ f(Y. I Yo, *)









These two steps are repeated k times to yield k sets of imputed values for the missing

scores Y,.

In principle, MI involves k repetitions of independent draws from the posterior

predictive distribution of Y. by specifying a prior distribution of the unknown structural

parameter 8 (Little & Schenker, 1995). This forms the k imputations for the missing

scores Y,.

Bayes' Theorem

The aim of MI is to estimate the unknown structural parameter 9 and to generate

the imputed values. Conditional upon a sample of observed scores Yo, Bayes' theorem

makes inferences about the unknown parameter 0. Bayes' theorem represents uncertainty

of not knowing Oby a prior probability distribution (Pollard, 1986).

The conditional distribution of 0 given the observed scores Yo or the posterior

distribution of is derived from a Bayesian procedure (Pollard, 1986) and is defined as

f(91Yo) =f(Yo I O)f(0) /f(Yo) (2-4)

where

f(Yo I 8) is the conditional probability, or likelihood, of the observed scores Yo,

f(0) is the prior distribution of unknown structural parameter 8 and represents

uncertainty about the value of the parameter before any data are seen, and

f(Yo) is the marginal probability of observed scores Yo for an examinee of parameter 0

randomly sampled from a population with the given distribution.

Sincef(Y,) is a constant that serves to makef(81 Yo) integrate to one, the

equation (2-4) becomes









f( I Yo) oc f(Yo I)f(0) (2-5)

where oc indicates a relationship of proportionality.

Given Yo,f (Yo I 0) becomes a likelihood function for 0 given Yo, L(O Yo). The posterior

distribution of 0is then written as

f(9 Yo) ac L(OI Y,)f(0) (2-6)

where L(0 Yo) expresses the information about the parameter provided by the observed

scores Yo, andf(Yo I 9) serves to convert the prior distribution f(0) into the posterior

distributionf(01 Yo) (Pollard, 1986).

It is through the likelihood (i.e., L( I Yo)) of f(Yo I 0) that the observed scores Yo

modify the prior distribution () to determine the posterior distributionf(9 I Yo) (Pollard,

1986). In essence, Bayesian methodology specifies a distribution of what is expected to

occur based on prior information and combines it with new information (i.e., the

observed scores Yo) to form inferences about 0.

Assumptions Underlying Multiple Imputation

The missing data mechanism is assumed to be ignorable or missing at random

(Graham & Schafer, 1999). Ignorable means it is not necessary to specify a nonresponse

model or to estimate its parameters in order to obtain valid likelihood-based inferences.

Missing at random means that the missing data are a random sample from the complete

data after conditioning on the measured variables X in the imputation model (Schafer,

1997). However, Graham and associates (1997) have demonstrated that MI produces

satisfactory parameter estimates even when the ignorability assumption is suspect.









In addition, the variables in the data set are assumed to have multivariate normal

distribution. Simulation studies (Graham, Hofer, & McKinnon, 1996; Wang, Anderson,

& Prentice, 1999) supported that the MI estimator is robust even when the data model

departs from being multivariate normally distributed.

Proper Imputation Method

An imputation method is regarded as proper when it incorporates appropriate

variability (i.e., uncertainty about the missing scores and the sampling variability) in

creating multiply imputed values under a correctly specified model (Rubin, 1987, 1996).

Rubin (1987) has shown that one way to achieve proper imputation is for the imputation

procedure to follow the Bayes' theorem of infinite independent draws of Y, from its

posterior predictive distribution as specified in equations (2-2 & 2-3). By incorporating

variability to adjust the standard error of estimates of the parameter, proper imputation

method leads to valid inferences (Rubin & Schenker, 1991).

The conditions under which an imputation method is proper include the

following:

1. Imputed values are independent repeated draws from a Bayesian posterior predictive

distribution of the missing scores Y, given the observed scores, f(Y, I Yo) (Rubin,

1996).

2. Infinite k repeated imputations since parameter estimates derived from infinite draws

for Y, are fully efficient (Little & Rubin, 1989).

3. The underlying model specification for the complete data is correct.

4. The underlying model specification for the missing data mechanism (i.e., assumptions

about the nonresponse) is correct.









5. Large-sample size (N> 100) (Rubin & Schenker, 1986).

6. All causes of missingness are included in the imputation model (Graham et al., 1997).

Imputation Methods

Rubin and Schenker (1986) proposed two types of imputation methods-implicit

and explicit. Implicit or nonparameteric methods are applicable for discrete data and

involve drawing values only from Yo and then assigning them to Ym. In contrast, explicit

or parametric methods are applicable for continuous data and involve a statistical model

to form the posterior predictive distribution of Y,, from which imputing values are

drawn. Unlike implicit methods, the drawing values based on explicit methods are not in

Yo (Rubin & Schenker, 1986).

Implicit Methods

Simple hot deck procedure

The simple hot deck procedure involves random draws with replacement of nm

imputed values for nonrespondents from observed scores in matching respondents.

However, like stochastic regression imputation, simple hot deck procedure ignores the

sampling variability as the population distribution of (Ym Yo) is not known. The imputed

values are estimated from the respondent scores Yo in one sample only (Little &

Schenker, 1995).

Approximate Bayesian bootstrap

In order to incorporate sampling variability in the estimated parameters,

approximate Bayesian bootstrap (ABB) is used (Rubin, 1987). The ABB creates k

repeated imputations from the posterior predictive distribution of the missing data as

follows:









1. Draw no values at random with replacement from the no possible values to create a

bootstrap sample distribution such as a scaled multinomial distribution.

2. Then independently draw n, missing values with replacement from the bootstrap

sample distribution (Rubin & Schenker, 1986).

This process is repeated k times to yield k sets of imputed values, and each set of

imputations comes from a different bootstrap sample of Yo.

Explicit Methods

Explicit methods define the model for the distribution of the response variable Y

(e.g., normal linear regression model or logistic regression model) and a set of predictors

X that enters the model to create imputations (Little, 1992; Rubin & Schenker, 1991).

Fully normal imputation

Once again, suppose Yis an Nx 1 vector of the response for examinees, Y= (yi,.

.., y,) and is composed of both a set of the observed scores, Yo = (yi, ..., yo), and a set

of the missing scores, Ym = (yo+i, ... ., y,). Let no be the number of respondents for Yo and

nti be the number of nonrespondents for Ym. The scores of Ym are to be estimated. Rubin

and Schenker (1986) described how to create multiple imputations under the independent

normal model, yi ~ N(, o2), for i = 1,... ,o, where 0= (u, 2) is unknown, and 0 is a

function of the mean ,u and the variance O2. When the prior distribution of O,f(0), is

proportional to 1/o2, the conditional posterior distribution of P given o2, fu I o2, Yo) is

N(y0,2l /no),

10
where yo is the sample mean of Yo, and is equal to y,;
no i=1

and the observed-data posterior distribution of o2,f(o2 Yo) is









(no g)62 X o-g,


where ir2 is the estimated variance of Yo, and is equal to (y o)2, and
(no g) j=1

X2-, denotes a chi-square random variable with no-g degrees of freedom.

To create an imputation Y. = (Y:,.... Y, ), for = 1,..., k, the following three

steps are required.

Step 1. Generate the unknown parameters 0 = (U, "2) from the observed-data posterior

distributionf(9 Yo) by first randomly drawing the variance a*2 from(n, g)d2 IX-g,

and then randomly drawing the mean u* from N(3y, &*2 / no).

Step 2. Independently draw nm missing values of Ym from yi, ~ N(p', a *2),

for i = o + 1,.. n (Rubin & Schenker, 1986, Schafer, 1999).

Step 3. Repeat the procedure k times (i.e., set = k) to yield k sets of proper imputations.

Markov Chain Monte Carlo

In addition to using the resampling procedure such as bootstrapping, which is a

noniterative method for creating posterior predictive probability distribution from Yo,

Markov Chain Monte Carlo (MCMC) is a collection of iterative simulation-based

methods for generating the posterior distribution of the unknown parameters 0and they

do not require large samples for efficacy (Little & Schenker, 1995). Data augmentation

algorithm (Tanner & Wong, 1987) and Gibbs sampling (Gelfand & Smith, 1990) are two

of the most common MCMC methods used in MI (Little & Schenker, 1995). They

generate simulation-based estimates of the Bayesian posterior predictive distribution of









the missing data, f(Y, I Y) and from it, perform k independent random draws of Y,m

(Schafer 1997).

Normal Linear Regression Model

Y is modeled by a linear regression model, Y~ N(XPf, o2) with a multivariate

normal distribution, where Xf covariate is a function of the parameters, X contains g

variables, 8 is the parameter vector of regression coefficients to be estimated, and o0 is

the regression variance. The algorithm for creating k multiply imputed values involves

the following steps (Rubin, 1987):

Step 1. Regress Yo on X. to give the ordinary least squares estimates: estimated

regression coefficient vector / and estimated regression variance ci2.

S= VX'. Y (2-7)

where V = (X'o X,)-.

The vector of predicted responses:

Y, = X, (2-8)

The maximum likelihood estimator of o2:

2 (Yo /(no -g) (2-9)

This is the estimation task for the normal linear regression model (Rubin, 1987).

Imputation task for this model is from Steps 2 to 4.

Step 2. Estimate o*2 (square of a random error) to account for the deviations around the

regression line.

'2 = a2(no g)/L (2-10)









for = 1,... where L is a randomly drawn variate from a chi-squared distribution

with no g degrees of freedom.

Substitute r2 = (Yo )2 /(no g) in equation (2-10), and o2 becomes (Yo o)2 /L.

Step 3. Estimate the regression coefficients by adding the random error a to account for

the uncertainty about the regression prediction.

f' = f^ + a"ZVZ (2-11)

where

Z is a g-component vector of standard normal deviate, Z ~ N (0, Ig),

Ig is the identity matrix of order g.

Z is formed by drawing g independent variates from N (0,1),

o'V"2 is the mean square error of the regression equation, and

oaV is the variance-covariance matrix 1. The roots of the main diagonal of Z are the

standard errors,

V/12 is the triangular square root of V obtained from the Cholesky decomposition, and

cr'V"2Z represents the error term.

Each set of randomly drawn coefficients is then used to estimate the missing

values, and different sets of coefficients reflect the variation of the regression lines due to

sampling. Steps 2 and 3 indicate random draws from the posterior distribution of j (van

Buuren, Boshuizen & Knook, 1999).

Step 4. Predict the missing values of Ym based on the following equation:

Y. = X8 + r'z (2-12)

where z is a random number drawn from normal deviates.








Each set of predicted values for Ym is based on a different set of regression predictions

and random components o*.

Step 5. Repeat steps 2 to 4 k times to create k sets of imputed values Y., Y2,..., Y .

Repeated-Imputation Inferences

The final point estimate of the parameter Q approximates the actual posterior

mean of Q, E (Q I Yo). This equals the average of the repeated complete-data posterior

means of Q, and can be expressed as (Rubin, 1996):

E (Q Yo) = E [ E (Q I Yo, Y) Yo] (2-13)

where E refers to the expectation over the repeated imputations, and E (Q I Yo, Ym)

approximates the complete-data posterior mean.

The final estimated variance of the parameter Q approximates the actual posterior

variance of Q, Var (Q I Yo). This equals the average of the repeated complete-data

posterior variances of Q plus the variance of the repeated complete-data posterior means

of Q, and can be expressed as (Rubin, 1996):

Var (Q I Yo) = E [Var (Q I Yo, Y,) I Yo] +Var [E (Q I Yo, Ym) Yo] (2-14)

where Var refers to the variance over the repeated imputations, and Var (Q I Yo, Y,)

approximates complete-data posterior variance.

Based on the above standard probability derivations (2-13 & 2-14), the k point

estimates and their associated variances obtained from the standard complete-case

analysis method can be combined into a final adjusted estimate and its estimated variance

using Rubin's formulas (Rubin, 1987).

After generating k imputed data sets using an appropriate imputation model and

method, and analyzing each of them separately using a standard complete-case analysis










method, MI yields k intermediate parameter estimates Q: ( ,..., Q,), and k associated

variance estimates l,: (lj,...,Uk ), for i = 1,..., k. The final adjusted point estimate Q

is obtained by averaging over the k intermediate parameter estimates, which is

k1k
Q =- i (2-15)
k i=-

Whereas the final estimated total variance is obtained by the sum of the average associate

variance within a set ofk imputed values and the variance across independent sets of

imputed values, which is

T = U+(1+ k-')B (2-16)

where U is the average within-imputation variance within a set ofk imputed values, and

is expressed as


U =E-l (2-17)
k i=1

and B is the variance across independent sets of imputed values, and is expressed as

1k
B= k-- (Q, )2 (2-18)
k -I x--

Bacik, Murhy, and Anthony (1998) indicated that the within-imputation variance

is a measure of the uncertainty about not knowing the missing data and the between-

imputation variance is a measure of ordinary sampling variation. The inflation factor

(l+k-') accounts for the simulation errors in using a finite number of imputations

(i.e., k < oc) (Barnard & Meng, 1999). Multiple imputation correctly adjusts the standard

error of estimates of the parameter by including within- and between-imputation

variances.









When there are no missing data, Q,..., are identical, and the between-

imputation variance B becomes zero and T is equal to U. When k = 1 (i.e., single

imputation), B cannot be estimated. T is then equal to U, and the variance is

systematically underestimated (Heitjan & Rubin, 1990). As k increases, both Q and T

decrease, hence resulting in greater precision of sample statistics (Little & Schenker,

1995).

The extent of influence of missing data on the estimation of Q is determined by

both yand r. The factor r estimates the proportional increase in variance due to missing

data, and can be expressed as

r = (1+ k-)BIU = yl/( y) (2-19)

where the ratio of B to U is a reflection of how much information in the missing part of

the data relative to the observed part (Schafer & Olsen, 1988), and yis an estimate of the

fraction of missing data about Q (Little & Schenker, 1995). Little and Rubin (1989)

pointed out that yis equal to the fraction of data missing only when the missing data

mechanism is missing completely at random.

Uncertainty

Since the imputed values are not the true observed scores, MI takes into account

the uncertainty about the true values of the missing scores in the parameter estimates by

drawing parameter d from the observed-data posterior distributionf(01 Yo) and then

imputed values Y, from the conditional predictive distribution of Y,,, given that parameter

d, f(Y, I Yo, 0) (Rubin & Schenker, 1991).









In addition to incorporating the uncertainty about not knowing the missing scores,

MI also takes into account the fact that the population distribution of Y, given Yo is not

known, and is estimated from the observed scores Yo in one sample (Graham & Schafer,

1999; Little & Schenker, 1995). The variation in estimating the regression line is called

sampling variability (Graham & Schafer, 1999). The third source of uncertainty /

variability comes from the finite number of imputations derived from using

approximations to Bayesian posterior distributions, and is called simulation errors (Rubin

& Schenker, 1991). Finally, by comparing parameter estimates across a number of

plausible missing-data models (i.e., sensitivity analysis), MI reveals uncertainty about

reasons for nonresponse (Beaton, 1997; Little & Rubin, 1989).

Number of Imputations

Under the ignorable response assumption, the final adjusted point estimate Q and

its estimated variance based on infinite number of imputations are the same as the ones

obtained from the maximum likelihood estimation (MLE), which is fully efficient and

correct (Little, 1992). The large-sample efficiency of the point estimate Q based on k

imputations relative to that based on an infinite number of imputations is 1 + (7 / k), in

standard error units (Rubin, 1996; Schafer & Olsen, 1988). As illustrated, with 30%

missing data (y= 0.3), an estimate based on k = 3 imputations has a standard error of

1 + 0.3/3 = 1.05. This means the standard error is 5% wider than the one obtained from

MLE. Alternatively, the percent efficiency of Q is defined as 1/l + (y/k) (Rubin,

1987). In this example, the percent efficiency is 1/1.05 = .95 or 95%. This means the

efficiency of Q is 5% less than the one obtained from MLE. By increasing k to 5 and 10









imputations, it increases the efficiency of Q to 97% and 99.5% respectively. As shown in

Figure 2-2, unless the fraction of missing data is unusually high (70% or more), the

efficiency gained by implementing k beyond 5-10 is minimal. Rubin and Schenker (1986)

concluded that only a few number of repetitions (3 < k < 10) are needed to produce point

estimates that are close to fully efficient when the amount of missing data is moderate

(e.g., 30%).





Y
--------- ---
.9 --

0.7
0.6
S0.5
I. 0.4
S0.3
0.2
0.1 x

S1 2 3 4 5 6 7 8 9 10
Number of Imputations

Figure 2-2. Percent efficiency of MI estimation using different number of imputations in
three levels of missingness (10%, 30%, and 50%).









Several empirical studies supported that standard error estimates of the parameters

were underestimated by 10-20% in single imputation when compared to the ones in MI

(Crawford, Tennstedt, & McKinlay, 1995; Heitjan & Rubin, 1990; Landerman et al.,

1997). Based on the results of two Monte Carlo simulation studies (Little & Rubin, 1989;

Rubin & Schenker, 1986), Table 1 shows the comparison of the actual confidence

interval (CI) coverage for Q, when k = 1, 2, or 3, with the nominal coverage at 90%,

95%, or 99%. Under the ignorable response assumption, the fraction of missing data in

these two large-sample (N > 100) simulation studies was 30%. As indicated, when k = 1

(single imputation), the discrepancy between the actual and nominal coverage ranges

from 5-13%, whereas when k = 2, the discrepancy is only 2-3%. When k = 3, there is no

discrepancy at all, which means that inferences are valid.





Table 2-1. Analytic Large-Sample (N> 100) Coverage (in %) of Single (k = 1) and
Multiple (k = 2 or 3) Imputation Procedure with Missing Data Equals 30%


k Nominal Coverage

90% 95% 99%

1 77 85 94

2 87 93 99

3 90 95 99

Note. Adapted from Little and Rubin (1989) and Rubin and Schenker (1986)









Rubin and Schenker (1986) demonstrated that k should increase from 2 to 3 as the

nonresponse rate increases from 10% to 60% in order to achieve a satisfactory CI

coverage (i.e., close to the nominal value). Rubin and Schenker (1986) also pointed out

that improvements in the actual CI coverage diminish as k increases. The differences in

standard errors or CI coverage between k = 5 and k > 25 have been shown to be

negligible (Heitjan & Rubin, 1990; Wang, Sedransk, & Jinn, 1992).

Based on the literature, the number of imputations depends on:

1. The amount of missing information. As the percent of missing data increases, the

amount of uncertainty about the imputed values increases. To accurately incorporate

this uncertainty, it requires an increase in the number of imputations. Since imputed

values are averaged over k imputations, the imputation variance is reduced as the

number of imputations increases (Kalton & Kasprzyk,1986; Rubin,1987).

2. The type of missing data mechanisms. Based on several simulation and empirical

studies, Glynn, Laird, and Rubin (1993) and Raghunathana and Siscovick (1996)

demonstrated that nonignorable nonresponse patterns require a larger number of

imputations (k > 10) than ignorable nonresponse patterns to achieve a satisfactory CI

coverage.

Advantages of Multiple Imputation

As in single imputation, the resulting data set with the imputed values can be

analyzed by means of standard complete-case analysis methods. Because MI involves

averaging over k intermediate parameter estimates, the final point estimate derived from

MI is more efficient than that from single imputation (Rubin, 1996). The final estimated

variance also reflects the true variance of the parameter.









Studies affirmed that MI produces accurate standard errors (i.e., efficient) for

parameter estimates as it correctly adjusts for nonresponse bias (Heitjan & Little, 1991;

Rubin & Schenker, 1986; Xie & Paik, 1997). The estimated actual CI coverage is close to

the nominal levels (Little & Rubin, 1989; Rubin & Schenker, 1986; Wang et al., 1992),

which means MI yields valid inferences.

MI has been shown to yield satisfactory parameter estimates with relatively little

bias even under the following conditions:

1. Sample sizes are small (e.g., 50) (Graham & Schafer, 1999). Little (1992)

recommended to use MI for small samples and MLE for large samples.

2. Data are missing in large amounts (e.g., 50%) (Graham & Schafer, 1999).

3. Models are relatively large and complex (e.g., 18-predictor model) (Graham

& Schafer, 1999).

4. Ignorability assumption is suspect (Graham et al., 1996, 1997).

5. Data distribution is skewed (Graham et al., 1996; Wang et al., 1999).

6. Model of the data distribution is misspecified (Greenland & Finkle, 1995).

Empirical and simulation studies have shown that MI is far superior to deletion

procedures, mean substitution, regression imputation (Crawford et al., 1995; Graham et

al., 1996), and simple hot deck procedure (DeCanio & Watkins, 1998) with regard to

bias, efficiency, and validity of interval estimates when the underlying MI model

specification is correct.









Limitations of Multiple Imputation

Since the observed scores Yo provide indirect evidence about the likely values of

the missing ones Ym in MI, relevant predictors for Y (i.e., knowledge on the cause of

missingness) are essential to obtain unbiased estimates and valid inferences.

Summary

A summary of the development, theoretical framework, and assumptions of MI

were discussed. The procedures on performing MI based on a normal linear regression

model with a univariate Y variable as well as how to combine k intermediate parameter

estimates into a final adjusted point estimate and its variance were described. The two

main features of MI are: (i) it takes into consideration the uncertainty of not knowing the

exact values of the missing scores by incorporating the residual variation about the

regression prediction, and (ii) it incorporates the sampling variability to estimate the

population distribution of the missing scores, which are unknown. Because of these two

features, MI has been shown to yield satisfactory parameter estimates with relatively little

bias.



Missing Data Mechanism

Little and Rubin (1987) indicated that valid inferences from MI depend on the

inclusion of a correct mechanism that produces missingness, and the knowledge of the

missing data mechanism is important in selecting an appropriate imputation model. Rubin

(1976) defined the mechanism of missingness in terms of a probability distribution model

of nonresponse. Let R denote an N x 1 vector of binary missing-data indicators with









distribution depending on a parameter vector V/ for the nonresponse model. If an

examinee responds to an item, R = 1; if an examinee omits an item, R = 0.

Since Y= (Yo, Ym), the probability distribution of the Y function for the complete

data can be expressed as

f(Y I) f(Y, Y, 10 ) (2-20)

Integrating over the sampling space of missing scores Y, yields the marginal probability

distribution for the observed scores (Schafer, 1997).

f(Y, I0) = ff(Y0, Y I )dY, (2-21)

The probability distribution for the observed scores given the parameters 0 of the data

model and the parameters y of the missing data mechanism can be expressed as

f(Y,,R I X,0, ) = f(Y, Y,,R I X,0, )dY, (2-22)

where 0and yare sets of indexing vectors of unknown parameters for their respective

distributions. For example, the parameters in y are the proportions of examinees assigned

to each item (Bradlow & Thomas, 1998), whereas 9= (Au, o) or 0= (', oa).

The equation (2-22) can be factorized

asf(Y,, R X, y) = Jf(R IY, Y., X, yl)f(Yo, Y. I X, O)dY, (2-23)

where

f(R I Yo, Y,, X, V) denotes the conditional distribution of R given Y and represents a

model for the missing data mechanism, and

f(Yo, Ym I X, 0) represents a model for the data.









Little and Rubin (1987) distinguished three types of missing data mechanisms:

missing completely at random (MCAR), missing at random (MAR), and nonignorable

missing (NIM).

Missing Completely at Random

Missing data mechanism is MCAR if the probability of the missing data indicator,

f(R), is independent of both the observed scores Yo and the missing scores Y, in the

model, which means

f(R I Yo, Y. X, ) =f (R I y) for all Y (2-24)

An example of MCAR in education is when the probability of missing responses to an

item in an achievement test depends on neither the examinees' ability nor the number of

instruction hours on test taking skills received, and the number of instruction hours

received for all examinees are known.

As indicated in equation (2-24), the distribution of the missing data indicator R

does not depend on the missing scores or any covariates X. The first term of the marginal

probability distribution in equation (2-23) can therefore come out of the integral, and

equation (2-23) can be expressed as

f(Yo,R I X,0,y) = f(R I ) ff(Y, Y, I )dY, (2-25)

From equation (2-21), when the MCAR assumption holds, the probability distribution of

the observed scores then becomes

f(Y, R I X, 0, V) =f(R I )f (Y, 0) (2-26)

where

f(R tI ) represents a model for the missing data mechanism, and









f(Yo | 9) represents a model for the conditional probability distribution of the observed

scores Yo.

Since Yo and R are independent in equation (2-26), the sampling distribution of

the observed scores is a marginal of the complete data distribution (Laird, 1988). This

situation implies that sampling-based inferences such as regression imputation that make

use of the distributional properties of the marginal distribution of the observed scores are

unbiased and valid (Heitjan, 1997). However, MCAR makes the strongest assumption

among the three types of missing data mechanisms (Little, 1992).

The likelihood function of the observed scores under the MCAR assumption in

equation (2-26) can be factorized into two components, one pertaining solely to the

structural parameter 0of the model and the other pertaining solely to the nuisance

parameter y of the missing data mechanism.

L(, I Y,,X,R) oc f(R I y)f(Yo 10) (2-27)

When the joint parameter space of (0, V) is the product of the parameter space of each

separately, that is the two parameters 0and yare independent, the likelihood of based

on Yo, L( I Yo) is a function proportional tof(Y, I 0) (Chirembo, 1995),

L( I Y,) c f(Y, I0) (2-28)

Since L( I Yo) is proportional tof(Yo I 0), it is not necessary to specify the

missing data mechanism when using likelihood-based inferences to obtain unbiased

estimates (Laird, 1988). Under the MCAR assumption, Bayesian inference or maximum

likelihood estimation of the structural parameters 0 will yield valid inferences from the









observed scoresf(Yo 9 ) without estimating the parameters y/(Rubin, 1976). That is why

the missing data mechanism is ignorable for likelihood-based inferences.

The pattern of missingness on Y under the MCAR assumption is completely

randomly determined. The MCAR assumption can be assessed by comparing

distributions of missing variable Y for respondents and nonrespondents on covariates X to

check evidence for a systematic difference between nonrespondents and respondents

(Curran, Bacchi, Hsu Schmitz, Molenberghs, & Sylvester, 1998).

Missing at Random

MAR is based on a weaker assumption. Under the MAR assumption, the

conditional probability distribution of missing data indicator R given X depends on the

observed scores Yo, but not the missing scores Ym (Little, 1995)

f(R I Yo, Y, X, y) =f (R I Yo, X, V) for all Y, (2-29)

For example, the probability of missing responses to an item in an achievement test

depends on the scores of the measured variables (e.g., number of instruction hours on test

taking skills received), but not on the missing scores of the item itself (e.g., item

difficulty).

As indicated in equation (2-29), the distribution of the missing data indicator R

does not depend on the missing scores. The first term of the marginal probability

distribution in equation (2-23) can therefore come out of the integral, and equation (2-23)

can be expressed as

f(Y,,R I X,0,y)= f(R I Yo,X,) If(Y ,Y. X,9)dY, (2-30)









As in equation (2-21), the probability distribution of Yo is the marginal probability

distribution.

f(Y I X,0) = ff(Y,Y I X,o)dY (2-31)

The probability distribution of the observed scores under the MAR assumption then

becomes

f(Yo, R I X, 0, ) = f(R I Yo,X, )f(Y, I X,0) (2-32)

When data are MAR, the sampling distribution of the observed scores no longer

equals the ordinary marginal distribution, but depends upon the missing process (Laird,

1988). Hence sampling-based inferences are biased. On the other hand, the likelihood

function of the observed scores under the MAR assumption in equation (2-32) can be

factorized into two components:

L(0,y I Yo,X,R) oc f(R I Y,,X,y)f(Y, I X,0) (2-33)

Once again, when the two parameters 0 and V/are functionally unrelated, the likelihood

of the structural parameters Obased on Yo is a function proportion to the marginal

probability distribution of Yo (Rubin, 1976).

L( I Yo,X) o f(Yo IX,0) (2-34)

Thus the missing data mechanism under the MAR assumption is also ignorable for the

likelihood-based inferences (Rubin, 1976). In summary, when the missing data

mechanism is ignorable, the imputation model does not have to include a distribution of

the missing data indicator R, and the likelihood function of 0 is based on only the

observed scores Yo.









Rubin (1976) showed that the response mechanism generating the missing data is

ignorable for likelihood-based inferences if the parameter 0 of the data model and the

parameter / associated with the missing data mechanism are independent or functionally

unrelated, and the missing data are MAR.

Conceptually, the missing values under the MAR assumption are a random

sample from the complete data after conditioning on the measured variables X in the

imputation model; therefore, the process of creating these missing values can be modeled

using these variables (Barard, Du, Hill, & Rubin, 1998). For example, the percent of

missing responses in an item of an achievement test differs in groups of examinees with

high, medium, and low cognitive-ability scores, and the scores of the cognitive ability for

all examinees are known. Under the MAR assumption, the missing responses are

randomly distributed within these three subgroups of examinees, even when the

responses are not missing at random across subgroups (Roth, 1994). In other words, the

measured variables X (i.e., the cognitive ability in this example) can account for the

differences in the distribution of Y between nonrespondents and respondents (Little &

Schenker, 1995).

In addition to MAR, Roth (1994) identified another pattern of missingness when

missing data are related to other variables. In this pattern, missing data are nonrandomly

distributed across and within subgroups. For example, more scores are missing at the

bottom range of the high cognitive-ability group but a relatively few missing scores at the

top range of the same group (Roth, 1994).









Accessible Missing Data Mechanism

Graham and Donaldson (1993) defined the missing data mechanism as

"accessible" when the cause ofmissingness has been measured, whereas ignorablee"

refers to a combination of accessible and proper use of the cause of missingness for

analysis. Graham, Hofer, and Piccinin (1994) explained that unless the cause of

missingness is incorporated properly in the analysis, the mechanism will not be ignorable.

Schafer (1997) pointed out that whether the missing data mechanism is ignorable

is closely related to the fullness of the observed scores Yo, the relevant variables X (i.e.,

causes ofmissingness), and the complexity of the data modelf(Yo I X, 0). If Yo and X

contain a lot of information for predicting Y,, and are incorporated properly in the

imputation model for analysis, then the residual dependence of R upon Ym after

conditioning on Yo and X will be small (Schafer, 1997). Including relevant variables X,

(covariates, variables that relate to the nonresponse, and predictive variables that explain

a considerable amount of variance of Yin the model) help to reduce the uncertainty of the

imputations (van Buuren et al., 1999), and thus to adjust bias associated with the missing

data (Graham et al., 1994, 1997).

Barnard and Meng (1999) advocated the adoption of a "sensible imputation

model", which incorporates as many relevant variables for the cause of missingness, and

at the same time keeps the model-building and fitting feasible so as to reduce

multicollinearity problems. It is suggested that including extra variables may affect

precision, but not bias in the inferences (Rubin 1987); on the other hand, leaving out

relevant causes of missingness will yield biased estimation (Schafer, 1999; Schafer &

Olsen, 1998).









Nonignorable Missing

Under the NIM assumption, the conditional probability distribution of missing

data indicator R given X is a function of the missing scores Y,, or the values of

unmeasured relevant variables, and possibly also the observed scores Yo (Laird, 1988).

The unmeasured variables may be unavailable or inaccessible.

f(R I Yo, Y,,,, X, ) (2-35)

For example, the probability of missing responses to an item in an achievement test

depends on the missing scores of the item itself(e.g., item difficulty) and/or the

examinees' true unobserved parameter (e.g., examinees' ability).

Since the conditional probability distribution (2-35) can not be simplified, NIM

involves joint probability distribution modeling of both the complete dataf(Yo, Y,,, X, 0)

for Y, the missing data mechanismf(R I Yo, Y,, X, V) and the joint estimation of 0and y/

from Yo and R, respectively (Schafer, 1997).

Little and Rubin (1987) suggested two ways to factorize the joint distribution of

the complete data Y and missing data indicator R. One is based on the selection models:

f(Y, RI X, y) =f(Y I )f( (R I Y, v) (2-36)

where

f(Y 0) is the model for the complete data Y, and

f(R I Y, V) is the model for the missing data mechanism.

The other is pattern-mixture models:

f(Y, R | q, n) =f(YI R, p)f(R I ) (2-37)

where









f(Y I R, 4p) represents the distribution of Y conditioning on the missing data indicator R,

f(R I T) represents the marginal distribution of the missing data indicator for whether or

not Y is missing, and

p and rare the two unknown parameters corresponding to the two distributions.

Selection models specify the precise form of the nonresponse model, whereas

pattern-mixture models incorporate the assumption of the missing data mechanism

through restrictions on the parameters (Little, 1995). When R is independent of Y (i.e.,

when 0= p and y= a), the missing data mechanism becomes MCAR, and the selection

models are equivalent to the pattern-mixture models.

The likelihood function on the observed scores under the NIM assumption include

a missing data indicator R and the missing data parameters y.

L(0,y | Yo,X,R) oc f(Y,R I X,0,y) (2-38)

The joint distribution for Y and R typically involves more parameters such as the

YR interaction term than can be estimated from Yo and R alone (i.e., under-identifiable)

(Little, 1995). In order to make them identifiable so that valid likelihood-based inferences

can be made about the marginal responses, a restriction on the assumption is required.

Schafer (1997) suggested that a'priori restrictions be imposed on either the joint

parameter space for 0and V, or the Bayesian prior distribution (0, /).

Conceptually, under the NIM assumption, the distribution of respondents and

nonrespondents on Y differs systematically, even after conditioning on the values of

measured variables X (Rubin & Schenker, 1991). Compared to equation (2-3), the

posterior predictive probability distribution under the NIM assumption needs to include a









full specification of the probability model with the joint distribution of Y, the nonresponse

pattern R, and the measured variables X.

f(Y I YX, R) = Jf(Y Yo,,X,R, )f(9 1 Y,,X,R)dO (2-39)

Sensitivity Analysis

Often time, little is known about the nonresponse mechanism that creates the

missing responses in a particular achievement test. Missing responses can arise from a

variety of reasons including a combination of ignorable and nonignorable mechanisms

(Schafer & Olsen, 1988). However, distinguishing between ignorable and nonignorable

mechanisms (i.e., MAR and NIM) relies on fundamentally untestable assumptions

(Curran et al., 1998). Curran and associates (1998) demonstrated that these assumptions

cannot be tested formally from the empirical data at hand. Analyses should be conducted

to compare the estimates across a number of plausible missing-data models. Inferences

from the sensitivity analysis reveal uncertainty about reasons for nonresponse (Beaton,

1997; Little & Rubin, 1989). Sensitivity analysis can also be conducted across alternative

imputing procedures in a similar manner to reveal uncertainty about different possible

imputation models.

Under the NIM assumption, sensitivity analysis can be performed by comparing

estimates between selection and pattern-mixture models. If the results are consistent,

confidence about the conclusions is established. On the other hand, if the results depend

on the form of the model, then more specific conditions can be suggested about where the

conclusion can apply (Little, 1995).






44


Summary

Valid inferences from MI rely on the inclusion of a correct missing data

mechanism. As discussed above, factorization of the posterior predictive probability

distribution depends on whether the missing data mechanism is ignorable or not. When

the missing data mechanism is not ignorable, the missing data indicator has to be

incorporated into the posterior predictive probability distribution, and the likelihood

function of 0 is not just based on the observed scores.













CHAPTER 3
METHODOLOGY

This chapter first describes the data generation procedure, the design of the study,

and the MI procedure. Data generation was based on the three-parameter logistic model

(Hambleton & Swaminathan, 1985). The design of this study involved the distribution

and percent of missing responses as a function of the ability of the examinees and the

difficulty of the items in one omitting pattern, and the ability of the examinees and the

sequence of the items in another omitting pattern. The procedure of MI based on a

logistic regression model with a univariate Ywas outlined. The second part of this

chapter discusses how to evaluate the effectiveness of MI on handling missing data.

To achieve the goal of evaluating the effectiveness of MI on handling missing

data, several steps were required.

Step 1. Simulated a complete data matrix of item responses for a specified number of

examinees.

Step 2. Computed the coefficient alpha. The coefficient alpha of this original complete

data (i.e., 0% of missing ) served as a benchmark for later comparison.

Step 3. Nonrandomly deleted certain percent of item responses from the complete

examinee-by-item matrix generated in Step 1. Each missing data set was generated in a

similar fashion.

Step 4. Replaced the omitting item responses in Step 3 using MI.

Step 5. Computed the coefficient alpha of the data set that was restored by MI.









Step 6. Compared the coefficient alpha from Step 2 with the one from Step 5.



Simulation Procedure

Let W be an Nx P matrix representing a complete examinee-by-item data set. N

is the number of examinees in the data set and P is the number of test items. In this study,

P was fixed to be 20 in all conditions. A 20-item test was used because it represented test

lengths frequently found in educational and psychological applications (Yen, 1987) such

as the American Mathematical Association of Two-Year Colleges' Student Mathematics

League contest (Isaacson & Smith, 1993). The response of each item was dichotomous in

nature. Simulation of the 20 dichotomously-scored item responses of a specified number

of examinees was based on the three-parameter logistic model (Hambleton &

Swaminathan, 1985).


() = c, + (1- c) exp[ b)] (3-1)
1 + exp[Da,(2 b,)]

for i = 1,..., 20, andj = 1,..., n.

where

P(~5) is the probability of thejth examinee with an ability answering the ith item

correctly,

ai is the discrimination parameter of item i,

bi is the difficulty parameter of item i,

c; is the pseudo-chance parameter of item i,

. is the ability ofthejth examinee, and

D is a scaling factor, which is -1.7.









To compute P(<), it required the three parameters (i.e., a, b, & c) and the ability

parameters (i.e., 4) to be known. The three parameters were drawn from the distributions

of known mean and standard deviation. Harrison (1998) used the criteria of Oshima's

(1994) study to sample the three parameters. The criteria for Oshima's (1994) study were

as follows: the item discrimination parameters (i.e., a) were randomly drawn from a

lognormal distribution with a mean of 1.13 and a standard deviation of 0.63; the item

difficulty parameters (i.e., b) were randomly drawn from a normal distribution with a

mean of 0 and a standard deviation of 1; and the pseudo-chance parameters (i.e., c) were

randomly drawn from a normal distribution with a mean of 0.25 and a standard deviation

of 0.05. According to Oshima (1994), the distributions of these three parameters were

similar to the real data set of a speeded test (i.e., TOEFL) as reported by Way and Reese

(1991). The ability parameters, 4, for the examinees were randomly generated from a

standard normal N(0,1) distribution. The present study used the same values of the three

item parameters as in Harrison's (1998, p. 7) study to generate the item responses (Table

3-1). The correlation between the item difficulty parameters (b) and the item

discrimination parameters (a) was .111 (p = .642, two-tailed), whereas the correlation

between the item difficulty parameters (b) and the pseudo-chance parameters (c) was

.281 (p =.230, two-tailed).

The response data for a particular item given by an examinee with a trait level of

ability, P,({), was determined by computing the probability of correctly answering that

item based on the known item and ability parameters. Since the item responses were

dichotomous in nature, the response probabilities Pi(.) were converted into binary

responses by comparing to a random number r generated from a uniform distribution in








the interval between 0 and 1. The random number r was used to determine whether the

score of a particular item was correct or incorrect. If the response probability obtained

from the equation (3-1) was greater than the random number r, a 1 was assigned, which

indicated the examinee's response to that particular item was correct; otherwise a 0 was

assigned, which indicated the examinee's response to that particular item was incorrect.

Kuder-Richardson 20 Formula

Since the test items are scored dichotomously, Kuder-Richardson 20 formula (KR

20) was used to calculate the index of internal consistency for the test items. The Kuder-

Richardson 20 formula is equivalent to coefficient alpha when the item responses are

dichotomous in nature (Kuder & Richardson, 1937).


[ s"p iqi
KR 20= 1s- (3-2)
s-1 or,


where

s is the number of items in the test,

a is the variance of the test scores,

pi is the proportion of subjects answering item correctly,

qi is the proportion of subjects answering item incorrectly, and

piqi is the variance of scores on a single item i.











Table 3-1. Item Parameters Used for Test Simulation


Item

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20


Discrimination (a)

0.269

0.236

2.817

8.565

1.452

1.043

1.594

1.258

5.502

2.468

1.016

3.413

2.238

2.370

2.635

0.533

1.601

2.809

0.036

7.637


Difficulty (b)

-0.772

-0.129

-0.979

0.235

0.072

-1.245

-1.504

0.545

-0.802

2.408

-0.048

2.062

0.262

-1.158

-0.314

-0.536

1.177

-0.471

-0.475

-0.203


Note. Adapted from Harrison (1998)


Pseudo-chance (c)

0.176

0.200

0.257

0.193

0.264

0.246

0.229

0.221

0.250

0.306

0.231

0.240

0.287

0.207

0.276

0.319

0.320

0.261

0.297

0.328









Design of Study

This study represents a 3 x [(3 x 3) + 1] x 2 design with three factors: sample size

(3 levels), percent of examinees with missing items (3 levels), percent of items missing

for each examinee with missing items (3 levels), and omitting pattern (2 levels) that were

fully crossed. An additional condition with disproportional percent of examinees missing

items that were nonrandomly distributed across and within each ability group was

included. The rationales for selecting the levels in each factor were described below.

Sample Size

The three levels of sample size chosen in this study were: N= 50, 100, or 500.

The sample size of 50 examinees is typical for validation studies (Schmidt, Hunter, &

Urry, 1976). The sample size of 500 was same as the real data that Raghunathana and

Siscovick (1996) used in studying the performance of MI. These three levels,

representing the small to large-sample sizes, were also used by Graham and Schafer

(1999) to evaluate the efficiency of MI in a simulation study. The present study adopted

these three levels of sample size to allow comparison of the performance of MI with that

of other MDTs investigated by Harrison (1998).

Distribution and Percent of Missing Responses

In order to simulate a more realistic distribution and percent of nonresponse

across test items, the distribution and percent of missing responses were based on the

findings from a large scale study, the Reading Comprehension subtest, Level I, of the

Comprehensive Test of Basic Skills, Form S (Cluxton & Mandeville, 1979). In Cluxton

and Mandeville's (1979) study, they stratified one thousand third grade students into

three ability levels--high, medium, and low. They found the proportion of students with









missing items within each stratified ability level was: 0-20% for the high ability group,

20-80% for the medium-ability group, and 90-100% for the low-ability group. They also

reported the proportion of missing items (out of the 45 items in the subtest) for students

within each stratified ability level was approximately: 7-16% for the high-ability group,

18-38% for the medium-ability group, and 40-49% for the low-ability group. The

correlation between the ability of students and the number of items missing in the body of

the test was -.76; and the correlation between the ability of students and the number of

items missing at the end of the test was -.47 (Cluxton & Mandeville, 1979).

Based on the range of the proportion of students with missing items within each

ability level, and the range of proportion of items missing provided in Cluxton and

Mandeville's (1979) study, the distribution and percent of missing responses in this study

were constructed in four steps:

First, the ability of the examinees in a sample were rank ordered. Second, the

examinees in each data set were stratified into three ability levels. Stratification was

based on the assumption that the data were normally distributed N (0,1). Plus and minus

one standard deviation in each sample were used as the cut-off to stratify the three ability

groups. As a result, approximate 68% of the examinees were within the one standard

deviation band and these students were classified as the medium-ability group. About

16% of the examinees were above one standard deviation and these students were

classified as the high-ability group, and about 16% of examinees were below one

standard deviation and these students were classified as the low-ability group.

For the percent of examinees with missing items (%EMI), three conditions

(%EMII, %EMI2, and %EMI3) were constructed. In the first condition %EMIi, the









percent of the high, medium, and low-ability examinees missing some test items were

0%, 20%, and 90% respectively. In the second condition %EMI2, the percent of the high,

medium, and low-ability examinees missing some test items were 10%, 50%, and 95%.

In the third condition %EMI3, the percent of the high, medium, and low-ability

examinees missing some test items were 20%, 80%, and 100%. The above three

conditions respectively corresponded to the minimum, the median, and the maximum

percent of examinees with missing items in each ability level provided by Cluxton and

Mandeville (1979).

Fourth, for the percent of items missing in those examinees with missing items

responses (%IM), another three conditions (%IMi, %IM2, and %IM3) were constructed.

The first condition %IMI was 7% of the items missing in the high-ability group, 18% of

the items missing in the medium-ability group, and 40% of the items missing in the low-

ability group. The second condition %IM2 was 12% of the items missing in the high-

ability group, 28% of the items missing in the medium-ability group, and 45% of the

items missing in the low-ability group. The third condition %IM3 was 16% of the items

missing in the high-ability group, 38% of the items missing in the medium-ability group,

and 49% of the items missing in the low-ability group. The three conditions respectively

corresponded to the minimum, the median, and maximum percent of items missing in

each ability level provided by Cluxton and Mandeville (1979).

The above two sets of conditions were crossed to create nine missing conditions

as shown in Figure 3-1. For example, the results of one combination were 20% of the

high-ability examinees with three missing items (i.e., 16% of the 20 test items), 80% of

the medium-ability examinees with eight missing items (i.e., 38% of the 20 test items),








and 100% of the low-ability examinees with ten missing items (i.e., 49% of the 20 test

items). The distribution and percent of missing responses represented the typical range of

missing data in educational tests, which is approximately 10-30% (Roth, 1994).


Range / Condition


Ability



H

M

L



M

L



M


y


Percent 7

0

20

90



50

95



80


Max / %IM

^~~JI-


L1


40


Med / %IM2



12 28 45


Min / %IM3


161


38


Note. Max = Maximum, Med = Medium, and Min = Minimum.

Figure 3-1. Distribution and percent of missing responses in the nine missing conditions.


49


Max /

%EMI1





%EMI2





%EMI3


I









In addition to the nine missing conditions, an additional condition with

disproportional percent of examinees omitted items that were nonrandomly distributed

across and within each ability group was included (Roth, 1994). For example, more items

were missing at the bottom range of the high-ability group but a relatively fewer items

were missing at the top range of the same group (Roth, 1994). The procedure was to

stratify each ability group (high, medium, and low) into three sub-strata. Stratification

once again was based on plus and minus one standard deviation of the sample size within

each of the three ability groups. The percent of examinees missed some items within each

sub-stratum ability group (%EMIs)was: 0, 10, 20 (in the high-ability group); 20, 50, 80

(in the medium-ability group); 90, 95, 100 (in the low-ability group). The corresponding

percent of item missing within each ability sub-stratum (%IMs) was: 7, 12, 16 (for the

high-ability group); 18, 28, 38 (for the medium-ability group); and 40, 45, 49 (for the

low-ability group). The two situations were then crossed to form the tenth condition.

Table 3-2 summarized the distribution and percent of missing responses of the ten

missing conditions.









Table 3-2. Summary the Distribution and Percent of Missing Responses of the Ten
Missing Conditions


Condition Description

%EMIi x %IM| 0% of the high-ability examinees having one missing item (i.e., 7% of the

20 test items) plus 20% of the medium-ability examinees having four

missing items (i.e., 18% of the 20 items) plus 90% of the low-ability

examinees having eight missing items (i.e., 40% of the 20 items). The total

percent of missing data is approximately 8.4%.

%EMI, x %IM2 0% of the high-ability examinees having two missing items (i.e., 12% of

the 20 test items) plus 20% of the medium-ability examinees having six

missing items (i.e., 28% of the 20 items) plus 90% of the low-ability

examinees having nine missing items (i.e., 45% of the 20 items). The total

percent of missing data is approximately 10.5%.

%EMIi x %IM3 0% of the high-ability examinees having three missing items (i.e., 16% of

the 20 test items) plus 20% of the medium-ability examinees having eight

missing items (i.e., 38% of the 20 items) plus 90% of the low-ability

examinees having ten missing items (i.e., 49% of the 20 items). The total

percent of missing data is approximately 12.6%.

%EMI2 x %IMN 10% of the high-ability examinees having one missing item (i.e., 7% of the

20 test items) plus 50% of the medium-ability examinees having four

missing items (i.e., 18% of the 20 items) plus 95% of the low-ability

examinees having eight missing items (i.e., 40% of the 20 items). The total

percent of missing data is approximately 13.3%.










Condition Description

%EMI2 x %IM2 10% of the high-ability examinees having two missing items (i.e., 12% of

the 20 test items) plus 50% of the medium-ability examinees having six

missing items (i.e., 28% of the 20 items) plus 95% of the low-ability

examinees having nine missing items (i.e., 45% of the 20 items). The total

percent of missing data is approximately 17.6%.

%EMI2 x %IM3 10% of the high-ability examinees having three missing items (i.e., 16% of

the 20 test items) plus 50% of the medium-ability examinees having eight

missing items (i.e., 38% of the 20 items) plus 95% of the low-ability

examinees having ten missing items (i.e., 49% of the 20 items). The total

percent of missing data is approximately 21.9%.

%EMI3 x %IMI 20% of the high-ability examinees having one missing item (i.e., 7% of the

20 test items) plus 80% of the medium-ability examinees having four

missing items (i.e., 18% of the 20 items) plus 100% of the low-ability

examinees having eight missing items (i.e., 40% of the 20 items). The total

percent of missing data is approximately 17.4%.

%EMI3 x %IM2 20% of the high-ability examinees having two missing items (i.e., 12% of

the 20 test items) plus 80% of the medium-ability examinees having six

missing items (i.e., 28% of the 20 items) plus 100% of the low-ability

examinees having nine missing items (i.e., 45% of the 20 items). The total

percent of missing data is approximately 23.8%.

%EMI3 x %IM3 20% of the high-ability examinees having three missing items (i.e., 16% of

the 20 test items) plus 80% of the medium-ability examinees having eight

missing items (i.e., 38% of the 20 items) plus 100% of the low-ability










Condition Description

examinees having ten missing items (i.e., 49% of the 20 items). The total

percent of missing data is approximately 30.2%.

%EMIs x %IMs 0% of the upper division of the high-ability examinees having one missing

item (i.e., 7% of the 20 test items) plus 10% of the middle division of the

high-ability examinees having two missing items (i.e., 12% of the 20

items) plus 20% of the lower division of the high-ability examinees having

three missing items (i.e., 16% of the 20 items) plus 20% of the upper

division of the medium-ability examinees having four missing items (i.e.,

18% of the 20 test items) plus 50% of the middle division of the medium-

ability examinees having six missing items (i.e., 28% of the 20 items) plus

80% of the lower division of the medium-ability examinees having eight

missing items (i.e., 38% of the 20 items) plus 90% of the upper division of

the low-ability examinees having eight missing items (i.e., 40% of the 20

test items) plus 95% of the middle division of the low-ability examinees

having nine missing items (i.e., 45% of the 20 items) plus 100% of the

lower division of the low-ability examinees having ten missing items (i.e.,

49% of the 20 items. The total percent of missing data approximately

18.0%.









Omitting Pattern

The two nonrandom omitting patterns were: (1) omitting item responses in the

body of the test (OPB), and (2) omitting item responses at the end of the test (OPE) (i.e.,

non-reached). The first situation was similar to the missing data mechanism 5 in

Harrison's (1998) study. However, in contrast to Harrison's (1998) study where

examinees with the lowest abilities missed the most difficult items, examinees with

different levels of abilities missed the most difficult items differentially. That meant that

the high-ability examinees missed fewer difficult items than those of the medium-ability

examinees, and in turn the medium-ability examinees missed fewer difficult items than

those of the low-ability examinees (see Figure 3-2).

The non-reached pattern was similar to the combination of missing data

mechanism 2 and 3 in Harrison's (1998) study. However, the selection of missing

responses was based on the examinees' ability instead of random selection as the missing

data mechanism 2 in Harrison's (1998) study. Once again, the high-ability examinees

missed fewer items at the end of a test than those of the medium-ability examinees, and

in turn the medium-ability examinees missed fewer items at the end of a test than those of

the low-ability examinees (see Figure 3-3).












Least difficult Most difficult
Item
Difficulty 4 i II ,:. I. : 3 12 7 I1
Person
Ability
S12 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1
0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0
0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 0
4 1 0 1 0 0 0 1 10 0 1 1 1 1 1 1
8 I 1 I 0 I 0 1 1 I I 1 I I I 0
12 0 0 0 1 0 0 1 0 0
1 111 0 1 0 0 110 0 1 1 1
10 I III 10
10 1 1 1 1 0 001 1 I1
13 1 0 0 1 0 0 1 0 0 1 1 0 1



1 1 1 0 0 1 1 0 0 0 1




Figure 3-2. Illustrate the omitting pattern of missing item responses in the body of the test
(OPB) with 15 examinees and 20 test items.












I 2 3 4 5 I 7 a 9 10 II 12 1 U 14 I 16 1 I S 19 2

(high ability) 1 0 1 0 1 0 0 0 I I 0 0 1 I I 1 1 1

0111110101111111101

1.4 0 000 00 1111111
8 I I I 0 1 0 I I I I I 1 I 1 0
12 0 1 0 0 0 0 I I 0 I 0 I 0 I
1 I I 0 0 0 0 0 I 0 0 I I
10 1 1 1 1 1 0 1 0 1 1 1 1 1 1
10. I 0 0 I 0 0 1 0 0 I I 0
3 1 0 1 1 1 1 1 1 1 0 1
2 I I I I 0 I 0 1 I 0 0
.. ... : 0 1 0 0 1 1 0 1 0 0


9 00011101
14 1 1 0 0 1 1 0 000
7 1 1 0 0 1 0 0 0 1
9 I 0 0 0 I I I 0 1
(lower ability)

Figure 3-3. Illustrate the omitting pattern of missing item responses at the end of the test
(OPE) with 15 examinees and 20 test items.









Cause of Missingness

Under the omitting pattern in which item responses was omitted in the body of the

test, the examinees' ability and the item difficulty provided an indirect evidence about the

likely values of the missing responses. On the other hand, when the item responses were

omitted at the end of the test, the examinee's ability and the item effect, which is the

random effect of the test items, provide an indirect evidence about the likely values of the

missing responses. Since the cause of missingness in this study was under the

researcher's control, and the differential amount of missing responses was a function of

the examinees' ability and item difficulty, or a function of the examinee's ability and

item effect depending on the omitting patterns, the missing data mechanism could be

considered as missing at random (Graham et al., 1997). The missing data mechanism was

therefore ignorable.

Iterations

For each of the 60 conditions (3 levels of sample size, 10 levels of the distribution

and percent of item response omission, and 2 levels of omitting pattern), one thousand

iterations were performed to ensure stable results. The one thousand iterations have been

used in two previous simulation studies in the evaluation of the efficiency of MI (Glynn,

Laird, & Rubin, 1993; Graham et al., 1996). The iterations resulted in generating 1,000

repeated data set for each level of sample size.



Multiple Imputation Procedure

Let Ybe an N x 1 vector of measures with Y~ N (X,/, o), where X? covariate

was a function of the parameters, X was a matrix of examinees' ability and item difficulty









variables under the situation when the omitting item responses were in the body of the

test, or examinees' ability and item effect variables under the situation when the omitting

item responses were at the end of the test, and / was a vector of regression parameters to

be estimated. The distribution of was assumed to be multivariate normal. The algorithm

for creating ten multiply imputed Ym involved the following steps (Freedman, 1990;

Freedman & Wolf, 1995):

Step 1. Specified a particular form of imputation model to predict the value of a missing

variable Y and estimate the parameter vector 3 of regression coefficients using the portion

of the sample with complete data. Since the item responses in this study were

dichotomous in nature, the prediction model required a logistic regression model with a

univariate Yof the form


Logit (pj) = In pj = o + Xo+ j + f2X2j (3-3)
l-Pj)

where = 1,..., n examinees.

The set of predictors entered the explicit model to create imputations for OPB

differed from that for OPE. Under the situation when the omitting item responses were in

the body of the test, XIj was the examinee's ability, and X2j was the item difficulty,

whereas under the situation when the omitting response were at the end of the test, Xij

was the examinee's ability, and X2j was the item effect, which was the random effect of

the 20 items. As a result, there were two distinct logistic regression models for the MI

procedure.

The posterior probability of the response given Xij and X2j was

pj = E (Yj Xi, X2) = Pr (Yj = 1 I Xj, X2j) (3-4)









The logistic regression model assumed that the logit of the posterior probability was a

linear combination of the Xij and X2j variables.

1 if logit [ ^Xj] > ,
Yj = (3-5)
0 otherwise,

where i, was a random error.

Regressed Y, on the corresponding X. matrix gave the ordinary least squares estimates:

estimated regression coefficient vector f and estimated variance-covariance matrix E.

Step 2. Randomly drew from the sampling distribution of regression coefficients.

Estimated the regression coefficients by adding the random error to account for the

uncertainty about the regression prediction, which was the same as equation (2-11). Each

repetition used a distinct value of/* common across all imputed cases. This value was

drawn independently from the multivariate normal distribution of the estimated vector/8.

Step 3. Given an estimate of / and the value for X1, X2, the probability of answering an

item correctly could be predicted with equation (3-5) by drawing a value of u drawn

from the uniform [0,1] distribution. Each set of predicted values Y. was based on

different sets of regression predictions and an independent drawn value of u The

probability of a correct response for respondents in the kth repetitions, Pj(k), was

calculated with the randomly selected regression coefficients and the value ofj for the

corresponding covariates from the logistic regression:


P 1' =1+ex. Iwx 2)X2









Step 4. The estimated probability j(k) from the logistic regression was compared to a

random number t from the uniform [0,1] distribution for each missing score. If the

predicted probability pj(k) was less than t then the imputed value for YJ(k) was assigned a 1;

otherwise the imputed value was assigned a 0. These probabilities were used to impute

the missing scores.

Step 5. Conducted ten repetitions which meant repeating steps 2 to 4 ten times to create a

series of ten imputed values (i.e., ten distinct imputed data sets).

Step 6. Computed the KR 20 (i.e., coefficient alpha) separately in each of the ten imputed

complete data sets. This resulted in ten separate coefficient alphas.

Step 7. Using the equation (2-15), the final adjusted coefficient alpha was obtained by

taking the simple arithmetic average of the ten coefficient alphas. This final coefficient

alpha was then compared to the one obtained from the original complete data set.



Evaluating the Performance of Multiple Imputation

The accuracy (bias and precision) of the coefficient alpha obtained from the

restored complete data set in each of the ten missing conditions using MI was assessed by

means of the bias and the root-mean-square error (RMSE). Measures of the bias and

RMSE were averaged over the 1,000 iterations of the simulation.

Bias is defined as the average value of the coefficient alphas derived from the

original complete data set with no missing data minus the average value of the coefficient

alphas from the corresponding imputed data set over the 20,000 (i.e., 2 x 10 x 1000)

completed tests for a particular number of examinees. The estimated coefficient alpha is









unbiased when the average deviation (i.e., bias) between the coefficient alpha obtained

from the imputed values and that of the original values in the data set is close to 0.

RMSE is defined as the square root of the average squared difference between the

coefficient alpha derived from the original complete data set with no missing data and the

coefficient alpha from the corresponding imputed data set. The estimated coefficient

alpha is precise when the RMSE is close to 0.

RMSE = (a of original data a of restored data using MI)2 (3-6)

The relationship between RMSE and bias is

(RMSE)2 = (bias)2 + (SE)2 (3-7)

where

RMSE is the root mean square error, which represents an overall error,

bias is the average deviation, which represents a systematic error, and

SE is the standard error, which represents random error.













CHAPTER 4
RESULTS

In this chapter results of the analyses of the data for the two performance criteria

are presented. The two criteria are the bias and root mean square error (RMSE). The

mean coefficient alpha and its standard deviation of the original complete data set with no

missing data for 50, 100, and 500 examinees were M = 0.765, SD = 0.033; M = 0.764,

SD = 0.023; M = 0.763, SD = 0.01 respectively. Each mean coefficient alpha was based

on the average of 20,000 (10 missing conditions x 2 omitting patterns x 1000 iterations)

values. The results of these mean coefficient alphas were very close to those computed by

Harrison (1998). For example, the mean coefficient alpha for the sample size of 50 in

Harrison's study was 0.769. The mean coefficient alphas and their standard deviation of

the restored completed data set using MI for the ten missing conditions in each of the

three levels of sample size and two levels of omitting pattern are shown in Figures 4-1 to

4-6.

The biases obtained in each of the ten missing conditions for the two omitting

patterns are summarized in Tables 4-1 and 4-2. Under the omitting pattern where missing

responses are at the end of the test (OPE), the biases (in absolute value) ranged from

0.000 to 0.030. The majority (93%) of the biases in OPE were in the magnitude of less

than 0.02. The biases (in absolute value) obtained in OPE for the sample size 50, 100, and

500 ranged from 0.001 to 0.030, 0.000 to 0.016, and 0.000 to 0.009 respectively. The

biases obtained in the omitting pattern where missing responses are in the body of the









test (OPB) were noticeably higher than those in OPE of the corresponding missing

conditions, the ), the biases (in absolute value) ranged from 0.019 to 0.069. The majority

(97%) of the biases in OPB were less than 0.06. The biases (in absolute value) obtained

in OPB for the sample size 50, 100, and 500 ranged from 0.027 to 0.069, 0.019 to 0.051,

and 0.028 to 0.054 respectively. As expected, the largest bias was in the missing

condition %EMI3 x %IM3 where the small sample size (50) accompanied with the largest

percent (30.2%) ofmissingness. The bias in this condition was 0.069. The coefficient

alphas in OPB were always overestimated (positively biased), whereas in OPE, about half

of the coefficient alphas obtained through MI was overestimated and the other half was

underestimated. Whether MI produced the coefficient alphas that were overestimated or

underestimated in OPE did not depend on the percent ofmissingness. Further research

needs to be conducted to explore why some of the coefficient alphas obtained from MI

were overestimated while others were underestimated under the same omitting pattern.

For condition %EMIs x %IMs in which nonrandom distribution of omission is

across and within each ability group (Roth, 1994), the bias obtained in this condition,

regardless of the omitting patterns, was similar to other missing conditions where the

percent ofmissingness was about the same (e.g., missing condition %EMI2 x %IM2). The

RMSEs obtained in each of the ten missing conditions for the two omitting patterns are

summarized in Tables 4-3 and 4-4. The results were very similar to those obtained for the

bias.

In general, the bias (in absolute value) or the RMSE increased as the amount of

missingness increased. Graham and Schafer (1999) explained this phenomenon by

suggesting that MI introduced bias when handling missing data. However, the pattern of









increment in the bias or the RMSE was not unidirectional as indicated in Tables 4-1 to 4-

4. There were irregularities in the magnitude of the bias or the RMSE across the ten

missing conditions within each sample size. That means in some missing conditions, the

magnitude of the bias or the RMSE for the smaller amount of missingness was bigger

than that of the larger amount of missingness even both conditions had the same sample

size. This kind of irregularity was also noticed in Graham and Schafer's (1999)

simulation study. Another general pattern revealed in this study was that the bias

decreased as the sample size increased. Once again the pattern of increment in the bias

was not unidirectional as indicated in Tables 4-1 and 4-2. There were irregularities across

the three sample sizes, and this kind of irregularity was also noticed in Graham and

Schafer's study. On the other hand, the magnitude of the RMSE in OPE, but not in the

OPB showed a clear pattern of decrement as the sample size increased (see Table 4-3 and

4-4).






1.0-
0.9
0.8-
0.7-
0.6
0.5-
0.4-
0
U 0.3-
0.2-
0.1-
0.0
1 2 3 4 5 6 7 8 9 10

Missing Condition

Figure 4-1. The mean coefficient alpha in OPB with sample size of 50.






1.0
0.9
0.8-
A 0.7
0.6
o 0.5
0.4-
o 0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10

Missing Condition

Figure 4-2. The mean coefficient alpha in OPE with sample size of 50.






1.0-
0.9
0.8-
0.7
0.6
0.5

0
u 0.4 -

0.2
0.1
0.0 -
1 2 3 4 5 6 7 8 9 10

Missing Condition


Figure 4-3. The mean coefficient alpha in OPB with sample size of 100.







1.0
0.9
0.8
c, 0.7
0.6
0.5
0.4
0
U 0.3 -
0.2
0.1
0.0 -7 7. T
1 2 3 4 5 6 7 8 9 10

Missing Condition

Figure 4-4. The mean coefficient alpha in OPE with sample size of 100









1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0 7 -
1 2 3 4 5 6 7 8 9 10

Missing Condition

Figure 4-5. The mean coefficient alpha in OPB with sample size of 500.







1.0
0.9
0.8
0.7
0.6
0.5
0.4
u 0.3
0.2
0.1
0.0....

1 2 3 4 5 6 7 8 9 10

Missing Condition


Figure 4-6. The mean coefficient alpha in OPE with sample size of 500.









Table 4-1. Bias for the Coefficient Alpha in Omitting Pattern Where Missing Responses
are in the Body of the Test



Sample size


50 100 500

Missing Approx. %
Sof Mean (SD) Mean (SD) Mean (SD)
ConditionMssngness
Missingness


%EMI1 x %IM,


%EMIl x %IM2


%EMI, x %IM3


%EMI2 x %IMi


%EMI3 x %lM,

%EM12 x %IM2

%EMI, x %IM,


%EMI2 x %IM3


%EMIs x %IM2


%EMI3 x %IM3


8.4%


10.5%


12.6%


13.3%


17.4%

17.6%

18.0%


21.9%


23.8%


30.2%


0.036 (0.017)


0.028 (0.018)


0.035 (0.019)


0.042 (0.020)


0.059 (0.021)

0.054 (0.022)

0.034 (0.020)


0.027 (0.023)


0.031 (0.024)


0.069 (0.023)


0.037 (0.012)


0.019 (0.014)


0.030 (0.015)


0.036 (0.013)


0.051 (0.015)

0.050 (0.015)

0.047 (0.015)


0.043 (0.016)


0.040 (0.016)


0.047 (0.018)


0.028 (0.005)


0.029 (0.006)


0.043 (0.006)


0.036 (0.006)


0.043 (0.007)

0.035 (0.007)

0.047 (0.007)


0.040 (0.007)


0.054 (0.007)


0.048 (0.008)










Table 4-2. Bias for the Coefficient Alpha in Omitting Pattern Where Missing Responses
are at the End of the Test



Sample size


50 100 500

Missing Approx. %
Missing of Mean (SD) Mean (SD) Mean (SD)
ConditionM
Missingness


0.017 (0.019)


0.006 (0.017)


0.009 (0.020)


0.001 (0.023)


0.013 (0.024)


-0.004 (0.022)


-0.005 (0.024)


-0.024 (0.025)


-0.030 (0.028)


0.017 (0.029)


0.011 (0.014)


-0.010 (0.014)


-0.015 (0.017)


0.016 (0.013)


0.010(0.014)


-0.013 (0.016)


0.000 (0.018)


-0.006 (0.020)


-0.008 (0.020)


0.006 (0.021)


0.007 (0.006)


-0.002 (0.006)


0.007 (0.007)


0.007 (0.006)


-0.009 (0.008)


-0.008 (0.007)


0.005 (0.007)


-0.008 (0.008)


0.000 (0.008)


-0.003 (0.009)


%EMIT x %IMI


%EMIT x %1M2


%EMIi x %IM3


%EMI2 x %IMi


%EMI3 x %IMI


%EMI2 x %1M2


%EMI, x %IM,


%EMI2 x %IM3


%EMI3 x %IM2


%EMI3 x %IM3


8.4%


10.5%


12.6%


13.3%


17.4%


17.6%


18.0%


21.9%


23.8%


30.2%









Table 4-3. RMSE for the Coefficient Alpha in Omitting Pattern Where Missing
Responses are in the Body of the Test



Sample size


50 100 500

Missing Approx. %
niin of Mean (SD) Mean (SD) Mean (SD)
MissigConditionness
Missingness


0.037 (0.017)


0.029 (0.017)


0.035 (0.018)


0.043 (0.019)


0.059 (0.021)


0.054 (0.022)


0.034 (0.019)


0.030 (0.019)


0.033 (0.021)


0.069 (0.023)


0.037 (0.012)


0.020 (0.013)


0.030 (0.014)


0.036 (0.013)


0.051 (0.015)


0.050 (0.015)


0.047 (0.015)


0.043 (0.016)


0.040 (0.016)


0.047 (0.018)


0.028 (0.005)


0.029 (0.006)


0.043 (0.006)


0.036 (0.006)


0.043 (0.007)


0.035 (0.007)


0.047 (0.007)


0.040 (0.007)


0.054 (0.007)


0.048 (0.008)


%EMI, x %IMI


%EMI, x %IM2


%EMI1 x %1M3


%EMI2 x %IMI


%EMI3 x %IMI


%EMI2 x %1M2


%EMI, x %IM,


%EMI2 x %IM3


%EMI3 x %IM2


%EMI3 x %IM3


8.4%


10.5%


12.6%


13.3%


17.4%


17.6%


18.0%


21.9%


23.8%


30.2%









Table 4-4. RMSE for the Coefficient Alpha in Omitting Pattern Where Missing
Responses are at the End of the Test



Sample size


50 100 500

Missing Approx. %
of Mean (SD) Mean (SD) Mean (SD)
ConditionMissingness
Missingness


0.021 (0.014)


0.015 (0.011)


0.018 (0.014)


0.018 (0.014)


0.022 (0.016)


0.018 (0.014)


0.020 (0.015)


0.028 (0.021)


0.033 (0.024)


0.027 (0.020)


0.015 (0.010)


0.014 (0.010)


0.018 (0.013)


0.017 (0.011)


0.014 (0.011)


0.017 (0.013)


0.014 (0.011)


0.016 (0.013)


0.017 (0.013)


0.017 (0.013)


0.008 (0.005)


0.005 (0.004)


0.008 (0.005)


0.008 (0.005)


0.010 (0.006)


0.009 (0.006)


0.007 (0.005)


0.009 (0.006)


0.006 (0.005)


0.007 (0.006)


%EMI1 x %IMi


%EMI, x %IM2


%EMIi x %IM3


%EMI2 x %IMI


%EMI3 x %IMi


%EMI2 x %IM2


%EMI, x %IM,


%EMI2 x %IM3


%EMI3 x %IM2


%EMI3 x %IM3


8.4%


10.5%


12.6%


13.3%


17.4%


17.6%


18.0%


21.9%


23.8%


30.2%













CHAPTER 5
DISCUSSION

Because there was no substantial bias for all the missing conditions, the results of

this simulation study indicated that MI is a reasonably good procedure to replace the

missing data in a single-facet crossed model in which missing responses are either in the

body of the test or at the end of the test. The majority of the biases obtained were less

than 0.05, and the magnitude was comparable to those obtained in Harrison's (1998)

study. The most significant difference was that the amount ofmissingness in the present

study was two to three times more than that used in Harrison's study, and the omitting

patterns were nonignorable.

The present study used the examinee's ability and item difficulty b as the

predictors in the logistic regression when the missing responses were in the body of the

test, and the examinee's ability and item effect i as the predictors when the missing

responses were at the end of the test. The predictors used in the present study differed

from the ones used by Harrison (1998). Harrison (1998) used examinee effect and item

effect i as the predictors. Results of using and i as the predictors in the present study

indicated that the biases for the coefficient alpha were unacceptably higher than those

obtained using and b, or and i. For example, in the missing condition %EMI3 x %IM3

with a sample size of 50, the bias obtained using thej and i as the predictors was -0.211

when missing responses were in the body of the test, and -0.233 when missing responses

were at the end of the test. This illustrated one of the limitations of MI as mentioned in









Chapter 3, namely that inference based on MI will be biased when relevant predictors are

not incorporated (Schafer, 1999; Schafer & Olsen, 1998).

An attempt to include more predictors (i.e., examinee's ability item difficulty b,

examinee effect and item effect i) in the logistic model did not help to reduce the bias.

For example, the bias obtained using 4, b,j and i as the predictors was 0.07 in the missing

condition %EMI3 x %IM3 when the omitting pattern was OPB and the sample size was

50. This illustration affirmed Rubin's (1987) suggestion that extra variables did not affect

the bias in the inferences. Further systematic studies need to be conducted to support

Rubin's claim regarding the relationship between the bias and the number of predictors.

Another possible factor that may affect the accuracy of the obtained coefficient

alphas was the extreme value in some of the item discrimination parameters (e.g., a =

7.637). Unfortunately, a simpler model such as a one-parameter Rasch model with fixed

item discrimination and pseudo-chance parameters did not help to reduce the bias. The

bias for the above missing condition (i.e., %EMI3 x %IM3) was still in the magnitude of

0.07 when using and b as the predictors.

When comparing the biases obtained from the two omitting patterns, it is

suggested that examinee's ability rather than item effect or item parameters may

contribute more to the accuracy of the parameter estimation. Further systematic

investigation is warranted.

Finally, a surprising finding was obtained when using listwise deletion to estimate

the coefficient alpha in the above missing condition (i.e., %EMI3 x %IM3)--the bias was

-0.077. The bias (in absolute value) was much smaller than those obtained from

Harrison's study (1998), even the amount of missingness was three time more. The bias









obtained from the nonrandom missing conditions (with 10% ofmissingness) in

Harrison's study was about -0.2. This surprising finding may have something to do with

the idiosyncratic nature of the missing mechanism in this study. Further research need to

systematic investigate this issue.



Limitations

The present study used examinee's ability and item difficulty as the predictors in

OPB, and used examinee's ability and item effect as the predictors in OPE. However, in

real life testing situations, the ability parameter and item parameter b need to be

estimated first. Accurate estimation of these two parameters may not be possible in

situations with a substantial amount of missing data (Bradlow & Thomas, 1998). Another

limitation is that one may not be sure of the mechanism producing the missing data.



Suggestions for Future Research

The present study only illustrated one way of using MI to analyze the data. It is

important to perform a sensitivity analysis to compare the results obtained in the present

study with those when the nonresponse model was treated as nonignorable. A comparison

of the coefficient alpha obtained using the selection approach model versus the pattern-

mixture model certainly would be informative.

The bias obtained in the present study as well as in Graham and Schafer's (1999)

study was not a linear function of the amount of missingness or the sample size.

However, no good explanation can be given based on the limited information provided in









the present study as well as in Graham and Schafer's (1999) study. This may be an

important issue for further investigation.

Because of the positively skewed distribution of the biases in OPB and the lower

bound nature of the coefficient alpha, it is suggested that using the median instead of the

mean to compute the final adjusted alpha may be worthwhile to investigate.

This study illustrated two of the most commonly encountered omitting patterns-

missing responses in the body of the test and at the end of the test. In most real life

educational tests, the types of omitting patterns are much more complicated and the

missing responses as suggested by Schafer and Olsen (1988) can arise from a variety of

reasons including a combination of ignorable and nonignorable mechanisms. Systematic

investigation of the effectiveness of different MDTs especially RMLE and MI under the

conditions involving a combination of ignorable and nonignorable is important for

examining different kinds of missing responses.

In chapter 2, several methods have been described to create the posterior

predictive probability distribution from Yo; however, today few studies have attempted to

compare different methods of data simulation applied to MI. Duncan, Duncan, and Li

(1998) illustrated the use of data augmentation and bootstrap in a structural equation

model. More studies should investigate the effectiveness of these data simulation

methods.

Obviously, the application of MI is not confined to the single-facet situation;

further study should explore the application of MI to multi-facet situations where a

generalizability coefficient is obtained. The incorporation of rater facet in nested designs






83


can be an extension of the present study to test the effectiveness of MI in handling

missing data in a more complicated situation.













REFERENCES


Angoff, W. H., & Schrader, W. B. (1984). A study of hypotheses basic to the use
of rights and formula scores. Journal of Educational Measurement, 21, 1-17.

Bacik, J. M., Murphy, S. A., & Anthony, J. C. (1998). Drug use prevention data,
missed assessments and survival analysis. Multivariate Behavioral Research, 33, 573-
588.

Barnard, J., Du, J. T., Hill, J. L., & Rubin, D. B. (1998). A broader template for
analyzing broken randomized experiments. Sociological Methods and Research, 27, 285-
317.

Barnard, J., & Meng, X. L. (1999). Applications of multiple imputation in
medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8,
17-36.

Beaton, A. E. (1997). Missing scores in survey research. In J. P. Keeves (ed.),
Educational research, methodology, and measurement: An international handbook (2nd
ed., pp. 763-766). New York: Pergamon Press.

Bradlow, E. T., & Thomas, N. (1998). Item response theory models applied to
data allowing examinee choice. Journal of Educational and Behavioral Statistics, 23, 236-
243.

Brownstone, D., &Valletta, R.G. (1996), Modeling earnings measurement error:
A multiple imputation approach. Review of Economics and Statistics, 78, 705-717.

Chirembo, A. M. (1995). Direct versus indirect methods for the estimation of
variance-covariance matrices and regression parameters when data are skewed and
incomplete. Unpublished doctoral dissertation, University of Florida, Gainesville.

Cluxton, S. E., & Mandeville, G. K. (1979, April). Latent trait models: Ability
estimates and omitted items. Paper presented at the 63rd Annual Meeting of the
American Educational Research Association, San Francisco, CA.

Crawford, S. L., Tennstedt, S. L., & McKinlay, J. B. (1995). A comparison of
analytic methods for non-random missingness of outcome data. Journal of Clinical
Epidemiology, 48, 209-219.









Crocker, L., & Algina, J. (1986). Introduction to classical and modem test theory.
New York: Holt, Rinehart, & Winston.

Cronbach, L. J. (1971). Test validation. In R. L. Thomdike (Ed.), Educational
measurement (2nd ed.). Washington, DC: American Council on Education.

Curran, D., Bacchi, M., Hsu Schmitz, S. F., Molenberghs. G., & Sylvester, R. J.
(1998). Identifying the types of missingness in quality of life data from clinical trials.
Statistics in Medicine, 17, 739-756.

DeCanio, S.J., & Watkins, W.E. (1998). Investment in energy efficiency: Do the
characteristics of firms matter? Review of Economics and Statistics, 80, 95-107.

Downey, D. G., & King, C. V. (1998). Missing data in Likert ratings: A
comparison of replacement methods. Journal of General Psychology, 125, 175-191.

Duncan, T. E., Duncan, S. C., & Li, F. (1998). A comparison of model- and
multiple imputation-based approaches to longitudinal analyses with partial missingness.
Structural Equation Modeling, 5, 1-21.

Freedman, V. (1990). Using SAS to perform multiple imputation. (Discussion
Paper Series UI-PSC-6). Washington, DC: Urban Institute,

Freedman, V., & Wolf, D. A. (1995). A case-study on the use of multiple
imputation. Demography, 32, 459-470.

Gelfand, A. E., & Smith, A. M. F. (1990). Sampling based approaches to
calculating marginal densities. Journal of the American Statistical Association, 86, 398-
409.

Glynn, R. J., Laird, N. M., & Rubin D. B. (1993). Multiple imputation in mixture
models for nonignorable nonresponse with follow-ups. Journal of the American
Statistical Association, 88, 984-993.

Graham, J. W., & Donaldson, S. I. (1993). Evaluating interventions with
differential attrition: The importance of nonresponse mechanisms and use of follow-up
data, Journal of Applied Psychology, 78, 119-128.

Graham, J. W., Hofer, S. M., Donaldson, S. I., MacKinnon, D. P., & Schafer, J. L.
(1997). Analysis with missing data in prevention research. In K. J. Bryant, M. Windle, &
S. G. West (Eds.), The science of prevention: Methodological advances from alcohol and
substance abuse research (pp. 325-366). Washington, DC: American Psychological
Association.









Graham, J. W., Hofer, S. M., & McKinnon, D. P. (1996). Maximizing the
usefulness of data obtained with planning missing value patterns: An application of
maximum likelihood procedures. Multivariate Behavioral Research, 31, 197-218.

Graham, J.W., Hofer, S.M., & Piccinin, A.M. (1994). Analysis with missing data
in drug prevention research. In L. M. Collins & L. A. Seitz (Eds.), Advances in data
analysis for prevention intervention research (NIDA Research Monograph 142, pp. 13-
62). Washington, DC: National Institute on Drug Abuse.

Graham, J. W., & Schafer, J. L. (1999). On the performance of multiple
imputation for multivariate data with small sample size. In R. H. Hoyle (Ed.), Statistical
strategies for small sample research (pp. 1-29). Thousand Oaks, CA: Sage.

Greenland, S., & Finkle, W. D. (1995). A critical look at methods for handling
missing covariates in epidemiologic regression analyses. American Journal of
Epidemiology, 142, 1255-1264.

Gross, A. L. (1997). Interval estimation of bivariate correlations with missing data
on both variables: A Bayesian approach. Journal of Educational and Behavioral Statistics,
22,407-424.

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles
and applications. Boston: Kluwer-Nijhoff.

Harrison, J. M. (1998). A comparison of strategies for estimating internal
consistency on tests with missing data. Unpublished master's thesis, University of
Florida, Gainesville.

Heitjan, D. F. (1997). Annotation: What can be done about missing data?
Approaches to imputation. American Journal of Public Health, 87, 548-550.

Heitjan, D. F., & Little, R. J. A.. (1991). Multiple imputation for the fetal accident
report and system. Applied Statistics, 40, 13-29.

Heitjan, D. F., & Rubin, D. B. (1990). Inference from coarse data via multiple
imputation with application to age heaping. Journal of the American Statistical
Association, 85, 304-314.

Isaacson, J., & Smith, G. (1993). Hosting a mathematics tournament for two-year
college students. (ERIC Document Reproduction Service No. ED 366 382)

Jamshidian, M., & Bentler, P. M. (1999). ML estimation of mean and covariance
structures with missing data using complete data routines. Journal of Educational and
Behavioral Statistics, 24, 21-41.









Kalton, G., & Kasprzyk, D. (1986). The treatment of missing survey data. Survey
Methodology, 12, 1-16.

Kim, J. 0., & Curry, J. (1977). The treatment of missing data in multivariate
analysis. Sociological Methods and Research, 6, 215-241.

Koretz, D., Lewis, E., Skewes-Cox, T., & Burstein, L. (1993). Omitted and non-
reached items in mathematics in the 1990 National Assessment of Educational Progress.
(ERIC Document Reproduction Service No. ED 378 220)

Kromrey, J. D., & Hines, C.V. (1994). Nonrandomly missing data in multiple
regression: An empirical comparison of common missing-data treatments. Educational
and Psychology Measurement, 54, 573-593.

Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test
reliability. Psychometrika, 2, 151-160.

Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine,
7, 305-315.

Landerman, L. R., Land, K. C., & Pieper, C. F. (1997). An empirical evaluation of
the predictive mean matching method for imputing missing values. Sociological Methods
and Research, 26, 3-33.

Little, R. J. A. (1992). Regression with Missing X's: A review. Journal of the
American Statistical Association, 87, 1227-1237.

Little, R. J. A. (1995). Modeling the drop-out mechanism in repeated-measures
studies. Journal of the American Statistical Association, 90, 1112-1121.

Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New
York: Wiley.

Little, R. J. A., & Rubin, D. B. (1989). The analysis of social science data with
missing data. Sociological Methods and Research, 18, 292-326.

Little, R. J. A., & Schenker, N. (1995). Missing data. In G. Arminger, C. C.
Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and
behavioral sciences (pp. 39-75). New York: Plenum.

Longford, N. T. (1994). Models for scoring missing responses to multiple-choice
items. (ERIC Document Reproduction Service No. ED 382 650)

Marcoulides, G. A. (1990). An alternative method for estimating variance
components in generalizability theory. Psychological Report, 66, 379-386.









Michiels, B., & Molenberghs, G. (1997). Protective estimation of longitudinal
categorical data with nonrandom dropout. Communication in Statistics Theory and
Method, 26, 65-94.

Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP.
Journal of Educational Statistics, 17, 131-154.

Neal, T., & Nianci, G. (1997). Generating multiple imputations for matrix
sampling data analyzed with item response models. Journal of Educational and
Behavioral Statistics, 22, 425-445.

Oshima, T. C. (1994). The effect of speededness on parameter estimation in item
response theory. Journal of Educational Measurement, 31, 200-219.

Peterson, R. A. (1994). A meta-analysis of Cronbach's coefficient alpha. Journal
of Consumer Research, 21, 381-391.

Pollard, W. E. (1986). Bayesian statistics for evaluation research: An
introduction. Beverly Hills, CA: Sage.

Raaijmakers, Q. A. (1999). Effectiveness of different missing data treatments in
surveys with Likert-type data: Introducing the relative mean substitution approach.
Educational and Psychological Measurement, 59, 725-728.

Raghunathana, E., & Siscovick, S. (1996). A multiple-imputation analysis of a
case-control study of the risk of primary cardiac arrest among pharmacologically treated
hypertensives. Applied. Statistics, 45, 335-352.

Raymond, M. R. (1987). Missing data in evaluation research. Evaluation and the
Health Professions, 9, 395-420.

Roth, P. L. (1994). Missing data: A conceptual review for applied psychologists.
Personnel Psychology, 47, 537-550.

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581-592.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York:
Wiley.

Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American
Statistical Association, 91, 473-489.

Rubin, D. B., & Schenker, N. (1986). Multiple imputation for interval estimation
from simple random samples with ignorable nonresponse. Journal of the American
Statistical Association, 81, 366-374.









Rubin, D. B., & Schenker, N. (1991). Multiple imputation in health-care
databases: An overview and some applications. Statistics in Medicine, 10, 585-598.

Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York:
Chapman & Hall.

Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in
Medical Research, 8, 3-15.

Schafer, J. L., & Olsen, M. K. (1988). Multiple imputation for multivariate
missing-data problems: A data analyst's perspective. Multivariate Behavioral Behavioral
Research, 33, 545-571.

Schmidt, F. L., Hunter, J. E., & Urry, V. W. (1976). Statistical power in criterion-
related validation studies. Journal of Applied Psychology, 61, 473-485.

Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions
by data augmentation (with discussion). Journal of the American Statistical Association,
82, 528-550.

van Buuren, S., Boshuizen, H. C., & Knook, D. L. (1999). Multiple imputation of
missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681-
694.

Wang, C. Y., Anderson, G. L., & Prentice, R. L. (1999). Estimation of the
correlation between nutrient intake measures under restricted sampling. Biometrics, 55,
711-717.

Wang, R., Sedransk, J., & Jinn, J. H. (1992). Secondary data analysis when there
are missing observations. Journal of the American Statistical Association, 87, 952-961.

Way, W. D., & Reese, C. M. (1991). An investigation of the use of simplified IRT
models for scaling and equating the TOEFL test. (ERIC Document Reproduction Service
No. ED 395 024)

Xie, F., & Paik, M. C. (1997). Multiple imputation methods for the missing
covariates in generalized estimating equation. Biometrics, 53, 1538-1546.

Yamamoto, K. (1995). Estimating the effects of test length and test time on
parameter estimation using the HYBRID model. (ERIC Document Reproduction Service
No. ED 395 035)

Yen, W. M. (1987). A comparison of the efficiency and accuracy ofBILOG and
LOGIST. Psychometrika, 52, 275-291.













BIOGRAPHIC SKETCH

Hon Keung Yuen was born in 1961 in Hong Kong. He completed his

undergraduate studies at Queensland University, Brisbane, Australia in 1986, where he

majored in occupational therapy. In 1988, he received a master of science degree in

occupational therapy from Western Michigan University. After five years of occupational

therapy practice in the field of traumatic head injury rehabilitation, Mr. Yuen's interest in

research grew. Between 1993 and 1996, he taught occupational therapy at Eastern

Kentucky University and subsequently in the Hong Kong Polytechnic University. In

1996, he began working on his Ph.D. in the College of Education at the University of

Florida, where he majored in research and evaluation methodology.

While pursuing his Ph.D., Mr. Yuen also worked full-time on the faculty of the

Occupational Therapy Department at the University of Florida. Mr. Yuen has published

over twelve articles in the American Journal of Occupational Therapy. Currently, he

serves on the editorial board of the American Journal of Occupational Therapy.








I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.



M. David Miller, Chair
Professor of Educational Psychology




I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.



Anne E. Seraphine
Assistant Professor of Educational
Psychology




I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.



Arthur J. ewman
Professor of Educational Leadership, Policy,
and Foundations




I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and quality,
as a dissertation for the degree of Doctor of Philosophy.



Kay Walker
Professor of Occupational Therapy








This dissertation was submitted to the Graduate Faculty of the College of
Education and the Graduate School and was accepted as partial fulfillment of the
requirements for the degree of Doctor of Philosophy.

August, 2000


Chairman, of Educational Psychology



Dean, College of Education



Dean, Graduate School















































































UNIVERSITY OF FLORIDA


3 1262 08555 1538




Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EOISU2U0W_L0MWZD INGEST_TIME 2013-10-24T21:06:09Z PACKAGE AA00017700_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES



PAGE 1

7+( ,03$&7 2) 08/7,3/( ,0387$7,216 21 7+( (67,0$7,21 2) &2()),&,(17 $/3+$ %\ +21 .(81* <8(1 $ ',66(57$7,21 35(6(17(' 72 7+( *5$'8$7( 6&+22/ 2) 7+( 81,9(56,7< 2) )/25,'$ ,1 3$57,$/ )8/),/0(17 2) 7+( 5(48,5(0(176 )25 7+( '(*5(( 2) '2&725 2) 3+,/2623+< 81,9(56,7< 2) )/25,'$

PAGE 2

$&.12:/('*0(176 DP LQGHEWHG WR D IHZ VSHFLDO LQGLYLGXDOV ZKR PDGH WKLV GLVVHUWDWLRQ SRVVLEOH )LUVW ZRXOG OLNH WR WKDQN P\ ZLIH .LW IRU KHU SDWLHQFH DQG XQGHUVWDQGLQJ WKURXJKRXW WKLV SURFHVV $OVR ZRXOG OLNH WR WKDQN P\ FRPPLWWHH PHPEHUV 'U 'DYLG 0LOOHU DQG 'U $QQH 6HUDSKLQH 'U .D\ :DONHU DQG 'U $UWKXU 1HZPDQ IRU WKHLU WLPH DQG VXSSRUW

PAGE 3

7$%/( 2) &217(176 JDJH $&.12:/(*(0(176 LL $%675$&7 LY &+$37(56 ,1752'8&7,21 6WDWHPHQW RI WKH 3UREOHP 5DWLRQDOH IRU WKH 6WXG\ 3XUSRVH DQG 6LJQLILFDQFH RI WKH 6WXG\ 5(9,(: 2) /,7(5$785( &RPPRQ 0LVVLQJ 'DWD 7UHDWPHQWV 0XOWLSOH ,PSXWDWLRQ 0LVVLQJ 'DWD 0HFKDQLVPV 0(7+2'2/2*< 6LPXODWLRQ 3URFHGXUH 'HVLJQ RI 6WXG\ 0XOWLSOH ,PSXWDWLRQ 3URFHGXUH (YDOXDWLQJ WKH 3HUIRUPDQFH RI 0XOWLSOH ,PSXWDWLRQ 5(68/76 ',6&866,21 /LPLWDWLRQV 6XJJHVWLRQV WR )XWXUH 5HVHDUFK 5()(5(1&(6 LLL %,2*5$3+,&$/ 6.(7&+

PAGE 4

$EVWUDFW RI 'LVVHUWDWLRQ 3UHVHQWHG WR WKH *UDGXDWH 6FKRRO RI WKH 8QLYHUVLW\ RI )ORULGD LQ 3DUWLDO )XOILOOPHQW RI WKH 5HTXLUHPHQWV IRU WKH 'HJUHH RI 'RFWRU RI 3KLORVRSK\ 7+( ,03$&7 2) 08/7,3/( ,0387$7,216 21 7+( (67,0$7,21 2) &2()),&,(17 $/3+$ %\ +RQ .HXQJ
PAGE 5

LQ WKH RPLWWLQJ SDWWHUQ ZKHUH PLVVLQJ UHVSRQVHV DUH LQ WKH ERG\ RI WKH WHVW ZDV OHVV WKDQ ,Q JHQHUDO WKH ELDV LQFUHDVHG DV WKH DPRXQW RI PLVVLQJQHVV LQFUHDVHG RU DV WKH VDPSOH VL]H GHFUHDVHG +RZHYHU WKLV SDWWHUQ LV QRW XQLIRUP DFURVV DOO WKH PLVVLQJ FRQGLWLRQV LQYHVWLJDWHG 2YHUDOO WKLV VLPXODWLRQ VWXG\ FRQILUPHG WKDW PXOWLSOH LPSXWDWLRQ LV D UHDVRQDEO\ JRRG SURFHGXUH WR UHSODFH WKH PLVVLQJ GDWD RQ WHVWV LQ ZKLFK PLVVLQJ UHVSRQVHV DUH HLWKHU LQ WKH ERG\ RI WKH WHVW RU DW WKH HQG RI WKH WHVW

PAGE 6

&+$37(5 ,1752'8&7,21 $FFXUDWH PHDVXUHPHQW RI H[DPLQHHVf DELOLW\ LQ VWDQGDUGL]HG DFKLHYHPHQW DVVHVVPHQWV UHTXLUHV WKH WHVW VFRUHV WR EH UHOLDEO\ PHDVXUHG ,QWHUQDO FRQVLVWHQF\ LV RQH W\SH RI UHOLDELOLW\ WKDW LQGLFDWHV KRZ VWURQJO\ WKH WHVW LWHPV ZLWKLQ WKH VDPH FRQVWUXFW DUH FRUUHODWHG ,QWHUQDO FRQVLVWHQF\ RI D WHVW DSSHDOV WR HGXFDWRUV EHFDXVH LW UHTXLUHV RQO\ D VLQJOH DGPLQLVWUDWLRQ RI RQH IRUP RI D WHVW &RHIILFLHQW DOSKD &URQEDFK f LV D FRPPRQO\ XVHG LQGH[ WR HVWLPDWH WKH LQWHUQDO FRQVLVWHQF\ RI D WHVW 7KH LQGH[ LV QRW D GLUHFW HVWLPDWH RI WKH WKHRUHWLFDO UHOLDELOLW\ FRHIILFLHQW EXW LV DQ HVWLPDWH RI WKH ORZHU ERXQG RI WKH LQWHUQDO FRQVLVWHQF\ &URFNHU t $OJLQD f $FFRUGLQJ WR 3HWHUVRQ f WKH IRUPXOD IRU FRPSXWLQJ FRHIILFLHQW DOSKD Df FDQ EH H[SUHVVHG DV ; A ;:r LV ZKHUH V LV WKH QXPEHU RI LWHPV LQ WKH WHVW FU@ LV WKH YDULDQFH RI WKH WHVW VFRUHV MI LV WKH YDULDQFH RI D VLQJOH LWHP L \cDVUL6 LV WKH FRYDULDQFH EHWZHHQ LWHP L DQG LWHP V DQG ULV LV WKH FRUUHODWLRQ EHWZHHQ LWHP L DQG LWHP V f

PAGE 7

2U D VU f U^V f ZKHUH U LV WKH DYHUDJH LQWHULWHP FRUUHODWLRQ $V LQ WKH HVWLPDWLRQ RI WKH 3HDUVRQ SURGXFW PRPHQW FRUUHODWLRQ FRHIILFLHQW FRPSXWDWLRQ RI FRHIILFLHQW DOSKD UHTXLUHV D UHFWDQJXODU SHUVRQE\LWHP GDWD PDWUL[ ZLWK QR PLVVLQJ GDWD LH D EDODQFHG GHVLJQ GDWD VHWf +RZHYHU LW LV ZHOO NQRZQ WKDW PLVVLQJ GDWD LV FRPPRQ LQ ODUJHVFDOH VWDQGDUGL]HG HGXFDWLRQDO DFKLHYHPHQW WHVWV VXFK DV WKH 1DWLRQDO $VVHVVPHQW RI (GXFDWLRQDO 3URJUHVV 1$(3f .RUHW] /HZLV 6NHZHV&R[ t %XUVWHLQ /RQJIRUG f DQG WKH 7HVW RI (QJOLVK DV D )RUHLJQ /DQJXDJH 72()/f
PAGE 8

%HFDXVH RI WKH EDODQFHG GHVLJQ UHTXLUHPHQW LQ WKH GDWD VHW WR FRPSXWH FRHIILFLHQW DOSKD PLVVLQJ GDWD SUHVHQW D FKDOOHQJH ZKHQ VWDQGDUG PHWKRGV RI GDWD DQDO\VLV DUH XVHG ,Q WKH ODVW IHZ GHFDGHV D QXPEHU RI PLVVLQJ GDWD WUHDWPHQWV 0'7Vf KDYH EHHQ SURSRVHG VHH UHYLHZ LQ /LWWOH t 5XELQ f $ SURPLVLQJ 0'7 LV PXOWLSOH LPSXWDWLRQ 0,f ZKLFK ZDV RULJLQDOO\ SURSRVHG E\ 5XELQ f 0, LV D PRGHOEDVHG HVWLPDWLRQ WHFKQLTXH IRU DQDO\]LQJ GDWD ZLWK PLVVLQJ VFRUHV 5XELQ f 8VLQJ LQIRUPDWLRQ IURP WKH REVHUYHG SDUW RI WKH GDWD VHW 0, JHQHUDWHV N VHWV RI HTXDOO\ SODXVLEOH YDOXHV IURP WKH VLPXODWHG GLVWULEXWLRQ RI WKH PLVVLQJ GDWD WR UHSODFH WKH PLVVLQJ VFRUHV ZKHUH N LV JUHDWHU WKDQ RQH 7KH PLVVLQJ VFRUHV DUH LPSXWHG N WLPHV 5XELQ f $V D UHVXOW 0, FUHDWHV N YHUVLRQV RI FRPSOHWH GDWD VHWV ZLWK LPSXWHG YDOXHV (DFK FRPSOHWH GDWD VHW FDQ EH DQDO\]HG VHSDUDWHO\ E\ PHDQV RI VWDQGDUG FRPSOHWHFDVH DQDO\VLV PHWKRGV 7KH ILQDO DGMXVWHG SRLQW HVWLPDWH LV REWDLQHG E\ DYHUDJLQJ RYHU WKH N LQWHUPHGLDWH SDUDPHWHU HVWLPDWHV 0, KDV EHHQ VKRZQ WR \LHOG VDWLVIDFWRU\ SDUDPHWHU HVWLPDWHV ZLWK UHODWLYHO\ OLWWOH ELDV *UDKDP t 6FKDIHU f +RZHYHU 0, KDV QRW EHHQ XVHG ZLGHO\ LQ HGXFDWLRQDO VHWWLQJV H[FHSW IRU PDWUL[ VDPSOLQJ DQG VFDOLQJ SURFHGXUHV LQ WKH 1$(3 0LVOHY\ -RKQVRQ t 0XUDNL 1HDO t 1LDQFL f 6HYHUDO UHFHQW VWXGLHV FRPSDUHG GLIIHUHQW 0'7V LQ HVWLPDWLQJ UHOLDELOLW\ FRHIILFLHQWV RQ PHDVXUHV ZLWK PLVVLQJ GDWD 'RZQH\ t .LQJ +DUULVRQ 0DUFRXOLGHV f 'RZQH\ DQG .LQJ f FRPSDUHG WKH DFFXUDF\ RI FRHIILFLHQW DOSKD HVWLPDWLRQ XVLQJ LWHP PHDQ DQG SHUVRQ PHDQ VXEVWLWXWLRQ WR UHSODFH PLVVLQJ GDWD LQ /LNHUW VFDOHV 5HVXOWV LQGLFDWHG WKDW LWHPPHDQ VXEVWLWXWLRQ UHGXFHV WKH UHOLDELOLW\ HVWLPDWH ZKHUHDV SHUVRQPHDQ VXEVWLWXWLRQ LQFUHDVHV WKH UHOLDELOLW\ HVWLPDWH RI WKH VFDOH DV WKH QXPEHU RI PLVVLQJ LWHPV DQG WKH QXPEHU RI UHVSRQGHQWV ZLWK PLVVLQJ LWHPV

PAGE 9

LQFUHDVHV EH\RQG b 'RZQH\ t .LQJ f 0DUFRXOLGHV f FRPSDUHG WKH FRQVLVWHQF\ DQG HIILFLHQF\ RI WZR 0'7V UHVWULFWHG PD[LPXP OLNHOLKRRG DQG DQDO\VLV RI YDULDQFHf LQ HVWLPDWLQJ YDULDQFH FRPSRQHQWV RQ PHDVXUHV ZLWK PLVVLQJ GDWD +H IRXQG WKDW UHVWULFWHG PD[LPXP OLNHOLKRRG 5(0/f SURGXFHV D PRUH HIILFLHQW DQG OHVV ELDVHG YDULDQFH HVWLPDWH ZKHQ b RI WKH GDWD DUH UDQGRPO\ GHOHWHG 0DUFRXOLGHV f $ORQJ WKH VDPH OLQH RI UHVHDUFK +DUULVRQ f HYDOXDWHG VL[ 0'7V OLVWZLVH GHOHWLRQ ]HUR LPSXWDWLRQ VXEVWLWXWLQJ OHDVW VTXDUH $129$ VXEVWLWXWLQJ SUREDELOLWLHV RI FRUUHFW DQVZHUV IURP ORJLVWLF UHJUHVVLRQ HVWLPDWHV +R\WfV $129$ IRUPXOD DQG 5(0/f LQ HVWLPDWLQJ FRHIILFLHQW DOSKD RQ WHVWV ZLWK GLFKRWRPRXVO\VFRUHG LWHPV XQGHU WKH FRQGLWLRQV RI ILYH UDQGRP DQG QRQUDQGRP PLVVLQJ GDWD SDWWHUQV FURVVHG ZLWK WZR VDPSOH VL]HV DQG f 5HVXOWV VKRZHG WKDW 5(0/ SURYLGHV UHDVRQDEOH DFFXUDF\ DQG SUHFLVLRQ IRU WKH HVWLPDWLRQ RI FRHIILFLHQW DOSKD LQ DOO ILYH PLVVLQJ GDWD SDWWHUQV +DUULVRQ f 6WDWHPHQW RI WKH 3UREOHP 5HVXOWV RI +DUULVRQfV f VWXG\ LQGLFDWHG WKDW WKH DYHUDJH ELDV RI WKH FRHIILFLHQW DOSKD ZKHQ XVLQJ HDFK RI WKH VL[ 0'7V LV QHJOLJLEOH OHVV WKDQ f H[FHSW LQ WZR QRQUDQGRP PLVVLQJGDWD VLWXDWLRQV ZKHUH D OLVWZLVH GHOHWLRQ SURFHGXUH LV XVHG 2QH UHDVRQ IRU WKH VPDOO GLVFUHSDQF\ LQ WKH ELDV DPRQJ WKH VL[ 0'7V LV WKDW WKH PD[LPXP DPRXQW RI PLVVLQJ GDWD LQ +DUULVRQfV VWXG\ LV OHVV WKDQ b 5RWK f FLWHG VHYHUDO VLPXODWHG DQG HPSLULFDO 0'7 VWXGLHV LQGLFDWLQJ WKDW WKHUH LV OLWWOH GLIIHUHQFH LQ SDUDPHWHU HVWLPDWHV ZKHQ WKH DPRXQW RI PLVVLQJ GDWD LV OHVV WKDQ b UHJDUGOHVV RI WKH PLVVLQJ GDWD SDWWHUQV UDQGRP RU QRQUDQGRPf 5RWK f VXJJHVWHG WKDW WKH FKRLFH RI 0'7V

PAGE 10

EHFRPHV PRUH LPSRUWDQW ZKHQ WKH DPRXQW RI PLVVLQJ GDWD LQ D GDWD VHW LV EH\RQG b 7KHUHIRUH ZH VWLOO GR QRW NQRZ KRZ VRPH RI WKH 0'7V EHKDYH LQ VLWXDWLRQV ZLWK D PRGHUDWH DPRXQW RI PLVVLQJQHVV +DUULVRQ f IRXQG WKDW WKH PHDQ FRHIILFLHQW DOSKD SURGXFHG E\ 5(0/ LV PRUH SRVLWLYHO\ ELDVHG LH RYHUHVWLPDWHGf WKDQ WKDW FRPSXWHG E\ +R\WfV $129$ LQ VLWXDWLRQV ZKHUH ORZDELOLW\ H[DPLQHHV KDYH PRUH RPLWWHG LWHPV RU ZKHUH WKH\ WHQG WR RPLW WKH PRVW GLIILFXOW LWHPV 7KHUH DUH WZR SRVVLEOH UHDVRQV IRU 5(0/ QRW EHKDYLQJ ZHOO LQ +DUULVRQfV VWXG\ 2QH LV WKDW 5(0/ KDV EHHQ VKRZQ WR SURGXFH HVWLPDWHV WKDW DUH VLJQLILFDQWO\ ELDVHG LQ VLWXDWLRQV ZKHUH VDPSOH VL]H LV VPDOO 1 RU f EHFDXVH 5(0/ LV EDVHG RQ ODUJHVDPSOH WKHRU\ *URVV f 7KH RWKHU LV WKDW 5(0/ SURGXFHV ELDVHG HVWLPDWHV ZKHQ GDWD DUH PLVVLQJ QRQUDQGRPO\ -DPVKLGLDQ t %HQWOHU f 5DWLRQDOH IRU WKH 6WXG\ 7KH SUHVHQW VWXG\ DWWHPSWV WR DGGUHVV VRPH RI WKH OLPLWDWLRQV RI WKH 5(0/ HVWLPDWLRQ SURFHGXUH +DUULVRQ f E\ LPSOHPHQWLQJ 0, ZKLFK KDV EHHQ VKRZQ WR SHUIRUP ZHOO LQ VPDOO VDPSOH VL]HV *UDKDP t 6FKDIHU f DQG LQ QRQUDQGRP PLVVLQJGDWD VLWXDWLRQV *UDKDP +RIHU 'RQDOGVRQ 0DF.LQQRQ t 6FKDIHU f $OWKRXJK 0, LV FRPPRQO\ DSSOLHG WR PLVVLQJ FRQWLQXRXV GDWD LW PD\ DOVR EH DSSOLHG WR GLFKRWRPRXV PLVVLQJ GDWD *UDKDP HW DO f %HFDXVH +DUULVRQfV VWXG\ H[DPLQHG WKH HIIHFWLYHQHVV RI GLIIHUHQW 0'7V XQGHU VOLJKW OHYHOV RI PLVVLQJQHVV LW LV RI LQWHUHVW WR H[DPLQH WKH 0, XQGHU PRUH H[WUHPH OHYHOV RI PLVVLQJQHVV 7KH OHYHO RI PLVVLQJQHVV LV WKHUHIRUH VHW DV KLJK DV b $W WKLV UDQJH RI PLVVLQJQHVV LW ZRXOG EHFRPH PRUH REYLRXV KRZ ZHOO 0, SHUIRUPV

PAGE 11

,W LV ZHOO NQRZQ WKDW GDWD PLVVLQJ FRPSOHWHO\ DW UDQGRP 0&$5f VHOGRP RFFXUV LQ HGXFDWLRQDO VHWWLQJV .URPUH\ t +LQHV f 7KH SUHVHQW VWXG\ WKHUHIRUH IRFXVHV RQ QRQUDQGRP PLVVLQJ GDWD 3XUSRVH DQG 6LJQLILFDQFH RI WKH 6WXG\ 7KH SXUSRVH RI WKLV VWXG\ ZDV WR LQYHVWLJDWH YLD GDWD VLPXODWLRQ WKH DFFXUDF\ RI WKH FRHIILFLHQW DOSKD RQ WHVWV ZLWK PLVVLQJ GDWD UHSODFHG XVLQJ 0, 7KH SHUIRUPDQFH RI 0, ZDV HYDOXDWHG XQGHU WKH FRQGLWLRQV RI WKUHH VDPSOH VL]HV 1 RU f WHQ FRQGLWLRQV RI GLVWULEXWLRQ DQG SHUFHQW RI PLVVLQJQHVV DQG WZR RPLWWLQJ SDWWHUQV RPLWWLQJ LWHP UHVSRQVHV LQ WKH ERG\ DQG RPLWWLQJ UHVSRQVHV DW WKH HQG RI WKH WHVWf 7KH UHVXOWV RI WKLV VWXG\ SURYLGHG DQ LQGLFDWLRQ RI KRZ ZHOO 0, SHUIRUPHG LQ WKH DERYHVWDWHG PLVVLQJ GDWD FRQGLWLRQV XQGHU D VLQJOHIDFHW FURVVHG PRGHO

PAGE 12

&+$37(5 5(9,(: 2) 5(/$7(' /,7(5$785( 7KH ILUVW VHFWLRQ RI WKLV FKDSWHU SURYLGHV DQ RYHUYLHZ RI VRPH FRPPRQO\ XVHG PLVVLQJ GDWD WUHDWPHQWV 0'7Vf ZKLFK LQFOXGH OLVWZLVH GHOHWLRQ YDULDEOH PHDQ VXEVWLWXWLRQ UHJUHVVLRQ LPSXWDWLRQ DQG VWRFKDVWLF UHJUHVVLRQ LPSXWDWLRQ /LPLWDWLRQV RI WKHVH 0'7V DUH KLJKOLJKWHG 7KH VHFRQG VHFWLRQ LV GHYRWHG WR WKH GHYHORSPHQW DQG WKHRUHWLFDO IUDPHZRUN RI PXOWLSOH LPSXWDWLRQ 0,f WKH UHODWLRQVKLS RI 0, DQG %D\HVf WKHRUHP DVVXPSWLRQV DQG FKDUDFWHULVWLFV RI 0, WKH GHVFULSWLRQ RI WKH LPSXWDWLRQ PHWKRGV DQG SURFHGXUHV WR SHUIRUP 0, 7KH ODVW VHFWLRQ GLVFXVVHV WKUHH PDMRU W\SHV RI PLVVLQJ GDWD PHFKDQLVPV DV SURSRVHG E\ /LWWOH DQG 5XELQ f DQG LPSOLFDWLRQV RI HDFK PLVVLQJ GDWD PHFKDQLVP WR WKH DSSOLFDWLRQ RI 0, &RPPRQ 0LVVLQJ 'DWD 7UHDWPHQWV /LVWZLVH 'HOHWLRQ ,Q RUGHU WR WUDQVIRUP WKH PLVVLQJ GDWD PDWUL[ LQWR D UHFWDQJXODU RQH D FRPPRQ SUDFWLFH LV WR H[FOXGH WKRVH H[DPLQHHV ZKR GR QRW UHVSRQG WR DOO LWHPV 7KLV LV FDOOHG WKH OLVWZLVH GHOHWLRQ SURFHGXUH RU WKH FRPSOHWHFDVH DQDO\VLV 7KH FRPSOHWH GDWD ZLWK UHGXFHG VDPSOH VL]H DUH WKHQ XVHG WR HVWLPDWH SRSXODWLRQ SDUDPHWHUV VXFK DV WKH UHOLDELOLW\ FRHIILFLHQW /LVWZLVH GHOHWLRQ LV WKH GHIDXOW RSWLRQ IRU DQDO\VLV LQ PDQ\ SRSXODU VWDWLVWLFDO VRIWZDUH SDFNDJHV VXFK DV WKH 6WDWLVWLFDO $QDO\VLV 6\VWHP 6$6f DQG WKH 6WDWLVWLFDO 3DFNDJHV IRU WKH 6RFLDO 6FLHQFHV 6366;f (YHQ WKRXJK OLVWZLVH GHOHWLRQ LV

PAGE 13

WKH VLPSOHVW DSSURDFK WR KDQGOH PLVVLQJ GDWD LW LV E\ QR PHDQV WKH GHVLUDEOH RQH 6LQFH WKH DQDO\VLV LV EDVHG RQ RQO\ WKRVH H[DPLQHHV ZKR UHVSRQG WR DOO LWHPV D VXEVWDQWLDO DPRXQW RI XVHIXO GDWD LV ORVW ,Q D 0RQWH &DUOR LQYHVWLJDWLRQ .LP DQG &XUU\ f IRXQG WKDW HYHQ ZLWK b UDQGRP QRQUHVSRQVHV RQ HDFK RI WKH YDULDEOHV OLVWZLVH GHOHWLRQ UHVXOWV LQ UHWDLQLQJ RQO\ b RI WKH FDVHV 7KHUH LV DQ DFFRPSDQLHG ORVV RI HIILFLHQF\ RU VWDWLVWLFDO SRZHU LQ WKH HVWLPDWLRQ RI WKH SRSXODWLRQ SDUDPHWHUV HVSHFLDOO\ ZKHQ WKH DPRXQW RI PLVVLQJ GDWD LV KLJK 5DDLMPDNHUV f GHPRQVWUDWHG WKDW OLVWZLVH GHOHWLRQ UHVXOWV LQ D ORVV RI VWDWLVWLFDO SRZHU UDQJLQJ IURP b WR b DV WKH DPRXQW RI PLVVLQJ GDWD LQFUHDVHV IURP b WR b LQ YDULRXV /LNHUWW\SH GDWD /LVWZLVH GHOHWLRQ LV EDVHG RQ WKH DVVXPSWLRQ WKDW WKH GDWD DUH PLVVLQJ FRPSOHWHO\ DW UDQGRP HYHQ WKRXJK WKHUH LV OLWWOH HYLGHQFH WR VXSSRUW WKLV DVVXPSWLRQ LQ HGXFDWLRQDO UHVHDUFK .URPUH\ t +LQHV f :KHQ GDWD DUH QRW PLVVLQJ FRPSOHWHO\ DW UDQGRP HVWLPDWHV DUH ELDVHG (PSLULFDO GDWD VXSSRUW WKDW WKH DYHUDJH ELDV LQFUHDVHV DV WKH DPRXQW RI PLVVLQJ GDWD LQFUHDVHV +DUULVRQ f ,Q +DUULVRQfV f VWXG\ ZKHQ LWHP UHVSRQVHV DUH QRW PLVVLQJ DW UDQGRP OLVWZLVH GHOHWLRQ OHDGV WR UDQJH UHVWULFWLRQ 7KH UHVXOWLQJ PHDQ FRHIILFLHQW DOSKD LV WKHQ VXEVWDQWLDOO\ XQGHUHVWLPDWHG +DUULVRQ f 6LQJOH ,PSXWDWLRQ 3URFHGXUHV %HVLGHV XVLQJ GHOHWLRQ SURFHGXUHV WR KDQGOH PLVVLQJ GDWD VLQJOH LPSXWDWLRQ SURFHGXUHV KDYH DOVR EHHQ XVHG ZLGHO\ LQ HGXFDWLRQDO UHVHDUFK VHH UHYLHZ LQ 5D\PRQG f ,PSXWDWLRQ LQYROYHV ILOOLQJ LQ HDFK PLVVLQJ UHVSRQVH ZLWK D SODXVLEOH YDOXH DQG WKHQ DQDO\]LQJ WKH UHVXOWLQJ GDWD VHW ZLWK WKH LPSXWHG YDOXHV $OVR SODXVLEOH YDOXHV DUH HVWLPDWHG IURP REVHUYHG VFRUHV LQ WKH VWXG\ 7ZR PDMRU DGYDQWDJHV RI LPSXWDWLRQ SURFHGXUHV DUH DV IROORZV

PAGE 14

7KH\ UHWDLQ WKH LQIRUPDWLRQ IURP LQFRPSOHWH FDVHV ZLWKRXW GLVFDUGLQJ DQ\ VFRUHV 7KH UHVXOWLQJ GDWD VHW ZLWK WKH LPSXWHG YDOXHV FDQ EH DQDO\]HG E\ PHDQV RI VWDQGDUG FRPSOHWHFDVH DQDO\VLV PHWKRGV 7KH WKUHH VLQJOH LPSXWDWLRQ SURFHGXUHV WKDW DUH FRPPRQO\ XVHG LQ HGXFDWLRQDO UHVHDUFK DUH YDULDEOH PHDQ VXEVWLWXWLRQ UHJUHVVLRQ LPSXWDWLRQ DQG VWRFKDVWLF UHJUHVVLRQ LPSXWDWLRQ 5D\PRQG 5RWK f 9DULDEOH 0HDQ 6XEVWLWXWLRQ 7R LPSOHPHQW YDULDEOH PHDQ VXEVWLWXWLRQ RU XQFRQGLWLRQDO PHDQ LPSXWDWLRQ HDFK PLVVLQJ VFRUH RI D SDUWLFXODU LWHP LV UHSODFHG ZLWK LWV UHVSHFWLYH PHDQ YDOXH RI DOO QRQPLVVLQJ FDVHV (YHQ WKRXJK LW VHHPV WKDW WKH PHDQ LV D JRRG HVWLPDWH DQG WKH SURFHGXUH LV UHODWLYHO\ HDV\ WR LPSOHPHQW YDULDEOH PHDQ VXEVWLWXWLRQ KDV VHYHUDO VHULRXV GLVDGYDQWDJHV 7KH REVHUYHG YDULDQFH RI DQ LWHP ZLWK LPSXWHG PHDQ YDOXH LV V\VWHPDWLFDOO\ XQGHUHVWLPDWHG LH QHJDWLYHO\ ELDVHGf EHFDXVH LPSXWLQJ D PHDQ YDOXH LQ DQ LWHP LV HTXLYDOHQW WR DGGLQJ ]HUR WR WKH VXP RI WKH VTXDUHG GHYLDWLRQV ZKLFK LV WKH QXPHUDWRU RI WKH IRUPXOD IRU FDOFXODWLQJ YDULDQFH $W WKH VDPH WLPH WKHUH LV DQ LQFUHDVH LQ WKH GHQRPLQDWRU RI WKH YDULDQFH IRUPXOD 1 f DV WKH SURFHGXUH DWWHPSWV WR UHVWRUH WKH RULJLQDO VDPSOH VL]H /DQGHUPDQ /DQG t 3LHSHU 5D\PRQG f $WWHQXDWLRQ RI WKH PDJQLWXGH RI WKH FRYDULDQFH RU FRUUHODWLRQ RI VFRUHV ZLWK ILOOHGLQ PHDQ ZLWK VFRUHV LQ RWKHU LWHPV FDQ EH H[SODLQHG LQ D VLPLODU IDVKLRQ &RQFHSWXDOO\ WKH LPSXWHG YDOXHV DUH D FRQVWDQW DQG WKH\ DUH XQUHODWHG WR VFRUHV LQ RWKHU LWHPV WKHUHIRUH LQWHULWHP FRUUHODWLRQV DUH DWWHQXDWHG 'RZQH\ DQG .LQJ f VKRZHG WKDW WKH VHYHULW\ RI DWWHQXDWLRQ RQ FRUUHODWLRQ LQFUHDVHV DV WKH DPRXQW RI LPSXWHG YDOXHV LQFUHDVHV $V LQGLFDWHG LQ HTXDWLRQ f D UHGXFWLRQ LQ WKH DYHUDJH LQWHULWHP FRUUHODWLRQ

PAGE 15

UHVXOWV LQ D GHFUHDVH LQ WKH FRHIILFLHQW DOSKD 'RZQH\ t .LQJ f )LJXUH JUDSKLFDOO\ VKRZV WKH FRHIILFLHQW DOSKD VSXULRXVO\ GHFUHDVHV ZKHQ WKHUH LV D UHGXFWLRQ LQ WKH DYHUDJH LQWHULWHP FRUUHODWLRQ DW WKH ORZHU HQG LH U f $QRWKHU GLVDGYDQWDJH UHODWHG WR WKH DWWHQXDWLRQ RI YDULDELOLW\ RI WKH LWHP LV WKDW WKH VWDQGDUG HUURU RI HVWLPDWH LV PXFK WRR VPDOO UHVXOWLQJ LQ ELDVHG LQIHUHQFHV /LWWOH t 5XELQ f )LQDOO\ YDULDEOH PHDQ VXEVWLWXWLRQ GRHV QRW XVH LQIRUPDWLRQ IURP RWKHU LWHPV WR LPSURYH WKH DFFXUDF\ RI LPSXWDWLRQ /DQGHUPDQ HW DO f )LJXUH 5HODWLRQVKLS EHWZHHQ WKH FRHIILFLHQW DOSKD DQG WKH DYHUDJH LQWHULWHP FRUUHODWLRQ ZKHQ V HTXDOV WR

PAGE 16

5HJUHVVLRQ ,PSXWDWLRQ 5HJUHVVLRQ LPSXWDWLRQ RU FRQGLWLRQDO PHDQ VXEVWLWXWLRQ LV XVHG WR ILOO LQ PLVVLQJ VFRUHV RI DQ LWHP ZLWK YDOXHV SUHGLFWHG IURP D UHJUHVVLRQ PRGHO E\ XWLOL]LQJ LQIRUPDWLRQ IURP RQH RU PRUH KLJKO\ UHODWHG REVHUYHG YDULDEOHV RU SUHGLFWRUV :KHQ WKH UHVSRQVH YDULDEOH ZLWK PLVVLQJ VFRUHV LV GLFKRWRPRXV LQ QDWXUH D ORJLVWLF UHJUHVVLRQ PRGHO LV XVHG LQVWHDG 7KH ORJLVWLF UHJUHVVLRQ SURGXFHV D SUHGLFWHG SUREDELOLW\ RI D UHVSRQVH EHLQJ PLVVLQJ 6XSSRVH LV DQ 79 [ YHFWRU RI WKH UHVSRQVHV IRU H[DPLQHHV < \c \ff DQG LV FRPSRVHG RI D VHW RI WKH REVHUYHG VFRUHV < ^\c \Df DQG D VHW RI WKH PLVVLQJ VFRUHV


PAGE 17

HTXDWLRQ E\ UHJUHVVLQJ < RQ ; $IWHU HVWLPDWLQJ WKH UHJUHVVLRQ FRHIILFLHQWV IURP WKH REVHUYHG VFRUHV D SUHGLFWHG VFRUH IRU


PAGE 18

5HJUHVVLRQ LPSXWDWLRQ FRQYH\V D IDOVH VHQVH RI DFFXUDF\ WKDW DOO PLVVLQJ VFRUHV FDQ EH SUHGLFWHG IURP ;P ZLWKRXW HUURUV %\ WUHDWLQJ WKH LPSXWHG YDOXHV DV WKH NQRZQ REVHUYHG VFRUHV UHJUHVVLRQ LPSXWDWLRQ IDLOV WR DFFRXQW SURSHUO\ IRU WKH YDULDELOLW\ RU XQFHUWDLQW\ DERXW QRW NQRZLQJ WKH PLVVLQJ VFRUHV LH ZKLFK YDOXH WR LPSXWHf 5XELQ t 6FKHQNHU f )DLOXUH RI WKH UHJUHVVLRQ PRGHO WR LQFRUSRUDWH UHVLGXDO YDULDELOLW\ LQ WKH LPSXWDWLRQ YDULDQFH OHDGV WR VWDQGDUG HUURUV RI HVWLPDWHV ELDV WRZDUG ]HUR LH WRR VPDOOf /LWWOH t 5XELQ f )RU H[DPSOH %URZQVWRQH DQG 9DOOHWWD f IRXQG WKDW WKH OHDVW VTXDUHV VWDQGDUG HUURU HVWLPDWHV DUH b OHVV WKDQ WKHLU WUXH YDOXHV 6WRFKDVWLF 5HJUHVVLRQ ,PSXWDWLRQ ,Q RUGHU WR UHVWRUH WKH SUHGLFWLRQ HUURUV LQ WKH LPSXWHG YDOXHV LH WKH YDULDELOLW\ DURXQG WKH UHJUHVVLRQ OLQHf D UDQGRP UHVLGXDO HUURU LV DGGHG WR HDFK SUHGLFWHG YDOXH 7KH UDQGRP UHVLGXDO FDQ EH GUDZQ UDQGRPO\ ZLWK UHSODFHPHQW HLWKHU IURP D VWDQGDUG QRUPDO GLVWULEXWLRQ ZLWK D PHDQ HTXDO WR ]HUR DQG D VWDQGDUG GHYLDWLRQ HTXDO WR WKH VWDQGDUG HUURU RI HVWLPDWH IRU < %HDWRQ f RU IURP WKH GLVWULEXWLRQ RI UHVLGXDOV RI WKH UHJUHVVLRQ HVWLPDWH IRU < *UDKDP HW DO f 7KH SXUSRVH RI GUDZLQJ ZLWK UHSODFHPHQW LV WR HQVXUH WKDW HDFK GUDZQ YDOXH KDV HTXDO SUREDELOLW\ ,Q VWRFKDVWLF UHJUHVVLRQ LPSXWDWLRQ HDFK PLVVLQJ UHVSRQVH LV UHSODFHG E\ LWV FRQGLWLRQDO PHDQ SOXV D UDQGRP UHVLGXDO IURP < /LWWOH t 5XELQ f +RZHYHU VWRFKDVWLF UHJUHVVLRQ LPSXWDWLRQ UHVWRUHV RQO\ RQH SDUW RI WKH YDULDELOLW\ WKH HUURUV RI SUHGLFWLRQ 7KHUH LV DQRWKHU SDUW RI YDULDELOLW\ WKH VDPSOLQJ YDULDELOLW\ LQ ZKLFK WKH YDOXHV RI WKH HVWLPDWHG UHJUHVVLRQ FRHIILFLHQWV DUH XQFHUWDLQ *UDKDP DQG 6FKDIHU f H[SODLQHG WKDW WKH UHJUHVVLRQ OLQH HVWLPDWHG IURP
PAGE 19

UHJUHVVLRQ LPSXWDWLRQ FDQQRW UHIOHFW SURSHUO\ WKH VDPSOLQJ YDULDELOLW\ EHFDXVH LW ODFNV WKH YDULDWLRQ RI WKH LPSXWHG YDOXHV DPRQJ VHYHUDO VHWV RI LPSXWDWLRQV /LWWOH t 5XELQ f 7R LQFRUSRUDWH WKH VDPSOLQJ YDULDELOLW\ LQ WKH HVWLPDWLRQ RI WKH UHJUHVVLRQ SDUDPHWHUV PXOWLSOH LPSXWDWLRQ 0,f LV UHTXLUHG 5XELQ f 0XOWLSOH ,PSXWDWLRQ ,QWURGXFWLRQ 0XOWLSOH LPSXWDWLRQ ZDV RULJLQDOO\ SURSRVHG E\ 5XELQ f ,W LV D PRGHOEDVHG HVWLPDWLRQ WHFKQLTXH IRU DQDO\]LQJ GDWD ZLWK PLVVLQJ VFRUHV 5XELQ f 8VLQJ LQIRUPDWLRQ IURP WKH REVHUYHG SDUW RI WKH GDWD VHW 0, JHQHUDWHV N VHWV RI HTXDOO\ SODXVLEOH YDOXHV IURP WKH VLPXODWHG GLVWULEXWLRQ RI WKH PLVVLQJ GDWD WR UHSODFH WKH PLVVLQJ VFRUHV ZKHUH N LV JUHDWHU WKDQ RQH 7KH PLVVLQJ VFRUHV DUH LPSXWHG N WLPHV DQG PXOWLSOH LPSXWDWLRQV ZLWKLQ RQH PRGHO DUH FDOOHG UHSHWLWLRQV 5XELQ f $V D UHVXOW 0, FUHDWHV N YHUVLRQV RI FRPSOHWH GDWD VHWV ZLWK LPSXWHG YDOXHV (DFK FRPSOHWH GDWD VHW FDQ EH DQDO\]HG VHSDUDWHO\ E\ PHDQV RI VWDQGDUG FRPSOHWHFDVH DQDO\VLV PHWKRGV 7KH HVWLPDWH DQG LWV DVVRFLDWHG YDULDQFH IURP HDFK VHSDUDWH DQDO\VLV FDQ EH FRPELQHG WR IRUP DQ XQELDVHG ILQDO SDUDPHWHU HVWLPDWH XQGHU WKH FRUUHFWO\ VSHFLILHG PRGHO /LWWOH t 5XELQ f 7KH ILQDO YDULDQFH LQFRUSRUDWHV WKH YDULDELOLW\ ZLWKLQ WKH LPSXWDWLRQ LH WKH SUHGLFWLRQ HUURUf DQG WKH YDULDWLRQ RI WKH LPSXWHG YDOXHV DPRQJ N VHWV RI LPSXWDWLRQV LH WKH VDPSOLQJ YDULDELOLW\f WR UHIOHFW WKH WUXH DFFXUDF\ RI WKH HVWLPDWLRQ 7KHRUHWLFDO )UDPHZRUN /HW 4 GHQRWH D VFDODU SRSXODWLRQ TXDQWLW\ VXFK DV D FRHIILFLHQW DOSKD RU D UHJUHVVLRQ FRHIILFLHQWf WR EH HVWLPDWHG DQG OHW 4 4 <
PAGE 20

REVHUYHG DQG PLVVLQJ GDWD 0XOWLSOH LPSXWDWLRQ XVHV LQIRUPDWLRQ IURP WKH REVHUYHG VFRUHV
PAGE 21

%DVHG RQ HTXDWLRQ f WKH DFWXDO SRVWHULRU SUREDELOLW\ GLVWULEXWLRQ RI 4 DW D SDUWLFXODU YDOXH 4c FDQ EH REWDLQHG E\ GUDZLQJ DQ LQILQLWH QXPEHU RI UHSHDWHG LQGHSHQGHQW YDOXHV IRU


PAGE 22

7KHVH WZR VWHSV DUH UHSHDWHG N WLPHV WR \LHOG N VHWV RI LPSXWHG YDOXHV IRU WKH PLVVLQJ VFRUHV


PAGE 23

I?
PAGE 24

,Q DGGLWLRQ WKH YDULDEOHV LQ WKH GDWD VHW DUH DVVXPHG WR KDYH PXOWLYDULDWH QRUPDO GLVWULEXWLRQ 6LPXODWLRQ VWXGLHV *UDKDP +RIHU t 0F.LQQRQ :DQJ $QGHUVRQ t 3UHQWLFH f VXSSRUWHG WKDW WKH 0, HVWLPDWRU LV UREXVW HYHQ ZKHQ WKH GDWD PRGHO GHSDUWV IURP EHLQJ PXOWLYDULDWH QRUPDOO\ GLVWULEXWHG 3URSHU ,PSXWDWLRQ 0HWKRG $Q LPSXWDWLRQ PHWKRG LV UHJDUGHG DV SURSHU ZKHQ LW LQFRUSRUDWHV DSSURSULDWH YDULDELOLW\ LH XQFHUWDLQW\ DERXW WKH PLVVLQJ VFRUHV DQG WKH VDPSOLQJ YDULDELOLW\f LQ FUHDWLQJ PXOWLSO\ LPSXWHG YDOXHV XQGHU D FRUUHFWO\ VSHFLILHG PRGHO 5XELQ f 5XELQ f KDV VKRZQ WKDW RQH ZD\ WR DFKLHYH SURSHU LPSXWDWLRQ LV IRU WKH LPSXWDWLRQ SURFHGXUH WR IROORZ WKH %D\HVf WKHRUHP RI LQILQLWH LQGHSHQGHQW GUDZV RI


PAGE 25

/DUJHVDPSOH VL]H 1 f 5XELQ t 6FKHQNHU f $OO FDXVHV RI PLVVLQJQHVV DUH LQFOXGHG LQ WKH LPSXWDWLRQ PRGHO *UDKDP HW DO f ,PSXWDWLRQ 0HWKRGV 5XELQ DQG 6FKHQNHU f SURSRVHG WZR W\SHV RI LPSXWDWLRQ PHWKRGVf§LPSOLFLW DQG H[SOLFLW ,PSOLFLW RU QRQSDUDPHWHULF PHWKRGV DUH DSSOLFDEOH IRU GLVFUHWH GDWD DQG LQYROYH GUDZLQJ YDOXHV RQO\ IURP < DQG WKHQ DVVLJQLQJ WKHP WR


PAGE 26

'UDZ Q YDOXHV DW UDQGRP ZLWK UHSODFHPHQW IURP WKH Q SRVVLEOH YDOXHV WR FUHDWH D ERRWVWUDS VDPSOH GLVWULEXWLRQ VXFK DV D VFDOHG PXOWLQRPLDO GLVWULEXWLRQ 7KHQ LQGHSHQGHQWO\ GUDZ QP PLVVLQJ YDOXHV ZLWK UHSODFHPHQW IURP WKH ERRWVWUDS VDPSOH GLVWULEXWLRQ 5XELQ t 6FKHQNHU f 7KLV SURFHVV LV UHSHDWHG N WLPHV WR \LHOG N VHWV RI LPSXWHG YDOXHV DQG HDFK VHW RI LPSXWDWLRQV FRPHV IURP D GLIIHUHQW ERRWVWUDS VDPSOH RI < ([SOLFLW 0HWKRGV ([SOLFLW PHWKRGV GHILQH WKH PRGHO IRU WKH GLVWULEXWLRQ RI WKH UHVSRQVH YDULDEOH < HJ QRUPDO OLQHDU UHJUHVVLRQ PRGHO RU ORJLVWLF UHJUHVVLRQ PRGHOf DQG D VHW RI SUHGLFWRUV ; WKDW HQWHUV WKH PRGHO WR FUHDWH LPSXWDWLRQV /LWWOH 5XELQ t 6FKHQNHU f )XOO\ QRUPDO LPSXWDWLRQ 2QFH DJDLQ VXSSRVH < LV DQ 79 [ YHFWRU RI WKH UHVSRQVH IRU H[DPLQHHV < \c \ff DQG LV FRPSRVHG RI ERWK D VHW RI WKH REVHUYHG VFRUHV < L\L \f DQG D VHW RI WKH PLVVLQJ VFRUHV


PAGE 27

^QJf£ [J ZKHUH D LV WKH HVWLPDWHG YDULDQFH RI < DQG LV HTXDO WR 9 \ \%f DQG QFJf WL [LRJ GHQRWHV D FKLVTXDUH UDQGRP YDULDEOH ZLWK QJ GHJUHHV RI IUHHGRP 7R FUHDWH DQ LPSXWDWLRQ
PAGE 28

WKH PLVVLQJ GDWD


PAGE 29

IRU r N ZKHUH / LV D UDQGRPO\ GUDZQ YDULDWH IURP D FKLVTXDUHG GLVWULEXWLRQ ZLWK Q J GHJUHHV RI IUHHGRP 6XEVWLWXWH D <
PAGE 30

(DFK VHW RI SUHGLFWHG YDOXHV IRU

(4?<9DU 4 < ( 4 ? <
PAGE 31

PHWKRG 0, \LHOGV N LQWHUPHGLDWH SDUDPHWHU HVWLPDWHV 4 44Pf DQG N DVVRFLDWHG YDULDQFH HVWLPDWHV Wf IRU N 7KH ILQDO DGMXVWHG SRLQW HVWLPDWH 4 LV REWDLQHG E\ DYHUDJLQJ RYHU WKH N LQWHUPHGLDWH SDUDPHWHU HVWLPDWHV ZKLFK LV H L  Lf N cA? :KHUHDV WKH ILQDO HVWLPDWHG WRWDO YDULDQFH LV REWDLQHG E\ WKH VXP RI WKH DYHUDJH DVVRFLDWH YDULDQFH ZLWKLQ D VHW RI N LPSXWHG YDOXHV DQG WKH YDULDQFH DFURVV LQGHSHQGHQW VHWV RI LPSXWHG YDOXHV ZKLFK LV 7 8 ? Nanf% f ZKHUH 8 LV WKH DYHUDJH ZLWKLQLPSXWDWLRQ YDULDQFH ZLWKLQ D VHW RI N LPSXWHG YDOXHV DQG LV H[SUHVVHG DV f r DQG % LV WKH YDULDQFH DFURVV LQGHSHQGHQW VHWV RI LPSXWHG YDOXHV DQG LV H[SUHVVHG DV % 7AL4!4\ f %DFLN 0XUK\ DQG $QWKRQ\ f LQGLFDWHG WKDW WKH ZLWKLQLPSXWDWLRQ YDULDQFH LV D PHDVXUH RI WKH XQFHUWDLQW\ DERXW QRW NQRZLQJ WKH PLVVLQJ GDWD DQG WKH EHWZHHQ LPSXWDWLRQ YDULDQFH LV D PHDVXUH RI RUGLQDU\ VDPSOLQJ YDULDWLRQ 7KH LQIODWLRQ IDFWRU $If DFFRXQWV IRU WKH VLPXODWLRQ HUURUV LQ XVLQJ D ILQLWH QXPEHU RI LPSXWDWLRQV LH N RFf %DUQDUG t 0HQJ f 0XOWLSOH LPSXWDWLRQ FRUUHFWO\ DGMXVWV WKH VWDQGDUG HUURU RI HVWLPDWHV RI WKH SDUDPHWHU E\ LQFOXGLQJ ZLWKLQ DQG EHWZHHQLPSXWDWLRQ YDQ DQHHV

PAGE 32

:KHQ WKHUH DUH QR PLVVLQJ GDWD 4?4N DUH LGHQWLFDO DQG WKH EHWZHHQ LPSXWDWLRQ YDULDQFH % EHFRPHV ]HUR DQG 7 LV HTXDO WR 8 :KHQ N LH VLQJOH LPSXWDWLRQf % FDQQRW EH HVWLPDWHG 7 LV WKHQ HTXDO WR 8 DQG WKH YDULDQFH LV V\VWHPDWLFDOO\ XQGHUHVWLPDWHG +HLWMDQ t 5XELQ f $V N LQFUHDVHV ERWK 4 DQG 7 GHFUHDVH KHQFH UHVXOWLQJ LQ JUHDWHU SUHFLVLRQ RI VDPSOH VWDWLVWLFV /LWWOH t 6FKHQNHU f 7KH H[WHQW RI LQIOXHQFH RI PLVVLQJ GDWD RQ WKH HVWLPDWLRQ RI 4 LV GHWHUPLQHG E\ ERWK \DQG U 7KH IDFWRU U HVWLPDWHV WKH SURSRUWLRQDO LQFUHDVH LQ YDULDQFH GXH WR PLVVLQJ GDWD DQG FDQ EH H[SUHVVHG DV U ? Nanf%8 \O If f ZKHUH WKH UDWLR RI % WR 8 LV D UHIOHFWLRQ RI KRZ PXFK LQIRUPDWLRQ LQ WKH PLVVLQJ SDUW RI WKH GDWD UHODWLYH WR WKH REVHUYHG SDUW 6FKDIHU t 2OVHQ f DQG \LV DQ HVWLPDWH RI WKH IUDFWLRQ RI PLVVLQJ GDWD DERXW 4 /LWWOH t 6FKHQNHU f /LWWOH DQG 5XELQ f SRLQWHG RXW WKDW \LV HTXDO WR WKH IUDFWLRQ RI GDWD PLVVLQJ RQO\ ZKHQ WKH PLVVLQJ GDWD PHFKDQLVP LV PLVVLQJ FRPSOHWHO\ DW UDQGRP 8QFHUWDLQW\ 6LQFH WKH LPSXWHG YDOXHV DUH QRW WKH WUXH REVHUYHG VFRUHV 0, WDNHV LQWR DFFRXQW WKH XQFHUWDLQW\ DERXW WKH WUXH YDOXHV RI WKH PLVVLQJ VFRUHV LQ WKH SDUDPHWHU HVWLPDWHV E\ GUDZLQJ SDUDPHWHU r IURP WKH REVHUYHGGDWD SRVWHULRU GLVWULEXWLRQI^?
PAGE 33

,Q DGGLWLRQ WR LQFRUSRUDWLQJ WKH XQFHUWDLQW\ DERXW QRW NQRZLQJ WKH PLVVLQJ VFRUHV 0, DOVR WDNHV LQWR DFFRXQW WKH IDFW WKDW WKH SRSXODWLRQ GLVWULEXWLRQ RI


PAGE 34

LPSXWDWLRQV LW LQFUHDVHV WKH HIILFLHQF\ RI 4 WR b DQG b UHVSHFWLYHO\ $V VKRZQ LQ )LJXUH XQOHVV WKH IUDFWLRQ RI PLVVLQJ GDWD LV XQXVXDOO\ KLJK b RU PRUHf WKH HIILFLHQF\ JDLQHG E\ LPSOHPHQWLQJ N EH\RQG LV PLQLPDO 5XELQ DQG 6FKHQNHU f FRQFOXGHG WKDW RQO\ D IHZ QXPEHU RI UHSHWLWLRQV N f DUH QHHGHG WR SURGXFH SRLQW HVWLPDWHV WKDW DUH FORVH WR IXOO\ HIILFLHQW ZKHQ WKH DPRXQW RI PLVVLQJ GDWD LV PRGHUDWH HJ bf )LJXUH 3HUFHQW HIILFLHQF\ RI 0, HVWLPDWLRQ XVLQJ GLIIHUHQW QXPEHU RI LPSXWDWLRQV LQ WKUHH OHYHOV RI PLVVLQJQHVV b b DQG bf

PAGE 35

6HYHUDO HPSLULFDO VWXGLHV VXSSRUWHG WKDW VWDQGDUG HUURU HVWLPDWHV RI WKH SDUDPHWHUV ZHUH XQGHUHVWLPDWHG E\ b LQ VLQJOH LPSXWDWLRQ ZKHQ FRPSDUHG WR WKH RQHV LQ 0, &UDZIRUG 7HQQVWHGW t 0F.LQOD\ +HLWMDQ t 5XELQ /DQGHUPDQ HW DO f %DVHG RQ WKH UHVXOWV RI WZR 0RQWH &DUOR VLPXODWLRQ VWXGLHV /LWWOH t 5XELQ 5XELQ t 6FKHQNHU f 7DEOH VKRZV WKH FRPSDULVRQ RI WKH DFWXDO FRQILGHQFH LQWHUYDO &Of FRYHUDJH IRU 4 ZKHQ N RU ZLWK WKH QRPLQDO FRYHUDJH DW b b RU b 8QGHU WKH LJQRUDEOH UHVSRQVH DVVXPSWLRQ WKH IUDFWLRQ RI PLVVLQJ GDWD LQ WKHVH WZR ODUJHVDPSOH 1 f VLPXODWLRQ VWXGLHV ZDV b $V LQGLFDWHG ZKHQ N VLQJOH LPSXWDWLRQf WKH GLVFUHSDQF\ EHWZHHQ WKH DFWXDO DQG QRPLQDO FRYHUDJH UDQJHV IURP b ZKHUHDV ZKHQ N WKH GLVFUHSDQF\ LV RQO\ b :KHQ N WKHUH LV QR GLVFUHSDQF\ DW DOO ZKLFK PHDQV WKDW LQIHUHQFHV DUH YDOLG 7DEOH $QDO\WLF /DUJH6DPSOH 1! f &RYHUDJH LQ bf RI 6LQJOH N f DQG 0XOWLSOH N RU f ,PSXWDWLRQ 3URFHGXUH ZLWK 0LVVLQJ 'DWD (TXDOV b N 1RPLQDO &RYHUDJH b b b 1RWH $GDSWHG IURP /LWWOH DQG 5XELQ f DQG 5XELQ DQG 6FKHQNHU f

PAGE 36

5XELQ DQG 6FKHQNHU f GHPRQVWUDWHG WKDW N VKRXOG LQFUHDVH IURP WR DV WKH QRQUHVSRQVH UDWH LQFUHDVHV IURP b WR b LQ RUGHU WR DFKLHYH D VDWLVIDFWRU\ &O FRYHUDJH LH FORVH WR WKH QRPLQDO YDOXHf 5XELQ DQG 6FKHQNHU f DOVR SRLQWHG RXW WKDW LPSURYHPHQWV LQ WKH DFWXDO &O FRYHUDJH GLPLQLVK DV N LQFUHDVHV 7KH GLIIHUHQFHV LQ VWDQGDUG HUURUV RU &O FRYHUDJH EHWZHHQ N DQG N! KDYH EHHQ VKRZQ WR EH QHJOLJLEOH +HLWMDQ t 5XELQ :DQJ 6HGUDQVN t -LQQ f %DVHG RQ WKH OLWHUDWXUH WKH QXPEHU RI LPSXWDWLRQV GHSHQGV RQ 7KH DPRXQW RI PLVVLQJ LQIRUPDWLRQ $V WKH SHUFHQW RI PLVVLQJ GDWD LQFUHDVHV WKH DPRXQW RI XQFHUWDLQW\ DERXW WKH LPSXWHG YDOXHV LQFUHDVHV 7R DFFXUDWHO\ LQFRUSRUDWH WKLV XQFHUWDLQW\ LW UHTXLUHV DQ LQFUHDVH LQ WKH QXPEHU RI LPSXWDWLRQV 6LQFH LPSXWHG YDOXHV DUH DYHUDJHG RYHU N LPSXWDWLRQV WKH LPSXWDWLRQ YDULDQFH LV UHGXFHG DV WKH QXPEHU RI LPSXWDWLRQV LQFUHDVHV .DOWRQ t .DVSU]\N 5XELQ f 7KH W\SH RI PLVVLQJ GDWD PHFKDQLVPV %DVHG RQ VHYHUDO VLPXODWLRQ DQG HPSLULFDO VWXGLHV *O\QQ /DLUG DQG 5XELQ f DQG 5DJKXQDWKDQD DQG 6LVFRYLFN f GHPRQVWUDWHG WKDW QRQLJQRUDEOH QRQUHVSRQVH SDWWHUQV UHTXLUH D ODUJHU QXPEHU RI LPSXWDWLRQV N f WKDQ LJQRUDEOH QRQUHVSRQVH SDWWHUQV WR DFKLHYH D VDWLVIDFWRU\ &O FRYHUDJH $GYDQWDJHV RI 0XOWLSOH ,PSXWDWLRQ $V LQ VLQJOH LPSXWDWLRQ WKH UHVXOWLQJ GDWD VHW ZLWK WKH LPSXWHG YDOXHV FDQ EH DQDO\]HG E\ PHDQV RI VWDQGDUG FRPSOHWHFDVH DQDO\VLV PHWKRGV %HFDXVH 0, LQYROYHV DYHUDJLQJ RYHU N LQWHUPHGLDWH SDUDPHWHU HVWLPDWHV WKH ILQDO SRLQW HVWLPDWH GHULYHG IURP 0, LV PRUH HIILFLHQW WKDQ WKDW IURP VLQJOH LPSXWDWLRQ 5XELQ f 7KH ILQDO HVWLPDWHG YDULDQFH DOVR UHIOHFWV WKH WUXH YDULDQFH RI WKH SDUDPHWHU

PAGE 37

6WXGLHV DIILUPHG WKDW 0, SURGXFHV DFFXUDWH VWDQGDUG HUURUV LH HIILFLHQWf IRU SDUDPHWHU HVWLPDWHV DV LW FRUUHFWO\ DGMXVWV IRU QRQUHVSRQVH ELDV +HLWMDQ t /LWWOH 5XELQ t 6FKHQNHU ;LH t 3DLN f 7KH HVWLPDWHG DFWXDO &O FRYHUDJH LV FORVH WR WKH QRPLQDO OHYHOV /LWWOH t 5XELQ 5XELQ t 6FKHQNHU :DQJ HW DO f ZKLFK PHDQV 0, \LHOGV YDOLG LQIHUHQFHV 0, KDV EHHQ VKRZQ WR \LHOG VDWLVIDFWRU\ SDUDPHWHU HVWLPDWHV ZLWK UHODWLYHO\ OLWWOH ELDV HYHQ XQGHU WKH IROORZLQJ FRQGLWLRQV 6DPSOH VL]HV DUH VPDOO HJ f *UDKDP t 6FKDIHU f /LWWOH f UHFRPPHQGHG WR XVH 0, IRU VPDOO VDPSOHV DQG 0/( IRU ODUJH VDPSOHV 'DWD DUH PLVVLQJ LQ ODUJH DPRXQWV HJ bf *UDKDP t 6FKDIHU f 0RGHOV DUH UHODWLYHO\ ODUJH DQG FRPSOH[ HJ SUHGLFWRU PRGHOf *UDKDP t 6FKDIHU f ,JQRUDELOLW\ DVVXPSWLRQ LV VXVSHFW *UDKDP HW DO f 'DWD GLVWULEXWLRQ LV VNHZHG *UDKDP HW DO :DQJ HW DO f 0RGHO RI WKH GDWD GLVWULEXWLRQ LV PLVVSHFLILHG *UHHQODQG t )LQNOH f (PSLULFDO DQG VLPXODWLRQ VWXGLHV KDYH VKRZQ WKDW 0, LV IDU VXSHULRU WR GHOHWLRQ SURFHGXUHV PHDQ VXEVWLWXWLRQ UHJUHVVLRQ LPSXWDWLRQ &UDZIRUG HW DO *UDKDP HW DO f DQG VLPSOH KRW GHFN SURFHGXUH 'H&DQLR t :DWNLQV f ZLWK UHJDUG WR ELDV HIILFLHQF\ DQG YDOLGLW\ RI LQWHUYDO HVWLPDWHV ZKHQ WKH XQGHUO\LQJ 0, PRGHO VSHFLILFDWLRQ LV FRUUHFW

PAGE 38

/LPLWDWLRQV RI 0XOWLSOH ,PSXWDWLRQ 6LQFH WKH REVHUYHG VFRUHV < SURYLGH LQGLUHFW HYLGHQFH DERXW WKH OLNHO\ YDOXHV RI WKH PLVVLQJ RQHV


PAGE 39

GLVWULEXWLRQ GHSHQGLQJ RQ D SDUDPHWHU YHFWRU A IRU WKH QRQUHVSRQVH PRGHO ,I DQ H[DPLQHH UHVSRQGV WR DQ LWHP 5 LI DQ H[DPLQHH RPLWV DQ LWHP 5 6LQFH < <
PAGE 40

/LWWOH DQG 5XELQ f GLVWLQJXLVKHG WKUHH W\SHV RI PLVVLQJ GDWD PHFKDQLVPV PLVVLQJ FRPSOHWHO\ DW UDQGRP 0&$5f PLVVLQJ DW UDQGRP 0$5f DQG QRQLJQRUDEOH PLVVLQJ 1,0f 0LVVLQJ &RPSOHWHO\ DW 5DQGRP 0LVVLQJ GDWD PHFKDQLVP LV 0&$5 LI WKH SUREDELOLW\ RI WKH PLVVLQJ GDWD LQGLFDWRU "f LV LQGHSHQGHQW RI ERWK WKH REVHUYHG VFRUHV < DQG WKH PLVVLQJ VFRUHV


PAGE 41

I^< f UHSUHVHQWV D PRGHO IRU WKH FRQGLWLRQDO SUREDELOLW\ GLVWULEXWLRQ RI WKH REVHUYHG VFRUHV < 6LQFH < DQG 5 DUH LQGHSHQGHQW LQ HTXDWLRQ f WKH VDPSOLQJ GLVWULEXWLRQ RI WKH REVHUYHG VFRUHV LV D PDUJLQDO RI WKH FRPSOHWH GDWD GLVWULEXWLRQ /DLUG f 7KLV VLWXDWLRQ LPSOLHV WKDW VDPSOLQJEDVHG LQIHUHQFHV VXFK DV UHJUHVVLRQ LPSXWDWLRQ WKDW PDNH XVH RI WKH GLVWULEXWLRQDO SURSHUWLHV RI WKH PDUJLQDO GLVWULEXWLRQ RI WKH REVHUYHG VFRUHV DUH XQELDVHG DQG YDOLG +HLWMDQ f +RZHYHU 0&$5 PDNHV WKH VWURQJHVW DVVXPSWLRQ DPRQJ WKH WKUHH W\SHV RI PLVVLQJ GDWD PHFKDQLVPV /LWWOH f 7KH OLNHOLKRRG IXQFWLRQ RI WKH REVHUYHG VFRUHV XQGHU WKH 0&$5 DVVXPSWLRQ LQ HTXDWLRQ f FDQ EH IDFWRUL]HG LQWR WZR FRPSRQHQWV RQH SHUWDLQLQJ VROHO\ WR WKH VWUXFWXUDO SDUDPHWHU RI WKH PRGHO DQG WKH RWKHU SHUWDLQLQJ VROHO\ WR WKH QXLVDQFH SDUDPHWHU \RI WKH PLVVLQJ GDWD PHFKDQLVP /\?< ; 5f FF I^5 \fI< ? f f :KHQ WKH MRLQW SDUDPHWHU VSDFH RI \f LV WKH SURGXFW RI WKH SDUDPHWHU VSDFH RI HDFK VHSDUDWHO\ WKDW LV WKH WZR SDUDPHWHUV DQG \DUH LQGHSHQGHQW WKH OLNHOLKRRG RI $EDVHG RQ < /
PAGE 42

REVHUYHG VFRUHV I^
PAGE 43

$V LQ HTXDWLRQ f WKH SUREDELOLW\ GLVWULEXWLRQ RI < LV WKH PDUJLQDO SUREDELOLW\ GLVWULEXWLRQ I< ;"f ?I
PAGE 44

5XELQ f VKRZHG WKDW WKH UHVSRQVH PHFKDQLVP JHQHUDWLQJ WKH PLVVLQJ GDWD LV LJQRUDEOH IRU OLNHOLKRRGEDVHG LQIHUHQFHV LI WKH SDUDPHWHU 2RI WKH GDWD PRGHO DQG WKH SDUDPHWHU \ DVVRFLDWHG ZLWK WKH PLVVLQJ GDWD PHFKDQLVP DUH LQGHSHQGHQW RU IXQFWLRQDOO\ XQUHODWHG DQG WKH PLVVLQJ GDWD DUH 0$5 &RQFHSWXDOO\ WKH PLVVLQJ YDOXHV XQGHU WKH 0$5 DVVXPSWLRQ DUH D UDQGRP VDPSOH IURP WKH FRPSOHWH GDWD DIWHU FRQGLWLRQLQJ RQ WKH PHDVXUHG YDULDEOHV ; LQ WKH LPSXWDWLRQ PRGHO WKHUHIRUH WKH SURFHVV RI FUHDWLQJ WKHVH PLVVLQJ YDOXHV FDQ EH PRGHOHG XVLQJ WKHVH YDULDEOHV %DUQDUG 'X +LOO t 5XELQ f )RU H[DPSOH WKH SHUFHQW RI PLVVLQJ UHVSRQVHV LQ DQ LWHP RI DQ DFKLHYHPHQW WHVW GLIIHUV LQ JURXSV RI H[DPLQHHV ZLWK KLJK PHGLXP DQG ORZ FRJQLWLYHDELOLW\ VFRUHV DQG WKH VFRUHV RI WKH FRJQLWLYH DELOLW\ IRU DOO H[DPLQHHV DUH NQRZQ 8QGHU WKH 0$5 DVVXPSWLRQ WKH PLVVLQJ UHVSRQVHV DUH UDQGRPO\ GLVWULEXWHG ZLWKLQ WKHVH WKUHH VXEJURXSV RI H[DPLQHHV HYHQ ZKHQ WKH UHVSRQVHV DUH QRW PLVVLQJ DW UDQGRP DFURVV VXEJURXSV 5RWK f ,Q RWKHU ZRUGV WKH PHDVXUHG YDULDEOHV ; LH WKH FRJQLWLYH DELOLW\ LQ WKLV H[DPSOHf FDQ DFFRXQW IRU WKH GLIIHUHQFHV LQ WKH GLVWULEXWLRQ RI < EHWZHHQ QRQUHVSRQGHQWV DQG UHVSRQGHQWV /LWWOH t 6FKHQNHU f ,Q DGGLWLRQ WR 0$5 5RWK f LGHQWLILHG DQRWKHU SDWWHUQ RI PLVVLQJQHVV ZKHQ PLVVLQJ GDWD DUH UHODWHG WR RWKHU YDULDEOHV ,Q WKLV SDWWHUQ PLVVLQJ GDWD DUH QRQUDQGRPO\ GLVWULEXWHG DFURVV DQG ZLWKLQ VXEJURXSV )RU H[DPSOH PRUH VFRUHV DUH PLVVLQJ DW WKH ERWWRP UDQJH RI WKH KLJK FRJQLWLYHDELOLW\ JURXS EXW D UHODWLYHO\ IHZ PLVVLQJ VFRUHV DW WKH WRS UDQJH RI WKH VDPH JURXS 5RWK f

PAGE 45

$FFHVVLEOH 0LVVLQJ 'DWD 0HFKDQLVP *UDKDP DQG 'RQDOGVRQ f GHILQHG WKH PLVVLQJ GDWD PHFKDQLVP DV fDFFHVVLEOHf ZKHQ WKH FDXVH RI PLVVLQJQHVV KDV EHHQ PHDVXUHG ZKHUHDV fLJQRUDEOHf UHIHUV WR D FRPELQDWLRQ RI DFFHVVLEOH DQG SURSHU XVH RI WKH FDXVH RI PLVVLQJQHVV IRU DQDO\VLV *UDKDP +RIHU DQG 3LFFLQLQ f H[SODLQHG WKDW XQOHVV WKH FDXVH RI PLVVLQJQHVV LV LQFRUSRUDWHG SURSHUO\ LQ WKH DQDO\VLV WKH PHFKDQLVP ZLOO QRW EH LJQRUDEOH 6FKDIHU f SRLQWHG RXW WKDW ZKHWKHU WKH PLVVLQJ GDWD PHFKDQLVP LV LJQRUDEOH LV FORVHO\ UHODWHG WR WKH IXOOQHVV RI WKH REVHUYHG VFRUHV < WKH UHOHYDQW YDULDEOHV ; LH FDXVHV RI PLVVLQJQHVVf DQG WKH FRPSOH[LW\ RI WKH GDWD PRGHO< ; f ,I < DQG ; FRQWDLQ D ORW RI LQIRUPDWLRQ IRU SUHGLFWLQJ


PAGE 46

1RQLJQRUDEOH 0LVVLQJ 8QGHU WKH 1,0 DVVXPSWLRQ WKH FRQGLWLRQDO SUREDELOLW\ GLVWULEXWLRQ RI PLVVLQJ GDWD LQGLFDWRU 5 JLYHQ ; LV D IXQFWLRQ RI WKH PLVVLQJ VFRUHV


PAGE 47

I< 5 Sf UHSUHVHQWV WKH GLVWULEXWLRQ RI < FRQGLWLRQLQJ RQ WKH PLVVLQJ GDWD LQGLFDWRU 5 I5 Wf UHSUHVHQWV WKH PDUJLQDO GLVWULEXWLRQ RI WKH PLVVLQJ GDWD LQGLFDWRU IRU ZKHWKHU RU QRW < LV PLVVLQJ DQG S DQG Q DUH WKH WZR XQNQRZQ SDUDPHWHUV FRUUHVSRQGLQJ WR WKH WZR GLVWULEXWLRQV 6HOHFWLRQ PRGHOV VSHFLI\ WKH SUHFLVH IRUP RI WKH QRQUHVSRQVH PRGHO ZKHUHDV SDWWHUQPL[WXUH PRGHOV LQFRUSRUDWH WKH DVVXPSWLRQ RI WKH PLVVLQJ GDWD PHFKDQLVP WKURXJK UHVWULFWLRQV RQ WKH SDUDPHWHUV /LWWOH f :KHQ 5 LV LQGHSHQGHQW RI < LH ZKHQ S DQG LS Wf WKH PLVVLQJ GDWD PHFKDQLVP EHFRPHV 0&$5 DQG WKH VHOHFWLRQ PRGHOV DUH HTXLYDOHQW WR WKH SDWWHUQPL[WXUH PRGHOV 7KH OLNHOLKRRG IXQFWLRQ RQ WKH REVHUYHG VFRUHV XQGHU WKH 1,0 DVVXPSWLRQ LQFOXGH D PLVVLQJ GDWD LQGLFDWRU 5 DQG WKH PLVVLQJ GDWD SDUDPHWHUV LS /LS?<;5fFF I
PAGE 48

IXOO VSHFLILFDWLRQ RI WKH SUREDELOLW\ PRGHO ZLWK WKH MRLQW GLVWULEXWLRQ RI < WKH QRQUHVSRQVH SDWWHUQ 5 DQG WKH PHDVXUHG YDULDEOHV ; f 6HQVLWLYLW\ $QDO\VLV 2IWHQ WLPH OLWWOH LV NQRZQ DERXW WKH QRQUHVSRQVH PHFKDQLVP WKDW FUHDWHV WKH PLVVLQJ UHVSRQVHV LQ D SDUWLFXODU DFKLHYHPHQW WHVW 0LVVLQJ UHVSRQVHV FDQ DULVH IURP D YDULHW\ RI UHDVRQV LQFOXGLQJ D FRPELQDWLRQ RI LJQRUDEOH DQG QRQLJQRUDEOH PHFKDQLVPV 6FKDIHU t 2OVHQ f +RZHYHU GLVWLQJXLVKLQJ EHWZHHQ LJQRUDEOH DQG QRQLJQRUDEOH PHFKDQLVPV LH 0$5 DQG 1,0f UHOLHV RQ IXQGDPHQWDOO\ XQWHVWDEOH DVVXPSWLRQV &XUUDQ HW DO f &XUUDQ DQG DVVRFLDWHV f GHPRQVWUDWHG WKDW WKHVH DVVXPSWLRQV FDQQRW EH WHVWHG IRUPDOO\ IURP WKH HPSLULFDO GDWD DW KDQG $QDO\VHV VKRXOG EH FRQGXFWHG WR FRPSDUH WKH HVWLPDWHV DFURVV D QXPEHU RI SODXVLEOH PLVVLQJGDWD PRGHOV ,QIHUHQFHV IURP WKH VHQVLWLYLW\ DQDO\VLV UHYHDO XQFHUWDLQW\ DERXW UHDVRQV IRU QRQUHVSRQVH %HDWRQ /LWWOH t 5XELQ f 6HQVLWLYLW\ DQDO\VLV FDQ DOVR EH FRQGXFWHG DFURVV DOWHUQDWLYH LPSXWLQJ SURFHGXUHV LQ D VLPLODU PDQQHU WR UHYHDO XQFHUWDLQW\ DERXW GLIIHUHQW SRVVLEOH LPSXWDWLRQ PRGHOV 8QGHU WKH 1,0 DVVXPSWLRQ VHQVLWLYLW\ DQDO\VLV FDQ EH SHUIRUPHG E\ FRPSDULQJ HVWLPDWHV EHWZHHQ VHOHFWLRQ DQG SDWWHUQPL[WXUH PRGHOV ,I WKH UHVXOWV DUH FRQVLVWHQW FRQILGHQFH DERXW WKH FRQFOXVLRQV LV HVWDEOLVKHG 2Q WKH RWKHU KDQG LI WKH UHVXOWV GHSHQG RQ WKH IRUP RI WKH PRGHO WKHQ PRUH VSHFLILF FRQGLWLRQV FDQ EH VXJJHVWHG DERXW ZKHUH WKH FRQFOXVLRQ FDQ DSSO\ /LWWOH f

PAGE 49

6XPPDU\ 9DOLG LQIHUHQFHV IURP 0, UHO\ RQ WKH LQFOXVLRQ RI D FRUUHFW PLVVLQJ GDWD PHFKDQLVP $V GLVFXVVHG DERYH IDFWRUL]DWLRQ RI WKH SRVWHULRU SUHGLFWLYH SUREDELOLW\ GLVWULEXWLRQ GHSHQGV RQ ZKHWKHU WKH PLVVLQJ GDWD PHFKDQLVP LV LJQRUDEOH RU QRW :KHQ WKH PLVVLQJ GDWD PHFKDQLVP LV QRW LJQRUDEOH WKH PLVVLQJ GDWD LQGLFDWRU KDV WR EH LQFRUSRUDWHG LQWR WKH SRVWHULRU SUHGLFWLYH SUREDELOLW\ GLVWULEXWLRQ DQG WKH OLNHOLKRRG IXQFWLRQ RI LV QRW MXVW EDVHG RQ WKH REVHUYHG VFRUHV

PAGE 50

&+$37(5 0(7+2'2/2*< 7KLV FKDSWHU ILUVW GHVFULEHV WKH GDWD JHQHUDWLRQ SURFHGXUH WKH GHVLJQ RI WKH VWXG\ DQG WKH 0, SURFHGXUH 'DWD JHQHUDWLRQ ZDV EDVHG RQ WKH WKUHHSDUDPHWHU ORJLVWLF PRGHO +DPEOHWRQ t 6ZDPLQDWKDQ f 7KH GHVLJQ RI WKLV VWXG\ LQYROYHG WKH GLVWULEXWLRQ DQG SHUFHQW RI PLVVLQJ UHVSRQVHV DV D IXQFWLRQ RI WKH DELOLW\ RI WKH H[DPLQHHV DQG WKH GLIILFXOW\ RI WKH LWHPV LQ RQH RPLWWLQJ SDWWHUQ DQG WKH DELOLW\ RI WKH H[DPLQHHV DQG WKH VHTXHQFH RI WKH LWHPV LQ DQRWKHU RPLWWLQJ SDWWHUQ 7KH SURFHGXUH RI 0, EDVHG RQ D ORJLVWLF UHJUHVVLRQ PRGHO ZLWK D XQLYDULDWH < ZDV RXWOLQHG 7KH VHFRQG SDUW RI WKLV FKDSWHU GLVFXVVHV KRZ WR HYDOXDWH WKH HIIHFWLYHQHVV RI 0, RQ KDQGOLQJ PLVVLQJ GDWD 7R DFKLHYH WKH JRDO RI HYDOXDWLQJ WKH HIIHFWLYHQHVV RI 0, RQ KDQGOLQJ PLVVLQJ GDWD VHYHUDO VWHSV ZHUH UHTXLUHG 6WHS 6LPXODWHG D FRPSOHWH GDWD PDWUL[ RI LWHP UHVSRQVHV IRU D VSHFLILHG QXPEHU RI H[DPLQHHV 6WHS &RPSXWHG WKH FRHIILFLHQW DOSKD 7KH FRHIILFLHQW DOSKD RI WKLV RULJLQDO FRPSOHWH GDWD LH b RI PLVVLQJf VHUYHG DV D EHQFKPDUN IRU ODWHU FRPSDULVRQ 6WHS 1RQUDQGRPO\ GHOHWHG FHUWDLQ SHUFHQW RI LWHP UHVSRQVHV IURP WKH FRPSOHWH H[DPLQHHE\LWHP PDWUL[ JHQHUDWHG LQ 6WHS (DFK PLVVLQJ GDWD VHW ZDV JHQHUDWHG LQ D VLPLODU IDVKLRQ 6WHS 5HSODFHG WKH RPLWWLQJ LWHP UHVSRQVHV LQ 6WHS XVLQJ 0, 6WHS &RPSXWHG WKH FRHIILFLHQW DOSKD RI WKH GDWD VHW WKDW ZDV UHVWRUHG E\ 0,

PAGE 51

6WHS &RPSDUHG WKH FRHIILFLHQW DOSKD IURP 6WHS ZLWK WKH RQH IURP 6WHS 6LPXODWLRQ 3URFHGXUH /HW : EH DQ 1 [ 3 PDWUL[ UHSUHVHQWLQJ D FRPSOHWH H[DPLQHHE\LWHP GDWD VHW 1 LV WKH QXPEHU RI H[DPLQHHV LQ WKH GDWD VHW DQG 3 LV WKH QXPEHU RI WHVW LWHPV ,Q WKLV VWXG\ 3 ZDV IL[HG WR EH LQ DOO FRQGLWLRQV $ LWHP WHVW ZDV XVHG EHFDXVH LW UHSUHVHQWHG WHVW OHQJWKV IUHTXHQWO\ IRXQG LQ HGXFDWLRQDO DQG SV\FKRORJLFDO DSSOLFDWLRQV 'Dt f@ H[S>'DL^t f@ DQG\ f ZKHUH 3Lccf LV WKH SUREDELOLW\ RI WKH \nWK H[DPLQHH ZLWK DQ DELOLW\ e DQVZHULQJ WKH WK LWHP FRUUHFWO\ D LV WKH GLVFULPLQDWLRQ SDUDPHWHU RI LWHP L E LV WKH GLIILFXOW\ SDUDPHWHU RI LWHP L Fc LV WKH SVHXGRFKDQFH SDUDPHWHU RI LWHP L i LV WKH DELOLW\ RI WKH\nWK H[DPLQHH DQG LV D VFDOLQJ IDFWRU ZKLFK LV

PAGE 52

7R FRPSXWH 3Af LW UHTXLUHG WKH WKUHH SDUDPHWHUV LH D E t Ff DQG WKH DELOLW\ SDUDPHWHUV LH ef WR EH NQRZQ 7KH WKUHH SDUDPHWHUV ZHUH GUDZQ IURP WKH GLVWULEXWLRQV RI NQRZQ PHDQ DQG VWDQGDUG GHYLDWLRQ +DUULVRQ f XVHG WKH FULWHULD RI 2VKLPDfV f VWXG\ WR VDPSOH WKH WKUHH SDUDPHWHUV 7KH FULWHULD IRU 2VKLPDfV f VWXG\ ZHUH DV IROORZV WKH LWHP GLVFULPLQDWLRQ SDUDPHWHUV LH Df ZHUH UDQGRPO\ GUDZQ IURP D ORJQRUPDO GLVWULEXWLRQ ZLWK D PHDQ RI DQG D VWDQGDUG GHYLDWLRQ RI WKH LWHP GLIILFXOW\ SDUDPHWHUV LH Ef ZHUH UDQGRPO\ GUDZQ IURP D QRUPDO GLVWULEXWLRQ ZLWK D PHDQ RI DQG D VWDQGDUG GHYLDWLRQ RI DQG WKH SVHXGRFKDQFH SDUDPHWHUV LH Ff ZHUH UDQGRPO\ GUDZQ IURP D QRUPDO GLVWULEXWLRQ ZLWK D PHDQ RI DQG D VWDQGDUG GHYLDWLRQ RI $FFRUGLQJ WR 2VKLPD f WKH GLVWULEXWLRQV RI WKHVH WKUHH SDUDPHWHUV ZHUH VLPLODU WR WKH UHDO GDWD VHW RI D VSHHGHG WHVW LH 72()/f DV UHSRUWHG E\ :D\ DQG 5HHVH f 7KH DELOLW\ SDUDPHWHUV e IRU WKH H[DPLQHHV ZHUH UDQGRPO\ JHQHUDWHG IURP D VWDQGDUG QRUPDO $f GLVWULEXWLRQ 7KH SUHVHQW VWXG\ XVHG WKH VDPH YDOXHV RI WKH WKUHH LWHP SDUDPHWHUV DV LQ +DUULVRQfV S f VWXG\ WR JHQHUDWH WKH LWHP UHVSRQVHV 7DEOH f 7KH FRUUHODWLRQ EHWZHHQ WKH LWHP GLIILFXOW\ SDUDPHWHUV Ef DQG WKH LWHP GLVFULPLQDWLRQ SDUDPHWHUV Df ZDV S WZRWDLOHGf ZKHUHDV WKH FRUUHODWLRQ EHWZHHQ WKH LWHP GLIILFXOW\ SDUDPHWHUV Ef DQG WKH SVHXGRFKDQFH SDUDPHWHUV Ff ZDV S WZRWDLOHGf 7KH UHVSRQVH GDWD IRU D SDUWLFXODU LWHP JLYHQ E\ DQ H[DPLQHH ZLWK D WUDLW OHYHO RI DELOLW\ 3^if ZDV GHWHUPLQHG E\ FRPSXWLQJ WKH SUREDELOLW\ RI FRUUHFWO\ DQVZHULQJ WKDW LWHP EDVHG RQ WKH NQRZQ LWHP DQG DELOLW\ SDUDPHWHUV 6LQFH WKH LWHP UHVSRQVHV ZHUH GLFKRWRPRXV LQ QDWXUH WKH UHVSRQVH SUREDELOLWLHV 3W^if ZHUH FRQYHUWHG LQWR ELQDU\ UHVSRQVHV E\ FRPSDULQJ WR D UDQGRP QXPEHU U JHQHUDWHG IURP D XQLIRUP GLVWULEXWLRQ LQ

PAGE 53

WKH LQWHUYDO EHWZHHQ DQG 7KH UDQGRP QXPEHU U ZDV XVHG WR GHWHUPLQH ZKHWKHU WKH VFRUH RI D SDUWLFXODU LWHP ZDV FRUUHFW RU LQFRUUHFW ,I WKH UHVSRQVH SUREDELOLW\ REWDLQHG IURP WKH HTXDWLRQ f ZDV JUHDWHU WKDQ WKH UDQGRP QXPEHU UDO ZDV DVVLJQHG ZKLFK LQGLFDWHG WKH H[DPLQHHfV UHVSRQVH WR WKDW SDUWLFXODU LWHP ZDV FRUUHFW RWKHUZLVH D ZDV DVVLJQHG ZKLFK LQGLFDWHG WKH H[DPLQHHfV UHVSRQVH WR WKDW SDUWLFXODU LWHP ZDV LQFRUUHFW .XGHU5LFKDUGVRQ )RUPXOD 6LQFH WKH WHVW LWHPV DUH VFRUHG GLFKRWRPRXVO\ .XGHU5LFKDUGVRQ IRUPXOD .5 f ZDV XVHG WR FDOFXODWH WKH LQGH[ RI LQWHUQDO FRQVLVWHQF\ IRU WKH WHVW LWHPV 7KH .XGHU 5LFKDUGVRQ IRUPXOD LV HTXLYDOHQW WR FRHIILFLHQW DOSKD ZKHQ WKH LWHP UHVSRQVHV DUH GLFKRWRPRXV LQ QDWXUH .XGHU t 5LFKDUGVRQ f .5 f§f§f§ 0 V R[ ZKHUH V LV WKH QXPEHU RI LWHPV LQ WKH WHVW DO LV WKH YDULDQFH RI WKH WHVW VFRUHV 3L LV WKH SURSRUWLRQ RI VXEMHFWV DQVZHULQJ LWHP FRUUHFWO\ Tc LV WKH SURSRUWLRQ RI VXEMHFWV DQVZHULQJ LWHP LQFRUUHFWO\ DQG ScTc LV WKH YDULDQFH RI VFRUHV RQ D VLQJOH LWHP L f

PAGE 54

7DEOH ,WHP 3DUDPHWHUV 8VHG IRU 7HVW 6LPXODWLRQ ,WHP 'LVFULPLQDWLRQ Df 'LIILFXOW\ Ef 3VHXGRFKDQFH Ff 1RWH $GDSWHG IURP +DUULVRQ f

PAGE 55

'HVLJQ RI 6WXG\ 7KLV VWXG\ UHSUHVHQWV D [ > [ f @ [ GHVLJQ ZLWK WKUHH IDFWRUV VDPSOH VL]H OHYHOVf SHUFHQW RI H[DPLQHHV ZLWK PLVVLQJ LWHPV OHYHOVf SHUFHQW RI LWHPV PLVVLQJ IRU HDFK H[DPLQHH ZLWK PLVVLQJ LWHPV OHYHOVf DQG RPLWWLQJ SDWWHUQ OHYHOVf WKDW ZHUH IXOO\ FURVVHG $Q DGGLWLRQDO FRQGLWLRQ ZLWK GLVSURSRUWLRQDO SHUFHQW RI H[DPLQHHV PLVVLQJ LWHPV WKDW ZHUH QRQUDQGRPO\ GLVWULEXWHG DFURVV DQG ZLWKLQ HDFK DELOLW\ JURXS ZDV LQFOXGHG 7KH UDWLRQDOHV IRU VHOHFWLQJ WKH OHYHOV LQ HDFK IDFWRU ZHUH GHVFULEHG EHORZ 6DPSOH 6L]H 7KH WKUHH OHYHOV RI VDPSOH VL]H FKRVHQ LQ WKLV VWXG\ ZHUH 1 RU 7KH VDPSOH VL]H RI H[DPLQHHV LV W\SLFDO IRU YDOLGDWLRQ VWXGLHV 6FKPLGW +XQWHU t 8UU\ f 7KH VDPSOH VL]H RI ZDV VDPH DV WKH UHDO GDWD WKDW 5DJKXQDWKDQD DQG 6LVFRYLFN f XVHG LQ VWXG\LQJ WKH SHUIRUPDQFH RI 0, 7KHVH WKUHH OHYHOV UHSUHVHQWLQJ WKH VPDOO WR ODUJHVDPSOH VL]HV ZHUH DOVR XVHG E\ *UDKDP DQG 6FKDIHU f WR HYDOXDWH WKH HIILFLHQF\ RI 0, LQ D VLPXODWLRQ VWXG\ 7KH SUHVHQW VWXG\ DGRSWHG WKHVH WKUHH OHYHOV RI VDPSOH VL]H WR DOORZ FRPSDULVRQ RI WKH SHUIRUPDQFH RI 0, ZLWK WKDW RI RWKHU 0'7V LQYHVWLJDWHG E\ +DUULVRQ f 'LVWULEXWLRQ DQG 3HUFHQW RI 0LVVLQJ 5HVSRQVHV ,Q RUGHU WR VLPXODWH D PRUH UHDOLVWLF GLVWULEXWLRQ DQG SHUFHQW RI QRQUHVSRQVH DFURVV WHVW LWHPV WKH GLVWULEXWLRQ DQG SHUFHQW RI PLVVLQJ UHVSRQVHV ZHUH EDVHG RQ WKH ILQGLQJV IURP D ODUJH VFDOH VWXG\ WKH 5HDGLQJ &RPSUHKHQVLRQ VXEWHVW /HYHO RI WKH &RPSUHKHQVLYH 7HVW RI %DVLF 6NLOOV )RUP 6 &OX[WRQ t 0DQGHYLOOH f ,Q &OX[WRQ DQG 0DQGHYLOOHfV f VWXG\ WKH\ VWUDWLILHG RQH WKRXVDQG WKLUG JUDGH VWXGHQWV LQWR WKUHH DELOLW\ OHYHOVf§KLJK PHGLXP DQG ORZ 7KH\ IRXQG WKH SURSRUWLRQ RI VWXGHQWV ZLWK

PAGE 56

PLVVLQJ LWHPV ZLWKLQ HDFK VWUDWLILHG DELOLW\ OHYHO ZDV b IRU WKH KLJK DELOLW\ JURXS b IRU WKH PHGLXPDELOLW\ JURXS DQG b IRU WKH ORZDELOLW\ JURXS 7KH\ DOVR UHSRUWHG WKH SURSRUWLRQ RI PLVVLQJ LWHPV RXW RI WKH LWHPV LQ WKH VXEWHVWf IRU VWXGHQWV ZLWKLQ HDFK VWUDWLILHG DELOLW\ OHYHO ZDV DSSUR[LPDWHO\ b IRU WKH KLJKDELOLW\ JURXS b IRU WKH PHGLXPDELOLW\ JURXS DQG b IRU WKH ORZDELOLW\ JURXS 7KH FRUUHODWLRQ EHWZHHQ WKH DELOLW\ RI VWXGHQWV DQG WKH QXPEHU RI LWHPV PLVVLQJ LQ WKH ERG\ RI WKH WHVW ZDV DQG WKH FRUUHODWLRQ EHWZHHQ WKH DELOLW\ RI VWXGHQWV DQG WKH QXPEHU RI LWHPV PLVVLQJ DW WKH HQG RI WKH WHVW ZDV &OX[WRQ t 0DQGHYLOOH f %DVHG RQ WKH UDQJH RI WKH SURSRUWLRQ RI VWXGHQWV ZLWK PLVVLQJ LWHPV ZLWKLQ HDFK DELOLW\ OHYHO DQG WKH UDQJH RI SURSRUWLRQ RI LWHPV PLVVLQJ SURYLGHG LQ &OX[WRQ DQG 0DQGHYLOOHfV f VWXG\ WKH GLVWULEXWLRQ DQG SHUFHQW RI PLVVLQJ UHVSRQVHV LQ WKLV VWXG\ ZHUH FRQVWUXFWHG LQ IRXU VWHSV )LUVW WKH DELOLW\ RI WKH H[DPLQHHV LQ D VDPSOH ZHUH UDQN RUGHUHG 6HFRQG WKH H[DPLQHHV LQ HDFK GDWD VHW ZHUH VWUDWLILHG LQWR WKUHH DELOLW\ OHYHOV 6WUDWLILFDWLRQ ZDV EDVHG RQ WKH DVVXPSWLRQ WKDW WKH GDWD ZHUH QRUPDOO\ GLVWULEXWHG 1 f 3OXV DQG PLQXV RQH VWDQGDUG GHYLDWLRQ LQ HDFK VDPSOH ZHUH XVHG DV WKH FXWRII WR VWUDWLI\ WKH WKUHH DELOLW\ JURXSV $V D UHVXOW DSSUR[LPDWH b RI WKH H[DPLQHHV ZHUH ZLWKLQ WKH RQH VWDQGDUG GHYLDWLRQ EDQG DQG WKHVH VWXGHQWV ZHUH FODVVLILHG DV WKH PHGLXPDELOLW\ JURXS $ERXW b RI WKH H[DPLQHHV ZHUH DERYH RQH VWDQGDUG GHYLDWLRQ DQG WKHVH VWXGHQWV ZHUH FODVVLILHG DV WKH KLJKDELOLW\ JURXS DQG DERXW b RI H[DPLQHHV ZHUH EHORZ RQH VWDQGDUG GHYLDWLRQ DQG WKHVH VWXGHQWV ZHUH FODVVLILHG DV WKH ORZDELOLW\ JURXS )RU WKH SHUFHQW RI H[DPLQHHV ZLWK PLVVLQJ LWHPV b(0,f WKUHH FRQGLWLRQV b(0,L b(0O DQG b(0Of ZHUH FRQVWUXFWHG ,Q WKH ILUVW FRQGLWLRQ b(0,L WKH

PAGE 57

SHUFHQW RI WKH KLJK PHGLXP DQG ORZDELOLW\ H[DPLQHHV PLVVLQJ VRPH WHVW LWHPV ZHUH b b DQG b UHVSHFWLYHO\ ,Q WKH VHFRQG FRQGLWLRQ b(0O WKH SHUFHQW RI WKH KLJK PHGLXP DQG ORZDELOLW\ H[DPLQHHV PLVVLQJ VRPH WHVW LWHPV ZHUH b b DQG b ,Q WKH WKLUG FRQGLWLRQ b(0, WKH SHUFHQW RI WKH KLJK PHGLXP DQG ORZDELOLW\ H[DPLQHHV PLVVLQJ VRPH WHVW LWHPV ZHUH b b DQG b 7KH DERYH WKUHH FRQGLWLRQV UHVSHFWLYHO\ FRUUHVSRQGHG WR WKH PLQLPXP WKH PHGLDQ DQG WKH PD[LPXP SHUFHQW RI H[DPLQHHV ZLWK PLVVLQJ LWHPV LQ HDFK DELOLW\ OHYHO SURYLGHG E\ &OX[WRQ DQG 0DQGHYLOOH f )RXUWK IRU WKH SHUFHQW RI LWHPV PLVVLQJ LQ WKRVH H[DPLQHHV ZLWK PLVVLQJ LWHPV UHVSRQVHV b,0f DQRWKHU WKUHH FRQGLWLRQV b,0L b,0 DQG b,0f ZHUH FRQVWUXFWHG 7KH ILUVW FRQGLWLRQ b,0L ZDV b RI WKH LWHPV PLVVLQJ LQ WKH KLJKDELOLW\ JURXS b RI WKH LWHPV PLVVLQJ LQ WKH PHGLXPDELOLW\ JURXS DQG b RI WKH LWHPV PLVVLQJ LQ WKH ORZ DELOLW\ JURXS 7KH VHFRQG FRQGLWLRQ b,0 ZDV b RI WKH LWHPV PLVVLQJ LQ WKH KLJK DELOLW\ JURXS b RI WKH LWHPV PLVVLQJ LQ WKH PHGLXPDELOLW\ JURXS DQG b RI WKH LWHPV PLVVLQJ LQ WKH ORZDELOLW\ JURXS 7KH WKLUG FRQGLWLRQ b,0 ZDV b RI WKH LWHPV PLVVLQJ LQ WKH KLJKDELOLW\ JURXS b RI WKH LWHPV PLVVLQJ LQ WKH PHGLXPDELOLW\ JURXS DQG b RI WKH LWHPV PLVVLQJ LQ WKH ORZDELOLW\ JURXS 7KH WKUHH FRQGLWLRQV UHVSHFWLYHO\ FRUUHVSRQGHG WR WKH PLQLPXP WKH PHGLDQ DQG PD[LPXP SHUFHQW RI LWHPV PLVVLQJ LQ HDFK DELOLW\ OHYHO SURYLGHG E\ &OX[WRQ DQG 0DQGHYLOOH f 7KH DERYH WZR VHWV RI FRQGLWLRQV ZHUH FURVVHG WR FUHDWH QLQH PLVVLQJ FRQGLWLRQV DV VKRZQ LQ )LJXUH )RU H[DPSOH WKH UHVXOWV RI RQH FRPELQDWLRQ ZHUH b RI WKH KLJKDELOLW\ H[DPLQHHV ZLWK WKUHH PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf b RI WKH PHGLXPDELOLW\ H[DPLQHHV ZLWK HLJKW PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf

PAGE 58

DQG b RI WKH ORZDELOLW\ H[DPLQHHV ZLWK WHQ PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf 7KH GLVWULEXWLRQ DQG SHUFHQW RI PLVVLQJ UHVSRQVHV UHSUHVHQWHG WKH W\SLFDO UDQJH RI PLVVLQJ GDWD LQ HGXFDWLRQDO WHVWV ZKLFK LV DSSUR[LPDWHO\ b 5RWK f 1RWH 0D[ 0D[LPXP 0HG 0HGLXP DQG 0LQ 0LQLPXP )LJXUH 'LVWULEXWLRQ DQG SHUFHQW RI PLVVLQJ UHVSRQVHV LQ WKH QLQH PLVVLQJ FRQGLWLRQV

PAGE 59

,Q DGGLWLRQ WR WKH QLQH PLVVLQJ FRQGLWLRQV DQ DGGLWLRQDO FRQGLWLRQ ZLWK GLVSURSRUWLRQDO SHUFHQW RI H[DPLQHHV RPLWWHG LWHPV WKDW ZHUH QRQUDQGRPO\ GLVWULEXWHG DFURVV DQG ZLWKLQ HDFK DELOLW\ JURXS ZDV LQFOXGHG 5RWK f )RU H[DPSOH PRUH LWHPV ZHUH PLVVLQJ DW WKH ERWWRP UDQJH RI WKH KLJKDELOLW\ JURXS EXW D UHODWLYHO\ IHZHU LWHPV ZHUH PLVVLQJ DW WKH WRS UDQJH RI WKH VDPH JURXS 5RWK f 7KH SURFHGXUH ZDV WR VWUDWLI\ HDFK DELOLW\ JURXS KLJK PHGLXP DQG ORZf LQWR WKUHH VXEVWUDWD 6WUDWLILFDWLRQ RQFH DJDLQ ZDV EDVHG RQ SOXV DQG PLQXV RQH VWDQGDUG GHYLDWLRQ RI WKH VDPSOH VL]H ZLWKLQ HDFK RI WKH WKUHH DELOLW\ JURXSV 7KH SHUFHQW RI H[DPLQHHV PLVVHG VRPH LWHPV ZLWKLQ HDFK VXEVWUDWXP DELOLW\ JURXS b(0,VfZDV LQ WKH KLJKDELOLW\ JURXSf LQ WKH PHGLXPDELOLW\ JURXSf LQ WKH ORZDELOLW\ JURXSf 7KH FRUUHVSRQGLQJ SHUFHQW RI LWHP PLVVLQJ ZLWKLQ HDFK DELOLW\ VXEVWUDWXP b(06f ZDV IRU WKH KLJKDELOLW\ JURXSf IRU WKH PHGLXPDELOLW\ JURXSf DQG IRU WKH ORZDELOLW\ JURXSf 7KH WZR VLWXDWLRQV ZHUH WKHQ FURVVHG WR IRUP WKH WHQWK FRQGLWLRQ 7DEOH VXPPDUL]HG WKH GLVWULEXWLRQ DQG SHUFHQW RI PLVVLQJ UHVSRQVHV RI WKH WHQ PLVVLQJ FRQGLWLRQV

PAGE 60

7DEOH 6XPPDU\ WKH 'LVWULEXWLRQ DQG 3HUFHQW RI 0LVVLQJ 5HVSRQVHV RI WKH 7HQ 0LVVLQJ &RQGLWLRQV &RQGLWLRQ 'HVFULSWLRQ b(0, [ b,0 b RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ RQH PLVVLQJ LWHP LH b RI WKH WHVW LWHPVf SOXV b RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ IRXU PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ HLJKW PLVVLQJ LWHPV LH b RI WKH LWHPVf 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD LV DSSUR[LPDWHO\ b b(0, [ b,0 b RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ WZR PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf SOXV b RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ VL[ PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ QLQH PLVVLQJ LWHPV LH b RI WKH LWHPVf 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD LV DSSUR[LPDWHO\ b b(0, [ b,0 b RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ WKUHH PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf SOXV b RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ HLJKW PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ WHQ PLVVLQJ LWHPV LH b RI WKH LWHPVf 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD LV DSSUR[LPDWHO\ b b(0, [ b,0 b RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ RQH PLVVLQJ LWHP LH b RI WKH WHVW LWHPVf SOXV b RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ IRXU PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ HLJKW PLVVLQJ LWHPV LH b RI WKH LWHPVf 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD LV DSSUR[LPDWHO\ b

PAGE 61

&RQGLWLRQ b(0, [ b,0 b(0, [ b,0 b(0, [ b,0 b(0, [ b,0 b(0, [ b,0 'HVFULSWLRQ b RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ WZR PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf SOXV b RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ VL[ PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ QLQH PLVVLQJ LWHPV LH b RI WKH LWHPVf 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD LV DSSUR[LPDWHO\ b b RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ WKUHH PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf SOXV b RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ HLJKW PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ WHQ PLVVLQJ LWHPV LH b RI WKH LWHPVf 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD LV DSSUR[LPDWHO\ b b RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ RQH PLVVLQJ LWHP LH b RI WKH WHVW LWHPVf SOXV b RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ IRXU PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ HLJKW PLVVLQJ LWHPV LH b RI WKH LWHPVf 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD LV DSSUR[LPDWHO\ b b RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ WZR PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf SOXV b RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ VL[ PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ QLQH PLVVLQJ LWHPV LH b RI WKH LWHPVf 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD LV DSSUR[LPDWHO\ b b RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ WKUHH PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf SOXV b RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ HLJKW PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZDELOLW\

PAGE 62

&RQGLWLRQ 'HVFULSWLRQ b(0,6 [ b,06 H[DPLQHHV KDYLQJ WHQ PLVVLQJ LWHPV LH b RI WKH LWHPVf 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD LV DSSUR[LPDWHO\ b b RI WKH XSSHU GLYLVLRQ RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ RQH PLVVLQJ LWHP LH b RI WKH WHVW LWHPVf SOXV b RI WKH PLGGOH GLYLVLRQ RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ WZR PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZHU GLYLVLRQ RI WKH KLJKDELOLW\ H[DPLQHHV KDYLQJ WKUHH PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH XSSHU GLYLVLRQ RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ IRXU PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf SOXV b RI WKH PLGGOH GLYLVLRQ RI WKH PHGLXP DELOLW\ H[DPLQHHV KDYLQJ VL[ PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZHU GLYLVLRQ RI WKH PHGLXPDELOLW\ H[DPLQHHV KDYLQJ HLJKW PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH XSSHU GLYLVLRQ RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ HLJKW PLVVLQJ LWHPV LH b RI WKH WHVW LWHPVf SOXV b RI WKH PLGGOH GLYLVLRQ RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ QLQH PLVVLQJ LWHPV LH b RI WKH LWHPVf SOXV b RI WKH ORZHU GLYLVLRQ RI WKH ORZDELOLW\ H[DPLQHHV KDYLQJ WHQ PLVVLQJ LWHPV LH b RI WKH LWHPV 7KH WRWDO SHUFHQW RI PLVVLQJ GDWD DSSUR[LPDWHO\ b

PAGE 63

2PLWWLQJ 3DWWHUQ 7KH WZR QRQUDQGRP RPLWWLQJ SDWWHUQV ZHUH f RPLWWLQJ LWHP UHVSRQVHV LQ WKH ERG\ RI WKH WHVW 23%f DQG f RPLWWLQJ LWHP UHVSRQVHV DW WKH HQG RI WKH WHVW 23(f LH QRQUHDFKHGf 7KH ILUVW VLWXDWLRQ ZDV VLPLODU WR WKH PLVVLQJ GDWD PHFKDQLVP LQ +DUULVRQfV f VWXG\ +RZHYHU LQ FRQWUDVW WR +DUULVRQfV f VWXG\ ZKHUH H[DPLQHHV ZLWK WKH ORZHVW DELOLWLHV PLVVHG WKH PRVW GLIILFXOW LWHPV H[DPLQHHV ZLWK GLIIHUHQW OHYHOV RI DELOLWLHV PLVVHG WKH PRVW GLIILFXOW LWHPV GLIIHUHQWLDOO\ 7KDW PHDQW WKDW WKH KLJKDELOLW\ H[DPLQHHV PLVVHG IHZHU GLIILFXOW LWHPV WKDQ WKRVH RI WKH PHGLXPDELOLW\ H[DPLQHHV DQG LQ WXUQ WKH PHGLXPDELOLW\ H[DPLQHHV PLVVHG IHZHU GLIILFXOW LWHPV WKDQ WKRVH RI WKH ORZDELOLW\ H[DPLQHHV VHH )LJXUH f 7KH QRQUHDFKHG SDWWHUQ ZDV VLPLODU WR WKH FRPELQDWLRQ RI PLVVLQJ GDWD PHFKDQLVP DQG LQ +DUULVRQfV f VWXG\ +RZHYHU WKH VHOHFWLRQ RI PLVVLQJ UHVSRQVHV ZDV EDVHG RQ WKH H[DPLQHHVf DELOLW\ LQVWHDG RI UDQGRP VHOHFWLRQ DV WKH PLVVLQJ GDWD PHFKDQLVP LQ +DUULVRQfV f VWXG\ 2QFH DJDLQ WKH KLJKDELOLW\ H[DPLQHHV PLVVHG IHZHU LWHPV DW WKH HQG RI D WHVW WKDQ WKRVH RI WKH PHGLXPDELOLW\ H[DPLQHHV DQG LQ WXUQ WKH PHGLXPDELOLW\ H[DPLQHHV PLVVHG IHZHU LWHPV DW WKH HQG RI D WHVW WKDQ WKRVH RI WKH ORZDELOLW\ H[DPLQHHV VHH )LJXUH f

PAGE 64

)LJXUH ,OOXVWUDWH WKH RPLWWLQJ SDWWHUQ RI PLVVLQJ LWHP UHVSRQVHV LQ WKH ERG\ RI WKH WHVW 23%f ZLWK H[DPLQHHV DQG WHVW LWHPV

PAGE 65

,WHP 3HUVRQ $ELOLW\ KLJK DELOLW\f L L L ORZHU DELOLW\f )LJXUH ,OOXVWUDWH WKH RPLWWLQJ SDWWHUQ RI PLVVLQJ LWHP UHVSRQVHV DW WKH HQG RI WKH WHVW 23(f ZLWK H[DPLQHHV DQG WHVW LWHPV

PAGE 66

&DXVH RI 0LVVLQJQHVV 8QGHU WKH RPLWWLQJ SDWWHUQ LQ ZKLFK LWHP UHVSRQVHV ZDV RPLWWHG LQ WKH ERG\ RI WKH WHVW WKH H[DPLQHHVf DELOLW\ DQG WKH LWHP GLIILFXOW\ SURYLGHG DQ LQGLUHFW HYLGHQFH DERXW WKH OLNHO\ YDOXHV RI WKH PLVVLQJ UHVSRQVHV 2Q WKH RWKHU KDQG ZKHQ WKH LWHP UHVSRQVHV ZHUH RPLWWHG DW WKH HQG RI WKH WHVW WKH H[DPLQHHfV DELOLW\ DQG WKH LWHP HIIHFW ZKLFK LV WKH UDQGRP HIIHFW RI WKH WHVW LWHPV SURYLGH DQ LQGLUHFW HYLGHQFH DERXW WKH OLNHO\ YDOXHV RI WKH PLVVLQJ UHVSRQVHV 6LQFH WKH FDXVH RI PLVVLQJQHVV LQ WKLV VWXG\ ZDV XQGHU WKH UHVHDUFKHUfV FRQWURO DQG WKH GLIIHUHQWLDO DPRXQW RI PLVVLQJ UHVSRQVHV ZDV D IXQFWLRQ RI WKH H[DPLQHHVf DELOLW\ DQG LWHP GLIILFXOW\ RU D IXQFWLRQ RI WKH H[DPLQHHfV DELOLW\ DQG LWHP HIIHFW GHSHQGLQJ RQ WKH RPLWWLQJ SDWWHUQV WKH PLVVLQJ GDWD PHFKDQLVP FRXOG EH FRQVLGHUHG DV PLVVLQJ DW UDQGRP *UDKDP HW DO f 7KH PLVVLQJ GDWD PHFKDQLVP ZDV WKHUHIRUH LJQRUDEOH ,WHUDWLRQV )RU HDFK RI WKH FRQGLWLRQV OHYHOV RI VDPSOH VL]H OHYHOV RI WKH GLVWULEXWLRQ DQG SHUFHQW RI LWHP UHVSRQVH RPLVVLRQ DQG OHYHOV RI RPLWWLQJ SDWWHUQf RQH WKRXVDQG LWHUDWLRQV ZHUH SHUIRUPHG WR HQVXUH VWDEOH UHVXOWV 7KH RQH WKRXVDQG LWHUDWLRQV KDYH EHHQ XVHG LQ WZR SUHYLRXV VLPXODWLRQ VWXGLHV LQ WKH HYDOXDWLRQ RI WKH HIILFLHQF\ RI 0, *O\QQ /DLUG t 5XELQ *UDKDP HW DO f 7KH LWHUDWLRQV UHVXOWHG LQ JHQHUDWLQJ UHSHDWHG GDWD VHW IRU HDFK OHYHO RI VDPSOH VL]H 0XOWLSOH ,PSXWDWLRQ 3URFHGXUH /HW 7EH DQ 1[ YHFWRU RI PHDVXUHV ZLWK < a 1 ;Rf ZKHUH ;" FRYDULDWH ZDV D IXQFWLRQ RI WKH SDUDPHWHUV ; ZDV D PDWUL[ RI H[DPLQHHVf DELOLW\ DQG LWHP GLIILFXOW\

PAGE 67

YDULDEOHV XQGHU WKH VLWXDWLRQ ZKHQ WKH RPLWWLQJ LWHP UHVSRQVHV ZHUH LQ WKH ERG\ RI WKH WHVW RU H[DPLQHHVf DELOLW\ DQG LWHP HIIHFW YDULDEOHV XQGHU WKH VLWXDWLRQ ZKHQ WKH RPLWWLQJ LWHP UHVSRQVHV ZHUH DW WKH HQG RI WKH WHVW DQG ZDV D YHFWRU RI UHJUHVVLRQ SDUDPHWHUV WR EH HVWLPDWHG 7KH GLVWULEXWLRQ RI ZDV DVVXPHG WR EH PXOWLYDULDWH QRUPDO 7KH DOJRULWKP IRU FUHDWLQJ WHQ PXOWLSO\ LPSXWHG


PAGE 68

7KH ORJLVWLF UHJUHVVLRQ PRGHO DVVXPHG WKDW WKH ORJLW RI WKH SRVWHULRU SUREDELOLW\ ZDV D OLQHDU FRPELQDWLRQ RI WKH ;LM DQG ;M YDULDEOHV LI ORJLW ;M@ M @ GLVWULEXWLRQ (DFK VHW RI SUHGLFWHG YDOXHV 7M ZDV EDVHG RQ GLIIHUHQW VHWV RI UHJUHVVLRQ SUHGLFWLRQV DQG DQ LQGHSHQGHQW GUDZQ YDOXH RI X 7KH SUREDELOLW\ RI D FRUUHFW UHVSRQVH IRU UHVSRQGHQW M LQ WKH $WK UHSHWLWLRQV SA ZDV FDOFXODWHG ZLWK WKH UDQGRPO\ VHOHFWHG UHJUHVVLRQ FRHIILFLHQWV DQG WKH YDOXH RI M IRU WKH FRUUHVSRQGLQJ FRYDULDWHV IURP WKH ORJLVWLF UHJUHVVLRQ J3P;?SDf;L 3QNf ? H3P[}SZ;L

PAGE 69

6WHS 7KH HVWLPDWHG SUREDELOLW\ "rf IURP WKH ORJLVWLF UHJUHVVLRQ ZDV FRPSDUHG WR D UDQGRP QXPEHU W IURP WKH XQLIRUP >@ GLVWULEXWLRQ IRU HDFK PLVVLQJ VFRUH ,I WKH SUHGLFWHG SUREDELOLW\ SANf ZDV OHVV WKDQ W WKHQ WKH LPSXWHG YDOXH IRU
PAGE 70

XQELDVHG ZKHQ WKH DYHUDJH GHYLDWLRQ LH ELDVf EHWZHHQ WKH FRHIILFLHQW DOSKD REWDLQHG IURP WKH LPSXWHG YDOXHV DQG WKDW RI WKH RULJLQDO YDOXHV LQ WKH GDWD VHW LV FORVH WR 506( LV GHILQHG DV WKH VTXDUH URRW RI WKH DYHUDJH VTXDUHG GLIIHUHQFH EHWZHHQ WKH FRHIILFLHQW DOSKD GHULYHG IURP WKH RULJLQDO FRPSOHWH GDWD VHW ZLWK QR PLVVLQJ GDWD DQG WKH FRHIILFLHQW DOSKD IURP WKH FRUUHVSRQGLQJ LPSXWHG GDWD VHW 7KH HVWLPDWHG FRHIILFLHQW DOSKD LV SUHFLVH ZKHQ WKH 506( LV FORVH WR 506( D RI RULJLQDO GDWD D RI UHVWRUHG GDWD XVLQJ 0,f f 7KH UHODWLRQVKLS EHWZHHQ 506( DQG ELDV LV 506(f ELDVf 6(f f ZKHUH 506( LV WKH URRW PHDQ VTXDUH HUURU ZKLFK UHSUHVHQWV DQ RYHUDOO HUURU ELDV LV WKH DYHUDJH GHYLDWLRQ ZKLFK UHSUHVHQWV D V\VWHPDWLF HUURU DQG 6( LV WKH VWDQGDUG HUURU ZKLFK UHSUHVHQWV UDQGRP HUURU

PAGE 71

&+$37(5 5(68/76 ,Q WKLV FKDSWHU UHVXOWV RI WKH DQDO\VHV RI WKH GDWD IRU WKH WZR SHUIRUPDQFH FULWHULD DUH SUHVHQWHG 7KH WZR FULWHULD DUH WKH ELDV DQG URRW PHDQ VTXDUH HUURU 506(f 7KH PHDQ FRHIILFLHQW DOSKD DQG LWV VWDQGDUG GHYLDWLRQ RI WKH RULJLQDO FRPSOHWH GDWD VHW ZLWK QR PLVVLQJ GDWD IRU DQG H[DPLQHHV ZHUH 0 6' 0 6' 0 6' UHVSHFWLYHO\ (DFK PHDQ FRHIILFLHQW DOSKD ZDV EDVHG RQ WKH DYHUDJH RI PLVVLQJ FRQGLWLRQV [ RPLWWLQJ SDWWHUQV [ LWHUDWLRQVf YDOXHV 7KH UHVXOWV RI WKHVH PHDQ FRHIILFLHQW DOSKDV ZHUH YHU\ FORVH WR WKRVH FRPSXWHG E\ +DUULVRQ f )RU H[DPSOH WKH PHDQ FRHIILFLHQW DOSKD IRU WKH VDPSOH VL]H RI LQ +DUULVRQfV VWXG\ ZDV 7KH PHDQ FRHIILFLHQW DOSKDV DQG WKHLU VWDQGDUG GHYLDWLRQ RI WKH UHVWRUHG FRPSOHWHG GDWD VHW XVLQJ 0, IRU WKH WHQ PLVVLQJ FRQGLWLRQV LQ HDFK RI WKH WKUHH OHYHOV RI VDPSOH VL]H DQG WZR OHYHOV RI RPLWWLQJ SDWWHUQ DUH VKRZQ LQ )LJXUHV WR 7KH ELDVHV REWDLQHG LQ HDFK RI WKH WHQ PLVVLQJ FRQGLWLRQV IRU WKH WZR RPLWWLQJ SDWWHUQV DUH VXPPDUL]HG LQ 7DEOHV DQG 8QGHU WKH RPLWWLQJ SDWWHUQ ZKHUH PLVVLQJ UHVSRQVHV DUH DW WKH HQG RI WKH WHVW 23(f WKH ELDVHV LQ DEVROXWH YDOXHf UDQJHG IURP WR 7KH PDMRULW\ bf RI WKH ELDVHV LQ 23( ZHUH LQ WKH PDJQLWXGH RI OHVV WKDQ 7KH ELDVHV LQ DEVROXWH YDOXHf REWDLQHG LQ 23( IRU WKH VDPSOH VL]H DQG UDQJHG IURP WR WR DQG WR UHVSHFWLYHO\ 7KH ELDVHV REWDLQHG LQ WKH RPLWWLQJ SDWWHUQ ZKHUH PLVVLQJ UHVSRQVHV DUH LQ WKH ERG\ RI WKH

PAGE 72

WHVW 23%f ZHUH QRWLFHDEO\ KLJKHU WKDQ WKRVH LQ 23( RI WKH FRUUHVSRQGLQJ PLVVLQJ FRQGLWLRQV WKHf WKH ELDVHV LQ DEVROXWH YDOXHf UDQJHG IURP WR 7KH PDMRULW\ bf RI WKH ELDVHV LQ 23% ZHUH OHVV WKDQ 7KH ELDVHV LQ DEVROXWH YDOXHf REWDLQHG LQ 23% IRU WKH VDPSOH VL]H DQG UDQJHG IURP WR WR DQG WR UHVSHFWLYHO\ $V H[SHFWHG WKH ODUJHVW ELDV ZDV LQ WKH PLVVLQJ FRQGLWLRQ b(0O [ b,0 ZKHUH WKH VPDOO VDPSOH VL]H f DFFRPSDQLHG ZLWK WKH ODUJHVW SHUFHQW bf RI PLVVLQJQHVV 7KH ELDV LQ WKLV FRQGLWLRQ ZDV 7KH FRHIILFLHQW DOSKDV LQ 23% ZHUH DOZD\V RYHUHVWLPDWHG SRVLWLYHO\ ELDVHGf ZKHUHDV LQ 23( DERXW KDOI RI WKH FRHIILFLHQW DOSKDV REWDLQHG WKURXJK 0, ZDV RYHUHVWLPDWHG DQG WKH RWKHU KDOI ZDV XQGHUHVWLPDWHG :KHWKHU 0, SURGXFHG WKH FRHIILFLHQW DOSKDV WKDW ZHUH RYHUHVWLPDWHG RU XQGHUHVWLPDWHG LQ 23( GLG QRW GHSHQG RQ WKH SHUFHQW RI PLVVLQJQHVV )XUWKHU UHVHDUFK QHHGV WR EH FRQGXFWHG WR H[SORUH ZK\ VRPH RI WKH FRHIILFLHQW DOSKDV REWDLQHG IURP 0, ZHUH RYHUHVWLPDWHG ZKLOH RWKHUV ZHUH XQGHUHVWLPDWHG XQGHU WKH VDPH RPLWWLQJ SDWWHUQ )RU FRQGLWLRQ b(0,6 [ b,06 LQ ZKLFK QRQUDQGRP GLVWULEXWLRQ RI RPLVVLRQ LV DFURVV DQG ZLWKLQ HDFK DELOLW\ JURXS 5RWK f WKH ELDV REWDLQHG LQ WKLV FRQGLWLRQ UHJDUGOHVV RI WKH RPLWWLQJ SDWWHUQV ZDV VLPLODU WR RWKHU PLVVLQJ FRQGLWLRQV ZKHUH WKH SHUFHQW RI PLVVLQJQHVV ZDV DERXW WKH VDPH HJ PLVVLQJ FRQGLWLRQ b(0O [ b,0f 7KH 506(V REWDLQHG LQ HDFK RI WKH WHQ PLVVLQJ FRQGLWLRQV IRU WKH WZR RPLWWLQJ SDWWHUQV DUH VXPPDUL]HG LQ 7DEOHV DQG 7KH UHVXOWV ZHUH YHU\ VLPLODU WR WKRVH REWDLQHG IRU WKH ELDV ,Q JHQHUDO WKH ELDV LQ DEVROXWH YDOXHf RU WKH 506( LQFUHDVHG DV WKH DPRXQW RI PLVVLQJQHVV LQFUHDVHG *UDKDP DQG 6FKDIHU f H[SODLQHG WKLV SKHQRPHQRQ E\ VXJJHVWLQJ WKDW 0, LQWURGXFHG ELDV ZKHQ KDQGOLQJ PLVVLQJ GDWD +RZHYHU WKH SDWWHUQ RI

PAGE 73

LQFUHPHQW LQ WKH ELDV RU WKH 506( ZDV QRW XQLGLUHFWLRQDO DV LQGLFDWHG LQ 7DEOHV WR 7KHUH ZHUH LUUHJXODULWLHV LQ WKH PDJQLWXGH RI WKH ELDV RU WKH 506( DFURVV WKH WHQ PLVVLQJ FRQGLWLRQV ZLWKLQ HDFK VDPSOH VL]H 7KDW PHDQV LQ VRPH PLVVLQJ FRQGLWLRQV WKH PDJQLWXGH RI WKH ELDV RU WKH 506( IRU WKH VPDOOHU DPRXQW RI PLVVLQJQHVV ZDV ELJJHU WKDQ WKDW RI WKH ODUJHU DPRXQW RI PLVVLQJQHVV HYHQ ERWK FRQGLWLRQV KDG WKH VDPH VDPSOH VL]H 7KLV NLQG RI LUUHJXODULW\ ZDV DOVR QRWLFHG LQ *UDKDP DQG 6FKDIHUnV f VLPXODWLRQ VWXG\ $QRWKHU JHQHUDO SDWWHUQ UHYHDOHG LQ WKLV VWXG\ ZDV WKDW WKH ELDV GHFUHDVHG DV WKH VDPSOH VL]H LQFUHDVHG 2QFH DJDLQ WKH SDWWHUQ RI LQFUHPHQW LQ WKH ELDV ZDV QRW XQLGLUHFWLRQDO DV LQGLFDWHG LQ 7DEOHV DQG 7KHUH ZHUH LUUHJXODULWLHV DFURVV WKH WKUHH VDPSOH VL]HV DQG WKLV NLQG RI LUUHJXODULW\ ZDV DOVR QRWLFHG LQ *UDKDP DQG 6FKDIHUnV VWXG\ 2Q WKH RWKHU KDQG WKH PDJQLWXGH RI WKH 506( LQ 23( EXW QRW LQ WKH 23% VKRZHG D FOHDU SDWWHUQ RI GHFUHPHQW DV WKH VDPSOH VL]H LQFUHDVHG VHH 7DEOH DQG f

PAGE 74

&RHIILFLHQW $OSKD 0LVVLQJ &RQGLWLRQ )LJXUH 7KH PHDQ FRHIILFLHQW DOSKD LQ 23% ZLWK VDPSOH VL]H RI 29 92

PAGE 75

&RHIILFLHQW $OSKD 0LVVLQJ &RQGLWLRQ )LJXUH 7KH PHDQ FRHIILFLHQW DOSKD LQ 23( ZLWK VDPSOH VL]H RI R

PAGE 76

&RHIILFLHQW $OSKD 0LVVLQJ &RQGLWLRQ )LJXUH 7KH PHDQ FRHIILFLHQW DOSKD LQ 23% ZLWK VDPSOH VL]H RI

PAGE 77

&RHIILFLHQW $OSKD 0LVVLQJ &RQGLWLRQ )LJXUH 7KH PHDQ FRHIILFLHQW DOSKD LQ 23( ZLWK VDPSOH VL]H RI WR

PAGE 78

&RHIILFLHQW $OSKD 0LVVLQJ &RQGLWLRQ )LJXUH 7KH PHDQ FRHIILFLHQW DOSKD LQ 23% ZLWK VDPSOH VL]H RI

PAGE 79

&RHIILFLHQW $OSKD 0LVVLQJ &RQGLWLRQ )LJXUH 7KH PHDQ FRHIILFLHQW DOSKD LQ 23( ZLWK VDPSOH VL]H RI

PAGE 80

7DEOH %LDV IRU WKH &RHIILFLHQW $OSKD LQ 2PLWWLQJ 3DWWHUQ :KHUH 0LVVLQJ 5HVSRQVHV DUH LQ WKH %RG\ RI WKH 7HVW 6DPSOH VL]H 0LVVLQJ &RQGLWLRQ $SSUR[ b RI 0LVVLQJQHVV 0HDQ 6'f 0HDQ 6'f 0HDQ 6'f b(0,L [ b,0 b f f f b(0,_ [ b,0 b f f f b(0, [ b,0 b f f f b(0, [ b,0 b f f f b(0,M [ b,0 b f f f b(0, [ b,0 b f f f b(0, [ b,0 b f f f b(0, [ b,0M b f f f b(0OM [ b,0 b f f f b(0M [ b,0M b f f f

PAGE 81

7DEOH %LDV IRU WKH &RHIILFLHQW $OSKD LQ 2PLWWLQJ 3DWWHUQ :KHUH 0LVVLQJ 5HVSRQVHV DUH DW WKH (QG RI WKH 7HVW 6DPSOH VL]H 0LVVLQJ &RQGLWLRQ $SSUR[ b RI 0LVVLQJQHVV 0HDQ 6'f 0HDQ 6'f 0HDQ 6'f b(0, [ b,0_ b f f f b(0, [ b,0 b f f f b(0, [ b,0 b f f f b(0, [ b,0_ b f f f b(0,M [ b,0_ b f f f b(0, [ bLP b f f f b(0,6 [ b,0 b f f f b(0, [ b0 b f f f b(0, [ bLP b f f f b(0, [ b,0 b f f f

PAGE 82

7DEOH 506( IRU WKH &RHIILFLHQW $OSKD LQ 2PLWWLQJ 3DWWHUQ :KHUH 0LVVLQJ 5HVSRQVHV DUH LQ WKH %RG\ RI WKH 7HVW 6DPSOH VL]H 0LVVLQJ &RQGLWLRQ $SSUR[ b RI 0LVVLQJQHVV 0HDQ 6'f 0HDQ 6'f 0HDQ 6'f b(0, [ b,0 b f f f b(0,_ [ b,0 b f f f b(0, [ b,0 b f f f b(0, [ b,0 b f f f b(0,M [ b,0_ b f f f b(0, [ bLP b f f f b(0, [ b,0 b f f f b(0, [ bLP b f f f b(0 [ bLP b f f f b(0, [ bLP b f f f

PAGE 83

7DEOH 506( IRU WKH &RHIILFLHQW $OSKD LQ 2PLWWLQJ 3DWWHUQ :KHUH 0LVVLQJ 5HVSRQVHV DUH DW WKH (QG RI WKH 7HVW 6DPSOH VL]H 0LVVLQJ &RQGLWLRQ $SSUR[ b RI 0LVVLQJQHVV 0HDQ 6'f 0HDQ 6'f 0HDQ 6'f b(0,_ [ b,0_ b f f f b(0, [ b,0 b f f f b(0,_ [ b,0M b f f f b(0, [ b,0_ b f f f b(0, [ b,0 b f f f b(0, [ bLP b f f f b(0, [ b,0 b f f f b(0, [ b,0M b f f f b(0, [ bLP b f f f b(0,M [ b,0M b f f f

PAGE 84

&+$37(5 ',6&866,21 %HFDXVH WKHUH ZDV QR VXEVWDQWLDO ELDV IRU DOO WKH PLVVLQJ FRQGLWLRQV WKH UHVXOWV RI WKLV VLPXODWLRQ VWXG\ LQGLFDWHG WKDW 0, LV D UHDVRQDEO\ JRRG SURFHGXUH WR UHSODFH WKH PLVVLQJ GDWD LQ D VLQJOHIDFHW FURVVHG PRGHO LQ ZKLFK PLVVLQJ UHVSRQVHV DUH HLWKHU LQ WKH ERG\ RI WKH WHVW RU DW WKH HQG RI WKH WHVW 7KH PDMRULW\ RI WKH ELDVHV REWDLQHG ZHUH OHVV WKDQ DQG WKH PDJQLWXGH ZDV FRPSDUDEOH WR WKRVH REWDLQHG LQ +DUULVRQfV f VWXG\ 7KH PRVW VLJQLILFDQW GLIIHUHQFH ZDV WKDW WKH DPRXQW RI PLVVLQJQHVV LQ WKH SUHVHQW VWXG\ ZDV WZR WR WKUHH WLPHV PRUH WKDQ WKDW XVHG LQ +DUULVRQfV VWXG\ DQG WKH RPLWWLQJ SDWWHUQV ZHUH QRQLJQRUDEOH 7KH SUHVHQW VWXG\ XVHG WKH H[DPLQHHfV DELOLW\ eDQG LWHP GLIILFXOW\ E DV WKH SUHGLFWRUV LQ WKH ORJLVWLF UHJUHVVLRQ ZKHQ WKH PLVVLQJ UHVSRQVHV ZHUH LQ WKH ERG\ RI WKH WHVW DQG WKH H[DPLQHHfV DELOLW\ eDQG LWHP HIIHFW L DV WKH SUHGLFWRUV ZKHQ WKH PLVVLQJ UHVSRQVHV ZHUH DW WKH HQG RI WKH WHVW 7KH SUHGLFWRUV XVHG LQ WKH SUHVHQW VWXG\ GLIIHUHG IURP WKH RQHV XVHG E\ +DUULVRQ f +DUULVRQ f XVHG H[DPLQHH HIIHFW M DQG LWHP HIIHFW L DV WKH SUHGLFWRUV 5HVXOWV RI XVLQJ\ DQG L DV WKH SUHGLFWRUV LQ WKH SUHVHQW VWXG\ LQGLFDWHG WKDW WKH ELDVHV IRU WKH FRHIILFLHQW DOSKD ZHUH XQDFFHSWDEO\ KLJKHU WKDQ WKRVH REWDLQHG XVLQJ e DQG E RU e DQG L )RU H[DPSOH LQ WKH PLVVLQJ FRQGLWLRQ b(0O [ b,0 ZLWK D VDPSOH VL]H RI WKH ELDV REWDLQHG XVLQJ WKH M DQG L DV WKH SUHGLFWRUV ZDV ZKHQ PLVVLQJ UHVSRQVHV ZHUH LQ WKH ERG\ RI WKH WHVW DQG ZKHQ PLVVLQJ UHVSRQVHV ZHUH DW WKH HQG RI WKH WHVW 7KLV LOOXVWUDWHG RQH RI WKH OLPLWDWLRQV RI 0, DV PHQWLRQHG LQ

PAGE 85

&KDSWHU QDPHO\ WKDW LQIHUHQFH EDVHG RQ 0, ZLOO EH ELDVHG ZKHQ UHOHYDQW SUHGLFWRUV DUH QRW LQFRUSRUDWHG 6FKDIHU 6FKDIHU t 2OVHQ f $Q DWWHPSW WR LQFOXGH PRUH SUHGLFWRUV LH H[DPLQHHfV DELOLW\ e LWHP GLIILFXOW\ E H[DPLQHH HIIHFW M DQG LWHP HIIHFW Lf LQ WKH ORJLVWLF PRGHO GLG QRW KHOS WR UHGXFH WKH ELDV )RU H[DPSOH WKH ELDV REWDLQHG XVLQJ e EM DQG L DV WKH SUHGLFWRUV ZDV LQ WKH PLVVLQJ FRQGLWLRQ b(0O [ b,0 ZKHQ WKH RPLWWLQJ SDWWHUQ ZDV 23% DQG WKH VDPSOH VL]H ZDV 7KLV LOOXVWUDWLRQ DIILUPHG 5XELQfV f VXJJHVWLRQ WKDW H[WUD YDULDEOHV GLG QRW DIIHFW WKH ELDV LQ WKH LQIHUHQFHV )XUWKHU V\VWHPDWLF VWXGLHV QHHG WR EH FRQGXFWHG WR VXSSRUW 5XELQfV FODLP UHJDUGLQJ WKH UHODWLRQVKLS EHWZHHQ WKH ELDV DQG WKH QXPEHU RI SUHGLFWRUV $QRWKHU SRVVLEOH IDFWRU WKDW PD\ DIIHFW WKH DFFXUDF\ RI WKH REWDLQHG FRHIILFLHQW DOSKDV ZDV WKH H[WUHPH YDOXH LQ VRPH RI WKH LWHP GLVFULPLQDWLRQ SDUDPHWHUV HJ D f 8QIRUWXQDWHO\ D VLPSOHU PRGHO VXFK DV D RQHSDUDPHWHU 5DVFK PRGHO ZLWK IL[HG LWHP GLVFULPLQDWLRQ DQG SVHXGRFKDQFH SDUDPHWHUV GLG QRW KHOS WR UHGXFH WKH ELDV 7KH ELDV IRU WKH DERYH PLVVLQJ FRQGLWLRQ LH b(0O [ bIL9f ZDV VWLOO LQ WKH PDJQLWXGH RI ZKHQ XVLQJ e DQG E DV WKH SUHGLFWRUV :KHQ FRPSDULQJ WKH ELDVHV REWDLQHG IURP WKH WZR RPLWWLQJ SDWWHUQV LW LV VXJJHVWHG WKDW H[DPLQHHfV DELOLW\ UDWKHU WKDQ LWHP HIIHFW RU LWHP SDUDPHWHUV PD\ FRQWULEXWH PRUH WR WKH DFFXUDF\ RI WKH SDUDPHWHU HVWLPDWLRQ )XUWKHU V\VWHPDWLF LQYHVWLJDWLRQ LV ZDUUDQWHG )LQDOO\ D VXUSULVLQJ ILQGLQJ ZDV REWDLQHG ZKHQ XVLQJ OLVWZLVH GHOHWLRQ WR HVWLPDWH WKH FRHIILFLHQW DOSKD LQ WKH DERYH PLVVLQJ FRQGLWLRQ LH b(0, [ b,0fWKH ELDV ZDV 7KH ELDV LQ DEVROXWH YDOXHf ZDV PXFK VPDOOHU WKDQ WKRVH REWDLQHG IURP +DUULVRQfV VWXG\ f HYHQ WKH DPRXQW RI PLVVLQJQHVV ZDV WKUHH WLPH PRUH 7KH ELDV

PAGE 86

REWDLQHG IURP WKH QRQUDQGRP PLVVLQJ FRQGLWLRQV ZLWK b RI PLVVLQJQHVVf LQ +DUULVRQfV VWXG\ ZDV DERXW 7KLV VXUSULVLQJ ILQGLQJ PD\ KDYH VRPHWKLQJ WR GR ZLWK WKH LGLRV\QFUDWLF QDWXUH RI WKH PLVVLQJ PHFKDQLVP LQ WKLV VWXG\ )XUWKHU UHVHDUFK QHHG WR V\VWHPDWLF LQYHVWLJDWH WKLV LVVXH /LPLWDWLRQV 7KH SUHVHQW VWXG\ XVHG H[DPLQHHfV DELOLW\ DQG LWHP GLIILFXOW\ DV WKH SUHGLFWRUV LQ 23% DQG XVHG H[DPLQHHfV DELOLW\ DQG LWHP HIIHFW DV WKH SUHGLFWRUV LQ 23( +RZHYHU LQ UHDO OLIH WHVWLQJ VLWXDWLRQV WKH DELOLW\ SDUDPHWHU eDQG LWHP SDUDPHWHU E QHHG WR EH HVWLPDWHG ILUVW $FFXUDWH HVWLPDWLRQ RI WKHVH WZR SDUDPHWHUV PD\ QRW EH SRVVLEOH LQ VLWXDWLRQV ZLWK D VXEVWDQWLDO DPRXQW RI PLVVLQJ GDWD %UDGORZ t 7KRPDV f $QRWKHU OLPLWDWLRQ LV WKDW RQH PD\ QRW EH VXUH RI WKH PHFKDQLVP SURGXFLQJ WKH PLVVLQJ GDWD 6XJJHVWLRQV IRU )XWXUH 5HVHDUFK 7KH SUHVHQW VWXG\ RQO\ LOOXVWUDWHG RQH ZD\ RI XVLQJ 0, WR DQDO\]H WKH GDWD ,W LV LPSRUWDQW WR SHUIRUP D VHQVLWLYLW\ DQDO\VLV WR FRPSDUH WKH UHVXOWV REWDLQHG LQ WKH SUHVHQW VWXG\ ZLWK WKRVH ZKHQ WKH QRQUHVSRQVH PRGHO ZDV WUHDWHG DV QRQLJQRUDEOH $ FRPSDULVRQ RI WKH FRHIILFLHQW DOSKD REWDLQHG XVLQJ WKH VHOHFWLRQ DSSURDFK PRGHO YHUVXV WKH SDWWHUQ PL[WXUH PRGHO FHUWDLQO\ ZRXOG EH LQIRUPDWLYH 7KH ELDV REWDLQHG LQ WKH SUHVHQW VWXG\ DV ZHOO DV LQ *UDKDP DQG 6FKDIHUfV f VWXG\ ZDV QRW D OLQHDU IXQFWLRQ RI WKH DPRXQW RI PLVVLQJQHVV RU WKH VDPSOH VL]H +RZHYHU QR JRRG H[SODQDWLRQ FDQ EH JLYHQ EDVHG RQ WKH OLPLWHG LQIRUPDWLRQ SURYLGHG LQ

PAGE 87

WKH SUHVHQW VWXG\ DV ZHOO DV LQ *UDKDP DQG 6FKDIHUfV f VWXG\ 7KLV PD\ EH DQ LPSRUWDQW LVVXH IRU IXUWKHU LQYHVWLJDWLRQ %HFDXVH RI WKH SRVLWLYHO\ VNHZHG GLVWULEXWLRQ RI WKH ELDVHV LQ 23% DQG WKH ORZHU ERXQG QDWXUH RI WKH FRHIILFLHQW DOSKD LW LV VXJJHVWHG WKDW XVLQJ WKH PHGLDQ LQVWHDG RI WKH PHDQ WR FRPSXWH WKH ILQDO DGMXVWHG DOSKD PD\ EH ZRUWKZKLOH WR LQYHVWLJDWH 7KLV VWXG\ LOOXVWUDWHG WZR RI WKH PRVW FRPPRQO\ HQFRXQWHUHG RPLWWLQJ SDWWHUQVf§ PLVVLQJ UHVSRQVHV LQ WKH ERG\ RI WKH WHVW DQG DW WKH HQG RI WKH WHVW ,Q PRVW UHDO OLIH HGXFDWLRQDO WHVWV WKH W\SHV RI RPLWWLQJ SDWWHUQV DUH PXFK PRUH FRPSOLFDWHG DQG WKH PLVVLQJ UHVSRQVHV DV VXJJHVWHG E\ 6FKDIHU DQG 2OVHQ f FDQ DULVH IURP D YDULHW\ RI UHDVRQV LQFOXGLQJ D FRPELQDWLRQ RI LJQRUDEOH DQG QRQLJQRUDEOH PHFKDQLVPV 6\VWHPDWLF LQYHVWLJDWLRQ RI WKH HIIHFWLYHQHVV RI GLIIHUHQW 0'7V HVSHFLDOO\ 50/( DQG 0, XQGHU WKH FRQGLWLRQV LQYROYLQJ D FRPELQDWLRQ RI LJQRUDEOH DQG QRQLJQRUDEOH LV LPSRUWDQW IRU H[DPLQLQJ GLIIHUHQW NLQGV RI PLVVLQJ UHVSRQVHV ,Q FKDSWHU VHYHUDO PHWKRGV KDYH EHHQ GHVFULEHG WR FUHDWH WKH SRVWHULRU SUHGLFWLYH SUREDELOLW\ GLVWULEXWLRQ IURP
PAGE 88

FDQ EH DQ H[WHQVLRQ RI WKH SUHVHQW VWXG\ WR WHVW WKH HIIHFWLYHQHVV RI 0, LQ KDQGOLQJ PLVVLQJ GDWD LQ D PRUH FRPSOLFDWHG VLWXDWLRQ

PAGE 89

5()(5(1&(6 $QJRII : + t 6FKUDGHU : % f $ VWXG\ RI K\SRWKHVHV EDVLF WR WKH XVH RI ULJKWV DQG IRUPXOD VFRUHV -RXUQDO RI (GXFDWLRQDO 0HDVXUHPHQW %DFLN 0 0XUSK\ 6 $ t $QWKRQ\ & f 'UXJ XVH SUHYHQWLRQ GDWD PLVVHG DVVHVVPHQWV DQG VXUYLYDO DQDO\VLV 0XOWLYDULDWH %HKDYLRUDO 5HVHDUFK %DUQDUG 'X 7 +LOO / t 5XELQ % f $ EURDGHU WHPSODWH IRU DQDO\]LQJ EURNHQ UDQGRPL]HG H[SHULPHQWV 6RFLRORJLFDO 0HWKRGV DQG 5HVHDUFK %DUQDUG t 0HQJ ; / f $SSOLFDWLRQV RI PXOWLSOH LPSXWDWLRQ LQ PHGLFDO VWXGLHV )URP $,'6 WR 1+$1(6 6WDWLVWLFDO 0HWKRGV LQ 0HGLFDO 5HVHDUFK %HDWRQ $ ( f 0LVVLQJ VFRUHV LQ VXUYH\ UHVHDUFK ,Q 3 .HHYHV HGf (GXFDWLRQDO UHVHDUFK PHWKRGRORJ\ DQG PHDVXUHPHQW $Q LQWHUQDWLRQDO KDQGERRN QG HG SS f 1HZ
PAGE 90

&URFNHU / t $OJLQD f ,QWURGXFWLRQ WR FODVVLFDO DQG PRGHP WHVW WKHRU\ 1HZ
PAGE 91

*UDKDP : +RIHU 6 0 t 0F.LQQRQ 3 f 0D[LPL]LQJ WKH XVHIXOQHVV RI GDWD REWDLQHG ZLWK SODQQLQJ PLVVLQJ YDOXH SDWWHUQV $Q DSSOLFDWLRQ RI PD[LPXP OLNHOLKRRG SURFHGXUHV 0XOWLYDULDWH %HKDYLRUDO 5HVHDUFK *UDKDP -: +RIHU 60 t 3LFFLQLQ $0 f $QDO\VLV ZLWK PLVVLQJ GDWD LQ GUXJ SUHYHQWLRQ UHVHDUFK ,Q / 0 &ROOLQV t / $ 6HLW] (GVf $GYDQFHV LQ GDWD DQDO\VLV IRU SUHYHQWLRQ LQWHUYHQWLRQ UHVHDUFK 1,'$ 5HVHDUFK 0RQRJUDSK SS f :DVKLQJWRQ '& 1DWLRQDO ,QVWLWXWH RQ 'UXJ $EXVH *UDKDP : t 6FKDIHU / f 2Q WKH SHUIRUPDQFH RI PXOWLSOH LPSXWDWLRQ IRU PXOWLYDULDWH GDWD ZLWK VPDOO VDPSOH VL]H ,Q 5 + +R\OH (Gf 6WDWLVWLFDO VWUDWHJLHV IRU VPDOO VDPSOH UHVHDUFK SS f 7KRXVDQG 2DNV &$ 6DJH *UHHQODQG 6 t )LQNOH : f $ FULWLFDO ORRN DW PHWKRGV IRU KDQGOLQJ PLVVLQJ FRYDULDWHV LQ HSLGHPLRORJLF UHJUHVVLRQ DQDO\VHV $PHULFDQ -RXUQDO RI (SLGHPLRORJ\ *URVV $ / f ,QWHUYDO HVWLPDWLRQ RI ELYDULDWH FRUUHODWLRQV ZLWK PLVVLQJ GDWD RQ ERWK YDULDEOHV $ %D\HVLDQ DSSURDFK -RXUQDO RI (GXFDWLRQDO DQG %HKDYLRUDO 6WDWLVWLFV +DPEOHWRQ 5 t 6ZDPLQDWKDQ + f ,WHP UHVSRQVH WKHRU\ 3ULQFLSOHV DQG DSSOLFDWLRQV %RVWRQ .OXZHU1LMKRII +DUULVRQ 0 f $ FRPSDULVRQ RI VWUDWHJLHV IRU HVWLPDWLQJ LQWHUQDO FRQVLVWHQF\ RQ WHVWV ZLWK PLVVLQJ GDWD 8QSXEOLVKHG PDVWHUfV WKHVLV 8QLYHUVLW\ RI )ORULGD *DLQHVYLOOH +HLWMDQ ) f $QQRWDWLRQ :KDW FDQ EH GRQH DERXW PLVVLQJ GDWD" $SSURDFKHV WR LPSXWDWLRQ $PHULFDQ -RXUQDO RI 3XEOLF +HDOWK +HLWMDQ ) t /LWWOH 5 $ f 0XOWLSOH LPSXWDWLRQ IRU WKH IHWDO DFFLGHQW UHSRUW DQG V\VWHP $SSOLHG 6WDWLVWLFV +HLWMDQ ) t 5XELQ % f ,QIHUHQFH IURP FRDUVH GDWD YLD PXOWLSOH LPSXWDWLRQ ZLWK DSSOLFDWLRQ WR DJH KHDSLQJ -RXUQDO RI WKH $PHULFDQ 6WDWLVWLFDO $VVRFLDWLRQ ,VDDFVRQ t 6PLWK f +RVWLQJ D PDWKHPDWLFV WRXUQDPHQW IRU WZR\HDU FROOHJH VWXGHQWV (5,& 'RFXPHQW 5HSURGXFWLRQ 6HUYLFH 1R (' f -DPVKLGLDQ 0 t %HQWOHU 3 0 f 0/ HVWLPDWLRQ RI PHDQ DQG FRYDULDQFH VWUXFWXUHV ZLWK PLVVLQJ GDWD XVLQJ FRPSOHWH GDWD URXWLQHV -RXUQDO RI (GXFDWLRQDO DQG %HKDYLRUDO 6WDWLVWLFV

PAGE 92

.DOWRQ t .DVSU]\N f 7KH WUHDWPHQW RI PLVVLQJ VXUYH\ GDWD 6XUYH\ 0HWKRGRORJ\ .LP 2 t &XUU\ f 7KH WUHDWPHQW RI PLVVLQJ GDWD LQ PXOWLYDULDWH DQDO\VLV 6RFLRORJLFDO 0HWKRGV DQG 5HVHDUFK .RUHW] /HZLV ( 6NHZHV&R[ 7 t %XUVWHLQ / f 2PLWWHG DQG QRQ UHDFKHG LWHPV LQ PDWKHPDWLFV LQ WKH 1DWLRQDO $VVHVVPHQW RI (GXFDWLRQDO 3URJUHVV (5,& 'RFXPHQW 5HSURGXFWLRQ 6HUYLFH 1R (' f .URPUH\ t +LQHV &9 f 1RQUDQGRPO\ PLVVLQJ GDWD LQ PXOWLSOH UHJUHVVLRQ $Q HPSLULFDO FRPSDULVRQ RI FRPPRQ PLVVLQJGDWD WUHDWPHQWV (GXFDWLRQDO DQG 3V\FKRORJ\ 0HDVXUHPHQW .XGHU ) t 5LFKDUGVRQ 0 : f 7KH WKHRU\ RI WKH HVWLPDWLRQ RI WHVW UHOLDELOLW\ 3V\FKRPHWULND /DLUG 1 0 f 0LVVLQJ GDWD LQ ORQJLWXGLQDO VWXGLHV 6WDWLVWLFV LQ 0HGLFLQH /DQGHUPDQ / 5 /DQG & t 3LHSHU & ) f $Q HPSLULFDO HYDOXDWLRQ RI WKH SUHGLFWLYH PHDQ PDWFKLQJ PHWKRG IRU LPSXWLQJ PLVVLQJ YDOXHV 6RFLRORJLFDO 0HWKRGV DQG 5HVHDUFK /LWWOH 5 $ f 5HJUHVVLRQ ZLWK 0LVVLQJ ;fV $ UHYLHZ -RXUQDO RI WKH $PHULFDQ 6WDWLVWLFDO $VVRFLDWLRQ /LWWOH 5 $ f 0RGHOLQJ WKH GURSRXW PHFKDQLVP LQ UHSHDWHGPHDVXUHV VWXGLHV -RXUQDO RI WKH $PHULFDQ 6WDWLVWLFDO $VVRFLDWLRQ /LWWOH 5 $ t 5XELQ % f 6WDWLVWLFDO DQDO\VLV ZLWK PLVVLQJ GDWD 1HZ
PAGE 93

0LFKLHOV % t 0ROHQEHUJKV f 3URWHFWLYH HVWLPDWLRQ RI ORQJLWXGLQDO FDWHJRULFDO GDWD ZLWK QRQUDQGRP GURSRXW &RPPXQLFDWLRQ LQ 6WDWLVWLFV f§ 7KHRU\ DQG 0HWKRG 0LVOHY\ 5 -RKQVRQ ( t 0XUDNL ( f 6FDOLQJ SURFHGXUHV LQ 1$(3 -RXUQDO RI (GXFDWLRQDO 6WDWLVWLFV 1HDO 7 t 1LDQFL f *HQHUDWLQJ PXOWLSOH LPSXWDWLRQV IRU PDWUL[ VDPSOLQJ GDWD DQDO\]HG ZLWK LWHP UHVSRQVH PRGHOV -RXUQDO RI (GXFDWLRQDO DQG %HKDYLRUDO 6WDWLVWLFV 2VKLPD 7 & f 7KH HIIHFW RI VSHHGHGQHVV RQ SDUDPHWHU HVWLPDWLRQ LQ LWHP UHVSRQVH WKHRU\ -RXUQDO RI (GXFDWLRQDO 0HDVXUHPHQW 3HWHUVRQ 5 $ f $ PHWDDQDO\VLV RI &URQEDFKfV FRHIILFLHQW DOSKD -RXUQDO RI &RQVXPHU 5HVHDUFK 3ROODUG : ( f %D\HVLDQ VWDWLVWLFV IRU HYDOXDWLRQ UHVHDUFK $Q LQWURGXFWLRQ %HYHUO\ +LOOV &$ 6DJH 5DDLMPDNHUV 4 $ f (IIHFWLYHQHVV RI GLIIHUHQW PLVVLQJ GDWD WUHDWPHQWV LQ VXUYH\V ZLWK /LNHUWW\SH GDWD ,QWURGXFLQJ WKH UHODWLYH PHDQ VXEVWLWXWLRQ DSSURDFK (GXFDWLRQDO DQG 3V\FKRORJLFDO 0HDVXUHPHQW 5DJKXQDWKDQD ( t 6LVFRYLFN 6 f $ PXOWLSOHLPSXWDWLRQ DQDO\VLV RI D FDVHFRQWURO VWXG\ RI WKH ULVN RI SULPDU\ FDUGLDF DUUHVW DPRQJ SKDUPDFRORJLFDOO\ WUHDWHG K\SHUWHQVLYHV $SSOLHG 6WDWLVWLFV 5D\PRQG 0 5 f 0LVVLQJ GDWD LQ HYDOXDWLRQ UHVHDUFK (YDOXDWLRQ DQG WKH +HDOWK 3URIHVVLRQV 5RWK 3 / f 0LVVLQJ GDWD $ FRQFHSWXDO UHYLHZ IRU DSSOLHG SV\FKRORJLVWV 3HUVRQQHO 3V\FKRORJ\ 5XELQ % f ,QIHUHQFH DQG PLVVLQJ GDWD %LRPHWULND 5XELQ % f 0XOWLSOH LPSXWDWLRQ IRU QRQUHVSRQVH LQ VXUYH\V 1HZ
PAGE 94

5XELQ % t 6FKHQNHU 1 f 0XOWLSOH LPSXWDWLRQ LQ KHDOWKFDUH GDWDEDVHV $Q RYHUYLHZ DQG VRPH DSSOLFDWLRQV 6WDWLVWLFV LQ 0HGLFLQH 6FKDIHU / f $QDO\VLV RI LQFRPSOHWH PXOWLYDULDWH GDWD 1HZ
PAGE 95

%,2*5$3+,& 6.(7&+ +RQ .HXQJ
PAGE 96

, FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 0 'DYLG 0LOOHU &KDLU 3URIHVVRU RI (GXFDWLRQDO 3V\FKRORJ\ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ $QQH ( 6HUDSKLQH $VVLVWDQW 3URIHVVRU RI (GXFDWLRQDO 3V\FKRORJ\ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 3URIHVVRU RI (GXFDWLRQDO /HDGHUVKLS 3ROLF\ DQG )RXQGDWLRQV FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ c$P W .D\ :DONHU 3URIHVVRU RI 2FFXSDWLRQDO 7KHUDS\

PAGE 97

7KLV GLVVHUWDWLRQ ZDV VXEPLWWHG WR WKH *UDGXDWH )DFXOW\ RI WKH &ROOHJH RI (GXFDWLRQ DQG WKH *UDGXDWH 6FKRRO DQG ZDV DFFHSWHG DV SDUWLDO IXOILOOPHQW RI WKH UHTXLUHPHQWV IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ $XJXVW KDLUPDQ RI (GXFDWLRQDO 3V\FKRORJ\ 'HDQ &ROOHJH RI (GXFDWLRQ 'HDQ *UDGXDWH 6FKRRO

PAGE 98

?