Citation
Type I Error Rates for the Kenward-Roger Adjusted Degree of Freedom F-Test for a Split-Plot Design with Missing Values

Material Information

Title:
Type I Error Rates for the Kenward-Roger Adjusted Degree of Freedom F-Test for a Split-Plot Design with Missing Values
Creator:
PADILLA, MIGUEL A. ( Author, Primary )
Copyright Date:
2008

Subjects

Subjects / Keywords:
Covariance ( jstor )
False positive errors ( jstor )
Group size ( jstor )
Matrices ( jstor )
Missing data ( jstor )
Modeling ( jstor )
Sample size ( jstor )
Simulations ( jstor )
Statistical models ( jstor )
Statistics ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Miguel A. Padilla. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
8/31/2007
Resource Identifier:
436098791 ( OCLC )

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

TYPE I ERROR RATES OF THE KENW ARD-ROGER ADJUSTED DEGREE OF FREEDOM F -TEST FOR A SPLIT-PLOT DESIGN WITH MISSING VALUES By MIGUEL A. PADILLA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005

PAGE 2

Copyright 2005 by Miguel A. Padilla

PAGE 3

To my mother, Juana; and my godmother, Lisa, for their undaunting support and constant encouragement

PAGE 4

ACKNOWLEDGMENTS The completion of this dissertation was made possible through the help of my dissertation committee. They all helped me in their own separate ways. I wish to thank Dr. James Algina and Dr. David Miller for gi ving me the opportunity to come and study at the University of Florida, for continuing throughout to show confidence in my abilities, and for supporting my academic decisions that were based on their advice. They may never know how much each of them has influenced me. Thanks also go to Dr. Anne Seraphine. Through our first conversation she helped me realize that I needed to be challenged by a more rigorous course of study. She continued to encourage me while I pursue d my graduate education. Additionally, I would like to thank Dr. Ronald Randles fo r teaching me the more mathematical and theoretical side of statistics, which saved me from many headaches. Lastly, I want to thank Elaine Green. Although she was not a member of my supervising committee, her efforts allowed me to do what a graduate st udent is supposed to do in graduate school: study. iv

PAGE 5

TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF TABLES............................................................................................................vii LIST OF FIGURES...........................................................................................................ix ABSTRACT.........................................................................................................................x CHAPTER 1 INTRODUCTION......................................................................................................1 Methods for Analyzing Data from Split-Plot Designs................................................6 Analysis of Variance (ANOVA)........................................................................6 Welch-James Adjusted Degree of Freedom Omnibus Test.................................7 Mixed-Effects Linear Models............................................................................9 Kenward-Roger Adjusted Degrees of Freedom F -Test......................................10 Traditional Methods for Analyz ing Data with Missing Values................................13 Modern Methods for Analyzing Data with Missing Values.....................................14 Purpose of the Study.................................................................................................17 2 LITERATURE REVIEW.........................................................................................19 3 METHODOLGOY...................................................................................................27 Design.......................................................................................................................27 Data Generation........................................................................................................33 Setting k and c ...........................................................................................................34 Data Analysis............................................................................................................35 4 RESULTS AND DISCUSSION...............................................................................42 Criteria for Assessing Model Fit and Type I Errors.................................................43 Analysis of Type I Error Rates fo r the Between-Subjects Main Effect....................44 Analysis of Type I Error Rates fo r the Within-Subjects Main Effect.......................45 Effect of Number of W ithin-Subjects Levels ( K )............................................46 Effect of Number of Between-Subjects Levels ( J )..........................................47 v

PAGE 6

Effect of n min /(K – 1)........................................................................................47 Effect of Type of Missing Data Mechanism (TMDM)....................................47 Effect of Percent of Missing Data (PM)...............................................................48 Analysis of Type I Error Rates for the Betweenby Within-Subjects Interaction......................................................................................................................48 Effect of Number of W ithin-Subjects Levels ( K )............................................49 Effect of Number of Between-Subjects Levels ( J )..........................................50 Effect of n min /(K – 1)........................................................................................50 Effect of Sample Size Inequality (SSI)............................................................51 Effect of Nature of Pairing of Group Sizes with Covariance Matrices (NPSC)..........................................................................................51 Effect of Type of Missing Data Mechanism (TMDM)....................................52 Example....................................................................................................................52 5 CONCLUSIONS.......................................................................................................82 REFERENCES..................................................................................................................90 BIOGRAPHICAL SKETCH.............................................................................................95 vi

PAGE 7

LIST OF TABLES Table Page 2-1 Final n min /(K – 1) Recommendations for Distribution by Between-Subjects Factor ( J ) by Effect by Within-Subjects Factor ( K )..................................................25 2-2 Test Statistics from Fai and Cornelius (1996)............................................................26 3-1 Group Sizes for Each Level of J at K = 3..............................................................37 3-2 Group Sizes for Each Level of J at K = 6..............................................................38 3-3 Pooled Covariance Matrix for K = 3.....................................................................39 3-4 Pooled Covariance Matrix for K = 6.....................................................................40 3-5 Example of PROC MIXED Program.....................................................................41 4-1 Wald Tests of Type I Error fo r the Between-Subjects Main Effect......................55 4-2 Goodness of Fit Tests for Within-Subjects Main Effect........................................56 4-3 Wald Tests of Type I Error fo r the Within-Subjects Main Effect.........................57 4-4 Effect of Number of W ithin-Subjects Levels ( K ) on Type I Error Rates for the Within-Subjects Main Effect......................................................................58 4-5 Effect of Number of Between-Subjects Levels ( J ) on Type I Error Rates for the Within-Subjects Main Effect......................................................................59 4-6 Effect of n min /(K – 1) on Type I Error Rate s for the Within-Subjects Main Effect............................................................................................................60 4-7 Effect of Type of Missing Data Mechanism (TMDM) on Type I Error Rates for the Within-Subjects Main Effect............................................................61 4-8 Effect of Percent of Missing Data Me chanism (PM) on Type I Error Rates for the Within-Subjects Main Effect......................................................................62 vii

PAGE 8

4-9 Goodness of Fit Tests for the Betwee nby Within-Subjects Interaction..............63 4-10 Wald Tests of Type I Error for the Betweenby Within-Subjects Interaction......64 4-11 Effect of Number of Within-Subjects Levels (K ) on Type I Error Rates for the Betweenby Within-Subjects Interaction..................................................65 4-12 Effect of Number of Between-Subjects Levels ( J ) on Type I Error Rates for the Betweenby Within-Subjects Interaction..................................................66 4-13 Effect of n min /(K – 1) on Type I Error Ra tes for the Betweenby Within-Subjects Interaction...................................................................................67 4-14 Effect of Sample Size Inequality (SSI) on Type I Error Rates for the Betweenby Within-Sub jects Interaction..............................................................68 4-15 Effect of Nature of Pairing of Group Sizes with Covariance Matrices (NPSC) on Type I Error Rates for the Betweenby Within-Subjects Interaction..............................................................................................................69 4-16 Effect of Type of Missing Data Mechanism (TMDM) on Type I Error Rates for the Betweenby Within-Subjects Interaction........................................70 4-17 PROC MIXED Program for Example...................................................................71 4-18 Model Information Criteria for Complete Data.....................................................72 4-19 Kenward-Roger F -Tests for Complete Data..........................................................73 4-20 Kenward-Roger F -Tests for Listwise Deleted Data..............................................74 4-21 Model Information Criteria for Incomplete Data...................................................75 4-22 Kenward-Roger F -Tests for Incomplete Data.......................................................76 viii

PAGE 9

LIST OF FIGURES Figure Page 4-1 Between-Subjects Main Effect Dist ribution of Type I Error Rates..........................77 4-2 Within-Subjects Main Effect Dist ribution of Type I Error Rates.............................78 4-3 Betweenby Within-Subj ects Interaction Distribution of Type I Error Rates.........79 4-4 Male Residual Profiles for Complete Data...............................................................80 4-5 Female Residual Profiles for Complete Data............................................................81 ix

PAGE 10

Abstract of Dissertation Presen ted to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TYPE I ERROR RATES OF THE KENW ARD-ROGER ADJUSTED DEGREE OF FREEDOM F -TEST FOR A SPLIT-PLOT DESIGN WITH MISSING VALUES By Miguel A. Padilla May, 2005 Chairman: James J. Algina Major Department: Educational Psychology The Type I error of th e Kenward-Roger (KR) F -test was assessed as it was implemented in PROC MIXED in SAS for a split-plot design with ignorable missing values. In PROC MIXED maximum likelihood (M L) using all available data is used to estimate model parameters along with the co rresponding covariance matrix. The KR test was selected because it uses an adjusted es timator for the sampling covariance matrix of the model parameters, which is known to be biased, and Satterthwaite Type degrees of freedom based on the adjusted covariance matrix. The simulation study looked at a split-plot design with one betw eenand within-subjects factor with each factor having three or six levels. Additional simulation f actors investigated were sample size, group sample size inequality, degree of covariance s phericity, pairing of group sample size with covariance matrices, type of missing data mechanism, and percent of missing data. Missing completely at random (MCAR) and missing at random (MAR) were the type of missing data mechanisms investigated and are ignorable for ML estimation. x

PAGE 11

Overall, the KR test controlled the T ype I error well under a ll of the simulation factors. However, number of levels of the within-subjects factor, type of missing data, percent of missing data, group sa mple size inequality, pairing of group sample size with covariance matrices, and sample size did have some affect on the Type I error or the KR test. For the within-subjects main effect and the betweenby within -subjects interaction, when the levels of the within-subjects factor increased from 3 to 6, the Type I error also increased, and the increase was higher when paired with a MAR missing data mechanism or 15% missing data. In general, the Type I er ror for the within-subjects main effect also tended to be higher under MAR or 15% missing data. More severe sample size inequality between the groups and a negative pairing of the group sample size with the covariance matrices tended to increase the Type I error of the betweenby within-subjects inter action. Lastly, as the sample size increased, the betweenby within-subjects interaction Type I error cam e closer to the nominal Type I error. xi

PAGE 12

CHAPTER 1 INTRODUCTION According to Keselman et al. (1998), one of the most commonly used designs in educational and psychological research is the split-plot design. Split-plot designs include both between-subjects and with in-subjects factors. Responses on the latter factor are obtained by repeatedly measuring each particip ant in the study. These repeated measures might be obtained at different points in time or under different treatments. As an example of a split-plot design, suppose that a researcher is interested in age differences in visual task performance (Maxwell & Delaney, 1999). In particular, the researcher is interested in the extent to whic h interfering visual stim uli slow the ability of the visual system to recognize letters. Part icipants are seated in front of a computer screen and are told to fixate on the center (0 off-center) of the display where they will see either the letter T or th e letter I embedded in a group of other letters. However, the target letter is 0, 4, and 8 off-center of the display and is viewed by every participant in random order. Reaction time, measured in milliseconds, needed to identify the correct letter is the dependent variab le. The researcher also want s to know if reaction time is slower for adults who are at least 60 years old than for adu lts who are approximately 20 years old. So the researcher separates the participants into the appropriate age groups. In this example, the angle of display is the within-subjects factor a nd the age group is the between-subjects factor. 1

PAGE 13

2 As an example of how the within-subjects factor can be obtained by using different tests, suppose that th e researcher is interested in the effects of a new drug on a mood disorders. The researcher randomly assi gns participants with the mood disorder to a treatment group or a control group. Particip ants in the treatment group receive the new drug, and participants in the control group re ceive a placebo. Then all participants are given the comparably scaled MMPI Social Introversion Scale, MMPI Depression Scale, and MMPI Schizophrenia Scale in random or der. The score on each scale is the dependent variable. In this example, th e treatment and control groups constitute the between-subjects factor and the three test s constitute the within-subjects factor. To imagine how the within-subjects factor can be obtained by measuring participants at different points in time, suppose a researcher wa nts to estimate the rate of growth in height of preadolescent children. The researcher measures the height of the children annually over a fixed set of ages such as 6 to 10. Additionally, the researcher is interested in gender differences in the rate of growth in height, so the children are separated into male and female groups. The height measurement is the dependent variable, the gender groups comprise the between-subjects fact or, and age is the within-subject (or repeated measures) factor. Although these three designs are conceptually different, each constitutes a split-plot desi gn and can be analyzed by the methods reviewed next. In educational and psychological research, data collected in split-plot designs are often incomplete. Missing values can come about for a variety of reasons. Consider participants who drop out of a longitudinal st udy because of illness or death, refuse to answer questions on a survey because of its length or the sensitivity of the questions, or

PAGE 14

3 are unable to answer questions on a perf ormance assessment test because of time constraints or lack of ability. E ach example results in missing values. Little and Rubin (2002, p. 12) and Rubin (1976) defined three types of missing data mechanisms. To understand the concept of a missing data mechanism, we must recognize that two types of data can be taken into account in the analysis when values are missing values from the data. First there are the independent and dependent variables and second there is a dichotomous variable indicating whether a particular dependent variable data point is missing. The missing da ta mechanism is a re lationship between the probability that data are missing on the depe ndent variables and on the independent and dependent variables. The relationship might be modeled, for example, as a logistic or probit regression functio n relating the presence or absenc e of the data points to the independent and dependent variables. The missing data mechanisms, ordered from most restrictive to least restrict ive, are missing completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR). Generally, the NMAR missing data condition constitutes any missing data c ondition that is not MC AR and/or MAR. To simplify presentation without important loss of generality, let Y = [ Y 1 Y 2 ] be a n 2 matrix of scores (where n is the number of participants) including those scores that are observed in the study and those that would have been observed had there been no missing data. Also let R = [ R 1 R 2 ] denote a n 2 matrix, where r ij = 1 if the score is missing for the i th participant ( i = 1,, n) on the j th variable ( j = 1, 2) and r ij = 0, otherwise. The missing data mechanism is MCAR if prob( R j = 1| Y 1 , Y 2 ) = prob( R j = 1) or, in other words, when the pattern of missing values in Y does not depend on Y 1 or Y 2 .

PAGE 15

4 If g 11 = prob ( R 1 = 1, R 2 = 1 | Y 1 , Y 2 ) is independent of Y 1 and Y 2 , g 10 = prob( R 1 = 1, R 2 = 0| Y 1 , Y 2 ) = prob ( R 1 = 1, R 2 = 0| Y 2 ), g 01 = prob( R 2 = 0, R 2 = 1| Y 1 , Y 2 ) = prob ( R 2 = 0, R 2 = 1| Y 2 ), and prob( R 2 = 0, R 2 = 0| Y 1 , Y 2 ) = 1g 11 g 10 g 01 the missing data mechanism is said to be MAR. Finally, if the pattern of missing values in Y 1 depends on Y 1 or on Y 1 and Y 2 or the pattern of missing values in Y 2 depends on Y 2 or on Y 1 and Y 2 , the missing data mechanism is NMAR. Although several methods are available for analyzing data with missing values, the statistical properties of the procedures depend on the missing data mechanism. These statistical properties are discusse d in the context of a split-plot design. Rubin (1976) has shown that if the missi ng data mechanism is MCAR or MAR, ML estimators of the parameters are consis tent when the missing data mechanism is ignored. When the missing data mechanism is ignored, it is excluded from the data analysis. Thus, the MCAR or MAR missing da ta mechanisms are ignorable for purposes of ML estimation. If the data are MCAR , both listwise deletion and ML ignoring the missing data mechanism will produce consistent estimators, but the ML estimators will be more accurate because they use all of th e available data. In listwise deletion, any participant who has missing data on at least one level of the within-subjects factor is dropped from the analysis, leaving only partic ipants who have data on every level of the within-subjects factor. R ubin (1976) also showed th at the MCAR missing data mechanism is ignorable for sampling distribut ion based inference procedures such as hypothesis tests and confidence intervals. So if the data are MCAR, either listwise deletion or ML ignoring the missing data mech anism can be used for inference, but ML

PAGE 16

5 will result in more powerful tests and narro wer confidence intervals because it does not delete the observed data for participants missing some data. When ML estimation is used, whethe r the MAR missing data mechanism is ignorable for sampling distribution base d inference depends on the how sampling variances and covariances are calculated. The MAR missing data mechanism is ignorable for sampling distribution based inferences on the means if the sampling covariance matrix is estimated from the obs erved information matrix for the means and the covariance parameter estimates, but not if the matrix is estimated from the portion of the observed information matrix that pe rtains only to the means (Kenward & Molenberghs, 1998). The MAR mechanism may not be ignorable for sampling distribution based inferences if the sampli ng covariance matrix is estimated from the expected information matrix. If the sampling covariance matrix is estimated from the expected information matrix, in order for sa mpling distribution based inferences to be valid, the expected value of the informati on matrix must be taken under the actual sampling process implied by the MAR mech anism (Kenward & Molenberghs, 1998). Kenward and Molenberghs call this type of expected information matrix the unconditional sampling framework; whereas us ing the information matrix that ignores this sampling process is calle d the nave sampling framework. If the missing data mechanism is NMAR, the missing data mechanism is non-ignorable (NI) for purposes of ML estimat ion. Because of this, if the missing data mechanism is not MAR or MCAR, the patter n of missing values must be taken into account obtain consistent ML estimates. This can be accomplished by using a selection model that incorporates a model for the mi ssing values indicator or by using a pattern

PAGE 17

6 mixture model, which stratifies the data on the basis of the pattern of missing values (Albert & Follmann, 2000; Diggle & Kenward, 1994; Fitzmaurice, Laird, & Shneyer, 2001; Kenward, 1998; Troxel, 1998). Little ( 1995) provided details about these two approaches. Using ML estimation ignoring th e missing data mechanism is also selection modeling. Methods for Analyzing Data from Split-Plot Designs Analysis of Variance (ANOVA) Traditional methods for analyzing a split -plot design are a un ivariate ANOVA for a split-plot design; a univari ate ANOVA for a split-plot design with approximate degrees of freedom (ADF) tests, such as the Gr eenhouse-Geisser (1959) or Huynh-Feldt (1976) tests of the within-sub jects effects; or a multivariate ANOVA. All of these tests assume that the repeated measures for different participants are drawn from a multivariate normal distribution and that responses for different participants are independently distributed. In addition, univariate F -tests for the within-subjects main effect and the interaction of the withinand between-subjects factors assume multisample sphericity for the covariance matrices across the levels of the between -subjects factor (Algina & Keselman, 1998; Huynh & Feldt, 1970; Keselman & Keselman, 1993; Mendoza, 1980). For a split-plot design, with one betweenand one within-subject s factor, let j = 1,, J index the levels of the between-subjects factor and let k = 1,, K denote the levels of the within-subjects factor. Multisample sphericity is satisfied when 1 j K– C C=I , where is a C ( K – 1) K orthonormal contrast matrix representing K – 1 linearly independent contrasts of the levels of the within-subjects factor, j is the K K variance-covariance matrix for the j th group, is a positive scalar, and 1 K –I is a ( K – 1) ( K – 1) identity matrix. Since

PAGE 18

7 multisample sphericity will rarely be met in practice, some authors recommend using an ADF test or a multivariate test (Looney & Stanley, 1989). The ADF tests adjust for violations of the sphericity assumption by modifying the degrees of freedom of the traditional F -test through an estimate of the unknown sphericity parameter and, effectively, do not make the sphericity assu mption. Multivariate test statistics do not require the sphericity assumption. Both ADF and multivariate tests require homogeneity of covariance matrices across the levels of the between-subjects factor. Both of these tests can fail to control the Type I error rate when the j covariance matrices are heterogeneous (Keselman & Keselman, 1990; Keselman, Keselman, & Lix, 1995; Keselman, Lix, & Keselman, 1996). Welch-James Adjusted Degree of Freedom Omnibus Test As a response to these unsatisfactory resu lts, the multivariate Welch-James (WJ) test, which does not require the sphericity assumption or the homoge neity of covariance assumption, has been proposed (Algina & Keselman, 1997, 1998; Keselman, Algina, Wilcox, & Kowalchuk, 2000; Keselman, Carriere, & Lix, 1993). Let L be a r JK contrast matrix. Suppose intere sts lies in testing the hypothesis H 0 : L = 0 where 12 J = , 12jjjjK = , and j = 1, 2, , J . Then the Welch-James test statisti cs has an approximate F distribution and is defined by Johansen (1980) as F c xLLSLLx (1-1)

PAGE 19

8 where x is the estimate of , 1122/// J Jnnn SSSS is block diagonal, S j is the sample variance-covariance matrix of the j th grouping factor, and c = r + 2A – 6A /(r + 2). The degrees of freedom for the WJ test are f 1 = r , and f 2 = r ( r + 2)/(3 A ) for the numerator and denominator, respectively, with 2 2 –1 11 2 1jj J j jtr tr n SLLSLLQSLLSLLQ A (1-2) The Q j matrix is a JK JK block diagonal, corresponding to the j th group with the (s, t) th block of Q j = I K K if s = t = j and is 0 otherwise. When the sample size is sufficiently large, the WJ test contro ls the Type I error rate when the j dispersion matrices are heterogeneous . However, it can be applied to data with missing values only through listw ise deletion. As the WJ procedure uses maximum likelihood (ML) estimates of the means and restricted maximum likelihood (REML) estimates of the covariance matrices , this procedure can only be expected to result in valid inferences if the missing da ta mechanism is MCAR (Little & Rubin, 2002). In the present context, the missing data mech anism is MCAR when the probability that a data point is missing is not related to the levels of the between-subjects factor, the levels of the within-subjects factor (the repeated measures factor), or the repeated measures random variables, a highly un likely situation in applied se ttings (Graham, Hofer, & MacKinnon, 1996; Muthen, Kaplan, & Hollis, 1987). Inferences from the WJ test will not be valid if the missing data mechanism is MAR because of the listwise deletion required by the WJ test when data are missing. In the present context, the missing data mechanism is MAR when the probability that a data point is missing on a particular level

PAGE 20

9 of the within-subjects factor is related to th e random variables for th e other levels of the within-subjects factor, but is not related to th e random variable for that particular level of the within-subjects factor. Mixed-Effects Linear Models The mixed-effects linear model of Lair d and Ware (1982) has become very popular for analyzing data from experimental designs because of the modeling flexibility it offers. Multilevel models and repeated measures models with or without random effects parameters are just two of the models that can be mixed-effects linear models. An advantage of the mixed-effects linear model approach to analyzing data from split-plot designs is that listwise deleti on need not be used. Instead, all of the available data are included in the analysis and because para meter estimation is carried out by ML, according to Little and Rubin (2002, p. 117) “in one formal sense there is no difference between ML or Bayes inference for incomplete data and ML or Bayes inference for complete data.” It is important to recognize that ML estimation can be carried out by either ignoring or incorporating the mechanism that accounts for the missing values. This is important because most commercially availa ble programs that estimate the mixed-effects linear model do so by ignoring the missing data mechanism by default. An incomplete list of such programs includes HLM 5 (Raudenbush, Bryk, Cheong, & Congdon, 2000), SPLUS, and SPSS. Recall that the missing data mechanism is a rela tionship between the probability that data are missing on the depe ndent variables and the independent and dependent variables. Analyzing only the observed data on the independent and dependent variables is referred to as ignoring the missing data mechanism.

PAGE 21

10 Kenward-Roger Adjusted Degrees of Freedom F -Test When the mixed-effects linear model is us ed to analyze the data, likelihood ratio, score, or Wald hypothesis tests can be used. Wald tests seem to be the most common. For example, when PROC MIXED in SAS is used, the default procedure for tests on the fixed effects is the Wald test. Let L be a r JK matrix of full row rank and let 12 J = and each j is a JK 1 vector of population means for the JK cells of the split-plot design. Each row of L is a contrast vector. The main effect and interaction hypotheses about the withinand between-subjects factors can be expressed as H 0 : L = 0 where the bold zero is a r 1 vector with all elements equal to zero. Let j denote the K K population covariance matrix of the j th level of the between-subjects factor, S j the K K REML estimate of the covariance matrix, and ij and S ij the K i K i section of the population and sample covarian ce matrices, respectiv ely, that pertain to the dependent variables on which the i th participant in the j th group has observed scores. In addition let A i denote a K i K indicator matrix obtained by eliminating the k th row from the K K identity matrix if the data for the i th participant is missing on the k th level of the within-subjects factor. For example, suppose that K = 5 and the i th participant has data at occasions 1, 2, and 4, but not at occasions 3 and 5. Then 1 2 112131415 2 212232425 2 313233435 2 414243445 2 51525354510000 10000 01000 01000 00010 00010iji j ASA . The default test statistic in PROC MIXED for testing the null hypothesis is

PAGE 22

11 –F r xLLMLLx (1-3) were 12 J x=xxx is the ML estimate of , r = rank( L ), and M is a block diagonal matrix in which the j th block is iji i ASA . The expression – iji i ASA is the estimated sampling covariance matrix of the mean vector j x and is based on the expected info rmation matrix calculated under the nave sampling framework. Even when data are MCAR or there are no missing data, using – iji i ASA has two drawbacks: 1. – iji i ASA is an estimate of – iji i A A , the sampling covariance matrix of the ML estimate of j that would be obtained if the j were known. Results by Kackar and Harville (19 84) show that, as a sampling covariance matrix for j x , – iji i ASA tend to be too small because it fails to take into account the uncertainty in j x introduced by substituting S j for j . 2. Prasad and Rao (1990) and Booth and Hobert (1998) show that – iji i ASA is biased for – iji i A A . Harville and Jeske (1992) develope d a better approximation, denoted by , that can be used to estimate the sampling covariance matrix of @ m x . Kenward and Roger (1997) then developed an alterna tive estimator, denoted by A , that can also be used to estimate

PAGE 23

12 the sampling covariance matrix for x . Kenward and Roger proposed a test statistic, which in the context of a split-plot design is ,A rdF r xLLLLx (1-4) * , rdFF were x is the ML estimate of , r = rank(L ), and is a scaling factor estimated from the data. Using Satterthwaite ’s (1946) approximation, the scaling factor and the d denominator degrees of freedom are derive d by equating first a nd second moments of both sides of equation 1-4. This gives [] 2 d EF d and 2 2 222 [] 24 drd VarF rdd . Solving for and d leads to [](2) d EFd and 22 4 [] 1 2[] r d VarF r EF . The Kenward-Roger procedure is implemen ted in SAS’s PROC MIXED, but it uses in place of @ m A . Additionally, it should be noted that the calculations of E [F] and Var [F] are computationally intensive.

PAGE 24

13 Traditional Methods for Analyzing Data with Missing Values There are several traditi onal techniques for analyzing data with missing values, but it has been suggested by some authors that the modern methods such as ML estimation are preferable (Graham et al., 1996) . Some of the traditional methods are listwise deletion, pairwise deletion, and regression based single imputation. As has already been discussed, listwise deletion excludes all partic ipants with any missing data from the analysis. Listwise deletion works well if the researcher has a large sample, a small percentage of missing data, and a MCAR missing data mechanism. For example, if the researcher has a sample of 500 and 35% of the participants have at least one missing value, the researcher will do th e analysis with a sample of 325 and, if the data are MCAR, obtain unbiased estimates while still retaining power. However, if the researcher has a sample of 100 and 35% of the participants have at least one missing value, doing that analysis with a sample of 65 could severe ly compromise power. Regardless, of the sample size and amount of missing data, estimates will be biased and sampling distribution based inferences, such as hypothe sis tests and confiden ce intervals, will be invalid if the missing data mechanism if MAR or NMAR. Another traditional method for analyzing data with missing values is pairwise deletion. In pairwise deletion, participants ar e excluded from the part of the analysis for which they have missing values. If a research er has a sample of 60 and is interested in calculating a covariance matrix for Y 1 , Y 2 , and Y 3 , but 10 participants have missing values on Y 1 , pairwise deletion results in calculating the variance of Y 1 and covariances it has with the other two variables based on a samp le size of 50. The remaining variances and

PAGE 25

14 covariances will all be based on a sample of 60. A problem with this method is that it produces biased parameter estimates (Graham & Hofer, 2000). A traditional method created as a res ponse to the problems associated with listwise and pairwise deletion is regressi on based single imputation. In this method regression analysis is used to predict plau sible values for missing values, usually done by regressing a variable with missi ng values on the remainder of the variables. One problem with this procedure is that it can be severe ly affected by slight changes in the data (Graham & Hofer, 2000; Hegamin-Younger & Forsyth, 1998). Additionally, the imputed values are only one set of plausible values th at could be used to estimate the parameters for the final model used in analyzing the da ta. Consequently, the parameters along with their corresponding standard errors are biased, leading some authors to suggest abandoning regression based single imputation (Graham & Hofer, 2000; Hegamin-Younger & Forsyth, 1998). Modern Methods for Analyzing Data with Missing Values Two modern methods for analyzing da ta with missing values are multiple imputation and ML estimation using all available data. In multiple imputation (MI) m data sets are generated with plausible values replacing missing values in each of the m data sets. The missing values are drawn fr om a predictive distribution of the missing values under a particular model for non-response. Each of the m data sets is then analyzed through the model of interest to produce parame ter estimates and standard errors. The estimates of based on the m imputed data sets are then combined to create a single estimate and standard error of the estimat e. The advantages of MI are that a single set of imputed data sets can be used for a variety of analyses. Second, the combined

PAGE 26

15 inferences drawn from the m data sets are valid, provided that the missing data mechanism is MCAR and/or MAR, because MI accounts for missing data uncertainty (Schafer, 1997; Schafer & Olsen, 1998). Thir d, there is a basis fo r computing standard errors because missing values have been filled and the data set, in a sense, is “complete” (Graham & Hofer, 2000). Lastly, MI is very efficient in that it only requires a small set of m imputed data sets to conduct a valid an alysis (Rubin, 1987; Sc hafer, 1997; Schafer & Olsen, 1998). However, generating multiple data sets can be time consuming depending on the size of the data set, pe rcent of missing data , and the number of m data sets to impute. With respect to combini ng parameter estimates, MI is a relatively new method and it is not widely available in comm ercial statistical packages. Hence, the programs that can conduct MI may not have the specific algorithm for combining the separate parameter estimates into a single infe rence for the model of interest. In such cases, the researcher must either find another program with the correct algorithm for combining the separate parameter estimates for the model of interest or write the algorithm according to the rules set forth by Rubin (1987) for combining the separate parameter estimates. A popular method for calculating ML estimates is the expectation maximization (EM) algorithm. The generality and useful ness of the EM algorithm for analyzing data with missing values was first realized when Dempster, Laird, and Rubin (1977) established its statistical properties. The EM algorithm is an iterative technique that begins by making a guess about the parameter estimates and then generates plausible values for the missing values from the guessed parameters (E-step). These plausible values are then used to revise the parameters estimates (M-step), which in turn are used to

PAGE 27

16 generate a new set of plausible values, etc. This iterative process continues until there is little change to the parameter estimates . The EM algorithm produces consistent parameter estimates provided that the missing data mechanism is MCAR and/or MAR (Graham & Hofer, 2000; Little & Rubin, 2002). However, a drawback to the procedure is that it does not compute a Hessian (or info rmation) matrix. Since the inverse of the Hessian matrix is used get estimates of the sampling variance-covariance matrix and subsequently used to compute standard er rors, another computing method such as the bootstrap or the supplemented EM algorithm (Meng & Rubin, 1991) must be used to compute standard errors. The estimated parameters and covariance matrix can then be used to conduct the analysis of interest su ch as regression or ANOV A, but one still needs to write a program to conduct the subs equent or analysis of interest. Another approach to analyzing data sets with missing values is to use ML to directly estimate the parameters of the mode l of interest and to use the inverse of the Hessian matrix to calculate estimates of the sampling variance-covariance matrix. This is the method that is implemented in SAS’s PROC MIXED when esti mating mixed-effects linear models with missing values and its prop erties with regard to a MCAR and/or MAR missing data mechanism have already been discussed. Moreover, using PROC MIXED does have advantages over the other modern methods for analyzing data with missing values. With respect to MI, there is no need to generate and analyze m imputed data sets or combine m set of estimates and standard erro rs. Unlike the EM algorithm, PROC MIXED does compute the Hessian matrix as a byproduct of using the Newton-Raphson algorithm to get ML estimates of the model parameters and uses the inverse of the Hessian matrix to get the sampling variance-cova riance matrix used in hypothesis testing.

PAGE 28

17 In addition, PROC MIXED is a general program for modeling a vast array mixed-effects linear models of which the model for the sp lit-plot design in th is study is just one modeling option. Furthermore, SAS is a popular and availa ble statistical package, making it accessible to educational and psychol ogical researchers. Because of these reasons, it is likely that using PROC MIXED in SAS will be a popular way for analyzing data from a split-plot design with missing values that conf orm to the MCAR and/or MAR missing data mechanisms. Hence, it is importa nt to assess its performance using the KR procedure for analyzing such data under a va riety of simulation c onditions that conform to educational and psychological research. Purpose of the Study The major purpose of the study was to empiri cally evaluate the Type I errors of the KR ADF F test as it is implemented in PROC MIXED with data that have missing values. The particular design that was inve stigated is a split-plot design with one between-subjects factor and one within-subjects factor. It was assumed that when the split-plot design is applied, observations are pl anned for all participants on each level of the within-subjects factor. The simulation conditions (or fact ors) of the study were (a) the number of levels of the between-subjec ts factor (A), (b) th e number of levels of the within-subjects factor (B), (c) the ratio of the smallest n j to K – 1 (i.e., n min /(K – 1)) where K is the number of levels of the within-subjects factor, (d) sample size inequality across the between-subjects factor (SSI), (e) degree of sphericity as quantified with Box’s (1954) epsilon (), (f) nature of pairing of group sizes with covariance matrices (NPSC), (g) type of missing data mech anism (TMDM), and (h) percent of missing data (PM). Because of the similarities be tween the WJ test and the KR test and because Type I error

PAGE 29

18 rates for the WJ test have been extensivel y evaluated by Algina and Keselman and their colleagues (Algina & Keselman, 1997; Algina & Oshima, 1994; Keselman et al., 2000; and Keselman et al., 1993), the KR test wi ll be evaluated under conditions similar to those used and identified by these authors to evaluate the WJ test, with the addition of missing value conditions. In particular, the sample size recommendations for control of the Type I error rate by the WJ test that we re set forth by Algina and Keselman and their colleagues will be evaluated.

PAGE 30

CHPATER 2 LITERATURE REVIEW Keselman et al. (1993) and Algina a nd Keselman (1997) investigated the performance of the WJ test at controlling the Type I error rates in a split-plot design under several simulation conditions. In the former study the authors investigated (a) number of levels of the within-subjects factor (K = 4, 8), (b) ratio of total sample size N to K – 1 (i.e., N /(K – 1)), (c) the ratio of the smallest n j to K – 1 (i.e., n min /( K – 1)), (d) sample size inequality, (e) pairing of n j with covariance matrices, and (f) shape of the distribution of the data. In all conditions the between -subjects factor was three ( J = 3) and heterogeneity of covariance matrices was he ld constant at a ratio of 1:3:5. The latter study added J = 6, degree of departure from sphericity measured by epsilon (), and heterogeneity of covariance matrices with a ratio of 1:5:9. In partic ular the authors were interested in the sample sizes required to c ontrol the Type I error rate when testing the within-subjects main effect a nd the betweenby with in-subjects interac tion. In the first study the sample sizes ranged from 30 to 171 and in the second study they ranged from 20 to 714. From these two studies the authors provide sample size guidelines in order for the WJ test to control the Type I error under normal and non-normal data. With normal data and J = 3 Keselman et al. (1993) concluded in the initial study that the WJ test can provide c ontrol of Type I error for the K main effect when the smallest of the unequal n j is 2 to 3 times larger than K – 1. Furthermore, the authors indicate that when J = 3 the WJ test can provide c ontrol of Type I error for the J K 19

PAGE 31

20 interaction when the sm allest of the unequal n j is 3 to 4 times larger than K – 1. However, in the subsequent study Algina and Keselman (1997) indicate that the recommendation for J = 3 holds for testing the K main effect, and can be slightly relaxed when J = 6. In particular, when J = 6 the smallest of the unequal n j for testing the K main effect can be relaxed to 1.33 and 1.43 larger than K – 1 for K = 4 and K = 8, respectively. For testing the interaction, the recommendation for J = 3 still holds, and again must be modified for J = 6. Specifically, when J = 6 the smallest of the unequal n j must be at least 4.75 to 5 times larger than K – 1 for K = 4 and K = 8, respectively, for testing the J K interaction. With non-normal data the authors conclude in the initial study that the WJ test provides control of Type I error for the K main effect when the smallest of the unequal n j is 3 to 4 times larger than K – 1 when J = 3. Also, for testing the J K interaction the smallest of the unequal n j must be 5 to 6 times larger than K – 1 when J = 3. In the subsequent study, the authors concluded that the previous reco mmendation holds for testing the K main effect when J = 3. Again, in the subsequent study the authors indicate that for testing the K main effect when J = 6 the smallest of the unequal n j can again be relaxed to 1.33 and 1.71 times larger than K – 1 for K = 4 and K = 8, respectively. However, the ratio of smallest of the unequal n j to K – 1 varies in order for the WJ test to provide reasonable control of Type I error rate when testing the J K interaction. Specifically, when J = 3 the smallest of the unequal n j must be at least 6 and 8 times larger than K – 1 for K = 8 and K = 4, respectively. Lastly, when J = 6 the smallest of the unequal n j must be at least 10.14 and 14 times larger than K for K = 8 and K = 4,

PAGE 32

21 respectively. The Algina and Keselman (1997) n min /(K – 1) recommendations for controlling Type I error in a split-plot design are shown in Table 2-1. Fai and Cornelius (1996) developed and comp ared four alternative test procedures that can be used to test linear hypotheses on means in multivariate studies. The four test statistics, specialized to the c ontext of this study are shown in Table 2-2. For each of the four statistics, Fai and Cornelius showed how to use the data to estimate the denominator degrees of freedom. The F 2 and F 4 statistics have scaling factors 1 and 2 that are estimated from the data. The F 1 and F 2 statistics use the – iji i ASA to estimate the covariance matrix of the mean vector whereas F 3 and F 4 use . The F @ m 4 statistic is similar to the statistic obtained by using th e KR option in PROC MIXED, but the formula for the scaling factor and the degrees of freedom are not iden tical to those used when the KR option is employed in PROC MIXED. The test using F 1 is available in SAS when the Satterthwaite option is used in PROC MIXED. Fai and Cornelius (1996) applied their tests to split-plot design s with a three level between-subjects factor and a within-subjects factor with four levels; that is, a ( J = 3) ( K = 4) design. The covariance structur e was compound symmetric. The design was unbalanced in that the number of subjects va ried across levels of the between-subjects factor and data were not generated for some combinations of subjects and the within-subjects treatment. Because the missing data were never generated, the missing data mechanism was effectively MCAR. Estim ated Type I error rates and power were reported for the main effect of the between -subjects factor. All four tests provided reasonable control of the Type I error rate. The performance of F 1 and F 3 , which do not

PAGE 33

22 include a scaling factor we re very similar. Type I error rates and power for F 4 were always larger than for F 3 . Schaalje, McBride, and Fellingham (2002), reporting on a study conducted by McBride (2002), reported Type I error rates for F 1 and the test obtained using the KR option in PROC MIXED. McBride investigat ed the performance of these tests in a split-plot design. The following provides a social science example of the design investigated by McBride. Suppose three methods for stru cturing interactions among students in a mathematics classroom are to be compared; n schools are randomly assigned to each method, where n was three in half of the conditions studied by McBride and five in the other half. The methods will be implemented for three, six, or nine weeks. Each school contributes K classes. Each class is assi gned a single inte raction quality score. In half of the conditions studied by McBride, K = 3 and the design was balanced. In the other half, K = 5 so that within each school tw o classes would be assigned to two of the implementation periods and one cla ss would be assigned to the remaining implementation period. In these conditions th e design was unbalanced, but no data were missing. McBride also investigated the effect of the covariance structure, which included the following five structures: compound symmetr ic (equal correlations and equal variance for the repeated measures), heterogeneous compound symmetric (equ al correlations, but unequal variances for the repeated measures ), Toeplitz, hetero geneous first-order autoregressive (correlations conform to a first-order autoregre ssive pattern, but the variances for the repeated measures are une qual), and first-order ante-dependence (see Wolfinger, 1996, for examples of these covariance structures). The results indicated that employing the KR option provided better cont rol than did employing the Satterthwaite

PAGE 34

23 option in PROC MIXED. Type I error rates were closer to the nominal level for balanced designs than for unbalanced designs. For unbalanced designs, Type I error rates improved as n increased. Kenward and Roger (1997) investigated how well the original KR procedure controlled Type I error rates in four situations: (a) a four-t reatment, two-period cross-over design, (b) a row-columndesign, (c) a random coeffici ents regression model for repeated measures data, and (d) a split-plot design. In (c) and (d) there were missing data. In (c) the missing data mechanism wa s MCAR. The missing data mechanism in (d) was not specified. In all situ ations, the KR test controlled the Type I error rate well. Kowalchuk, Keselman, Algina, and Wolfi nger (2004) compared the performance of the KR and the WJ procedures at cont rolling the Type I erro r rate under several simulation conditions for a ( J = 3) ( K = 4) split-plot design. The simulation conditions they investigated were (a) type of populat ion covariance structure, (b) degree of group size inequality, (c) positive and negative pa irings of covariance matrices and group sample sizes, (d) shape of the data, and (e) type of covariance structure fit to the data. All simulation conditions had heterogeneous cova riance matrices across the levels of the between-subjects factor ( J ) with a ratio of 1:3:5. The KR test coupled with modeling the true covariance structure of the data perf ormed better than did the WJ test under all conditions with small sample sizes. Also, th e authors showed that always assuming an unstructured covariance structure performed co mparably to modeling the true covariance structure when using the KR test. As shown in the preceding review, Algina and Keselman (1997), Keselman et al. (2000), and Keselman et al. (1993) demonstrated that the WJ tests of the within-subjects

PAGE 35

24 main effect and the betweenby within-subjects interaction, in split-plot designs, controls the Type I error rate even when there is fairly substantial between-subjects covariance heterogeneity, provided the sample sizes are sufficiently large. These authors have recommended minimum sample sizes for use of the WJ test. Results reported by several authors (Fai & Cornelius, 1996; Kenwar d & Roger, 1997; Kowalchuk et al., 2004; Schaalje et al., 2002) have shown that th e KR test and similar tests like the F 4 test (Fai & Cornelius, 1996) can c ontrol the Type I error rate for a variety of repeated measures designs when there is either mi ssing data but no covariance heterogeneity or covariance heterogeneity but not missing data. The purpose of this study is to investigate control of the Type I error rate by the KR test as it is implemented in PROC MIXED when there is both missing data and covariance he terogeneity. This test will be evaluated under conditions similar to those studied by Algina and Keselman and their colleagues and with sample size that meet their recomm endations. Since SAS is widely available the research will provide useful information to researchers about the conditions under which the KR test can be confidently applied to test the within-subj ects main effect and the betweenby within-s ubjects interaction, in a split-plot design.

PAGE 36

25 Table 2-1. Final n min /(K – 1) Recommendations for Distribution by Between-Subjects Factor ( J ) by Effect by Within-Subjects Factor ( K ) n min /(K – 1) Effect K = 4 K = 8 J Distribution Normal Non-normal 3 6 3 6 K J K K J K K J K K J K 2.00 3.00 3.00 4.00 1.33 1.43 4.75 5.00 4.00 6.00 3.00 8.00 1.33 14.00 1.71 10.14

PAGE 37

26 Table 2-2. Test Statistics from Fai and Cornelius (1996) Test statistics Critical values – 1F r xLLMLLx 1,,rdfF – 22F r xLLMLLx 2,,rdfF @ 3 m F r xLLLLx 3,,rdfF @ 44 m F r xLLLLx 4,,rdfF

PAGE 38

CHAPTER 3 METHODOLOGY Eight variables were manipulated in this simulation. The variables of interest are (a) the number of levels of the between-subjec ts factor (A), (b) th e number of levels of the within-subjects factor (B), (c) the ratio of the smallest n j to K – 1 (i.e., n min /(K – 1)) where K is the level of the within-subjects factor, (d) sample size inequality across the between-subjects factor (SSI), (e) degree of sphericity as quantified with Box’s (1954) epsilon (), (f) nature of pairing of group sizes with covariance matrices (NPSC), (g) type of missing data mechanism (TMDM), and (h) pe rcent of missing data (PM). For each combination of levels of the factors, fi ve thousand replications were generated. Design Before describing the design of the study, it should be noted that published research typically lacks sufficient details to assess the degree to which assumptions in a split-plot design such as multivariate normality and homogeneity of covariance matrices are violated. So, it is difficult to be sure a bout the type of data likely be encountered by educational and psychological researchers. Nevertheless, while keeping the design close to what might be encountered in educatio nal and psychological research, the design of this simulation study is based on Wilcox’s (1995a) contention th at if a procedure functions well under a variety of simulation conditions, including extreme conditions, the procedure should hold under less stringent cond itions than those in the simulation study. 27

PAGE 39

28 Both the number of levels of the between -subjects factor and nu mber of levels of the within-subjects factor were investigated in the simulation study. Each of these factor had two levels with J = 3 and 6 and K = 3 and 6. Algina and Keselman (1997), Keselman et al. (2000), and Keselman et al. (1993) mainly focused on K = 4 and 8. Initially, the present study was planned to investigate K = 4 and 8, but the larges t number of levels of the within-subjects factor was reduced from K = 8 to K = 6 because preliminary studies indicated that PROC MIXED took an inordinate amount of time to analyze a single data set when K = 8. In addition the smaller number of levels was reduced from K = 4 to K = 3. When K = 8 and J = 6 it took PROC MIXED a pproximately 2.3 minutes to produce results for one replication. The resour ces of the computer system are a follows: PROC MIXED in SAS 9.0 running under WIND OWS XP Professional on a Pentium 4 computer at 2.0GHz with 1.0GB of RAM. Co mbined with the remaining factors in the simulation and using 5,000 repl ications, it would have taken one computer running 24 hours per day approximately two years to comple te this part of the simulation. However, it took PROC MIXED approximately 45 seconds to produce results when K = 6 and J = 6. This part of the simulation would have taken approximate ly eight months to complete. However, using six computers the completion time was reduced to approximately one month and ten days for K = 6 and J = 6. The remaining parts of the simulation took approximately five days to complete. Hence, th e entire simulation took approximately one month and a half to complete. Keselman et al. (1993) reported that the smallest of the unequal n j must be 2 to 3 times and 3 to 4 times larger than K – 1 for testing the B main effect and the A B

PAGE 40

29 interaction, respectively to get reasonable control of the Type I error rate for the WJ test for normally distributed data. In that study th e smallest total sample size was 30 and the largest was 171. In the subsequent st udy by Algina and Keselman (1997) the recommendations were relaxed for testing the B main effect when J = 6. When J = 3 the recommendation of the previous study were shown to hold. For the A B interaction, the recommendations of the pr evious study still held for J = 3, but were modified for J = 6 in that the smallest of the unequal n j must be 4.75 to 5 times larger than K – 1. In the Algina and Keselman study, the smallest total sample size investigated was 20 and the largest was 714. Kowalchuk et al. (2004) indicat e that the KR test is preferable to the WJ test because it controls the Type I error rate better in small sample sizes when testing the B main effect and the A B interaction for normal or non-normal data. The total sample sizes investigated by Kowalchuk et al. were N = 30 and 45 for a split-plot design with J = 3 and K = 4. But it should be noted that Ko walchuk et al. did not look at larger designs that involve J = 6 and K = 6. None of these prev ious studies included missing data conditions. Also, an educational and ps ychological lite rature review conducted by Keselman et al. (1998) found that a total sample size of N 60 was reported in 55.3% of the split-plot designs and that six split-plot studies reported N 400. Tables 3-1 and 3-2 contain the actual sa mple sizes that were used for this simulation study along with the corresponding levels of the n min /(K – 1) ratios. It should be noted that the sample sizes selected co rrespond closely to the recommendations given by Keselman et. al (1993) for J = 3 and normal data (i.e., n min /(K – 1) = 3 to 4) and Algina and Keselman (1997) for J = 6 and normal data (i.e., n min /(K – 1) = 4.75 to 5). In addition, larger values of n min /(K – 1) were investigated. These sample sizes were

PAGE 41

30 selected taking into consideration the reco mmendations and results from the previous studies and the possibility that missing va lues will place more demands on the data analysis. Keselman et al. (1998) found that unequal sa mple sizes in split-plot designs were common, making up a little over 50% of the split -plot designs. For this reason unequal sample sizes were investigated. In particular, moderate and severe group size inequalities were investigated as defined by Keselman et al. (1993) through the coefficient of variation: 2 11J j jnn C nJ , (3-1) where and describe moderate and severe group size inequality, respectively. Tables 3-1 and 3-2 contain th e actual group sample sizes that were used. .16 C .33 C The degree of departure from sphericit y, quantified by Box’s (1954) epsilon, will be investigated. Note that the sphericity assumption is met when = 1. The departures from sphericity corresponded to = .60, .75, .90, where = .60 and = .75 represent relatively severe and moderate violations of sphericity, respectively. In past studies = .40, .57, .75 were investigated (Algina & Keselman, 1997; Keselman, Keselman, & Shaffer, 1991; Algina & Oshi ma, 1994). The coefficient has a lower bound of 1 1 K and for K = 3, then the lower bound is = .5 and so = .4 cannot be investigated. Also, according to Huynh and Feldt (1976) = .75 represents the lower limit of found in educational and psychological data. These epsilon values in this simulation study were chosen based on this contention. In particular, note that = .75 is

PAGE 42

31 the mid value and the other values are .15, which are roughly the same in distance from the mid value. Although the WJ test is a multivariate st atistic and hence should not depend upon on the data conforming to sphericity, Algina and Keselman (1997) found that the Type I error for the WJ does vary so mewhat with the value of . In particular, for testing the A B interaction under normal data, the av erage Type I errors were .070, .072, .071, and .073 for = .40, .57, .75, and 1, respectively. Although the Type I errors were within Bradley’s (1978) liberal criterion of .025 .075 they were close enough to the liberal side to warrant inves tigation in this study given the similarity between the WJ and KR test statistics. Additionall y, it is fair to assume that 1 coupled with missing data could put more demands on the data and hence af fect the Type I errors of the KR test. The covariance matrices for this simulation corresponding to each were adapted from Keselman et al. (1991) and were slightly modi fied for the levels of the within-subjects factor ( K = 3, 6). The actual covariance matric es are shown in Tables 3-3 and 3-4. The direction, positive or negative, of the pairing between the unequal group sizes and the heterogeneous covariance matrices were also investigated. A positive pairing is when the largest n j is paired with the covariance matr ix with the largest elements and a negative pairing is when the largest n j is paired with the covariance matrix with the smallest elements. A common condition for heterogeneity of covariance matrices in simulation studies is to set these matrices at a ratio of 1:3:5 for J = 3 and 1:3:5:1:3:5 for J = 6; that is, 121 3 and 35 3 2 for J = 3 and 121 3 , 325 3 , 451 3 , and 65 3 5 for J = 6 (Algina & Keselman, 1997; Keselman et al., 1993; Keselman,

PAGE 43

32 Algina, Kowalchuk, & Wolfinger, 1999b). Us ing this heterogeneity of covariance matrices ratio allows for comparability with previous research results. Furthermore, previous studies have shown that this ratio and direction of pairing can have a strong impact on the control of Type I er ror rate for approximate univariate F tests, such as the Huynh-Feldt F test (1976), and multivariate tests, particularly when the sample size is small (Keselman & Keselman, 1990). Specifically, positive pairings produce conservative Type I errors and negative pairing produce liberal Type I errors. Lastly, the MCAR and MAR missing data mechanisms were investigated in connection with 5%, and 15% probability of missing data at each level of the within-subjects factor except the first level; there were no missing data in the first level (see the “Data Generation” section for an explanation). Only the MCAR and MAR missing data mechanisms will be investigated since Padilla and Algina (2004) demonstrated that the NMAR missing data mechanism affects the validity of the KR test statistic. With respect to the probability of missing data, Gleason and Staelin (1975) show that the probability that a participant will have no missing data at any level of the within-subjects factor is (1 – p) K where p is the probability that a particular data point will be missing. Recall that there were no missing data on the first level of the within-subjects factor. Therefore, the proba bility of a participant having at least one missing data point on levels K = 3 and 6 is 1 – (1 – p) K – 1 and was .098, .228, .226, and .556 for the combinations of K and probability of missing data investigated in the study. Thus, studies with substantial amount of missing data were investigated.

PAGE 44

33 Data Generation The data were generated by using the model ijk ijkY e , (3-2) where i = 1, 2,, n j , j = 1, 2,, J , and k = 1, 2,, K or in matrix terms 11 2 ij ij ij ij ijk ijkYe Ye Ye 2 . (3-3) The mean vector j was the same for all J groups and the elements j will be equal because the focus of the proposed study is on control of the Type I error rate by the KR test. The common elements were arbitrarily set to zero. The vector e is a K 1 random vector such that e ~ NID( 0, j ). All data simulations and analyses were conducted using SAS version 9.0. For each combination of levels of the simulation factors, the following steps were used to simulate the data in the j th level of the between-subjects factor. 1. Simulate Z , a n j K matrix of pseudorandom standard normal variables where n j is the sample size for the j th level of the between-subj ects split-plot design. 2. Calculate T a K K upper triangular Cholesky factor of the covariance matrix . 3. Calculate Y = d j ZT , where d j is a constant selected to create the required degree of covariance heterogeneity. 4. Select data points for elimination. In a ll conditions there were be no data missing on Y i1 . a. For the MCAR missing data mechanism, Y ik (k = 2,, K) will be eliminated from the matrix if U ik < where is the expected proportion of the missing data on Y k and U ik is a uniform random variable.

PAGE 45

34 b. For the MAR missing data mechanism, Y ik (k = 2,, K) will be eliminated if U ik < (dY i1 + c), where is the cumulative standard normal distribution function and d and c are parameters that control the dependence of the missing data on the Y variables and the expected proportion of missing data. In the MAR conditions, all missing values depe nded on the observed values of the first level of the within-subjects factor (Y 1 ) and, hence, might limit the generality of the results to longitudinal studies. One can argue that MAR missing values that depend only on the first level of the within-subjects factor do not reflect the setting of the commonly used crossover design where the order in which the subjects receive the treatments are permuted according to the levels of the within-subjects factor (K) or assigned to a particular order sequence by some form of a Latin square (Kuehl, 2000, p. 275). In this case the MAR missing values might not depend on the first level of the within-subjects factor since that level will occur in diffe rent positions for different subjects. Nevertheless, it is felt that a benchmark for a simpler situation should be investigated and hopefully established before pursuing the more complex situation of crossover designs. Setting d and c The parameter d controls how dependent the missing data are on Y 1 in the MAR condition and will be set to one. Let 1 0ik ikifYismissing R otherwise . With d = 1, the biserial correlation between R ik and Y 1 is .5 in the MAR condition. Hence, the missing data indicators depend fairly heavily on Y 1 . With d = 1, the expected proportion of missing data on Y k is dependent on c. In the procedure described in the preceding paragraphs, the probability that R ik = 1 is related to Y 1 is modeled by a normal

PAGE 46

35 ogive (or probit model). Using well-known f acts about the normal ogive model (see, for example, Lord & Novick, 1968, equations 16.9.3 and 16.94), it can be shown that – 11c 2d . (3-4) Thus, for d = 1, – 12 11 2c So for 5% and 15% missing data conditions, the expression becomes 1.6452 c , and 1.0362 c . Data Analysis The SAS PROC MIXED program that will be used in this simulation is shown in Table 3-5. The following list descri bes various aspects of the code. Person is a variable that identifies simulated subjects. Score is the variable containing scores on the dependent variable. A is a variable that identifies the levels of the between-subjects factor. B is a variable that identifies the levels of the within-subjects factor. ddfm = kenwardroger instructs SAS to use the KR statistic to test the main effects and the interaction. Repeated is a key word that tells SAS that B is a repeated measures (within-subjects) factor and is nece ssary when there are missing data. Group = A tells SAS to model the covariance matr ix for each level of A. That is, it specifies heterogeneity of covariance matrices across the levels of A. Subject = person tells SAS that the score values are correlated within each person. Type = un instructs SAS to estimate an unstr uctured (unconstra ined) covariance matrix. That is, there are no constraints on the K estimated variances or on the K(K – 1)/2 estimated covariances.

PAGE 47

36 With this code the covariance matrix for the repeated measures is estimated by restricted maximum likelihood (REML), which is the default in PROC MIXED. This choice of using REML as the default is cons istent with the compar ison of REML and ML in McCulloch and Searle (2001). Although there are several covariance struct ures that can be used to model the covariance matrix (Wolfinger, 1996), onl y the unstructured between-subjects heterogeneous structure (UN-H) covariance matrix will be used in this simulation. There are three reasons for choosing a nd UN-H structure. First, it is the most general of the covariance structures that can be modeled and contains, as special cases, all other covariance structures and must be a correct covariance struct ure for the data. Second, the UN-H covariance structure does not require a researcher to ha ve prior knowledge of the underlying true population covarian ce structure. This makes it especially at tractive to educational and psychological researchers. Lastly, although usi ng a UN-H covariance structure comes at the cost of estimating mo re variance and covariance parameters K(K + 1)/2, under similar simulation conditions, the KR F-test for the main and interaction effects controlled the Type I error rate when there were no missing data (Kowalchuk et al., 2004).

PAGE 48

37 Table 3-1. Group Sizes for Each Level of J at K = 3 Sample size inequality J .16 C .33 C .16 C .33 C 3 n min /(K – 1) = 4.0 n min /(K – 1) = 6.0 8 8 12 12 10 14 15 20 12 20 18 28 n min /(K – 1) = 5.0 n min /(K – 1) = 7.7 10 13 16 10 13 16 10 17 24 10 6 17 24 15 19 23 15 19 23 15 25 35 15 25 35

PAGE 49

38 Table 3-2. Group Sizes for Each Level of J at K = 6 Sample size inequality .16 C .33 C .16 C .33 C J n min /(K – 1) = 4.0 n min /(K – 1) = 6.0 20 25 30 3 20 30 30 34 37 50 48 44 70 n min /(K – 1) = 5.0 n min /(K – 1) = 7.7 25 25 38 38 31 42 47 64 6 37 59 56 90 25 25 38 38 31 42 47 64 37 59 56 90

PAGE 50

39 Table 3-3. Pooled C ovariance Matrix for K = 3 = .90 18.05.06.0 8.05.0 7.0 = .75 23.24.57.4 10.35.3 4.3 = .60 23.81.99.3 9.55.7 3.9

PAGE 51

40 Table 3-4. Pooled C ovariance Matrix for K = 6 = .90 18.05.07.07.06.05.0 12.08.07.06.05.0 10.06.06.05.0 10.05.05.0 9.05.0 8.0 = .75 29.612.77.57.05.95.9 15.17.96.06.44.9 13.26.96.05.4 9.46.04.8 8.05.0 5.9 = .60 28.84.810.19.88.37.3 17.48.17.46.94.1 9.97.76.55.7 8.35.64.3 5.64.4 4.3

PAGE 52

41 Table 3-5. Example of PROC MIXED Program proc mixed; class person A B; model score = A B A*B/ ddfm=kenwardroger; repeated B/ subject= person group=A type=un; run;

PAGE 53

CHAPTER 4 RESULTS AND DISCUSSION In this chapter the results of analys es of the Type I error rates of the Kenward-Roger (KR) tests under the investigated conditions (o r factors) are presented. In particular, for each replic ation of each condition, the KR test for the within-subjects main effect, the between-subjects main e ffect, and the betweenby within-subjects interaction was conducted; the decisions a bout the null hypothesis for the tests were analyzed. The simulation conditions (or fact ors) under investigati on were (a) the number of levels of the between-subjects factor (J), (b) the number of le vels of the withinsubjects factor (K), (c) the ratio of the smallest n j to K – 1 (i.e., n min /(K – 1)) where K is the level of the within-subjects factor, (d) sample size inequality across the between-subjects factor (SSI), (e) degree of sphericity as quantified with Box’s epsilon (), (f) nature of pairing of group sizes with covariance ma trices (NPSC), (g) type of missing data mechanism (TMDM), and (h) percent of missing data (PM). All of these conditions were between-subjec ts factors for the simulation study. For each combination of the investigated conditions the results of applying the KR test to 5,000 replications were available. Hence, for each test in each replication of each combination of the investigated conditions th e KR test produced one p-value. The result of each test was summarized by a dichotomous variable , defined in the following manner: 1. 0ifpvalue TypeIError otherwise 0 5 . 42

PAGE 54

43 This dichotomous Type I error variable was analyzed by a 2 (J) 2 (K) 2 (n min /(K – 1)) 2 (SSI) 3 () 2 (NPSC) 2 (TMDM) 2 (PM) logistic regression model. Criteria for Assessing Model Fit and Type I Errors Both forward selection and backward e limination have been proposed as methods for selecting models. The models analyzed in this study were selected by using forward selection. However backward elimination was also applied to the data and resulted in the selection of more complicated models than did forward selection, but these models did not add any important insight in to the effects of the factor s on the Type I error rate. In this study, forward se lection started by testing the goodness of fit of the model containing only the intercept against the satura ted model. If the intercept-only model was rejected a model containing the intercept and main effects wa s compared to the saturated model. If model containi ng the main effects was rejected, a model containing the intercept, main effects, and two-way interact ions was compared to the saturated model. This process was continued until a non-si gnificant goodness of fit test was obtained. To assess the Type I errors, Bradley (1978) presented a conservative and liberal criterion for identifying conditions in which hypothesis testing procedures work adequately. His conservative criterion is .9 1.1 and his liberal criterion is .5 1.5 where is the nominal Type I error and is the actual Type I error. Using = .05, the ranges are .045 .055 for the conservative criterion and .025 .075 for the liberal criterion. Another method used to judge wh ether a test controls the Type I error rate is to use the binomial standard error met hod. In this method one forms a confidence interval (CI) for the estimate d Type I error in the following manner

PAGE 55

44 1.96[(1 – )/N R ] 1/2 where is the nominal alpha level, is the actual Type I error, and N R is the number of replications or simulatio ns. Each of these two criterion intervals can lead to different interpretations of the T ype I error results. Even though there is no universal standard for judging T ype I errors, Bradley’s liberal criterion was used for this study. Analysis of Type I Error Rates for the Between-Subjects Main Effect The distribution of Type I error rates for the between -subjects main effect is shown in Figure 4-1 with M = .0500 and SD = .0031. The lowest Type I error is .0410 and the highest is .0574. Thus, in all condi tions the Type I error rate was well within Bradley’s (1978) conservative and liberal criterion in tervals. Forward selection indicated that the intercept-only logistic model could not be rejected 2 (383) = 398.64, p = .28. Although the intercept-only l ogistic model could not be rejected, to determine if there was any evidence of the effects of the f actors on Type I error rates, Wald tests of the main effects were tested. Results of thes e tests are presented in Table 4-1. According to the results there are significant main effects for number of levels of the within-subjects level (K), the sample size inequality across the levels of the between-subjects factor (SSI), and the nature of pairing of the group sizes (n j ) with the covariance matrices (NPSC). Significant main effect s can be interpreted in terms of the effect of the factors on logits, odds, or proba bilities of a Type I error. It is the author’s opinion that the average Type I error rates are more intuitive to interpret than the log odds ratio or odds ratio for this particular study. The average Type I error rates for the nu mber of levels of the within-subjects factor are M = .0492 and M = .0501 for K = 3, 6, respectively. For the sample size

PAGE 56

45 inequality across the levels of the between-subj ects factor the average Type I error rates are M = .0493 and M = .0500 for SSI = .16, .33, respectivel y. For the nature of pairing of the group sizes (n j ) with the covariance matrices are M = .0493 and M = .0500 for positive and negative pairings, respectively. A lthough the Wald tests show that all of the above factors had a significant effect, their e ffects are negligible a nd the average Type I error rates well within Bradley’s liberal crite rion, indicating that the KR between-subjects omnibus test controls the Type I error well across the following factors: levels of the within-subjects factor, the sample size inequality across the levels of the between-subjects, and the nature of pairing of the group sizes (n j ) with the covariance matrices. Because the other factors did not a ffect the Type I error rate and because the average Type I error rate was .05 across all conditions it a ppears that the KR between-subjects omnibus test controls the Type I error well at all levels of the investigated factors in this study. These ra ther small effects are consistent with the results that the intercept-only logistic model is adequate for the data. Analysis of Type I Error Rates for the Within-Subject s Main Effect The distribution Type I error rates for the within-subjects main effect is shown in Figure 4-2 with M = .0524 and SD = .0051. The lowest Type I error is .0406 and the highest is .0704. Hence, in all conditions th e Type I error rate was well within Bradley’s (1978) liberal criterion interval. The goodness-of-fit tests for the logistic models fitted to the Type I error rates for the within-subjects main effect are shown in Table 4-2. The goodness-of-fit test indicates that the two-way interaction model fits the data adequately. The backward elimination procedure indicated that the th ree-way interaction model is adequate for the data. But as

PAGE 57

46 has previously been stated, examination of the three-way interactions did not add any insight into the effect of the factors on the Type I e rror rate. Thus, the two-way interaction model was selected for further analysis. The Wald tests for the two-way interact ion model are shown in Table 4-3. According to the results there are three significant main effects and eleven significant two-way interactions. As before, interpretation of the effects was conducted by inspecting the average Type I error rates. Since all of factors for which there are significant main effects enter into some of the significant two-way interactions, effects will be interpreted from two-way tables of the average Type I errors rates for each two-way interaction. It shoul d be noted that in all of th e following tables, the average Type I errors are within Brad ley’s liberal criterion, and he nce acceptable. Nevertheless, the Type I errors are slightly affected by vari ous combinations of th e investigated factors of the study. Effect of Number of Within-Subjects Levels ( K ) Although the main effect of K was not significant, K entered into significant interactions with each of n min /(K – 1), TMDM, and PM. Average Type I error rates for K in combination with each of these factors ar e shown in Table 4-4. The results in Table 4-4 indicate that K has about the same size effect at both levels of n min /(K – 1) and that both effects are fairly small. The effect of K is negligible when the TMDM is MCAR, and larger but still fairly small when the TMDM is MAR. The effect of K is also negligible when the missing data rate is 5%; it is larger but still small when the missing data rate is 15%. Overa ll, the results suggest the K has, at most, a small effect on the Type I error rate, provided K 6.

PAGE 58

47 Effect of Number of Between-Subjects Levels ( J) The main effect of J was significant but was modified by significant interactions with SSI, NPSC, and TMDM. The average Type I error rates by combination of J with SSI, NPSC, and TMDM are shown in Table 4-5. When the coefficient of variation for SSI is .33, the effect of J is negligible. The effect is larger when SSI is .16, but again quite small. Similarly, the effect of J is negligible when the pairing between covariance matrices and sample sizes is negative and very small when the pairing is positive. At most, the effect of J on the Type I error rate of the test of the within-subjects effect is very small. Under TMDM, the effect of J is negligible under MCAR and slightly larger, but still small, under MAR. As with the number of within-subjects levels, the overall results suggest the J has, at most, a small effect on Type I error rate, provided J 6. Effect of n min /(K – 1) The effect of n min /(K – 1) was non-significant, but th is ratio entered into several significant interactions. Average Type I error rates for n min /(K – 1) in combination with K and PM are shown in Table 4-6. Despite the significant interactions, the results in Table 4-6 suggest th at the effects of n min /(K – 1) are negligible to very small. Effect of Type of Missing Data Mechanism (TMDM) The effect of TMDM was significant; however it was modi fied by significant interactions with K, J, SSI, NPSC, and PM. Average Type I error rates for TMDM in combination with K, J, SSI, NPSC, and PM are shown in Table 4-7. When K = 3, the effect of TMDM is fairly small. The effect is slightly larger when the K = 6 in that the Type I error rate increases slightly under MAR. As with the K factor, the effect of TMDM is small when J = 3 and increases slightly when J = 6 in that the Type I error

PAGE 59

48 rates increases a little under MAR. When th e coefficient of variation for SSI is .16 the effect of TMDM is small. The effect is larger, but still small, when SSI is .33. The TMDM effect is fairly small when the pairing between covariance matrices and sample sizes is negative and larger, but still small, when the pairing is positive. The effect of TMDM is negligible when the missing data rate is 5%; it increases a little under MAR when the missing data rate is 15%. Overall, the results suggest that the effect of TMDM is negligible to small in that the Type I error rate is well controlled under a MCAR and MAR missing data mechanism. Effect of Percent of Missing Data (PM) Although the effect of PM was not si gnificant, PM entere d into significant interactions with K, n min /(K – 1), NPSC, and TMDM. Averag e Type I error rates for PM in combination with each of these factors are shown in Table 4-8. The results indicate that the effect of missing data rate is small when K = 3 and larger, but still fairly small, when K = 6. With regard to n min /(K – 1), the missing data rate has about the same size effect under both levels of n min /(K – 1). The results indicate the PM has about the same size effect at both levels of NPSC and that both effects are sm all. Additionally, the effect of PM is negligible when TMDM is MCAR. The effect is larger but small with TMDM is MAR. On the whole, the re sults suggest that th e effect of PM on the Type I error rate is small, provided PM 15%. Analysis of Type I Error Rates for the Betweenby Within-Subjects Interaction The distribution of Type I error rates for the interaction effect is presented in Figure 4-3 with M = .0537 and SD = .0068. The lowest Type I error is .0400 and the

PAGE 60

49 highest is .0750. Consequently, in all cond itions the Type I error rate was once again within Bradley’s liberal criterion interval. The results of the goodness-of-fit tests for th e logistic models fitted to the Type I error rates for the within-subjects main effect are shown in Table 4-9. The goodness-of-fit test for the tw o-way interaction model i ndicates that the two-way interaction model fits the data adequately. The backward elimination procedure indicated that the three-way interaction model is adequate for the data. Examination of the three-way interactions did not add any insight into the effect of th e factors on the Type I error rate. Thus, the two-way interaction model was selected for further analysis. Wald tests for the two-way interactio n model are shown in Table 4-10. According to the results there are three significant main effects and sixteen significant two-way interactions. As with the Type I error rates of the within-subjects main effect, interpretation of the effects will be conduc ted by inspecting the average Type I error rates. Since all of the main effects enter in to some of the two-way interactions, effects were interpreted from two-way tables of the average Type I errors rates for each two-way interaction. As before, in all of the followi ng tables, the average Type I errors are within Bradley’s liberal criterion, and hence acceptable. Nevertheless, the Type I errors are slightly affected by various combinations of the investigated factors of the study. Effect of Number of Within-Subjects Levels ( K ) Although the main effect of K was significant, it was modified by significant interactions with n min /(K – 1), NPSC, TMDM, and PM. Th e average Type I error rates for K in combination with each of these factor s are shown in Table 4-11. The results suggest that K has a slighter larger effect when n min /(K – 1) is small than when

PAGE 61

50 n min /(K – 1) is large. Nevertheless, the effect is small for both levels of n min /(K – 1). Under NPSC, K has about the same size effect at bot h levels of NPSC. The effect of K is small when TMDM is MCAR, and is larger bu t still small when TMDM is MAR. Lastly, the effect of K is small when the missing data rate is 5% and larger but still fairly small when the missing data rate is 15%. Overall, as with the within-s ubjects main effect, the results suggest the K has, at most, a small effect on the Type I error rate, provided K 6. Effect of Number of Between-Subjects Levels ( J) The main effect of J was not significant but it entere d into significant interactions with TMDM and PM. Average Type I error rates for combinations of J, TMDM, and PM are shown in Table 4-12. When TMDM is MCAR the effect of J is negligible. The effect slightly increases when TMDM is MAR, but is still fairly small. Again, the effect of J is negligible when the missing data rate is 5%, and is larger but still small when the missing data rate is 15%. As with the number of within-subjects levels, the overall results suggest the J has, at most, a small effect on Type I error rate, provided J 6. Effect of n min /(K – 1) The effect of n min /(K – 1) was non-significant, but th is ratio entered into several significant interactions. Averag e of Type I error rates for n min /(K – 1) in combination with K, SSI, NPSC, TMDM, and PM are shown in Table 4-13. With the exception of NPSC and TMDM, the effect of n min /(K – 1) was negligible under the less extreme condition of the other factors and small with th e more severe condition. With regard to NPSC, the effect of n min /(K – 1) was negligible when NPSC is positive and small when NPSC is negative. Despite the significant inter action with TMDM, the results suggest that the effect of n min /(K – 1) is negligible to very small for both levels of TMDM.

PAGE 62

51 Overall, the results sugg est that the effects of n min /(K – 1) on the Type I error rate are negligible to small. Effect of Sample Size Inequality (SSI) The main effect of SSI was not signifi cant; however it entered into significant interactions with n min /(K – 1), NPSC, TMDM, and PM. Th e average Type I error rates for combinations of SSI with the other factors are shown in Table 4-14. The effect of SSI is negligible when n min /(K – 1) is large and fairly small when n min /(K – 1) small. For NPSC, the effect of SSI was negligible unde r a positive pairing of covariance with group sample size and small under a negative pairing. For the rest of the f actors, the effect of SSI was negligible under the less extreme condition of the factors and small under the more extreme conditions. Overall, it appears that the effects of SSI on the Type I error rate are negligible to small. Effect of Nature of Pairing of Group Sizes with Covariance Matrices (NPSC) The main effect of NPSC was signifi cant, but was modified by significant interactions with K, n min /(K – 1), SSI, , and PM. The average Type I error rates for NSPC in combination with these other factor s are shown in Table 4-15. The effect of NPSC was small and about the same size for both levels of K. The size of the NPSC effect was fairly small for large n min /(K – 1) and larger, but still small, when n min /(K – 1) was small. Under SSI and PM, the NPC effect was small. The NPSC effect was small under the less extreme conditions of SSI and PM and a bit larger under the more extreme conditions of SSI and PM. Lastly, th e NPSC effect is still small when = .90 when NPSC is negative and slightly larger, but still fairly small, under the remaining conditions. Overall, the NPSC effect on the Type I erro r rate is fairly small.

PAGE 63

52 Effect of Type of Missing Data Mechanism (TMDM) The main effect of TMDM was signi ficant, however it was modified by significant interactions with K, J, n min /(K – 1), SSI, and PM. The average Type I error rates for these two-way interactions are show n in Table 4-16. The effect of TMDM is negligible and about the same for both levels of J and SSI. The TMDM effect was small and about the same for both levels for each r eaming factors. Even though the interactions were significant, in general, the results s uggest that TMDM has a negligible to small effect on the Type I error rate. Example Data are taken from Little and Rubin ( 2002, p. 243) and are distances from the center of the pituitary to the maxillary fissure for 16 boys and 11 girls at ages 8, 10, 12, and 14. In this data set gender is th e between-subjects factor and age is the within-subjects factor. The original the data were co mplete but unbalanced. However, using a missing data mechanism designed to be MAR but not MCAR, Little and Rubin deleted nine values at age 10. Specifically, pa rticipants with low values at age 8 are more likely to have missing values at age 10. In order to create an in equality of covariance condition for this example, I multiplied the measurements for boys and girls by 1.16 and .79, respectively. This created a positiv e paring condition in wh ich the group with the largest sample size (males) was paired with the covariance matrix with the largest elements. Residual profiles for the complete data are shown in Figures 4-4 and 4-5 and illustrate greater variability for males. The complete data with no missing va lues were analyzed by using the KR F-test with a homogeneous model, which ignores heterogeneity of covariance, and a heterogeneous model, which accounts for heterogeneity of covariance. The SAS PROC

PAGE 64

53 MIXED program for the heterogeneous model is shown in Table 4-17. To estimate the homogeneous model, just delete the code “group=gender.” The information criteria and KR F-test results for the analys es are shown in Tables 418 and 4-19, respectively. According to the likelihood-ratio test, 2 (10) = 38.5, p = .000, the heterogeneous model fits the data significantly better than the homogeneous model. Additionally, Akaike’s (1974) information index (AIC) and the Bayesian information index (BIC) (Schwarz, 1978) are smaller for the heterogene ous model indicating that it fits the data better. One noticeable f eature is that the KR F-test for the interaction is larger (i.e., more conservative) for the homogene ous model than for the hetero geneous model. This is consistent with past research demonstrating that a positive pairing of group sample size with covariance matrix will produce a more conservative test of the betweenby within-subjects interaction (Keselman & Keselman, 1990). Nevertheless, the results for the complete data show that both main effects and the interaction are statistically significant for the homogeneous and heterogeneous model. The same analyses were conducted on the listwise-deleted data: participants with a missing value at age 10 were excluded from the analysis. The results are shown in Table 4-20. In this case, the interaction is not significant either when the covariance matrices are assumed to be homogeneous or when covariance heterogeneity is modeled and reflects the loss of power associated with listwise deletion, particularly when working with small sample sizes. Although the p-values for the main effects remain significant, another indication of loss of power is that each F statistic in these analyses is smaller than the corresponding F statistic for the complete data analyses. As with the complete data case, the p-value associated with the interaction is larger for the

PAGE 65

54 homogeneous model than for the heterogeneous m odel, a result that is consistent with the spuriously conservative nature of the te st that incorrectly assumes homogeneous covariance matrices. Lastly, the analysis was conducted on the in complete data; in th is analysis all of the observed data points in the incomplete data were included in the analysis. The information criteria and KR F-tests for the analyses are shown in Tables 4-21 and 4-22, respectively. According to the likelihood-ratio test, 2 (10) = 35.3, p = .000, the heterogeneous model fits the data significantl y better than the homogenous model. Both AIC and BIC support this conclusion as they di d in the complete data case. One thing to note is that the values of the F statistics for these analys es are between those of the complete data and the listwise deleted data. However, because the F values for the main effects are so large in all three analyses, the p-values for the main effects are less than .0001 in all three analyses. As was tr ue in the two preceding analyses, the p-value associated with the interaction for the homogeneous model is higher than for the heterogeneous model. However, the p-value associated with the interaction for the homogeneous model is not significant becau se modeling the data incorrectly by not accounting for the inequality of covari ance negatively affected the KR F-test and because there is some loss of power due to the missing data. When the heterogeneity of covariance was taken into account, that is, wh en the correct model was specified, the KR F-test resulted in a significant interaction.

PAGE 66

55 Table 4-1. Wald Tests of Type I Error for the Between-Subjects Main Effect Wald p-value df Source K J n min /(K – 1) SSI NPSC MDM PM 1 1 1 1 1 2 1 1 7.35 0.98 0.08 5.53 5.04 1.55 2.21 0.08 .0067 .3222 .7751 .0187 .0247 .4617 .1367 .7751

PAGE 67

56 Table 4-2. Goodness of Fit Tests for Within-Subjects Main Effect Model Deviance df p-value Intercept-only Main Effects Two-Way 994.43 593.84 354.24 383 .0001 374 .0001 339 .2735

PAGE 68

57 Table 4-3. Wald Tests of Type I E rror for Within-Subjects Main Effect p-value df Source Wald K J n min /( K – 1) SSI NPSC TMDM PM K J K n min /( K – 1) K SSI K NPSC K K TMDM K PM J n min /( K – 1) J SSI J NPSC J J TMDM J PM n min /( K – 1) SSI n min /( K – 1) NPSC n min /( K – 1) n min /( K – 1) TMDM n min /( K – 1) PM SSI NPSC SSI SSI TMDM SSI PM NPSC NPSC TMDM NPSC PM TMDM PM TMDM PM 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 2 1 1 2 2 1 2.64 .1045 .0374 .4771 .9355 4.33 0.51 0.01 6.11 0.96 11.95 0.20 0.01 4.27 0.91 0.74 4.13 25.66 35.56 2.67 4.64 17.72 0.61 14.10 3.80 0.03 1.60 5.80 2.91 9.78 2.71 4.06 4.53 1.11 .0135 .6175 .0005 .6573 .9562 .0387 .3391 .3907 .1270 .0001 .0001 .1026 .0311 .0001 0.53 4.06 4.41 0.91 1.53 78.88 .7380 .0002 .0514 .8724 .2057 .0550 .0882 .0018 .0998 .1312 .0333 .2912 .7673 .0440 .0358 .6345 .4649 .0001

PAGE 69

58 Table 4-4. Effect of Number of Within-Subjects Levels (K) on Type I Error Rates for the Within-Subjects Main Effect Note . Each proportion is out of 480,000 hypothesis tests. For K = 3, small and large n min /(K – 1) are 4.0 and 6.0, respectively; for K = 6, n min /(K – 1) are 5.0 and 7.7, respectively. K Factor Factor levels 3 6 n min /(K – 1) TMDM PM Small Large MCAR MAR 5% 15% .0503 .0514 .0502 .0515 .0541 .0539 .0516 .0564 .0507 .0496 .0521 .0574

PAGE 70

59 Table 4-5. Effect of Number of Between-Subjects Levels (J) on Type I Error Rates for the Within-Subjects Main Effect J Factor Factor levels 3 6 Note . Each proportion is out of 48,000 hypothesis tests. SSI NPSC TMDM .16 .33 Positive Negative MCAR MAR .0515 .0526 .0511 .0530 .0511 .0529 .0530 .0527 .0532 .0525 .0507 .0550

PAGE 71

60 Table 4-6. Effect of n min /(K – 1) on Type I Error Rate s for the Within-Subjects Main Effect n min /(K – 1) Factor Factor levels Small Large Note . Each proportion is out of 480,000 hypothesis tests. For K = 3, small and large n min /(K – 1) are 4.0 and 6.0, respectively; for K = 6, n min /(K – 1) are 5.0 and 7.7, respectively. K PM 3 .0503 .0541 .0494 .0514 6 .0539 5% 15% .0509 .0550 .0545

PAGE 72

61 Table 4-7. Effect of Type of Missing Data Mechanism (TMDM) on Type I Error Rates for the Within-Subjects Main Effect TMDM Factor Factor levels Note . Each proportion is out of 480,000 hypothesis tests. MCAR MAR K J SSI NPSC PM 3 6 3 6 .16 .33 Positive Negative 5% 15% .0502 .0516 .0511 .0507 .0511 .0508 .0503 .0515 .0501 .0517 .0515 .0564 .0529 .0550 .0534 .0545 .0540 .0539 .0502 .0577

PAGE 73

62 Table 4-8. Effect of Percent of Missing Da ta Mechanism (PM) on Type I Error Rates for the Within-Subjects Main Effect PM Factor Factor levels 5% 15% Note . Each proportion is out of 480,000 hypot hesis tests. Small and large n min /(K – 1) are 4.0/5.0 and 6.0/7.7, respectively. K n min /(K – 1) NPSC TMDM 3 .0496 6 .0507 Small Large Positive Negative MCAR MAR .0494 .0509 .0502 .0501 .0501 .0502 .0521 .0574 .0550 .0545 .0541 .0554 .0517 .0577

PAGE 74

63 Table 4-9. Goodness of Fit Te sts for the Betweenby Within-Subjects Interaction Model Deviance df p-value Intercept-only Main Effects Two-Way 1700.15 736.58 368.79 383 374 339 .0001 .0001 .1276

PAGE 75

64 Table 4-10. Wald Tests of Type I Error for the Betweenby Within-Subjects Interaction Source df Wald p-value K J n min /( K – 1) SSI NPSC TMDM PM K J K n min /( K – 1) K SSI K NPSC K K TMDM K PM J n min /( K – 1) J SSI J NPSC J J TMDM J PM n min /( K – 1) SSI n min /( K – 1) NPSC n min /( K – 1) n min /( K – 1) TMDM n min /( K – 1) PM SSI NPSC SSI SSI TMDM SSI PM NPSC NPSC TMDM NPSC PM TMDM PMDM TMDM PM 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 2 1 1 2 2 1 7.01 0.19 2.01 0.01 27.57 0.15 77.07 1.83 1.23 11.19 0.38 5.12 1.04 50.16 62.50 0.89 0.12 3.68 2.06 10.31 13.67 7.05 39.00 0.90 40.12 5.29 8.90 2.40 7.08 4.18 6.04 1.49 13.95 0.29 0.35 63.37 .0081 .6636 .1566 .9168 .0001 .9270 .0001 .1762 .2668 .0008 .5362 .0237 .5934 .0001 .0001 .3457 .7338 .0550 .3572 .0013 .0002 .0079 .0001 .6382 .0001 .0215 .0029 .3009 .0078 .0409 .0488 .2228 .0002 .8629 .8386 .0001

PAGE 76

65 Table 4-11. Effect of Number of Within-Subjects Levels (K) on Type I Error Rates for the Betweenby Within-Subjects Interaction K Factor Factor levels 3 6 n min /(K – 1) NPSC TMDM Note . Each proportion is out of 480,000 hypothesis tests. For K = 3, small and large n min /(K – 1) are 4.0 and 6.0, respectively; for K = 6, n min /(K – 1) are 5.0 and 7.7, respectively. PM Small Large Positive Negative MCAR MAR 5% 15% .0509 .0507 .0482 .0534 .0519 .0497 .0492 .0524 .0577 .0553 .0544 .0586 .0552 .0578 .0520 .0611

PAGE 77

66 Table 4-12. Effect of Number of Between-Subjects Levels (J) on Type I Error Rates for the Betweenby Within-Subjects Interaction J Factor Factor levels 3 6 .0534 .0525 Note . Each proportion is out of 480,000 hypothesis tests. TMDM PM MCAR MAR 5% 15% .0505 .0554 .0537 .0550 .0506 .0581

PAGE 78

67 Table 4-13. Effect of n min /(K – 1) on Type I Error Rates for the Betweenby Within-Subjects Interaction n min /(K – 1) Factor Factor levels Small Large Note . Each proportion is out of 480,000 hypothesis tests. For K = 3, small and large n min /(K – 1) are 4.0 and 6.0, respectively; for K = 6, n min /(K – 1) are 5.0 and 7.7, respectively. For all other factors, small and large n min /(K – 1) are 4.0/5.0 and 6.0/7.7, respectively. K SSI NPSC TMDM PM 3 6 .16 .33 Positive Negative MCAR MAR 5% 15% .0509 .0577 .0527 .0559 .0509 .0577 .0552 .0534 .0508 .0578 .0507 .0553 .0524 .0537 .0517 .0543 .0519 .0542 .0504 .0557

PAGE 79

68 Table 4-14. Effect of Sample Size Inequa lity (SSI) on Type I Error Rates for the Betweenby Within-Subjects Interaction SSI Factor Factor levels .16 .33 n min /(K – 1) Note . Each proportion is out of 480,000 hypot hesis tests. Small and large n min /(K – 1) are 4.0/5.0 and 6.0/7.7, respectively. NPSC TMDM PM Small Large Positive Negative MCAR MAR 5% 15% .0527 .0524 .0507 .0544 .0529 .0522 .0499 .0552 .0559 .0537 .0519 .0577 .0542 .0553 .0513 .0583

PAGE 80

69 Table 4-15. Effect of Nature of Pairing of Group Sizes with Covariance Matrices (NPSC) on Type I Error Rates for the Betweenby Within-Subjects Interaction NPSC Factor levels Positive Factor Negative K Note . Each proportion is out of 480,000 hypothesis tests. For , each proportion is out of 320,000 hypothesis tests. Small and large n min /(K – 1) are 4.0/5.0 and 6.0/7.7, respectively. n min /(K – 1) SSI PM 3 6 Small Large .16 .33 .90 .75 .60 5% 15% .0534 .0482 .0586 .0544 .0577 .0509 .0543 .0517 .0544 .0507 .0577 .0519 .0552 .0517 .0561 .0511 .0512 .0567 .0522 .0489 .0537 .0598

PAGE 81

70 Table 4-16. Effect of Type of Missing Data Mechanism (TMDM) on Type I Error Rates for the Betweenby Within-Subjects Interaction TMDM MAR Factor Factor levels MCAR .0519 .0552 K .0497 3 .0578 6 3 6 Small J .0525 .0534 .0550 .0537 .0552 .0519 n min /(K – 1) .0534 .0542 Large Note . Each proportion is out of 480,000 hypot hesis tests. Small and large n min /(K – 1) are 4.0/5.0 and 6.0/7.7, respectively. .16 .33 5% 15% .0522 .0529 .0542 .0518 SSI .0553 PM .0494 .0582 .0553

PAGE 82

71 Table 4-17. PROC MIXED Program for Example proc mixed ic; class person gender age; model measure = gender age ge nder*age/ ddfm=kenwardroger; repeated age/ subject=pe rson group=gender type=un; run; Note . The keyword ic produces the model information criteria.

PAGE 83

72 Table 4-18 Model Information Criteria for Complete Data Information criterion Model Parameter Deviance AIC BIC Homogeneous 10 430.3 450.3 463.3 Heterogeneous 20 391.8 431.8 457.7

PAGE 84

73 Table 4-19. Kenward-Roger F-Tests for Complete Data p-value F Denominator Source Numerator df df Homogenous model Between Gender (G) 1 25.0 209.93 < .0001 Within Age (A) 3 23.0 27.69 < .0001 3 23.0 5.70 .0045 A G Heterogeneous model Between Gender (G) 1 24.5 230.14 < .0001 Within Age (A) 3 17.7 35.63 < .0001 3 17.7 7.28 A G .0022

PAGE 85

74 Table 4-20. Kenward-Roger F-Tests for Listwise Deleted Data p-value F Source Numerator Denominator df df Homogenous model Between Gender (G) 1 16.0 150.35 < .0001 Within Age (A) 3 14.0 21.25 < .0001 3 14.0 2.45 .1066 A G Heterogeneous model Between Gender (G) 1 16.0 187.85 < .0001 Within Age (A) 3 11.4 27.18 < .0001 3 11.4 3.15 .0667 A G

PAGE 86

75 Table 4-21. Model Information Criteria for Incomplete Data Information criterion Model Parameter Deviance AIC BIC Homogeneous 10 398.6 418.6 431.5 Heterogeneous 20 363.3 403.3 429.2

PAGE 87

76 Table 4-22. Kenward-Roger F-Tests for Incomplete Data p-value F Denominator Source Numerator df df Homogenous model Between Gender (G) 1 23.4 173.74 < .0001 Within Age (A) 3 16.2 18.18 < .0001 3 16.2 2.92 .0656 A G Heterogeneous model Between Gender (G) 1 22.5 189.22 < .0001 Within Age (A) 3 11.5 23.56 < .0001 3 11.5 3.73 .0433 A G

PAGE 88

77 Figure 4-1. Between-Subjects Main Effect Distribution of Type I Error Rates

PAGE 89

78 Figure 4-2. Within-Subjects Main Effect Distribution of T ype I Error Rates

PAGE 90

79 Figure 4-3. Betweenby Within -Subjects Interaction Distribu tion of Type I Error Rates

PAGE 91

80 AGE14 12 10 8Residual8 6 4 2 0 -2 -4 -6 -8 Figure 4-4. Male Residual Profiles for Complete Data

PAGE 92

81 AGE14 12 10 8Residual8 6 4 2 0 -2 -4 -6 -8 Figure 4-5. Female Residual Profiles for Complete Data

PAGE 93

CHAPTER 5 CONCLUSIONS One of the most popular experimental designs in educational and psychological research is the split-plot design. Split-plot designs ha ve both betweenand within-subjects factors. The within-subjects factor can be due to repeatedly measuring each subject on the same measure or by expos ing each subject to several treatments in which scores are comparably scaled. In its simplest form the split-plot design has one between-subjects factor and one within-subjects factor and this design was investigated in the present study. In addition, missing values commonly o ccur in educational and psychological research. The mechanisms that describe th e relationships among i ndicators of missing data and the variables in a study have been placed into three broad categories: missing completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR). The MCAR and MAR missing data m echanisms are ignorable for purposes of maximum likelihood (ML) estimation and NMAR is non-ignorable. In addition the MCAR missing data mechanism is i gnorable for the purposes of sampling distribution-based inference, while the validity of sampling distribution-based inferences under the MAR mechanism depends on the procedure for calculating the sampling covariance matrix. Traditional methods for analyzing sp lit-plot designs are univariate ANOVA, univariate ANOVA with approximate degrees of freedom (ADF), and multivariate 82

PAGE 94

83 ANOVA. Of these three methods the latter two are the pref erred methods in terms of controlling Type I error rate. However, they both require the assumption of homogeneity of covariance matrices across the levels of the between-subjects factor and have been shown to fail to control the Type I error when the covariance matrices are heterogeneous. As a result, the Welch-James (WJ) test has been proposed. The WJ test does not make the homogeneity of covariance a ssumption across the levels of the between-subject factor and has been shown to control the Type I erro r with heterogeneous covariances, provided the sample sizes are sufficiently large. A lthough it uses ML estimation for the means and restricted maximum likelihood (REML) estima tion for the covarian ce matrices, listwise deletion must be performed if the WJ test is to be used on data with missing values. Because of this, it should only be used wh en the missing data mechanism if MCAR. The linear mixed-effects model has beco me increasingly popular for analyzing data from experimental designs because of its modeling flexibility. A practical advantage of the mixed-effect model, as it is implem ented in SAS’s PROC MIXED, is that although it also uses ML estimation like the WJ test, it includes all of the available data into the analysis (i.e., it does not perform listwise de letion). Consequently, it can legitimately be used with both types of ignorable missing data mechanisms. Even so, it has been shown that the covariance matrix used in calculating the standard F-test in PROC MIXED is biased, which can affect the Type I error. A better approxima tion to the covariance matrix was introduced by Harville and Jeske (1992) and is implemented in SAS’s PROC MIXED along with degrees of freedom propos ed by Kenward and Roger (1997) as the Kenward-Roger (KR) test.

PAGE 95

84 Several studies have shown that the KR test or tests similar to it perform well at controlling the Type I error in split-plot designs. Some of the conditions investigated in these studies included unbalanced data, whic h can be thought of as data with a MCAR missing data mechanism. Other conditions in cluded covariance he terogeneity across the levels of the between-subjects factor. Ho wever, these studies did not investigate performance of the KR test or similar test s in conditions that combine missing data and covariance heterogeneity. This study assessed Type I error rates of the Kenward-Roger tests of the between-subjects main effect , the within-subjects main effect, and the betweenby within-subjects interaction in a split-plot design with an ignorable missing data mechanism and heterogeneity of the covariance matrices acro ss the level of the between-subjects factor. Because the KR test is similar to the WJ test, the KR test was investigated under simulation conditions th at researchers had previously used to investigate the WJ test and under which the WJ test was shown to adequately control the Type I error rate. The results of this study support the conclusion that sampling distribution based inferences on the means using ML estimates can be valid for MCAR missing data mechanisms. Additionally, sampling distribut ion based inferences using ML estimates can be valid when the missing data mech anism is MAR even though the information matrix under the nave sampling framework was used instead of the observed information matrix or the expected information matrix under the unconditional sampling framework (Little & Rubin, 2002; Rubin, 1976). Even t hough the KR test kept the actual Type I error rate within Bradley’s ( 1978) liberal crit erion of .025 .075 under all of the simulation factors, certain simulated factors or combination of fact ors did influence the

PAGE 96

85 Type I error of the KR test. In particular, number of levels of the within-subjects factor, type of missing data, percent of missing data, sample size inequality across the between-subjects factor (SSI), nature of pairing of gr oup sizes with covariance matrices (NPSC), and ratio of the smallest n to K – 1 (i.e., n j min /(K – 1)) are factors that affect the Type I error of the ML based KR test. With respect to the between -subjects main effect, the KR test statistic controlled the Type I error rate well across all levels of the simulation factors. All of the Type I error rates were well within both Bradley’s a nd liberal and conservati ve criterion. Thus, none of the simulation factors affected the Type I error rate of the between-subjects main effect. In regard to the within-subjects main effect, number of levels of the within-subjects factor, type of missing data mechanism, a nd percent of missing data had statistically significant effects on the Type I error rates. Although statistically significant effects were observed, in gene ral all effects were quite sma ll. The nominal Type I error rate was .05, estimated Type I error rates we re between .04 and .071, and the effects were typically on the order of .01 or less. In general, when the levels of the within-subjects factor increased from three to six the Type I error rate increased and the discrepancy between the actual and nominal type I error rate increased. The increase in the Type I error rate was more noticeable when the number of levels of the within-subjects factor was six and the missing data were MAR, or when 15% of the data were missing. Additionally, the Type I error rate increased as the type of missing data mechanism changed from MCAR to MAR and the incr ease was more noticeable when combined with 15% missing data. Lastly, the Type I er ror rate was higher when 15% of the data

PAGE 97

86 were missing. In both cases, as the Type I error rate increased, the difference between the actual and nominal Type I error rate increased. With respect to the betweenby within-subj ects interaction, for a nominal .05 test, estimated Type I error rates ranged from .04 to .075. Statistically significant effects were number of levels of the within-subjects factor, sample size inequality across the between-subjects factor, and nature of pairing of group sizes with the covariance matrices, and the n min /(K – 1) ratio. As was true for the within-subjects main effect, increasing the levels of the within-subjects fr om three to six increased the Type I error rates and the increase was more noticeable when the missing data mechanism was MAR or when 15% of the data were missing. Additionally, when the sample size inequality across the between-subject factor was more severe (C = .33) the Type I error rates tended to be higher. Also, the Type I error rates we re higher when the nature of pairing of group size with the covariance matrix was negative (i.e., when the largest n j is paired with the covariance matrix with the smallest elements), and the increase was slightly higher when 15% of the data were missing. In all cases, when the Type I erro r rate increased the difference between the nominal and actual Type I error rate increas ed. Lastly, the n min /(K – 1) ratio had a stabilizing effect on the Type I error rate. In general, when the n min /(K – 1) was large the Type I error rate tend ed to be closer to the nominal Type I error. Overall, however, all significant effects were small in magnitude. The effects of the factors on Type I e rror rates were generally quite small. Nevertheless it is clear that th e effects of the factors on the on Type I error rates must be due to their effects on the accuracy of the F-distribution as an approximation to the sampling distribution of the test statistic. The KR test statistic was selected because it

PAGE 98

87 uses a better estimator of the covariance matr ix for small sample sizes and Satterthwaite (1946) type degrees of freedom based on the better estimate of the covariance matrix. However, when the data are incomplete in a ddition to being relatively small and paired with a MAR missing data mechanism, the accu racy of the approximation may be worse than when the data are complete. One possible mediating factor for number of levels of the wi thin-subjects factor, sample size ratio, percent of missing data is bias in the estimated means. Even though the ML estimates of the means are known to be consistent under both MCAR and MAR missing data mechanisms, it is possible for them to be biased. If they are biased, one might expect the bias to be worse when the sample size ratio is small and the percent of missing data is large because, under these c onditions, there is a sm aller amount of data available per parameter to use in estimation. Similarly, under MCAR the sampling covariance matrix is consistently estimat ed but may be biased. Under MAR, the sampling covariance matrix may not be consis tently estimated because the information matrix is calculated under the nave sampling framework. Nevertheless, the estimates may improve with increases in sample size. Whether or not bias in the estimated means and the sampling covariance matrix account for the poor performance of the KR test with small sample size ratios, large proportions of missing data, and la rge levels of the within-subjects factor should be addressed in future work. Although the design investigat ed in this study was a popular split-plot design with one betweenand one within-subjects factor, the positive findings opens the door for further simulation work on using ML to di rectly estimate model parameters from split-plot designs with missing values. One condition that can be investigated is a

PAGE 99

88 non-normal distribution of the depe ndent variable. In this particular study, the data were generated under a multivariate normal distribu tion and since data from educational or psychological research cannot be presumed to be normal, investigation of a non-normal data condition can provide applied researchers with valuable information as to whether the KR test is robust to the normality assumption. In other words, can the KR test control the Type I error when the normality assumption is violated? Even though all of the Type I error rate s of the KR test were within Bradley’s (1978) liberal criterion, it is not clear at wh at percent of missing da ta the KR test will begin to breakdown. Additionally, it is not clea r how small the sample sizes need to be in order for the KR test to provide reasonable control of the Type I error. Consequently, future work should focus on what are the percent of missing data and sample size requirements needed for the KR test to provi de reasonable control of the Type I error. An alternative to the estimator of the sampling covariance matrix used in the KR test is the sandwich estimator. The sandwich estimator provides a consistent estimator of the covariance matrix given that the struct ural model is correct, which makes the estimator attractive particularly because it can be difficult to correctly specify a covariance structure and the data may not be normally distributed. Hence, it may be fruitful to compare the performance of the F-test using the sandwich estimator to the KR test at controlling the Type I error in a simulation study with i gnorable missing data. Lastly, attrition in longitudinal studies is another form of missing data that is common in educational and psychological resear ch. Attrition occurs when participants completely drop out prior to the end of the study and do not return. Participant may drop out for various reasons such as moving out of town, health pr oblems, or just refusing to

PAGE 100

89 continue to participant in the study. Nevert heless, future work should consider how the KR test performs in controlling the Type I er ror in a split-plot de sign in conjunction with methods used in the medical sciences when there is attrition in the data.

PAGE 101

REFERENCES Albert, P. S., & Follmann, D. A. (2000). Modeling repeated count data subject to information dropout. Biometrics, 56, 667-677. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transaction on Automatic Control, 19, 716-723. Algina, J., & Keselman, H. J. (1997). Te sting repeated measure hypotheses when covariance matrices are heterogeneou s: Revisiting the robustness of the Welch-James test. Multivariate Behavioral Research, 32, 255-274. Algina, J., & Keselman, H. J. (1998). A power comparison of the Welch-James and Improved General Approximation test s in the split-plot design. Journal of Educational and Behavioral Statistics, 23, 152-169. Algina, J., & Oshima, T. C. (1994). Type I error rates for Huynh’s general approximation and improved general approximation tests. British Journal of Mathematical and Statistical Psychology, 47, 151-165. Booth, J. G., & Hobert, J. P. (1998). Standard errors of prediction in generalized linear mixed models. Journal of the American St atistical Association, 96, 262-272. Box, G. E. P. (1954). Some theorem on quadrat ic forms applied in the study of analysis of variance problems: II. Effect of inequa lity of variance and correlations between errors in the two-way classification. Annals of Mathematical Statistics, 25, 484-498. Bradley, J. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144-152. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39, 1-38. Diggle, P. D., & Kenward, M. G. (1994). Informative dropout in longitudinal data analysis. Applied Statistics, 43, 49-93. 90

PAGE 102

91 Fai, H. T., & Cornelius, P. L. (1996). Approximate F-tests for multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments. Journal of Statistical Co mputation and Simulation, 54, 363-378. Fitzmaurice, G. M., Laird, N. M., & Shneye r, L. (2001). An alternative parameterization of the general linear mixture model fo r longitudinal data with non-ignorable drop-outs. Statistics in Medicine, 20, 1009-1021. Gleason, T. C., & Staelin, R. (1975). A proposal for handling missing data. Psychometrika, 40, 229-251. Graham, J. W., & Hofer, S. C. (2000). Multiple imputation in multivariate research. In T. D. Little, K. U. Schnabel, & J. Baumert (Eds.), Modeling longitudinal and multivariate data: Practical issues, applied approaches, and specific examples (pp. 201-218). Hillsdale, NJ: Lawrence Erlbaum Associates. Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planed missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197-218. Greenhouse, S. W., & Geisser, S. (1959). On methods in the analys is of profile data. Psychometrika, 24, 95-112. Harville, D. A., & Jeske, D. R. (1992). Mean squared error of estimation or prediction under a general linear model. Journal of the American St atistical Association, 87, 724-731. Hegamin-Younger, C., & Forsyth, R. ( 1998). A comparison of four imputation procedures in a two-vari able prediction system. Educational and Psychological Measurement, 58, 197-210. Huynh, H., & Feldt, L. S. ( 1970). Conditions under which mean square ratios in repeated measures designs have exact F-distributions. Journal of the American Statistical Association, 65, 1582-1589. Huynh, H. & Feldt, L. S. (1976). Estimation of the Box correction for degree of freedom from sample data in randomized block and split-plot designs. Journal of Educational Statistics, 1, 69-82. Johansen, S. (1980). The WelchJames approximation to the di stribution of the residual sum of squares in a weig hted linear regression. Biometrika, 67, 85-92. Kackar, R. N., & Harville, D. A. (1984). Approximations for standard errors of estimators of fixed and random effects in mixed linear models. Journal of the American Statistical Association, 79, 853-862.

PAGE 103

92 Kenward, M. G. (1998). Selection models for repeated measurements with non-random dropout: An illustration of sensitivity. Statistics in Medicine, 17, 2723-2732. Kenward, M. G., & Molenberghs, G. (1998). Likelihood based frequentist inference when data are missing at random. Statistical Science, 13, 236-247. Kenward, M. G., & Roger, J. H. (1997). Sm all sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983-997. Keselman, H. J., Algina, J., Kowalchuk, R. K ., & Wolfinger, R. D. (1999b). The analysis of repeated measurements: A comparison of mixed-model Satterthwaite F tests and a nonpooled adjusted degrees of freedom multivariate test. Communications in Statistics – Theory and Methods, 28, 2967-2999. Keselman, H. J., Algina, J., Wilcox, R. R., & Kowalchuk (2000) . Testing repeated measures hypothesis when covariance matrices are heterogeneous: Revisiting the robustness of the Welch-James test again. Educational and Psychological Measurement, 60, 925-938. Keselman, H. J., Carriere, K. C., & Lix, L. M. (1993). Testing repeated measure hypotheses when covariance matrices are heterogeneous. Journal of Educational Statistics, 18, 305-319. Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donohue, B., Kowalchuk, R. K., Lowman, L. L., Petosky, M. D., Keselman, J. C., & Levin, J. R. (1998). Statistical practices of educatio nal researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350-386. Keselman, H. J., & Keselman, J. C. (1990) . Analysing unbalanced repeated measures designs. British Journal of Mathematical and Statistical Psychology, 43, 265-282 Keselman, H. J., & Keselman, J. C. (1993). Anal ysis of repeated measurement. In L. K. Edwards (Eds.), Applied analysis of varian ce in behavioral science (pp. 105-145). New York: Marcel Dekker. Keselman, H. J., Keselman, J. C., & Lix, L. M. (1995). The analysis of repeated measurements: Univariate tests, multivariate tests, or both? British Journal of Mathematical and Statistical Psychology, 48, 319-338. Keselman, H. J., Keselman, J. C., & Shaffer, J. P. (1991). Multiple pairwise comparisons of repeated measures means under violation of multisample sphericity. Psychological Bulletin, 110, 162-170.

PAGE 104

93 Keselman, J. C., Lix, L. M., & Keselman, H. J. (1996). The analysis of repeated measurements: A quantitative research synthesis. British Journal of Mathematical and Statistical Psychology, 49, 275-298. Kowalchuk, R. K., Keselman, H. J., Algina, J ., & Wolfinger, R. D. (2004). The analysis of repeated measurements with mixed-model adjusted F-tests. Educational and Psychological Measurement, 64, 224-242. Kuehl, R. O. (2000). Design of experiments: Statistical pr inciples of research design and analysis (2 nd ed.). Pacific Grove, CA: Brooks/Cole. Laird, N. M., & Ware, J. H. (1982). Ra ndom effects model for longitudinal data. Biometrics, 38, 963-974. Little, R. J. A. (1995). Mode ling the drop-out mechanism in repeated-measures studies. Journal of the American St atistical Association, 90, 1112-1121. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2 th ed.). New York: John Wiley & Sons. Looney, S. W., & Stanley, W. B. (1989). Expl oratory repeated measures analysis for two or more groups: Review and update. American Statistician, 43, 220-225. Lord, F. M., & Novick, M. R. (1968). Statistical theories of me ntal test score, with contributions by Alan Birnbaum. Reading, MA: Addison-Wesley. Maxwell, S. E., & Delaney, H. D. (1999). Designing experiments and analyzing data: A model comparison perspective. Mahwah, NJ: Lawrence Erlbaum Associates. McBride, G. B. (2002). Statistical methods helping and hindering environmental science and management. Journal of Agricultural, Bi ological, and Environmental Statistics, 7, 300-305. McCulloch, C. E., & Searle, S. R. (2001). Generalized, linear, and mixed models. New York: John Wiley & Sons. Mendoza, J. L. (1980). A significance test for multisample sphericity. Psychometrika, 45, 495-498. Meng, X. L., & Rubin, D. B. (1991). Using EM to obtain as ymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association, 86, 899-909. Muthen, B., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing co mpletely at random. Psychometrika, 52, 431-462.

PAGE 105

94 Padilla, M. A., & Algina, J. (2004). Type I error rates for a one factor within-subjects design with missing values. Journal of Modern Applie d Statistical Methods, 3, 406-416. Prasad, N. G. N, & Rao, J. N. K. (1990) . The estimation of mean squared error of small-area estimators. Journal of the American St atistical Association, 85, 163-171. Raudenbush, S. W, Bryk, A. S, Cheong, Y. F., & Congdon., R. T. (2000). HLM 5: Hierarchical linear and nonlinear modeling. Chicago: Scientific Software International. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581-592. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons. Satterthwaite, F. E. (1946). An approxima tion distribution of es timates of variance components. Biometrics Bulletin, 2, 110-114. Schaalje, G. B., McBride, J. B., & Fe llingham, G. W. (2002). Adequacy of approximations to distributions of test st atistics in complex mixed linear models. Journal of Agricultural, Biologica l, and Environmental Statistics, 7, 512-524. Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall. Schafer, J. L., & Olsen, M. K. (1998). Mult iple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33, 545-571. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461-464. Troxel, A. B. (1998). Analysis of longit udinal data with non-ignorable non-monotone missing values. Applied Statistics, 47, 425-438. Wilcox, R. R. (1995a). ANOVA: A paradigm for low power and misleading measures of effect size? Review of Educational Research, 65, 51-77. Wolfinger, R. D. (1996). Heterogeneous va riance-covariance struct ures for repeated measures. Journal of Agricultural, Biological, and Environmental Statistics, 1, 205-230.

PAGE 106

BIOGRAPHICAL SKETCH Miguel A. Padilla was born in San Fran cisco del Rincon, Guanajuato, Mexico. He received his B.A. in psychology from Calif ornia State University, Dominguez Hills in May 1998. In the fall of 1999, he enrolled for graduate studies in the Educational Psychology Department at the University of Fl orida and received his M.A.E in research and evaluation methodology in May 2002. In August 2005 he received his Ph.D. in research and evaluation methodology from th e Educational Psychology Department with a minor in statistics from the Statistics De partment at the University of Florida. 95