Citation
Evaluation of Model Fit in Cognitive Diagnosis Models

Material Information

Title:
Evaluation of Model Fit in Cognitive Diagnosis Models
Creator:
Hu, Jin Xiang
Publisher:
University of Florida
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Research and Evaluation Methodology
Human Development and Organizational Studies in Education
Committee Chair:
MILLER,DAVID
Committee Co-Chair:
ALGINA,JAMES J
Committee Members:
MANLEY,ANNE CORINNE
BRUMBACK,BABETTE A
Graduation Date:
8/9/2014

Subjects

Subjects / Keywords:
Betting ( jstor )
Cognitive models ( jstor )
Cognitive psychology ( jstor )
Modeling ( jstor )
Parametric models ( jstor )
Psychometrics ( jstor )
Sample size ( jstor )
Simulations ( jstor )
Statistical models ( jstor )
Statistics ( jstor )
cdm

Notes

General Note:
Cognitive diagnosis models (CDMs) estimate student ability profiles using latent attributes. There are compensatory and non-compensatory CDMs. DINA is the commonly used highly restrictive CDM. G-DINA is a generalization of the DINA model that can embrace other CDMs as special cases in its framework. Regardless of what model is fit to the data, model fit to the data needs to be ascertained in order to determine whether inferences from CDMs are valid. This study investigated the usefulness of some popular model fit statistics to detect CDMs fit including AIC, BIC, CAIC,RMSEA, ABS(fcor), and MAX(x2). The fit statistics are assessed under different CDM settings with respect to Q-matrix misspecification and CDM misspecification. Results showed that the wrong CDM model can be detected by all the statistics, and minor Q-matrix misspecification is hard to detect by the statistics.

Record Information

Source Institution:
UFRGP
Rights Management:
All applicable rights reserved by the source institution and holding location.
Embargo Date:
8/31/2016

Downloads

This item has the following downloads:


Full Text

PAGE 1

EVALUATION OF MODEL FIT IN COGNITIVE DIAGNOSIS MODEL S By JINXIANG HU A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2014

PAGE 2

© 2014 Jinxiang Hu

PAGE 3

T his dissertation is a special dedication to my husband .

PAGE 4

4 ACKNOWLEDGMENTS My deepest gratitude goes to my adviser Dr. M. David Miller for providing guidance on this dissertation. Thank you for the solid support during graduate school and the walk in s for the questions as they came up. I am also thankful to my committee, professors James Algina, and Anne Corinne Manley . Your valuable feedback on this dissertation was very much appreciated and I am honored to have you as part of this journey. I would al so like to thank Dr. Ba bette Brumback for being my dear outside committee member as well as my tennis partner . I am also thankful to the graduate school of UF for giving me this opportunity to obtain this doctorate degree. Last but not least, I am grateful to my family. The greatest support and inspiration came from my husband, Nan. Thank you for your wholeheartedly embracing all the distance and separation graduate school brought into our lives especially during the difficult times. I am also thankful to m y parents for believing in me and being the best role models.

PAGE 5

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 7 LIST OF FIGURES ................................ ................................ ................................ .......... 8 ABSTRACT ................................ ................................ ................................ ................... 10 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 11 2 REVIEW OF LITERATURE ................................ ................................ .................... 18 2.1 Cognitive Diagnosis Models ................................ ................................ .............. 18 2.2 Non compensatory and Compensatory Rules in CDMs ................................ .... 19 2.2.1 Non compensatory CDMs ................................ ................................ ....... 19 2.2.2 Compe nsatory CDMs ................................ ................................ .............. 20 2.3 The DINA Model ................................ ................................ ............................... 22 2.4 The G DINA Model ................................ ................................ ........................... 24 2.5 DINA, A CDM under the G DINA Framework. ................................ .................. 27 2.6 Misspecification of CDMs ................................ ................................ .................. 29 2.6.1 Q Matrix and Misspecification of the Q matrix ................................ ......... 29 2.6.2 CDM Misspecification ................................ ................................ .............. 31 2.7 Model Fit of CDMs ................................ ................................ ............................ 32 2.7.1 Relative Fit Statistics ................................ ................................ ............... 33 2.7.2 Absolute Fit Statistics ................................ ................................ .............. 34 2.8 Review of Studies of Q matrix Misspecification and CDM Misspecification on CDM Model Fit ................................ ................................ ................................ 36 2.9 Purpose of the Study ................................ ................................ ........................ 44 3 METHOD ................................ ................................ ................................ ................ 45 3.1 Simulation Design ................................ ................................ ............................. 45 3.2 Model Estimation ................................ ................................ .............................. 49 3.3 Model Fit ................................ ................................ ................................ ........... 50 3.4 Outcome Variable ................................ ................................ ............................. 50 3.5 Software ................................ ................................ ................................ ............ 51 4 RES ULTS ................................ ................................ ................................ ............... 52 4.1 Research Question 1. Relative Fit of CDM ................................ ....................... 53 4.2 Research Question 2. Absolute Fit of CDM ................................ ...................... 57 4.2.1 ABS(fcor) and MAX(x 2 ) ................................ ................................ .......... 57

PAGE 6

6 4.2.2 RMSEA. ................................ ................................ ................................ ... 59 4.3 An Empirical Example ................................ ................................ ....................... 68 5 DISCUSSION AND CONCLUSION ................................ ................................ ........ 71 LIST OF REFERENCES ................................ ................................ ............................... 75 BIOGRAPHICAL SKETCH ................................ ................................ ............................ 80

PAGE 7

7 LIST OF TABLES Table P age 1 1 An Example of Q matrix ................................ ................................ ..................... 12 3 1 Research Design ................................ ................................ ................................ 45 3 2 G DINA Item Para meters ................................ ................................ ................... 47 3 3 Correct Q matrix of 20 Items ................................ ................................ .............. 48 4 1 Item Parameter Recovery ................................ ................................ ............... 52 4 2 Selection Rate of AIC, BIC and CAIC for the DINA Model ................................ . 53 4 3 Rejection Rate of and with ................................ ..... 57 4 4 Model Fit Resu lts of the Fraction Subtraction Data Analysis .............................. 69

PAGE 8

8 LIST OF FIGURES Figure page 4 1 RMSEA of DINA with True and Misspecified Q matrix ................................ ....... 61 4 2 RMSEA of G DINA with True and Misspecified Q matrix ................................ ... 64 4 3 RMSEA of A CDM with True and Misspecified Q matrix ................................ .... 67 4 4 Fraction Subtraction Example and Q matrix. [Reprinted with permission from Chen, J. 2013. Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement (Page 135, Table 5)] ............... 69

PAGE 9

9 LIST OF ABBREVIATIONS AIC Absolute deviation of Fisher Transformed Item Pair Correlation BIC Bayesian Information Criterion CAIC Consistent Akaikes Information Criterion CDM Cognitive Diagnosis M odel C RUM Compensatory Reparameterized Unified Model DINA Model DINO Model GDM General Diagnostic Model l Log odds Ratios of Item Pairs LCDM Log linear Cognitive Diagnosis Model LCM Latent Class Model MAD Mean Absolute Differences Maximum of Item Pair NC RUM Non compensatory Reparameterized Unified Model NIDA NIDO r Residuals bet we en the Observed and Predicted Correlations of Item Pair RMSEA Root Mean Square Error of Approximation

PAGE 10

10 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy EVALUATION OF MODEL FIT IN COGNITIVE DIAGNOSIS MODELS By J inxiang Hu August 2014 Chair: M. David Miller Major: Research and Evaluation Methodology Cognitive diagnosis models (CDM s ) estimate student ability profiles using latent attributes. There are compensatory and non compensatory CDMs. DINA is the commonly used highly restrictive CDM . G DINA is a generalization of the DINA model that can embrace other CDM s as special case s in it s framework. Regardless of what model is fit to the data, model fit to the data needs to be ascertained in order to determine whether inferences from CDMs are valid . This study investigated the usefulness of some popular model fit statistics to detect CDMs information criterion (AIC), Bayesian information criterion (BIC), Consistent Akaikes Information criterion (CAIC), root mean square error of approximation (RMSEA), and . The fit statistic s a re assessed under different CDM settings with respect to Q matrix misspecification and CDM misspecification. Results showed that the wrong CDM model can be detected by all the statistics, and minor Q matrix misspecification is hard to detect by the stat istics.

PAGE 11

11 CHAPTER 1 INTRODUCTION Cognitive diagnosis models (CDMs) are psychometric models to assess mastery of sets of skills or attributes ( Henson, Templin, & Willse, 2009; Templin & Henson, 2006 ) . These combinations of skill sets are called profiles or latent classes in the CDM framework. There are many variations of the CDM that can be applied across a wide range of settings. Some of these CDMs are non compensatory, which means lacki ng one or more required skills cannot be compensated for by surplus of other skills. Examples of non compensatory models include the model (DINA; Junker & Sijtsma, 2001), the noisy l (NIDA; Junker & Sijtsma, 2001), and the non compensatory reparameterized unified model (NC RUM; Hartz, 2002). On the other hand, some CDMs are compensatory so that high levels of performance on some skills can compensate for low levels of performance on another ski ll. Examples of compensatory CDMs include the model (DINO; & Henson , 2006), and the compensatory reparameterized unified model (C RUM; Hartz, 2002). These models rely on different assumptions for the modeling process and differ in the number of parameters they co ntain for items and attributes. These differences in model specification for compensatory and non compensatory mo dels have made it difficult to directly compare models until the recent development of more general models. Recent developments in CDM theory involve the development of more general modeling of CDM where all the above mentioned CDMs can be parameterized a s

PAGE 12

12 special cases of general CDMs . These general models include : the log linear cognitive diagnosis model (LCDM; Henson, Templin , & Willse, 2009), the general diagnostic gate model (G DINA; de la Torre, 2011). The G DINA is a generalization of the DINA model. If all the interaction terms in the G DINA model are set to zero, the model is an additive CDM (A CDM; de la Torre, 2011). For this paper, different CDM s will be cons tructed under the framework of G DINA. The specification of CDM includes matching of attributes (skills) with items. This is done in a table of specifications called the Q matrix. Each element in the Q matrix indicates whether an item measures some specif ic attribute. If an attribute is measured by an item, the entry in the Q matrix is 1, otherwise the entry is 0. The entries in the Q matrix indicate whether or not a particular skill or attribute is necessary in the cognitive response process that examinee s engage in when they respond to an item. Different Q matrix specifications reflect different theoretical hypotheses in the cognitive response process, and the appropriateness of the Q matrix is crucial for evaluating th e fit of CDMs. A simple example of Q matrix is given in T able 1 1. Table 1 1. An E xample of Q matrix Attribute 1 Attribute 2 Attribute3 Item 1 1 0 0 Item 2 1 1 0 Item 3 0 0 1 As indicated by the above example, a Q matrix typically has the items in the rows and the attribut es in the columns. Entries of 1 indicate that an attribute is me asured by an item, entries of 0 indicate an attribute is not measured by an item. Based on the

PAGE 13

13 above example, item 1 requires only attribute 1 in order to be answered correctly. Item 2 requires attribute 1 and 2 at the same time in order to be ans wer ed correctly. Item 3 req uires only attribute 3 to be answere d correctly. So the Q matrix represents the design of each item in the assessment instrument in essence and the correct design of the Q matrix determines the quality and validity of the diagnostic information from the profile and the design of better instruction possible. Ho wev er, the na me cognitive diagnosis modeling is to only refer to the process of the psychometric modeling and the parameter estimate s . The correctness of the Q matrix is taken for granted . Yet this assumption of the correct Q matrix is not always true, and the conseque nces of overlooking the correctness of the Q matrix results in poor model parameter estimates, wrong classification of respondents and a poor overall model fit. As a matter of fact, the cognitive diagnosis modeling process itself should include the constru ction or val idation of the Q matrix process (d e la Torre, 2008). Research for the correct construction and validation of the Q matrix ha s been done. Liu, Douglas , and Henson (2009) used fact or analysis to develop Q matrix. d e la Torre (2008) used delta a m ethod to validate the Q matrix . de la Torre and Douglas (2004), d e la Torre (2008), Henson, Templin, and Willse (2009), and DeCarlo (2011 , 2012 ) studied the true Q exploratory compo nents analysis as a supplement to theory in developing the Q matrix under the DINA model. The above studies show the important role of the Q matrix as an integral part of the CDM model ing process when we want to draw valid inferences. Ho we ver, the exact

PAGE 14

14 specification of the Q matrix itself is hard to be known, CDM itself might be misspecified as we ll. Both the modeling and the Q matrix can be potential source s of misspecification and contribute to biased parameter estimates , of the CDM . Therefore, it is important for researchers to assess the CDM model fit in order to make valid inferences from CDMs (Rupp, Templin, & Henson, 2010). There have been studies that utilize CDM s to a nalyze real data (e.g. Ba okçu, 2014; Ba okçu, Ogretmen, & Kelecioglu, 2013 ; Bradshaw et al., 2014; Lee, Park, & Taylan, 2011; Ravand, Barati, & Widhiarso, 2013; Su et al., 2013 ; von Davier, 2008; Xu & von Davier, 2008). This study aims to provide more information on the CDM model fit for practitioners to make valid conclusions . To assess the model fit of the CDM, v arious fit statistics or methods have been developed or used for assessing model fit of CDMs. These statistics include: the residuals bet we en the observed and predicted correlations of item pair (r ) , the log odds ratios of item pairs (l ) , and the residuals bet we en the observed and predicted proportion correct of individual items (e.g., Chen, de la Torre, & Zhang, 2013; de la Torre & Dougl as, 2008; Sinharay & Almond, 2007 ), item discrimination indices (e.g., de la Torre, 2008; de la Torre & Chiu, 2010 ); and G statistics based on the observed and predicted item pair responses (e.g., Rupp, Templin, & Henson, 2010); the mean absolute deviation (MAD; e.g., Henson, Templin, & Willse, 2009 ) bet we en the observed and predicted item correlation; and the root mean square error of approximation (RMSEA; e.g., Kunina Habenicht, Rupp, & Wilhelm, 2012). Convention al fit statistics criterion (BIC; Schwarzer, 1976), the Consistent Akaikes Information cr iterion (CAIC;

PAGE 15

15 Bozdogan, 1987), the deviance information criterion (DIC; Spiegelh alter, Best, Ca rlin, & van der Linde, 2002), and the Bayes factor ( Kass and Raftery, 1995) have been adopted empirically for model fit evaluation (e.g., de la Torre & Douglas, 2004, 2008; DeCarlo, 2011, 2012; Rupp, Templin & Henson, 2010 ; Sinharay & Almond , 2007). Yet f ew studies have been conducted to systematically evaluate the extent to which these model fit indexes are sensitive to model data misfit or are useful for model selection. Choi, Templin, Cohen and Atwood (2010), Kunina Habenicht, Rupp, and W ilhelm (2012), and Chen , de la Torre, and Zhang (2013) have looked into the impact of Q matrix misspecification and CDM misspecification on model fit. CDM misspecification refers to the different CDMs that result from including different interactio n effects among the attributes. Choi, Templin, Cohen and Atwood (2010) looked at the interaction effect of the attributes on parameter and Q matrix misspecification within the log linear model (LCDM) framework. These authors examined the impact of Q matrix m isspecifications for four different CDMs . Results showe d AIC and BIC a re able to point to the correct generating model with a high level of consistency. The effect of Q matrix misspecification was estimated on the recovery of item parameters and classifica tions. T hey found that under specifying the Q matrix had a detrimental impact on both the recovery of item parameters and the resulting respondent classifications; over specifying the Q matrix had only a negligible impact on parameter recovery and classifi cat ion . In terms of sample size effect on parameter estimation, they found that with sample size larger than 500, the intercepts and main effects can be estimated accurately . In contrast, two way interaction effects in general are difficult to recover

PAGE 16

16 even with sample size of 4000 . Kunina Habenicht, Rupp, and Wilhelm (2012) found that AIC and BIC a re useful in selecting the correctly specified Q matrix against the misspecified Q matrices when about 30% of the entries had been randomly permuted or the number of attributes was severely over specified or under specified in the log linear CDM (LCDM). In addition, they also found that the AIC was useful in selecting the correct mode l against the misspecified model when all interaction effects a re omitted. Ho we ver, the usefulness of the mean absolute differences (MAD) and root mean square error of approximation (RMSEA) we re limited for absolute fit evaluation at the test level. Chen, de la Torre, and Zhang (2013) focused on CDM setting and found that AIC and BIC a re useful for detecting the correct CDM against a misspecified CDM and for detecting the correct Q matrix against a misspecified Q matrix. The residual bet we en the observed an d predicted correlation of item pair (r ) and the residual bet we en the observed and predicted log odds ratios of item pair (l ) we re insensitiv e to over specified Q matrices. These three papers all examined the model fit of CDM subject to Q matrix miss pecification and CDM model misspecification. Ho we ver, these papers are limited in the degree of Q matrix misspecifications. For example, Kunina Habenicht, Rupp, and Wilhelm (2012) defined Q matrix misspecification as about 30% of the entries had been randomly permuted , Q matrix wrong dimensionality is defined at test level under the log linear CDM (LCDM). Choi, Templin, Cohen and Atwood (2010), Chen, de la Torre, and Zhang (2013) focused on just one or two items to de fine the Q matrix misspecification. Chen , de la Torre, and Z research needs to be conducted to systematically examine the impact of the degree of Q concern,

PAGE 17

17 this study aims at investigating the usefulness of various fit statistics to the degree of Q matrix misspecifications under different CDMs in the framework of G DINA. Diverse situations of misfit a re covered, including the different misspecificati ons of the CDM in terms of attributes interaction, the number and degree of misspecification of the Q matrix. Through this study , more could be learned about CDM model fit and the results could be compared to IRT analysis as well as used by related parties to make valid

PAGE 18

18 CHAPTER 2 REVIEW OF LITERATURE This chapter contains the following sections. First, a general introduction to cognitive diagnosis models is presented. Second, n on compensatory and compensatory rules in C DM are introduced. Third, the commonly used DINA model is presented. Fourth, the general framework of G DINA is presented . Fift h, different CDMs under the framework of G DINA are specified. Six th , misspecification of the CDM is illustrated through the misspecification of the Q matrix and CDM misspecification. Seven th, CDM fit and m odel fit statistics commonly found in the CDM literature are introduced. Eigh th , studies on CDM fit that involve both misspecification of the Q matrix and misspecification of the CDMs are revie we d. To conclude, the purpose of the study is presented . 2.1 Cognitive Diagnosis Models D iscussion of CDMs is based on dichotomous items for simplicity of exposition. In other words, examinees either ans wer an item correctly or incorrect ly. There are other names for CDM such as latent response models (Maris, 1995 a ), multiple classification latent class models (Maris, 1995 b ), and structured item response theory models (Rupp & Mislevy, 2007 ). These names emphasize the statistical proce ss of the model while t he name CDM emphasize s the classification of respondents to different groups (R upp, Templin, & Henson, 2010) based on the skill set the examinees master. There are many classification approaches to CDM. The most prominent classification a pproaches are: knowledge space theory (Doignon & Falmagne, 1999; Schrepp, 2003 ), the rule space theory ( Tatsuoka, 198 3), the attribute hierarchy method (Gierl, Cui, & Zhou , 2007; Leighton, Gierl, & Hunka, 2004 ), the clustering algorithms ( Lattin, Caroll, &

PAGE 19

19 Green, 2003) and th e B ayesian inference networks. CDMs are confirmatory m ultidimensional latent variable model s . T he Q matrix defines that an item can load on more than one attribute (i.e. an item measures more than one attribute). And CDM s are special ca ses of the latent class model (LCM ; Lazarsfeld & Henry,1968 ). Recent development of CDM see more general model s such as the log linear cognitive diagnosis model (LCDM; Henson, Templin, & Willse 2009), the general diagnostic model (GDM; von Davier, 2005), and the generalized deterministic inputs, DINA; de la Torre, 2011). These general CDM s can be constrained in different ways to produce many CDMs of different assumptions, such as the DINA, the DINO, the compensatory RUM (Reparamet erized Unified Model), a nd the reduced RUM, etc . Putting different CDMs under the same general framework makes different models nested and enables the direct comparison across different models. This paper will be written in the domain of the G DINA framewo rk and a simulation study is conducted focusing on different specifications of the CDMs under the G DINA framework. D etails will be delineated in 2.3 . 2.2 Non compensatory and C ompensatory R ules in CDMs 2.2 .1 Non compensatory CDMs Non c ompensatory CDMs spe cify that the surplus of one attribute can not make up for the lack of another attribute (mastery of all required skills is required for a high probability of a correct response). In non compensatory models, the interaction of skills in examinee responses is modeled such that mastery of each and every skills required by an item leads to a high probability of a correct response. Simply put , m athematically non compensatory rules are represented by products of latent variables that indicate masteries of certain attributes, so that missing any required attribute will result in a

PAGE 20

20 score of 0. For example, if an item measures two attributes, the expected result is 1 or 0 depending on the presence (1) or absence (0) of attribute 1 and attribute 2 in multiplied form. The missing of either attribute 1 or attribute 2 result in the expected result of a 0 (Equation 2 1) . Expected Result = Attribute 1 * Attribute 2 ( 2 1 ) The non compensatory models include: the DINA model ( deterministic inputs, model; Haertel, 1989; Junker & Sijtsma, 2001 ; de la Torre & Douglas, 2004); the NIDA model (the noisy Junker & Sijtsma, 2001 ); and the non compensatory reparameterized unified model (NC RUM; Hartz, 2002). Among t hem, the DINA is the simplest and the mo st restrictive model. The other two models are less restrictive than the DINA model but are more complex. The DINA model is commonly used in CDM research studies (e.g., Chen, de la Torre, & Zhang, 2013; Choi, Templin, Cohen, & Atwood, 2010; DeCarlo, 2011,2012; de la Torre, 2008; de la Torre & Douglas, 2004, 2008; Rupp & Templin, 2008; von Davier, 2014), as wel l as in applied studies (e.g. Ba okçu, 2014; Ba okçu, Ogretmen, & Kelecioglu, 2013; Lee, Park, & Taylan, 2011; Ravand, Barati, & Widhiarso, 2013; Su et al., 2013). This is probably because it is easy to apply and interpret. This paper uses the DINA model a s the generating model in a simulation study, accordingly the DINA model will be explained in detail in the following section but not the NIDA and NC RUM models. 2.2.2 Compensatory CDMs Compensatory CDMs specify that a surplus of one attribute can compens ate for lack of another attribute (where mastery of at least one sufficient skill leads to a high probability of a correct response on an item). Examples of compensatory CDMs include

PAGE 21

21 the model (DINO; Templin & Henson, 2006), the Templin & Henson , 2006), and the compensatory reparameterized unified model (C RUM; Hartz, 2002). M athematically they contain both sums and products of latent variables so that the prese nce of any of the required attributes will result in a score of 1. For example, if an item requires two attributes, the expected response depends on the right side of the E quation 2 2 . The presence of an attribute is represented by 1 , and the presence of e ither attribute 1 or attribute 2 results in a 0 in the bracket part which will give a score of 1 to the expected result. Expected Result = 1 [(1 Attribute 1) * (1 Attribute 2)] ( 2 2 ) Statistically the expected result is based on the probability of correctly responding to an item j by examinee i given examinee i is in class c (E quation 2 3 ): ( 2 3 ) where is the probability of correct response to item j by respondent i in latent class c. is the observed response across all items , is the o bserved response of respondent i to item j . The is the multiplication symbol indicating that the probability of correct r esponse to all items from 1 to j are multiplied together assuming the distribution of item scores are independent of each other conditional on the latent class. are the estimated proportions of respon dents in the populations that belong to latent class c. The is the summation symbol (Rupp, Templin, & Henson, 2010) .

PAGE 22

22 2.3 The DINA Model Research has been conducted with the DINA model including parameter estimation, classification accuracy, Q matrix validation and model fit ( de la Torre, 2008, de la Torre & Douglas, 2004,2008; D eCarlo, 2011,2012; von Davier, 2014, etc). The DINA model is a non compensatory model with a conjunctive condensation rule. The conjunctive condensation rul e means that a respondent needs to have mastered all required attributes to obtain a score of 1 on a particular item. Thus examinees are divided into two classes: one class that has mastered all required attributes, one class that has not mastered all requ ired attributes (i.e. one or more required attributes is missing). No further differentiation is made bet we en those examinees who lack only one or more required attributes. At the same time, it is possible that examinees may randomly ans we r a question corr ectly even if they have not mastered all requir ed attributes (guessing), or answe r a question incorrectly even if they have mastered all required attributes (slipping). Thus the probability of obtaining a correct resp onse to an item is composed of two diff erent error probabilities: the guessing probability that represents the probability of obtaining a score of 1 when at least one required attribute is missing and the slipping probability that represents obtaining a score of 0 when all required attributes a re present. Whether a respondent possesses all required attributes is determined by a latent variable that takes on the value of 1 or 0, 1 for mastering all required attributes, 0 for missing at least one required attribute. Let represent the observed dichotomous score of respondent i to item j, represent the latent class of respondent i on item j, represent the slipping, represent the non slipping

PAGE 23

23 indicating that respondent i who has mastered all required attributes correctly applies all of them, represent the guessing. The equation for the DINA model can be written as in E q uation 2 4 : ( 2 4 ) where is the latent class for respondent i on item j . is 1 if all the skills required by item j have been mastered by examinee i , otherwise it is 0. is determined by E quation 2 5 : ( 2 5 ) where is the entry in the Q matrix and indicate s whether an attribute k is measured by item j ( =1 if an attribute k is measured by an item j, = 0 if not); is the entry in the skill profile and it indicate s whether respondent i possesses attribute k ( =1 means a respondent i mastered attribute k, =0 if not). Only those k attributes that are measured by item j (i.e. =1) contribute to the calculation of the probability of correctly ans wering item j. The probability of correctly hav ing each attribute is assumed to be independent of each other, so these probabilities are multiplied together to repres ent whether respondent i correctly has all required k attributes by item j, the latent response variable . The and the parameters are the measurement part of DINA (Junker & Sijtsma , 2001) where is the slip parameter and indicates the probability that a respondent who has mastered all required attributes makes an error and answers

PAGE 24

24 incorrectly . The slipping parameter and guessing parameter are determined by linking the latent response variable and the observed response , and ca n be represented by E quation 2 6 and 2 7 : ( 2 6 ) ( 2 7 ) 2.4 The G DINA M odel The assumption of the DINA model is that the non mastery group has the same probability of answering the item correctly no matter which attribute or how many If there are k attributes measured by an assessm ent, there will be attribute combinations or skill profiles altogether for the whole assessment. If we reduce the skill profiles to specific items, the number of skill profiles measured item j is where is the required attributes specifically for item j . These combinations of attributes will be classified into two groups: a group that mastered all the required attributes by item j , and at least one of the required attributes by item j . For example, for an item that requires both adding and subtracting skills, an examinee with adding skill has the same chance of answering the item correctly with an examinee with no adding n or subtracting skill. The assumption is that the non mastery group has the same probability of answering the item correctly no restrictive assumption and we know when there are respondents that master more of the required attributes than others . T heir probability of answering the item correctly should be different. This is th e limitation of the DINA model.

PAGE 25

25 The generalized DINA (G DINA) model proposed by de la Torre (2011) removed this highly restrictive equal probability assumption. Instead of two latent groups determined by the DINA model, the G DINA model partitions the examinees into latent groups on each item, where is the required attributes for item j if there are K attributes altogether for the assessment (de la Torre, 2011). As a result, if there are no constraints, the G DINA model has parameters for item j, hence affording it greater generality as compared to the DINA. In its most general formulation, the G DINA model allows examinees possessing more required attributes of an item to have a higher probability of answering the item corre ctly as compared to an examinee possessing less of the required attributes. For example, if an item measures both adding and subtracting skills , examinees will be classified into 2 2 =4 groups: a group with only adding skill, a group with only subtracting sk ill, a group with both adding and subtracting skill and a group wi th neither adding nor subtracting skill. Meanwhile, an examinee with adding skill has a greater chance of answering the item correctly tha n an e xaminee with no adding nor subtracting skill. The G DINA model gained popularity for its ability to accommodate different CDMs under its framework. Research conducted in the framework of G DINA includes: de la Torre (2011), Chen, de la Torre and Zhang (2013), Chen and de la Torre (2013). The G DINA mo del is also based on a J ×K Q matrix where J is the number of items and K is the number of attributes (de l a Torre, 2011; Tatsuoka, 1983). The element in row j and column k of the Q matrix, , is equal to 1 if the kth attribute is requ ired to ans we r item j correctly; otherwise it is equal to zero. Several link functions that are linear in the parameters can be used in specifying general models for cognitive diagnosis, namely, identity, logit, and log. In their saturated

PAGE 26

26 forms, all the resulting models have parameters for item j, and provide identical model data fit (de la Torre, 2011) . This study utilizes the G DINA identity link. For the identity link, the G DINA model can be represented by the sum of the effects of each required attributes and their interactions. Specifically, the identity link ca n be represented by E quation 2 8 : ( 2 8 ) w here p is the probability, is the oberved response of person i on item j, is the reduced attribute combination of person i resulting from the required attributes specifically for item j, is the intercept for item j when there are no attributes mastered, is the k required attributes measured specifically by item j, is the main effect due to the presence of attribute k on item j, is the interaction effect due to the presence of both k and , and is the interaction effect due to all the re quired attributes on item j: namely , , is the attribute vector due to presence of attribute k of person i . These parameters can be interpreted as follows: represents the baseline probability of answering item j correctly (i.e., probability of a correct response when none of the required attributes is present); is the increase in the probability of a correct response as a result of mastering a single attribute (i.e., ); is the increase in the probability of a correct response due to the mastery of both and that is over and above the additive impact of the mastery of the same two attributes; and represents the change in the probability of a correct response due to the mastery of all the required attributes that is over and above the additive impact of the main and lo we r order interaction effects. The intercept is always non negative, the main effects are typically non negative, but the interaction effects can take on any values. T he main effects are non -

PAGE 27

27 negative if for , where is the null vector of length . This implies that mastering any one of the required attributes corresponds to some incre the item (de la Torre, 2011, pp.182 183). DINA framework. The logit link in G DINA is to take log odds of the identity link and results in an equivalent model to the log linear CDM (LCDM). The logit link can be re presented by E quation 2 9 : ( 2 9 ) The log link in G DINA is to take the log of the identity l ink, and can be represented by E quation 2 10 : ( 2 10 ) The G DINA with the identity link, logit link and the log link define different nature of the CDM: the identity link and the logit link of G DINA defines the additive nature of the probability and logit of mastering each attributes in the Q matrix; t he log CDM defines the multiplicative nature of probability of mastering each attribute in the Q matrix. It is important to note this implication because applying the same constraints to the different link functions will resu lt in different models , which may res ult in differ ent model fit (de la Torre, 2011) . This paper is going to use the G DINA identity link. 2. 5 DINA , A CDM under the G DINA F ramework . The DINA and A CDM can be specified as special cases of the G DINA model.

PAGE 28

28 DINA. The DINA model can be obtained by setting all the parameters in G DINA model , except and , to zero (Equation 2 1 1 ) ( 2 11 ) In terms of the G DINA parameters, because is the probability of an examinee answering an item correctly without possessing any required attributes. because only when an examinee possesses all the required attributes by an item can he or she ans we r the item correctly. In othe r words, only possessing one or more certain attributes does not add to the probability of answering an item correctly so the lo we r order interaction terms are all set to zero. A CDM. The additive CDM ( de la Torre, 2011) includes only the main effect of the attributes required by an item. In the G DINA framework, the A CDM can be obtained by setting all the interaction terms to zero. In other words, the probability of an examinee answering an item correctly can be defined with only the intercept and main effects (Equation 2 12 ) : ( 2 12 ) where represents the baseline probability of correct answer even no attribute is mastered, and the remainder of the right hand side of the equation represents the main effects of all required attributes by an item. There is no interaction bet we en attributes. What this model implies is that the probability of answering item j correctly is increased by mastering attribute and the margin of probability increase is . The increase of

PAGE 29

29 probability due to is assumed to be independent of other attributes. There are parameters for item j in the A CDM where is the number of required attributes for item j (de la Torre, 2011). 2. 6 Misspecification of CDM s 2.6 .1 Q Matrix and M isspecification of the Q matrix Q matrix. For cognitive diagnosis modeling, the Q matrix defines which attributes are required by each item and it determines the number of item parameters to be estimated. In effect, the Q matrix embodies the design of the assessment instrument in use and hence, de termines the quality of the diagnostic information obtained through the assessment instrument (Rupp & Templin, 2008). T he process of establishing the Q matrix through substantive knowledge tends to be subjective in nature and has raised serious validity co ncerns among researchers (de la Torre, 2008; Rupp & Templin, 2008). Let denote the element in row j and column k of a J × K Q matrix, where J and K represent the number of items and attributes, respectively. The entry, , is specified as 1 if mastery of attribute k is required to ans we r item j correctly and as 0 otherwise. So the Q matrix contains elements 0 or 1 to constrain whether a particular attribute is required to correctly ans we r an item . Henson and Templin ( 2009 ) point out that, while constructing the Q matrix is the most crucial and difficult step in CDM , it is often taken for granted. It is assumed that experts correctly identify exactly the skills needed. Ho we ver, this assumption may not always hold. The appropriateness of the Q matrix is often overlooked and as a result poorly fitting models due to Q matrix misspecification cannot be identifie d as poorly fitting (de la Torre, 2008).

PAGE 30

30 Q matrix misspecification . It could mean that the attribute specified fo r an item has been under specified (i.e., some 1s have bee n incorrectly specified as 0s assuming an attribute that contributes to an item does not), over specified (i.e., some 0s have be en incorrectly specified as 1s assuming an attribute that does no t con tribute to an item does), or both under and over specified (i.e., some 1s have been incorrectly specified as 0s and at the same time some 0s have be en incorrectly specified as 1s , both types of errors are present in the Q matrix). T he effect of the Q mat rix misspecification cannot easily be evaluated because about the implications of misspecification of the Q matrix on CDM and its impact on CDM fit. Given not sure if the Q m atrix is correctly specified , c oncern for appropriateness of the Q matrix has been shown ( e.g. , de la Torre, 2008; de la Torre & Douglas, 2004). Because of these concerns, an increase in research has been seen with respect to developing the correct Q matrix, validating the Q matrix and also the consequences of Q matrix misspecification. The consequences of misspecification of Q matrix can be seen in poor parameter estimation, wrong classification of respondents and poor model data fit of the CDMs . These will cause the inference made of CDM invali d. Some studies used factor a nalysis in diagnosing attribute profile (Templin & Henson 2006; Henson, Templin, & Douglas, 2007). Liu, Douglas and Henson (2009) used factor analysis to develop the Q matrix. One study has examined a method for validating the Q matrix (de la Torre, 2008). Another study proposed an exploratory method for validati ng the Q matrix (Close, 2012). d e la Torre (2008) developed a method called delta method for identifying and correcting for misspecified Q matrix entries for DINA model. DeCarlo (2011) studied the uncertainty of Q matrix using the

PAGE 31

31 DINA model in the framework of LCDM to analyze the fraction subtraction data. They found that if latent class size estimates for one or more skills that are close to unity, then ibility that the Q matrix has been misspecified (with respect to the inclusion of an irrelevant skill). Another DeCarlo paper (2012) aimed to recognize the uncertainty of the Q matrix with simulation study with the Bayesian estimation approach. Results sho we d that the Bayesian approach was helpful to determine the correct Q matrix. Both of these papers suggested to fit the CDM with just the main effect of each attribute as comparing models because the independence model might reveal whether a skill is irrel evant. The A CDM in this paper is equivalent to the independence CDM proposed by DeCarlo. This paper will examine the effect of all types of Q matrix misspecification as we ll as degree of misspecification represented by number of misspecified items present in the Q matrix on CDM fit indexes. 2.6 .2 CDM Misspecification Other than the Q matrix, the CDMs can be misspecified as we ll. CDM misspecification in this study refers to the different CDMs specified under the framework of G DINA with incorrect interaction terms. In this paper, G DINA, DINA, and A CDM will be investigated for the impact of Q matrix misspecification on model fit. Particularly in this paper , there are a maximum of 3 attributes in the simulation study, G DINA model is the saturated model that includes the intercept, main effect of the att ributes required by an item and two way and three way interaction terms depending on how many attributes are measured by an item ; DINA model includes only the intercept and the highest order interaction of the attributes required by an item (the population model); and A CDM includes only the intercept and main effect of the attributes required by an item.

PAGE 32

32 2.7 Model Fit of CDM s Giv en a CDM, how the model fits the response data is known as the evaluation of the goodness of fit of the model, or model fit. This step is fundamental to the modeling process because the results of statistical models are essentially meaningless when model f it is poor and thus invalidate the inference made from the CDMs . The model fit for CDMs discussed in this dissertation is based on categorical data analysis. Under the CDM context, various fit statistics or methods have been developed or used to a ssess the model data fit. For instance, the residuals bet we en the observed and predicted correlations , the log odds ratios of item pairs , and the residuals bet we en the observed and predicted proportion correct of individual items (de la Torre & Douglas, 2008; Sinharay & Almond, 2007); item discrimination indices (de la Torre, 2008; de la Torre & Chiu, 2010); the and G statistics ( Rupp, Templ in, & Henson , 2010) ; the mean absolute differences ( MAD; Henson, Templin, & Willse, 2009) bet we en the observed and predicted item conditional probabilities of success; and related root mean square error of approximation (RMSEA; von Davier, 2006; Kunina Hab enicht, information criterion (AIC; Akaike, 1974), the Bayesian information criterion (BIC; Schwarzer, 1976), the Consistent Akaikes Information (CAIC; Bozdogan, 1987), th e deviance information criterion (DIC; Spiegelhalter, Best, Carlin, & van der Linde, 2002), and the Bayes factor (Kass & Raftery, 1995) that are adopted for relative fit evaluation (e.g., de la Torre & Douglas, 2004, 2008; DeCarlo, 2011; Rupp, Templin, & H enson , 2010; Sinharay & Almond, 2007). This paper will adopt AIC, BIC and CAIC, RMSEA, the maximum ( ) ; Robitzsch, Kiefer, George, & Uenlue, 2014 ) and the

PAGE 33

33 absolute value of residuals bet we en the observed and predicted correlations ; Robitzsch, Kiefer, George, & Uenlue, 2014) as criteria to evaluate CDM fit. Amongst these statistics, the first three are relative fit statistics, and the r emaining three are absolute fit statistics as they compare the predicted frequencies to the observed frequencies as the model fit to the data regardless of how we ll other models fit. 2.7 .1 Relative Fit Statistics Two commonly used relative fit statistics are the informatio n criteria AIC and BIC (Rupp, Templin, & Henson , 2010). They compare bet we en two non nested models to determine which one model is superior. The relative model fit of CDMs is usually assessed before the absolute fit after certain candidate mod els are selec ted ( Rupp, Templin, & Henson , 2010). AIC and BIC are computed as a function of the maximum likelihood (ML), which is based on the ML estimate of the item parameters (i.e., ) with the attributes integrated out (Equation 2 13 ) : ( 2 13 ) where N is the sample size, L is the total number of attribute patterns, is the response vector for examinee i , is the l th attribute vector, is the likelihood of the response vector of examinee i given , and ) is the prior probability of . The log likelihood value of the CDM is calculated as in E quation 2 1 4 : 2 LL 2 ln( ML ) ( 2 14 ) Based on this log likeliho od valu e, the AIC is calculated as in E quation 2 1 5 :

PAGE 34

34 AIC 2LL + 2 P ( 2 15 ) where P is the number of parameters in the model. BIC is calculated as in E quation 2 16 : BIC 2LL + P ln( N ) ( 2 1 6 ) where N is the sample size . sample size in its calculation. The CAIC (Bozdogan, 1987) is an adjustment for AIC by adding sample size in the calculation of AIC (E quation 2 18 ): CAIC = 2LL + P (ln(N) +1) ( 2 1 7 ) These information based criteria represent statistical compromises bet we en model fit and model parsimony, which means that overly complex models that lead to a small improvement in fit compared to simpler models will be penalized. The number of parameters P equals (2 J + 2 k 1), , and for the DINA, A CDM and G DINA respectively. For all three statistics, a lo we r value is desirable . The model with the smallest value is selected for better model fit. 2.7 .2 Absolute F it Statistics The absolute model fit is based on fit of a model to the observed response data without comparison to other models. Absolut e fit can be at test level or item level , and should always be of concer n to an analyst (Rupp, Templin, & Henson , 2010). Three absolute model fit statistics examined in this study are: RMSEA, and . RMSEA (von Davier, 2006) is the root mean square error of approximation. RMSEA measures the discrepancy bet we en predicted prob abilities of correct response

PAGE 35

35 with observed correct responses of an item by latent class. Item level RMSEA ca n be calculated by Equation 2 18 : ( 2 18 ) w here is kth attribute, represents latent class of certain attribute combination , is the estimated class probability of . is the estimated item response function, is the expected number of students with on item j in category k and is the expected number of students with on item j ( Robitzsch, Kiefer, George, & Uenlue, 2014 ). Mean RMSEA is calculated at the test level and is an average of all item RMSEA. Recommendations for RMSEA criteria in use have not been e stablished in a CDM framework. Ho we ver, Hu and Be ntler (1999) recommend RMSEA < .06 as a criterion for a good structural equation model. McDonald a nd Mok (1995) recommend RMSEA < .05 as a rule of thumb for IRT. Maydeu Olivares, and Joe (2014) suggested RMSEA < .089 as adequate fit for multidimentional IRT, RMSEA< .05 for close fit. As a result, I adopted RMSEA < .05 as a criteria for CDM. and . These two statistics are tests of global absolute model fit statistics using test statistics of all item pairs (X j, and X ). The is based on predictions of the pairwise item responses. The statistic test s how similar the observed proportions are to the expected proportions under the assumption of statistical independence (Rupp, Templin, & Henson, 2010) . It is de fined by equation 2 19 :

PAGE 36

36 ( 2 19 ) where is the summation symbol. n is the observed frequency of ( X j =l and X ) and e is the expected frequency implied by the model. The is the maximum of all item pair statistics. The comes with a p value and a significant p value indicates that the statistical independence of the item pair is violated, thus the model does not fit the data. The Holm procedure is applied to the p value for the multiple comparison purpose (Robitzsch, Kiefer, George, & Uenlue, 2014) . The statistics is the absolute value of the deviations of Fisher transformed correlations accompanied by a p va lue adjusted by the Holm procedure ( Chen, de la Torre, & Zhang, 2013; Robitzsch, Kiefer, George, & Uenlue, 2014). can be calculated with E quation 2 2 0 : ( 2 2 0 ) where is the observed response on item j and is the predicted response of respondent i on item j. Corr uct moment correlation, and Z is the Fisher transformation. The should be close to 0 if the model fit is good ( Chen, de la Torre, & Zhang, 2013). 2.8 Review of S tudies of Q ma trix M isspecification a nd CDM Misspecification on CDM Model F it Studies have investigated the consequences of Q matrix misspecification on parameter es timates and/or respondent classifications (e.g., Choi, Templin, Cohen, & Atwood, 2010; Henson & Templin, 2009; Kunina Habenicht, Rupp & Wilhelm, 2012 ;

PAGE 37

37 Rupp & Templin, 2008). For example, Rupp and Templin (2008) investigated the impact of Q matrix misspecif ication on parameter estimates and classification accuracy for the DINA model for several misspecification conditions with over and under specifications as we ll as conditions with incorrect logical dependencies bet we en attributes The misspecifications of the Q matrix investigated we re of two types: (a) under specificati on (replacing 1s with 0s), ove r specificati on (replacing 0s with 1s), or both over spe cification and under specificati on ( exchanging 0s and 1s while co ntrolling for the overall number of changes); and (b) incorrect dependency assumptions about two attributes (e.g., when one attribute cannot occur in the presence of another). The authors examined the parameter estimates and the mean absolute difference (M AD) values of the parameter estimates. Results sho we d that the effect of Q matrix misspecification on the slipping and the guessing parameters of the DINA model only affects the items misspecified . For example, the slipping parameters we re overestimated in the item over specified while the guessing parameters we re overestimated in item s under specified . When there is both under specification and over specification in an item , the item mean absolute deviation (MAD) increased for both the slipping and the guessing parameters . L arge values of the slipping and the guessing parameters can provide empirical evidence for Q matrix misspecification. Besides Q matrix misspecification, model fit statistics due to the misspecification of the Q matrix need to be paid attention to in order to make valid inferences from CDMs. Of the studies revie we d, most use global relative fit statistics like AIC and BIC to evalua te the model fit (e.g ., Chen, de la Torre, & Zhang, 2013; Close, 2012; DeCarlo, 2011, 2012; de la Torre & D ouglas, 2008; Henson, Templin, & Willse, 2009; Kunina -

PAGE 38

38 Habenicht, Rupp and Wilhelm, 2012 ; Rupp & Templin, 2008 ). Some used DIC as mod el fit information (Sinharay & Almod, 2007). Some study used root mean squared deviation (RMSD) to assess overall model fit ( de la Torre & Karelitz, 2009). Some used item level absolute fit statistics like MAD ( e.g., Chen, de la Torre, & Zhang, 2013; de la Torre & Douglas, 2004; Henson, Roussos, & Templin, 2005; Kunina Habenicht, Rupp, & Wilhelm, 2012 ; Templin, Henson, Templin , & Roussos, 2008 ) and root mean square error of approximation (RMSEA; e.g., Kunina Habenicht, Rupp, & Wilhelm, 2012 ) to evaluate model fit. Other studies used RMSE (de la Torre & Douglas, 2004; Henson, Roussos, & Templin, 2005; Templin & Henson, 2006) to e valuate model f it. Ho we ver, there has been only a few research studies conducted on the effect of Q matrix missp ecification on CDM model fit, CDM misspecification on CDM model fit, or on both. d e la Torre and Douglas (2008) suggested investigating the inte raction of misspecifications of Q matrix and different specifications of CDMs to examine the extent to which these interactions affect different model fit indices. There are few studies conducted on this (Choi, Te mplin, Cohen, & Atwood, 2010; Kunina Habeni cht, Rupp, & Wilhelm, 2012; Chen, de la Torre, & Zhang, 2013). These three papers are conducted in the general framework of either LCDM (Choi, Templin, Cohen, & Atwood, 2010; Kunina Habenicht, Rupp , & Wilhelm, 2012) or G DINA (Chen, de la Torre , & Zhang, 2013), so that the misspecification of the CDMs in terms of attribute interactions are comparable. This paper aims to further investigate on this issue, below are a detailed review of the studies. Choi, Templin, Cohen and Atwood (2010) studied the Q matrix misspecification and CDM misspecification within the log linear CDM (LCDM) framework by investigating

PAGE 39

39 the performances of the AIC and BIC statistics. These authors examined the impact of Q matrix misspecifications on four CDMs , namely the full LCD M, the DINA, the DINO, and the C RUM. The full L CDM includes both main effect parameters and all the interaction effect parameter; the DINA model contains only the two way interaction effect parameter; the DINO model contains both main effect parameters an d a negative interaction effect parameter, and the C RUM contains only main effect parameters. Thus, the full LCDM is the most flexible model, while the DINA model is the most restrictive model with the remaining two models representing special intermediat e cases. Their findings are that relative model data fit indices we re able to detect the correct model with sample sizes larger than 200; AIC and BIC do not always agree on the best fitting model as they penalize differently for model complexity; leaving r equired attributes out of the Q matrix was harmful to parameter recovery of item parameters and respondent classifications while adding non necessary attributes to the Q matrix was not; estimation of intercepts and main effects was precise for samples larg er than 500 but not for interaction effects; two way interaction effects we re difficult to recover even in co nditions with 4000 respondents. The findings about the effect of leaving or adding unnecessary attributes in the Q matrix shed insight to research practices. Ho we ver, these authors only investigated the performance of relative fit statistics such as AIC and BIC, they did not examine the absolute fit of the CDM. According to Rupp, Templin and elative fit statistics for validity of the CDMs . Meanwhile, these authors did not vary the test length which our preliminary ANOVA proved is a significant factor that affects the model fit index performance. Fur thermore, they used limited number s of misspe cified Q matrixes.

PAGE 40

40 Kunina Habenicht, Rupp, and Wilhelm (2012) studied the effect of Q matrix misspecification in the framework of the log linear CDM (LCDM). They used a logistic link function and a linear predictor with intercept, slope, and interaction e ffect parameters to model the relationship bet we en the latent attribute variables and the observed response probabilities for items. The intercept is the probability of answering an item correct without mastering any attribute , main effects are associated with one attribute, two way interaction effects are associated with two attributes, and three way interaction effects are associated with three attributes. The simulation study addressed the evaluation of the L CDM from the Q matrix misspecification and int eraction effect misspecification . The paper also examined model fit at item level by looking at the mean absolute difference index (MAD ; Henson, Roussos, Douglas, & He, 2008 ). MAD is the absolute value of the difference bet we en observed and model predicted response probabilities within latent classes . Another index was the root mean square error of approximation ( RMSEA; von Davier, 2006). T he paper also considered the relative fit indexes like AIC (Akaike , 1974) and BIC (Schwarz, 1978). Results has demonstrated that Q matrix misspecifications have a dramatic effect on classification accuracy and parameter recovery of latent class distributions, correlations, and attribute proportions given that the Q matri x reflects the loading structure of the multidimensional CDMs . Whereas the AIC and BIC we re sensitive to Q matrix misspecification, results also show that AIC and BIC we re useful in selecting the correctly specified Q matrix against the misspecified Q matr ices when about 30% of the entries had been randomly permuted or the number of attributes was completely over specified (from three to five) or completely under specified (from five to three) in the log linear CDM. In addition, they also found

PAGE 41

41 that the AIC was useful in selecting the correct model against the misspecified model when all interaction effects we re omitted. Ho we ver, the usefulness of the mean absolute differences (MAD) and root mean square error of approximation (RMSEA) we re limited for absolut e fit evaluation at the test level. This study compared several CDMs with different specifications of attribute interactions, but it is limited in the degrees of Q matrix misspecifications : 30% of the Q matrix entries are randomly permuted or the dimension of the attributes are misspecified at test level . Chen, de la Torre and Zhang (2013) did a comprehensive study on the model fit evaluation of the CDM within the G DINA model framework. They used a simulation study and an empirical example to investigate the usefulness of various fit statistics to evaluate model fit under the context of DINA, A CDM and G DINA model formulated within the G DINA framework. The study manipulated the Q matrix misspecification (over specifi cation and under specification), the CDM misspecification (CDM misspecification in this study refers to incorrect interaction parameterization of the CDMs ) and a combination of both to study the model fit index performances. The simulation study used two types of reduced C DMs to generate the data: the DINA model and A CDM. In addition to these two models, a saturated model (i.e., the G DINA model) also was used to fit the data. Two test lengths we re considered with J = 15 and J = 30. Five attributes we re considered, and eac h attribute was specified an equal number of times. The maximum number of required attributes is 3. Six model fit statistics we re considered in this study: 2 log likelihood ( 2LL), AIC, BIC, the residual bet we en the observed and predicted proportion corre ct of individual items, the residual bet we en the observed and predicted Fisher transformed correlation of item pairs ( ),

PAGE 42

42 and the residual bet we en the observed and predicted log odds ratios of item pairs ( ) . In this study, the first three statistics we re used for relative fit evaluation, whereas the last three we re for absolute fit evaluation. They found that BIC, and to some extent, the AIC, can be useful to detect misspecification of the CDM, the Q matrix, or both. The saturated model can play an important role in detecting CDM or Q matrix misspecifications. For CDM misspecification, it can be used to distinguish bet we en possibly true and misspecified CDMs. For Q matrix misspecification, it can be used as t he true model to compare across Q matrices. For absolute fit evaluation, the residual bet we en the observed and predicted correlation of item pair with the Fisher transformation ( ) and the residual bet we en the observed and predicted lo g odds ratios of pair wise item responses ( ) had similar performance and we re sensitive to different misspecifications in most conditions. T he saturated model can be used as the true model in most cases for these two statistics . Ho we v er, both we re insensitive to over specified Q matrices unless highly constrained CDMs we re involved. They also suggested to further investigate the impact of degree of misspecification on CDM model fit as they misspecified a few items to stand for over or under specification. These studies sho we d some aligned results i ndependent of which diagnostic classification model was used : AIC and BIC do not always agree on the best model because BIC favors simpler model with the punis hment imposed on extra parameters; BIC can detect the misspecified model and misspec ified Q matrix; over specificati on is hard to detect while under specification is easy to detect. Chen, de la Torre, and Zhang (2013) also recommend the saturated G DINA mo del can be used as the correct model to detect wrong Q matrix. Yet the investigation of the po we r of relative model fit

PAGE 43

43 statistics and absolute fit statistics subject to various degrees of Q matrix misspecification is called for as suggested by Chen, de la Torre, and Zhang (2013). T his paper aims to further investigate the Q matrix misspecification issue and extend the degree of misspecification of the Q matrix with a simulation study. Degree of Q matrix misspecification in this study was embodied through n umber of misspecified items. Meanwhile the true CDM might not be known as we ll, so I studied the effect of three different CDMs that are specified under the framework of G DINA with different interaction terms. Based on the results of previous studies (Cho i, Rupp, & Pan, 2012; Choi, Templin, Cohen, & Atwood, 2010; Chen, de la Torre, & Zhang, 2013) and the suggestion of de la Torre and Douglas ( 2008), this paper includes the generating DINA model, the main effect A CDM and the saturated G DINA model in the e stimation part as comparison mod els. In terms of the fit indexe s , AIC and BIC are used for reason that t hese two relative statistics are commonly used model fit statistics (e.g., Chen, de la Torre, & Zhang, 2013; Close, 2012; DeCarlo , 2011, 2012; de la Torre & Douglas, 2008; Henson, Templin, & Willse, 2009; Kunina Habenicht, Rupp and Wilhelm, 2012; Rupp & Templin, 2008) . sample size adjusted version of AIC which is known for its insensitivity to sample size. In terms of absolute fit indexes, RMSEA commonly used (e.g. von Davier, 2006; Kunina Habenicht, Rupp, & Wilhelm, 2012) . Moreover, in order to provide alternative absolute fit assessment, two other statistics, and are introduced for the benefit of statistical hypothesis testing. This study investigates the combined effect of Q matrix misspecification and CDM misspecificatio n.

PAGE 44

44 2.9 Purpose of the S tudy This study addresses these core components of model fit of CDMs in t wo parts: 1. How is the relative fit of the CDM affected by the Q matrix misspecification and the CDM model misspecification? 2. How is the absolute fit of the CDM affected by the Q matrix misspecification and the C DM model misspecification ?

PAGE 45

45 CHAPTER 3 METHOD In our simulation study, manipulated factors are the sample size, test length, types of misspecification , and number of misspecified items. There are 72 data generation conditions and each condition was replicated 500 times. In addition to the correct model (i.e., DINA), the G DINA as the saturated model and a reduced model of A CDM as the benchmark model ar e also fit to the data generated. 3.1 Simulation Design Table 3 1 lists the research design in detail. Table 3 1. Research Design Research Design Levels Data generation Generating model DINA Number of attributes 3 Marginal attribute difficulty 0 Attribute correlation .5 Slipping .1 Guessing .1 Manipulated factors Sample size 200, 500, 1000 Test length 20, 40 Type of misspecification under specification, over specification, both Number of misspecified items 1, 3, 5, 7 Generating conditions 2 x 3 x 4 x 3 = 72 Number of replications 500 Estimating model DINA, G DINA, A CDM The D INA model is a commonly used CDM in research studies (e.g., Chen, de la Torre, & Zhang, 2013; Choi, Templin, Co hen, & Atwood, 2010; D eCarlo, 2011, 2012; de la Torre, 2008; de la Torre & Douglas, 2004, 2008; Rupp & Templin, 2008; von Davier, 2014 ) , as well as in applied studies (e.g. Ba okçu, 2014 ; Ba okçu, Ogretmen, & Kelecioglu, 2013; Lee, Park, & Taylan, 2011; Ravand, Barati, & Widhiarso, 2013 ; Su et

PAGE 46

46 al., 2013 ) . This paper select s the DINA model as the generating model because of its popularity, easy application and interpretation, as well as less complexity. Three attributes a re considered because three and five attributes are common dimensiona lities of educational and psychological assessments (Kunina Habenicht , Rupp, & Wilhelm, 2012). Examples of simulation or real data studies with three or five attributes include de la Torre and Dou glas (2004, 2008), de la Torre and Lee (2010), Templin and B radshaw (2014), Sun, Zhang, and de la Torre (2013). Each attribute was specified an equal number of times in the Q matrix, which makes this Q matrix a balanced design. Each item measures one, two or three attributes. Since the point in this paper is to inv estigate the effect of the degree of Q matrix misspecification on CDMs , I adopt a smaller dimension with just three attributes that enables us to easily apply a fully crossed design. This allows us to investigate more precisely and completely the degree of Q matrix misspecification (1, 3, 5, 7) with different types of misfit: o ver specification (0 to 1), under specification (1 to 0), and both (0 to 1 and 1 to 0). Marginal attribute difficulty is the attribute difficulty, aka the threshold of students correctly answering each item. Marginal attribute difficulty is held at 0 foll owing Kunina Habenicht , Rupp , and Wilhelm (2012), so that the probability of each respondent mast ering certain attribute is 50%. I did this through setting the cutoff value at 0 for a normal distribution of the continuous underlying attributes created with mean 0 and a correlation matrix. Through this setting, I hope to minimize the effect of randomly under specifying and over specifying the attributes of different difficulty in the Q matrix, so the conclusion on type of misspecification on model fit is mor e trustworthy.

PAGE 47

47 Attribute correlation is the dichotomous correlation bet we en two attributes. It is held at .5. This value is set based on the DINA setting of this article and based on a review of previous literature. The correct model of this paper is the DINA model which essentially specifies an examinee needs to possess all required attributes in order to ans we r an item correctly. In other words, only the three way interaction is significant. This special feature of the DINA model indicates the attribute correlation should not be too high. An invest igation of empirical studies showe d that .5 is a reasonable value Henson, Roussos, Douglas & He, 2008; Henson, Templin, & Douglas, 2007; Kunina Habenicht , Rupp , & Wilhelm, 2012; Sinha ray, Puhan & H aberman, 2011). Slipping and guessing are held at .1 following Chen, de la Torre and Zhang (2013). According to Rupp, Templin and Henson (2010) slipping at .1 is commo n in educational research . Table 3 2 presents item parameters as used in Ch en, de la Torre, and Zhang ( 2013). Table 3 2. G DINA Item Parameters Items P arameters 1 Attribute .10 .90 2 Attributes .10 .10 .10 .90 3 Attributes .10 .10 .10 .10 .10 .10 .10 .90 Sample size of 200, 500, and 1000 are included in the simulation. These sample sizes are selected based on the design of Chen, de la Torre, and Zhang (2013) who used sample size of 500 and 1000. Our pilot study sho we d that smaller sample size is beneficial for correct recognition of the correct CDMs . Through reviewing literatures, there are studies that examined small sample size specifically

PAGE 48

48 (2014) investigated small sample siz e like 50, 100 and 200. Maydeu Olivares and Joe (2014) included sample size 100 in their multidimensional IRT st udy. Choi, Templin, Cohen, and Atwood (2010) found with sample size of 200 or more, relative model fit indexes are able to detect the correct CDMs . So it is reasonable to include sample size of 200 to the study. Test lengths are set at 20 and 40 because a review of previous studies sho we d these are common in educational studies (e.g., Choi, Templin, Cohen, & Atwood, 2010; de la Torre & Douglas, 2004, 2008; de la Torre & Lee , 2010; Henson & Douglas, 2005; Henson, Roussos, Douglas & He , 2008; Henson & Templin, 2009; Henson, Templin, & Douglas , 2007 ). The correct Q matri x for 20 items is presented in T able 3 3 and the Q matrix of 40 items is just a combination of two 20 item Q matrix. Table 3 3 . Correct Q matrix of 20 I tems Attribute Attribute I tem I tem 1 1 0 0 11 1 0 1 2 0 1 0 12 0 1 1 3 0 0 1 13 1 0 0 4 1 1 0 14 0 1 0 5 1 0 1 15 0 0 1 6 0 1 1 16 1 1 0 7 1 0 0 17 1 0 1 8 0 1 0 18 0 1 1 9 0 0 1 19 1 1 1 10 1 1 0 20 1 1 1 Misspecifications of the Q matrix a re completed through randomly selecting and misspecifying 1, 3, 5, and 7 items to cover over specification (misspecifying a 0 as 1), under specification (misspecifying a 1 as 0), and both over and under specifications (one misspecifying a 0 as 1 and one misspecifying a 1 as 0 while keeping the total

PAGE 49

49 number of a ttribute measured unchanged) scenarios. The manipulation of types of misspecification is akin to Rupp and Templin (2008), and Chen, de la Torre and Zhang (2013). I vary the number of items misspecified considering previous studies only examined a limited d egree of Q matrix misspecifica tion. For example, Kunina Habenicht , Rupp and Wilhelm (2012) created wrong Q matrix through randomly permuting 30% of the correct Q matrix or ex changing the dimensionality of the correct Q matrix (3 to 5 and vice versa) . Yet t hrough our investigation of the previous studies results, type of misspecification and dimensionality influence the parameter estimation of the CDMs (Rupp & Templin, 2008). So whether the random permutation involves over specification or under specificatio n should be discussed separately. Chen, de la Torre and Zhang (2013) defined misspecification of Q matrix through misspecifying a couple of items in their study, and suggested degree of misspecification of Q matrix should be further examined in a future st udy. Through this setting I hope to vary the degree of Q matrix misspecifications via the number of items misspecified combined with types of misspecification (over specification, under specification, and both over and under specifications). Altogether t here are 3 (sample size) x 2 (test length) x 4 (number of misspecified items) x 3 (types of misspecification) =72 simulated conditions. Each condition was replicated 500 times following Chen, de la Torre and Zhang (2013) . 3.2 Model Estimation Each simulat ed data set was analyzed with DINA, G DINA and A CDM , each combined with the correct Q matrix as we ll as the misspecified Q matrix, with all models but the DINA being considered CDM misspecifications. DINA is the correct model, G DINA is the saturated model as a benchmark model, and A CDM is the additive CDM

PAGE 50

50 with only the main effects of each required attribute as another be nchmark model. These fitting models are chosen following Chen, de la Torre and Zhang (2013) and also from the r eview of the following papers. DeCarlo (2011, 2012) suggested using independent DINA model with only main effects as the comparison model. Simula tio n studies (Chen, de la Torre , & Zhang, 2013; Choi, Rupp, & Pan, 2012; Choi, Templin, Cohen, & Atwood, 2 010; Kunina Habenicht, Rupp, & Wilhelm, 2012) all used main effect CDM and saturated CDM as comparison models. 3.3 Model Fit Absolute model fit (RMSEA , , and ) and relative model fit (AIC, BIC and CAIC) statistics of these fi tting models will be examined. Absolute model fit statistics are chosen based on review of the following studies: Kunina Habenicht , Rupp and Wilhelm (2012), Chen, de la Torre and Z hang (2013), and Rupp, Templin and Henson (2010). Relative model fit statistics are chosen for the following reasons: AIC and BIC are widely used model fit index to determine the model fit of CDM bet we en non neste d mod els (e.g., Chen, de la Torre, & Zhang, 2013; Choi, Rupp, & Pan, 2012; Choi, Templin, Cohen, & Atwood, 2010; DeCarlo, 2011, 2012; de la Torre & Douglas, 2008; Henson & Templin, 2009; Kunina Habenicht , Rupp, & Wilhelm, 2012; Rupp & Templin, 2008; Rupp, Templin , & Henson, 2010). CAIC was included because it is a sample size adjusted version for AIC and it appears that no one has assessed CAIC in their CDMs evaluation studies. 3.4 Outcome Variable Each generated data set will be analyzed with DINA, G DINA , and A CDM. Corresponding model fit statistics such as RMSEA, and AIC, BIC

PAGE 51

51 and CAIC will be extracted and compared for each analysis procedure. For RMSEA, the RMSEA values of the DINA, G DINA and A CDM under various conditions are box plotted and compared to the .05 criterion to assess model fit. For the and , the rejection rate out of the 500 replications are reported as the outcome variable. For AIC, BIC and CAIC, for each replication, whichever CDMs fit to the data generated that produced the lo we st value is recognized as the best fitting model. And the percentages of AIC, BIC and CAIC selecting the DINA, G DINA or A CDM are calculated and reported as the out come variable. 3.5 Software Q matrix generation, data generation and data estimation code are written and conducted in R ( www.r project.org ) and CDM package 3.0 29 (Robitzsch, Kiefer, George, & Uenlue, 2014).

PAGE 52

52 CHAPTER 4 RESULTS The results are organized in three parts. First the item parameter recovery is reported for the simulation study. In the second part, performances of the relative model fit indices as in research question 1 are reported. In the third pa rt, performances of the absolute model fit indices as in research question 2 are reported. Table 4 1 . Item Parameter R ecovery Sample S ize J=20 J=40 bias mean(se) bias mean(se) bias mean(se) bias mean(se) 200 .00 .1(.03) .00 .1(.11) .00 .1(.01) .00 .1(.05) 500 .00 .1(.02) .00 .1(.07) .00 .1(.02) .00 .1(.07) 1000 .00 .1(.01) .00 .1(.05) .00 .1(.01) .00 .1(.05) From T able 4 1 we can see the bias for guessing and slipping are 0 across all simulation conditions. The mean estimate for guessing and slipping are .1 as simulated. The standard errors of guessing a re less than .03. We notice that the standard errors of the slipping are l arger than the standard errors of the guessi ng for the presenc e of two way or three way interactions depending on an item measuring two or three attributes. This result is not surprising since Choi, Templin, Cohen, and Atwood (2010) found main effect is ea sily recoverable, and that two way interaction effects a re difficult to recover even in conditions with a sample size of 4000. Slipping in terms of G DINA can be calculated as 1 intercept interaction, so if a multiple way interaction is not reliable, then slipping is not reliable either. Kunina Habenicht, Rupp, and Wilhelm (2012) found that trying to extract main and interaction effects on more than three dimensions with a single item score leads to large standard errors of parameter estimates and is also

PAGE 53

53 c omputationally burdensome. In a word, the precision of the parameter estimates was sufficiently high to ignore the uncertainty in the esti mates caused by the simulation. 4.1 Research Question 1. Relative Fit of CDM In this part, proportion of AIC, BIC and CAIC selecting among DINA, G DINA and A CDM are presented. The result sho we d that A CDM is never selected, so the selection was bet we en the DINA and G DINA. Consequently, A CDM is left out of any further discussion. For simplicity in presenting the results , only percentage of DINA model being selected is reported and the rest of the proportion g oes to the G DINA model (T able 4 2 ). Table 4 2 . Selection R ate of AIC, BIC and CAIC for the DINA M odel N= 200 N= 500 N= 1000 AIC BIC CAIC AIC BIC CAIC AIC BIC CAIC J T No. D D D D D D D D D 20 t 0 1 1 1 1 1 1 1 1 1 b 1 1 1 1 1 1 1 .99 1 1 3 .01 .97 1 0 .36 .65 0 0 0 5 0 .65 .9 0 0 .02 0 0 0 7 0 .58 .84 0 .01 .03 0 0 0 o 1 .13 1 1 0 .99 1 0 .16 .52 3 0 .94 1 0 0 0 0 0 0 5 0 .02 .19 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 u 1 1 1 1 .99 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 5 .99 1 1 .98 1 1 .99 1 1 7 .81 .99 1 .84 .98 .99 .92 .97 .98 40 t 0 1 1 1 1 1 1 1 1 1 b 1 1 1 1 1 1 1 1 1 1 3 .28 1 1 0 1 1 0 .67 .96 5 .02 1 1 0 .72 .94 0 0 .02 7 0 .99 1 0 .12 .4 0 0 0 o 1 .14 1 1 0 1 1 0 .41 .93 3 0 1 1 0 .06 .58 0 0 0 5 0 .94 1 0 0 0 0 0 0 7 0 .02 .29 0 0 0 0 0 0

PAGE 54

54 Table 4 2. Continued N= 200 N= 500 N= 1000 AIC BIC CAIC AIC BIC CAIC AIC BIC CAIC J T No. D D D D D D D D D u 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 1 7 1 1 1 1 1 1 1 1 1 Note: J= test length. T=type of misspecification. No.=number of misspecified items. D = DINA. t=t rue Q matrix. b=b oth over and under specification. o=over specification. u=under specificati o n. The selection rate for the G DINA model is not presented and can be calculated as 1 selection rate for DINA. T he results will be presented with the following p attern for clarity purpose. From T able 4 2 , we can see that if the true Q matrix is applied, the correct DINA model is always selected over the saturated G DINA model regardless of sample size and test length. If the Q matrix is misspecified, discussion is based on three different statistics under conditions of types of misspecification, number of misspecified items, test length, and sample size. A ll the results are based on DINA is the correct model. AIC. specification, selection rate for DINA model is extremely high (9 2 % percent or higher in all conditions except one ) across all conditions. In other words, if a certain CDM is chosen over the saturated model, it is possible that the selected model is the correct model. Meanwhile the Q matrix is either correct or under specified. (2) When there is only over specification, the selection rate for DINA model is extremely low (al l 0 specified item ). In other words, if AIC chooses the saturated model over other CDMs only over specification of items in the Q matrix. (3) When there involves both over and under specificatio n of the Q matrix: AIC selection rate for the DINA model is low when number of misspecified items is 3 or larger misspecified item . For example, when the number of misspecification is 1, AIC chooses

PAGE 55

55 the correct DINA model unifo rmly across test length and sample size. In other words, if we and under specification in the Q matrix and if AIC chooses a CDM over the saturated model , it is possible the selected CDM is the correct model. Meanwhile the degree of misspecification (both over and under specification in the item) is not serious in the Q matrix (1 item misspecified). On the other hand, if AIC chooses the saturated model, it is possible the degree of misspecification (both over and under specific ation in the item) is more serious (3 items misspecified or above). In summary, under specification can be detected easily by AIC. AIC prefers the saturated 1 item wi th both over and under specification. To interpret the summary in another way, if a certain CDM is chosen over the saturated model by AIC : it is possible the sele cte d model is the correct model; the Q matrix might be the correct Q matrix; if we suspect th matrix, the Q matrix may have only under specification in it, or involves minor degree of misspecification ( mostly 1 item with both over and under specification , sometimes 3 items at test length 40 ). If the saturated model is chosen by AIC, there might be over specification in the Q matrix. BIC. (1) When there is only under specification in the items, BIC always selects DINA ( above 97%) . (2) When there is only over specification in the items, sample size, test length, and number of misspecified items play an important role to BIC selection. For example, at sample size 200, BIC selects DINA model at 3 or less over specified items and at test length 20, and BIC selects DINA model at 5 or less over specified items at test leng over specified item. At sample size 1000, BIC switch to the saturated G DINA model

PAGE 56

56 specified item. (3) and under specification in t he items, sample size, test length and number of misspecified items all play a role. For example, at sample size 200, BIC selects the correct DINA model under most conditions and the selection rate is extremely high at test length 40. At sample size 500, B IC selects the DINA model only at 1 misspecified item at test length 20, and BIC selects the DINA model at 5 or less misspecified items at test length 40. At sample size 1000, BIC selects the DINA model may se lect the DINA model specification, BIC chooses the correct DINA model (under specification can be detected uniformly by BIC); BIC tend to choose the correct DINA mo del at small sample size (500 or less, especially N=200), small number of misspecified item (especially 1 misspecified item), and longer test length (J=40); as sample size and number of misspecified items increase, BIC tends to choose the saturated G DINA model; as test length increases, BIC tends to choose the correct DINA model. To interpret the summary in another way, if a CDM is chosen by BIC over the saturated model, it is possible this CDM is the correct model. Meanwhile, the Q matrix might be correct or completely under degree of over specification (1 over specified item at test length 20 , or 3 over specified and under specification in the items (1 it em misspecified). On the other hand, if BIC chooses the saturated model, there might be severe misspecification of Q matrix (3 or 5 misspecified items or more depending on test length).

PAGE 57

57 CAIC. CAIC has a similar pattern with BIC but is prone to choose DINA more especially at small sample size (N=200), longer test length (J=40) and is more tolerant to misspecification. For example, CAIC chooses the correct DINA model even when and under specification in the Q matrix when sample s ize is 200 at test length 20. In a word, under specification can be detected uniformly across all conditions; as sample size, and number of misspecified items increase, CAIC tends to choose the saturated G DINA model . As test length increases, CAIC tends t o choose the correct DINA model . 4.2 Research Question 2 . Absolute F it of CDM 4.2.1 ABS(fcor) and MAX(x 2 ) Table 4 3 shows the rejection rate of and with . Table 4 3. Rejection Rate of and with DINA G DINA A CDM 200 500 1000 200 500 1000 200 500 1000 J T N o. 20 t 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 b 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 .93 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 .98 1 1 1 1 1 1 1 1 1 1 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 o 1 .83 .38 1 .99 1 1 0 0 0 0 0 0 1 1 1 1 1 1 3 .99 .78 1 1 1 1 .01 0 0 0 0 0 1 1 1 1 1 1 5 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 7 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 u 1 .77 .36 1 .98 1 1 .78 .35 1 .97 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 40 t 0 .01 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 b 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 .99 1 1 1 1 1 1 1 1 1 1 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

PAGE 58

58 Table 4 3. Continued DINA G DINA A CDM 200 500 1000 200 500 1000 200 500 1000 J T N o. o 1 1 .97 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 3 1 .98 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 5 1 .99 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 7 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 u 1 .8 .27 1 .96 1 1 .81 .28 1 .97 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Note: J= test length. T=type of misspecification. No.=number of misspecified items. D = DINA. t=true Q matrix. b=b oth over and under specification. o=over specification. u=under specificati o n. T he results will be presented based on different CDMs for clarity of present ation since T able 4 3 is a full factorial table. The rejection rate of and is almost identical except when sample size is 200 and number of misspecifi cation is 1, so the analysis will not distinguish the two statistics unless otherwise stated. All results are based on DINA is the correct model. DINA. With th e true Q matrix, rejection rate is 0 or .01 across all sample size and test length suggesting low type I error rate. If Q matrix is misspecified: at sample size 200, rejection rate is extremely high across all conditions with a few exceptions for ; at sample size 500 or larger, rejection rates are high (above96%), suggesting high statistical po we r to detect Q matrix misspecification of any type and degree. In other words, if the DINA model is the correct model and the correct Q matrix is used, the type I error rate of and is very low. If the DINA model is the correct model and an incorrect Q matrix of the

PAGE 59

59 kinds s pecified in this re search is used , and have good power to detect Q mat rix misspecification of any type and degree most of the time . G DINA. With the true Q matrix , rejection rate is 0 or .01 across all sample size and test length suggesting low type I error rate. If Q matrix is misspecified: rejection rate for over specification is 0 uniformly; rejection rate for under specification and both over and under specific ation are almost all 100% except for under specified item at sample size 200. In other words, the saturated G DINA model can be used (Ch en, de la Torre, & Zhang, 2013) to detect an in correct Q matrix that is either under specified or both under and over specified. Over specification of the Q matrix is hard to detect by using the G DINA model . A CDM. When A CDM is fitted, rejection rates are 100% across all conditions, suggesting high po wer to detect wrong model an d wrong Q matrix. In other words, an incorrect A CDM model can be detected by and regardless of whether the Q matrix is correct or not. 4.2.2 RMSEA . Figure 4 1 through figure 4 3 sho wed the RMSEA values of the DINA, G DINA and A CDM under various conditions with the .05 criterion.

PAGE 60

60

PAGE 61

61 Figure 4 1 . RMSEA of DINA with True and M isspecified Q matrix From F igure 4 1 we can see if true Q matrix is applied to DINA model, sample size influences the mean RMSEA substantially. To be specific, at sample size 200, RMSEA are above the .05 criterion for the small sample size. As sample size increases

PAGE 62

62 to 500 or more, RMSEA drops b elow the .05 criterion. On the other hand, when the misspecified Q matrix is applied to the correct DINA model, as sample size and test length increase, RMSEA decreases. The resul ts on RMSEA echoes with Maydeu and Joe (2014) who find that RMSEA decreases a s sample size increases. At the same time as number of misspecified items increases, RMSEA increase. When sample size is .05 even if the Q matrix is not correct. Another pheno menon is that regardless of type of misspecification, RMSEA performs the same. In summary, if we know the true CDM but not the true Q matrix , increase in sample size and test length decreases the RMSEA substantially. Minor misspecification (1 misspecified item) produces low RMSEA value, hence is hard to detect by RMSEA. To interpret the result in practical way: RMSEA values are below .05 with the true CDM and true Q matrix, especially at sample size 500 or larger; when we know the true CDM , RMSEA can be use d to detect misspecification in the Q matrix if degree of Q matrix is serious (RMSEA value above .05 at 3 misspecified items or more). Figure 4 2 shows the RMSEA of G DINA model with true and misspecified Q matrix.

PAGE 63

63

PAGE 64

64 Figure 4 2 . RMSEA of G DINA with True and M isspecified Q matrix From F igure 4 2 we can see that in general RMSEA of the saturated G DINA model is lo we r than the DINA model, which is not surprising for G DINA model is more complex and the extra parameterization will br ing RMSEA down (Rupp, Templin, & Henson, 2010; Maydeu & Joe, 2014). If true Q matrix is applied to G DINA model, same pattern can be seen with the DINA model: RMSEA drops as sample size and test length increases. For example, when sample size is 500 or lar ger, RMSEA is completely below .05. If the misspecified Q matrix is applied to G DINA model, as sample size and test length increase, RMSEA decreases. As number of misspecified items increase, RMSEA specification of th e Q matrix produces extremely low RMSEA (consistent with Rupp, Templin, & Henson, 2010), even lo we r than .05

PAGE 65

65 specified or both under and over specified), RMSEA value is close to or lo we r than .05. In summary, over specification is hard to detect by RMSEA with the saturated model. Minor m isspecification (1 item under specified or both under and over specified) is hard to detect by RMSEA by the saturated model either. Or in othe r words, we can use the saturated model and RMSEA to detect wrong Q matrix if certain type and degree of misspecification is serious (3 items or more of under specification or both under and over specification), but we can not use saturated model and RMSE A to detect over specified Q matrix. Figure 4 3 shows the RMSEA of A CDM with true and misspecified Q matrix.

PAGE 66

66

PAGE 67

67 Figure 4 3 . RMSEA of A CDM with True and M isspecified Q matrix regardless of Q matrix is correct or not. To sum both the relative and absolute fit indexes, the performance of the absolute fit statistics is consistent with the selection ra te of the relative fit indexes . Given the DINA model is the correct model : if we know the correct CDM , all Q matrix misspecification can be detected by both relative and absolute fit indexes; wrong CDM can be easily detected by both relative and absolute specification, relative fit index tend to choose the saturated model, and absolute fit index could not reject over specification of Q matrix with the saturated model minor misspecification of the Q matrix ( 3 or less under specified or both under and over specified items), relative fit index like BIC and CAIC can still pick the correct model especially when sample size is small ( 500 or less, especially when N=200), and absolute fit index like and can reject even minor Q matrix misspecification (1 misspecified item) while RMSEA can not.

PAGE 68

68 4.3 An Empirical Example In order to illustrate the results of this article , a subset of the Tatsuoka (1983 ) is used as an example to show how absolute fit index es and relative fit index es select among different CDMs . The Tatsuoka fraction subtraction data are responses of 2144 middle school students to 20 fraction subtraction problems involving 8 attributes ( Tatsuoka, 1983 ). The Tatsuoka fraction subtraction exam ple is popular for CDM research. Examples include: Chen, de la Torre and Zhang (2013), Close (2012), D eCarlo, (201 1,2012), de la Torre (2008 ), de la Torre and Dou glas (2004, 2008), de la Torre and Lee (2010), Henson, Te mpl in, and Willse (2009), Rupp and Templin (2008), and Sinhary and Almond (2007). The subset use d attributes for the 20 items are: (1) converting whole number to fraction. (2) separating whole numbe r from fraction. (2) simplifying for subtracting. (4) finding common denominator. (5) borrowing from whole number. (6) column borrowing. (7) subtracting numerators. (8) reducing ans we rs to the simplest form. The fraction subtraction data and the correspon di ng Q matrix are given in Figure 4 4 .

PAGE 69

69 Figure 4 4 . Fracti on Subtraction E xample and Q matrix . [Reprinted with permission from Chen, J. 2013. Relative and absolute fit evaluation in cognitive diagnosis m odeling. Journal of Educational Measurement (Page 135, Table 5)] DINA, G DINA and A CDM are fit to the fraction subtraction data with the Q matrix described above, model fit results are given in T able 4 4 including relative fit (AIC, BIC, CAIC) and absolute fit (RMSEA, and ). Table 4 4 . Model Fit Results of the Fraction Subtraction D ata A nalysis Model AIC BIC CAIC .p .p RMSEA DINA 8971.78 9301.65 9378.65 0.00 0.00 0.10 G DINA 8906.81 9879.31 10106.31 0.00 0.00 0.12 A CDM 8886.52 9370.63 9483.63 0.00 0.00 0.09 For relative fit ind ices, BIC and CAIC are the lo we st in the DINA model ( 9301.65 and 9378.65 respectively ) . Based on the results of this study, if a certain CDM is selected by BIC and CAIC over the saturated model, this CDM could be the correct model. The implication is that the DINA model is the correct model for the fraction subtraction data . Meanwhile the Q m atrix might be correct or completely under specified, or mildly over specified (1 over specified item at test length 20) , or mi norly both over and under specified ( 1 item misspecified ). For the absolute fit, the fact that

PAGE 70

70 even G DINA is rejected by and indicates that there are misspecification ( at least under specification) in the Q matrix given that RMSEA for DINA, G DINA and A CDM are above .05 (. 09, .1, and .12 respectively ) . The reason for the statement is that based on the result of this study G DINA with over specification produces very low RMSEA value. Further investigation to modify and check the under specification in the Q matrix , or both under and over specification is called for. Through t his example, it is illustrated how relative fit statistics like BIC or CA IC and absolute fit statistics like , and RMSEA work we ll together to help assess the model fit of CDMs. With BIC and CAIC, correct m odel is identified . W ith , and RMSEA, misspecification of the Q matrix is identified .

PAGE 71

71 CHAPTER 5 DISCUSSION AND CONCLUSION Importance has been attached to the model fit of the CDM in terms of Q matrix correctness and model appropriateness as a validity concern of the CDM studies. This study extends previous studies in the following aspects: this study generates and analyzes da ta in the general framework of G DINA following Chen, de la Torre and Zhang (2013); this study introduced various degrees of Q matrix misspecification by manipulating number of misspecified items randomly chosen to change the dimension and misspecification type of the Q matrix; this study evaluates the combined condition of Q matrix misspecification and CDM misspecification; this study introduces CAIC as the relative model fit for the first time and it turns out that CAIC is a reliable model fit index; and this study introduces .05 as the criteria for RMSEA for model fit purpose from the multidimensional IRT to CDM. The following conclusions can be drawn from this study (all conclusions and discussions are based on the DINA as the true CDM) : (1) W hen we are sure about the Q matrix and want to identify the true CDM , AIC, BIC and CAIC are reliable model fit statistics. This finding is consistent with the conclusion from Chen, de la Torre and Zhang (2013) who found that BIC and to some extent AIC can choose the true model. In contrast, Kunina Habenicht , Rupp and Wilhelm (2012) found that BIC does not select the true model with high consistency. This discrepancy is due to the fact that their generating model is a saturated model and BIC favors parsimoniou s models with fe we r parameters. (2) W hen we are sure about the CDM but not the Q matrix, all statistics are useful in detecting true Q matrix against the wrong Q matrix whenever true Q matrix is in the competing matrices. This result

PAGE 72

72 corresponds to the findings of Kunina Habenicht , Rupp and Wilhelm (2012) and Chen, de la Torre and Zhang (2013) who concluded that AIC and BIC are able to point to the correct Q matrix against the wrong Q matrix. Ho we ver, in reality, it is possible that we are not sure about the true Q matrix or the CDM. So the contribution of this study is that it might cast light on which index to use for model fit evaluation purpose and under what conditions. When we are not sure about the Q matrix or CDM, the following conclusion s can be made based on the results of this study: (1) A ll relative and absolute fit indexes can detect the incorrect model with high consistency. (2) The true Q matrix combined with either the correct DINA model or the saturated G DINA model has low rejection rates (0), suggesting extremely low type I error rate. (3) Under specification is easy to detect by AIC, BIC, CAIC, and . (4) Over specification of the Q matrix is hard to detect by and with G DINA. This conclusion is consistent with Kunina Habenicht, Rupp, and Wilhelm (2012). (5) S mall sample size (200), longer test length (40), small number of misspecified items (1 item) in combination of BIC and CAIC improves the selection rate towards the true DINA model. Choi, Templin, Cohen, and Atwood (2010), Kunina Habenicht, Rupp, and Wilhelm (2012) , and Chen, de la Torre and Zhang (2013) also found that BIC is useful in detecting the correct CDM . (6) .05 as a criterion of RMSEA for CDM is practical beca u s e Q matrix misspecification and CDM misspecification both produce large RMSEA values, while RMSEA value is low for saturated model with over specification in the Q matrix. To interpret the conclusion in a practical way: if a CDM is chosen by BIC or CAIC over the saturated model, it is possible this CDM is the correct model. Meanwhile, the

PAGE 73

73 Q matrix might be correct or completely under minor or medium degree of over specification (1 over specified item at test length 20 or 3 over specified and under specification in the items (1 item misspecified). On the other hand, if BIC and CAIC choose the saturated model, there might be complete over specification or sev ere other types of misspecification in the Q matrix (5 items or more) . If and fail to reject a model, it indicates possibly this model is the correct model with the correct Q matrix or minor misspecified Q matrix (either under specified or both under and over specified). The following suggestions can be made based on the results of this study if it is suspected a restricted CDM is the correct model to apply . I n order to determine the model fit of CDM, p ractitioners should conduct a pilot study first, preferably with smaller sample size (200), longer test length (40), together with BIC and CAIC to determi ne the correct CDM first. The reason is i f a certain CDM is selected over the saturated model, it is p ossible this model is the correct model . Then practitioners can fit the correct CDM (determined from above procedure) and the saturated CDM to the competing Q matrixes to determine the correct Q matrix using and until a Q matrix stops being rejected (note that with satur ated model over specification can not be rejected). Then practitioners can evaluate the RMSEA value, supposingly RMSEA with the correct model coupled with true Q matrix or 1 item misspecified has a value smaller than .05. Finally the practitioner can re evaluate the overall fit by re examining the AIC, BIC, CAIC , and p value to determine the final fit of the candidate CDMs and candidate Q matrix.

PAGE 74

74 The fin dings of this study should cast some insight to practical CDM assessment in terms of the statistical indexes to evaluate CDMs fit and when to use them , especially on the effect of number of misspecified items on CDM model fit as suggested by Chen, de la Torre, and Zhang (2013) . This study suggests that small number of misspecified items (1) are hard to be detected by the model fit indexes. T also l imitation s to this paper. ow if the BIC and CAIC pointing to the correct model because DINA is the generating model or because DINA is the more parsimonious model, more research needed to be done in this regard. T his paper only used dichotomous items to investigate the effects of v arious Q matrix misspecifications on model fit indices. Future studies can focus on polytomous items, different parameter estimation methods, classification accuracy of CDM by latent class, and non parametric methods, CDM with covariates, or multi level CD M.

PAGE 75

75 LIST OF REFERENCES Akaike, H. (1974). A new look at the statistical identification model. IEEE Transactions on Automated Control , 19 , 716 723. Ba okç u, T.O. (2014). Classification accuracy effects of Q Matrix validation and sample s ize in DINA and G DINA Models. Journal of Education and P ractice, 5 (6), 220 230. (2013). Model data fit comparison between DINA and G DINA in cognitive diagnostic models. Education Journal, 2 (6): 256 262 . Boz general theory and its analytical extensions , Psychometrika, 52, 345 37 0 . understandings of rational numbers: Building a multidimensional test within the diagnostic Classification framework. Educational Measurement: Issues and Practice, 33 , 2 14. Chen, J., & de la Torre, J. (2013). A general cognitive d iagnosis model for expert defin d polytomous attributes. Applied Psychological Measurement, 37, 419 437. Chen, J., de la Torre, J. and Zhang, Z. (2013), Relative and Absolute Fit Evaluation in Cognitive Diagnosis Modeling. Journal of Educational Measurement, 50 (2) , 123 14 0 . Choi, H. J., Templin, J . L., Cohen, A. S., & Atwood, C. H. (2010, April). The impact of model misspecification on estimation accuracy in diagnostic classification models. Paper presented at the meeting of the National Council on Measurement in Education (NCME), Denver, CO. Choi, H. J., Rupp, A. A, & Pan, M. (2012). Standardized D iagnostic Assessment Design and Analysis: Key Ideas from Modern Measurement Theory. In self directed learning oriented assessments in the Asia Pacific (pp61 86). New York, NY: Springer Publishing Company. Close, C. N. (2012). An Exploratory Technique for Finding the Q matrix for the DINA Model in Cognitive Diagnostic Assessment: Combining Theory with Data (Doctoral dissertation, U niversity of M innesota ). DeCarlo , L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q matrix. Applied Psychological Measurement , 35 , 8 26. DeCarlo , L. T. (2012). Recognizing uncertainty in the Q matrix via a Bayesian extension of the DINA model. Applied Psychological Measurement, 36, 447 468.

PAGE 76

76 de la Torre, J. (2008). An empirically based method of Q matrix validation for the DINA model: Development a nd applications. Journal of Educational Measurement , 45 , 343 362. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika , 76 , 179 199. de la Torre, J., & Chiu, C.Y. (2010, April). A general method of empirical Q matrix validation using the G DINA model discrimination index . Paper presented at the meeting of the National Council on Measurement in Education, Denver, CO. de la Torre, J., & Douglas, J. (2004). A higher order latent trait model for cognitive diagnosis. Psychometrika , 69 , 333 353. de la Torre, J., & Douglas, J. (2008). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika , 73 , 595 624. de la Torre, J., & Karelitz, T. M. (2009). Impact of diagnosticity on the ade quacy of models for cognitive diagnosis under a linear attribute structure: A simulation study. Journal of Educational Measurement, 46 (4), 45 0 469 . de la Torre, J., & Lee, Y.S. (2010). A note on the invariance of the DINA model parameters. Journal of Educa tional Measurement , 47 , 115 127. Doignon, J. P., & Falmagne, J. C. (1999). Knowledge spaces. Berlin: Springer. Gierl, M. J., Leighton, J. P., & Hunka, S. M. (2007). Using the attribute hierarchy method itive skills. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 242 274). Cambridge, UK: Cambridge University Press. Haertel, E. H. (1989). Using restricted latent class models to map the sk ill structure of achievement items. Journal of Educational Measurement, 26, 333 352. Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Unpublished doctoral dissertation, Un iversity of Illinois at Urbana Champaign, Urbana Champaign, IL. Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29 (4), 262 277 . Henson, R., Roussos, L. A., Douglas, J., & He, X. (2008). Cognit ive diagnostic attribute level discrimination indices. Applied Psychological Measurement, 32 (4), 275 288 . Henson, R. A., & Templin, J. L. (2009). Implications of Q matrix misspecification in cognitive diagnosis. Unpublished paper.

PAGE 77

77 Henson, R., Templin, J. L ., & Douglas, J. (2007). Using efficient model based sum scores for conducting skills diagnoses. Journal of Educational Measurement, 44 (4), 361 376 . Unpublished ETS Project Report, Princeton, NJ . Henson, R. A., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log linear models with latent variables. Psychometrika , 74 , 191 21. Hu, L., & Bentler, P.M. (1999). Cutoff criteria for fit indexe s in covariance structure analysis: Conventional criteria versus new alternatives. Structual Equation Modeling, 4, 1 55. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with non parametric item respon se theory. Applied Psychological Measurement , 25 , 258 272. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association , 90 , 773 795. Kunina Habenicht, O., Rupp, A. A., & Wilhelm, O. (2012). The impact of model missp ecification on parameter estimation and item fit assessment in log linear diagnostic classification models. Journal of Educational Measurement , 49 , 59 81. Lattin, J., Carroll, J. D., & Green, P. E. (2003). Analyzing multivariate data. Pacific Grove, CA: Br ooks/Cole Thomson Learning. Lazarsfeld P.F ., & Henry, N.W. (1968). Latent structure analysis . Boston: Houghton Mifflin. Lee, Y. S., Park, Y. S., & Taylan, D. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the US national sample using the TIMSS 2007 . International Journal of Testing, 11, 144 177. Leighton, J. P., Gierl, M. J., & Hunka, S. M. (2004). The attribute hierarchy method for space approach. Journal of Educational Measurement, 41, 205 237. Liu, Y., Douglas, J. A., & Henson, R. A. (2009). Testing person fit in cognitive diagnosis. Applied Psychological Measurement, 33 (8), 579 598 . Maris, E. (1995a). Psychometric latent response models. Psychom etrika, 60, 523 547. Maris, E. (1995b). Estimating multiple classification latent class models. Psychometrika, 64, 187 212.

PAGE 78

78 Maydeu Olivares, A, and Joe, H. (2014). Assessing Approximate Fit In Categorical Data Analysis. Multivariate Behavioral Research, (in press). Mcdonald, R.P., Mok, MM.C (1995) . Goodness of fit in item response models. Multivariate Behavavior Research, 30 (1) , 23 40. Ravand, H., Barati, H., & Widhiarso, W. (2013). Exploring diagnostic capacity of a high stakes reading comprehension tes t: A pedagogical demonstration. Iranian Journal of Language Testing, 3 (1), 1 27. Robitzsch, A., Kiefer, T., George, A. C., & Uenlue A. (2014). CDM package (Version 3.0 29) [Computer software]. Rupp, A. A., & Mislevy, R. J. (2007). Cognitive foundations o f structured item response theory models. In J. Leighton & M.J. Gierl (Eds.), Cognitively Diagnostic Assessment for E ducation: Theory and applications (pp. 205 241). Cambridge, UK: Cambridge University Press. Rupp, A. A., & Templin, J. L. (2008). The effe cts of Q matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68 (1), 78 96 . Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guilford Press. Schrepp, M. (2003). A method for the analysis of hierarchical dependencies bet I en items of a questionnaire. Methods of Psychological Research Online, 8, 43 79. Schwarz er, G. (1976). Estimating the dimension of a model. Annals of Statistics , 6 , 461 464. Sinharay, S., & Almond, R. G. (2007). Assessing fit of cognitive diagnostic models: A case study. Educational and Psychological Measurement , 67 , 239 257. Sinharay, S., Pu han, G., & Haberman, S. J. (2011). An NCME instructional module on subscores. Educational Measurement: Issues and Practice , 30 (3), 29 40. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity an d fit. Journal of the Royal Statistical Society Series B , 64 , 583 639. Su, Y.L., Choi, K. M., Lee,W.C., Choi, T., & McAninch, M. (2013). Hierarchical cognitive diagnostic analysis for TIMSS 2003 mathematics. CASMA Research Report 35. Center for Advanced St udies in Measurement and Assessment (CASMA), University of Iowa.

PAGE 79

79 Sun, J., Xin, T., Zhang, S., & de la Torre, J. (2013). A polytomous Extension of the generalized distance discriminating method. Applied Psychological Measurement, 37(7), 503 521. Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345 354. Templin, J., and Bradshaw. L . (2014). Hierarchical diagnostic classification models: a family of models f or estimating and testing attribute hierarchies. Psychometrika, 79 (2), 317 339. Templin, J., & Henson, R. A. (2006).Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods , 11 , 287 305. Templin, J. L., Henson, R. A., Templin, S. E., & Roussos, L. (2008). Robustness of hierarchical modeling of skill association in cognitive diagnosis models. Applied Psychological Measurement , 32 (7), 559 574. von Davier, M. (2005). A general diagnostic mode l applied to language testing data (Research Report: RR 05 16). Princeton, NJ: Educational Testing Service. von Davier, M. (2006). Multidimensional latent trait modelling (MDLTM) [Software program]. Princeton, NJ: Educational Testing Service. von Davier, M . (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287 307. von Davier, M. (2014), The DINA model as a constrained general diagnostic model: Two variants of a model equiva lency. British Journal of Mathematical and Statistical Psychology, 67, 49 71 . Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR 08 27. Princeton, ETS.

PAGE 80

80 BIOGRAPHICAL SKETCH Jinxiang Hu had her college degree in English l iterature in Tianjin Forei gn English ( i nternational information communication and m anagement ) in Tianjin Foreign Studies University in 2004. She entered University of Florida for her doctoral degree in e ducation, specializing in research and evaluation m ethodology. She is honored and grateful for the four years she spent in the University and Florida, and great appreciation goes to all the professors in R EM all her friends who escorted her along the way.