Observed Score and True Score Equating for Multidimensional Item Response Theory under Nonequivalent Group Anchor Test Design

MISSING IMAGE

Material Information

Title:
Observed Score and True Score Equating for Multidimensional Item Response Theory under Nonequivalent Group Anchor Test Design
Physical Description:
1 online resource (436 p.)
Language:
english
Creator:
Zhang, Ou
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Research and Evaluation Methodology, Human Development and Organizational Studies in Education
Committee Chair:
Miller, M David
Committee Co-Chair:
Algina, James J
Committee Members:
Leite, Walter
Smith, Stephen W

Subjects

Subjects / Keywords:
equating -- irt -- linking -- multidimensional -- neat
Human Development and Organizational Studies in Education -- Dissertations, Academic -- UF
Genre:
Research and Evaluation Methodology thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
In the MIRT framework, MIRT scale linking is used to adjust rotation, correlation, translation, and dilation so that two test forms are linked. Additionally, different measures of ability are available to report examinee’s ability proficiency. For each MIRT ability vector on one test form, it is possible that there is infinite number of ability vectors falling on the equivalent contours of test characteristic surface on the other equated test form. Therefore, using the number-correct score as the ability measure makes the MIRT equating available. Several equating procedures have been developed to conduct MIRT equating, (Brossman, 2010). The purpose of this study was to evaluate the performance of the MIRT equating under NEAT design and to explore how different MIRT linking methods interacting with these equating procedures impact on the equating results, under various testing conditions. In this study, five MIRT linking methods (i.e., the direct method-OD, the Test Characteristic Function method-TCF, the Item Characteristic Function method-ICF, the Min’s method-M, and the non-orthogonal Procrustes method-NOP) and three MIRT equating procedures (i.e., the full MIRT observed score equating, the unidimensional approximation of MIRT true score equating, and the unidimensional approximation of MIRT observed score equating) are examined. Results indicated that, the unidimensional approximation of MIRT true score equating procedure demonstrates the best performance as compared with the other two equating procedures across all group distribution conditions and all linking methods. The MIRT equating procedures under the TCF, the ICF, and the OD linking methods showed better equating performance as compared with those under the M and the NOP linking methods. The MIRT equating procedures under the NOP linking method demonstrated the worst equating performance within most of the group distribution conditions. Furthermore, the group ability mean difference factor had the largest negative effect on the equating results for all three equating procedures across all linking methods. Future studies are expected to address how the different MIRT software, the choice of the synthetic population weights, the choice of different criterion equating functions, and selection of rotation type influence the performance of the MIRT equating.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Ou Zhang.
Thesis:
Thesis (Ph.D.)--University of Florida, 2012.
Local:
Adviser: Miller, M David.
Local:
Co-adviser: Algina, James J.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2014-08-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2012
System ID:
UFE0044500:00001


This item is only available as the following downloads:


Full Text

PAGE 1

1 OBSERVED SCORE AND TRUE SCORE EQUATING FOR MULTIDIMENSIONAL ITEM RESPONSE THEORY UNDER NONEQUIVALENT GROUP ANCHOR TEST DESIGN By OU ZHANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMEN T OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2012

PAGE 2

2 2012 Ou Zhang

PAGE 3

3 Having drea m is what makes life tolerable!

PAGE 4

4 ACKNOWLEDGMENTS I would like to express my sincere appreciation to Dr. M. David. Miller, m y committee chair, who has always been there for me to answer my questions, to provide valuable guidance, and to encourage and support me not only for this dissertation, but also for helping me start my professional career. I would like to extend my endles s thanks and appreciation to Dr. James J. Algina, my committee co chair, for providing me valuable guidance and selfless devotion to help me succeed I am also extremely grateful to my other committee member Dr. Walter Leite and Dr. Stephen Smith who have generously given their time and expertise to review my work. My deepest gratitude goes to my dearest Parents Zhigang Zhang and Rongqing Ou, for their constant support, love and trust I wouldn't h ave made it this far without their support.

PAGE 5

5 TABLE OF CO NTENTS pageinking and Equating ................................ ................................ ................................ ............. 23 Item Response Theory ................................ ................................ ................................ ............ 24 Multidimensional Item Response Theory ................................ ................................ ............... 24 MIRT Linking ................................ ................................ ................................ ......................... 26 MIRT Equating ................................ ................................ ................................ ....................... 27 Limitation in Previous MIRT Equating Research ................................ ................................ .. 33 Purpose of the Study ................................ ................................ ................................ ............... 34 2 LITERATURE REVIEW ................................ ................................ ................................ ....... 36 Item Response Theory ................................ ................................ ................................ ............ 36 Unidimensional Ite m Response Theory ................................ ................................ .......... 37 Unidimensional Item Response Model for Dichotomous Data ................................ ....... 37 Item characteristic curve (ICC) ................................ ................................ ................ 39 Test characteristic curve (TCC) ................................ ................................ ............... 40 Multidimensional Item Response Theory ................................ ................................ ............... 41 Multid imensional IRT Models for Dichotomous Data ................................ ................... 41 Compensatory Multidimensional Item Response Model ................................ ................ 42 Partially compensatory Multidim ensional Item Response Model ................................ ... 44 Summary of MIRT Statistics ................................ ................................ ................................ .. 46 MDISC ................................ ................................ ................................ ............................ 46 MDIFF ................................ ................................ ................................ ............................. 49 Lower Asymptote Parameter ................................ ................................ ........................... 50 Item Characteristic Surface ................................ ................................ ............................. 50 Test Characteristic Surface (TCS) ................................ ................................ ................... 51 Item Information Function ................................ ................................ .............................. 52 Reference Composite and Direction of Best Measurement ................................ ............ 53 Linking and Equating ................................ ................................ ................................ ............. 57 Data Collection Design ................................ ................................ ................................ .... 57 IRT Linking ................................ ................................ ................................ ..................... 58

PAGE 6

6 Invariance of item/person parameters ................................ ................................ ...... 59 Scale indeterminacy ................................ ................................ ................................ 59 IRT Sca le Linking Methods ................................ ................................ ............................ 60 Concurrent calibration ................................ ................................ .............................. 61 Fixed common item parameter calibration (FCIP) ................................ .................. 61 Scale linking after separate calibrations ................................ ................................ ... 61 UIRT Scale Linking ................................ ................................ ................................ ................ 61 Scale Transformation Procedure ................................ ................................ ..................... 62 Mean/Sigma Method ................................ ................................ ................................ ....... 63 Mean/Mean Method ................................ ................................ ................................ ........ 63 Haebara Method ................................ ................................ ................................ .............. 64 Stocking Lord Method ................................ ................................ ................................ .... 65 Summary of UIRT Scale Linking Methods ................................ ................................ ..... 66 MIRT Sca le Linking ................................ ................................ ................................ ............... 66 ................................ ................................ ................................ .............. 69 ................................ ................................ ....... 72 ................................ ................................ .......................... 73 ................................ ................................ ................................ ........... 75 Reckase and Martineau (NOP) Method ................................ ................................ .......... 77 ................................ ................................ ...... 79 The direct (OD) procedure ................................ ................................ ....................... 80 The equated function procedure ................................ ................................ ............... 80 The test characteristic function (TCF) procedure ................................ .................... 82 The item characteristic function (ICF) procedure ................................ .................... 83 Summary of MIRT Scale Linking ................................ ................................ ................... 83 IRT Equating ................................ ................................ ................................ .......................... 85 Measures of Ability ................................ ................................ ................................ ......... 85 UIRT True score Equating ................................ ................................ .............................. 86 UIRT Observed score Equating ................................ ................................ ...................... 88 Summary of UIRT Equating M ethods ................................ ................................ ............ 89 MIRT Equating ................................ ................................ ................................ ....................... 90 Full MIRT Observed Score Equating (MOSE) ................................ ............................... 94 Unidimensional Approximation ................................ ................................ ...................... 96 Unidimensional Approximation of MIRT True Score Equating (ATSE) ..................... 100 Unidimensional Approximati on of MIRT Observed Score Equating (AOSE) ............. 101 3 METHODOLOGY ................................ ................................ ................................ ............... 102 Simulation Design ................................ ................................ ................................ ................ 102 Model Used to Generate Data ................................ ................................ ....................... 103 Test Length ................................ ................................ ................................ .................... 103 Anchor Test Length ................................ ................................ ................................ ....... 104 Test Structure and MIRT Item Parameter ................................ ................................ ..... 105 Generating Item Parameters for Simulation ................................ ................................ .. 107 Sample Size ................................ ................................ ................................ ................... 114 Number of Replications ................................ ................................ ................................ 115 Ability Distribution Design ................................ ................................ ........................... 115

PAGE 7

7 Condition Summary ................................ ................................ ................................ ....... 117 Data Generation ................................ ................................ ................................ .................... 118 MIRT Parameter Estimation ................................ ................................ ................................ 118 Selec tion of IRT Software Program ................................ ................................ .............. 118 Multidimensional IRT Estimation ................................ ................................ ................. 119 Identification Issue in Item Parameter Estimation ................................ ........................ 120 Correcting Dimension Sequence and Direction ................................ ............................ 121 Linking ................................ ................................ ................................ ................................ .. 121 MIRT Linking ................................ ................................ ................................ ............... 122 Minimization Algorithm used in Linking Procedure ................................ .................... 122 Newton Raphson algorithm ................................ ................................ ................... 122 Broyden Fletcher Goldfarb Shanno (BFGS) method ................................ .......... 123 Criteria and Quadrature Nodes used in Minimization ................................ ................... 123 Equati ng ................................ ................................ ................................ ................................ 124 Common Target Population and Synthetic Population Weights ................................ ... 124 MIRT Equating ................................ ................................ ................................ .............. 125 Full MIRT observed score equating (MOSE) ................................ ........................ 125 Unidimensional approximation for MIRT equating ................................ ............... 127 Unidimensional approximation of MIRT true (ATSE) /observed score equating (AOSE) ................................ ................................ ................................ ............... 129 Large Sample Criterion and Criterion Equating Method ................................ .............. 129 Ev aluation Criteria and Data Analysis ................................ ................................ ................. 131 Evaluation Criteria ................................ ................................ ................................ ......... 131 Standard error of equating ( ) ................................ ................................ .......... 132 Equating bias ................................ ................................ ................................ .......... 132 Root mean square deviation ( ) ................................ ................................ ... 133 Wei ghted average root mean square deviation ( ) ................................ ... 133 ANOVA Analysis ................................ ................................ ................................ .......... 133 4 RESULTS ................................ ................................ ................................ ............................. 137 Preliminary ANOVA results ................................ ................................ ................................ 138 2 ................................ ................................ .. 138 Comparison for the Linking Method x Group Distribution Interaction ........................ 139 Comparison for the Equating Method x Group Distribution Interaction ...................... 142 Gr oup Distribution Conditions ................................ ................................ ............................. 144 Group Distribution Condition 1 ................................ ................................ ..................... 145 Equivalent score difference ................................ ................................ .................... 145 SEE, RMSD, and Bias ................................ ................................ ............................ 146 Group Distribution Condition 2 ................................ ................................ ..................... 148 Equivalent score difference ................................ ................................ .................... 148 SEE, RMSD, and Bias ................................ ................................ ............................ 149 Group Distribution Condition 3 ................................ ................................ ..................... 151 Equivalent scor e difference ................................ ................................ .................... 151 SEE, RMSD, and Bias ................................ ................................ ............................ 152 Group Distribution Condition 4 ................................ ................................ ..................... 154

PAGE 8

8 Equivalent score difference ................................ ................................ .................... 154 SEE, RMSD, and Bias ................................ ................................ ............................ 155 Group Distribution Condition 5 and 6 ................................ ................................ ........... 158 5 DISCUSSION AND CONCLUSION ................................ ................................ .................. 160 Equivalent Score Difference ................................ ................................ ................................ 161 Standard Error of Equating ................................ ................................ ................................ ... 164 RMSD ................................ ................................ ................................ ................................ ... 165 Bias ................................ ................................ ................................ ................................ ....... 166 Effects from IRT Estimation ................................ ................................ ................................ 166 Effects from IRT Linking Methods ................................ ................................ ...................... 167 Limitation and Future Research Direction ................................ ................................ ........... 168 Conclusion

PAGE 9

9 LIST OF TABLES Table page A 1 Multidimensional model u sed in previous MIRT linking/equating studies ..................... 174 A 2 Linking design, test length and anchor test length in previous MIRT linking/equating studies ................................ ................................ ................................ .............................. 174 A 3 Test structure in previous MIRT linking/equating studies ................................ .............. 175 A 4 .......................... 175 A 5 Ten MIRT discrimination and difficulty levels ................................ ............................... 175 A 6 Test structure of base form unique test section (approximate simple structure) ............. 176 A 7 Test structure of base form unique test section (complex structure) ............................... 177 A 8 Test structure of equated form unique item section (approximate simpl e structure) ...... 178 A 9 Test structure of equated form unique item section (complex structure) ........................ 179 A 10 Test structure of anchor item section (approximate simple structure) ............................. 180 A 11 Test structure of anchor item section (complex structure) ................................ ............... 181 A 12 Sample size us ed in previous MIRT linking/equating research ................................ ....... 182 A 13 Number of replications used in previous MIRT linking/equating research ..................... 182 A 14 Ability distributions for examinee groups ................................ ................................ ....... 183 A 15 Simulation design condition ((2 base group 6 ability distributions) 2 test structure) ................................ ................................ ................................ .......................... 183 A 16 MIRT software used in previous MIRT linking/equating studies ................................ ... 184 A 17 Rotation methods used in previous MIRT linking/equatin g studies ................................ 184 A 18 Repeated measure analysis results for weighted Bias and ARMSD ................................ 1 85 A 19 Weighted mean Bias for linking methods group ................................ ............................ 186 A 20 Weighted mean ARMSD for linking methods group ................................ ..................... 186 A 21 Weighted me an Bias for equating methods group ................................ ......................... 187 A 22 Weighted mean ARMSD for equating methods group ................................ .................. 187

PAGE 10

10 A 23 condition 1 .......... 188 A 24 Equating mean score and score difference for ODL direct method condition 1 ............. 189 A 25 Equating mean score and score difference for TCF linking method condition 1 ............ 190 A 26 Equating mean score and score difference for ICF linking method condition 1 ............. 191 A 27 Equating mean score and score difference for NOP linking method condition 1 ........... 192 A 28 Equating mean score and condition 2 .......... 193 A 29 Equating mean score and score difference for ODL direct method condition 2 ............ 194 A 30 Equating mean score and score difference for TCF linking method condition 2 ........... 195 A 31 Equating mean score and score difference for ICF linking method condition 2 ............. 196 A 32 Equating mean score and score difference for NOP linking method condition 2 ........... 197 A 33 Equating mean score and score difference for Min condition 3 .......... 198 A 34 Equating mean score and score difference for ODL Direct method condition 3 ........... 199 A 35 Equati ng mean score and score difference for TCF linking method condition 3 ............ 200 A 36 Equating mean score and score difference for ICF linking method condition 3 ............. 201 A 37 Equating mean score and score difference for NOP linking method condition 3 ........... 202 A 38 conditio n 4 .......... 203 A 39 Equating mean score and score difference for ODL Direct method condition 4 ............ 204 A 40 Equating mean score and score dif ference for TCF linking method condition 4 ............ 205 A 41 Equating mean score and score difference for ICF linking method condition 4 ............. 206 A 42 Equating mean score and score difference for NOP linking method condition 4 ........... 207 A 43 condition 5 .......... 208 A 44 Equating mean score and score difference for ODL Direct method condition 5 ............ 209 A 45 Equating mean score and score difference for TCF linking met hod condition 5 ............ 210 A 46 Equating mean score and score difference for ICF linking method condition 5 ............. 211 A 47 Equating mean score and score difference for NOP linking method condition 5 ........... 212

PAGE 11

11 A 48 condition 6 .......... 213 A 49 Equating mean score and score difference for ODL direct method condition 6 ............. 214 A 50 Equating mean score and score difference for TCF linking method condition 6 ............ 215 A 51 Equating mean score and score difference for ICF linking method condition 6 ............. 216 A 52 Equating mean score and score difference for NO P linking method condition 6 ........... 217 A 53 condition 7 .......... 218 A 54 Equat ing mean score and score difference for ODL direct method condition 7 ............. 219 A 55 Equating mean score and score difference for TCF linking method condition 7 ............ 220 A 56 Equating mean score and score difference for ICF linking method condition 7 ............. 221 A 57 Equating mean score and score difference for NOP linking method condition 7 ........... 222 A 58 condition 8 .......... 223 A 59 Equating mean score and score di fference for ODL direct method condition 8 ............. 224 A 60 Equating mean score and score difference for TCF linking method condition 8 ............ 225 A 61 Equating mean score and score difference for ICF linking method condition 8 ............. 226 A 62 Equating mean score and score difference for NOP linking method condition 8 ........... 227 A 63 condition 9 .......... 228 A 64 Equating mean score and score difference for ODL direct met hod condition 9 ............. 229 A 65 Equating mean score and score difference for TCF linking method condition 9 ............ 230 A 66 Equating mean score and score difference for ICF linking method condition 9 ............. 231 A 67 Equating mean score and score difference for the NOP linking method condition 9 ..... 232 A 68 condition 10 ........ 233 A 69 Equating mean score and score difference for ODL direct method condition 10 ........... 234 A 70 Equating mean score and score difference for TCF linking method condition 10 .......... 235 A 71 Equating mean score and score difference for ICF linking method condition 10 ........... 236 A 72 Equating mean score and score difference for NOP linking method condition 10 ......... 237

PAGE 12

12 A 7 3 condition 11 ........ 238 A 74 Equating mean score and score difference for ODL direct method condition 11 ........... 239 A 75 Equating mean score and score difference for TCF linking method condition 11 .......... 240 A 76 Equating mean score and score difference for ICF linking metho d condition 11 ........... 241 A 77 Equating mean score and score difference for NOP linking method condition 11 ......... 242 A 78 Equating mean score condition 12 ........ 243 A 79 Equating mean score and score difference for ODL direct method condition 12 ........... 244 A 80 Equating mean score and score difference for TCF linking method condition 12 .......... 245 A 81 Equating mean score and score difference for ICF linking method condition 12 ........... 246 A 82 Equating mean score and score difference for NOP linking method condition 12 ......... 247 A 83 condi tion 1 ................................ ..................... 248 A 84 SEE, Bias, and RMSD for ODL direct method condition 1 ................................ ............ 249 A 85 SEE, Bias, and RMSD for ODL TCF method condition 1 ................................ .............. 250 A 86 SEE, Bias, and RMSD for ODL ICF method condition 1 ................................ ............... 251 A 87 SEE, Bias, and RMSD for NOP method condition 1 ................................ ...................... 252 A 88 condition 2 ................................ ..................... 253 A 89 SEE, Bias, and RMSD for ODL direct method condition 2 ................................ ............ 254 A 90 SEE, Bias, and RMSD for ODL TCF method condition 2 ................................ .............. 255 A 91 SEE, Bias, and RMSD for ODL ICF method condition 2 ................................ ............... 256 A 92 SEE, Bias, and RMSD for NOP method condition 2 ................................ ...................... 257 A 93 condition 3 ................................ ..................... 258 A 94 SEE, Bias, and RMSD for ODL direct method condition 3 ................................ ............ 259 A 95 SEE, Bias, and RMSD for ODL TCF method condition 3 ................................ .............. 260 A 96 SEE, Bias, and RMSD for ODL ICF method condition 3 ................................ ............... 261 A 97 SEE, Bias, and RMSD for NOP method condition 3 ................................ ...................... 262

PAGE 13

13 A 9 8 condition 4 ................................ ..................... 263 A 99 SEE, Bias, and RMSD for ODL direct method condition 4 ................................ ............ 264 A 100 SEE, B ias, and RMSD for ODL TCF method condition 4 ................................ .............. 265 A 101 SEE, Bias, and RMSD for ODL ICF method condition 4 ................................ ............... 266 A 102 SEE, Bias, and RMSD for NOP method condition 4 ................................ ...................... 267 A 103 condition 5 ................................ ..................... 26 8 A 104 SEE, Bias, and RMSD for ODL direct method condition 5 ................................ ............ 269 A 105 SEE, Bias, and RMSD for ODL TCF method condition 5 ................................ .............. 270 A 106 SEE, Bias, and RMSD for ODL ICF m ethod condition 5 ................................ ............... 271 A 107 SEE, Bias, and RMSD for NOP method condition 5 ................................ ...................... 272 A 108 conditio n 6 ................................ ..................... 273 A 109 SEE, Bias, and RMSD for ODL direct method condition 6 ................................ ............ 274 A 110 SEE, Bias, and RMSD for ODL TCF method condition 6 ................................ .............. 275 A 111 SEE, Bias, and RMSD for ODL ICF method condition 6 ................................ ............... 276 A 112 SEE, Bias, RMSD for NOP method condition 6 ................................ ............................. 277 A 113 condition 7 ................................ ..................... 278 A 114 SEE, Bias, and RMSD for ODL direct method condition 7 ................................ ............ 279 A 115 SEE, Bias, and RMSD for ODL TCF method condition 7 ................................ .............. 280 A 116 SEE, Bias, and RMSD for ODL ICF method condition 7 ................................ ............... 281 A 117 SEE, Bias, and RMSD for NOP method condition 7 ................................ ...................... 282 A 118 condition 8 ................................ ..................... 283 A 119 SEE, Bias, and RMSD for ODL direct method condition 8 ................................ ............ 284 A 120 SEE, Bias, and RMSD for ODL TCF method condition 8 ................................ .............. 285 A 121 SEE, Bias, and RMSD for ODL ICF method condition 8 ................................ ............... 286 A 122 SEE, Bias, and RMSD for NOP method condition 8 ................................ ...................... 287

PAGE 14

14 A 123 condition 9 ................................ ..................... 288 A 124 SEE, Bias, and RMSD for ODL direct method condition 9 ................................ ............ 289 A 1 25 SEE, Bias, and RMSD for ODL TCF method condition 9 ................................ .............. 290 A 126 SEE, Bias, and RMSD for ODL ICF method condition 9 ................................ ............... 291 A 127 SEE, Bias, and RMSD for NOP method condition 9 ................................ ...................... 292 A 128 condition 10 ................................ ................... 293 A 129 SEE, Bias, and RMS D for ODL direct method condition 10 ................................ .......... 294 A 130 SEE, Bias, and RMSD for ODL TCF method condition 10 ................................ ............ 295 A 131 SEE, Bias, and RMSD f or ODL ICF method condition 10 ................................ ............. 296 A 132 SEE, Bias, and RMSD for NOP method condition 10 ................................ .................... 297 A 133 ethod condition 11 ................................ ................... 298 A 134 SEE, Bias, and RMSD for ODL direct method condition 11 ................................ .......... 299 A 135 SEE, Bias, and RMSD for ODL TCF meth od condition 11 ................................ ............ 300 A 136 SEE, Bias, and RMSD for ODL ICF method condition 11 ................................ ............. 301 A 137 SEE, Bias, RMSD for NOP method condition 1 1 ................................ ........................... 302 A 138 condition 12 ................................ ................... 303 A 139 SEE, Bias, and RMSD for ODL direct method condition 12 ................................ .......... 304 A 140 SEE, Bias, and RMSD for ODL TCF method condition 12 ................................ ............ 305 A 141 SEE, Bias, and RMSD for ODL ICF method condition 12 ................................ ............. 306 A 142 SEE, Bias, and RMSD for NOP method condition 12 ................................ .................... 307

PAGE 15

15 LIST OF FIGURES Figure page 1 1 Po ssible unidimensionalization at different IRT equating stages ................................ ...... 32 2 1 Example item characteristic curve ................................ ................................ ..................... 39 2 2 Example test charact eristic curve (TCC) for a 3 item test and ................................ ........... 40 2 3 Plot of vector that yield exponen ts of for a test item with parameter ................................ ................................ ................................ ................. 44 2 4 Graphical representation of in a two dimensional space, for a test item with parameters by the length of vector arrow and ................................ ................................ .................... 48 2 5 Item response surface (IRS) and equiprobable contour plot for an item with parameters ................................ ................................ ................. 50 2 6 Test characteristic surface (TCS) for a three item test with parameters ................................ ....................... 51 2 7 Reference composite with item parameter arrows for a five item MIRT test (M2PL) with parameters ................................ ................................ ....................... 54 2 8 UIRT and MIRT Linking Components represents origin, represents the unit of measurement for Scale and ................................ ................................ ......................... 68 3 1 Approximate Simple Structure (APSS) (Modified from Min, 2003) .............................. 109 3 2 Complex Structure (CS) (Modified from Min, 2003) ................................ ...................... 110 3 3 Item vectors APSS of base form unique item section ................................ ..................... 111 3 4 Item Vectors CS of base form unique item section ................................ ......................... 111 3 5 Item Vectors APSS of equated form unique item section ................................ ............... 112 3 6 Item Vectors CS of equated form unique item section ................................ .................... 112 3 7 Item Vectors APSS of anchor item section ................................ ................................ ..... 113

PAGE 16

16 3 8 Item Vectors CS of anchor item section ................................ ................................ .......... 113 B 1 Equiva condition 1 ................................ ................... 308 B 2 Equivalent score difference OD method condition 1 ................................ ...................... 309 B 3 Equivalent Score Difference TCF method condition 1 ................................ ................... 310 B 4 Equivalent score difference ICF method condition 1 ................................ ...................... 311 B 5 Equivalent score difference N OP method condition 1 ................................ .................... 312 B 6 condition 2 ................................ ................... 313 B 7 Equivalent score difference OD method c ondition 2 ................................ ...................... 314 B 8 Equivalent score difference TCF method condition 2 ................................ ..................... 315 B 9 Equivalent score difference ICF method condition 2 ................................ ...................... 316 B 10 Equivalent score difference NOP method condition 2 ................................ .................... 317 B 11 condition 3 ................................ ................... 318 B 12 Equivalent score difference OD method condition 3 ................................ ...................... 319 B 13 Equivalent score difference TCF method condition 3 ................................ ..................... 320 B 14 Equivalent score difference ICF method condition 3 ................................ ...................... 321 B 15 Equivalent score difference NOP method condition 3 ................................ .................... 322 B 16 condition 4 ................................ ................... 323 B 17 Equivalent score difference OD method condition 4 ................................ ...................... 324 B 18 Equivalent score difference TCF method condition 4 ................................ ..................... 325 B 19 Equivalent score difference ICF method condition 4 ................................ ...................... 326 B 20 Equivalent score difference NOP method condition 4 ................................ .................... 327 B 21 condition 5 ................................ ................... 328 B 22 Equivalent score difference OD method condition 5 ................................ ...................... 329 B 23 Equivalent score difference TCF method condition 5 ................................ ..................... 330 B 24 Equivalent score difference ICF method condition 5 ................................ ...................... 331

PAGE 17

17 B 25 Equivalent score difference NOP method condition 5 ................................ .................... 332 B 26 Equiva condition 6 ................................ ................... 333 B 27 Equivalent score difference OD method condition 6 ................................ ...................... 334 B 28 Equivalent scor e difference TCF method condition 6 ................................ ..................... 335 B 29 Equivalent score difference ICF method condition 6 ................................ ...................... 336 B 30 Equivalent score differen ce NOP method condition 6 ................................ .................... 337 B 31 condition 7 ................................ ................... 338 B 32 Equivalent score difference OD me thod condition 7 ................................ ...................... 339 B 33 Equivalent score difference TCF method condition 7 ................................ ..................... 340 B 34 Equivalent score difference ICF method condi tion 7 ................................ ...................... 341 B 35 Equivalent score difference NOP method condition 7 ................................ .................... 342 B 36 condition 8 ................................ ................... 343 B 37 Equivalent score difference OD method condition 8 ................................ ...................... 344 B 38 Equivalent score difference TCF method condition 8 ................................ ..................... 345 B 39 Equivalent score difference ICF method condition 8 ................................ ...................... 346 B 40 Equivalent score difference NOP method condition 8 ................................ .................... 347 B 41 condition 9 ................................ ................... 348 B 42 Equivalent score difference OD method condition 9 ................................ ...................... 349 B 43 Equivalent score difference TCF method condition 9 ................................ ..................... 350 B 44 Equivalent score difference ICF method condition 9 ................................ ...................... 351 B 45 Equivalent score difference NOP method condition 9 ................................ .................... 352 B 46 condition 10 ................................ ................. 353 B 47 Equivalent score difference OD method condition 10 ................................ .................... 354 B 48 Equivalent score difference TCF method condition 10 ................................ ................... 355 B 49 Equivalent score difference ICF method condition 10 ................................ .................... 356

PAGE 18

18 B 50 Equivalent score difference NOP method condition 10 ................................ .................. 357 B 51 condition 11 ................................ ................. 358 B 52 Equivalent score difference OD method condition 11 ................................ .................... 359 B 53 Equivalent score difference TCF method condition 11 ................................ ................... 360 B 54 Equivalent score difference ICF method condition 11 ................................ .................... 361 B 55 Equivale nt score difference NOP method condition 11 ................................ .................. 362 B 56 condition 12 ................................ ................. 363 B 57 Equivalent sco re difference OD method condition 12 ................................ .................... 364 B 58 Equivalent score difference TCF method condition 12 ................................ ................... 365 B 59 Equivalent score differ ence ICF method condition 12 ................................ .................... 366 B 60 Equivalent score difference NOP method condition 12 ................................ .................. 367 B 61 SEE, RMSD, and Bias d condition 1 ................................ .......................... 368 B 62 SEE, RMSD, and Bias OD method condition 1 ................................ .............................. 369 B 63 SEE, RMSD, and Bias TCF method condition 1 ................................ ............................ 370 B 64 SEE, RMSD, and Bias ICF method condition 1 ................................ ............................. 371 B 65 SEE, RMSD, and Bias NOP method condition 1 ................................ ........................... 372 B 66 SEE, RMSD, and Bias condition 2 ................................ .......................... 373 B 67 SEE, RMSD, and Bias OD method condition 2 ................................ .............................. 374 B 68 SEE, RMSD, and Bias TCF method condition 2 ................................ ............................ 375 B 69 SEE, RMSD, and Bias ICF method condition 2 ................................ ............................. 376 B 70 SEE, RMSD, and Bia s NOP method condition 2 ................................ ........................... 377 B 71 SEE, RMSD, and Bias condition 3 ................................ ......................... 378 B 72 SEE, RMSD, and Bias OD method condition 3 ................................ .............................. 379 B 73 SEE, RMSD, and Bias TCF method condition 3 ................................ ............................ 380 B 74 SEE, RMSD, and Bias ICF method condition 3 ................................ ............................. 381

PAGE 19

19 B 75 SEE, RMSD, and Bias NOP method condition 3 ................................ ........................... 382 B 76 SEE, RMSD, and Bias condition 4 ................................ .......................... 383 B 77 SEE, RMSD, and Bias OD method condition 4 ................................ .............................. 384 B 78 SEE, RMSD, and Bias TCF method condition 4 ................................ ............................ 385 B 79 SEE, RMSD, and Bias ICF method condition 4 ................................ ............................. 386 B 80 SEE, RMSD, and Bias NOP method condition 4 ................................ ........................... 387 B 81 SEE, RMSD, and Bias od condition 5 ................................ .......................... 388 B 82 SEE, RMSD, and Bias OD method condition 5 ................................ .............................. 389 B 83 SEE, RMSD, and Bias TCF method condition 5 ................................ ............................ 390 B 84 SEE, RMSD, and Bias ICF method condition 5 ................................ ............................. 391 B 85 SEE, RMSD, and Bias NOP method condition 5 ................................ ........................... 392 B 86 SEE, RMSD, and Bias condition 6 ................................ .......................... 393 B 87 SEE, RMSD, and Bias OD method condition 6 ................................ .............................. 394 B 88 SEE, RMSD, and Bias TCF method condition 6 ................................ ............................ 395 B 89 SEE, RMSD, and Bias ICF method condition 6 ................................ ............................. 396 B 90 SEE, RMSD, and Bi as NOP method condition 6 ................................ ........................... 397 B 91 SEE, RMSD, and Bias condition 7 ................................ .......................... 398 B 92 SEE, RMSD, and Bias OD method condition 7 ................................ .............................. 399 B 93 SEE, RMSD, and Bias TCF method condition 7 ................................ ............................ 400 B 94 SEE, RMSD, and Bias ICF method condition 7 ................................ ............................. 401 B 95 SEE, RMSD, and Bias NOP method condition 7 ................................ ........................... 402 B 96 SEE, RMSD, and Bias condition 8 ................................ .......................... 403 B 97 SEE, RMSD, and Bias OD method condition 8 ................................ .............................. 404 B 98 SEE, RMSD, and Bias TCF method condition 8 ................................ ............................ 405 B 99 SEE, RMSD, and Bias ICF method condition 8 ................................ ............................. 406

PAGE 20

20 B 100 SEE, RMSD, and Bias NOP method condition 8 ................................ ........................... 407 B 101 SEE, RMSD, and Bias thod condition 9 ................................ .......................... 408 B 102 SEE, RMSD, and Bias OD method condition 9 ................................ .............................. 409 B 103 SEE, RMSD, and Bias TCF method condition 9 ................................ ............................ 410 B 104 SEE, RMSD, and Bias ICF method condition 9 ................................ ............................. 411 B 105 SEE, RMSD, and Bias NOP method condition 9 ................................ ........................... 412 B 106 SEE, RMSD, and Bias condition 10 ................................ ........................ 413 B 107 SEE, RMSD, and Bias OD method condition 10 ................................ ............................ 414 B 108 SEE, RMSD, and Bias TCF method condition 10 ................................ .......................... 415 B 109 SEE, RMSD, and Bias ICF method condition 10 ................................ ........................... 416 B 110 SE E, RMSD, and Bias NOP method condition 10 ................................ ......................... 417 B 111 SEE, RMSD, and Bias condition 11 ................................ ........................ 418 B 112 SEE, RMSD, and Bias OD method condition 11 ................................ ............................ 419 B 113 SEE, RMSD, and Bias TCF method condition 11 ................................ .......................... 420 B 114 SEE, RMSD, and Bias ICF method condition 11 ................................ ........................... 421 B 115 SEE, RMSD, and Bias NOP method condition 11 ................................ ......................... 422 B 116 SEE, RMSD, and Bias condition 12 ................................ ........................ 423 B 117 SEE, RMSD, and Bias OD method condition 12 ................................ ............................ 424 B 118 SEE, RMSD, and Bias TCF method condition 12 ................................ .......................... 4 25 B 119 SEE, RMSD, and Bias ICF method condition 12 ................................ ........................... 426 B 120 SEE, RMSD, and Bias NOP method condition 12 ................................ ......................... 427

PAGE 21

21 Abstract of Dissertation Pre sented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy OBSERVED SCORE AND TRUE SCORE EQUATING FOR MULTIDIMENSIONAL ITEM RESPONSE THEORY UNDER NONEQUIVALEN T GROUP ANCHOR TEST DESIGN By Ou Zhang August 2012 Chair: M. David Miller Cochair: James Algina Major: Research and Evaluation Methodology In the MIRT framework, MIRT scale linking is used to adjust rotation, correlation, translation, and dilation so that two test forms are linked. Additionally, different measures of one test form, it is possible that there is infinite number of ability vectors falling on the eq uivalent contours of test characteristic surface on the other equated test form. Therefore, using the number correct score as the ability measure makes the MIRT equating available. Several equating procedures have been developed to conduct MIRT equating, (Brossman, 2010). The purpose of this study was to evaluate the performance of the MIRT equating under NEAT design and to explore how different MIRT linking methods interacting with these equating procedures impact on the equating results, under various te sting conditions. In this study, five MIRT linking methods (i.e., the direct method OD, the Test Characteristic Function method TCF, the Item Characteristic Function method method M, and the non orthogonal Procrustes method NOP) and three MI RT equating procedures (i.e., the full MIRT observed score equating, the unidimensional approximation of

PAGE 22

22 MIRT true score equating, and the unidimensional approximation of MIRT observed score equating) are examined. Results indicated that, the unidimension al approximation of MIRT true score equating procedure demonstrates the best performance as compared with the other two equating procedures across all group distribution conditions and all linking methods. The MIRT equating procedures under the TCF, the I CF, and the OD linking methods showed better equating performance as compared with those under the M and the NOP linking methods. The MIRT equating procedures under the NOP linking method demonstrated the worst equating performance within most of the group distribution conditions. Furthermore, the group ability mean difference factor had the largest negative effect on the equating results for all three equating procedures across all linking methods. Future studies are expected to address how the different MIRT software, the choice of the synthetic population weights, the choice of different criterion equating functions, and selection of rotation type influence the performance of the MIRT equating.

PAGE 23

23 CHAPTER 1 INTRODUCTION Large scale assessment s are widely used in education. The No Child Left Behind Act (2001) has increased interest in measuring student Adequate Yearly Progress (AYP) (Schwarz, Yen, & Schafer, 2001). Under the No Child Left Behind Act of 2001 (NCLB), each state is required to establish achiev ement standards and assessment systems for measuring student progress since federal legislation links funding to standardized test score improvement. This connection raises the stakes associated with large scale assessment systems (Jodoin, Keller & Swamina than, 2003). Therefore, standardized test scores from large scale assessment systems should reflect the most accurate estimates of ability or skills. Ideally, the same test would be administered to the entire population o n multiple occasion s How ever this test administration like would make it impossible to ensure test security and fairness Thus, parallel test forms with different test items are widely used to fulfill the test security and fairness requirement. However, in practice, it is nearly impossibl e to construct multiple forms that are strictly parallel. There is no way to guarantee that the difficulties of each form are the same. Thus, the requirement of parallel test forms cannot be rigidly adhered to So, a statistical process of adjusting test scores from different forms so that scales from the different assessments are transformed to the same metric is necessary. By transforming scales from the different assessments to the same metric, the direct comparison between assessments becomes feasible. This process is called linking or equating. Linking and Equating Test linking and equating have been widely used in educational measurement. Linking is the general idea of a transformation procedure to put performance or scores from two test forms on a co mmon metric. Equating is a statistical process that is used to adjust scores on different test

PAGE 24

24 forms so that scores on the forms are comparable (Kolen & Brennan, 2004). There are two different types of linking/equating methods: observe score linking/equati ng and true score linking/equating. In general, test linking/equating, has two essential components: the data collection design and the linking/equating methods (Holland & Dorans, 2006). The choice of linking/equating methods directly corresponds to the d ata collection design. T est linking/equating can be categorized as IRT (Item Response Theory) linking/equating in which linking/equating is conducted under the IRT framework and non IRT linking/equating in which linking/equating is not conducted under the IRT framework. Several IRT equating methods are Item Response Theory Item response theory (IRT) models are commonly used in educational and psychological testing. In the past, IRT has been widely used in state educational assessment s to estimate student performances on standardized tests. IRT is a family of statistical models for analyzing item responses in a sample of individuals (Lord, 1952). invariance characteristics when analyzing test response results, IRT provides a stronger statistical approach and more accurate ability (trait) estimation than Classical Test Theory (CTT) in many applications. The essential feature of the IRT framework is its mathem atical expression for the probability of a particular item response ( ) ( ) and the item parameter ( ): (1 1) Multi dimensional Item Response Theory IRT model can be classified as either unidimensional IRT or multidimensional IRT depending on the assumption of dimensionality underlying the response mechanism between

PAGE 25

25 examinees and items. Unidimensional item response theo ry (UIRT) model is based on the idea that the probability of answering an item correct ly is a function of a si ngle or unidimensional ability. The logistic models for unidimensional item response theory are based on the logistic distribution, which gives th e probability of a response in a simple form (Embretson & Reise, 2000). The three parameter logistic (3PL) model (Hambleton & Swaminathan, 1985; Lord 1980) assumes that the probability of a correct answer to a dichotomously scored item by an examinee with ability is given as follows: (1 2) where is the item response for examinee on test item is the item discrimination parameter, is the item difficulty parameter, is the guessing parameter or the lower asymptotes, representing the probability of correct re sponse when the ability assessed by the item is very low, and, is the scale coefficient, so that the parameters in the logistic model have the same meaning as the parameters in normal ogive model. Multidimensional item response th eory (MIRT) model has been developed in response to the need f or modeling the relationship between more than one ability or construct and the 2005). The probab ility of a correct response to multidimensional items will vary depending on various combinations of proficiency on multiple abilities or constructs. MIRT models are developed and classified into two types: compensatory and partially compensatory (or non determined differently among the ability dimensions in these two types of models (Min, 2003).

PAGE 26

26 The compensatory model allows high ability on one dimension to compensate for low abili ties on other dimensions because of the additive characteristics i n the exponent of as the probability of a correct response is modeled through a linear combination of vectors. However, the partially co mpensatory (or non compensatory) model only allows limited compensation, because the probability of a correct response is modeled as a function of the product of probabilities for each ability dimension parts. The form of the multidimensional extension of the three parameter logistic (M3PL) model (McKinley & Reckase, 1983) is given by (1 3) w here is the probability of a correct response ( ) for person on test item is the item response for person on test item is an vector of item discrimination paramet ers, is a scalar parameter related to the difficulty of the item, is a lower asymptote or pseudo guessing parameter, is an vector of ability parameters fo r the examinee, is the number of ability dimensions, and is the scale coefficient to ensure the parameters in the logistic model having the same meaning as the parameters in the normal ogive model. MIRT L inking In educational measurement, IRT (both UIRT and MIRT) is widely used because of its advantages over other non IRT approaches (i.e., Classical Test Theory). The parameter invariance characteristic of IRT offers tremendous flexibility in choosing a pla n for calibrating and linking test forms. Similar to UIRT scale linking, MIRT scale linking is also a linear transformation, but the transformation is on multiple dimensions. MIRT scale linking generates a

PAGE 27

27 set of transformation coefficients to adjust rotat According to different data collection designs and different assumptions, several multidimensional linking methods have b een studied ( Davey, Oshima & Lee, 1996; Hirsch, 1989; Kim, 2008; Li, 1997; Li & Lissitz, 2000; Min, 2003; Oshima, Davey & Lee, 2000; Reckase & Martineau, 2004; Simon, 2008; Thompson, Nering & Davey, 1997; Wei, 2008 ; Yon, 2006 ). Specifically, several MIRT l inking methods were developed for the non equivalent anchor test (NEAT) design including the Li and Lissitz (2000) method (L L method), the Min (2003) method (M method ), the Oshima, Davey, and Lee (2000) method (i.e. Test Characteristic Function TCF method Item Characteristic Function ICF method, Direct Function method) and the Non orthogonal Procrustes (Reckase & Martineau, 2004) method (NOP method) The specific details and theoretical framework of these MIRT linking methods will be presented in Chapter 2. MIRT Equating ability proficiency. These measures include raw score (i.e., number correct score), IRT ability estimate, and scale score. In the UIRT framework, beca use of the invariance characteristic of IRT, the scale indeterminacy (Baker, 1992; Lord, 1980) occurs. That is, it is possible to have an infinite number of legitimate translations of IRT parameters, under which the invariant relationship between items and origin indeterminacy and the unit indeterminacy. However when an IRT ability estimate vector the MIRT framework, in addition to origin and the unit indeterminacy for UIRT, a new indeterminacy occurs.

PAGE 28

28 The estimated IRT ability (i.e., ) is a scalar within the UIRT framework. That is, under UIRT, the estimated IRT ability on one test is in terchangeable to its corresponding estimated IRT ability on the other test. Thus, the symmetry property (Lord, 1980) of equating is satisfied under UIRT equating. T he symmetry property, proposed by Lord (1980) requires that the function used to transform a score on equated Form to the base Form scale be the inverse of the function used to transform a score on base Form to the equated Form scale (Kolen & Brennan, 2004). If the UIRT scale linking is conducted, estimated IRT ability from different test forms are in the same metric and can be compared directly. However, in the MIRT framework the estimated IRT ability (i.e., ) is a vector ( ). Under MIRT, the relationship between the loc ation of a person in the ability space and the probability of a correct response to the item is displayed as the cutoff contour in the item characteristic surface. Even though t hese ability vectors contain different mathematical values, they all fall on th e same contour so that they are equivalent due to their same probability for a correct item response For those examinees having ability vectors that fall on the same contour of the MIRT item, the probability to answer this item correctly is the same. Diff erent ability vectors falling on the same contour may end up with the same probability to answer this item right Similar situation occurs on the test level. Under MIRT, different ability vectors falling on the same contour of test characteristic surface may end up with the same number correct scores in the same test form Likewise if two test forms are already in the same scale metric, different ability vectors from two different forms falling on the equivalent contours are equivalent and may end up with the same number correct scores. Thus, the test scores of two forms for these equivalent ability vectors are equated. However, the equivalence of these different ability vectors from two different forms falling on the equivalent contour s may not be able to directly r ecognize

PAGE 29

29 until putting two test forms in the same scale metric, then obtaining the same value by plugging them in a specific MIRT model. Furthermore, because for each ability vector on one test form, the re is infinite number of ability vector s falling on the equivalent contours of test characteristic surface on the other equated test form the symmetry property (Lord, 1980) of equating is violated under MIRT. One p ossible solution to make the MIRT equating available is to use the number correct score or scale score as the ability measure in MIRT. In fact, when the number correct score or scale score is used as the ability measure, the MIRT ability vector is unidimensionalized so that the symmetry property (Lord, 1980) of equating is satisfied. This process is called unidimensionalization S everal procedures have been developed within the MIRT framework to conduct MIRT equating, (Brossman, 2010). These MIRT equating procedures are full MIRT observed score equating unid imensional approximation of MIRT true score equating procedure and unidimensional approximation of MIRT observed score equating procedure. Although, three new MIRT equating procedures were developed by Brossman (2010), the appropriateness of these procedu res for conduct ing MIRT equating ha s not been sufficiently investigated Moreover, the following questions have not been investigated: 1. Should the number correct score be used as the ability measure in the MIRT framework ? 2. Are the current ly existing MIRT eq uating methodolog ies adequate? 3. Should we utilize the unidimensionalization for test equating in the MIRT framework? 4. A re these unidimensionalization procedures (i.e., unidimensional approximation or the final step of full information MIRT observed score e quating) appropriate for the MIRT equating? 5. If score unidimensionalization is appropriate, on which step (i.e., IRT modeling, IRT linking, IRT equating) should the unidimensionalization be conducted?

PAGE 30

30 Before stating toward the purpose of the study, these questions must be answered. First, the number correct score has been widely used in educational measurement and large scale assessment. Psychometricians and educational practitioners are usually required to include the number correct score as a measure to it is easier for the audience and stakeholders to understand the number correct score than it is to understand ability estimates Therefore, the number correct score is still an important measure to report ex even in the MIRT framework. B ecause of the symmetry property (Lord, 1980) of equating, using the number correct score to unidimensionalize the MIRT ability vector is the one possible solution to make such MIRT equating possible. How ever, using the number correct score to represent an test score is exclusively unidimensional regardless of whether the test is unidimensional or multidimensional. Some multidimensional features of the MIRT might be los t b ut, as reiterated before, this unidimensio nalization procedure is the acceptable solution to make the MIRT equating possible. So, for the purpose of equating two test forms, even though using the number corre ct score ce unidimensionalizes the MIRT ability vectors, it is acceptable under MIRT framework. Second, e ven though the number correct score is unidimensional, the relationship between item scores and examinee abilities is still multidimensional in the MIRT framewo rk. Theoretically, if the MIRT assumptions hold, using the MIRT model MIRT linking and MIRT equating procedures to estimate and transform test scores w ill maintain multidimensional characteristics as much as possible in the IRT estima tion, linking, and equating process. By maintai ni ng the multidimensional characteristics from the response data, sampling error and systematic bias are minimized in the process, so that more accurate test equating results

PAGE 31

31 that reflect the relationship betw een items and examinees will be obtained, no matter which form of the ability measure the test adapted. Therefore, even though the reported measure is the unidimensional number correct score for a multidimensional test, theoretically it is suggested to use three step procedure MIRT estimation, MIRT linking, MIRT equating to conduct test form equating. Third, if the number unidimensionalization process is conducted by transform ing MIRT abil ity vectors into unidimensional measures I t is unknow on which step (i.e., IRT estimation, IRT linking, IRT equating) should the unidimensionalization be conducted. T echnically, unidimensionalization is possible on any of three equating steps (i.e., model estimation, linking, and equating). If the unidimensionalization were used in the estimation step a UIRT model is used to replace the MIRT model so that a set of unidimensional parameters are obtained and used for later test linking and equating. If the unidimensionalization were used in the linking step the MIRT model is estimated and then, a unidimensional approximation of MIRT (Zhang, 1996; Zhang & Stout, 1999 ; Zhang & Wang, 1998 ) is conducted for unidimensionalization so that a UIRT linking and UIRT equating are applied later. If the unidimensionalization were used in the equating step through the compound binomial function from observed score equating method, the multidimensional estimates are unidimensionalized in the final step. In addition, the u nidimensional approximation (Z hang, 1996; Zhang & Stout, 1999 ; Zhang & Wang, 1998 ) can also be used as an unidimensionalization for the equating in the final step. The four different unidimensionalization procedures at different stages is depicted graphica lly Figure 1 1.

PAGE 32

32 Figure 1 1 Pos sible unidimensionalization at d ifferent IRT equating s tages Presumably, maintaining the multidimensional features up to the last step in the process and conducting unidimensionalization in the equating step (i.e., full i nformation MIRT observed score equating) is theoretically the best procedure to handle unidimensionalization. However, it

PAGE 33

33 is unknown which procedure depicted in the above figure performs better than others and this is one of the reasons we need to conduct a further investigation. Moreover, since the MIRT proposed to compare the performances of these test equating methods (i.e., full MIRT observed score equating meth od, unidimensional approximation of MIRT true score equating method, and unidimensional approximation of MIRT observed score equating method) under different IRT frameworks. Limitation in Previous MIRT Equating Research Although three MIRT equating proced ures were developed by Brossman (2010), limitations exist in his study and further exploration is needed. First, instead of using simulated data, a set of real test data were used in the Brossman study. Because the study used real data, the original popul ation parameters are not known so that in order to systematically investigate these MIRT equating procedures, a simulation study is needed so that we can see how e ach of these procedures performs under a variety of settings. Second, because Brossman used real data in his study, not simulated data mean square error, and bias could not equating procedures. Since the mean square error and bias are usually used to report the efficiency and robustness of a test equating method in the equating methodology literatur e, using these t wo coefficients in a simulation study to see how these equating procedures perform would increase our understandings of the MIRT equating methodology. Third, Brossman only discussed how these MIRT equating procedures perform ed under the ra ndomly equivalent group s design. Because of the data collection design applied in his study

PAGE 34

34 MIRT linking may not have be en as significant of an issue, provided the constructs we r e measured equivalently in the two groups from the same population. Thus, the discussion of the MIRT linking procedures was not addressed in hi s study. Moreover use of the randomly equivalent group s design to scale and equate test forms in the multidimensional case is not suggested (Davey, et al. 1996). E ach dimension in the test w that dimension with high parameters. T he dimension and test structure from one test must match with its substantive counterpart in the ot her test. In practice, although using the randomly equivalent group design in the MIRT equating is theoretically feasible, the prospects of successfully accomplishing all the constraints are discouraging enough to suggest not using randomly equivalent grou p designs for multidimensional data ( Davey, et al. 1996 ). Purpose of the Study In practice, the NEAT design is the most widely used data collection design in the test equating. As noted above the NEAT design is not affected by the problems caused for the randomly equivalent group s by the lack of common items in the two tests Therefore, the performance of the MIRT equating procedures under the NEAT deserves further investigation. Moreover, in comparison to the randomly equivalent group designs, under the N EAT design, MIRT scale linking indeterminacies related to translation, dilation, rotation, and correlation, have more consequential influences on the test equating results. Thus, a study to investigate how MIRT equating procedures perform under the NEAT de sign is proposed here. In addition, the research to see how the MIRT linking methods intertwine with the MIRT equating procedures to impact on the equating results is also necessary. Therefore, the purpose of this study was to evaluate the performance of t he MIRT equating procedures under NEAT design and to explore how different MIRT linking methods interacting with these equating procedures impact on the

PAGE 35

35 equating results, under various testing conditions. In this study, five MIRT linking methods (i.e., the direct method, the T est C haracteristic F unction method TCF the I tem C haracteristic F unction method ICF the M method M, and the non orthogonal Procrustes method NOP ) and three MIRT equating procedures (i.e., the full MIRT observed score equating, the unidimensional approximation of MIRT true score equating, and the unidimensional approximation of MIRT observed score equating) are examined.

PAGE 36

36 CHAPTER 2 LITERATURE REVIEW In this chapter, a review of item response theory, including unidimensional item r esponse theory (UIRT) and multidimensional item response theory (MIRT) is presented. Next some special features of IRT (e.g. item characteristic curve, test characteristic curve, etc.) which are closely relevant to equating me thodology are introduced Fol lowing that da ta collection designs and corresponding procedures for conducting I RT scale linking (i.e., UIRT, MIRT) are summarized T he final section discusses the procedures for conducting IRT equating, including the UIRT equating methods and the m ethod ology for MIRT equating ( Brossman 2010) Item Response Theory Item Response Theory (IRT), proposed by Lord (1952), is a family of statistical models for analyzing item responses in a population of individuals. It applies mathematical models to depict the relationship between examinees and items for measuring abilities, attitudes, or other variables (IRT; Lord& Novick, 1968 ; Rasch, 1960 ). Because of its sample invaria nce characteristics when analyzing test response results, IRT provide s a better statistica l approach and more accurate ability (trait) estimation over Classical Test Theor y (CTT) in many applications including equating and linking In general, item response theory can be classified either as unidimensional or multidimensional depending on the a ssumption of dimensionality underlying the response mechanism between examinees and items The essential feature of the IRT framework is its mathematical expression for the probability of a particular item response, value and the item parameter (s) : (2 1)

PAGE 37

37 This equation above is define d as the item response function (IRF ; Yen & Fitzpatrick, 2000 ). Unidimensional Item Response Theory U nidimensional item response theory is based on the idea that the pr obability of getting an item correct is a function of a single or unidimensional ability. For instance, a person with higher math skills (i.e., unidimensional ability) would be more likely to correctly respond to a given item on a math test. Thus, the prob ability to correctly respond to the given item d epicts the relationship ability attri butable to the persons Two major assumptions for unidimensional item response theory (I RT) models unidime n sio nality, and local independence are worth y of mentioned. The assumption of u nidimensionality indicates that there is only a single ability underlying the difference in person responses to items (Embretson & Reise, 2000). The local ind ependence assumption emphasizes that the probability of responding to an item is statistically independent of responding to any other item while conditioned on ability within test (Hambleton & Swaminathan, 1985). The common ly used UIRT models for di chotomo us data are presented below Unidimensional Item Response Model for Dichotomous Data There are two different types of unidimensional models commonly used for dichotomous responses : the logistic model and normal ogive model. For its easier mathematical expr ession, the logistic mo del has been widely used in educational measurement application s The logistic models for unidimensional i tem response theory are based on the logistic distribution, which gives the probability of a response in a simple form (Embre tson & Reise, 2000).

PAGE 38

38 The three parameter logistic (3PL) model (Hambleton & Swaminathan, 1985; Lord 1980) assumes that the probability of a correct answer to a dichotomously scored item by an examinee with ability is given as follows: (2 2) where is the item response for examinee on test item is the ite m discrimination parameter, is the item difficulty parameter, is the guessing parameter or the lower asymptotes, representing the probability of correct response when the ability assessed by the item is very low and, is the scale coefficient, so that the parameters in the logistic model have the same meaning as the parameters in normal ogive model. Therefore, the item characteristic curve (ICC) from a logistic model w ith the sa me item parameters is almost identical to its corresponding ICC from a normal ogive model. If is assumed to equal 0, the 3PL model becomes two parameter lo gistic (2PL) model presented below. (2 3) In 2PL model, if is assumed to equal 1, then 2PL model becomes the Rasch model (Rasch, 1960). The Rasch model is a special case of one parameter logistic (1PL) model. (2 4)

PAGE 39

39 Item characteristic c urve (ICC) In t he IRT theoretical framework, the graphical version of IRF is called the item characteristic curve (IC C). The IC C graph ically displays ability on the x axis and probability of response on the y axis (Yen & Fitzpatrick, 2006 ) F igure 2 1. Example item characteristic c urve and When the item response function describes the probability of a given response to an item, its graphical representation, the item characteristic curve (ICC), define s the expected item score as a function of ability. T he expected item score for a dichotomous item is simply equal t o the probability of a correct response. The item difficulty parameter ind icates the location of the ICC with respect to the ability scale ( ) when the latent trait is equal to 0.5 (one parameter two parameter model) or (three parameter model) ; the item discrimination parameter indicates the magnitude of the slope of the ICC, at the location when the latent trait is equal to 0.5 (one parameter, two parameter model) or (three parameter model). A larger discrimination parameter results in more discriminat ion power and yields a steeper ICC slope.

PAGE 40

40 The pseudo guessing parameter ( ) indicates the lower asymptote location of an examinee who has very low ability should perform on the item. Test characteristic c urve (TCC) The test chara c teristic function (TCF) is the sum of the item scores assigned to each item response times the probability of obtaining that it em r esponse (Yen & Fitzpatrick, 2006 ) : (2 5) For a test composed solely of dichotomous items, if thre e parameter model was utilized, looks like the number correct score, but it is continuous and has values higher than the pseudo guessing parameters ( ). Figure 2 2. Example test charact eristic c urve (TCC) for a 3 item test and The test characteristic curve (TCC) is the graphical representation of TCF. In addition, TCC is the sum of the item characteristic curves across all items and is conceptually viewed as

PAGE 41

41 the regression of the summed score responses on ability (Lord, 1980). Figure 2 2 displays the TCC for a test composed of 3 dichotomous items. The inverse of the TCC can be used to convert total raw sco res to theta estimates. M ultidimensional I tem R esponse T heory In uindimensional IRT framework, a single underlying ability is involved across all the items in a test. Although unidimensional IRT has been widely used in t he educational measurement resear ch shows that the unidimensional assumption s are often difficult to meet in real world contexts (Ackerman, 1994; Reckase, 1985). I n the real world situation, test items ma y require more than one ability or hypothetical construct to solve it The probabili ty of a correct response to such items will vary depending on various combinations of proficiency on multiple abilities or con structs. An example of multidimensional items is a set of science problems that requires the examinees to have both scientific ski lls to solve the problems and reading skills to understand the questions Therefore, m ultidimensional item response theory (MIRT) models have been developed in response to the need of modeling the relationship between more than one ability or construct and the complexities of the interaction between persons and items (Reckase, 2005). Multidimensionality may be caused by many factors including item context, test content or the existence of multiple traits on the same item. It is possible that more than one cognitive trait Multidimensional IRT Models for Dichotomous Data In general, MIRT models are developed and classified as two types : compensatory and partially compensatory (or non compensatory) models The probabilities of a response are determined differently among the ability dimensions in these two types of models (Min, 2003).

PAGE 42

42 The compensatory model allows low ability on one dimension to be compen satory by the other dimension s because of the additive characteristics on the exponent of as the probability of a correct response is modeled through a linear combination of vectors. On the contrary, l imited compens ation is allowed in the partially compensatory (or non compensatory) model, because the probability of a correct response is modeled as a function of the product of probabilities for each ability dimension parts. Thus, the probability of correct response f or an item that follows partially compensatory model can never be greater than the probability for the component with the lowest probability (Reckase, 2009). Compensatory Multidimensional Item Response Model The form of the multidimensional extension of t he three parameter logistic (M3 PL) mo del (McKinley & Reckase, 1983) is given by (2 6) Where is the probability of a correct response ( ) for person on test item is the item response (0 wrong; 1 correct ) for person on test item is an vector of item d iscrimination parameters, is a scalar parameter related to the difficulty of the item, is a lower asymptote or pseudo guessing parameter, is an vector of ability parameters for the examinee, is the number of ability dimensions and is the scale coefficient, so that the parameters in the logisti c model have the same meaning as the parameters in the norm al ogive model.

PAGE 43

43 It should be note d that there is a discrimination parameter for each dimension, but only one difficulty parameter is included in the model. In this model, using a distinct difficulty paramete r for each separate dimension is indeterminate ( Z hang & Stout 1999 ). If is assumed to equal 0, the M3PL model becomes the compensatory multidimensional two parameter logistic (M2PL) model presented as below. (2 7) The property of the linear combina tion of the compensatory multidimensional logistic model on the exponent of can be expanded to show the way that the element of the and vectors interact. (2 8) The exponent is a linear function of the elements of with the parameter as the intercept term and the elements of the vector as slope parameters (Reckase, 2009, p.86) The components in the exponent are additive such that ability on one dimension can be compensated by the other ability dimension(s). When the exponent is set to a certain constant value of all vectors w hich produce will give the same probability of a correct response. For example, for a test item with and as the exponen t is a linear function of the elements of becomes,

PAGE 44

44 Figure 2 3 Plot of vector that yield exponents of for a test item with param eter The property of the model can be graphically displayed by an equiprobable contour plot shown below. This plot shows that all person with vect ors that falls on the line have a probability of correct response of .5. Partially compensatory Multidimensional Item Response Model Sympson (1978) proposed the partially compensatory MIRT model for the interaction between t he person and the test items Th e mathematical expression is given by (2 9) Where is the probability of a correct response ( ) for person on test item

PAGE 45

45 is the item response (0 wrong; 1 correct) for person on test item is an vector of item discrimination parameters, is an vector of item difficulty parameters, is a lower asymptot e or pseudo guessing parameter, i s an vector of abilit y parameters f or the examinee, is the nu mber of ability dimensions, and is the scale coefficient, so that the parameters in the l ogistic model have the same meaning as the parameters in the normal ogive model. The terms in the multiplication product have similar form of the 2PL UIRT model. In a sense, each of these terms gives the probability of being successful on one dimension of the item. The probability of doing all of the dimensions on the item correctly is the product of the probabilities of doing each dimension correctly (Reckase, 2009) Although two different types of the MIRT model have been proposed, research on which model fits real data better is rare (Bolt & Lall, 2003; Spray, Davey, Reckase, Ac kerman, & Carlson, 1990). T here is no overall conclusion on which forms of the model more accurately represents the interactions between persons and test items (Reckase, 2009). Pre vious empirical studies indicated that, for vectors along the diagonal direction from ( 4, 4) to (4, 4), there is no distinguishable difference in the probability of correct response for the two models (Spray et al., 1990). Bec ause estimation with the compensatory models is relatively easy, most research and applications on MIRT have been done based on the compensatory models (Kim, 2008). Moreover, most research on MIRT linking and equating has be en done using the compensatory

PAGE 46

46 m odels ( e.g., Davey, et al., 1996; Hirsch, 1989; Thompson, et al., 1997 ) Therefore, only the compensatory model hence forward referred to as the multidimensional model, is considered in this study. We should be aware that although the compensato ry models have been widely used there is little psychological /cognitive justification f o r this type of models Neverthenless the compensatory model is a good solution to multidimensionality because the simplicity of the models and they produce a reasonable approxi mation to reality (Zhang & Stout, 1999). S ummary of MIRT Statistics In the multidimensional item response theory framework, a number of statistics have been developed to describe an param eters and direction cosines ) Moreover, some statistics are frequently being used to describe the functioning of test items in MIRT linking and equating (e.g., direction cosine, item response surface (IRS), test response surface (TRS), etc.). Specifically two statistics reference composite and the direction of measurement are of critical importance for understanding the MIRT equating methodology described in this research. Because of its importance the summary of these statistics in multidimensional ite m response theory is given below to clearly explicate the meaning of these statistics. MDISC In UIRT, the item discrimination is simple and directly related to the slope of the item characteristic curve where it is steepest. In MIRT, the item discriminatio n can be generalized from the UIRT case, but the relationship is complex and no longer direct. U nlike the UIRT models, multidimensional item discrimination is a vector rather than scalars (Min, 2003). The overall d iscriminating power of an item, is defined by the following equation (Reckase & McKinley, 1991).

PAGE 47

47 (2 10) where denotes the t slop e at the steepest point, and is the t t h dimension. T he interpretation of MIRT discrimination p arameter is similar to the UIRT parameter but each element of the vector is related to a direction in the dimensional space. The magnitude of the on a particular dimension relative to the magnitude of the parameters on other dimensions indicates to what degree the item measuring ability on that particular dimension (Brossman, 2010). The process t o interpret is complex, because in the item response surface (IRS), the slope of a surface is dependent on the direction of movement along the surface so the point of steepest slope depends on the direction that is being considere d (Reckase, 2009). Reckase (2009) sta ted the meaning of below. At each point in the space, there is a direction that has the maximum slope from that point. If the entire space is considered and the slopes in all directions at each point are evaluated, there is a maximum slope overall for the test item. The value of the maximum slope would be useful summary of the capabilities of the test item for distinguishing between points in the d irection of greatest slope. (p. 113) Another important statistic for describing item discrimination vectors using MIRT is the direction of an item for the best measurement (i.e. best discrimination) in the latent ability s pace. This direction of an item for the best measurement coefficient is an angle index and can be expressed as t he discrimination of an item at its greatest slope when the angle with coordinate axis is given (Reckase & McKinley, 1 991). Thus the trigonometric expression of this direction of an item for the best measurement coefficient is given by

PAGE 48

48 (2 11) Therefore, the direction of an item for the best measurement coefficient can be obtained as, (2 12) where the direction of the best discrimination in the dimensional space is defined as and is an angle from the th dimension. These angles and cosines a re characteristics of the item with the item discrimination vectors. The cosines specified are sometimes called direction cosine s. Figure 2 4 Graphical representation of in a two dimensional space, f or a test item with parameters by the length of vector arrow and The magnitude of can be gr aphically displayed as the length of the item vector arrow. For example, for a test item with parameters, and the magnitude

PAGE 49

49 of is equal to and the angle between first dimension ( axis) and the direction of the best measurement for this item is equal to Assuming orthogonal axes of dimensions in the item response surface (IRS), the graphical representation of this item discrimination parameter of the tes t item is presented in Figure 2 4. MDIFF The item difficulty in the UIRT framework is directly related to the characteristics of the item characte ristics curve (ICC). The difficulty parameter indicates the value of that corresponds to the point of steepest slope for the ICC (Reckase, 2009). The m ultidimensional item difficulty is equivalent to the unidimensional difficulty and defined as Graphically speaking, for the test item is the distance between the origin and the point of the steepest slope of the item in the space The sign associated with this distance would indicate the relative position of the point to the origin of the space. The mathematical expression is given as (2 13) However, we usually use index ( ) to indicate item diffic ulty in MIRT models. The index which is parameterized to indirectly relate t o the location of the ICS is the multidimensional e quivalent to the unidimensional parameterization, ( ). That is, and A s a result, the MIRT equivalent of the UIRT difficulty can be computed as

PAGE 50

50 (2 14) In this equation, is the parameterized value for the item and represents the vector of discrimination parameters for the item Lower Asymptote P ara meter In the MIRT framework, t he MIRT lower asymptote parameter is directly comparable to the UIRT lower asymptote parameter Item Characteristic Surface As the direct extension of item characteristic c urve in UIRT, i tem response surface (IRS) is used to graphically display the property of the compensatory MIRT models, called item characteristic surface (ICS). With the same example item ( and ) used before the item characteristic surface and equiprobable contour plots for this item are given below. Figure 2 5. Item response surface (IRS) and equiprobable contour plot for an item with parameters

PAGE 51

51 Test Characteristic Surface (TCS) Similar to the item characteristic surface (ICS), the multidimensional test response surface (TRS), also known as test characteristi c surface (TCS), hence forward i s the direct extension of the unidimensional test characteristic curve (TCC). The TCS is computed as the sum of the item characteristic surfaces (ICSs) across all items, and is conceptually viewed as the regression of the sum of the item scores on the vector (Reckase, 2009). For a test with all dichotomous test items, the test characteristic surface (TCS) is the sum of the item characteristic surfaces for the items in the test. The mathematical expression for the test characterist ic surface ( TCS ) is given by (2 15) For a three item two dimensional test example with the test characteristic surface is gr aphically displayed in Figure 2 6 as below. Figure 2 6 T est characteristic surface (TC S) for a three item test with parameters

PAGE 52

52 Item Information Function T he item information in MIRT is defi item score on the divided by the variance of the item score at (Reckase, 2009). However, at each point in the space, because of different direction of movement from the point, the slope of the multidimensional item response surface varies according to different directions This slope variation implies that the magnitude of the item information in the s pace at each point depends on its orientation T he definition of item information is given Reckase, 2009, p. 121 ) by (2 16) where is the vector of angles w ith the coordinate axes that defines the direction taken from the point, is the directional derivative in the direction and is the probability to ans wer that item correct, at any given is equal to directional derivative (Reckase, 2009) is given by (2 17) If the two p arameter MIRT logistic model (M2PL) is assumed the directional derivative can be presented as (2 18) Therefore (2 19)

PAGE 53

53 If the is replaced by the direction of great est slope the MIRT item information reaches its maximum slope in that direction So, (2 20) Reference C omposite and Direction of Best M easurement Reference composite and direction of best measurement are two important concepts in the MI RT framework. Specifically, these two concepts are essential for MIRT linking and MIRT equating. Reference composite was originally developed by Wang (1985, 1986). W hen a UIRT model is applied to the multidimensional data, a unidimensional projection scal e ( ) can be found to determine which direction is best measured at the test level in the ability space This unidimensional projection scale is called the for the test The reference compos ite is defined as and can be treated as the average direction for the item vectors. T he unidimensional reference composite ( ) is the values projected onto the average direction of the test items at the test level. Wang (1985, 1986) defined the reference composite by the first eigenvector of the matrix which corresponds to the largest eigenvalue of the matrix, where is the matrix of di scrimination parameters for each item on the exam (for each item on each dimension ). In the matrix each row corresponds to an item, and each column corresponds to a specific dimension. Each dimension in the multidim ensional ability space is represented by one element of the reference composite, and each element of the reference composite is conceptually similar to the discrimination parameter on that dimension (Brossman, 2010). The elements of the eigenvector

PAGE 54

54 of the matrix can be considered as direction cosines. These direction cosines give the orientation of the reference composite with the coordinate axes of the space (Reckase, 2009). An example of a reference com posite for a 5 item two dimension MIRT test is graphically displayed in Figure 2 7 This test has 5 items with parameters and Five black arrows in the Figure 2 7 are the ite m discrimination arrows for the five items respectively. The bold brown arrow projecting across the origin of the dimensional space is the arrow of the reference composite at the test level. Figure 2 7 Reference composite with item parameter arrows for a five item MIRT test (M2PL) with parameters Another important concept to summariz e the characteristics of the multidimensional items at the test level is a unidimensional p rojection scale called the (Zhang & Stout, 1999). Similar to the reference composite the concept of

PAGE 55

55 as the linear composite of abilities, is proposed to describe the relationship between the mult idimensional ability space and the unidimensional projection when a UIRT model is applied to the multidimensional data. Compared to the reference composite, t he A b rief description of the mathematical definition from Zhang and Stout (1999) is given below. I n Zhang and Stout (1999) the total observed score of a test is given by (2 21) a nd the true score for the test is defined as (2 22) So the expected value of the test score by giving ability vector is (2 23) The discriminating power of a test score in the composite direction is defined by how quickly the true score increas es as the composite score increases (Zhang & Stout, 1999). Thus, the directional derivative of the true score in the direction of the composite must be considered. The matrix is defined as a direction m atrix (of a composite) in the dimensional ability space Thus, the multidimensional critical ratio (MCR) function for test score at in direction is defi ned as (2 24)

PAGE 56

56 I f the MCR function for a single item is computed the square of the MCR function for item has the same mathematical expression as the multidimensional item information defined by Reckase and McKinley (1991). The mathematical expression is shown below. (2 25) where is the probability to get a correct response for this item, and When the discriminating power o f an observed score in the composite direction at point in the latent space is measured the expected multidimensional critical ration (EMCR) function denote as is obtained for t otal observed score in direction (2 26) The EMCR function gives the average discriminating power of the observed score in the composite direction Thus the direction of best measurement is defined as, The direction vector that maximizes the EMCR function over all possible direction is called the sub test direction of score or simply the direction of best measurement of The corresponding composite, denoted and given by is called the subtest composite of the score When is the total test score, the corresponding composite is called the test composite and its direction is called the test direction (p. 140; Zhang & Stout, 1999) Although the concepts of are similar t he se two measures have some slight differences in both mathematical and conceptual ways. T he reference composite is solely determined by the item discrimination parameters (Wang, 1 985, 1986). However, by taking into account the multidimensional information, t he direction of best measurement is computed by averaging multidimensional

PAGE 57

57 information function across all directions (Zhang & Stout, 1999). These two statistics will be mention ed again later in the introduction of MIRT equating in this chapter. Linking and Equating To date, test linking and equat ing has been widely used in educational measurement and psychometrics to put scales from the different assessments to the same metric so that the direct comparison between assessments becomes available. Linking, is the broadest term used to refer to a collection of procedures to put performance or scores on one assessment on a common metric with performance or scores on a nother assessmen t. Equating is a statistical process that is used to adjust scores on different test forms so that scores on the forms are comparable (Kolen & Brennan, 2004) Equating is the strongest claim and provides a direct link between a score on one test form and a score on another test form. In general, t est linking/equating including IRT linking/equating and non IRT linking/equating has two essential components: the data collection design and the linking/equating methods (Holland & Dorans, 2006). The choice of linking/equating methods direct ly corresponds to the data collection design. Data Collection Design To date, a number of data collection designs have been popularly used for linking and equating. These data collection designs are single group design (SG), single group design with counterbalancing (CB), random groups design (EG), and common item nonequivalent groups design (NEAT). In the SG design, the same examinees are administered both test forms. In the CB design, the same examinees are also administer ed both test forms, but the order of administration of the test forms is counterbalanced so that the order effects in the single group design are solved. In

PAGE 58

58 the EG design, examinees are randomly assigned the form to be administered. I n the NEAT design, two test forms have a set of items in common and different groups of examinees are administered the two forms (Kolen & Brennan, 2004). Each of the data collection designs has advantages and disadvantages that make it more or less useful in a particular situa tion. In general, for equating and scaling purposes, the SG design requires the smallest sample sizes and the EG design requires the largest sample size to achieve the same level of accuracy as measured by the standard error of equating (Holland & Dorans, 2006). The requirement for sample size and other characteristics regarding NEAT design is somewhere in between EG design and SG design. Because the group and form difference are confounded in the NEAT design, a major task in conducting equating with the NEA T design is to separate group and form difference s (Kolen & Brennan, 2004) In summary, all four data collection designs can be categorized into three different situations (Angoff, 1984 ): Common Items a design that the test forms administered to separate examinee samples partially overlap by including some number of items in common; Common Examinees a design where the examinee samples partially overlap by including some number of examinees in common; Randomly Equivalent G roup a design that administers n on overlapping test forms to randomly equivalent examinee groups. IRT Linking In educational measurement IRT is widely available because of its advan tages over other non IRT approaches (i.e., Classical Test Theory ) for linking The parameter invariance c haracteri stic of IRT offer s tremendous flexibility in choosing a plan for calibrating and linking test forms. As long as the IRT assumptions hold, the sampling error (random error) is handled well in IRT method s compared with other sample dependent non IR T approache s However,

PAGE 59

59 because of its para meter invariance characteristic using IRT may cause possible scale indeterminacy. Thus, scale indeterminacy becomes the other characteristics of IRT as a result of sample invariance feature I nvariance of item /person parameters and scale indeterminacy of IRT are two major reasons of why IRT linking/equating is conducted T he refore, the se two characteristics of IRT are presented below. Invariance of item/person parameters Invariance of item parameters means that the item parameters in IRT remain the same, regardless of various ability distributions who take the tests or the overall difficulty level of the test IRT r emain the same, no matter wh ich test form is given to examinees. As such, the probability of correctly responding to an item is the same for persons with the same ability, regardless of which test they take if the test are designed to measure the same con struct. That is, the item /person parameters are sample invariant across groups and test forms. This characteristic (i.e., invariance of item parameter) is defined by Lord (1980). Based on this important IRT characteristic, item parameters do not depend on forms (Lord, 1980). Scale i ndeterminacy Because of this invariance characteristic of IRT above it is possible to have an infinite number of legitimate translations of IRT parameters as long as such an in variant relationship Lord, 1980). Scale indeterminacy for UIRT includes the origin indeterminacy and the unit indeterminacy. Scale indeterminacy is similar to the identification problems in traditional factor analysis (Min, 2003). In the process of IRT estimation, t he choice of or igin for ability scale is

PAGE 60

60 arbitrary (Lord, 1980). In practices, most IRT programs (e.g., BILOG MG, ConQuest, etc.) usually set a scale by setting features of ability or difficulty distributions to some specific values prior to the IRT calibration so that the scale indeterminacy is solved within each single calibration run Usually, setting s pecific parameters of the ability distribution in t he calibration run to solve the scale determinacy becomes a comparatively stable choice in that the population sample size is usually larger than the sum of the item parameters Thus, the IRT program usually set ability distributions to specific values by setting origin as zero and unit as one (i.e., ) for each item parameter calibration so that the scale indeterminacy seems solved. However this procedure may cause new scale indeterminacy when different test forms are used to meas ure different population grou the groups differ in ability. The scale indeterminacy occurs when the ability distributions from different population groups are specified by setting the origin as zero and unit as one in their s e parate calibration, respectively. B ecause of the group difference s it is possible that peopl e who have the same ability in different groups obtain different test scores and ability scores from separate calibrations This is caused by using different metr ics for different test forms or different groups when the item parameters are calibrated Thus, scaling, linking, or equating must be considered to put item/abilit y parameters on the same metric so that the performance or scores in different assessments ar e associate d or linked. IRT Scale Linking Methods The re are three different ways to conduct IRT linking for both UIRT and MIRT These approaches include concurrent calibration, fixed common item parameter calibration (FCIP), and the scale linking methods after separate calibrations.

PAGE 61

61 Concurrent c alibration In c oncurrent calibration all examinee response data from separate test administrations are combined together and calibrated in a single run (Wingersky & Lord, 1984). As the data are calibrated concurr ently, item parameters and ability estimates are already on the same matric, therefore no scale transformation or linking is necessary. Fixed common item parameter calibration (FCIP) Fixed common item parameter calibration (FCIP), has been widely used in the psychometrics field especially in the psychometrics industry When item parameters for the common items or ability estimates for the common persons are available, those parameters can be used to put the parameters from the other test administration o n the same metric as the matric for the previous test data (Jodoin, Keller & Swaminathan, 2003; Kim, 2006 ). When the FCIP method is used, no scale transformation or linking is necessary, because item parameters and ability estimates from different test adm inistration are already on the same matric for the previous test data. Scale linking after separate c alibrations When response data are from different test forms, or test administrations and when item p arameters are calibrated in separate runs it is nece ssary to conduct scale transformation or linking so that parameters from different test forms are transformed on the same metric. The linking procedure f or this study will focus on these type s of methods UIRT Scale Linking In general, UIRT scale linki ng is a two step process in volving IRT calibration and scale linear transformation In the f irst step item parameters are calibrated by applying certain IRT models through some popular IRT calibration program s (i.e., BILOG, MULTILOG, etc.) S econd item parameter estimates obtained from different calibration processes are linearly

PAGE 62

62 transformed to the same common matric, as long as the IRT model fits the data and the abilities and item s yield the same probabilistic relationships (Kolen & Brennan, 2004) I f the separate calibrations are conducted under different data collection design, whether the scale linking method should be conducted for separate calibrations depends on the equivalency assumption across groups. When the randomly equi valent groups design is used, the assumption that two groups are randomly selected from the same population holds so that these two groups are equivalent. So, when separate calibrations for different groups are conducted under the randomly equivalent groups design, no scale l inking method is necessary since the means and standard deviations across groups are specified to be equal in the calibration process When the separate calibration s for different groups are conducted under the common item design, but two groups are random ly equivalent, it is recommended to use scale linking as a double secured solution to clear up the scale indeterminacy, even though it is not necessary. When the separate calibration s for un equivalent groups are conducted under the NEAT design, scale link ing must be conducted because the scale indeterminacy exists The transformation procedure is described below to focus on the scale linking under the NEAT design. Scale Transformation Procedure Scale is defined as base IRT scale an d scale is defined as equated IRT scale. These two scales differ by a linear transformation. Then the ability values for the two scales are related as (2 27) w here and are transformation coefficients in the linear equation to transform equated IRT scale to the base me tric of scale so that two scales are i n the same common me tric after

PAGE 63

63 transformation (Kolen & Brennan, 2004, p.162) The item parameters transformations on the two scales are : (2 28) (2 29) (2 30) where is the item discrimination, is the item difficulty, and is the pseudo guessing parameter for both scales (Kolen & Brennan, 2004, p.162) In the following section, four commonly used UIRT scale linking methods for separate calibration are described. Mean/Sigma Method Proposed by Marco (1977), Mean/Sigma method uses the means a nd standard deviations of the item difficulty parameter estimates ( ) over the common items to calculate the tr ansformation coefficients (i.e., and ) in equations as follows (2 31) (2 32) Mean/Mean Method The Mean/Mean method was f irst described by Loyd and H oove r (1980). Mean/M ean method uses the mean of the item discrimination parameter estimates ( ) over common items to calculate the unit or dilation transformation coefficient Then, the origin or translati on transformation coefficient is also calculated by the mean of the item difficulty parameter estimates ( ) over the common items with the coefficient calculated before These equ ations are given

PAGE 64

64 (2 33) (2 34) B ecause Mean/Sigma Method and Mean/Mean Method both use the first and second moments of the item parameter estimates to obtain the transformation coefficient s these tw o methods are categorized and called moment transformation method s One weakness of the moment transformation methods is that the calculation process of these methods does not include all the item parameter estimates simultaneously (Kolen & Brennan, 2004 ) so that the whole feature of the item parameter estimates may not be fully covered In order to improve this weakness, Haebara (1980) and Stocking and Lord (1983) developed a type of transformation method that considers all of the item parameters simulta neously. These two UIRT scale linking methods are referred as characteristic curve transformation method s Haebara Method The Haebara method (Haebara, 1980), also known as an item characteristic curve method, uses the function that seeks to minimize the sum of the squared difference between the item characteristic curves across all common items in different forms over a given ability For a given ability the sum, over items, of the squared difference is given by (2 35) The summation is over the common items ( ). T he difference between each item characteristic curve on the two scales is squared and summed.

PAGE 65

65 After is obtained, this function is summed ov er all examinees at any particular ability Then, the following criterion is minimized to obtain the scale transformation coefficient and (2 36) Stocking Lord Method P ropos ed by Stocking and Lord (1983) the Stocking Lord method minimize s the sum of square difference over items and examinees Unlike the Haebara method, the Stocking Lord method sums over items for each set of para meter estimates before squaring. T he function below is the squared difference between the test characteristic curves for a given ability. (2 37) I n the IRT framework, t he equation is referred to as the test characteristic cur ve Thus the expressio n is the squared difference between the test characteristic curves for a given In contrast to the function which is the sum of the squared differ ence between the item characteristic curves for a given the function is the sum of squared difference between the test characteristic curves By minimizing the function below, the scale transformation co efficient s and are obtained in Stocking Lord method. (2 38)

PAGE 66

66 S ummary of UIRT Scale Linking M ethods A comprehensive literature review regarding comparisons among al l these transf ormation methods wa s given by Kolen and Brennan (2004). Some research indicated that the mean/sigma method is preferred over the mean/mean method because the estimates of item difficulty parameters are usually more stable than the estimates of the item dis crimination parameters. On the contrary other research found the opposite result because the means are typically more stable than standard deviations (Baker & Al Karni, 1991; Ogasawara, 2000) Previous research shows that the comparison between two moment scale transformati on methods is inconclusive. Accordingly when scale linking is conducted, both mean/mean and mean/sigma methods must be taken into account When characteristic curve methods are compared with moment methods, in general, the characteristi c curve methods produce more stable results than the mean/sigma and mean/mean methods (Baker and Al Karni, 1991; Hanson and Beguin, 2002; Hung, P., Wu, Y., & Chen, Y., 1991; Kim and Cohen, 1992; Ogasawara, 2001a, 2001b, 2001 c; Way and Tang, 1991). However, the comparison result of these four scale transformation methods under various conditions is not conclusive. I t is suggested that all four scale transformation methods should be taken into account when scale linking is conduct ed (Kolen & Brennan, 2004). P revious research also compared these four scale transformation methods with the concurrent calibration method and the FCIP method. Since this study does not focus on the FCIP method and the concurrent calibration method, the comparison results are left out here MIRT Scale Linking In the past, the r esearch on MIRT scale linking has not been conducted as intensively as unidimensional IRT scale linking for its mathematical complexity. S everal multidimensional linking methods have been studied ( Davey, Oshima & Lee, 1996; Hirsch, 1989; Kim, 2008; Li, 1997; Li & Lissitz, 2000; Min, 2003; Oshima, Davey & Lee, 2000; Reckase & Martineau, 2004;

PAGE 67

67 Simon, 2008; Thompson, Nering & Davey, 1997; Yon, 2006 ) based on different data collection designs and different assumptions Similar to UIRT scale linking, MIRT scale linking is also a linear transformation but the transformation is on multiple dimensions S ince multiple sets of parameter estimates are obtained from MIRT calibration (i.e., discrimination vector) the transfor mation for MIRT scale linking becomes the linear transformation of matrices (Simon, 2008) Moreover, different from the UIRT scale linking procedure where only two transformation coefficients (origin coefficient and unit coefficient ) are needed to adjust differences in (1) origin and (2) unit of measurement from separate calibrations, MIRT scale linking procedure must take more scale indeterminacies into account. Besides unit i ndeterminacy and origin indeterminacy, these scale indeterminacies also contain the rotation difference to determine the comparable reference system as well as the covariance/correlation difference for the several ability dimensions obtained from different calibrations, due to multidimensionality (Min, 2003). Consequently l inking the scale in the multidimension al case is more complex than linking in the unidimensional case N ot only is there the necessity of deciding the location for the origin of the spac e, but there is also the need to set units of measurement on each of the coordinate axes. Furthermore the orientation of the MIRT scale system needs to be determined as well. Differences between UIRT scale linking and MIRT scale linking are graphica lly di splayed below in Figure 2 8 So, besides the origin coefficient and unit coefficient estimated from the UIRT scale linking, MIRT scale linking generates a set of transformation coefficients to adjust (1) rotation,

PAGE 68

68 developed (Hirsch, 1989; Li & Lissitz, 2000; Min, 2003; Os hima et al., 2000; Reckase & Martineau, 2004; Thompson et al., 1997). These MIRT scale linking methods all use the multiple dimensional compensatory model. In these methods, three linking coefficients are estimated including a rotation matrix ( ) to deal with rotation indeterminacy, a translation vector ( ) and a dilation vector ( ) to deal with origin and unit indeterminacy for MIRT scale system (Min, 2003). Different methods rely on different mathematical solutions and theoretical perspective to deal with the scale indeterminacies mentioned above. All current existing MIRT scale linking methods cope with scale indeterminacy by transforming the scale on the rotation, dilation, and t ranslation, either respectively or simultaneously. Figure 2 8 UIRT and MIRT Link ing Components represents origin, represents the unit of measurement for Scale and (Adapted from Min, 2003)

PAGE 69

69 Note that the correlation of item parameters between dimensions cannot be transformed solely. The correlation between ability dimensions is changed with these three transformations mentioned above, especially with th e rotation transformation. When the orthogonal rotation transformation is applied, the correlation of parameters between dimensions remains the same for both base and equated test forms. When the oblique (i.e., nonorthogonal) rotation transformation is us ed, the correlation between the latent ability dimensions may not remain the same for test forms. In the following section, six MIRT linking methods will be introduced based on different perspectives to deal with rotation, translation, and dilation Follo wing that a summary of MIRT scale linking is given. Hirsch The first possible MIRT scale linking approach was proposed by Hirsch (1989). method is the first attempt to expl ore MIRT linking. The common examinee design was used in s study, but common item design can also be used. T he procedure of method is very complex. There are f our basic steps in Hirsch method : (1) MIRT item calibration and ability estimation (2 ) identify common orthogonal basis vectors for both test forms; (3) align different test form basis vectors through orthogonal P rocrustes rotation; and (4) a linear transformation is applied to equate means and standard deviations of ability estimates at each dimension for the base and equated tests. The rotati on indeterminacy for two test forms is handled in the step 2 and step 3. The dilation and translation borrows ideas from principal component analysis (PCA). Thus, orthogonal P rocrustes rotation is adapted for rotation. there are two rounds of rotations (i.e., orthogonal P rocrustes rotation ) in step 2 and step 3

PAGE 70

70 In step 1, the item parameters and ability estimates are obtained through MIRT item calibration and ability estimation. In s tep 2, the common orthogonal basis vectors for both test forms are set up so that the parameter estimates from two test forms would be in the same ability dimension space. This step is closely relevant to the idea of the invariance transformation for MIRT by involving the parameter estimates of the common examinees (or anchor items). The connection between ability vectors and common orthogonal basis vectors is rotated by a transformation matrix (i.e., ) The step conducts the first rotation which shows as follows, Base form orthogonal common basic vector for base form (2 39) E quated form orthogonal common basic vector for equated form (2 40) T he orthogonal common basic vector performs as a common me tric for both test forms. This rotation keeps the correlation between and values found for the common examinees in the test s T he product of the transformation matrix ( ) transforms the discrimination parameter vectors of the tests so that the discrimination parameter vectors of the equate d form had a set of basis vectors with the same angle as the basis vector of the base test (Green, 1976). In step 3, t he second rotation was conduc t ed through the transformation matrix ( ) The transformation matrix ( ) is an orthogonal P rocrustes rotation of the ability estimates which rotates the rotated abili ty estimates of the equated form from step 2 to the rotated ability

PAGE 71

71 estimates of the base form fr om step 2. Through this second round rotation, the sum of the squared differences (i.e., ), is minimized (Schonemann, 1966). This step solves the rotational indeterminacy for both tests (Wang, 1985). T he transformation mat rix ( ) is calculated by the transformation process as follows, (2 41) (2 42) where is the eigenvectors of, and is the diagonal matrix of the variance. (2 43) where is the eigenvectors of and is the diagonal matrix of the variance. So, (2 44) In s tep 4 the translation and dilation indeterminacy are solved by extending the invariance transformation characteristics of UIRT to the multidimensional ability space under certain circumstance s The invariance transformation for MIRT indicates that if we have a multidimensional ability space system and the ability ve ctors for each dimension are (where is the number of dimensions), the standardized ability vectors for each dime nsion are given by (2 45) Then, the probability of a correct response for any item given a specific ability value remain s the same as long as the item discrimination vecto rs and are rescaled by multiplying the standard deviations, respectively, a nd the item difficulty is

PAGE 72

72 rescaled by Thus, the mean and standard deviation for com mon examinees on each dimension are transformed so that the ability estimates in base and equated tests are on the same metric includes two rotation steps and one dilatio n/translation step to sol ve the MIRT scale indeterminacy problems between two test forms. This method provides valuable knowledge of the MIRT linking principle, but the procedure is very complicated For example in the two dimensional MIRT case, eight set s of coefficients must be calculated including four rotational matrices, two dilation parameters and two location parameters need to be estimated (Li, 1997). However, current MIRT calibration programs (e.g., NOHARM, TESTFACT, ConQuest) provide orthogonal a bility dimensions as a default option so that the procedures to set common basis vectors Thompson, Thompson, Nering, and Davey (1996) developed a MIRT linking framework for the randomly equivalent group design on non overlapping test forms Because of the randomly equivalent examinee groups design, the means and standard deviations across groups are specified to be equal. Accordingly, there is no need to solve the translation a nd dilation indeterminacy Thus, only rotation indeterminacy needs to be solved for the Thompson et al. method (Thompson, et al. 1996). Since randomly equivalent examinee groups with the non overlapping items design is applied, no common items exist in two f orms and a test form requirement must be satisfied before scale transformation the test forms must be parallel. In order to satisfy this test forms requirement, cluster analysis (Miller &Hirsch, 1992) must be conducted to ensure two test forms have the sam e test structure and the direction of the reference composite of traits for the tests must be the same.

PAGE 73

73 M athematically, the Thompson et al. (1997) procedure is used by applying the orthogonal Procrustes rotation. In the process of the Procrustes rotation, is defined as the matrix of item discrimination parameters for equated form, is defined as the matrix of item discrimination parameters for base form, and is the transformation matrix ( ). Thus, the rotation procedure is conducte d to find the transformation matrix so that is minimized, where and is the matrix trace operator. The main idea of using orthogonal Procrustes rotation to transform rotation scales for two test forms is borrowed from Schonemann (1966) singular value decomposition (SVD) This singular value decomposition (SVD) procedure is expre ssed as follows : (2 46) (2 47) (2 48) w here is a diagonal matrix of the square root of the eigenvalues of and and are matrix of eigenvectors of Thus, the transformation matrix ( ) to solve the rotation indeterminacy problem for two test forms is obtained. In Thomps on et al. (1997) study, two criteria to evaluate the rotation results have been discussed, including the congruence coefficient ( i.e., ; Tucker, 1951) and the least square for the matrix trace operator (i.e., ) The study indicated that the least square for the matrix trace operator (i.e., ) is a better criterion than the congruence coefficient Therefore, the least square has been used as a criterion in the Thompson et al. (19 97) study. Method Li and Lissitz (2000, see also Li, 1997), first attempted to systematically discuss a couple of possible approaches for MIRT linking from different perspectives. L i and L method

PAGE 74

74 hence forward referred as th e LL method, is applied for the nonequivalent anchor test design (NEAT). By acquir Thompson et al. method (1997) and traditional factor analysis techniques, Li and Lissitz (2000) found the best composite transformati on procedur e to solve MIRT scale indeterminacy problems from four different procedures they proposed. By deal ing with rotation, translation, dilation issues separately the best composite transformation procedure of the LL methods seeks three linking compo nents including (1) a rotation matrix ( ) from the orthogonal Procrustes solutions, (2) a translation vector ( ) obtained by a least square procedure to minimize differences between item difficulty paramete rs from base form and transformed parameters from equated form, and (3) a central dilation constant ( ) computed by the trace method to minimize the sum of square difference between two centralized discrimination parameter matrices through rotation matrix from both forms. The details of this best procedure are described as follows: F irst, a rotation matrix is obtained from the orthogonal Procrustes solutions (Schonemann, 1966). As and are item discrimination estimates matrices for the base and equated test form, respectively, the rotation matrix is obtained by minimizing the residual matrix where (2 49) S econd, the translation vector is obtained by minimizing the sum of squared deviation (2 50)

PAGE 75

75 w here is transformed item difficulty parameter, is the number of dimensions, and is the number of common test items. Third, the central dilation constant is obtained to minimize t he sum of squared errors of the residual matrix by the trace method, (2 51) In the third step, the rotation matrix and the unit change coefficient are derived simultaneously by using the trace method through (2 52) w here is the matrix operator of the sum of diagonal elements (trace), and are centra lized item discrimination parameter matrix for base form and equated form respectively. Once the rotation matrix the translation vector and the central dilation constant are ob tained the transformation of parameters in the MIRT model for equated form is as follows: (2 53) (2 54) (2 55) (M) Method Based on the LL method, Min (2003) develop ed a new approach as an amelioration of the LL method. for the nonequivalent anchor test design (NEAT) In Li and using one dilation constant to cover overall dilation adjustment was enough because multiple dimensions in the te st may be strongly related. Accordingly a change in one dimension may go along wi th other dimension (s) change s with the

PAGE 76

76 same dilation (Li & Lissitz, 2000) However, this m ay not be the case in the real world situation. In the real world situation, if multiple dimens ions in the test were not close ly related, the unit changes in one dimension may be dif ferent from the other dimension (s) unit changes so that the dilation rate for different dimensions varies. called the M method, a diagonal dilation matrix is proposed to replace the dilation constant in the LL method, where is a diagonal dilation matrix. For the two dimensional case, is defined as Off diagonal elements in are set to zero because the correlation between two d imensions is defined by the orthogonal rotation matrix Thus, the transformations of the M method are (2 56) (2 57) (2 58) Other coefficient s and the scale transformation s are the same as the LL method. In the M method, the rotation, dilation, and translation coefficient are calculated based on orthogonal Procrustes solution. (2 59) (2 60) (2 61) where and are known as item discrimination parameter matrices for anchor items in equated form ( ) and base form ( ), is the number of item, is the number of dimensions. is the orthogonal rotation matrix to minimize the least square difference ( ) between

PAGE 77

77 equated form and base form (i.e., ), and is the identity matrix. is the minimization of the trace function for the pro duct of the least square difference ( ) and its transpose ( ). N ote that all these transformation s for two forms are under the NEAT condition. The approach to obtain the rotation matrix and the translation vector in the M method, are the same as the LL method. Both methods use the eigenvalue and eigenvector techniques (singular value decomposition, Schonemann, 1966). The dilation vector is obtained using the function as follows (Min, 2003). (2 62) where is the diagonal dilation matrix. Reckase and Martineau (NOP) Method By identifying an important weakne ss in the Min (2003) approach to MIRT linking, procedure (2004), hence forward referred as NOP method since it is based on a non orthogonal Procrustes transformation is applied for the NEAT data collection design. In Reckase and asserted that the M method performs well when the dimensionality modeled is low, but when the dimensionality modeled is high, the re is a big computational burden fo r the M method being applied in MIRT linking Because an interaction between dimensional dominance across forms exists and the ordering of the dimensions may change from one form to another in orthogonal Procrustes rotation the dimension orders for both f orms may

PAGE 78

78 not remain the same Th erefore the non orthogonal Procrustes transformation (Mulaik, 1972) is suggested by Reckase and Martineau to solve this problem. There are two advantages to employ non orthogonal Procrustes transformation in MIRT linking. First, n on orthogonal Procrustes transformation automatically aligns each dimension of the item discrimination matrix ( ) from the equated form with the dimensions of the corresponding matrix ( ) from the b ase form depending upon the best fit of the dimensions of the matrix with the dimensions of the matrix. Second, the non orthogonal Procrustes transformation already includes scale dilations for each dimen sion, so no dilation parameter/vectors are needed in this procedure. According to Mulaik (1972), the non orthogonal Procrustes procedure results in the rotation matrix is given by, (2 63) w here is th e rotation matrix, is the item discrimination matrix ( ) from the equated form and is item discrimination matrix from the base form. The MIRT linking transformation s are shown a s follows : (2 64) (2 65) (2 66) (2 67) where is the translation vector and can be computed as equation above

PAGE 79

79 Oshima, D (ODL) Method Different from all the MIRT linking methods described above, Oshima, Davey and Lee (2000) developed a series of MIRT linking methods from the perspective of the IRT framework. referred as the ODL method can be treated as the direct extension of UIRT linking methods. The ODL method is used for the NEAT da ta collection design. Similar to the MIRT linking methods previously described, t he linear transformations for both test forms are : (2 68) (2 69) (2 70) w here is the rotation matrix to adjust variances and covariances of the ability dimensions (s cale), is the translation vector to adjust the location of origins and the asterisk ( ) indicates transformed parameters from the equated test form In a series of procedures in the ODL method, a non orthogonal rotation matrix and a translation vector are included. In the ODL method two different rotation procedures are available. If the correlation between dimensions is the same across test forms the rotation matrix is orthogon al ; otherwise the rotation matrix is non orthogonal. I n the ODL method, the scale dilation is conducted simultaneously with the rotation procedure. The rotation matrix has two functions including a rotation for a proper dimensional orientation and a dilation to adjust the va riances of ability dimensions. Accordingly no dilation parameter is needed in the ODL method. Note that t he rotation matrix and translation vector are analogous to the dilation coefficient and translation coefficient in UIRT scale linking

PAGE 80

80 There are four different procedures develope d in the ODL method to minimize the targeted criterion function with respect to the rotation matrix and the translation vector These four procedures are the direct method, the equated function method, the test characteristic function (TCF) method, and the i tem characteristic function (ICF) method. The goal of these four minimization procedures in the ODL method is to make the transformed item parameter estimates (i.e., ) fro m the equated form as similar as possible to the item parame ter estimates (i.e., ) from the base form by choosing the "appropriate" rotation matrix and the translation vector Note that t he rotation matrix and tra nslation vector are determined simultaneously for all four of the ODL procedures. The direct (OD) p rocedure As a multivariate extension of the unidimensional minimum chi square linking method (Divgi, 1985), the direct procedure e stimate s linking functions by minimizing the sum of squared differences between corresponding elements of item discrimination parameter matri x for the base form and transformed item parameter matrix for t he equated form and the item difficulty vector for the base form and the transformed item difficulty vector for the equated form, over all anchor items ( ) The function to be m inimized with respect to is as follows, (2 71) where is the number of items, is the number of dimensions. The equated function p rocedure The equated f unction procedure is the multidimensional method in the UIRT linking This equated function procedure seeks to minimize the sum of

PAGE 81

81 squared differences between functions defined on sets of selected elements of discrimination mat rices and and difficulty vectors and The number of functions needed ( ) is determined by the number of unknown values in and The function of is, (2 72) where is the number of dimensions in the scale system. For example, in the UIRT, the number of dimension is equal to one (i.e., ), therefore, the number of function is equal to 2 ( ) so that the number of linking functions for UIRT is two ( i.e., a nd ). In addition if the number of dimensions is equal to two ( ), the number of functions is six ( ). T hat is, four unknown values in the rotation m atri x and two unknown values of translation v ector must be calculated for this two dimensional case. Therefore, six functions are needed for the two dimensional case on the common ite ms The functions specified are (2 73) (2 74) (2 75) (2 76) (2 77) (2 78)

PAGE 82

82 where is the total number of common items, and is the t h item number. T he function to be minimized is (2 79) where is the number of elements to be estimated is the mean functions of separate sets of item parameters for base form, and is transformed mean functions of separate sets of item parameters for equated form The test characteristic f unction (TCF) procedure The test characteristic function procedure (TCF) is the multidimensional extension of the Stocking and Lord's (1983) unidimensional IRT linking procedure The TCF method minimized the sum of squared differences between multidimensio nal test characteristic surfaces (2 80) where is the item response function for the multidimensional model with corresponding parameter sets ( ) and ( ). Instead of integration, the TCF procedure applies the discrete approximation method for integration and sets as the number of quadrature grids for matching vectors. T he outside summation is applied to approximate the multiple inte gral over the ability space The re are two different version s of this TCF procedure: weighted version and un weighted version. The weight ed version of this TCF method is u sed by multiplying the squared difference of the summations to the weight vector corresponding to the differen t values to recognize that some regions of the scale are more impor tant than others (i.e., the weighted version). The un weighted version of the TCF method treats different values as equally weighted by assigning the weight equally ( i.e., )

PAGE 83

83 The item characteristic f un ction (ICF) p rocedure S im ilar to the TCF procedure the item characteristic function (ICF) procedure minimizes the squared differences between item response functions. However, instead of minimizing test characterist ic functions as the TCF procedure does, the ICF procedure minimiz es the square difference of the sum of the item characteristic surface between two forms The ICF procedure is treated as the extension of its UIRT linking procedure proposed by Haebara (1980). The minimization function is given by (2 81) where is the number of items, is the number of quadrature grids for the matching vectors In summary t he directed procedure in the ODL meth od minimizes di erences between common item parameter estimates ; t he equated function procedure, as an extension of the mean/mean method in UIRT linking, matches scaling functions in MIRT framework ; t he TCF procedure matches test response functions or surf aces for both test forms; and t he ICF procedure matches item response function or surface for bot test forms. Summary of MIRT Scale Linking In general, MIRT linking is looking for the solutions to resolve indeterminacy on rotation, translation, and dila tion. The correlation between item parameters changes along with the other three types of transformation. The MIRT linking methods described above in this section differ in : (1) data collection designs; (2) the theoretical foundation to solve rotation inde terminacy (IRT perspective or factor analysis perspective) ; (3) the rotation approach ( orthogonal or non orthogonal ); (3) including or not including the dilation parameters, and ; (4) what kinds of dilation parameter the method s have

PAGE 84

84 Hirsch metho d is valued as the first attempt to deal with MIRT linking Alth ough, method is complex and difficult in the practice, it provide s a set of ideas as a guideline for the later MIRT linking methodology development. imp ortance of building th e common metric to link scale systems from both test forms. In addition rotation, dilation, and translation are specified by Hirsch as the three key issues of MIRT linking. Later Thompson et al. (1997) developed a MIRT linking proced ure for the ra ndomly equiv alent group design. The Thompson et al. (1997) procedure is the first and only MIRT linking procedure applied in the randomly equivalent group data collection design situation. N o translation and dilation are needed in this method but some strong assumptions and constraints must hold prior to the linking procedure application. Since no common items or common examinees exist in both test forms, the test forms must restrictive mean ing that not o nly the dimensions of the test forms must be the same, but also the test structures of the forms must be similar. Second, besides dimensionality analysis, the cluster analysis (Roussos, Stout & Marden, 1998) must also be involved in the Thompson et al. (19 97) method to classify the homogenous clusters in the test. Third, items which were identified in the same cluster should measure the same composite of traits. Several MIRT linking methods were developed for the NEAT design, including the L L method (L i & Lissitz, 2000) the M method (Min, 2003), the ODL method (Oshima et al. 2000), and the NOP method (Reckase & Martineau, 2004). The LL method resolve s the three indeterminacy problems separately by using a translation vector a sc alar dilation parameter and orthogonal Procrustes rotation matrix respectively. The M method ( Min, 2003) improved the LL method by replacing the dilation constant in the LL me thod to the diagonal dilation

PAGE 85

85 matrix that allows for d ifferential dilation/contraction of the scales of the various dimensions ( Min, 2003 ). The NOP method applies non orthogonal rotation to correct the weakness in the M method that an infeasible burden of computation exists as dimensionality of test is high. Among all MIRT linking methods for the NEAT design, the ODL method (Oshima et al. 2000) and the NOP method (Reckase & Martineau, 2004) allow using a non orthogonal rotation app roach to solve the rotation indeterminacy problem. In contrast the LL method (Li & Lissitz, 2000) and the M method (Min, 2003) stick to the orthogonal rotation approach. I n the ODL method and the NOP method, the dilation indeterminacy and rotation indete rminacy are solved simultaneously so that no dilation parameter s exist in these two methods. IRT Equating Measures of Ability In the IRT framework, s proficiency. These measures in clude raw score (i.e., number correct score) IRT ability estimate and scale score. Under UIRT framework, the estimated IRT ability (i.e., ) is a scalar so that the estimated IRT ability on one test is symmetric to its corresponding es timated IRT abili ty on the other test. I f the test s are scored using estimated IRT abilities, be compared by using his/ her estimated IRT ability. N o further procedure is needed to develop a relationship between scores on two test forms. That is, after UIRT scale linking, estimated IRT ability from different test forms are in the same metric and can be compar ed directly. However, s everal practical problems may occur when estimated IRT abilities a re used to First, IRT estimation method uses the entire response pattern (i.e., 0/1) rather than the number correct score to estimate ability so

PAGE 86

86 that estimating the same numbe r correct score with different response patterns may end up with different estimated abilities This can be difficult to explain to examinees. Second, the estimated ability at the high or low end within population ability distrib ution intervals typically has greater measurement error than the estimated ability in the middle of the population ability distribution intervals so that the estimated ability at the high or low end may be less accurate Third, it is hard to bring the meaning of scores for test users if we just report ability estimate as test scores. Therefore, to date, the number correct score other than ability estimate, is widely used as tes t scores for the test report even when IRT is used to develop and equate in the most large scale assessments. In this situation, an additional step is required in the IRT process, which we call IRT equating. In the UIRT framework, two equating methods a re proposed to equate true scores and to equate observed scores. These two methods are UIRT true score equating and UIRT observed score equating. UIRT True score E quating In UIRT true score equating, a lthough the number correct scores on test forms are u sed, the association between true score and ability estimates in two test form s is connected through the test characteristic curve function in UIRT true score equating. Thus, UIRT true score equating ca n be graphically displayed as true scores through the test characteristic curve (TCC) for both test forms. The test characteristic function is the sum of the item response function over test items. W is known, true score for this examinee can be obtained,

PAGE 87

87 (2 82) Therefore, the s, to some extent, can be approximately connected with their number correct scor es by substituting the summed score with then solving using numerical methods. This function is monotonic ally increasing. T he test characteristic curve function for the number correct true score on f orm X (i.e., equated test form) that is equivalent to is defined as (2 83) where is the total number of items on f orm X Likewise test characteristic curve function s for th e nu mber correct true score on f orm Y (i.e., base test form) that is equivalent to is defined as (2 84) where is the total number of items on f orm Y. T rue score and are considered to be equivalent for a given Th us, (2 85) w here is defined as the corre sponding to true score and A three step procedure is developed for UIRT true score equating. F irst, a true score on Form X is specified. Second, an iterative procedure (i.e., N ewton Raphson method ) is typically used to find the t hat corresponds to that true score Third, corresponding true score for

PAGE 88

88 form Y is derived by putting in the test characteristic function T he iterative procedure Newton Raphson method will be described in detail in the methodology section UIRT O bserved score E quating The spirit of UIRT observed score equating is the convention al equipercentile equating and compound binomial recursion algorithm IRT only play s a minor role in UIRT observed score equating (Lord & Wingersky, 1984). First, an estimated distribution of observed number correct scores on each form is generated from th e IRT model. Second, the conventional equipercentile equating is used to equate the scores for both test forms. The distribution of observed number correct scores for examinees of a given ability is produced by the compound binomial distribution (see Lord and Wingersky, 1984). Then, these observed score distributions are cumulated over a population of examinees to produce a number correct observed score distribution for both test forms, respectively. Therefore, the explicit specification of the distributio n of ability in the population of examinees is restrictively required. The compound binomial distribution (see Lord and Wingersky, 1984) is express ed as a recursion formula. First, is defined as the conditional observed score dis tribution over the first items for examinees of ability. Thus, as the probability of earning a score of 0 on the first item and as the probability of earning a score of 1 on t he first item are specified. For the recursion formula is as follows: (2 86)

PAGE 89

89 Through this recursion procedure, the conditional observed score distribution for examinees of a given ability is obtained. Once the conditional observed score distribution is determined through the procedure above, the conditional distribution is then multiplied by the ability density ( ) so that a joint distribution of observed score is obtained. By either accumulated or integrated over all joint distributions at each level of ability, and the observed marginal distribution ( ) is determined for each form. When the ability d istribution is continuous, (2 87) where is the distribution of When the abilit y distribution is discrete, (2 88) Once the observed marginal distributions (i.e., and ) are determined, they are weighted using synthetic weights to obtained the distribution of X and Y in the synthetic population. Finally, conventional equipercentile methods are used to find score equivalents with the same percenti le rank. Summary of UIRT Equating Methods Both UIRT observed score equating and UIRT true score equating have pros and cons compared with each other. Compared with the UIRT observed score equating, the UIRT true score equating method does not depend on th e distribution of sample ability and has an easier computation. However, in practice, only observed scores are available, not the true score. According to Lord and

PAGE 90

90 s if were included in the IRT model, equivalents are undefined at very low scores and at the top number correct scores (Kolen & Brennan, 2004). The UIRT observe d s core equating, on the contrary reflects the equating relationship for observed scores. Compared with UIRT true score equating, the disadvantage of UIRT observed score equating is its complex compound binomial computation and the sample dependent featur e. When the two groups are randomly equivalent, UIRT observed score equating and traditional equipercentile equating would end up with the same results, as long as the IRT model holds. Although these two equating methods have theoretical differences, these two methods produce very similar equating results in the NEAT design (Lord & Wingersky, 1984). MIRT Equating Similar to UIRT equating, in the be reported as several measures, such as number correct sc ore and IRT ability estimate However, new indeterminacy occurs when the MIRT ability estimate is used to report ) is a scalar within the UIRT framework. That is, u nder UIRT, the estimated IRT ability on one test is symmetric to its corresponding estimated IRT ability on the other test. If the UIRT scale linking is conducted, estimated IRT ability from different te st forms are in the same metric and can be compared directly. In the MIRT framework, t he estimated IRT ability (i.e., ) is no longer a scalar, but a vector ( ) Moreover, under UIRT, the relationship betwee n the location of a person

PAGE 91

91 in the ability scale and the probability of a correct response to the item is graphically represented as the cutoff point in the item characteristic curve. However, under M IRT, the relationship between the location of a person i n the ability space and the probability of a correct response to the item is displayed as the cutoff contour in the item characteristic surface. F or those examinees having ability vectors that fall on the same contour of the MIRT item the probability to a nswer this item correct ly is the same. D ifferent ability vectors falling on the same contour may end up with the same number correct scores. Accordingly, the ability vectors and their corresponding number correct scores are no longer symmetric. Mathematic ally, these ability vectors falling on the same contour are equivalent due to their same probability. However, the equivalence of these ability vectors cannot be directly classified until obtaining the same p value by putting them in a specific MIRT model Moreover, when two MIRT test forms are in the same common metric after MIRT linking, we still have differe nt sets of item parameters for two test form s Thus, after MIRT linking, the equivalent relationship of these ability vectors from two forms is still indirect. That is, in the MIRT framework, the transition to demonstrate the equivalence between two ability vectors to which cor respond different test forms becomes more complex and indirect. For example, in a two dimensional compensatory logistic model, for a test item in the base form with and with ability vector the probability to answer this item correct for is equal to 0.5. Meanwhile, in the same MIRT model, for a test item in the equated form with and with ability vector t he probability to answer this item correct for is also equal to 0.5. The mathematical expressions for two ability vectors are shown as follows. on item in base fo rm :

PAGE 92

92 on item in equated form : T he ability vector on item in the base form and the ability vector on item in the equated form are equivalent in that the probabilities of these ability vectors (i.e., ) to answer these items (i.e., ) respectively are the same. This equivalence of two ability vectors on two items in different test forms cannot be identified until putting them in the MIRT model with c orresponding items. Thus, the equivalence relationshi p of the MIRT for these ability vectors from two forms is still indirect. This process to put items in the MIRT model so that the same p value is obtained is the linear combination procedure commonly used in the multivariate analysis. In fact, linear combi nation procedure devectorizes the vector or multidimensional features in the item parameters. This devectorization can also be treated as a process of unidimensionalization. Likewise, on the test level, for each combination of ability levels (correspondi ng to each dimension), the probabilities of obtaining correct responses to each item are summed to form true scores (i.e., ) in the test characteristic surface (TCS). For a particular true score, there are infinite numbers of combi nations of ability levels that are associated with this true score. So, a lthough Hirsch (1989) stated that test equating is available in the MIRT framework by stating o

PAGE 93

93 MIRT equating is not available if the IRT ability estimate vector is used as the ability measure. This is because multiple ability vectors could possibly result in the same num ber correct scores so that the symmetry property (Lord, 1980) of equating is no longer satisfied. So, if the IRT ability estimate in MIRT test equating for two different forms is not available. Even th ough many MIRT linking research called themselves they are in fact, still exploring MIRT scale link ing. In order to match the test equating needs from the educatio nal practitioners, the only possible solution to make the MIRT equating a vailable is to use the number correct score or scale score as the ability measure in MIRT. In fact, when the number correct score is used as the ability measure, the MIRT ability v ector is unidimensionalized so that the symmetry property (Lord, 1980) of eq uating is satisfied Consequently, the multidimensional characteristics in the MIRT ability measure no longer remain. This process is called unidimensionalization Therefore, unidimensionalization is the solution to make the MIRT equating possible. Theo retically, using number able to c onduct test equating under MIRT, but we may lose the multidimensional characteristics of the MIRT ability measure. Therefore, the questions whether using the number correct score as the ability measure in the MIRT framework and whether the current existing MIRT equa need further investigation. This will be discussed later after the MIRT equating methodology being introduc ed B r the first attempt to explore MIRT equating methodology. The the randomly equivalent group design.

PAGE 94

94 In the t equating procedures two observed score procedures and one true score procedure for MIRT equating were discussed. The first he other t wo MIRT equating methods apply the unidimensional approximation algorithm developed by Zhang and colleagues (Z hang, 1996; Zhang & Stout, 1999 ; Zhang & Wang, 1998 approximation of MIRT true s core equati nidimensional approximation of MIRT observed score e And, the unidimensionalization is in the process of all three MIRT equating procedures developed by Brossman (2010). In the following section, the description o procedure is given Following that, the unidimensional approximation (Zhang, 1996; Zh ang & Stout, 1999; Zhang & Wang 1998) which is closely relevant to the formation of MIRT true score equating is depicted Later, nidimensional approximation of MIRT t rue s core e pproximation of MIRT o bserved s core Full MIRT Observed Score Equating (MOSE) The full MIRT observed scor e equating method (MOSE) is a straightforward extension of UIRT observed score equating. Similar to the UIRT framework, the distribution of observed number correct scores for examinees of a given ability combination is produced by the compound binomial di stribution through a recursion formula (Lord & Wingersky, 1984). But, u nlike the U IRT observed score equating, the conditional observed score distributions (i.e., ) are determined at each combination of ability levels (i.e., the c ombination of each grid points at ) in the entire ability space (Kolen & Wang, 2007) where is the ability combination

PAGE 95

95 vector (i.e., ). In the recursion formula, the single abilit y sc alar for UIRT is replaced by the vector of the combination of ability levels as follows: (2 89) Next these conditional obse rved score distributions (i.e., ) are then multiplied by the ability densit y ( ) so that a joint distribution of the observed score is obtained. Once the conditional observed score distribution is determined similar to the UIRT observed score equating method, these conditional distributions are then mult iplied by the multivariate ability density ( ) to obtain the joint distribution of observed score for the test forms Finally, the observed marginal distribution ( ) is determined for each form by either mu ltivariate accumulated or multiple integrated over all joint distributions at each level of ability combination on the ability space (i.e., ) The mathematical expression is displayed as below. (2 90) O r (2 91) where is defined as the number of dimensions. Note that the multivariate accumulation or multiple integration for obtaining marginal distribution of observed scores in the final step executes th e undimensionalization for the MIRT equating.

PAGE 96

96 Similar to the UIRT observed score equating after these transformations, the tr aditional equipercentile method is applied to equate both test forms. Unidimensional Approximation Note that the unidimensional ap proximation of MIRT true score equating proced ure and the unidimensional approximation of MIRT observed score equating procedure both apply the unidimensional approximation algorithm (Zhang & Stout, 1999) as the foundation in their test equating procedure. The details of this unidimensional approximation are presented as follows. First, a generalized multidimensional compensatory model is defined as (2 91) are nonnegative and not all zero, and i s any monotonically non decreasing function with f or all x, and not being zero identically as v aries for (mathematically (Zhang & Stout, 1999a, p. 133). For this family of generalized multidimensional models, is the discrimination par ameter vector, is an index related to the difficulty parameter, and is a link function (Z hang, 1996, Zhang & Stout, 1999 and Zhang & Wang, 1998). There are two commonly used generalized multidimensional compensatory models including the multidimensional compensatory three parameter logistic model (M3PL) and the multidimensional compensatory three parameter normal ogive model (M3NO). In practice, because of its easier computation, the logistic model has b een used more often. Thus, the M3PL model can be written as:

PAGE 97

97 (2 92) with corresponding link function (2 93) In previous MIRT research, the idea s of reference composite and direction of best measuremen t have been developed (Zhang, 1996; Zhang & Stout, 1999 ; Zhang & Wang, 1998) Because of the ideas of reference composite and direction of best measurement, Zhang and Stout claimed that any set of item responses adequately modeled by a multidimensional I RT model, can be closely approximated by a unidimensional IRT model with estimated unidimensional ability composite ( ) and estimated unidimensional item parameters (Zhang & Stout, 1999 ). The ability composite of the multidimensional ability vector (i.e., ) is defined as a standard ized linear combination of That is (2 94) where is the multidimensional item discrimination estimate matrix and is defined as the direction of composite or the unidimensional approximation of the multidimensional ability vector (i.e., ), and is constrained for the scale specification. Additionally the sum of the direction of composite is also defined as 1(i.e., ) T his appro ximation is true for any generalized multi dimensional compensatory model. So, t he directions for the standardized linear composite are estimated as

PAGE 98

98 (2 95) where is the complet e multidimensional ability estimate vector, is the score weight, is a positive constant to ensure that the sum of the direction of composite is equal to 1(i.e., ) and is the expected operator. Under the assumption that all terms (2 96) are equal ly weighted the formula of the direction of the linear composite can be simplified as (2 97) where is the total number of items on the test. The term is considered as the to associate with the test direction and e ach term in the is com pletely determined by the product of the score weight and the item model (theoretical) weight which is mainly determined by the derivative of the link function and the item discrimination parameter vecto (Zhang, 1996, p. 25). So, the item parameters of the UIRT approximation for the MIRT model can be obtained as follows:

PAGE 99

99 UIRT approximation discrimination: (2 98) UIRT approximation index of difficulty: (2 99) So, UIRT approximation item difficulty is obtained: (2 100) UIRT approximation lower asymptote parameter: (2 101) Variance of the directions for the standardized linear composite : (2 102) T he true score of this unidimensional approximation model for the linear composite ( ) is defined as This true score ( ) a ssociated with the linear composite ( ) is the sum of the probabilities to obtain correct responses over all items at each composite ability level and can be mathematically expressed as, (2 103) This exp ression preserves the property of unidimensional IRT true scores in that the function is monotonically increasing (e.g., Zhang et al., 1999 ). is defined as the direction corres ponding to the average multidimensional information funct ion evaluated in all directions so that which unidimensional composite would best repre sent this number correct score can be determined

PAGE 100

100 In the application of the direction of best measurement as t he unidimensional ability scale, the assumption of the complete multidimensional ability is given that has a standardized multivariate normal distribution, where is a positive definite covariance (correlation ) matrix with for By holding this assumption, the unidimensional (marginal) item parameters corresponding to the approxim ated unidimensional ability scale are determine d Thus, the coefficients fo r the direction of linear composites (i.e., ) provide a unidimensional ability estimate which can be co nceptually viewed as an unidimensional approximation of the multidimensional combination of the proficiencies that the test most precisely measures (Brossman, 2010) Note that the unidimensional approximation is a procedure of unidimensionalization and can be applied either at the MIRT linking step or at the MIRT equating step after MIRT linking. Unidimensional Approximation of MIRT True Score Equating (ATSE) After conducting the unidimensional approximations, the UIRT true score equating procedure is uti lized to equated composite true scores ( ) on both multidimensional test forms. True score from the base form and true score from the equated form are considered to be equivalent, for a given Thus, (2 104) T hroughout the iterative procedure (i.e., Newton Raphson method) the function of the is minimized, (2 105)

PAGE 101

101 Finally, using the IRT definition of true score, the composite true score on the base form associated with the composite true score on the equated form can be computed as (2 106) Unidi mensional Approximation of MIRT Observed Score Equating (AOSE) The procedure of unidimensional approximation of MIRT observed score equating, is the same as the UIRT observed score equating. First, the unidimensional item parameters and abilities estimate s are estimated through the unidimensional approximation procedure Then, the conditional distributions for the unidimensional ability composite is determined at each composite ability level ( ) through th e compound binomial recursion formula (Lord & Wingersky, 1984) The n, the marginal distribution for each observed score is computed by conditional distributions multiplied the e stimated unidimensional ability distribution in the population of examinees across the estimated u nidimensional ability space as: (2 107) O r (2 108) Finally, the conventional equipercentile procedure is applied to equate the test scores for both forms

PAGE 102

102 CHAPTER 3 METHODOLOGY The main purpose of this study is to investigate the performance of the MIRT equating procedures under the NEAT In this study, five different MIRT linking methods and three MIRT equating methods are inc luded. The purpose of this study is to evaluate how the interactions between MIRT linking method and MIRT equating procedures impact the equating results under various testing conditions. In this chapter, the details of simulation design and experimental c onditions are summarized with the introduction of the data generation. Then the details of IRT estimation, IRT linking, and IRT Equating are described. IRT estimation is used to estimate the item parameters and ability estimates from different models on th e MIRT data; IRT Linking (five different MIRT linking methods) is used to transform the parameter scales from different MIRT linking methods under the NEAT design; and IRT Equating (three MIRT equating methods) is used to obtain equivalent scores for the d ifferent test forms from different equating methods. Some special features and technical details of MIRT linking and MIRT equating including possible minimization algorithms, quadrature nodes chosen in the discrete approximation of the integration, and the combination of quadrature nodes and quadrature weights used a re introduced. Finally, the evaluation criteria and data analysis procedure are presented Simulation Design A s imulation design is recommended in this study since systematic errors are able to be separated from the random errors by using simulation design (Min, 2003). Kolen and Brennan (2004) claimed that a major consideration in designing and conducting equating is to minimize equating error. Because population parameters are known in the simu lation study, it is possible to compare the population parameters with parameter estimates so that we can separate the systematic errors from random errors.

PAGE 103

103 There were four steps to conducting the simulation for this study : item response generation, IRT ca libration, test linking, and test equating. Model Used to Generate Data As mentioned in C hapter 2, there are two types of the MIRT models: compensatory and partially compensatory (or non compensatory) models. Because estimation with the compensatory mo dels is relatively easy, most research and applications on MIRT linking and equating have used the compensatory models (e.g., Davey, et al., 1996; Hirsch, 1989; Thompson, et al., 1997). R esults from the previous MIRT linking/equating research regarding whi ch MIRT models were used were summ arized in the Table A 1. Among all MIRT linking/equating articles, ten out of thirteen articles used the compensatory two parameter two dimension logistical model (M2PL) (e.g., Li, 1997; Min, 2003). Only three studies (e.g ., Davey et al., 1996; Simon, 2008 ; Yao, 2008 ) used a different MIRT model (e.g., the compensatory three parameter two dimension logistical model (M3PL)). Since the lower asymptote parameters are not directly related to MIRT linking processes, for the purp ose of simplicity, the lower asymptote parameters were not considered in this simulation study. Therefore, the M2PL model was selected as the MIRT model for data simulation. Because TESTFACT (Bock, Gibbons, Schilling, Muraki, Wilson, & Wood, 2003) was used for item calibration for this study and TESTFACT applies normal ogive model in its estimation process, so for the item parameter direct comparison purpose, the scale constant ( ) was added in the M2PL model to generate data. Test Le ngth The length of total test is an important issue in the test linking/equating research. According to previous linking and equating literatures, a test form must be sufficiently long so that the purpose of the test and the similarity requirement for link ing test form and alternate forms are satisfied (Kolen & Brennan, 2004). According to Kolen and Brennan (2004), if other

PAGE 104

104 requirements are fulfilled, the rule of thumb to the test length should be at least 30 40 items in a test. Since the simulation researc h was applied in this study, factors that influence the length of test (i.e., the test content specification, the test structure feature, and the test statistical specification, etc.) were well controlled. Thus, the length of total test was set as 40. Anch or Test Length The length of anchor test is also an important factor to ensure successful equating. T here is no absolute agreement about the length of the common/anchor test in educational measurement. Because the characteristics of the set of anchor test items (i.e., the test content specification, the test statistical specification item parameter features etc.) vary, the length of anchor test varies accordingly so that test equating could be successfully conducted. In general, the longer the common/anc hor test, the more accurate the equating result (Bastari, 2000). Based on a systematic literature review on linking and equating (Kolen & Brennan, 2004), one rule of thumb to the number of common items should be at least 20% of the length of a total test c ontaining forty or more items for a unidimensional test (Angoff, 1971; Brennan 1987; Kolen & Brennan, 2004). Theoretically, because of multiple dimensions, MIRT linking needs more common/anchor items than UIRT linking needs. According to previous MIRT link ing/equating literatures, the number of common/anchor items used in most of the MIRT linking/equating research ranged from 15 to 40 Davey et al. (1996) used 40 common/anchor items. Li (1997) utilized 15 and 25 common/anchor items in his study. Oshima et al., (2000) adapted 40 items in the common/anchor test. Min (2003, 2007) set 20 common/anchor items for his MIRT linking study to compare three multidimensional linking methods. Yon (1996) chose 30 items as common/anchor test in the study. Simon (2008) and Yao (2008) both included 20 items in the common/anchor test in their studies, respectively. Wei (2008) applied 20 and 40 common/anchor items in his study. Previous research found that the number of items had a significant influence

PAGE 105

105 on the stability of lin king transformation parameter estimates for multidimensional linking (Li, 1997; Oshima et al. 2000; Simon, 2008; Wei, 2008). The results from the previous MIRT linking/equating research regarding the number of common/anchor items were summarized in Table A 2. T herefore, in this study 2 0 items out of total 40 test items (i.e., 50% total items) were applied as the common/anchor test section Thus, the test for both test forms are divided into three different test sections: base form unique item section, equat ed form unique item section, and anchor item section. For each item secion, there are 20 test items. Test Structure and MIRT Item Parameter The concept of test structure is essential in the application of IRT In UIRT, the test structure In UIRT test stru cture refers to the distribution of item difficulty and item discrimination over the items in the test. Specifically, the UIRT test structure reflects the test difficulty and strength pattern of the relationship between the item response and ability. It in dicates the sensitivity of the test in which how the test items are responded on different ability levels. This sensitivity feature of the test structure is demonstrated through the combination and patter n of the item difficulty and item discrimination for all items in test. In MIRT, the test structure has more features than does test structure in UIRT. MIRT test structure refers to the distribution of three MIRT item characteristics over the item s in the test: and the direction of an item for the best measurement (Ackerman, 19 94; Reckase, 1985, 200 4 ; Reckase & McKinley, 1991). Note that the definition of the MIRT test structure is distinct from the idea of the dimensionality of the test response data. Reckase (2004) indicated that dimensionality is a property of the response data, not the test. The test structure is the combination of the item parameters on each dimension of the multidimensional space for this test.

PAGE 106

106 According to the literature review by Tate (2003), there are three types of test structure for MIRT: simple structure (SS) approximate simple structure (APSS), and complex structure (CS). The concept of simple structure was first given by Thurstone (1947). The combination of a test conf iguration and the coordinate axes is called a structure The coordinate axes determine the co ordinate planes. If each test vector is on one or more of the co ordinate planes, then the combination of the configuration and the coordinate axes is called simp le structure The corresponding factor pattern will then have one or more zero entries in each row.( p. 181 ) In a restrictive definition given by R e ckase (2004), SS allows only one nonzero parameter on the dimensions for each item In SS all item vectors are exactly aligned with one of the coordinate axes in the multidimensional space. SS is an ideal ized test structure and rarely occurs in practice C ompared with SS approximate simple structure (APSS) is a more complex test str ucture. In APSS each item has a relatively higher loading on one dimension than on loadings on other dimensions. In other words, for a cluster of items, there is a high discrimination on the same dimension and low discriminations on the other dimension o f the items (Min, 2003). In general, the items in APSS have one large parameter and the other parameters are near zero. Complex structure (CS ) has the most complex structures, compared with the two afor ementioned test structures. In CS test items measure a composite of dimensions (Kim, 1994; Roussos, Stout, & Marden, 1998). In CS, each item has a relatively similar loading on one dimension to on loadings on other dimensions. In general, for the two dime nsional test, the items in CS have a comparatively similar parameter and to the other parameters. In the real world, CS is more common, compared with simple structure and APSS In agreement with the te st structure selected in the previous MIRT l inking research shown in Table A 3 (Li, 1997; Li & Lissitz, 2000; Min, 2003, 2007; Simon, 2008; Wei, 2008 ; Yao,

PAGE 107

107 2008 ), approximate simple structure (APSS) and complex structure (CS) were applied in this study. G enerating Item Parameters for Simulation In order to define the item parameters for this simulation study, the magnitude of the MIRT item discrimination and item difficulty must be taken into account. According to previous litera tures, there a re two different solutions to define item parameters for MIRT simulation (Min, 2003; Wei, 2008). On one hand, based on prev i ous empirical studie s (Reckase, 1985, 1997 ; Reckase & McKinley, 1991), Wei (2008) randomly drew parameters from a lognormal distribution with mean of 1.37 and standard deviation of 0.54. Similarly, the parameters for test items were randomly picked from a normal distribution with mean of 0.28 and standard deviation of 0.69. On the other hand, Min (2003) defined item parameters by fixing a set of values for and shown in Table A 4. These values were obtained from Roussos et al. of is 1.2, and the mean of is zero. Note that neither nor sensitivity and test structure. Once the magnit ude of and for each test item are fixed, the test structure is directly specified by the direction of the best measurement for each item, or, in other words, the angle of the item with the item discrimina tion vectors in the multidimensional space. Specifically, for the two dimensional test, the test structure is defined by manipulating the angle of each MIRT item with the direction cosine with the first coordinate axes of a multidimensional space ( ). The specifications of the direction of an item for the best

PAGE 108

108 measurement in MIRT simulation we Note that the two dimension coordinate system is used in these two studies. ( 2008) study, APSS was constructed by randomly selecting the angle between the item composite and its dominant dimension from a lognormal distribution with mean of and standard deviation of For CS, the an gle between each item and two dimensions was randomly selected from a normal distribution with mean of and standard deviation of y selecting angles of the item ( ) within given ranges To construct APSS in which one set of items loaded more strongly on the first dimension and the other set on the second dim ension, the angle range ( ) was for the first set of item and for the second set. To construct CS in which two sets of test items can loaded strongly on both two dimensions, two angle ranges ( ) were utilized F or the first set of test items the two angle range s ( ) were and ; for the second set of test items the ranges were and Although both methods for defining item parameters for MIRT simulation (Min, 2003; Wei, 2008) fulfill the needs of item parameter definition, the solution proposed by Min (2003) is simpler and can be easily displayed through a graphical representation. Therefore, for the purpose of the simplicity, a modifi cation of the procedure used by Min (2003) was applied in this study. In addition, for the simplicity purpose all simulated population parameters were rounded to three decima l places. First the magnitude of and were defined by giving fixed value to them. The values for and are presented in Table A 5. The mean of is 1.1, and the mean of is zero. This pattern of and was repeated two times

PAGE 109

109 for all twen ty items in base form unique section, equated form unique section, and anch or section, respectively For the test structure and specific item discrimination parameter on each dimension we first define the direction of best measurement for each item. For the APSS, 2 0 test items was separately defined by two clusters o f items. Each cluster includes 1 0 items. The items from the first cluster mainly loaded on the first dimension and the items from the second cluster mainly loaded on the second dimension. For CS 2 0 items were divided into four clusters, five items for each cluster. Two sets among them highly loaded on either of the two dimensions, and the remaining two sets were sensitive to composites of the two dimensions. The geometric representations to select the direction of best measurement for ea ch item ( ) to generate t hese two multidimensional test structu res are shown below in Figure 3 1 and Figure 3 2. Figure 3 1 Approximate Simple Structure (APSS) (Modified from Min, 2003)

PAGE 110

110 Figure 3 2 Complex Structure (CS) (Modi fied from Min, 2003) Once the direction of best measurement ( ) for each item was selected, similar to Wei and the MIRT equivalent of the UIRT difficulty, were computed by using the following formulas: (3 1) (3 2) (3 3) Note that the proce dure to simulate the item parameters for approximate simple structure and complex structure is repeated three times for all three item sections in the test base form unique item section, equated form unique item section, and anchor item section. The approx imate simple structure and complex structure item parameters for the base form unique item section, the equated form unique item section, and the anchor item section are presented in Table A 6, Table A 7 Table A 8, Table A 9, Table A 10, and Tab le A 11 re spectively. The graphical representations of these two multidimensional test structures used in this study are depicted in Fi gure 3 3 (APSS) and Figure 3 4 (CS) for base form unique item

PAGE 111

111 section, Figure 3 5 (APSS) and Figure 3 6 (CS) for equated form un iq ue item section, and Figure 3 7 (APSS) and Figure 3 8 (CS), for anchor item section Figure 3 3 Item v ectors APSS of base form unique item section Figure 3 4 Item Vectors CS of base form unique item section

PAGE 112

112 Figure 3 5 Item Vectors APSS of equate d form unique item section Figure 3 6 Item Vectors CS of equated form unique item section

PAGE 113

113 Figure 3 7 Item Vectors APSS of anchor item section Figure 3 8 Item Vectors CS of anchor item section

PAGE 114

114 Sample Size Previous research suggested that, in ge neral, 2000 or more examinees are needed to ensure the accurate recovery of parameter estimates from MIRT calibration (Ackerman, 1994; Reckase, 1995). Yao and Boughton (2007) also found that a sample size of 3,000 was needed for accurate and stable paramet er estimation for MIRT polytomous response. T he performance of MIRT linking and equating is also influenced by sample size (Li, 1997; Min, 2003; Simon, 2008 ; Wei, 2008 ). Since more parameters need to be estimated in the MIRT model, larger sample size is re quired for MIRT calibration than UIRT calibration. Reckase (1997 ) compared MIRT software NOHARM (Fraser & McDonald, 1988) and TESTFACT (Wilson, Wood, & Gibbons, 1987) and claimed that both computer programs produced comparatively stable parameter estimate s when sample sizes are 1000 or more. Tate (2003) also found that most of the multidimensional computer programs performed well when the sample size exceeds 2000. In previous MIRT linking/equating research, most of the studies used sample size s larger than 2000, except for some research studies in which the effect of sample size on MIRT calibration and MIRT linking was investigated (Min, 2003, 2007; Simon, 2008 ; Wei, 2008 ). S ample s size used in past MIRT linking literatures are shown in Table A 12 The purp ose of this study is t o evaluate the stability of MIRT equating results under ideal conditions. Thus, for the better result, a sample size of 2 000 was used in this study. In a typical equating situation, there are two groups of examinees the base group and the equated group. In the NEAT design, each group takes one form and both groups take the anchor items. These test forms are the base form and equated form. The sample sizes for each group were set to be equal to 2 000

PAGE 115

115 Number of Replication s Even th ough there is no rule of thumb to the number of replication s needed for reliable results in simulation study, previous MIRT linking and equating studies (Li, 1997; Li & Lissitz, 2000; Min, 2003, 2007; Oshima, Davey, Lee, 2000; Simon, 2008; Wei, 2008; Yao, 2008; Yon, 2006;) provide a guideline to set the number of replication s used in this study. The number of replications used in past MIRT linking literature is shown in Table A 13 Specifically, Li (1997) replicated conditions 100 times in his simulation st udy; Oshima et al. (2000) repeat their conditions 20 times; Min (2003, 2007) chose 25 as the number of replications; and, Yao (2008) and Wei (2008) both chose 500 as the number of replications in their studies, respectively. T he averaging number of replica tions in previous research was 200 ; 200 replications will be used in th e proposed simulation study. Ability Distribution Design When data are collected in a NEAT design it is common for the ability distributions to vary across the groups. Moreover, th e similarity of the ability distributions for the simulated populations which take the two tests has a significant impact on MIRT test equating and linking (Li, 1997; Min, 2003; Simon, 2008 ; Wei, 2008 ). Note that the characteristics of the ability distribu tion for groups also affect the requirement of other factors in test equating, such as the number of common items (Cook & Petersen, 1987). Although the population ability distributions were assumed to be multivariate normal, the mean, variance, and covari ance of the ability distributions varies across groups According to MIRT linking theoretical framework, the mean and variance difference can be adjusted through translation, dilation/contraction, and rotation. However, the between population difference s in the covariance among dimensions may cause more problems for MIRT link ing than mean and variance differences between groups.

PAGE 116

116 Current MIRT calibration programs (i.e., TESTFACT, NOHARM) assume the ability dimensions are orthogonal So, the correlations amo ng the dimensions are constrained to be 0.0 in the MIRT calibration. When correlation exists between ability dimensions, but correlations among the dimensions are constrained to be 0.0, the observed cor relations among the item scores must accounted for solely by the parameters (Reckase, 1997 ). Thus, if correlation among the dimensions exists, using current MIRT calibration programs may cause errors in the process of MIRT calibr ation. In addition, the dimensional orientation of MIRT parameter estimates for those correlated ability dimension cases becomes biased Nevertheless, because most constructs and dimensions within a construct are correlated in education and psychology, the correlated dimension situation cannot be ignored. Oshima et al. (2000) investigated six conditions defined by ability distributions for two groups. Min (2003) used four conditions in his study. Wei (2008) examined four of MIRT linking methods an d investigated in four conditions defined by the ability distributions across two groups ; these conditions were similar to those investigated by Oshima et al. (2000), but unlike Min (2003) and Oshima et al. (2000), the ability dimensions were correlated in the base group in Wei (2008). The correlations among a bility dimensions were set .5 in the Min (2003) and Wei (2008). In Oshima et al. (2000), the correlations among ability dimensions ranged from .5 to .7. Yao and Boughton (2007) used correlations among dimensions as high as 0.9. Simon (2008) used a correlation of 0.8 as the highest correlation condition for her study. Possible estimation errors and biased dimensional orientation of MIRT parameter estimates might be caused by correlation among dimensions in the process of MIRT calibration. If t hese two problems occur it would be prior to the MIRT linking and equating operation. Conceptually, the equating errors can be separated into two main elements. One is the error in

PAGE 117

117 the parameter estimates during MI RT estimation process. The other is the error produced by the inaccurate linking or equating transformations. The former error usually intertwines with the latter and these two errors cannot be separated in practice (Li, 1997; Min, 2003). Thus, although ge nerating correlated abilities in the study is appropriate in that dimensions of multidimensional c onstructs are typically correlated in education and psychology correlated abilities may negatively impact on the performance of the MIRT linking In all cond itions the ability dimensions were uncorrelated in the base group. Six conditions were created by varying the means, variances, and correlations between the ability dimensions for the equated groups: (1) No difference in the base and scaled groups (the nul l condition), (2) differences in variances, (3) differences in means, (4) differences in correlation, (5) differences in means and correlations, and (6) di fferences in means, variances and correlations. Note that only three conditions for the equated group have correlations among dimensions (condition s 4, 5, and 6). The details of the populatio n design are shown in Table A 14 Con dition Summary In this study, five MIRT linking methods were i nvestigated including the OD method, the TCF method, the ICF method from Oshima et al. (2000), the M method from Min (2003), and the NOP method from Reckase and Martineau (2004). Three MIRT equ ating procedures proposed by Brossman (2010) were examined in this study. These three MIRT equating methods are the MOSE method the ATSE method and the AOSE method. In summary, the design for the data generation was a two factor completely crossed design : 6 (ability distributions) 2 (test structure) for a total of 12 data generation conditions. In

PAGE 118

118 addition there were 15 MIRT linking/equating methods : 5 (MIRT linking methods) 3 (MIRT equating methods) T herefore, there were total 180 combinations of data generation and linking /equating conditions in this study. Table A 15 demonstrates all the twelve simulation conditions. Data Generation The item response data were generated using the statistical language R 2.14 (R Genz, Bretz, & Hothorn 2010) was used to generate ability scores The compensatory two parameter two dimension logistical model (M2PL) was used to generate the item response s Response data were generated 200 times from a set of population item parameters (40 items) and population ability parameters (4000 simulees) for each condition. Note that all programs to generate MIRT item response s were written in R. MIRT Parameter Estimation Selectio n of IRT Software Program According to past literature, the parameters of the MIRT models can be estimated through a variety of estimation methods including unweighted least square s procedure (ULS), weigh t ed least square s (WLS), marginal maximum likelihoo d (MML) method ( Baker, 1992; Bock & Aitkin, 1981 ) and Markov Chain Monte Carlo (MCMC) method. A number of computer programs have been developed for estimating the parameters in MIRT models. To date, computer programs available for MIRT estimation include T ESTFACT (Bock et al. 2003), NOHARM (Fraser, 1998), Mplus (Muthen & Muthen, 1998), ConQuest (Wu, Adams, & Wilson, 1997), and BMIRT (Yao, 2003). Among all available computer programs, NOHARM and TESTFACT are still the most popular computer programs for MIRT estimation. TESTFACT (Bock, et al. 2003) estimates the item and person parameters from the multidimensional extension of the two parameter normal ogive model. Although TESTFACT

PAGE 119

119 does not estimate the lower asymptote parameter for the items, the lower asym ptote parameters can be set so that the item and person parameters can be estimated from the multidimensional extension of the three parameter normal ogive model. TESTFACT uses marginal maximum likelihood (Bock and Aitken 1981) to estimate the MIRT model i tem parameters. In TESTFACT, the parameters are estimated by using the Expectation/maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977) in minimization. TESTFACT assumes a standard multivariate normal distribution of coordinates with zero inter cor relations. NOHARM (Normal Ogive Harmonic Analysis Robust Method) uses a quite different procedure than that used by TESTFACT to estimate the parameters for MIRT models. Unlike TESTFACT, NOHARM (Fraser 1998) is the computer program using unweighted least sq uares method for MIRT estimation. In NOHARM, the parameters are estimated through a quasi Newton algorithm. Previous comparative studies (Beguin & Glas, 2001; Gosz & Walker, 2002, Maydeu Olivares, 2001; Miller, 1991; Stone &Yeh, 2006) showed that neither N OHARM nor TESTFACT was clearly superior to the other when considering the recovery of item parameters. Each program performs better than the other under specific circumstances. The MML procedure used in TESTFACT and the unweighted least square s procedure u sed in NOHARM were equally effective at estimating parameters under well fitting model conditions (Miller, 1991). A summary of the computer software used for MIRT estimation in past MIRT linking/equating literature is shown in Table A 16 Thus, the comput er program TESTFACT was used in this study for MIRT estimation. Multidimensional IRT Estimation A fter generating MIRT response s the item response model was estimated by using TESTFACT to obtain MIRT parameter estimates. The compensatory two parameter two

PAGE 120

120 dimension logistical model (M2PL) was selected for MIRT calibration. TESTFACT provides two types of estimation results unrotated results and rotated results. In practice, the calibrated item parameter estimates could be rotated obliquely for better inter pretation (Li & Lissitz, 2000). In this study, the rotated parameter estimate results were used since TESTFACT as its rotation method For consistency after MIRT estimation, all paramet er estimates were rounded to three dec imal places. Identification Issue in Item Parameter Estimation In the UIRT model, the identification problem is solved by standardizing the ability distribution to mean zero and unit variance (Baker, 1992). In the MIRT model, the identification problem is more complicated. The identification problem is solved by setting the ability estimates for each dimension to zero mean and unit variance In addition, the coordinate axes corresponding to all dimensions are always assumed orthogonal in these programs as well (Li & Lissitz, 2000) Accordingly, the metric of the MIRT item parameter estimates is orthogonal and standardized (Li, 1997). Therefore, the observed correlations among the item scores are accounted for solely by the paramete rs (Li, 1997; Reckase, 1997; Wei, 200 8). Likewise, the original heterogeneous variances and means of multidimensional abilities are also accounted for solely by the parameter estimates as well ( Li, 1997 ; Reckase, 1997 ). The hetero geneous variances and mean difference between groups can be adjusted through translation, dilation/contraction, and rotation from MIRT linking. However, possible estimation errors and biased dimensional orientation of MIRT parameter estimates might be caus ed by correlation among dimensions in MIRT calibration through the MIRT computer pro grams (e.g., TESTFACT, NOHARM).

PAGE 121

121 Correcting Dimension Sequence and Direction TESTFACT does not always identify the sequence of ability dimensions in the estimated results ex actly as the sequence in the MIRT data (Simon, 2008) In addition t he signs of the item parameter estimates can be reversed from those in the model used to simulate the data Therefore, the dimension sequence and the direction of item parameter estimates m ust be permuted before conducting any linking or equating (Bateley & Boss, 1993; Simon, 2008) To avoid incorrect dim ension sequence, the dimension sequence of item parameter estimates must be permuted ( Bateley & Boss, 1993 ). For both APSS and CS test str ucture in this study, the first five item discrimination parameter estimates for each replication were used as indicators to correct dimension sequence because for these five items, the true item discrimination parameters in the first dimension ( ) were larger than the item discrimination parameters in the second dimension ( ). If the sum of these five item parameter estimates in the first dimension ( ), was less than the sum of the ten item p arameter estimates in the second dimension ( ), the item discrimination parameter estimates for the entire test were reversed (i.e., estimates idenfitied as Dimension 1 were labeled as Dimension 2 and vice versa). Likewise, the directi on of item parameter estimates must be corrected. Since the item parameters for both dimensions in a two dimensional test are all positive, w hen averages of item discrimination parameter estimates for each dimension are negative, the sign is reversed (Simo n, 2008) Linking The linking procedure was conducted through the statistical language R 2.14 (R Development Core Team, 2007). The R codes for MIRT linking were adapted from Simon (2008). Note that all five MIRT linking procedures were written in R

PAGE 122

122 MIRT Linking In this study, five MIRT linking methods were investig ated including the OD method the TCF method, the ICF method from Oshima et al. (2000), the M method from Min (2003), and the NOP method from Reckase and Martineau (2004). In general, MIRT scal e linking is conducted to adjust rotation, translation, and dilation. Different methods used different approaches to adjust rotation, translation, and dilation for the parameter estimate coordinate systems. The theoretical details were provided in the Chap ter 2. Moreover, the r otation methods used in previous MIRT linking/equating studies are summarized in the Table A 17 Minimization Algorithm used in Linking Procedure The MIRT linking procedure require s a minimization method during the estimation process to obtain the scale transf ormation coefficients. For the M IRT linking methods, the Newton Raphson algorithm (Kolen & Brennan, 2004, p. 178) and the nonlinear optimization algorithm (i.e., Broyden Fletcher Goldfarb Shanno ; BFGS ) were a vailable for the mini mization. Newton Raphson a lgorithm The Newton Raphson algorithm is an iterative process to find successively better approximations to the roots (or zeroes) of a real valued function. To use this method, the function is initially set to 0. Based on Kolen a nd Brennan (2004, p. 177), the function is defined as a function of the scale transformation coefficient Thus, the first derivative of the function is defined as To start the Newton Raphson algorithm, an initial value is given for which is referred to as A new value for is obtained as (3 4)

PAGE 123

123 Theoretically, would be closer to the root of the equation than The new value then is redefined as and the process is repeated until the difference between and meets specific level of precision or until the value of is close to 0 at a specific level. Broyden Fletcher Goldfarb Shanno (BFGS) method The Broyden Fletcher Goldfarb Shanno (BFGS) method (Denn is & Schnabel, 1996) is a method to solve nonlinear optimization problems. BFGS method is similar to the quasi Newto n method and Newton Raphson algorithm, but with less constraints. The BFGS method is an approximation of Newton's method that seeks a statio nary point of a (preferably twice continuously differentiable) function. In the MIRT linking process, both Newton Raphson and BFGS methods were applied respectively. Then, the scale transformation results from both methods were compared so that the results with smaller estimation errors were selected to transform scale systems for both groups onto the common scale. Criteria and Quadrature Nodes used in Minimization In Simon (2008), the maximum number of iterations used in the minimization for both UIRT and MIRT linking was set to 20. Similarly, the minimum standard error used in the minimization for both UIRT linking and MIRT linking was set to .2 (Simon, 2008). For better minimization purpose, in this study, the maximum number of iterations used for mi nimization was set to 30. Likewise, the minimum standard error used for minimization was set to .1. The iteration process stops, as the sum of the squared difference between two probability functions for two forms (i.e., base form parameters, transformed e quated form parameters) through scale transformation is smaller than the minimum standard error criterion. Otherwise, the iteration process keeps going until it reaches the number of maximum iterations (i.e., 30).

PAGE 124

124 In Oshima et al. (2000), the number of qu adrature nodes in MIRT linking for each dimension was set to seven. Because the compensatory two parameter two dimension logistical model (M2PL) was used, the total quadrature nodes used for MIRT linking was forty nine (i.e., ) in this study. Equating Common Target Population and Synthetic Population Weights Before conducting equating, the common target population and synthetic population weights for two groups in this study must be considered. In previous test equating literature various properties of equating have been proposed ( e.g., Angoff, 1971; Lord, 1980). Among all the properties of equating, the group invariance property becomes important to define population e equating relationship is the same (Kolen & Brennan, 2004 ). However, in reality, it is very difficult to hold group invariance assumption in the strictest sense. Thus, in order to hold the group invariance property, the population of examinees on which the equating relationship is developed should be clearly stated and representative of all groups of examinees who are administered the test (Kolen & Brennan, 2004). This population of examine es for test equating is called target population. This target population of equating could be either an actual population or a hypothetical population. If the resulting equating functions for different target populations are different enough to have pract ical consequences, then the population invariance requirement is violated (von Davier, Holland & Thayer, 2004). In the NEAT design there are two target populations, and a sample of examinees is drawn from each, respectively (i.e., target population 1 for base group ; target population 2 for equated consideration here involves two populations, an equating function is still viewed as being defined

PAGE 125

125 within a single popula tion (Kolen & Brennan, 2004). Thus, a common target population for the NEAT design must be defined. Therefore, multiple populations must be combined to obtain a single common target population to define an equating relationship. The common target populatio n is a mixture of both nonequivalent target populations where the samples of examinees are drawn from (von Davier, Holland & Thayer, 2004). To address this issue, Braun and Holland (1982) introduced the concept of a synthetic population in which Popula tion 1 and 2 are weighted by and respectively, (3 5) where and In this study, the synthetic population weights for population 1 a nd 2 to define the target population are set as and So, the synthetic population weights for both group 1 and 2 were set as and for the target population in this study. Thus, (3 6) (3 7) MIRT Equating In this study, three MIRT equating methods were applied. These MIRT equating methods are full MIRT observed score equating method, unidimensional approximation of MIRT true s core equating method, and unidimensional approximation of MIRT observed score equating method proposed by Brossman (2010). Full MIRT observed score e quating (MOSE) The MOSE method has been fully described in Chapter 2 and the specific details regarding th e procedure of this method were illustrated. The MOSE method is a straightforward extension of UIRT observed score equating. Several steps are used in this equating method. In the first

PAGE 126

126 step, the distribution of observed number correct scores for examinee s of a given ability combination was estimated for the compound binomial distribution through a recursion formula (Lord & Wingersky, 1984). In this step, the conditional observed score distributions (i.e., ) were determined at eac h combination of ability levels (i.e., the combination of each grid points for ) in the entire ability space (Kolen & Wang, 2007), where is the ability combination vector (i.e., ). In the next step, quadrature nodes and weights were determined from the multivariate normal distribution. The number of quadrature nodes was equal to where is the number of quadrature points per dimension and is the number of dimensions. In this two dimension case, the quadrature nodes were selected from the range with 0.2 as the equal sized interval of the sequence. So, th e number of quadrature nodes per dimension was equal to 41 ( ), and so Accordingly, 1681 quadrature nodes were obtained in this case. That is, there were 1681 combinations of the ability vect ors in this two dimension case. (3 8)

PAGE 127

127 Based on the combinations of the ability vector, the conditional observed score distributions (i.e., ) was obtained through the recursion formula sho wn in equation 2 89 and re presented below : (3 9) Next, these conditional observed score distributions (i.e., ) were multiplied by the ability density ( ) so that a joint distri bution of the observed score was obtained. The ability density ( ) here was treated as a discrete distribution and randomly selected from a Genz, Bretz, & Hothorn, 2010). In the next step, the observed marginal distribution ( ) was determined for each form by multivariate accumulated over all joint distributions at each level of ability combination on the ability space (i.e., ). The mathematical expression is displayed in Equation 2 90 and re presented below : (3 10) where is defined as the number of dimensions. After above transformations, the traditional equipercentil e method was applied to equate both test forms. Unidimensional approximation for MIRT e quating The unidimensional approximation algorithm (Zhang & Stout, 1999) is the foundation to successfully conduct the unidimensional approximation of MIRT true score equating method and the unidimensional approximation of MIRT observed score equating method. The key function

PAGE 128

128 for unidimensional approximation is the formula f or the direction of the linear composite described in Equation 2 97 and re presented below : (3 11) Thus, the MIRT item parameter estimates were transformed to the UIRT item parameter estimates by using unidimensional approximation according to the equation above. T he item parameters of the UIRT approximation for the MIRT m odel can be obtained from Equation 2 98 to Equation 2 102. Then, these transformed unidimensional approximation item parameter estimates above were used in the ATSE method, similar to the procedure in UIRT true score equating. The procedure used for the A OSE method can be described as follows. Similar to the MOSE procedure, the number of quadrature nodes and weights for observed score equating were determined for the unidimensional approximation. To obtain the unidimensional quadrature nodes, each vector o f multidimensional quadrature points was multiplied by the vector of standardized linear composite coefficients ( ) in accordance with the test level direction of best measurement (Brossman, 2010) : (3 12) Then, the quadrature nodes shown above were rank ordered according to their magnitude. In the next step, the equal sized intervals were created. Then, the number of quadrature nodes was divided by to determine how many of the nodes appear within

PAGE 129

129 each interval. In the next step, by summing all quadrature nodes and weights pattern falling within each interval, the total quadrature nodes and weights pattern were collapsed into quadrature nodes and weights. The mean quadrature nodes for each interval were used as the final quadrature node for the interval and the sum of the q uadrature weights within each interval was used as the final quadrature weight. Unidimensional approximation of MIRT t rue (ATSE) /observed score e quating (AOSE) After the unidmensional approximation was complete, the ATSE procedure and AOSE procedure were conducted, similar to the UIRT true score equating and UIRT observed score equating. To conduct the ATSE procedure and AOSE procedure a set of R codes were developed through R 2.14 (R Development Core Team, 2007). Note that in the ATSE procedure the tru e scores at both ends may outside the range of possible true scores on test forms. To solve this possible problem, the ad hoc procedure proposed by Kolen (1981) and Lord (1980) was applied. Since there were forty items in each test forms, there were forty one possible scores in this study (i.e., 0 40). By using the ad hoc procedure (Kolen, 1981; Lord, 1980), the true scores at both ends were fixed as zero and forty, respectively. Thus, only thirty nine possible true scores were used for the test equating me thods comparison use in this study. Large Sample Criterion and Criterion Equating Method Harris and Crouse (1993) stated that when the p opulation equating is known, a true criterion would exist for evaluating equating results. In some studies (e.g., Ango ff & Cowell, 1985; Heh, 2007; Skaggs, 2005), a very large sample equating has been treated as a population equating and as a criterion for evaluating equating methods by comparing the results of equatings from sample groups of examinees.

PAGE 130

130 In most of the r eal test equating settings, it is difficult to obtain a very large sample of data, which makes this large sample criterion method impractical. However, because this study is a simulation study, it is possible to obtain very large samples and makes large sa mple criterion appropriate to compare different MIRT equating methods for sample groups of examinees. The frequency estimation method for NEAT design was used as a criterion equating function for comparing the MIRT equating procedures, since this method o nly employed total test scores and the assumptions associated with this procedure were not expected to be violated in this study. For this frequency estimation method, the frequency distributions of equated Form X and base Form Y for a common synthetic p opulation are estimated in equation 3 7 and 3 8. The frequency estimation method assumes that the distribution of X and Y scores conditioned on the common item set V scores are population invariant; that is, (3 13) (3 14) Then, it follows that (3 15) (3 16) where and are the marginal distributions of the common item set scores in Populations 1 and 2. All the above quan tities are directly observable with a common item nonequivalent groups design. Equipercentile equating is then applied to and

PAGE 131

131 We acknowledge that using frequency estimation method as the criterion equating func tion in evaluating different equating methods is arguable. However, frequency estimation method in this study is a model free procedure and can be regarded as independent criteria. Note that t he specific chose of synthetic population weight can matter for Frequency Estimation Method ( von Davier, et al. 2004 ). In addition, the common target population is a mixture of both nonequivalent target populations (von Davier, et al. 2004). Therefore, similar to the synthetic population weights used in MIRT equating procedures, the synthetic population weights for the frequency estimation method are set as and as well. Evaluation Criteria and Data Analysis Evaluation Criteria In the test equating liter ature, the choice of criteria for evaluating equating methods has been made subjectively (Harris & Crouse, 1993) D ifferent criteria might lead to the preferenc e of different equating methods. Therefore, it is essential to select criteria to evaluate diffe rent equating methods. Several criteria were used to evaluate the adequacy of the MIRT equating methods. In general, when evaluating equating, the error generated in the equating process can be decomposed into two different types of errors: systematic erro r and random error (Kolen & Brennan, 2004). The goal of this study wa s to find the most accurate equating method so that both systematic error and random error are minimized. According to previous equating literature (e.g., Harris & Crouse, 1993; Zeng & Kolen, 1995) and other evaluation studies of test equating the evaluation criteria for equating results include: Standard Error of Equating conditional on scores ( ), equating bias (Livingston, 1993), and Root M e an Square Deviation ( ) for each score point and weighted average

PAGE 132

132 Root Mean Square Deviation ( ) for the entire test form were used as criteria to evaluate a series of equating methods in this study. Standard error of e quating ( ) The Standard Error of Equating ( ) is used to indicat e the magnitude of random error in equating which is due to sampling of examinees. The Standard Error of Equating ( ) is defined as (3 17 ) and (3 18 ) where is the total number of replications (i.e., 2000); represents the score on the base form; denotes the raw sco re equivalent calculated from one equating procedure in replication ; indicates the mean of equated score at score over the replications. Equating b ias Equating bias i s used to indicate systematic error in equating. It was defined as the mean difference between an equating method and criterion equating function (i.e., Frequency Estimation method) over replications. T he bias at each raw score point is defined as (3 19 ) where indicates the raw score equivalent calculated from the criterion equating function.

PAGE 133

133 Root mean square d eviation ( ) The Root Mean Square Dev iation ( ), is a measure of the overall equating accuracy. It is defined: (3 20 ) where denotes the raw score equivalent calculated from one equating procedure in replication indicates the raw score equivalent calculated from the criterion equating function. The quantity is equal to the sum of the Standard Error of Equating ( ) and the equating bias ( ). The Root Mean Square Deviation ( ), is used to evaluate the total error generated in the equating process. Weighted a verage root m ean square d eviation ( ) Weighted Average Root Mean Square Deviation ( ) is used to evaluate the discrepancy between each equating procedure and the criterion equating function on the test level. In general, the weighted average root mean square deviation ( ) over all avai lable score points was computed as (3 21 ) where is the population proportion of examinees from target population who have an observed score of on equated form ANOVA Analysis Because five MIRT linking methods and three MIRT equating methods applied to the same response patterns, a repeated ANOVA model was used to detect effects of simulation conditions (between factors), linking methods (within factors), and equating methods (within

PAGE 134

134 factor s) on the weighted average root mean square deviation ( ) and weighted average bias ( ) f or each iteration The repeated measures ANOVA model is defined: (3 22 ) where, is the weighted root mean squar e deviation or weighted average bias for the entire test, is the overall mean of weighted root mean square deviation in the population, is the effect of th distribution group (group 1 6), is the effect of th test structure (APSS and CS), is the interaction effect of group and test structure (between subject factors), is the effect of th iteration within th distribution group and th test structure (i.e., error term of between subject factors) is the effect of th MIRT linking method (five MIRT linking methods), is the effect of th MIRT equating method (three MIRT equating methods), is the interaction effect of MIRT linking and MIRT equat ing method, is the interaction effect of MIRT linking method and distribution group, is the interaction effect of MIRT equating method and distribution group, is the three way inter action effect of MIRT equating method, MIRT linking method, and distribution group,

PAGE 135

135 is the interaction effect of MIRT linking method and test structure, is the interaction effect of MIRT equating metho d and test structure, is the three way interaction of MIRT linking method, MIRT equating method, and test structure, is the three way interaction of MIRT linking method, distribution group, and test structure is the three way interaction of MIRT equating method, distribution group, and test structure, is the four way interaction of MIRT linking method, MIRT equating method, distribution group, and test structu re, is the interaction effect of MIRT linking method, MIRT equating method, within th distribution group and th test structure (i.e., error term of within subject factors). In equation ( 3 18), there are two between subject factors: test structure and distributional shape of the group. Also, there is one between subject factor interaction ( ). In addition, there are two within subject factors MIRT linking method and MI RT equating method. Additionally, there is one within subject factor interaction ( ). Besides all the factor and interactions, there are several interaction terms for between by within subject factors. Statistics of two summary statis tics were examined to provide detailed patterns of errors associated with MIRT linking, equating, group distribution difference and test structure. The proportion of variance effect size of partial was reporte d and interpreted in this study The proportion of variance effect size of partial was computed as

PAGE 136

136 (3 23) where is computed by pooling the between subjects and within subjects error variances

PAGE 137

137 CHAPTER 4 RESULTS A series of simulation studies and analyses were conducted to examine the effects of four factors on test equating results for two test forms (i.e., base form and equated form) following the procedures described in the previous chapter. Those facto rs were: 1. Two test structures: Approximate Simple Structure (APSS) and Complex Structure (CS). 2. Three group ability distributions: group mean difference, group correlation difference, and group standard deviation difference. 3. Five MIRT linking methods: the M method, the OD method, the TCF method, the ICF method, and the NOP method. 4. Three MIRT equating methods: the full information MIRT observed score equating (MOSE) method, the unidimensional approximation of MIRT true score equating (ATSE) method, and the u nidimensional approximation of MIRT observed score equating (AOSE) method. The first two factors are between subject factors, and the second two factors are within subject factors. For each combination of levels of the between subjects factors, 200 replica tions were conducted. This chapter summarizes the results of this simulation study. In the first section, the summary of ANOVA analysis results (i.e., ) for weighted root mean square deviation ( ) and weighted bia s ( ) for the entire test are presented. This section focuses on how the four factors impacted the equating results of the entire test. In the second section, the results of differences between 1) the equivalent scores from three MIRT equating methods and 2) the equivalent score from the equating criterion (i.e., Frequency Estimation Method) across score scale for each group distribution condition are presented. Since the results of the SEE, the RMSD, and the Bias were expected to vary at different points along the score scale, the main focus was on how the four factors impact equating results across the entire score scale. Thus, the Equating Bias, the SEE and the RMSD are reported conditional on each score point across linking methods f or each group distribution condition.

PAGE 138

138 Table A 1 8 through Table A 22 present the results of ANOVA tests and corresponding averaged mean results, based on the two largest proportion of variance effect sizes of from the ANOVA result s. The detailed results of the equating mean score and score difference for the equating methods across linking procedures for different group distribution conditions are presented in Table s A 23 to A 82 and Figure s B 1 to B 60. Table s A 83 to A 142 and F igure s B 61 to B 120 display the results of the SEE, the RMSD, and the Bias statistics, conditional on each score point across MIRT linking methods for all MIRT equating methods used in this study. To make the comparison across different conditions of grou p distribution easier to interpret, the results of SEE, RMSD, and Bias, along the mean scale score differences, are presented using fixed vertical scales. For the mean score difference of equivalent scores from MIRT equating methods compared with score sca le and equivalent score from the criterion equating function, a scale from 4 to 4 is used. For the conditional SEE results, a scale from 0 to 1.5 is used. For the conditional RMSD, a scale from 0 to 2.0 is used. For the conditional Bias, a scale from 1.0 to 1.0 is used. Preliminary ANOVA results 2 The results of ANOVA tests for all four factors are presented in Table A 1 8 The results of the analysis are the proportion of variance effect size of partial for weighted average Bias and weighted average RMSD for the entire test. The proportion of variance effect sizes of for both weighted average bias and weighted average RMSD indicate that the effects of linking an d equating results were most dependent

PAGE 139

139 upon group distribution changes. The largest effect size was the interaction of linking method with group distribution factor, with the effect size of partial equal to .88045 for and .94122 for The second largest effect size was the interaction of equating method with group distribution factor, with the effect size of partial equal to .46236 for and .587 11 for Test structure and all the interactions that include test structure had a very small effect size of total This pattern is made clearer by the results of effect size of partial in Table A 1 8 That is, the linking method group distribution interaction accounted for the largest portions of and variations in the equating results for the entire test. Also, the equa ting method group distribution interaction accounted for the second largest portions of and variations in the equating results for the entire test. In sum, the results of the repeated measu res ANOVA show that the group distribution change and the type of MIRT linking had significant effects on equating results (i.e., linking method group distribution interaction). Test structure and all the interactions including test st ructure had very small effects on equating results. And the soundness of equating results depended on various group distribution changes, linking methods, equating methods, and their interactions. Comparison for the Linking Method x Group Distribution Inte raction Since the linking method group distribution interaction accounted for the largest portions of and variations in equating results for the entire test, the mean of and for the equating results were obtained by averaging the mean results from different test structures and the MIRT equating methods, in order to directly compare equating results of

PAGE 140

140 five MIRT linking methods across different grou p distribution conditions. These results are shown in Table A 19 and T able A 20 In general, the TCF method and the ICF method performed best across all group distribution conditions with the mean s equal to .8067 .4022 for and 3.3 1658 1.45064 for as compared with the other three linking methods. The OD method and the M method had less biased and more stable results with the mean s equal to .9214, 2.2417 for a nd 5.29746, 13.0437 for than the NOP method in terms of smaller means of and on equating results. The NOP method performed worst and had the largest means of and ( 4.0161 and 40.485) on equating results, compared with other four linking methods. That is, under NOP method, the equating results for the entire test had the largest amount of (40.485) which was approximately the same as the total test scal e. More specific variation points follow. Under the condition with only group standard deviation change (i.e., Group 2), the performance pattern across all five linking methods was similar to the general pattern but the M method performed best with the me an s equal to .29248 for and .20884 for compared with the other four linking methods in terms of smallest magnitude of mean of Under the condition with both group standard deviation and mea n change (i.e., Group 3), the magnitude of means of and for all five MIRT linking methods drastically increased as compared with Group 1 and Group 2. The results of for the M method, the O D method, the TCF method, the ICF method, and the NOP method under condition 3 were 5.119, 2.3007, 1.8481, 1.6189, and 9.0351. The results of for the M method, the

PAGE 141

141 OD method, the TCF method, the ICF method, and the NOP method un der condition 3 were 28.6865, 7.52435, 5.64556, 4.9614, and 87.9546. The performance pattern across all five linking methods was similar to the general pattern but with extremely larger magnitude of means of and Under the NOP method, the average results of was 87.9546 which means the discrepancy between the average scores of three MIRT equating procedures and criterion equating function scores was more than double of the entire test scale. Under the condition where correlation exists between group ability dimensions (i.e., Group 4), the ICF method outperformed the other four linking methods in terms of having the smallest mean of (.19166). Generally, the results of and for equating results at all five linking methods under condition 4 were comparatively small with the mean s equal to .321992 for and 1.06243 for The performance pattern of all five linking methods under correlated ability dimensions was similar to the general pattern across all group distribution conditions. Under the condition with both group mean difference and correlation existing between group ability dimensions (i.e. Group 5), the magnitude of means of and for all five MIRT linking methods greatly increased similarly to the results in Group 3 Also, the ICF method outperformed the other four linking methods, and the perform ance pattern of all five linking methods under this condition (i.e., both group mean difference and correlation existing between group ability dimensions) was similar to the general pattern across all group distribution conditions. Under the condition with all group difference (i.e., correlation existing between group ability dimensions, group mean difference, group standard deviation difference), similarly to the results from all group mean difference conditions the magnitude of the means of and

PAGE 142

142 for all five MIRT linking methods greatly increased as compared with the null condition. The equating performance pattern of all five linking methods under this condition was similar to the general pattern across all group distribution conditions with the mean s equal to 4.7977, 2.2377, 1.7191, 0.4831, 8.7142 for and 26.0056, 10.8617, 5.6119, 1.6744, 81.6578 for under the M method, the OD method, the TCF method, the ICF method and the NOP method. For the the general pattern across all group distribution conditions, the equating performance had the mean s equal to 2.2417, 0.9214, 0.8067, 0.4022, 4.01613 for and 13.0437, 5.2975, 3.3166, 1.4506, 40.4850 for under the M method, the OD method, the TCF method, the ICF method and the NOP method. Comparison for the Equating Method x Group Distribution Interaction After the linking method group distribution interacti on, the equating method group distribution interaction accounts for the next largest portions of and variations in the equating results for the entire test. The mean of and for the equating results are obtained by averaging the mean results from different test structures and the MIRT linking methods, for the purpose of directly comparing equating results of three MIRT equating methods across differ ent group distribution conditions. Th ese results are shown in Table A 21 and Table A 22 Overall, it was found that all three MIRT equating methods performed comparatively well when there was no group mean difference, including under the null condition. G enerally, the ATSE method demonstrated the best equating performance with the means equal to 1.2752 for and 9.07452 for as compared with the other two equating methods (i.e.,

PAGE 143

143 MOSE, AOSE) across all group di stribution conditions. The MOSE method displayed the worst equating performance, with the mean s equal to 2.4309 for a nd 18.6353 for When there was no group distribution mean change, the MOSE method performed better than the AOSE method in terms of smaller values of the mean of and for the equating results for the entire test (i.e., .23211 for and .23195 for under null conditi on) When there was a group distribution mean change, both the ATSE method and the AOSE method outperformed the MOSE method with regard to smaller values of the mean of a nd However, the magnitude of the means fo r the a nd drastically increased for all three MIRT equating methods when group mean differences existed For example, under group distribution condition 3, the mean results of all three MIRT equating p rocedure wer e 2.23063 for a nd 16.3327 for compared with .36066 for a nd .372772 for under null condition More specific variation points follow. When only the group standard de viation changes (i.e., Group 2) or only a correlation exists between group ability dimensions (i.e., Group 4), the performance pattern across all three equating methods was similar to the general pattern with the mean s equal to 2.23063 for and 16.3327 for The ATSE procedure outperformed the other two equating methods with the smallest magnitude of the mean for (i.e., .27605). Under the condition with both group standard deviation and mean cha nge (i.e., Group 3), the values of means of and for all three MIRT equating methods greatly increased with the means equal to 2.2306 for and 16.3327 for as compared wi th the means equal to .2989 for and .37375 for under conditions without group mean

PAGE 144

144 changes (i.e., Group 2) The performance pattern across all three equating methods was similar to the general pattern but with lar ger magnitude of the means of and Under the condition with both group mean difference and existing correlation between group ability dimensions (i.e., Group 5), similar to the results in Group 3 the values of me ans of and for all three MIRT equating methods greatly increased. Also, the performance pattern of all three equating methods under this condition (i.e., both group mean difference and existing correlation betwee n group ability dimensions) was similar to the general pattern across all group distribution conditions. Under the condition with all group differences (i.e., correlation exists between group ability dimensions, group mean difference, group standard deviat ion difference), similar to the results from all group mean difference conditions the values of the means of and for all three MIRT equating methods greatly increased as compared with the null condition. The perfo rmance pattern of equating methods under this condition was similar to the general pattern across all group distribution conditions. Group Distribution Conditions While the ability distribution for the baseline group was fixed (group mean = zero and standa rd deviation = 1.0 for both ability dimensions), six bivariate normal distributions were considered as six additional conditions for the equated groups in this study. Because test structure (i.e., APSS and CS) and all the interactions including test struct ure had a very small effect on the test equating results, the overall trends of the conditions with CS test structure (i.e., condition 7 12) were similar to their corresponding conditions with APSS test structure (i.e., condition 1 6). Accordingly, the int erpretations for the conditions with CS test structure (i.e.,

PAGE 145

1 45 condition 7 12) are left out of this chapter, but corresponding results are shown in Tables A 5 3 to A 82, Tables A 113 to A 142 and in Figure s B 3 1 to B 60, B 91 to B 120 in the Appendix. The re sults of difference between 1) equivalent scores from three MIRT equating methods and 2) the equivalent score from the equating criterion (i.e., Frequency Estimation Method) for the entire score scale are discussed within each group distribution condition. The results of SEE, RMSD, and Bias across score scale within each group distribution condition are also given. Tables and figures are organized according to different group mean distribution conditions and MIRT linking methods. Group Distribution Conditio n 1 In group distribution condition 1 the multivariate ability distribution is the null condition, with mean (i.e., 0) and standard deviation (i.e., 1) matching the ability distribution in the baseline group. Equivalent score d ifference Table s A 23 to A 27 and Figure s B 1 to B 5 display the results of the equating mean score and score difference for all three MIRT equating methods across five linking procedures for group distribution condition 1 (i. e., group 1). Under the null group distribution condition, neither the ATSE nor the AOSE equating procedures differed from the MOSE procedure within each of five linking methods. Under the TCF, ICF, and OD linking methods, all three equating procedures performed similarly. The differences between the equivalent sc ores from the criterion equating and the equivalent scores for the three equating procedures tend to be small (i.e., ). Under the M linking method, all three equating procedures performed similarly within the null group distribution co ndition. When compared with equivalent scores from the criterion equating function, the three equating procedures performed well at the lower end of the score

PAGE 146

146 scale (from score 1 to 20) with comparatively small equivalent score differences, but had larger positive score differences at the upper end of score scale (from scale score 21 to 40), and reached their maximum around score 30 (i.e., ). Under the NOP linking method, all three equating procedures tended to produce higher raw equiv alent scores as compared to the criterion, and this tendency was stronger across the middle of the score scale. When compared with corresponding equivalent scores from the criterion equating, the three equating procedures all had positive equivalent score differences across the entire score scale, and reached their maximum around score 23 (i.e., ). SEE, RMSD, and Bias Tables A 83 to A 87 and Figure s B 61 to B 65 show the results of SEE, RMSD, and Bias statistics across MIRT linking met hods for all MIRT equating methods for group distribution condition 1 (i.e., group 1). A visual inspection of these figures suggests that the bulk of RMSD came from SEE, and a relatively small portion came from Bias. As a result, RMSD values more closely r esemble those for SEEs under the null group distribution condition. Under the TCF, ICF, and OD linking methods, values of SEEs, RMSDs, and bias for all three equating procedures were smaller than those under the other two linking methods (i.e., M method an d NOP method). Within each of three linking methods, the AOSE procedure produced larger amounts of SEEs, RMSDs, and Bias across the score scale than either the MOSE or the ATSE procedures. The magnitude of the SEEs for the three equating procedures increas ed when scale scores reached both ends of the entire score scale. The magnitude of the RMSDs for the three equating procedures increased when scale scores reached both ends of the score scale and peaked at a raw score equal to 35. Within each of the three linking methods, all three equating procedures appeared to have negative bias at the lower tail and smoothly approached positive

PAGE 147

147 bias throughout the middle score range, and their Bias values increased and peaked around raw scores of 35 or 36. Under the M l inking methods, values of SEEs, RMSDs, and bias for all three equating procedures were larger than those under the TCF, ICF, and OD linking methods but smaller than those under the NOP linking method. All three equating procedures had a very similar patter n of SEEs and RMSDs. At the lower tail, their SEEs and RMSDs were comparatively high, but quickly decreased throughout the entire middle score range. Then at the upper tail, their RMSDs increased and peaked at a raw score of about 35. Under the M linking m ethod, the AOSE procedure produced larger amounts of SEEs, RMSDs, and bias across the score scale than did the MOSE and ATSE procedures. The ATSE and MOSE equating procedures did not appear to result in much difference in terms of magnitude of SEEs, RMSDs, and bias across the score scale. All three equating procedures appeared to have negative bias at the lower tail and smoothly approached positive bias throughout the middle score range, and then their Bias values increased and peaked around raw scores of 3 5 or 36. Under the NOP linking method, all three equating procedures displayed the largest SEEs, RMSDs, and bias as compared with those under the TCF, ICF, and OD linking methods. For all three equating procedures, the magnitudes of SEEs, RMSDs, and bias s tarted increasing from the lower ends of the score scale and then peaked at a raw score of approximately 33. The AOSE procedure produced slightly larger amounts of SEEs, RMSDs, and bias across the score scale than did the MOSE and ATSE procedures. At the l ower tail, the SEEs, RMSDs and bias were comparatively low, but smoothly increased throughout the entire middle score range until their values peaked at a raw score around 35. Then, they decreased at the upper tail.

PAGE 148

148 Group Distribution Condition 2 In group distribution condition 2, the multivariate ability distribution has an ability standard deviation of .8, but otherwise matches the null distribution. Equivalent score d ifference Table s A 28 to A 32 and Figures B 6 to B 10 display the results of the equatin g mean score and score difference for all three MIRT equating methods across five linking procedures for group distribution condition 2 (i.e., group 2). In general, under the group distribution condition 2 the three equating procedures produced larger amou nts of equivalent score differences across the five linking methods as compared with those under group distribution condition 1. The AOSE procedure did not differ from the MOSE procedure across each of five linking methods, but the ATSE procedure performed best among three MIRT equating procedures, with the fewest differences between its equivalent scores and equivalent scores from the criterion equating function. Under the M linking method, the equating procedures had the least amount of equivalent score d ifference, followed by those under the NOP linking methods. The equating procedures under TCF, ICF, and OD linking methods produced the largest amount of equivalent score difference. Under the TCF, ICF, and OD linking methods, all three equating procedures performed similarly with regard to equivalent score difference changes, but the ATSE procedure had the best performance with the least amount of equivalent score differences across the score scale. All three equating procedures appeared to have negative s core differences at the lower tail, but the equivalent score differences smoothly approached positive differences throughout the middle of the score range and then peaked at a raw score around 35 or 36.

PAGE 149

149 Under the NOP linking method, all three equating proc edures performed similarly with regard to equivalent score difference changes, but the ATSE procedure had the best performance with the smallest equivalent score difference across the score scale. Unlike their equating performance under the TCF, ICF, and O D linking methods, all three equating procedures appeared to have positive score differences at the lower tail, with equivalent score differences smoothly decreasing throughout the entire middle score range, then increasing again and peaking at a raw score around 35 or 36 for both the AOSE and MOSE procedures. As for the ATSE procedure, the equivalent score differences were distributed more evenly across the entire score scale. SEE, RMSD, and Bias Table s A 88 to A 92 and Figure s B 66 to B 70 show the result s of SEE, RMSD, and bias statistics across MIRT linking methods for all MIRT equating methods for group distribution condition 2 (i.e., group 2). In general, under the group distribution condition 2 the three equating procedures produced larger amounts of SEE, RMSD, and bias across the five linking methods as compared with those under group distribution condition 1. A visual inspection of these figures suggests that the bulk of RMSD came from both SEEs and bias. As a result, RMSD values resemble those from both SEEs and bias under this group distribution condition. Under the TCF, ICF, and OD linking methods, values of SEEs, RMSDs, and bias for all three equating procedures are larger than those under the other two linking methods (i.e., M method and NOP meth od). This pattern is in opposition to its corresponding pattern under group distribution condition 1. Within each of three linking methods, the AOSE procedure produced larger amounts of SEEs, RMSDs, and bias across the score scale than did the MOSE and ATS E procedures. The magnitude of the SEEs for three equating procedures increased when scale scores reached both ends across the entire score scale. The magnitude of the RMSDs for the

PAGE 150

150 three equating procedures increased when scale scores reached both ends ac ross the entire score scale and peaked at a raw score equal to 35. Within each of three linking methods, all three equating procedures appeared to have negative bias at the lower tail, but smoothly approached a positive bias throughout the entire middle sc ore range. Then, bias increased and peaked around a raw score of 35 or 36. Among three equating procedures, the ATSE procedure had the least amount of bias across the entire score scale. Under the M linking methods, all three equating procedures had the le ast amount of SEEs, RMSDs, and bias as compared with those under all other linking methods. All three equating procedures had a very similar pattern of SEEs and RMSDs but with some minor variations. At the lower tail, their SEEs and RMSDs were comparativel y high, but quickly decreased throughout the middle score range. Then at the upper tail, their RMSDs increased and peaked at a raw score around 35. Under the M linking method, the AOSE procedure produced larger amounts of SEEs, RMSDs, and bias across most of the score scale than did the MOSE and the ATSE procedures, but the MOSE procedure produced larger amounts of RMSD and bias than did the ATSE and the AOSE in the middle score range. Both the ATSE and the MOSE equating procedures do not display an obvious divergence in terms of similar magnitude of SEEs, but the ATSE procedure produced smaller RMSDs than did the MOSE across the score scale. All three equating procedures appeared to have positive bias across the entire range of the score scale, except that the MOSE procedure produced negative bias at the lower tail. Compared with those produced by the MOSE and the AOSE procedures, values of bias from the ATSE were smaller and distributed more evenly. Under the NOP linking methods, all three equating procedur es demonstrated the largest SEEs, RMSDs and bias as compared with those under the TCF, ICF, and OD linking methods.

PAGE 151

151 Overall, the magnitudes of SEEs and RMSDs started to increase from the lower end of the score scale and peaked at a raw score of approximate ly 33, except those from the AOSE procedure. The AOSE procedure produced slightly SEEs and RMSDs at the lower tail of the score scale than did the ATSE and the MOSE procedures. In general, all three equating procedures produced positive bias across most of score scale, except for the MOSE procedure at the lower tail of the scale and the ATSE procedure at the upper tail of the scale. The magnitude of bias for the MOSE and the AOSE procedures started with small amount at the lower tail of the scale, then it s moothly increased throughout the entire middle score range and peaked at a raw score of around 35. The magnitude of bias for the ATSE procedure started with a large amount of positive bias at the lower tail, smoothly increased first then decreased througho ut the entire middle score range, and then reached the lowest value a t the upper tail of the scale. Group Distribution Condition 3 In group distribution condition 3, the multivariate ability distribution has an ability mean of .5 and an ability standard de viation of .8, but otherwise matches the null distribution. Equivalent score d ifference Tables A 33 to A 37 and Figure s B 11 to B 15 display the results of the equating mean score and score difference for all three MIRT equating methods across five linking procedures for group distribution condition 3 (i.e., group 3). In general, under group distribution condition 3 the equivalent score differences for the three equating procedures greatly increased across the five linking methods as compared with those und er group distribution condition 1. This was likely due to the group mean changes of this condition, as compared with group distribution conditions 1 and 2. All three equating procedures appeared to have negative score differences throughout the entire scor e range. The largest absolute equivalent score difference occurred around raw score 18.

PAGE 152

152 The equating procedures under the TCF, the ICF, and the OD linking methods produced smaller amounts of equivalent score difference as compared with those under the M an d NOP linking methods. Under the TCF, ICF, and OD linking methods, all three equating procedures showed a similar pattern, but the ATSE procedure performed best with the least amount of equivalent score differences across the score scale. Alternately, the MOSE procedure performed worst with the largest amount of equivalent score differences across the score scale. The equivalent score differences produced by all equating procedures under the M and NOP linking methods exceeded the fixed vertical scales set f or score differences, which indicates that group mean difference had a huge effect on the equating results for all three equating procedures. Under the M linking method, the equating procedures had the second largest amount of equivalent score difference, followed by those under the NOP linking methods. In general, all three equating procedures produced a similar pattern for equivalent score difference changes. The MOSE procedure performed the worst with the largest equivalent score difference across the sc ore scale as compared with the other two procedures. The ATSE procedure performed best among the three MIRT equating procedures with the fewest differences between its equivalent scores and equivalent scores from the criterion equating function. Under the NOP linking method, all three equating procedures performed similarly with high negative values of equivalent score difference. Although the ATSE procedure demonstrated the smallest equivalent score difference across the score scale, the equivalent score d ifferences for all three equating procedures exceeded the fixed vertical scales set for score differences. SEE, RMSD, and Bias Table s A 93 to A 97 and Figure s B 71 to B 75 show the results of SEE, RMSD, and bias statistics across MIRT linking methods for a ll MIRT equating methods for group distribution

PAGE 153

153 condition 3 (i.e., group 3). In general, under group distribution condition 3 the three equating procedures produced extremely large values of SEE, RMSD, and bias across the five linking methods as compared w ith those under group distribution conditions 1 and 2. This is likely due to the group mean changes of this condition as compared with the null group distribution condition. A visual inspection of these figures suggests that the bulk of RMSD comes from bia s instead of from SEEs. As a result, RMSD values resemble those from bias, but with positive values under this group distribution condition. The values of RMSDs and bias produced by all the equating procedures under this ability group condition exceeded th e fixed vertical scales set for RMSDs and bias, which indicates that group mean difference had a huge effect on the equating results for all three equating procedures. Under the TCF and ICF methods, values of SEEs for all three equating procedures were lar ger than those produced by the other three linking methods (i.e., the OD, M method, and NOP method), but values for RMSDs and bias were comparatively smaller than those under the other three linking methods. The magnitude of the SEEs started very low at th e lower tail of the score range, then increased throughout the entire middle score range and peaked to its maximum around raw score 20. Within each of the three linking methods, the ATSE procedure produced smaller amounts of SEEs, RMSDs, and bias across th e score scale than did the MOSE and the AOSE procedures. Among all three equating procedures, the AOSE procedure produced the largest SEEs, but the MOSE had the worst equating performance in terms of largest RMSDs and bias. Within each of the three linking methods, all three equating procedures appeared to have negative bias throughout the entire score scale, with the absolute values of the bias increasing throughout the middle score range, then decreasing when the score range approached the upper

PAGE 154

154 tail. Amo ng the three equating procedures, the ATSE procedure had the least amount of bias across the score scale. Under the M, OD, and NOP linking methods, all three equating procedures had smaller amount of SEEs and larger amount of RMSDs and bias as compared wit h those under the other two linking methods. All three equating procedures had a very similar pattern on SEEs, RMSDs, and bias but with some minor variations. At the lower tail, the SEEs were comparatively low, except for the AOSE procedure, but quickly in creased throughout the middle score range and peaked at a raw score of around 39. Under these three linking methods, the MOSE procedure produced larger amounts of RMSDs and bias across most of score scale than did the ATSE and the AOSE procedures. Group Di stribution Condition 4 In group distribution condition 4, the multivariate ability distribution has a correlation between ability dimensions of .5, but otherwise matches the null distribution. Equivalent score d ifference Tables A 38 to A 42 and Figures B 1 6 to B 20 display the results of the equating mean score and score difference for all three MIRT equating methods across five linking procedures for group distribution condition 4 (i.e., group 4). In general, the three equating procedures showed the larger equivalent score differences across the five linking methods as compared with those under the null group distribution. This is likely due to the correlation between ability dimensions in this condition. Under the ICF, NOP, and OD linking methods, the equa ting procedures produced a smaller amount of equivalent score difference as compared with those under the M and TCF linking methods. Under these three linking methods, the MOSE procedure demonstrated a pattern on equivalent score difference in opposition t o that demonstrated by the ATSE and AOSE

PAGE 155

155 procedures. The equivalent score difference for the MOSE procedure increased first at the lower tail of the score range and peaked around raw score 10, then decreased throughout the rest of the score range. Alternat ely, the equivalent score differences for the ATSE and the AOSE procedures started increasing at the lower tail of score range and peaked around raw score 35, then decreased at the upper tail of score scale. Under the M and TCF linking methods, the equatin g procedures produced a larger amount of equivalent score differences than those under the ICF, NOP, and OD linking methods. The three equating procedures had a similar pattern on equivalent score difference starting with positive equivalent score differen ces at the lower tail of the score range, then increasing and then peaking around raw score 10. Later the equivalent score differences decreases and turned negative until they reached their lowest point around raw score 35, then returned to zero at the upp er tail of the score scale. The ATSE procedure performed best with least amount of equivalent score differences across the score scale. Under the M and TCF linking methods, the MOSE procedure and the AOSE procedure did not differ from each other in terms o f a similar amount of equivalent score difference across the score scale. SEE, RMSD, and Bias Tables A 98 to A 102 and Figures B 76 to B 80 show the results of SEE, RMSD, and bias statistics across MIRT linking methods for all MIRT equating methods for gro up distribution condition 4 (i.e., group 4). In general, under group distribution condition 4 the three equating procedures produced larger amounts of SEE, RMSD, and bias across the five linking methods as compared with those under the null group distribut ion (condition 1). This is likely due to the correlation between ability dimensions of this condition, as compared with no correlation between ability dimensions under the null group distribution condition. A visual inspection of these figures suggests tha t the bulk of RMSD came primarily from bias. As a result, these figures

PAGE 156

156 of RMSD resemble those from bias, but with positive values under this group distribution condition. In general, the three equating procedures produced similar pattern of SEEs across th e five linking methods, but the AOSE had the worst equating performance with the largest SEEs throughout the range of the score scale. The ATSE procedure had the least amount of RMSDs and bias as compared with the other two equating procedures under all fi ve linking methods. Under the M and ICF linking methods, the three equating procedures produced smaller SEEs than those under the other three linking methods (i.e., OD, TCF method, and NOP method). The magnitude of the SEEs started very low at the lower ta il of score range, increased briefly, and then decreased throughout the entire middle score range. Then they increased again and peaked around raw score 36. Under the ICF linking method, all three equating procedures produced the least amount of RMSDs as c ompared with those under the other four linking methods. Within each of the two linking methods, the ATSE procedure produced smaller amounts of SEEs, RMSDs, and bias across the score scale than did the MOSE and the AOSE procedures. Among all three equating procedures, the AOSE procedure produced the largest SEEs, but the MOSE demonstrated the worst equating performance at the lower end of the score scale in terms of largest RMSDs and bias. When approaching the upper end of the score scale, the AOSE procedur e demonstrated the worst equating performance. Under the ICF linking method, the AOSE and the ATSE procedures appeared to have negative bias at the lower end of the score scale, then it increased throughout the middle of the score range and peaked around r aw score 37. Conversely, the MOSE procedure started with negative bias at the lower end of the score scale, then it quickly increased to positive and peaked at raw score 10. The values of bias of the MOSE procedure then decreased from their peak in the mid dle score range and reached

PAGE 157

157 their lowest value around raw score 34. Finally, the values of bias of the MOSE procedure go back to approximately zero when approaching the upper tail of the score range. Under the M linking method, the three equating procedure s appeared to have negative bias at the lower end of the score scale. Then the values of bias increased throughout the middle score range and peaked between raw score 8 and raw score 12. After that change, the values of bias then dropped through the middle of the score range and reached their lowest value around raw score 36. Finally, the values of bias go back to approximately zero when approaching the upper tail of the score range. Under the TCF, OD, and NOP linking methods, all three equating procedures had larger amount of SEEs and RMSDs as compared with those under the other two linking methods. Within each of the three linking methods, all three equating procedures had a very similar pattern on SEEs, RMSDs, and bias but with some minor variations. At t he lower tail, their SEEs and RMSDs were comparatively low (except for the MOSE procedure under the NOP linking method) but quickly increased throughout the middle score range and then peaked at a raw score of approximately 10. After that change, the value s of SEEs and RMSDs then dropped from their peak in the middle score range and reached their lowest value around raw score 20. Then, the values of bias increased again, peaking at a raw score around 35. Finally, the values of SEEs and RMSDs dropped as the upper tail of the score range was approached. Under the NOP and OD linking methods, the AOSE and the ATSE procedures appeared to have negative bias at the lower end of the score scale, then increased throughout the middle score range and peaked around raw score 37. Conversely, the MOSE procedure started with negative bias at the lower end of the score scale, then quickly increased to positive and peaked at the raw score 10. The values of bias for the MOSE procedure then decreased throughout the middle of th e score range

PAGE 158

158 and reached their lowest value around raw score 34. Finally, the values of bias of the MOSE procedure returned to approximately zero when the score range approached the upper tail. Under the TCF linking method, the patterns of bias for the th ree equating procedures were similar to those under the M linking method. Group Distribution Condition 5 and 6 In group distribution condition 5, the multivariate ability distribution has a correlation between ability dimensions of .5 and an ability mean o f .5, but otherwise matches the null distribution In group distribution condition 6, the multivariate ability distribution has a correlation between ability dimensions of .5, an ability standard deviation of .8, and an ability mean of .5, but otherwise mat ches the null distribution. In general, under group distribution conditions 5 and 6, the pattern of equivalent score difference, SEEs, RMSDs, and bias for three equating procedures are similar to those under group distribution condition 3 with minor variat ions. This indicates that group ability mean difference had an extremely large effect on the equating results for all three equating procedures as compared with group ability standard deviation difference and correlation between ability dimensions. Because of the large effect from group ability mean difference, the influence from other two group ability changes on the patterns of changes on equivalent score difference, SEEs, RMSDs, and bias became inconsequential. Since the patterns of changes on equivalent score difference, as well as the SEEs, RMSDs, and bias for the three equating procedures across the different linking methods are similar to those under group distribution condition 3, the interpretations for these two conditions are omitted from this cha pter. However, results are provided in the Appendix in Tables A 43 to A 52

PAGE 159

159 and in Figure s B 21 to B 30 (i.e., equivalent score difference), and also in Tables A 103 to A 112 and in Figure s B 81 to B 90 (i.e., SEEs, RMSDs, and bias).

PAGE 160

160 CHAPTER 5 DISCUSSION AND CONCLUSION In this chapter, the goals of this study are revisited and study results are described and discussed within the context of those goals. Next, a discussion of the factors that may have affected results is presented. Also, limitations of the s tudy are discussed and suggestions for further research in MIRT equating are proposed. Lastly, major conclusions are presented. The primary goals of this study were to evaluate the performance of MIRT equating procedures under the NEAT design, and to explo re how different MIRT linking methods interact with MIRT equating procedures to impact equating results under various testing conditions. Results were expected to provide some guidelines about which MIRT equating method might be more accurate under various testing conditions. Previous literature (i.e., Brossman, 2010) has mainly focused on comparing equating study, research results indicated that different UIRT equati ng procedures performed similarly, different MIRT equating procedures performed similarly, but that UIRT equating procedures and MIRT equating procedures performed differently from one another. Additionally, MIRT equating procedures tended to perform more similarly to the equipercentile equating procedure multidimensional and thus using UIRT procedures may have introduced systematic bias into the equating process, the res ults obtained matched what the author had predicted. Brossman also found that 1) results from MIRT equating procedures were comparable to those from traditional equipercentile equating and 2) all three MIRT equating procedures performed similarly under th e random equivalent groups (EG) design. Furthermore, he claimed that the ATSE procedure performed more similarly to the equipercentile equating procedure than either of the

PAGE 161

161 multidimensional observed score equating procedures (i.e., MOSE and AOSE), but that the differences between the MIRT observed score equating procedures and the ATSE procedure were minimal. This was likely due to the randomly equivalent group design used in his study. However, evaluating MIRT equating under the NEAT design is not as st rai ghtforward as it is in the randomly equivalent group design. First, existing population differences under the NEAT design may have a huge effect on the linking and equating results. Second, unlike MIRT linking under the randomly equivalent group design, tr anslation and dilation must be considered for MIRT linking procedures under the NEAT design. Third, the non equivalent groups are non equivalent not only because of different ability levels, but also because of the correlation between ability dimensions. A ccordingly, the errors from MIRT equating and from MIRT linking procedures are confounded and may not be separated. Thus, MIRT equating procedures must be evaluated and compared within each population condition. Equivalent Score Difference In this study, M IRT equating under the NEAT design was compared under twelve conditions with six different group distributions across two different test structures. From these results, some characteristics of MIRT equating under the NEAT design can be identified. First of all, test structure and all the interactions including test structure had a very small effect on equating results. Second, among all three group distribution factors (i.e., group mean, correlation and standard deviation), the group mean factor influenced equating results the most. The group correlation factor and the group standard deviation factor had a similar level of effect on the equating results, but their impact was not as large as the group mean factor. Third, the interaction of the group distribut ion change and the type of MIRT linking methods had a huge effect on the equating results. Fourth, the interaction of the group distribution change and the type of MIRT equating procedures also had a large effect on the equating results.

PAGE 162

162 F MIRT linki ng methods, all under the TCF and the ICF methods when there was a significant group distribution difference. When group distribution change existed, the equating results had smaller discrepancies under the OD method and the M methods than they did under the NOP method. The equating procedures under the NOP method had the lowest robustness when there were group distribution shape differences. This was consistent with the results found in previous literature (Sim she found that the M method performed poorly with non equivalent groups, and although the M method had smaller root mean square errors (RMSE) than the TCF method when groups were equivalent, the TCF method had a smaller item di scrimination parameter RMSE than the M method with non equivalent groups. Moreover, the MIRT equating procedures had the worst performance under the NOP linking method with larger score difference across score scale. In this study, some interesting results were found by comparing MIRT equating procedures with the criterion equating function within each population condition through equivalent score differences. The results of the comparison among the three MIRT equating procedures were obtained by averaging equating results from each MIRT equating procedure across all linking methods. First, the ATSE procedure demonstrated, overall, the best equating performance as compared with the other two equating procedures (i.e., MOSE and AOSE) across all group distribu tion conditions. Second, all three MIRT equating procedures performed comparatively well when no group mean difference existed, especially under the null condition. Third, the MOSE procedure performed better than the AOSE procedure in terms of the equivale nt score difference across score scale when no group distribution mean change existed. Fourth, both the ATSE procedure and the AOSE procedure outperformed the MOSE procedure when there were

PAGE 163

163 group distribution mean changes. However, when group mean differen ce existed, the equating results for all three MIRT equating procedures had larger discrepancies than those under conditions with no group mean differences. Fifth, the ATSE procedure performed better than the other two equating procedures when only the gro up standard deviation changes or only correlation exists between group ability dimensions. When both group standard deviation and mean differ, the discrepancies between equivalent scores from all three MIRT equating procedures and equivalent scores from th e criterion equating function greatly increased. But the ATSE procedure also outperformed the other two equating procedures in terms of smaller equivalent score differences. Sixth, the discrepancies between equivalent scores from all three MIRT equating pr ocedures and equivalent scores from the criterion equating function greatly increased (and the ATSE procedure also outperformed other two equating procedures) when both group mean difference and correlation existed between group ability dimensions and all group difference (i.e., correlation exists between group ability dimensions, group mean difference, group standard deviation difference). This pattern was actually similar to the condition that had only group mean difference. Because MIRT observed score eq uating and MIRT true score equating are defined differently, the observed score and the true score equating procedures are not expected to perform similarly even under ideal conditions. According to Brossman (2010), because the equipercentile equating proc edure is an observed score procedure, it is expected that the MIRT observed score procedures would perform more similarly to the equipercentile equating procedure than the MIRT true score procedure (Brossman, 2010). This might be true under the EG design. However, when two groups are non equivalent, the ATSE procedure had, overall, the

PAGE 164

164 best equating performance as compared with frequency estimation equating results in this study. This was true even though the frequency estimation equating procedure is an ob served score procedure. It is currently unknown as to why the ATSE procedure has the best equating performance among all three MIRT equating procedures, even compared with the observed score criterion equating function. This result might be caused by the g roup ability non equivalence. More specifically, since the MIRT true score procedure is sample invariant and MIRT observed score procedures may be influenced by sample variation, it is possible that the MIRT true score procedure outperforms the MIRT observ ed score equating procedures as group non equivalence exists, such as with the NEAT design. Standard Error of Equating The Standard Error of Equating (SEE) is computed to determine the magnitude of random error within each equating procedure that is due to the sampling of examinees. In general, the results from this study indicated that the magnitude of SEEs for all MIRT equating procedures across all linking methods increased as group distributions changed, especially when group mean difference existed. Fi rst, under all five linking methods, the AOSE procedure produced larger amounts of SEEs across the score scale than did the MOSE and the ATSE procedures, with no group distribution change The magnitude of the SEEs increased as scores reached both ends of the score scale Under the null condition, b oth the ATSE and the MOSE equating procedures did not display an obvious divergence in terms of the magnitude of SEEs. Second, when only the group standard deviation changed or only correlation existed between gr oup ability dimensions, the three equating procedures produced larger SEEs across the five linking methods as compared with those under the condition with no group distribution difference. The magnitude of the SEEs for the three equating procedures increas ed when scores

PAGE 165

165 reached both ends of the score scale. The AOSE procedure produced larger SEEs across the score scale than did the MOSE and the ATSE procedures. Both the ATSE and the MOSE equating procedures work in the same way and did not display an obviou s divergence in terms of similar magnitude of SEEs. Under the NOP linking method, all three equating procedures demonstrated the largest SEEs as compared with those under the M, the TCF, the ICF, and the OD linking methods. Third when there is a significa nt mean difference between groups, the three equating procedures produced extremely large SEEs across the five linking methods as compared with those under the group conditions without group mean differences. The magnitude of SEEs for the three equating pr ocedures was similar across different MIRT linking methods, with minor variations. Among the three equating procedures, the AOSE procedure produced the largest SEEs and the ATSE procedure produced the smallest SEEs. RMSD The Root Mean Square Deviation (RMS D) is a measure to determine the overall equating accuracy of an equating procedure. In general, the results of RMSDs from this study indicate that the magnitudes of RMSDs for all three equating procedures across all linking methods increased as the group distribution changed, especially when group mean difference existed. Overall, the ATSE procedure outperformed the other two equating procedures with the smallest RMSDs across the entire score scale under different linking methods and group distribution cha nges. The AOSE procedure produced the largest RMSDs across the score scale than did the MOSE and the ATSE procedures when no group mean difference existed between groups. Across all five linking methods, all three equating procedures displayed the largest RMSDs under the NOP linking method as compared with those under the TCF, the ICF, the M, and the OD linking methods.

PAGE 166

166 W hen a group distribution mean change existed, the three equating procedures produced extremely large RMSDs across the five linking methods as compared with those under the group conditions without group mean changes. The ATSE procedure produced the smaller RMSDs across the score scale vs. the MOSE and the AOSE procedures. Among all three equating procedures, the MOSE procedure had the worst equating performance in terms of largest RMSDs. Bias Equating bias is computed to determine the magnitude of systematic error within each equating procedure. Although the patterns of bias for the three MIRT equating procedures under specific group distribu tion conditions were not as obvious and consistent as the patterns for SEEs and RMSDs, one can still see an overall trend based on study results. In general, under most situations across all group distribution conditions and MIRT linking methods, the resul ts of bias from this study indicate that the ATSE procedure had performed best among three MIRT equating procedures in terms of smaller and more evenly distributed values of bias as compared with those from the MOSE and AOSE procedures. Effects from IRT Es timation The first possible effect that could influence MIRT equating results is the MIRT estimation process itself. In this study, item calibration was done by using TESTFACT to obtain MIRT parameter estimates. TESTFACT provides two types of rotation solu tions in its IRT estimation rotation solution is recommended because the calibr ated item parameter estimates can be rotated in this study.

PAGE 167

167 controversial interpreted item parameter estimates (Li & Lissitz, 2000); on the other hand, under such oblique rotation the overall discrimination power for each item (which is related to the geometric length of the item as represented in multidimensional space) may change, since each item is rotated obliquely. Furthermore, the MIRT difficulty parameter ( ) may vary accordingly. That is, it is possible that the direction of best measurement for the ent may affect the MIRT linking process in the next step. Since most constructs and dimensions within a construct are correla ted in education and psychology, correlated ability dimension conditions are included in this study. Because of the characteristics of the IRT estimation in TESTFACT, correlations among item scores are accounted for solely by the parameters (Li, 1997; Reckase, 1997; Wei, 2008). Therefore, by item scores may not be solely accounted for by the parameters. This may als o affect the MIRT linking process. It is currently unknown which rotation solution will have a greater effect on MIRT equating procedure performance. Furthermore, the amount of error due to item parameter estimates from the MIRT estimation was not examined separately from equating errors in this study. Therefore, further investigation with regard to how MIRT estimation affects equating results is warranted. Effects from IRT Linking Methods The process of MIRT linking may also affect MIRT equating results. A s mentioned in chapter 2, different linking methods apply different types of rotation, dilation, and translation

PAGE 168

168 approaches. Also, different linking methods utilize different types of optimization approaches in the MIRT linking process. Therefore, applying different MIRT linking methods in MIRT equating procedure may result in different equating performance, even within the same MIRT equating procedure. In the TCF, the ICF, and the OD linking methods, the rotation matrix and transl ation vector are optimized simultaneously through their linking process. In the M linking method, only the rotation matrix is optimized by minimizing the trace function for the product of the least square difference ( ) and its transpose ( ). In the NOP linking method, no optimization process is applied and the MIRT linking process is done solely through the non orthogonal Procrustes procedure. Thus, the nu mber of optimization processes included in the linking methods may be an underlying reason for differences in equating performance. Accordingly, it is possible that because both the rotation matrix and translation vector are optimized in the MIRT linking processes, that explains why the MIRT equating procedures under the TCF, the ICF, and the OD linking methods demonstrate better equating performance as compared with the M and NOP linking methods. It is als o possible that because no optimization process is applied in the NOP process, the MIRT equating procedures under the NOP linking method demonstrate the worst equating performance among all MIRT linking methods. In this study, the amount of error due to MI RT linking was not examined separately from equating errors. Therefore, further investigation with regard to the impact on equating results due to different MIRT linking methods is suggested. Limitation and Future Research Direction This study is the first simulation study to evaluate the performance of MIRT equating procedures. Specifically, this study explores the performance of multiple MIRT equating

PAGE 169

169 procedures under the NEAT design. Equating under the NEAT design could have employed more comprehensive f actors. Among them, more sophisticated combinations of different populations could have been included. Since it was impossible to include all these factors, only a few of them were considered; thus, this study is limited by the restricted number of conditi ons considered. Also, the IRT software used in this study was TESTFACT. As mentioned earlier in this chapter, TESTFACT only provides limited options for rotation solutions in its IRT estimation process. Possible problems may have been created by the select ion of rotation type in the process of MIRT item calibration. Future research should consider additional choices for rotation in the MIRT estimation process, from different MIRT software Moreover, when there are high correlations between ability dimension s and non equivalent groups, the choice of a rotation solution for MIRT item calibration becomes much more complex. Better estimation procedures with correlated ability dimensions and non equivalent groups are needed. Thus, the possibility of using use mul tiple programs for MIRT (Mplus, BMIRT, and IRTPRO) needs to be considered in future studies. Additionally, the synthetic population weights for both the equated group and the base group (to define the target population) were set as an d only in this study. Although the choice of weights makes little practical difference in most real equating applications, according to previous equating research (Kolen & Brennan, 2004), research on synthetic population weights has on ly focused on the unidimensional situation. No investigation regarding synthetic population weights applied for a multidimensional equating situation has been conducted. Further, in the NEAT design the target population for equating is usually a hypothetic al population This hypothetical population is made up of two or more target test

PAGE 170

170 taking populations, and it is assumed that the sample groups of examinees are drawn from each target population, respectively. The hypothetical population is thus a combinati on of all nonequivalent target populations from which the samples of examinees are drawn (von Davier, Holland & Thayer, 2004). However, even though the hypothetical population is generated by applying the idea of synthetic population weights, the populatio n invariance requirement may be violated if the resulting equating functions for different target populations are different enough (von Davier, Holland & Thayer, 2004). Thus, to what extent 1) the difference between different target populations and 2) the choice of synthetic population weights affect the performance of MIRT equating procedures needs further exploration and should be considered by the future research studies. The ability mean difference between groups had the greatest influence on the equati ng results for all three equating procedures across all linking methods. This is likely due to the violation of the population invariance requirement for equating. Also, it may have been impacted by the fact that no translation is involved in the optimizat ion process in any of the linking methods, so that the adjustment process in MIRT linking may not work effectively. The criterion equating used in this study was frequency estimation equating that is based on an observed score equating framework under the NEAT design. This might disadvantage the MIRT equating methods, which are based on an IRT equating framework. As a general rule, using a frequency estimation method as the criterion equating function when evaluating different equating methods is questionab le. It is therefore worth considering replicating this study using some other met hod for the criterion equating especially one that does not favor any of the equating methods under study.

PAGE 171

171 Another limitation may be that the rotation in the MIRT linking pro cess used in this study is controversial. On one hand, only orthogonal rotation in MIRT linking is recommended by the the discrimination parameters changed throu gh the orthogonal rotation, the overall discrimination power and the MIRT difficulty parameter for each item remained the same. In were addressed. He believed that the meaning of the reference axes could change after oblique rotation because the angles among axes are changed when finding the optimal rotation, while the orthogonal rotation maintains the initial structure of a reference system. Neither Brossman nor Min recommend using oblique rotation in MIRT linking process, something that was proposed by Oshima, Davey and Lee (2000). On the other hand, based on the MIRT results obtained from this study, the MIRT equating procedures under the oblique rotated the TCF, t he ICF, and the OD linking methods demonstrated better equating performance than those under the orthogonal rotated M linking methods. It is not clear whether researchers need to maintain item vector structure through an orthogonal rotation, nor is it clea r to what extent the oblique rotation used in the linking methods changes the vector structures so that the performance of the MIRT equating procedures is influenced. Therefore, further investigation into what types of rotation used in the MIRT linking pro cess for MIRT equating is needed. Lastly, it is worth noting that although test forms to be equated are typically designed to cover the same content domain, the multidimensional feature of some tests implies that different total scores across the entire sc ore scale might carry different weight from different dimensions for each population. This may be true even though the unidimensionalization is conducted in the process to obtain total scores. In previous literature, a centroid plot (Reckase, 2009) is prop osed

PAGE 172

172 to determine which trait(s) contribute the most towards differences in scores at various score points along the scale. Due to time and space limits, this issue was not discussed in this study. Therefore, further research into this issue in MIRT equati ng is recommended. Conclusion The primary goals of this study were to evaluate the performance of MIRT equating procedures under NEAT design and to explore how different MIRT linking methods interacting with MIRT equating procedures might impact the equati ng results under various testing conditions. In this study, three MIRT equating procedures under the NEAT design were evaluated and investigated. There were twelve conditions with six different group distribution situations across two different test struct ures in this study. Additionally, five linking methods were applied for the MIRT linking in this study. In conclusion, the ATSE equating procedure seemed to demonstrate the best performance (in terms of smaller magnitudes of equivalent score difference, SE Es, RMSDs and bias) as compared with the other two equating procedures (i.e., the MOSE and AOSE procedures) across all group distribution conditions and all linking methods. Next, the MIRT equating procedures under the TCF, the ICF, and the OD linking meth ods showed better equating performance as compared with the MIRT equating procedures under the M and the NOP linking methods. The MIRT equating procedures under the NOP linking method demonstrated the worst equating performance with the largest values of e quivalent score difference, SEEs, RMSDs and bias within most of the group distribution conditions. This is likely due to how different linking methods are applied in the process of MIRT linking. Furthermore, compared with other two group ability distributi on factors, the group ability mean difference factor had the largest negative effect on the equating results for all three equating procedures across all linking methods. Group ability standard deviation difference and

PAGE 173

173 correlation between ability dimension s had some effects on the equating results for the three equating procedures across all linking methods, but those effects were not as strong as those from group ability mean difference. This is likely due to the violation of the population invariance requ irement caused by the group mean difference. And this is also likely due to the ineffectiveness of the translation transformation (i.e., no optimization process is available for translation) in the process of MIRT linking. Finally, it was found that test s tructure had very small effect on the equating results for all three equating procedures. Further research is needed on the possibility of multiple MIRT software, the choice of the synthetic population weights under the MIRT framework, the choice of differ ent criterion equating functions, selection of rotation type in the MIRT linking process for MIRT equating, and to what extent the different total scores carry different weights from different dimensions along the entire score scale.

PAGE 174

174 APPENDIX A TABLE S Ta ble A 1. Multidimensional model u sed in previous MIRT linking/equating s tudies Previous Studies MIRT Model Hirsch (1989) M2PL Davey, Oshima, Lee (1996) M3PL Thompson, Nering, Davey (1997) M2PL Li (1997) Li & Lissitz (2000) M2PL Oshima, Davey, Lee (20 00) M2PL Min (2003), Min (2007) M2PL Reckase & Martineau (2004) M2PL Yao (2008) M3PL/M2PPC Wei (2008) M2PL Simon(2008) M3PL Brossman (2010) M2PL Note: M2PL Multidimensional compensatory 2 parameter Logistic model M3PL Multidimensional compensatory 3 parameter Logistic model M2PPC Multidimensional compensatory 2 parameter Partial Credit model Table A 2. Linking design, test length and anchor test length in previous MIRT linking/equating s tudies Studies Linking design Test Le ngth Anchor Length Hirsch (1989) Common Examinees 40 N/A Davey, Oshima, Lee (1996) NEAT 40 40 Thompson, Nering, Davey (1997) Randomly Equivalent 200 (item pool) N/A Li (1997) Li & Lissitz (2000) NEAT 15,25 15,25 Oshima, Davey, Lee (2000) NEAT 40 40 M in (2003), Min (2007) NEAT 20 20 Reckase & Martineau (2004) NEAT 50 25 Yon (2006) NEAT 30 30 Yao (2008) NEAT 60 15 Wei (2008) NEAT 40/20 40/20 Simon(2008) NEAT 60 40/20 Brossman (2010) Randomly Equivalent 50/40/48 N/A

PAGE 175

175 Table A 3. Test s tructure in previous MIRT linking/equating s tudies Studies Test Structure Hirsch (1989) N/A Davey, Oshima, Lee (1996) N/A Thompson, Nering, Davey (1997) N/A Li (1997) Li & Lissitz (2000) APSS/CS Oshima, Davey, Lee (2000) N/A Min (2003), Min (2007) APSS/CS Reck ase & Martineau (2004) N/A Yao (2008) CS Wei (2008) APSS/CS Simon(2008) CS Brossman (2010) N/A Note: APSS Approximate simple structure, CS Complex structure Table A tudy (2003) Level MDISC MDIFF 1 0.4 1.5 2 0.8 1 3 1.2 0 4 1.6 1 5 2 1.5 Mean 1.2 0 Table A 5. Ten MIRT discrimination and difficulty l evels Level MDISC MDIFF 1 0.2 2.0 2 0.4 1.5 3 0.6 1.0 4 0.8 0.5 5 1 0.0 6 1.2 0.0 7 1.4 1 8 1.6 0.5 9 1.8 1.5 10 2 .0 2.0 Mean 1.1 0.0

PAGE 176

176 Table A 6. Test structure of base form unique test section (approximate simple structure) Item 1 0.200 0.014 0.400 0.200 2.000 4 86 2 0.390 0.090 0.600 0.400 1.500 13 77 3 0.589 0.114 0.600 0.600 1.000 11 79 4 0.796 0.084 0.400 0.800 0.500 6 84 5 0.993 0.122 0.000 1.000 0.000 7 83 6 1.169 0.270 0.000 1.200 0 .000 13 77 7 1.374 0.267 1.400 1.400 1.000 11 79 8 1.599 0.056 0.800 1.600 0.500 2 88 9 1.799 0.063 2.700 1.800 1.500 2 88 10 1.992 0.174 4.000 2.000 2.000 5 85 11 0.048 0.194 0.400 0.200 2.000 76 14 12 0.000 0.400 0.600 0.400 1.500 90 0 13 0. 000 0.600 0.600 0.600 1.000 90 0 14 0.180 0.779 0.400 0.800 0.500 77 13 15 0.174 0.985 0.000 1.000 0.000 80 10 16 0.249 1.174 0.000 1.200 0.000 78 12 17 0.122 1.395 1.400 1.400 1.000 85 5 18 0.195 1.588 0.800 1.600 0.500 83 7 19 0.126 1.796 2.70 0 1.800 1.500 86 4 20 0.209 1.989 4.000 2.000 2.000 84 6

PAGE 177

177 Table A 7. Test structure of base form unique test section (complex structure) Item 1 0.194 0.048 0.400 0.200 2.000 14 76 2 0.390 0.090 0.600 0.400 1.500 13 77 3 0.593 0.094 0.600 0.600 1.000 9 81 4 0.790 0.125 0.400 0.800 0.500 9 81 5 0.974 0.225 0.000 1.000 0.000 13 77 6 0.946 0.739 0.000 1.200 0.000 38 52 7 1.147 0.803 1.400 1.400 1.000 35 55 8 1.226 1.028 0.800 1.600 0.500 40 50 9 1.474 1.032 2.700 1.800 1.500 35 55 10 1.638 1.147 4.000 2.000 2.000 35 55 11 0.100 0.173 0.400 0.200 2.000 60 30 12 0.235 0.324 0.600 0.400 1.500 54 36 13 0.401 0.446 0.600 0.600 1.000 48 42 14 0.503 0.622 0.400 0.800 0.500 51 39 15 0.500 0.866 0.000 1.000 0.000 60 30 16 0.084 1.197 0.000 1.200 0.000 86 4 17 0.073 1.398 1.400 1.400 1.000 87 3 18 0.139 1.594 0.800 1.600 0 .500 85 5 19 0.374 1.761 2.700 1.800 1.500 78 12 20 0.382 1.963 4.000 2.000 2.000 79 11

PAGE 178

178 Table A 8. Test structure of equated form unique item section (approximate simple structure) Item 1 0.194 0.048 0.400 0.200 2.000 14 76 2 0.398 0.035 0.600 0.400 1.500 5 85 3 0.599 0.031 0.600 0.600 1.000 3 87 4 0.779 0.180 0.400 0.8 00 0.500 13 77 5 0.996 0.087 0.000 1.000 0.000 5 85 6 1.193 0.125 0.000 1.200 0.000 6 84 7 1.392 0.146 1.400 1.400 1.000 6 84 8 1.594 0.139 0.800 1.600 0.500 5 85 9 1.796 0.126 2.700 1.800 1.500 4 86 10 1.975 0.313 4.000 2.000 2.000 9 81 11 0.0 17 0.199 0.400 0.200 2.000 85 5 12 0.069 0.394 0.600 0.400 1.500 80 10 13 0.063 0.597 0.600 0.600 1.000 84 6 14 0.042 0.799 0.400 0.800 0.500 87 3 15 0.139 0.990 0.000 1.000 0.000 82 8 16 0.000 1.200 0.000 1.200 0.000 90 0 17 0.049 1.399 1.400 1. 400 1.000 88 2 18 0.387 1.552 0.800 1.600 0.500 76 14 19 0.188 1.790 2.700 1.800 1.500 84 6 20 0.347 1.970 4.000 2.000 2.000 80 10

PAGE 179

179 Table A 9. Test structure of equated form unique item section (complex structure) Item 1 0.196 0.042 0.400 0.200 2.000 12 78 2 0.386 0.104 0.600 0.400 1.500 15 75 3 0.597 0.063 0.600 0.600 1.000 6 84 4 0.796 0.084 0.400 0.800 0.500 6 84 5 0.993 0.122 0.000 1.000 0.000 7 83 6 0.983 0.688 0.000 1.200 0.000 35 55 7 1.118 0.843 1.400 1.400 1.000 37 53 8 1.278 0.963 0.800 1.600 0.500 37 53 9 1.510 0.980 2.700 1.800 1.500 33 57 10 1.50 9 1.312 4.000 2.000 2.000 41 49 11 0.134 0.149 0.400 0.200 2.000 48 42 12 0.268 0.297 0.600 0.400 1.500 48 42 13 0.300 0.520 0.600 0.600 1.000 60 30 14 0.424 0.678 0.400 0.800 0.500 58 32 15 0.643 0.766 0.000 1.000 0.000 50 40 16 0.146 1.191 0.0 00 1.200 0.000 83 7 17 0.219 1.383 1.400 1.400 1.000 81 9 18 0.195 1.588 0.800 1.600 0.500 83 7 19 0.435 1.747 2.700 1.800 1.500 76 14 20 0.313 1.975 4.000 2.000 2.000 81 9

PAGE 180

180 Table A 10. Test structure of anchor item section (approximate simple structure) Item 1 0.198 0.028 0.400 0.200 2.000 8 82 2 0.397 0.049 0.600 0.400 1.500 7 83 3 0.600 0.010 0.600 0.600 1.000 1 89 4 0.785 0.153 0.400 0.800 0.500 11 79 5 0.996 0.087 0.000 1.000 0.000 5 85 6 1.169 0.270 0.000 1.200 0.000 13 77 7 1.369 0.291 1.400 1.400 1.000 12 78 8 1.576 0.278 0.800 1.600 0.500 10 80 9 1.74 7 0.435 2.700 1.800 1.500 14 76 10 1.941 0.484 4.000 2.000 2.000 14 76 11 0.024 0.199 0.400 0.200 2.000 83 7 12 0.056 0.396 0.600 0.400 1.500 82 8 13 0.135 0.585 0.600 0.600 1.000 77 13 14 0.097 0.794 0.400 0.800 0.500 83 7 15 0.225 0.974 0.000 1.000 0.000 77 13 16 0.249 1.174 0.000 1.200 0.000 78 12 17 0.315 1.364 1.400 1.400 1.000 77 13 18 0.195 1.588 0.800 1.600 0.500 83 7 19 0.343 1.767 2.700 1.800 1.500 79 11 20 0.209 1.989 4.000 2.000 2.000 84 6

PAGE 181

181 Table A 11. Test struc ture of anchor item section (complex structure) Item 1 0.200 0.014 0.400 0.200 2.000 4 86 2 0.395 0.063 0.600 0.400 1.500 9 81 3 0.582 0.145 0.600 0.600 1.000 14 76 4 0.797 0.070 0.400 0.800 0.500 5 85 5 0.988 0.156 0.000 1.000 0.000 9 81 6 1.029 0.618 0.000 1.200 0.000 31 59 7 0.990 0.990 1.400 1.400 1.000 45 45 8 1.189 1.071 0. 800 1.600 0.500 42 48 9 1.438 1.083 2.700 1.800 1.500 37 53 10 1.532 1.286 4.000 2.000 2.000 40 50 11 0.100 0.173 0.400 0.200 2.000 60 30 12 0.218 0.335 0.600 0.400 1.500 57 33 13 0.336 0.497 0.600 0.600 1.000 56 34 14 0.535 0.595 0.400 0.800 0 .500 48 42 15 0.656 0.755 0.000 1.000 0.000 49 41 16 0.249 1.174 0.000 1.200 0.000 78 12 17 0.219 1.383 1.400 1.400 1.000 81 9 18 0.278 1.576 0.800 1.600 0.500 80 10 19 0.313 1.773 2.700 1.800 1.500 80 10 20 0.000 2.000 4.000 2.000 2.000 90 0

PAGE 182

182 Table A 12. Sample size used in previous MIRT linking/equating research Studies sample size Hirsch (1989) 2000 Davey, Oshima, Lee (1996) N/A Thompson, Nering, Davey (1997) N/A Li (1997) Li & Lissitz (2000) 1000/2000/4000 Oshima, Davey, Lee (2000) N/A Min (2003), Min (2007) 500/1000/2000 Reckase & Martineau (2004) 1734/1855/2044/2074/2167 Yon (2006) 2000 Yao (2008) 3000 Wei (2008) 500/1000/2000 Simon(2008) 500/1000/3000 Brossman (2010) 2500 Table A 13. Number of replications used in previous MIRT linking/equating research Studies Number of Replication Li (1997) Li & Lissitz (2000) 200, 100 Oshima, Davey, Lee (2000) 20 Min (2003), Min (2007) 25 Yon (2006) 10 Yao (2008) 500 Wei (2008) 500 Simon(2008) 50

PAGE 183

183 Table A 14. Abil ity distributions for examinee groups Group Base Group Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Table A 15. Simulation desi gn condition ( (2 b ase group 6 ability distributions) 2 test structure) condition Ability distribution Test Structure Base1 B Group APSS Base2 B Group CS 1 Group1 APSS 2 Group2 APSS 3 Group3 APSS 4 Group 4 APSS 5 Group5 APSS 6 Group6 APSS 7 Group1 CS 8 Group2 CS 9 Group3 CS 10 Group4 CS 11 Group5 CS 12 Group6 CS

PAGE 184

184 Table A 16. MIRT software used in previous MIRT linking/equating studies Studies Software Hirsch (1989) MIRTE Davey, Oshima, Lee (1996 ) NOHARM Thompson, Nering, Davey (1997) NOHARM Li (1997) Li & Lissitz (2000) TESTFACT Oshima, Davey, Lee (2000) NOHARM Min (2003), Min (2007) NOHARM Reckase & Martineau (2004) TESTFACT Yon (2006) TESTFACT Yao (2008) BMIRT Wei (2008) NOHARM Sim on(2008) TESTFACT Brossman (2010) TESTFACT Table A 17 Rotation methods used in previous MIRT linking/equating studies Studies Rotation Method Hirsch (1989) Orthogonal Procrustes Davey, Oshima, Lee (1996) Nonorthogonal Procrustes Thompson, Nering, D avey (1997) Orthogonal Procrustes Li (1997) Li & Lissitz (2000) Orthogonal Procrustes Oshima, Davey, Lee (2000) Nonorthogonal Procrustes Min (2003), Min (2007) Orthogonal Procrustes Reckase & Martineau (2004) Nonorthogonal Procrustes Yon (2006) Nonor thogonal Procrustes Yao (2008) Nonorthogonal Procrustes Wei (2008) Nonorthogonal Procrustes Simon(2008) Non /orthogonal Procruses Brossman (2010) N/A

PAGE 185

185 Table A 1 8. Repeated measure analysis results for weighted Bias and ARMSD Statistic Factors Sour ce Partial Between test_str 0.02067 Between group 0.92557 Between test_str*group 0.00458 Within link 0.84970 Within link*test_str 0.00641 Within link*group 0.88045 Within link*test_str*group 0.06019 Within equat 0.47878 Within equat*test_str 0.01469 Within equat*group 0.46236 Within equat*test_str*group 0.00459 Within link*equat 0.00185 Within link*equat*test_str 0.00342 Within link*equat*group 0.00873 Within link*equat*test_str*group 0.00429 Between test_str 0.00670 Between group 0.91944 Between test_str*g roup 0.02128 Within link 0.94089 Within link*test_str 0.03362 Within link*group 0.94122 Within link*test_str*group 0.15599 Within equat 0.57653 Within equat*test_str 0.01727 Within equat*group 0.58711 Within equat*test_str*group 0.02497 Within link*equat 0.38335 Within link*equat*test_str 0.03872 Within link*equat*group 0.40483 Within link*eq uat*test_str*group 0.04714 Note: link MIRT linking methods equate MIRT equating methods Group group distribution shape test_str test structure

PAGE 186

186 Table A 19. Weighted m ea n Bias for linking m ethods group Group Distribution Linking Methods Mean SD Cor Min OD TCF ICF NOP Mean Group 1 0.0 1.0 0.0 0.32926 0.24153 0.09908 0.08588 0.58308 0.26777 Group 2 0.0 0.8 0.0 0.29248 0.25659 0.10186 0.07712 0.42949 0.23151 Group 3 0.5 0.8 0.0 5 .1191 2.3007 1.8481 1.6189 9.0351 3.9844 Group 4 0.0 1.0 0.5 0.29875 0.64432 0.015 0.0637 0.74563 0.32199 Group 5 0.5 1.0 0.5 4.454 2.1323 1.4589 0.4105 8.1057 3.3123 Group 6 0.5 0.8 0.5 4.7977 2.2377 1.7191 0.4831 8.7142 3.5904 Mean 2.2417 0.9214 0.8067 0.4022 4.0161 1.6776 Note: SD Standard Deviation, Cor Correlation, Min Oshima, Davey, and Oshima, Davey, and Method Table A 20. Weighted mean ARMSD for linking m ethods group Group Distribution Linking Methods Mean SD Cor Min OD TCF ICF NOP Mean Group 1 0.0 1.0 0.0 0. 23089 0.28789 0.11543 0.11071 0.69485 0.28795 Group 2 0.0 0.8 0.0 0.20884 0.48575 0.26935 0.26063 0.55671 0.35626 Group 3 0.5 0.8 0.0 28.6865 7.52435 5.64556 4.9614 87.9546 26.9545 Group 4 0.0 1.0 0.5 0.54088 1.33343 2.00157 0.19166 1.24462 1.06243 Gro up 5 0.5 1.0 0.5 22.5897 11.2916 6.25571 1.50499 70.8016 22.4887 Group 6 0.5 0.8 0.5 26.0056 10.8617 5.61188 1.67444 81.6578 25.1623 Mean 13.0437 5.29746 3.31658 1.45064 40.485 12.7187 Note: SD Standard Deviation, Cor Correlation, Min od, OD Oshima, Davey, and Method

PAGE 187

187 Table A 21. Weighted mean Bias for eq uating m ethods group Group Distribution Equating Methods Mean SD Cor MOSE AOSE ATSE Mean Group 1 0.0 1.0 0.0 0.23211 0.33764 0.23354 0.36066 Group 2 0.0 0.8 0.0 0.24126 0.25349 0.19976 0.2989 Group 3 0.5 0.8 0.0 5.5469 3.354 1 3.0522 2.2306 Group 4 0.0 1.0 0.5 0.25488 0.40193 0.30917 0.4932 Group 5 0.5 1.0 0.5 4.7011 2.6595 2.5762 1.6874 Group 6 0.5 0.8 0.5 5.0657 2.9399 2.7655 1.8942 Mean 2.4309 1.3267 1.2752 1.6776 Note: SD Standard Deviation, Cor C orrelation, MOSE Full Information MIRT Observed Score Equating, AOSE Unidimensional Approximation of MIRT Observed Score Equating, ATSE Unidimensional Approximation of MIRT True Score Equating Table A 22. Weighted mean ARMSD for equating m ethods g roup Group Distribution Equating Methods Mean SD Cor MOSE AOSE ATSE Mean Group 1 0.0 1.0 0.0 0.23195 0.39008 0.24183 0.37277 Group 2 0.0 0.8 0.0 0.34226 0.45045 0.27605 0.37375 Group 3 0.5 0.8 0.0 40.9469 21.1754 18.7411 16.3327 G roup 4 0.0 1.0 0.5 1.0374 1.34202 0.80787 0.93746 Group 5 0.5 1.0 0.5 32.3293 18.7457 16.3913 13.7932 Group 6 0.5 0.8 0.5 36.924 20.5739 17.989 15.3574 Mean 18.6353 10.4463 9.07452 12.7187 Note: SD Standard Deviation, Cor Correlation, MOSE Full Information MIRT Observed Score Equating, AOSE Unidimensional Approximation of MIRT Observed Score Equating, ATSE Unidimensional Approximation of MIRT True Score Equating

PAGE 188

188 Table A c ondition 1 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalen t Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.561 0.000 0.561 0.000 0.561 0.000 0.561 1 1.112 0.987 0.125 1.004 0.108 0.9 76 0.136 2 2.049 1.981 0.068 1.983 0.066 1.958 0.091 3 3.065 2.974 0.091 2.982 0.083 2.943 0.122 4 4.111 3.970 0.141 3.975 0.136 3.933 0.178 5 5.082 4.970 0.112 4.969 0.113 4.928 0.155 6 6.063 5.974 0.088 5.958 0.104 5.930 0.133 7 7.0 34 6.983 0.050 6.947 0.086 6.939 0.094 8 8.016 7.997 0.019 7.937 0.079 7.955 0.061 9 9.017 9.013 0.004 8.927 0.090 8.976 0.041 10 10.008 10.033 0.025 9.928 0.080 10.000 0.008 11 10.993 11.054 0.061 10.941 0.052 11.026 0.033 12 11.977 12.07 7 0.099 11.965 0.013 12.054 0.076 13 12.973 13.101 0.128 12.995 0.022 13.082 0.109 14 13.979 14.125 0.145 14.030 0.050 14.111 0.132 15 14.974 15.149 0.175 15.066 0.092 15.140 0.166 16 15.965 16.173 0.208 16.104 0.139 16.169 0.204 17 16.973 17.196 0.2 23 17.144 0.171 17.197 0.224 18 17.975 18.219 0.245 18.185 0.210 18.224 0.250 19 18.983 19.241 0.258 19.225 0.242 19.250 0.267 20 19.986 20.262 0.277 20.265 0.279 20.275 0.290 21 20.974 21.283 0.308 21.304 0.330 21.298 0.324 22 21.974 22.302 0.329 22. 343 0.369 22.320 0.346 23 22.987 23.321 0.334 23.382 0.395 23.340 0.353 24 24.013 24.340 0.326 24.421 0.408 24.359 0.346 25 25.022 25.357 0.335 25.459 0.437 25.378 0.356 26 26.030 26.374 0.344 26.500 0.469 26.396 0.366 27 27.042 27.390 0.348 27.544 0. 503 27.416 0.374 28 28.036 28.405 0.369 28.593 0.556 28.436 0.399 29 29.015 29.420 0.405 29.640 0.625 29.456 0.441 30 30.010 30.434 0.424 30.681 0.672 30.474 0.464 31 31.008 31.445 0.438 31.712 0.704 31.489 0.481 32 31.998 32.453 0.455 32.728 0.730 32 .496 0.498 33 32.998 33.457 0.459 33.729 0.731 33.494 0.496 34 34.055 34.454 0.399 34.710 0.655 34.476 0.421 35 35.093 35.442 0.349 35.671 0.578 35.440 0.347 36 36.041 36.415 0.373 36.609 0.568 36.380 0.339 37 37.083 37.367 0.285 37.529 0.446 37.296 0 .213 38 38.056 38.299 0.243 38.435 0.379 38.188 0.132 39 38.957 39.213 0.256 39.307 0.350 39.071 0.114 40 39.812 39.980 0.167 39.956 0.144 38.294 1.519 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 189

189 Table A 24. Equating mean score and score difference for ODL direct method c ondition 1 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.561 0.000 0.561 0.000 0.561 0.000 0.561 1 1.112 0.985 0.127 1.003 0.109 0.993 0.119 2 2.049 1.977 0.072 1.990 0.058 1.979 0.070 3 3.065 2.970 0.095 2.984 0.081 2.964 0.101 4 4.111 3.963 0.148 3.982 0.129 3.951 0.160 5 5.082 4.958 0.124 4.981 0.101 4.941 0.141 6 6.063 5.956 0.107 5.975 0.087 5.935 0.128 7 7.034 6.956 0.078 6.971 0.063 6.934 0.100 8 8.016 7.959 0.057 7.968 0.048 7.937 0.079 9 9.017 8.964 0.053 8.970 0.047 8.943 0.074 10 10.008 9.972 0.036 9.982 0 .026 9.953 0.055 11 10.993 10.981 0.012 11.001 0.008 10.964 0.029 12 11.977 11.991 0.014 12.026 0.048 11.978 0.000 13 12.973 13.003 0.030 13.051 0.078 12.992 0.019 14 13.979 14.015 0.035 14.074 0.095 14.007 0.027 15 14.974 15.028 0.054 15.093 0.118 15.022 0.048 16 15.965 16.041 0.076 16.104 0.140 16.037 0.073 17 16.973 17.054 0.081 17.109 0.136 17.053 0.080 18 17.975 18.067 0.092 18.107 0.132 18.068 0.093 19 18.983 19.079 0.097 19.099 0.116 19.083 0.100 20 19.986 20.092 0.107 20.087 0.102 20.09 7 0.112 21 20.974 21.104 0.130 21.075 0.100 21.111 0.137 22 21.974 22.116 0.142 22.066 0.092 22.125 0.151 23 22.987 23.128 0.141 23.062 0.075 23.138 0.150 24 24.013 24.139 0.126 24.066 0.053 24.150 0.137 25 25.022 25.150 0.128 25.078 0.056 25.162 0.14 0 26 26.030 26.161 0.130 26.099 0.069 26.174 0.144 27 27.042 27.171 0.129 27.130 0.088 27.187 0.145 28 28.036 28.181 0.145 28.169 0.133 28.200 0.164 29 29.015 29.191 0.177 29.213 0.198 29.215 0.200 30 30.010 30.201 0.191 30.256 0.246 30.229 0.219 31 31.008 31.210 0.203 31.295 0.287 31.243 0.235 32 31.998 32.218 0.220 32.325 0.327 32.254 0.255 33 32.998 33.224 0.226 33.343 0.345 33.259 0.261 34 34.055 34.227 0.172 34.346 0.291 34.255 0.200 35 35.093 35.224 0.131 35.332 0.239 35.238 0.145 36 36.041 36.212 0.171 36.300 0.259 36.206 0.164 37 37.083 37.189 0.106 37.254 0.171 37.158 0.075 38 38.056 38.151 0.095 38.197 0.142 38.096 0.040 39 38.957 39.104 0.147 39.132 0.175 39.033 0.076 40 39.812 39.958 0.146 39.917 0.104 38.294 1.519 Note: Test sco res at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 190

190 Table A 25. Equating mean score and score difference for TCF linking method c ondition 1 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Appro x True Pop Equivalent Mean Score Differenc e Mean Score Difference Mean Score Difference 0 0.561 0.000 0.561 0.000 0.561 0.000 0.561 1 1.112 0.974 0.137 0.993 0.119 0.996 0.116 2 2.049 1.966 0.082 1.992 0.057 1.982 0.067 3 3.065 2.9 58 0.107 2.987 0.078 2.967 0.098 4 4.111 3.950 0.161 3.984 0.127 3.952 0.159 5 5.082 4.943 0.139 4.977 0.105 4.939 0.143 6 6.063 5.938 0.125 5.964 0.099 5.929 0.134 7 7.034 6.936 0.098 6.953 0.081 6.923 0.111 8 8.016 7.935 0.081 7.944 0.072 7.921 0.095 9 9.017 8.938 0.079 8.937 0.080 8.923 0.094 10 10.008 9.942 0.066 9.937 0.071 9.928 0.080 11 10.993 10.948 0.045 10.944 0.050 10.935 0.058 12 11.977 11.956 0.022 11.956 0.022 11.944 0.033 13 12.973 12.964 0.009 12.971 0.002 12.955 0.018 14 13.979 13.973 0.006 13.987 0.007 13.966 0.014 15 14.974 14.983 0.009 15.001 0.027 14.977 0.003 16 15.965 15.994 0.029 16.012 0.048 15.989 0.025 17 16.973 17.004 0.031 17.020 0.047 17.002 0.029 18 17.975 18.014 0.039 18.025 0. 050 18.014 0.039 19 18.983 19.024 0.041 19.025 0.042 19.026 0.043 20 19.986 20.034 0.049 20.024 0.039 20.038 0.053 21 20.974 21.044 0.070 21.023 0.049 21.050 0.076 22 21.974 22.054 0.080 22.024 0.050 22.061 0.088 23 22.987 23.064 0.077 23.030 0.043 23 .072 0.085 24 24.013 24.073 0.060 24.040 0.027 24.083 0.070 25 25.022 25.083 0.061 25.056 0.035 25.094 0.072 26 26.030 26.092 0.062 26.078 0.048 26.105 0.074 27 27.042 27.102 0.060 27.107 0.065 27.116 0.075 28 28.036 28.112 0.075 28.141 0.105 28.129 0 .092 29 29.015 29.121 0.106 29.179 0.164 29.142 0.127 30 30.010 30.131 0.121 30.215 0.205 30.156 0.146 31 31.008 31.140 0.132 31.245 0.237 31.169 0.161 32 31.998 32.148 0.150 32.265 0.267 32.179 0.181 33 32.998 33.154 0.157 33.274 0.276 33.185 0.187 34 34.055 34.158 0.103 34.268 0.213 34.183 0.128 35 35.093 35.158 0.065 35.248 0.155 35.170 0.077 36 36.041 36.150 0.109 36.213 0.171 36.143 0.102 37 37.083 37.133 0.050 37.167 0.084 37.105 0.022 38 38.056 38.105 0.050 38.116 0.060 38.058 0.002 39 38. 957 39.072 0.115 39.066 0.109 39.016 0.059 40 39.812 39.954 0.141 39.894 0.081 38.294 1.519 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 191

191 Table A 26 Equating mean score and score difference for ICF linking method c ondition 1 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.561 0.000 0.561 0.000 0.561 0.000 0.561 1 1.112 0.973 0.139 0.990 0.122 0.994 0.118 2 2.049 1.963 0.085 1.980 0.069 1.979 0.070 3 3.065 2.954 0.111 2.977 0.088 2.963 0.102 4 4.111 3.946 0.166 3.978 0.134 3.947 0.164 5 5.082 4.938 0.144 4.974 0.108 4.933 0.149 6 6.063 5.933 0.1 30 5.962 0.101 5.923 0.140 7 7.034 6.930 0.104 6.947 0.086 6.917 0.117 8 8.016 7.930 0.086 7.936 0.080 7.915 0.101 9 9.017 8.933 0.084 8.930 0.087 8.917 0.100 10 10.008 9.937 0.071 9.930 0.078 9.923 0.085 11 10.993 10.944 0.049 10.938 0.055 10.931 0.062 12 11.977 11.952 0.026 11.952 0.025 11.940 0.037 13 12.973 12.961 0.012 12.969 0.004 12.951 0.022 14 13.979 13.971 0.009 13.986 0.007 13.963 0.017 15 14.974 14.981 0.007 15.001 0.027 14.975 0.001 16 15.965 15.992 0.027 16.0 14 0.049 15.988 0.023 17 16.973 17.003 0.030 17.022 0.049 17.001 0.028 18 17.975 18.014 0.039 18.026 0.051 18.014 0.039 19 18.983 19.025 0.042 19.026 0.043 19.027 0.044 20 19.986 20.036 0.050 20.024 0.038 20.040 0.054 21 20.974 21.046 0.072 21.022 0.0 48 21.052 0.078 22 21.974 22.057 0.083 22.023 0.049 22.064 0.090 23 22.987 23.067 0.080 23.028 0.041 23.076 0.089 24 24.013 24.077 0.064 24.039 0.025 24.087 0.074 25 25.022 25.087 0.066 25.055 0.033 25.099 0.077 26 26.030 26.097 0.067 26.078 0.047 26. 110 0.080 27 27.042 27.107 0.066 27.108 0.066 27.122 0.080 28 28.036 28.117 0.081 28.143 0.107 28.135 0.099 29 29.015 29.128 0.113 29.182 0.167 29.149 0.134 30 30.010 30.137 0.128 30.219 0.209 30.163 0.153 31 31.008 31.147 0.139 31.251 0.244 31.176 0. 169 32 31.998 32.156 0.157 32.275 0.276 32.188 0.190 33 32.998 33.163 0.165 33.286 0.288 33.194 0.196 34 34.055 34.167 0.112 34.283 0.228 34.193 0.138 35 35.093 35.167 0.074 35.265 0.172 35.181 0.088 36 36.041 36.159 0.118 36.231 0.190 36.155 0.114 3 7 37.083 37.142 0.059 37.184 0.102 37.116 0.033 38 38.056 38.114 0.059 38.132 0.076 38.068 0.012 39 38.957 39.079 0.122 39.080 0.124 39.020 0.063 40 39.812 39.959 0.147 39.904 0.091 38.294 1.519 Note: Test scores at both ends (i.e., 0, 40) are not cou nted due to ad hoc procedure (Kolen, 1981).

PAGE 192

192 Table A 27. Equating mean score and score difference for NOP l inkin g method c ondition 1 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score D ifferenc e Mean Score Differenc e 0 0.561 0.000 0.561 0.000 0.561 0.000 0.561 1 1.112 1.139 0.027 1.242 0.130 1.021 0.091 2 2.049 2.159 0.111 2.250 0.201 2.036 0.013 3 3.065 3.186 0.121 3.257 0.192 3.062 0.003 4 4.111 4.217 0.106 4.257 0.146 4.102 0.009 5 5.082 5.252 0.170 5.254 0.171 5.153 0.070 6 6.063 6.289 0.226 6.253 0.190 6.207 0.144 7 7.034 7.325 0.292 7.259 0.225 7.260 0.226 8 8.016 8.360 0.344 8.280 0.264 8.309 0.293 9 9.017 9.393 0.376 9.324 0.307 9.353 0.336 10 10.008 10.424 0.416 10.373 0.365 10.392 0.384 11 10.993 11.452 0.459 11.420 0.427 11.427 0.433 12 11.977 12.477 0.500 12.464 0.487 12.457 0.480 13 12.973 13.499 0.526 13.503 0.530 13.485 0.512 14 13.979 14.517 0.538 14.536 0.556 14.510 0.530 15 14.974 15.533 0.559 15.56 5 0.591 15.531 0.557 16 15.965 16.546 0.581 16.591 0.627 16.550 0.586 17 16.973 17.556 0.583 17.616 0.643 17.566 0.593 18 17.975 18.565 0.590 18.637 0.662 18.579 0.605 19 18.983 19.572 0.589 19.654 0.671 19.589 0.606 20 19.986 20.578 0.592 20.668 0.68 2 20.596 0.610 21 20.974 21.582 0.608 21.677 0.703 21.599 0.625 22 21.974 22.586 0.612 22.684 0.710 22.601 0.627 23 22.987 23.588 0.601 23.686 0.699 23.600 0.613 24 24.013 24.589 0.576 24.687 0.674 24.599 0.586 25 25.022 25.589 0.567 25.687 0.665 25.5 97 0.575 26 26.030 26.588 0.558 26.688 0.657 26.596 0.566 27 27.042 27.587 0.545 27.690 0.648 27.596 0.554 28 28.036 28.585 0.548 28.693 0.657 28.596 0.560 29 29.015 29.582 0.567 29.696 0.682 29.595 0.581 30 30.010 30.577 0.567 30.698 0.688 30.593 0.5 83 31 31.008 31.571 0.563 31.697 0.689 31.585 0.578 32 31.998 32.561 0.563 32.689 0.691 32.572 0.574 33 32.998 33.547 0.550 33.672 0.674 33.549 0.551 34 34.055 34.528 0.473 34.645 0.590 34.513 0.458 35 35.093 35.500 0.407 35.605 0.512 35.461 0.368 36 36.041 36.457 0.416 36.550 0.509 36.388 0.347 37 37.083 37.398 0.315 37.483 0.400 37.293 0.210 38 38.056 38.317 0.261 38.399 0.343 38.178 0.122 39 38.957 39.211 0.255 39.270 0.313 39.058 0.102 40 39.812 39.960 0.148 39.932 0.120 38.294 1.519 Note: T est scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 193

193 Table A 28. Equatin c ondition 2 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalen t Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.351 0.000 0.351 0.000 0.351 0.000 0.351 1 0.939 0.779 0.160 1.148 0.209 1.030 0.091 2 1.890 1.741 0.149 2.137 0.247 2.023 0.133 3 2.788 2.696 0.092 3.114 0.3 26 2.989 0.201 4 3.708 3.656 0.052 4.080 0.372 3.943 0.236 5 4.655 4.626 0.029 5.040 0.385 4.906 0.251 6 5.606 5.613 0.007 5.991 0.385 5.886 0.280 7 6.627 6.618 0.009 6.933 0.306 6.884 0.257 8 7.616 7.640 0.024 7.873 0.257 7.898 0.282 9 8.637 8.67 6 0.039 8.830 0.192 8.923 0.286 10 9.672 9.722 0.050 9.814 0.141 9.955 0.282 11 10.710 10.777 0.067 10.824 0.114 10.990 0.280 12 11.748 11.838 0.090 11.855 0.107 12.027 0.279 13 12.769 12.902 0.133 12.899 0.130 13.063 0.294 14 13.823 13.969 0.145 13.9 48 0.124 14.097 0.274 15 14.874 15.036 0.162 14.998 0.124 15.130 0.256 16 15.915 16.102 0.188 16.049 0.134 16.160 0.245 17 16.971 17.167 0.196 17.100 0.128 17.186 0.215 18 18.014 18.229 0.215 18.150 0.136 18.209 0.195 19 19.058 19.290 0.232 19.199 0.1 41 19.229 0.171 20 20.102 20.348 0.246 20.245 0.143 20.245 0.143 21 21.143 21.405 0.262 21.291 0.147 21.259 0.116 22 22.177 22.461 0.284 22.337 0.160 22.272 0.094 23 23.220 23.518 0.298 23.387 0.167 23.284 0.064 24 24.266 24.575 0.310 24.441 0.175 24. 298 0.032 25 25.312 25.634 0.322 25.501 0.189 25.316 0.004 26 26.339 26.696 0.357 26.573 0.234 26.340 0.001 27 27.349 27.761 0.412 27.662 0.313 27.373 0.024 28 28.365 28.830 0.466 28.765 0.400 28.413 0.048 29 29.393 29.903 0.510 29.875 0.482 29.460 0. 067 30 30.434 30.979 0.545 30.980 0.547 30.511 0.077 31 31.459 32.052 0.593 32.071 0.612 31.560 0.100 32 32.493 33.120 0.627 33.136 0.643 32.600 0.107 33 33.432 34.176 0.744 34.166 0.734 33.623 0.191 34 34.471 35.212 0.740 35.158 0.686 34.622 0.150 3 5 35.488 36.216 0.727 36.107 0.618 35.588 0.100 36 36.521 37.174 0.653 37.015 0.494 36.516 0.004 37 37.537 38.079 0.541 37.888 0.351 37.403 0.134 38 38.415 38.925 0.509 38.726 0.310 38.251 0.165 39 39.361 39.666 0.305 39.493 0.132 39.077 0.284 40 40.318 40.000 0.318 39.963 0.355 38.294 2.024 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 194

194 Table A 29. Equating mean score and score difference for ODL direct method c ondition 2 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.351 0.000 0.351 0.000 0.351 0.000 0.351 1 0.939 0.647 0.292 0.733 0.206 0.913 0.026 2 1.890 1 .528 0.362 1.656 0.234 1.851 0.039 3 2.788 2.405 0.383 2.641 0.147 2.778 0.009 4 3.708 3.320 0.388 3.623 0.084 3.691 0.017 5 4.655 4.265 0.390 4.601 0.054 4.607 0.048 6 5.606 5.236 0.371 5.571 0.036 5.543 0.063 7 6.627 6.231 0.396 6.52 5 0.102 6.507 0.120 8 7.616 7.252 0.364 7.452 0.164 7.500 0.116 9 8.637 8.295 0.342 8.378 0.259 8.519 0.118 10 9.672 9.359 0.313 9.339 0.333 9.559 0.113 11 10.710 10.438 0.271 10.340 0.370 10.615 0.095 12 11.748 11.530 0.218 11.379 0.3 68 11.682 0.065 13 12.769 12.629 0.140 12.452 0.318 12.757 0.012 14 13.823 13.734 0.089 13.545 0.278 13.836 0.013 15 14.874 14.842 0.031 14.648 0.225 14.917 0.044 16 15.915 15.952 0.037 15.755 0.159 15.999 0.084 17 16.971 17.062 0.091 16.862 0.109 17.080 0.108 18 18.014 18.171 0.157 17.968 0.047 18.158 0.144 19 19.058 19.280 0.222 19.070 0.012 19.234 0.176 20 20.102 20.386 0.285 20.170 0.068 20.306 0.204 21 21.143 21.493 0.350 21.270 0.126 21.375 0.232 22 22.177 22.600 0.423 22.374 0.19 7 22.442 0.265 23 23.220 23.707 0.486 23.486 0.266 23.508 0.288 24 24.266 24.813 0.547 24.607 0.341 24.575 0.310 25 25.312 25.919 0.606 25.741 0.429 25.646 0.334 26 26.339 27.026 0.687 26.896 0.557 26.724 0.385 27 27.349 28.136 0.787 28.074 0.725 27.8 10 0.461 28 28.365 29.250 0.886 29.270 0.906 28.902 0.538 29 29.393 30.368 0.975 30.471 1.078 29.997 0.604 30 30.434 31.483 1.050 31.652 1.219 31.086 0.652 31 31.459 32.590 1.130 32.797 1.337 32.159 0.700 32 32.493 33.679 1.186 33.891 1.399 33.206 0.7 13 33 33.432 34.744 1.312 34.928 1.496 34.214 0.782 34 34.471 35.773 1.302 35.901 1.429 35.174 0.703 35 35.488 36.755 1.266 36.815 1.327 36.080 0.591 36 36.521 37.675 1.154 37.672 1.151 36.927 0.406 37 37.537 38.516 0.979 38.469 0.931 37.718 0.181 38 38.415 39.244 0.828 39.189 0.773 38.460 0.044 39 39.361 39.833 0.472 39.734 0.372 39.182 0.179 40 40.318 40.000 0.318 39.970 0.347 38.294 2.024 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 195

195 Tabl e A 30. Equating mean score and score difference for TCF linking method c ondition 2 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Diff erence 0 0.351 0.000 0.351 0.000 0.351 0.000 0.351 1 0.939 0.648 0.291 0.787 0.152 0.930 0.009 2 1.890 1.541 0.349 1.718 0.173 1.874 0.016 3 2.788 2.433 0.355 2.690 0.098 2.803 0.015 4 3.708 3.354 0.354 3.665 0.043 3.716 0.008 5 4.655 4. 296 0.359 4.630 0.025 4.632 0.023 6 5.606 5.262 0.345 5.590 0.016 5.569 0.038 7 6.627 6.252 0.375 6.540 0.087 6.531 0.096 8 7.616 7.267 0.349 7.471 0.145 7.522 0.094 9 8.637 8.304 0.333 8.395 0.242 8.536 0.101 10 9.672 9.360 0.312 9.35 0 0.323 9.570 0.103 11 10.710 10.431 0.279 10.342 0.368 10.618 0.092 12 11.748 11.513 0.235 11.371 0.377 11.676 0.071 13 12.769 12.603 0.166 12.430 0.339 12.741 0.028 14 13.823 13.699 0.124 13.510 0.314 13.810 0.014 15 14.874 14.798 0.0 76 14.600 0.274 14.880 0.006 16 15.915 15.898 0.017 15.695 0.220 15.950 0.035 17 16.971 16.999 0.028 16.793 0.179 17.019 0.048 18 18.014 18.099 0.085 17.892 0.122 18.087 0.072 19 19.058 19.198 0.140 18.991 0.067 19.151 0.093 20 20.102 20.297 0.1 95 20.089 0.013 20.213 0.111 21 21.143 21.394 0.251 21.188 0.045 21.272 0.129 22 22.177 22.491 0.314 22.291 0.113 22.330 0.152 23 23.220 23.589 0.369 23.399 0.179 23.387 0.166 24 24.266 24.686 0.420 24.513 0.248 24.445 0.180 25 25.312 25.783 0.470 25 .638 0.326 25.508 0.196 26 26.339 26.882 0.543 26.780 0.441 26.577 0.238 27 27.349 27.985 0.636 27.942 0.593 27.654 0.305 28 28.365 29.093 0.728 29.122 0.757 28.738 0.374 29 29.393 30.205 0.812 30.306 0.912 29.826 0.433 30 30.434 31.317 0.883 31.472 1 .038 30.910 0.476 31 31.459 32.421 0.962 32.604 1.145 31.981 0.522 32 32.493 33.511 1.018 33.691 1.199 33.030 0.537 33 33.432 34.579 1.147 34.724 1.293 34.045 0.613 34 34.471 35.615 1.143 35.699 1.227 35.017 0.546 35 35.488 36.607 1.118 36.619 1.130 3 5.940 0.452 36 36.521 37.541 1.020 37.485 0.964 36.810 0.289 37 37.537 38.404 0.866 38.299 0.761 37.628 0.090 38 38.415 39.153 0.738 39.041 0.625 38.400 0.015 39 39.361 39.761 0.400 39.650 0.288 39.155 0.207 40 40.318 40.000 0.318 39.987 0.330 38. 294 2.024 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 196

196 Table A 31. Equating mean score and score difference for ICF linking method condition 2 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.351 0.000 0.351 0.000 0.351 0.000 0.351 1 0.939 0.650 0.289 0.788 0.152 0.932 0.007 2 1.890 1.542 0.348 1.716 0.175 1.8 76 0.014 3 2.788 2.432 0.356 2.692 0.096 2.805 0.017 4 3.708 3.352 0.356 3.667 0.041 3.717 0.010 5 4.655 4.294 0.361 4.635 0.020 4.634 0.021 6 5.606 5.260 0.347 5.594 0.012 5.569 0.037 7 6.627 6.250 0.378 6.541 0.086 6.532 0.096 8 7.616 7.264 0.352 7.471 0.145 7.521 0.095 9 8.637 8.301 0.337 8.394 0.243 8.534 0.103 10 9.672 9.356 0.317 9.348 0.324 9.567 0.105 11 10.710 10.426 0.284 10.340 0.370 10.614 0.095 12 11.748 11.508 0.240 11.369 0.379 11.672 0.076 13 12.769 12 .597 0.172 12.428 0.342 12.736 0.034 14 13.823 13.692 0.132 13.507 0.316 13.803 0.020 15 14.874 14.789 0.084 14.596 0.277 14.872 0.001 16 15.915 15.889 0.026 15.690 0.225 15.941 0.027 17 16.971 16.989 0.017 16.787 0.185 17.010 0.038 18 18. 014 18.088 0.074 17.884 0.130 18.076 0.062 19 19.058 19.186 0.128 18.981 0.077 19.140 0.082 20 20.102 20.283 0.182 20.076 0.026 20.201 0.099 21 21.143 21.380 0.237 21.173 0.029 21.259 0.116 22 22.177 22.477 0.300 22.273 0.096 22.316 0.138 23 23.220 23.574 0.354 23.379 0.159 23.372 0.152 24 24.266 24.671 0.406 24.493 0.227 24.430 0.165 25 25.312 25.768 0.456 25.617 0.304 25.493 0.180 26 26.339 26.867 0.528 26.758 0.419 26.562 0.223 27 27.349 27.970 0.621 27.919 0.570 27.639 0.290 28 28.365 29.07 7 0.712 29.098 0.733 28.723 0.358 29 29.393 30.187 0.794 30.281 0.887 29.810 0.416 30 30.434 31.298 0.865 31.446 1.012 30.893 0.460 31 31.459 32.402 0.943 32.577 1.118 31.964 0.505 32 32.493 33.491 0.999 33.664 1.171 33.012 0.519 33 33.432 34.559 1.12 8 34.697 1.266 34.027 0.595 34 34.471 35.596 1.125 35.673 1.202 34.999 0.528 35 35.488 36.589 1.100 36.594 1.106 35.923 0.435 36 36.521 37.524 1.003 37.462 0.941 36.795 0.274 37 37.537 38.387 0.849 38.276 0.739 37.615 0.078 38 38.415 39.135 0.719 39.0 21 0.605 38.391 0.024 39 39.361 39.749 0.388 39.632 0.271 39.150 0.211 40 40.318 40.000 0.318 39.979 0.339 38.294 2.024 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 197

197 Table A 32. Equating mean sc ore and score difference for NOP linking method condition 2 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.351 0.000 0.351 0.000 0.351 0.000 0.351 1 0.939 0.820 0.120 1.251 0.312 1.052 0.113 2 1.890 1.791 0.099 2.246 0.356 2.060 0.170 3 2.788 2.764 0.024 3.236 0.448 3.043 0.255 4 3.708 3.742 0.034 4.212 0.505 4.018 0.311 5 4.655 4.727 0.072 5.178 0.523 5.002 0.347 6 5.606 5.726 0.120 6.130 0.524 6.000 0.394 7 6.627 6.740 0.113 7.073 0.445 7.012 0.385 8 7.616 7.769 0.153 8.022 0.406 8.035 0.419 9 8.637 8.809 0.171 8.998 0.361 9.065 0.428 10 9.672 9.858 0.185 9.999 0.327 10.098 0.426 11 10.710 10.912 0.202 11.020 0.311 11.132 0.422 12 11.748 11.971 0.223 12.055 0.308 12.164 0.417 13 12.769 13.032 0.262 13.097 0.328 13.195 0.426 14 13.823 14.093 0.269 14.139 0.316 14.223 0.400 15 14.874 15.153 0.279 15.181 0.308 15.248 0.375 16 15.915 16.211 0.296 16.222 0.307 16.270 0.355 17 1 6.971 17.267 0.295 17.261 0.290 17.287 0.316 18 18.014 18.321 0.306 18.298 0.284 18.301 0.287 19 19.058 19.372 0.314 19.332 0.274 19.311 0.253 20 20.102 20.423 0.321 20.363 0.261 20.318 0.216 21 21.143 21.472 0.329 21.393 0.250 21.322 0.178 22 22.177 22.521 0.343 22.423 0.246 22.324 0.147 23 23.220 23.569 0.349 23.454 0.233 23.327 0.106 24 24.266 24.618 0.352 24.487 0.222 24.331 0.066 25 25.312 25.668 0.355 25.527 0.215 25.340 0.028 26 26.339 26.720 0.381 26.578 0.238 26.355 0.016 27 27.349 27.777 0.428 27.640 0.291 27.378 0.029 28 28.365 28.838 0.473 28.716 0.351 28.409 0.044 29 29.393 29.903 0.510 29.799 0.406 29.446 0.053 30 30.434 30.972 0.538 30.882 0.448 30.487 0.054 31 31.459 32.038 0.579 31.955 0.496 31.527 0.068 32 32.493 33.099 0.606 33.011 0.518 32.560 0.068 33 33.432 34.148 0.717 34.041 0.609 33.580 0.148 34 34.471 35.179 0.708 35.039 0.567 34.578 0.107 35 35.488 36.180 0.692 36.001 0.512 35.548 0.060 36 36.521 37.139 0.618 36.926 0.405 36.483 0.038 37 37.537 38.041 0.504 37.8 13 0.275 37.377 0.160 38 38.415 38.887 0.471 38.669 0.253 38.232 0.183 39 39.361 39.632 0.271 39.454 0.093 39.066 0.296 40 40.318 39.997 0.321 39.947 0.371 38.294 2.024 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc pr ocedure (Kolen, 1981).

PAGE 198

198 Table A 33. Equating mean score and score difference for linking method condition 3 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalen t Mean Score Differenc e Mean Score Differenc e Mean Sc ore Differenc e 0 0.500 0.000 0.500 0.000 0.500 0.000 0.500 1 0.877 0.000 0.877 0.021 0.856 0.760 0.117 2 1.897 0.010 1.887 0.215 1.682 1.500 0.397 3 2.776 0.461 2.315 0.775 2.001 2.189 0.587 4 3.660 0.864 2.797 1.626 2.035 2.815 0.845 5 4.575 1.490 3.085 2.529 2.046 3.401 1.174 6 5.419 1.970 3.449 3.422 1.997 3.972 1.447 7 6.364 2.583 3.781 4.282 2.082 4.551 1.813 8 7.295 3.127 4.168 5.088 2.207 5.151 2.144 9 8.272 3.739 4.533 5.797 2.475 5.784 2.488 10 9.333 4.371 4. 962 6.470 2.862 6.455 2.877 11 10.367 5.002 5.365 7.074 3.293 7.169 3.198 12 11.369 5.698 5.671 7.701 3.668 7.926 3.443 13 12.367 6.418 5.949 8.338 4.029 8.723 3.644 14 13.416 7.160 6.256 9.020 4.396 9.557 3.859 15 14.475 7.949 6.526 9. 765 4.710 10.422 4.053 16 15.480 8.783 6.697 10.564 4.916 11.314 4.165 17 16.508 9.655 6.853 11.413 5.095 12.231 4.277 18 17.532 10.561 6.971 12.307 5.226 13.169 4.363 19 18.560 11.501 7.059 13.240 5.320 14.128 4.432 20 19.595 12.475 7. 120 14.205 5.390 15.108 4.487 21 20.595 13.479 7.117 15.196 5.399 16.109 4.486 22 21.625 14.511 7.114 16.213 5.411 17.133 4.491 23 22.664 15.568 7.096 17.257 5.407 18.180 4.483 24 23.697 16.647 7.049 18.327 5.370 19.249 4.448 25 24.710 1 7.749 6.961 19.423 5.288 20.335 4.375 26 25.744 18.873 6.871 20.542 5.201 21.435 4.308 27 26.771 20.017 6.753 21.683 5.088 22.543 4.228 28 27.785 21.182 6.604 22.837 4.948 23.652 4.134 29 28.826 22.365 6.461 23.995 4.831 24.759 4.067 30 29.875 23.567 6.308 25.149 4.726 25.865 4.010 31 30.897 24.792 6.104 26.300 4.597 26.978 3.919 32 31.907 26.040 5.866 27.457 4.450 28.108 3.798 33 32.926 27.313 5.612 28.630 4.296 29.275 3.650 34 33.947 28.620 5.327 29.830 4.118 30.502 3.445 35 34.971 29.970 5.001 31.066 3.905 31.818 3.153 36 35.935 31.379 4.556 32.353 3.582 33.261 2.674 37 36.953 32.887 4.067 33.712 3.242 34.869 2.084 38 38.001 34.544 3.457 35.157 2.845 36.665 1.336 39 38.920 36.404 2.516 36.699 2.221 38.540 0.381 40 39.690 38.418 1.272 38.353 1.337 38.294 1.396 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 199

199 Table A 34. Equating mean score and score difference for ODL Direct method c ondition 3 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.500 0.000 0.500 0.000 0.500 0.000 0.500 1 0.877 0.000 0.877 0.253 0.624 0.869 0.008 2 1.897 0.523 1.374 0.919 0.978 1.719 0.179 3 2.776 1.037 1.739 1.828 0.948 2.525 0.251 4 3.660 1.720 1.941 2.773 0.888 3.285 0.376 5 4.575 2.425 2.150 3.701 0.874 4.024 0.550 6 5.419 3.046 2.373 4.614 0.805 4.770 0.648 7 6.364 3.756 2.608 5.511 0.853 5.541 0.823 8 7.295 4.490 2.805 6.357 0.938 6.346 0.950 9 8.272 5.211 3.061 7.118 1.154 7.189 1.083 10 9.333 5.975 3.357 7.877 1.456 8.070 1.263 11 10.367 6.786 3.581 8.672 1.695 8.985 1.382 12 11.369 7.628 3.74 0 9.505 1.864 9.929 1.440 13 12.367 8.497 3.870 10.391 1.976 10.895 1.472 14 13.416 9.394 4.022 11.330 2.086 11.878 1.538 15 14.475 10.321 4.154 12.310 2.164 12.873 1.602 16 15.480 11.277 4.203 13.320 2.160 13.877 1.603 17 16.508 12.261 4.246 14.344 2.163 14.889 1.618 18 17.532 13.271 4.262 15.375 2.157 15.908 1.625 19 18.560 14.302 4.258 16.408 2.152 16.933 1.627 20 19.595 15.351 4.244 17.443 2.152 17.964 1.630 21 20.595 16.415 4.181 18.478 2.118 19.002 1.593 22 21.62 5 17.491 4.134 19.514 2.110 20.047 1.578 23 22.664 18.578 4.085 20.557 2.107 21.097 1.567 24 23.697 19.676 4.021 21.609 2.088 22.150 1.546 25 24.710 20.784 3.926 22.672 2.038 23.207 1.504 26 25.744 21.903 3.841 23.746 1.998 24.262 1.481 27 26.771 23.031 3.740 24.828 1.942 25.316 1.454 28 27.785 24.167 3.618 25.916 1.869 26.368 1.417 29 28.826 25.310 3.515 27.007 1.819 27.417 1.408 30 29.875 26.461 3.415 28.101 1.774 28.468 1.407 31 30.897 27.621 3.275 29.196 1.701 29.52 5 1.372 32 31.907 28.796 3.110 30.290 1.617 30.592 1.314 33 32.926 29.986 2.940 31.384 1.542 31.677 1.249 34 33.947 31.188 2.759 32.480 1.467 32.785 1.163 35 34.971 32.405 2.566 33.580 1.390 33.923 1.047 36 35.935 33.652 2.283 34.689 1. 246 35.099 0.836 37 36.953 34.952 2.001 35.813 1.140 36.317 0.636 38 38.001 36.309 1.692 36.959 1.042 37.577 0.424 39 38.920 37.747 1.173 38.130 0.791 38.848 0.073 40 39.690 39.232 0.458 39.327 0.363 38.294 1.396 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 200

200 Table A 35. Equating mean score and score difference for TCF linking method condition 3 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.500 0.000 0.500 0.000 0.500 0.000 0.500 1 0.877 0.056 0.821 0.390 0.487 0.893 0.016 2 1.897 0.601 1.296 1.129 0.768 1.765 0.132 3 2.776 1.177 1.599 2.051 0 .725 2.598 0.177 4 3.660 1.891 1.769 2.997 0.664 3.392 0.269 5 4.575 2.600 1.975 3.923 0.651 4.170 0.404 6 5.419 3.253 2.166 4.843 0.576 4.959 0.460 7 6.364 3.988 2.376 5.749 0.615 5.774 0.590 8 7.295 4.745 2.550 6.607 0.688 6.621 0.67 4 9 8.272 5.497 2.774 7.391 0.881 7.504 0.768 10 9.333 6.290 3.043 8.185 1.147 8.420 0.913 11 10.367 7.124 3.243 9.015 1.352 9.365 1.002 12 11.369 7.987 3.381 9.882 1.487 10.333 1.035 13 12.367 8.876 3.491 10.797 1.570 11.320 1.047 14 13.416 9.792 3.624 11.759 1.657 12.319 1.097 15 14.475 10.735 3.740 12.758 1.717 13.328 1.147 16 15.480 11.706 3.774 13.781 1.698 14.342 1.138 17 16.508 12.702 3.806 14.819 1.688 15.362 1.146 18 17.532 13.719 3.813 15.864 1.668 16.385 1. 148 19 18.560 14.756 3.804 16.912 1.648 17.412 1.148 20 19.595 15.808 3.787 17.962 1.633 18.442 1.153 21 20.595 16.872 3.723 19.012 1.584 19.476 1.120 22 21.625 17.948 3.677 20.062 1.563 20.513 1.111 23 22.664 19.033 3.631 21.116 1.548 2 1.554 1.109 24 23.697 20.126 3.571 22.174 1.523 22.597 1.100 25 24.710 21.229 3.481 23.240 1.470 23.641 1.069 26 25.744 22.340 3.404 24.311 1.433 24.685 1.059 27 26.771 23.459 3.311 25.386 1.385 25.727 1.044 28 27.785 24.585 3.200 26.462 1.323 26.766 1.019 29 28.826 25.716 3.110 27.538 1.288 27.804 1.022 30 29.875 26.853 3.022 28.613 1.262 28.843 1.032 31 30.897 27.999 2.898 29.685 1.212 29.885 1.012 32 31.907 29.157 2.749 30.752 1.155 30.935 0.972 33 32.926 30.329 2.5 97 31.813 1.113 31.996 0.930 34 33.947 31.512 2.436 32.872 1.075 33.075 0.873 35 34.971 32.707 2.264 33.932 1.039 34.176 0.795 36 35.935 33.926 2.008 34.996 0.939 35.305 0.629 37 36.953 35.190 1.763 36.072 0.881 36.469 0.485 38 38.001 36 .505 1.496 37.168 0.833 37.667 0.334 39 38.920 37.883 1.037 38.289 0.631 38.878 0.042 40 39.690 39.303 0.387 39.430 0.260 38.294 1.396 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 201

201 Table A 3 6. Equating mean score and score difference for ICF linking method condition 3 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.500 0.000 0.500 0.000 0.500 0.000 0.500 1 0.877 0.053 0.824 0.382 0.495 0.892 0.015 2 1.897 0.598 1.299 1.121 0.776 1.763 0.134 3 2.776 1.172 1.604 2.044 0.731 2.595 0.181 4 3.660 1.884 1.777 2.990 0.671 3.386 0.274 5 4.575 2.592 1. 982 3.913 0.661 4.164 0.411 6 5.419 3.246 2.173 4.833 0.586 4.951 0.468 7 6.364 3.979 2.385 5.737 0.627 5.764 0.600 8 7.295 4.735 2.560 6.595 0.700 6.611 0.685 9 8.272 5.487 2.785 7.380 0.892 7.492 0.780 10 9.333 6.279 3.053 8.173 1.16 0 8.407 0.925 11 10.367 7.113 3.254 9.001 1.366 9.352 1.015 12 11.369 7.976 3.393 9.867 1.502 10.320 1.049 13 12.367 8.864 3.503 10.781 1.586 11.306 1.061 14 13.416 9.779 3.637 11.743 1.673 12.306 1.110 15 14.475 10.723 3.752 12.742 1.7 33 13.314 1.161 16 15.480 11.693 3.786 13.765 1.714 14.329 1.151 17 16.508 12.689 3.819 14.803 1.704 15.348 1.159 18 17.532 13.707 3.826 15.849 1.684 16.372 1.161 19 18.560 14.743 3.817 16.897 1.663 17.399 1.161 20 19.595 15.796 3.799 17 .947 1.647 18.430 1.165 21 20.595 16.861 3.735 18.997 1.598 19.464 1.131 22 21.625 17.937 3.688 20.049 1.576 20.502 1.122 23 22.664 19.022 3.641 21.103 1.561 21.544 1.120 24 23.697 20.116 3.581 22.162 1.534 22.588 1.109 25 24.710 21.220 3.490 23.229 1.481 23.633 1.077 26 25.744 22.332 3.412 24.301 1.443 24.677 1.067 27 26.771 23.451 3.319 25.378 1.393 25.720 1.051 28 27.785 24.578 3.207 26.456 1.329 26.760 1.025 29 28.826 25.710 3.116 27.534 1.292 27.799 1.027 30 29.87 5 26.848 3.028 28.610 1.265 28.838 1.037 31 30.897 27.994 2.903 29.683 1.214 29.881 1.016 32 31.907 29.154 2.753 30.751 1.156 30.931 0.975 33 32.926 30.326 2.600 31.813 1.113 31.994 0.932 34 33.947 31.510 2.438 32.873 1.074 33.073 0.874 35 34.971 32.705 2.266 33.934 1.037 34.175 0.796 36 35.935 33.924 2.010 34.999 0.936 35.305 0.629 37 36.953 35.189 1.765 36.075 0.878 36.470 0.484 38 38.001 36.503 1.498 37.172 0.829 37.668 0.333 39 38.920 37.882 1.038 38.293 0.627 38.87 9 0.041 40 39.690 39.305 0.385 39.431 0.259 38.294 1.396 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 202

202 Table A 37 Equating mean score and score difference for NOP linking method condition 3 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.500 0.000 0.500 0.000 0.500 0.000 0.500 1 0.877 0.000 0.877 0.000 0.877 0.646 0.231 2 1.897 0. 000 1.897 0.000 1.897 1.258 0.640 3 2.776 0.000 2.776 0.045 2.731 1.812 0.964 4 3.660 0.009 3.651 0.352 3.309 2.297 1.363 5 4.575 0.401 4.174 1.046 3.529 2.729 1.846 6 5.419 0.691 4.728 1.874 3.545 3.130 2.289 7 6.364 1.088 5.276 2.713 3.651 3.518 2.846 8 7.295 1.578 5.717 3.500 3.795 3.907 3.388 9 8.272 1.933 6.339 4.130 4.141 4.308 3.963 10 9.333 2.454 6.879 4.728 4.604 4.730 4.603 11 10.367 2.860 7.507 5.272 5.095 5.178 5.189 12 11.369 3.387 7.982 5.780 5.589 5.6 59 5.710 13 12.367 3.862 8.506 6.274 6.093 6.177 6.191 14 13.416 4.427 8.989 6.757 6.659 6.733 6.682 15 14.475 4.965 9.509 7.236 7.239 7.331 7.143 16 15.480 5.590 9.890 7.740 7.740 7.971 7.509 17 16.508 6.217 10.291 8.261 8.246 8.652 7 .856 18 17.532 6.897 10.636 8.837 8.695 9.374 8.159 19 18.560 7.631 10.929 9.469 9.091 10.136 8.424 20 19.595 8.399 11.196 10.147 9.448 10.937 8.657 21 20.595 9.211 11.384 10.889 9.706 11.779 8.816 22 21.625 10.072 11.552 11.690 9.935 12 .663 8.962 23 22.664 10.980 11.683 12.544 10.119 13.588 9.075 24 23.697 11.934 11.763 13.451 10.246 14.557 9.140 25 24.710 12.932 11.779 14.408 10.302 15.569 9.141 26 25.744 13.973 11.771 15.414 10.330 16.621 9.122 27 26.771 15.057 11.71 3 16.467 10.304 17.711 9.060 28 27.785 16.182 11.603 17.565 10.221 18.831 8.954 29 28.826 17.348 11.478 18.702 10.123 19.976 8.850 30 29.875 18.555 11.320 19.872 10.003 21.140 8.736 31 30.897 19.807 11.090 21.065 9.831 22.322 8.574 32 31 .907 21.108 10.799 22.277 9.630 23.529 8.377 33 32.926 22.464 10.462 23.506 9.420 24.777 8.149 34 33.947 23.883 10.064 24.761 9.186 26.093 7.854 35 34.971 25.380 9.591 26.069 8.902 27.532 7.439 36 35.935 26.987 8.948 27.479 8.455 29.184 6.750 37 36.953 28.777 8.177 29.062 7.891 31.216 5.737 38 38.001 30.868 7.133 30.911 7.090 33.923 4.078 39 38.920 33.426 5.494 33.151 5.769 37.492 1.428 40 39.690 36.554 3.136 35.970 3.720 38.294 1.396 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 203

203 Table A 38. Equating mean score and score difference for linking method condition 4 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalen t Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.562 0.000 0.562 0.000 0.562 0.000 0.562 1 1.381 1.639 0.257 1.695 0.313 1.173 0.208 2 2.364 2.764 0.400 2.716 0.352 2.272 0.092 3 3.304 3.904 0.600 3.729 0.426 3.376 0.072 4 4.293 5.047 0 .754 4.732 0.438 4.493 0.200 5 5.253 6.176 0.923 5.725 0.473 5.599 0.346 6 6.245 7.273 1.028 6.723 0.478 6.676 0.431 7 7.191 8.330 1.138 7.747 0.556 7.723 0.532 8 8.136 9.346 1.211 8.821 0.685 8.744 0.608 9 9.102 10.328 1.226 9.910 0.808 9.743 0.641 10 10.097 11.281 1.184 10.969 0.872 10.728 0.631 11 11.093 12.211 1.118 11.991 0.897 11.703 0.609 12 12.065 13.123 1.058 12.979 0.914 12.671 0.606 13 13.043 14.019 0.976 13.942 0.899 13.633 0.590 14 14.039 14.902 0.864 14.885 0.847 14.591 0.552 15 15. 025 15.774 0.749 15.813 0.787 15.543 0.518 16 16.021 16.635 0.614 16.725 0.704 16.491 0.470 17 17.019 17.486 0.467 17.622 0.603 17.432 0.413 18 18.006 18.331 0.324 18.507 0.501 18.368 0.362 19 19.001 19.171 0.170 19.381 0.380 19.298 0.298 20 19.992 20 .010 0.017 20.246 0.254 20.225 0.232 21 20.977 20.849 0.128 21.102 0.125 21.149 0.172 22 21.957 21.690 0.267 21.949 0.009 22.073 0.115 23 22.952 22.538 0.413 22.787 0.164 22.999 0.048 24 23.975 23.396 0.579 23.622 0.353 23.931 0.044 25 24.989 24.265 0.724 24.458 0.531 24.869 0.120 26 25.991 25.145 0.846 25.297 0.693 25.814 0.176 27 26.983 26.033 0.950 26.137 0.846 26.763 0.219 28 27.973 26.928 1.045 26.972 1.001 27.713 0.261 29 28.961 27.827 1.135 27.801 1.161 28.658 0.303 3 0 29.940 28.727 1.213 28.629 1.311 29.596 0.344 31 30.923 29.628 1.295 29.468 1.455 30.524 0.399 32 31.898 30.533 1.365 30.328 1.570 31.445 0.453 33 32.875 31.446 1.430 31.217 1.658 32.362 0.514 34 33.863 32.376 1.487 32.146 1.716 33.283 0.580 35 34.863 33.337 1.526 33.129 1.734 34.220 0.643 36 35.771 34.350 1.421 34.183 1.589 35.193 0.578 37 36.764 35.443 1.321 35.327 1.438 36.232 0.532 38 37.844 36.647 1.197 36.570 1.274 37.388 0.456 39 38.728 37.971 0.757 37.885 0.84 3 38.715 0.013 40 39.620 39.338 0.281 39.227 0.392 38.294 1.326 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 204

204 Table A 39. Equating mean score and score difference for ODL Direct method c ondition 4 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.562 0.000 0.562 0.000 0.562 0.000 0.562 1 1.381 1.439 0.058 1.292 0.090 1.044 0.337 2 2.364 2.518 0.154 2.299 0.065 2.069 0.295 3 3.304 3.604 0.300 3.299 0.004 3.079 0.224 4 4.293 4.695 0.402 4.289 0.004 4.087 0.206 5 5.253 5.786 0.534 5.269 0.016 5.102 0.150 6 6.245 6.868 0.623 6.240 0.006 6.127 0.118 7 7.191 7.931 0.740 7.210 0.018 7.160 0.032 8 8.136 8.973 0.837 8.192 0.056 8.197 0.062 9 9.102 9.992 0.890 9.198 0.096 9.237 0.135 10 10.097 10.991 0.894 10.226 0.129 10.277 0.180 11 11.093 11.973 0.879 11.266 0.173 11.317 0.223 12 12.065 12.940 0.875 12.314 0.2 49 12.355 0.290 13 13.043 13.893 0.850 13.362 0.319 13.391 0.348 14 14.039 14.835 0.796 14.407 0.369 14.425 0.386 15 15.025 15.766 0.740 15.449 0.423 15.456 0.430 16 16.021 16.687 0.667 16.487 0.466 16.483 0.463 17 17.019 17.602 0.582 17.523 0.503 17. 507 0.488 18 18.006 18.510 0.504 18.554 0.548 18.527 0.520 19 19.001 19.415 0.414 19.579 0.578 19.541 0.540 20 19.992 20.318 0.325 20.599 0.607 20.550 0.558 21 20.977 21.220 0.243 21.613 0.636 21.555 0.579 22 21.957 22.124 0.167 22.621 0.664 22.556 0. 599 23 22.952 23.031 0.079 23.624 0.672 23.555 0.603 24 23.975 23.942 0.033 24.623 0.648 24.552 0.577 25 24.989 24.859 0.130 25.622 0.633 25.550 0.561 26 25.991 25.782 0.209 26.623 0.632 26.550 0.559 27 26.983 26.710 0.272 27.627 0.645 27.551 0.56 8 28 27.973 27.645 0.329 28.633 0.659 28.551 0.578 29 28.961 28.584 0.377 29.634 0.673 29.550 0.588 30 29.940 29.529 0.411 30.627 0.687 30.542 0.602 31 30.923 30.479 0.444 31.608 0.685 31.525 0.602 32 31.898 31.434 0.463 32.572 0.675 32.495 0.597 33 32.875 32.398 0.477 33.520 0.644 33.450 0.574 34 33.863 33.372 0.491 34.451 0.588 34.388 0.525 35 34.863 34.360 0.503 35.372 0.508 35.310 0.447 36 35.771 35.367 0.404 36.288 0.517 36.220 0.449 37 36.764 36.402 0.362 37.207 0.443 37.125 0.360 38 37.844 37.476 0.368 38.134 0.290 38.042 0.198 39 38.728 38.595 0.133 39.047 0.319 39.003 0.275 40 39.620 39.704 0.084 39.763 0.143 38.294 1.326 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 205

205 Ta ble A 40. Equating mean score and score difference for TCF linking method condition 4 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Dif ference 0 0.562 0.000 0.562 0.000 0.562 0.000 0.562 1 1.381 1.805 0.424 2.162 0.781 1.344 0.038 2 2.364 3.004 0.640 3.187 0.823 2.502 0.138 3 3.304 4.189 0.886 4.200 0.897 3.654 0.350 4 4.293 5.347 1.054 5.200 0.907 4.810 0.517 5 5.253 6.475 1.22 2 6.188 0.935 5.935 0.682 6 6.245 7.558 1.313 7.187 0.942 7.010 0.765 7 7.191 8.591 1.399 8.235 1.044 8.039 0.847 8 8.136 9.575 1.440 9.342 1.206 9.028 0.893 9 9.102 10.520 1.418 10.434 1.332 9.989 0.887 10 10.097 11.432 1.335 11.471 1.374 10.928 0.83 0 11 11.093 12.318 1.225 12.453 1.359 11.851 0.758 12 12.065 13.183 1.118 13.389 1.324 12.764 0.699 13 13.043 14.030 0.987 14.291 1.248 13.668 0.625 14 14.039 14.864 0.826 15.164 1.125 14.565 0.527 15 15.025 15.686 0.661 16.012 0.986 15.457 0.431 16 16.021 16.498 0.477 16.837 0.817 16.342 0.322 17 17.019 17.301 0.282 17.646 0.626 17.223 0.203 18 18.006 18.098 0.092 18.440 0.433 18.098 0.092 19 19.001 18.892 0.109 19.223 0.222 18.970 0.031 20 19.992 19.683 0.309 19.999 0.006 19.840 0.153 21 20 .977 20.476 0.501 20.766 0.211 20.709 0.268 22 21.957 21.273 0.685 21.527 0.431 21.579 0.378 23 22.952 22.075 0.876 22.283 0.669 22.453 0.499 24 23.975 22.885 1.090 23.035 0.940 23.332 0.644 25 24.989 23.704 1.285 23.788 1.201 24.217 0.7 73 26 25.991 24.534 1.456 24.546 1.444 25.108 0.883 27 26.983 25.374 1.609 25.313 1.669 26.003 0.980 28 27.973 26.221 1.752 26.088 1.885 26.899 1.074 29 28.961 27.072 1.890 26.867 2.094 27.794 1.168 30 29.940 27.927 2.013 27.652 2.288 28 .686 1.254 31 30.923 28.788 2.135 28.445 2.478 29.577 1.346 32 31.898 29.658 2.240 29.257 2.641 30.471 1.427 33 32.875 30.543 2.332 30.100 2.775 31.374 1.502 34 33.863 31.454 2.409 30.995 2.868 32.297 1.566 35 34.863 32.407 2.456 31.961 2.902 33.259 1.604 36 35.771 33.427 2.344 33.022 2.749 34.287 1.484 37 36.764 34.552 2.213 34.210 2.554 35.426 1.338 38 37.844 35.832 2.012 35.552 2.292 36.755 1.088 39 38.728 37.318 1.410 37.056 1.671 38.387 0.340 40 39.620 38.964 0.65 6 38.650 0.969 38.294 1.326 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 206

206 Table A 41. Equating mean score and score difference for ICF linking method condition 4 Item Characteristic Curve Method R aw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.562 0.000 0.562 0.000 0.562 0.000 0.562 1 1.381 1.277 0.104 1.023 0.358 0.998 0.383 2 2.364 2.320 0.0 43 2.007 0.357 1.983 0.381 3 3.304 3.367 0.063 2.998 0.306 2.944 0.360 4 4.293 4.415 0.121 3.977 0.316 3.890 0.404 5 5.253 5.461 0.208 4.940 0.312 4.838 0.414 6 6.245 6.499 0.254 5.889 0.356 5.799 0.446 7 7.191 7.526 0.335 6.834 0.357 6.775 0.416 8 8.136 8.538 0.403 7.775 0.360 7.765 0.370 9 9.102 9.535 0.433 8.708 0.394 8.768 0.334 10 10.097 10.518 0.421 9.665 0.432 9.779 0.318 11 11.093 11.489 0.395 10.648 0.445 10.798 0.296 12 12.065 12.448 0.383 11.653 0.411 11.820 0.245 13 13.043 13.397 0.354 12.675 0.368 12.846 0.197 14 14.039 14.336 0.298 13.707 0.332 13.873 0.165 15 15.025 15.268 0.243 14.742 0.283 14.902 0.124 16 16.021 16.193 0.172 15.780 0.241 15.930 0.091 17 17.019 17.112 0.093 16.820 0.199 16.958 0. 062 18 18.006 18.027 0.021 17.863 0.143 17.984 0.022 19 19.001 18.939 0.062 18.906 0.094 19.009 0.008 20 19.992 19.850 0.142 19.949 0.043 20.031 0.039 21 20.977 20.762 0.215 20.991 0.014 21.052 0.075 22 21.957 21.675 0.282 22.032 0.074 22.071 0.114 23 22.952 22.592 0.360 23.071 0.120 23.088 0.137 24 23.975 23.514 0.461 24.109 0.134 24.105 0.130 25 24.989 24.442 0.547 25.146 0.157 25.123 0.133 26 25.991 25.378 0.613 26.183 0.193 26.140 0.150 27 26.983 26.320 0.663 27.225 0.242 27.159 0 .177 28 27.973 27.268 0.706 28.271 0.297 28.179 0.206 29 28.961 28.222 0.740 29.315 0.354 29.198 0.237 30 29.940 29.181 0.759 30.352 0.412 30.215 0.275 31 30.923 30.146 0.777 31.375 0.452 31.227 0.304 32 31.898 31.120 0.778 32.383 0.485 32.232 0. 334 33 32.875 32.104 0.771 33.371 0.496 33.227 0.351 34 33.863 33.103 0.760 34.342 0.479 34.210 0.347 35 34.863 34.120 0.744 35.295 0.432 35.181 0.318 36 35.771 35.162 0.609 36.235 0.464 36.141 0.370 37 36.764 36.239 0.525 37.168 0.404 37.094 0.3 30 38 37.844 37.356 0.488 38.101 0.257 38.049 0.205 39 38.728 38.507 0.220 39.050 0.322 39.017 0.290 40 39.620 39.676 0.057 39.888 0.268 38.294 1.326 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 207

207 Table A 42. Equating mean score and score difference for NOP linking method condition 4 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.562 0. 000 0.562 0.000 0.562 0.000 0.562 1 1.381 1.389 0.008 1.162 0.219 0.996 0.386 2 2.364 2.451 0.087 2.166 0.198 2.000 0.364 3 3.304 3.523 0.220 3.165 0.138 2.991 0.313 4 4.293 4.606 0.313 4.153 0.140 3.979 0.314 5 5.253 5.695 0.443 5.132 0.1 20 4.981 0.271 6 6.245 6.781 0.536 6.102 0.144 6.003 0.242 7 7.191 7.854 0.662 7.063 0.128 7.041 0.150 8 8.136 8.908 0.773 8.030 0.105 8.092 0.043 9 9.102 9.944 0.842 9.028 0.074 9.151 0.049 10 10.097 10.962 0.865 10.057 0.040 10.213 0.116 1 1 11.093 11.963 0.870 11.112 0.018 11.277 0.183 12 12.065 12.951 0.886 12.183 0.118 12.341 0.276 13 13.043 13.925 0.882 13.261 0.218 13.403 0.360 14 14.039 14.888 0.849 14.339 0.301 14.464 0.426 15 15.025 15.840 0.815 15.417 0.392 15.523 0.498 16 16.0 21 16.784 0.763 16.496 0.476 16.578 0.558 17 17.019 17.720 0.700 17.575 0.556 17.630 0.611 18 18.006 18.650 0.643 18.651 0.645 18.677 0.670 19 19.001 19.576 0.575 19.722 0.721 19.718 0.717 20 19.992 20.499 0.507 20.788 0.795 20.753 0.761 21 20.977 21. 422 0.446 21.848 0.871 21.783 0.806 22 21.957 22.346 0.389 22.901 0.944 22.807 0.850 23 22.952 23.272 0.321 23.949 0.998 23.828 0.876 24 23.975 24.202 0.227 24.992 1.017 24.846 0.871 25 24.989 25.137 0.147 26.034 1.045 25.864 0.875 26 25.991 26.076 0. 086 27.077 1.086 26.882 0.892 27 26.983 27.021 0.039 28.122 1.139 27.901 0.919 28 27.973 27.972 0.002 29.167 1.193 28.920 0.947 29 28.961 28.927 0.034 30.206 1.244 29.936 0.975 30 29.940 29.888 0.052 31.233 1.293 30.945 1.005 31 30.923 30.853 0.07 0 32.243 1.320 31.942 1.019 32 31.898 31.824 0.073 33.230 1.333 32.923 1.025 33 32.875 32.801 0.074 34.192 1.317 33.883 1.007 34 33.863 33.785 0.078 35.126 1.263 34.818 0.955 35 34.863 34.775 0.088 36.032 1.169 35.726 0.862 36 35.771 35.774 0.003 36.917 1.146 36.604 0.833 37 36.764 36.785 0.020 37.782 1.018 37.455 0.691 38 37.844 37.811 0.033 38.624 0.780 38.286 0.442 39 38.728 38.848 0.120 39.410 0.682 39.115 0.387 40 39.620 39.840 0.220 39.946 0.327 38.294 1.326 Note: Test scores at both e nds (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 208

208 Table A 43. Equating mean score and score difference for linking method condition 5 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalen t M ean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.545 0.000 0.545 0.000 0.545 0.000 0.545 1 1.322 0.483 0.839 0.286 1.036 0.967 0.354 2 2.257 1.138 1.118 1.000 1.257 1.841 0.415 3 3.083 1.856 1.226 1.940 1.143 2.664 0.418 4 4.004 2.592 1.412 2.903 1.101 3.426 0.578 5 4.943 3.250 1.693 3.869 1.073 4.134 0.809 6 5.886 3.894 1.992 4.833 1.054 4.809 1.077 7 6.835 4.548 2.286 5.774 1.061 5.472 1.363 8 7.786 5.165 2.621 6.653 1.133 6.137 1.649 9 8.767 5.796 2.971 7.426 1.341 6.814 1.953 10 9.767 6.435 3.332 8.122 1.645 7.509 2.258 11 10.763 7.065 3.698 8.814 1.949 8.226 2.537 12 11.732 7.719 4.012 9.510 2.222 8.966 2.766 13 12.709 8.380 4.328 10.217 2.492 9.727 2.982 14 13.734 9.044 4.690 10.939 2.794 10.507 3.226 15 14.727 9.731 4.996 11.677 3.049 11.304 3.423 16 15.690 10.427 5.263 12.430 3.260 12.114 3.577 17 16.693 11.127 5.566 13.200 3.493 12.936 3.757 18 17.688 11.848 5.840 13.988 3.700 13.769 3.919 19 18.662 12.59 0 6.072 14.791 3.871 14.615 4.047 20 19.651 13.342 6.309 15.603 4.047 15.475 4.175 21 20.637 14.112 6.525 16.418 4.219 16.352 4.285 22 21.631 14.909 6.721 17.236 4.395 17.249 4.382 23 22.634 15.734 6.900 18.060 4.573 18.169 4.465 24 23. 653 16.586 7.067 18.896 4.757 19.115 4.538 25 24.648 17.466 7.182 19.745 4.903 20.085 4.562 26 25.669 18.376 7.293 20.603 5.066 21.078 4.591 27 26.682 19.314 7.367 21.470 5.212 22.086 4.596 28 27.661 20.279 7.382 22.342 5.319 23.100 4.56 0 29 28.657 21.265 7.392 23.221 5.435 24.113 4.543 30 29.658 22.270 7.388 24.110 5.548 25.120 4.538 31 30.654 23.292 7.362 25.011 5.643 26.122 4.531 32 31.619 24.337 7.282 25.926 5.693 27.130 4.489 33 32.582 25.414 7.168 26.863 5.719 28. 158 4.423 34 33.522 26.542 6.980 27.839 5.684 29.232 4.290 35 34.559 27.754 6.805 28.886 5.673 30.388 4.171 36 35.492 29.107 6.385 30.054 5.438 31.686 3.806 37 36.482 30.694 5.788 31.407 5.075 33.229 3.253 38 37.618 32.663 4.954 33.043 4.574 35.202 2.416 39 38.472 35.157 3.314 35.078 3.394 37.828 0.643 40 39.452 37.983 1.469 37.489 1.962 38.294 1.158 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 209

209 Table A 44. Equating mean sco re and score difference for ODL Direct method c ondition 5 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.545 0.000 0.545 0.000 0 .545 0.000 0.545 1 1.322 1.304 0.017 1.958 0.636 1.353 0.031 2 2.257 2.249 0.007 2.997 0.740 2.469 0.213 3 3.083 3.166 0.083 4.002 0.919 3.557 0.474 4 4.004 4.052 0.048 4.985 0.981 4.628 0.624 5 4.943 4.913 0.030 5.957 1.014 5.659 0.716 6 5.886 5 .745 0.141 6.934 1.047 6.643 0.756 7 6.835 6.550 0.285 7.941 1.106 7.584 0.750 8 7.786 7.320 0.467 8.990 1.204 8.494 0.707 9 8.767 8.066 0.702 10.033 1.266 9.379 0.612 10 9.767 8.811 0.956 11.031 1.264 10.248 0.481 11 10.763 9.553 1.210 11.974 1 .211 11.105 0.342 12 11.732 10.280 1.451 12.867 1.135 11.954 0.222 13 12.709 11.004 1.705 13.719 1.010 12.796 0.087 14 13.734 11.734 2.000 14.537 0.804 13.633 0.100 15 14.727 12.463 2.264 15.326 0.599 14.466 0.261 16 15.690 13.187 2.503 16.086 0.396 15.294 0.396 17 16.693 13.917 2.775 16.820 0.127 16.119 0.574 18 17.688 14.655 3.033 17.531 0.158 16.940 0.748 19 18.662 15.396 3.266 18.223 0.440 17.761 0.901 20 19.651 16.145 3.506 18.903 0.748 18.583 1.068 21 20.637 16.906 3.731 19.576 1.061 19.407 1.230 22 21.631 17.683 3.948 20.241 1.389 20.238 1.393 23 22.634 18.475 4.159 20.905 1.729 21.077 1.557 24 23.653 19.287 4.367 21.568 2.085 21.927 1.727 25 24.648 20.120 4.528 22.233 2.415 22.787 1.861 26 25.669 20.97 5 4.694 22.904 2.765 23.657 2.012 27 26.682 21.850 4.832 23.584 3.098 24.533 2.149 28 27.661 22.744 4.917 24.278 3.383 25.412 2.249 29 28.657 23.653 5.003 24.989 3.668 26.290 2.367 30 29.658 24.577 5.081 25.718 3.939 27.167 2.490 31 30. 654 25.516 5.138 26.468 4.186 28.047 2.607 32 31.619 26.475 5.144 27.244 4.375 28.935 2.683 33 32.582 27.463 5.118 28.059 4.523 29.844 2.738 34 33.522 28.499 5.023 28.937 4.585 30.788 2.734 35 34.559 29.607 4.952 29.906 4.653 31.794 2.76 5 36 35.492 30.833 4.659 31.011 4.481 32.903 2.589 37 36.482 32.252 4.230 32.315 4.167 34.191 2.291 38 37.618 33.969 3.649 33.894 3.724 35.792 1.826 39 38.472 36.079 2.393 35.792 2.680 37.919 0.552 40 39.452 38.404 1.048 37.954 1.498 38. 294 1.158 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 210

210 Table A 45. Equating mean score and score difference for TCF linking method condition 5 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.545 0.000 0.545 0.000 0.545 0.000 0.545 1 1.322 1.368 0.046 1.972 0.650 1.353 0.031 2 2.257 2.326 0.069 3.007 0.751 2.474 0. 218 3 3.083 3.253 0.171 4.012 0.929 3.568 0.485 4 4.004 4.148 0.144 4.993 0.989 4.647 0.643 5 4.943 5.011 0.069 5.960 1.017 5.688 0.745 6 5.886 5.847 0.040 6.927 1.041 6.680 0.794 7 6.835 6.654 0.181 7.917 1.082 7.630 0.796 8 7.786 7.427 0.359 8.9 41 1.154 8.547 0.761 9 8.767 8.172 0.595 9.959 1.191 9.439 0.672 10 9.767 8.912 0.855 10.934 1.167 10.314 0.547 11 10.763 9.651 1.112 11.862 1.099 11.176 0.413 12 11.732 10.376 1.355 12.747 1.015 12.029 0.297 13 12.709 11.095 1.614 13.600 0.891 1 2.875 0.166 14 13.734 11.819 1.915 14.426 0.693 13.715 0.018 15 14.727 12.543 2.183 15.230 0.503 14.551 0.176 16 15.690 13.262 2.428 16.012 0.322 15.381 0.309 17 16.693 13.986 2.707 16.773 0.080 16.207 0.486 18 17.688 14.717 2.972 17.513 0.1 76 17.030 0.659 19 18.662 15.451 3.211 18.236 0.426 17.851 0.811 20 19.651 16.192 3.459 18.950 0.701 18.673 0.978 21 20.637 16.946 3.691 19.657 0.980 19.498 1.140 22 21.631 17.714 3.916 20.358 1.273 20.328 1.303 23 22.634 18.498 4.136 21 .055 1.579 21.166 1.468 24 23.653 19.300 4.353 21.749 1.904 22.014 1.640 25 24.648 20.124 4.524 22.442 2.205 22.872 1.775 26 25.669 20.968 4.701 23.140 2.529 23.740 1.929 27 26.682 21.833 4.849 23.846 2.835 24.614 2.068 28 27.661 22.715 4.945 24.564 3.097 25.491 2.170 29 28.657 23.613 5.043 25.297 3.360 26.367 2.289 30 29.658 24.524 5.133 26.043 3.615 27.242 2.416 31 30.654 25.450 5.204 26.799 3.854 28.119 2.535 32 31.619 26.393 5.225 27.574 4.045 29.003 2.615 33 32.58 2 27.365 5.216 28.384 4.197 29.908 2.674 34 33.522 28.382 5.140 29.254 4.268 30.849 2.673 35 34.559 29.472 5.087 30.214 4.345 31.852 2.707 36 35.492 30.681 4.811 31.304 4.188 32.960 2.532 37 36.482 32.093 4.389 32.581 3.901 34.244 2.238 38 37.618 33.812 3.805 34.117 3.501 35.837 1.780 39 38.472 35.953 2.518 35.956 2.516 37.947 0.525 40 39.452 38.338 1.113 38.035 1.416 38.294 1.158 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 198 1).

PAGE 211

211 Table A 46. Equating mean score and score difference for ICF linking method condition 5 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Sco re Difference 0 0.545 0.000 0.545 0.000 0.545 0.000 0.545 1 1.322 0.846 0.476 0.939 0.383 0.977 0.345 2 2.257 1.783 0.474 1.928 0.329 1.947 0.310 3 3.083 2.703 0.380 2.918 0.164 2.896 0.186 4 4.004 3.602 0.403 3.887 0.117 3.833 0.171 5 4.943 4.469 0.473 4.846 0.097 4.777 0.166 6 5.886 5.314 0.572 5.799 0.087 5.738 0.149 7 6.835 6.157 0.677 6.745 0.090 6.719 0.115 8 7.786 7.008 0.779 7.686 0.100 7.720 0.066 9 8.767 7.863 0.904 8.617 0.150 8.736 0.031 10 9.767 8.724 1 .043 9.573 0.194 9.763 0.003 11 10.763 9.587 1.176 10.563 0.200 10.798 0.035 12 11.732 10.450 1.281 11.581 0.151 11.836 0.104 13 12.709 11.314 1.395 12.618 0.091 12.875 0.166 14 13.734 12.184 1.550 13.665 0.068 13.915 0.181 15 14.727 13.061 1.666 14.716 0.010 14.953 0.227 16 15.690 13.946 1.745 15.770 0.079 15.990 0.299 17 16.693 14.837 1.855 16.825 0.132 17.023 0.330 18 17.688 15.737 1.951 17.881 0.193 18.053 0.365 19 18.662 16.646 2.016 18.936 0.273 19.080 0.417 20 19.651 17.564 2.087 19.987 0.336 20.102 0.452 21 20.637 18.494 2.143 21.036 0.398 21.121 0.484 22 21.631 19.437 2.194 22.082 0.451 22.138 0.507 23 22.634 20.395 2.239 23.126 0.492 23.152 0.518 24 23.653 21.370 2.283 24.167 0.514 24.165 0.512 25 24.648 22.362 2.285 25.208 0.560 25.179 0.531 26 25.669 23.372 2.298 26.251 0.582 26.194 0.525 27 26.682 24.398 2.284 27.300 0.618 27.211 0.529 28 27.661 25.439 2.222 28.354 0.693 28.230 0.569 29 28.657 26.494 2.162 29.406 0.750 29.250 0.594 30 29.658 27.564 2 .094 30.451 0.793 30.270 0.612 31 30.654 28.650 2.004 31.480 0.826 31.286 0.632 32 31.619 29.756 1.863 32.488 0.870 32.295 0.676 33 32.582 30.886 1.695 33.474 0.892 33.293 0.712 34 33.522 32.045 1.477 34.434 0.912 34.278 0.756 35 34.559 33.234 1. 326 35.371 0.812 35.247 0.688 36 35.492 34.461 1.031 36.291 0.799 36.201 0.709 37 36.482 35.735 0.747 37.203 0.721 37.141 0.659 38 37.618 37.049 0.569 38.118 0.501 38.075 0.457 39 38.472 38.352 0.120 39.052 0.580 39.020 0.548 40 39.452 39.599 0.14 7 39.915 0.463 38.294 1.158 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 212

212 Table A 47. Equating mean score and score difference for NOP linking method condition 5 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.545 0.000 0.545 0.000 0.545 0.000 0.545 1 1.322 0.000 1.322 0.000 1.322 0.661 0.661 2 2.257 0.024 2.232 0.003 2.254 1.295 0.962 3 3.083 0.541 2.541 0.105 2.978 1.870 1.212 4 4.004 0.905 3.100 0.574 3.430 2.374 1.630 5 4.943 1.483 3.460 1.418 3.525 2.820 2.122 6 5.886 1.871 4.015 2.309 3.577 3.232 2.654 7 6.835 2.395 4.440 3.182 3.653 3.628 3.207 8 7.786 2 .814 4.972 3.959 3.827 4.023 3.763 9 8.767 3.305 5.463 4.614 4.154 4.428 4.339 10 9.767 3.763 6.004 5.192 4.575 4.852 4.915 11 10.763 4.251 6.512 5.695 5.068 5.301 5.462 12 11.732 4.745 6.987 6.190 5.542 5.781 5.951 13 12.709 5.259 7.4 50 6.650 6.059 6.296 6.413 14 13.734 5.786 7.947 7.098 6.636 6.849 6.885 15 14.727 6.346 8.380 7.571 7.156 7.442 7.285 16 15.690 6.912 8.778 8.032 7.658 8.076 7.614 17 16.693 7.526 9.167 8.559 8.134 8.750 7.942 18 17.688 8.146 9.543 9.1 12 8.576 9.465 8.223 19 18.662 8.805 9.857 9.731 8.931 10.220 8.442 20 19.651 9.498 10.152 10.403 9.248 11.015 8.636 21 20.637 10.216 10.421 11.126 9.512 11.850 8.787 22 21.631 10.971 10.659 11.907 9.724 12.728 8.903 23 22.634 11.766 10 .868 12.748 9.886 13.649 8.985 24 23.653 12.595 11.058 13.643 10.010 14.615 9.038 25 24.648 13.464 11.184 14.590 10.058 15.627 9.021 26 25.669 14.372 11.297 15.587 10.082 16.683 8.986 27 26.682 15.320 11.361 16.635 10.047 17.779 8.902 28 27.661 16.308 11.353 17.733 9.928 18.910 8.750 29 28.657 17.334 11.322 18.877 9.779 20.069 8.587 30 29.658 18.401 11.257 20.060 9.597 21.251 8.407 31 30.654 19.511 11.142 21.273 9.381 22.453 8.201 32 31.619 20.673 10.946 22.509 9.110 23. 682 7.937 33 32.582 21.897 10.685 23.764 8.818 24.951 7.630 34 33.522 23.201 10.321 25.047 8.475 26.291 7.232 35 34.559 24.615 9.944 26.388 8.172 27.751 6.808 36 35.492 26.191 9.301 27.841 7.651 29.423 6.069 37 36.482 28.036 8.446 29.484 6.998 31.464 5.018 38 37.618 30.341 7.277 31.409 6.209 34.139 3.479 39 38.472 33.367 5.105 33.765 4.707 37.579 0.893 40 39.452 37.061 2.390 36.712 2.740 38.294 1.158 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 213

213 Table A 48. Equating mean score and score difference for linking method condition 6 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalen t Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.519 0.000 0.519 0.000 0.519 0.000 0.519 1 1.158 0.047 1.111 0.167 0.991 0.914 0.244 2 2.137 0.646 1.490 0.758 1.379 1.766 0.370 3 2.976 1.289 1.687 1.668 1.308 2.565 0.411 4 3.809 1.881 1.928 2.635 1.175 3.297 0.51 3 5 4.752 2.534 2.217 3.600 1.152 3.973 0.779 6 5.723 3.116 2.607 4.556 1.167 4.619 1.104 7 6.718 3.724 2.994 5.477 1.241 5.254 1.464 8 7.655 4.328 3.326 6.330 1.325 5.895 1.759 9 8.580 4.925 3.655 7.083 1.497 6.552 2.028 10 9.565 5.55 4 4.011 7.774 1.791 7.231 2.334 11 10.570 6.179 4.390 8.441 2.129 7.935 2.635 12 11.579 6.827 4.752 9.116 2.463 8.665 2.914 13 12.576 7.501 5.076 9.809 2.767 9.420 3.157 14 13.591 8.179 5.412 10.523 3.068 10.196 3.395 15 14.619 8.880 5 .739 11.258 3.361 10.991 3.628 16 15.620 9.607 6.013 12.012 3.608 11.801 3.819 17 16.636 10.344 6.292 12.786 3.850 12.624 4.011 18 17.641 11.096 6.545 13.580 4.061 13.461 4.180 19 18.630 11.875 6.755 14.392 4.238 14.310 4.320 20 19.633 1 2.679 6.953 15.214 4.419 15.174 4.459 21 20.617 13.508 7.110 16.042 4.576 16.055 4.562 22 21.612 14.358 7.254 16.874 4.739 16.957 4.655 23 22.619 15.239 7.380 17.712 4.907 17.884 4.735 24 23.641 16.153 7.488 18.558 5.083 18.836 4.805 25 24.650 17.101 7.549 19.413 5.237 19.813 4.837 26 25.684 18.081 7.603 20.278 5.407 20.813 4.871 27 26.713 19.091 7.622 21.150 5.563 21.828 4.885 28 27.716 20.123 7.592 22.026 5.690 22.847 4.869 29 28.711 21.174 7.537 22.905 5.806 23.862 4.849 30 29.708 22.236 7.472 23.790 5.918 24.869 4.839 31 30.713 23.308 7.404 24.685 6.028 25.870 4.843 32 31.694 24.393 7.301 25.597 6.098 26.873 4.821 33 32.680 25.500 7.179 26.531 6.149 27.896 4.783 34 33.657 26.649 7.008 27.503 6.154 28.965 4.692 35 34.715 27.871 6.844 28.543 6.172 30.119 4.596 36 35.658 29.218 6.440 29.700 5.959 31.422 4.237 37 36.663 30.777 5.885 31.041 5.622 32.985 3.678 38 37.775 32.677 5.098 32.656 5.119 35.017 2.758 39 38.663 35.040 3.623 34.6 60 4.003 37.779 0.884 40 39.608 37.763 1.845 37.104 2.504 38.294 1.314 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 214

214 Table A 49. Equating mean score and score difference for ODL direct method c on dition 6 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.519 0.000 0.519 0.000 0.519 0.000 0.519 1 1.158 0.567 0.591 0.924 0. 234 1.063 0.095 2 2.137 1.339 0.798 1.934 0.203 2.059 0.077 3 2.976 2.072 0.904 2.943 0.033 3.023 0.047 4 3.809 2.848 0.961 3.927 0.118 3.955 0.145 5 4.752 3.634 1.117 4.901 0.150 4.857 0.105 6 5.723 4.391 1.332 5.866 0.143 5.740 0.017 7 6.7 18 5.124 1.595 6.822 0.104 6.613 0.105 8 7.655 5.873 1.782 7.766 0.112 7.485 0.170 9 8.580 6.639 1.941 8.697 0.117 8.360 0.220 10 9.565 7.406 2.159 9.621 0.056 9.241 0.324 11 10.570 8.175 2.394 10.541 0.029 10.129 0.441 12 11.579 8.960 2.6 19 11.451 0.128 11.023 0.556 13 12.576 9.762 2.814 12.351 0.225 11.921 0.655 14 13.591 10.575 3.015 13.242 0.349 12.823 0.768 15 14.619 11.393 3.226 14.122 0.497 13.726 0.893 16 15.620 12.219 3.401 14.989 0.631 14.628 0.991 17 16.636 13. 055 3.580 15.840 0.796 15.530 1.105 18 17.641 13.905 3.736 16.672 0.969 16.432 1.209 19 18.630 14.767 3.863 17.487 1.143 17.335 1.295 20 19.633 15.641 3.992 18.286 1.347 18.240 1.393 21 20.617 16.530 4.087 19.074 1.543 19.150 1.468 22 2 1.612 17.438 4.175 19.857 1.755 20.067 1.546 23 22.619 18.366 4.253 20.637 1.982 20.993 1.626 24 23.641 19.318 4.323 21.414 2.227 21.930 1.710 25 24.650 20.293 4.357 22.194 2.457 22.879 1.771 26 25.684 21.292 4.392 22.978 2.706 23.837 1. 848 27 26.713 22.310 4.403 23.771 2.942 24.799 1.914 28 27.716 23.341 4.374 24.573 3.143 25.759 1.957 29 28.711 24.381 4.330 25.384 3.327 26.714 1.997 30 29.708 25.422 4.286 26.205 3.503 27.661 2.047 31 30.713 26.464 4.248 27.037 3.676 2 8.602 2.110 32 31.694 27.511 4.183 27.888 3.806 29.546 2.148 33 32.680 28.573 4.107 28.776 3.904 30.502 2.177 34 33.657 29.667 3.990 29.721 3.936 31.488 2.169 35 34.715 30.820 3.895 30.749 3.966 32.527 2.188 36 35.658 32.070 3.589 31.892 3.766 33.656 2.003 37 36.663 33.461 3.202 33.193 3.469 34.932 1.731 38 37.775 35.055 2.720 34.692 3.083 36.455 1.320 39 38.663 36.863 1.800 36.396 2.267 38.334 0.328 40 39.608 38.765 0.843 38.253 1.355 38.294 1.314 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 215

215 Table A 50. Equating mean score and score difference for TCF linking method condition 6 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.519 0.000 0.519 0.000 0.519 0.000 0.519 1 1.158 0.581 0.577 0.937 0.221 1.068 0.090 2 2.137 1.375 0.761 1.953 0.184 2.068 0.069 3 2.976 2.113 0.863 2 .966 0.010 3.035 0.059 4 3.809 2.886 0.923 3.953 0.143 3.971 0.161 5 4.752 3.673 1.078 4.927 0.175 4.877 0.125 6 5.723 4.437 1.286 5.889 0.166 5.763 0.041 7 6.718 5.170 1.548 6.841 0.123 6.640 0.078 8 7.655 5.917 1.738 7.781 0.127 7.514 0.141 9 8.580 6.681 1.899 8.704 0.124 8.391 0.189 10 9.565 7.449 2.116 9.619 0.054 9.273 0.292 11 10.570 8.217 2.353 10.528 0.042 10.161 0.409 12 11.579 8.998 2.581 11.430 0.149 11.054 0.525 13 12.576 9.796 2.780 12.324 0.252 11.952 0.624 14 1 3.591 10.606 2.985 13.210 0.380 12.852 0.738 15 14.619 11.421 3.198 14.089 0.529 13.753 0.865 16 15.620 12.242 3.378 14.959 0.661 14.654 0.966 17 16.636 13.074 3.562 15.815 0.821 15.554 1.082 18 17.641 13.918 3.723 16.654 0.987 16.453 1. 188 19 18.630 14.774 3.856 17.477 1.153 17.354 1.276 20 19.633 15.643 3.989 18.285 1.348 18.256 1.377 21 20.617 16.527 4.091 19.084 1.534 19.163 1.455 22 21.612 17.428 4.184 19.877 1.735 20.077 1.536 23 22.619 18.350 4.268 20.666 1.952 2 1.000 1.618 24 23.641 19.296 4.345 21.453 2.188 21.935 1.706 25 24.650 20.265 4.385 22.241 2.410 22.880 1.770 26 25.684 21.258 4.427 23.032 2.652 23.834 1.850 27 26.713 22.269 4.443 23.830 2.882 24.793 1.920 28 27.716 23.295 4.421 24.637 3.078 25.750 1.966 29 28.711 24.328 4.383 25.453 3.258 26.702 2.009 30 29.708 25.363 4.345 26.277 3.431 27.646 2.062 31 30.713 26.399 4.314 27.111 3.601 28.587 2.126 32 31.694 27.440 4.254 27.963 3.731 29.530 2.164 33 32.680 28.497 4.1 83 28.850 3.830 30.488 2.192 34 33.657 29.586 4.071 29.792 3.865 31.475 2.181 35 34.715 30.736 3.979 30.818 3.897 32.517 2.198 36 35.658 31.984 3.675 31.957 3.702 33.649 2.010 37 36.663 33.376 3.286 33.251 3.412 34.929 1.734 38 37.775 34 .979 2.796 34.739 3.036 36.455 1.320 39 38.663 36.804 1.859 36.428 2.235 38.335 0.328 40 39.608 38.732 0.876 38.269 1.339 38.294 1.314 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 216

216 Table A 5 1. Equating mean score and score difference for ICF linking method condition 6 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.519 0.000 0.519 0.000 0.519 0.000 0.519 1 1.158 0.608 0.550 0.857 0.301 0.957 0.201 2 2.137 1.464 0.673 1.851 0.286 1.924 0.213 3 2.976 2.242 0.734 2.843 0.133 2.864 0.112 4 3.809 3.041 0.768 3.812 0.002 3.786 0.024 5 4.752 3.870 0.882 4.764 0.012 4.711 0.040 6 5.723 4.710 1.013 5.711 0.012 5.657 0.066 7 6.718 5.551 1.167 6.647 0.071 6.627 0.091 8 7.655 6.389 1.266 7.570 0.085 7.620 0.035 9 8.580 7.235 1.345 8.478 0.102 8.633 0.053 10 9.565 8.099 1.466 9.418 0.14 7 9.660 0.095 11 10.570 8.979 1.590 10.399 0.171 10.696 0.127 12 11.579 9.875 1.705 11.413 0.166 11.738 0.158 13 12.576 10.782 1.794 12.452 0.125 12.781 0.205 14 13.591 11.700 1.890 13.503 0.088 13.824 0.234 15 14.619 12.628 1.991 14.559 0.0 60 14.865 0.247 16 15.620 13.564 2.056 15.616 0.003 15.904 0.284 17 16.636 14.509 2.127 16.675 0.039 16.938 0.302 18 17.641 15.463 2.178 17.732 0.091 17.968 0.327 19 18.630 16.426 2.204 18.785 0.155 18.994 0.364 20 19.633 17.400 2.233 19.832 0.2 00 20.015 0.382 21 20.617 18.386 2.232 20.874 0.256 21.031 0.414 22 21.612 19.384 2.228 21.909 0.297 22.044 0.432 23 22.619 20.397 2.221 22.939 0.321 23.055 0.436 24 23.641 21.425 2.216 23.964 0.323 24.063 0.422 25 24.650 22.466 2.185 24.984 0.33 3 25.071 0.421 26 25.684 23.520 2.164 26.002 0.318 26.080 0.395 27 26.713 24.585 2.128 27.021 0.308 27.088 0.375 28 27.716 25.657 2.059 28.041 0.325 28.096 0.380 29 28.711 26.733 1.978 29.059 0.348 29.102 0.391 30 29.708 27.812 1.896 30.071 0.363 30.106 0.398 31 30.713 28.894 1.818 31.073 0.361 31.105 0.392 32 31.694 29.984 1.710 32.065 0.371 32.098 0.404 33 32.680 31.085 1.594 33.045 0.365 33.084 0.404 34 33.657 32.205 1.452 34.013 0.356 34.062 0.405 35 34.715 33.350 1.365 34.972 0.257 35.035 0.320 36 35.658 34.529 1.129 35.927 0.269 36.004 0.346 37 36.663 35.752 0.910 36.885 0.223 36.977 0.314 38 37.775 37.014 0.761 37.856 0.082 37.964 0.189 39 38.663 38.287 0.376 38.850 0.187 38.981 0.318 40 39.608 39.532 0.076 39.797 0.189 3 8.294 1.314 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 217

217 Table A 52. Equating mean score and score difference for NOP linking method condition 6 NOP Method Raw Score Frequency Full MIRT Approx Obs erved Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.519 0.000 0.519 0.000 0.519 0.000 0.519 1 1.158 0.000 1.158 0.000 1.158 0.618 0.540 2 2.137 0.000 2.137 0.000 2.137 1.229 0.908 3 2.976 0 .006 2.970 0.022 2.954 1.782 1.194 4 3.809 0.403 3.406 0.309 3.501 2.264 1.546 5 4.752 0.713 4.038 0.989 3.763 2.688 2.063 6 5.723 1.143 4.579 1.844 3.879 3.079 2.644 7 6.718 1.606 5.112 2.686 4.033 3.454 3.265 8 7.655 1.973 5.682 3.45 2 4.203 3.827 3.828 9 8.580 2.475 6.105 4.107 4.473 4.209 4.371 10 9.565 2.863 6.702 4.679 4.886 4.609 4.956 11 10.570 3.359 7.211 5.214 5.356 5.033 5.537 12 11.579 3.804 7.776 5.702 5.877 5.487 6.093 13 12.576 4.310 8.267 6.187 6.389 5.975 6.601 14 13.591 4.806 8.784 6.646 6.944 6.502 7.089 15 14.619 5.354 9.265 7.103 7.516 7.068 7.551 16 15.620 5.898 9.721 7.584 8.036 7.676 7.943 17 16.636 6.506 10.130 8.067 8.569 8.327 8.309 18 17.641 7.112 10.529 8.616 9.025 9.02 0 8.621 19 18.630 7.774 10.855 9.200 9.429 9.755 8.875 20 19.633 8.472 11.160 9.845 9.788 10.532 9.101 21 20.617 9.200 11.417 10.556 10.061 11.352 9.265 22 21.612 9.975 11.637 11.317 10.295 12.217 9.396 23 22.619 10.797 11.822 12.137 10 .481 13.127 9.492 24 23.641 11.662 11.979 13.018 10.623 14.084 9.557 25 24.650 12.573 12.078 13.957 10.693 15.087 9.563 26 25.684 13.530 12.155 14.949 10.735 16.136 9.548 27 26.713 14.532 12.180 15.992 10.720 17.226 9.487 28 27.716 15.577 12.139 17.084 10.631 18.351 9.364 29 28.711 16.663 12.048 18.222 10.489 19.505 9.206 30 29.708 17.788 11.920 19.398 10.310 20.683 9.025 31 30.713 18.955 11.758 20.602 10.110 21.883 8.830 32 31.694 20.169 11.525 21.830 9.864 23.110 8.584 33 32.680 21.442 11.238 23.079 9.600 24.379 8.300 34 33.657 22.790 10.867 24.357 9.300 25.719 7.938 35 34.715 24.242 10.473 25.688 9.027 27.182 7.533 36 35.658 25.850 9.808 27.123 8.536 28.862 6.797 37 36.663 27.708 8.954 28.747 7.916 3 0.931 5.732 38 37.775 29.986 7.789 30.664 7.111 33.706 4.069 39 38.663 32.918 5.745 33.021 5.642 37.427 1.236 40 39.608 36.562 3.046 36.048 3.560 38.294 1.314 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedur e (Kolen, 1981).

PAGE 218

218 Table A 53. Equating mean score and score difference for linking method condition 7 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalen t Mean Score Differenc e Mean Score Differenc e Mean Score Di fferenc e 0 0.183 0.000 0.183 0.000 0.183 0.000 0.183 1 0.966 1.269 0.303 1.538 0.572 1.127 0.162 2 1.988 2.288 0.301 2.551 0.564 2.172 0.184 3 2.950 3.308 0.358 3.539 0.589 3.201 0.251 4 3.952 4.327 0.375 4.521 0.568 4.233 0.281 5 4.987 5.345 0.358 5.502 0.515 5.272 0.285 6 5.980 6.362 0.382 6.483 0.503 6.313 0.333 7 6.948 7.376 0.428 7.474 0.526 7.349 0.402 8 7.943 8.386 0.444 8.476 0.533 8.378 0.436 9 8.949 9.393 0.444 9.491 0.542 9.398 0.449 10 9.958 10.397 0.439 10.511 0.553 10.409 0.451 11 10.962 11.397 0.435 11.530 0.568 11.412 0.450 12 11.949 12.395 0.446 12.544 0.595 12.410 0.461 13 12.946 13.392 0.446 13.552 0.606 13.404 0.458 14 13.946 14.387 0.440 14.555 0.609 14.396 0.450 15 14.938 15.381 0.443 15.555 0.617 15.388 0.450 16 15.95 7 16.375 0.419 16.553 0.596 16.379 0.423 17 16.985 17.369 0.384 17.550 0.565 17.371 0.386 18 18.002 18.363 0.362 18.550 0.548 18.364 0.363 19 19.007 19.358 0.351 19.551 0.544 19.358 0.352 20 20.028 20.353 0.325 20.556 0.528 20.353 0.325 21 21.052 21.3 49 0.297 21.564 0.513 21.349 0.297 22 22.055 22.345 0.291 22.578 0.523 22.344 0.290 23 23.047 23.343 0.296 23.595 0.548 23.341 0.294 24 24.027 24.342 0.315 24.616 0.589 24.338 0.311 25 25.023 25.343 0.320 25.640 0.618 25.339 0.316 26 26.025 26.347 0.3 22 26.666 0.641 26.344 0.319 27 27.032 27.355 0.322 27.690 0.658 27.355 0.323 28 28.066 28.366 0.300 28.710 0.644 28.373 0.307 29 29.094 29.380 0.286 29.720 0.627 29.394 0.301 30 30.087 30.395 0.308 30.718 0.631 30.417 0.329 31 31.080 31.408 0.328 31. 701 0.621 31.434 0.354 32 32.046 32.414 0.369 32.666 0.621 32.439 0.394 33 33.018 33.409 0.391 33.617 0.599 33.428 0.410 34 34.007 34.388 0.381 34.555 0.548 34.394 0.387 35 34.943 35.352 0.409 35.486 0.543 35.337 0.394 36 35.907 36.302 0.394 36.414 0. 507 36.259 0.352 37 36.911 37.242 0.330 37.342 0.431 37.170 0.258 38 37.887 38.178 0.290 38.272 0.385 38.082 0.194 39 39.036 39.114 0.078 39.198 0.162 39.015 0.021 40 40.180 39.962 0.218 39.945 0.235 38.755 1.425 Note: Test scores at both ends (i. e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 219

219 Table A 54. Equating mean score and score difference for ODL direct method c ondition 7 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent M ean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.183 0.000 0.183 0.000 0.183 0.000 0.183 1 0.966 1.214 0.249 1.484 0.519 1.119 0.153 2 1.988 2.229 0.242 2.490 0.502 2.157 0.169 3 2.950 3.244 0.294 3.480 0.530 3.180 0.230 4 3.952 4 .259 0.307 4.456 0.503 4.203 0.250 5 4.987 5.273 0.286 5.431 0.444 5.229 0.242 6 5.980 6.285 0.305 6.407 0.427 6.255 0.275 7 6.948 7.296 0.348 7.391 0.444 7.280 0.332 8 7.943 8.304 0.361 8.385 0.443 8.300 0.357 9 8.949 9.309 0.360 9.389 0.440 9.314 0. 365 10 9.958 10.312 0.355 10.396 0.438 10.322 0.365 11 10.962 11.314 0.352 11.402 0.440 11.326 0.364 12 11.949 12.314 0.365 12.406 0.457 12.326 0.377 13 12.946 13.314 0.368 13.408 0.462 13.324 0.378 14 13.946 14.314 0.367 14.411 0.465 14.321 0.374 15 14.938 15.313 0.375 15.417 0.478 15.318 0.380 16 15.957 16.313 0.356 16.426 0.469 16.315 0.359 17 16.985 17.312 0.327 17.440 0.455 17.314 0.329 18 18.002 18.313 0.311 18.460 0.459 18.314 0.313 19 19.007 19.314 0.307 19.487 0.480 19.316 0.309 20 20.02 8 20.316 0.288 20.518 0.490 20.319 0.290 21 21.052 21.319 0.267 21.555 0.504 21.322 0.271 22 22.055 22.324 0.269 22.597 0.542 22.327 0.272 23 23.047 23.330 0.283 23.641 0.594 23.333 0.286 24 24.027 24.337 0.310 24.688 0.660 24.340 0.313 25 25.023 25.3 48 0.325 25.734 0.712 25.351 0.328 26 26.025 26.361 0.336 26.779 0.754 26.367 0.342 27 27.032 27.379 0.346 27.818 0.786 27.390 0.358 28 28.066 28.399 0.333 28.849 0.783 28.419 0.353 29 29.094 29.423 0.329 29.868 0.774 29.451 0.358 30 30.087 30.447 0.3 60 30.871 0.784 30.483 0.396 31 31.080 31.468 0.388 31.855 0.775 31.508 0.428 32 32.046 32.481 0.435 32.820 0.774 32.519 0.474 33 33.018 33.481 0.463 33.766 0.748 33.511 0.493 34 34.007 34.464 0.457 34.697 0.690 34.478 0.471 35 34.943 35.430 0.487 35. 616 0.673 35.418 0.475 36 35.907 36.379 0.471 36.531 0.623 36.333 0.426 37 36.911 37.313 0.402 37.441 0.529 37.229 0.317 38 37.887 38.236 0.348 38.349 0.462 38.118 0.230 39 39.036 39.153 0.117 39.241 0.205 39.022 0.014 40 40.180 39.954 0.226 39.929 0.251 38.755 1.425 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 220

220 Table A 55. Equating mean score and score difference for TCF linking method condition 7 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.183 0.000 0.183 0.000 0.183 0.000 0.183 1 0.966 1.083 0.117 1.254 0.288 1.058 0.093 2 1.988 2.081 0.094 2.266 0.278 2.069 0.082 3 2.950 3.079 0.129 3.265 0.315 3.067 0.117 4 3.952 4.075 0.123 4.247 0.294 4.062 0.110 5 4.987 5.071 0.084 5.222 0.235 5.059 0.072 6 5.980 6.065 0.085 6.192 0.212 6.058 0.078 7 6.948 7.060 0.112 7.159 0.212 7.058 0.110 8 7.943 8.055 0.11 2 8.135 0.192 8.058 0.115 9 8.949 9.050 0.101 9.121 0.172 9.058 0.109 10 9.958 10.046 0.089 10.112 0.154 10.057 0.099 11 10.962 11.044 0.082 11.104 0.142 11.056 0.094 12 11.949 12.043 0.094 12.098 0.149 12.054 0.105 13 12.946 13.044 0.098 13.093 0.147 13.053 0.107 14 13.946 14.045 0.099 14.092 0.146 14.052 0.105 15 14.938 15.048 0.110 15.096 0.158 15.052 0.114 16 15.957 16.052 0.095 16.107 0.150 16.053 0.097 17 16.985 17.057 0.072 17.124 0.139 17.057 0.072 18 18.002 18.064 0.062 18.150 0.148 18.06 3 0.061 19 19.007 19.073 0.066 19.183 0.176 19.071 0.065 20 20.028 20.083 0.055 20.223 0.195 20.083 0.055 21 21.052 21.096 0.045 21.270 0.219 21.098 0.046 22 22.055 22.112 0.057 22.324 0.269 22.115 0.060 23 23.047 23.129 0.082 23.382 0.335 23.134 0.08 7 24 24.027 24.149 0.121 24.443 0.416 24.155 0.128 25 25.023 25.171 0.148 25.507 0.484 25.180 0.157 26 26.025 26.196 0.171 26.569 0.544 26.208 0.183 27 27.032 27.223 0.191 27.626 0.594 27.242 0.210 28 28.066 28.254 0.188 28.675 0.609 28.281 0.215 29 29.094 29.288 0.194 29.711 0.617 29.323 0.229 30 30.087 30.322 0.235 30.729 0.642 30.365 0.278 31 31.080 31.353 0.273 31.726 0.646 31.401 0.321 32 32.046 32.377 0.331 32.701 0.656 32.425 0.380 33 33.018 33.388 0.370 33.656 0.639 33.430 0.413 34 34.007 34.382 0.375 34.596 0.589 34.411 0.404 35 34.943 35.359 0.416 35.524 0.581 35.364 0.420 36 35.907 36.320 0.412 36.449 0.542 36.292 0.384 37 36.911 37.267 0.356 37.376 0.464 37.204 0.293 38 37.887 38.206 0.319 38.308 0.420 38.116 0.229 39 39.036 39.14 4 0.107 39.234 0.198 39.047 0.010 40 40.180 39.980 0.200 39.963 0.217 38.755 1.425 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 221

221 Table A 56. Equating mean score and score difference for ICF linking method condition 7 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.183 0.000 0.183 0.000 0.183 0.000 0.183 1 0.966 1.1 08 0.142 1.321 0.355 1.082 0.116 2 1.988 2.106 0.118 2.321 0.333 2.099 0.111 3 2.950 3.103 0.153 3.308 0.358 3.099 0.149 4 3.952 4.098 0.146 4.286 0.334 4.093 0.141 5 4.987 5.092 0.105 5.258 0.271 5.088 0.101 6 5.980 6.084 0.104 6.228 0.247 6.083 0.10 3 7 6.948 7.076 0.128 7.196 0.249 7.079 0.132 8 7.943 8.067 0.124 8.170 0.228 8.075 0.132 9 8.949 9.058 0.109 9.153 0.204 9.070 0.121 10 9.958 10.050 0.092 10.140 0.182 10.064 0.106 11 10.962 11.043 0.081 11.129 0.167 11.057 0.095 12 11.949 12.037 0. 088 12.118 0.169 12.050 0.101 13 12.946 13.032 0.086 13.109 0.163 13.043 0.097 14 13.946 14.029 0.083 14.103 0.157 14.037 0.090 15 14.938 15.027 0.089 15.102 0.163 15.031 0.093 16 15.957 16.026 0.069 16.106 0.149 16.028 0.071 17 16.985 17.026 0.041 17 .117 0.131 17.026 0.041 18 18.002 18.028 0.026 18.134 0.133 18.027 0.025 19 19.007 19.032 0.025 19.159 0.152 19.031 0.024 20 20.028 20.039 0.011 20.191 0.163 20.038 0.010 21 21.052 21.048 0.004 21.230 0.179 21.049 0.003 22 22.055 22.060 0.005 22.275 0.221 22.062 0.008 23 23.047 23.075 0.028 23.325 0.278 23.078 0.032 24 24.027 24.092 0.065 24.379 0.352 24.098 0.070 25 25.023 25.113 0.090 25.436 0.413 25.120 0.098 26 26.025 26.136 0.111 26.492 0.467 26.147 0.122 27 27.032 27.163 0.131 27.545 0.512 27.179 0.147 28 28.066 28.193 0.127 28.591 0.525 28.216 0.150 29 29.094 29.225 0.132 29.626 0.533 29.256 0.162 30 30.087 30.258 0.171 30.647 0.560 30.296 0.209 31 31.080 31.289 0.209 31.649 0.569 31.332 0.252 32 32.046 32.314 0.268 32.631 0.586 32.35 7 0.312 33 33.018 33.327 0.309 33.595 0.577 33.366 0.348 34 34.007 34.325 0.319 34.544 0.537 34.353 0.347 35 34.943 35.308 0.365 35.482 0.539 35.315 0.372 36 35.907 36.276 0.369 36.413 0.506 36.253 0.346 37 36.911 37.231 0.319 37.342 0.431 37.173 0.26 2 38 37.887 38.176 0.289 38.272 0.384 38.088 0.201 39 39.036 39.120 0.083 39.191 0.154 39.020 0.017 40 40.180 39.972 0.208 39.944 0.235 38.755 1.425 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 222

222 Table A 57. Equating mean score and score difference for NOP linking method condition 7 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.183 0 .000 0.183 0.000 0.183 0.000 0.183 1 0.966 1.343 0.378 1.646 0.680 1.149 0.183 2 1.988 2.377 0.390 2.673 0.686 2.211 0.224 3 2.950 3.413 0.463 3.674 0.723 3.263 0.312 4 3.952 4.449 0.497 4.662 0.710 4.321 0.369 5 4.987 5.485 0.498 5.650 0.663 5.388 0. 401 6 5.980 6.518 0.537 6.638 0.658 6.456 0.475 7 6.948 7.546 0.599 7.639 0.692 7.515 0.567 8 7.943 8.569 0.627 8.660 0.717 8.561 0.619 9 8.949 9.586 0.637 9.695 0.746 9.594 0.645 10 9.958 10.597 0.639 10.734 0.776 10.613 0.655 11 10.962 11.602 0.640 11.767 0.805 11.621 0.659 12 11.949 12.603 0.654 12.792 0.843 12.622 0.673 13 12.946 13.601 0.655 13.808 0.861 13.617 0.671 14 13.946 14.597 0.650 14.815 0.869 14.609 0.663 15 14.938 15.590 0.652 15.816 0.878 15.599 0.661 16 15.957 16.582 0.626 16.81 4 0.858 16.589 0.632 17 16.985 17.573 0.588 17.810 0.825 17.578 0.593 18 18.002 18.563 0.561 18.805 0.803 18.567 0.565 19 19.007 19.552 0.545 19.800 0.793 19.555 0.548 20 20.028 20.540 0.512 20.797 0.769 20.542 0.514 21 21.052 21.528 0.476 21.796 0.74 4 21.529 0.477 22 22.055 22.516 0.461 22.796 0.742 22.514 0.460 23 23.047 23.504 0.457 23.799 0.753 23.499 0.453 24 24.027 24.493 0.466 24.805 0.777 24.486 0.459 25 25.023 25.485 0.463 25.812 0.789 25.477 0.454 26 26.025 26.481 0.456 26.819 0.794 26.4 73 0.448 27 27.032 27.480 0.448 27.825 0.793 27.476 0.444 28 28.066 28.483 0.418 28.827 0.761 28.486 0.420 29 29.094 29.490 0.396 29.820 0.726 29.499 0.405 30 30.087 30.497 0.409 30.802 0.714 30.511 0.424 31 31.080 31.500 0.420 31.769 0.689 31.517 0.4 37 32 32.046 32.496 0.450 32.722 0.677 32.511 0.465 33 33.018 33.479 0.461 33.662 0.644 33.487 0.469 34 34.007 34.448 0.441 34.592 0.585 34.443 0.436 35 34.943 35.402 0.459 35.515 0.572 35.377 0.434 36 35.907 36.343 0.436 36.439 0.531 36.293 0.385 37 36.911 37.276 0.364 37.368 0.457 37.197 0.285 38 37.887 38.202 0.315 38.305 0.417 38.103 0.216 39 39.036 39.124 0.088 39.232 0.196 39.024 0.013 40 40.180 39.946 0.234 39.932 0.248 38.755 1.425 Note: Test scores at both ends (i.e., 0, 40) are not c ounted due to ad hoc procedure (Kolen, 1981).

PAGE 223

223 Table A 58. Equating mean score and score difference for linking method condition 8 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalen t Mean Score Differenc e Mean S core Differenc e Mean Score Differenc e 0 0.285 0.000 0.285 0.000 0.285 0.000 0.285 1 0.693 0.963 0.270 1.441 0.749 1.102 0.409 2 1.734 1.955 0.221 2.453 0.719 2.143 0.409 3 2.793 2.950 0.156 3.453 0.660 3.173 0.379 4 3.804 3.948 0.144 4.448 0.644 4.20 4 0.400 5 4.796 4.951 0.155 5.436 0.640 5.238 0.443 6 5.807 5.959 0.152 6.426 0.619 6.273 0.466 7 6.794 6.973 0.179 7.424 0.630 7.305 0.510 8 7.778 7.993 0.215 8.430 0.651 8.330 0.552 9 8.786 9.019 0.233 9.443 0.657 9.349 0.563 10 9.793 10.049 0.255 10.458 0.665 10.361 0.568 11 10.805 11.082 0.277 11.471 0.666 11.367 0.562 12 11.808 12.118 0.310 12.480 0.672 12.368 0.561 13 12.837 13.155 0.318 13.485 0.648 13.366 0.529 14 13.872 14.193 0.321 14.487 0.615 14.361 0.489 15 14.889 15.231 0.342 15.486 0.597 15.353 0.464 16 15.936 16.269 0.333 16.484 0.548 16.344 0.408 17 16.963 17.307 0.344 17.481 0.518 17.335 0.372 18 17.978 18.345 0.366 18.478 0.500 18.325 0.347 19 19.018 19.382 0.365 19.476 0.459 19.316 0.298 20 20.066 20.420 0.354 20.476 0.411 20.306 0.240 21 21.110 21.457 0.348 21.478 0.369 21.296 0.187 22 22.131 22.495 0.364 22.483 0.352 22.287 0.156 23 23.148 23.532 0.384 23.491 0.343 23.278 0.131 24 24.160 24.570 0.409 24.502 0.341 24.271 0.111 25 25.177 25.609 0.432 25.515 0.337 25.26 6 0.089 26 26.208 26.651 0.443 26.528 0.321 26.266 0.058 27 27.221 27.697 0.476 27.541 0.320 27.270 0.049 28 28.236 28.748 0.512 28.550 0.313 28.280 0.044 29 29.267 29.803 0.536 29.552 0.285 29.293 0.027 30 30.282 30.858 0.576 30.544 0.262 30.307 0.02 5 31 31.274 31.908 0.634 31.525 0.251 31.317 0.042 32 32.243 32.945 0.702 32.494 0.251 32.318 0.075 33 33.243 33.960 0.717 33.453 0.210 33.308 0.065 34 34.227 34.947 0.721 34.405 0.178 34.282 0.055 35 35.183 35.905 0.722 35.353 0.169 35.239 0.056 36 36.108 36.833 0.725 36.303 0.195 36.183 0.076 37 37.021 37.729 0.708 37.258 0.236 37.120 0.099 38 37.972 38.595 0.623 38.219 0.248 38.061 0.090 39 39.174 39.420 0.246 39.167 0.007 39.017 0.158 40 40.244 39.999 0.245 39.941 0.303 38.755 1.489 Note : Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 224

224 Table A 59. Equating mean score and score difference for ODL direct method c ondition 8 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed A pprox True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.285 0.000 0.285 0.000 0.285 0.000 0.285 1 0.693 0.858 0.165 1.213 0.520 1.029 0.337 2 1.734 1.821 0.087 2.201 0.467 2.036 0.302 3 2.793 2.779 0.014 3.2 01 0.408 3.040 0.247 4 3.804 3.748 0.056 4.193 0.389 4.044 0.240 5 4.796 4.729 0.066 5.184 0.389 5.047 0.252 6 5.807 5.723 0.085 6.177 0.369 6.052 0.244 7 6.794 6.728 0.067 7.168 0.374 7.059 0.265 8 7.778 7.744 0.034 8.161 0.382 8.070 0.291 9 8. 786 8.773 0.013 9.156 0.370 9.085 0.299 10 9.793 9.811 0.018 10.154 0.361 10.103 0.310 11 10.805 10.858 0.053 11.157 0.352 11.124 0.319 12 11.808 11.912 0.104 12.165 0.357 12.146 0.339 13 12.837 12.971 0.134 13.179 0.342 13.169 0.333 14 13.872 14.033 0.162 14.199 0.327 14.192 0.321 15 14.889 15.098 0.209 15.226 0.337 15.216 0.327 16 15.936 16.165 0.229 16.259 0.323 16.239 0.303 17 16.963 17.234 0.271 17.297 0.335 17.264 0.301 18 17.978 18.303 0.325 18.341 0.363 18.289 0.310 19 19.018 19.373 0.356 19.390 0.372 19.314 0.297 20 20.066 20.444 0.378 20.443 0.378 20.340 0.275 21 21.110 21.514 0.405 21.501 0.391 21.366 0.257 22 22.131 22.584 0.453 22.562 0.431 22.392 0.261 23 23.148 23.653 0.505 23.626 0.479 23.416 0.269 24 24.160 24.721 0.561 24.69 3 0.533 24.442 0.281 25 25.177 25.791 0.613 25.762 0.584 25.470 0.293 26 26.208 26.863 0.655 26.829 0.621 26.503 0.296 27 27.221 27.939 0.718 27.893 0.672 27.543 0.322 28 28.236 29.021 0.785 28.948 0.712 28.589 0.353 29 29.267 30.106 0.839 29.989 0.72 2 29.637 0.370 30 30.282 31.188 0.906 31.010 0.727 30.679 0.397 31 31.274 32.259 0.985 32.006 0.731 31.710 0.435 32 32.243 33.308 1.065 32.977 0.734 32.719 0.476 33 33.243 34.325 1.083 33.927 0.684 33.701 0.459 34 34.227 35.305 1.079 34.859 0.632 34.6 54 0.427 35 35.183 36.245 1.062 35.780 0.597 35.576 0.392 36 36.108 37.145 1.037 36.695 0.588 36.471 0.364 37 37.021 38.002 0.981 37.609 0.588 37.349 0.327 38 37.972 38.822 0.850 38.519 0.547 38.218 0.246 39 39.174 39.580 0.406 39.389 0.215 39.091 0. 084 40 40.244 39.991 0.253 39.947 0.297 38.755 1.489 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 225

225 Table A 60. Equating mean score and score difference for TCF linking method condition 8 Test Cha racteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.285 0.000 0.285 0.000 0.285 0.000 0.285 1 0.693 0.754 0.062 1.028 0.335 0.985 0.2 93 2 1.734 1.704 0.030 2.002 0.268 1.964 0.230 3 2.793 2.648 0.146 2.996 0.203 2.941 0.148 4 3.804 3.600 0.204 3.988 0.185 3.916 0.113 5 4.796 4.561 0.234 4.975 0.179 4.891 0.096 6 5.807 5.534 0.273 5.957 0.150 5.870 0.063 7 6.794 6.519 0.275 6 .934 0.140 6.854 0.060 8 7.778 7.519 0.260 7.907 0.129 7.847 0.069 9 8.786 8.532 0.254 8.883 0.097 8.848 0.062 10 9.793 9.559 0.235 9.864 0.071 9.856 0.063 11 10.805 10.597 0.208 10.853 0.048 10.871 0.066 12 11.808 11.645 0.163 11.851 0.043 11.88 9 0.081 13 12.837 12.700 0.136 12.858 0.021 12.910 0.074 14 13.872 13.761 0.111 13.874 0.003 13.933 0.061 15 14.889 14.826 0.063 14.900 0.011 14.957 0.068 16 15.936 15.895 0.041 15.935 0.001 15.982 0.046 17 16.963 16.966 0.003 16.976 0.014 17.008 0.046 18 17.978 18.040 0.062 18.025 0.047 18.037 0.059 19 19.018 19.116 0.098 19.081 0.063 19.068 0.051 20 20.066 20.194 0.128 20.143 0.077 20.101 0.036 21 21.110 21.273 0.164 21.210 0.101 21.136 0.026 22 22.131 22.353 0.222 22.284 0.153 22.172 0.041 23 23.148 23.433 0.285 23.362 0.214 23.208 0.061 24 24.160 24.512 0.352 24.444 0.283 24.246 0.086 25 25.177 25.592 0.415 25.528 0.351 25.285 0.108 26 26.208 26.673 0.466 26.613 0.405 26.329 0.121 27 27.221 27.758 0.537 27.694 0.473 27.377 0.156 28 2 8.236 28.848 0.612 28.766 0.530 28.430 0.194 29 29.267 29.940 0.673 29.824 0.557 29.486 0.219 30 30.282 31.032 0.750 30.860 0.578 30.538 0.255 31 31.274 32.114 0.840 31.871 0.597 31.579 0.305 32 32.243 33.176 0.933 32.856 0.613 32.602 0.359 33 33.243 34.208 0.966 33.817 0.574 33.601 0.358 34 34.227 35.203 0.977 34.760 0.533 34.570 0.344 35 35.183 36.158 0.975 35.689 0.506 35.509 0.325 36 36.108 37.071 0.964 36.613 0.506 36.420 0.312 37 37.021 37.938 0.917 37.534 0.513 37.312 0.291 38 37.972 38.764 0.793 38.451 0.479 38.198 0.226 39 39.174 39.549 0.374 39.337 0.162 39.092 0.083 40 40.244 40.000 0.244 39.960 0.285 38.755 1.489 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 226

226 Table A 61. Equati ng mean score and score difference for ICF linking method condition 8 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.28 5 0.000 0.285 0.000 0.285 0.000 0.285 1 0.693 0.780 0.087 1.111 0.418 1.019 0.326 2 1.734 1.734 0.000 2.091 0.357 2.011 0.277 3 2.793 2.681 0.113 3.079 0.286 2.995 0.202 4 3.804 3.635 0.169 4.071 0.267 3.973 0.170 5 4.796 4.598 0.198 5.059 0.263 4. 948 0.153 6 5.807 5.570 0.237 6.038 0.230 5.924 0.117 7 6.794 6.553 0.241 7.012 0.218 6.904 0.110 8 7.778 7.549 0.229 7.981 0.203 7.891 0.112 9 8.786 8.557 0.229 8.953 0.167 8.884 0.098 10 9.793 9.578 0.215 9.929 0.136 9.885 0.091 11 10.805 10.6 09 0.196 10.911 0.107 10.890 0.085 12 11.808 11.649 0.159 11.901 0.093 11.899 0.092 13 12.837 12.695 0.141 12.898 0.061 12.911 0.074 14 13.872 13.747 0.125 13.904 0.032 13.923 0.051 15 14.889 14.803 0.086 14.919 0.030 14.937 0.048 16 15.936 15.86 1 0.075 15.941 0.005 15.951 0.015 17 16.963 16.923 0.040 16.971 0.008 16.968 0.005 18 17.978 17.987 0.008 18.008 0.029 17.986 0.008 19 19.018 19.053 0.036 19.051 0.033 19.007 0.011 20 20.066 20.122 0.056 20.100 0.034 20.030 0.036 21 21.110 21.193 0.083 21.155 0.045 21.055 0.054 22 22.131 22.265 0.134 22.215 0.084 22.083 0.048 23 23.148 23.338 0.190 23.281 0.133 23.113 0.035 24 24.160 24.411 0.251 24.350 0.189 24.144 0.016 25 25.177 25.486 0.309 25.422 0.245 25.178 0.001 26 26.208 26.563 0. 355 26.495 0.287 26.216 0.008 27 27.221 27.644 0.423 27.566 0.345 27.259 0.038 28 28.236 28.728 0.492 28.630 0.393 28.306 0.070 29 29.267 29.816 0.549 29.682 0.415 29.355 0.088 30 30.282 30.903 0.620 30.716 0.434 30.402 0.119 31 31.274 31.981 0.707 31 .729 0.455 31.440 0.165 32 32.243 33.043 0.800 32.720 0.477 32.464 0.221 33 33.243 34.079 0.836 33.691 0.448 33.469 0.226 34 34.227 35.082 0.855 34.645 0.419 34.449 0.223 35 35.183 36.047 0.864 35.588 0.405 35.404 0.220 36 36.108 36.974 0.867 36.525 0 .417 36.334 0.226 37 37.021 37.858 0.837 37.458 0.436 37.246 0.225 38 37.972 38.702 0.730 38.385 0.413 38.149 0.178 39 39.174 39.512 0.338 39.283 0.109 39.059 0.115 40 40.244 40.000 0.244 39.946 0.299 38.755 1.489 Note: Test scores at both ends (i .e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 227

227 Table A 62. Equating mean score and score difference for NOP linking method condition 8 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.285 0.000 0.285 0.000 0.285 0.000 0.285 1 0.693 1.014 0.321 1.548 0.855 1.120 0.427 2 1.734 2.019 0.285 2.580 0.846 2.178 0.444 3 2.793 3.028 0.234 3.596 0.803 3.226 0.433 4 3.804 4.041 0.237 4.595 0.791 4.277 0.473 5 4.796 5.060 0.264 5.588 0.792 5.333 0.537 6 5.807 6.084 0.277 6.581 0.774 6.388 0.581 7 6.794 7.113 0.319 7.582 0.787 7.438 0.644 8 7.778 8.147 0.369 8.597 0.818 8.480 0.702 9 8.786 9.185 0.399 9.624 0.838 9.512 0.726 10 9. 793 10.225 0.431 10.654 0.861 10.533 0.740 11 10.805 11.266 0.461 11.682 0.877 11.546 0.741 12 11.808 12.307 0.499 12.703 0.896 12.552 0.744 13 12.837 13.348 0.511 13.719 0.882 13.552 0.716 14 13.872 14.388 0.516 14.728 0.856 14.549 0.677 15 14.889 15 .428 0.539 15.732 0.843 15.542 0.653 16 15.936 16.466 0.530 16.732 0.796 16.534 0.598 17 16.963 17.504 0.541 17.730 0.768 17.524 0.561 18 17.978 18.540 0.562 18.727 0.749 18.513 0.534 19 19.018 19.576 0.558 19.723 0.705 19.500 0.483 20 20.066 20.610 0 .544 20.719 0.653 20.487 0.421 21 21.110 21.642 0.533 21.716 0.606 21.472 0.362 22 22.131 22.674 0.543 22.713 0.583 22.456 0.325 23 23.148 23.704 0.556 23.713 0.565 23.440 0.292 24 24.160 24.735 0.574 24.714 0.553 24.425 0.265 25 25.177 25.767 0.590 2 5.716 0.539 25.414 0.237 26 26.208 26.803 0.595 26.719 0.511 26.408 0.200 27 27.221 27.844 0.623 27.721 0.500 27.407 0.186 28 28.236 28.890 0.653 28.720 0.484 28.413 0.177 29 29.267 29.939 0.672 29.712 0.445 29.421 0.155 30 30.282 30.988 0.706 30.695 0.413 30.430 0.147 31 31.274 32.031 0.756 31.668 0.394 31.433 0.158 32 32.243 33.058 0.815 32.630 0.387 32.426 0.183 33 33.243 34.064 0.821 33.582 0.339 33.407 0.165 34 34.227 35.041 0.815 34.527 0.300 34.373 0.147 35 35.183 35.988 0.805 35.470 0.287 35.323 0.140 36 36.108 36.905 0.797 36.414 0.306 36.260 0.152 37 37.021 37.797 0.776 37.363 0.342 37.187 0.166 38 37.972 38.655 0.683 38.317 0.346 38.115 0.143 39 39.174 39.454 0.279 39.253 0.079 39.041 0.133 40 40.244 39.992 0.252 39.946 0.298 38. 755 1.489 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 228

228 Table A 63. Equating mean score and score difference for linking method condition 9 Min's Method Raw Score Frequency Full MIRT Approx O bserved Approx True Pop Equivalen t Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.500 0.000 0.500 0.000 0.500 0.000 0.500 1 0.768 0.000 0.768 0.099 0.669 0.856 0.088 2 1.751 0.087 1.664 0.551 1.200 1.639 0.112 3 2.780 0. 619 2.161 1.321 1.459 2.376 0.404 4 3.720 1.113 2.607 2.192 1.528 3.055 0.665 5 4.647 1.678 2.969 3.108 1.539 3.678 0.969 6 5.631 2.208 3.423 4.019 1.612 4.264 1.367 7 6.610 2.749 3.861 4.895 1.716 4.834 1.776 8 7.564 3.302 4.261 5.713 1.850 5.407 2.157 9 8.535 3.843 4.691 6.475 2.060 5.996 2.539 10 9.515 4.427 5.087 7.177 2.337 6.612 2.903 11 10.498 4.995 5.503 7.832 2.666 7.263 3.235 12 11.469 5.616 5.853 8.484 2.985 7.952 3.518 13 12.450 6.245 6.205 9.134 3.315 8 .680 3.769 14 13.468 6.904 6.565 9.797 3.671 9.445 4.024 15 14.462 7.605 6.857 10.492 3.970 10.240 4.222 16 15.472 8.330 7.142 11.216 4.255 11.062 4.410 17 16.510 9.084 7.425 11.972 4.538 11.904 4.606 18 17.538 9.880 7.658 12.766 4.772 12.763 4.774 19 18.563 10.712 7.852 13.594 4.969 13.640 4.923 20 19.587 11.578 8.008 14.454 5.133 14.536 5.050 21 20.618 12.479 8.139 15.342 5.276 15.455 5.163 22 21.620 13.418 8.201 16.255 5.365 16.404 5.216 23 22.614 14.398 8.216 17.19 3 5.421 17.387 5.227 24 23.613 15.421 8.192 18.155 5.458 18.409 5.204 25 24.603 16.486 8.117 19.142 5.461 19.470 5.133 26 25.613 17.590 8.023 20.153 5.460 20.564 5.049 27 26.631 18.729 7.903 21.187 5.445 21.677 4.955 28 27.637 19.897 7. 740 22.242 5.395 22.791 4.845 29 28.660 21.086 7.573 23.317 5.343 23.893 4.767 30 29.661 22.289 7.372 24.411 5.249 24.974 4.687 31 30.648 23.501 7.147 25.530 5.118 26.042 4.606 32 31.658 24.728 6.929 26.681 4.977 27.113 4.545 33 32.631 2 5.983 6.649 27.879 4.752 28.216 4.415 34 33.672 27.289 6.384 29.142 4.530 29.394 4.278 35 34.676 28.686 5.990 30.488 4.187 30.705 3.970 36 35.642 30.226 5.416 31.927 3.715 32.229 3.413 37 36.628 31.970 4.659 33.457 3.171 34.048 2.580 38 37.589 33.956 3.634 35.066 2.524 36.169 1.420 39 38.791 36.142 2.649 36.749 2.043 38.374 0.418 40 39.865 38.366 1.499 38.508 1.356 38.755 1.110 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 229

229 Table A 64. Equating mean score and score difference for ODL direct method c ondition 9 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.500 0.000 0.500 0.000 0.500 0.000 0.500 1 0.768 0.064 0.704 0.572 0.196 0.926 0.158 2 1.751 0.644 1.107 1.346 0.405 1.799 0.048 3 2.780 1.269 1.511 2.219 0.561 2.651 0.129 4 3.720 1.878 1.841 3.110 0.610 3.468 0.251 5 4.647 2.577 2.07 0 4.012 0.634 4.254 0.393 6 5.631 3.228 2.403 4.919 0.712 5.026 0.605 7 6.610 3.883 2.727 5.813 0.798 5.801 0.809 8 7.564 4.590 2.973 6.686 0.877 6.592 0.972 9 8.535 5.294 3.240 7.538 0.996 7.408 1.127 10 9.515 6.015 3.499 8.365 1.150 8.254 1.261 11 10.498 6.778 3.720 9.183 1.315 9.132 1.366 12 11.469 7.569 3.900 10.014 1.456 10.038 1.432 13 12.450 8.388 4.061 10.864 1.585 10.968 1.481 14 13.468 9.238 4.230 11.740 1.728 11.918 1.550 15 14.462 10.121 4.342 12.645 1.81 7 12.883 1.579 16 15.472 11.033 4.439 13.580 1.891 13.860 1.612 17 16.510 11.973 4.537 14.543 1.967 14.847 1.663 18 17.538 12.941 4.597 15.532 2.006 15.845 1.692 19 18.563 13.934 4.629 16.546 2.018 16.855 1.708 20 19.587 14.954 4.633 17. 582 2.004 17.878 1.708 21 20.618 16.002 4.616 18.641 1.977 18.917 1.701 22 21.620 17.079 4.541 19.718 1.901 19.974 1.646 23 22.614 18.187 4.427 20.813 1.801 21.049 1.565 24 23.613 19.326 4.287 21.922 1.691 22.140 1.473 25 24.603 20.491 4.112 23.041 1.562 23.240 1.363 26 25.613 21.675 3.938 24.166 1.447 24.340 1.272 27 26.631 22.868 3.763 25.292 1.340 25.433 1.198 28 27.637 24.062 3.574 26.414 1.223 26.512 1.124 29 28.660 25.249 3.410 27.525 1.135 27.576 1.083 30 29.661 26.426 3.234 28.622 1.039 28.629 1.031 31 30.648 27.596 3.052 29.706 0.943 29.678 0.970 32 31.658 28.768 2.890 30.780 0.877 30.731 0.927 33 32.631 29.955 2.676 31.849 0.783 31.798 0.833 34 33.672 31.172 2.500 32.919 0.754 32.889 0.784 35 34.676 32.431 2.245 33.997 0.678 34.013 0.662 36 35.642 33.744 1.898 35.089 0.553 35.175 0.467 37 36.628 35.119 1.509 36.195 0.433 36.376 0.253 38 37.589 36.530 1.060 37.313 0.277 37.608 0.019 39 38.791 37.972 0.819 38.432 0.359 38.847 0.056 40 39.865 39.335 0.529 39.530 0.335 38.755 1.110 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 230

23 0 Table A 65. Equating mean score and score difference for TCF linking method condition 9 Test C haracteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.500 0.000 0.500 0.000 0.500 0.000 0.500 1 0.768 0.136 0.632 0.668 0.100 0.951 0.183 2 1.751 0.743 1.008 1.534 0.217 1.865 0.114 3 2.780 1.417 1.363 2.465 0.314 2.758 0.022 4 3.720 2.098 1.622 3.411 0.309 3.621 0.099 5 4.647 2.841 1.806 4.361 0.286 4.458 0.189 6 5.631 3.524 2.106 5.306 0.324 5.286 0.344 7 6.610 4 .217 2.393 6.229 0.381 6.120 0.491 8 7.564 4.956 2.608 7.128 0.436 6.969 0.595 9 8.535 5.693 2.841 8.005 0.530 7.841 0.693 10 9.515 6.446 3.069 8.862 0.653 8.738 0.776 11 10.498 7.235 3.263 9.717 0.781 9.660 0.838 12 11.469 8.052 3.418 10.586 0.883 10.603 0.866 13 12.450 8.894 3.556 11.475 0.974 11.563 0.886 14 13.468 9.763 3.705 12.389 1.080 12.535 0.934 15 14.462 10.661 3.801 13.326 1.136 13.514 0.948 16 15.472 11.585 3.886 14.287 1.185 14.500 0.972 17 16.510 12.532 3.978 15.266 1.244 15.490 1.020 18 17.538 13.501 4.037 16.263 1.275 16.486 1.052 19 18.563 14.490 4.073 17.273 1.290 17.489 1.074 20 19.587 15.501 4.085 18.296 1.291 18.501 1.085 21 20.618 16.536 4.082 19.330 1.289 19.524 1.094 22 21.6 20 17.595 4.024 20.373 1.246 20.558 1.061 23 22.614 18.682 3.932 21.426 1.188 21.605 1.009 24 23.613 19.793 3.820 22.486 1.127 22.661 0.952 25 24.603 20.926 3.676 23.551 1.052 23.723 0.880 26 25.613 22.075 3.538 24.618 0.995 24.783 0.829 27 26.631 23.230 3.401 25.683 0.948 25.836 0.795 28 27.637 24.386 3.251 26.744 0.893 26.878 0.759 29 28.660 25.535 3.125 27.799 0.860 27.910 0.750 30 29.661 26.676 2.985 28.850 0.811 28.936 0.725 31 30.648 27.814 2.835 29.898 0.750 29.9 63 0.686 32 31.658 28.958 2.700 30.948 0.710 30.998 0.660 33 32.631 30.123 2.508 32.003 0.628 32.049 0.582 34 33.672 31.324 2.349 33.068 0.605 33.123 0.550 35 34.676 32.569 2.106 34.144 0.532 34.226 0.449 36 35.642 33.869 1.773 35.233 0 .409 35.362 0.280 37 36.628 35.234 1.395 36.336 0.292 36.530 0.099 38 37.589 36.631 0.959 37.454 0.136 37.721 0.132 39 38.791 38.053 0.739 38.578 0.213 38.911 0.119 40 39.865 39.408 0.457 39.658 0.207 38.755 1.110 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 231

231 Table A 66. Equating mean score and score difference for ICF linking method condition 9 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.500 0.000 0.500 0.000 0.500 0.000 0.500 1 0.768 0.250 0.518 0.873 0.106 0.993 0.225 2 1.751 0.883 0.868 1.786 0.035 1.940 0.189 3 2.780 1.596 1.184 2.739 0.041 2.869 0.089 4 3.720 2.323 1.397 3.702 0.018 3.775 0.055 5 4.647 3.092 1.555 4.665 0.018 4.664 0.017 6 5.631 3.804 1.826 5.620 0.011 5.548 0.082 7 6.610 4.525 2.085 6.551 0.059 6.438 0.172 8 7.564 5.288 2.276 7.468 0.095 7.341 0.223 9 8.5 35 6.058 2.476 8.376 0.159 8.260 0.275 10 9.515 6.843 2.672 9.275 0.239 9.197 0.318 11 10.498 7.657 2.841 10.179 0.319 10.150 0.348 12 11.469 8.497 2.972 11.092 0.378 11.117 0.352 13 12.450 9.362 3.088 12.020 0.429 12.094 0.355 14 13.46 8 10.252 3.216 12.967 0.502 13.078 0.390 15 14.462 11.168 3.295 13.930 0.533 14.067 0.396 16 15.472 12.105 3.367 14.907 0.564 15.058 0.413 17 16.510 13.062 3.448 15.897 0.613 16.052 0.458 18 17.538 14.037 3.500 16.897 0.641 17.050 0.488 19 18.563 15.031 3.532 17.906 0.658 18.052 0.512 20 19.587 16.044 3.542 18.921 0.665 19.059 0.528 21 20.618 17.078 3.540 19.943 0.675 20.073 0.545 22 21.620 18.135 3.485 20.970 0.650 21.094 0.525 23 22.614 19.214 3.400 22.001 0.613 22.12 3 0.491 24 23.613 20.316 3.297 23.035 0.578 23.158 0.455 25 24.603 21.436 3.167 24.071 0.532 24.195 0.408 26 25.613 22.567 3.045 25.105 0.507 25.230 0.382 27 26.631 23.703 2.928 26.137 0.494 26.260 0.372 28 27.637 24.837 2.800 27.164 0. 472 27.281 0.356 29 28.660 25.964 2.696 28.186 0.473 28.294 0.366 30 29.661 27.084 2.577 29.204 0.457 29.303 0.358 31 30.648 28.201 2.448 30.221 0.428 30.311 0.337 32 31.658 29.325 2.333 31.239 0.418 31.326 0.332 33 32.631 30.468 2.164 3 2.264 0.367 32.350 0.281 34 33.672 31.642 2.031 33.298 0.375 33.391 0.282 35 34.676 32.855 1.821 34.343 0.332 34.452 0.224 36 35.642 34.118 1.524 35.403 0.239 35.537 0.105 37 36.628 35.438 1.190 36.476 0.152 36.647 0.019 38 37.589 36.790 0.800 37.565 0.024 37.781 0.191 39 38.791 38.150 0.641 38.663 0.128 38.923 0.131 40 39.865 39.455 0.410 39.725 0.139 38.755 1.110 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 232

232 Table A 67. Equa ting mean score and score difference for the NOP linking method condition 9 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.500 0.000 0.500 0. 000 0.500 0.000 0.500 1 0.768 0.000 0.768 0.000 0.768 0.710 0.058 2 1.751 0.000 1.751 0.022 1.729 1.337 0.414 3 2.780 0.000 2.780 0.190 2.590 1.912 0.867 4 3.720 0.063 3.656 0.724 2.996 2.425 1.295 5 4.647 0.494 4.153 1.530 3.117 2.875 1.772 6 5.631 0.757 4.873 2.382 3.248 3.277 2.353 7 6.610 1.160 5.450 3.215 3.395 3.650 2.961 8 7.564 1.587 5.977 4.003 3.560 4.008 3.556 9 8.535 1.915 6.619 4.722 3.813 4.364 4.170 10 9.515 2.367 7.148 5.346 4.169 4.729 4.786 11 10.49 8 2.749 7.749 5.909 4.588 5.111 5.387 12 11.469 3.168 8.302 6.437 5.032 5.516 5.954 13 12.450 3.625 8.824 6.917 5.532 5.950 6.500 14 13.468 4.052 9.416 7.405 6.064 6.417 7.052 15 14.462 4.553 9.909 7.866 6.596 6.920 7.543 16 15.472 5.02 6 10.446 8.356 7.116 7.461 8.011 17 16.510 5.569 10.941 8.847 7.663 8.041 8.469 18 17.538 6.107 11.431 9.377 8.161 8.659 8.879 19 18.563 6.703 11.861 9.929 8.635 9.317 9.247 20 19.587 7.329 12.258 10.532 9.054 10.013 9.574 21 20.618 7.9 94 12.624 11.169 9.449 10.749 9.869 22 21.620 8.718 12.902 11.857 9.763 11.528 10.092 23 22.614 9.493 13.121 12.596 10.018 12.351 10.263 24 23.613 10.320 13.293 13.380 10.233 13.224 10.389 25 24.603 11.206 13.397 14.210 10.393 14.148 10. 455 26 25.613 12.148 13.464 15.086 10.527 15.123 10.490 27 26.631 13.145 13.487 16.006 10.626 16.146 10.486 28 27.637 14.193 13.444 16.968 10.669 17.210 10.426 29 28.660 15.290 13.369 17.970 10.690 18.309 10.351 30 29.661 16.438 13.223 19 .012 10.649 19.436 10.225 31 30.648 17.640 13.009 20.096 10.552 20.590 10.059 32 31.658 18.907 12.751 21.227 10.431 21.773 9.884 33 32.631 20.256 12.376 22.418 10.213 22.999 9.633 34 33.672 21.710 11.962 23.694 9.978 24.293 9.379 35 34.6 76 23.305 11.370 25.093 9.583 25.711 8.965 36 35.642 25.092 10.550 26.676 8.966 27.361 8.281 37 36.628 27.159 9.469 28.531 8.097 29.477 7.151 38 37.589 29.663 7.926 30.769 6.820 32.568 5.021 39 38.791 32.811 5.981 33.456 5.336 37.057 1.7 34 40 39.865 36.519 3.346 36.566 3.299 38.755 1.110 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 233

233 Table A 68. Equating mean score and score difference for linking method condition 10 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalen t Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.148 0.000 0.148 0.000 0.148 0.000 0.148 1 1.269 1.628 0.359 1.824 0.554 1.223 0.047 2 2.3 81 2.764 0.384 2.847 0.466 2.341 0.040 3 3.423 3.913 0.490 3.848 0.425 3.453 0.030 4 4.411 5.060 0.649 4.845 0.433 4.566 0.155 5 5.416 6.187 0.771 5.839 0.423 5.665 0.249 6 6.393 7.281 0.888 6.855 0.462 6.738 0.344 7 7.360 8.336 0.976 7.897 0.537 7.7 82 0.422 8 8.328 9.351 1.023 8.966 0.638 8.800 0.472 9 9.283 10.333 1.049 10.037 0.753 9.798 0.515 10 10.253 11.286 1.034 11.083 0.831 10.781 0.528 11 11.219 12.218 0.999 12.101 0.882 11.753 0.534 12 12.178 13.134 0.956 13.093 0.916 12.718 0.541 13 1 3.136 14.037 0.901 14.065 0.929 13.680 0.544 14 14.085 14.931 0.846 15.021 0.935 14.640 0.554 15 15.046 15.817 0.771 15.964 0.918 15.597 0.552 16 16.044 16.696 0.653 16.899 0.855 16.554 0.510 17 17.056 17.570 0.514 17.827 0.771 17.509 0.453 18 18.041 18.439 0.398 18.749 0.708 18.463 0.422 19 19.013 19.306 0.293 19.666 0.653 19.415 0.402 20 19.997 20.173 0.175 20.579 0.581 20.367 0.370 21 20.996 21.041 0.045 21.487 0.491 21.318 0.322 22 21.986 21.913 0.073 22.391 0.405 22.271 0.286 23 22.945 22.79 1 0.154 23.292 0.347 23.228 0.283 24 23.912 23.678 0.234 24.189 0.277 24.190 0.278 25 24.897 24.575 0.322 25.084 0.187 25.159 0.262 26 25.881 25.481 0.400 25.976 0.095 26.135 0.254 27 26.863 26.396 0.467 26.870 0.006 27.116 0.253 28 27.849 27.317 0.533 27.766 0.083 28.100 0.250 29 28.843 28.241 0.603 28.668 0.175 29.081 0.237 30 29.804 29.166 0.638 29.580 0.224 30.056 0.251 31 30.754 30.095 0.659 30.508 0.246 31.024 0.270 32 31.719 31.031 0.687 31.455 0.263 31.985 0.266 33 32.643 31 .983 0.660 32.429 0.214 32.943 0.300 34 33.593 32.960 0.633 33.431 0.163 33.903 0.310 35 34.547 33.977 0.569 34.460 0.086 34.872 0.325 36 35.522 35.045 0.478 35.515 0.008 35.854 0.332 37 36.528 36.168 0.361 36.586 0.058 36.849 0.320 38 37.474 37.336 0.138 37.664 0.190 37.848 0.374 39 38.682 38.521 0.161 38.738 0.056 38.868 0.186 40 39.887 39.677 0.210 39.756 0.131 38.755 1.132 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 234

234 Table A 69 Equating mean score and score difference for ODL direct method c ondition 10 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.148 0 .000 0.148 0.000 0.148 0.000 0.148 1 1.269 1.555 0.285 1.466 0.196 1.056 0.214 2 2.381 2.672 0.291 2.498 0.117 2.109 0.272 3 3.423 3.811 0.388 3.513 0.090 3.180 0.243 4 4.411 4.960 0.549 4.530 0.118 4.279 0.132 5 5.416 6.106 0.690 5.552 0.136 5 .395 0.020 6 6.393 7.231 0.838 6.584 0.191 6.511 0.118 7 7.360 8.326 0.966 7.639 0.279 7.614 0.254 8 8.328 9.388 1.060 8.724 0.395 8.699 0.371 9 9.283 10.418 1.135 9.825 0.541 9.766 0.483 10 10.253 11.422 1.169 10.920 0.667 10.819 0.566 11 11.219 12 .403 1.184 11.999 0.780 11.859 0.640 12 12.178 13.368 1.190 13.062 0.884 12.891 0.713 13 13.136 14.320 1.183 14.110 0.974 13.918 0.782 14 14.085 15.261 1.176 15.147 1.062 14.943 0.858 15 15.046 16.194 1.148 16.176 1.130 15.966 0.920 16 16.044 17.120 1 .076 17.199 1.156 16.987 0.943 17 17.056 18.040 0.984 18.218 1.162 18.005 0.950 18 18.041 18.954 0.913 19.233 1.192 19.021 0.980 19 19.013 19.864 0.851 20.244 1.231 20.031 1.018 20 19.997 20.772 0.775 21.250 1.253 21.036 1.039 21 20.996 21.678 0.682 2 2.252 1.256 22.034 1.039 22 21.986 22.584 0.598 23.248 1.262 23.028 1.042 23 22.945 23.492 0.547 24.239 1.294 24.018 1.073 24 23.912 24.404 0.492 25.224 1.312 25.008 1.096 25 24.897 25.323 0.426 26.203 1.306 26.001 1.104 26 25.881 26.248 0.367 27.174 1.293 27.001 1.120 27 26.863 27.179 0.315 28.137 1.273 28.006 1.142 28 27.849 28.114 0.265 29.089 1.240 29.013 1.163 29 28.843 29.053 0.210 30.031 1.188 30.015 1.172 30 29.804 29.993 0.189 30.962 1.158 31.005 1.201 31 30.754 30.933 0.179 31.885 1.131 31.976 1.222 32 31.719 31.872 0.153 32.803 1.084 32.922 1.203 33 32.643 32.814 0.171 33.721 1.078 33.841 1.198 34 33.593 33.765 0.172 34.646 1.053 34.734 1.141 35 34.547 34.736 0.189 35.584 1.037 35.610 1.063 36 35.522 35.732 0.210 36.537 1.015 36.478 0.956 37 36.528 36.758 0.230 37.502 0.974 37.349 0.821 38 37.474 37.811 0.337 38.469 0.994 38.228 0.753 39 38.682 38.872 0.190 39.366 0.684 39.105 0.423 40 39.887 39.843 0.044 39.892 0.005 38.755 1.132 Note: Test scores at both ends (i.e., 0, 40) a re not counted due to ad hoc procedure (Kolen, 1981).

PAGE 235

235 Table A 70. Equating mean score and score difference for TCF linking method condition 10 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent M ean Score Difference Mean Score Difference Mean Score Difference 0 0.148 0.000 0.148 0.000 0.148 0.000 0.148 1 1.269 2.237 0.968 2.959 1.689 1.656 0.386 2 2.381 3.517 1.136 4.018 1.637 2.935 0.554 3 3.423 4.770 1.346 5.027 1.604 4.183 0.760 4 4.411 5.996 1.584 6.019 1.607 5.429 1.018 5 5.416 7.161 1.745 7.025 1.609 6.626 1.210 6 6.393 8.249 1.856 8.071 1.678 7.741 1.348 7 7.360 9.260 1.900 9.168 1.808 8.772 1.411 8 8.328 10.202 1.874 10.285 1.957 9.733 1.404 9 9.283 11.089 1.805 11.369 2.086 10 .640 1.357 10 10.253 11.932 1.680 12.385 2.132 11.510 1.258 11 11.219 12.744 1.525 13.329 2.110 12.354 1.135 12 12.178 13.534 1.356 14.211 2.033 13.181 1.004 13 13.136 14.305 1.169 15.043 1.907 13.997 0.860 14 14.085 15.063 0.977 15.837 1.751 14.804 0 .719 15 15.046 15.810 0.764 16.601 1.555 15.606 0.560 16 16.044 16.549 0.506 17.342 1.298 16.403 0.359 17 17.056 17.282 0.227 18.065 1.010 17.196 0.140 18 18.041 18.011 0.030 18.776 0.735 17.986 0.054 19 19.013 18.737 0.276 19.475 0.462 18.776 0.2 37 20 19.997 19.463 0.534 20.166 0.169 19.566 0.431 21 20.996 20.194 0.802 20.851 0.145 20.360 0.636 22 21.986 20.930 1.056 21.530 0.456 21.161 0.825 23 22.945 21.674 1.271 22.206 0.739 21.971 0.973 24 23.912 22.430 1.482 22.881 1.031 22. 793 1.119 25 24.897 23.199 1.698 23.556 1.341 23.626 1.270 26 25.881 23.978 1.903 24.235 1.646 24.468 1.413 27 26.863 24.765 2.098 24.922 1.942 25.314 1.550 28 27.849 25.558 2.292 25.620 2.229 26.156 1.693 29 28.843 26.353 2.490 26.339 2.504 26.992 1.852 30 29.804 27.150 2.654 27.087 2.718 27.821 1.983 31 30.754 27.950 2.804 27.872 2.882 28.651 2.103 32 31.719 28.762 2.957 28.708 3.011 29.492 2.227 33 32.643 29.602 3.041 29.609 3.034 30.361 2.282 34 33.593 30.492 3.102 30.592 3.002 31.285 2.309 35 34.547 31.461 3.085 31.671 2.876 32.299 2.248 36 35.522 32.550 2.972 32.857 2.665 33.450 2.072 37 36.528 33.802 2.726 34.154 2.374 34.786 1.742 38 37.474 35.259 2.216 35.559 1.915 36.337 1.137 39 38.682 36.9 31 1.751 37.077 1.605 38.117 0.565 40 39.887 38.760 1.127 38.711 1.176 38.755 1.132 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 236

236 Table A 71. Equating mean score and score difference for ICF lin king method condition 10 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.148 0.000 0.148 0.000 0.148 2.786 2.637 1 1.2 69 1.418 0.149 1.204 0.066 1.034 0.235 2 2.381 2.488 0.107 2.202 0.178 2.045 0.335 3 3.423 3.561 0.138 3.202 0.221 3.054 0.369 4 4.411 4.634 0.222 4.193 0.218 4.068 0.343 5 5.416 5.697 0.281 5.186 0.230 5.084 0.332 6 6.393 6.743 0.350 6.177 0.216 6.093 0.300 7 7.360 7.765 0.405 7.172 0.188 7.092 0.268 8 8.328 8.762 0.433 8.176 0.152 8.084 0.245 9 9.283 9.734 0.451 9.189 0.095 9.069 0.214 10 10.253 10.687 0.434 10.197 0.055 10.051 0.201 11 11.219 11.624 0.405 11.196 0.023 11.03 2 0.187 12 12.178 12.548 0.370 12.185 0.007 12.012 0.165 13 13.136 13.462 0.326 13.166 0.030 12.993 0.143 14 14.085 14.369 0.284 14.143 0.058 13.975 0.110 15 15.046 15.271 0.225 15.118 0.072 14.959 0.087 16 16.044 16.170 0.126 16.093 0.049 15.944 0.100 17 17.056 17.065 0.010 17.068 0.013 16.930 0.125 18 18.041 17.960 0.081 18.046 0.005 17.919 0.122 19 19.013 18.853 0.160 19.025 0.012 18.910 0.103 20 19.997 19.748 0.249 20.007 0.009 19.904 0.093 21 20.996 20.646 0.350 20.990 0.006 20 .901 0.095 22 21.986 21.549 0.437 21.976 0.010 21.901 0.085 23 22.945 22.458 0.486 22.962 0.017 22.904 0.040 24 23.912 23.376 0.536 23.950 0.038 23.910 0.001 25 24.897 24.301 0.596 24.938 0.041 24.919 0.022 26 25.881 25.233 0.648 25.925 0.04 4 25.929 0.047 27 26.863 26.170 0.694 26.911 0.048 26.939 0.075 28 27.849 27.109 0.740 27.896 0.047 27.948 0.098 29 28.843 28.051 0.793 28.880 0.036 28.954 0.111 30 29.804 28.993 0.811 29.864 0.059 29.958 0.153 31 30.754 29.939 0.815 30.850 0.096 30.957 0.203 32 31.719 30.893 0.826 31.841 0.122 31.954 0.235 33 32.643 31.863 0.780 32.840 0.197 32.948 0.305 34 33.593 32.861 0.732 33.852 0.259 33.943 0.350 35 34.547 33.897 0.649 34.879 0.332 34.942 0.396 36 35.522 34.982 0.540 35.922 0.400 35.950 0.428 37 36.528 36.120 0.409 36.977 0.448 36.969 0.441 38 37.474 37.299 0.175 38.031 0.557 37.996 0.522 39 38.682 38.496 0.186 39.068 0.386 39.014 0.332 40 39.887 39.688 0.198 39.928 0.042 38.755 1.132 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 237

237 Table A 72. Equating mean score and score difference for NOP linking method condition 10 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Dif ferenc e Mean Score Differenc e Mean Score Differenc e 0 0.148 0.000 0.148 0.000 0.148 0.000 0.148 1 1.269 1.536 0.267 1.397 0.128 1.033 0.237 2 2.381 2.646 0.265 2.437 0.056 2.078 0.303 3 3.423 3.778 0.355 3.458 0.035 3.141 0.282 4 4.411 4.924 0.5 12 4.475 0.063 4.232 0.180 5 5.416 6.069 0.653 5.489 0.073 5.341 0.075 6 6.393 7.198 0.805 6.515 0.122 6.454 0.061 7 7.360 8.298 0.938 7.560 0.200 7.559 0.198 8 8.328 9.366 1.038 8.633 0.305 8.649 0.321 9 9.283 10.403 1.120 9.726 0.443 9.725 0.442 10 10.253 11.414 1.162 10.822 0.569 10.787 0.534 11 11.219 12.404 1.184 11.907 0.688 11.837 0.618 12 12.178 13.376 1.199 12.979 0.802 12.880 0.703 13 13.136 14.336 1.200 14.040 0.904 13.919 0.783 14 14.085 15.286 1.200 15.092 1.007 14.955 0.870 15 15. 046 16.227 1.181 16.138 1.092 15.989 0.943 16 16.044 17.162 1.118 17.179 1.136 17.023 0.979 17 17.056 18.091 1.035 18.218 1.162 18.054 0.999 18 18.041 19.014 0.973 19.254 1.214 19.083 1.042 19 19.013 19.934 0.921 20.288 1.275 20.107 1.094 20 19.997 20 .850 0.853 21.318 1.321 21.125 1.128 21 20.996 21.765 0.769 22.343 1.348 22.136 1.140 22 21.986 22.679 0.693 23.364 1.379 23.142 1.156 23 22.945 23.595 0.650 24.380 1.435 24.143 1.198 24 23.912 24.515 0.603 25.389 1.477 25.143 1.231 25 24.897 25.440 0 .543 26.391 1.494 26.145 1.248 26 25.881 26.371 0.490 27.385 1.504 27.153 1.272 27 26.863 27.309 0.445 28.369 1.506 28.166 1.303 28 27.849 28.251 0.402 29.342 1.492 29.183 1.334 29 28.843 29.197 0.354 30.301 1.458 30.196 1.353 30 29.804 30.144 0.340 3 1.247 1.443 31.197 1.393 31 30.754 31.092 0.338 32.182 1.428 32.179 1.425 32 31.719 32.040 0.321 33.108 1.389 33.135 1.416 33 32.643 32.990 0.347 34.029 1.386 34.060 1.417 34 33.593 33.948 0.354 34.951 1.358 34.956 1.362 35 34.547 34.919 0.373 35.878 1.331 35.825 1.279 36 35.522 35.911 0.388 36.812 1.290 36.676 1.154 37 36.528 36.925 0.396 37.747 1.219 37.515 0.987 38 37.474 37.955 0.481 38.675 1.201 38.348 0.874 39 38.682 38.980 0.297 39.528 0.846 39.162 0.480 40 39.887 39.906 0.019 39.981 0.094 38.755 1.132 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 238

238 Table A 73. Equating mean score and score difference for linking method condition 11 Min's Method Raw Score Frequency Full MIRT Ap prox Observed Approx True Pop Equivalen t Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.054 0.000 0.054 0.000 0.054 2.786 2.732 1 0.974 0.384 0.591 0.336 0.638 0.966 0.009 2 2.036 1.008 1.028 1.021 1.015 1.829 0.207 3 3.114 1.733 1.381 1.921 1.193 2.640 0.474 4 4.107 2.429 1.679 2.863 1.244 3.381 0.726 5 5.108 3.053 2.054 3.811 1.297 4.056 1.052 6 6.078 3.691 2.387 4.750 1.328 4.688 1.390 7 7.009 4.300 2.709 5.654 1.355 5.303 1.706 8 7.948 4.891 3.057 6.510 1.438 5.920 2.028 9 8.891 5.498 3.393 7.292 1.599 6.552 2.339 10 9.842 6.091 3.751 8.012 1.830 7.210 2.632 11 10.809 6.706 4.103 8.704 2.105 7.900 2.909 12 11.788 7.330 4.458 9.387 2.402 8.623 3.165 13 12.775 7.963 4.813 10 .074 2.702 9.379 3.396 14 13.751 8.622 5.130 10.778 2.973 10.163 3.588 15 14.726 9.291 5.436 11.503 3.224 10.970 3.757 16 15.719 9.974 5.745 12.251 3.468 11.793 3.926 17 16.706 10.681 6.025 13.026 3.680 12.630 4.076 18 17.679 11.402 6.2 77 13.824 3.855 13.478 4.201 19 18.646 12.138 6.509 14.644 4.002 14.339 4.307 20 19.624 12.898 6.726 15.483 4.141 15.215 4.409 21 20.642 13.685 6.958 16.336 4.306 16.110 4.532 22 21.644 14.499 7.145 17.203 4.441 17.033 4.611 23 22.606 15 .344 7.262 18.082 4.524 17.989 4.617 24 23.576 16.226 7.351 18.972 4.604 18.982 4.595 25 24.553 17.147 7.406 19.874 4.680 20.013 4.540 26 25.542 18.106 7.436 20.785 4.758 21.074 4.468 27 26.530 19.098 7.432 21.705 4.824 22.149 4.381 28 27.502 20.116 7.387 22.635 4.867 23.218 4.285 29 28.497 21.151 7.346 23.576 4.921 24.265 4.233 30 29.476 22.198 7.278 24.533 4.943 25.285 4.190 31 30.443 23.256 7.187 25.514 4.929 26.287 4.156 32 31.415 24.332 7.083 26.534 4.881 27.288 4 .127 33 32.343 25.445 6.898 27.615 4.728 28.318 4.025 34 33.309 26.628 6.682 28.780 4.529 29.420 3.890 35 34.284 27.931 6.353 30.059 4.225 30.653 3.631 36 35.249 29.432 5.817 31.476 3.774 32.098 3.151 37 36.247 31.232 5.015 33.039 3.208 33.844 2.403 38 37.281 33.429 3.852 34.741 2.540 35.913 1.368 39 38.507 35.968 2.539 36.570 1.937 38.168 0.339 40 39.695 38.477 1.219 38.518 1.177 38.755 0.940 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedu re (Kolen, 1981).

PAGE 239

239 Table A 74. Equating mean score and score difference for ODL d irect method c ondition 11 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.054 0.000 0.054 0.000 0.054 0.000 0.054 1 0.974 0.817 0.157 0.001 0.974 1.199 0.225 2 2.036 1.503 0.533 0.027 2.010 2.176 0.140 3 3.114 2.215 0.899 0.246 2.868 3.118 0.004 4 4.107 2.928 1.179 0.846 3.261 4.036 0.071 5 5.108 3.623 1.485 1.711 3.397 4.921 0.187 6 6.078 4.315 1.763 2.612 3.466 5.771 0.307 7 7.009 5.004 2.005 3.488 3.521 6.591 0.418 8 7.948 5.692 2.256 4.293 3.655 7.390 0.558 9 8.891 6.393 2.498 5.013 3.878 8.178 0.713 10 9.842 7.098 2.745 5.681 4.161 8.966 0.876 11 10.809 7.815 2.994 6.275 4.534 9.761 1.048 12 11.788 8.546 3.242 6.801 4.987 10.569 1.219 13 12.775 9.292 3.483 7.326 5.449 11.390 1.385 14 13.751 10.059 3.692 7.815 5.936 12.226 1.526 15 14.726 10.847 3 .880 8.328 6.399 13.074 1.653 16 15.719 11.655 4.064 8.841 6.878 13.932 1.787 17 16.706 12.487 4.219 9.393 7.313 14.799 1.907 18 17.679 13.344 4.335 9.968 7.711 15.673 2.006 19 18.646 14.226 4.420 10.596 8.050 16.555 2.091 20 19.624 15.1 36 4.487 11.261 8.362 17.447 2.177 21 20.642 16.077 4.566 11.977 8.665 18.350 2.292 22 21.644 17.049 4.595 12.747 8.897 19.269 2.375 23 22.606 18.055 4.551 13.563 9.043 20.206 2.400 24 23.576 19.093 4.484 14.423 9.153 21.166 2.410 25 24 .553 20.161 4.393 15.326 9.228 22.148 2.405 26 25.542 21.254 4.289 16.268 9.274 23.146 2.397 27 26.530 22.363 4.167 17.248 9.282 24.147 2.383 28 27.502 23.478 4.024 18.262 9.240 25.136 2.367 29 28.497 24.590 3.908 19.309 9.189 26.102 2.3 95 30 29.476 25.689 3.787 20.387 9.089 27.045 2.430 31 30.443 26.770 3.673 21.497 8.945 27.972 2.471 32 31.415 27.841 3.574 22.647 8.768 28.896 2.520 33 32.343 28.916 3.427 23.853 8.490 29.839 2.504 34 33.309 30.023 3.286 25.143 8.167 30 .831 2.478 35 34.284 31.197 3.087 26.563 7.721 31.911 2.373 36 35.249 32.490 2.759 28.180 7.070 33.132 2.117 37 36.247 33.951 2.296 30.077 6.171 34.564 1.684 38 37.281 35.596 1.685 32.320 4.961 36.259 1.022 39 38.507 37.368 1.139 34.888 3.619 38.195 0.312 40 39.695 39.074 0.622 37.666 2.030 38.755 0.940 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 240

240 Table A 75. Equating mean score and score difference for TCF linking method condi tion 11 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.054 0.000 0.054 0.000 0.054 0.000 0.054 1 0.974 1.528 0.553 2 .468 1.493 1.488 0.514 2 2.036 2.553 0.517 3.488 1.452 2.682 0.646 3 3.114 3.536 0.423 4.477 1.363 3.837 0.724 4 4.107 4.473 0.365 5.460 1.352 4.983 0.875 5 5.108 5.362 0.254 6.448 1.340 6.096 0.988 6 6.078 6.204 0.126 7.456 1.378 7.158 1.080 7 7.009 7.012 0.003 8.492 1.483 8.167 1.158 8 7.948 7.793 0.155 9.544 1.596 9.129 1.181 9 8.891 8.552 0.340 10.582 1.691 10.054 1.163 10 9.842 9.289 0.553 11.584 1.741 10.951 1.109 11 10.809 10.019 0.790 12.541 1.732 11.828 1.019 12 11.788 10.746 1.042 13.453 1.665 12.691 0.903 13 12.775 11.468 1.308 14.327 1.551 13.544 0.768 14 13.751 12.187 1.565 15.166 1.415 14.389 0.637 15 14.726 12.908 1.819 15.978 1.251 15.227 0.501 16 15.719 13.629 2.090 16.767 1.048 16.060 0.341 17 16.706 14.351 2.355 1 7.536 0.830 16.889 0.183 18 17.679 15.078 2.600 18.288 0.609 17.713 0.034 19 18.646 15.812 2.834 19.027 0.381 18.536 0.111 20 19.624 16.554 3.070 19.754 0.131 19.358 0.266 21 20.642 17.309 3.334 20.470 0.172 20.182 0.460 22 21.644 18.080 3.56 4 21.177 0.467 21.012 0.632 23 22.606 18.871 3.735 21.876 0.730 21.850 0.756 24 23.576 19.682 3.894 22.568 1.008 22.698 0.879 25 24.553 20.516 4.037 23.258 1.295 23.553 1.000 26 25.542 21.369 4.174 23.946 1.596 24.413 1.129 27 26.530 22. 236 4.294 24.638 1.891 25.270 1.260 28 27.502 23.111 4.391 25.340 2.162 26.118 1.385 29 28.497 23.990 4.508 26.057 2.441 26.952 1.545 30 29.476 24.870 4.606 26.798 2.677 27.775 1.700 31 30.443 25.754 4.689 27.577 2.866 28.595 1.848 32 3 1.415 26.651 4.764 28.405 3.011 29.422 1.994 33 32.343 27.580 4.763 29.299 3.044 30.275 2.068 34 33.309 28.569 4.740 30.277 3.032 31.180 2.129 35 34.284 29.663 4.621 31.360 2.924 32.175 2.109 36 35.249 30.922 4.328 32.566 2.683 33.311 1. 939 37 36.247 32.423 3.825 33.910 2.337 34.649 1.598 38 37.281 34.243 3.038 35.398 1.883 36.253 1.028 39 38.507 36.381 2.126 37.031 1.476 38.152 0.355 40 39.695 38.614 1.081 38.780 0.915 38.755 0.940 Note: Test scores at both ends (i.e., 0 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 241

241 Table A 76. Equating mean score and score difference for ICF linking method condition 11 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equiv alent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.054 0.000 0.054 0.000 0.054 0.000 0.054 1 0.974 0.905 0.070 1.069 0.094 1.004 0.030 2 2.036 1.846 0.190 2.066 0.029 1.996 0.040 3 3.114 2.768 0.345 3.063 0.051 2.986 0 .128 4 4.107 3.668 0.439 4.059 0.048 3.974 0.133 5 5.108 4.538 0.570 5.046 0.062 4.962 0.146 6 6.078 5.370 0.708 6.028 0.049 5.950 0.128 7 7.009 6.186 0.823 7.006 0.003 6.942 0.067 8 7.948 7.002 0.946 7.986 0.038 7.939 0.009 9 8.891 7.8 22 1.069 8.972 0.081 8.941 0.050 10 9.842 8.645 1.197 9.959 0.117 9.948 0.106 11 10.809 9.469 1.340 10.950 0.141 10.959 0.150 12 11.788 10.294 1.494 11.945 0.157 11.971 0.183 13 12.775 11.128 1.648 12.945 0.170 12.984 0.209 14 13.751 11.971 1.78 1 13.950 0.199 13.997 0.246 15 14.726 12.823 1.904 14.959 0.232 15.010 0.283 16 15.719 13.682 2.037 15.970 0.251 16.022 0.303 17 16.706 14.548 2.158 16.981 0.275 17.033 0.327 18 17.679 15.421 2.258 17.991 0.312 18.044 0.365 19 18.646 16.305 2.341 18.999 0.353 19.054 0.408 20 19.624 17.203 2.421 20.003 0.379 20.064 0.440 21 20.642 18.117 2.525 21.002 0.359 21.073 0.430 22 21.644 19.051 2.593 21.994 0.351 22.080 0.436 23 22.606 20.006 2.599 22.980 0.374 23.086 0.480 24 23.576 20.983 2.594 23.957 0.380 24.089 0.513 25 24.553 21.978 2.575 24.925 0.371 25.090 0.536 26 25.542 22.987 2.555 25.884 0.341 26.085 0.543 27 26.530 24.004 2.525 26.835 0.305 27.075 0.545 28 27.502 25.023 2.479 27.780 0.278 28.058 0.556 29 28.497 26.039 2.459 2 8.722 0.224 29.034 0.536 30 29.476 27.052 2.423 29.664 0.189 30.002 0.527 31 30.443 28.069 2.374 30.613 0.171 30.964 0.521 32 31.415 29.102 2.314 31.575 0.159 31.923 0.507 33 32.343 30.167 2.176 32.555 0.212 32.881 0.538 34 33.309 31.287 2.022 33 .559 0.250 33.845 0.535 35 34.284 32.489 1.795 34.592 0.308 34.822 0.538 36 35.249 33.793 1.456 35.654 0.405 35.822 0.573 37 36.247 35.195 1.052 36.742 0.495 36.855 0.608 38 37.281 36.671 0.610 37.843 0.562 37.919 0.639 39 38.507 38.157 0.350 38. 934 0.427 38.991 0.484 40 39.695 39.508 0.188 39.910 0.214 38.755 0.940 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 242

242 Table A 77. Equating mean score and score difference for NOP linking method cond ition 11 NOP Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.054 0.000 0.054 0.000 0.054 0.000 0.054 1 0.974 0.000 0.974 0.001 0.974 0.668 0 .307 2 2.036 0.100 1.936 0.027 2.010 1.308 0.728 3 3.114 0.627 2.487 0.246 2.868 1.913 1.201 4 4.107 1.078 3.029 0.846 3.261 2.464 1.643 5 5.108 1.616 3.492 1.711 3.397 2.955 2.153 6 6.078 2.042 4.036 2.612 3.466 3.396 2.681 7 7.009 2. 557 4.452 3.488 3.521 3.807 3.202 8 7.948 2.965 4.983 4.293 3.655 4.204 3.744 9 8.891 3.461 5.430 5.013 3.878 4.602 4.289 10 9.842 3.884 5.958 5.681 4.161 5.013 4.829 11 10.809 4.372 6.437 6.275 4.534 5.446 5.363 12 11.788 4.831 6.957 6.801 4.987 5.909 5.879 13 12.775 5.331 7.445 7.326 5.449 6.407 6.368 14 13.751 5.828 7.924 7.815 5.936 6.946 6.805 15 14.726 6.360 8.366 8.328 6.399 7.527 7.200 16 15.719 6.896 8.823 8.841 6.878 8.150 7.569 17 16.706 7.474 9.232 9.393 7.313 8.815 7.891 18 17.679 8.058 9.620 9.968 7.711 9.519 8.160 19 18.646 8.685 9.961 10.596 8.050 10.260 8.386 20 19.624 9.337 10.287 11.261 8.362 11.037 8.587 21 20.642 10.021 10.622 11.977 8.665 11.850 8.793 22 21.644 10.746 10.897 1 2.747 8.897 12.702 8.942 23 22.606 11.512 11.094 13.563 9.043 13.597 9.009 24 23.576 12.319 11.258 14.423 9.153 14.540 9.036 25 24.553 13.170 11.383 15.326 9.228 15.534 9.020 26 25.542 14.066 11.476 16.268 9.274 16.578 8.965 27 26.530 15 .004 11.526 17.248 9.282 17.666 8.864 28 27.502 15.980 11.522 18.262 9.240 18.786 8.716 29 28.497 16.991 11.506 19.309 9.189 19.928 8.570 30 29.476 18.038 11.438 20.387 9.089 21.079 8.397 31 30.443 19.124 11.319 21.497 8.945 22.234 8.209 32 31.415 20.259 11.157 22.647 8.768 23.398 8.018 33 32.343 21.462 10.881 23.853 8.490 24.587 7.756 34 33.309 22.766 10.543 25.143 8.167 25.840 7.469 35 34.284 24.218 10.066 26.563 7.721 27.228 7.056 36 35.249 25.895 9.354 28.180 7.070 28.884 6.366 37 36.247 27.922 8.325 30.077 6.171 31.049 5.199 38 37.281 30.508 6.773 32.320 4.961 34.100 3.181 39 38.507 33.866 4.641 34.888 3.619 37.797 0.710 40 39.695 37.537 2.159 37.666 2.030 38.755 0.940 Note: Test scores at both end s (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 243

243 Table A 78. Equating mean score and score difference for linking method condition 12 Min's Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equival en t Mean Score Differenc e Mean Score Differenc e Mean Score Differenc e 0 0.266 0.000 0.266 0.000 0.266 0.000 0.266 1 0.911 0.067 0.844 0.354 0.557 0.986 0.075 2 2.012 0.658 1.354 1.051 0.960 1.849 0.162 3 2.948 1.297 1.651 1.943 1.005 2.654 0.2 94 4 3.937 1.888 2.050 2.859 1.079 3.385 0.553 5 4.916 2.519 2.397 3.790 1.126 4.049 0.867 6 5.860 3.091 2.769 4.714 1.146 4.671 1.189 7 6.818 3.677 3.141 5.614 1.204 5.276 1.542 8 7.761 4.251 3.510 6.457 1.304 5.882 1.879 9 8.731 4.82 3 3.908 7.229 1.502 6.503 2.228 10 9.722 5.414 4.308 7.950 1.771 7.149 2.573 11 10.691 6.006 4.685 8.634 2.057 7.825 2.866 12 11.657 6.626 5.031 9.310 2.346 8.535 3.122 13 12.642 7.259 5.383 9.996 2.646 9.277 3.365 14 13.631 7.911 5.72 0 10.691 2.940 10.046 3.585 15 14.599 8.592 6.007 11.406 3.193 10.838 3.761 16 15.593 9.290 6.303 12.147 3.446 11.648 3.945 17 16.606 10.007 6.599 12.910 3.696 12.472 4.134 18 17.611 10.754 6.857 13.697 3.914 13.308 4.303 19 18.610 11.52 5 7.085 14.506 4.104 14.157 4.453 20 19.599 12.318 7.281 15.334 4.266 15.023 4.576 21 20.616 13.141 7.475 16.177 4.439 15.909 4.707 22 21.624 14.000 7.624 17.034 4.590 16.824 4.800 23 22.603 14.898 7.705 17.905 4.698 17.772 4.831 24 23. 591 15.838 7.753 18.789 4.803 18.760 4.831 25 24.590 16.820 7.770 19.684 4.906 19.788 4.802 26 25.598 17.843 7.755 20.589 5.009 20.847 4.751 27 26.597 18.902 7.695 21.504 5.093 21.922 4.675 28 27.590 19.987 7.604 22.430 5.161 22.993 4.59 7 29 28.604 21.088 7.517 23.366 5.238 24.043 4.562 30 29.575 22.196 7.380 24.319 5.257 25.065 4.511 31 30.517 23.309 7.208 25.296 5.221 26.066 4.451 32 31.489 24.433 7.056 26.314 5.175 27.064 4.425 33 32.436 25.586 6.850 27.392 5.044 28. 090 4.346 34 33.440 26.799 6.641 28.557 4.883 29.185 4.254 35 34.414 28.122 6.292 29.840 4.574 30.415 3.998 36 35.372 29.626 5.746 31.264 4.108 31.867 3.505 37 36.388 31.397 4.992 32.838 3.551 33.640 2.749 38 37.394 33.510 3.884 34.546 2.847 35.766 1.628 39 38.601 35.914 2.687 36.372 2.229 38.091 0.509 40 39.758 38.345 1.413 38.327 1.431 38.755 1.003 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 244

244 Table A 79. Equating mean sco re and score difference for ODL direct method c ondition 12 Oshima's Direct Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.266 0.000 0.266 0.000 0 .266 0.000 0.266 1 0.911 0.387 0.524 1.165 0.254 1.073 0.162 2 2.012 0.951 1.061 1.916 0.096 1.991 0.021 3 2.948 1.534 1.414 2.693 0.254 2.872 0.075 4 3.937 2.149 1.789 3.494 0.444 3.721 0.217 5 4.916 2.769 2.147 4.333 0.582 4.539 0.377 6 5.860 3.400 2.460 5.214 0.647 5.335 0.525 7 6.818 4.043 2.775 6.116 0.701 6.119 0.699 8 7.761 4.704 3.057 7.045 0.716 6.898 0.863 9 8.731 5.395 3.336 7.977 0.754 7.681 1.050 10 9.722 6.111 3.611 8.904 0.818 8.476 1.246 11 10.691 6.861 3.830 9.795 0.896 9.287 1.404 12 11.657 7.630 4.027 10.659 0.998 10.119 1.538 13 12.642 8.439 4.203 11.500 1.142 10.973 1.669 14 13.631 9.279 4.351 12.344 1.287 11.847 1.783 15 14.599 10.154 4.445 13.186 1.413 12.742 1.857 16 15.593 11 .065 4.527 14.031 1.562 13.655 1.938 17 16.606 12.013 4.593 14.891 1.716 14.584 2.022 18 17.611 13.001 4.610 15.770 1.841 15.528 2.083 19 18.610 14.028 4.582 16.674 1.936 16.488 2.122 20 19.599 15.096 4.504 17.605 1.994 17.463 2.136 21 20.616 16.203 4.414 18.559 2.057 18.456 2.160 22 21.624 17.349 4.275 19.528 2.096 19.469 2.155 23 22.603 18.535 4.068 20.506 2.097 20.505 2.098 24 23.591 19.756 3.835 21.495 2.097 21.564 2.027 25 24.590 21.005 3.585 22.491 2.099 22.643 1 .947 26 25.598 22.268 3.330 23.493 2.105 23.734 1.864 27 26.597 23.535 3.062 24.500 2.097 24.825 1.772 28 27.590 24.795 2.795 25.502 2.089 25.898 1.692 29 28.604 26.035 2.569 26.487 2.117 26.944 1.660 30 29.575 27.248 2.327 27.453 2.122 27.959 1.616 31 30.517 28.428 2.089 28.414 2.104 28.952 1.566 32 31.489 29.580 1.908 29.368 2.121 29.934 1.555 33 32.436 30.726 1.710 30.331 2.105 30.924 1.512 34 33.440 31.878 1.562 31.314 2.126 31.945 1.495 35 34.414 33.059 1.355 32.35 3 2.060 33.027 1.387 36 35.372 34.295 1.077 33.479 1.893 34.209 1.163 37 36.388 35.598 0.790 34.703 1.685 35.525 0.863 38 37.394 36.942 0.451 36.016 1.377 36.977 0.417 39 38.601 38.260 0.341 37.413 1.188 38.523 0.078 40 39.758 39.456 0. 302 38.848 0.910 38.755 1.003 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 245

245 Table A 80. Equating mean score and score difference for TCF linking method condition 12 Test Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.266 0.000 0.266 0.000 0.266 0.000 0.266 1 0.911 0.776 0.135 1.475 0.565 1.202 0.291 2 2.012 1.636 0.37 5 2.475 0.464 2.253 0.242 3 2.948 2.474 0.474 3.475 0.528 3.271 0.323 4 3.937 3.298 0.639 4.459 0.521 4.273 0.335 5 4.916 4.107 0.809 5.439 0.524 5.260 0.345 6 5.860 4.892 0.969 6.417 0.557 6.233 0.372 7 6.818 5.668 1.150 7.393 0.575 7.189 0.371 8 7.761 6.437 1.324 8.373 0.612 8.132 0.371 9 8.731 7.200 1.532 9.354 0.623 9.065 0.333 10 9.722 7.966 1.756 10.328 0.606 9.991 0.269 11 10.691 8.745 1.946 11.287 0.596 10.913 0.222 12 11.657 9.534 2.123 12.230 0.573 11.833 0.176 13 12.642 10.33 0 2.312 13.157 0.515 12.751 0.109 14 13.631 11.137 2.494 14.069 0.438 13.667 0.036 15 14.599 11.957 2.642 14.969 0.370 14.582 0.017 16 15.593 12.788 2.805 15.858 0.265 15.495 0.098 17 16.606 13.628 2.978 16.739 0.133 16.408 0.198 18 17.611 14. 479 3.133 17.611 0.000 17.321 0.290 19 18.610 15.343 3.267 18.476 0.134 18.237 0.373 20 19.599 16.225 3.375 19.334 0.265 19.156 0.443 21 20.616 17.125 3.491 20.185 0.431 20.081 0.535 22 21.624 18.049 3.575 21.030 0.594 21.015 0.609 23 22 .603 18.997 3.606 21.867 0.736 21.958 0.644 24 23.591 19.971 3.620 22.698 0.893 22.910 0.681 25 24.590 20.968 3.623 23.524 1.066 23.868 0.722 26 25.598 21.982 3.616 24.346 1.252 24.826 0.772 27 26.597 23.006 3.591 25.167 1.430 25.777 0.8 20 28 27.590 24.033 3.558 25.992 1.598 26.716 0.874 29 28.604 25.054 3.550 26.829 1.775 27.642 0.963 30 29.575 26.070 3.506 27.686 1.889 28.557 1.019 31 30.517 27.084 3.433 28.573 1.944 29.469 1.048 32 31.489 28.110 3.379 29.502 1.987 30 .390 1.099 33 32.436 29.168 3.268 30.485 1.950 31.336 1.100 34 33.440 30.287 3.153 31.532 1.908 32.330 1.110 35 34.414 31.500 2.914 32.647 1.767 33.397 1.016 36 35.372 32.838 2.534 33.833 1.539 34.560 0.812 37 36.388 34.316 2.072 35.086 1.302 35.821 0.568 38 37.394 35.917 1.476 36.401 0.992 37.164 0.230 39 38.601 37.577 1.024 37.767 0.834 38.576 0.025 40 39.758 39.186 0.572 39.159 0.599 38.755 1.003 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 246

246 Table A 81. Equating mean score and score difference for ICF linking method condition 12 Item Characteristic Curve Method Raw Score Frequency Full MIRT Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Sco re Difference Mean Score Difference 0 0.266 0.000 0.266 0.000 0.266 0.000 0.266 1 0.911 0.673 0.238 1.116 0.206 1.024 0.113 2 2.012 1.572 0.439 2.133 0.122 2.024 0.012 3 2.948 2.411 0.537 3.138 0.191 3.015 0.068 4 3.937 3.207 0.731 4.124 0.186 4. 002 0.065 5 4.916 4.015 0.901 5.098 0.183 4.986 0.070 6 5.860 4.832 1.028 6.072 0.212 5.971 0.111 7 6.818 5.653 1.165 7.046 0.228 6.959 0.142 8 7.761 6.465 1.296 8.019 0.258 7.953 0.192 9 8.731 7.270 1.462 8.997 0.266 8.953 0.222 10 9.722 8.089 1.633 9.980 0.258 9.957 0.235 11 10.691 8.928 1.763 10.967 0.276 10.964 0.273 12 11.657 9.785 1.872 11.959 0.302 11.973 0.316 13 12.642 10.657 1.985 12.956 0.314 12.982 0.340 14 13.631 11.540 2.090 13.957 0.326 13.990 0.359 15 14.599 12.434 2.16 5 14.963 0.364 14.998 0.399 16 15.593 13.338 2.255 15.970 0.377 16.005 0.412 17 16.606 14.252 2.354 16.978 0.372 17.012 0.406 18 17.611 15.180 2.432 17.985 0.374 18.019 0.407 19 18.610 16.121 2.489 18.989 0.380 19.025 0.416 20 19.599 17.078 2.521 19.990 0.391 20.033 0.433 21 20.616 18.054 2.562 20.986 0.370 21.040 0.424 22 21.624 19.051 2.573 21.976 0.352 22.047 0.423 23 22.603 20.069 2.534 22.959 0.356 23.054 0.451 24 23.591 21.109 2.483 23.933 0.342 24.058 0.467 25 24.590 22.165 2.425 24.898 0.308 25.060 0.470 26 25.598 23.231 2.367 25.854 0.256 26.058 0.460 27 26.597 24.300 2.297 26.802 0.204 27.049 0.452 28 27.590 25.364 2.226 27.743 0.153 28.033 0.442 29 28.604 26.418 2.186 28.682 0.078 29.008 0.403 30 29.575 27.463 2.112 2 9.624 0.048 29.974 0.398 31 30.517 28.504 2.013 30.573 0.055 30.933 0.415 32 31.489 29.552 1.937 31.536 0.047 31.887 0.398 33 32.436 30.623 1.813 32.520 0.084 32.841 0.406 34 33.440 31.736 1.704 33.529 0.089 33.803 0.363 35 34.414 32.909 1.505 34 .565 0.152 34.782 0.368 36 35.372 34.154 1.218 35.629 0.257 35.788 0.416 37 36.388 35.471 0.917 36.715 0.326 36.830 0.441 38 37.394 36.854 0.540 37.810 0.416 37.903 0.509 39 38.601 38.232 0.369 38.896 0.295 38.977 0.376 40 39.758 39.517 0.241 39. 884 0.126 38.755 1.003 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 247

247 Table A 82. Equating mean score and score difference for NOP linking method condition 12 NOP Method Raw Score Frequency Full MIR T Approx Observed Approx True Pop Equivalent Mean Score Difference Mean Score Difference Mean Score Difference 0 0.266 0.000 0.266 0.000 0.266 2.786 3.052 1 0.911 0.000 0.911 0.000 0.911 0.672 0.239 2 2.012 0.000 2.012 0.032 1.979 1.306 0.706 3 2.948 0.171 2.777 0.219 2.728 1.899 1.048 4 3.937 0.613 3.324 0.755 3.182 2.438 1.500 5 4.916 0.980 3.936 1.580 3.336 2.915 2.000 6 5.860 1.488 4.372 2.451 3.409 3.344 2.516 7 6.818 1.850 4.968 3.303 3.514 3.742 3.076 8 7.761 2.315 5.446 4.106 3.655 4.126 3.635 9 8.731 2.726 6.005 4.836 3.895 4.509 4.222 10 9.722 3.152 6.570 5.493 4.229 4.904 4.818 11 10.691 3.613 7.078 6.088 4.603 5.319 5.372 12 11.657 4.041 7.616 6.625 5.032 5.761 5.896 13 12.642 4.535 8.107 7.1 41 5.501 6.238 6.404 14 13.631 4.997 8.634 7.635 5.996 6.752 6.879 15 14.599 5.522 9.077 8.135 6.463 7.307 7.292 16 15.593 6.033 9.560 8.645 6.948 7.904 7.689 17 16.606 6.599 10.008 9.179 7.427 8.542 8.065 18 17.611 7.173 10.439 9.742 7.869 9.220 8.392 19 18.610 7.790 10.820 10.347 8.263 9.936 8.674 20 19.599 8.443 11.156 10.992 8.607 10.691 8.908 21 20.616 9.131 11.485 11.690 8.927 11.484 9.132 22 21.624 9.869 11.755 12.434 9.190 12.318 9.306 23 22.603 10.656 11.947 13.226 9.377 13.197 9.406 24 23.591 11.492 12.099 14.067 9.524 14.125 9.466 25 24.590 12.380 12.210 14.953 9.637 15.104 9.486 26 25.598 13.319 12.280 15.880 9.718 16.134 9.464 27 26.597 14.305 12.293 16.846 9.751 17.208 9.389 28 27.590 1 5.333 12.257 17.848 9.742 18.317 9.273 29 28.604 16.401 12.204 18.884 9.720 19.448 9.156 30 29.575 17.505 12.070 19.951 9.624 20.591 8.984 31 30.517 18.650 11.867 21.052 9.466 21.743 8.774 32 31.489 19.845 11.644 22.191 9.298 22.906 8.58 3 33 32.436 21.109 11.327 23.386 9.050 24.098 8.338 34 33.440 22.471 10.968 24.664 8.776 25.354 8.086 35 34.414 23.978 10.436 26.073 8.341 26.744 7.670 36 35.372 25.695 9.676 27.680 7.692 28.399 6.973 37 36.388 27.734 8.655 29.576 6.813 30.570 5.818 38 37.394 30.272 7.121 31.838 5.555 33.696 3.698 39 38.601 33.510 5.091 34.449 4.152 37.646 0.954 40 39.758 37.153 2.604 37.292 2.466 38.755 1.003 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedu re (Kolen, 1981).

PAGE 248

248 Table A 83. c ondition 1 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.561 0.561 0.000 0.561 0.561 0.120 2.403 2.406 1 0.213 0.125 0.247 0.560 0.108 0.569 0.182 0.136 0.227 2 0.228 0.068 0.238 0.593 0.066 0.595 0.226 0.091 0.243 3 0.245 0.091 0.260 0.579 0.083 0.584 0.243 0.122 0.271 4 0.260 0.141 0.295 0.537 0.136 0.553 0.258 0.178 0.313 5 0.272 0.112 0.293 0.4 93 0.113 0.504 0.276 0.155 0.316 6 0.278 0.088 0.291 0.447 0.104 0.458 0.291 0.133 0.319 7 0.277 0.050 0.281 0.408 0.086 0.416 0.298 0.094 0.312 8 0.269 0.019 0.269 0.390 0.079 0.397 0.297 0.061 0.302 9 0.256 0.004 0.255 0.403 0.090 0.412 0.286 0.041 0.288 10 0.237 0.025 0.238 0.414 0.080 0.421 0.268 0.008 0.267 11 0.216 0.061 0.224 0.414 0.052 0.416 0.244 0.033 0.246 12 0.193 0.099 0.216 0.399 0.013 0.399 0.218 0.076 0.230 13 0.169 0.128 0.211 0.372 0.022 0.372 0.190 0.109 0.218 14 0.147 0.145 0.206 0.337 0.050 0.340 0.162 0.132 0.209 15 0.128 0.175 0.217 0.298 0.092 0.311 0.138 0.166 0.216 16 0.116 0.208 0.238 0.256 0.139 0.291 0.121 0.204 0.237 17 0.112 0.223 0.250 0.215 0.171 0.275 0.115 0.224 0.252 18 0.118 0.245 0.272 0.1 79 0.210 0.276 0.122 0.250 0.278 19 0.133 0.258 0.290 0.155 0.242 0.287 0.140 0.267 0.302 20 0.153 0.277 0.316 0.151 0.279 0.317 0.165 0.290 0.333 21 0.177 0.308 0.355 0.169 0.330 0.370 0.194 0.324 0.378 22 0.203 0.329 0.386 0.203 0.369 0.421 0.225 0.3 46 0.413 23 0.230 0.334 0.406 0.247 0.395 0.466 0.256 0.353 0.435 24 0.258 0.326 0.416 0.295 0.408 0.503 0.286 0.346 0.448 25 0.286 0.335 0.440 0.346 0.437 0.557 0.316 0.356 0.475 26 0.312 0.344 0.464 0.399 0.469 0.615 0.346 0.366 0.503 27 0.338 0.348 0.485 0.457 0.503 0.678 0.375 0.374 0.529 28 0.362 0.369 0.517 0.516 0.556 0.758 0.403 0.399 0.566 29 0.385 0.405 0.559 0.572 0.625 0.846 0.429 0.441 0.614 30 0.407 0.424 0.587 0.621 0.672 0.914 0.451 0.464 0.647 31 0.425 0.438 0.609 0.659 0.704 0.964 0.469 0.481 0.671 32 0.440 0.455 0.633 0.684 0.730 1.000 0.478 0.498 0.690 33 0.451 0.459 0.643 0.693 0.731 1.006 0.476 0.496 0.687 34 0.455 0.399 0.604 0.687 0.655 0.948 0.461 0.421 0.623 35 0.449 0.349 0.568 0.664 0.578 0.879 0.429 0.347 0.551 36 0 .429 0.373 0.568 0.623 0.568 0.841 0.380 0.339 0.508 37 0.390 0.285 0.482 0.569 0.446 0.722 0.311 0.213 0.376 38 0.330 0.243 0.409 0.500 0.379 0.626 0.221 0.132 0.257 39 0.242 0.256 0.352 0.379 0.350 0.516 0.120 0.114 0.165 40 0.068 0.167 0.180 0.141 0 .144 0.201 0.107 1.519 1.522 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 249

249 Table A 84. SEE, Bias, and RMSD for ODL direct method c ondition 1 Oshima's Direct Method Raw Score Full MIRT Approx Observ ed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.561 0.561 0.000 0.561 0.561 0.120 2.403 2.406 1 0.171 0.127 0.213 0.510 0.109 0.520 0.160 0.119 0.199 2 0.181 0.072 0.194 0.519 0.058 0.521 0.187 0.070 0.199 3 0.193 0.095 0 .215 0.489 0.081 0.494 0.191 0.101 0.215 4 0.206 0.148 0.253 0.452 0.129 0.469 0.195 0.160 0.252 5 0.220 0.124 0.252 0.410 0.101 0.421 0.208 0.141 0.251 6 0.233 0.107 0.256 0.365 0.087 0.374 0.225 0.128 0.258 7 0.243 0.078 0.255 0.325 0.06 3 0.330 0.241 0.100 0.261 8 0.251 0.057 0.257 0.301 0.048 0.304 0.253 0.079 0.265 9 0.257 0.053 0.262 0.312 0.047 0.315 0.261 0.074 0.270 10 0.261 0.036 0.263 0.329 0.026 0.329 0.265 0.055 0.270 11 0.263 0.012 0.263 0.342 0.008 0.341 0.267 0.029 0.268 12 0.265 0.014 0.265 0.347 0.048 0.349 0.267 0.000 0.267 13 0.266 0.030 0.267 0.345 0.078 0.353 0.268 0.019 0.268 14 0.267 0.035 0.269 0.339 0.095 0.351 0.269 0.027 0.269 15 0.267 0.054 0.272 0.330 0.118 0.350 0.270 0.048 0.273 16 0.268 0. 076 0.278 0.320 0.140 0.348 0.271 0.073 0.280 17 0.268 0.081 0.279 0.311 0.136 0.338 0.273 0.080 0.284 18 0.269 0.092 0.284 0.305 0.132 0.332 0.276 0.093 0.290 19 0.271 0.097 0.287 0.306 0.116 0.327 0.278 0.100 0.295 20 0.273 0.107 0.293 0.314 0.102 0. 329 0.281 0.112 0.301 21 0.276 0.130 0.305 0.329 0.100 0.343 0.283 0.137 0.314 22 0.279 0.142 0.313 0.349 0.092 0.360 0.286 0.151 0.323 23 0.283 0.141 0.316 0.372 0.075 0.378 0.289 0.150 0.325 24 0.287 0.126 0.313 0.396 0.053 0.398 0.292 0.137 0.322 2 5 0.291 0.128 0.317 0.420 0.056 0.423 0.297 0.140 0.327 26 0.296 0.130 0.322 0.446 0.069 0.450 0.303 0.144 0.334 27 0.300 0.129 0.326 0.475 0.088 0.482 0.310 0.145 0.342 28 0.306 0.145 0.338 0.509 0.133 0.525 0.320 0.164 0.359 29 0.312 0.177 0.358 0.54 7 0.198 0.580 0.332 0.200 0.387 30 0.319 0.191 0.371 0.584 0.246 0.633 0.345 0.219 0.408 31 0.326 0.203 0.383 0.618 0.287 0.680 0.358 0.235 0.427 32 0.334 0.220 0.399 0.642 0.327 0.719 0.368 0.255 0.447 33 0.341 0.226 0.408 0.652 0.345 0.737 0.374 0.26 1 0.455 34 0.346 0.172 0.386 0.647 0.291 0.707 0.370 0.200 0.419 35 0.348 0.131 0.371 0.623 0.239 0.666 0.352 0.145 0.380 36 0.341 0.171 0.381 0.584 0.259 0.638 0.317 0.164 0.356 37 0.321 0.106 0.337 0.528 0.171 0.554 0.260 0.075 0.270 38 0.281 0.095 0.296 0.456 0.142 0.477 0.182 0.040 0.185 39 0.222 0.147 0.266 0.362 0.175 0.401 0.097 0.076 0.123 40 0.095 0.146 0.174 0.170 0.104 0.199 0.107 1.519 1.522 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 198 1).

PAGE 250

250 Table A 85. SEE, Bias, and RMSD for ODL TCF method condition 1 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.561 0.561 0.000 0.561 0.561 0.120 2.403 2.406 1 0.155 0.137 0.207 0.507 0.119 0.519 0.158 0.116 0.195 2 0.169 0.082 0.187 0.509 0.057 0.510 0.184 0.067 0.195 3 0.183 0.107 0.211 0.478 0.078 0.483 0.190 0.098 0.214 4 0.195 0.161 0.253 0.445 0.127 0.462 0.197 0.159 0.253 5 0.206 0.139 0 .248 0.405 0.105 0.418 0.209 0.143 0.253 6 0.215 0.125 0.248 0.364 0.099 0.376 0.221 0.134 0.258 7 0.221 0.098 0.241 0.331 0.081 0.340 0.230 0.111 0.255 8 0.223 0.081 0.237 0.313 0.072 0.321 0.234 0.095 0.252 9 0.223 0.079 0.236 0.322 0.08 0 0.331 0.233 0.094 0.251 10 0.221 0.066 0.230 0.332 0.071 0.339 0.230 0.080 0.243 11 0.218 0.045 0.222 0.337 0.050 0.340 0.225 0.058 0.232 12 0.214 0.022 0.215 0.335 0.022 0.334 0.220 0.033 0.222 13 0.210 0.009 0.210 0.325 0.002 0.324 0.21 5 0.018 0.215 14 0.207 0.006 0.206 0.311 0.007 0.310 0.210 0.014 0.210 15 0.203 0.009 0.203 0.294 0.027 0.294 0.207 0.003 0.207 16 0.200 0.029 0.202 0.276 0.048 0.280 0.205 0.025 0.206 17 0.198 0.031 0.200 0.259 0.047 0.263 0.203 0.029 0.205 18 0.1 97 0.039 0.201 0.245 0.050 0.250 0.203 0.039 0.206 19 0.198 0.041 0.201 0.238 0.042 0.241 0.203 0.043 0.207 20 0.199 0.049 0.205 0.239 0.039 0.241 0.205 0.053 0.211 21 0.203 0.070 0.214 0.250 0.049 0.254 0.208 0.076 0.220 22 0.207 0.080 0.221 0.270 0.0 50 0.274 0.211 0.088 0.228 23 0.212 0.077 0.225 0.296 0.043 0.298 0.216 0.085 0.232 24 0.218 0.060 0.225 0.326 0.027 0.327 0.222 0.070 0.232 25 0.224 0.061 0.232 0.360 0.035 0.361 0.229 0.072 0.239 26 0.231 0.062 0.239 0.398 0.048 0.400 0.238 0.074 0.2 49 27 0.240 0.060 0.247 0.442 0.065 0.446 0.249 0.075 0.260 28 0.249 0.075 0.260 0.491 0.105 0.501 0.264 0.092 0.279 29 0.261 0.106 0.281 0.544 0.164 0.567 0.282 0.127 0.309 30 0.274 0.121 0.298 0.597 0.205 0.630 0.304 0.146 0.336 31 0.288 0.132 0.316 0.644 0.237 0.685 0.327 0.161 0.364 32 0.304 0.150 0.338 0.682 0.267 0.731 0.349 0.181 0.393 33 0.321 0.157 0.357 0.706 0.276 0.756 0.368 0.187 0.412 34 0.337 0.103 0.352 0.710 0.213 0.740 0.378 0.128 0.398 35 0.348 0.065 0.354 0.695 0.155 0.710 0.373 0.077 0.380 36 0.351 0.109 0.366 0.658 0.171 0.679 0.349 0.102 0.363 37 0.338 0.050 0.341 0.598 0.084 0.603 0.300 0.022 0.300 38 0.303 0.050 0.306 0.518 0.060 0.520 0.222 0.002 0.221 39 0.245 0.115 0.270 0.417 0.109 0.430 0.115 0.059 0.129 40 0.122 0 .141 0.186 0.227 0.081 0.241 0.107 1.519 1.522 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 251

251 Table A 86. SEE, Bias, and RMSD for ODL ICF method condition 1 Item Characteristic Curve Method Raw Scor e Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.561 0.561 0.000 0.561 0.561 0.120 2.403 2.406 1 0.140 0.139 0.197 0.483 0.122 0.497 0.153 0.118 0.193 2 0.147 0.085 0.169 0.490 0.069 0.494 0.174 0.07 0 0.187 3 0.154 0.111 0.190 0.461 0.088 0.468 0.173 0.102 0.201 4 0.163 0.166 0.232 0.427 0.134 0.446 0.170 0.164 0.236 5 0.172 0.144 0.224 0.387 0.108 0.401 0.174 0.149 0.229 6 0.180 0.130 0.222 0.336 0.101 0.350 0.183 0.140 0.230 7 0.187 0.104 0.213 0.286 0.086 0.298 0.192 0.117 0.224 8 0.192 0.086 0.209 0.255 0.080 0.267 0.198 0.101 0.221 9 0.194 0.084 0.211 0.258 0.087 0.272 0.201 0.100 0.224 10 0.196 0.071 0.208 0.268 0.078 0.279 0.202 0.085 0.219 11 0.197 0.049 0.203 0.276 0.055 0.281 0.202 0.062 0.211 12 0.198 0.026 0.199 0.279 0.025 0.279 0.201 0.037 0.204 13 0.198 0.012 0.198 0.276 0.004 0.275 0.201 0.022 0.202 14 0.199 0.009 0.198 0.270 0.007 0.269 0.201 0.017 0.201 15 0.199 0.007 0.199 0.262 0.027 0. 262 0.202 0.001 0.201 16 0.199 0.027 0.201 0.252 0.049 0.256 0.203 0.023 0.204 17 0.200 0.030 0.202 0.244 0.049 0.248 0.205 0.028 0.206 18 0.201 0.039 0.204 0.237 0.051 0.242 0.206 0.039 0.209 19 0.203 0.042 0.206 0.236 0.043 0.239 0.208 0.044 0.212 2 0 0.205 0.050 0.210 0.240 0.038 0.243 0.210 0.054 0.217 21 0.207 0.072 0.219 0.252 0.048 0.256 0.213 0.078 0.226 22 0.210 0.083 0.226 0.269 0.049 0.273 0.215 0.090 0.233 23 0.214 0.080 0.228 0.291 0.041 0.293 0.218 0.089 0.235 24 0.218 0.064 0.226 0.31 4 0.025 0.314 0.222 0.074 0.233 25 0.222 0.066 0.230 0.339 0.033 0.340 0.226 0.077 0.239 26 0.226 0.067 0.235 0.367 0.047 0.370 0.233 0.080 0.245 27 0.231 0.066 0.240 0.401 0.066 0.405 0.241 0.080 0.253 28 0.237 0.081 0.250 0.440 0.107 0.452 0.252 0.09 9 0.270 29 0.245 0.113 0.269 0.484 0.167 0.510 0.265 0.134 0.297 30 0.253 0.128 0.283 0.528 0.209 0.567 0.281 0.153 0.320 31 0.263 0.139 0.297 0.568 0.244 0.617 0.298 0.169 0.342 32 0.274 0.157 0.315 0.598 0.276 0.657 0.313 0.190 0.366 33 0.284 0.165 0.328 0.614 0.288 0.677 0.324 0.196 0.379 34 0.294 0.112 0.314 0.613 0.228 0.653 0.327 0.138 0.354 35 0.301 0.074 0.309 0.595 0.172 0.618 0.317 0.088 0.328 36 0.299 0.118 0.321 0.558 0.190 0.588 0.288 0.114 0.309 37 0.285 0.059 0.291 0.503 0.102 0.512 0.238 0.033 0.240 38 0.253 0.059 0.259 0.435 0.076 0.441 0.167 0.012 0.167 39 0.203 0.122 0.237 0.353 0.124 0.373 0.090 0.063 0.110 40 0.085 0.147 0.169 0.177 0.091 0.199 0.107 1.519 1.522 Note: Test scores at both ends (i.e., 0, 40) are not counted d ue to ad hoc procedure (Kolen, 1981).

PAGE 252

252 Table A 87. SEE, Bias, and RMSD for NOP method condition 1 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.561 0.561 0.000 0 .561 0.561 0.120 2.403 2.406 1 0.178 0.027 0.180 0.521 0.130 0.536 0.173 0.091 0.196 2 0.181 0.111 0.212 0.512 0.201 0.549 0.202 0.013 0.202 3 0.185 0.121 0.221 0.487 0.192 0.522 0.201 0.003 0.201 4 0.191 0.106 0.218 0.450 0.146 0.472 0.194 0.009 0 .193 5 0.200 0.170 0.262 0.405 0.171 0.439 0.191 0.070 0.203 6 0.212 0.226 0.310 0.359 0.190 0.405 0.195 0.144 0.242 7 0.228 0.292 0.370 0.314 0.225 0.386 0.205 0.226 0.305 8 0.248 0.344 0.424 0.288 0.264 0.390 0.222 0.293 0.367 9 0.274 0.376 0.465 0. 289 0.307 0.421 0.246 0.336 0.416 10 0.304 0.416 0.515 0.301 0.365 0.473 0.275 0.384 0.472 11 0.337 0.459 0.569 0.317 0.427 0.532 0.309 0.433 0.532 12 0.372 0.500 0.622 0.337 0.487 0.592 0.346 0.480 0.591 13 0.406 0.526 0.664 0.362 0.530 0.641 0.386 0. 512 0.641 14 0.441 0.538 0.695 0.393 0.556 0.680 0.427 0.530 0.680 15 0.474 0.559 0.732 0.429 0.591 0.729 0.468 0.557 0.727 16 0.507 0.581 0.770 0.471 0.627 0.783 0.509 0.586 0.775 17 0.538 0.583 0.793 0.517 0.643 0.824 0.548 0.593 0.807 18 0.569 0.59 0 0.819 0.564 0.662 0.869 0.585 0.605 0.840 19 0.598 0.589 0.838 0.611 0.671 0.907 0.619 0.606 0.865 20 0.627 0.592 0.861 0.656 0.682 0.945 0.650 0.610 0.891 21 0.654 0.608 0.892 0.697 0.703 0.989 0.679 0.625 0.922 22 0.680 0.612 0.913 0.735 0.710 1.02 0 0.704 0.627 0.942 23 0.703 0.601 0.924 0.767 0.699 1.037 0.727 0.613 0.950 24 0.725 0.576 0.924 0.797 0.674 1.042 0.748 0.586 0.949 25 0.744 0.567 0.934 0.823 0.665 1.056 0.768 0.575 0.958 26 0.761 0.558 0.942 0.846 0.657 1.070 0.786 0.566 0.967 27 0.775 0.545 0.946 0.867 0.648 1.081 0.802 0.554 0.973 28 0.786 0.548 0.957 0.883 0.657 1.099 0.814 0.560 0.986 29 0.793 0.567 0.973 0.895 0.682 1.123 0.821 0.581 1.004 30 0.795 0.567 0.975 0.899 0.688 1.130 0.820 0.583 1.004 31 0.791 0.563 0.970 0.894 0.689 1.127 0.808 0.578 0.992 32 0.781 0.563 0.961 0.880 0.691 1.117 0.782 0.574 0.969 33 0.762 0.550 0.938 0.856 0.674 1.088 0.742 0.551 0.922 34 0.734 0.473 0.872 0.823 0.590 1.011 0.684 0.458 0.822 35 0.693 0.407 0.802 0.781 0.512 0.932 0.608 0.368 0.709 36 0.634 0.416 0.757 0.725 0.509 0.884 0.512 0.347 0.618 37 0.559 0.315 0.641 0.660 0.400 0.770 0.398 0.210 0.449 38 0.460 0.261 0.528 0.573 0.343 0.667 0.269 0.122 0.294 39 0.326 0.255 0.413 0.422 0.313 0.525 0.139 0.102 0.172 40 0.105 0.148 0. 181 0.155 0.120 0.196 0.107 1.519 1.522 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 253

253 Table A 88. method condition 2 Min's Method Raw Score Full MIRT Approx Observed A pprox True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.351 0.351 0.000 0.351 0.351 0.120 2.613 2.616 1 0.142 0.160 0.214 0.525 0.209 0.564 0.179 0.091 0.200 2 0.164 0.149 0.221 0.541 0.247 0.594 0.208 0.133 0.246 3 0.194 0.092 0.215 0.5 18 0.326 0.611 0.219 0.201 0.297 4 0.221 0.052 0.227 0.494 0.372 0.617 0.233 0.236 0.331 5 0.240 0.029 0.241 0.465 0.385 0.603 0.252 0.251 0.355 6 0.250 0.007 0.249 0.429 0.385 0.576 0.268 0.280 0.387 7 0.252 0.009 0.251 0.391 0.306 0.495 0.277 0.25 7 0.378 8 0.247 0.024 0.247 0.370 0.257 0.450 0.277 0.282 0.395 9 0.236 0.039 0.238 0.380 0.192 0.425 0.268 0.286 0.391 10 0.220 0.050 0.226 0.390 0.141 0.414 0.252 0.282 0.378 11 0.203 0.067 0.213 0.388 0.114 0.404 0.231 0.280 0.363 12 0.184 0.090 0. 204 0.373 0.107 0.387 0.206 0.279 0.347 13 0.164 0.133 0.210 0.346 0.130 0.369 0.181 0.294 0.345 14 0.144 0.145 0.204 0.313 0.124 0.336 0.156 0.274 0.315 15 0.126 0.162 0.205 0.275 0.124 0.301 0.133 0.256 0.289 16 0.113 0.188 0.219 0.236 0.134 0.271 0. 116 0.245 0.271 17 0.107 0.196 0.223 0.197 0.128 0.234 0.107 0.215 0.240 18 0.109 0.215 0.241 0.161 0.136 0.210 0.110 0.195 0.223 19 0.120 0.232 0.261 0.136 0.141 0.196 0.123 0.171 0.210 20 0.137 0.246 0.281 0.131 0.143 0.194 0.144 0.143 0.203 21 0.15 9 0.262 0.306 0.148 0.147 0.208 0.169 0.116 0.205 22 0.184 0.284 0.338 0.181 0.160 0.241 0.197 0.094 0.218 23 0.211 0.298 0.365 0.223 0.167 0.278 0.226 0.064 0.234 24 0.240 0.310 0.391 0.270 0.175 0.322 0.255 0.032 0.256 25 0.268 0.322 0.418 0.322 0.18 9 0.373 0.285 0.004 0.284 26 0.296 0.357 0.463 0.379 0.234 0.445 0.315 0.001 0.315 27 0.324 0.412 0.523 0.443 0.313 0.541 0.347 0.024 0.346 28 0.351 0.466 0.583 0.509 0.400 0.647 0.378 0.048 0.380 29 0.378 0.510 0.634 0.577 0.482 0.750 0.409 0.067 0.41 3 30 0.402 0.545 0.677 0.636 0.547 0.838 0.437 0.077 0.442 31 0.423 0.593 0.728 0.683 0.612 0.916 0.459 0.100 0.469 32 0.440 0.627 0.766 0.712 0.643 0.958 0.474 0.107 0.484 33 0.452 0.744 0.870 0.724 0.734 1.030 0.475 0.191 0.511 34 0.454 0.740 0.868 0.718 0.686 0.992 0.461 0.150 0.484 35 0.443 0.727 0.851 0.695 0.618 0.929 0.429 0.100 0.440 36 0.415 0.653 0.773 0.655 0.494 0.819 0.377 0.004 0.377 37 0.373 0.541 0.657 0.606 0.351 0.699 0.306 0.134 0.334 38 0.329 0.509 0.606 0.530 0.310 0.613 0.21 8 0.165 0.273 39 0.250 0.305 0.394 0.401 0.132 0.421 0.123 0.284 0.309 40 0.000 0.318 0.318 0.123 0.355 0.376 0.107 2.024 2.027 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 254

254 Table A 89. SEE, Bia s, and RMSD for ODL d irect method condition 2 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.351 0.351 0.000 0.351 0.351 0.120 2.613 2.616 1 0.160 0.292 0.333 0.545 0.2 06 0.582 0.193 0.026 0.194 2 0.249 0.362 0.439 0.635 0.234 0.676 0.241 0.039 0.243 3 0.312 0.383 0.494 0.630 0.147 0.645 0.269 0.009 0.269 4 0.349 0.388 0.521 0.600 0.084 0.604 0.304 0.017 0.304 5 0.375 0.390 0.540 0.567 0.054 0.568 0.347 0.048 0.350 6 0.392 0.371 0.539 0.539 0.036 0.539 0.387 0.063 0.391 7 0.402 0.396 0.564 0.514 0.102 0.523 0.417 0.120 0.433 8 0.404 0.364 0.543 0.507 0.164 0.531 0.432 0.116 0.446 9 0.397 0.342 0.523 0.530 0.259 0.589 0.432 0.118 0.447 10 0.383 0.313 0.494 0.555 0.333 0.646 0.419 0.113 0.433 11 0.364 0.271 0.454 0.567 0.370 0.676 0.397 0.095 0.407 12 0.342 0.218 0.405 0.559 0.368 0.669 0.369 0.065 0.374 13 0.318 0.140 0.347 0.532 0.318 0.619 0.338 0.012 0.337 14 0.296 0.089 0.309 0.490 0.278 0.562 0.308 0.013 0.308 15 0.280 0.031 0.281 0.441 0.225 0.494 0.283 0.044 0.286 16 0.270 0.037 0.272 0.392 0.159 0.422 0.266 0.084 0.279 17 0.267 0.091 0.282 0.349 0.109 0.365 0.261 0.108 0.282 18 0.274 0.157 0.315 0.318 0.047 0.321 0.268 0.144 0.303 19 0.289 0.222 0.364 0.305 0.012 0.304 0.287 0.176 0.336 20 0.312 0.285 0.422 0.312 0.068 0.319 0.314 0.204 0.374 21 0.342 0.350 0.488 0.340 0.126 0.362 0.349 0.232 0.418 22 0.376 0.423 0.565 0.384 0.197 0.431 0.387 0.265 0.468 23 0.413 0.486 0.637 0.440 0.266 0.513 0.429 0.288 0.516 24 0.451 0.547 0.708 0.505 0.341 0.608 0.473 0.310 0.564 25 0.490 0.606 0.779 0.578 0.429 0.718 0.519 0.334 0.616 26 0.530 0.687 0.867 0.663 0.557 0.864 0.568 0.385 0.685 27 0.573 0.787 0.972 0 .760 0.725 1.049 0.619 0.461 0.770 28 0.616 0.886 1.078 0.863 0.906 1.250 0.670 0.538 0.858 29 0.658 0.975 1.175 0.962 1.078 1.443 0.718 0.604 0.936 30 0.695 1.050 1.258 1.044 1.219 1.603 0.758 0.652 0.998 31 0.723 1.130 1.341 1.098 1.337 1.729 0.784 0 .700 1.049 32 0.741 1.186 1.398 1.122 1.399 1.791 0.789 0.713 1.062 33 0.746 1.312 1.508 1.114 1.496 1.864 0.770 0.782 1.096 34 0.735 1.302 1.494 1.078 1.429 1.789 0.723 0.703 1.007 35 0.704 1.266 1.448 1.020 1.327 1.672 0.648 0.591 0.876 36 0.653 1.1 54 1.325 0.939 1.151 1.484 0.547 0.406 0.681 37 0.580 0.979 1.137 0.834 0.931 1.249 0.426 0.181 0.461 38 0.455 0.828 0.945 0.688 0.773 1.034 0.290 0.044 0.293 39 0.272 0.472 0.544 0.425 0.372 0.564 0.154 0.179 0.236 40 0.002 0.318 0.318 0.114 0.347 0.365 0.107 2.024 2.027 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 255

255 Table A 90. SEE, Bias, and RMSD for ODL TCF method condition 2 Test Characteristic Curve Method Raw Score Full MIRT Approx Obse rved Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.351 0.351 0.000 0.351 0.351 0.120 2.613 2.616 1 0.123 0.291 0.316 0.520 0.152 0.540 0.184 0.009 0.184 2 0.208 0.349 0.406 0.603 0.173 0.626 0.224 0.016 0.224 3 0.266 0.355 0.443 0.589 0.098 0.596 0.242 0.015 0.242 4 0.296 0.354 0.461 0.553 0.043 0.554 0.263 0.008 0.263 5 0.312 0.359 0.475 0.508 0.025 0.508 0.291 0.023 0.292 6 0.320 0.345 0.470 0.468 0.016 0.467 0.319 0.038 0.320 7 0.323 0.375 0.494 0.433 0.08 7 0.441 0.338 0.096 0.351 8 0.318 0.349 0.472 0.419 0.145 0.442 0.345 0.094 0.357 9 0.308 0.333 0.453 0.432 0.242 0.494 0.340 0.101 0.354 10 0.291 0.312 0.426 0.448 0.323 0.551 0.325 0.103 0.340 11 0.270 0.279 0.388 0.453 0.368 0.583 0.301 0.092 0.314 12 0.248 0.235 0.341 0.443 0.377 0.581 0.274 0.071 0.282 13 0.226 0.166 0.280 0.416 0.339 0.536 0.245 0.028 0.246 14 0.208 0.124 0.242 0.377 0.314 0.490 0.219 0.014 0.219 15 0.197 0.076 0.210 0.334 0.274 0.431 0.199 0.006 0.199 16 0.194 0.017 0.194 0.292 0.220 0.365 0.191 0.035 0.194 17 0.200 0.028 0.202 0.258 0.179 0.313 0.195 0.048 0.201 18 0.215 0.085 0.231 0.238 0.122 0.267 0.212 0.072 0.224 19 0.237 0.140 0.275 0.236 0.067 0.244 0.239 0.093 0.256 20 0.265 0.195 0.3 28 0.252 0.013 0.252 0.272 0.111 0.293 21 0.297 0.251 0.388 0.287 0.045 0.290 0.309 0.129 0.334 22 0.332 0.314 0.457 0.335 0.113 0.353 0.348 0.152 0.379 23 0.369 0.369 0.521 0.393 0.179 0.430 0.388 0.166 0.422 24 0.407 0.420 0.584 0.457 0.248 0.518 0. 431 0.180 0.466 25 0.446 0.470 0.647 0.529 0.326 0.620 0.475 0.196 0.512 26 0.485 0.543 0.727 0.615 0.441 0.755 0.521 0.238 0.572 27 0.526 0.636 0.824 0.713 0.593 0.926 0.569 0.305 0.644 28 0.567 0.728 0.922 0.815 0.757 1.111 0.618 0.374 0.721 29 0.60 6 0.812 1.012 0.914 0.912 1.290 0.664 0.433 0.791 30 0.641 0.883 1.091 0.994 1.038 1.436 0.703 0.476 0.848 31 0.668 0.962 1.170 1.047 1.145 1.550 0.729 0.522 0.896 32 0.685 1.018 1.226 1.073 1.199 1.607 0.737 0.537 0.911 33 0.692 1.147 1.338 1.069 1.29 3 1.676 0.722 0.613 0.946 34 0.683 1.143 1.331 1.038 1.227 1.606 0.681 0.546 0.872 35 0.655 1.118 1.295 0.986 1.130 1.499 0.614 0.452 0.761 36 0.609 1.020 1.187 0.912 0.964 1.325 0.521 0.289 0.594 37 0.543 0.866 1.022 0.814 0.761 1.113 0.405 0.090 0.41 4 38 0.430 0.738 0.853 0.670 0.625 0.915 0.274 0.015 0.274 39 0.256 0.400 0.475 0.390 0.288 0.484 0.142 0.207 0.250 40 0.000 0.318 0.318 0.064 0.330 0.337 0.107 2.024 2.027 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 256

256 Table A 91. SEE, Bias, and RMSD for ODL ICF method condition 2 Item Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.351 0.351 0.000 0.351 0.3 51 0.120 2.613 2.616 1 0.130 0.289 0.317 0.531 0.152 0.550 0.190 0.007 0.189 2 0.215 0.348 0.409 0.616 0.175 0.639 0.233 0.014 0.232 3 0.274 0.356 0.448 0.605 0.096 0.611 0.252 0.017 0.252 4 0.304 0.356 0.467 0.570 0.041 0.570 0.275 0.010 0.2 74 5 0.320 0.361 0.482 0.529 0.020 0.528 0.304 0.021 0.304 6 0.328 0.347 0.477 0.484 0.012 0.483 0.331 0.037 0.332 7 0.330 0.378 0.501 0.445 0.086 0.452 0.349 0.096 0.361 8 0.324 0.352 0.478 0.429 0.145 0.452 0.355 0.095 0.366 9 0.311 0.3 37 0.458 0.441 0.243 0.502 0.347 0.103 0.361 10 0.293 0.317 0.431 0.455 0.324 0.558 0.329 0.105 0.345 11 0.270 0.284 0.391 0.458 0.370 0.587 0.304 0.095 0.317 12 0.246 0.240 0.344 0.444 0.379 0.583 0.274 0.076 0.283 13 0.223 0.172 0.282 0.4 14 0.342 0.536 0.243 0.034 0.245 14 0.206 0.132 0.244 0.372 0.316 0.488 0.217 0.020 0.217 15 0.197 0.084 0.214 0.326 0.277 0.427 0.199 0.001 0.198 16 0.199 0.026 0.200 0.283 0.225 0.361 0.194 0.027 0.195 17 0.210 0.017 0.210 0.250 0.185 0.31 1 0.204 0.038 0.207 18 0.230 0.074 0.241 0.235 0.130 0.268 0.227 0.062 0.235 19 0.257 0.128 0.287 0.240 0.077 0.252 0.259 0.082 0.271 20 0.290 0.182 0.341 0.267 0.026 0.267 0.298 0.099 0.313 21 0.326 0.237 0.403 0.310 0.029 0.311 0.339 0.116 0.357 22 0.365 0.300 0.472 0.366 0.096 0.377 0.382 0.138 0.405 23 0.406 0.354 0.538 0.430 0.159 0.457 0.426 0.152 0.451 24 0.446 0.406 0.602 0.499 0.227 0.547 0.471 0.165 0.498 25 0.487 0.456 0.666 0.575 0.304 0.650 0.518 0.180 0.547 26 0.529 0.528 0.746 0.6 65 0.419 0.784 0.566 0.223 0.607 27 0.572 0.621 0.843 0.766 0.570 0.954 0.617 0.290 0.680 28 0.616 0.712 0.940 0.872 0.733 1.138 0.668 0.358 0.756 29 0.658 0.794 1.030 0.975 0.887 1.316 0.716 0.416 0.827 30 0.695 0.865 1.108 1.058 1.012 1.462 0.757 0.4 60 0.884 31 0.723 0.943 1.187 1.113 1.118 1.575 0.784 0.505 0.931 32 0.740 0.999 1.242 1.138 1.171 1.631 0.792 0.519 0.945 33 0.745 1.128 1.351 1.132 1.266 1.696 0.774 0.595 0.975 34 0.734 1.125 1.342 1.098 1.202 1.626 0.729 0.528 0.899 35 0.703 1.100 1.305 1.043 1.106 1.519 0.656 0.435 0.786 36 0.652 1.003 1.196 0.966 0.941 1.347 0.556 0.274 0.619 37 0.582 0.849 1.029 0.865 0.739 1.136 0.433 0.078 0.439 38 0.463 0.719 0.854 0.710 0.605 0.932 0.294 0.024 0.294 39 0.273 0.388 0.474 0.423 0.271 0.50 2 0.153 0.211 0.261 40 0.000 0.318 0.318 0.083 0.339 0.349 0.107 2.024 2.027 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 257

257 Table A 92. SEE, Bias, and RMSD for NOP method condition 2 Non Orthogon al Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.351 0.351 0.000 0.351 0.351 0.120 2.613 2.616 1 0.113 0.120 0.164 0.483 0.312 0.573 0.177 0.113 0.210 2 0.126 0.099 0.160 0. 480 0.356 0.596 0.197 0.170 0.260 3 0.140 0.024 0.142 0.458 0.448 0.640 0.196 0.255 0.321 4 0.154 0.034 0.158 0.432 0.505 0.664 0.194 0.311 0.366 5 0.172 0.072 0.186 0.402 0.523 0.659 0.198 0.347 0.399 6 0.189 0.120 0.223 0.363 0.524 0.637 0.204 0.394 0.443 7 0.207 0.113 0.235 0.324 0.445 0.550 0.211 0.385 0.439 8 0.227 0.153 0.273 0.295 0.406 0.501 0.220 0.419 0.473 9 0.250 0.171 0.303 0.297 0.361 0.467 0.232 0.428 0.486 10 0.276 0.185 0.332 0.306 0.327 0.448 0.250 0.426 0.493 11 0.304 0.202 0.36 5 0.316 0.311 0.443 0.273 0.422 0.502 12 0.334 0.223 0.401 0.326 0.308 0.448 0.301 0.417 0.514 13 0.365 0.262 0.448 0.339 0.328 0.471 0.332 0.426 0.540 14 0.396 0.269 0.478 0.357 0.316 0.476 0.365 0.400 0.541 15 0.426 0.279 0.509 0.381 0.308 0.489 0.40 0 0.375 0.547 16 0.456 0.296 0.543 0.413 0.307 0.513 0.434 0.355 0.560 17 0.486 0.295 0.567 0.449 0.290 0.533 0.469 0.316 0.564 18 0.514 0.306 0.597 0.489 0.284 0.564 0.502 0.287 0.577 19 0.542 0.314 0.625 0.530 0.274 0.595 0.533 0.253 0.589 20 0.569 0.321 0.652 0.571 0.261 0.627 0.563 0.216 0.601 21 0.596 0.329 0.679 0.612 0.250 0.659 0.590 0.178 0.615 22 0.621 0.343 0.708 0.650 0.246 0.693 0.616 0.147 0.631 23 0.645 0.349 0.732 0.685 0.233 0.722 0.639 0.106 0.646 24 0.668 0.352 0.753 0.719 0.222 0.751 0.661 0.066 0.663 25 0.688 0.355 0.773 0.752 0.215 0.780 0.682 0.028 0.681 26 0.707 0.381 0.802 0.786 0.238 0.820 0.703 0.016 0.701 27 0.724 0.428 0.840 0.821 0.291 0.869 0.723 0.029 0.721 28 0.739 0.473 0.876 0.855 0.351 0.922 0.740 0.044 0.740 29 0.751 0.510 0.907 0.885 0.406 0.971 0.755 0.053 0.754 30 0.759 0.538 0.929 0.907 0.448 1.010 0.762 0.054 0.762 31 0.760 0.579 0.954 0.919 0.496 1.042 0.759 0.068 0.760 32 0.753 0.606 0.965 0.919 0.518 1.053 0.743 0.068 0.744 33 0.736 0.717 1.026 0. 905 0.609 1.089 0.710 0.148 0.724 34 0.707 0.708 0.999 0.878 0.567 1.043 0.659 0.107 0.666 35 0.663 0.692 0.957 0.835 0.512 0.978 0.589 0.060 0.590 36 0.605 0.618 0.863 0.780 0.405 0.877 0.499 0.038 0.499 37 0.525 0.504 0.726 0.706 0.275 0.756 0.391 0.160 0.421 38 0.432 0.471 0.639 0.614 0.253 0.663 0.268 0.183 0.324 39 0.300 0.271 0.403 0.458 0.093 0.466 0.144 0.296 0.329 40 0.023 0.321 0.321 0.151 0.371 0.400 0.107 2.024 2.027 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 258

258 Table A 93. method condition 3 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.500 0.120 3.464 3.466 1 0.000 0.877 0.877 0.097 0.856 0.861 0.165 0.117 0.202 2 0.053 1.887 1.888 0.302 1.682 1.709 0.198 0.397 0.443 3 0.161 2.315 2.320 0.506 2.001 2.064 0.202 0.587 0.621 4 0.170 2.797 2.802 0.556 2.035 2.109 0.200 0.845 0.868 5 0.179 3.085 3.090 0.535 2.046 2.114 0.202 1.174 1.191 6 0.200 3.449 3.455 0.497 1.997 2.057 0.209 1.447 1.461 7 0.177 3.781 3.785 0.461 2.082 2.132 0.217 1.813 1.826 8 0.216 4.168 4.173 0.438 2.207 2.250 0.227 2.144 2.156 9 0.184 4.533 4.537 0.3 84 2.475 2.504 0.237 2.488 2.499 10 0.214 4.962 4.967 0.374 2.862 2.887 0.244 2.877 2.887 11 0.205 5.365 5.369 0.367 3.293 3.313 0.248 3.198 3.207 12 0.189 5.671 5.674 0.342 3.668 3.683 0.248 3.443 3.452 13 0.201 5.949 5.952 0.363 4.029 4. 046 0.244 3.644 3.652 14 0.196 6.256 6.259 0.363 4.396 4.411 0.235 3.859 3.867 15 0.180 6.526 6.528 0.366 4.710 4.724 0.223 4.053 4.059 16 0.164 6.697 6.699 0.370 4.916 4.930 0.208 4.165 4.171 17 0.151 6.853 6.854 0.365 5.095 5.108 0.190 4 .277 4.281 18 0.139 6.971 6.973 0.350 5.226 5.237 0.171 4.363 4.367 19 0.127 7.059 7.060 0.328 5.320 5.330 0.151 4.432 4.435 20 0.114 7.120 7.121 0.302 5.390 5.398 0.132 4.487 4.489 21 0.105 7.117 7.117 0.272 5.399 5.406 0.117 4.486 4.488 22 0.100 7.114 7.114 0.240 5.411 5.417 0.107 4.491 4.493 23 0.104 7.096 7.096 0.206 5.407 5.411 0.107 4.483 4.485 24 0.116 7.049 7.050 0.172 5.370 5.373 0.117 4.448 4.450 25 0.135 6.961 6.962 0.146 5.288 5.290 0.137 4.375 4.377 26 0.159 6. 871 6.873 0.135 5.201 5.203 0.161 4.308 4.311 27 0.185 6.753 6.756 0.147 5.088 5.090 0.188 4.228 4.232 28 0.211 6.604 6.607 0.178 4.948 4.951 0.214 4.134 4.139 29 0.237 6.461 6.466 0.218 4.831 4.836 0.240 4.067 4.074 30 0.264 6.308 6.313 0. 260 4.726 4.733 0.264 4.010 4.019 31 0.289 6.104 6.111 0.304 4.597 4.607 0.287 3.919 3.929 32 0.309 5.866 5.875 0.350 4.450 4.464 0.309 3.798 3.811 33 0.329 5.612 5.622 0.395 4.296 4.314 0.330 3.650 3.665 34 0.351 5.327 5.339 0.435 4.118 4 .140 0.349 3.445 3.463 35 0.367 5.001 5.014 0.463 3.905 3.932 0.365 3.153 3.174 36 0.378 4.556 4.571 0.480 3.582 3.613 0.370 2.674 2.699 37 0.386 4.067 4.085 0.482 3.242 3.277 0.353 2.084 2.114 38 0.373 3.457 3.477 0.456 2.845 2.881 0.293 1.336 1.368 39 0.323 2.516 2.537 0.408 2.221 2.258 0.173 0.381 0.418 40 0.215 1.272 1.290 0.306 1.337 1.371 0.107 1.396 1.400 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 259

259 Table A 94. SEE, Bias and RMSD for ODL d irect method condition 3 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.500 0.120 3.464 3.466 1 0.000 0.877 0.877 0.312 0.624 0.698 0.164 0.008 0.163 2 0.090 1.374 1.377 0.484 0.978 1.091 0.186 0.179 0.257 3 0.160 1.739 1.746 0.503 0.948 1.072 0.181 0.251 0.309 4 0.108 1.941 1.944 0.477 0.888 1.007 0.172 0.376 0.413 5 0.163 2.150 2.156 0.425 0.874 0.972 0.169 0.5 50 0.576 6 0.166 2.373 2.379 0.367 0.805 0.884 0.172 0.648 0.671 7 0.142 2.608 2.612 0.327 0.853 0.913 0.178 0.823 0.842 8 0.168 2.805 2.810 0.314 0.938 0.989 0.183 0.950 0.967 9 0.186 3.061 3.066 0.300 1.154 1.192 0.188 1.083 1.099 10 0.1 79 3.357 3.362 0.267 1.456 1.480 0.189 1.263 1.277 11 0.176 3.581 3.585 0.283 1.695 1.718 0.189 1.382 1.395 12 0.184 3.740 3.745 0.312 1.864 1.890 0.188 1.440 1.452 13 0.194 3.870 3.875 0.331 1.976 2.004 0.186 1.472 1.484 14 0.201 4.022 4. 027 0.341 2.086 2.114 0.184 1.538 1.549 15 0.206 4.154 4.159 0.341 2.164 2.191 0.183 1.602 1.612 16 0.208 4.203 4.208 0.334 2.160 2.186 0.182 1.603 1.613 17 0.211 4.246 4.251 0.321 2.163 2.187 0.181 1.618 1.629 18 0.213 4.262 4.267 0.303 2 .157 2.179 0.181 1.625 1.635 19 0.214 4.258 4.264 0.282 2.152 2.170 0.181 1.627 1.637 20 0.215 4.244 4.249 0.259 2.152 2.167 0.180 1.630 1.640 21 0.217 4.181 4.186 0.239 2.118 2.131 0.180 1.593 1.603 22 0.218 4.134 4.139 0.227 2.110 2.122 0 .179 1.578 1.588 23 0.219 4.085 4.091 0.227 2.107 2.119 0.179 1.567 1.577 24 0.221 4.021 4.027 0.238 2.088 2.101 0.179 1.546 1.557 25 0.223 3.926 3.932 0.257 2.038 2.054 0.179 1.504 1.514 26 0.226 3.841 3.847 0.281 1.998 2.017 0.180 1.481 1.492 27 0.227 3.740 3.746 0.306 1.942 1.966 0.182 1.454 1.466 28 0.229 3.618 3.625 0.333 1.869 1.899 0.186 1.417 1.430 29 0.231 3.515 3.523 0.360 1.819 1.854 0.193 1.408 1.422 30 0.235 3.415 3.423 0.389 1.774 1.816 0.203 1.407 1.421 31 0. 242 3.275 3.284 0.418 1.701 1.752 0.215 1.372 1.388 32 0.248 3.110 3.120 0.445 1.617 1.676 0.229 1.314 1.334 33 0.253 2.940 2.951 0.465 1.542 1.610 0.244 1.249 1.273 34 0.257 2.759 2.771 0.477 1.467 1.543 0.256 1.163 1.190 35 0.265 2.566 2 .580 0.479 1.390 1.470 0.263 1.047 1.080 36 0.279 2.283 2.299 0.468 1.246 1.330 0.257 0.836 0.874 37 0.277 2.001 2.021 0.442 1.140 1.222 0.231 0.636 0.676 38 0.255 1.692 1.711 0.398 1.042 1.115 0.178 0.424 0.460 39 0.238 1.173 1.197 0.335 0.791 0.858 0.105 0.073 0.127 40 0.133 0.458 0.476 0.231 0.363 0.430 0.107 1.396 1.400 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 260

260 Table A 95. SEE, Bias, and RMSD for ODL TCF method condition 3 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.500 0.120 3.464 3.466 1 0.124 0.821 0.831 0.427 0.487 0.646 0.168 0.016 0.168 2 0.185 1 .296 1.309 0.609 0.768 0.979 0.201 0.132 0.240 3 0.319 1.599 1.630 0.635 0.725 0.963 0.222 0.177 0.284 4 0.356 1.769 1.804 0.620 0.664 0.907 0.262 0.269 0.375 5 0.393 1.975 2.014 0.592 0.651 0.879 0.325 0.404 0.518 6 0.450 2.166 2.212 0.569 0.576 0.808 0.402 0.460 0.610 7 0.487 2.376 2.425 0.560 0.615 0.830 0.485 0.590 0.763 8 0.542 2.550 2.607 0.579 0.688 0.899 0.566 0.674 0.879 9 0.609 2.774 2.840 0.623 0.881 1.078 0.641 0.768 0.999 10 0.660 3.043 3.113 0.682 1.147 1.334 0 .707 0.913 1.153 11 0.704 3.243 3.318 0.763 1.352 1.552 0.763 1.002 1.259 12 0.748 3.381 3.463 0.849 1.487 1.711 0.811 1.035 1.314 13 0.791 3.491 3.579 0.923 1.570 1.820 0.850 1.047 1.347 14 0.831 3.624 3.718 0.979 1.657 1.923 0.881 1.097 1.405 15 0.865 3.740 3.838 1.020 1.717 1.996 0.906 1.147 1.461 16 0.894 3.774 3.878 1.047 1.698 1.994 0.926 1.138 1.465 17 0.917 3.806 3.914 1.067 1.688 1.996 0.940 1.146 1.481 18 0.934 3.813 3.925 1.082 1.668 1.987 0.949 1.148 1.488 19 0. 945 3.804 3.919 1.093 1.648 1.976 0.952 1.148 1.490 20 0.951 3.787 3.904 1.099 1.633 1.966 0.950 1.153 1.492 21 0.953 3.723 3.843 1.100 1.584 1.927 0.943 1.120 1.462 22 0.952 3.677 3.798 1.098 1.563 1.908 0.930 1.111 1.447 23 0.947 3.631 3 .752 1.090 1.548 1.892 0.913 1.109 1.435 24 0.940 3.571 3.692 1.077 1.523 1.863 0.892 1.100 1.414 25 0.930 3.481 3.603 1.058 1.470 1.810 0.869 1.069 1.376 26 0.916 3.404 3.524 1.034 1.433 1.766 0.845 1.059 1.354 27 0.899 3.311 3.431 1.009 1.385 1.712 0.822 1.044 1.327 28 0.879 3.200 3.318 0.983 1.323 1.647 0.800 1.019 1.294 29 0.856 3.110 3.225 0.957 1.288 1.603 0.778 1.022 1.283 30 0.832 3.022 3.134 0.933 1.262 1.568 0.757 1.032 1.279 31 0.805 2.898 3.007 0.908 1.212 1.513 0.732 1.012 1.248 32 0.776 2.749 2.856 0.883 1.155 1.452 0.704 0.972 1.199 33 0.744 2.597 2.701 0.854 1.113 1.401 0.667 0.930 1.143 34 0.708 2.436 2.536 0.818 1.075 1.350 0.619 0.873 1.069 35 0.669 2.264 2.360 0.772 1.039 1.293 0.557 0.795 0.970 36 0.623 2.008 2.103 0.714 0.939 1.178 0.476 0.629 0.788 37 0.558 1.763 1.849 0.642 0.881 1.089 0.372 0.485 0.610 38 0.470 1.496 1.568 0.555 0.833 1.000 0.244 0.334 0.413 39 0.363 1.037 1.099 0.452 0.631 0.775 0.113 0.042 0.120 40 0 .197 0.387 0.434 0.302 0.260 0.398 0.107 1.396 1.400 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 261

261 Table A 96. SEE, Bias, and RMSD for ODL ICF method condition 3 Item Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.500 0.120 3.464 3.466 1 0.126 0.824 0.833 0.428 0.495 0.654 0.170 0.015 0.170 2 0.184 1.299 1.312 0.611 0.776 0.987 0.203 0.134 0.243 3 0.316 1.604 1.634 0.639 0.731 0.970 0.223 0.181 0.287 4 0.349 1.777 1.811 0.624 0.671 0.915 0.260 0.274 0.377 5 0.388 1.982 2.020 0.589 0.661 0.885 0.319 0.411 0.520 6 0.444 2.173 2.218 0.564 0.586 0.812 0.394 0.468 0.611 7 0.480 2.385 2.432 0.549 0.627 0.832 0.475 0.600 0.764 8 0.536 2.560 2.615 0.566 0.700 0.900 0.556 0.685 0.881 9 0.603 2.785 2.849 0.609 0.892 1.080 0.631 0.780 1.002 10 0.654 3.053 3.122 0.666 1.160 1.337 0.698 0.925 1.158 11 0.699 3.254 3 .328 0.747 1.366 1.556 0.755 1.015 1.264 12 0.744 3.393 3.473 0.834 1.502 1.717 0.804 1.049 1.320 13 0.788 3.503 3.590 0.909 1.586 1.827 0.844 1.061 1.354 14 0.829 3.637 3.730 0.968 1.673 1.931 0.876 1.110 1.413 15 0.864 3.752 3.850 1.010 1.733 2.004 0.903 1.161 1.469 16 0.894 3.786 3.890 1.039 1.714 2.003 0.923 1.151 1.474 17 0.917 3.819 3.927 1.061 1.704 2.006 0.939 1.159 1.490 18 0.935 3.826 3.938 1.078 1.684 1.998 0.948 1.161 1.497 19 0.947 3.817 3.932 1.089 1.663 1.986 0.953 1.161 1.500 20 0.954 3.799 3.916 1.096 1.647 1.977 0.951 1.165 1.503 21 0.957 3.735 3.855 1.099 1.598 1.938 0.945 1.131 1.472 22 0.956 3.688 3.809 1.098 1.576 1.919 0.933 1.122 1.458 23 0.952 3.641 3.763 1.091 1.561 1.903 0.916 1.120 1.445 24 0.946 3.581 3.703 1.078 1.534 1.874 0.897 1.109 1.425 25 0.936 3.490 3.613 1.060 1.481 1.820 0.875 1.077 1.386 26 0.923 3.412 3.534 1.037 1.443 1.775 0.852 1.067 1.364 27 0.906 3.319 3.440 1.013 1.393 1.721 0.830 1.051 1.338 28 0 .887 3.207 3.327 0.989 1.329 1.655 0.808 1.025 1.304 29 0.865 3.116 3.233 0.965 1.292 1.611 0.788 1.027 1.293 30 0.841 3.028 3.142 0.941 1.265 1.575 0.767 1.037 1.288 31 0.816 2.903 3.015 0.918 1.214 1.520 0.743 1.016 1.257 32 0.787 2.753 2.863 0.893 1.156 1.459 0.715 0.975 1.208 33 0.755 2.600 2.707 0.865 1.113 1.408 0.678 0.932 1.152 34 0.719 2.438 2.541 0.829 1.074 1.356 0.629 0.874 1.077 35 0.679 2.266 2.365 0.783 1.037 1.298 0.566 0.796 0.976 36 0.632 2.010 2.107 0.724 0.936 1.183 0.484 0.629 0.793 37 0.565 1.765 1.853 0.653 0.878 1.093 0.379 0.484 0.614 38 0.475 1.498 1.571 0.565 0.829 1.002 0.249 0.333 0.415 39 0.368 1.038 1.101 0.461 0.627 0.778 0.116 0.041 0.123 40 0.203 0.385 0.435 0.304 0.259 0.399 0.107 1.396 1.400 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 262

262 Table A 97. SEE, Bias, and RMSD for NOP method condition 3 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Appr ox True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.500 0.120 3.464 3.466 1 0.000 0.877 0.877 0.000 0.877 0.877 0.144 0.231 0.272 2 0.000 1.897 1.897 0.000 1.897 1.897 0.171 0.640 0.662 3 0.000 2.776 2.776 0.14 1 2.731 2.734 0.172 0.964 0.979 4 0.037 3.651 3.651 0.349 3.309 3.327 0.166 1.363 1.373 5 0.154 4.174 4.177 0.493 3.529 3.563 0.161 1.846 1.853 6 0.083 4.728 4.729 0.475 3.545 3.576 0.157 2.289 2.294 7 0.167 5.276 5.278 0.433 3.651 3.677 0 .155 2.846 2.850 8 0.099 5.717 5.718 0.401 3.795 3.816 0.153 3.388 3.392 9 0.138 6.339 6.341 0.405 4.141 4.161 0.153 3.963 3.966 10 0.146 6.879 6.880 0.361 4.604 4.618 0.153 4.603 4.605 11 0.133 7.507 7.508 0.365 5.095 5.108 0.154 5.189 5. 191 12 0.161 7.982 7.984 0.319 5.589 5.598 0.156 5.710 5.712 13 0.143 8.506 8.507 0.330 6.093 6.102 0.160 6.191 6.193 14 0.165 8.989 8.991 0.272 6.659 6.665 0.164 6.682 6.684 15 0.168 9.509 9.511 0.301 7.239 7.246 0.171 7.143 7.145 16 0.16 8 9.890 9.891 0.243 7.740 7.744 0.179 7.509 7.511 17 0.201 10.291 10.293 0.281 8.246 8.251 0.190 7.856 7.858 18 0.204 10.636 10.638 0.253 8.695 8.699 0.204 8.159 8.161 19 0.221 10.929 10.931 0.271 9.091 9.095 0.221 8.424 8.427 20 0.252 11. 196 11.199 0.282 9.448 9.452 0.241 8.657 8.661 21 0.279 11.384 11.388 0.272 9.706 9.710 0.265 8.816 8.820 22 0.304 11.552 11.556 0.273 9.935 9.939 0.292 8.962 8.967 23 0.333 11.683 11.688 0.280 10.119 10.123 0.322 9.075 9.081 24 0.365 11.76 3 11.769 0.289 10.246 10.250 0.354 9.140 9.147 25 0.398 11.779 11.785 0.300 10.302 10.306 0.388 9.141 9.150 26 0.431 11.771 11.778 0.315 10.330 10.335 0.424 9.122 9.132 27 0.464 11.713 11.723 0.337 10.304 10.310 0.459 9.060 9.071 28 0.496 1 1.603 11.613 0.367 10.221 10.227 0.492 8.954 8.967 29 0.526 11.478 11.490 0.403 10.123 10.131 0.522 8.850 8.866 30 0.555 11.320 11.334 0.441 10.003 10.013 0.548 8.736 8.753 31 0.581 11.090 11.105 0.478 9.831 9.843 0.569 8.574 8.593 32 0.605 10.799 10.815 0.512 9.630 9.643 0.586 8.377 8.397 33 0.625 10.462 10.480 0.540 9.420 9.436 0.600 8.149 8.171 34 0.643 10.064 10.085 0.562 9.186 9.204 0.612 7.854 7.878 35 0.655 9.591 9.613 0.582 8.902 8.921 0.625 7.439 7.465 36 0.661 8.948 8.972 0.603 8.455 8.477 0.639 6.750 6.780 37 0.655 8.177 8.203 0.618 7.891 7.915 0.646 5.737 5.773 38 0.627 7.133 7.160 0.615 7.090 7.117 0.611 4.078 4.124 39 0.553 5.494 5.522 0.576 5.769 5.797 0.388 1.428 1.480 40 0.393 3.136 3.160 0.453 3.720 3.747 0.107 1.396 1.400 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 263

263 Table A 98. method condition 4 Min's Method Raw Score Full MIRT Approx Observed Approx Tr ue SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.562 0.562 0.000 0.562 0.562 0.120 2.403 2.405 1 0.192 0.257 0.321 0.601 0.313 0.676 0.193 0.208 0.284 2 0.244 0.400 0.468 0.602 0.352 0.696 0.234 0.092 0.251 3 0.294 0.600 0.668 0.576 0.426 0.715 0.262 0.072 0.271 4 0.333 0.754 0.824 0.546 0.438 0.699 0.296 0.200 0.356 5 0.360 0.923 0.991 0.507 0.473 0.693 0.330 0.346 0.478 6 0.373 1.028 1.093 0.470 0.478 0.669 0.357 0.431 0.559 7 0.374 1.138 1.198 0.450 0.556 0.715 0.373 0.532 0.649 8 0 .367 1.211 1.265 0.466 0.685 0.828 0.377 0.608 0.715 9 0.353 1.226 1.275 0.492 0.808 0.945 0.371 0.641 0.740 10 0.334 1.184 1.230 0.507 0.872 1.008 0.356 0.631 0.724 11 0.311 1.118 1.160 0.506 0.897 1.029 0.336 0.609 0.695 12 0.286 1.058 1.096 0.491 0. 914 1.037 0.311 0.606 0.681 13 0.261 0.976 1.010 0.464 0.899 1.011 0.285 0.590 0.655 14 0.237 0.864 0.895 0.427 0.847 0.948 0.258 0.552 0.609 15 0.216 0.749 0.779 0.384 0.787 0.876 0.234 0.518 0.568 16 0.199 0.614 0.646 0.338 0.704 0.781 0.215 0.470 0. 516 17 0.188 0.467 0.503 0.294 0.603 0.670 0.202 0.413 0.459 18 0.183 0.324 0.372 0.254 0.501 0.561 0.197 0.362 0.412 19 0.185 0.170 0.251 0.224 0.380 0.441 0.201 0.298 0.359 20 0.194 0.017 0.194 0.208 0.254 0.328 0.213 0.232 0.315 21 0.209 0.128 0.2 45 0.209 0.125 0.243 0.231 0.172 0.287 22 0.229 0.267 0.351 0.226 0.009 0.226 0.253 0.115 0.278 23 0.251 0.413 0.483 0.257 0.164 0.304 0.279 0.048 0.283 24 0.275 0.579 0.641 0.296 0.353 0.460 0.307 0.044 0.310 25 0.302 0.724 0.784 0.340 0.531 0.630 0.338 0.120 0.357 26 0.331 0.846 0.908 0.386 0.693 0.793 0.370 0.176 0.409 27 0.362 0.950 1.016 0.436 0.846 0.951 0.403 0.219 0.458 28 0.394 1.045 1.117 0.488 1.001 1.113 0.436 0.261 0.507 29 0.426 1.135 1.212 0.541 1.161 1.280 0.468 0.303 0.557 30 0.457 1.213 1.296 0.593 1.311 1.439 0.497 0.344 0.604 31 0.486 1.295 1.383 0.643 1.455 1.590 0.520 0.399 0.655 32 0.513 1.365 1.458 0.686 1.570 1.713 0.536 0.453 0.701 33 0.535 1.430 1.526 0.718 1.658 1.806 0.543 0.514 0.746 34 0.552 1.487 1.586 0.737 1.716 1.867 0.538 0.580 0.790 35 0.561 1.526 1.625 0.741 1.734 1.885 0.519 0.643 0.826 36 0.557 1.421 1.526 0.725 1.589 1.746 0.482 0.578 0.752 37 0.533 1.321 1.424 0.684 1.438 1.591 0.421 0.532 0.677 38 0.477 1.197 1.288 0.617 1.274 1.415 0.323 0.456 0.558 39 0.371 0.757 0.842 0.515 0.843 0.987 0.178 0.013 0.178 40 0.191 0.281 0.340 0.341 0.392 0.519 0.107 1.326 1.330 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedur e (Kolen, 1981).

PAGE 264

264 Table A 99. SEE, Bias, and RMSD f or ODL d irect method condition 4 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.562 0.562 0.000 0.562 0.562 0.120 2.403 2 .405 1 0.241 0.058 0.247 0.643 0.090 0.647 0.212 0.337 0.398 2 0.281 0.154 0.320 0.630 0.065 0.632 0.261 0.295 0.394 3 0.323 0.300 0.440 0.611 0.004 0.609 0.306 0.224 0.379 4 0.360 0.402 0.539 0.587 0.004 0.586 0.361 0.206 0.415 5 0.389 0.534 0.660 0.563 0.016 0.561 0.412 0.150 0.438 6 0.409 0.623 0.745 0.544 0.006 0.543 0.448 0.118 0.462 7 0.421 0.740 0.851 0.544 0.018 0.543 0.467 0.032 0.467 8 0.426 0.837 0.939 0.578 0.056 0.579 0.475 0.062 0.478 9 0.428 0.890 0.987 0.632 0.096 0.638 0.476 0.135 0.493 10 0.432 0.894 0.992 0.667 0.129 0.678 0.475 0.180 0.507 11 0.437 0.879 0.982 0.678 0.173 0.698 0.476 0.223 0.525 12 0.447 0.875 0.982 0.670 0.249 0.714 0.482 0.290 0.562 13 0.462 0.850 0.967 0.653 0.319 0.725 0.494 0.348 0.604 14 0. 480 0.796 0.929 0.632 0.369 0.731 0.514 0.386 0.642 15 0.502 0.740 0.894 0.616 0.423 0.746 0.540 0.430 0.689 16 0.528 0.667 0.850 0.612 0.466 0.768 0.572 0.463 0.735 17 0.558 0.582 0.806 0.624 0.503 0.800 0.609 0.488 0.779 18 0.590 0.504 0.775 0.654 0. 548 0.852 0.649 0.520 0.831 19 0.625 0.414 0.748 0.701 0.578 0.907 0.691 0.540 0.876 20 0.661 0.325 0.735 0.762 0.607 0.973 0.734 0.558 0.921 21 0.698 0.243 0.738 0.835 0.636 1.048 0.777 0.579 0.967 22 0.737 0.167 0.754 0.914 0.664 1.128 0.819 0.599 1. 013 23 0.776 0.079 0.778 0.998 0.672 1.201 0.861 0.603 1.049 24 0.814 0.033 0.813 1.085 0.648 1.262 0.902 0.577 1.069 25 0.851 0.130 0.859 1.176 0.633 1.333 0.942 0.561 1.094 26 0.888 0.209 0.910 1.272 0.632 1.418 0.982 0.559 1.128 27 0.925 0.272 0.962 1.369 0.645 1.510 1.022 0.568 1.167 28 0.961 0.329 1.014 1.463 0.659 1.601 1.062 0.578 1.207 29 0.998 0.377 1.064 1.552 0.673 1.688 1.100 0.588 1.245 30 1.033 0.411 1.110 1.633 0.687 1.768 1.133 0.602 1.281 31 1.068 0.444 1.154 1.699 0.685 1. 828 1.159 0.602 1.304 32 1.100 0.463 1.191 1.749 0.675 1.871 1.173 0.597 1.314 33 1.125 0.477 1.220 1.776 0.644 1.885 1.172 0.574 1.302 34 1.140 0.491 1.239 1.772 0.588 1.862 1.148 0.525 1.260 35 1.138 0.503 1.242 1.730 0.508 1.799 1.097 0.447 1.18 2 36 1.109 0.404 1.177 1.642 0.517 1.718 1.007 0.449 1.100 37 1.037 0.362 1.096 1.497 0.443 1.557 0.863 0.360 0.933 38 0.902 0.368 0.972 1.286 0.290 1.315 0.636 0.198 0.665 39 0.691 0.133 0.702 0.984 0.319 1.032 0.300 0.275 0.407 40 0.396 0.084 0. 404 0.523 0.143 0.541 0.107 1.326 1.330 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 265

265 Table A 100. SEE, Bias, and RMSD for ODL TCF method condition 4 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.562 0.562 0.000 0.562 0.562 0.120 2.403 2.405 1 0.327 0.424 0.535 0.782 0.781 1.104 0.257 0.038 0.259 2 0.414 0.640 0.762 0.786 0.823 1.137 0.331 0.138 0.358 3 0.483 0.886 1.008 0.769 0.897 1.180 0.411 0.350 0.539 4 0.537 1.054 1.182 0.753 0.907 1.177 0.503 0.517 0.720 5 0.580 1.222 1.352 0.738 0.935 1.190 0.585 0.682 0.898 6 0.604 1.313 1.444 0.739 0.942 1.196 0.641 0.765 0.997 7 0.609 1.399 1.526 0.776 1.04 4 1.299 0.668 0.847 1.078 8 0.597 1.440 1.558 0.850 1.206 1.474 0.671 0.893 1.116 9 0.573 1.418 1.528 0.923 1.332 1.619 0.655 0.887 1.101 10 0.540 1.335 1.439 0.961 1.374 1.675 0.624 0.830 1.038 11 0.502 1.225 1.323 0.961 1.359 1.663 0.583 0.758 0.955 12 0.460 1.118 1.208 0.929 1.324 1.617 0.536 0.699 0.880 13 0.414 0.987 1.070 0.873 1.248 1.522 0.483 0.625 0.790 14 0.368 0.826 0.903 0.798 1.125 1.378 0.429 0.527 0.679 15 0.324 0.661 0.736 0.708 0.986 1.213 0.374 0.431 0.570 16 0.283 0.477 0.555 0. 607 0.817 1.017 0.322 0.322 0.454 17 0.249 0.282 0.376 0.501 0.626 0.802 0.277 0.203 0.343 18 0.227 0.092 0.244 0.397 0.433 0.587 0.246 0.092 0.262 19 0.221 0.109 0.246 0.308 0.222 0.379 0.236 0.031 0.238 20 0.233 0.309 0.387 0.257 0.006 0.257 0.251 0.153 0.293 21 0.258 0.501 0.564 0.275 0.211 0.346 0.285 0.268 0.391 22 0.295 0.685 0.745 0.352 0.431 0.555 0.333 0.378 0.503 23 0.339 0.876 0.939 0.461 0.669 0.812 0.387 0.499 0.631 24 0.388 1.090 1.157 0.585 0.940 1.106 0.445 0.644 0.78 2 25 0.437 1.285 1.357 0.715 1.201 1.397 0.504 0.773 0.922 26 0.488 1.456 1.535 0.846 1.444 1.673 0.564 0.883 1.047 27 0.540 1.609 1.696 0.977 1.669 1.933 0.625 0.980 1.162 28 0.595 1.752 1.850 1.107 1.885 2.185 0.688 1.074 1.275 29 0.651 1.890 1.998 1.233 2.094 2.428 0.751 1.168 1.388 30 0.708 2.013 2.134 1.353 2.288 2.656 0.815 1.254 1.494 31 0.765 2.135 2.268 1.466 2.478 2.877 0.877 1.346 1.605 32 0.823 2.240 2.385 1.566 2.641 3.069 0.933 1.427 1.704 33 0.878 2.332 2.491 1.646 2.775 3.225 0.979 1.502 1.792 34 0.929 2.409 2.581 1.698 2.868 3.331 1.010 1.566 1.862 35 0.970 2.456 2.640 1.711 2.902 3.367 1.016 1.604 1.897 36 0.991 2.344 2.544 1.674 2.749 3.217 0.987 1.484 1.781 37 0.977 2.213 2.418 1.570 2.55 4 2.996 0.900 1.338 1.611 38 0.899 2.012 2.203 1.385 2.292 2.676 0.715 1.088 1.301 39 0.721 1.410 1.582 1.107 1.671 2.003 0.369 0.340 0.502 40 0.446 0.656 0.792 0.707 0.969 1.198 0.107 1.326 1.330 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 266

266 Table A 101. SEE, Bias, and RMSD for ODL ICF method condition 4 Item Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.562 0.562 0.000 0.562 0.562 0.120 2.403 2.405 1 0.210 0.104 0.234 0.553 0.358 0.658 0.171 0.383 0.420 2 0.238 0.043 0.242 0.549 0.357 0.654 0.203 0.381 0.431 3 0.264 0.063 0.271 0.520 0.306 0.602 0.230 0.360 0.427 4 0.287 0.121 0.311 0.488 0.316 0.580 0.266 0.404 0.483 5 0.306 0.208 0.369 0.453 0.312 0.549 0.303 0.414 0.513 6 0.317 0.254 0.406 0.417 0.356 0.548 0.332 0.446 0.556 7 0.319 0.335 0.462 0.400 0.357 0.535 0.347 0.416 0.542 8 0.313 0.403 0.509 0.420 0.360 0.553 0.350 0.370 0.509 9 0.301 0.433 0.527 0.459 0.394 0.604 0.343 0.334 0.479 10 0.285 0.421 0.508 0.483 0.432 0.647 0.329 0.318 0.457 11 0.267 0.395 0.477 0.489 0.445 0.661 0.309 0.296 0.427 12 0.247 0.383 0.456 0.477 0.411 0.629 0.286 0.245 0.376 13 0.227 0.354 0.420 0.452 0.368 0.582 0.262 0.197 0.327 14 0.207 0.298 0.362 0.417 0.332 0.532 0.238 0.165 0.289 15 0.189 0.243 0.307 0.375 0.283 0.470 0.215 0.124 0.247 16 0.173 0.172 0.243 0.331 0.241 0.409 0.194 0.091 0.214 17 0.161 0.093 0.18 5 0.285 0.199 0.347 0.178 0.062 0.188 18 0.154 0.021 0.155 0.242 0.143 0.281 0.168 0.022 0.169 19 0.153 0.062 0.165 0.207 0.094 0.227 0.166 0.008 0.166 20 0.159 0.142 0.213 0.188 0.043 0.193 0.172 0.039 0.176 21 0.170 0.215 0.274 0.193 0.014 0 .193 0.185 0.075 0.199 22 0.186 0.282 0.338 0.221 0.074 0.233 0.203 0.114 0.233 23 0.203 0.360 0.413 0.265 0.120 0.290 0.225 0.137 0.263 24 0.222 0.461 0.512 0.319 0.134 0.345 0.249 0.130 0.280 25 0.243 0.547 0.598 0.378 0.157 0.408 0.273 0.133 0.3 04 26 0.264 0.613 0.667 0.441 0.193 0.481 0.299 0.150 0.334 27 0.286 0.663 0.722 0.508 0.242 0.562 0.325 0.177 0.370 28 0.310 0.706 0.770 0.578 0.297 0.649 0.353 0.206 0.408 29 0.334 0.740 0.811 0.650 0.354 0.738 0.382 0.237 0.449 30 0.360 0.759 0.839 0.718 0.412 0.826 0.413 0.275 0.495 31 0.387 0.777 0.867 0.777 0.452 0.898 0.442 0.304 0.536 32 0.416 0.778 0.881 0.828 0.485 0.957 0.470 0.334 0.576 33 0.444 0.771 0.889 0.864 0.496 0.994 0.492 0.351 0.604 34 0.468 0.760 0.892 0.880 0.479 1. 000 0.506 0.347 0.612 35 0.486 0.744 0.887 0.872 0.432 0.971 0.505 0.318 0.596 36 0.492 0.609 0.782 0.837 0.464 0.955 0.484 0.370 0.608 37 0.479 0.525 0.710 0.765 0.404 0.863 0.431 0.330 0.542 38 0.432 0.488 0.651 0.656 0.257 0.703 0.329 0.205 0.38 7 39 0.341 0.220 0.405 0.510 0.322 0.602 0.163 0.290 0.332 40 0.221 0.057 0.228 0.292 0.268 0.396 0.107 1.326 1.330 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 267

267 Table A 102. SEE, Bias, and RMSD fo r NOP method condition 4 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.562 0.562 0.000 0.562 0.562 0.120 2.403 2.405 1 0.185 0.008 0.184 0.518 0.219 0.562 0.1 65 0.386 0.419 2 0.189 0.087 0.208 0.511 0.198 0.547 0.188 0.364 0.409 3 0.198 0.220 0.296 0.484 0.138 0.502 0.191 0.313 0.366 4 0.214 0.313 0.379 0.457 0.140 0.477 0.195 0.314 0.369 5 0.234 0.443 0.501 0.423 0.120 0.439 0.209 0.271 0.342 6 0 .257 0.536 0.594 0.381 0.144 0.406 0.231 0.242 0.334 7 0.282 0.662 0.720 0.334 0.128 0.357 0.258 0.150 0.298 8 0.307 0.773 0.831 0.306 0.105 0.323 0.288 0.043 0.291 9 0.333 0.842 0.905 0.326 0.074 0.334 0.321 0.049 0.324 10 0.359 0.865 0.936 0.3 61 0.040 0.362 0.355 0.116 0.373 11 0.387 0.870 0.952 0.396 0.018 0.395 0.391 0.183 0.431 12 0.415 0.886 0.978 0.429 0.118 0.444 0.427 0.276 0.507 13 0.443 0.882 0.987 0.462 0.218 0.509 0.464 0.360 0.587 14 0.471 0.849 0.971 0.495 0.301 0.578 0.501 0. 426 0.657 15 0.499 0.815 0.955 0.531 0.392 0.659 0.538 0.498 0.732 16 0.525 0.763 0.926 0.571 0.476 0.742 0.574 0.558 0.799 17 0.551 0.700 0.890 0.613 0.556 0.826 0.608 0.611 0.861 18 0.577 0.643 0.863 0.656 0.645 0.918 0.640 0.670 0.926 19 0.601 0.57 5 0.830 0.698 0.721 1.002 0.668 0.717 0.979 20 0.624 0.507 0.803 0.738 0.795 1.084 0.694 0.761 1.028 21 0.647 0.446 0.784 0.775 0.871 1.164 0.716 0.806 1.077 22 0.668 0.389 0.772 0.808 0.944 1.241 0.735 0.850 1.122 23 0.689 0.321 0.758 0.837 0.998 1.30 1 0.752 0.876 1.153 24 0.708 0.227 0.742 0.863 1.017 1.333 0.767 0.871 1.160 25 0.727 0.147 0.740 0.889 1.045 1.370 0.783 0.875 1.173 26 0.745 0.086 0.748 0.915 1.086 1.419 0.798 0.892 1.195 27 0.762 0.039 0.761 0.941 1.139 1.476 0.812 0.919 1.225 28 0.778 0.002 0.776 0.963 1.193 1.532 0.825 0.947 1.254 29 0.793 0.034 0.792 0.979 1.244 1.582 0.832 0.975 1.280 30 0.806 0.052 0.806 0.988 1.293 1.625 0.831 1.005 1.303 31 0.815 0.070 0.816 0.986 1.320 1.646 0.820 1.019 1.307 32 0.818 0.073 0.820 0 .972 1.333 1.648 0.794 1.025 1.296 33 0.814 0.074 0.815 0.943 1.317 1.618 0.752 1.007 1.256 34 0.799 0.078 0.801 0.901 1.263 1.550 0.692 0.955 1.179 35 0.770 0.088 0.773 0.847 1.169 1.442 0.614 0.862 1.057 36 0.722 0.003 0.720 0.784 1.146 1.387 0.51 6 0.833 0.979 37 0.645 0.020 0.643 0.708 1.018 1.239 0.400 0.691 0.798 38 0.533 0.033 0.532 0.607 0.780 0.988 0.268 0.442 0.516 39 0.393 0.120 0.410 0.446 0.682 0.814 0.134 0.387 0.410 40 0.208 0.220 0.302 0.147 0.327 0.358 0.107 1.326 1.330 Note: T est scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 268

268 Table A 103. method condition 5 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bi as RMSD 0 0.000 0.545 0.545 0.000 0.545 0.545 0.120 2.419 2.422 1 0.159 0.839 0.854 0.362 1.036 1.097 0.171 0.354 0.393 2 0.229 1.118 1.141 0.556 1.257 1.374 0.205 0.415 0.463 3 0.194 1.226 1.241 0.581 1.143 1.282 0.215 0.418 0.470 4 0.204 1.412 1.427 0.561 1.101 1.235 0.220 0.578 0.618 5 0.245 1.693 1.711 0.526 1.073 1.195 0.229 0.809 0.841 6 0.227 1.992 2.005 0.477 1.054 1.156 0.241 1.077 1.104 7 0.233 2.286 2.298 0.423 1.061 1.142 0.253 1.363 1.386 8 0.245 2.621 2.632 0.3 76 1.133 1.194 0.263 1.649 1.670 9 0.228 2.971 2.980 0.376 1.341 1.392 0.269 1.953 1.972 10 0.234 3.332 3.340 0.380 1.645 1.688 0.270 2.258 2.274 11 0.228 3.698 3.705 0.393 1.949 1.988 0.268 2.537 2.551 12 0.210 4.012 4.018 0.410 2.222 2.2 59 0.260 2.766 2.778 13 0.209 4.328 4.333 0.418 2.492 2.527 0.249 2.982 2.992 14 0.194 4.690 4.694 0.415 2.794 2.825 0.234 3.226 3.235 15 0.174 4.996 4.999 0.404 3.049 3.076 0.216 3.423 3.430 16 0.165 5.263 5.266 0.389 3.260 3.283 0.197 3. 577 3.582 17 0.149 5.566 5.568 0.369 3.493 3.512 0.176 3.757 3.761 18 0.129 5.840 5.842 0.343 3.700 3.716 0.155 3.919 3.922 19 0.114 6.072 6.073 0.312 3.871 3.883 0.135 4.047 4.049 20 0.105 6.309 6.310 0.280 4.047 4.057 0.118 4.175 4.177 2 1 0.095 6.525 6.526 0.247 4.219 4.227 0.105 4.285 4.286 22 0.092 6.721 6.722 0.211 4.395 4.400 0.101 4.382 4.383 23 0.098 6.900 6.901 0.177 4.573 4.577 0.106 4.465 4.466 24 0.112 7.067 7.068 0.148 4.757 4.759 0.122 4.538 4.540 25 0.134 7.1 82 7.183 0.129 4.903 4.905 0.143 4.562 4.564 26 0.158 7.293 7.295 0.127 5.066 5.067 0.169 4.591 4.594 27 0.184 7.367 7.370 0.141 5.212 5.214 0.195 4.596 4.600 28 0.211 7.382 7.385 0.169 5.319 5.321 0.221 4.560 4.566 29 0.238 7.392 7.395 0.2 03 5.435 5.439 0.244 4.543 4.550 30 0.265 7.388 7.393 0.239 5.548 5.553 0.266 4.538 4.546 31 0.291 7.362 7.367 0.276 5.643 5.650 0.285 4.531 4.540 32 0.316 7.282 7.289 0.312 5.693 5.701 0.304 4.489 4.499 33 0.340 7.168 7.176 0.348 5.719 5. 729 0.323 4.423 4.435 34 0.364 6.980 6.990 0.386 5.684 5.697 0.343 4.290 4.304 35 0.389 6.805 6.816 0.426 5.673 5.689 0.363 4.171 4.187 36 0.411 6.385 6.398 0.465 5.438 5.458 0.383 3.806 3.825 37 0.434 5.788 5.805 0.493 5.075 5.098 0.395 3 .253 3.277 38 0.437 4.954 4.974 0.502 4.574 4.602 0.374 2.416 2.444 39 0.383 3.314 3.336 0.466 3.394 3.426 0.248 0.643 0.689 40 0.230 1.469 1.487 0.348 1.962 1.993 0.107 1.158 1.163 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 269

269 Table A 104. SEE, Bias, and RMSD for O DL d irect method condition 5 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.545 0.545 0.000 0. 545 0.545 0.120 2.419 2.422 1 0.212 0.017 0.213 0.546 0.636 0.838 0.209 0.031 0.210 2 0.253 0.007 0.252 0.549 0.740 0.920 0.257 0.213 0.333 3 0.298 0.083 0.309 0.554 0.919 1.073 0.291 0.474 0.556 4 0.343 0.048 0.346 0.565 0.981 1.131 0.327 0.624 0.70 4 5 0.357 0.030 0.358 0.580 1.014 1.168 0.364 0.716 0.803 6 0.379 0.141 0.404 0.608 1.047 1.210 0.400 0.756 0.855 7 0.389 0.285 0.482 0.628 1.106 1.271 0.431 0.750 0.864 8 0.396 0.467 0.611 0.653 1.204 1.369 0.455 0.707 0.841 9 0.399 0.702 0.807 0.696 1.266 1.444 0.471 0.612 0.771 10 0.394 0.956 1.033 0.715 1.264 1.451 0.480 0.481 0.679 11 0.389 1.210 1.271 0.720 1.211 1.408 0.482 0.342 0.590 12 0.385 1.451 1.501 0.713 1.135 1.340 0.480 0.222 0.528 13 0.376 1.705 1.746 0.697 1.010 1.227 0. 472 0.087 0.479 14 0.364 2.000 2.033 0.674 0.804 1.048 0.458 0.100 0.468 15 0.353 2.264 2.291 0.644 0.599 0.878 0.440 0.261 0.510 16 0.339 2.503 2.526 0.606 0.396 0.722 0.417 0.396 0.574 17 0.322 2.775 2.794 0.560 0.127 0.572 0.390 0.574 0.694 18 0.305 3.033 3.048 0.510 0.158 0.532 0.361 0.748 0.830 19 0.290 3.266 3.279 0.458 0.440 0.634 0.331 0.901 0.959 20 0.274 3.506 3.517 0.403 0.748 0.849 0.299 1.068 1.109 21 0.259 3.731 3.740 0.348 1.061 1.117 0.267 1.230 1.258 22 0.247 3 .948 3.956 0.298 1.389 1.421 0.237 1.393 1.413 23 0.240 4.159 4.166 0.254 1.729 1.747 0.209 1.557 1.571 24 0.239 4.367 4.373 0.228 2.085 2.098 0.189 1.727 1.737 25 0.246 4.528 4.535 0.228 2.415 2.425 0.182 1.861 1.869 26 0.262 4.694 4.702 0 .260 2.765 2.777 0.191 2.012 2.021 27 0.287 4.832 4.840 0.315 3.098 3.114 0.214 2.149 2.159 28 0.320 4.917 4.927 0.384 3.383 3.405 0.245 2.249 2.262 29 0.358 5.003 5.016 0.462 3.668 3.696 0.282 2.367 2.383 30 0.402 5.081 5.097 0.547 3.939 3.977 0.328 2.490 2.512 31 0.453 5.138 5.158 0.635 4.186 4.234 0.382 2.607 2.635 32 0.513 5.144 5.170 0.725 4.375 4.435 0.445 2.683 2.720 33 0.582 5.118 5.151 0.806 4.523 4.594 0.508 2.738 2.784 34 0.654 5.023 5.065 0.869 4.585 4.667 0.549 2.734 2.788 35 0.713 4.952 5.003 0.877 4.653 4.734 0.552 2.765 2.820 36 0.743 4.659 4.718 0.829 4.481 4.557 0.523 2.589 2.641 37 0.733 4.230 4.293 0.770 4.167 4.237 0.468 2.291 2.338 38 0.665 3.649 3.709 0.687 3.724 3.786 0.379 1.826 1.865 39 0.507 2.393 2.445 0.561 2.680 2.738 0.226 0.552 0.597 40 0.270 1.048 1.082 0.371 1.498 1.543 0.107 1.158 1.163 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 270

270 Table A 105. SEE, Bias, and RMSD for ODL TCF method condition 5 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.545 0.545 0.000 0.545 0.545 0.120 2.419 2.422 1 0.185 0.046 0.191 0.547 0.650 0.8 49 0.188 0.031 0.190 2 0.195 0.069 0.207 0.527 0.751 0.917 0.209 0.218 0.302 3 0.199 0.171 0.262 0.499 0.929 1.054 0.212 0.485 0.529 4 0.197 0.144 0.243 0.462 0.989 1.091 0.212 0.643 0.677 5 0.192 0.069 0.204 0.414 1.017 1.098 0.218 0.745 0.776 6 0.19 1 0.040 0.194 0.364 1.041 1.103 0.226 0.794 0.825 7 0.196 0.181 0.266 0.329 1.082 1.130 0.232 0.796 0.829 8 0.208 0.359 0.414 0.327 1.154 1.200 0.236 0.761 0.796 9 0.210 0.595 0.631 0.336 1.191 1.238 0.237 0.672 0.712 10 0.205 0.855 0.879 0.342 1. 167 1.216 0.237 0.547 0.595 11 0.206 1.112 1.131 0.343 1.099 1.151 0.236 0.413 0.475 12 0.215 1.355 1.372 0.339 1.015 1.070 0.236 0.297 0.379 13 0.217 1.614 1.628 0.334 0.891 0.951 0.238 0.166 0.289 14 0.218 1.915 1.927 0.325 0.693 0.765 0.240 0.0 18 0.240 15 0.224 2.183 2.195 0.312 0.503 0.591 0.244 0.176 0.300 16 0.233 2.428 2.439 0.295 0.322 0.436 0.249 0.309 0.397 17 0.238 2.707 2.718 0.280 0.080 0.291 0.255 0.486 0.549 18 0.245 2.972 2.982 0.273 0.176 0.324 0.262 0.659 0.709 19 0. 255 3.211 3.221 0.272 0.426 0.505 0.271 0.811 0.855 20 0.265 3.459 3.469 0.276 0.701 0.753 0.279 0.978 1.016 21 0.274 3.691 3.701 0.288 0.980 1.021 0.288 1.140 1.175 22 0.285 3.916 3.927 0.308 1.273 1.310 0.298 1.303 1.336 23 0.298 4.136 4 .147 0.333 1.579 1.614 0.307 1.468 1.500 24 0.310 4.353 4.364 0.362 1.904 1.938 0.317 1.640 1.670 25 0.323 4.524 4.535 0.395 2.205 2.240 0.327 1.775 1.805 26 0.336 4.701 4.713 0.431 2.529 2.565 0.338 1.929 1.958 27 0.351 4.849 4.861 0.471 2.835 2.874 0.349 2.068 2.097 28 0.366 4.945 4.959 0.512 3.097 3.138 0.362 2.170 2.200 29 0.382 5.043 5.058 0.554 3.360 3.405 0.378 2.289 2.320 30 0.398 5.133 5.149 0.595 3.615 3.663 0.396 2.416 2.448 31 0.417 5.204 5.221 0.634 3.854 3.906 0.415 2.535 2.569 32 0.438 5.225 5.243 0.668 4.045 4.099 0.436 2.615 2.651 33 0.461 5.216 5.236 0.700 4.197 4.255 0.455 2.674 2.712 34 0.486 5.140 5.163 0.726 4.268 4.329 0.471 2.673 2.714 35 0.508 5.087 5.112 0.743 4.345 4.408 0.479 2.707 2.749 36 0.526 4.811 4.839 0.745 4.188 4.253 0.474 2.532 2.576 37 0.529 4.389 4.421 0.726 3.901 3.968 0.446 2.238 2.282 38 0.501 3.805 3.838 0.673 3.501 3.564 0.373 1.780 1.819 39 0.409 2.518 2.551 0.570 2.516 2.579 0.216 0.525 0.567 40 0 .238 1.113 1.138 0.392 1.416 1.469 0.107 1.158 1.163 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 271

271 Table A 106. SEE, Bias, and RMSD for ODL ICF method condition 5 Item Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.545 0.545 0.000 0.545 0.545 0.120 2.419 2.422 1 0.125 0.476 0.492 0.508 0.383 0.635 0.160 0.345 0.380 2 0.129 0.474 0.491 0.525 0.329 0.619 0. 188 0.310 0.362 3 0.134 0.380 0.403 0.493 0.164 0.519 0.192 0.186 0.267 4 0.143 0.403 0.427 0.453 0.117 0.467 0.194 0.171 0.258 5 0.159 0.473 0.499 0.406 0.097 0.417 0.199 0.166 0.259 6 0.164 0.572 0.595 0.355 0.087 0.365 0.204 0.149 0.252 7 0.158 0.677 0.695 0.306 0.090 0.318 0.204 0.115 0.234 8 0.148 0.779 0.792 0.292 0.100 0.308 0.199 0.066 0.210 9 0.138 0.904 0.914 0.305 0.150 0.340 0.190 0.031 0.192 10 0.131 1.043 1.051 0.318 0.194 0.372 0.179 0.003 0.178 11 0.129 1.1 76 1.183 0.319 0.200 0.376 0.167 0.035 0.170 12 0.131 1.281 1.288 0.308 0.151 0.343 0.158 0.104 0.189 13 0.133 1.395 1.401 0.287 0.091 0.301 0.152 0.166 0.225 14 0.134 1.550 1.556 0.261 0.068 0.269 0.150 0.181 0.235 15 0.139 1.666 1.671 0.233 0.010 0.233 0.153 0.227 0.273 16 0.147 1.745 1.751 0.208 0.079 0.222 0.160 0.299 0.339 17 0.159 1.855 1.862 0.188 0.132 0.230 0.171 0.330 0.372 18 0.173 1.951 1.959 0.178 0.193 0.262 0.185 0.365 0.409 19 0.190 2.016 2.025 0.181 0.273 0.328 0.202 0. 417 0.463 20 0.208 2.087 2.097 0.198 0.336 0.390 0.220 0.452 0.502 21 0.227 2.143 2.155 0.225 0.398 0.457 0.238 0.484 0.539 22 0.246 2.194 2.207 0.260 0.451 0.520 0.258 0.507 0.568 23 0.266 2.239 2.254 0.300 0.492 0.576 0.276 0.518 0.587 24 0.286 2.283 2.301 0.343 0.514 0.618 0.295 0.512 0.590 25 0.307 2.285 2.306 0.388 0.560 0.681 0.312 0.531 0.616 26 0.326 2.298 2.320 0.436 0.582 0.727 0.329 0.525 0.619 27 0.346 2.284 2.310 0.488 0.618 0.787 0.345 0.529 0.631 28 0.365 2.222 2.251 0.542 0 .693 0.879 0.362 0.569 0.674 29 0.384 2.162 2.196 0.596 0.750 0.957 0.379 0.594 0.704 30 0.403 2.094 2.132 0.647 0.793 1.022 0.396 0.612 0.729 31 0.422 2.004 2.048 0.692 0.826 1.076 0.414 0.632 0.755 32 0.441 1.863 1.914 0.727 0.870 1.132 0.429 0.6 76 0.800 33 0.460 1.695 1.756 0.750 0.892 1.164 0.441 0.712 0.837 34 0.477 1.477 1.552 0.755 0.912 1.183 0.446 0.756 0.877 35 0.488 1.326 1.412 0.742 0.812 1.099 0.440 0.688 0.816 36 0.493 1.031 1.142 0.707 0.799 1.066 0.417 0.709 0.821 37 0.475 0.747 0.884 0.644 0.721 0.966 0.368 0.659 0.754 38 0.416 0.569 0.704 0.552 0.501 0.744 0.281 0.457 0.536 39 0.304 0.120 0.326 0.432 0.580 0.723 0.144 0.548 0.566 40 0.183 0.147 0.234 0.248 0.463 0.525 0.107 1.158 1.163 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 272

272 Table A 107. SEE, Bias, and RMSD for NOP method condition 5 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RM SD 0 0.000 0.545 0.545 0.000 0.545 0.545 0.120 2.419 2.422 1 0.000 1.322 1.322 0.000 1.322 1.322 0.133 0.661 0.674 2 0.070 2.232 2.234 0.036 2.254 2.254 0.161 0.962 0.975 3 0.097 2.541 2.543 0.206 2.978 2.985 0.165 1.212 1.223 4 0.145 3.10 0 3.103 0.413 3.430 3.454 0.162 1.630 1.638 5 0.149 3.460 3.463 0.493 3.525 3.559 0.159 2.122 2.128 6 0.140 4.015 4.017 0.461 3.577 3.606 0.157 2.654 2.659 7 0.173 4.440 4.443 0.440 3.653 3.679 0.156 3.207 3.211 8 0.136 4.972 4.974 0.396 3 .827 3.847 0.156 3.763 3.766 9 0.177 5.463 5.466 0.355 4.154 4.169 0.157 4.339 4.342 10 0.137 6.004 6.005 0.381 4.575 4.591 0.159 4.915 4.918 11 0.174 6.512 6.515 0.316 5.068 5.077 0.162 5.462 5.465 12 0.146 6.987 6.988 0.346 5.542 5.553 0. 166 5.951 5.954 13 0.176 7.450 7.452 0.282 6.059 6.065 0.170 6.413 6.416 14 0.159 7.947 7.949 0.307 6.636 6.643 0.177 6.885 6.887 15 0.183 8.380 8.382 0.269 7.156 7.161 0.185 7.285 7.287 16 0.181 8.778 8.780 0.279 7.658 7.663 0.195 7.614 7 .617 17 0.194 9.167 9.169 0.272 8.134 8.139 0.208 7.942 7.945 18 0.213 9.543 9.545 0.287 8.576 8.581 0.224 8.223 8.226 19 0.222 9.857 9.859 0.277 8.931 8.935 0.243 8.442 8.446 20 0.245 10.152 10.155 0.292 9.248 9.252 0.266 8.636 8.640 21 0 .270 10.421 10.425 0.294 9.512 9.516 0.291 8.787 8.792 22 0.292 10.659 10.663 0.291 9.724 9.728 0.320 8.903 8.909 23 0.318 10.868 10.873 0.294 9.886 9.890 0.351 8.985 8.992 24 0.348 11.058 11.063 0.303 10.010 10.015 0.385 9.038 9.046 25 0.3 80 11.184 11.190 0.317 10.058 10.063 0.420 9.021 9.030 26 0.412 11.297 11.304 0.335 10.082 10.088 0.455 8.986 8.998 27 0.445 11.361 11.370 0.361 10.047 10.054 0.490 8.902 8.916 28 0.477 11.353 11.363 0.394 9.928 9.936 0.521 8.750 8.766 29 0 .508 11.322 11.333 0.432 9.779 9.789 0.549 8.587 8.605 30 0.538 11.257 11.269 0.471 9.597 9.609 0.571 8.407 8.426 31 0.567 11.142 11.157 0.508 9.381 9.394 0.588 8.201 8.222 32 0.593 10.946 10.962 0.539 9.110 9.126 0.600 7.937 7.960 33 0.616 10.685 10.702 0.563 8.818 8.836 0.608 7.630 7.655 34 0.637 10.321 10.341 0.582 8.475 8.495 0.615 7.232 7.258 35 0.655 9.944 9.966 0.600 8.172 8.193 0.620 6.808 6.836 36 0.669 9.301 9.325 0.621 7.651 7.676 0.624 6.069 6.101 37 0.677 8.446 8.473 0.634 6.998 7.027 0.614 5.018 5.055 38 0.662 7.277 7.307 0.626 6.209 6.240 0.551 3.479 3.522 39 0.581 5.105 5.138 0.576 4.707 4.742 0.314 0.893 0.946 40 0.344 2.390 2.415 0.411 2.740 2.770 0.107 1.158 1.163 Note: Test scores at both en ds (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 273

273 Table A 108. method condition 6 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.5 19 0.519 0.000 0.519 0.519 0.120 2.445 2.448 1 0.104 1.111 1.116 0.255 0.991 1.023 0.175 0.244 0.300 2 0.122 1.490 1.495 0.504 1.379 1.468 0.201 0.370 0.421 3 0.243 1.687 1.704 0.582 1.308 1.431 0.211 0.411 0.461 4 0.209 1.928 1.940 0.569 1 .175 1.305 0.220 0.513 0.557 5 0.224 2.217 2.228 0.536 1.152 1.270 0.233 0.779 0.812 6 0.254 2.607 2.619 0.496 1.167 1.267 0.248 1.104 1.132 7 0.227 2.994 3.002 0.449 1.241 1.320 0.263 1.464 1.487 8 0.258 3.326 3.336 0.426 1.325 1.392 0.275 1.759 1.781 9 0.242 3.655 3.663 0.420 1.497 1.555 0.284 2.028 2.048 10 0.242 4.011 4.018 0.406 1.791 1.836 0.288 2.334 2.352 11 0.249 4.390 4.397 0.426 2.129 2.171 0.287 2.635 2.650 12 0.228 4.752 4.758 0.441 2.463 2.502 0.282 2.914 2.928 13 0.226 5.076 5.081 0.449 2.767 2.803 0.271 3.157 3.168 14 0.220 5.412 5.416 0.450 3.068 3.101 0.257 3.395 3.404 15 0.197 5.739 5.742 0.444 3.361 3.390 0.239 3.628 3.636 16 0.181 6.013 6.016 0.426 3.608 3.633 0.219 3.819 3.825 17 0.172 6.292 6.294 0.402 3.850 3.871 0.197 4.011 4.016 18 0.151 6.545 6.547 0.377 4.061 4.078 0.175 4.180 4.184 19 0.129 6.755 6.756 0.348 4.238 4.252 0.152 4.320 4.323 20 0.111 6.953 6.954 0.314 4.419 4.430 0.131 4.459 4.461 21 0.100 7.110 7.110 0.275 4.576 4.584 0.113 4.562 4.564 22 0.094 7.254 7.255 0.233 4.739 4.744 0.102 4.655 4.656 23 0.095 7.380 7.380 0.194 4.907 4.910 0.100 4.735 4.736 24 0.107 7.488 7.489 0.159 5.083 5.085 0.111 4.805 4.807 25 0.128 7.549 7.550 0.133 5.237 5.239 0.130 4.837 4.839 26 0.154 7.603 7.604 0.121 5.407 5.408 0.156 4.871 4.873 27 0.183 7.622 7.624 0.131 5.563 5.564 0.183 4.885 4.888 28 0.215 7.592 7.595 0.159 5.690 5.692 0.211 4.869 4.873 29 0.246 7.537 7.541 0.198 5.806 5.809 0.238 4.849 4.855 30 0.277 7.472 7.477 0.241 5.918 5.923 0.263 4.839 4.846 31 0.307 7.404 7.411 0.284 6.028 6.034 0.287 4.843 4.851 32 0.335 7.301 7.309 0.323 6.098 6.106 0.310 4.821 4.831 33 0.361 7.179 7.188 0.361 6.149 6.159 0.333 4.783 4.79 5 34 0.386 7.008 7.019 0.400 6.154 6.167 0.357 4.692 4.705 35 0.410 6.844 6.857 0.443 6.172 6.188 0.380 4.596 4.612 36 0.431 6.440 6.455 0.485 5.959 5.978 0.403 4.237 4.256 37 0.448 5.885 5.902 0.521 5.622 5.646 0.418 3.678 3.701 38 0.445 5.098 5.117 0.535 5.119 5.147 0.401 2.758 2.786 39 0.394 3.623 3.644 0.511 4.003 4.035 0.270 0.884 0.924 40 0.264 1.845 1.864 0.397 2.504 2.535 0.107 1.314 1.318 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedu re (Kolen, 1981).

PAGE 274

274 Table A 109. SEE, Bias, and RMSD for ODL d irect method condition 6 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.519 0.519 0.000 0.519 0.519 0.120 2.445 2.448 1 0.103 0.591 0.600 0.501 0.234 0.552 0.185 0.095 0.208 2 0.220 0.798 0.827 0.524 0.203 0.561 0.211 0.077 0.224 3 0.248 0.904 0.937 0.523 0.033 0.523 0.228 0.047 0.232 4 0.235 0.961 0.989 0.522 0.118 0.534 0.250 0.145 0.289 5 0.254 1. 117 1.146 0.535 0.150 0.554 0.276 0.105 0.295 6 0.281 1.332 1.361 0.526 0.143 0.544 0.301 0.017 0.301 7 0.279 1.595 1.619 0.496 0.104 0.505 0.321 0.105 0.337 8 0.270 1.782 1.802 0.513 0.112 0.524 0.333 0.170 0.373 9 0.268 1.941 1.960 0.542 0.117 0.553 0.337 0.220 0.401 10 0.269 2.159 2.175 0.565 0.056 0.567 0.332 0.324 0.463 11 0.259 2.394 2.408 0.580 0.029 0.579 0.321 0.441 0.544 12 0.242 2.619 2.630 0.584 0.128 0.596 0.303 0.556 0.633 13 0.224 2.814 2.823 0.577 0.225 0.618 0.282 0.655 0.713 14 0.211 3.015 3.023 0.558 0.349 0.657 0.257 0.768 0.809 15 0.201 3.226 3.232 0.526 0.497 0.723 0.232 0.893 0.923 16 0.190 3.401 3.406 0.482 0.631 0.793 0.209 0.991 1.013 17 0.181 3.580 3.585 0.427 0.796 0.903 0.192 1.105 1.122 18 0.179 3.736 3.741 0.366 0.969 1.035 0.182 1.209 1.222 19 0.186 3.863 3.868 0.305 1.143 1.183 0.184 1.295 1.308 20 0.204 3.992 3.997 0.251 1.347 1.370 0.196 1.393 1.406 21 0.229 4.087 4.093 0.213 1.543 1.558 0.218 1.468 1.484 22 0.260 4 .175 4.183 0.209 1.755 1.767 0.247 1.546 1.565 23 0.295 4.253 4.263 0.245 1.982 1.997 0.279 1.626 1.649 24 0.334 4.323 4.336 0.309 2.227 2.248 0.315 1.710 1.739 25 0.374 4.357 4.373 0.389 2.457 2.487 0.354 1.771 1.806 26 0.416 4.392 4.412 0 .479 2.706 2.748 0.395 1.848 1.889 27 0.458 4.403 4.426 0.575 2.942 2.997 0.440 1.914 1.964 28 0.500 4.374 4.403 0.666 3.143 3.212 0.485 1.957 2.015 29 0.543 4.330 4.364 0.743 3.327 3.408 0.525 1.997 2.065 30 0.584 4.286 4.325 0.799 3.503 3.593 0.555 2.047 2.121 31 0.624 4.248 4.294 0.839 3.676 3.770 0.571 2.110 2.186 32 0.653 4.183 4.234 0.866 3.806 3.903 0.573 2.148 2.223 33 0.674 4.107 4.162 0.884 3.904 4.002 0.563 2.177 2.249 34 0.682 3.990 4.048 0.892 3.936 4.035 0.541 2.169 2.235 35 0.675 3.895 3.953 0.880 3.966 4.062 0.511 2.188 2.247 36 0.656 3.589 3.648 0.829 3.766 3.856 0.469 2.003 2.057 37 0.617 3.202 3.261 0.763 3.469 3.552 0.411 1.731 1.779 38 0.529 2.720 2.770 0.671 3.083 3.154 0.323 1.320 1.359 39 0.401 1.800 1.844 0.545 2.267 2.332 0.184 0.328 0.376 40 0.242 0.843 0.877 0.369 1.355 1.404 0.107 1.314 1.318 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 275

275 Table A 110. SEE, Bias, and RMSD for ODL TCF method condition 6 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.519 0.519 0.000 0.519 0.519 0.120 2.445 2.448 1 0.091 0.577 0.585 0.503 0.221 0 .548 0.179 0.090 0.200 2 0.194 0.761 0.785 0.509 0.184 0.540 0.190 0.069 0.201 3 0.199 0.863 0.886 0.482 0.010 0.481 0.185 0.059 0.194 4 0.170 0.923 0.939 0.441 0.143 0.463 0.184 0.161 0.244 5 0.176 1.078 1.092 0.404 0.175 0.440 0.190 0.125 0.2 27 6 0.206 1.286 1.303 0.359 0.166 0.395 0.200 0.041 0.204 7 0.212 1.548 1.563 0.314 0.123 0.336 0.209 0.078 0.223 8 0.197 1.738 1.749 0.299 0.127 0.324 0.214 0.141 0.256 9 0.196 1.899 1.909 0.329 0.124 0.351 0.216 0.189 0.287 10 0.205 2.116 2 .126 0.362 0.054 0.365 0.215 0.292 0.362 11 0.205 2.353 2.362 0.386 0.042 0.387 0.211 0.409 0.460 12 0.195 2.581 2.589 0.398 0.149 0.424 0.206 0.525 0.564 13 0.188 2.780 2.786 0.399 0.252 0.471 0.201 0.624 0.656 14 0.187 2.985 2.991 0.389 0 .380 0.543 0.197 0.738 0.764 15 0.189 3.198 3.204 0.370 0.529 0.646 0.193 0.865 0.887 16 0.188 3.378 3.383 0.346 0.661 0.745 0.191 0.966 0.984 17 0.185 3.562 3.567 0.316 0.821 0.880 0.189 1.082 1.098 18 0.183 3.723 3.728 0.286 0.987 1.027 0 .188 1.188 1.202 19 0.184 3.856 3.860 0.258 1.153 1.181 0.188 1.276 1.290 20 0.188 3.989 3.994 0.236 1.348 1.368 0.188 1.377 1.389 21 0.194 4.091 4.095 0.220 1.534 1.549 0.189 1.455 1.467 22 0.201 4.184 4.189 0.216 1.735 1.749 0.190 1.536 1.547 23 0.209 4.268 4.273 0.225 1.952 1.965 0.191 1.618 1.629 24 0.218 4.345 4.351 0.244 2.188 2.201 0.192 1.706 1.717 25 0.228 4.385 4.391 0.268 2.410 2.425 0.194 1.770 1.781 26 0.239 4.427 4.433 0.298 2.652 2.669 0.195 1.850 1.860 27 0. 251 4.443 4.450 0.330 2.882 2.901 0.197 1.920 1.930 28 0.263 4.421 4.429 0.362 3.078 3.100 0.200 1.966 1.976 29 0.275 4.383 4.392 0.394 3.258 3.282 0.204 2.009 2.020 30 0.288 4.345 4.355 0.422 3.431 3.457 0.211 2.062 2.072 31 0.303 4.314 4 .324 0.448 3.601 3.629 0.220 2.126 2.137 32 0.319 4.254 4.266 0.474 3.731 3.761 0.231 2.164 2.176 33 0.338 4.183 4.197 0.500 3.830 3.862 0.245 2.192 2.205 34 0.359 4.071 4.087 0.526 3.865 3.900 0.259 2.181 2.197 35 0.382 3.979 3.997 0.548 3.897 3.936 0.273 2.198 2.215 36 0.401 3.675 3.696 0.560 3.702 3.743 0.281 2.010 2.029 37 0.406 3.286 3.311 0.557 3.412 3.457 0.277 1.734 1.756 38 0.397 2.796 2.824 0.528 3.036 3.081 0.243 1.320 1.342 39 0.338 1.859 1.889 0.461 2.235 2.281 0.155 0.328 0.363 40 0.230 0.876 0.906 0.338 1.339 1.381 0.107 1.314 1.318 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 276

276 Table A 111. SEE, Bias, and RMSD for ODL ICF method condition 6 Item Char acteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.519 0.519 0.000 0.519 0.519 0.120 2.445 2.448 1 0.064 0.550 0.553 0.481 0.301 0.566 0.170 0.201 0.263 2 0.143 0.673 0.6 88 0.494 0.286 0.570 0.183 0.213 0.280 3 0.176 0.734 0.755 0.469 0.133 0.486 0.179 0.112 0.210 4 0.160 0.768 0.785 0.430 0.002 0.429 0.175 0.024 0.176 5 0.155 0.882 0.895 0.376 0.012 0.376 0.177 0.040 0.181 6 0.165 1.013 1.026 0.323 0.012 0. 322 0.183 0.066 0.194 7 0.187 1.167 1.182 0.279 0.071 0.288 0.190 0.091 0.210 8 0.211 1.266 1.283 0.267 0.085 0.279 0.199 0.035 0.201 9 0.229 1.345 1.364 0.279 0.102 0.297 0.211 0.053 0.217 10 0.246 1.466 1.487 0.292 0.147 0.326 0.226 0.095 0.245 11 0.266 1.590 1.612 0.299 0.171 0.344 0.247 0.127 0.277 12 0.287 1.705 1.729 0.302 0.166 0.344 0.270 0.158 0.313 13 0.311 1.794 1.821 0.303 0.125 0.327 0.297 0.205 0.360 14 0.337 1.890 1.920 0.307 0.088 0.318 0.325 0.234 0.399 15 0.364 1.991 2.024 0.317 0.060 0.321 0.353 0.247 0.430 16 0.392 2.056 2.093 0.334 0.003 0.334 0.382 0.284 0.475 17 0.418 2.127 2.168 0.359 0.039 0.361 0.411 0.302 0.509 18 0.445 2.178 2.223 0.391 0.091 0.400 0.440 0.327 0.547 19 0.471 2.204 2.254 0.428 0.155 0.454 0.467 0.364 0.591 20 0.498 2.233 2.288 0.469 0.200 0.508 0.494 0.382 0.623 21 0.524 2.232 2.292 0.512 0.256 0.572 0.518 0.414 0.662 22 0.549 2.228 2.294 0.558 0.297 0.630 0.541 0.432 0.691 23 0.572 2.221 2.293 0.604 0.321 0.683 0.560 0 .436 0.709 24 0.594 2.216 2.294 0.651 0.323 0.725 0.577 0.422 0.714 25 0.613 2.185 2.268 0.698 0.333 0.772 0.591 0.421 0.724 26 0.629 2.164 2.253 0.747 0.318 0.810 0.602 0.395 0.719 27 0.644 2.128 2.222 0.796 0.308 0.852 0.613 0.375 0.717 28 0.656 2.059 2.161 0.847 0.325 0.905 0.624 0.380 0.729 29 0.668 1.978 2.087 0.897 0.348 0.960 0.636 0.391 0.745 30 0.679 1.896 2.014 0.945 0.363 1.010 0.649 0.398 0.760 31 0.692 1.818 1.945 0.989 0.361 1.050 0.662 0.392 0.768 32 0.704 1.710 1.849 1.025 0.371 1.088 0.673 0.404 0.783 33 0.717 1.594 1.747 1.049 0.365 1.109 0.678 0.404 0.788 34 0.726 1.452 1.622 1.056 0.356 1.112 0.674 0.405 0.785 35 0.728 1.365 1.546 1.039 0.257 1.068 0.654 0.320 0.727 36 0.714 1.129 1.335 0.995 0.269 1.028 0.610 0. 346 0.700 37 0.667 0.910 1.127 0.917 0.223 0.941 0.529 0.314 0.614 38 0.574 0.761 0.952 0.797 0.082 0.799 0.392 0.189 0.434 39 0.421 0.376 0.564 0.639 0.187 0.664 0.180 0.318 0.365 40 0.248 0.076 0.259 0.411 0.189 0.451 0.107 1.314 1.318 Note: Te st scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 277

277 Table A 112. SEE, Bias, RMSD for NOP method condition 6 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bia s RMSD SEE Bias RMSD 0 0.000 0.519 0.519 0.000 0.519 0.519 0.120 2.445 2.448 1 0.000 1.158 1.158 0.000 1.158 1.158 0.135 0.540 0.556 2 0.000 2.137 2.137 0.000 2.137 2.137 0.160 0.908 0.922 3 0.027 2.970 2.970 0.092 2.954 2.956 0.165 1.194 1. 205 4 0.167 3.406 3.411 0.313 3.501 3.515 0.164 1.546 1.554 5 0.096 4.038 4.039 0.471 3.763 3.792 0.162 2.063 2.070 6 0.187 4.579 4.583 0.482 3.879 3.909 0.160 2.644 2.649 7 0.112 5.112 5.113 0.445 4.033 4.057 0.159 3.265 3.269 8 0.154 5. 682 5.684 0.423 4.203 4.224 0.159 3.828 3.831 9 0.157 6.105 6.107 0.428 4.473 4.493 0.160 4.371 4.374 10 0.141 6.702 6.703 0.374 4.886 4.900 0.161 4.956 4.959 11 0.174 7.211 7.213 0.390 5.356 5.370 0.163 5.537 5.539 12 0.141 7.776 7.777 0.3 37 5.877 5.887 0.166 6.093 6.095 13 0.175 8.267 8.269 0.354 6.389 6.399 0.169 6.601 6.603 14 0.152 8.784 8.786 0.293 6.944 6.950 0.173 7.089 7.091 15 0.179 9.265 9.267 0.314 7.516 7.522 0.179 7.551 7.553 16 0.172 9.721 9.723 0.276 8.036 8. 040 0.186 7.943 7.945 17 0.186 10.130 10.132 0.286 8.569 8.574 0.195 8.309 8.311 18 0.204 10.529 10.531 0.275 9.025 9.029 0.208 8.621 8.624 19 0.210 10.855 10.857 0.293 9.429 9.434 0.223 8.875 8.878 20 0.236 11.160 11.163 0.281 9.788 9.792 0.243 9.101 9.104 21 0.262 11.417 11.420 0.288 10.061 10.065 0.266 9.265 9.269 22 0.284 11.637 11.641 0.295 10.295 10.300 0.293 9.396 9.400 23 0.311 11.822 11.826 0.296 10.481 10.485 0.324 9.492 9.497 24 0.346 11.979 11.984 0.298 10.623 10. 627 0.357 9.557 9.564 25 0.382 12.078 12.084 0.307 10.693 10.698 0.393 9.563 9.571 26 0.420 12.155 12.162 0.323 10.735 10.740 0.430 9.548 9.558 27 0.458 12.180 12.189 0.348 10.720 10.726 0.468 9.487 9.498 28 0.497 12.139 12.149 0.380 10.631 10.638 0.503 9.364 9.378 29 0.535 12.048 12.060 0.419 10.489 10.497 0.536 9.206 9.221 30 0.572 11.920 11.934 0.460 10.310 10.320 0.564 9.025 9.042 31 0.607 11.758 11.774 0.501 10.110 10.123 0.588 8.830 8.849 32 0.639 11.525 11.543 0.539 9. 864 9.879 0.608 8.584 8.605 33 0.669 11.238 11.258 0.571 9.600 9.617 0.624 8.300 8.324 34 0.695 10.867 10.889 0.598 9.300 9.319 0.638 7.938 7.963 35 0.717 10.473 10.497 0.624 9.027 9.049 0.654 7.533 7.561 36 0.731 9.808 9.835 0.655 8.536 8. 561 0.671 6.797 6.830 37 0.734 8.954 8.984 0.684 7.916 7.945 0.682 5.732 5.772 38 0.714 7.789 7.822 0.693 7.111 7.145 0.643 4.069 4.119 39 0.630 5.745 5.779 0.655 5.642 5.680 0.380 1.236 1.293 40 0.404 3.046 3.073 0.495 3.560 3.594 0.107 1 .314 1.318 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 278

278 Table A 113. method condition 7 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD S EE Bias RMSD SEE Bias RMSD 0 0.000 0.183 0.183 0.000 0.183 0.183 0.155 2.969 2.973 1 0.237 0.303 0.385 0.624 0.572 0.845 0.230 0.162 0.281 2 0.252 0.301 0.392 0.608 0.564 0.828 0.263 0.184 0.321 3 0.266 0.358 0.445 0.567 0.589 0.816 0.275 0.251 0.371 4 0.276 0.375 0.465 0.516 0.568 0.767 0.284 0.281 0.399 5 0.283 0.358 0.456 0.478 0.515 0.702 0.296 0.285 0.410 6 0.287 0.382 0.477 0.448 0.503 0.673 0.307 0.333 0.452 7 0.285 0.428 0.514 0.428 0.526 0.677 0.312 0.402 0.508 8 0.279 0.444 0.523 0.418 0. 533 0.677 0.309 0.436 0.534 9 0.267 0.444 0.518 0.416 0.542 0.683 0.299 0.449 0.539 10 0.252 0.439 0.506 0.416 0.553 0.692 0.282 0.451 0.531 11 0.234 0.435 0.493 0.410 0.568 0.700 0.260 0.450 0.519 12 0.214 0.446 0.495 0.396 0.595 0.714 0.235 0.461 0.5 17 13 0.194 0.446 0.486 0.373 0.606 0.711 0.211 0.458 0.504 14 0.176 0.440 0.474 0.343 0.609 0.698 0.188 0.450 0.487 15 0.162 0.443 0.472 0.307 0.617 0.688 0.169 0.450 0.480 16 0.153 0.419 0.446 0.267 0.596 0.653 0.157 0.423 0.451 17 0.152 0.384 0.413 0.226 0.565 0.609 0.154 0.386 0.416 18 0.159 0.362 0.395 0.190 0.548 0.580 0.162 0.363 0.397 19 0.173 0.351 0.391 0.165 0.544 0.569 0.180 0.352 0.395 20 0.193 0.325 0.378 0.160 0.528 0.551 0.203 0.325 0.383 21 0.216 0.297 0.367 0.176 0.513 0.542 0.229 0.297 0.375 22 0.241 0.291 0.377 0.209 0.523 0.563 0.257 0.290 0.387 23 0.266 0.296 0.398 0.253 0.548 0.603 0.285 0.294 0.409 24 0.292 0.315 0.429 0.302 0.589 0.661 0.313 0.311 0.441 25 0.317 0.320 0.450 0.353 0.618 0.711 0.341 0.316 0.464 26 0.342 0 .322 0.470 0.404 0.641 0.757 0.369 0.319 0.487 27 0.367 0.322 0.488 0.454 0.658 0.799 0.399 0.323 0.512 28 0.393 0.300 0.493 0.500 0.644 0.814 0.430 0.307 0.528 29 0.418 0.286 0.506 0.538 0.627 0.825 0.462 0.301 0.550 30 0.442 0.308 0.538 0.567 0.631 0 .848 0.490 0.329 0.590 31 0.463 0.328 0.567 0.584 0.621 0.851 0.512 0.354 0.621 32 0.478 0.369 0.603 0.589 0.621 0.855 0.522 0.394 0.653 33 0.485 0.391 0.622 0.582 0.599 0.834 0.517 0.410 0.658 34 0.480 0.381 0.612 0.566 0.548 0.787 0.494 0.387 0.626 35 0.463 0.409 0.617 0.543 0.543 0.767 0.453 0.394 0.599 36 0.432 0.394 0.584 0.512 0.507 0.720 0.395 0.352 0.528 37 0.386 0.330 0.507 0.474 0.431 0.639 0.323 0.258 0.413 38 0.323 0.290 0.434 0.422 0.385 0.570 0.237 0.194 0.306 39 0.244 0.078 0.256 0.3 38 0.162 0.374 0.142 0.021 0.143 40 0.090 0.218 0.236 0.133 0.235 0.270 0.103 1.425 1.428 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 279

279 Table A 114. SEE, Bias, and RMSD for ODL d irect method condi tion 7 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.183 0.183 0.000 0.183 0.183 0.155 2.969 2.973 1 0.242 0.249 0.347 0.644 0.519 0.825 0.235 0.153 0.280 2 0.257 0.242 0 .352 0.619 0.502 0.796 0.265 0.169 0.314 3 0.272 0.294 0.400 0.578 0.530 0.783 0.271 0.230 0.355 4 0.288 0.307 0.420 0.539 0.503 0.737 0.278 0.250 0.374 5 0.304 0.286 0.417 0.501 0.444 0.668 0.293 0.242 0.379 6 0.320 0.305 0.441 0.464 0.427 0.630 0.313 0.275 0.416 7 0.335 0.348 0.483 0.441 0.444 0.625 0.333 0.332 0.470 8 0.349 0.361 0.502 0.436 0.443 0.620 0.351 0.357 0.500 9 0.362 0.360 0.510 0.445 0.440 0.626 0.365 0.365 0.515 10 0.374 0.355 0.514 0.462 0.438 0.636 0.376 0.365 0.523 11 0.385 0.35 2 0.521 0.479 0.440 0.649 0.385 0.364 0.529 12 0.396 0.365 0.538 0.492 0.457 0.671 0.395 0.377 0.545 13 0.409 0.368 0.549 0.500 0.462 0.680 0.406 0.378 0.554 14 0.422 0.367 0.559 0.503 0.465 0.684 0.419 0.374 0.561 15 0.438 0.375 0.575 0.502 0.478 0.69 3 0.435 0.380 0.577 16 0.454 0.356 0.576 0.501 0.469 0.685 0.454 0.359 0.578 17 0.472 0.327 0.574 0.500 0.455 0.675 0.474 0.329 0.576 18 0.491 0.311 0.580 0.504 0.459 0.680 0.496 0.313 0.585 19 0.510 0.307 0.594 0.514 0.480 0.702 0.518 0.309 0.602 20 0.528 0.288 0.600 0.533 0.490 0.723 0.538 0.290 0.610 21 0.545 0.267 0.606 0.560 0.504 0.752 0.557 0.271 0.618 22 0.560 0.269 0.620 0.595 0.542 0.804 0.573 0.272 0.633 23 0.574 0.283 0.639 0.635 0.594 0.869 0.586 0.286 0.651 24 0.587 0.310 0.662 0.677 0.660 0.945 0.598 0.313 0.674 25 0.599 0.325 0.680 0.719 0.712 1.011 0.610 0.328 0.692 26 0.611 0.336 0.696 0.758 0.754 1.068 0.625 0.342 0.711 27 0.625 0.346 0.713 0.792 0.786 1.115 0.643 0.358 0.734 28 0.639 0.333 0.719 0.819 0.783 1.132 0.664 0.353 0.751 29 0.654 0.329 0.730 0.837 0.774 1.138 0.686 0.358 0.772 30 0.666 0.360 0.756 0.843 0.784 1.149 0.704 0.396 0.806 31 0.674 0.388 0.776 0.837 0.775 1.139 0.712 0.428 0.829 32 0.673 0.435 0.800 0.819 0.774 1.126 0.706 0.474 0.849 33 0.661 0.463 0. 806 0.792 0.748 1.088 0.681 0.493 0.839 34 0.637 0.457 0.783 0.755 0.690 1.022 0.635 0.471 0.790 35 0.600 0.487 0.771 0.713 0.673 0.979 0.569 0.475 0.740 36 0.550 0.471 0.723 0.664 0.623 0.910 0.484 0.426 0.644 37 0.486 0.402 0.629 0.606 0.529 0.803 0. 386 0.317 0.499 38 0.404 0.348 0.532 0.534 0.462 0.705 0.277 0.230 0.360 39 0.302 0.117 0.323 0.424 0.205 0.470 0.166 0.014 0.166 40 0.126 0.226 0.258 0.176 0.251 0.306 0.103 1.425 1.428 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 280

280 Table A 115. SEE, Bias, and RMSD for ODL TCF method condition 7 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.183 0.183 0.0 00 0.183 0.183 0.155 2.969 2.973 1 0.143 0.117 0.185 0.487 0.288 0.565 0.176 0.093 0.198 2 0.146 0.094 0.173 0.452 0.278 0.529 0.182 0.082 0.199 3 0.149 0.129 0.196 0.411 0.315 0.517 0.174 0.117 0.210 4 0.153 0.123 0.196 0.379 0.294 0.479 0.167 0.110 0 .199 5 0.159 0.084 0.179 0.351 0.235 0.422 0.165 0.072 0.179 6 0.165 0.085 0.185 0.319 0.212 0.382 0.169 0.078 0.186 7 0.172 0.112 0.205 0.286 0.212 0.356 0.177 0.110 0.208 8 0.179 0.112 0.211 0.264 0.192 0.326 0.186 0.115 0.218 9 0.186 0.101 0.211 0. 257 0.172 0.309 0.193 0.109 0.221 10 0.192 0.089 0.211 0.258 0.154 0.300 0.198 0.099 0.221 11 0.198 0.082 0.214 0.261 0.142 0.297 0.202 0.094 0.222 12 0.204 0.094 0.224 0.263 0.149 0.302 0.206 0.105 0.231 13 0.210 0.098 0.231 0.263 0.147 0.300 0.211 0. 107 0.236 14 0.216 0.099 0.237 0.260 0.146 0.297 0.216 0.105 0.240 15 0.224 0.110 0.249 0.254 0.158 0.299 0.223 0.114 0.250 16 0.231 0.095 0.250 0.250 0.150 0.291 0.231 0.097 0.250 17 0.240 0.072 0.250 0.250 0.139 0.285 0.241 0.072 0.251 18 0.250 0.06 2 0.257 0.258 0.148 0.297 0.253 0.061 0.259 19 0.260 0.066 0.268 0.278 0.176 0.328 0.265 0.065 0.272 20 0.271 0.055 0.276 0.309 0.195 0.365 0.277 0.055 0.282 21 0.281 0.045 0.284 0.351 0.219 0.412 0.289 0.046 0.292 22 0.292 0.057 0.297 0.398 0.269 0.48 0 0.300 0.060 0.306 23 0.303 0.082 0.313 0.449 0.335 0.559 0.311 0.087 0.322 24 0.313 0.121 0.335 0.501 0.416 0.650 0.321 0.128 0.345 25 0.323 0.148 0.354 0.550 0.484 0.732 0.331 0.157 0.366 26 0.333 0.171 0.374 0.595 0.544 0.805 0.343 0.183 0.388 27 0.345 0.191 0.393 0.634 0.594 0.867 0.357 0.210 0.413 28 0.357 0.188 0.403 0.665 0.609 0.900 0.375 0.215 0.431 29 0.371 0.194 0.418 0.686 0.617 0.921 0.395 0.229 0.456 30 0.386 0.235 0.451 0.695 0.642 0.945 0.416 0.278 0.499 31 0.399 0.273 0.483 0.692 0.646 0.945 0.433 0.321 0.539 32 0.409 0.331 0.526 0.678 0.656 0.942 0.443 0.380 0.583 33 0.412 0.370 0.553 0.654 0.639 0.913 0.440 0.413 0.602 34 0.407 0.375 0.553 0.624 0.589 0.857 0.421 0.404 0.583 35 0.392 0.416 0.571 0.588 0.581 0.825 0.386 0.420 0.570 36 0.366 0.412 0.551 0.546 0.542 0.768 0.335 0.384 0.510 37 0.325 0.356 0.481 0.493 0.464 0.676 0.269 0.293 0.397 38 0.269 0.319 0.416 0.425 0.420 0.597 0.185 0.229 0.294 39 0.198 0.107 0.225 0.327 0.198 0.382 0.100 0.010 0.100 40 0.052 0.200 0 .207 0.112 0.217 0.244 0.103 1.425 1.428 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 281

281 Table A 116. SEE, Bias, and RMSD for ODL ICF method condition 7 Item Characteristic Curve Method Raw Score Fu ll MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.183 0.183 0.000 0.183 0.183 0.155 2.969 2.973 1 0.163 0.142 0.216 0.502 0.355 0.614 0.186 0.116 0.219 2 0.164 0.118 0.202 0.469 0.333 0.574 0.194 0.111 0.223 3 0 .165 0.153 0.225 0.426 0.358 0.555 0.184 0.149 0.237 4 0.167 0.146 0.221 0.394 0.334 0.515 0.176 0.141 0.225 5 0.169 0.105 0.199 0.365 0.271 0.454 0.173 0.101 0.200 6 0.172 0.104 0.200 0.335 0.247 0.416 0.176 0.103 0.204 7 0.176 0.128 0.217 0.302 0.249 0.391 0.182 0.132 0.224 8 0.181 0.124 0.219 0.277 0.228 0.358 0.188 0.132 0.229 9 0.186 0.109 0.216 0.266 0.204 0.335 0.193 0.121 0.227 10 0.193 0.092 0.214 0.265 0.182 0.321 0.198 0.106 0.225 11 0.201 0.081 0.216 0.266 0.167 0.314 0.204 0.095 0.225 12 0.209 0.088 0.226 0.267 0.169 0.316 0.211 0.101 0.233 13 0.219 0.086 0.235 0.266 0.163 0.311 0.219 0.097 0.239 14 0.229 0.083 0.243 0.262 0.157 0.305 0.228 0.090 0.245 15 0.241 0.089 0.256 0.257 0.163 0.304 0.240 0.093 0.257 16 0.253 0.069 0.262 0.2 55 0.149 0.295 0.253 0.071 0.262 17 0.266 0.041 0.269 0.257 0.131 0.288 0.267 0.041 0.269 18 0.280 0.026 0.280 0.269 0.133 0.300 0.282 0.025 0.283 19 0.294 0.025 0.294 0.292 0.152 0.329 0.298 0.024 0.298 20 0.307 0.011 0.307 0.327 0.163 0.364 0.313 0.0 10 0.313 21 0.321 0.004 0.320 0.369 0.179 0.410 0.328 0.003 0.327 22 0.333 0.005 0.333 0.417 0.221 0.471 0.341 0.008 0.340 23 0.345 0.028 0.345 0.467 0.278 0.543 0.353 0.032 0.353 24 0.356 0.065 0.361 0.516 0.352 0.624 0.363 0.070 0.369 25 0.366 0.0 90 0.376 0.562 0.413 0.696 0.374 0.098 0.386 26 0.376 0.111 0.391 0.602 0.467 0.760 0.385 0.122 0.403 27 0.386 0.131 0.407 0.636 0.512 0.815 0.399 0.147 0.424 28 0.397 0.127 0.416 0.663 0.525 0.844 0.415 0.150 0.440 29 0.408 0.132 0.428 0.682 0.533 0.8 64 0.432 0.162 0.460 30 0.419 0.171 0.451 0.690 0.560 0.887 0.447 0.209 0.492 31 0.426 0.209 0.474 0.687 0.569 0.891 0.457 0.252 0.521 32 0.429 0.268 0.505 0.673 0.586 0.891 0.458 0.312 0.553 33 0.426 0.309 0.525 0.648 0.577 0.867 0.449 0.348 0.567 34 0.415 0.319 0.522 0.616 0.537 0.816 0.426 0.347 0.548 35 0.396 0.365 0.538 0.579 0.539 0.789 0.389 0.372 0.538 36 0.368 0.369 0.520 0.538 0.506 0.738 0.338 0.346 0.483 37 0.328 0.319 0.457 0.490 0.431 0.652 0.275 0.262 0.379 38 0.273 0.289 0.397 0.430 0.384 0.576 0.198 0.201 0.282 39 0.205 0.083 0.221 0.338 0.154 0.371 0.113 0.017 0.114 40 0.066 0.208 0.218 0.136 0.235 0.272 0.103 1.425 1.428 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 282

282 Tabl e A 117. SEE, Bias, and RMSD for NOP method condition 7 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.183 0.183 0.000 0.183 0.183 0.155 2.969 2.973 1 0.196 0.378 0.425 0.591 0.680 0.900 0.239 0.183 0.301 2 0.199 0.390 0.437 0.557 0.686 0.883 0.261 0.224 0.343 3 0.202 0.463 0.505 0.515 0.723 0.887 0.253 0.312 0.401 4 0.206 0.497 0.538 0.466 0.710 0.848 0.239 0.369 0.439 5 0.213 0.498 0.541 0.415 0.663 0.781 0.2 30 0.401 0.462 6 0.224 0.537 0.582 0.366 0.658 0.752 0.230 0.475 0.528 7 0.240 0.599 0.645 0.332 0.692 0.767 0.237 0.567 0.614 8 0.260 0.627 0.678 0.316 0.717 0.783 0.251 0.619 0.667 9 0.284 0.637 0.697 0.313 0.746 0.809 0.270 0.645 0.699 10 0.310 0.6 39 0.710 0.318 0.776 0.839 0.294 0.655 0.718 11 0.339 0.640 0.724 0.326 0.805 0.868 0.322 0.659 0.733 12 0.370 0.654 0.751 0.337 0.843 0.908 0.353 0.673 0.759 13 0.402 0.655 0.768 0.351 0.861 0.930 0.387 0.671 0.774 14 0.436 0.650 0.782 0.371 0.869 0.9 44 0.424 0.663 0.786 15 0.470 0.652 0.803 0.396 0.878 0.963 0.463 0.661 0.806 16 0.504 0.626 0.803 0.428 0.858 0.958 0.502 0.632 0.807 17 0.538 0.588 0.796 0.465 0.825 0.946 0.542 0.593 0.802 18 0.571 0.561 0.800 0.506 0.803 0.949 0.580 0.565 0.809 19 0.603 0.545 0.811 0.549 0.793 0.964 0.616 0.548 0.824 20 0.632 0.512 0.812 0.592 0.769 0.969 0.649 0.514 0.826 21 0.658 0.476 0.811 0.633 0.744 0.976 0.676 0.477 0.826 22 0.680 0.461 0.820 0.672 0.742 1.000 0.699 0.460 0.835 23 0.700 0.457 0.834 0.707 0.753 1.031 0.717 0.453 0.846 24 0.717 0.466 0.853 0.737 0.777 1.070 0.732 0.459 0.863 25 0.732 0.463 0.864 0.762 0.789 1.095 0.747 0.454 0.873 26 0.746 0.456 0.873 0.781 0.794 1.113 0.763 0.448 0.884 27 0.760 0.448 0.881 0.794 0.793 1.121 0.782 0.444 0.898 28 0.774 0.418 0.878 0.801 0.761 1.104 0.802 0.420 0.904 29 0.787 0.396 0.879 0.802 0.726 1.080 0.820 0.405 0.913 30 0.794 0.409 0.892 0.794 0.714 1.067 0.831 0.424 0.931 31 0.795 0.420 0.897 0.780 0.689 1.039 0.827 0.437 0.934 32 0.784 0.450 0 .902 0.759 0.677 1.015 0.805 0.465 0.928 33 0.760 0.461 0.887 0.732 0.644 0.974 0.761 0.469 0.892 34 0.722 0.441 0.845 0.699 0.585 0.910 0.695 0.436 0.819 35 0.672 0.459 0.812 0.660 0.572 0.872 0.610 0.434 0.747 36 0.607 0.436 0.746 0.617 0.531 0.813 0 .510 0.385 0.638 37 0.531 0.364 0.642 0.567 0.457 0.727 0.400 0.285 0.491 38 0.438 0.315 0.539 0.511 0.417 0.659 0.287 0.216 0.359 39 0.320 0.088 0.331 0.420 0.196 0.463 0.178 0.013 0.178 40 0.125 0.234 0.265 0.166 0.248 0.298 0.103 1.425 1.428 No te: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 283

283 Table A 118. method condition 8 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD S EE Bias RMSD 0 0.000 0.285 0.285 0.000 0.285 0.285 0.155 3.071 3.074 1 0.206 0.270 0.340 0.582 0.749 0.947 0.215 0.409 0.462 2 0.234 0.221 0.321 0.597 0.719 0.933 0.254 0.409 0.481 3 0.257 0.156 0.300 0.567 0.660 0.869 0.273 0.379 0.467 4 0.277 0.144 0.312 0.521 0.644 0.828 0.291 0.400 0.494 5 0.294 0.155 0.332 0.483 0.640 0.801 0.312 0.443 0.541 6 0.305 0.152 0.340 0.457 0.619 0.768 0.331 0.466 0.571 7 0.310 0.179 0.357 0.443 0.630 0.769 0.342 0.510 0.614 8 0.308 0.215 0.375 0.441 0.651 0.786 0.34 4 0.552 0.649 9 0.299 0.233 0.378 0.451 0.657 0.796 0.334 0.563 0.654 10 0.285 0.255 0.382 0.459 0.665 0.807 0.317 0.568 0.650 11 0.266 0.277 0.383 0.459 0.666 0.809 0.292 0.562 0.633 12 0.243 0.310 0.394 0.449 0.672 0.808 0.264 0.561 0.619 13 0.220 0 .318 0.386 0.426 0.648 0.775 0.234 0.529 0.578 14 0.196 0.321 0.376 0.393 0.615 0.729 0.205 0.489 0.530 15 0.176 0.342 0.385 0.350 0.597 0.692 0.180 0.464 0.498 16 0.162 0.333 0.370 0.303 0.548 0.625 0.163 0.408 0.440 17 0.157 0.344 0.378 0.253 0.518 0 .576 0.156 0.372 0.404 18 0.163 0.366 0.401 0.205 0.500 0.540 0.163 0.347 0.383 19 0.178 0.365 0.406 0.169 0.459 0.489 0.181 0.298 0.348 20 0.201 0.354 0.407 0.158 0.411 0.440 0.208 0.240 0.317 21 0.230 0.348 0.417 0.180 0.369 0.410 0.240 0.187 0.303 22 0.261 0.364 0.447 0.226 0.352 0.418 0.274 0.156 0.314 23 0.293 0.384 0.483 0.286 0.343 0.446 0.308 0.131 0.334 24 0.325 0.409 0.522 0.352 0.341 0.490 0.342 0.111 0.358 25 0.356 0.432 0.559 0.421 0.337 0.539 0.375 0.089 0.385 26 0.388 0.443 0.588 0.4 89 0.321 0.583 0.409 0.058 0.412 27 0.419 0.476 0.634 0.552 0.320 0.636 0.444 0.049 0.446 28 0.452 0.512 0.682 0.606 0.313 0.681 0.481 0.044 0.482 29 0.484 0.536 0.721 0.650 0.285 0.708 0.516 0.027 0.516 30 0.513 0.576 0.770 0.679 0.262 0.726 0.548 0.0 25 0.547 31 0.536 0.634 0.829 0.693 0.251 0.736 0.571 0.042 0.571 32 0.549 0.702 0.890 0.692 0.251 0.734 0.582 0.075 0.585 33 0.550 0.717 0.903 0.676 0.210 0.707 0.576 0.065 0.578 34 0.536 0.721 0.898 0.649 0.178 0.672 0.549 0.055 0.550 35 0.508 0.722 0.881 0.611 0.169 0.632 0.501 0.056 0.503 36 0.466 0.725 0.861 0.568 0.195 0.600 0.432 0.076 0.437 37 0.407 0.708 0.816 0.513 0.236 0.563 0.344 0.099 0.357 38 0.335 0.623 0.707 0.450 0.248 0.513 0.243 0.090 0.258 39 0.223 0.246 0.331 0.350 0.007 0.34 9 0.138 0.158 0.210 40 0.006 0.245 0.245 0.133 0.303 0.331 0.103 1.489 1.493 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 284

284 Table A 119. SEE, Bias, and RMSD for ODL d irect method condition 8 Oshi ma's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.285 0.285 0.000 0.285 0.285 0.155 3.071 3.074 1 0.249 0.165 0.298 0.692 0.520 0.864 0.240 0.337 0.413 2 0.302 0.087 0.313 0.733 0. 467 0.867 0.303 0.302 0.427 3 0.362 0.014 0.362 0.722 0.408 0.828 0.341 0.247 0.421 4 0.411 0.056 0.414 0.682 0.389 0.784 0.379 0.240 0.448 5 0.450 0.066 0.454 0.643 0.389 0.750 0.425 0.252 0.493 6 0.482 0.085 0.488 0.621 0.369 0.722 0.473 0.244 0. 532 7 0.506 0.067 0.509 0.615 0.374 0.719 0.514 0.265 0.577 8 0.523 0.034 0.523 0.627 0.382 0.733 0.542 0.291 0.614 9 0.532 0.013 0.531 0.653 0.370 0.750 0.555 0.299 0.629 10 0.534 0.018 0.533 0.684 0.361 0.772 0.555 0.310 0.635 11 0.532 0.053 0.53 3 0.708 0.352 0.789 0.546 0.319 0.631 12 0.527 0.104 0.536 0.718 0.357 0.800 0.532 0.339 0.629 13 0.521 0.134 0.537 0.714 0.342 0.790 0.517 0.333 0.614 14 0.517 0.162 0.541 0.696 0.327 0.767 0.506 0.321 0.598 15 0.518 0.209 0.557 0.667 0.337 0.746 0.50 0 0.327 0.596 16 0.523 0.229 0.570 0.633 0.323 0.709 0.501 0.303 0.585 17 0.533 0.271 0.597 0.599 0.335 0.685 0.510 0.301 0.591 18 0.549 0.325 0.637 0.572 0.363 0.676 0.527 0.310 0.611 19 0.571 0.356 0.671 0.557 0.372 0.668 0.551 0.297 0.625 20 0.596 0.378 0.705 0.559 0.378 0.674 0.580 0.275 0.641 21 0.625 0.405 0.743 0.582 0.391 0.700 0.613 0.257 0.663 22 0.655 0.453 0.795 0.623 0.431 0.757 0.647 0.261 0.696 23 0.686 0.505 0.850 0.679 0.479 0.830 0.682 0.269 0.731 24 0.717 0.561 0.909 0.746 0.533 0.915 0.716 0.281 0.768 25 0.750 0.613 0.967 0.817 0.584 1.003 0.753 0.293 0.806 26 0.783 0.655 1.019 0.891 0.621 1.084 0.792 0.296 0.843 27 0.819 0.718 1.087 0.962 0.672 1.171 0.836 0.322 0.894 28 0.856 0.785 1.160 1.025 0.712 1.246 0.884 0.353 0.950 29 0.894 0.839 1.224 1.076 0.722 1.293 0.932 0.370 1.001 30 0.926 0.906 1.294 1.108 0.727 1.323 0.973 0.397 1.049 31 0.947 0.985 1.365 1.118 0.731 1.334 0.999 0.435 1.087 32 0.953 1.065 1.427 1.106 0.734 1.325 1.001 0.476 1.106 33 0.937 1.083 1.430 1. 073 0.684 1.270 0.971 0.459 1.071 34 0.899 1.079 1.403 1.022 0.632 1.200 0.904 0.427 0.998 35 0.839 1.062 1.352 0.955 0.597 1.124 0.801 0.392 0.890 36 0.758 1.037 1.283 0.873 0.588 1.051 0.668 0.364 0.759 37 0.652 0.981 1.177 0.774 0.588 0.970 0.512 0. 327 0.607 38 0.522 0.850 0.997 0.656 0.547 0.853 0.347 0.246 0.425 39 0.364 0.406 0.544 0.485 0.215 0.529 0.193 0.084 0.210 40 0.050 0.253 0.258 0.151 0.297 0.333 0.103 1.489 1.493 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 285

285 Table A 120. SEE, Bias, and RMSD for ODL TCF method condition 8 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.285 0.285 0.000 0.2 85 0.285 0.155 3.071 3.074 1 0.151 0.062 0.163 0.528 0.335 0.624 0.189 0.293 0.348 2 0.199 0.030 0.200 0.570 0.268 0.628 0.230 0.230 0.325 3 0.246 0.146 0.286 0.560 0.203 0.595 0.250 0.148 0.290 4 0.280 0.204 0.346 0.529 0.185 0.559 0.270 0.113 0.29 2 5 0.305 0.234 0.384 0.494 0.179 0.525 0.298 0.096 0.312 6 0.321 0.273 0.421 0.469 0.150 0.491 0.327 0.063 0.332 7 0.330 0.275 0.429 0.452 0.140 0.472 0.350 0.060 0.354 8 0.331 0.260 0.420 0.446 0.129 0.463 0.362 0.069 0.367 9 0.324 0.254 0.411 0.456 0.097 0.465 0.361 0.062 0.365 10 0.310 0.235 0.388 0.472 0.071 0.476 0.347 0.063 0.352 11 0.292 0.208 0.357 0.481 0.048 0.482 0.325 0.066 0.331 12 0.269 0.163 0.314 0.478 0.043 0.479 0.296 0.081 0.306 13 0.247 0.136 0.281 0.461 0.021 0.460 0. 265 0.074 0.275 14 0.227 0.111 0.252 0.431 0.003 0.430 0.236 0.061 0.243 15 0.214 0.063 0.222 0.390 0.011 0.389 0.214 0.068 0.224 16 0.211 0.041 0.214 0.343 0.001 0.342 0.205 0.046 0.209 17 0.220 0.003 0.220 0.298 0.014 0.297 0.212 0.046 0.216 18 0.242 0.062 0.249 0.264 0.047 0.268 0.235 0.059 0.242 19 0.273 0.098 0.290 0.258 0.063 0.265 0.271 0.051 0.275 20 0.311 0.128 0.336 0.287 0.077 0.297 0.314 0.036 0.316 21 0.353 0.164 0.389 0.348 0.101 0.361 0.361 0.026 0.361 22 0.397 0.222 0.454 0.428 0.153 0.454 0.409 0.041 0.410 23 0.439 0.285 0.523 0.519 0.214 0.560 0.455 0.061 0.458 24 0.481 0.352 0.595 0.614 0.283 0.675 0.500 0.086 0.506 25 0.522 0.415 0.666 0.710 0.351 0.790 0.544 0.108 0.553 26 0.564 0.466 0.730 0.801 0.405 0.896 0.591 0.121 0.601 27 0.608 0.537 0.810 0.886 0.473 1.002 0.641 0.156 0.658 28 0.653 0.612 0.893 0.958 0.530 1.093 0.694 0.194 0.719 29 0.697 0.673 0.968 1.014 0.557 1.154 0.747 0.219 0.777 30 0.736 0.750 1.050 1.048 0.578 1.195 0.794 0.255 0.832 31 0.766 0.840 1. 135 1.059 0.597 1.213 0.826 0.305 0.878 32 0.779 0.933 1.214 1.046 0.613 1.210 0.835 0.359 0.907 33 0.773 0.966 1.236 1.013 0.574 1.162 0.815 0.358 0.888 34 0.748 0.977 1.229 0.962 0.533 1.098 0.763 0.344 0.835 35 0.704 0.975 1.201 0.898 0.506 1.029 0. 680 0.325 0.752 36 0.643 0.964 1.158 0.819 0.506 0.961 0.570 0.312 0.649 37 0.555 0.917 1.071 0.724 0.513 0.886 0.438 0.291 0.524 38 0.445 0.793 0.908 0.604 0.479 0.770 0.288 0.226 0.366 39 0.298 0.374 0.478 0.434 0.162 0.463 0.141 0.083 0.163 40 0.0 00 0.244 0.244 0.116 0.285 0.307 0.103 1.489 1.493 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 286

286 Table A 121. SEE, Bias, and RMSD for ODL ICF method condition 8 Item Characteristic Curve Method R aw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.285 0.285 0.000 0.285 0.285 0.155 3.071 3.074 1 0.155 0.087 0.177 0.549 0.418 0.689 0.195 0.326 0.380 2 0.196 0.000 0.195 0.574 0.357 0.675 0.235 0.277 0.363 3 0.243 0.113 0.268 0.559 0.286 0.626 0.255 0.202 0.325 4 0.276 0.169 0.323 0.530 0.267 0.592 0.274 0.170 0.322 5 0.297 0.198 0.356 0.500 0.263 0.564 0.299 0.153 0.335 6 0.310 0.237 0.390 0.476 0.230 0.528 0.324 0.117 0.344 7 0.315 0.241 0 .396 0.459 0.218 0.507 0.343 0.110 0.359 8 0.312 0.229 0.387 0.450 0.203 0.492 0.349 0.112 0.366 9 0.302 0.229 0.379 0.455 0.167 0.484 0.343 0.098 0.356 10 0.286 0.215 0.357 0.466 0.136 0.484 0.326 0.091 0.338 11 0.265 0.196 0.329 0.471 0.107 0.482 0.300 0.085 0.311 12 0.241 0.159 0.288 0.466 0.093 0.474 0.269 0.092 0.283 13 0.218 0.141 0.260 0.448 0.061 0.451 0.237 0.074 0.248 14 0.201 0.125 0.236 0.417 0.032 0.418 0.210 0.051 0.216 15 0.193 0.086 0.211 0.375 0.030 0.376 0.193 0.048 0.198 16 0.198 0.075 0.211 0.327 0.005 0.326 0.191 0.015 0.191 17 0.216 0.040 0.219 0.280 0.008 0.279 0.207 0.005 0.206 18 0.244 0.008 0.244 0.245 0.029 0.246 0.237 0.008 0.237 19 0.281 0.036 0.282 0.239 0.033 0.240 0.278 0.011 0.278 20 0.322 0.056 0.326 0.270 0.034 0.271 0.325 0.036 0.326 21 0.366 0.083 0.374 0.331 0.045 0.334 0.373 0.054 0.376 22 0.410 0.134 0.430 0.410 0.084 0.418 0.421 0.048 0.423 23 0.452 0.190 0.490 0.498 0.133 0.514 0.467 0.035 0.468 24 0.493 0.251 0.552 0.588 0.189 0.616 0. 511 0.016 0.510 25 0.533 0.309 0.615 0.677 0.245 0.718 0.554 0.001 0.553 26 0.572 0.355 0.672 0.763 0.287 0.813 0.598 0.008 0.596 27 0.612 0.423 0.743 0.842 0.345 0.908 0.644 0.038 0.644 28 0.654 0.492 0.817 0.911 0.393 0.990 0.693 0.070 0.695 29 0.6 94 0.549 0.883 0.965 0.415 1.048 0.741 0.088 0.745 30 0.729 0.620 0.956 0.999 0.434 1.087 0.782 0.119 0.789 31 0.754 0.707 1.032 1.011 0.455 1.106 0.809 0.165 0.824 32 0.765 0.800 1.106 0.999 0.477 1.105 0.815 0.221 0.842 33 0.759 0.836 1.128 0.968 0.4 48 1.065 0.794 0.226 0.824 34 0.734 0.855 1.126 0.921 0.419 1.009 0.743 0.223 0.774 35 0.692 0.864 1.106 0.860 0.405 0.948 0.663 0.220 0.697 36 0.631 0.867 1.071 0.788 0.417 0.890 0.557 0.226 0.600 37 0.547 0.837 0.999 0.700 0.436 0.823 0.429 0.225 0.4 84 38 0.436 0.730 0.850 0.593 0.413 0.722 0.287 0.178 0.337 39 0.298 0.338 0.450 0.438 0.109 0.451 0.145 0.115 0.185 40 0.000 0.244 0.244 0.127 0.299 0.324 0.103 1.489 1.493 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 287

287 Table A 122. SEE, Bias, and RMSD for NOP method condition 8 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.285 0.285 0.000 0.285 0.285 0. 155 3.071 3.074 1 0.174 0.321 0.365 0.567 0.855 1.025 0.219 0.427 0.480 2 0.182 0.285 0.338 0.543 0.846 1.005 0.245 0.444 0.507 3 0.190 0.234 0.301 0.502 0.803 0.946 0.242 0.433 0.496 4 0.198 0.237 0.309 0.441 0.791 0.905 0.229 0.473 0.526 5 0.207 0.2 64 0.336 0.388 0.792 0.882 0.218 0.537 0.580 6 0.220 0.277 0.353 0.347 0.774 0.847 0.215 0.581 0.619 7 0.238 0.319 0.398 0.317 0.787 0.848 0.222 0.644 0.681 8 0.260 0.369 0.451 0.299 0.818 0.871 0.237 0.702 0.740 9 0.287 0.399 0.491 0.291 0.838 0.887 0 .259 0.726 0.770 10 0.318 0.431 0.536 0.294 0.861 0.910 0.286 0.740 0.793 11 0.352 0.461 0.579 0.302 0.877 0.927 0.318 0.741 0.806 12 0.387 0.499 0.631 0.316 0.896 0.949 0.353 0.744 0.823 13 0.425 0.511 0.664 0.335 0.882 0.943 0.390 0.716 0.815 14 0.4 63 0.516 0.693 0.361 0.856 0.928 0.430 0.677 0.801 15 0.502 0.539 0.735 0.392 0.843 0.929 0.471 0.653 0.805 16 0.541 0.530 0.757 0.430 0.796 0.905 0.514 0.598 0.787 17 0.580 0.541 0.792 0.473 0.768 0.901 0.556 0.561 0.789 18 0.618 0.562 0.834 0.520 0.7 49 0.911 0.598 0.534 0.801 19 0.654 0.558 0.859 0.569 0.705 0.906 0.638 0.483 0.798 20 0.688 0.544 0.875 0.620 0.653 0.899 0.674 0.421 0.793 21 0.718 0.533 0.893 0.670 0.606 0.902 0.706 0.362 0.792 22 0.745 0.543 0.920 0.719 0.583 0.924 0.734 0.325 0.8 01 23 0.769 0.556 0.948 0.765 0.565 0.950 0.757 0.292 0.810 24 0.790 0.574 0.975 0.808 0.553 0.978 0.778 0.265 0.820 25 0.810 0.590 1.000 0.847 0.539 1.002 0.798 0.237 0.831 26 0.830 0.595 1.019 0.879 0.511 1.015 0.820 0.200 0.842 27 0.849 0.623 1.051 0.905 0.500 1.032 0.844 0.186 0.862 28 0.869 0.653 1.085 0.922 0.484 1.039 0.868 0.177 0.884 29 0.885 0.672 1.110 0.929 0.445 1.028 0.891 0.155 0.902 30 0.897 0.706 1.140 0.924 0.413 1.010 0.905 0.147 0.915 31 0.898 0.756 1.173 0.908 0.394 0.988 0.906 0.158 0.918 32 0.885 0.815 1.202 0.881 0.387 0.960 0.888 0.183 0.904 33 0.854 0.821 1.183 0.843 0.339 0.906 0.845 0.165 0.859 34 0.806 0.815 1.144 0.796 0.300 0.849 0.777 0.147 0.788 35 0.740 0.805 1.092 0.741 0.287 0.793 0.682 0.140 0.694 36 0.656 0 .797 1.032 0.678 0.306 0.742 0.565 0.152 0.584 37 0.565 0.776 0.959 0.607 0.342 0.695 0.433 0.166 0.463 38 0.448 0.683 0.816 0.519 0.346 0.623 0.300 0.143 0.331 39 0.297 0.279 0.407 0.404 0.079 0.411 0.179 0.133 0.223 40 0.047 0.252 0.257 0.146 0.29 8 0.332 0.103 1.489 1.493 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 288

288 Table A 123. method condition 9 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.500 0.155 3.286 3.289 1 0.000 0.768 0.768 0.222 0.669 0.705 0.184 0.088 0.204 2 0.142 1.664 1.670 0.443 1.200 1.279 0.219 0.112 0.246 3 0.118 2.161 2.164 0.572 1.459 1. 566 0.227 0.404 0.463 4 0.226 2.607 2.617 0.573 1.528 1.632 0.227 0.665 0.703 5 0.163 2.969 2.973 0.546 1.539 1.632 0.228 0.969 0.995 6 0.233 3.423 3.431 0.503 1.612 1.688 0.233 1.367 1.386 7 0.189 3.861 3.866 0.450 1.716 1.773 0.241 1.776 1.792 8 0.234 4.261 4.268 0.406 1.850 1.894 0.250 2.157 2.171 9 0.207 4.691 4.696 0.393 2.060 2.097 0.260 2.539 2.552 10 0.227 5.087 5.092 0.385 2.337 2.369 0.269 2.903 2.915 11 0.220 5.503 5.507 0.372 2.666 2.691 0.276 3.235 3.247 12 0.2 14 5.853 5.857 0.387 2.985 3.010 0.279 3.518 3.529 13 0.225 6.205 6.209 0.389 3.315 3.338 0.278 3.769 3.779 14 0.210 6.565 6.568 0.398 3.671 3.693 0.272 4.024 4.033 15 0.205 6.857 6.860 0.406 3.970 3.991 0.262 4.222 4.230 16 0.206 7.142 7. 145 0.407 4.255 4.274 0.249 4.410 4.417 17 0.194 7.425 7.428 0.402 4.538 4.555 0.232 4.606 4.612 18 0.179 7.658 7.660 0.393 4.772 4.788 0.215 4.774 4.779 19 0.167 7.852 7.853 0.376 4.969 4.983 0.196 4.923 4.927 20 0.157 8.008 8.010 0.354 5 .133 5.145 0.179 5.050 5.054 21 0.150 8.139 8.140 0.325 5.276 5.286 0.163 5.163 5.165 22 0.146 8.201 8.203 0.291 5.365 5.373 0.151 5.216 5.218 23 0.146 8.216 8.217 0.255 5.421 5.427 0.145 5.227 5.229 24 0.154 8.192 8.193 0.217 5.458 5.462 0 .147 5.204 5.206 25 0.169 8.117 8.119 0.181 5.461 5.464 0.158 5.133 5.135 26 0.190 8.023 8.025 0.154 5.460 5.462 0.177 5.049 5.052 27 0.216 7.903 7.905 0.145 5.445 5.447 0.201 4.955 4.959 28 0.245 7.740 7.743 0.159 5.395 5.397 0.226 4.845 4.851 29 0.273 7.573 7.578 0.193 5.343 5.346 0.251 4.767 4.774 30 0.300 7.372 7.378 0.239 5.249 5.255 0.274 4.687 4.695 31 0.325 7.147 7.154 0.290 5.118 5.127 0.297 4.606 4.616 32 0.348 6.929 6.938 0.343 4.977 4.989 0.320 4.545 4.556 33 0. 369 6.649 6.659 0.392 4.752 4.769 0.346 4.415 4.429 34 0.390 6.384 6.396 0.436 4.530 4.551 0.375 4.278 4.295 35 0.412 5.990 6.004 0.469 4.187 4.213 0.405 3.970 3.991 36 0.430 5.416 5.433 0.483 3.715 3.746 0.431 3.413 3.440 37 0.443 4.659 4 .680 0.475 3.171 3.207 0.430 2.580 2.616 38 0.429 3.634 3.659 0.446 2.524 2.563 0.361 1.420 1.465 39 0.360 2.649 2.674 0.399 2.043 2.081 0.208 0.418 0.466 40 0.226 1.499 1.515 0.308 1.356 1.391 0.103 1.110 1.115 Note: Test scores at both end s (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 289

289 Table A 124. SEE, Bias, and RMSD for ODL d irect method condition 9 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.500 0.155 3.286 3.289 1 0.249 0.704 0.747 0.757 0.196 0.780 0.343 0.158 0.377 2 0.367 1.107 1.166 0.912 0.405 0.996 0.491 0.048 0.492 3 0.524 1.511 1.599 1.023 0.561 1.165 0.605 0.129 0.617 4 0.630 1.841 1.94 5 1.143 0.610 1.293 0.699 0.251 0.741 5 0.730 2.070 2.194 1.278 0.634 1.424 0.782 0.393 0.873 6 0.825 2.403 2.540 1.390 0.712 1.558 0.857 0.605 1.047 7 0.904 2.727 2.872 1.464 0.798 1.664 0.928 0.809 1.229 8 0.955 2.973 3.122 1.521 0.877 1 .752 0.995 0.972 1.389 9 1.002 3.240 3.391 1.552 0.996 1.841 1.058 1.127 1.544 10 1.037 3.499 3.649 1.570 1.150 1.943 1.115 1.261 1.681 11 1.058 3.720 3.867 1.586 1.315 2.057 1.164 1.366 1.793 12 1.087 3.900 4.048 1.585 1.456 2.149 1.203 1 .432 1.868 13 1.105 4.061 4.208 1.585 1.585 2.239 1.228 1.481 1.922 14 1.112 4.230 4.373 1.585 1.728 2.342 1.239 1.550 1.983 15 1.113 4.342 4.482 1.577 1.817 2.403 1.234 1.579 2.002 16 1.105 4.439 4.574 1.555 1.891 2.446 1.212 1.612 2.015 17 1.089 4.537 4.665 1.525 1.967 2.487 1.177 1.663 2.035 18 1.066 4.597 4.718 1.483 2.006 2.493 1.133 1.692 2.035 19 1.040 4.629 4.744 1.425 2.018 2.468 1.086 1.708 2.023 20 1.016 4.633 4.742 1.355 2.004 2.417 1.043 1.708 2.000 21 0.998 4. 616 4.722 1.272 1.977 2.350 1.007 1.701 1.975 22 0.991 4.541 4.647 1.191 1.901 2.242 0.983 1.646 1.915 23 0.999 4.427 4.537 1.123 1.801 2.121 0.973 1.565 1.841 24 1.025 4.287 4.407 1.083 1.691 2.007 0.980 1.473 1.768 25 1.068 4.112 4.248 1. 084 1.562 1.900 1.004 1.363 1.692 26 1.127 3.938 4.095 1.136 1.447 1.838 1.046 1.272 1.645 27 1.198 3.763 3.948 1.241 1.340 1.824 1.105 1.198 1.628 28 1.276 3.574 3.794 1.386 1.223 1.846 1.176 1.124 1.625 29 1.360 3.410 3.670 1.539 1.135 1 .909 1.250 1.083 1.652 30 1.444 3.234 3.541 1.673 1.039 1.966 1.318 1.031 1.671 31 1.524 3.052 3.410 1.768 0.943 1.999 1.367 0.970 1.674 32 1.587 2.890 3.295 1.812 0.877 2.009 1.395 0.927 1.672 33 1.626 2.676 3.130 1.798 0.783 1.957 1.397 0.833 1.624 34 1.636 2.500 2.986 1.741 0.754 1.893 1.373 0.784 1.578 35 1.610 2.245 2.760 1.657 0.678 1.787 1.319 0.662 1.473 36 1.541 1.898 2.442 1.548 0.553 1.641 1.225 0.467 1.308 37 1.411 1.509 2.064 1.396 0.433 1.458 1.079 0.253 1.106 38 1.210 1.060 1.606 1.177 0.277 1.206 0.866 0.019 0.864 39 0.910 0.819 1.223 0.871 0.359 0.941 0.533 0.056 0.535 40 0.471 0.529 0.708 0.488 0.335 0.591 0.103 1.110 1.115 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 290

290 Table A 125. SEE, Bias, and RMSD for ODL TCF method condition 9 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.50 0 0.155 3.286 3.289 1 0.215 0.632 0.667 0.528 0.100 0.536 0.191 0.183 0.265 2 0.253 1.008 1.039 0.660 0.217 0.693 0.246 0.114 0.270 3 0.354 1.363 1.408 0.688 0.314 0.755 0.286 0.022 0.286 4 0.412 1.622 1.673 0.693 0.309 0.757 0.338 0.099 0.35 1 5 0.455 1.806 1.862 0.703 0.286 0.757 0.410 0.189 0.450 6 0.513 2.106 2.167 0.700 0.324 0.770 0.497 0.344 0.604 7 0.554 2.393 2.456 0.682 0.381 0.780 0.590 0.491 0.766 8 0.596 2.608 2.675 0.685 0.436 0.811 0.680 0.595 0.902 9 0.659 2.84 1 2.917 0.723 0.530 0.895 0.761 0.693 1.028 10 0.715 3.069 3.151 0.791 0.653 1.024 0.829 0.776 1.134 11 0.761 3.263 3.350 0.868 0.781 1.166 0.883 0.838 1.216 12 0.806 3.418 3.511 0.942 0.883 1.289 0.924 0.866 1.265 13 0.851 3.556 3.655 1.00 8 0.974 1.400 0.954 0.886 1.301 14 0.892 3.705 3.810 1.064 1.080 1.514 0.976 0.934 1.349 15 0.928 3.801 3.912 1.106 1.136 1.584 0.991 0.948 1.370 16 0.957 3.886 4.002 1.134 1.185 1.639 1.003 0.972 1.394 17 0.981 3.978 4.096 1.150 1.244 1.6 92 1.010 1.020 1.434 18 1.001 4.037 4.159 1.154 1.275 1.718 1.015 1.052 1.460 19 1.016 4.073 4.198 1.150 1.290 1.726 1.016 1.074 1.477 20 1.027 4.085 4.212 1.137 1.291 1.718 1.012 1.085 1.482 21 1.035 4.082 4.211 1.119 1.289 1.705 1.002 1. 094 1.482 22 1.037 4.024 4.155 1.096 1.246 1.658 0.985 1.061 1.446 23 1.034 3.932 4.065 1.070 1.188 1.597 0.961 1.009 1.392 24 1.024 3.820 3.954 1.044 1.127 1.534 0.929 0.952 1.328 25 1.007 3.676 3.811 1.017 1.052 1.461 0.893 0.880 1.252 2 6 0.984 3.538 3.672 0.993 0.995 1.403 0.857 0.829 1.191 27 0.957 3.401 3.533 0.971 0.948 1.356 0.824 0.795 1.144 28 0.927 3.251 3.380 0.952 0.893 1.303 0.798 0.759 1.100 29 0.898 3.125 3.251 0.934 0.860 1.268 0.780 0.750 1.080 30 0.871 2.9 85 3.109 0.916 0.811 1.221 0.767 0.725 1.054 31 0.847 2.835 2.958 0.892 0.750 1.164 0.757 0.686 1.020 32 0.825 2.700 2.823 0.861 0.710 1.114 0.744 0.660 0.993 33 0.801 2.508 2.632 0.821 0.628 1.032 0.722 0.582 0.926 34 0.769 2.349 2.471 0.7 72 0.605 0.979 0.685 0.550 0.877 35 0.726 2.106 2.227 0.714 0.532 0.889 0.625 0.449 0.768 36 0.671 1.773 1.895 0.652 0.409 0.768 0.537 0.280 0.604 37 0.590 1.395 1.514 0.585 0.292 0.653 0.419 0.099 0.429 38 0.486 0.959 1.074 0.511 0.136 0. 527 0.278 0.132 0.307 39 0.352 0.739 0.818 0.424 0.213 0.473 0.133 0.119 0.178 40 0.190 0.457 0.495 0.277 0.207 0.345 0.103 1.110 1.115 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 291

291 Table A 126. SEE, Bias, and RMSD for ODL ICF method condition 9 Item Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.500 0.155 3.286 3.289 1 0.244 0.518 0. 573 0.590 0.106 0.598 0.182 0.225 0.289 2 0.281 0.868 0.912 0.673 0.035 0.672 0.222 0.189 0.291 3 0.353 1.184 1.235 0.684 0.041 0.684 0.253 0.089 0.268 4 0.433 1.397 1.462 0.675 0.018 0.674 0.308 0.055 0.312 5 0.478 1.555 1.627 0.660 0.018 0.658 0.391 0.017 0.391 6 0.541 1.826 1.904 0.643 0.011 0.642 0.493 0.082 0.498 7 0.590 2.085 2.167 0.636 0.059 0.638 0.599 0.172 0.622 8 0.635 2.276 2.362 0.660 0.095 0.665 0.700 0.223 0.733 9 0.701 2.476 2.573 0.721 0.159 0.737 0.791 0.275 0.83 5 10 0.766 2.672 2.779 0.805 0.239 0.838 0.868 0.318 0.922 11 0.815 2.841 2.955 0.893 0.319 0.946 0.929 0.348 0.990 12 0.861 2.972 3.094 0.976 0.378 1.045 0.976 0.352 1.036 13 0.909 3.088 3.218 1.051 0.429 1.133 1.012 0.355 1.070 14 0.952 3.216 3.353 1.113 0.502 1.218 1.038 0.390 1.106 15 0.988 3.295 3.439 1.160 0.533 1.274 1.057 0.396 1.126 16 1.017 3.367 3.516 1.193 0.564 1.317 1.072 0.413 1.147 17 1.040 3.448 3.601 1.213 0.613 1.357 1.083 0.458 1.173 18 1.057 3.500 3.656 1.221 0.641 1.377 1.090 0.488 1.191 19 1.070 3.532 3.690 1.219 0.658 1.382 1.091 0.512 1.203 20 1.077 3.542 3.702 1.207 0.665 1.376 1.087 0.528 1.206 21 1.080 3.540 3.700 1.187 0.675 1.363 1.075 0.545 1.203 22 1.077 3.485 3.647 1.160 0.65 0 1.327 1.054 0.525 1.175 23 1.068 3.400 3.563 1.126 0.613 1.280 1.024 0.491 1.133 24 1.051 3.297 3.459 1.087 0.578 1.229 0.985 0.455 1.083 25 1.026 3.167 3.328 1.045 0.532 1.171 0.940 0.408 1.023 26 0.994 3.045 3.203 1.002 0.507 1.121 0.89 5 0.382 0.971 27 0.957 2.928 3.080 0.960 0.494 1.077 0.852 0.372 0.928 28 0.916 2.800 2.945 0.919 0.472 1.031 0.815 0.356 0.888 29 0.876 2.696 2.834 0.880 0.473 0.997 0.785 0.366 0.864 30 0.837 2.577 2.709 0.841 0.457 0.955 0.758 0.358 0.8 37 31 0.801 2.448 2.575 0.800 0.428 0.905 0.732 0.337 0.804 32 0.767 2.333 2.455 0.756 0.418 0.862 0.701 0.332 0.774 33 0.729 2.164 2.283 0.706 0.367 0.795 0.660 0.281 0.716 34 0.684 2.031 2.143 0.652 0.375 0.751 0.605 0.282 0.666 35 0.630 1.821 1.926 0.595 0.332 0.680 0.531 0.224 0.575 36 0.567 1.524 1.626 0.538 0.239 0.588 0.437 0.105 0.449 37 0.487 1.190 1.285 0.480 0.152 0.503 0.327 0.019 0.327 38 0.394 0.800 0.891 0.419 0.024 0.419 0.210 0.191 0.284 39 0.272 0.641 0.696 0.353 0.128 0.375 0.106 0.131 0.168 40 0.155 0.410 0.438 0.251 0.139 0.286 0.103 1.110 1.115 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 292

292 Table A 127. SEE, Bias, and RMSD for NOP method condition 9 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.500 0.500 0.000 0.500 0.500 0.155 3.286 3.289 1 0.000 0.768 0.768 0.000 0.768 0.768 0.165 0.058 0.175 2 0.00 0 1.751 1.751 0.102 1.729 1.732 0.197 0.414 0.458 3 0.000 2.780 2.780 0.273 2.590 2.604 0.201 0.867 0.890 4 0.115 3.656 3.658 0.460 2.996 3.031 0.193 1.295 1.309 5 0.134 4.153 4.155 0.512 3.117 3.159 0.185 1.772 1.781 6 0.119 4.873 4.875 0 .497 3.248 3.286 0.179 2.353 2.360 7 0.199 5.450 5.453 0.467 3.395 3.427 0.175 2.961 2.966 8 0.131 5.977 5.978 0.446 3.560 3.588 0.173 3.556 3.560 9 0.160 6.619 6.621 0.398 3.813 3.834 0.172 4.170 4.174 10 0.187 7.148 7.150 0.387 4.169 4.1 86 0.172 4.786 4.789 11 0.142 7.749 7.750 0.362 4.588 4.603 0.173 5.387 5.390 12 0.187 8.302 8.304 0.351 5.032 5.045 0.175 5.954 5.956 13 0.151 8.824 8.826 0.334 5.532 5.542 0.178 6.500 6.502 14 0.177 9.416 9.418 0.337 6.064 6.073 0.182 7. 052 7.054 15 0.167 9.909 9.911 0.314 6.596 6.604 0.187 7.543 7.545 16 0.181 10.446 10.447 0.327 7.116 7.123 0.194 8.011 8.013 17 0.178 10.941 10.942 0.310 7.663 7.670 0.202 8.469 8.472 18 0.197 11.431 11.433 0.321 8.161 8.167 0.211 8.879 8. 881 19 0.196 11.861 11.862 0.313 8.635 8.640 0.223 9.247 9.249 20 0.222 12.258 12.260 0.316 9.054 9.060 0.236 9.574 9.576 21 0.234 12.624 12.626 0.317 9.449 9.455 0.252 9.869 9.872 22 0.252 12.902 12.904 0.311 9.763 9.768 0.270 10.092 10.09 6 23 0.280 13.121 13.124 0.310 10.018 10.023 0.291 10.263 10.267 24 0.309 13.293 13.296 0.309 10.233 10.237 0.315 10.389 10.394 25 0.339 13.397 13.402 0.306 10.393 10.398 0.343 10.455 10.461 26 0.371 13.464 13.470 0.303 10.527 10.531 0.374 10.490 10.496 27 0.406 13.487 13.493 0.303 10.626 10.630 0.407 10.486 10.493 28 0.442 13.444 13.451 0.308 10.669 10.674 0.442 10.426 10.436 29 0.478 13.369 13.378 0.320 10.690 10.695 0.476 10.351 10.362 30 0.514 13.223 13.233 0.338 10.649 10 .654 0.509 10.225 10.237 31 0.550 13.009 13.020 0.362 10.552 10.559 0.538 10.059 10.073 32 0.584 12.751 12.764 0.391 10.431 10.438 0.561 9.884 9.900 33 0.614 12.376 12.391 0.423 10.213 10.222 0.579 9.633 9.650 34 0.640 11.962 11.979 0.458 9 .978 9.989 0.593 9.379 9.398 35 0.657 11.370 11.389 0.495 9.583 9.596 0.607 8.965 8.985 36 0.665 10.550 10.571 0.534 8.966 8.982 0.628 8.281 8.305 37 0.659 9.469 9.492 0.568 8.097 8.117 0.662 7.151 7.182 38 0.634 7.926 7.951 0.565 6.820 6.8 44 0.677 5.021 5.066 39 0.547 5.981 6.006 0.487 5.336 5.358 0.504 1.734 1.806 40 0.350 3.346 3.364 0.371 3.299 3.320 0.103 1.110 1.115 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 293

293 Table A 128 method condition 10 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.148 0.148 0.000 0.148 0.148 0.155 2.637 2.642 1 0.208 0.359 0.414 0.632 0.554 0.8 39 0.221 0.047 0.225 2 0.256 0.384 0.461 0.608 0.466 0.765 0.256 0.040 0.259 3 0.300 0.490 0.574 0.563 0.425 0.704 0.277 0.030 0.278 4 0.332 0.649 0.728 0.515 0.433 0.672 0.299 0.155 0.337 5 0.352 0.771 0.847 0.473 0.423 0.634 0.324 0.249 0.408 6 0. 360 0.888 0.958 0.445 0.462 0.641 0.344 0.344 0.487 7 0.361 0.976 1.040 0.434 0.537 0.690 0.356 0.422 0.551 8 0.355 1.023 1.083 0.440 0.638 0.774 0.358 0.472 0.592 9 0.344 1.049 1.104 0.450 0.753 0.877 0.352 0.515 0.623 10 0.330 1.034 1.085 0.455 0.831 0.947 0.340 0.528 0.628 11 0.313 0.999 1.047 0.451 0.882 0.990 0.325 0.534 0.624 12 0.296 0.956 1.001 0.437 0.916 1.014 0.308 0.541 0.622 13 0.280 0.901 0.944 0.416 0.929 1.017 0.293 0.544 0.617 14 0.266 0.846 0.887 0.389 0.935 1.013 0.280 0.554 0.621 15 0.256 0.771 0.812 0.359 0.918 0.986 0.270 0.552 0.614 16 0.250 0.653 0.699 0.330 0.855 0.917 0.266 0.510 0.575 17 0.250 0.514 0.571 0.303 0.771 0.829 0.267 0.453 0.526 18 0.254 0.398 0.472 0.283 0.708 0.763 0.273 0.422 0.502 19 0.263 0.293 0.393 0 .272 0.653 0.707 0.284 0.402 0.492 20 0.276 0.175 0.327 0.272 0.581 0.642 0.300 0.370 0.475 21 0.293 0.045 0.296 0.283 0.491 0.566 0.318 0.322 0.452 22 0.313 0.073 0.321 0.305 0.405 0.507 0.338 0.286 0.442 23 0.335 0.154 0.368 0.335 0.347 0.482 0.359 0.283 0.456 24 0.359 0.234 0.427 0.370 0.277 0.462 0.381 0.278 0.471 25 0.382 0.322 0.499 0.410 0.187 0.449 0.405 0.262 0.482 26 0.407 0.400 0.570 0.451 0.095 0.460 0.431 0.254 0.499 27 0.433 0.467 0.636 0.492 0.006 0.491 0.460 0.253 0.524 28 0.4 60 0.533 0.703 0.533 0.083 0.538 0.490 0.250 0.549 29 0.488 0.603 0.775 0.570 0.175 0.595 0.520 0.237 0.571 30 0.517 0.638 0.820 0.602 0.224 0.641 0.547 0.251 0.601 31 0.544 0.659 0.853 0.627 0.246 0.672 0.567 0.270 0.627 32 0.566 0.687 0.890 0.644 0.263 0.695 0.576 0.266 0.633 33 0.582 0.660 0.879 0.652 0.214 0.685 0.571 0.300 0.643 34 0.588 0.633 0.863 0.650 0.163 0.668 0.549 0.310 0.629 35 0.580 0.569 0.812 0.636 0.086 0.640 0.510 0.325 0.604 36 0.555 0.478 0.731 0.609 0.008 0.6 08 0.452 0.332 0.560 37 0.506 0.361 0.620 0.566 0.058 0.567 0.372 0.320 0.490 38 0.429 0.138 0.449 0.500 0.190 0.534 0.270 0.374 0.461 39 0.328 0.161 0.365 0.409 0.056 0.411 0.153 0.186 0.241 40 0.203 0.210 0.291 0.244 0.131 0.276 0.103 1.132 1.1 36 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 294

294 Table A 129. SEE, Bias, and RMSD for ODL d irect method condition 10 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bia s RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.148 0.148 0.000 0.148 0.148 0.155 2.637 2.642 1 0.263 0.285 0.388 0.691 0.196 0.716 0.268 0.214 0.342 2 0.341 0.291 0.448 0.696 0.117 0.704 0.345 0.272 0.439 3 0.399 0.388 0.555 0.689 0.090 0.693 0.393 0.243 0.461 4 0.452 0.549 0.710 0.671 0.118 0.680 0.433 0.132 0.452 5 0.500 0.690 0.851 0.632 0.136 0.645 0.468 0.020 0.467 6 0.537 0.838 0.995 0.614 0.191 0.642 0.499 0.118 0.512 7 0.568 0.966 1.120 0.617 0.279 0.676 0.529 0.254 0.586 8 0.596 1.060 1.215 0.634 0.395 0.746 0.559 0.371 0.670 9 0.622 1.135 1.294 0.657 0.541 0.850 0.590 0.483 0.761 10 0.647 1.169 1.336 0.678 0.667 0.950 0.621 0.566 0.839 11 0.672 1.184 1.361 0.690 0.780 1.040 0.653 0.640 0.913 12 0.697 1.190 1.378 0.698 0.884 1.126 0.688 0.713 0.990 13 0.722 1.183 1.385 0.709 0.974 1.204 0.725 0.782 1.065 14 0.749 1.176 1.393 0.725 1.062 1.285 0.766 0.858 1.148 15 0.776 1.148 1.385 0.750 1.130 1.355 0.809 0.920 1.224 16 0.805 1.076 1.343 0.786 1.156 1.396 0.855 0.943 1.271 17 0. 834 0.984 1.289 0.831 1.162 1.428 0.901 0.950 1.308 18 0.864 0.913 1.256 0.884 1.192 1.483 0.946 0.980 1.361 19 0.893 0.851 1.232 0.943 1.231 1.549 0.988 1.018 1.417 20 0.921 0.775 1.201 1.006 1.253 1.605 1.026 1.039 1.458 21 0.946 0.682 1.164 1.069 1. 256 1.648 1.057 1.039 1.480 22 0.970 0.598 1.137 1.132 1.262 1.694 1.084 1.042 1.501 23 0.992 0.547 1.131 1.194 1.294 1.759 1.106 1.073 1.539 24 1.013 0.492 1.124 1.254 1.312 1.813 1.128 1.096 1.570 25 1.034 0.426 1.116 1.310 1.306 1.848 1.151 1.104 1. 593 26 1.056 0.367 1.115 1.363 1.293 1.876 1.178 1.120 1.623 27 1.080 0.315 1.122 1.409 1.273 1.897 1.209 1.142 1.661 28 1.105 0.265 1.133 1.448 1.240 1.904 1.240 1.163 1.698 29 1.130 0.210 1.146 1.478 1.188 1.893 1.267 1.172 1.723 30 1.153 0.189 1.16 5 1.496 1.158 1.889 1.283 1.201 1.755 31 1.170 0.179 1.181 1.502 1.131 1.877 1.284 1.222 1.770 32 1.178 0.153 1.184 1.494 1.084 1.843 1.264 1.203 1.743 33 1.170 0.171 1.179 1.472 1.078 1.821 1.221 1.198 1.708 34 1.143 0.172 1.153 1.432 1.053 1.775 1.15 2 1.141 1.619 35 1.091 0.189 1.105 1.374 1.037 1.719 1.052 1.063 1.494 36 1.009 0.210 1.028 1.293 1.015 1.641 0.920 0.956 1.325 37 0.892 0.230 0.919 1.182 0.974 1.529 0.757 0.821 1.116 38 0.742 0.337 0.813 1.034 0.994 1.433 0.562 0.753 0.939 39 0.554 0.190 0.584 0.804 0.684 1.054 0.310 0.423 0.524 40 0.277 0.044 0.280 0.397 0.005 0.396 0.103 1.132 1.136 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 295

295 Table A 130. SEE, Bias, and RMSD for ODL TCF me thod condition 10 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.148 0.148 0.000 0.148 0.148 0.155 2.637 2.642 1 0.328 0.968 1.022 0.718 1.689 1.835 0.276 0.386 0.474 2 0.372 1.136 1.195 0.678 1.637 1.771 0.331 0.554 0.645 3 0.425 1.346 1.411 0.651 1.604 1.731 0.388 0.760 0.853 4 0.472 1.584 1.653 0.626 1.607 1.724 0.447 1.018 1.111 5 0.500 1.745 1.815 0.617 1.609 1.723 0.493 1.210 1.306 6 0.511 1.856 1.925 0.625 1.678 1.790 0.519 1.348 1.444 7 0.510 1.900 1.967 0.651 1.808 1.921 0.528 1.411 1.506 8 0.498 1.874 1.939 0.679 1.957 2.071 0.522 1.404 1.498 9 0.480 1.805 1.868 0.691 2.086 2.197 0.508 1.357 1.448 10 0.458 1.680 1.741 0.684 2.132 2.239 0.487 1.2 58 1.348 11 0.435 1.525 1.586 0.661 2.110 2.211 0.463 1.135 1.226 12 0.411 1.356 1.417 0.629 2.033 2.128 0.438 1.004 1.095 13 0.389 1.169 1.231 0.589 1.907 1.995 0.415 0.860 0.955 14 0.369 0.977 1.044 0.545 1.751 1.834 0.394 0.719 0.819 15 0.352 0.764 0.841 0.498 1.555 1.632 0.377 0.560 0.674 16 0.341 0.506 0.609 0.451 1.298 1.374 0.364 0.359 0.511 17 0.334 0.227 0.403 0.407 1.010 1.088 0.358 0.140 0.384 18 0.333 0.030 0.334 0.369 0.735 0.822 0.358 0.054 0.362 19 0.339 0.276 0.437 0.342 0.462 0. 575 0.365 0.237 0.435 20 0.350 0.534 0.638 0.330 0.169 0.370 0.378 0.431 0.573 21 0.366 0.802 0.882 0.337 0.145 0.366 0.396 0.636 0.748 22 0.387 1.056 1.124 0.361 0.456 0.581 0.418 0.825 0.924 23 0.410 1.271 1.335 0.400 0.739 0.840 0.443 0. 973 1.069 24 0.436 1.482 1.544 0.452 1.031 1.125 0.468 1.119 1.212 25 0.463 1.698 1.760 0.509 1.341 1.434 0.495 1.270 1.363 26 0.493 1.903 1.966 0.571 1.646 1.742 0.523 1.413 1.506 27 0.524 2.098 2.162 0.635 1.942 2.043 0.553 1.550 1.645 2 8 0.556 2.292 2.358 0.699 2.229 2.335 0.586 1.693 1.791 29 0.589 2.490 2.559 0.762 2.504 2.617 0.624 1.852 1.953 30 0.625 2.654 2.727 0.821 2.718 2.839 0.664 1.983 2.091 31 0.663 2.804 2.881 0.877 2.882 3.012 0.706 2.103 2.218 32 0.700 2.9 57 3.038 0.925 3.011 3.149 0.747 2.227 2.348 33 0.736 3.041 3.129 0.965 3.034 3.183 0.782 2.282 2.412 34 0.767 3.102 3.195 0.992 3.002 3.161 0.808 2.309 2.445 35 0.791 3.085 3.184 1.001 2.876 3.044 0.815 2.248 2.390 36 0.801 2.972 3.078 0.9 86 2.665 2.841 0.791 2.072 2.217 37 0.783 2.726 2.836 0.941 2.374 2.552 0.719 1.742 1.884 38 0.716 2.216 2.328 0.858 1.915 2.098 0.578 1.137 1.275 39 0.579 1.751 1.844 0.723 1.605 1.760 0.331 0.565 0.655 40 0.366 1.127 1.184 0.485 1.176 1. 271 0.103 1.132 1.136 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 296

296 Table A 131. SEE, Bias, and RMSD for ODL ICF method condition 10 Item Characteristic Curve Method Raw Score Full MIRT Approx Obse rved Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.148 0.148 0.000 0.148 0.148 0.155 2.637 2.642 1 0.211 0.149 0.257 0.587 0.066 0.589 0.201 0.235 0.309 2 0.234 0.107 0.257 0.585 0.178 0.611 0.241 0.335 0.413 3 0.257 0.138 0. 291 0.557 0.221 0.598 0.269 0.369 0.456 4 0.282 0.222 0.359 0.523 0.218 0.566 0.298 0.343 0.454 5 0.303 0.281 0.413 0.492 0.230 0.542 0.326 0.332 0.465 6 0.316 0.350 0.471 0.467 0.216 0.513 0.347 0.300 0.458 7 0.320 0.405 0.516 0.454 0.188 0.4 91 0.358 0.268 0.446 8 0.316 0.433 0.536 0.455 0.152 0.479 0.358 0.245 0.433 9 0.307 0.451 0.545 0.462 0.095 0.471 0.350 0.214 0.409 10 0.293 0.434 0.524 0.465 0.055 0.467 0.335 0.201 0.390 11 0.277 0.405 0.490 0.458 0.023 0.458 0.316 0.187 0. 367 12 0.259 0.370 0.452 0.443 0.007 0.442 0.295 0.165 0.337 13 0.241 0.326 0.405 0.420 0.030 0.420 0.272 0.143 0.307 14 0.223 0.284 0.361 0.390 0.058 0.393 0.250 0.110 0.272 15 0.205 0.225 0.304 0.355 0.072 0.361 0.228 0.087 0.243 16 0.188 0.126 0.226 0.317 0.049 0.320 0.207 0.100 0.230 17 0.175 0.010 0.174 0.277 0.013 0.277 0.190 0.125 0.228 18 0.165 0.081 0.183 0.239 0.005 0.238 0.178 0.122 0.215 19 0.159 0.160 0.225 0.205 0.012 0.205 0.172 0.103 0.200 20 0.159 0.249 0.295 0.179 0.009 0.178 0.172 0.093 0.195 21 0.164 0.350 0.386 0.166 0.006 0.166 0.180 0.095 0.203 22 0.174 0.437 0.470 0.172 0.010 0.172 0.194 0.085 0.211 23 0.187 0.486 0.521 0.195 0.017 0.195 0.211 0.040 0.215 24 0.202 0.536 0.573 0.230 0.038 0.233 0.231 0.001 0.231 25 0.219 0.596 0.635 0.273 0.041 0.276 0.252 0.022 0.252 26 0.237 0.648 0.690 0.321 0.044 0.323 0.273 0.047 0.276 27 0.256 0.694 0.739 0.370 0.048 0.372 0.295 0.075 0.303 28 0.275 0.740 0.789 0.418 0.047 0.420 0.317 0.098 0.331 29 0.29 5 0.793 0.846 0.463 0.036 0.464 0.341 0.111 0.358 30 0.316 0.811 0.870 0.504 0.059 0.506 0.365 0.153 0.395 31 0.337 0.815 0.882 0.538 0.096 0.545 0.388 0.203 0.437 32 0.359 0.826 0.900 0.563 0.122 0.575 0.408 0.235 0.470 33 0.380 0.780 0.867 0.581 0.197 0.612 0.423 0.305 0.521 34 0.397 0.732 0.833 0.590 0.259 0.643 0.429 0.350 0.553 35 0.408 0.649 0.766 0.593 0.332 0.678 0.422 0.396 0.578 36 0.407 0.540 0.676 0.584 0.400 0.707 0.397 0.428 0.583 37 0.388 0.409 0.563 0.560 0.448 0.716 0.353 0 .441 0.564 38 0.346 0.175 0.387 0.509 0.557 0.753 0.284 0.522 0.594 39 0.281 0.186 0.336 0.421 0.386 0.570 0.171 0.332 0.373 40 0.198 0.198 0.280 0.234 0.042 0.237 0.103 1.132 1.136 Note: Test scores at both ends (i.e., 0, 40) are not counted due t o ad hoc procedure (Kolen, 1981).

PAGE 297

297 Table A 132. SEE, Bias, and RMSD for NOP method condition 10 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.148 0.148 0.000 0.1 48 0.148 0.155 2.637 2.642 1 0.171 0.267 0.316 0.564 0.128 0.577 0.219 0.237 0.322 2 0.190 0.265 0.326 0.547 0.056 0.548 0.253 0.303 0.394 3 0.217 0.355 0.415 0.507 0.035 0.507 0.255 0.282 0.379 4 0.245 0.512 0.568 0.465 0.063 0.469 0.249 0.180 0.3 07 5 0.275 0.653 0.708 0.421 0.073 0.427 0.254 0.075 0.264 6 0.305 0.805 0.860 0.383 0.122 0.401 0.276 0.061 0.282 7 0.338 0.938 0.997 0.362 0.200 0.412 0.313 0.198 0.370 8 0.373 1.038 1.102 0.363 0.305 0.474 0.357 0.321 0.479 9 0.410 1.120 1.192 0.3 88 0.443 0.589 0.404 0.442 0.598 10 0.448 1.162 1.245 0.429 0.569 0.712 0.451 0.534 0.698 11 0.487 1.184 1.280 0.474 0.688 0.835 0.498 0.618 0.793 12 0.526 1.199 1.308 0.520 0.802 0.955 0.546 0.703 0.889 13 0.565 1.200 1.326 0.567 0.904 1.066 0.595 0.7 83 0.982 14 0.605 1.200 1.344 0.615 1.007 1.179 0.644 0.870 1.081 15 0.644 1.181 1.345 0.667 1.092 1.279 0.693 0.943 1.170 16 0.683 1.118 1.309 0.722 1.136 1.345 0.742 0.979 1.228 17 0.720 1.035 1.260 0.778 1.162 1.398 0.789 0.999 1.272 18 0.755 0.973 1.231 0.834 1.214 1.471 0.833 1.042 1.333 19 0.787 0.921 1.210 0.888 1.275 1.552 0.871 1.094 1.397 20 0.816 0.853 1.179 0.940 1.321 1.619 0.902 1.128 1.443 21 0.842 0.769 1.139 0.986 1.348 1.668 0.927 1.140 1.468 22 0.864 0.693 1.106 1.027 1.379 1.717 0.946 1.156 1.492 23 0.884 0.650 1.095 1.062 1.435 1.783 0.961 1.198 1.534 24 0.901 0.603 1.082 1.089 1.477 1.833 0.976 1.231 1.569 25 0.917 0.543 1.064 1.110 1.494 1.859 0.991 1.248 1.592 26 0.933 0.490 1.052 1.122 1.504 1.875 1.010 1.272 1.622 27 0 .951 0.445 1.048 1.126 1.506 1.878 1.030 1.303 1.659 28 0.968 0.402 1.046 1.120 1.492 1.864 1.047 1.334 1.694 29 0.984 0.354 1.043 1.103 1.458 1.826 1.056 1.353 1.714 30 0.994 0.340 1.049 1.076 1.443 1.798 1.050 1.393 1.743 31 0.996 0.338 1.050 1.039 1 .428 1.764 1.023 1.425 1.753 32 0.986 0.321 1.035 0.993 1.389 1.706 0.972 1.416 1.716 33 0.959 0.347 1.018 0.937 1.386 1.671 0.897 1.417 1.676 34 0.913 0.354 0.977 0.878 1.358 1.616 0.800 1.362 1.579 35 0.849 0.373 0.925 0.807 1.331 1.556 0.684 1.279 1 .449 36 0.762 0.388 0.854 0.732 1.290 1.483 0.556 1.154 1.280 37 0.655 0.396 0.764 0.639 1.219 1.375 0.422 0.987 1.073 38 0.529 0.481 0.714 0.540 1.201 1.316 0.292 0.874 0.921 39 0.377 0.297 0.479 0.392 0.846 0.932 0.176 0.480 0.511 40 0.146 0.019 0.1 47 0.087 0.094 0.128 0.103 1.132 1.136 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 298

298 Table A 133. method condition 11 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.054 0.054 0.000 0.054 0.054 0.155 2.732 2.736 1 0.194 0.591 0.622 0.363 0.638 0.734 0.189 0.009 0.188 2 0.217 1.028 1.051 0.560 1.015 1.159 0.221 0.207 0.303 3 0.188 1.381 1.39 4 0.573 1.193 1.323 0.230 0.474 0.527 4 0.242 1.679 1.696 0.546 1.244 1.358 0.234 0.726 0.763 5 0.251 2.054 2.070 0.515 1.297 1.395 0.241 1.052 1.079 6 0.237 2.387 2.398 0.470 1.328 1.408 0.252 1.390 1.412 7 0.266 2.709 2.721 0.427 1.355 1 .421 0.264 1.706 1.726 8 0.254 3.057 3.067 0.407 1.438 1.494 0.275 2.028 2.047 9 0.260 3.393 3.403 0.404 1.599 1.649 0.285 2.339 2.356 10 0.261 3.751 3.760 0.403 1.830 1.874 0.292 2.632 2.648 11 0.251 4.103 4.111 0.413 2.105 2.145 0.296 2. 909 2.924 12 0.254 4.458 4.466 0.424 2.402 2.438 0.295 3.165 3.178 13 0.242 4.813 4.819 0.434 2.702 2.736 0.289 3.396 3.409 14 0.232 5.130 5.135 0.438 2.973 3.005 0.278 3.588 3.599 15 0.226 5.436 5.440 0.439 3.224 3.253 0.264 3.757 3.766 1 6 0.209 5.745 5.748 0.433 3.468 3.494 0.247 3.926 3.934 17 0.194 6.025 6.028 0.420 3.680 3.704 0.228 4.076 4.082 18 0.183 6.277 6.279 0.399 3.855 3.875 0.209 4.201 4.206 19 0.169 6.509 6.511 0.372 4.002 4.019 0.190 4.307 4.312 20 0.154 6.7 26 6.728 0.340 4.141 4.155 0.174 4.409 4.412 21 0.145 6.958 6.959 0.305 4.306 4.317 0.162 4.532 4.535 22 0.144 7.145 7.146 0.267 4.441 4.449 0.157 4.611 4.614 23 0.150 7.262 7.263 0.228 4.524 4.530 0.160 4.617 4.620 24 0.163 7.351 7.352 0.1 92 4.604 4.608 0.174 4.595 4.598 25 0.184 7.406 7.408 0.166 4.680 4.683 0.195 4.540 4.544 26 0.212 7.436 7.439 0.155 4.758 4.760 0.222 4.468 4.474 27 0.244 7.432 7.436 0.166 4.824 4.827 0.251 4.381 4.388 28 0.279 7.387 7.392 0.195 4.867 4. 871 0.280 4.285 4.294 29 0.314 7.346 7.353 0.235 4.921 4.927 0.306 4.233 4.244 30 0.348 7.278 7.286 0.282 4.943 4.951 0.331 4.190 4.203 31 0.381 7.187 7.197 0.331 4.929 4.940 0.355 4.156 4.171 32 0.411 7.083 7.095 0.381 4.881 4.896 0.380 4 .127 4.145 33 0.441 6.898 6.912 0.432 4.728 4.748 0.407 4.025 4.045 34 0.471 6.682 6.698 0.481 4.529 4.555 0.437 3.890 3.914 35 0.504 6.353 6.373 0.524 4.225 4.257 0.470 3.631 3.661 36 0.541 5.817 5.842 0.553 3.774 3.814 0.498 3.151 3.190 37 0.572 5.015 5.047 0.557 3.208 3.256 0.499 2.403 2.454 38 0.560 3.852 3.892 0.528 2.540 2.594 0.423 1.368 1.432 39 0.448 2.539 2.579 0.459 1.937 1.991 0.244 0.339 0.417 40 0.248 1.219 1.244 0.321 1.177 1.220 0.103 0.940 0.946 Note: Test s cores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 299

299 Table A 134. SEE, Bias, and RMSD for ODL d irect method condition 11 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.054 0.054 0.000 0.054 0.054 0.155 2.732 2.736 1 0.675 0.157 0.692 1.400 0.580 1.512 0.570 0.225 0.611 2 1.029 0.533 1.156 1.759 0.280 1.777 0.882 0.140 0.891 3 1.329 0.899 1.602 2.089 0.007 2.084 1.152 0.004 1.149 4 1.594 1.179 1.980 2.393 0.169 2.393 1.393 0.071 1.391 5 1.837 1.485 2.359 2.682 0.303 2.693 1.600 0.187 1.607 6 2.042 1.763 2.694 2.958 0.361 2.972 1.777 0.307 1.799 7 2.201 2.005 2.973 3.208 0.327 3.217 1.930 0.418 1.970 8 2.323 2.256 3.2 34 3.432 0.283 3.435 2.066 0.558 2.135 9 2.397 2.498 3.458 3.585 0.256 3.586 2.184 0.713 2.293 10 2.458 2.745 3.680 3.674 0.282 3.676 2.282 0.876 2.439 11 2.494 2.994 3.892 3.696 0.362 3.705 2.358 1.048 2.575 12 2.506 3.242 4.094 3.674 0.4 89 3.698 2.413 1.219 2.698 13 2.509 3.483 4.290 3.629 0.658 3.679 2.447 1.385 2.807 14 2.494 3.692 4.452 3.557 0.831 3.644 2.464 1.526 2.893 15 2.463 3.880 4.592 3.466 1.014 3.603 2.467 1.653 2.965 16 2.419 4.064 4.726 3.354 1.217 3.560 2.4 61 1.787 3.036 17 2.372 4.219 4.837 3.224 1.410 3.511 2.450 1.907 3.100 18 2.327 4.335 4.918 3.060 1.575 3.434 2.440 2.006 3.154 19 2.297 4.420 4.979 2.874 1.720 3.343 2.436 2.091 3.206 20 2.297 4.487 5.038 2.678 1.858 3.254 2.440 2.177 3. 265 21 2.337 4.566 5.127 2.502 2.025 3.214 2.451 2.292 3.352 22 2.428 4.595 5.194 2.382 2.163 3.213 2.470 2.375 3.422 23 2.568 4.551 5.222 2.352 2.252 3.252 2.493 2.400 3.456 24 2.750 4.484 5.257 2.426 2.344 3.369 2.520 2.410 3.482 25 2.96 8 4.393 5.297 2.581 2.441 3.548 2.554 2.405 3.503 26 3.209 4.289 5.352 2.793 2.551 3.778 2.598 2.397 3.529 27 3.457 4.167 5.409 3.048 2.661 4.040 2.652 2.383 3.560 28 3.698 4.024 5.459 3.329 2.755 4.315 2.708 2.367 3.592 29 3.919 3.908 5.5 28 3.612 2.873 4.608 2.758 2.395 3.648 30 4.107 3.787 5.578 3.870 2.973 4.872 2.793 2.430 3.697 31 4.233 3.673 5.597 4.080 3.056 5.090 2.810 2.471 3.737 32 4.299 3.574 5.582 4.235 3.130 5.258 2.808 2.520 3.768 33 4.281 3.427 5.475 4.228 3. 163 5.272 2.788 2.504 3.742 34 4.180 3.286 5.309 4.113 3.196 5.201 2.744 2.478 3.692 35 3.980 3.087 5.029 3.932 3.148 5.029 2.648 2.373 3.551 36 3.678 2.759 4.591 3.655 2.966 4.700 2.464 2.117 3.244 37 3.250 2.296 3.973 3.249 2.653 4.188 2. 161 1.684 2.735 38 2.644 1.685 3.130 2.705 2.169 3.462 1.687 1.022 1.969 39 1.820 1.139 2.143 1.975 1.702 2.604 0.920 0.312 0.969 40 0.803 0.622 1.014 1.019 1.096 1.495 0.103 0.940 0.946 Note: Test scores at both ends (i.e., 0, 40) are not co unted due to ad hoc procedure (Kolen, 1981).

PAGE 300

300 Table A 135. SEE, Bias, and RMSD for ODL TCF method condition 11 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.054 0 .054 0.000 0.054 0.054 0.155 2.732 2.736 1 0.435 0.553 0.703 1.034 1.493 1.815 0.357 0.514 0.625 2 0.538 0.517 0.745 1.086 1.452 1.812 0.486 0.646 0.807 3 0.635 0.423 0.762 1.105 1.363 1.753 0.618 0.724 0.950 4 0.728 0.365 0.813 1.113 1.352 1.750 0.75 6 0.875 1.155 5 0.811 0.254 0.848 1.141 1.340 1.758 0.880 0.988 1.322 6 0.884 0.126 0.891 1.194 1.378 1.821 0.979 1.080 1.456 7 0.931 0.003 0.928 1.254 1.483 1.940 1.054 1.158 1.564 8 0.977 0.155 0.987 1.319 1.596 2.069 1.110 1.181 1.619 9 1.010 0.3 40 1.063 1.365 1.691 2.171 1.153 1.163 1.636 10 1.037 0.553 1.173 1.379 1.741 2.219 1.186 1.109 1.622 11 1.061 0.790 1.321 1.372 1.732 2.208 1.212 1.019 1.581 12 1.079 1.042 1.498 1.366 1.665 2.151 1.231 0.903 1.524 13 1.093 1.308 1.702 1.349 1.551 2.054 1.245 0.768 1.460 14 1.100 1.565 1.911 1.329 1.415 1.938 1.254 0.637 1.404 15 1.105 1.819 2.126 1.307 1.251 1.807 1.257 0.501 1.350 16 1.107 2.090 2.363 1.286 1.048 1.656 1.255 0.341 1.298 17 1.106 2.355 2.600 1.263 0.830 1.508 1.249 0.183 1 .260 18 1.102 2.600 2.823 1.236 0.609 1.375 1.240 0.034 1.237 19 1.098 2.834 3.038 1.206 0.381 1.262 1.227 0.111 1.229 20 1.095 3.070 3.258 1.173 0.131 1.177 1.210 0.266 1.236 21 1.091 3.334 3.507 1.139 0.172 1.149 1.191 0.460 1.274 22 1.088 3.564 3.725 1.105 0.467 1.197 1.168 0.632 1.325 23 1.087 3.735 3.889 1.072 0.730 1.295 1.143 0.756 1.368 24 1.088 3.894 4.042 1.043 1.008 1.449 1.118 0.879 1.420 25 1.092 4.037 4.182 1.019 1.295 1.647 1.093 1.000 1.479 26 1.100 4.174 4.316 1.003 1.596 1.883 1.072 1.129 1.555 27 1.112 4.294 4.435 0.995 1.891 2.136 1.056 1.260 1.642 28 1.129 4.391 4.533 0.999 2.162 2.381 1.047 1.385 1.734 29 1.151 4.508 4.652 1.015 2.441 2.643 1.046 1.545 1.865 30 1.180 4.606 4.754 1.040 2.677 2.871 1.052 1.700 1.998 31 1.212 4.689 4.843 1.071 2.866 3.059 1.063 1.848 2.131 32 1.244 4.764 4.923 1.105 3.011 3.206 1.075 1.994 2.264 33 1.269 4.763 4.928 1.137 3.044 3.249 1.082 2.068 2.333 34 1.280 4.740 4.909 1.162 3.032 3.246 1.079 2.129 2.385 35 1.275 4.621 4.793 1.170 2.924 3.148 1.055 2.109 2.357 36 1.245 4.328 4.502 1.148 2.683 2.917 0.994 1.939 2.178 37 1.156 3.825 3.995 1.080 2.337 2.574 0.871 1.598 1.819 38 0.997 3.038 3.196 0.928 1.883 2.098 0.654 1.028 1.21 8 39 0.725 2.126 2.246 0.725 1.476 1.643 0.356 0.355 0.502 40 0.379 1.081 1.145 0.454 0.915 1.021 0.103 0.940 0.946 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 301

301 Table A 136. SEE, Bias, and RMS D for ODL ICF method condition 11 Item Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.054 0.054 0.000 0.054 0.054 0.155 2.732 2.736 1 0.147 0.070 0.163 0.527 0.094 0.534 0.181 0.030 0.183 2 0.156 0.190 0.246 0.505 0.029 0.505 0.208 0.040 0.211 3 0.167 0.345 0.384 0.481 0.051 0.482 0.217 0.128 0.252 4 0.177 0.439 0.474 0.452 0.048 0.454 0.224 0.133 0.260 5 0.191 0.570 0.601 0.418 0.062 0.421 0.231 0.146 0.273 6 0.200 0.708 0.735 0.385 0.049 0.387 0.237 0.128 0.269 7 0.197 0.823 0.846 0.354 0.003 0.353 0.240 0.067 0.249 8 0.189 0.946 0.965 0.336 0.038 0.337 0.238 0.009 0.238 9 0.182 1.069 1.084 0.334 0.081 0.342 0.232 0.050 0.237 10 0.176 1 .197 1.210 0.335 0.117 0.354 0.222 0.106 0.246 11 0.172 1.340 1.351 0.334 0.141 0.361 0.211 0.150 0.258 12 0.167 1.494 1.503 0.326 0.157 0.361 0.199 0.183 0.270 13 0.160 1.648 1.656 0.312 0.170 0.355 0.189 0.209 0.281 14 0.153 1.781 1.787 0.292 0.1 99 0.353 0.180 0.246 0.304 15 0.149 1.904 1.910 0.268 0.232 0.354 0.173 0.283 0.332 16 0.147 2.037 2.042 0.241 0.251 0.348 0.170 0.303 0.347 17 0.150 2.158 2.163 0.215 0.275 0.349 0.169 0.327 0.368 18 0.156 2.258 2.263 0.192 0.312 0.367 0.172 0.365 0.403 19 0.164 2.341 2.347 0.176 0.353 0.394 0.180 0.408 0.445 20 0.174 2.421 2.427 0.168 0.379 0.415 0.190 0.440 0.479 21 0.188 2.525 2.532 0.170 0.359 0.397 0.203 0.430 0.476 22 0.205 2.593 2.601 0.184 0.351 0.396 0.218 0.436 0.488 23 0.223 2. 599 2.609 0.206 0.374 0.427 0.233 0.480 0.534 24 0.243 2.594 2.605 0.235 0.380 0.447 0.249 0.513 0.570 25 0.263 2.575 2.589 0.269 0.371 0.458 0.264 0.536 0.597 26 0.283 2.555 2.571 0.305 0.341 0.457 0.279 0.543 0.610 27 0.303 2.525 2.543 0.343 0.30 5 0.458 0.294 0.545 0.619 28 0.322 2.479 2.500 0.380 0.278 0.470 0.310 0.556 0.636 29 0.341 2.459 2.482 0.415 0.224 0.471 0.328 0.536 0.628 30 0.360 2.423 2.450 0.446 0.189 0.483 0.347 0.527 0.630 31 0.380 2.374 2.404 0.470 0.171 0.499 0.367 0.521 0.637 32 0.400 2.314 2.348 0.486 0.159 0.510 0.384 0.507 0.636 33 0.421 2.176 2.216 0.494 0.212 0.537 0.398 0.538 0.669 34 0.442 2.022 2.070 0.495 0.250 0.553 0.404 0.535 0.670 35 0.460 1.795 1.853 0.490 0.308 0.577 0.396 0.538 0.668 36 0.467 1.4 56 1.529 0.480 0.405 0.627 0.370 0.573 0.682 37 0.446 1.052 1.142 0.463 0.495 0.677 0.323 0.608 0.688 38 0.394 0.610 0.725 0.429 0.562 0.706 0.256 0.639 0.688 39 0.291 0.350 0.455 0.362 0.427 0.559 0.154 0.484 0.507 40 0.168 0.188 0.252 0.215 0.214 0.303 0.103 0.940 0.946 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 302

302 Table A 137. SEE, Bias, RMSD for NOP method condition 11 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.054 0.054 0.000 0.054 0.054 0.155 2.732 2.736 1 0.000 0.974 0.974 0.009 0.974 0.974 0.158 0.307 0.345 2 0.139 1.936 1.941 0.111 2.010 2.013 0.186 0.728 0.751 3 0.092 2.487 2.48 9 0.311 2.868 2.885 0.190 1.201 1.216 4 0.179 3.029 3.035 0.487 3.261 3.297 0.183 1.643 1.653 5 0.125 3.492 3.494 0.488 3.397 3.432 0.175 2.153 2.160 6 0.171 4.036 4.039 0.466 3.466 3.497 0.170 2.681 2.687 7 0.148 4.452 4.455 0.440 3.521 3 .548 0.166 3.202 3.206 8 0.160 4.983 4.986 0.419 3.655 3.679 0.164 3.744 3.748 9 0.167 5.430 5.433 0.388 3.878 3.897 0.163 4.289 4.292 10 0.157 5.958 5.960 0.338 4.161 4.175 0.164 4.829 4.832 11 0.177 6.437 6.439 0.343 4.534 4.546 0.165 5. 363 5.366 12 0.163 6.957 6.959 0.309 4.987 4.997 0.169 5.879 5.882 13 0.182 7.445 7.447 0.323 5.449 5.459 0.175 6.368 6.371 14 0.176 7.924 7.926 0.294 5.936 5.943 0.183 6.805 6.808 15 0.194 8.366 8.369 0.310 6.399 6.406 0.194 7.200 7.202 1 6 0.198 8.823 8.825 0.291 6.878 6.884 0.208 7.569 7.571 17 0.214 9.232 9.234 0.302 7.313 7.319 0.225 7.891 7.894 18 0.230 9.620 9.623 0.296 7.711 7.717 0.245 8.160 8.164 19 0.246 9.961 9.964 0.297 8.050 8.056 0.268 8.386 8.391 20 0.273 10. 287 10.290 0.302 8.362 8.368 0.295 8.587 8.592 21 0.296 10.622 10.626 0.299 8.665 8.670 0.324 8.793 8.799 22 0.322 10.897 10.902 0.300 8.897 8.902 0.356 8.942 8.949 23 0.354 11.094 11.099 0.305 9.043 9.048 0.391 9.009 9.017 24 0.388 11.258 11.264 0.313 9.153 9.159 0.429 9.036 9.047 25 0.424 11.383 11.391 0.324 9.228 9.233 0.469 9.020 9.032 26 0.461 11.476 11.485 0.340 9.274 9.281 0.512 8.965 8.979 27 0.500 11.526 11.537 0.362 9.282 9.289 0.555 8.864 8.882 28 0.540 11.522 11.5 35 0.391 9.240 9.249 0.596 8.716 8.736 29 0.581 11.506 11.521 0.424 9.189 9.198 0.633 8.570 8.593 30 0.620 11.438 11.454 0.459 9.089 9.100 0.662 8.397 8.423 31 0.658 11.319 11.338 0.496 8.945 8.959 0.683 8.209 8.237 32 0.692 11.157 11.178 0 .533 8.768 8.784 0.695 8.018 8.048 33 0.723 10.881 10.904 0.569 8.490 8.509 0.702 7.756 7.787 34 0.749 10.543 10.570 0.606 8.167 8.189 0.709 7.469 7.503 35 0.771 10.066 10.095 0.644 7.721 7.747 0.722 7.056 7.092 36 0.789 9.354 9.387 0.681 7.070 7.102 0.747 6.366 6.409 37 0.802 8.325 8.364 0.703 6.171 6.210 0.776 5.199 5.256 38 0.791 6.773 6.818 0.668 4.961 5.006 0.717 3.181 3.261 39 0.663 4.641 4.688 0.553 3.619 3.661 0.436 0.710 0.833 40 0.352 2.159 2.187 0.373 2.030 2.064 0.103 0.940 0.946 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 303

303 Table A 138. method condition 12 Min's Method Raw Score Full MIRT Approx Observed Approx True SEE Bi as RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.266 0.266 0.000 0.266 0.266 0.155 3.052 3.055 1 0.126 0.844 0.854 0.392 0.557 0.680 0.202 0.075 0.215 2 0.154 1.354 1.362 0.587 0.960 1.125 0.235 0.162 0.285 3 0.262 1.651 1.671 0.608 1.005 1.174 0.2 46 0.294 0.383 4 0.234 2.050 2.063 0.575 1.079 1.222 0.254 0.553 0.608 5 0.256 2.397 2.410 0.541 1.126 1.248 0.264 0.867 0.906 6 0.279 2.769 2.783 0.502 1.146 1.251 0.277 1.189 1.221 7 0.262 3.141 3.152 0.469 1.204 1.292 0.290 1.542 1.569 8 0.289 3.510 3.522 0.449 1.304 1.379 0.302 1.879 1.903 9 0.271 3.908 3.918 0.450 1.502 1.568 0.312 2.228 2.250 10 0.283 4.308 4.317 0.448 1.771 1.827 0.319 2.573 2.593 11 0.276 4.685 4.693 0.452 2.057 2.106 0.321 2.866 2.883 12 0.268 5.0 31 5.038 0.465 2.346 2.392 0.319 3.122 3.138 13 0.270 5.383 5.389 0.474 2.646 2.688 0.312 3.365 3.379 14 0.253 5.720 5.725 0.476 2.940 2.978 0.300 3.585 3.597 15 0.243 6.007 6.012 0.476 3.193 3.228 0.284 3.761 3.771 16 0.235 6.303 6.307 0.4 70 3.446 3.478 0.265 3.945 3.953 17 0.215 6.599 6.602 0.454 3.696 3.723 0.244 4.134 4.141 18 0.196 6.857 6.860 0.431 3.914 3.938 0.222 4.303 4.309 19 0.181 7.085 7.087 0.403 4.104 4.124 0.200 4.453 4.457 20 0.167 7.281 7.283 0.371 4.266 4. 282 0.179 4.576 4.580 21 0.153 7.475 7.477 0.333 4.439 4.452 0.162 4.707 4.710 22 0.144 7.624 7.625 0.293 4.590 4.599 0.152 4.800 4.803 23 0.144 7.705 7.706 0.252 4.698 4.704 0.151 4.831 4.833 24 0.156 7.753 7.755 0.214 4.803 4.807 0.161 4 .831 4.834 25 0.178 7.770 7.772 0.183 4.906 4.910 0.182 4.802 4.806 26 0.208 7.755 7.758 0.166 5.009 5.012 0.210 4.751 4.756 27 0.244 7.695 7.699 0.168 5.093 5.095 0.240 4.675 4.681 28 0.281 7.604 7.609 0.188 5.161 5.164 0.270 4.597 4.605 29 0.319 7.517 7.523 0.222 5.238 5.243 0.298 4.562 4.571 30 0.353 7.380 7.388 0.264 5.257 5.263 0.322 4.511 4.522 31 0.385 7.208 7.219 0.309 5.221 5.230 0.345 4.451 4.465 32 0.413 7.056 7.068 0.357 5.175 5.187 0.368 4.425 4.440 33 0.440 6. 850 6.864 0.405 5.044 5.060 0.393 4.346 4.364 34 0.466 6.641 6.657 0.453 4.883 4.903 0.421 4.254 4.275 35 0.494 6.292 6.311 0.497 4.574 4.601 0.452 3.998 4.024 36 0.523 5.746 5.770 0.529 4.108 4.141 0.482 3.505 3.538 37 0.547 4.992 5.022 0. 539 3.551 3.591 0.491 2.749 2.792 38 0.535 3.884 3.920 0.519 2.847 2.894 0.430 1.628 1.683 39 0.443 2.687 2.723 0.462 2.229 2.276 0.253 0.509 0.568 40 0.264 1.413 1.437 0.337 1.431 1.470 0.103 1.003 1.008 Note: Test scores at both ends (i.e. 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 304

304 Table A 1 39. SEE, Bias, and RMSD for ODL d irect method condition 12 Oshima's Direct Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0. 000 0.266 0.266 0.000 0.266 0.266 0.155 3.052 3.055 1 0.394 0.524 0.655 1.054 0.254 1.082 0.468 0.162 0.494 2 0.717 1.061 1.279 1.419 0.096 1.419 0.742 0.021 0.740 3 1.005 1.414 1.734 1.771 0.254 1.785 0.983 0.075 0.983 4 1.259 1.789 2.186 2.10 2 0.444 2.143 1.202 0.217 1.218 5 1.504 2.147 2.619 2.392 0.582 2.457 1.399 0.377 1.446 6 1.726 2.460 3.003 2.665 0.647 2.736 1.576 0.525 1.657 7 1.913 2.775 3.368 2.945 0.701 3.021 1.737 0.699 1.868 8 2.065 3.057 3.686 3.203 0.716 3.274 1 .887 0.863 2.071 9 2.181 3.336 3.983 3.413 0.754 3.487 2.025 1.050 2.277 10 2.273 3.611 4.264 3.544 0.818 3.628 2.148 1.246 2.478 11 2.322 3.830 4.476 3.621 0.896 3.721 2.251 1.404 2.648 12 2.380 4.027 4.674 3.666 0.998 3.790 2.333 1.538 2 .789 13 2.406 4.203 4.840 3.690 1.142 3.854 2.391 1.669 2.911 14 2.420 4.351 4.976 3.652 1.287 3.864 2.425 1.783 3.005 15 2.421 4.445 5.058 3.589 1.413 3.849 2.436 1.857 3.058 16 2.409 4.527 5.126 3.511 1.562 3.835 2.425 1.938 3.100 17 2.3 91 4.593 5.175 3.411 1.716 3.810 2.398 2.022 3.132 18 2.378 4.610 5.184 3.297 1.841 3.769 2.360 2.083 3.143 19 2.384 4.582 5.162 3.174 1.936 3.711 2.319 2.122 3.139 20 2.423 4.504 5.111 3.056 1.994 3.643 2.285 2.136 3.124 21 2.504 4.414 5. 072 2.971 2.057 3.608 2.263 2.160 3.125 22 2.629 4.275 5.015 2.919 2.096 3.587 2.258 2.155 3.117 23 2.796 4.068 4.932 2.880 2.097 3.557 2.270 2.098 3.087 24 2.998 3.835 4.864 2.874 2.097 3.552 2.301 2.027 3.062 25 3.216 3.585 4.811 2.920 2 .099 3.591 2.351 1.947 3.048 26 3.426 3.330 4.771 3.036 2.105 3.688 2.423 1.864 3.052 27 3.615 3.062 4.731 3.240 2.097 3.852 2.516 1.772 3.072 28 3.783 2.795 4.696 3.492 2.089 4.062 2.616 1.692 3.110 29 3.918 2.569 4.677 3.740 2.117 4.290 2 .703 1.660 3.167 30 4.019 2.327 4.635 3.944 2.122 4.470 2.765 1.616 3.197 31 4.063 2.089 4.560 4.103 2.104 4.602 2.799 1.566 3.201 32 4.041 1.908 4.460 4.175 2.121 4.673 2.807 1.555 3.203 33 3.960 1.710 4.304 4.158 2.105 4.652 2.786 1.512 3.164 34 3.798 1.562 4.098 4.033 2.126 4.550 2.718 1.495 3.096 35 3.537 1.355 3.779 3.813 2.060 4.326 2.572 1.387 2.916 36 3.175 1.077 3.345 3.501 1.893 3.973 2.320 1.163 2.590 37 2.706 0.790 2.812 3.068 1.685 3.494 1.937 0.863 2.116 38 2. 119 0.451 2.162 2.484 1.377 2.835 1.410 0.417 1.467 39 1.403 0.341 1.440 1.772 1.188 2.129 0.725 0.078 0.727 40 0.620 0.302 0.688 0.904 0.910 1.281 0.103 1.003 1.008 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc pro cedure (Kolen, 1981).

PAGE 305

305 Table A 140. SEE, Bias, and RMSD for ODL TCF method condition 12 Test Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.266 0.266 0.000 0.266 0.266 0 .155 3.052 3.055 1 0.305 0.135 0.333 0.917 0.565 1.075 0.282 0.291 0.405 2 0.394 0.375 0.543 0.928 0.464 1.035 0.370 0.242 0.441 3 0.469 0.474 0.666 0.895 0.528 1.037 0.456 0.323 0.558 4 0.522 0.639 0.825 0.871 0.521 1.013 0.548 0.335 0.642 5 0.56 4 0.809 0.985 0.855 0.524 1.001 0.638 0.345 0.724 6 0.614 0.969 1.146 0.858 0.557 1.021 0.719 0.372 0.808 7 0.665 1.150 1.328 0.890 0.575 1.058 0.789 0.371 0.870 8 0.711 1.324 1.502 0.936 0.612 1.116 0.846 0.371 0.922 9 0.754 1.532 1.706 0.981 0.6 23 1.161 0.891 0.333 0.949 10 0.796 1.756 1.927 1.015 0.606 1.180 0.926 0.269 0.962 11 0.832 1.946 2.116 1.036 0.596 1.193 0.953 0.222 0.976 12 0.865 2.123 2.292 1.048 0.573 1.192 0.973 0.176 0.987 13 0.896 2.312 2.478 1.054 0.515 1.171 0.990 0.109 0.994 14 0.922 2.494 2.658 1.054 0.438 1.139 1.004 0.036 1.002 15 0.944 2.642 2.805 1.051 0.370 1.112 1.016 0.017 1.014 16 0.963 2.805 2.965 1.045 0.265 1.076 1.026 0.098 1.028 17 0.980 2.978 3.135 1.038 0.133 1.044 1.034 0.198 1.050 18 0.996 3.133 3.286 1.030 0.000 1.027 1.040 0.290 1.077 19 1.010 3.267 3.418 1.022 0.134 1.028 1.043 0.373 1.105 20 1.022 3.375 3.525 1.014 0.265 1.045 1.043 0.443 1.131 21 1.032 3.491 3.640 1.007 0.431 1.093 1.039 0.535 1.166 22 1.042 3.575 3.723 1.001 0.594 1.162 1.031 0.609 1.195 23 1.050 3.606 3.755 0.999 0.736 1.239 1.021 0.644 1.205 24 1.055 3.620 3.770 1.000 0.893 1.339 1.008 0.681 1.214 25 1.059 3.623 3.773 1.005 1.066 1.464 0.994 0.722 1.227 26 1.060 3.616 3.768 1.015 1.252 1.610 0.983 0.772 1.248 27 1.059 3.591 3.743 1.029 1.430 1.760 0.976 0.820 1.273 28 1.058 3.558 3.711 1.047 1.598 1.909 0.974 0.874 1.307 29 1.057 3.550 3.703 1.069 1.775 2.071 0.978 0.963 1.371 30 1.057 3.506 3.661 1.094 1.889 2.182 0.987 1.019 1.416 31 1.061 3.433 3.593 1.119 1.944 2.242 0.996 1.048 1.444 32 1.068 3.379 3.543 1.144 1.987 2.291 1.003 1.099 1.486 33 1.078 3.268 3.440 1.165 1.950 2.270 1.003 1.100 1.487 34 1.088 3.153 3.335 1.174 1.908 2.239 0.994 1.110 1.48 8 35 1.095 2.914 3.112 1.160 1.767 2.112 0.967 1.016 1.401 36 1.082 2.534 2.754 1.115 1.539 1.898 0.904 0.812 1.214 37 1.028 2.072 2.312 1.029 1.302 1.658 0.774 0.568 0.959 38 0.900 1.476 1.728 0.901 0.992 1.338 0.555 0.230 0.599 39 0.679 1.024 1.227 0.717 0.834 1.099 0.265 0.025 0.265 40 0.360 0.572 0.675 0.429 0.599 0.736 0.103 1.003 1.008 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 306

306 Table A 1 4 1. SEE, Bias, and RMSD for ODL IC F method condition 12 Item Characteristic Curve Method Raw Score Full MIRT Approx Observed Approx True SEE Bias RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.266 0.266 0.000 0.266 0.266 0.155 3.052 3.055 1 0.075 0.238 0.249 0.493 0.206 0.533 0.181 0. 113 0.213 2 0.107 0.439 0.452 0.473 0.122 0.487 0.195 0.012 0.195 3 0.160 0.537 0.560 0.444 0.191 0.482 0.189 0.068 0.200 4 0.165 0.731 0.749 0.408 0.186 0.447 0.178 0.065 0.189 5 0.151 0.901 0.914 0.367 0.183 0.409 0.172 0.070 0.185 6 0.151 1.02 8 1.039 0.330 0.212 0.391 0.171 0.111 0.204 7 0.155 1.165 1.175 0.295 0.228 0.372 0.174 0.142 0.224 8 0.168 1.296 1.307 0.268 0.258 0.371 0.177 0.192 0.261 9 0.177 1.462 1.472 0.258 0.266 0.370 0.179 0.222 0.285 10 0.180 1.633 1.643 0.256 0.258 0.3 63 0.181 0.235 0.297 11 0.183 1.763 1.772 0.257 0.276 0.376 0.185 0.273 0.329 12 0.189 1.872 1.881 0.254 0.302 0.394 0.191 0.316 0.369 13 0.199 1.985 1.995 0.247 0.314 0.399 0.199 0.340 0.394 14 0.212 2.090 2.101 0.237 0.326 0.403 0.211 0.359 0.417 15 0.224 2.165 2.176 0.225 0.364 0.428 0.225 0.399 0.458 16 0.236 2.255 2.267 0.216 0.377 0.435 0.240 0.412 0.477 17 0.248 2.354 2.367 0.212 0.372 0.428 0.256 0.406 0.480 18 0.261 2.432 2.446 0.215 0.374 0.431 0.274 0.407 0.490 19 0.276 2.489 2. 504 0.225 0.380 0.441 0.291 0.416 0.507 20 0.291 2.521 2.538 0.242 0.391 0.459 0.309 0.433 0.532 21 0.307 2.562 2.581 0.263 0.370 0.453 0.325 0.424 0.533 22 0.322 2.573 2.593 0.288 0.352 0.454 0.340 0.423 0.542 23 0.338 2.534 2.556 0.316 0.356 0.47 5 0.352 0.451 0.571 24 0.352 2.483 2.507 0.345 0.342 0.485 0.362 0.467 0.590 25 0.365 2.425 2.452 0.377 0.308 0.486 0.370 0.470 0.598 26 0.377 2.367 2.396 0.409 0.256 0.481 0.376 0.460 0.593 27 0.386 2.297 2.329 0.441 0.204 0.485 0.383 0.452 0.592 28 0.394 2.226 2.261 0.473 0.153 0.496 0.391 0.442 0.590 29 0.401 2.186 2.222 0.501 0.078 0.506 0.401 0.403 0.568 30 0.408 2.112 2.151 0.525 0.048 0.526 0.411 0.398 0.572 31 0.416 2.013 2.056 0.541 0.055 0.543 0.421 0.415 0.591 32 0.424 1.937 1.9 83 0.549 0.047 0.550 0.427 0.398 0.583 33 0.431 1.813 1.863 0.548 0.084 0.553 0.427 0.406 0.588 34 0.434 1.704 1.758 0.536 0.089 0.542 0.416 0.363 0.551 35 0.427 1.505 1.564 0.515 0.152 0.536 0.390 0.368 0.536 36 0.401 1.218 1.282 0.489 0.257 0.551 0.346 0.416 0.541 37 0.360 0.917 0.985 0.456 0.326 0.560 0.286 0.441 0.525 38 0.301 0.540 0.618 0.414 0.416 0.586 0.215 0.509 0.553 39 0.217 0.369 0.427 0.351 0.295 0.458 0.130 0.376 0.398 40 0.140 0.241 0.278 0.217 0.126 0.251 0.103 1.003 1.008 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 307

307 Table A 142. SEE, Bias, and RMSD for NOP method condition 12 Non Orthogonal Procrustes Method Raw Score Full MIRT Approx Observed Approx True SEE Bia s RMSD SEE Bias RMSD SEE Bias RMSD 0 0.000 0.266 0.266 0.000 0.266 0.266 0.155 3.052 3.055 1 0.000 0.911 0.911 0.000 0.911 0.911 0.166 0.239 0.291 2 0.001 2.012 2.012 0.124 1.979 1.983 0.200 0.706 0.733 3 0.175 2.777 2.782 0.331 2.728 2.748 0.2 08 1.048 1.069 4 0.102 3.324 3.326 0.488 3.182 3.219 0.204 1.500 1.514 5 0.184 3.936 3.940 0.525 3.336 3.376 0.197 2.000 2.010 6 0.174 4.372 4.376 0.519 3.409 3.448 0.192 2.516 2.523 7 0.158 4.968 4.970 0.507 3.514 3.551 0.189 3.076 3.082 8 0.207 5.446 5.450 0.478 3.655 3.686 0.188 3.635 3.640 9 0.153 6.005 6.007 0.445 3.895 3.920 0.187 4.222 4.226 10 0.198 6.570 6.573 0.404 4.229 4.248 0.187 4.818 4.822 11 0.168 7.078 7.080 0.397 4.603 4.620 0.187 5.372 5.376 12 0.187 7.6 16 7.618 0.361 5.032 5.045 0.189 5.896 5.899 13 0.183 8.107 8.109 0.369 5.501 5.513 0.193 6.404 6.407 14 0.189 8.634 8.636 0.342 5.996 6.006 0.198 6.879 6.882 15 0.194 9.077 9.079 0.350 6.463 6.473 0.205 7.292 7.295 16 0.203 9.560 9.562 0.3 35 6.948 6.956 0.215 7.689 7.692 17 0.209 10.008 10.010 0.341 7.427 7.435 0.227 8.065 8.068 18 0.231 10.439 10.441 0.330 7.869 7.876 0.243 8.392 8.395 19 0.239 10.820 10.823 0.335 8.263 8.270 0.262 8.674 8.678 20 0.266 11.156 11.159 0.330 8.607 8.613 0.284 8.908 8.913 21 0.290 11.485 11.489 0.325 8.927 8.933 0.309 9.132 9.138 22 0.315 11.755 11.759 0.326 9.190 9.196 0.338 9.306 9.312 23 0.347 11.947 11.952 0.325 9.377 9.382 0.369 9.406 9.413 24 0.383 12.099 12.105 0.324 9.52 4 9.530 0.405 9.466 9.475 25 0.420 12.210 12.217 0.327 9.637 9.643 0.443 9.486 9.496 26 0.460 12.280 12.288 0.336 9.718 9.724 0.485 9.464 9.476 27 0.502 12.293 12.303 0.352 9.751 9.757 0.528 9.389 9.403 28 0.545 12.257 12.269 0.374 9.742 9. 749 0.569 9.273 9.291 29 0.587 12.204 12.218 0.401 9.720 9.729 0.606 9.156 9.176 30 0.628 12.070 12.087 0.432 9.624 9.634 0.637 8.984 9.006 31 0.667 11.867 11.886 0.464 9.466 9.477 0.659 8.774 8.799 32 0.700 11.644 11.665 0.496 9.298 9.311 0.673 8.583 8.609 33 0.729 11.327 11.350 0.529 9.050 9.066 0.680 8.338 8.365 34 0.752 10.968 10.994 0.564 8.776 8.794 0.685 8.086 8.115 35 0.770 10.436 10.464 0.602 8.341 8.362 0.696 7.670 7.701 36 0.784 9.676 9.708 0.646 7.692 7.719 0.720 6.973 7.010 37 0.793 8.655 8.691 0.687 6.813 6.847 0.757 5.818 5.867 38 0.782 7.121 7.164 0.686 5.555 5.597 0.721 3.698 3.767 39 0.672 5.091 5.135 0.598 4.152 4.195 0.441 0.954 1.051 40 0.394 2.604 2.634 0.421 2.466 2.501 0.103 1.003 1.008 Note: Test scores at both ends (i.e., 0, 40) are not counted due to ad hoc procedure (Kolen, 1981).

PAGE 308

308 APPENDIX B FIGURES Figure B 1. Equivalent score d method condition 1 A) raw B) criterion A B

PAGE 309

309 Figure B 2. Equivalent score difference OD method condition 1 A) raw B) criterion A B

PAGE 310

310 Figure B 3. Equivalent Score Difference TCF method condition 1 A) raw B) criterion A B

PAGE 311

311 Figure B 4. Equivalent score difference ICF method condition 1 A) raw B) criterion A B

PAGE 312

312 Figure B 5. Equivalent score d ifference NOP method condition 1 A) raw B) criterion A B

PAGE 313

313 Figure B 6. Equivalent score difference method condition 2 A) raw B) criterion A B

PAGE 314

314 Figure B 7. Equivalent score difference OD method condition 2 A) raw B) criterion A B

PAGE 315

315 Figure B 8. Equi valent score difference TCF method condition 2 A) raw B) criterion A B

PAGE 316

316 Figure B 9. Equivalent score difference ICF method condition 2 A) raw B) criterion A B

PAGE 317

317 Figure B 10. Equivalent score difference NOP method condition 2. A) raw B) criterion A B

PAGE 318

318 Figur e B 11. Equivalent score difference method condition 3 A) raw B) criterion A B

PAGE 319

319 Figure B 12. Equivalent score difference OD method condition 3 A) raw B) criterion A B

PAGE 320

320 Figure B 13. Equivalent score difference TCF method condition 3 A) raw B) crite rion A B

PAGE 321

321 Figure B 14. Equivalent score difference ICF method condition 3 A) raw B) criterion A B

PAGE 322

322 Figure B 15. Equivalent score difference NOP method condition 3 A) raw B) criterion A B

PAGE 323

323 Figure B 16. Equivalent score difference method condition 4 A) raw B) criterion A B

PAGE 324

324 Figure B 17. Equivalent score difference OD method condition 4 A) raw B) criterion A B

PAGE 325

325 Figure B 18. Equivalent score difference TCF method condition 4 A) raw B) criterion A B

PAGE 326

326 Figure B 19. Equivalent score difference ICF method condition 4 A) raw B) criterion A B

PAGE 327

327 Figure B 20. Equivalent score difference NOP method condition 4 A) raw B) criterion A B

PAGE 328

328 Figure B 21. Equivalent score difference method condition 5 A) raw B) criterion A B

PAGE 329

329 Figure B 22. Equivalent score diffe rence OD method condition 5 A) raw B) criterion A B

PAGE 330

330 Figure B 23. Equivalent score difference TCF method condition 5 A) raw B) criterion A B

PAGE 331

331 Figure B 24. Equivalent score difference ICF method condition 5 A) raw B) criterion A B

PAGE 332

332 Figure B 25. Equivalen t score difference NOP method condition 5 A) raw B) criterion A B

PAGE 333

333 Figure B 26. Equivalent score difference method condition 6 A) raw B) criterion A B

PAGE 334

334 Figure B 27. Equivalent score difference OD method condition 6 A) raw B) criterion A B

PAGE 335

335 Figure B 28. Equivalent score difference TCF method condition 6 A) raw B) criterion A B

PAGE 336

336 Figure B 29. Equivalent score difference ICF method condition 6 A) raw B) criterion A B

PAGE 337

337 Figure B 30. Equivalent score difference NOP method condition 6 A) raw B) criterio n A B

PAGE 338

338 Figure B 31. Equivalent score difference method condition 7 A) raw B) criterion A B

PAGE 339

339 Figure B 32. Equivalent score difference OD method condition 7 A) raw B) criterion A B

PAGE 340

340 Figure B 33. Equivalent score difference TCF method condition 7 A) raw B) criterion A B

PAGE 341

341 Figure B 34. Equivalent score difference ICF method condition 7 A) raw B) criterion A B

PAGE 342

342 Figure B 35. Equivalent score difference NOP method condition 7 A) raw B) criterion A B

PAGE 343

343 Figure B 36. Equivalent score difference method condition 8 A) raw B) criterion A B

PAGE 344

344 Figure B 37. Equivalent score difference OD method condition 8 A) raw B) criterion A B

PAGE 345

345 Figure B 38. Equivalent score difference TCF method condition 8 A) raw B) criterion A B

PAGE 346

346 Figure B 39. Equivalent score differenc e ICF method condition 8 A) raw B) criterion A B

PAGE 347

347 Figure B 40. Equivalent score difference NOP method condition 8 A) raw B) criterion A B

PAGE 348

348 Figure B 41. Equivalent score difference method condition 9 A) raw B) criterion A B

PAGE 349

349 Figure B 42. Equivalent score difference OD method condition 9 A) raw B) criterion A B

PAGE 350

350 Figure B 43. Equivalent score difference TCF method condition 9 A) raw B) criterion A B

PAGE 351

351 Figure B 44. Equivalent score difference ICF method condition 9 A) raw B) criterion A B

PAGE 352

352 Figure B 4 5. Equivalent score difference NOP method condition 9 A) raw B) criterion A B

PAGE 353

353 Figure B 46. Equivalent score difference method condition 10 A) raw B) criterion A B

PAGE 354

354 Figure B 47. Equivalent score difference OD method condition 10 A) raw B) criterio n A B

PAGE 355

355 Figure B 48. Equivalent score difference TCF method condition 10 A) raw B) criterion A B

PAGE 356

356 Figure B 49. Equivalent score difference ICF method condition 10 A) raw B) criterion A B

PAGE 357

357 Figure B 50. Equivalent score difference NOP method condition 10 A ) raw B) criterion A B

PAGE 358

358 Figure B 51. Equivalent score difference method condition 11 A) raw B) criterion A B

PAGE 359

359 Figure B 52. Equivalent score difference OD method condition 11 A) raw B) criterion A B

PAGE 360

360 Figure B 53. Equivalent score difference TCF meth od condition 11 A) raw B) criterion A B

PAGE 361

361 Figure B 54. Equivalent score difference ICF method condition 11 A) raw B) criterion A B

PAGE 362

362 Figure B 55. Equivalent score difference NOP method condition 11 A) raw B) criterion A B

PAGE 363

363 Figure B 56. Equivalent score di fference method condition 12 A) raw B) criterion A B

PAGE 364

364 Figure B 57. Equivalent score difference OD method condition 12 A) raw B) criterion A B

PAGE 365

365 Figure B 58. Equivalent score difference TCF method condition 12 A) raw B) criterion A B

PAGE 366

366 Figure B 59. E quivalent score difference ICF method condition 12 A) raw B) criterion A B

PAGE 367

367 Figure B 60. Equivalent score d ifference NOP method condition 12 A) raw B) criterion A B

PAGE 368

368 Figure B 61. SEE, RMSD, and Bias method condition 1 A B C

PAGE 369

369 Figure B 62. SEE, RMSD, and Bias OD method condition 1 A) SEE, B) RMSD, C) Bias A B C

PAGE 370

370 Figure B 63. SEE, RMSD, and Bias TCF method condition 1 A) SEE, B) RMSD, C) Bias A B C

PAGE 371

371 Figure B 64. SEE, RMSD, and Bias ICF method condition 1 A) SEE, B) RMSD, C) Bias A B C

PAGE 372

372 Figure B 65. SEE RMSD, and Bias NOP method condition 1 A) SEE, B) RMSD, C) Bias A B C

PAGE 373

373 Figure B 66. SEE, RMSD, and Bias method condition 2 A) SEE, B) RMSD, C) Bias A B C

PAGE 374

374 Figure B 67. SEE, RMSD, and Bias OD method condition 2 A) SEE, B) RMSD, C) Bias A B C

PAGE 375

375 Figure B 68. SEE, RMSD, and Bias TCF method condition 2 A) SEE, B) RMSD, C) Bias A B C

PAGE 376

376 Figure B 69. SEE, RMSD, and Bias ICF method condition 2 A) SEE, B) RMSD, C) Bias A B C

PAGE 377

377 Figure B 70. SEE, RMSD, and Bias NOP method condition 2 A) SEE, B) RMSD, C) Bias A B C

PAGE 378

378 Figure B 71. SEE, RMSD, and Bias method condition 3 A) SEE, B) RMSD, C) Bias A B C

PAGE 379

379 Figure B 72. SEE, RMSD, and Bias OD method condition 3 A) SEE, B) RMSD, C) Bias A B C

PAGE 380

380 Figure B 73. SEE, RMSD, and Bias TCF method condition 3 A) SEE, B) RMSD, C) Bias A B C

PAGE 381

381 Figure B 74. SEE, RMSD, and Bias ICF method condition 3 A) SEE, B) RMSD, C) Bias A B C

PAGE 382

382 Figure B 75. SEE, RMSD, and Bias NOP method condition 3 A) SEE, B) RMSD, C) Bias A B C

PAGE 383

383 Figure B 76. SEE, RMSD, and Bias method condition 4 A) SEE, B) RMSD, C) Bias A B C

PAGE 384

384 Figure B 77. SEE, RMSD, and Bias OD method condition 4 A) SEE, B) RMSD, C) Bias A B C

PAGE 385

385 Figure B 78. SEE, RMSD, and Bias TCF method condition 4 A) SEE, B) RMSD, C) Bias A B C

PAGE 386

386 Figu re B 79. SEE, RMSD, and Bias ICF method condition 4 A) SEE, B) RMSD, C) Bias A B C

PAGE 387

387 Figure B 80. SEE, RMSD, and Bias NOP method condition 4 A) SEE, B) RMSD, C) Bias A B C

PAGE 388

388 Figure B 81. SEE, RMSD, and Bias method condition 5 A) SEE, B) RMSD, C) Bias A B C

PAGE 389

389 Figure B 82. SEE, RMSD, and Bias OD method conditi on 5 A) SEE, B) RMSD, C) Bias A B C

PAGE 390

390 Figure B 83. SEE, RMSD, and Bias TCF method condition 5 A) SEE, B) RMSD, C) Bias A B C

PAGE 391

391 Figure B 84. SEE, RMSD, and Bias ICF method condition 5 A) SEE, B) RMSD, C) Bias A B C

PAGE 392

392 Figure B 85. SEE, RMSD, and Bias NOP method condition 5 A) SEE, B) RMSD, C) Bias A B C

PAGE 393

393 Figure B 86. SEE, RMSD, and Bias method condition 6 A) SEE, B) RMSD, C) Bias A B C

PAGE 394

394 Figure B 87. SEE, RMSD, and Bias OD method condition 6 A) SEE, B) RMSD, C) Bias A B C

PAGE 395

395 Figure B 88. SEE, RMSD, and Bias T CF method condition 6 A) SEE, B) RMSD, C) Bias A B C

PAGE 396

396 Figure B 89. SEE, RMSD, and Bias ICF method condition 6 A) SEE, B) RMSD, C) Bias A B C

PAGE 397

397 Figure B 90. SEE, RMSD, and Bias NOP method condition 6 A) SEE, B) RMSD, C) Bias A B C

PAGE 398

398 Figure B 91. SEE, RMSD, an d Bias method condition 7. A) SEE, B) RMSD, C) Bias A B C

PAGE 399

399 Figure B 92. SEE, RMSD, and Bias OD method condition 7 A) SEE, B) RMSD, C) Bias A B C

PAGE 400

400 Figure B 93. SEE, RMSD, and Bias TCF method condition 7 A) SEE, B) RMSD, C) Bias A B C

PAGE 401

4 01 Figure B 94. SEE, RMSD, and Bias ICF method condition 7 A) SEE, B) RMSD, C) Bias A B C

PAGE 402

402 Figure B 95. SEE, RMSD, and Bias NOP method condition 7 A) SEE, B) RMSD, C) Bias A B C

PAGE 403

403 Figure B 96. SEE, RMSD, and Bias method condition 8 A) SEE, B) RMSD, C) Bias A B C

PAGE 404

404 Figure B 97. SEE, RMSD, and Bias OD method condition 8 A) SEE, B) RMSD, C) Bias A B C

PAGE 405

405 Figure B 98. SEE, RMSD, and Bias TCF method condition 8 A) SEE, B) RMSD, C) Bias A B C

PAGE 406

406 Figure B 99. SEE, RMSD, and Bias ICF method condition 8 A) SEE, B) RMSD, C) Bias A B C

PAGE 407

407 Figure B 100. SEE, RMSD, and Bias NOP method condition 8 A) SEE, B) RMSD, C) Bias A B C

PAGE 408

408 Figure B 101. SEE, RMSD, and Bias method condition 9 A) SEE, B) RMSD, C) Bias A B C

PAGE 409

409 Figure B 102. SEE, RMSD, and Bias OD method condition 9 A) SEE, B) RMSD, C) Bias A B C

PAGE 410

410 Figure B 103. SEE, RMSD, and Bias TCF method condition 9 A) SEE, B) RMSD, C) Bias A B C

PAGE 411

411 Figure B 104. SEE, RMSD, and Bias ICF method condition 9 A) SEE, B) RMSD, C) Bias A B C

PAGE 412

412 Figure B 105. SEE, RMSD, and Bias NOP method condition 9 A) SEE, B) RMSD, C) Bias A B C

PAGE 413

413 Figure B 106. SEE, RMSD, and Bias method condition 10 A) SEE, B) RMSD, C) Bias A B C

PAGE 414

414 Figure B 107. SEE, RMSD, and Bias OD method condition 10 A) SEE, B) RMSD, C) Bias A B C

PAGE 415

415 Figure B 108. SEE, RMSD, and Bias TCF method conditi on 10 A) SEE, B) RMSD, C) Bias A B C

PAGE 416

416 Figure B 109. SEE, RMSD, and Bias ICF method condition 10 A) SEE, B) RMSD, C) Bias A B C

PAGE 417

417 Figure B 110. SEE, RMSD, and Bias NOP method condition 10 A) SEE, B) RMSD, C) Bias A B C

PAGE 418

418 Figure B 111. SEE, RMSD, and Bias s method condition 11 A) SEE, B) RMSD, C) Bias A B C

PAGE 419

419 Figure B 112. SEE, RMSD, and Bias OD method condition 11 A) SEE, B) RMSD, C) Bias A B C

PAGE 420

420 Figure B 113. SEE, RMSD, and Bias TCF method condition 11 A) SEE, B) RMSD, C) Bias A B C

PAGE 421

421 Figure B 114. SEE, RMSD and Bias ICF method condition 11 A) SEE, B) RMSD, C) Bias A B C

PAGE 422

422 Figure B 115. SEE, RMSD, and Bias NOP method condition 11 A) SEE, B) RMSD, C) Bias A B C

PAGE 423

423 Figure B 116. SEE, RMSD, and Bias method condition 12 A) SEE, B) RMSD, C) Bias A B C

PAGE 424

424 Figure B 117. SEE, RMSD, and Bias OD method condition 12 A) SEE, B) RMSD, C) Bias A B C

PAGE 425

425 Figure B 118. SEE, RMSD, and Bias TCF method condition 12 A) SEE, B) RMSD, C) Bias A B C

PAGE 426

426 Figure B 119. SEE, RMSD, and Bias ICF method condition 12 A) SEE, B) RMSD, C) Bias A B C

PAGE 427

427 Figure B 120. SEE, RMSD, and Bias NOP method c ondition 12 A) SEE, B) RMSD, C) Bias A B C

PAGE 428

428 LIST OF REFERENCES Ackerman, T. A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Educ ation, 4 255 278. Ackerman, T. A. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement 20, 311 329. Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L. Thorridike (Ed.), Educa tional measurement (2 nd ed., pp. 508 600). Washington, DC: American Council on Education. (Reprinted as W.A. Angoff, Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service, 1984.). Angoff, W. H. (1984). Scales, norms and equivalen t scores. Princeton, NJ: Educational Testing Service. Angoff, W. H., & Cowell, W. R. (1985). An examinination of the assumption that the equating of parallel forms is population independent (GRE Board Professional Report GREB No. 83 12P, Research Rep. No. 85 22). Princeton, NJ: Educational Testing Service. Baker, F. B. (1992). Equating tests under the graded response model. Applied Psychological Measurement 16, 87 96. Baker, F. B., & Al Karni, A. (1991). A comparison of two procedures for computing IRT e quating coefficients. Journal of Educational Measurement, 28, 147 162. Bastari, B. (2000). Linking multiple choice and constructed response items to a common proficiency scale. Unpublished doctoral dissertation, University of Massachusetts. Batley, R. & Bo ss, M.W. (1993). The effects on parameter estimation of correlated dimensions and a distribution restricted trait in a multidimensional item response model. Applied Psychological Measurement, 17, 131 141. Beguin, A. A, Glas C. A.W (2001) MCMC estimation an d some model fit analysis of multidimensional IRT models. Psychometrika 66 541 561. Bock, R. D., & Aitkin, M. (1981). Marginal maxi mum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443 459. Bock, R. D., Gibb ons, R., Schilling, S. G., Muraki, E., Wilson, D. T., & Wood, R. (1999). TESTFACT 3: Test scoring, items statistics, and full information item factor analysis. Chicago: Scientific Software International. Bolt, D. M. & Lall, V. F. (2003) Estimation of comp ensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement 27, 395 414.

PAGE 429

429 Braun, H. I. & Holland, P. W. (1982). Observed score test equating: A mathematical analysis of some ETS equati ng procedures. In P.W. Holland and D.B. Rubin (Eds.) Test equating (p. 9 49). New York: Academic. Brennan, R. (1987). Introduction to problems, perspectives, and practical issues in equating. Applied Psychological Measurement, 11, 221 224. Brossman, B. G. (2010). Observed score and true score equating procedures for multidimensional item response theory. Unpublished doctoral dissertation University of Iowa. http://ir.uiowa.edu/etd/469. Cook, L. L., & Petersen, N. S. (1987). Problem s related to the use of conventional and item response theory equating methods in less than optimal circumstances. Applied Psychological Measurement, 11, 225 244. Davey, T. C., Oshima, T. C., & Lee, K. (1996). Linking multidimensional item calibrations. Applied Psychological Mea surement, 20, 405 416. Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society Series B 39 ,1 38. Dennis, J. E., & Schnabel, R. B. (1996). Num erical methods for unconstrained optimization and nonlinear equations. Prentice Hall, Englewood Cliffs, NJ. Divgi, D. R. (1985). A minimum chi square methods for developing a common metric in item response theory. Applied Psychological Measurement, 9 413 415. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. New Jersey: Lawrence Erlbaum Associates, Inc., Publishers. Fraser, C., & McDonald, R. P. (1988). NOHARM: Least squares item factor analysis. Multivariate Behavioral Resea rch, 23, 267 269. Fraser, C. (1998). NOHARM: A Fortran program for fitting unidimensional and multidimensional normal ogive models in latent trait theory. The University of New England, Center for Behavioral Studies, Armidale, Australia Genz, A., Bretz, F. & Hothorn, T. (2005). mvtnorm: Multivariate Normal and t Distribution. R package version 0.7 2, URL http://CRAN.R project.org/. Gosz J. K., Walker, C. M. (2002). An empirical comparison of multidimensional item response theory using TESTFACT and NOHARM. Paper presented at the annual meeting of the National Council for Measurement in Education, New Orleans. Green, P. E. (1976). Mathematical tools for applied multivariate analysis. New York: Academic Press.

PAGE 430

430 Haebara, T. (1980). Equating logistic ability sca les by a weighted least squares method. Japanese Psychological Research, 22, 144 149. Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Norwell, MA: Kluwer Academic Publishers. Hanson, B. A., & Beguin, A. A. (200 2). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common item equating design. Applied Psychological Measurement, 26 3 24. Harris, D. J., & Crouse, J. D. (1993). A study of criteria u sed in equating. Applied Measurement in Education, 6, 195 240. Heh, V. K. (2007). Equating accuracy using small samples in the random groups design. Unpublished doctoral dissertation. Ohio University. Athens, OH. Hirsch, T. M. (1989). Multidimensional equ ating. Journal of Educational Measurement, 26, 337 349. Holland, P.W. and Dorans, N. J. (2006). Linking and equating, in Brennan, R.L. (Eds.) Educational Measurement. 4 th ed. (pp. 187 220). West Post, CT: American Council on Education & Praeger Publisher. Hung, P., Wu, Y., & Chen, Y. (1991). IRT item parameter linking: Relevant issues for the purpose of item banking. Paper presented at the International Academic Symposium on Psychological Measurement, Tainan, Taiwan. Jodoin, M. G., Keller, L. A., & Swamina than, H. (2003). A comparison of linear, fixed common item, and concurrent parameter estimation equating procedures in capturing academic growth. The Journal of Experimental Education, 71, 229 250. Kim, H. (1994). New techniques for the dimensionality ass essment of standardized test data. Unpublished doctoral dissertation. University of Illinois at Urbana Champaign. Urbana Champaign, IL. Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4 ) 355 381. Kim, Y. Y. (2008). Effects of test linking methods on proficiency classification: UIRT versus MIRT linking. Unpublished Dissertation, Michigan State University, East Lansing, MI. Kim, S. H., & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29(1), 51 66. Kolen, M. J. (1981). Comparison of traditional and item response theory methods for equating tests. Journal of Educational Measurement, 18, 1 11.

PAGE 431

431 Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking: methods and practices (Second ed.). New York: Springer. Kolen, M. J., & Wang, T. Y. (2007 ). Conditional standard errors of measurement for composite scores using IRT. (Unpublished manuscript). Li, Y. H. (1997). An evaluatio n of multidimensional IRT equating methods by assessing the accuracy of transforming parameters onto a target test metric (Doctoral dissertation, University of Marryland, 1997). Dissertation Abstract International UMI Number 9816494. Li, Y. H., & Lissitz R. W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24, 115 138. Livingston, S. A. (1993). Small samp le equating with log linear smoothing. Journal of Educational Measurement, 30, 23 39. Lo rd, F. M. (1952). A theory of test scores. Psychometric Monograph 7. Lord, F. M. (1980). Applications of item response theory to practical testing problems Hillsdale, NJ: Erlbaum. Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test s cores Reading, MA: Addison Wesley. Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179 193. Marco, G. L. (1977). Item characteristic curve solution to three intractable testing problems Journal of Educational Measurement, 14, 139 160. Maydeu Olivares, A. (2001). Multidimensional item response theory modeling of binary data: Large sample properties of NOHARM estimates. Journal of Educational and Behavioral Statistics 26, 51 71. McKinle y, R. L., & Reckase, M. D. (1983). An extension of the two parameter logistic model to the multidimensional latent space (Research Report, ONR 83 2). Iowa City, IA: American College Testing Program. Miller, T.R. (1991). Empirical estimation of standard err ors of compensatory MIRT model parameters obtained from the NOHARM estimation program. (Research Report ONR 91 2). American College Testing Program, Iowa City, IA. Miller, T.R. and Hirsch, T.M. (1992) Cluster analysis of angular data in applications of mul tidimensional item response theory. Applied Measurement in Education 5 ,193 211

PAGE 432

432 Min, K. S. (2003). The impact of scale dilation on the quality of the linking of multidimensional item response theory calibrations. Unpublished Dissertation, Michigan State Un iversity, East Lansing, MI. Mulaik, S.A. (1972). The foundations of factor analysis New York: McGraw Hill Book Company. Muthen, L. K., & Muthen, B. (1998). MPLUS: The comprehensive modeling program for Los Angeles: Muthen & Muthen. Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review, Otaru University of Commerce, 51(1). 1 23. Ogasawara, H. (2001a). Item response theory true score equatings and their standard errors Journal of Educational and Behavioral Statistics, 26(1), 31 50. Ogasawara, H. (2001b). Least squares estimation of item response theory linking coefficients. Applied Psychological Measurement, 25(4). 3 24. Ogasawara, H. (2001c). Marginal maximum likeliho od estimation of item response theory (IRT) equating coefficients for the common examinee design. Japanese Psychological Research, 43(2), 72 82. Oshima, T. C., Davey, T. C., & Lee, K. (2000). Multidimensional linking: Four practical approaches. Journal of Educational Measurement, 37, 357 373. R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3 900051 07 0, URL http:/ /www.R project.org Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press. Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Meas urement, 9, 401 412. Reckase, M. D. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15, 361 373. Reckase, M. D. (1995). A linear logistic multidimensional model for dichotomous item respons e data. In W.J. Linden and R. K. Hambleton (Eds.). Handbook of modern item response theory (pp.271 286). New York: Springer Verlag. Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25 36. Reckase, M. D. (2004). The real world is more complicated than we would like. Journal of Educational and Behavioral Statistics, 29(1), 117 120.

PAGE 433

433 Rechase, M. D., & Martineau, J. (2004, October). The vertical scaling of Science Achievement Tests. Paper commissioned by the Committee on Test Design for K 12 Science Achievement, Center for Education, National Research Council. Reckase, M. D. (2005). Multidimensional item response theory models. In Kimberly Kempf Leonard (Ed.) Encyclopedia of Social Measure ment (Vol. 2, pp. 771 777). San Diego, Calif.; London: Academic. Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer. Roussos, L., Stout, W., & Marden, J. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35 1 30. Schonemann, P. H. (1966). A generalized solution of the orthogonal procrustes problem. Psychometrika, 31, 1 10. Schwarz, R.,Yen, W.,Schafer, W. (2001). The Challenge and Attainab ility of Goals for Adequate Yearly Progress. Educational Measurement: Issues and Practice. 20(4), 26 33. Simon, M. K. (2008). Comparison of concurrent and separate multidimensional IRT linking of item parameters. Unpublished Dissertation University of Min nesota. Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal of Educational Measurement, 42(4), 309 330. Spray, J. A., Davey, T. C., Reckase, M. D., Ackerman, T. A., Carlson, J. E. (1990). Comparison of two logistic multi dimensional item response theory models (Research Report ONR90 8). ACT, Inc., Iowa City, IA. Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201 210. Stone, C. A., & Yeh, C C. (2006). Assessing the dimensionality and factor structure of multiple choice exams: An empirical comparison of methods using the multistate Bar Examination. Educational and Psychological Measurement, 66, 193 214. Sympson, J. B. (1978). A model for testin g with multidimensional items. In Weiss., D. J. (ed) Proceedings of the 1977 Computerized Adaptive Testing Conference, University of Minnesota, Minneapolis. Tate, R. (2003). A comparison of selected empirical methods for assessing the structure of response s to test items. Applied Psychological Measurement, 27 159 203 Thompson, T. D., Nering, M., & Davey, T. (1997). Multidimensional IRT scale linking without common items or common examinees. Paper presented at the annual meeting of the Psychometric Society, Gatlinburg, TN.

PAGE 434

434 Thurstone, L. L. (1947). Multiple factor analysis: A development and expansion of The Vectors of Mind. The University of Chicago Press, Chicago Tucker, L.R. (1951). A method for synthesis of factor analysis studies (Personal Research Secti on Report No.984). Washington, D.C.: Department of the Army. Wang, M. (1985). Fitting a unidimensional model to multidimensional item response data: The effect of latent space misspecification on the application of IRT. (Research Report MW: 6 24 85). Univ ersity of Iowa, Iowa City, IA. Wang, M. (1986). Fitting a unidimensional model to multidimensional item response data. Paper presented at the Office of Naval Research Contractors Meeting, Gatlinburg, TN. Way, W.D., & Tang, K. L. (1991). A comparison of fou r logistic model equating methods. Paper presented at the annual meeting of the American Educational Research Association, Chicago. Wilson, D., Wood, R., & Gibbons, R. D. (1987). TESTFACT: Test scoring, item statistics, and item factor analysis. Mooresvill e, IN: Scientific Software. Wei, Y. (2008). A simulation study on the performance of four multidimensional IRT scale linking methods Unpublished doctoral dissertation. University of Florida. Wingersky, M.S., & Lord, F. M. (1984). An investigation of metho ds for reducing sampling error in certain IRT procedures. Applied Psychological Measurement. 8(3). 347 364. Wu, M.L., Adams, R. J., & Wilson, M. R. (1997) ConQuest: Generalized item response modeling software. ACER, Victoria, Australia Yao, L., & Boughton, K. A. (2007). A multidimensional partial credit model with associated item and test statistics: An application to mixed format tests. Applied Psychological Measurement, 30, 469 492. Yen, W. M. & Fitzpatrick, A. R. (2006). Item Response Theory. in Brenna n, R.L. (Eds.) Educational Measurement. 4 th ed. (pp. 111 154). West Post, CT: American Council on Education & Praeger Publisher. Yon, H. (2006). Multidimensional Item Response Theory (MIRT) approaches to vertical scaling. Unpublished doctoral dissertation Michigan State University, East Lansing, MI. Zeng, L. & Kolen, M. J. (1995). An alternative approach for IRT observed score equating of number correct scores. Applied Psychological Measurement, 19(3). 231 240. Zhang, J. (1996). Some fundamental issues in item response theory with applications. Unpublished doctoral dissertation University of Illinois at Urbana Champaign, Department of Statistics.

PAGE 435

435 Zhang, J., & Stout, W. F., (1999). Conditional covariance structure of generalized compensatory multidimension al items. Psychometrika, 64, 129 152. Zhang, J., & Wang, M. (1998, April). Relating reported scores to latent traits in a multidimensional test. Paper presented at the annual meeting of American Educational Resear ch Association, San Diego, CA. Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R.D. (2003). BILOG MG 3 for Windows: Multiple group IRT analysis and test maintenance for binary items [Computer software]. Lincolnwood, IL: Scientific Software International, Inc.

PAGE 436

436 BIOGRAPHICAL SKETCH Ou Zhang w as born in Chengdu, China. He completed his Bachelor of Science in computer science from Chengdu University of Technology in 2001 and his Master of Education in Educational Research Measurement and Evaluation from Boston College in 2007. He received his Ma ster of Arts in e ducation degree from the program of Research and Evaluation Methodology at University of Florida in the fall of 2010. He received his Ph.D. from the p rogram of Research and Evaluation Methodology at University of Florida in the summer of 2 012