Citation
Statistical Analysis of Rater Effects

Material Information

Title:
Statistical Analysis of Rater Effects
Creator:
BUU, YUH-PEY ANNE ( Author, Primary )
Copyright Date:
2008

Subjects

Subjects / Keywords:
Markov chains ( jstor )
Parametric models ( jstor )
Power functions ( jstor )
Psychological assessment ( jstor )
Psychological research ( jstor )
Psychological techniques ( jstor )
Psychometrics ( jstor )
Simulations ( jstor )
Statistical discrepancies ( jstor )
Statistics ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Yuh-Pey Anne Buu. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
6/1/2004
Resource Identifier:
53207343 ( OCLC )

Downloads

This item is only available as the following downloads:


Full Text

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS.............................iv ABSTRACT....................................vii CHAPTERS 1LITERATUREREVIEW........................1 1.1Introduction...........................1 1.2ItemResponseTheory.....................2 1.3ReviewofthePsychometricsLiterature............6 1.4MarkovChainMonteCarlo(MCMC).............10 1.5OverviewofThisStudy.....................13 2AHIERARCHICALBAYESIANMODELFORORDINAL RESPONSES.............................15 2.1ReviewofthePartialCreditModel..............15 2.2TheProposedHierarchicalBayesianModel..........17 2.3TheVarianceRatioMethodforMonitoringConvergence..19 2.4AnEcientComputingTechnique...............21 2.5DemonstrationUsingRealPsychologyData.........23 3ABAYESIANMODELFORMULTIPLERATERS.........40 3.1ReviewofJohnson'sMultiraterOrdinalModel........40 3.2TheoreticalandPracticalIssues................42 3.3ProprietyofthePosteriorforJohnson'sModel........44 3.4MultivariateGeneralizationofJohnson'sModel........44 3.5AMethodforDealingwithImpropriety............47 3.6ComputationalIssuesandSolutions..............48 3.7DemonstrationUsingRealEducationData..........52 4APERMUTATIONTESTFORDIFFERENCEINRATERAGREEMENT.................................70 4.1ReviewofCohen'sKappa....................70 4.2ReviewofTheDistanceFunctionApproach..........72 4.3ASimulationStudyonthePerformanceofMielkeandBerry's Test..............................76 v

PAGE 6

4.4TheProposedHypothesisTestingMethod..........79 4.5DemonstrationUsingRealPsychologyData.........82 5FUTURERESEARCH.........................100 REFERENCES...................................102 BIOGRAPHICALSKETCH............................106 vi

PAGE 7

Duetotheevolvingconceptsoftestvalidity,therecenttrendinlarge-scalestandardizedtestingistowardsmoreessayquestionsorperformance-orientedtasks.Open-endedquestionsunavoidablyintroduceacertainamountofsubjectivityintothescoringprocess.Therefore,measuringratereectshasbecomeanimportantmethodologicalissueineducationalandpsychologicaltesting.Thestatisticsliteratureonthemethodsforanalyzingordinaldatacollectedfrommultipleratershavebeenclassiedintotwoapproaches.Therstapproach,undertheparadigmofCohen'sKappa,focusesonmodelingtheagreementamongraters.Thesecondapproach,undertheparadigmofitemresponsetheory(IRT),emphasizesthetasksofevaluatingraterprecisionandestimatingtherelativerankingsofindividuals.Thisstudyprovidesacomprehensivereviewofbothparadigms. AhierarchicalBayesianapproachbasedonapopularmodelforordinalresponsesonmultipleitems,thepartialcreditmodel,isproposedtotacklethevii

PAGE 8

ABayesianmodelformultipleratersisproposedtodealwiththetheoreticalandpracticalissuesofJohnson'smultiraterordinalmodel,sothattheproposedmodelismoreapplicableinageneralsettingofthepsychologicalandeducationaltestingthatemploys(1)ordinalresponses,(2)multipleitems,and(3)multipleraters.Theapplicationsoftheproposedmodelaredemonstratedbyusingrealeducationdata. AsimulationstudyisconductedtoevaluatetheperformanceofthehypothesistestingmethodproposedbyMielkeandBerryforgroupdierencesinrateragree-ments.AMonteCarlopermutationtestisproposedandisshowntooutperformMielkeandBerry'stestinasimulationstudy.Arealpsychologydatasetcontain-ingratingsgivenbyagroupoftrainedratersandagroupofuntrainedratersisusedtodemonstratebothmethods. Thenalchapterofthisdissertationprovidessixrelevanttopicsforfutureresearch.Allthetopicsinvolveimportantmethodologicalissuesinpsychologyandeducation.viii

PAGE 9

1.1 Introduction Duetotheevolvingconceptsoftestvalidity(Cronbach,1988;Messick,1995),therecenttrendinlarge-scalestandardizedtestingistowardsmoreessayquestionsorperformance-orientedtasks.Thewell-knownexamplesareScholasticAptitudeTest(SAT)andNationalAssessmentofEducationalProgress(NAEP).Open-endedquestionsaregenerallybelievedtomeasurehigher-orderthinkingbetterthanmultiple-choicequestions.Thus,incorporatingopen-endedquestionsinnationwideorstate-levelstandardizedtestswassuggestedtohavepositiveinuencesontheclassroominstructionandschoolcurriculum(Resnick&Resnick,1992).However,open-endedquestionsunavoidablyintroduceacertainamountofsubjectivityintothescoringprocessevenwhenguidelinesandtrainingareprovidedforraters.Therefore,measuringratereectshasbecomeanimportantmethodologicalissueineducationalandpsychologicaltesting.Ifratereectscanbeaccuratelyspecied,ratershavetheopportunitytoimprovebasedontheirevaluationresultsandtesttakersmightbeabletoobtainfairerscores(adjustedfortheraterbias). Traditionalmethodsforassessingratereectsinthepsychologyandeducationliterature,whichareusuallycalledgeneralizabilitytheory,aremainlyanalysisofvariance(ANOVA)procedures.ThedetailedprinciplesandapplicationsofgeneralizabilitytheorycanbefoundinShavelsonandWebb(1991)andBrennan(2001).Bock(1997)commentedthatgeneralizabilitytheoryshowshowtoassessseparatelytheamountoferrorvarianceattributabletothesamplingofitemsandsamplingofraterswhennumericalvaluesareassignedtothegradedcategories,but1

PAGE 10

JohnsonandAlbert(1999)reviewedthestatisticsliteratureonthemethodsforanalyzingordinaldatacollectedfrommultipleratersandclassiedthemintotwoapproaches.Therstapproachfocusesonmodelingtheagreementamongraters.Chapter4ofthisdissertationprovidesacomprehensivereviewofthemostpopularmeasureofagreement:Cohen'sKappaanditsgeneralizedformdevelopedinrecentyears.Thehypothesistestingmethodthatwasdevelopedwiththegeneralizedmeasureofagreementisalsoevaluatedandanalternativemethodisproposedtoimprovetheperformance. Theotherapproachemphasizesthetasksofevaluatingraterprecisionandes-timatingtherelativerankingsofindividuals.Thisapproachisundertheparadigmofitemresponsetheory(IRT).Thefollowingsectionsreview(1)thegeneralformofIRT,(2)theexistingestimationproceduresandissues,(3)thecommonlyusedIRTmodelsforordinalresponsesandmultipleraters,and(4)anumericalintegrationtechnique,MarkovchainMonteCarlo(MCMC),whichisaveryusefultoolforBayesianIRTmodeling. 1.2 Item Response Theory Itemresponsetheoryisagroupofgeneralizedlinearmodelswhichweredevelopedineducationaltestingtodescribehowastudent'sperformanceonatestdependsonhis/herabilityandthecharacteristicsoftheitemsonthetest.HambletonandSwaminathan(1985)providedacompletereviewoftheprinciplesandapplicationsofthistheory.Inaddition,aninterestinghistoricalreviewofthedevelopmentofIRTcanbefoundinBock(1997). SupposethatthereareNindividualswhotakeatestconsistingofJitems.Eachitemisscoredascorrectorincorrect.Letyijdenotethebinaryresponseoftheithindividualtothejthitem.Themodelassumesthattheprobabilitythatthe

PAGE 11

AssumingthattheJresponsesoftheindividualiareconditionallyindepen-dentgivenavalueoftheperson'sabilityi,P(yi1;:::;yiJji)=P(yi1ji)P(yiJji); Ghosh(1996)providedacomprehensivereviewoftheexistingestimationproceduresforitemresponsemodels.Frequentistmethodsofinferenceforthesemodelscanbebroadlycategorizedaseitherconditionalormarginalmaximum

PAGE 12

Inrecentyears,withtherapiddevelopmentofcomputationaltechniques,theBayesianapproachhasreceivedconsiderableattention.ThemajordicultyforestimatingparametersspeciedinIRTmodelsisthatthenumberofparametersalwaysexceedsthenumberofobservations.TheBayesianapproachtakescareofthisproblembyspecifyingpriordistributionsfortheunknownparametersinordertoreducethedimensionoftheparameterspace.However,unlessoneusesobjectivepriors,suchanapproachwilllackrobustnessagainstmisspeciedpriors.Usingatpriorsforbothsubjectabilityparametersanditemdicultyparameters,theBayesiananalysisisequivalenttothelikelihood-basedanalysis(Swaminathan&Giord,1985). Amajorconcerninchoosingpriorsispotentialimproprietyoftheresultingposteriors.ProperposteriorsarevitalforanyBayesiananalysissince,otherwise,

PAGE 13

0
PAGE 14

Thefollowingsectionreviewsseveralcommonlyuseditemresponsemodelsinthepsychometricsliterature.Threeclassicalmodelsforordinalresponseswithoutinvolvingmultipleratersarerstintroducedtoestablishafoundationformorecomplexmodels.Builtuponthisfoundation,somerecentlypublisheditemresponsemodelsforratereectsarereviewed. 1.3 Review of the Psychometrics Literature Whenopen-endedquestionsareemployedinpsychologicalmeasurements,ratersareusuallyinstructedtorateparticipants'answersonanordinalscale(e.g.1to5)accordingtosomeguidelinesorbehaviorexamples.Thisiswhytheitemresponsemodelingforratereectsusuallydealswithordinalresponses. Agresti(1990)categorizedcommonlyusedlogitmodelsforordinalresponsesintothreetypes.Letf1(x);:::;K(x)gdenoteresponseprobabilitiesatvaluexforasetofexplanatoryvariables.Thethreetypesoflogitmodelsaredenedasfollows:Theadjacent-categorieslogits orLk=logk+1(x)

PAGE 15

1+exp[j(ij;k+1)]1 1+exp[j(ijk)]; Anothergroupofmodelswhichbecamepopularinthe1980sbelongstotheadjacent-categorieslogitmodel,basedonthedenitionofAgresti(1990).Masters(1982)developedthewell-knownpartialcreditmodel:logP(yij=k)

PAGE 16

Builtuponthepartialcreditmodel,Linacre(1989)proposedthefacetsmodeltomeasuretheratereect:logP(yijr=k) 22r[k(+r)]2i=1;:::;N;j=1;:::;J;r=1;:::;R;k=1;:::;K;=1;:::;K

PAGE 17

Followingthetraditionofthresholdmodels,JohnsonandAlbert(1999)proposedthemultiraterordinalmodelwhichsharesasimilarperspectivewithSamejima'sgradedresponsemodel.Sincethemultiraterordinalmodelwasoriginallydevelopedforessaygrading(Johnson,1996)andGPAadjustment(Johnson,1997),unlikePatz'shierarchicalratermodel,thismodelwasdesignedforoneiteminsteadofmultipleitems.Letyirdenotetheratingassignedbyraterrtoindividuali.Thismodelassumesthatthereexistsacontinuousunderlyingtraitscaleonwhicheachraterusesdierentcutopoints.Lettirdenoteraterr'sperceptionofthelatenttraitforindividuali.tir=Zi+eir,whereZiisthetruevalueofindividuali'slatenttraitandeiristheerrorterm.Individualiisassignedtocategorykbyraterrifr;k1
PAGE 18

BothPatz'shierarchicalratermodelandJohnson'smultiraterordinalmodelemploytheBayesianapproachusingtheMarkovchainMonteCarlonumericalintegrationtechniqueforparameterestimations.Thisisaneectiveapproachgiventhecomplexityofthemodelsandthelargenumberofparametersspeciedinthemodels.ThefollowingsectionintroducestheMarkovchainMonteCarlonumericalintegrationtechniquewhichwillbeusedinthisstudy. 1.4 Markov Chain Monte Carlo (MCMC) MarkovchainMonteCarlo(MCMC)isanumericalintegrationtechniquethatgeneratesMarkovchainstoconductMonteCarlointegrations.TheMCMCmethodologywasoriginallydevelopedbyphysiciststocomputecomplexintegralsbyusingrandomnumbergeneration(Metropolisetal.,1953).Intherecentdecade,thistechniquehasbeenwidelyappliedtoabroadclassofBayesianproblemswhichinvolvesimulatingposteriordistributions.IntroductoryarticlesforthistechniquecanbefoundinCasellaandGeorge(1992),ChibandGreenberg(1995),andGilks,Richardson,andSpiegelhalter(1996). LetXbeavectorofkrandomvariableswithdistribution().ToevaluatetheexpectationE[g(X)]forsomefunctionofinterestg(),MonteCarlointegrationdrawssamplesfXt;t=1;:::;ngfrom()andthenapproximatesitusing1

PAGE 19

TheMetropolis-HastingsalgorithmisageneralmethodforconstructingaMarkovchainsuchthatitsstationarydistributionispreciselyourdistributionofinterest().Thealgorithmwasrstproposedbyphysicists(Metropolisetal.,1953)andwasthengeneralizedbyHastings(1970)toapplicationsinstatistics.FortheMetropolis-Hastingsalgorithm,ateachtimet,thenextstateXt+1ischosenbyrstsamplingacandidatepointYtfromaproposaldistributionq(YjXt).ThecandidatepointYtisthenacceptedwithprobability(Xt;Yt)where(X;Y)=min1;(Y)q(XjY) DierentmethodsforspecifyingtheproposaldistributionwerereviewedbyChibandGreenberg(1995).Acommonlyusedapproach,randomwalk,isadoptedinthisstudyandisreviewedasfollows.LetYt=Xt+t;

PAGE 20

AparticularMarkovchainalgorithmthathasbeencommonlyusedinmanymultidimensionalproblemsistheGibbssampler.InsteadofupdatingthewholeofX,itisoftenmoreconvenientandcomputationallyecienttodivideXintosubvectorsfX1;X2;:::;Xhgandthenupdatethesesubvectorsonebyone.LetXi=fX1;:::;Xi1;Xi+1;:::;Xhg;

PAGE 21

1.5 Overview of This Study Inthisstudy,westartwithbuildingamodelforordinalresponsesonmultipleitemsinChapter2.Then,wemoveontoamorecomplexmodelwhichinvolvesmultipleratersinChapter3.Modelspecicationandparameterestimationmeth-odsforratereectsusingaBayesianapproachareproposed.Thecomputationalissuesandapplicationsoftheproposedmethodsineducationandpsychologyarealsoaddressed.AftertheIRTparadigmisexploredinChapters2and3,wepro-videacomprehensivereviewandevaluationoftheparadigmofCohen'sKappaformeasuringrateragreementinChapter4.Analternativemethodfortestinggroupdierencesinrateragreementisproposedandisshowntooutperformarecentlydevelopedhypothesistestingmethod. InChapter2,wereviewapopularmodelforordinalresponsesonmultipleitems,thepartialcreditmodel(Masters,1982),anditsoriginalmethodofpa-rameterestimation.WeproposeahierarchicalBayesianapproachtotackletheinconsistencyissueoftheexistingparameterestimationmethod.Weprovidesolu-tionsforcomputationalissuesoftheproposedmethodincludingusingthevarianceratiomethodtomonitorconvergenceofMCMCandapplyingtherecursivetech-niquetocalculatesuccessivesamplevariances.Theapplicationsoftheproposedmethodaredemonstratedbyusingrealpsychologydatawhichcontaintheratingsofopen-endedresponsesfrom87adolescentswithcysticbrosisonapsychologicalmeasurement:Role-PlayInventoryofSituationsandCopingStrategies(RISCS;Quittner,1996). InChapter3,wereviewJohnson'smultiraterordinalmodel(Johnson,1996,1997)andinvestigatethetheoreticalandpracticalissuesofthemodel.Tosolvetheexistingproblems,weproposeanalternativemodelwhichismoreapplicable

PAGE 22

InChapter4,wereviewthemostpopularmeasureofagreement:Cohen'sKappa(Cohen,1960,1968)anditsgeneralizationusingthedistancefunctionapproach(Mielke&Berry,2001).AsimulationstudyisconductedtoevaluatetheperformanceofthehypothesistestingmethodproposedbyMielkeandBerry(2001)forgroupdierencesinrateragreement.AMonteCarlopermutationtestisproposedandisshowntooutperformMielkeandBerry'stestbyasimulationstudy.ArealpsychologydatasetcontainingtheRISCSratingsgivenbyagroupoftrainedratersandagroupofuntrainedratersisusedtodemonstratebothmethods. Chapter5providessixrelevanttopicsforfutureresearch.Allthetopicsinvolveimportantmethodologicalissuesinpsychologyandeducation.

PAGE 23

Inthischapter,werstreviewapopularmodelforordinalresponsesinpsychologyandeducation,thepartialcreditmodel,andtheoriginalmethodofparameterestimation.WeproposeahierarchicalBayesianapproachtotacklethelimitationsoftheexistingparameterestimationmethodforthepartialcreditmodel.Wealsoprovidesolutionsforcomputationalissuesoftheproposedmethodanddemonstrateitsapplicationsbyusingrealpsychologydata. 2.1 Review of the Partial Credit Model Wereviewinthissectionthepartialcreditmodelincludingthemodelspec-icationandtheparameterestimationmethod.Motivatedbythefactthatthequestionsonmosteducationaltestsareordinalresponsesinnatureandyetaretreatedasdichotomousitems,Masters(1982)proposedapartialcreditmodeltoanalyzedataofordinalresponses.OnegoodexampleofanitemwhichinvolvesseveralstepstoaccomplishwasgivenbyMasters(1982):r 0:316=? Inordertogetthisquestionright,oneneedstosolvethefollowingthreesteps:Step1: Thetraditionaleducationaltestingscoresperformanceonthiskindofquestionaseitherfailureorsuccess(i.e.dichotomousresponses).However,astudentwhocangetstep1rightorbothsteps1and2rightactuallyperformsdierentlyfrom15

PAGE 24

SupposethereareKordinalcategories.Letyijbetheresponseoftheithindividualtothejthitem(i=1;:::;N;j=1;:::;J).Thus,yij=kiftheresponsebelongstocategoryk(k=1;:::;K).Thepartialcreditmodelcanbespeciedasfollows:logP(yij=k) Letijk=P(yij=k).Then,ijk 1+PKk=2expPkl=2[ijl];ijk=expPkl=2[ijl] 1+PKk=2expPkl=2[ijl];k=2;:::;K:

PAGE 25

wherei=PJj=1PKk=2(k1)yijk;jl=PNi=1PKk=lyijk:;Tj=(j2;:::;jK);Tj=(j2;:::;jK). SincetherearenoclosedformsolutionsfortheMLEofi(i=1;:::;N)andjk(j=1;:::;J;k=2;:::;K),Masters(1982)proposedtouseaniterativenumericproceduresuchastheNewton-RaphsonmethodtoobtaintheMLE.However,accordingtothereviewinthepreviouschapter,theresultingMLEmaysuerfromlackofconsistency(i.e.theNeyman-Scottphenomenon).Inthenextsection,weproposeaBayesianapproachtotacklethisproblem. 2.2 The Proposed Hierarchical Bayesian Model ThemajordicultyforestimatingparametersspeciedinIRTmodelsisthatthenumberofparametersalwaysexceedsthenumberofobservations.TheBayesianapproachtakescareofthisproblembyspecifyingpriordistributionsfortheunknownparametersinordertoreducethedimensionoftheparameterspace.TheconsistencyproblemisresolvedforthefullBayesianmodelwherethedimensionoftheparameterspaceremainsxedwiththechangeinsamplesize.Afrequentistapproachtothisproblemistospecifysomedistributionsforthenuisanceparameters,integratetheseparametersout,andbaseinferenceontheintegratedlikelihood.ThisisessentiallyapracticalBayesapproachincontrasttothefullBayesianapproachastakeninthischapter. ThechoiceofpriorsisofconcerntoBayesianstatisticians.Ifsubjectivepriorsarespecied,thereisdoubtwhethertheyreectthereality.Ontheotherhand,theuseofobjectivepriorsusuallyraisestheissueofimproprietyofposteriors.ThehierarchicalBayesianmodelingisapopularmethodfordealingwiththeuncertaintyinthepriorinformationbyassigningadistributiontothepriorparameters.Inthissection,weproposeahierarchicalBayesianmodeltoestimatetheabilityanddicultyparametersinthepartialcreditmodel.WedemonstratelaterinthischapterthattheproposedhierarchicalBayesianmodelisrobustto

PAGE 26

AssumethefollowingprioriiidN(0;2)jiidN(0;); 22NXi=12i#jjJ(K1) 2exp"1 2JXj=1Tj1j#2c 2tr(1): 22NXi=12i+d!# 2NXi=12i+d!! 2exp1 2tr1h+PJj=1jTji

PAGE 27

i.e.IWJ(K1)+m;+PJj=1jTj 2Tj1j] ApplyingtheGibbssamplingalgorithmthatiteratesbetweenthesefullconditionals,wecansimulatethejointposteriordistribution.Although2andcanbedirectlysampledfromtheinverseGammaandtheinverseWishartdistributions,respectively,therearenotanyknowndistributionscorrespondingtojandi.WeapplytherandomwalkMetropolis-HastingsalgorithmwhichhasbeenintroducedinChapter1togeneratethesetwosetsofvariables. 2.3 The Variance Ratio Method for Monitoring Convergence AftertheGibbssamplingonthefullconditionalsconverges,theposteriordistributionscanbesimulatedbythegeneratedsamples.Hereoneimportantquestionisraised:howdoweknowthattheMCMCalgorithmconverges?BrooksandRoberts(1998)providedacomprehensivereviewoftheexistingconvergenceassessmenttechniques.Inthepresentstudy,thevarianceratiomethodisusedtomonitorconvergencebecauseitcanbeimplementedeasilyandisapplicabletomonitormultivariateparameters.ThismethodwasoriginallydevelopedbyGelmanandRubin(1992)basedontheconceptofANOVA.Supposemparallelsimulations,eachoflengthn,areconducted.Letijdenotethescalarsummaryofinterestfortheiterationjofsequencei,j=1;:::;n;i=1;:::;m.The

PAGE 28

m1mXi=1( ::=1 W; ThismethodwaslaterextendedbyBrooksandGelman(1998)tomonitorconvergenceoftheestimationofavectorparameterwithpdimensions.Thedirectanalogueoftheunivariateapproachinhigherdimensionsisasfollows:B=n m1mXi=1(

PAGE 29

ThemultivariatePSRFisthemaximumPSRFofanylinearprojectionof:^Rp=maxaa0^Va a0Wa: 2.4 An Ecient Computing Technique Whenthevarianceratiomethodisusedtomonitorconvergence,thevalueofPSRFneedstobecalculatedforeachiteration.Thecomputationbecomesmoretimeconsumingasthenumberofiterationsgoesup.Ross(1997)introducedanecientcomputingtechniqueforrecursivelycomputingthesuccessivesamplemeansandsamplevariances,ratherthanhavingtorecomputefromscratchforeachiteration.ConsiderthesequenceofdatavaluesX1;X2;:::,andlet denotethesamplemeanandsamplevarianceoftherstndatavalues,respectively.Thefollowingrecursioncanbeusedtosuccessivelycomputethecurrentvalueofthesamplemeanandsamplevariance.SetS21=0and Xn+1=

PAGE 30

n+1(Xn+1

PAGE 31

=Sn+0+0+n n+1(Xn+1 Demonstration Using Real Psychology Data Inthissection,werstintroducethedataofapsychologicalstudy.ThesedataareusedinthisstudytodemonstrateparameterestimationbytheproposedhierarchicalBayesianmodelunderdierenthyperparameters.ThePSRFiscomputedtomonitorconvergenceoftheMCMC.2.5.1ThePsychologyData Thedatausedintheanalysiscontaintheratingsofopen-endedresponsesfrom87adolescentswithcysticbrosisonapsychologicalmeasurement:Role-PlayInventoryofSituationsandCopingStrategies(RISCS;Quittner,1996).RISCSwasdesignedtoassesscysticbrosispatients'copingstrategiesforsomefrequentanddicultsituationsintheireverydaylives.Themeasurecategorizedthemostcommonproblematicsituationsconfrontedbyteenagerswithcysticbrosisinto7domains:treatmentadherence,parent-teenrelationship,eating/weightgain,growingup/long-termhealthissues,medicalproceduresandsymptoms,schoolissues,andfriends.Eachdomaincontains2-6items.Theadolescentswereencouragedtoplacethemselvesinthesituationsdescribedtothemandrespondastheynormallywouldasspecicallyaspossible.Theraterwasinstructedtoevaluatetheadolescent'scompetencyona4-pointscale,rangingfromextremelyincompetent/ineective(1)toextremelycompetent/eective(4),basedonthecriteriaandbehaviorexamplesintherater'smanual.Forsimplicity,thisstudy

PAGE 32

."Whatwouldyousayordointhissituation?Item19: Thedatacontainingtheratingscorrespondingtothese4itemsareshowninTable2.1.

PAGE 33

ThePSRFvaluesfortherst5000iterationsareshowninFigures2.1-2.4.Theestimatesfor2andtendtoconvergemorequicklythantheonesforand.Thisisbecausetheconvergencerategoesdownwiththenumberofdistinctparametersineachset.Infact,2isascalar.Althoughisa33matrix,itonlycontains6distinctparameters.Ontheotherhand,andcontain87and12parameters,respectively.Theseguresalsoshowthatinordertoachieveconvergenceforallthefoursetsofparameters,theMCMChastorunabout5000iterations.Therefore,alltheestimatesofthemodelarebasedonthe5000sampleswhichweregeneratedaftertheburn-inperiodof5000iterations. Theestimatesoftheparametersunderdierenthyperparametersarecomparedinordertoinvestigatethesensitivityissue.Sincecontains87parameters,thecomparisonofthepointestimatesforallofthemwouldbetedious.Instead,thecomparisonisfocusedononeadolescentwhoseratingsonthe4vignettesconsistofallthe4possiblevalues.Figures2.5-2.8showtheeectsofthechoiceofpriorontheposteriordistributionof14.Ingeneral,theposteriordistributionisinsensitivetothethreevaluesofc,ofd,andofm.Itisalsoinsensitivetothetwodierentstructuresof:=Iand=A,whereI=266664100010001377775;A=26666410:50:50:510:50:50:51377775:

PAGE 34

Figure2.1:MonitoringConvergenceofIterativeSimulationson.

PAGE 35

Figure2.2:MonitoringConvergenceofIterativeSimulationson2.

PAGE 36

Figure2.3:MonitoringConvergenceofIterativeSimulationson.

PAGE 37

Figure2.4:MonitoringConvergenceofIterativeSimulationson.

PAGE 38

Figure2.5:TheEectoftheChoiceofPriorconthePosteriorDistributionof14.

PAGE 39

Figure2.6:TheEectoftheChoiceofPriordonthePosteriorDistributionof14.

PAGE 40

Figure2.7:TheEectoftheChoiceofPriormonthePosteriorDistributionof14.

PAGE 41

Figure2.8:TheEectoftheChoiceofPrioronthePosteriorDistributionof14.

PAGE 42

adolescentitem2item10item14item19 144432242134441423435334364342744418434294244103443114341124341132421141243154343162441172444184443191214201222211213224333234441243243251111264432274211282242291344302431312331324243332344344344354443364343372441383224392241402213414341424344434342444441454443

PAGE 43

Table2.1-Continued. adolescentitem2item10item14item19 464243472343481221494342502343512343521111534422544341554323564444574443584443591341604341611442624343632333641433654443664343672343682444693341702342712442722341733141742443754243764331774443784443792311804441814444824344834241844344854443864341874344

PAGE 47

=I=A 20.4240.41012-0.168-0.163130.1260.16114-0.406-0.36522-0.414-0.40823-0.335-0.369240.1670.17432-0.118-0.00333-0.367-0.29734-0.640-0.641420.2200.24443-0.014-0.049440.4170.342110.0700.08412-0.003-0.02013-0.003-0.020220.0720.08423-0.001-0.015330.0590.073

PAGE 48

TheobjectiveofthischapteristoinvestigatethetheoreticalandpracticalissuesofJohnson'smultiraterordinalmodelandtoproposeanalternativemodelwhichismoreapplicableinageneralsettingofthepsychologicalandeducationaltesting.Wealsoprovidesolutionsforcomputationalissuesoftheproposedmodelanddemonstrateitsapplicationsbyusingrealeducationdata. 3.1 Review of Johnson's Multirater Ordinal Model WereviewinthissectionJohnson'soriginalmultiraterordinalmodel(John-son,1996,1997)includingmodelspecication,assumptionsfortheprior,andtheresultingfullconditionaldistributions.Althoughthismodelwasdesignedtomea-sureoneitem,itcanbegeneralizedtothemultivariatecasetomeasuremultipleitems.ThemultivariategeneralizationofJohnson'smodelisgiveninalatersectionofthischapter. Letyirdenotetheratingassignedbyraterrtoindividuali(yir=1;2;:::;K;i=1;:::;N;r=1;:::;R).Theraterisassumedtorateindividualibyrstestimatinghis/herlatenttraitxionacontinuousscaleandthenassigninganordinalratingbasedontheestimateandcategorycutosspecictotherater.Letzirberaterr'sestimateofindividuali'slatenttrait.Letr=frkg,k=1;:::;K1,bethecategorycutosforraterr.Then,themodelcanbeexpressedaszir=xi+eir;40

PAGE 49

whereyir=k()r;k1
PAGE 50

2exp(zirxi)2 2rI[r;yir1
PAGE 51

Therearesomepracticalissuesthatneedtobedealtwithinthisstudyinordertomakethemodelapplicabletoageneralsetting.First,psychologicaloreducationalmeasurementsusuallyemploymultipleitemsinsteadofonesingleitemtoassessalatenttraitorconstruct.Thus,Johnson'smultiraterordinalmodelcanbeapplicabletoageneralsettingonlyifitcanbegeneralizedtothemultivariatecase.Weoeronesuchgeneralization. Second,inrealpractices,araterisusuallyassignedtogradeonlysomeofthesubjectsinsteadofallofthem,duetothelimitationsoftimeandfunding.However,themodelspecicationintheprevioussectionalwaysassumesthateachratergradesallthesubjects,forsimplicity.Toxthisproblem,itmightrequiresomemodicationsonthelikelihoodfunctionandsomecomplicationsofprogramming. Third,itisusuallytrueinpracticethatsomeratersneverassigncertainratingcategoriestosubjects.However,inJohnson'smodel,thefullconditionalforthecutopointrkrequiresthatraterrhasgeneratedbothratingskandk+1.Weprovideacomputationalsolutionforthisissue. Fourth,althoughtherater-speciccutosprovidepsychologistsoreducatorsvaluableinformationconcerningraterseverity,theinformationmightbecomeoverwhelmingespeciallywhentherearemanyratingcategoriesormanyraters.Inthisstudy,sometransformationsoftheestimatedcutosareproposedinordertoprovideeasilyunderstandableindicesforraterseverity.

PAGE 52

3.3 Propriety of the Posterior for Johnson's Model BasedonJohnson'smultiraterordinalmodel,theposteriorcanbewrittenasfollows:(z;x;;2;;jy)/NYi=1RYr=1 2exp(zirxi)2 2rI[r;yir1
PAGE 54

2expf1 2(zirxi1)TV1r(zirxi1)gIAir(zir); i.e.(Vr)/jVrj1 2(m+J+1)exp[1 2tr(V1r)]:Thejointposteriorcanbewrittenas(z;x;;Vjy)/NYi=1RYr=1jVrj1 2(m+J+2)expf1 2[(zirxi1)TV1r(zirxi1)+x2i+tr(V1r)]gIAir(zir):Thus,thefullconditionalscanbederivedasfollows:(zirjx;;V;y)/expf1 2(zirxi1)TV1r(zirxi1)gIAir(zir);i.e.N(xi1;Vr)truncatedinthedomainAir.(xijz;;V;y)/QRr=1expf1 2[(zirxi1)TV1r(zirxi1)+x2i]g;i.e.NPRr=11TV1rzir 1+PRr=11TV1r1:(r;ljz;x;V;y)/QNi=1IAir(zir);i.e.Umaxmaxi;j:y(j)ir=lz(j)ir;minmini;j:y(j)ir=l+1z(j)ir;l=1;:::;K1:

PAGE 55

2(m+J+2)expf1 2[(zirxi1)TV1r(zirxi1)+tr(V1r)]g;i.e.IWm+1;+PNi=1(zirxi1)(zirxi1)T: A Method for Dealing with Impropriety ItseemsthattheimproprietydemonstratedinSection3.3stemsfromtheimproperprioron.Onewaytosolvethisproblemistoassignproperpriorsfor.AlthoughJohnson(1996;p.43)proposedtousetheuniformonthereallineaspriorforthecutopoints,Section3ofhispaper(p.44)seemstosuggestthattheindependentlyandidenticallydistributedstandardnormalpriorswereactuallyusedwhenaMCMCalgorithmwasimplemented.Weconsiderthisasapossiblesolutionfortheimpropriety.Usingthisapproach,thefullconditionalforr;lisatruncatedstandardnormaldistributionontheintervalmaxmaxi;j:y(j)ir=lz(j)ir;minmini;j:y(j)ir=l+1z(j)ir:Foralltheotherparameters,thefullconditionalsstaythesameasthoseinlastsection. Themodelsdescribedsofaralwaysassumethateachratergradesallthesubjects,forsimplicity.However,asstatedinSection3.2,araterisusuallyassignedtogradeonlyasubsetofthesubjectsinpractice.Tomakethemodelsapplicabletoageneralsetting,theproducttermQNi=1QRr=1canbechangedtoQRr=1Qi2r,whererdenotesthesetofallsubjectsigradedbyraterr.Computationally,itinvolvessettingupanindicatorvariableforeachcombinationofiandr. Thismodiedmodelcanbesummarizedbyitslikelihoodfunction,prior,andfullconditionalsasfollows:LikelihoodfunctionL(z;x;;Vjy)/RYr=1Yi2rjVrj1 2expf1 2(zirxi1)TV1r(zirxi1)gIAir(zir);

PAGE 56

whereAir=8>>>><>>>>:r;y(1)ir1;r;y(1)ir...r;y(J)ir1;r;y(J)ir9>>>>=>>>>;:Prior xN(0;I): r;kN(0;1);undertheconstraint:r;k
PAGE 57

Thesecondcomputationalissueishowtogeneratetruncatedmultivariatenormalvariables.WeadoptthemethodproposedbyDamienandWalker(2001)thatreducestheproblemofsamplingtruncateddensitiestothesamplingofacoupleofuniformrandomvariablesbyintroducingasinglelatentvariable.ConsiderthefollowingtruncatedmultivariatenormaldensityfX1;:::;Xp(x1;:::;xp)/exp(1 2(x)01(x))I(x2D);

PAGE 58

Theabovemethodcanbeusedtosimulateourfullconditional(zirjx;;V;y)/N(xi1;Vr); 2exp[1 2(ta)]I(t>a) andF(tjzir;xi;Vr)=1exp[1 2(ta)];

PAGE 59

Next,werearrangeandpartitiontheinversecovariancematrixV1raccordingly:V1r=264pppqqpqq375; Thethirdcomputationalissueisraisedbythepossibilitythatsomeratersneverassigncertainratingcategoriestoanysubjects.Forexample,intheed-ucationdataweuseinthenextsection(seeTable3.1),rater2hasnevergivenanyratingslowerthan4ona17ratingscale.Whenthishappens,wewoulddenitelyencounterdicultiesinestimatingr;l.Recallthefullconditional:(r;ljz;y)/N(0;1);truncatedintheintervalmaxmaxi;j:y(j)ir=lz(j)ir;minmini;j:y(j)ir=l+1z(j)ir:Ifraterrhasneverassignedratingltoanysubjectionanyitemj,thelowerboundofthetruncatednormalforr;ldoesnotexist.Asaresult,theGibbssamplingcannotcontinuewhenitcyclesthroughr;l.Bythesametoken,ifratingl+1hasneverbeenassignedbyraterr,theGibbssamplinghastostopatr;lbecausetheupperboundforthedistributionofr;ldoesnotexist. Tosolvethisproblemcomputationally,foreachrater,werstpickouttheratingcategorieshe/shehasassigned.Ifraterrhasassignedratingl(l=

PAGE 60

Afterwesampler;lforalltheratingcategoriesraterrhasassigned,wemovedtotheunchosenratings.Letmdenotetheunassignedrating.Ifm=1,wesetr;mtobeanarbitrarilysmallvalue,say5(forastandardnormal,theprobabilitythatthevalueofarandomvariable<5islessthan106).Otherwise(form=2;:::;K1),weassignr;m=r;m1+;whereisanarbitrarilysmallvalue,say106. 3.7 Demonstration Using Real Education Data Inthissection,werstintroducethedataofaneducationalstudy.Thesedatawereanalyzedusingtheproposedmultivariatemodelforratereects.Theresultsofanalysisareshown.Thepracticalimplicationsofthendingsarealsodiscussed.3.7.1TheEducationData DatacollectedbyaneducationalstudyareusedtodemonstratetheproposedBayesianmodelformultipleraters.Thedatacontaintheratingsgeneratedby24trainedratersonacommonlyusededucationalmeasurement:EarlyChildhoodEnvironmentRatingScale-Revised(ECERS-R;Harms,Cliord,&Cryer,1998).ECERS-Rwasdesignedtoassessprocessqualityinanearlychildhoodcaregroup.Processqualityisassessedprimarilythroughtheratingsproducedbytrainedratersbasedontheirobservationsofthearrangementofspace,thematerialsandactivitiesoeredtothechildren,thesupervisionandinteractionsthatoccurintheclassroom,scheduleoftheday,andthesupportoeredtoparentsandsta.Thescaleconsistsof7subscaleseachofwhichcontains410items.Theraterisinstructedtoevaluatethechildcarequalityforeachoftheitemsona7-pointscale,

PAGE 61

Eachofthe24raters,whohavecompletedthetrainingsessions,wasassignedtoobserveandrate48ofthe73classrooms.Eachclassroomwasevaluatedbytworaters.Forsimplicity,thisstudyonlyanalyzesthedatafortheinteractionsubscalewhichconsistsof5itemslistedasfollows:Item29: Thedatacontainingtheratingscorrespondingtothese5itemsareshowninTable3.1.3.7.2TheResultsofAnalysis Applyingtheproposedmethodtoanalyzetheeducationdata,weobtainedsomeestimatedvaluesfortheparameterswhicharedemonstratedinTables3.2-3.5.Thehyperparametersweresetas:m=10;=I.Allthepointestimateswerecalculatedasmediansofthe10;000sampleswhichweregeneratedaftertheburn-inperiodof10;000iterations. Table3.2showstheestimatedvaluesofthe6cutopoints(r1;:::;r6)foreachofthe24raters.Forrater2,theestimatedcutopointsforratings13areallequaltothexedvalue,5:0000,becausethisraterdidnotassignthesethreeratings.Althoughthedistancesbetweencutopointsprovidesomeinformationaboutarater'sratingcriteriaunderanormalcurve,itisverydiculttoobtainaclearpictureofhis/herratingpatternbysimplylookingatthesevalues.Weproposetotransformthecutopointsintoestimatedprobabilitiesthataraterassignsasubjecttothecorrespondingratingcategories.Letr0=and

PAGE 62

ThevaluesintheparenthesesofTable3.3aretheempiricalprobabilities,~prk(r=1;:::;24;k=1;:::;7),thatisdenedastheproportionoftimesthatratingkwasassignedbyraterr.Asshowninthetable,althoughthereisageneralconsistencybetween^prkand~prkintermsofthemagnitude,thedierenceofthetwovaluescouldbelargerthan:10insomecases.Oneadvantageofusingtheproposedmodeltoestimatetheprobabilitiesofratingcategoriesisthatwecanprovidenotonlypointestimates(likethevaluesshowninTable3.3)butalsointervalestimates.Forexample,wecouldcalculatetheposteriorquantilesbasedonthegenerated10;000samplesandconstruct95%credibleintervalsfortheprobabilitiesofratingcategories. Table3.4liststheestimatedvaluesofeachrater'serrorvariancesonthe5items.Thistableprovidesvaluableinformationabouteachrater'sprecisions.Asshowninthetable,aratercouldhavedierentprecisionsondierentitems.Ingeneral,ratersseemtohavemoretroublesonitems1and2.Someratersareveryaccurate(e.g.rater6),whereasotherstendtomakemoreerrors(e.g.rater3). Incurrentpractice,educatorswhouseECERS-Rusuallyaveragethescoresofthe5itemstogettheinteractionsubscalescoreforeachclassroom.Ifmultipleratersgradethesameclassroom,theaverageofthesubscalescoresgivenbythoseratersisreportedtosummarizetheclassroom'squalityontheinteractiondomain.Theresultingaveragescoreisdenotedby

PAGE 63

Theposteriordistributionsforthefoursetsofparametersweregraphedforrater7andclassroom66andareshowninFigures3.1-3.4.TheshapesofthehistogramslookconsistentwiththecorrespondingfullconditionalssummarizedinSection3.5.

PAGE 64

Figure3.1:ThePosteriorDistributionofz(1)66;7.

PAGE 65

Figure3.2:ThePosteriorDistributionofx66.

PAGE 66

Figure3.3:ThePosteriorDistributionof7;1.

PAGE 67

Figure3.4:ThePosteriorDistributionofv(11)7.

PAGE 68

classroomrateritem29item30item31item32item33 122444641244463621951472220575753194234432014322415324444163737651511111516311116172342161821311717565767186457681311113814111139134632691454466105445661076657611625414118345741265566612827776135244641373423314361777144177771519113141520454351619354771620424761711474461712555771831323218461111192246577192467677202223534202435413

PAGE 69

Table3.1-Continued. classroomrateritem29item30item31item32item33 2117465762118445762217111112218111112321555772323565772421476772423455772591111125101111126957677261057677271164534271255362281511113281611113291323321291444325301314274301423222311556576311647576323617773241777733193121133204422634222432334241111335224154435244164736177677362776773731323237461111382145777382346574392164676392364574402145777402346574

PAGE 70

Table3.1-Continued. classroomrateritem29item30item31item32item33 411711111411811113421714434421816413439543144310543144419424274420114274513477774514547774613334624614534644715774764716543454815564764816544724911457749274576501132111501213111511111572511211562526216145284657653967777531066777543777775446777755194434755203242756224776756244777757194341457201221258144677582775775915467759256677601715576601814476

PAGE 71

Table3.1-Continued. classroomrateritem29item30item31item32item33 6121556766123555766221256766223276746322244446324245346451132464721111655445746584357466534574667455766753246467746777686423476883646469311677694176777033322370434446711566777126657772516574727165767351747473717576

PAGE 74

Table3.3-Continued.

PAGE 77

Table3.5-Continued.

PAGE 78

Inthischapter,werstreviewthemostpopularmeasureofagreement:Co-hen'sKappa.Then,wereviewadistancefunctionapproachwhichgeneralizestheKappato(1)multipleraters,(2)multipleitems,and(3)anylevelofmeasurement.Asimulationstudyisconductedtoevaluatetheperformanceofthehypothesistest-ingmethodproposedbyMielkeandBerry(2001)forthedierenceoftwomeasuresofagreement.ApermutationtestisproposedandisshowntooutperformMielkeandBerry'stestbyasimulationstudyonthepowerfunction.Arealpsychologydatasetisusedtodemonstratebothmethods. 4.1 Review of Cohen's Kappa Inthissection,wereviewthemostpopularmeasuresofagreementdevelopedbyCohen(1960,1968).Backinthesixties,themostprimitiveapproachformeasuringagreementwassimplycountinguptheproportionofcasesinwhichtheratersagreed.Thissimplemeasureisstillusedfrequentlyintoday'spsychologysocietybecauseitiseasytocalculateandithasanintuitiveinterpretation.However,Cohen(1960)arguedthatthisprimitivemethoddoesnottakeaccountofthefactthatacertainamountoftheobservedagreementactuallyistobeexpectedbychance.Tosolvethisproblem,heproposedachance-correctedmeasureofagreement:thecoecientKappa.4.1.1Kappa-NominalScale AssumethattworatersindependentlyassigneachofNsubjectsintooneofKcategories.TheresultscanbesummarizedbyaKbyKcontingencytable.Table4.1demonstratesa3by3contingencytable.70

PAGE 79

Letpobetheobservedproportionofcasesonwhichtheratersagree.Letpedenotetheproportionofcasesforwhichagreementisexpectedbychance.ThecoecientKappaisgivenby=pope ThecoecientKappawasdesignedformeasuringrateragreementonanominalscale.Thatis,aslongasthetworatersassignasubjecttodierentcategories,thesubjectiscountedasacaseofdisagreementnomatterwhichpairofcategoriestheraterschoose.However,intheeldsofpsychologyandeducation,choosingadierentpairofcategoriescouldreectadierentdegreeofdisagreement.Forexample,ifeachstudentisassignedtoathree-pointscale:(1)incompetent(2)somewhatcompetent(3)competent,thedegreeofdisagreementforchoosing(1)and(3)isobviouslylargerthantheoneforchoosing(1)and(2).Toincorporatethedisagreementsofvaryingdegree,Cohen(1968)proposedtheweightedKappaasthecorrespondingmeasureofagreement. SupposeapsychologistassignsanagreementweightwijtoeachoftheK2cells(i;j=1;:::;K)accordingtohis/herpracticalsense.Letwmax=maxfwij:i;j=1;:::;Kg.Theweightedproportionsofobservedandchanceagreementarecalculatedaspo=PKi=1PKj=1wijpij

PAGE 80

ThegreatestcontroversysurroundingtheuseofKappaandweightedKappaisthattheirvaluesdependheavilyonthemarginaldistributions.Sprott(2000)andAgresti(2002)haveprovidedexamplestodemonstratethatforagivenassociation(oddsratio),thevalueofKappavarieswiththevaluesofmarginalprobabilities. 4.2 Review of The Distance Function Approach MielkeandBerry(2001)providedacomprehensivereviewofthemethod-ologicaldevelopmentafterCohen'sKappawaspublished.TheycalledforageneralizationoftheKappato(1)multipleraters,(2)multipleitems,and(3)anylevelofmeasurement.TheyproposedadistancefunctionapproachformeasuringdisagreementsthroughEuclideandistancesinthecontextofamultivariaterandom-izedblockdesignsothatmultipleratersandmultipleitemscanbeincludedinthemodel.Theyalsoproposedapermutationmethodtocalculatetheexactp-valueforhypothesistestingandanapproximationmethod. SupposeatestcontainsJitemswhicharedesignedtomeasurethesamelatenttraitofanindividual.Letyir=(y(1)ir;:::;y(J)ir)Tbetheratingsassignedbyraterrtoindividualiaccordingtohis/herperformanceontheJitems(i=1;:::;N;r=1;:::;R).Treatingratersasblocks,amultivariaterandomizedblockdesigncanbeappliedtothissetting.Table4.2demonstratessuchadesign.

PAGE 81

Theobservedproportionofdisagreementcanbewrittenas=1po=1 2: 2: : MielkeandBerry(2001)alsodevelopedhypothesistestingfor<.Since
PAGE 82

Whenthenumberofpermutationsislarge,theyproposedtoemploythePear-sontypeIIIdistributiontoattainanapproximatep-valuewhichusuallyperformsbetterthanthenormalapproximationespeciallywhentheactualdistributionisskewed.BasedonthePearsontypeIIIdistribution,theprobabilitydensityfunctionofycharacterizedbytheskewnessparameter,,isgivenasfollows:If<0: Itshouldbenotedthatif=0,f(y)=(2)1=2exp(y2=2); Theexactmean,variance,andskewnessofunderthepermutationdistribu-tionaregivenby=1

PAGE 83

Inadditiontothehypothesistestfortheagreementamongonegroupofraters,MielkeandBerry(2001)developedanotherhypothesistesttoevaluatethedierencebetweenmeasuresofagreementobtainedfromtwoindependentgroupsofraters.Let<1=11 Itcanbeshowneasilythattheaboveformulafor2DandDarebasedontheassumptionthattheexpectedproportionsofdisagreementunderthepermutation

PAGE 84

4.3 A Simulation Study on the Performance of Mielke and Berry's Test Inthissection,asimulationstudyisconductedtoinvestigatetheperformanceofthehypothesistestforthedierencebetweentwoindependentagreementmea-suresproposedbyMielkeandBerry.Werstreviewthedenitionandpracticalmeaningofthenitemixturemodel.Then,weapplythenitemixturemodeltogeneraterandomsamplesthatsatisfythenullhypothesisorthealternativehypoth-esisandmeasuretheresultingvaluesofthepowerfunction.Comparingthevaluesofthepowerfunctionandthesignicancelevel,wecouldevaluatetheperformanceofMielke&Berry'stest.4.3.1ReviewofFiniteMixture Flury(1997)denedthenitemixturedistributionasfollows:SupposethattherandomvariableYhasadistributionthatcanberepresentedbyapdfoftheformfY(y)=p1f1(y)+:::+pkfk(y); Flury(1997)alsoprovidedagoodwaytothinkofnitemixtures:\Therearekdierentpossibleexperimentsthatcanbeperformed.Theoutcomeof

PAGE 85

Thenitemixtureisusedtosimulatethedatasothatdierentcasesunderthenullhypothesis(alternativehypothesis)canbeeasilysimulatedandthediscretefeatureoftheratingscanbestraightforwardlygenerated.Denegasamultivariatediscreteuniformwiththeparameterspacef1;:::;Kg.Letz=(z1;:::;zN)Tbethelatenttraitsofsubjects1;:::;N.Letx=(x1;:::;xN)Tandy=(y1;:::;yN)Tbetheratingsgeneratedbyaraterfromgroup1andaraterfromgroup2,respectively.Theprocedurefortheexperimentcanbelistedasfollows:Step1: Noticethatdierentcasesunderthenullhypothesis(alternativehypothesis)canbeeasilysimulatedbymanipulatingthevaluesof1and2.Moreover,thediscretefeatureoftheratingsxandycanbestraightforwardlygeneratedbythenite

PAGE 86

Inthesimulationexperiments,wechoosethenumberofratingcategoriesK=10becauseitisprobablythelargestscalecommonlyusedinpractice.ThenumberofratersRissettobe10andthemaximumforthenumberofsubjectsNisassignedtobe20becausethesearetheusuallyacceptablenumbersbasedontheconsiderationofstudycosts.ThenumberofsimulationsMissettobe1000asacompromisebetweenthecomputationaltimeandtheaccuracy. When1(2)islarge,group1(group2)ratersperformwell.When1(2)issmall,group1(group2)ratersperformpoorly.Wesimulatethreecaseswherethenullhypothesisofnogroupdierencesinrateragreementismostlikelytrueandmonitortheresultingvaluesof<1,<2,andthepowerfunctionforsubjectnumberN=1;:::;20.TheresultsofthesimulationexperimentsaresummarizedinTables4.3,4.4,and4.5for1=2=0:10,1=2=0:50,and1=2=0:90,respectively. Theoretically,thevalueofthepowerfunctionunderH0issupposedtobeclosetothesignicancelevel(wesettobe0.05inourexperiments).AsshowninTables4.3-4.5,thevaluesof<1and<2areverycloseforthethreecases.Thissatisesthenullhypothesisofnogroupdierencesinrateragreement.When1=2=:10,theagreementsareverylowforbothgroups(around.01).When1=2=:50,theagreementsarearound.20forbothgroups.When1=2=:90,theagreementsarehighforbothgroups(around.80).However,thevaluesofthepowerfunctionhaveneverbeencloseto0.05inanyofthe3cases.Especially,when1=2=:50or1=2=:90,thevaluesofthepowerfunctionareoutrageous(over.70).Therefore,theTypeIerrorrateforMielkeandBerry'stestistoohightobeacceptable.

PAGE 87

Wealsosimulatetwocaseswherethereexistsomegroupdierencesinrateragreement(i.e.underH1).Again,wemonitortheresultingvaluesof<1,<2,andthepowerfunctionforsubjectnumberN=1;:::;20.TheresultsofthesimulationexperimentsaresummarizedinTables4.6and4.7for1=:40;2=:60and1=:40;2=:80,respectively.InTable4.6,thevaluesof<2areconsistentlyhigherthanthevaluesof<1becausegroup2issetuptohavehigheragreementthangroup1(2>1).InTable4.7,thedierencesbetween<1and<2areevenlargerbecausethedierencebetween1and2issetuptobelarger.Asaresult,thevaluesofthepowerfunctioninTable4.7tendtobehigherthanthevaluesinTable4.6. 4.4 The Proposed Hypothesis Testing Method Asshownintheprevioussection,theperformanceofMielkeandBerry'stestislessthansatisfactoryinthepresentcontext.Thus,weproposeanalternativetestinthissection.WealsoconductasimilarsimulationstudytoevaluatetheperformanceoftheproposedtestandcomparetheresultswiththesimulationresultsforMielkeandBerry'stest. Supposetherearetwogroupsofraterswhoratesubjects1;:::;Ninde-pendently.TheresultingdatastructureisillustratedinTable4.8.Letxirandyir(i=1;:::;N;r=1;:::;R)betheratingsgivenbytherthratersofgroups1and2,respectively,forsubjecti'sperformance.Underthenullhypothesisofnogroupdierencesinrateragreement,onecouldconductapermutationtestbycomparingthestatisticoftheobserveddatawiththe2RRstatisticsofallpossiblepermutationsofRcolumnschosenfrom2Rcolumns(Good,2000). Itisusuallycomputationallyimpossibletoconductallthepossiblepermu-tationsforalargeR,sothatweproposeaMonteCarlomethodtocalculatetheapproximatep-value.LetDobetheobserveddierencebetween<1and<2.Ran-domlyconductingBpermutationsofRcolumnschosenfrom2Rcolumns,one

PAGE 88

Toevaluatetheperformanceoftheproposedhypothesistestingmethod,asimulationstudysimilartotheoneintheprevioussectionisconducted.Theprocedureislistedasfollows:Step1: 4.4.2TheResultoftheSimulationStudy Asthesimulationexperimentsconductedintheprevioussection,wechoosethenumberofratingcategoriesK=10,thenumberofratersR=10,themaximumforthenumberofsubjectsNmax=20,andthenumberofsimulationsM=1000.Inaddition,wechoosethenumberofpermutationsB=1000. Weagainsimulatethreeconditionsunderwhichthenullhypothesisofnogroupdierencesinrateragreementismostlikelytrueandmonitortheresulting

PAGE 89

Theoretically,thevalueofthepowerfunctionunderH0issupposedtobeclosetothesignicancelevel.Likethesimulationstudyconductedintheprecioussection,wesetthesignicanceleveltobe0.05inourexperiments.AsshowninTables4.9-4.11,thevaluesof<1and<2areverycloseforthethreecases.Thissatisesthenullhypothesisofnogroupdierencesinrateragreement.When1=2=:10,theagreementsareverylowforbothgroups(around.01).When1=2=:50,theagreementsarearound.20forbothgroups.When1=2=:90,theagreementsarehighforbothgroups(around.80).Moreover,thevaluesofthepowerfunctionareverycloseto0.05inallthe3cases,althoughwhen1=2=:90,thevaluesofpowerfunctiontendtobealittlelower(around0.001).Therefore,theproposedpermutationmethodperformsbetterthanMielke&Berry'stestintermsofcontrollingtheTypeIerrorrate. Wealsosimulatetwocaseswherethereexistsomegroupdierencesinrateragreement(i.e.underH1).Again,wemonitortheresultingvaluesof<1,<2,andthepowerfunctionforsubjectnumberN=1;:::;20.TheresultsofthesimulationexperimentsaresummarizedinTables4.12and4.13for1=:40;2=:60and1=:40;2=:80,respectively.Thevaluesof<1and<2inTables4.12and4.13areveryclosetothecorrespondingonesinTables4.6and4.7.Thisveriesthattheresultsofthetwosimulationstudiesarecomparable.When1=0:40and2=0:60,thedierencesbetween<1and<2arenotlarge(about0.20)sothepowerfunctionvaluesfortheproposedpermutationtestarenotlarge(about.10).However,thevaluesofthepowerfunctionforMielkeandBerry'stestalreadyreach0.80forthisamountofdierencesinrateragreement.When1=0:40and2=0:80,thedierencesbetween<1and<2arelarger(about0.50)sothepower

PAGE 90

4.5 Demonstration Using Real Psychology Data Inthissection,thehypothesistestingmethodsreviewedinprevioussectionsareusedtoanalyzerealpsychologydata.Theresearchquestionofthepsychologystudyiswhetherthetrainedanduntrainedgroupsofratersaredierentintermsoftheirrateragreements.4.5.1ThePsychologyData ApsychologystudywasconductedtoinvestigateiftheratertrainingmakesadierencefortheratingsonRole-PlayInventoryofSituationsandCopingStrategies(RISCS;Quittner,1996),whichwasreviewedinChapter2.Open-endedresponsesfrom6adolescentswithcysticbrosisonRISCSwerepresentedto18trainedratersand18untrainedratersindividually.Theraterswereinstructedtoevaluatetheadolescent'scompetencyona4-pointscale,rangingfromextremelyincompetent/ineective(1)toextremelycompetent/eective(4),basedonthecriteriaandbehaviorexamplesintherater'smanual.Forsimplicity,thisstudyonlyanalyzesthedataforthetreatmentadherencedomainwhichconsistsof4itemsdescribedinChapter2.Thedatacontainingtheratingscorrespondingtothe4itemsareshowninTables4.14and4.15fortheuntrainedandtrainedraters,respectively.Theresearchquestioniswhetherthetwogroupsofratersaredierentintermsoftherateragreement.4.5.2TheResultsofAnalysis Applyingthedistancefunctionapproachtoanalyzethepsychologydata,wefoundthatthevalueofthegeneralizedmeasureofagreement
PAGE 91

UsingtheproposedMonteCarlopermutationtestwith10,000permutations,thep-valueisequalto0.59whichislargerthanthePearsontypeIIIp-value.Thisresultmatcheswiththegeneralndingofoursimulationexperiments.AlthoughtheproposedmethodleadstothesameconclusionasMielke&Berry'stestinthiscase,theycouldleadtodierentconclusionsonothercasesasdemonstratedinthesimulationstudies.

PAGE 92

RaterA 1p11p12p13p12p21p22p23p23p31p32p33p3 Table4.2:AMultivariateRandomizedBlockDesign. Rater(block) Subject12R 1y11y12y1R2y21y22y2R...............NyN1yN2yNR subjectpower<1<2

PAGE 93

subjectpower<1<2

PAGE 94

subjectpower<1<2

PAGE 95

subjectpower<1<2

PAGE 96

subjectpower<1<2 Table4.8:IllustrationofTheDataStructureofTwoGroupsofRaters. Group1ratersGroup2raters Subject12R12R 1x11x12x1Ry11y12y1R2x21x22x2Ry21y22y2R...........................NxN1xN2xNRyN1yN2yNR

PAGE 97

subjectpower<1<2

PAGE 98

subjectpower<1<2

PAGE 99

subjectpower<1<2

PAGE 100

subjectpower<1<2

PAGE 101

subjectpower<1<2

PAGE 102

ratersubjectitem2item10item14item19 114221124443131211142331152444164343214223223443232221241341252344264443314243324443331211342331352434363444414341424443432331441341452344464443514243524443531221541331552444564443614241624443633321641331652344664443

PAGE 103

Table4.14-Continued. ratersubjectitem2item10item14item19 714443724443731211742331752344764443814241824443832211842341852344864443914243924443932221942421952344964443101434310244431031211104234110523441064443111434311243431132421114134111523441164443121424212243431232421124234112533441264443

PAGE 104

Table4.14-Continued. ratersubjectitem2item10item14item19 131424113234431332211134233113523441364343141424314244431432421144134114523441464443151422315244231533211154232115523241564323161424216244431631211164233216523441664343171434317244431732221174134117523441764343181424118244431832421184133118523441864443

PAGE 105

ratersubjectitem2item10item14item19 114242124343132231142322152344164443214342224443232221242341252344264343313221324423331111342331352334364433414243423443432221442331453344464443514241524443531221542341553344564443614231624443631221642341652344664443

PAGE 106

Table4.15-Continued. ratersubjectitem2item10item14item19 713243724443731221741341752344764443814341824443832211842341852344864443914243924443932221941341952344964343101421210234431031221104233210533441064343111424311244431132231114234111523441164443121434112244431231411124134112523441264443

PAGE 107

Table4.15-Continued. ratersubjectitem2item10item14item19 131424313244431331221134134113523441364343141424114243431431211144134114523441464443151424315234431533221154134115523441564343161434216244431632221164134116523441664443171324317244431732221174234117523441764344181434218244431831411184134118523441864344

PAGE 108

Themodelsproposedinthepreviouschaptersaredesignedforatestmea-suringonesinglelatenttrait.However,atypicalpsychologicaloreducationalmeasurementusuallyconsistsofseveralsubscaleseachofwhichcontainsmultipleitems.Eachsubscaleisdesignedtomeasureonelatenttraitorconstruct.Ifitisassumedthatthelatenttraitsinatestareindependent,onecouldtamodelforeachsubscaleseparately.However,thisassumptionofindependenceisusuallynotvalid.Takethepsychologicalmeasurementintroducedinthisstudy(RISCS)asanexample,"treatmentadherence"isdenitelyrelatedto"parent-teenrelationship,""eating/weightgain,"and"growingup/long-termhealthissues."Therefore,thefuturemodeldevelopmenthastotakeaccountofthecorrelationstructureamonglatenttraits. Whenameasurementinvolvescategoricalratings,thechoiceofordinalscalesisusuallyarbitrary.Forexample,thepsychologicalmeasurementintroducedinthisstudy(RISCS)usesa4-pointscalewhereastheeducationalmeasurement(ECERS-R)adoptsa7-pointscale.Moreover,the10-pointscalehasbeencommonlyusedinthemedicalscience.Futureresearchmightevaluatesomestatisticalcriteriasuchasthepowerofatestorthegoodness-of-tstatisticunderdierentscalessothatanobjectiveguidelineforchoosingdierentscalescanbedeveloped. Samplesizedeterminationisalwaysanimportantpracticalissue.Inourresearchcontext,thequestionweusuallyaskis:Howmanysubjectsandratersdoweneedinordertomakevalidinferencesbasedonthehypothesistestingresults.AlthoughoursimulationexperimentsinChapter4donotshowmuchimprovement100

PAGE 109

InChapters2and3,theBayesianmodelsgenerateestimatesforsubjects'latenttraits.Inpractice,theaveragesofitemscoresarecommonlyusedtoestimatethelatenttraits.Theoretically,theestimatesbasedonthemodelsaresupposedtobebetterestimatesforthelatenttraitsbecausethemodelshavetakenaccountoftheratereectsoritemdiculties.Itrequiresfutureresearchtoverifythesuperiorityofthemodel-basedestimatesovertheaveragescoresbycomparingtheircapacitiesforpredictinglateroutcomes.Forexample,ifthelatenttraitistheabilityofastudent,abetterestimateoftheabilityshouldbeabetterpredictorofthestudent'slongitudinalacademicperformance. Inaregularsettingofpsychologicalresearch,multipleratersobservingthesamesubjectsareonlyemployedinthetrainingstage.Afterratersachievecertaincriteria,theyareusuallysentouttotheeldsontheirowninordertosavethecost.However,itisquestionabletouseonesinglerater'sratingsasthemeasureforthesubject'sperformanceeveniftheraterhasbeen"calibrated"before.Futureresearchmightstudyhowtousetheratereectsestimatedinthesettingofmultipleraterstoadjustfortheratingsgeneratedinthesingleratersetting. Whenratersareinthetrainingstage,thereisapracticalneedtomeasuretheagreementbetweentheratersandtheexpert(goldstandard)inordertoevaluatetheprogressofthetrainingprocedure.MielkeandBerry(2001)providedsuchaagreementmeasurebasedonadistancefunctionapproachsimilartotheonereviewedinChapter4.TheyalsoproposedahypothesistestofwhichtheapproximateprobabilityvaluesarebasedonthePearsontypeIIIdistribution.Itisnecessarytoconductafuturestudytoevaluatetheperformanceofthisparticulartestandinvestigateitsapplicationimplications.

PAGE 110

Agresti,A.(1990).Categoricaldataanalysis.NewYork:JohnWiley&Sons. Agresti,A.(2002).Categoricaldataanalysis(2nded.).NewYork:JohnWiley&Sons. Andrich,D.(1978).Aratingformulationfororderedresponsecategories.Psy-chometrika43,561-573. Berry,K.J.&Mielke,P.W.(1988).AgeneralizationofCohen'skappaagree-mentmeasuretointervalmeasurementandmultipleraters.EducationalandPsychologicalMeasurement48,921-933. Berry,K.J.&Mielke,P.W.(1997).Agreementmeasurecomparisonsbetweentwoindependentsetsofraters.EducationalandPsychologicalMeasurement57,360-364. Bock,R.D.(1997).Abriefhistoryofitemresponsetheory.EducationalMeasure-ment:Issues&Practice16,21-33. Brennan,R.L.(2001).Generalizabilitytheory.NewYork,NY:Springer-Verlag. Brooks,S.P.&Gelman,A.(1998).Generalmethodsformonitoringconvergenceofiterativesimulations.JournalofComputationalandGraphicalStatistics7,434-455. Brooks,S.P.&Roberts,G.O.(1998).ConvergenceassessmenttechniquesforMarkovchainMonteCarlo.StatisticsandComputing8,319-335. Casella,G.&George,E.I.(1992).ExplainingtheGibbssampler.TheAmericanStatistician46,167-174. Chib,S.&Greenberg,E.(1995).UnderstandingtheMetropolis-Hastingsalgo-rithm.TheAmericanStatistician49,327-335. Cohen,J.(1960).Acoecientofagreementfornominalscales.EducationalandPsychologicalMeasurement20,37-46. Cohen,J.(1968).Weightedkappa:Nominalscaleagreementwithprovisionforscaleddisagreementorpartialcredit.PsychologicalBulletin70,213-220. Cronbach,L.J.(1988).Fiveperspectivesonvalidityargument.InH.Wainer&H.Braun(Eds.),Testvalidity.Hillsdale,NJ:Erlbaum.102

PAGE 111

Damien,P.&Walker,S.G.(2001).Samplingtruncatednormal,beta,andgammadensities.JournalofComputationalandGraphicalStatistics10,206-215. Devroye,L.(1986).Non-uniformrandomvariategeneration.NewYork,NY:Springer-Verlag. Flury,B.(1997).Arstcourseinmultivariatestatistics.NewYork:Springer-Verlag. Gelman,A.&Rubin,D.B.(1992).Inferencefromiterativesimulationusingmultiplesequences.StatisticalScience7,457-511. Ghosh,A.(1996).Bayesiananalysisofitemresponsemodelsforbinarydata.Ph.D.Dissertation,UniversityofFlorida. Ghosh,M.(1995).InconsistentmaximumlikelihoodestimatorsfortheRaschmodel.StatisticsandProbabilityLetters23,165-170. Ghosh,M.,Ghosh,A.,Chen,M.,&Agresti,A.(2000).Noninformativepriorsforone-parameteritemresponsemodels.JournalofStatisticalPlanningandInference88,99-115. Gilks,W.R.,Richardson,S.,&Spiegelhalter,D.J.(1996).IntroducingMarkovchainMonteCarlo.InW.R.Gilks,S.Richardson&D.J.Spiegelhalter(Eds.),MarkovchainMonteCarloinpractice.NewYork:Chapman&Hall. Good,P.(2000).Permutationtests.Apracticalguidetoresamplingmethodsfortestinghypotheses(2nded.).NewYork:Springer-Verlag. Hambleton,R.K.&Swaminathan,H.(1985).Itemresponsetheory:Principlesandapplications.Hingham,MA:KluwerBoston. Harms,T.,Cliord,R.M.,&Cryer,D.(1998).Earlychildhoodenvironmentratingscale.Revisededition.NewYork,NY:TeachersCollegePress. Hastings,W.K.(1970).MonteCarlosamplingmethodsusingMarkovchainsandtheirapplicatons.Biometrika57,97-109. Hobert,J.P.&Casella,G.(1996).TheeectofimproperpriorsonGibbssam-plinginhierarchicallinearmixedmodels.JournaloftheAmericanStatisticalAssociation91,1461-1473. Johnson,V.E.(1996).OnBayesiananalysisofmultiraterordinaldata:Anapplicationtoautomatedessaygrading.JournaloftheAmericanStatisticalAssociation91,42-51. Johnson,V.E.(1997).AnalternativetotraditionalGPAforevaluatingstudentperformance.StatisticalScience12,251-278.

PAGE 112

Johnson,V.E.&Albert,J.H.(1999).Ordinaldatamodeling.NewYork,NY:Springer-Verlag. Linacre,J.M.(1989).Many-facetRaschmeasurement.Chicage,IL:MESAPress. Masters,G.N.(1982).ARaschmodelforpartialcreditscoring.Psychometrika47,149-174. Messick,S.(1995).Validityofpsychologicalassessment:Validationofinferencesfrompersons'responsesandperformancesasscienticinquiryintoscoremean-ing.AmericanPsychologist50,741-749. Metropolis,N.,Rosenbluth,A.W.,Rosenbluth,M.N.,Teller,A.H.,&Teller,E.(1953).Equationsofstatecalculationsbyfastcomputingmachines.JournalofChemicalPhysics21,1087-1092. Mielke,P.W.&Berry,K.J.(2001).Permutationmethods.Adistancefunctionapproach.NewYork:Springer-Verlag. Neyman,J.&Scott,E.L.(1948).Consistentestimatesbasedonpartiallyconsis-tentobservations.Econometrika16,1-22. Patz,R.J.,Junker,B.W.,&Johnson,M.S.(2000).Thehierarchicalratermodelforratedtestitemsanditsapplicationtolarge-scaleeducationalassessmentdata.CarnegieMellonUniversityStatisticsDepartmentTechnicalReportNo.712,Pittsburg. Quittner,A.L.(1996).Role-PlayInventoryofSituationsandCopingStrategies(RISCS):Adolescentandparent-adolescentversions.Bloomington,IN:IndianaUniversity. Rasch,G.(1961).Ongenerallawsandthemeaningofmeasurementinpsychology.ProceedingsoftheFourthBerkeleySymposiumonMathematicalStatistics4,321-333. Resnick,L.B.&Resnick,D.P.(1992).Assessingthethinkingcurriculum:Newtoolsforeducationalreform.InB.R.Giord&M.C.O'Connor(Eds.),Chang-ingassessments:Alternativeviewsofaptitude,achievementandinstruction.Norwell,MA:KluwerAcademic. Robert,C.P.&Casella,G.(1999).MonteCarlostatisticalmethods.NewYork:Springer-Verlag. Roberts,G.O.(1996).Markovchainconceptsrelatedtosamplingalgorithms.InW.R.Gilks,S.Richardson,&D.J.Spiegelhalter(Eds.),MarkovchainMonteCarloinpractice.NewYork:Chapman&Hall. Ross,S.M.(1997).Simulation.SanDiego,CA:AcademicPress.

PAGE 113

Samejima,F.(1969).Estimationoflatentabilityusingaresponsepatternofgradedscores.PsychometrikaMonographSupplement17. Shavelson,R.J.&Webb,N.M.(1991).Generalizabilitytheory:Aprimer.New-buryPark,CA:SAGE. Sprott,D.A.(2000).Statisticalinferenceinscience.NewYork:Springer-Verlag. Swaminathan,H.&Giord,J.A.(1985).Bayesianestimationinthetwo-parameterlogisticmodel.Psychometrika50,349-364.