Reducing decision errors in the paired comparison of the diagnostic accuracy of screening tests with Gaussian outcomes

MISSING IMAGE

Material Information

Title:
Reducing decision errors in the paired comparison of the diagnostic accuracy of screening tests with Gaussian outcomes
Physical Description:
Mixed Material
Language:
English
Creator:
Ringham, Brandy M.
Alonzo, Todd A.
Brinton, John T.
Kreidler, Sarah M.
Munjal, Aarti
Muller, Keith E.
Glueck, Deborah H.
Publisher:
BioMed Central (BMC Medical Research Methodology)
Publication Date:

Notes

Abstract:
Background: Scientists often use a paired comparison of the areas under the receiver operating characteristic curves to decide which continuous cancer screening test has the best diagnostic accuracy. In the paired design, all participants are screened with both tests. Participants with suspicious results or signs and symptoms of disease receive the reference standard test. The remaining participants are classified as non-cases, even though some may have occult disease. The standard analysis includes all study participants, which can create bias in the estimates of diagnostic accuracy since not all participants receive disease status verification. We propose a weighted maximum likelihood bias correction method to reduce decision errors. Methods: Using Monte Carlo simulations, we assessed the method’s ability to reduce decision errors across a range of disease prevalences, correlations between screening test scores, rates of interval cases and proportions of participants who received the reference standard test. Results: The performance of the method depends on characteristics of the screening tests and the disease and on the percentage of participants who receive the reference standard test. In studies with a large amount of bias in the difference in the full areas under the curves, the bias correction method reduces the Type I error rate and improves power for the correct decision. We demonstrate the method with an application to a hypothetical oral cancer screening study. Conclusion: The bias correction method reduces decision errors for some paired screening trials. In order to determine if bias correction is needed for a specific screening trial, we recommend the investigator conduct a simulation study using our software. Keywords: Cancer screening, Differential verification bias, Area under the curve, Type I error, Power, Paired screening trial, Receiver operating characteristic analysis
General Note:
Ringham et al. BMC Medical Research Methodology 2014, 14:37 http://www.biomedcentral.com/1471-2288/14/37; Pages 1-12
General Note:
doi:10.1186/1471-2288-14-37 Cite this article as: Ringham et al.: Reducing decision errors in the paired comparison of the diagnostic accuracy of screening tests with Gaussian outcomes. BMCMedical Research Methodology 2014 14:37.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
© 2014 Ringham et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated
Resource Identifier:
System ID:
AA00021508:00001

Full Text

PAGE 1

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 http://www.biomedcentral.com/1471-2288/14/37 RESEARCHARTICLE OpenAccessReducingdecisionerrorsinthepaired comparisonofthediagnosticaccuracy ofscreeningtestswithGaussianoutcomesBrandyMRingham1*,ToddAAlonzo2,JohnTBrinton3,SarahMKreidler3,AartiMunjal3,KeithEMuller4andDeborahHGlueck3 AbstractBackground: Scientistsoftenuseapairedcomparisonoftheareasunderthereceiveroperatingcharacteristiccurves todecidewhichcontinuouscancerscreeningtesthasthebestdiagnosticaccuracy.Inthepaireddesign,all participantsarescreenedwithbothtests.Participantswithsuspiciousresultsorsignsandsymptomsofdisease receivethereferencestandardtest.Theremainingparticipantsareclassifiedasnon-cases,eventhoughsomemay haveoccultdisease.Thestandardanalysisincludesallstudyparticipants,whichcancreatebiasintheestimatesof diagnosticaccuracysincenotallparticipantsreceivediseasestatusverification.Weproposeaweightedmaximum likelihoodbiascorrectionmethodtoreducedecisionerrors. Methods: UsingMonteCarlosimulations,weassessedthemethodsabilitytoreducedecisionerrorsacrossarange ofdiseaseprevalences,correlationsbetweenscreeningtestscores,ratesofintervalcasesandproportionsof participantswhoreceivedthereferencestandardtest. Results: Theperformanceofthemethoddependsoncharacteristicsofthescreeningtestsandthediseaseandon thepercentageofparticipantswhoreceivethereferencestandardtest.Instudieswithalargeamountofbiasinthe differenceinthefullareasunderthecurves,thebiascorrectionmethodreducestheTypeIerrorrateandimproves powerforthecorrectdecision.Wedemonstratethemethodwithanapplicationtoahypotheticaloralcancer screeningstudy. Conclusion: Thebiascorrectionmethodreducesdecisionerrorsforsomepairedscreeningtrials.Inorderto determineifbiascorrectionisneededforaspecificscreeningtrial,werecommendtheinvestigatorconducta simulationstudyusingoursoftware. Keywords: Cancerscreening,Differentialverificationbias,Areaunderthecurve,TypeIerror,Power,Pairedscreening trial,ReceiveroperatingcharacteristicanalysisBackgroundPairedscreeningtrialsarecommonincancerscreening.Forinstance,oneofthedesignsconsideredfora plannedoralcancerscreeningstudywasapairedcomparisonofthevisualandtactileoralexamwiththeVELscope imagingdevice[1].Tworecentbreastcancerscreening *Correspondence:bringham@ucla.edu 1 CenterforCancerPreventionandControlResearch,UniversityofCalifornia, LosAngeles,650CharlesYoungDriveSouth,RoomA2-125CHS,LosAngeles CA90095,USA Fulllistofauthorinformationisavailableattheendofthearticlestudiesusedapaireddesigntocomparefilmanddigital mammography[2,3]. Inpairedcancerscreeningtrials,investigatorsscreen allparticipantswithbothscreeningtests.Thescreening testsmaymeasureaparticipantsdiseasestatuswitherror. Toascertainparticipantsdiseasestatesmoredefinitively, thestudyinvestigatortestseachparticipantwithasecond,moreaccurateprocedure.Werefertothedefinitive procedureasareferencestandardtest.Incancerscreening,themostaccuratereferencestandardtestisbiopsy followedbypathologicalconfirmationofdisease.Biopsy ispainfulandinvasiveandcanonlybeperformedon 2014Ringhametal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycredited.TheCreativeCommonsPublicDomainDedication waiver(http://creativecommons.org/publicdomain/zero/1.0/)appliestothedatamadeavailableinthisarticle,unlessotherwise stated.

PAGE 2

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page2of12 http://www.biomedcentral.com/1471-2288/14/37individualswithavisiblelesion.Thus,thestudyinvestigatordeterminesparticipantsdiseasestatesasfollows. Participantswithunremarkablescreeningtestscoreson bothscreeningtestsenterafollow-upperiod.Participantswithsuspiciousscreeningtestscoresorwhoshow signsandsymptomsofdiseaseduringfollow-upundergo furtherworkupleadingtoareferencestandardtest.Participantswhodonotshowsignsandsymptomsofdisease duringfollow-upareassumedtobedisease-free.Followupcanbethoughtofasanimperfectreferencestandard. Thereferencestandardisimperfectbecause,intruth, someparticipantsmayhaveoccultdisease. InthetrialbyLewin etal. [2],theinvestigatorsused astandardanalysistocomparethefullareasunderthe receiveroperatingcharacteristiccurves.Thestandard analysisincludesallparticipants,eventhosewhosediseasestatusisnotverifiedwiththereferencestandardtest. Becausesomecasesmaybemisclassified,theestimates ofdiagnosticaccuracymaybebiased,causingdecision errors[4].Ifthebiasissevereenough,investigatorscan detectadifferencebetweenscreeningtestswhenthere isnone,orconcludeincorrectlythattheinferiortestis superior.Choosingtheinferiortestcandelaydiagnosis, increasingmorbidityandmortality. Screeningtrialsaresubjecttodifferentbiasesdependingonthechoiceofreferencestandardandanalysisplan [5-8].Pairedscreeningtrialbias[4],thefocusofour research,isaspecialcaseofdifferentialverificationbias. Differentialverificationbiasoccurswhen1)areference standardisusedforsomeparticipantsandanimperfect referencestandardisusedfortheremainingparticipants, 2)thedecisiontousethereferencestandarddependson thescreeningtestresultsand3)datafromallparticipantsareincludedintheanalysis[7].Pairedscreening trialbiasoccursinpairedstudieswhenthescreeningtests aresubjecttodifferentialverificationbiasandreferdifferentproportionsofparticipantstothereferencestandard test[4]. Weproposeabias-correctionmethodtoreducedecisionerrorsinpairedcancerscreeningtrials.Underthe assumptionthatthescreeningtestscoresfollowabivariateGaussiandistribution,conditionalondiseasestatus, weuseaniterative,maximumlikelihoodapproachto reducethebiasintheestimatesofthemean,variance andcorrelation.Theresultingestimatesarethenusedto reducebiasintheestimatesofthediagnosticaccuracyof thescreeningtests. Inthefollowingsections,wedescribethebiascorrectionmethodandevaluateitsperformancebysimulation. IntheMethodssection,weexplainthestudydesignof interest,outlinetheassumptionsandnotation,delineate thebiascorrectionalgorithmanddescribethedesignof thesimulationstudies.IntheResultssection,wereport theresultsofthesimulationstudiesanddemonstrate theutilityofthemethodusingahypotheticaloralcancerscreeningstudy.Finally,intheDiscussionsection, wediscusstheimplicationsoftheresultsandprovide recommendations.MethodsStudydesignThestudydesignofinterestisapairedstudyoftwocontinuouscancerscreeningtests.Aflowchartofthestudy designisshowninFigure1. Weconsiderthescreeningstudyfromtwopointsof view[9].Thefirstviewpointisthatoftheomniscient observerwhoknowsthe true diseasestatusofeachparticipant.Thesecondviewpointisthatofthestudyinvestigator,whocanonlyknowthediseasestatus observed inthe study. Thestudyinvestigatordeterminesaparticipants observed diseasestatusasfollows.Anyscorethatexceeds thethresholdofsuspiciondefinedforeachscreeningtest triggerstheuseofareferencestandardtest.Casesidentifiedduetoremarkablescreeningtestscoresarereferred toas screen-detected cases.Participantswithunremarkablescreeningtestscoresonbothscreeningtestsentera follow-upperiod.Someparticipantsmayshowsignsand symptomsofdiseaseduringthefollow-upperiod,leading toareferencestandardtestandpathologicalconfirmation ofdisease.Theseparticipantsarereferredtoas interval cases.Werefertothecollectionofscreen-detectedcases andintervalcasesasthe observed cases.Participantswith unremarkablescreeningtestscoreswhodonotshowsigns andsymptomsofdiseaseduringthefollow-upperiodare assumedtobedisease-free,or observed non-cases. Undertheassumptionthatthereferencestandardtest is100%sensitiveandspecific,thestudydesigndescribed abovewillcorrectlyidentifyallnon-cases.However,the designmaycausesomecasestobemisclassifiedasnoncases. Misclassified casesoccurwhenstudyparticipants whoactuallyhavediseasereceiveunremarkablescreening testscoresandshownosignsorsymptomsofdisease. Wepresentagraphofahypotheticaldatasetofscreeningtestscores(Figure2)toillustratehowthestudyinvestigatorobservesdiseasestatus.Theaxesrepresentthe thresholdsofsuspicionforeachscreeningtest.Wecan identifythemisclassifiedcasesbecausewepresentthis graphfromanomniscientpointofview.StandardanalysisInthestandardanalysis,thestudyinvestigatorcompares thediagnosticaccuracyofthetwoscreeningtests,measuredbythefullareaunderthereceiveroperatingcharacteristiccurve.Thegoaloftheanalysisistochoosethe screeningtestwithsuperiordiagnosticaccuracy. Thereceiveroperatingcharacteristiccurvesarecalculatedusingdatafromallcasesandnon-cases observed in

PAGE 3

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page3of12 http://www.biomedcentral.com/1471-2288/14/37 Signs and symptoms? Screen positive on one or both tests? Reference standard test Follow-up No observed disease Observed disease Yes No No Yes Confirmed disease? Yes No Screening by both tests Figure1 Flowchartofapairedtrialoftwocontinuousscreeningtests. Theflowchartculminatesinthestudyinvestigatorsobservationofthe diseasestatusoftheparticipant. Screening Test 2 ScoreScreening Test 1 ScoreDetected by both screening tests Detected by Screening Test 2 only Detected by Screening Test 1 only Detected only if signs and symptoms of disease Non-Cases Interval Cases Misclassified Cases Screen Detected Cases Partition A Partition B Figure2 Hypotheticaldataforapairedscreeningtrial. Datain partition A (gray)arethesetoftruecaseswhereatleastone screeningtestscorefallsabovethethresholdforthatscreeningtest. Datainpartition B (white)arethesetoftruecaseswherethescores onbothscreeningtestsfallbelowtheirrespectivethresholds.thestudy.Whencasesaremisclassified,thedenominatorofthesensitivitydecreaseswhileboththenumerator anddenominatorofthespecificityincrease.Asaresult, thestudyinvestigatoroverestimatesboththesensitivity andspecificityofthescreeningtest.Theerrorinsensitivityandspecificitycausesconcomitanterrorsinthe areaunderthecurve.Thus,the observed areaunderthe curvecanbebiased.Pairedscreeningtrialbiasoccurs whenthe observed areasunderthecurvesaredifferentiallybiased,causingthedifferencebetweenthe observed areastobeeitherlargerorsmallerthanthetruestateof nature. Theproposedbiascorrectionmethodonlycorrectsthe estimationofthesensitivityanddoesnotcorrectspecificity.Forscreeningtrialsusingthestudydesignand standardanalysisdescribedabove,theerrorinsensitivitymaybelarge[4].Theerrorinspecificity,however,is typicallynegligible.Thelargenumberofnon-casesmakes thespecificityrobusttosmalldeviationsinthenumberof observed cases.Inscenarioswithahigherdiseaseprevalence,theerrorintheuncorrectedspecificitymayaffect theperformanceofthemethod.Assumptions,definitionsandnotationWemakeaseriesofassumptions.Let n bethetotalnumberofstudyparticipantsand theprevalenceofdisease inthepopulation.Assumingsimplerandomsampling,

PAGE 4

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page4of12 http://www.biomedcentral.com/1471-2288/14/37thenumberofparticipantswithdiseaseis M ,andis distributed M Binomial ( n ) .(1) Let i indexparticipants, j indexthescreeningtestand k indicatethetruepresence( k = 1)orabsence( k = 0) ofdisease.Thepairofscreeningtestscores, Xi 1 kand Xi 2 k,areindependentlyandidenticallydistributedbivariateGaussianrandomvariableswithmeans jk,variances 2 jk,andcorrelation k. Let ajbethethresholdofsuspicionforscreeningtest j .Allscoresabovethethresholdwilltriggertheuseofa referencestandardtest.Forscreeningtest j ,the percent ascertainment is100timesthenumberofparticipants withdiseasewhoscoreabovethethresholdonscreening test j ,dividedbythetotalnumberofparticipantsobserved tohavedisease. Let I betheeventthataparticipantshowssignsand symptomsofdiseaseand P ( I | k = 1 ) = .Becauseparticipantswithoutthetargetdiseaseareunlikelytoshowsigns andsymptomsofthatdisease,weassumethat P ( I | k = 0 ) = 0.Inpractice,however,theclinicianmustrespond toanysignsandsymptomswithfurthertesting,evenif thosesignsandsymptomsmaynot,infact,becaused bythetargetdisease.Thus,participantswhoshowsigns andsymptomsofdiseaseduringthefollow-upperiodwill stillreceivethereferencestandardtestandsubsequent pathologicalconfirmation.BiascorrectionalgorithmWedescribeanalgorithmtoreducebiasinestimatesof diagnosticaccuracy.Thealgorithmcorrectsthemaximumlikelihoodestimatesoftheparametersofthedistributionofcasescreeningtestscores.Thealgorithmthen usesaweightingschemetoreducethevarianceofthe estimates.Thecorrectedmaximumlikelihoodestimates areusedtocalculatecorrectedestimatesofthediagnostic accuracyofthescreeningtests. Thealgorithmrequiresfoursteps.Step1.PartitionThecasescanbestratifiedintotwosets,shownin Figure2.Let A (datainthegrayarea)bethesetof true caseswithatleastonescreeningtestscoreaboveits respectivethreshold.Let B (datainthewhitearea)bethe setof true caseswherethescoresonbothscreeningtests fallbelowtheirrespectivethresholds.Thepercentagesof participants observed tohavethediseaseinsets A and B differ:allcasesinset A areobserved,butonlyafraction ofcasesareobservedinset B .Theestimationforeachset ishandledseparatelyinStep2.Then,inStep3,theestimatesarecombinedusingweightingproportionaltothe samplingfraction([10],p.81,Equation3.3.1).Step2.MaximumlikelihoodestimationWeobtainmaximumlikelihoodestimatesofthebivariate Gaussianparametersforthecases.Theestimationprocess followstheiterativemethodsuggestedbyNath[11].The methodallowsunbiasedestimationofbivariateGaussian parametersfromsinglytruncatedconvexsamplespaces. Toobtainsinglytruncatedconvexsets,wefurtherpartitionthesamplespaceintoquadrants Ql{ 1,2,3,4 } ,as showninTable1. Thestartingvaluesfortheiterationarethesample statisticsforthe observed casesineachquadrant.Using theNathmethodforeachsetofstartingvaluesresults infoursetsofquadrantspecificmaximumlikelihood estimates.Fromthefourquadrantspecificestimates,we choosethesetthatmaximizestheloglikelihoodofthefull bivariateGaussiandistribution.Werefertothatsetasthe Nathestimates,denotedby 11, N, 21, N, 2 11, N, 2 21, Nand 1, N. Werequirethesamplevarianceasastartingvaluefor theNathalgorithm.Thus,quadrantspecificestimatesare notcalculatedforquadrantscontaininglessthantwodata points.Step3.WeightingTheNathestimatesarebasedononlyonequadrantof data.Weusetheprocessdescribedbelowtocalculate weightedestimateswhichincorporatedatafromallquadrants,therebyloweringthevariance. First,theNathestimatesareusedasinputsforcalculatingthesamplingfractionforsets A and B .Definethe estimatedprobabilityof A as = 1 Š a1Š 11, N 11, N, a2Š 21, N 21, N, 1, N .(2) Second,the observed dataareusedtocalculatethe observed samplestatisticsforsets A and B .The observed samplestatisticsaredefinedasfollows.Let k= 1ifaparticipantisobservedtohavediseaseand k= 0otherwise. Forset s { A B } ,screeningtest j and observed disease status k,let Xjk, sbethesamplemean, Sjk, sbethesamplestandarddeviationand rk, sbethesamplecorrelation betweenthescreeningtests. Finally,theweightedestimatesarecalculatedasafunctionofthesamplingfraction(Equation2)andthe observedTable1Quadrantdefinitions QuadrantDefinition Q1{ xi 1 k a1; xi 2 k a2} Q2{ xi 1 k a1; xi 2 k< a2} Q3{ xi 1 k< a1; xi 2 k a2} Q4{ xi 1 k< a1; xi 2 k< a2}

PAGE 5

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page5of12 http://www.biomedcentral.com/1471-2288/14/37samplestatisticsforsets A and B .Wederivedexpressionsfortheweightedestimatesusingtheconditional covarianceformula([12],p.348,Proposition5.2)andthe definitionoftheweightedmean([10],p.77,Equation 3.2.1).Let j 1, W, 2 j 1, Wand 1, Wbetheweightedestimates ofthemean,varianceandcorrelationofthescreeningtest scoresforthecases,respectively.Wedefinetheestimates asfollows: 11, W= X11, A+ ( 1 Š ) X11, B,(3) 21, W= X21, A+ ( 1 Š ) X21, B,(4) 2 11, W=G1+H1Š 2 11, W,(5) 2 21, W=G2+H2Š 2 21, W(6) and 1, W= Š 1 11, W Š 1 21, W(P+QŠ 11, W 21, W) ,(7) whereGj= X2 j 1, A+ S2 j 1, A ,(8)Hj= ( 1 Š ) X2 j 1, B+ S2 j 1, B ,(9)P= X11, A X21, A+ S11, AS21, Ar1, A(10) andQ= ( 1 Š ) X11, B X21, B+ ( 1 Š ) S11, BS21, Br1, B.(11) Theweightedestimatesarethe corrected estimatesused tocalculatethe corrected areasunderthereceiveroperatingcharacteristiccurves.Ifeitherset A orset B contain onlyoneobservation,wedonotconducttheweightingandinsteadusetheNathestimatesasthe corrected estimates. Softwaretoimplementthemethodisavailableat[13].EvaluationofbiascorrectionWecomparedthreemethodsofanalysis: true observed and corrected .Forthe observed analysis,weusedthe observed samplestatisticstocalculateestimatesofdiagnosticaccuracy,replicatingthestandardanalysisperformedbythestudyinvestigatorofacancerscreening trial.Forthe corrected analysis,weusedtheproposedbias correctionapproach.Finally,boththe observed and corrected analyseswerecomparedtothe true analysis.Inthe true analysis,weassumedthatthestudyinvestigatorknew the true diseasestatusofeveryparticipant. Foreachanalysis,wetestedthenullhypothesisthat therewasnodifferenceintheareasunderthebinormalreceiveroperatingcharacteristiccurves.Theareas underthecurveswerecalculatedasdescribedin([14], Equations12and13).Wethencalculatedthevariance ofthedifferenceintheareasunderthecurvesandconductedatwo-sidedhypothesistestusingthemethodof ObuchowskiandMcClish[15]. Toassessscreeningtestperformance,wecomparedthe TypeIerrorandpowerofthe observed corrected and true analyses.Becausetheestimatesofdiagnosticaccuracy canbebiased,thestudyinvestigatorcancorrectlyconcludethatthereisadifferencebetweenthetwoscreening testsbutincorrectlychoosetoimplementthescreening testwiththelowerdiagnosticaccuracy.Toquantifythis decisionerror,wedividedpowerintothe correctrejectionfraction andthe wrongrejectionfraction .Thecorrect rejectionfractionistheprobabilitythatthehypothesistest rejectsandthescreeningtestwiththelarger observed area underthecurveisthescreeningtestwithlarger true area underthecurve.Thewrongrejectionfractionistheprobabilitythatthehypothesistestrejectsbutthescreening testwiththelarger observed areaunderthecurveisthe screeningtestwiththesmaller true areaunderthecurve.DesignofsimulationstudiesundertheGaussian assumptionDataweresimulatedpertheassumptionslistedinthe Assumptions,definitionsandnotationsection.Weconsideredtwostatesofnature;onewherethenullhypothesis holdsandonewherethealternativehypothesisholds. Underthenull,wefixedthe true areasunderthecurvesto be0.78.Underthealternative,wefixedthe true areaunder thecurvetobe0.78forTest1and0.74forTest2foradifferenceof0.04.Thesamplesizewasfixedat50,000.The diagnosticaccuracyofthescreeningtestsandthesamplesizeweresimilartothoseinthestudybyPisano etal. [3].Exceptwherenoted,thecorrelationbetweenscreeningtestscoresforboththecasesandnon-caseswassetto 0.10.Alsoexceptwherenoted,arandomsampleof10% ofthecasesshowedsignsandsymptomsofdisease.Recall thatshowingsignsandsymptomsofdiseaseonlychanges thedecisiontoconductabiopsyiftheparticipantscored negativeonbothscreeningtests.ThethresholdofsuspicionforTest1wassetsothatveryfewcaseswerereferred tothereferencestandardtest.ThethresholdforTest2was setsothatnearlyallcaseswerereferredtothereference standardtest.Differentlevelsofpercentascertainmentfor eachscreeningtestcancausetheestimatesofdiagnostic accuracytobebiasedbyadifferentamount[4].Underthe conditionsofthissimulationstudy,thedifferentialbias wasextremeand,onaverage,causedthereceiveroperatingcharacteristiccurvestoswitchorientationrelativeto thetruestateofnature. Thesimulationstudiesvariedfourfactors:thedisease prevalence,theproportionofcasesthatexhibitedsigns andsymptomsofdiseaseduringfollow-up,thecorrelationbetweenTest1and2scoresandthepositionsofthe thresholdsthattriggerareferencestandardtest.Thefour factorschangedthenumberof observed casesandthe amountofbiasintheestimatesofdiagnosticaccuracy.We setthediseaseprevalenceto0.01,0.14or0.24,reflecting

PAGE 6

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page6of12 http://www.biomedcentral.com/1471-2288/14/37cancerratesseeninpublishedcancerstudiesandsurveys [2,3,16-18].Therateofsignsandsymptomswasvaried acrossaclinicallyrelevantrangeof0to0.20[2,17,19]. Weexaminedarangeofcorrelationsbetween0and1.To assesstheeffectsofsmallerdegreesofdifferentialbias, wesetthethresholdsofsuspiciontoresultin15,50and 80percentascertainmentandexaminedeachofthenine possiblepairings.Eachpairvariedtheamountandsource ofthebias(Test1orTest2).Notethatpercentagesare approximatebecausethecasenumbersarediscrete. Foreachcombinationofparametervalues,wesimulatedpairedscreeningtestscoresandabinaryindicatorof true diseasestatus.Basedonthedescribedstudydesign (Figure1),wededucedthe observed diseasestatus.After calculatingthe true observed and corrected areasunder thecurves,decisionerrorswereassessedusingthemetrics describedintheEvaluationofbiascorrectionsection.We used10,000realizationsofthesimulateddatatoensure thattheerrorintheestimationofprobabilitiesoccurred intheseconddecimalplace.Designofnon-GaussiansimulationstudiesAlthoughthebiascorrectionmethodwasdeveloped underanassumptionthatthedatawerebivariateGaussian,screeningdatamaynotfollowtheGaussiandistribution.Weconductedasecondsetofsimulationstudiesto examinetheperformanceofthebiascorrectionmethod formultinomialandzero-weighteddata. Multinomialandzero-weighteddataoccuroftenin imagingstudies.Readersmaygivetheimageascoreof zerotoindicatethatnodiseaseisseen,resultingina datasetwheremultiplevaluesarezero.Readerpreferences forasubsetofscorescanproducemultinomialdata.To generatethezero-weighteddatawheretheoccurrenceof zeroesiscorrelatedbetweenthetwoscreeningtests,we createdtwosetsofBernoullirandomvariables,onefor thecasesandoneforthenon-cases,sothattheprobabilitythatthescoreonTest1iszerois p1 k,theprobability thatthescoreonTest2iszerois p2 kandtheprobabilitythatbothscreeningtestscoresarezerois qk.Ifthe Bernoullirandomvariablewasone,wereplacedtheassociatedscreeningtestscorewithazero.Otherwise,the screeningtestscoreremainedasitwas.Weset pjkequal toarangeofvaluesbetween0and0.90.Themarginal probabilitiesputconstraintsonthepossiblevaluesfor qk[20].Weset qktothemedianallowedagreementforeach pairingof pjk. Togeneratemultinomialdata,webinnedthebivariate Gaussiandata.Binsizesrangedfrom1 / 10to2timesthe standarddeviation.Diseaseprevalencewas0.01,0.14and 0.24.Allotherparametervalueswereequivalenttothose intheGaussiansimulationstudies.Theperformanceof themethodwasevaluatedasdescribedintheEvaluation ofbiascorrectionsection.ResultsOverviewWhencomparedtothe observed analysis,thebiascorrectionmethodreduceddecisionerrorsacrossallexperimentalconditionswherethepercentascertainmentdiffered betweenthetwoscreeningtests(Figures3,4,5and Table2,Rows1-9).However,theTypeIerrorratefor the corrected analysiswasstillabovenominalformany experimentalconditions(Table2). Variationsinthediseaseprevalence,thecasecorrelationandthepositionofthethresholdsofsuspicionhad thelargesteffectontheTypeIerrorrateandpowerof the corrected analysis.ThedifferencebetweentheTypeI errorrateandpowerofthe corrected analysiscomparedto the true analysiswasonlyslightlymodifiedbychangesin therateofsignsandsymptoms(detailsgiveninAdditional file1).Thenon-casecorrelationisnotinvolvedinthebias correctioncalculationsand,asexpected,hadnoeffecton theperformanceofthemethod. Thebiascorrectionmethodreduceddecisionerrors whenscreeningtestscoreshadamultinomialdistributionwithbinsizesupto1 / 4thestandarddeviationand thediseaseprevalencewasmediumorhigh.However,the TypeIerrorratewasabovenominal.TheNathalgorithm hadhighfailurerateswhenmorethan1%ofscreeningtest scoreswerezero.EffectofdiseaseprevalenceandcasecorrelationAsshowninFigure3,higherdiseaseprevalenceresulted inhigherTypeIerrorratesforthe true observed and corrected analyses.TypeIerrordeclinedwithincreasing casecorrelation.TheTypeIerrorrateofthe corrected analysiswasbelownominalatlowdiseaseprevalenceand decreasedfrom0.09tobelownominalathighprevalence. TheTypeIerrorrateofthe observed analysishadahighof 0.06atlowprevalencethendecreasedtobelownominal. 0.00.51.0 0.00.51.0 0.00.51.0 TrueObservedCorrectedPrevalence = 0.01Prevalence = 0.24Type I Error RateCorrelation Between Screening Test Scores for Cases 0.00.51.0 0.000.050.10 Figure3 EffectofcasecorrelationontheTypeIerrorrate. The nominalTypeIerrorwasfixedat0.05andisindicatedbytheredline.

PAGE 7

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page7of12 http://www.biomedcentral.com/1471-2288/14/37 0.00.51.0 0.00.51.0 0.00.51.0 TrueObservedCorrectedPrevalence = 0.01Prevalence = 0.24Correct Rejection FractionCorrelation Between Screening Test Scores for Cases Figure4 Effectofcasecorrelationonthecorrectrejectionfraction. Thecorrectrejectionfractionistheproportionoftimesthehypothesistest rejectswhenthealternativeistrueandthechoiceofthesuperiorscreeningtestisalignedwiththetruestateofnature.Athighprevalence,theTypeIerrorrateofthe observed analysisdecreasedfrom0.95to0.05. InFigure4,higherdiseaseprevalenceandcasecorrelationresultedinahighercorrectrejectionfraction. Thecorrectrejectionfractionforthe true analysisranged from0.74to1.0atlowprevalenceandwas1.0athigh prevalence.Thecorrectrejectionfractionforthe corrected analysisrangedfrom0.57to1.0atlowprevalenceand0.85 to1.0athighprevalence.Thecorrectrejectionfraction ofthe observed analysis,however,was0exceptatcorrelationsgreaterthanapproximately0.7atlowprevalenceand 0.8athighprevalence. InFigure5,thewrongrejectionfractionwasator near0forthe corrected analysisacrossallexperimentalconditions.Bycontrast,thewrongrejectionfraction forthe observed analysiswas1atlowandmediumcorrelationacrossalldiseaseprevalences.Athighcorrelation,thewrongrejectionfractionforallanalyseswentto zero.EffectofpercentascertainmentTable2showstheTypeIerrorofthe true observed and corrected analysesforninepairsofpercentascertainment levels.WedonotdiscussthepowerresultssincetheType 0.00.51.0 0.00.51.0 0.00.51.0 TrueObservedCorrectedPrevalence = 0.01Prevalence = 0.24Wrong Rejection FractionCorrelation Between Screening Test Scores for Cases Figure5 Effectofcasecorrelationonthewrongrejectionfraction. Thewrongrejectionfractionistheproportionoftimesthehypothesistest rejectswhenthealternativeistrueandthechoiceofthesuperiorscreeningtestisoppositethetruestateofnature.

PAGE 8

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page8of12 http://www.biomedcentral.com/1471-2288/14/37Table2EffectofpercentascertainmentontheTypeIerrorrate PairedscreeningDiseasePercentascertainmentTrueObservedCorrected trialbiasprevalence(Test1/Test2) 0.0115 / 500.010.890.36 0.0115 / 800.020.950.25 0.0150 / 800.010.230.12 0.1415 / 500.021.000.82 Yes0.1415 / 800.021.000.60 0.1450 / 800.021.000.20 0.2415 / 500.021.000.95 0.2415 / 800.021.000.91 0.2450 / 800.021.000.40 0.0115 / 150.010.020.23 0.0150 / 500.010.020.12 0.0180 / 800.020.020.18 0.1415 / 150.020.020.26 No0.1450 / 500.020.020.14 0.1480 / 800.020.020.03 0.2415 / 150.020.020.26 0.2450 / 500.020.020.14 0.2480 / 800.020.020.04 TypeIerrorratesarecalculatedover10,000realizationsofthedataforthehypothesistestofadifferenceinthefullareasunderthecurves.Thenom inalTypeIerroris fixedat0.05.Ierrorofthe observed analysiswassohighandpoweris boundedbelowbyTypeIerrorrate. Ingeneral,whenthestudyhadsomeamountofpaired screeningtrialbias(asindicatedbyadifferenceinthepercentascertainment),theTypeIerrorrateofthe observed analysiswastoohigh(0.23to1.0).TheTypeIerrorrate ofthe corrected analysiswasclosertonominalthanthat ofthe observed analysis,butwasalsotoohigh(0.12to 0.95).Forpairingswithnopairedscreeningtrialbias,the observed analysishadlowerthannominalTypeIerror rateswhilethe corrected analysishadTypeIerrorratesup to0.26.Whenbothscreeningtestshadhighpercentascertainment(80 / 80),theTypeIerrorrateofthe corrected analysiswasbelownominal.Robustnesstonon-GaussiandataTheresultsofthenon-Gaussiansimulationstudiesare summarizedbelow.AtableofthemainresultsispresentedinAdditionalfile2. Atmediumandhighdiseaseprevalence,the corrected analysishadalowerTypeIerrorratethanthe observed analysisformultinomialbinsizes1 / 4thestandarddeviationorless.Atlowdiseaseprevalence,theTypeIerror rateforthe corrected analysiswaslowerthanthe observed analysisformultinomialbinsizes1 / 10thestandarddeviationorless. Fortherangeofmultinomialbinsizesconsideredin thestudy,theTypeIerrorrateofthe true analysis remainedbelownominal.TheTypeIerrorrateofthe corrected analysis,however,wasabovenominalforalldisease prevalencesandbinsizesgreaterthan1 / 10thestandard deviation.The observed analysishadaninflatedTypeI errorrateatmediumandhighdiseaseprevalence.Atlow diseaseprevalence,theTypeIerrorrateofthe observed analysiswasbelownominalexceptatamultinomialbin sizeof2timesthestandarddeviation. Forzero-weighteddata,thesuccessrateoftheNath algorithmdecreasedasthepercentageofzeroscoresfor thecasesincreased.Atlowdiseaseprevalence,when1%of thecaseshadzeroscores,theNathalgorithmconverged foronly33%ofthesimulatedtrials.Forzero-weightsless than1%,theTypeIerrorrateforallthreeanalyseswas abovenominalatmediumandhighdiseaseprevalence. However,theTypeIerrorrateofboththe true and corrected analyseswereclosertonominalthanthatofthe observed analysis.DemonstrationFigure6showsthereceiveroperatingcharacteristic curvesforahypotheticaloralcancerscreeningtrialsimilartothatconsideredbyLingen[1].Oneofthedesigns consideredbyLingenwasapairedtrialcomparingtwo

PAGE 9

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page9of12 http://www.biomedcentral.com/1471-2288/14/37 0.00.51.0 0.00.51.0 0.00.51.0 0.00.51.0 Test 1Test 2 0.00.51.0 0.00.51.0 TrueObservedCorrectedSensitivity1 Specificity Figure6 Receiveroperatingcharacteristiccurvesforahypotheticaloralcancerscreeningstudy. Thestudyissubjecttopairedscreening trialbias.The true areasunderthecurvesforTest1andTest2are0.77and0.71,respectively,fora true differenceof0.06.The observed differenceis Š 0.06,withthe corrected differenceat0.06.oralcancerscreeningmodalities:1)examinationbyadentistusingavisualandtactileoralexamination,andreferral forbiopsyonlyforfrankcancers(Test1);and2)examinationbyadentistusingavisualandtactileoralexam, asecondlookwiththeVELscopeoralcancerscreeningdeviceandstringentinstructionstobiopsyanylesion detectedduringeitherexamination(Test2). Wecouldfindnopublishedoralcancerscreeningtrials ofpairedcontinuoustests.Instead,wechoseparameter valuesfromabreastcancerscreeningstudy[3]andan oralcancerscreeningdemonstrationstudy[17].Wefixed thesamplesizeat50,000andtherateofvisiblelesions at0.1[17,rateofsuspiciousoralcancerandprecancerouslesionsreportedin28studiesbetween1971and2002 rangesfrom0.02to0.17,Table6].Weapproximatedthe diseaseprevalenceas0.01basedonthenumberofAmericanswithcanceroftheoralcavityandpharynx[18]and the2011populationestimatefromtheU.S.CensusBureau [21].Forthepurposesoftheillustration,the true areas underthecurvesforTest1andTest2werefixedat0.77 and0.71,respectively. Inthehypotheticaloralcancerscreeningtrial,weposit thattherewouldbealargedifferenceinthepercent ascertainmentsforeachscreeningmodality.Inthefirst arm,thedentistonlyrecommendsbiopsyforparticipants withhighlysuspiciouslesions.Thus,wefixedthepercentascertainmenttobeverylow,only0.01%ofthecases. Theoralpathologistrecommendsbiopsyforalmostany lesionsowesetthepercentascertainmentat97%of thecases.Thelargedifferenceinpercentascertainment createdextremepairedscreeningtrialbias,causingthe receiveroperatingcharacteristiccurvestoswitchorientationrelativetothetruth. Whenthereisanextremeamountofdifferentialbias, themethodperformswell(Figure6).The true difference intheareasunderthecurveswas0.06(p=0.001)andthe observed differencewas-0.06(p=0.005).The corrected analysisrealignedthecurveswiththetruestateofnature, adjustingthedifferencebackto0.06(p=0.001). Inreality,thestudyinvestigatorwouldnotknowwhich analysishadresultsclosesttothetruth.Tovalidateour choiceofanalysis,wesimulatedthehypotheticalstudy usingtheparametervaluesspecifiedabove.Thesimulated TypeIerrorrateofthe corrected analysiswasbelownominalat0.03,whiletheTypeIerrorrateofthe observed analysiswasabovenominalat0.06.Thecorrectrejectionfractionofthe corrected analysiswas0.58,while thatofthe observed analysiswaszero.Infact,usingthe observed analysis,thestudyinvestigatorwouldwrongly concludethatTest2wassuperiortoTest186%ofthe time.Basedonthissimulation,wewouldrecommendthe studyinvestigatorusetheresultsofthe corrected analysis.DiscussionWecouldfindnoothermethodsthatattemptedtoamelioratepairedscreeningtrialbias.Re-weighting,generalizedestimatingequations,imputationandBayesian approacheshavebeenproposedtoreducetheeffectof partialverificationbias( e.g. ,[22-27]).Maximumlikelihoodmethods[28,29]andlatentclassmodels[30]have beenproposedtoestimatediagnosticaccuracyinthe presenceofimperfectreferencestandardbias.These methods,however,addressproblemsthatarequitedifferentthantheonewedescribe.Theproposedapproachis theonlymethodthatattemptstocorrectthedifferential misclassificationofdiseasestates. Thebiascorrectionalgorithmisamaximumlikelihood method.Thus,theaccuracyoftheestimationdepends onthenumberofcases.Wedonotrecommendusing themethodforstudieswithaverysmallnumberof cases( < 500)andintervalcases( < 5).Theperformanceofthemethodimprovesasthediseaseprevalence andrateofsignsandsymptomsincreasebecauseboth factorsincreasetheamountofinformation(numberof

PAGE 10

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page10of12 http://www.biomedcentral.com/1471-2288/14/37cases)usedtoformthecorrectedestimates.Asthedisease prevalencebecomesverylarge,however,thebenefitsof theincreasedamountofcaseinformationisconstrained bytheincreasingnumberofnon-cases.Thenon-case parameterestimatesarenotcorrectedandaddbiastothe estimatesofdiagnosticaccuracy. Forthecorrelationsimulationstudy,theperformanceof themethoddependsuponthe observed differenceinthe areasunderthecurves.Undertheconditionsofthestudy, theaveragedifferenceinthe observed areasunderthe curveswaszeroatanapproximatecorrelationof0.7.At highercorrelations,theaverage observed differencewas underestimatedbutagreedwiththetruestateofnature. Atlowercorrelationsthebiaswasmoresevere:theaverage observed differencewasoverestimatedandopposite thetruestateofnature.Thebiascorrectionmethodperformedbestunderconditionswithalargeamountof differentialbias.Thus,atlowcorrelationsthe corrected analysishadlowerTypeIerrorratesandhigherpowerfor thecorrectdecisionrelativetothe observed analysis. Thesimulationstudiesdemonstratedthattheperformanceofthebiascorrectionmethoddependsonthe amountofdifferentialbiasinthestudy.Theamountof bias,inturn,dependsonfourteenfactors:themeans, variancesandcorrelationsofthetestscores,thedisease prevalence,therateofsignsandsymptomsandthepercentascertainmentforeachscreeningtest.Afteranalysis ofover170,000combinationsoftheparametervalues, wewereunabletodetermineadefinitivepatternupon whichtobaserecommendationsforusingastandard versusabias-correctedapproach.Wecan,however,providerecommendationsfortwospecialcases.Wesuggest that observed studyresultsbeusedif:1)allparticipants receiveareferencestandardtest,or2)thetwoscreeningtestsunderconsiderationascertainapproximatelythe samepercentageofcases.Bothsituationsareplausiblein cancerscreening.Inaproposedoralcancerscreeningtrial [1],theinvestigatorssuggestedbiopsyingallorallesions, undertheargumentthatoralbiopsywasminimallyinvasive,anddiagnosiswasdifficultwithoutbiopsy.Thesecondcaseoccurredinstudiescomparingdigitalandfilm mammography,whichhavesimilarrecallrates[2,3]. Inordertodetermineifbiascorrectionisindicatedfor ascreeningtrialthatisnotaspecialcase,werecommendtheinvestigatorconductasimulationstudysimilar tothosedescribedinthemanuscript.Thesimulationsoftware,instructionmanualandexamplecodeareavailable at[13].ThesoftwaresimulatesTypeIerrorrateandpower forboththestandardandbias-correctedanalysesinthe SAS/IMLenvironment.Inaddition,thesoftwarecanperformbiascorrectionforauser-provideddatasetshould thebias-correctedapproachbedeemedappropriate. Undermostcircumstances,thestudyinvestigator shouldchoosetheanalysisthathastheTypeIerrorrate closestto,butnotgreaterthanthenominallevel,highestcorrectrejectionfractionandlowestwrongrejection fraction.Insomecontexts,onetypeoferrormaybemore importantthantheother.ControllingtheTypeIerror rateisapriorityifthereisonlyonestudythatisgoing tobeperformedandpatientscouldbeputatharmifthe wrongscreeningtestisselected.Asmallinflationofthe TypeIerrorratemightbelessimportantifthereisprior knowledgethatthenullisnottrue.Forexample,saya researcherisdesigningthelaststudyinaseriesofstudiesexaminingcomplimentaryhypotheses.Ifallprevious studiesrejectedthenullhypothesis,thentheresearcher haspriorknowledgethatthephenomenonmayshowan effect.Inthissituation,theresearchermightprioritizethe analysiswithaslightlyhigherthannominalTypeIerror rateinfavorofgreaterdiscriminatorypowerunderthe alternativehypothesis. Anotherlimitationofthemethodistheassumptionthat screeningtestscoresaredistributedbivariateGaussian conditionalondiseasestatus.ThebivariateGaussiandistributionistheunderlyingassumptionforthebinormal receiveroperatingcharacteristiccurve,apopularformof receiveroperatingcharacteristicanalysis[31].Weevaluatedtherobustnessofthemethodtotwocommondeviationsfromnormality:multinomialandzero-weighted data.Basedonoursimulationstudies,wecannotrecommendthemethodforusewithdatasetswheregreaterthan 1%oftestscoreshavezerovalues.Inaddition,themethod isnotrecommendedfordatawithmultinomialbinsizes greaterthan1 / 4thestandarddeviationformediumor highdiseaseprevalenceor1 / 10thestandarddeviation forlowprevalence.Infuturework,thebiascorrection methodcouldbeexpandedtohandlealternativedistributionsforthetestscores. Thispaperprovidestwocontributionstotheliterature.First,wedescribeamethodtocorrectforpaired screeningtrialbias,abiasforwhichthereisnoothercorrectiontechnique.Duetotheincreasinguseofcontinuous biomarkersforcancerdetection(see, e.g. ,[32]),agrowing numberofscreeningtrialshavethepotentialtobesubjecttopairedscreeningtrialbias.Theproposedmethod willcounteractbiasinthepairedtrialsandallowinvestigatorstocomparescreeningtestswithfewerdecision errors.Second,weintroduceanimportantmetricforevaluatingtheperformanceofbiascorrectiontechniques,that ofreducingdecisionerrors.Werecommendthatanynew correctionmethodbeevaluatedwithastudyofTypeI errorandpower.ConclusionsTheproposedbiascorrectionmethodreducesdecision errorsinthepairedcomparisonofthefullareasunder thecurvesofscreeningtestswithGaussianoutcomes. Becausetheperformanceofthebiascorrectionmethod

PAGE 11

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page11of12 http://www.biomedcentral.com/1471-2288/14/37isaffectedbycharacteristicsofthescreeningtestsand thediseasebeingexamined,werecommendconductinga simulationstudyusingourfreesoftwarebeforechoosing abias-correctedorstandardanalysis.Additionalfiles Additionalfile1:Effectoftherateofsignsandsymptoms. Thefile containsresultsforthesimulationstudyexaminingtheeffectofvarying therateofsignsandsymptomsontheTypeIerrorrateandpowerofthe true observed and corrected analyses. Additionalfile2:Non-Gaussiansimulationstudy. Thefilecontainsthe mainresultsforthesimulationstudyexaminingtherobustnessofthebias correctionmethodtodeviationsfromtheGaussianassumption. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Authorscontributions BMRconductedtheliteraturereview,derivedthemathematicalresults, designedandprogrammedthesimulationstudies,interpretedtheresultsand preparedthemanuscript.TAAassistedwiththeliteraturereviewandprovided expertiseonthecontextofthetopicinrelationtootherworkinthefield.JTB assistedwiththemathematicalderivations.SMKprovidedguidanceforthe designandprogrammingofthesimulationstudies.AMimprovedthe softwareandpackageditforpublicrelease.KEMreviewedtheintellectual contentoftheworkandgaveimportanteditorialsuggestions.DHGconceived ofthetopicandguidedthedevelopmentofthework.Allauthorsreadand approvedthemanuscript. Acknowledgements Theresearchpresentedinthispaperwassupportedbytwogrants.The mathematicalderivations,programmingofthealgorithmandearlysimulation studieswerefundedbyNCI1R03CA136048-01A1,agrantawardedtothe ColoradoSchoolofPublicHealth,DeborahGlueck,PrincipalInvestigator. Completionofthesimulationstudies,includingtheTypeIerrorandpower analyses,wasfundedbyNIDCR3R01DE020832-01A1S1,aminoritysupplement awardedtotheUniversityofFlorida,KeithMuller,PrincipalInvestigator,witha subawardtotheColoradoSchoolofPublicHealth.Thecontentofthispaperis solelytheresponsibilityoftheauthors,anddoesnotnecessarilyrepresentthe officialviewsoftheNationalCancerInstitute,theNationalInstituteofDental andCraniofacialResearch,northeNationalInstitutesofHealth. Authordetails1CenterforCancerPreventionandControlResearch,UniversityofCalifornia, LosAngeles,650CharlesYoungDriveSouth,RoomA2-125CHS,LosAngeles CA90095,USA.2DepartmentofPreventiveMedicine,UniversityofSouthern California,440E.HuntingtonDr,4thfloor,ArcadiaCA91006,USA.3DepartmentofBiostatisticsandInformatics,ColoradoSchoolofPublic Health,UniversityofColoradoAnschutzMedicalCampus,13001E.17thPlace, AuroraCO80045,USA.4DepartmentofHealthOutcomesandPolicy, UniversityofFlorida,1329SW16thSt.,GainesvilleFL32608,USA. Received:13December2013Accepted:26February2014 Published:5March2014 References 1.LingenMW: Efficacyoforalcancerscreeningadjunctivetechniques. NationalInstituteofDentalandCraniofacialResearch,NationalInstitutes ofHealth,USDepartmentofHealthandHumanServices.NIHProject Number1RC2DE020779-01.2009. 2.LewinJM,DOrsiCJ,HendrickRE,MossLJ,IsaacsPK,KarellasA,CutterGR: Clinicalcomparisonoffull-fielddigitalmammographyand screen-filmmammographyfordetectionofbreastcancer. AmJ Roentgenol 2002, 179: 671…677. 3.PisanoED,GatsonisC,HendrickE,YaffeM,BaumJK,AcharyyaS,Conant EF,FajardoLL,BassettL,DOrsiC,JongR,RebnerM: Diagnostic performanceofdigitalversusfilmmammographyforbreast-cancer screening. NEnglJMed 2005, 253: 1773…1783. 4.GlueckDH,LambMM,ODonnellCI,RinghamBM,BrintonJT,MullerKE, LewinJM: Biasintrialscomparingpairedcontinuoustestscancause researcherstochoosethewrongscreeningmodality. BMCMedRes Methodol 2009, 9: 4. 5.LijmerJG,MolBW,HeisterkampS,BonselGJ,PrinsMH,vanderMeulen JH,BossuytPM: Empiricalevidenceofdesign-relatedbiasinstudies ofdiagnostictests. JAMA 1999, 282: 1061…1066. 6.ReitsmaJB,RutjesAWS,KhanKS,CoomarasamyA,BossuytPM: Areview ofsolutionsfordiagnosticaccuracystudieswithanimperfector missingreferencestandard. JClinEpidemiol 2009, 62: 797…806. 7.RutjesAWS,ReitsmaJB,DiNisio,M,SmidtN,vanRijn,JC,BossuytPMM: Evidenceofbiasandvariationindiagnosticaccuracystudies. CanMedAssocJ 2006, 174: 469…476. 8.WhitingP,RutjesAWS,ReitsmaJB,GlasAS,BossuytPMM,KleijnenJ: Sourcesofvariationandbiasinstudiesofdiagnosticaccuracy:a systematicreview. AnnInternMed 2004, 140: 189…202. 9.RinghamBM,AlonzoTA,GrunwaldGK,GlueckDH: Estimatesof sensitivityandspecificitycanbebiasedwhenreportingtheresults ofthesecondtestinascreeningtrialconductedinseries. BMCMed ResMethodol 2010, 10: 3. 10.KishL: SurveySampling .Hoboken:JohnWiley&Sons;1965. 11.NathGB: Estimationintruncatedbivariatenormaldistributions. JRoy StatSocC-App 1971, 20: 313…319. 12.RossS: AFirstCourseinProbability .PrenticeHall:UpperSaddleRiver;2009. 13.GitHubRepository: BiasCorrectionSuite. [www.github.com/ SampleSizeShop/BiasCorrectionSuite]. 14.MetzCE,HermanBA,ShenJH: Maximumlikelihoodestimationof receiveroperatingcharacteristic(roc)curvesfrom continuously-distributeddata. StatMed 1998, 17: 1033…1053. 15.ObuchowskiNA,McClishDK: Samplesizedeterminationfor diagnosticaccuracystudiesinvolvingbinormalroccurveindices. StatMed 1997, 16: 1529…1542. 16.BunkerCH,PatrickAL,KonetyBR,DhirR,BrufskyAM,VivasCA,BecichMJ, TrumpDL,KullerLH: Highprevalenceofscreening-detectedprostate canceramongafro-caribbeans:thetobagoprostatecancersurvey.CancerEpidemBiomar 2002, 11: 726…729. 17.LimK,MolesDR,DownerMC,SpeightPM: Opportunisticscreeningfor oralcancerandprecanceringeneraldentalpractice:resultsofa demonstrationstudy. BritDentJ 2003, 194: 497…502. 18.HowladerN,NooneAM,KrapchoM,NeymanN,AminouR,WaldronW, AltekruseSF,KosaryCL,RuhlJ,TatalovichZ,ChoH,MariottoA,EisnerMP, LewisDR,ChenHS,FeuerEJ,CroninKA,EdwardsBK(Eds): SEERcancer statisticsreview,1975-2008. 2011.[http://seer.cancer.gov/csr/ 1975_2009_pops09/]. 19.BoboJK,LeeNC,ThamesSF: Findingsfrom752,081clinicalbreast examinationsreportedtoanationalscreeningprogramfrom1995 through1998. JNatlCancerI 2000, 92: 971…976. 20.AlonzoTA: Verificationbias-correctedestimatorsoftherelativetrue andfalsepositiveratesoftwobinaryscreeningtests. StatMed 2005, 24: 403…417. 21.BureauUSC: StateandCountyQuickfacts. [http://quickfacts.census. gov]. 22.BeggCB,GreenesRA: Assessmentofdiagnostictestswhendisease verificationissubjecttoselectionbias. Biometrics 1983, 39: 207…215. 23.AlonzoTA,PepeMS: Assessingaccuracyofacontinuousscreening testinthepresenceofverificationbias. JRoyStatSocC-App 2005, 54: 173…190. 24.BuzoianuM,KadaneJB: Adjustingforverificationbiasindiagnostic testevaluation:Abayesianapproach. StatMed 2008, 27: 2453…2473. 25.MartinezEZ,AlbertoAchcar,J,Louzada-NetoF: Estimatorsofsensitivity andspecificityinthepresenceofverificationbias:abayesian approach. ComputStatDataAn 2006, 51: 601…611. 26.RotnitzkyA,FaraggiD,SchistermanE: Doublyrobustestimationofthe areaunderthereceiver-operatingcharacteristiccurveinthe presenceofverificationbias. JAmStatAssoc 2006, 101: 1276…1288. 27.ToledanoAY,GatsonisC: Generalizedestimatingequationsfor ordinalcategoricaldata:arbitrarypatternsofmissingresponses andmissingnessinakeycovariate. Biometrics 1999, 55: 488…496.

PAGE 12

Ringham etal.BMCMedicalResearchMethodology 2014, 14 :37 Page12of12 http://www.biomedcentral.com/1471-2288/14/3728.ZhouX: Maximumlikelihoodestimatorsofsensitivityandspecificity correctedforverificationbias. CommunStatA-Theor 1993, 22: 3177…3198. 29.VacekPM: Theeffectofconditionaldependenceontheevaluationof diagnostictests. Biometrics 1985, 41: 959…968. 30.Torrance-RynardVL,WalterSD: Effectsofdependenterrorsinthe assessmentofdiagnostictestperformance. StatMed 1997, 16: 2157…2175. 31.MetzC,WangP,KronmanHA: Newapproachfortestingthe significanceofdifferencesbetweenroccurvesmeasuredfrom correlateddata. In InformationProcessingInMedicalImaging .Editedby DeconinckF.TheHague:Springer;1984:432…445. 32.ElashoffD,ZhouH,ReissJ,WangJ,XiaoH,HensonB,HuS,ArellanoM, SinhaU,LeA,MessadiD,WangM,NabiliV,LingenM,MorrisD,Randolph T,FengZ,AkinD,KastratovicDA,ChiaD,AbemayorE,WongDTW: Prevalidationofsalivarybiomarkersfororalcancerdetection. CancerEpidemBiomar 2012, 21: 664…672. doi:10.1186/1471-2288-14-37 Citethisarticleas: Ringham etal. : Reducingdecisionerrorsinthepaired comparisonofthediagnosticaccuracyofscreeningtestswithGaussian outcomes. BMCMedicalResearchMethodology 2014 14 :37. Submit your next manuscript to BioMed Central and take full advantage of: € Convenient online submission € Thorough peer review € No space constraints or color “gure charges € Immediate publication on acceptance € Inclusion in PubMed, CAS, Scopus and Google Scholar € Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit



PAGE 1

1Additional file 2 Non-Gaussian simulation studyOverviewWe provide a table of important results for the simulation study assessing the robustness of the bias correction method to deviations from the Gaussian assumption. For a description of the simulation study design, refer to the Design of nonGaussian simulation studies section in the main text.Tabular Results for the Non-Gaussian Simulation StudyTable S1 displays the effect of increasing the multinomial bin size on the Type I error rate of the and analyses. Note that bin size is listed as a proportion of the trueobservedcorrected standard deviation of the screening test scores For example, a bin size of 1/2 indicates that Gaussian screening test scores were rounded to 1/2 the standard deviation to generate the multinomial screening test scores. For additional information on the results of the non-Gaussian simulation studies, see the Results, Robustness to non-Gaussian data section in the main text.

PAGE 2

2Table S1 Effect of multinomial bin size on the Type I error rate of the and trueobserved corrected analyses Bin SizeDisease True Observed Corrected Prevalence* *Relative to the standard deviation of the screening test scores.



PAGE 1

1Additional file 1 Effect of the rate of signs and symptomsOverviewWe present the results for the simulation study assessing the effect of the rate of signs and symptoms on the performance of the bias correction method. For a description of the simulation study design, refer to the Design of simulation studies under the Gaussian assumption section in the main text.Effect of the rate of signs and symptomsIn Figure S1, the Type I error rate declined as the rate of signs and symptoms increased. The Type I error rate of the analysiswas below nominal at low disease prevalence and corrected ranged from to at high prevalence. The Type I error rate of the analysis observed ranged between and at low prevalence and to at high prevalence. In Figure S2, increasing the rate of signs and symptoms had no effect on the correct rejection fraction at low disease prevalence, but improved the correct rejection fraction at high prevalence. The correct rejection fraction for the analys is was at low prevalence and at high true prevalence. The correct rejection fraction for the analysis ranged from to at corrected low prevalence and to at high prevalence. By contrast, the analysis had a observed correct rejection fraction of zero across all prevalences. In Figure S3, the wrong rejection fraction for the is near zero across all corrected analysis rates of signs and symptoms and disease prevalences. For the the observed analysis, however, wrong rejection fraction ranged from to Under the conditions of the simulation, a study investigator using the results would observed either incorrectly decide that the worst screening test was best, or conclude that there was no difference between the two screening tests.

PAGE 2

2Figure S1 Effect of the rate of signs and symptoms on Type I error rateThe nominal Type I error was fixed at and is indicated by the red line. 0.00.10.2 0.00.51.0 0.00.10.2 TrueObservedCorrectedPrevalence = 0.01Prevalence = 0.24Type 1 Error RateRate of Signs and Symptoms 0.00.10.2 0.000.050.10

PAGE 3

3Figure S2 Effect of the rate of signs and symptoms on the correct rejection fractionThe correct rejection fraction is the proportion of times the hypothesis test rejects when the alternative is true and the choice of the superior screening test is aligned with the true state of nature. 0.00.10.2 0.00.51.0 0.00.10.2 TrueObservedCorrectedPrevalence = 0.01Prevalence = 0.24Correct Rejection FractionRate of Signs and Symptoms

PAGE 4

4Figure S3 Effect of the rate of signs and symptoms on the wrong rejection fractionThe wrong rejection fraction is the proportion of times the hypothesis test rejects when the alternative is true and the choice of the superior screening test is opposite the true state of nature. 0.00.10.2 0.00.51.0 0.00.10.2 TrueObservedCorrectedPrevalence = 0.01Prevalence = 0.24Wrong Rejection FractionRate of Signs and Symptoms