Some Contributions to the Equivalence of Prospective and Retrospective Inferences in Case-Control Studies

MISSING IMAGE

Material Information

Title:
Some Contributions to the Equivalence of Prospective and Retrospective Inferences in Case-Control Studies
Physical Description:
1 online resource (79 p.)
Language:
english
Creator:
Song,Jihyun
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Statistics
Committee Chair:
Ghosh, Malay
Committee Members:
Daniels, Michael J
Randles, Ronald H
Shuster, Jonathan J

Subjects

Subjects / Keywords:
bayesian -- case -- control -- error -- measurement -- prospective -- retrospective -- semiparametric
Statistics -- Dissertations, Academic -- UF
Genre:
Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
My dissertation focuses on equivalence issues related to prospective and retrospective inferences in case-control studies. In case-control studies, one traces the exposure history of an individual or a group of individuals conditional on outcome categories of a certain disease. Here, the primary goal is to examine the relationship between the disease and the exposure. The usual prospective inference based on the model for probability of disease given exposure is not obviously applicable in this type of study design. Instead, the retrospective inference based on the model for probability of exposure given disease is needed. Prospective models usually involve fewer parameters than the corresponding retrospective models. Also, standard software is usually available for prospective models. Hence, if the inference based on the prospective likelihood provides an equivalent inference based on the retrospective likelihood, inferential problems become easier. Thus, investigating the equivalence of the two approaches is an interesting and important statistical question. In my dissertation, I explore the equivalence of the two approaches both in a Bayesian framework and in a frequentist framework. In a Bayesian framework, I present a class of priors under which the posterior distributions of odds ratios based on the prospective likelihood is equivalent to that based on the retrospective likelihood for general models. This is an extension of Seaman and Richardson (2004) and Ghosh, Zhang and Mukherjee (2006) who proposed a class of priors for the equivalence of the posteriors for odds ratio parameters based on prospective and retrospective likelihoods for the logistic regression model. A colorectal cancer study data analysis is included as an application. In a frequentist framework, Prentice and Pyke (1979) showed that the maximum likelihood estimator for odds ratio parameters obtained from the prospective likelihood is identical to the one obtained from the retrospective likelihood. Also, the estimator is a consistent estimator of the true odds ratio parameter and its asymptotic variance-covariance matrix estimator based on the inverse of the observed Fisher information matrix converges to the true asymptotic variance-covariance matrix under the retrospective likelihood. Scott and Wild (2001) generalized the work of Prentice and Pyke (1979) to any arbitrary binary regression model with general exposure variables. I extend Scott and Wild (2001) to multiple disease states and general exposure variables. I present a pseudo prospective likelihood approach to provide equivalent inference from the retrospective likelihood. However, I take a different approach from Scott and Wild (2001) to construct the pseudo likelihood and to examine asymptotic properties of the estimators. As Chen (2003) pointed out, parameters not involved in odds ratios are not identifiable with case-control data. I also present a modified pseudo prospective likelihood approach overcoming this non-identifiability issue. My modified pseudo prospective model coincides with Chen (2003)'s base-line logit model. However, my derivation of the model and asymptotic distribution of the estimator is different from Chen's. I also extend my pseudo prospective likelihood approach to the case where some of covariates are measured with errors. I present a pseudo joint likelihood approach for which the MLE's of odds ratio parameters are equivalent to those from the retrospective likelihood. This is an extension of Roeder, Carroll, and Lindsay (1996) who studied the equivalence of MLE's in the situation where some of covariates are measured with errors in case-control studies for the binary logistic regression model. I also prove the consistency of the MLE under certain conditions.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Jihyun Song.
Thesis:
Thesis (Ph.D.)--University of Florida, 2011.
Local:
Adviser: Ghosh, Malay.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2013-08-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2011
System ID:
UFE0043213:00001


This item is only available as the following downloads:


Full Text

PAGE 1

SOMECONTRIBUTIONSTOTHEEQUIVALENCEOFPROSPECTIVEANDRETROSPECTIVEINFERENCESINCASE-CONTROLSTUDIESByJIHYUNSONGADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2011

PAGE 2

c2011JihyunSong 2

PAGE 3

Idedicatethistomymother,latefather,husband,anddaughter. 3

PAGE 4

ACKNOWLEDGMENTS Iamdeeplygratefultomydissertationadvisor,ProfessorMalayGhosh.Hispatienceandkindnessmademecomfortabletoaskanyquestionandhisencouragementandguidancemademenishthisdissertation.IamalsogratefultoProfessorBhramarMukherjeeforallowingmetouseMECCdataandgivingfeedbacks.IwouldliketothankProfessorsDalhoKimandDebashisGhoshforhelpfuldiscussionsaswell.IalsowouldliketothankProfessorsMichaelDaniels,RonaldRandles,andJonathanShusterforservingonmycommittee.Theirkindsuggestionsandcommentsduringmyproposalpresentationimprovedcontentsofthisdissertation.Finally,Iwouldliketothankmyhusbandandmymother.Withouttheirphysicalandemotionalsupport,Icouldnotcompletethisdissertation. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 ABSTRACT ......................................... 8 CHAPTER 1OVERVIEW:STATISTICALMETHODSFORCASE-CONTROLSTUDIES ... 10 1.1Case-ControlStudyandOddsRatio ..................... 10 1.2Cochran-Mantel-HaenszelTestandMantel-HaenszelEstimatorforCommonOddsRatio ................................... 12 1.3LogisticRegressionandCase-ControlStudies ............... 14 1.4EquivalenceofInferenceforOddsRatioParametersBasedonProspectiveandRetrospectiveLikelihoodsofLogisticRegressionModelinCase-ControlStudies ..................................... 15 1.5EquivalenceofInferenceforOddsRatioParametersBasedonProspectiveandRetrospectiveLikelihoodsofLogisticRegressionModelinCase-ControlStudieswithMeasurementError ....................... 19 1.6Case-ControlStudieswithKnownPopulationTotals ............ 21 1.7Case-ControlStudieswithoutSupplementaryInformation ......... 23 1.8TopicsofMyDissertation ........................... 25 2ONTHEEQUIVALENCEOFPOSTERIORINFERENCEBASEDONRETROSPECTIVEANDPROSPECTIVELIKELIHOODS ....................... 30 2.1TheBayesianEquivalenceResult ...................... 32 2.2MultiplicativeInterceptModel ......................... 36 2.3PosteriorEquivalenceforStratiedCase-ControlData ........... 36 2.4Example:AnalysisofColorectalCancerData ................ 38 3FREQUENTISTPROSPECTIVEAPPROACHINGENERALCASE-CONTROLFRAMEWORK .................................... 45 3.1AProspectiveLikelihoodApproachforGeneralModelsinCase-ControlStudies ..................................... 46 3.1.1AProspectiveLikelihoodApproachtoObtainMLEfromRetrospectiveLikelihood ................................ 46 3.1.2ConsistencyandAsymptoticNormalityofMLE ........... 48 3.1.3AsymptoticCovarianceMatrixoftheMLE .............. 52 3.2NonidentiablityofParametersNotInvolvedinOddsRatios ........ 55 3.3AProspectiveLikelihoodApproachforGeneralModelsinCase-ControlStudiesWhereSomeofParametersAreNotIdentied ........... 60 5

PAGE 6

4SEMIPARAMETRICMIXTUREAPPROACHINCASE-CONTROLSTUDIESWITHMEASUREMENTERRORINCOVARIATESFORGENERALMODELS 62 4.1AJointLikelihoodApproachforGeneralModelsinCase-ControlStudieswithErrorsinCovariates ............................ 63 4.2Consistencyof^q2 ................................ 66 5SUMMARYANDFUTURERESEARCH ...................... 69 REFERENCES ....................................... 75 BIOGRAPHICALSKETCH ................................ 76 6

PAGE 7

LISTOFTABLES Table page 1-122FrequencyTableforBinaryExposureandBinaryDiseaseVariables ... 11 1-2FrequencyTableforBinaryExposureandBinaryDiseaseforStratumk,k=1,,K ........................................ 13 2-1PosteriormeansandHPDregionsforb1,b2,f1,andoddsratioswithvariousnormalpriorsforb,andtheirMLE'sandasymptoticC.I.'s. ............ 43 2-2Sensitivityanalysisunderpriorchoiceforf1:posteriormeansandHPDregionsforb1,b2,f1withbetapriorsforf1andprior5inTable2-1forb. ........ 44 7

PAGE 8

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophySOMECONTRIBUTIONSTOTHEEQUIVALENCEOFPROSPECTIVEANDRETROSPECTIVEINFERENCESINCASE-CONTROLSTUDIESByJihyunSongAugust2011Chair:MalayGhoshMajor:Statistics Mydissertationfocusesonequivalenceissuesrelatedtoprospectiveandretrospectiveinferencesincase-controlstudies.Incase-controlstudies,onetracestheexposurehistoryofanindividualoragroupofindividualsconditionalonoutcomecategoriesofacertaindisease.Here,theprimarygoalistoexaminetherelationshipbetweenthediseaseandtheexposure.Theusualprospectiveinferencebasedonthemodelforprobabilityofdiseasegivenexposureisnotobviouslyapplicableinthistypeofstudydesign.Instead,theretrospectiveinferencebasedonthemodelforprobabilityofexposuregivendiseaseisneeded.Prospectivemodelsusuallyinvolvefewerparametersthanthecorrespondingretrospectivemodels.Also,standardsoftwareisusuallyavailableforprospectivemodels.Hence,iftheinferencebasedontheprospectivelikelihoodprovidesanequivalentinferencebasedontheretrospectivelikelihood,inferentialproblemsbecomeeasier.Thus,investigatingtheequivalenceofthetwoapproachesisaninterestingandimportantstatisticalquestion.Inmydissertation,IexploretheequivalenceofthetwoapproachesbothinaBayesianframeworkandinafrequentistframework. InaBayesianframework,Ipresentaclassofpriorsunderwhichtheposteriordistributionsofoddsratiosbasedontheprospectivelikelihoodisequivalenttothatbasedontheretrospectivelikelihoodforgeneralmodels.ThisisanextensionofSeamanandRichardson(2004)andGhosh,ZhangandMukherjee(2006)who 8

PAGE 9

proposedaclassofpriorsfortheequivalenceoftheposteriorsforoddsratioparametersbasedonprospectiveandretrospectivelikelihoodsforthelogisticregressionmodel.Acolorectalcancerstudydataanalysisisincludedasanapplication. Inafrequentistframework,PrenticeandPyke(1979)showedthatthemaximumlikelihoodestimatorforoddsratioparametersobtainedfromtheprospectivelikelihoodisidenticaltotheoneobtainedfromtheretrospectivelikelihood.Also,theestimatorisaconsistentestimatorofthetrueoddsratioparameteranditsasymptoticvariance-covariancematrixestimatorbasedontheinverseoftheobservedFisherinformationmatrixconvergestothetrueasymptoticvariance-covariancematrixundertheretrospectivelikelihood.ScottandWild(2001)generalizedtheworkofPrenticeandPyke(1979)toanyarbitrarybinaryregressionmodelwithgeneralexposurevariables.IextendScottandWild(2001)tomultiplediseasestatesandgeneralexposurevariables.Ipresentapseudoprospectivelikelihoodapproachtoprovideequivalentinferencefromtheretrospectivelikelihood.However,ItakeadifferentapproachfromScottandWild(2001)toconstructthepseudolikelihoodandtoexamineasymptoticpropertiesoftheestimators.AsChen(2003)pointedout,parametersnotinvolvedinoddsratiosarenotidentiablewithcase-controldata.Ialsopresentamodiedpseudoprospectivelikelihoodapproachovercomingthisnon-identiabilityissue.MymodiedpseudoprospectivemodelcoincideswithChen(2003)'sbase-linelogitmodel.However,myderivationofthemodelandasymptoticdistributionoftheestimatorisdifferentfromChen's.Ialsoextendmypseudoprospectivelikelihoodapproachtothecasewheresomeofcovariatesaremeasuredwitherrors.IpresentapseudojointlikelihoodapproachforwhichtheMLE'sofoddsratioparametersareequivalenttothosefromtheretrospectivelikelihood.ThisisanextensionofRoeder,Carroll,andLindsay(1996)whostudiedtheequivalenceofMLE'sinthesituationwheresomeofcovariatesaremeasuredwitherrorsincase-controlstudiesforthebinarylogisticregressionmodel.IalsoprovetheconsistencyoftheMLEundercertainconditions. 9

PAGE 10

CHAPTER1OVERVIEW:STATISTICALMETHODSFORCASE-CONTROLSTUDIES 1.1Case-ControlStudyandOddsRatio Case-controlstudiesmarkoneofthemostimportantareasofresearchinepidemiologicalstatistics.Theprimarygoalhereistoexaminetherelationshipbetweenacertaindiseaseandanexposureorasetofexposureswhichpotentiallycausethedisease.Thesestudiesareoftenemployedtostudyraredisease(e.g.cancer)becausetheyareverycost-efcientascomparedtocohortstudies.Thecostefciencyisachievedsincetheanalysisiscarriedoutbysmallteamsofresearchersinarelativelyshortperiodoftime.Thisisadvantageousoveracohortstudywhichfollowsalargenumberofhealthyindividualswithandwithoutexposureoveralongperiodoftimewhichbecomesimpracticalforrarediseasessuchascancer. Acasecontrolstudyisretrospectiveinthatsubjectsareselectedseparatelyfromcaseandcontrolpopulationsrstandtheirexposurelevelsarerecordedlater.Thisisincontrasttoacohortstudyinwhichhealthyindividualsarefollowedovertimetoseeiftheyhavethediseaseornotinthefuture.Inacasecontrolstudy,theoddsratioiscomputedtoassesstheassociationofdiseaseandexposure.IfDdenotesdisease(1forcases,0forcontrols)andzdenotesexposure(1forexposed,0forunexposed),thentheoddsratioforthebinarycaseis P(D=1jz=1)=P(D=0jz=1) P(D=1jz=0)=P(D=0jz=0).(1) Aninterpretationofanoddsratiobigger(smaller)than1isthatthediseaseandexposurearepositively(negatively)associated,whileanoddsratioof1impliesnoassociation.Anothermeasureofassociationistherelativeriskgivenby P(D=1jz=1) P(D=1jz=0).(1) 10

PAGE 11

TheoddsratioapproximatestherelativeriskforrarediseasesinceP(D=0jz=i)1fori=0,1. Tobespecic,considerTable1-1involvingabinaryexposure(exposedornon-exposed)andbinarydiseasestatus(withorwithoutdisease).Thenij'ssignifythenumbersineachcell. Table1-1. 22FrequencyTableforBinaryExposureandBinaryDiseaseVariables DiseaseStatusNotexposed(z=0)Exposed(z=1)Total Nodisease(D=0)n00n01n0Disease(D=1)n10n11n1 Totale0e1N SupposeTable1-1islledwithdataresultingfromaprospectivestudy.Theexposure-diseaseoddsratiodenotedbyy,canbeestimatedby^y=(n11n00)=(n01n10).Instead,ifTable1-1islledwithdataasaresultofacase-controlstudy,thentheoddsratioshouldbecomputedasP(z=1jD=1)=P(z=0jD=1) P(z=1jD=0)=P(z=0jD=0).However,thefollowingequalityenablesustoestimatethelaterby^yaswell.P(z=1jD=1)=P(z=0jD=1) P(z=1jD=0)=P(z=0jD=0)=P(D=1jz=1)=P(D=0jz=1) P(D=1jz=0)=P(D=0jz=0) (1) Inamoregeneralframeworkinwhichthediseasestatusismultiple(d=0,,r)andexposurevariablesaremulticategoryorcontinuouswithbaselinecategoryz0,thefollowingequalityholds.Pr(zjD=d)=Pr(z0jD=d) Pr(zjD=0)=Pr(z0jD=0)=Pr(D=djz)=Pr(D=0jz) Pr(D=djz0)=Pr(D=0jz0). (1) Thisimpliesthattheestimatorsofoddsratiosfromprospectivemodelsarevalidwhenestimatingthecorrespondingoddsratiosfromtheretrospectivemodel.Moreoverthisequalityholdsevenwhenzisanexposurevector. 11

PAGE 12

Fisher(1935)introducedlikelihood-basedtestfortheoddsratioyina22table.SupposewewanttotestHo:y=y0inTable1-1.Ifmarginaltotalsofn0,n1,e0,ande1areheldxed,thenthecellcountn11determinesremainingcellcounts.Fishernotedthatinthissetting,underHo,arandomvariablen11hasanon-centralhypergeometricdistributiongivenby Pr(n11je0,e1,n0,n1)=0B@n1n111CA0B@n0e1)]TJ /F11 11.955 Tf 10.95 0 Td[(n111CAyn110 u0B@n1u1CA0B@n0e1)]TJ /F11 11.955 Tf 10.95 0 Td[(u1CAyu0.(1) Thenap-valuecanbecalculatedasun11Pr(uje0,e1,n0,n1),un11Pr(uje0,e1,n0,n1)orsumofun11Pr(uje0,e1,n0,n1)andun11Pr(uje0,e1,n0,n1)dependingonthealternativehypothesis.Thisisanexacttestwhichisusedforsmallsamples. 1.2Cochran-Mantel-HaenszelTestandMantel-HaenszelEstimatorforCommonOddsRatio Cochran(1954)and,MantelandHaenszel(1959)developedastatistictotestHo:y=1whereyisacommonoddsratioparameteracrosstables(strata).Inaclinicaltrial,experimentsareoftenperformedinseverallocationsandresultswillvarydependingonthelocations.Inthiscase,thelocationscanbeconsideredasstrata. Cochran-Mantel-Haenszeltestisbasedontheideaofconditioningonthemarginaltotalsandasymptoticinference.InthedatastructureinTable1-2,theCochran-Mantel-Haenszelteststatisticis c2=fjk(n11k)]TJ /F11 11.955 Tf 10.95 0 Td[(E(n11kje0k,e1k,n0k,n1k)j)]TJ /F7 8.966 Tf 17.11 4.71 Td[(1 2g2 kVar(n11kje0k,e1k,n0k,n1k),(1) whereE(n11kje0k,e1k,n0k,n1k)=e1kn1k=Nk,Var(n11kje0k,e1k,n0k,n1k)=e0ke1kn1kn0k=Nk2(Nk)]TJ /F3 11.955 Tf 11.32 0 Td[(1).Theexpectationandthevariancearecalculatedfromthehypergeometricdistributionsimilarto( 1 ).Thep-valuesarecomputedfromthefactthattheteststatistichasanasymptoticchi-squaredistributionwithK)]TJ /F3 11.955 Tf 10.95 0 Td[(1degreeoffreedom. 12

PAGE 13

Table1-2. FrequencyTableforBinaryExposureandBinaryDiseaseforStratumk,k=1,,K DiseaseStatusNotexposed(z=0)Exposed(z=1)Total Nodisease(D=0)n00kn01kn0kDisease(D=1)n10kn11kn1k Totale0ke1kNk Fundamentally,thistestissimlartoFisher'sexacttestinthesensethatitconditionsonmarginaltotals.However,itdiffersinthatasymptoticinferenceisusedinsteadofexactinference.TheCochran-Mantel-Haenszeltestisanappropriatetestincase-controlstudiessincethemarginaltotalsofindividualswithorwithoutthedisease,namely,n0kandn1k,arexedincasecontrolstudies. MantelandHaenszel(1959)proposedapointestimatorofoddsratios,whichtakestheform ^yMH=Kk=1n11kn00k=Nk Kk=1n10kn01k=Nk.(1) Theadvantageofthisestimatoristhatitisnotaffectedbyzerocellentries,andisaconsistentestimatorofyevenforalargenumberofsmallstrata.MantelandHaenszeldidnotprovideanyvarianceformulafor^yMH.Robins,BreslowandGreenland(1986)andPhillipsandHolland(1987)proposedvarianceestimatorsoftheMantel-Haenszelestimator,coveringthetwoasymptoticsituations-asmallnumberoflargestrataandalargenumberofsmallstrata. TheCochran-Mantel-HaenszeltestandtheMantel-Haenszelestimatoraredesignedtouseforasinglebinaryexposurevariable.However,theycanbeusedforasinglemulti-categoryexposurevariableormultiplecategoricalvariablesbyconsideringeachfactoratatimeaftertreatingotherfactorsasthecovariatesforstratication.However,thisapproachdoesnotcovercontinuousexposurevariablesunlesstheyarecategorized. 13

PAGE 14

1.3LogisticRegressionandCase-ControlStudies Morerecentdevelopmentsincase-controlinferencesareprimarilybasedonlogisticregression.Thelogisticregressionhastheform P(D=1jz)=1)]TJ /F11 11.955 Tf 10.95 0 Td[(P(D=0jz)=exp(a+bTz) 1+exp(a+bTz)(1) forabinaryresponsevariableandarbitraryexposurevectorz.TheparametersaandbareconsideredasaninterceptandanoddsratioparameterrespectivelysincebT(z)]TJ /F11 11.955 Tf 11.1 0 Td[(z0)representsthelogoddsratioofdiseasedforexposurelevelzversusz0duetothefollowingequality: P(D=1jz)=P(D=0jz) P(D=1jz0)=P(D=0jz0)=expfbT(z)]TJ /F11 11.955 Tf 10.95 0 Td[(z0)g.(1) Onmostoccasions,bistheparameterofinterest. Corneldetal.(1961)presentedarelationshipbetweenthenormaldistributionandthelogisticregressionmodel.Iftheexposurevectorzconditionalondiseased(D=1)orundiseased(D=0)isnormallydistributedwithdifferentmeansbutacommoncovariancematrix,thentheprobabilityofdevelopingadiseasewhenanindividualhasaexposurelevelzhastheformgivenin( 1 ). Generally,regressionmethodsmodeltheprobabilitystructureofadiseasestatusgivenanyexposurevalue.Therefore,suchmodelsareappropriatetoanalyzedataobtainedfromprospectivestudiessuchascohortstudies,butingeneralarenotappropriatefordataobtainedfromretrospectivestudieslikecasecontrolstudies.However,Anderson(1972),andPrenticeandPyke(1979)showedevenforcase-controldata,inferencefromthe(prospective)logisticregressionmodeliscorrectforalltheoddsratioparametersthatareofmaininterest.Idiscussthisissueindetail,inSection1.4. 14

PAGE 15

1.4EquivalenceofInferenceforOddsRatioParametersBasedonProspectiveandRetrospectiveLikelihoodsofLogisticRegressionModelinCase-ControlStudies PrenticeandPyke(1979)showedthatthelogisticregressionmodelcanbeconstructedfromtheoddsratiomodel.AbinarydiseasestatusisdenotedbyD=0(undiseased)andD=1(diseased),andaexposurevectorisdenotedbyzwithabaselinecategoryz0.Thenthefollowingoddsratiomodel P(D=1jz)=P(D=0jz) P(D=1jz0)=P(D=0jz0)=expfbT(z)]TJ /F11 11.955 Tf 10.95 0 Td[(z0)g(1) isequivalenttothelogisticregressionmodel P(D=1jz)=exp(a+bT(z)]TJ /F11 11.955 Tf 10.95 0 Td[(z0))=f1+exp(a+bT(z)]TJ /F11 11.955 Tf 10.95 0 Td[(z0))g,(1) settinga=logfP(D=1jz0)=P(D=0jz0)g. Theyalsoshowedthatthefollowingretrospectivemodelisconstructedfromtheoddsratiomodel. Pr(zjD=1)=Pr(z0jD=1)exp[logfPr(zjD=0) Pr(z0jD=0)g+bT(z)]TJ /F11 11.955 Tf 10.95 0 Td[(z0)].(1) Althoughthismodelhasalogisticform,itdoesnotcorrespondtoastandardlogisticregressionmodelbecausetheintercepttermlogfPr(zjD=0) Pr(z0jD=0)gdoesnotchangeasdchangesfrom0to1,andthenormalizingconstantPr(z0jD=1)isnottheinverseofthesumofexp[logfPr(zjD=0) Pr(z0jD=0)g+bT(z)]TJ /F11 11.955 Tf 10.95 0 Td[(z0)]overd(d=0,1). Anderson(1972),andPrenticeandPyke(1979)provedfordiscreteexposureandgeneralexposurecasesrespectivelythattheinferencefortheoddsratioparametersfromtheprospectivelikelihoodisequivalenttoonefromtheretrospectivelikelihoodforcase-controldataunderthelogisticregressionmodel.IdiscussthePrentice-Pyke(1979)approachnext. Supposesamplesizesofn0(>0)controls(d=0)andnd(>0)casesfordiseasestatusd(d=1,,r)areobtainedfromacase-controlstudy.Letzdidenote 15

PAGE 16

theexposurevectorofithsubjectinthediseasestatusd(d=0,,r).Thentheretrospectivelikelihoodis LR=rd=0ndi=1Pdzdi qdg(zdi)(1) wherePdz=exp(ad+bTdz)=rl=0exp(al+bTlz)witha0=0,b0=0;g(z)isthemarginalprobabilityfunctionofzandisleftunspecied;andqd=RPdzg(z)m(dz),wheremissomes-nitemeasure.Wecanrewrite( 1 )as rd=0ndi=1(ndn)]TJ /F7 8.966 Tf 6.97 0 Td[(10=qdq)]TJ /F7 8.966 Tf 6.97 0 Td[(10)Pdzdi rl=0(nln)]TJ /F7 8.966 Tf 6.97 0 Td[(10=qlq)]TJ /F7 8.966 Tf 6.97 0 Td[(10)Plzdifrl=0(nln)]TJ /F7 8.966 Tf 6.97 0 Td[(1=ql)Plzdig(zdi)g=nd n.(1) Nowwedene pd(z)=exp(dd+bTdz)=rl=0exp(dl+bTlz)(d=0,,r)(1) wheredd=log(ndn)]TJ /F7 8.966 Tf 6.96 0 Td[(10=qdq)]TJ /F7 8.966 Tf 6.96 0 Td[(10)+ad,d=0,,rand h(z)=rl=0(nln)]TJ /F7 8.966 Tf 6.97 0 Td[(1=ql)Plzg(z).(1) NotethatRh(z)m(dz)=1.Thenthelikelihoodgivenin( 1 )canbewrittenas L=frd=0ndi=1pd(zdi)gfrd=0ndi=1h(zdi)g=L1L2.(1) Intheabovepd(z),or(d1,,dr,b1,,br),andh(z)arerestrictedby Zpd(z)h(z)m(dz)=nd n(d=0,,r),(1) andtheMLE's(^d1,,^dr,^b1,,^br)and^h(z)needtosatisfythisrelationshipaswell.NotealsothatL1hasaformoftheprospectivelikelihoodunderthelogisticregressionmodel. Let(^d1,,^dr,^b1,,^br)denotetheMLEofL1,i.e.,thesolutionoflogL1 dm=0(m=1,,r)andlogL1 bi=0(i=1,,r).Let^h(z)denoteanonparametricMLEofh(z)ofL2,whichisadiscretedensityhavingmassconcentratedatobservedvalues 16

PAGE 17

z1,,zKwithprobabilities^hk=nk=n,k=1,,K.Thenclearly,(^d1,,^dr,^b1,,^br)and^h(z)areamaximizerofL.Also,these(^d1,,^dr,^b1,,^br)and^h(z)arerelatedby Z^pm(z)^h(z)m(dz)=rk=0^pm(zk)^hk=rd=0ndi=1^pm(zdi)=n=nm=n(m=0,1,,r),(1) where^pm(z)=exp(^dm+^bTmz)=rl=0exp(^dl+^bTlz)(m=0,,r).Thethirdequalityin( 1 )followsfromlogL1 dm=0(m=1,,r)at(^d1,,^dr,^b1,,^br).Hence(^b1,,^br)arethesemiparametricmaximumlikelihoodestimatorsoftheoddsratioparameters(b1,,br). PrenticeandPykealsoprovedthat(^b1,,^br)isaconsistentestimatorof(b1,,br)byshowingthatthescorefunctionsderivedfromL1areunbiasedalsoundertheretrospectivemodel.TheyfurtherprovedthattheappropriatecomponentoftheinverseoftheobservedFisherinformationmatrixprovidesaconsistentestimatoroftheasymptoticvariance-covariancematrixof^b.NowtheequivalenceofinferenceforoddsratioparametersbasedonprospectiveandretrospectivelikelihoodsisestablishedsinceL1canbeviewedasoriginatingfromaprospectivelikelihood.However,theinterceptparameterofthelogisticregressionmodelcannotbeestimatedinthisapproachbecauseofthelackofidentiabilitybetweentheinterceptparameterandthepopulationproportionofdiseasestatus.Hsiehetal.(1985)andWild(1991)proposedmethodstosolvethisinestimabilityproblemusingsupplementaryinformation.IreviewtheirmethodsinsomedetailinSection1.6. WeinbergandWacholder(1993)generalizedAnderson(1972)toageneralmultiplicative-interceptmodel.Theyshowedthattheparameterestimatorsandtheirasymptoticvariance-covariancematrixestimatorsbasedontheobservedFisherinformationmatrixarecorrectuptothemultiplicativeintercepttermsincase-controlstudieswithdiscreteexposurevariableswhentheprospectivelikelihoodisused.ScottandWild(2001)extendedthistothecasewithgeneralexposurevariablesandbinary 17

PAGE 18

responsevariables.AbriefexplanationforthemultiplicativeinterceptmodelisgiveninSection1.6. SeamanandRichardson(2004)presentedanalternativeproofoftheequivalenceoftheinferencefortheoddsratioparametersbasedonprospectiveandretrospectivelikelihoodsunderthelogisticregressionmodelusingmultinomial-Poissontransformation.Furthermore,theyextendedthisfrequentistequivalencealsotoaBayesianframework.Theyshowedthattheposteriordistributionoftheoddsratioparameterbasedontheprospectivelikelihoodisthesameastheonebasedontheretrospectivelikelihoodunderthelogisticregressionmodelforcertainpriors.Theyassumedbinaryresponsevariablesanddiscreteexposurevectors.Aspriors,theyusedauniformpriorfortheinterceptparameter,andDirichletpriorfortheprobabilitythatanundiseasedhasacertainexposurevector;anarbitraryproperpriorwasassignedtotheoddsratioparameters.Ghosh,ZhangandMukherjee(2006)extendedthisresulttomatchedcasecontrolstudiesincludingmissingness. Kagan(2001)exploredthelinkfunctionsofthegeneralizedlinearmodelandshowedthatthelogitistheonlylinkforwhichtheprospectivelikelihooddiffersfromtheretrospectivelikelihoodonlybytheintercepttermforthebinaryresponsevariable.Inhisretrospectivesamplingscheme,asampleofsizenisdrawnfromthenitepopulationf(D1,z1),,(DN,zN)gwiththeselectionprobabilityofacaseandacontrolgivenrespectivelybyp1andp0.Thissamplingschemeisslightlydifferentfromtheonetypicallyusedinacase-controlstudy.Thisisbecausefnig,thesamplesizeforfD=ig,i=0,1,israndomunderKagan'sframework,whilefnigisxedundertheusualcase-controlsamplingscheme.MukherjeeandLiu(2009)extendedKagan'sresulttomulti-categoryresponsevariablesunderthesamesamplingscheme.Theyprovedthatthegeneralizedmultinomiallogitistheonlylinkforwhichtheprospectivelikelihooddiffersfromtheretrospectivelikelihoodonlybyinterceptterms.Theyalso 18

PAGE 19

providedapproximateexpressionforthebiascausedbyttingtheprospectivelikelihoodincase-controlstudies. 1.5EquivalenceofInferenceforOddsRatioParametersBasedonProspectiveandRetrospectiveLikelihoodsofLogisticRegressionModelinCase-ControlStudieswithMeasurementError Roeder,CarrollandLindsay(1996)gaveadifferentprooffortheequivalenceofinferenceforoddsratioparametersbasedontheprospectiveandretrospectivelikelihoodsincase-controlstudiesforbinarylogisticregressionmodelsusingadifferentapproach.Thentheyextendedittocase-controlstudieswitherrorsincovariates.Wenowdiscusstheirresults.SupposealikelihoodcanberepresentedasLC(q,f)=LJ(q,f) LM(q,f), whereLM(q,f)isofmultinomialform,i.e.,LM(q,f)=Kk=0fpk(q,f)gnkwithn=knkandkpk(q,f)=1.Ifthereexistsfdependingon(q,f)satisfying(a)LC(q,f)=LC(q,f);and(b)pk(q,f)=nk nthenthefollowinghold: (i)(^qC,^fC),themaximizerofLC(q,f)satisfying(b),maximizesLJ(q,f), (ii)(^qJ,^fJ),anymaximizerofLJ(q,f),satises(b)andalsomaximizesLC(q,f). Moreoverifforany(q,f)intheparameterspace,thereexistsfsatisfying(a)and(b)in,theprolelikelihoodforqisthenthesameforLcandLJ. Theynotedthatforbinarylogisticregression,theretrospectivelikelihoodLCcanbewrittenasLJ=LMinwhichLJ=1d=0ndi=1Pdzdig(zdi)andLM=1d=0qnddwhereP1z=1)]TJ /F11 11.955 Tf 11.06 0 Td[(P0z=exp(b0+bT1z) 1+exp(b0+bT1z),g(z)isanyarbitraryprobabilityfunction,andq0+q1=1.Theyalsoobservedthat b0=b0+log(n1(1)]TJ /F11 11.955 Tf 10.95 0 Td[(p)=n0p)(1) wherep=Pr(D=1jb0,b1,g)and g(z)=f1+exp(b0+bT1z)g=f1+exp(b0+bT1z)g zf1+exp(b0+bT1z)g=f1+exp(b0+bT1z)gg(z)(1) 19

PAGE 20

satisfyconditions(a)and(b).Henceitfollowsthatthe^b1Cthatmaximizestheprolelikelihoodofb1fromtheretrospectivelikelihoodLCcanbeobtainedbymaximizingthejointlikelihoodLJ,whichamountstomaximizingtheprospectivelogisticregressionoverb0andb1. Roeder,CarrollandLindsayalsoconsideredthesituationinwhichsomeofthecovariatesaremeasuredwitherrors.Theirmotivatingexamplewasastudyfortheeffectoflow-densitylipoprotein(LDL)cholesterol(Z)ontheprobabilityofheartdisease(D).Acase-controldataisdividedintoagroupofsizenC(completedata)andanothergroupofsizenR(reduceddata).Inthecompletegroup,LDLlevelandtotalcholesterol(W)aremeasuredforn0controls(D=0)andn1cases(D=1),andinthereducedgroup,onlytotalcholesterolismeasuredform0controlsandm1casesduetothehighcostofmeasuringLDLlevelrelativetototalcholesterol.ThecompletedatasetisfWi,Zi,Di,i=1,,nCgandreduceddatasetisfWj,Dj,j=1,,nRg. Theyassumedthati)aconditionalprobabilityfunctionofDgivenZismodeledasfDjZ(1jz;b0,b1)=1)]TJ /F11 11.955 Tf 11.47 0 Td[(fDjZ(0jz;b0,b1)=exp(b0+bT1z) 1+exp(b0+bT1z);(ii)aconditionalprobabilityfunctionofWgiven(Z,D)ismodeledparametricallyasfWjZ,D(wjz,d;d);(iii)amarginalprobabilityfunctionofZ,g(z),isleftunspecied.Thentheretrospectivelikelihoodcanbewrittenas LC=1j=0nji=1f(wjijdji,zji)f(djijzji)g(zji) Rf(djijz)g(z)m(dz)1j=0mjk=1Rf(wjkjdjk,z)f(djkjz)g(z)m(dz) Rf(djkjz)g(z)m(dz).(1) NotethattheLCin( 1 )canbereexpressedasLJ=LMwhereLJ=1j=0nji=1f(wjijdji,zji)f(djijzji)g(zji)1j=0mjk=1Zf(wjkjdjk,z)f(djkjz)g(z)m(dz), and LM=1j=0fZf(jjz)g(z)m(dz)gnj+mj.(1) Now,notingthatb0andg(z)in( 1 )and( 1 )satisfytheconditions(a)and(b),itfollowsthat^b1,themaximumlikelihoodestimatorofb1fromtheretrospectivelikelihood 20

PAGE 21

LC,canbeobtainedfromtheprolelikelihoodofb1basedonthejointlikelihoodLJ.Roeder,CarrollandLindsaystatedthatifLJsatisestheregularityconditionsspeciedbyKieferandWolfowitz(1956),then^b1isconsistent.ButtheydidnotprovewhetherLJsatisesKieferandWolfowitz'sregularityconditions.LaterMurphyandvanderVaart(2001)provedtheconsistencyof^b1whenn0=m0andn1=m1.Theyalsoprovedtheasymptoticnormalityof^b1.However,theydidnotprovideexplicitexpressionfortheasymptoticcovarianceof^b1duetocomputationaldifculty.Instead,theyestablishedtheasymptoticchi-squaredistributionofthelikelihoodratiostatisticfortestingH0:b1=b10. 1.6Case-ControlStudieswithKnownPopulationTotals Itisknownthatprospectiveprobabilities,ingeneral,arenotestimablewiththedataobtainedonlyfromcase-controlstudies(Cosslett,1981,p.1297;Hsiehetal.,1985,Chapter2).Forexample,inthelogisticregressioncase,theintercepttermoftheprospectivelogisticregressionmodelandthemarginaldistributionfunctionoftheresponsevariableareunidentiableasstatedinSection1.4,andsotheprospectiveprobabilitycannotbeestimatedwithoutsupplementaryinformation. Hsiehetal.(1985)andWild(1991)consideredthesituationinwhichthepopulationtotalsofeachresponsecategoryareknownincasecontrolstudies.TheypresentedmethodstoestimatealltheparametersoftheprospectiveprobabilityP(djz)andthustheprospectiveprobabilityitself.Theexamplesofthesourceoftheknownpopulationtotalsincludehospitalrecordsandgeneralpopulationstatistics.Thesituationtheyconsideredisasfollows.Supposethepopulationtotalsforeachresponsecategory(d=0,,r)areknownasN0,,Nr,andn0,,nrsamplesareobtainedfromacase-controlstudy.Thenthelikelihoodcanbewrittenas rd=0fndi=1Pr(zdijd)gfPr(D=d)gNd=rd=0fndi=1Pd(zdi;q)Pr(zdi)gfZPd(z;q)dF(z)gNd)]TJ /F13 8.966 Tf 6.97 0 Td[(nd,(1) wherePr(djz,q)=Pd(z;q)andF(z)isacumulativedistributionfunctionofz(Wild,1991;ScottandWild,1997).Notethattheabovelikelihoodisidenticaltothelikelihood 21

PAGE 22

fromatwo-stagestudydesigninwhichtheresponsecategoryisrecordedonallsubjectsattherststageandcovariatesaremeasuredonlyforsubsamplesatthesecondstage.Dealingwiththelikelihoodin( 1 )isnoteasybecauseoftheintegraltermRPd(z;q)dF(z)inthedenominator. Hsiehetal.(1985)andWild(1991)presentedrelativelysimplepseudolikelihoodapproachesandcomparedthemwiththemaximumlikelihoodestimatorbasedonthelikelihoodgivenin( 1 ).Amongthem,pseudoconditionallikelihoodapproach,whichistermedconditionalmaximumlikelihoodinHsiehetal.(1985),producesconsistentestimatorsfortheparametersoftheprospectivemodelPd(z;q)andmoreovertheseestimatorsareessentiallyasefcientasthemaximumlikelihoodestimator. Thepseudoconditionallikelihoodisamodiedversionofthefollowinglikelihood: rd=0ndi=1(ndn)]TJ /F7 8.966 Tf 6.96 0 Td[(1=qd)Pd(zdi;q) l(nln)]TJ /F7 8.966 Tf 6.96 0 Td[(1=ql)Pl(zdi;q),(1) whereqd=RPd(z;q)dF(z)andn=rd=0nd.Thisisalikelihoodforanewprospectivemodel Pr(D=djz)=(ndn)]TJ /F7 8.966 Tf 6.97 0 Td[(1=qd)Pd(z;q)=frl=0(nln)]TJ /F7 8.966 Tf 6.97 0 Td[(1=ql)Pl(z;q)g(1) whichistheprobabilityfunctionofaresponsedgivenacovariatezundertheprobabilitystructureinwhichthejointdistributionof(z,d)istheproductofPr(D=d)=nd=nandPr(zjd)=Pd(z;q)Pr(z)=qd,theretrospectivemodelinoursetting.IfqdisreplacedwithitsnaturalestimatorNd NwhereN=rd=0Nd,thenwehavethepseudoconditionallikelihood.Thepseudoconditionallikelihoodiswrittenas LC(q)=rd=0ndi=1mdPd(zdi;q) rl=0mlPl(zdi;q),(1) wheremd=nd Nd(d=0,,r).Theestimator^qCobtainedbymaximizingLCin( 1 )isaconsistentestimatorofqalthoughitis,ingeneral,notidenticaltothemaximumlikelihoodestimator^qfromthelikelihoodgivenin( 1 ). 22

PAGE 23

Hsiehetal.(1985)foundthat^qCcoincideswiththemaximumlikelihoodestimator^qforthemultiplicativeinterceptmodel.Themultiplicativeinterceptmodelhastheform Pr(djz)=Pdz(q=(a1,,ar,b))=expfad+Qd(z,b)g=rl=0expfal+Ql(z,b)g(1) wherea0=0andQ0(z,b)=0forallz.NotethatthelogisticregressionmodelisincludedinthisclassofmodelswithQd(z,b)=zTbd.Anotherexampleofthisclassofmodelsisthestereotypemodel(Anderson,1984)withQd(z,b)=fdzTb,where0=f0f1>>fr)]TJ /F7 8.966 Tf 6.97 0 Td[(1>fr=0. ScottandWild(1997)presentedaniterativewaytoobtainthemaximumlikelihoodestimator^qanditsasymptoticvariance-covariancematrixestimatorforanymodelinthissituation.Intheirapproach,thepseudoconditionallikelihoodLCin( 1 )isttediterativelyupdating(m1,,mr)values.Then^qCconvergestothemaximumlikelihoodestimator^q.Moreover,simplemodicationoftheobservedFisherinformationmatrixatthenaliterationgivesaconsistentasymptoticvariance-covariancematrixestimatefor^q. 1.7Case-ControlStudieswithoutSupplementaryInformation ScottandWild(2001)analyzedcase-controldatawithanarbitrarymodelevenwhenthereisnosupplementaryinformation.Theydevelopedamaximumlikelihoodapproachforabinarydiseasestatuscase.Inthesituationwheren0controls(d=0)andn1(d=1)casesareobtainedfromacase-controlstudyandanyparametricprospectivemodelisassumedfortherelationshipbetweentheexposurevectorzandthediseasestatusd,theretrospectivelikelihoodcanbewrittenas 1d=0ndi=1Pr(zdijd)=1d=0ndi=1Pd(zdi;q)Pr(zdi)=qd,(1) wherePd(z;q)representsanyparametricprospectivemodelandqd=RPd(z;q)dF(z)withF(z)asacumulativedistributionofz.ScottandWildpresentedapseudolikelihoodapproachwhichproducesthemaximumlikelihoodestimatorofqfromtheabove 23

PAGE 24

retrospectivelikelihood.Theyshowedthatthismaximumlikelihoodestimatorisaconsistentestimatorofqandisasymptoticallynormallydistributed.Theyalsogaveananalyticalexpressionfortheasymptoticvariance-covariancematrixoftheestimatorbasedonthepseudolikelihood.Thepseudolikelihoodinducedbythemisasfollows.Supposethesupportofexposurezisnitewithztakingvaluesz1,,zJwithprobabilitiesdj(j=1,,J).Letd=(d1,,dJ).Theloglikelihoodofthelikelihoodin( 1 )canthenbewrittenas `(q,d)=1d=0Jj=1ndjlogPd(zdj;q)+Jj=1n+jlogdj)]TJ /F7 8.966 Tf 17.34 12.78 Td[(1d=0nd+log(Jl=1Pd(zl;q)dl)(1) wherendjisthenumberofsubjectsinthediseasestatusdandexposurecategoryj;n+j=1d=0ndj,nd+=Jj=1ndj;Jj=1dj=1.Writingk=(n1=q1) (n2=q2),theratiooftherelativesamplingrates,theproleloglikelihoodofq,`p(q),canbewrittenas `p(q)=`(q,k(q))(1) where `(f=(q,k))=1d=0Jj=1log(Pd(zdj;f))(1) withlogitP1(z;f)=logitP1(z;q)+logkor P1(z;f)=kP1(z;q) kP1(z;q)+P0(z;q),(1) andk(q)satisestheequation `(q,k) k=0.(1) Thisimpliesthatthepseudologlikelihood`(f)in( 1 )producesthesameestimatorastheproleloglikelihood`p(q)andtheresulting^qisobtainedfrom( 1 ). Thepseudologlikelihood`(f)in( 1 )hastheformofaprospectiveloglikelihoodbecausePd(z;f)canbeconsideredasaprospectivemodelsinceP0(z;f)+P1(z;f)=1regardlessofzandf.Infact,`(f)in( 1 )isidenticaltotheloglikelihood 24

PAGE 25

ofLCin( 1 )inSection1.6.However,thekin`(f)isestimatedsimultaneouslywithqusingcasecontroldatawhilethem1inLCisestimatedwithsupplementaryinformationbeforetheestimationprocessforqisperformedwithcasecontroldata. LaterChen(2003)pointedoutthatScottandWild(2001)'sapproachislesstransparantbecauseoftheproblemofparameteridentiability.Forexample,inalogisticregressionmodelwithlogitP1(z;q)=q0+zTq1,q=(q0,q1),P1hasalsoalogisticformwithlogitP1(z;q)=q0+logk+zTq1.Hence,q0andkcannotbeestimatedseparately.Moregenerally,q0andkcannotbeestimatedseparatelyforamultiplicativeinterceptmodelinwhichlogitP1(z;q)=q0+Q(z,q1).Chen(2003)rstexaminedwhichparametersareidentiableincase-controlstudies.Heprovethataparameterisidentiedfromcase-controlstudiesifandonlyifthisparticularparameterisidentiedfromoddsratiofunctions.Secondly,hesuggestedabase-linelogitregressionmodelofwhichformislogP(D=djz) P(D=0jz)=bd+loghdz, wherehdz=P(D=djz)=P(D=0jz) P(D=dj0)=P(D=0j0).TheMLEsfortheparametersinh'sfromthismodelareequivalenttothosefromtheretrospectivelikelihood. 1.8TopicsofMyDissertation Theequivalenceofinferencebasedonprospectiveandretrospectivelikelihoodsiswell-studiedinthefrequentistframework.Anderson(1972),andPrenticeandPyke(1979)showedthattheequivalenceoftheprospectiveandretrospectivelikelihoodsintermsoftheinferencefortheoddsratioparameterholdsforlogisticregressionmodels.WeinbergandWacholder(1993)extendedittomultiplicativeinterceptmodels.Kagan(2001)andMukherjeeandLiu(2009)respectivelyprovedforbinaryandmulti-categoryresponsevariablesthatthelogitistheonlylinkforwhichtheretrospectivelikelihooddiffersfromtheprospectivelikelihoodbytheintercepttermundercertainretrospectivesamplingscheme.Roeder,CarrollandLindsay(1996)exploredthisequivalencepropertyinthecaseofmeasurementerrorsincovariates. 25

PAGE 26

Incontrast,intheBayesianframework,onlyafewstudiesexaminedtheequivalenceofposteriorinferencebasedonprospectiveandretrospectivelikelihoods.SeamanandRichardson(2004)showedthattheposteriorinferencefortheoddsratioparameterbasedontheprospectivelikelihoodisidenticaltotheonebasedontheretrospectivelikelihoodforthebinarylogisticregressionmodelwithdiscreteexposurevariables.Ghosh,ZhangandMukherjee(2006)extendedittothematchedcase-controlstudiesinvolvingmissingdata.InChapter2,IpresentageneralBayesianequivalenceresultwhichcanbeappliedtoanygeneralizedlinearmodelwithanarbitrarylink. InordertoprovethisgeneralBayesianequivalence,Iexpresstheprospectivemodelintermsofnormalizedodds(h)andanormalizingconstant(a)andthenshowthattheposteriorinferenceforthenormalizedoddshbasedontheprospectivelikelihoodisequivalenttoonebasedontheretrospectivelikelihoodforacertainclassofpriors.ThisclassofpriorsismoregeneralthanthatofSeamanandRichardson(2004). Ifthemodelisamultiplicativeinterceptmodelwhichincludesthelogisticregressionmodel,thentheoddsratioparametercanbeconsideredasareparameterizationofanormalizedoddsparameterh.Moreover,thetransformedpriorsfortheoddsratioparameterandtheinterceptparameterfromthepriorsofaandhusedintheresultincludeasspecialcasesthepriorsofSeamanandRichardson(2004). Inaddition,ageneralBayesianequivalenceresultforstratiedcase-controlstudiesisgiven,andaBayesiananalysisforacolorectalcancercase-controlstudydatawiththestereotypemodelisperformedasanapplication. ScottandWild(2001)presentedapseudolikelihoodapproachthatproducesmaximumlikelihoodestimatorsforanyarbitrarybinaryregressionmodel,Pd(z;q)(d=0,1),incase-controlstudies.Theyshowedthatthemaximumlikelihoodestimatorforqunderthispseudolikelihoodisidenticaltotheoneundertheretrospectivelikelihood.Theyalsoprovedconsistencyandasymptoticnormalityofthemaximumlikelihoodestimatorforq.Besides,theyshowedthattheasymptoticvariance-covariance 26

PAGE 27

matrixestimatorforestimatorsofqobtainedfromtheinverseoftheobservedFisherinformationmatrixbasedonthepseudolikelihoodisaconsistentestimatorfortheasymptoticvariance-covariancematrixofthemaximumlikelihoodestimatorforq. InChapter3,Iextendtheirworktoanymulti-category(responsevariable)regressionmodels,Pd(z;q)(d=0,,r).HowevermyapproachissimilartothatofPrenticeandPyke(1979)ratherthanthatofScottandWild.Irstconsideranimaginarypopulationinwhichtheretrospectiveprobabilityisthesameasoneforthetruepopulationbutthepopulationproportionofeachdiseasestatusissettobethesamplingfraction(seee.g.(15)-(18)inthecontextofPrenticeandPyke(1979)).ThenIshowthattheprospectivelikelihoodbasedonthisimaginarypopulation,whichwewillrefertoasapseudolikelihood,producesthesamemaximumlikelihoodestimatorofqasthatfromtheretrospectivelikelihood.Infact,thispseudolikelihoodisconstructedwithanyregressionmodel,Pd(z;q)(d=0,,r)andsomelatentparameters(r1,,rr).TheconsistencyofthemaximumlikelihoodestimatorforqisprovedbyusingaTheoremofFoutz(1977),andasymptoticnormalityisprovedbyapplyingtheLindberg-FellerCentralLimitTheorem.Asstatedintheearlierparagraph,IalsoshowthatasubmatrixoftheinverseoftheobservedFisherinformationmatrixbasedonthepseudolikelihoodprovidesaconsistentestimatorfortheasymptoticvariance-covariancematrixofthemaximumlikelihoodestimatorofq. Chen(2003)pointedoutthatparametersnotinvolvedinoddsratiosarenotidentiableincase-controlstudies.Forexample,theinterceptparametersarenotidentiedunderthemultiplicativeinterceptmodelsincetheseparametersarenotinvolvedinoddsratios.Thecontinuationratiomodelalsohasanon-identiableparameter.IprovearesultsameasChen'susingadifferentapproach.Myapproachutilizesthepseudomodelusedinthepseudolikelihood.Ialsopresentamodiedpseudomodeltotakecareofthenon-identiabilityissue.Themaximumlikelihoodestimatorfromthemodiedpseudomodelisequivalenttothatfromtheretrospective 27

PAGE 28

modelfortheparametersinoddsratios,andisaconsistentestimator.TheinverseoftheobservedFisherinformationmatrixbasedonthemodiedpseudomodelprovidesaconsistentestimatorfortheasymptoticcovariancematrixfortheMLE.Themodiedpseudomodelcoincideswiththebase-linelogitmodelproposedbyChen(2003). Forthemultiplicativeinterceptmodel,themodiedpseudomodelhasthesameformasthatoftheprospectivemodel.Henceitautomaticallyfollowsthatmaximumlikelihoodestimatorsforoddsratioparametersfromtheretrospectivelikelihoodcanbeobtainedfromtheprospectivelikelihood.Also,maximumlikelihoodestimatorsforoddsratioparametersareconsistentestimatorsofoddsratiosandasymptoticallynormallydistributed.Theasymptoticvariance-covarianceestimatorfortheestimatorsofoddsratioparametersobtainedfromtheinverseoftheobservedFisherinformationmatrixbasedontheprospectivelikelihoodisaconsistentestimatorfortheasymptoticvariance-covariancematrixofthemaximumlikelihoodestimatorfortheoddsratioparameters.Thisresultholdsforarbitraryexposurevector.WeinbergandWacholder(1993)showedthisforadiscreteexposurevariable.ScottandWild(2001)showedthisforgeneralexposurevariables,butbinaryresponsevariables. InChapter4,Iexplorethemeasurementerrorproblemincase-controlstudies.Roeder,Carroll,andLindsay(1996)studiedequivalenceofinferenceforoddsratioparametersbasedontheprospectiveandretrospectivelikelihoodsinthepresenceofthemeasurementerrorincovariatesforthebinarylogisticregressioncases.Theyconsideredthesituationinwhichonecollectstwosetsofdata.Foronesetofdata(reduceddata),valuesofsubjectsforsurrogatevariables(W)anddiseasestatus(D)arerecorded,andfortheothersetofdata(completedata),valuesofsubjectsforsurrogatevariables,exposurevariables(Z),anddiseasestatusarerecorded.ThentheyassumedthatconditionalprobabilityofDgivenZismodeledasabinarylogisticregression,f(djz,q),andconditionalprobabilityofWgivenZandDismodeledparametrically,f(wjz,d,t).Buttheydidnotassumeanyparametricmodelforthemarginalprobability 28

PAGE 29

functionofZ,m(z),andusednonparametricapproachtoestimateit.Underthissetting,theyshowedthatthemaximumlikelihoodestimatorfortheoddsratioparametercanbeobtainedfromthejointlikelihood.MurphyandvanderVaart(2001)showedthattheMLEbasedonthejointlikelihoodisaconsistentestimatorandasymptoticallynormallydistributedwithacase-controldataundercertainconditions. IextendRoeder,Carroll,andLindsay(1996)toanyarbitraryregressionmodelwithmultiplediseasestatus.IpresentapseudojointlikelihoodapproachtoprovidetheMLEbasedontheretrospectivelikelihoodforparametersinoddsratiosforgeneralmodels.ThepseudojointlikelihoodissimilartotheactualjointlikelihoodexceptthatthepseudomodelinChapter3isusedasaconditionalprobabilityofDgivenZinthejointlikelihood.IshowthattheMLEfromthispseudojointlikelihoodisequivalenttothatfromtheretrospectivelikelihoodfortheparametersinoddsratios.IalsoshowthattheMLEisaconsistentestimatorunderMurphyandvanderVaart(2001)'sconditions.TheasymptoticnormalityoftheMLEisleftasatopicforfutureresearch. 29

PAGE 30

CHAPTER2ONTHEEQUIVALENCEOFPOSTERIORINFERENCEBASEDONRETROSPECTIVEANDPROSPECTIVELIKELIHOODS Themainobjectiveofcase-controlstudiesistomeasurethedegreeofassociationbetweenacertaindisease(forexamplecancer)andoneormoreexposurevariablesunderconsideration(forexamplesmoking,familyhistory,obesityetc.).Statisticalanalysisofcase-controldataisprimarilybasedontheexposure-disease,orequivalentlythedisease-exposureoddsratio(Corneld,1951)whichmarksthedegreeofassociationbetweenthediseaseandtheexposure.Withadvancesinmodernmedicineandclinicaldiagnosis,moreprecisecharacterizationofdiseasesubtypesareoftenpossiblewiththeneedforamoregeneralizedoddsratiofunction,orequivalentlymoregeneralmodelsfortheprobabilityofdiseasegivenexposurethanlogisticregressionmodels,tocapturetheriskheterogeneityacrossdifferentsubtypes. Case-controlstudiesareprimarilyretrospectiveinnaturesinceonetracestheexposurehistoryofanindividualoragroupofindividualsconditionalontheoutcomecategory.Assuch,theusualprospectiveanalysissuitableforcohortdataisnotobviouslyapplicableinthecase-controlcontext.Thusitisimportanttoknowwhetheraprospectiveanalysisofthedisease-exposureoddsratioparametersproducesequivalentanswerstothecorrespondingretrospectiveanalysis.Answeringthisquestionisofrelevancesinceprospectivemodelsusuallyinvolvefewerparametersthanthecorrespondingretrospectivemodels.Retrospectivemodelingrequiresspecifyingthe(potentially)high-dimensionaljointdistributionofexposuresgiventheoutcomecategory.Also,standardsoftwareisusuallyavailableforaprospectiveregressionmodelwhereasretrospectivemodelsmayrequiredevelopmentofproblem-speciccode.Thus,investigatingtheequivalenceofthetwoapproachesisaninterestingandimportantstatisticalquestion. Withbinaryoutcomesandtheprospectivelogisticregressionmodel,Anderson(1972)providedanequivalenceresultfortheinferenceoftheoddsratioparameter, 30

PAGE 31

restrictedonlytodiscreteexposures.Continuingwiththismodel,theclassicpaperofPrenticeandPyke(1979)providedamoregeneralresult,wheretheexposurevariablescouldbediscrete,continuousoracombination.Inparticular,PrenticeandPykeprovedthatthemaximumlikelihoodestimator(MLE)fortheoddsratioparameterbasedonaprospectivemodelisequivalenttothatbasedonaretrospectivemodel.Moreover,thisMLEisconsistentevenundertheretrospectivemodelandtheobservedFisherinformationmatrixfortheoddsratioparametersbasedontheprospectivemodelprovidescorrectstandarderrorestimatesundertheretrospectivemodel.TheworkofPrenticeandPyke(1979)spurredfurtherresearchinthisgeneralarea.Amongothers,ImayrefertoCarroll,GailandLubin(1993),Roeder,CarrollandLindsay(1996),ScottandWild(1986,1991,1997),Wild(1991),BreslowandCain(1988),BreslowandChatterjee(1999),andChatterjee(2004). EquivalenceofposteriorsfortheoddsratioparameterbasedoneithertheprospectiveortheretrospectivelikelihoodinaBayesiancontextisofmorerecentorigin.ThisproblemwasrststudiedbySeamanandRichardson(2004).Theypresentedaclassofpriorsforwhichtheposteriorforoddsratioparametersbasedontheprospectivelikelihoodisequivalenttothatbasedontheretrospectivelikelihoodforunmatchedcase-controlstudiesunderthelogisticregressionmodel.Ghosh,ZhangandMukherjee(2006)extendedtheirworktomatchedcase-controlstudiesusingthesameclassofpriorsasthatofSeamanandRichardson(2004).Staicu(2010)presentedageneralclassofpriorsfortheposteriorequivalenceofoddsratioparameterswhichincludestheclassofpriorsofSeamanandRichardson(2004).ShealsoshowedthatSeamanandRichardson(2004)'spriorsareuniquepriorsfortheoddsratioparameter'sposteriorequivalencewhenthenuisanceparameterintheprospectivelikelihood,thenuisanceparameterintheretrospectivelikelihood,andtheoddsratioparameteraremutuallyindependent.Howeverallthesepaperswererestrictedtothebinarylogisticmodelanddiscreteexposures. 31

PAGE 32

Thepurposeofthischapteristoextendtheaboveresultsintwodirections.First,Iaccommodatemultiple,possiblyordered,diseasestates.Second,Irelaxtherestrictiontologisticregressionmodels,byincludingthoseinvolvingtheprobitandskew-symmetriclinks.AnotherinterestingcasethatIcanhandleistheproportionaloddsmodel,andinparticularthecumulativelogitmodel(Agresti,1984,p.322).However,theresultsarestilllimitedtomodelswithdiscreteexposurevariables. AspecialandimportantcaseofthegeneralmodelstructureIconsideristhemultiplicativeinterceptmodelconsideredinWeinbergandWacholder(1993).AfurtherspecialcaseisthestereotypemodelrstintroducedbyAnderson(1984)andsubsequentlystudiedbyGreenland(1994),Kuss(2006)andAhnetal.(2009,2010)amongothers.Astereotypemodelliesin-betweenthepolytomouslogisticmodelandtheproportionaloddsmodel,butcanrepresenttheordinalcharacteristicsofaresponsevariableundersuitableorderconstraintsontheparameters. Theoutlineoftheremainingsectionsisasfollows.Section2.2presentsthegeneralBayesianequivalenceresultwithaclassofpriors.Conditionsensuringtheproprietyoftheposteriorsarealsogiveninthissection.Section2.3givesanequivalentformulationforthespecialmultiplicativeinterceptmodel.Section2.4extendstheresultsofSections2.2and2.3forstratiedcase-controlstudies.Section2.5presentsanexampleofBayesiananalysisofacase-controlstudywithmultiplecancerstagesunderthestereotypemodel. 2.1TheBayesianEquivalenceResult Considerthesituationwhentherearer+1diseasecategoriesd=0,1,,randanexposurevectorXwhichassumesthevaluesx1,,xK.Let Pdk=P(D=djX=xk),d=0,1,,r,k=1,,K.(2) Alsolet P(X=xkjD=0)=gk Kl=1gl,k=1,,K.(2) 32

PAGE 33

Then,byusingatechniqueofSattenandKupper(1993),itiseasytoshowthat P(X=xkjD=d)=gkPdkP)]TJ /F7 8.966 Tf 6.97 0 Td[(10k Kl=1glPdlP)]TJ /F7 8.966 Tf 6.97 0 Td[(10l,k=1,,K.(2) IdenotebyZdk,thenumberofindividualswithD=dandX=xk.Then,from( 2 ),theprospectivelikelihoodisgivenby LP=rd=0Kk=1P(D=djX=xk)Zdk=rd=0Kk=1PZdkdk,(2) whilefrom( 2 )and( 2 ),theretrospectivelikelihoodisgivenby LR=Kk=1rd=0P(X=xkjD=d)Zdk=Kk=124 gk Kl=1gl!Z0krd=1(gkPdkP)]TJ /F7 8.966 Tf 6.97 0 Td[(10k Kl=1glPdlP)]TJ /F7 8.966 Tf 6.97 0 Td[(10l)Zdk35.(2) Iwritead=Kl=1Pdl=P0landhdk=a)]TJ /F7 8.966 Tf 6.97 0 Td[(1d(Pdk=P0k),k=1,,K.NotethatthehdkarenormalizedoddsandKl=1hdl=1. Thehdkareparametersofinterestsincetheoddsratio(Pdk=P0k)=(Pdl=P0l)areequivalentlyexpressibleashdk=hdl.Asanexample,forthemultiplicativeinterceptmodelwherePdk=expfrd+Q(xk,bd)g=rj=0expfrj+Q(xk,bj)g,whereQcanbeanyarbitraryfunctionoftheexposurevariableandparameters,onegetstheoddsratioshdk=hdl=exp[Q(xk,bd))]TJ /F11 11.955 Tf 10.87 0 Td[(Q(xl,bd)]whichdonotdependontheinterceptparametersrd. MyobjectiveistondaclassofpriorsforwhichtheposteriorinferenceforthehdkremainsthesameunderLPorLRgivenin( 2 )and( 2 )respectively.Tothisend,IrstrewriteLPandLRrespectivelyas LP=(rd=1Kk=1(adhdk)Zdk)Kk=18<: 1+rd=1adhdk!)]TJ /F15 8.966 Tf 7.96 -.66 Td[(rd=0Zdk9=;(2) and LR=Kk=18<: gk Kl=1gl!Z0krd=1 gkhdk Kl=1glhdl!Zdk9=;(2) sincePdk=P0k=adhdkandKl=1hdl=1. 33

PAGE 34

Iwillalsowritea=(a1,,ar),g=(g1,,gK),h=(h11,,h1K,,hr1,,hrK)andZ=(Z01,,Z0K,,Zr1,,ZrK).Themaintheoremofthissectionisasfollows. Theorem2.1. Considerthepriorp(a,g,h)_(rd=1a)]TJ /F7 8.966 Tf 6.97 0 Td[(1d)(Kk=1g)]TJ /F7 8.966 Tf 6.97 0 Td[(1k)p0(h),wherep0isanarbitrarybutproperprior.ThentheposteriorofhisthesameunderLPorLR. Proof. FollowingSeamanandRichardson(2004)orGhoshetal.(2006),IbeginwiththeaugmentedmodelZdkjldkindPoisson(ldk),where log(ldk)=logad+loggk+loghdk,d=1,,r,k=1,,K, log(l0k)=loggk,k=1,,K.(2) Thentheaugmentedlikelihoodisgivenby LA_exp()]TJ /F13 8.966 Tf 15.48 12.77 Td[(Kk=1gk 1+rd=1adhdk!)Kk=1grd=0Zdkkrd=1aKk=1Zdkdrd=1Kk=1hZdkdk.(2) Theposteriorbasedonthelikelihoodgivenin( 2 )andthepriorp(a,g,h)isgivenbyp(a,g,hjz)_exp()]TJ /F13 8.966 Tf 15.48 12.77 Td[(Kk=1gk 1+rd=1adhdk!)Kk=1grd=0Zdk)]TJ /F7 8.966 Tf 6.97 0 Td[(1krd=1aKk=1Zdk)]TJ /F7 8.966 Tf 6.96 0 Td[(1d rd=1Kk=1hZdkdk!p0(h). (2) Nowintegratingoutg,from( 2 ),onegetsp(h,ajz)_rd=1Kk=1(adhdk)ZdkKk=1 1+rd=1adhdk!)]TJ /F15 8.966 Tf 7.96 -.66 Td[(rd=0Zdk(rd=1a)]TJ /F7 8.966 Tf 6.97 0 Td[(1d)p0(h)=LP rd=1a)]TJ /F7 8.966 Tf 6.97 0 Td[(1d!p0(h), (2) by( 2 ).Nextintegratingouta,itfollowsfrom( 2 )thatp(g,hjZ)_exp )]TJ /F13 8.966 Tf 15.48 12.77 Td[(Kk=1gk!Kk=1grd=0Zdkkrd=1Kk=1hZdkdkKk=1g)]TJ /F7 8.966 Tf 6.97 0 Td[(1krd=1 Kk=1gkhdk!)]TJ /F15 8.966 Tf 7.96 -.66 Td[(Kk=1Zdkp0(h). (2) 34

PAGE 35

Inowusethetransformationgk=yqk(k=1,,K)]TJ /F3 11.955 Tf 10.95 0 Td[(1)sothatgK=y(1)]TJ /F8 11.955 Tf 10.95 -.88 Td[(K)]TJ /F7 8.966 Tf 6.97 0 Td[(1k=1qk).Letq=(q1,,qK)]TJ /F7 8.966 Tf 6.96 0 Td[(1).TheJacobianoftransformationisthenyK)]TJ /F7 8.966 Tf 6.96 0 Td[(1.Nowfrom( 2 ),thejointposteriorof(y,q,h)isp(y,q,hjz)_exp()]TJ /F4 11.955 Tf 1 0 .167 1 180.51 -95.63 Tm[(y)yKk=1Z0k)]TJ /F7 8.966 Tf 6.97 0 Td[(1 rd=1Kk=1hZdkdk!Kk=1qZ0k)]TJ /F7 8.966 Tf 6.96 0 Td[(1krd=18<: Kk=1qZdkk! Kk=1qkhdk!)]TJ /F15 8.966 Tf 7.96 -.66 Td[(Kk=1Zdk9=;p0(h), (2) whereqK=1)]TJ /F8 11.955 Tf 10.95 -.88 Td[(K)]TJ /F7 8.966 Tf 6.97 0 Td[(1k=1qk.Nextintegratingouty,thejointposteriorofqandhisp(q,hjz)_ Kk=1qZ0kk!rd=18<:Kk=1 qkhdk Kl=1qlhdl!Zdk9=;(Kk=1q)]TJ /F7 8.966 Tf 6.97 0 Td[(1k)p0(h)=LR Kk=1q)]TJ /F7 8.966 Tf 6.97 0 Td[(1k!p0(h), (2) rewritinggk=yqk(k=1,,K)in( 2 ). Sincebothposteriorsgivenin( 2 )and( 2 )arederivedfromtheaugmentedmodelgivenin( 2 ),onegetsthedesiredconclusionthattheposteriorofhisthesamewhetheritisobtainedfromtheprospectivelikelihoodLPgivenin( 2 )ortheretrospectivelikelihoodgivenin( 2 ). Remark2.1.TheaboveresultgeneralizesSeamanandRichardson(2004)whoconsideredbinarydiseasestatus,i.e.d=0,1,andthelogitlinkdescribingPdk. Remark2.2.Itiseasytochecktheproprietyoftheposteriorfrom( 2 ).Ifp0(h)isproper,Rp(q,hjz)dqdhisnitewheneverZ0k1(k=1,,K),i.e.everyexposurecategoryconsistsofatleastonecontrol.Thisisbecausetherighthandsideofthepenultimatelinein( 2 )isboundedabovebythefunctionKk=1(qZ0k)]TJ /F7 8.966 Tf 6.96 0 Td[(1k)p0(h)whichclearlyhasaniteintegral. 35

PAGE 36

2.2MultiplicativeInterceptModel Multiplicativeinterceptmodelsarequitewidelyusedfortheanalysisofcategoricaldata,andinparticularfortheanalysisofcase-controldata.Forsuchamodel, Pdk=P(D=djX=xk)=expfrd+Q(xk,bd)g rm=0expfrm+Q(xk,bm)g.(2) Iassumewithoutlossofgeneralityr0=0andQ(xk,b0)=0forallk=1,,K.Hence,ad=Kl=1Pdl=P0l=Kl=1expfrd+Q(xl,bd)gsothat logad=rd+log"Kl=1expfQ(xl,bd)g#, and hdk=a)]TJ /F7 8.966 Tf 6.97 0 Td[(1d(Pdk=P0k)=expfQ(xk,bd)g Kl=1expfQ(xl,bd)g whichdoesnotdependona.Nowindependentuniform()]TJ /F8 11.955 Tf 9.29 0 Td[(,)priorsontherd(d=1,,r)andproperpriorsonbd(d=0,,r)resultinindependentuniform()]TJ /F8 11.955 Tf 9.29 0 Td[(,)priorsforthelogad(d=1,,r)orthepriorrd=1a)]TJ /F7 8.966 Tf 6.97 0 Td[(1dforaandaproperpriorforh.TheposteriorequivalenceresultofSection2.2nowholdsunderthisalternativeformulationaswell. Aspecialcaseof( 2 )isthegeneralmultinomiallogisticmodelwhereQ(xk,bd)=xTkbd.Thestereotypemodel(Anderson,1984)isanotherspecialcasewithQ(xk,bd)=fdxTkb,where0=f0f1>>fr)]TJ /F7 8.966 Tf 6.97 0 Td[(1>fr=0.Thisisoneofthesimplestmodelstoanalyzeordinaldataincludingtheadjacentcategorymodel(Agresti,1984,p.318)wherefd=(r)]TJ /F11 11.955 Tf 10.95 0 Td[(d)=r. 2.3PosteriorEquivalenceforStratiedCase-ControlData Inthissection,IextendtheresultsofSection2.2forstratiedcase-controldata.Theneedforstraticationoftenarisesincase-controlstudiestoeliminateanypotentialconfoundingeffects.Forinstance,oftenitisnecessarytostratifycase-controldatabasedongender,age,raceandethnicity.Anexampleofsuchstratieddataappearsin 36

PAGE 37

thenextsection.ForBayesiananalysisofmatchedcase-controldata,seeRice(2004,2008),Diggleetal.(2000),andGhoshandChen(2002). Ghoshetal.(2006)establishedthisposteriorequivalenceforbinarycase-controldataassumingalogisticregressionmodel.Onceagain,Iprovideatwo-foldextentionoftheirresults,rstbyconsideringmultipleandpossiblyordereddiseasestatus,andsecondbyconsideringanarbitrarylink,notrestrictedtothelogitlink. Tothisend,IconsiderTstrata,1,,TandIdenotethestratumindicatorbyS.AsinSection2.2,Ihave(r+1)diseasecategoriesd=0,1,,r,andKexposurecategoriesx1,,xK.Let Psdk=P(D=djS=s,X=xk),s=1,,T;d=0,1,,r;k=1,,K. Also,letP(X=xkjD=0,S=s)=gsk Kl=1gsl,s=1,,T. BytheSatten-Kupper(1993)techniqueonceagain, P(X=xkjS=s,D=d)=gskPsdkP)]TJ /F7 8.966 Tf 6.97 0 Td[(1sok Kl=1gslPsdlP)]TJ /F7 8.966 Tf 6.97 0 Td[(1sol. IfnowZsdkdenotethenumberofindividualswithD=dandX=xkinstratums,theprospectiveandtheretrospectivelikelihoodsaregivenrespectivelyby LP=Ts=1rd=0Kk=1PZsdksdk and LR=8<:Ts=1Kk=1 gsk Kl=1gsl!Zs0k9=;24Ts=1rd=1Kk=1(gskPsdkP)]TJ /F7 8.966 Tf 6.97 0 Td[(1sok Kl=1gslPsdlP)]TJ /F7 8.966 Tf 6.97 0 Td[(1sol)Zsdk35 NowconsiderasinSection2.2,thereparameterizationasd=Kl=1Psdl=Ps0landhsdk=a)]TJ /F7 8.966 Tf 6.97 0 Td[(1sd(Psdk=Ps0k),s=1,,T,d=1,,randk=1,,K.Rewritea=(a11,,a1r,,aT1,,aTr)andg=(g11,,g1K,,gT1,,gTK),h= 37

PAGE 38

(h111,,hTrK).Considerthepriorp(a,g,h)_ Ts=1rd=1a)]TJ /F7 8.966 Tf 6.97 0 Td[(1sd! Ts=1Kk=1g)]TJ /F7 8.966 Tf 6.97 0 Td[(1sk!p0(h), wherep0isaproperprior.Thenitturnsoutthattheposteriorofhisthesamewhetheritisgeneratedfromthereparameterizedretrospectivelikelihoodorthereparameterizedprospectivelikelihood,andtheproofissimilartotheoneinSection2.2. ForthemultiplicativeinterceptmodelinSection2.3,Psdk=expfrsd+Q(xk,bd)g=rm=0expfrsm+Q(xk,bm)g.Assumewithoutlossofgenerality,rs0=0foralls=1,,TandQ(xk,b0)=0forallk=1,,K.Thenlog(asd)=rsd+logKl=1expfQ(xl,bd)gandhsdkhdk=expfQ(xk,bd)g=Kl=1expfQ(xl,bd)g.Soassumingindependentuniform()]TJ /F8 11.955 Tf 9.28 0 Td[(,)priorforthersdandproperpriorsonbd(d=0,,r),onegetstheproposedpriorsforasdandh.TheposteriorequivalenceofhinthissectionholdsunderthisalternativepriorformulationasinSection2.3.IusethisalternativepriorformulationinSection2.5tododataanalysisforcertainstereotypemodels. 2.4Example:AnalysisofColorectalCancerData ThedatasetIconsiderisapopulation-basedcase-controlstudyofcolorectalcancerinIsraelfrom1998-2004.TheMolecularEpidemiologyofColorectalCancer(MECC)Studyisarichresourceofmolecular,environmental,dietaryandbehavioralriskfactorsalongwithbasicpersonalanddemographicinformation(Poynteretal.,2005).Inowapplytheresultsoftheprevioussectiontoanalyzeasubsetofthedatawith1,066casesand1,337controlswithstageinformationrecordedforthecases.CancerstagingiscarriedoutbasedonTumorNodesandMetastasis(TNM)criteria.Amongthecases,665areinstages1and2(d=1)and401areinstages3and4(d=2).Thusthenumberofdiseasecategoriesis3,withd=0standingforall1,337controls.Theobjectiveistoexploretheassociationbetweencolorectalcancerstagewiththefrequencyofgrilledredmeatintakeandsportsparticipation.Ethnicitymayalsohaveassociationwithcolorectalcancerandwastreatedasastrata.Irestrictedmyanalysistotwoethnicgroups, 38

PAGE 39

AshkenaziandSephardiwhichconstitutestratum1(s=1)with1,799observationsandstratum2(s=2)with604observationsrespectively.Theexposurevectorconsistsoftwovariables.FGRMIiscodedas1ifthesubjecthastakenatleastonegrilledredmeatportionperweekonaverageand0otherwise.Ialsouseabinaryvariable,SP,whosevalueis1ifthesubjecthasparticipatedinsportsorotherphysicalactivitiesand0otherwise.Thusthefourpossiblevaluesoftheexposurevectorwiththesetwodichotomousvariablesare(0,0),(0,1),(1,0),(1,1)whichIdenotebyx1,x2,x3,x4. Stereotypemodelsareusefulhereduetotheorderednatureofdiseasestatus.IusePsdk=exp(rsd+fdxTkb)=2m=0exp(rsm+fmxTkb)withb=(b1,b2)T,r10=r20=0,0=f0
PAGE 40

meanandthecovariancematrixinprior1isusedasapriorcovariancematrix.Detaileddescriptionforprior1-prior5isgiveninthenotebelowTable2-1.WiththesepriorstheequivalenceinSection2.4holds. IhavetomentionthattheMLEandcovariancematrixestimateusedinprior1andprior5areobtainedwithouttheconstraint0=f0
PAGE 41

iii) p(f1j)_exp(f12s=14k=1xTkbZs1k) 2s=12d=04k=12m=0exp(rsm+fmxTkb)ZsdkI(0f11). Duetothelog-concavityoftheconditionals,Iuseadaptiverejectionsamplingforallparameters.Foreachanalysis,atotalof50,000iterationswereperformed,andtheinitial30,000werediscardedasburnin.GraphicalanalysisoftheMCMCsampleindicatedsatisfactoryconvergencewhichwasalsoconrmedbythemethodproposedbyGeweke(1992).AsummaryofMCMCsamplesincludingposteriormeansand95%HPDregionsforb1,b2,f1andoddsratiosisgiveninTable2-1.Also,MLE'sand95%asymptoticcondenceintervalsaregiveninTable2-1forthepurposeofcomparison. Basedonmyanalysis,thefrequencyofgrilledredmeatintakehaspositiveassociationwithcolorectalcancercontrollingforsportsparticipationandethnicityregardlessofpriorsforb.Sportsparticipationisnegativelyassociatedwithcolorectalcancerwithanypriorforbwhenonecontrolsforthefrequencyofgrilledredmeatintakeandethnicity.TheseresultsareconsistentwiththosefromtheMLapproach.Inparticular,whenprior1isused,theposteriormeansforb1andb2areveryclosetotheMLE's.Also,theendpointsof95%HPDregionsforb1andb2areveryclosetothoseof95%asymptoticC.I.'s.However,forf1,theposteriormeanisalittledifferentfromtheMLE.The95%HPDregionforf1isincludedinthe95%asymptoticC.I.andmuchshorterthanthelatter.Theseresultsshowthatanalysiswithouttheconstraint0=f0
PAGE 42

Comparingresultsforprior1andprior5,IndthatwhenthepriorcovarianceforbisacovariancematrixestimateobtainedfromtheMLmethodmultipliedbythetotalnumberofobservations,theshiftofthepriormeanforbfrom0totheMLEhasalmostnoimpactonposteriormeansand95%HPDregionsforallparametersofinterest. Settingthepriorforbasprior5,Ialsodiddataanalysiswithbeta(2,5),beta(5,5),andbeta(5,2)priorsforf1.Table2-2showsthattheposteriormeansand95%HPDintervalsforb1andb2aresimilartothoseunderanuniformpriorforf1.However,theseposteriorsforf1varysomewhat. Themainobjectiveofthischapteristoshowthattheposteriorequivalenceoftheoddsratioparametersbasedonprospectiveandretrospectivelikelihoodsholdsinbothunmatchedandmatchedcase-controlstudiesforaclassofpriors.Idonotinsistthatonemustusethisclassofpriors.Forexample,withaproperlyelicitedprior,ifavailable,onecandoasubjectiveBayesiananalysis.Isimplymakethepointthatananalystcandoeitherprospectiveorretrospectiveinferenceonceheorsheusesthisclassofpriors.Also,givingacolorectalcancerstudydataanalysisasanexample,Ishowthatmypriorscanbeusedtoanalyzerealdata.TheresultsgeneralizethoseofSeamanandRichardson(2004)andGhosh,ZhangandMukherjee(2006).Themultiplicativeinterceptmodelisincludedasaspecialcase.TheseresultsareimportantforpractitionerswhoareplanningtoconductBayesiananalysisofcase-controldata. Theexposurevariablesareassumedtobediscrete.Onepossibleextensionistoconsiderthecaseofarbitraryexposures,discrete,continuous,ormixed. 42

PAGE 43

Table2-1. PosteriormeansandHPDregionsforb1,b2,f1,andoddsratioswithvariousnormalpriorsforb,andtheirMLE'sandasymptoticC.I.'s. prior1prior2prior3 posterior95%HPDposterior95%HPDposterior95%HPDmeanregionmeanregionmeanregion b1FGRMI1.08(0.78,1.37)1.08(0.79,1.37)0.82(0.54,1.16)b2SP-0.61(-0.81,-0.40)-0.61(-0.84,-0.41)-0.59(-0.79,-0.38)f1cancer-stage(CS)=0-20.85(0.70,1.00)0.86(0.70,1.00)0.89(0.74,1.00)exp(f1b1)oddsratioofFGRMIforCS=0-22.53(1.88,3.19)2.53(1.91,3.21)2.09(1.58,2.77)exp(f1b2)oddsratioofSPforCS=0-20.60(0.49,0.71)0.60(0.49,0.70)0.60(0.49,0.71)exp(b1)oddsratioofFGRMIforCS=3-42.99(2.16,3.88)2.98(2.13,3.88)2.29(1.69,3.15)exp(b2)oddsratioofSPforCS=3-40.54(0.43,0.66)0.55(0.43,0.66)0.56(0.45,0.68) prior4prior5ML posterior95%HPDposterior95%HPDMLE95%C.I.meanregionmeanregion b1FGRMI0.66(0.41,0.82)1.09(0.81,1.40)1.06(0.76,1.37)b2SP-0.57(-0.76,-0.38)-0.62(-0.83,-0.42)-0.61(-0.81,-0.40)f1cancer-stage(CS)=0-20.91(0.77,1.00)0.85(0.69,1.00)0.90(0.65,1.14)exp(f1b1)oddsratioofFGRMIforCS=0-21.83(1.45,2.10)2.55(1.91,3.24)2.59(1.91,3.28)exp(f1b2)oddsratioofSPforCS=0-20.60(0.50,0.71)0.59(0.49,0.72)0.58(0.47,0.69)exp(b1)oddsratioofFGRMIforCS=3-41.95(1.46,2.2)3.02(2.16,3.93)2.90(2.00,3.80)exp(b2)oddsratioofSPforCS=3-40.57(0.46,0.67)0.55(0.43,0.65)0.55(0.43,0.66) Note:N0B@0,26459)]TJ /F40 10.909 Tf 8.48 0 Td[(12)]TJ /F40 10.909 Tf 8.47 0 Td[(12263751CA,N0B@0,26450)]TJ /F40 10.909 Tf 8.48 0 Td[(20)]TJ /F40 10.909 Tf 8.47 0 Td[(20503751CA,N0B@0,2645000503751CA,N0B@0,264501010503751CA,N0B@2641.06)]TJ /F40 10.909 Tf 8.48 0 Td[(0.61375,26459)]TJ /F40 10.909 Tf 8.47 0 Td[(12)]TJ /F40 10.909 Tf 8.48 0 Td[(12263751CAareusedaspriorsofbforprior1-5respectively. 43

PAGE 44

Table2-2. Sensitivityanalysisunderpriorchoiceforf1:posteriormeansandHPDregionsforb1,b2,f1withbetapriorsforf1andprior5inTable2-1forb. priorforf1posteriormean95%HPDregion b1beta(5,2)1.10(0.81,1.40)beta(5,5)1.14(0.84,1.43)beta(2,5)1.15(0.86,1.45) b2beta(5,2)-0.62(-0.83,-0.41)beta(5,5)-0.65(-0.86,-0.44)beta(2,5)-0.66(-0.88,-0.44) f1beta(5,2)0.83(0.69,0.98)beta(5,5)0.75(0.60,0.89)beta(2,5)0.72(0.57,0.87) 44

PAGE 45

CHAPTER3FREQUENTISTPROSPECTIVEAPPROACHINGENERALCASE-CONTROLFRAMEWORK Bynature,case-controlstudiesareretrospectivesinceforrarediseasesuchascancer,itisimpracticaltofollowacohortforanextendedperiod.Instead,onceapersonisdiagnosedwithadisease,onecanusuallytrackdownacontrolgroupofpeoplewhobearsimilartraitsastheindividualwiththedisease,forexamplethoselivinginthesamecommunity,belongingtothesameagegroup,andinfacthavinganyothersimilarauxiliaryinformation.Accordingly,anystatisticalinferenceforcase-controlproblemsneedstobebasedonretrospectivelikelihood(derivedfromtheconditionaldistributionoftheexposuregiventhedisease)ratherthantheprospectivelikelihood(drivedfromtheconditionaldistributionofthediseasegiventheexposure). Inferenceproblemwithretrospectivelikelihoodsismoredifcultthanonewithprospectivelikelihoodsbecausetheretrospectivemodelinvolvesamarginalpdfoftheexposureaswellastheprospectivemodel.Henceifonecanshowthattheinferencefromtheprospectivelikelihoodprovidesequivalentanswersfromtheretrospectivelikelihood,theinferenceproblemwillbecomeeasierinacase-controlstudy. PrenticeandPyke(1979),intheirseminalpaper,showedthatmaximumlikelihoodestimationofoddsratiosbasedonamultivariatelogisticmodelforthediseasestatesremainsthesameforaprospectiveandaretrospectivelikelihood.TheyprovedalsoconsistencyandasymptoticnormalityoftheMLEundertheretrospectivemodel.ScottandWild(2001)extendedtheworkofPrenticeandPyke(1979)toanyarbitrarybinaryregressionmodel.InSection3.2,IextendScottandWild(2001)tomultiplediseasestates.Chen(2003)pointedoutthatcertainparametersarenotidentiedincase-controlstudiesinageneralframework.InSection3.3,Iprovethenon-identiabilityofcertainparametersinadifferentwayfromChen(2003).InSection3.4,ImodifymyapproachinSection3.2overcomingthenon-identiabilityproblem.Themodiedmodelcoincideswiththebase-linelogitmodelofChen(2003). 45

PAGE 46

3.1AProspectiveLikelihoodApproachforGeneralModelsinCase-ControlStudies Inthissection,IpresentaprospectivelikelihoodapproachtoprovidetheMLEfromtheretrospectivelikelihoodforgeneralmodels.TheconsistencyandasymptoticnormalityoftheMLEareshownaswell.Also,arelativelysimpleexpressionfortheasymptoticcovariancematrixoftheMLEisgiven. 3.1.1AProspectiveLikelihoodApproachtoObtainMLEfromRetrospectiveLikelihood Letddenotethediseasestateandzthevectorofcovariates.zcanbediscrete,continuousormixed.SupposedjzPdz(q)(d=0,,r),zm(z)andnddenotessamplesizeforthedthdiseasestate(d=0,,r)ascollectedinacase-controlstudy.Letn=rd=0nd.Theparameterqisunknown,andm(z)isleftunspecied.ByBayesrule,theretrospectivepdfP(zjd)=Pdz(q)m(z)=ZPdz(q)m(z)m(dz) wheremisas)]TJ /F1 11.955 Tf 9.29 0 Td[(nitemeasure.Thendenotingthedenominatorbyqd,onecanwritetheretrospectivelikelihoodasLR=rd=0ndi=0[Pdzdi(q)m(zdi)=qd]. Nextdene pfd(z)=exp(rd)Pdz(q)=rl=0exp(rl)Plz(q)(3) wherer0=0andf=(q,r1,,rr).Herepfd(z)canbeviewedasaprospectivepdfsincerd=0pfd(z)=1.Thenthelikelihoodbasedontheaboveprospectivepdfpfd(z)canbewrittenas LP=rd=0ndi=1pfd(zdi).(3) NowIshowthatasemiparametricmaximumlikelihoodestimatorforqfromLRisequivalenttothatfromLP.Lettingrd=log(ndq)]TJ /F7 8.966 Tf 6.97 0 Td[(1d=n0q)]TJ /F7 8.966 Tf 6.97 0 Td[(10)(d=0,,r)andh(z)= 46

PAGE 47

m(z)rl=0(nl=nql)Plz(q),onegets LR=LPrd=0ndi=1h(zdi)=(ndn)]TJ /F7 8.966 Tf 6.96 0 Td[(1).(3) Notethath(z)doesnotinvolvefandRh(z)m(dz)=rl=0nl=n=1sothath()isaprobabilityfunction.Theotherthingtonoteisthatfandhmustsatisfytherelation Zpfd(z)h(z)m(dz)=nd=n(d=0,,r).(3) Inparticular,theirMLE'smustsatisfythisrelationshipaswell. Itakeasemiparametricapproachestimatingfparametricallyandhnonparametrically.Letz1,,zkdenotetheobservedvaluesofz.Also,lethk=h(zk)andnzk=numberofobservationswithz=zk.Thenthelog-likelihood`R=logLRisgivenby`R=log(LP)+Kk=1nzkloghk+nlogn)]TJ /F13 8.966 Tf 17.65 12.78 Td[(rd=0ndlognd. Notethat`R=f=logLP=f and`R=hk=nzk=hk)]TJ /F11 11.955 Tf 10.95 0 Td[(nzK=hK(k=1,,K)]TJ /F3 11.955 Tf 10.95 0 Td[(1). Hence,ifthesolutionofffromlogLP=f=0,say^f,andthesolutionofh()from`R=hk=0,k=1,,K)]TJ /F3 11.955 Tf 11.52 0 Td[(1,say^h(),satisfytheconstraintsin( 3 ),thenIamdone.First,solving`R=hk=0,^h()isgivenbyadiscretedensityhavingmassconcentratedatz1,,zKwithprobabilities^hk=nzk=n,k=1,,K.Second,notingthatlogLP rm=nm)]TJ /F8 11.955 Tf 10.95 -.88 Td[(rd=0ndi=1pfm(zdi),m=1,...r,onegets rd=0ndi=1p^fm(zdi)=n=nm=n.(3) 47

PAGE 48

ObservingthatKk=1p^fm(zk)^hk=rd=0ndi=1p^fm(zdi)=n,itfollowsthat^fand^h()satisfytheconstraintsin( 3 ). 3.1.2ConsistencyandAsymptoticNormalityofMLE Supposelimn!nd=n=md(d=0,,r)andletq0bethetruevalueofq.Dener0d=log[md(q0d))]TJ /F7 8.966 Tf 6.97 0 Td[(1=m0(q00))]TJ /F7 8.966 Tf 6.96 0 Td[(1]withq0d=RPdz(q0)m(z)m(dz)ford=0,1,,r,andf0=(q0,r01,,r0r).LetTn(f)=n)]TJ /F7 8.966 Tf 6.96 0 Td[(1rd=0ndi=1 fflogpfd(zdi)gandT(1)n(f)=((T(1)nss0(f)))=n)]TJ /F7 8.966 Tf 6.97 0 Td[(1rd=0ndi=12 ffTflogpfd(zdi)g.ThroughoutIusejjastheEuclideannorm.Iassumethat (a)2 ffTlog(pfd(z))existsandiscontinuousforallf; (b)I(f)=((Iss0(f)))=rd=0mdEqzjd[)]TJ /F18 8.966 Tf 1 0 .167 1 222.72 -211.54 Tm[(2 ffTlogpfd(z)]isnite,continuousinfandpositivedeniteforallf; (c)jEq0zjd[ flogpfd(z)jf=f0]j0andd>0,Pq0zjd[supf2Ud=fjf)]TJ /F18 8.966 Tf 1 0 .167 1 270.33 -446.6 Tm[(f0jdgjT(1)nss0(f)+Iss0(f)j>e]!0asn!foralls,s0. Inordertoprove(i),Ibeginwithjrd=0(nd=n)]TJ /F4 11.955 Tf 1 0 .167 1 161.89 -527.22 Tm[(md)Eq0zjd[ flogpfd(z)jf=f0]jmax0drjnd=n)]TJ /F4 11.955 Tf 1 0 .167 1 182.6 -563.24 Tm[(mdjrd=0jEq0zjd[ flogpfd(z)jf=f0]j!0, (3) 48

PAGE 49

byvirtueof(c)and(d).Nextobservethatfort=1,,r,rd=0mdEq0zjd[ rtlogpfd(z)jf=f0]=rd=0mdI[d=t])]TJ /F13 8.966 Tf 17.64 12.78 Td[(rd=0mdZmt(q0t))]TJ /F7 8.966 Tf 6.97 0 Td[(1Ptz(q0) rl=0ml(q0l))]TJ /F7 8.966 Tf 6.96 0 Td[(1Plz(q0)Pdz(q0) q0dm(z)m(dz)=mt)]TJ /F4 11.955 Tf 1 0 .167 1 102.35 -115.62 Tm[(mt(q0t))]TJ /F7 8.966 Tf 6.97 0 Td[(1ZPtz(q0)m(z)m(dz)=mt)]TJ /F4 11.955 Tf 1 0 .167 1 297.41 -115.62 Tm[(mt(q0t))]TJ /F7 8.966 Tf 6.97 0 Td[(1q0t=0. (3) Also,rd=0mdEq0zjd[ qlog(pfd(z))jf=f0]=rd=0mdZ qPdz(q)jq=q0 Pdz(q0)Pdz(q0) q0dm(z)m(dz))]TJ /F13 8.966 Tf 17.65 12.78 Td[(rd=0mdZrl=0ml(q0l))]TJ /F7 8.966 Tf 6.96 0 Td[(1 qPlz(q)jq=q0 rl=0ml(q0l))]TJ /F7 8.966 Tf 6.96 0 Td[(1Plz(q0)Pdz(q0) q0dm(z)m(dz)=rd=0md(q0d))]TJ /F7 8.966 Tf 6.97 0 Td[(1f qZPdz(q)m(z)m(dz)gjq=q0)]TJ /F13 8.966 Tf 17.65 12.77 Td[(rd=0mdZrl=0ml(q0l))]TJ /F7 8.966 Tf 6.96 0 Td[(1 qPlz(q)jq=q0 rl=0ml(q0l))]TJ /F7 8.966 Tf 6.96 0 Td[(1Plz(q0)Pdz(q0) q0dm(z)m(dz)=rd=0md(q0d))]TJ /F7 8.966 Tf 6.97 0 Td[(1 qfZPdz(q)m(z)m(dz)gjq=q0)]TJ /F13 8.966 Tf 16.3 12.78 Td[(rl=0ml(q0l))]TJ /F7 8.966 Tf 6.96 0 Td[(1 qfZPlz(q)m(z)m(dz)gjq=q0=0. (3) (i)followsnowfrom( 3 )-( 3 ). Inordertoprove(ii),Ibeginwiththeinequalitysupf2UdjT(1)nss0(f)+Iss0(f)j=supf2UdjT(1)nss0(f))]TJ /F11 11.955 Tf 10.95 0 Td[(T(1)nss0(f0)+T(1)nss0(f0)+Iss0(f0))]TJ /F11 11.955 Tf 10.95 0 Td[(Iss0(f0)+Iss0(f)jsupf2UdjT(1)nss0(f))]TJ /F11 11.955 Tf 10.95 0 Td[(T(1)nss0(f0)j+jT(1)nss0(f0)+Iss0(f0)j+supf2UdjIss0(f))]TJ /F11 11.955 Tf 10.95 0 Td[(Iss0(f0)j. (3) Byassumption(a),T(1)n(f)iscontinuousinf,andsoitisuniformlycontinuousinfintheclosedneighborhoodUdoff.Byassumption(b),thesameistrueforI(f).Also, 49

PAGE 50

T(1)nss0(f0)Pq0zjd!Iss0(f0)bythelawoflargenumbers.Hencetherighthandsideof( 3 )convergestozeroinPq0zjdprobability.Thisproves(ii).TheproofofTheorem3.1isnowcomplete. Theorem3.2showsthat^fisasymptoticallynormallydistributedwiththesandwichmatrixasitscovariancematrix.Then,^qisalsoasymptoticallynormallydistributed. Theorem3.2. AssumetheconditionsofTheorem3.2with(d)strengthenedtond=n)]TJ /F4 11.955 Tf 1 0 .167 1 .3 -171.9 Tm[(md=o(n)]TJ /F7 8.966 Tf 6.97 0 Td[(1=2)(d=0,,r).Also,leteachelementofEq0zjd[jf flog(pfd(z))jf=f0gf fTlog(pfd(z))jf=f0gj],(d=0,,r),benite.Then n1=2(^f)]TJ /F4 11.955 Tf 1 0 .167 1 198.91 -231.67 Tm[(f0)d!N(0,B)]TJ /F7 8.966 Tf 6.97 0 Td[(1AB)]TJ /F7 8.966 Tf 6.97 0 Td[(1)(3) whereA=rd=0mdEq0zjd[f flog(pfd(z))jf=f0gf fTlog(pfd(z))jf=f0g] )]TJ /F13 8.966 Tf 17.32 10.5 Td[(rd=0mdEq0zjd[ flog(pfd(z))jf=f0]Eq0zjd[ fTlog(pfd(z))jf=f0]and B=rd=0mdEq0zjd[)]TJ /F18 8.966 Tf 1 0 .167 1 124.23 -319.77 Tm[(2 ffTlogpfd(z)jf=f0]. Proof. ATaylorexpansionof1 nrd=0ndi=1 flog(pfd(zdi))jf=^faroundf0gives )-138(f1 nrd=0ndi=12 ffTlog(pfd(zdi))jf=fgp n(^f)]TJ /F4 11.955 Tf 1 0 .167 1 247.6 -403.03 Tm[(f0)=p nf1 nrd=0ndi=1 flog(pfd(zdi))jf=f0g(3) wherej^f)]TJ /F4 11.955 Tf 1 0 .167 1 59.97 -452.9 Tm[(fj
PAGE 51

1 p nEq0zjd[ flog(pfd(z))jf=f0].Iwillshowthat rd=0ndi=1Yndid!N(0,A).(3) Inordertoprove( 3 ),IwillusetheLindberg-FellerCentralLimitTheorem.NotethatEq0zjd[Yndi]=0andlimn!rd=0ndi=1V(Yndi)=A.Also,theLindbergconditionisequivalentto limn!rd=0nd nEq0zjd[jYnd1j2IfjYnd1j>eg]=0foreverye>0.(3) Toprove( 3 ),rstobservethatjYnd1j2IfjYnd1j>egjYnd1j22=njf flog(pfd(z))jf=f0gj2 +2=njEq0zjd[ flog(pfd(z))jf=f0]j2a.s.!0asn!.Second,Eq0zjd[jYnd1j2] Eq0zjd[jf flog(pfd(z))jf=f0gj2]eg]!0ford=0,,r.Alsosincend n!md(d=0,,r),onegets( 3 ). NextIwillshowthat 1 p nrd=0ndi=1 flog(pfd(zdi))jf=f0d!N(0,A).(3) Inordertoprove( 3 ),notethatrd=0ndi=1Yndi=p nrd=0ndi=1f1 n flog(pfd(zdi))jf=f0g)]TJ 18.58 9.45 Td[(p nrd=0fnd nEq0zjd[ flog(pfd(z))jf=f0]g. (3) Sincend n)]TJ /F4 11.955 Tf 1 0 .167 1 59.73 -484.82 Tm[(md=o(n)]TJ /F7 8.966 Tf 6.96 0 Td[(1=2)andEq0zjd[ flog(pfd(z))jf=f0]isnite,p nrd=0(nd n)]TJ /F4 11.955 Tf 1 0 .167 1 398.63 -484.82 Tm[(md)Eq0zjd[ flog(pfd(z))jf=f0]!0asn!.Alsord=0mdEq0zjd[ flog(pfd(z))jf=f0]=0.Hence, p nrd=0fnd nEq0zjd[ flog(pfd(z))jf=f0]g!0asn!.(3) From( 3 ),( 3 )and( 3 ),onegets( 3 ).Also( 3 ),( 3 )and( 3 )leadto( 3 ). 51

PAGE 52

3.1.3AsymptoticCovarianceMatrixoftheMLE Inthissubsection,Iwillshowthattheasymptoticcovariancematrixof^qisequivalenttothe(rr)upper-leftblockofB)]TJ /F7 8.966 Tf 6.97 0 Td[(1.Ibeginwithprovingamathematicallemma.IpartitionA=0B@A11A12A21A221CA,B=0B@B11B12B21B221CAandB)]TJ /F7 8.966 Tf 6.97 0 Td[(1=0B@B11B12B21B221CA.WriteG=B)]TJ /F7 8.966 Tf 6.96 0 Td[(1AB)]TJ /F7 8.966 Tf 6.97 0 Td[(1=0B@G11G12G21G221CA.LetB11.2=B11)]TJ /F11 11.955 Tf 11.05 0 Td[(B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122B21.Thenthefollowinglemmaholds. Lemma3.1B11=G11ifandonlyif A11)]TJ /F11 11.955 Tf 10.95 0 Td[(B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122A21)]TJ /F11 11.955 Tf 10.95 0 Td[(A12B)]TJ /F7 8.966 Tf 6.96 0 Td[(122B21+B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122A22B)]TJ /F7 8.966 Tf 6.97 0 Td[(122B21=B11.2.(3) Proof. First,IwriteG=264B11B12B21B22375264A11A12A21A22375264B11B12B21B22375 (3)=264B11A11+B12A21B11A12+B12A22B21A11+B22A21B21A12+B22A22375264B11B12B21B22375. Itfollowsfrom( 3 )that G11=B11A11B11+B12A21B11+B11A12B21+B12A22B21.(3) Hencefrom( 3 ),G11=B11ifandonlyif A11+(B11))]TJ /F7 8.966 Tf 6.97 0 Td[(1B12A21+A12B21(B11))]TJ /F7 8.966 Tf 6.97 0 Td[(1+(B11))]TJ /F7 8.966 Tf 6.96 0 Td[(1B12A22B21(B11))]TJ /F7 8.966 Tf 6.96 0 Td[(1=(B11))]TJ /F7 8.966 Tf 6.96 0 Td[(1.(3) ItfollowsfromExercise2.7,p.33ofRao(1973)thatB12=)]TJ /F11 11.955 Tf 9.29 0 Td[(B11B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122sothat(B11))]TJ /F7 8.966 Tf 6.97 0 Td[(1B12=)]TJ /F11 11.955 Tf 9.29 0 Td[(B12B)]TJ /F7 8.966 Tf 6.96 0 Td[(122.Hence,from( 3 ),onegetsA11)]TJ /F11 11.955 Tf 10.95 0 Td[(B12B)]TJ /F7 8.966 Tf 6.96 0 Td[(122A21)]TJ /F11 11.955 Tf 10.95 0 Td[(A12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122B21+B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122A22B)]TJ /F7 8.966 Tf 6.97 0 Td[(122B21=B11.2 52

PAGE 53

sinceB11=B)]TJ /F7 8.966 Tf 6.97 0 Td[(111.2. Corollary3.1LetA=B)]TJ /F11 11.955 Tf 11.26 0 Td[(D,whereD=0B@D11D12D21D221CA,D22ispositivedeniteandD11.2=D11)]TJ /F11 11.955 Tf 10.95 0 Td[(D12D)]TJ /F7 8.966 Tf 6.96 0 Td[(122D21=0.ThenG11=B11ifandonlyifB12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122=D12D)]TJ /F7 8.966 Tf 6.97 0 Td[(122. Proof. SinceA=B)]TJ /F11 11.955 Tf 11.14 0 Td[(D,IrewritethenecessaryandsufcientconditionofLemma3.1asB11)]TJ /F11 11.955 Tf 10.95 0 Td[(D11)]TJ /F11 11.955 Tf 10.95 0 Td[(B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122(B21)]TJ /F11 11.955 Tf 10.95 0 Td[(D21))]TJ /F3 11.955 Tf 10.95 0 Td[((B12)]TJ /F11 11.955 Tf 10.94 0 Td[(D12)B)]TJ /F7 8.966 Tf 6.97 0 Td[(122B21+B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122(B22)]TJ /F11 11.955 Tf 10.95 0 Td[(D22)B)]TJ /F7 8.966 Tf 6.97 0 Td[(122B21=B11.2 orequivalently)]TJ /F3 11.955 Tf 9.29 0 Td[((B12B)]TJ /F7 8.966 Tf 6.96 0 Td[(122)]TJ /F11 11.955 Tf 10.94 0 Td[(D12D)]TJ /F7 8.966 Tf 6.96 0 Td[(122)D22(B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122)]TJ /F11 11.955 Tf 10.95 0 Td[(D12D)]TJ /F7 8.966 Tf 6.96 0 Td[(122)T)]TJ /F3 11.955 Tf 10.94 0 Td[((D11)]TJ /F11 11.955 Tf 10.94 0 Td[(D12D)]TJ /F7 8.966 Tf 6.96 0 Td[(122D21)=0. SinceD12D)]TJ /F7 8.966 Tf 6.97 0 Td[(122D21=D11,theconditionsimpliesto(B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122)]TJ /F11 11.955 Tf 10.95 0 Td[(D12D)]TJ /F7 8.966 Tf 6.97 0 Td[(122)D22(B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122)]TJ /F11 11.955 Tf 10.95 0 Td[(D12D)]TJ /F7 8.966 Tf 6.97 0 Td[(122)T=0. ButD22beingpositivedenite,onemusthaveB12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122=D12D)]TJ /F7 8.966 Tf 6.97 0 Td[(122. Applyingtheabovecorollary,Iwillprovethe(rr)upper-leftblockofB)]TJ /F7 8.966 Tf 6.97 0 Td[(1AB)]TJ /F7 8.966 Tf 6.97 0 Td[(1in( 3 )isequivalenttothe(rr)upper-leftblockofB)]TJ /F7 8.966 Tf 6.97 0 Td[(1.RecallthatA=rd=0mdEq0zjd[f flog(pfd(z))jf=f0gf fTlog(pfd(z))jf=f0g])]TJ /F13 8.966 Tf 17.32 12.77 Td[(rd=0mdEq0zjd[ flog(pfd(z))jf=f0]Eq0zjd[ fTlog(pfd(z))jf=f0] andB=rd=0mdEq0zjd[)]TJ /F4 11.955 Tf 1 0 .167 1 232.33 -534.31 Tm[(2 ffTlogpfd(z)jf=f0]. Writingcd(f0)= flog(pfd(z))jf=f0, A=rd=0mdEq0zjd[cd(f0)cd(f0)T])]TJ /F13 8.966 Tf 17.65 12.78 Td[(rd=0mdEq0zjd[cd(f0)]Eq0zjd[cd(f0)T](3) 53

PAGE 54

and B=)]TJ /F13 8.966 Tf 17.32 12.78 Td[(rd=0mdEq0zjd[2 ffTpfd(z)jf=f0=pf0d(z)]+rd=0mdEq0zjd[cd(f0)cd(f0)T].(3) Moreover,rd=0mdEq0zjd[2 ffTpfd(z)jf=f0=pf0d(z)]=rd=0mdZ2 ffTpfd(z)jf=f0 (md=q0d)Pdz(q0)frl=0(ml=q0l)Plz(q0)gPdz(q0)m(z) q0dm(dz)=rd=0Zf2 ffTpfd(z)jf=f0gfrl=0(ml=q0l)Plz(q0)gm(z)m(dz)=Zf2 ffT(rd=0pfd(z))jf=f0gfrl=0(ml=q0l)Plz(q0)gm(z)m(dz)=0. HenceA=B-DwhereD=rd=0mdEq0zjd[cd(f0)]Eq0zjd[cd(f0)T]. Supposeqisakdimensionalvector.Writecd(f0)=(c1d,,ckd,ck+1d,,ck+rd)TandpartitionDas0B@D11D12D21D221CA,whereD11isa(kk)matrixwith(i,j)thelementequaltord=0mdEq0zjd[cid]Eq0zjd[cjd](1i,jk),D21=DT12isa(rk)matrixwith(i,j)thelementequaltord=0mdEq0zjd[ci+kd]Eq0zjd[cjd](1ir,1jk)andD22isan(rr)matrixwith(i,j)thelementequaltord=0mdEq0zjd[ci+kd]Eq0zjd[cj+kd](1ir,1jr). Theorem3.3. D22ispositivedenite,D11=D12D)]TJ /F7 8.966 Tf 6.97 0 Td[(122D21andB12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122=D12D)]TJ /F7 8.966 Tf 6.97 0 Td[(122. Proof. WritingE1asa(kr)matrixwith(i,d)thelementequaltomdEq0zjd[cid],1ik,1dr,E2asa(rr)matrixwith(i,d)thelementequaltomdEq0zjd[ci+kd],1ir,1dr,onecanrewrite(i,j)thelementofD11asrd=1(mdEq0zjd[cid])(m)]TJ /F7 8.966 Tf 6.97 0 Td[(1d)(mdEq0zjd[cjd])+(m0Eq0zj0[ci0])(m)]TJ /F7 8.966 Tf 6.97 0 Td[(10)(m0Eq0zj0[cj0])=(i,j)thelementofE1fDiag(m)]TJ /F7 8.966 Tf 6.97 0 Td[(11,,m)]TJ /F7 8.966 Tf 6.97 0 Td[(1r)+(m)]TJ /F7 8.966 Tf 6.97 0 Td[(10)1r1TrgET1, 54

PAGE 55

sincem0Eq0zj0[ci0]=)]TJ /F8 11.955 Tf 10.62 -.89 Td[(rd=1mdEq0zjd[cid]=ithelementof)]TJ /F11 11.955 Tf 9.29 0 Td[(E11r,1rbeingardimensionalcolumnvectorofall1's.ThusD11=E1fDiag(m)]TJ /F7 8.966 Tf 6.97 0 Td[(11,,m)]TJ /F7 8.966 Tf 6.97 0 Td[(1r)+(m)]TJ /F7 8.966 Tf 6.97 0 Td[(10)1r1TrgET1.SimilarlyD21=DT12=E2fDiag(m)]TJ /F7 8.966 Tf 6.97 0 Td[(11,,m)]TJ /F7 8.966 Tf 6.97 0 Td[(1r)+(m)]TJ /F7 8.966 Tf 6.97 0 Td[(10)1r1TrgET1,andD22=E2fDiag(m)]TJ /F7 8.966 Tf 6.97 0 Td[(11,,m)]TJ /F7 8.966 Tf 6.97 0 Td[(1r)+(m)]TJ /F7 8.966 Tf 6.97 0 Td[(10)1r1TrgET2.HenceD12D)]TJ /F7 8.966 Tf 6.96 0 Td[(122D21=D11andD22ispositivedenite. NextIshowthatB12=E1andB22=E2.Notingthatcj+kd=I(d=j))]TJ /F11 11.955 Tf 11.21 0 Td[(pf0j(z)(j=1,...r),the(i,j)thelementofB12=rd=0mdEq0zjd[cidcj+kd]=mjEq0zjj[cij])]TJ /F13 8.966 Tf 16.98 10.5 Td[(rd=0mdEq0zjd[cidpf0j(z)]=(i,j)thelementofE1)]TJ /F13 8.966 Tf 17.65 10.5 Td[(rd=0mdEq0zjd[cid(f0)pf0j(z)],for1ik,1jr.However,rd=0mdEq0zjd[cd(f0)pf0j(z)]=rd=0Z( fpfd(z)jf=f0)frl=0(ml=q0l)Plz(q0)gpf0j(z)m(z)m(dz)=Zf f(rd=0pfd(z))jf=f0gfrl=0(ml=q0l)Plz(q0)gpf0j(z)m(z)m(dz)=0. HenceB12=E1.Similarly,B22=E2.NowobservethatD12D)]TJ /F7 8.966 Tf 6.97 0 Td[(122=E1fDiag(m)]TJ /F7 8.966 Tf 6.97 0 Td[(11,,m)]TJ /F7 8.966 Tf 6.97 0 Td[(1r)+(m)]TJ /F7 8.966 Tf 6.97 0 Td[(10)1r1TrgET2(ET2))]TJ /F7 8.966 Tf 6.97 0 Td[(1fDiag(m)]TJ /F7 8.966 Tf 6.97 0 Td[(11,,m)]TJ /F7 8.966 Tf 6.97 0 Td[(1r)+(m)]TJ /F7 8.966 Tf 6.97 0 Td[(10)1r1Trg)]TJ /F7 8.966 Tf 6.97 0 Td[(1E)]TJ /F7 8.966 Tf 6.97 0 Td[(12=E1E)]TJ /F7 8.966 Tf 6.96 0 Td[(12=B12B)]TJ /F7 8.966 Tf 6.97 0 Td[(122. ByCorollary3.1,itfollowsfromTheorem3.3thatthe(rr)upper-leftblockofB)]TJ /F7 8.966 Tf 6.97 0 Td[(1AB)]TJ /F7 8.966 Tf 6.97 0 Td[(1isequivalenttothe(rr)upper-leftblockofB)]TJ /F7 8.966 Tf 6.96 0 Td[(1. 3.2NonidentiablityofParametersNotInvolvedinOddsRatios Forthemultiplicativeinterceptmodel P(djz)=expfad+Qd(z,b)g=rl=0expfal+Ql(z,b)g(d=0,,r),(3) witha0=0andQ0(z,b)=0forallz,theprospectivepseudomodelinSection3.2reducesto pfd(z)=expf(rd+ad)+Qd(z,b)g=rl=0expf(rl+al)+Ql(z,b)g(d=0,,r). 55

PAGE 56

Here,a1,,,ararenotseparatelyestimablefromr1,,rr.Infact,a1,,,ararenotidentiablewithcase-controlstudydata.Ipresentnowageneraltheoremaddressingthisnonidentiabilityissue. FirstIwritePdz(q)as Pdz(q)=zd(q)xdz(q) lzl(q)xlz(q),(3) wherezd(q)=Pd0(q)=P00(q)andxdz(q)=fPdz(q)=P0z(q)gfPd0(q)=P00(q)g)]TJ /F7 8.966 Tf 6.97 0 Td[(1(d=0,,r).Notethatz0(q)=x0z(q)=1forallz.Letq2denoteasetofparametersinx1z(q),,xrz(q)andq1denoteasetofparametersinvolvedinz1(q),,zr(q)butnotinx1z(q),,xrz(q).q=(q1,q2).q1andq2aredisjoint.ThenPdz(q)in( 3 )canbewrittenas Pdz(q1,q2)=zd(q1,q2)xdz(q2) lzl(q1,q2)xlz(q2).(3) Inthissection,Iwillshowthatq1isnotidentiableincase-controlstudies.Morespecically,Iwillshowthatforagivenq1andm(z),thereexistsaq016=q1andapdfm0(z)suchthattheretrospectivemodelwithPdz(q01,q2)andm0(z)equalsonewithPdz(q1,q2)andm(z).Iwillusethepseudomodelpfd(z)inSection3.2toprovethis.Recallthatf=(q,r)=(q1,q2,r)wherer=(r1,,rr).Usingpfd(z)andm(z),aretrospectivemodelcanbewrittenasPr(zjd,f,m)=pfd(z)m(z) Rpfd(z)m(z)m(dz)=pfd(z)m(z) Pr(D=djf,m), sinceRpfd(z)m(z)m(dz)=Pr(D=djf,m).Thenthefollowinglemmaholds. Lemma3.2Foranyq,r=0,m(z),0
PAGE 57

andPr(D=djq,r0,m0)=fd(d=0,,r). Proof. Letr0d=log(fdq)]TJ /F7 8.966 Tf 6.97 0 Td[(1d=f0q)]TJ /F7 8.966 Tf 6.97 0 Td[(10)(d=0,r)withqd=RPdz(q)m(z)m(dz),andm0(z)=m(z)rl=0flq)]TJ /F7 8.966 Tf 6.97 0 Td[(1lPlz(q).Firstwrite Pr(zjd,q,r0,m0)=pf0d(z)m0(z) Rpf0d(z)m0(z)m(dz)(3) lettingf0=(q,r0).Forthenumeratorof( 3 ),pf0d(z)m0(z)=[(fdq)]TJ /F7 8.966 Tf 6.97 0 Td[(1d=f0q)]TJ /F7 8.966 Tf 6.97 0 Td[(10)Pdz(q)=rl=0(flq)]TJ /F7 8.966 Tf 6.97 0 Td[(1l=f0q)]TJ /F7 8.966 Tf 6.97 0 Td[(10)Plz(q)]m(z)rl=0(flq)]TJ /F7 8.966 Tf 6.97 0 Td[(1l)Plz(q)=fdq)]TJ /F7 8.966 Tf 6.96 0 Td[(1dPdz(q)m(z). (3) SinceRPdz(q)m(z)m(dz)=qd, Zpf0d(z)m0(z)m(dz)=fd.(3) From( 3 )( 3 ),Pr(zjd,q,r0,m0)=Pdz(q)m(z)=qd=Pdz(q)m(z)=ZPdz(q)m(z)m(dz)=Pr(zjd,q,r=0,m), sincep(q,r=0)d(z)=Pdz(q).From( 3 ),Pr(D=djq,r0,m0)=Zpf0d(z)m0(z)m(dz)=fd. Notingthatp(q,r=0)d(z)=Pdz(q),onecannotethatthecombinationofpf0d(z)andm0(z)producesthesameretrospectivemodeltothatfromPdz(q)andm(z)bythe 57

PAGE 58

abovelemma.Inotherwords, pf0d(z)m0(z) Rpf0d(z)m0(z)m(dz)=Pdz(q)m(z) RPdz(q)m(z)m(dz).(3) NowIwillshowthatthereexistsaq016=q1suchthat Pdz(q01,q2)=pf0d(z).(3) Onceitisprovedthat( 3 )holdsforsomeq01,itfollowsfrom( 3 )that Pdz(q01,q2)m0(z) RPdz(q01,q2)m0(z)m(dz)=Pdz(q1,q2)m(z) RPdz(q1,q2)m(z)m(dz),(3) whichimpliesthatq1isnotidentiableintheretrospectivemodel,accordinglyincase-controlstudieswhenm(z)isunknown. Toprovetheexistenceofq01satisfying( 3 ),Iwritepf0d(z)as pf0d(z)=exp(r0d)zd(q1,q2)xdz(q2) lexp(r0l)zl(q1,q2)xlz(q2),(3) usingtheexpressionforPdz(q)in( 3 ).Ifthereexistsomeq016=q1suchthatzd(q01,q2)=exp(r0d)zd(q1,q2)ford=0,,r,then( 3 )holdswiththisq01. Lemma3.3Assumethatzd(q1,q2)iscontinuousinq1foranyq2ford=1,,r.Thenforanyq1andq2thereexistq016=q1and0fd1ford=0,,rwithrd=0fd=1suchthat zd(q01,q2)=exp(r0d)zd(q1,q2),(3) wherer0d=log(fdq)]TJ /F7 8.966 Tf 6.96 0 Td[(1d=f0q)]TJ /F7 8.966 Tf 6.96 0 Td[(10)ford=0,,r. Proof. Sincez0(q1,q2)=1forallq1andq2,( 3 )holdsford=0.Ford=1,,r,withoutlossofgenerality,Iassumethatzd(q1,q2)isanon-degeneratefunctionwithrespecttoq1for1dkforsomek(1kr),andzd(q1,q2)isadegeneratefunctionwithrespecttoq1fork+1dr.Sincez1(q1,q2)isanon-degeneratecontinuous 58

PAGE 59

functionwithrespecttoq1,thereexistaq1suchthat z1(q1,q2)6=z1(q1,q2).(3) Thenfor2dk,eitherzd(q1,q2)=zd(q1,q2)orzd(q1,q2)6=zd(q1,q2).Fork+1dr,zd(q1,q2)=zd(q1,q2).Nowifzd(q1,q2)6=zd(q1,q2),onecanchoosefd=f0sothat zd(q1,q2)=exp(r0d)zd(q1,q2),(3) sincer0d=log(fdq)]TJ /F7 8.966 Tf 6.96 0 Td[(1d=f0q)]TJ /F7 8.966 Tf 6.97 0 Td[(10)and0
PAGE 60

3.3AProspectiveLikelihoodApproachforGeneralModelsinCase-ControlStudiesWhereSomeofParametersAreNotIdentied UsingtheexpressionofPdz(q)in( 3 ),thepseudomodelpfd(z)in( 3 )canbewrittenaspfd(z)=exp(rd)Pdz(q) rl=0exp(rl)Plz(q)=exp(rd)zd(q1,q2)xdz(q2) lexp(rl)zl(q1,q2)xlz(q2)=exp(dd)xdz(q2) lexp(dl)xlz(q2)=pcd(z), (3) settingdd=rd+log(zd(q1,q2))ford=1,,r,d0=0,andc=(q2,d1,,dr).NowIrewritetheLPin( 3 )inSection3.2as LP=rd=0ndi=1pcd(zdi).(3) DenotingtheMLEofc=(q2,d1,,dr)by^c=(^q2,^d1,,^dr),ifonecanshow rd=0ndi=1p^cm(zdi)=n=nm=n(3) form=1,...,r,then^q2istheMLEforq2basedontheretrospectivelikeihoodbythesamereasoningthat^qistheMLEforqbasedontheretrospectivelikelihoodinSection3.2.NotingthatlogLP dm=nm)]TJ /F8 11.955 Tf 10.95 -.88 Td[(rd=0ndi=1pcm(zdi)form=1,...,r,equalityin( 3 )holds. Consistencyandasymptoticnormalityof^q2canbeshownbyslightlymodifyingTheorems3.1and3.2inSection3.2.Also,itcanbeshownthattheasymptoticcovariancematrixof^q2canbeconsistentlyestimatedbyasubmatrixoftheinverseoftheobservedinformationmatrixbasedonLPin( 3 )byslightlymodifyingTheorem3.3inSection3.2. Example3.3(Multiplicativeinterceptmodel-continuedfromExample3.1)pcd(z)=exp(dd+Qd(z,b)) lexp(dl+Ql(z,b)),d=0,,rwhered0=Q0(z,b)=0.pcd(z)hastheformofthemultiplicativeinterceptmodel.Henceitfollowsthattheequivalenceofinferencebasedontheprospectiveandretrospectivelikelihoodsholdsfortheoddsratio 60

PAGE 61

parameterbunderthemultiplicativeinterceptmodel.ThisresultextendsWeinbergandWacholder(1993)whoconsideredthisequivalencefordiscreteexposurevariables. Example3.4(Continuationratiomodelwith3diseasestates-continuedfromExample3.2)pc0(z)=1 1+exp(d1+b1z)+exp(d2+b2z)(1+exp(a1+b1z))=(1+exp(a1));pc1(z)=exp(d1+b1z) 1+exp(d1+b1z)+exp(d2+b2z)(1+exp(a1+b1z))=(1+exp(a1));pc2(z)=exp(d2+b2z)(1+exp(a1+b1z))=(1+exp(a1)) 1+exp(d1+b1z)+exp(d2+b2z)(1+exp(a1+b1z))=(1+exp(a1)). TheMLE'sfora1,b1,b2frompcd(z)arethesameasonesfromtheretrospectivemodelandconsistent.Also,100(1)]TJ /F4 11.955 Tf 1 0 .167 1 156.71 -203.22 Tm[(a)%condenceregionsfora1,b1,b2basedonpcd(z)have100(1)]TJ /F4 11.955 Tf 1 0 .167 1 42.6 -227.13 Tm[(a)%asymptoticcoverageprobability. Thischapterconsidersmulti-categorycase-controldatawheretheexposurevariablescanbediscrete,continuousormixed.MyndingsgeneralizetheworkofScottandWild(2001)andcoincidewithChen(2003)whensomeofparametersarenotidentied.However,unliketheworkofScottandWild(2001)andChen(2003),Ihavetakenanewapproachforprovingtheconsistencyandasymptoticnormalitywithrelativelysimpleexpressionfortheasymptoticcovariancematrixforestimatorsoftheparametersoftheprospectivemodelunderthecorrectretrospectivemodel.Applicationincludesthemultiplicativeinterceptmodelandcontinuationratiomodel. 61

PAGE 62

CHAPTER4SEMIPARAMETRICMIXTUREAPPROACHINCASE-CONTROLSTUDIESWITHMEASUREMENTERRORINCOVARIATESFORGENERALMODELS Anderson(1972)andPrentice&Pyke(1979)showedthattheinferencebasedonaprospectivelikelihoodisequivalenttothatbasedonaretrospectiveoneforoddsratioparametersunderthelogisticregressionmodel.Roeder,Carroll,andLindsay(1996)extendedAnderson(1972)andPrentice&Pyke(1979)tothecasewheresomeofcovariatesaremeasuredwitherrors.Insuchsituations,case-controldataaredividedintotwogroups.Inonesetofdata(completedataset),exposureandsurrogatevariablesaremeasuredgiventhediseasestatus.Intheothersetofdata(reduceddataset),exposurevariablesarenotmeasuredbutsurrogatevariablesaremeasuredgiventhediseasestatus.Theywritetheretrospectivelikelihoodasafractionofajointlikelihoodandamarginalpdfofthediseasestatus.Thejointlikelihoodisaproductoftheprospectivemodel(aconditionaldistributionofdiseasestatusgivenexposure),themodelforthesurrogatevariable(aconditionaldistributionofthesurrogatevariablegivenexposureanddiseasestatus),empiricaldistributionfortheexposure,andthemixturemodel(theintegraloftheprospectivemodel,themodelforthesurrogatevariable,andempiricaldistributionoftheexposurewithrespecttotheexposure).TheyshowedthattheMLEbasedonthejointlikelihoodisequivalenttothatbasedontheretrospectivelikelihoodfortheoddsratioparameterforthelogisticregressionmodel. MurphyandvanderVaart(2001)provedtheconsistencyandtheasymptoticnormalityoftheMLEfortheoddsratioparametersundercertainconditions.Inthischapter,IextendRoeder,Carroll,andLindsay(1996)togeneralmodels.IpresentapseudojointlikelihoodapproachtoproducetheMLEfromtheretrospectivelikelihoodfortheparametersinoddsratiosforgeneralmodelswithmultiplediseasestatus.TheconsistencyoftheMLEisalsostudiedundercertainconditions. 62

PAGE 63

4.1AJointLikelihoodApproachforGeneralModelsinCase-ControlStudieswithErrorsinCovariates Supposethattwosetsofdataareobtainedinacase-controlstudy.LetZdenoteanexposurevectorandD(D=0,,r)denoteadiseasestatus.AlsoletWdenoteasurrogatecovariateforZ.Inoneset(completedata),WandZareobservedgivenD.Intheotherset(reduceddata),onlyWisfoundgivenD.wCdiandzCdidenotetheithindividual'svaluesforWandZinagroupoffD=dg(d=0,,r,i=1,,nd)inthecompletedataset.wRdkdenotesthekthindividual'svalueforWinagroupoffD=dg(d=0,,r,k=1,,md)inthereduceddataset.Iassumethatdjzsf(djz,q);wjz,dsf(wjz,d;t);andzsm(z)whichisleftunspecied.f(djz,q)andf(wjz,d;t)areanyparametricmodel.Letsd=nd+md(d=0,,r),n=rd=0nd,m=rd=0md,ands=rd=0sd. TheretrospectivelikelihoodisgivenbyLR(q,t,m)=rd=0ndi=1f(wCdijd,zCdi,t)f(djzCdi,q)m(zCdi) qdrd=0mdk=1Rf(wRdkjd,z,t)f(djz,q)m(z)m(dz) qd, (4) whereqd=Rf(djz,q)m(z)m(dz).InordertoaddressnonidentiabilityofcertainparametersasinChapter3,Iproceedasfollows.Letzd(q)=f(dj0,q)=f(0j0,q)andxdz(q)=ff(djz,q)=f(0jz,q)gff(dj0,q)=f(0j0,q)g)]TJ /F7 8.966 Tf 6.97 0 Td[(1(d=0,,r).Notethatz0(q)=x0z(q)=1forallz.Letq=(q1,q2)whereq2denotesasetofparametersinx1z(q),,xrz(q),whileq1denotesasetofparametersincludedinz1(q),,zr(q)butnotinx1z(q),,xrz(q).Then f(djz,q1,q2)=zd(q1,q2)xdz(q2) rl=0zl(q1,q2)xlz(q2). InSection3.3,Ishowedthatthereexistq01andm0(z)suchthat f(djz,q01,q2)m0(z) Rf(djz,q01,q2)m0(z)m(dz)=f(djz,q1,q2)m(z) Rf(djz,q1,q2)m(z)m(dz).(4) 63

PAGE 64

From( 4 ),onecannotethatLR(q1,q2,t,m)in( 4 )isequivalenttoLR(q01,q2,t,m0).Henceq1isnotidentied. NextletPdz(q)=f(djz,q)andptd(wjz)=f(wjz,d;t).Alsorecallthatpcd(z)wasdenedinSection3.4aspcd(z)=exp(dd)xdz(q2) lexp(dl)xlz(q2), whered0=0,andc=(q2,d1,,dr).DeneajointlikelihoodasLJ(c,t,h)=diptd(wCdijzCdi)pcd(zCdi)h(zCdi)dkZptd(wRdkjz)pcd(z)h(z)m(dz), whereh(z)isanynonparametricpdfofz. Theorem4.1. TheMLEforq2andtbasedonLJ(c,t,h)areequivalenttoonesbasedonLR(q1,q2,t,m)in( 4 ). Proof. Firstnotethat LR(q1,q2,t,m)=LJ(c,t,h)=rd=0(s)]TJ /F7 8.966 Tf 6.97 0 Td[(1sd)sd. Writingdd=log[f(sdq)]TJ /F7 8.966 Tf 6.97 0 Td[(1d)=(s0q)]TJ /F7 8.966 Tf 6.97 0 Td[(10)gzd(q1,q2)],qd=RPdz(q)m(z)m(dz)ford=0,,randh(z)=m(z)rl=0(sl=s)q)]TJ /F7 8.966 Tf 6.97 0 Td[(1lPlz(q), Zpcd(z)h(z)m(dz)=s)]TJ /F7 8.966 Tf 6.97 0 Td[(1sd,d=0,,r.(4) FollowingPrenticeandPyke(1979),iftheusualMLEforc,t,andanonparametricMLEforh(z)basedonLJsatisfytherelationshipin( 4 ),thentheMLE'sforq2andtequalthosebasedonLR.Write`J=log(LJ(c,t,h))=dilogptd(wCdijzCdi)pcd(zCdi)h(zCdi)+dklogZptd(wRdkjz)pcd(z)h(z)m(dz) (4) 64

PAGE 65

Supposethatfz1,,zKgdenotetheobservedcovariates.Denehk=h(zk)(k=1,,K)]TJ /F3 11.955 Tf 11.19 0 Td[(1),andhK=1)]TJ /F8 11.955 Tf 11.19 -.88 Td[(K)]TJ /F7 8.966 Tf 6.97 0 Td[(1k=1hk.SinceanonparametricMLEforh(z)isadiscretedensityhavingmassconcentratedatfz1,,zKg,Icansubstituteh1,,hK)]TJ /F7 8.966 Tf 6.97 0 Td[(1forh(z)inthescoreequationstoobtainthesemiparametricMLEs.(seeScottandWild1997,p67;Roeder,Carroll,Lindsay,1996).Thenscoreequationsbasedon`Jaregivenby i) `J dj=difI(j=d))]TJ /F11 11.955 Tf 10.95 0 Td[(pcj(zCdi)g+dkfI(j=d))]TJ /F8 11.955 Tf 12.15 9.3 Td[(Km0=1ptd(wRdkjzm0)pcd(zm0)pcj(zm0)hm0 Km=1ptd(wRdkjzm)pcd(zm)hmg=0,(j=1,,r), ii) `J t=dif tptd(wCdijzCdi)g ptd(wCdijzCdi)+dk[Km0=1f dptd(wRdkjzm0)gpcd(zm0)hm0 Km=1ptd(wRdkjzm)pcd(zm)hm]=0, iii) `J q2=di[f q2xdzCdi(q2)g xdzCdi(q2))]TJ /F13 8.966 Tf 17.5 26.72 Td[(rl=0exp(dl) q2xlzCdi(q2) rl=0exp(dl)xlzCdi(q2)]+dkuptd(wRdkjzu)[exp(dd)f q2xdzu(q2)g lexp(dl)xlzu(q2))]TJ /F7 8.966 Tf 12.15 12.54 Td[(exp(dd)xdzu(q2)lexp(dl)f q2xlzu(q2)g flexp(dl)xlzu(q2)g2]hu mptd(wRdkjzm)pcd(zm)hm=0, iv) `J hj=difh)]TJ /F7 8.966 Tf 6.96 0 Td[(1jI(zCdi=zj))]TJ /F11 11.955 Tf 10.95 0 Td[(h)]TJ /F7 8.966 Tf 6.97 0 Td[(1KI(zCdi=zK)g+dkptd(wRdkjzj)pcd(zj))]TJ /F11 11.955 Tf 10.95 0 Td[(ptd(wRdkjzK)pcd(zK) mptd(wRdkjzm)pcd(zm)hm=0,(j=1,,K)]TJ /F3 11.955 Tf 10.95 0 Td[(1). Letsolutionsoftheabovescoreequationsbe^c=(^q2,^d1,,^dr),^t,and^hk(k=1,,K)]TJ /F3 11.955 Tf 11.13 0 Td[(1).Settingt(z)=dkptd(wRdkjz)pcd(z) mptd(wRdkjzm)pcd(zm)hmandnzm=#ofzminthecomplete 65

PAGE 66

dataset,onecansimplify`J dj=0(j=1,,r)as sj)]TJ /F23 17.215 Tf 10.95 -2.56 Td[(dipcj(zCdi))]TJ /F23 17.215 Tf 10.95 -2.56 Td[(m0t(zm0)pcj(zm0)hm0=0,(4) and`J hj=0(j=1,,r)as h)]TJ /F7 8.966 Tf 6.96 0 Td[(1jnzj)]TJ /F11 11.955 Tf 10.95 0 Td[(h)]TJ /F7 8.966 Tf 6.96 0 Td[(1KnzK+t(zj))]TJ /F11 11.955 Tf 10.95 0 Td[(t(zK)=0.(4) Equationsin( 4 )andKj=1t(zj)hj=midentifyt(zj)=s)]TJ /F11 11.955 Tf 10.8 0 Td[(h)]TJ /F7 8.966 Tf 6.96 0 Td[(1jnzj,j=1,,K.Substitutingt(z1),,t(zK)into( 4 ),onegetsKm=1p^cj(zm)^hm=s)]TJ /F7 8.966 Tf 6.97 0 Td[(1sj,(j=0,,r), sincedip^cj(zCdi)=m0nzm0p^cj(zm0). 4.2Consistencyof^q2 Inthissubsection,Iprovethat^q2isaconsistentestimatorofq2whennd=mdford=0,,r.Theobservationsinthecompletedatasetare(wC1,zC1,dC1),,(wCn,zCn,dCn)wheretherstn0observationshaveD=0,nextn1observationshaveD=1,andsoon,sothat(wCi,zCi,dCi)hasD=difnd)]TJ /F7 8.966 Tf 6.97 0 Td[(1
PAGE 67

Theorem4.2. ^q2and^tareconsistentestimatorsofq0andt0.Inaddition,^h(z)d!h0(z)=m0(z)rl=0ml(q0l))]TJ /F7 8.966 Tf 6.96 0 Td[(1Plz(q0). Proof. Let~h(z)betheempiricaldistributionofzC1,,zCn.Then n)]TJ /F7 8.966 Tf 6.97 0 Td[(1ni=1log^h(zCi)n)]TJ /F7 8.966 Tf 6.96 0 Td[(1ni=1log~h(zCi).(4) Since^c,^t,^haretheMLEfromLJ,onegetsn)]TJ /F7 8.966 Tf 6.97 0 Td[(1ni=1logP^tdi(wCijzCi)P^cdi(zCi)^h(zCi)+n)]TJ /F7 8.966 Tf 6.97 0 Td[(1ni=1logfZP^tdi(wRijz)P^cdi(z)^h(z)m(dz)gn)]TJ /F7 8.966 Tf 6.96 0 Td[(1ilogPt0di(wCijzCi)Pc0di(zCi)~h(zCi)+n)]TJ /F7 8.966 Tf 6.96 0 Td[(1ilogfZPt0di(wRijz)Pc0di(z)~h(z)m(dz)g. (4) From( 4 )and( 4 ),onegetsn)]TJ /F7 8.966 Tf 6.97 0 Td[(1ni=1logP^tdi(wCijzCi)P^cdi(zCi)+n)]TJ /F7 8.966 Tf 6.97 0 Td[(1ni=1logfZP^tdi(wRijz)P^cdi(z)^h(z)m(dz)gn)]TJ /F7 8.966 Tf 6.96 0 Td[(1ilogPt0di(wCijzCi)Pc0di(zCi)+n)]TJ /F7 8.966 Tf 6.96 0 Td[(1ilogfZPt0di(wRijz)Pc0di(z)~h(z)m(dz)g. (4) Iuse( 4 )asthestartingpointofaconsistencyproof.NotingthatLR=LJ=d(nd=n)2nd,thedifferencewithanordinaryconsistencyproofofMLE'ssuchasonesinKieferandWolfowitz(1596)orWald(1949)isthepresenceof~h(z)insteadofh0(z).Since~h!h0 67

PAGE 68

bytheWLLN,n)]TJ /F7 8.966 Tf 6.97 0 Td[(1ni=1logP^tdi(wCijzCi)P^cdi(zCi)+n)]TJ /F7 8.966 Tf 6.97 0 Td[(1ni=1logfZP^tdi(wRijz)P^cdi(z)^h(z)m(dz)gn)]TJ /F7 8.966 Tf 6.96 0 Td[(1ilogPt0di(wCijzCi)Pc0di(zCi)+n)]TJ /F7 8.966 Tf 6.96 0 Td[(1ilogfZPt0di(wRijz)Pc0di(z)h0(z)m(dz)g. withprobabilitygoingto1.Thisimpliesthat^cp!c0=(q02,d01,,d0r),^tp!t0,and^hp!h0relativetotheweektopologyunder(c0,t0,h0). Inthischapter,Istudyestimationprobleminacase-controlstudywheresomeofcovariatesaremeasuredwitherrors.IpresentajointlikelihoodapproachprovidingtheMLEfromtheretrospectivelikelihoodfortheparametersinoddsratios.Myapproachisapplicabletogeneralmodelswithmultiplediseasestatesandgeneralexposurevariables.ThisworkisanextensionofRoeder,Carroll,andLindsay(1996).TheconsistencyoftheMLEisshownundercertainconditions.TheasymptoticnormalityandconsistencyoftheMLEundergeneralconditionsremainasanopenquestion.Myapproachcanbeappliedtoprobitandcontinuationratiomodels. 68

PAGE 69

CHAPTER5SUMMARYANDFUTURERESEARCH TheequivalenceofinferencebasedontheprospectiveandretrospectivelikelihoodshasbeenstudiedbothinafrequentistframeworkandinaBayesianframework.Anderson(1972)andPrenticeandPyke(1979)showedthattheequivalenceholdsfortheoddsratioparameterunderthelogisticregressionmodelinafrequentistframework.SeamanandRichardson(2004)studiedthisequivalenceinaBayesianframework.Theyshowedthatthemarginalposteriordistributionforoddsratioparametersbasedontheprospectivelikelihoodisequivalenttotheonebasedontheretrospectivelikelihoodwithcertainpriors.Ghosh,ZhangandMukherjee(2006)andStaicu(2010)followedSeamanandRichardson(2004).However,theyconsideredonlythebinarylogisticregressionmodelwithmulti-categorydiscreteexposures.InChapter2,Ishowthattheequivalenceoftheposteriordistributionsofoddsratiosbasedontheprospectiveandretrospectivelikelihoodsholdsforgeneralmodels.However,myworkisstilllimitedtomodelswithdiscreteexposurevectors. Inordertoprovethisgeneralposteriorequivalence,Iexpresstheprospectivemodelintermsofnormalizedodds(h)andanormalizingconstant(a),andtheretrospectivemodelintermsofhandtheexposureprobabilityforthecontrol(g).Then,Ishowthattheposteriordistributionforthenormalizedoddshbasedontheprospectivelikelihoodisequivalenttoonebasedontheretrospectivelikelihoodforacertainclassofpriors.Sinceoddsratiosequaltheratiosofnormalizedodds,theequivalenceoftheposteriordistributionsofoddsratiosfollowsimmediately.Thisposteriorequivalenceresultisextendedtostratiedcase-controlstudies. Inapplication,researcherswouldchoosetottheprospectivelikelihoodifthedimensionofacorrespondingtothenumberofdiseasestatesissmallerthanthedimensionofgcorrespondingtothenumberofdistinctvectorsonwhichtheexposurevectorhaspositivemassbecause,inthiscase,theprospectivelikelihoodhasfewer 69

PAGE 70

parameterstoestimatethantheretrospectivelikelihood,andviceversa.Notingthatthedimensionofgincreasesgeometricallyasthenumberofexposurevariablesincreases,mostofthetime,theprospectivelikelihoodwouldbeabetterchoice. Inchapter3,Iexploremaximumlikelihoodestimationincase-controlproblemsforageneralmodelwithmultiplediseasestatesandgeneralexposurevariables.MyworkisanextensionofScottandWild(2001)whoconsideredmaximumlikelihoodestimationincase-controlstudiesforageneralmodelwithbinarydiseasestatesandgeneralexposurevariables.IassumeaparametricmodelforPr(djz)whereddenotesdiseasestatesandzdenotesanexposurevector,andleavethejointdistributionofexposurevariables,denotedbym(z),unspecied.Here,theparameterofinterestistheparameteroftheprospectivemodelPr(djz),sayq.ByBayesrule,theretrospectivemodelcanbewrittenasPr(zjd)=Pr(djz)m(z)=RPr(djz)m(z)(dz). Duetothepresenceofm(z),obtainingthemaximumlikelihoodestimatorforqbecomescomputationallyverydifcult.Toavoidsuchadifculty,Ipresentapseudolikelihoodofwhichparametersareqandr,whererisavectorofnewnuisanceparameterswithdimension=r)]TJ /F3 11.955 Tf 10.98 0 Td[(1,whereristhenumberofdiseasestates.Essentially,thispseudolikelihoodisaprospectivelikelihoodforanimaginarypopulationinwhichtheretrospectiveprobabilityisthesameastheoneforthetruepopulationbutthepopulationproportionofeachdiseasestateissettobethesamplingfraction.Then,Ishowthatthemaximumlikelihoodestimatorforqbasedonthepseudolikelihoodisequivalenttotheonebasedontheretrospectivelikelihood.Theestimatorsforqareconsistentestimatorsofqundertheretrospectivelikelihood,andasymptoticallynormallydistributed.Also,aconsistentasymptoticvariance-covariancematrixestimatorcanbeobtainedfromtheinverseoftheobservedFisherinformationmatrix. Chen(2003)foundthatparametersnotinvolvedinoddsratiosarenotidentiableincase-controlstudies.Onlyparametersidentiedintheunionofoddsratiofunctionsareidentiable.Forexample,theinterceptparametersarenotidentiableforthe 70

PAGE 71

multiplicativeinterceptmodel.Continuationratiomodelhasoneunidentiableparameteraswell.Hence,ifoneusesthepseudolikelihoodforthesemodels,thentheobservedFisherinformationmatrixwillbesingularandthustheestimatorforvariance-covariancematrixcannotbecalculated.Toavoidthisproblem,Imodifytheoriginalpseudolikelihoodbyamodiedpseudolikelihoodincludingtheoddsratios,butnotincludingthenonidentiableparameters.ThentheMLEfromthemodiedpseudolikelihoodisequivalenttothatfromtheretrospectivelikelihoodfortheparametersinoddsratios.TheMLEfortheparametersinoddsratiosisaconsistentestimatorfortheparametersinoddsratios.AsubmatrixoftheinverseoftheobservedFisherinformationmatrixbasedonthemodiedpseudolikelihoodisaconsistentestimatorfortheasymptoticvariance-covariancematrixoftheMLEfortheparametersinoddsratiosundertheretrospectivelikelihood.Themodiedpseudomodelcoincideswiththebase-linelogitmodelproposedbyChen(2003). Forthemultiplicativeinterceptmodel,themodiedpseudomodelisthesameastheoriginalprospectivemodel.Theequivalenceofinferencesforoddsratioparametersfromtheprospectiveandretrospectivelikelihoodsunderthemultiplicativeinterceptmodel(PrenticeandPyke,1979)followsautomaticallyfromtheequivalenceofinferencesbasedonthemodiedpseudoandretrospectivelikelihoodsforoddsratioparameters. InChapter4,Iconsiderthecaseinwhichsomecovariatesaremeasuredwitherrorsincase-controlstudies.Onecollectstwosetsofdata.Foronesetofdata(reduceddataset),valuesofsubjectsforsurrogatevariables(w)anddiseasestatus(d)arerecorded,andfortheothersetofdata(completedataset),valuesofsubjectsforsurrogatevariables,exposurevariables(z),anddiseasestatusarerecorded.Iassumethatconditionalprobabilityofdgivenzandconditionalprobabilityofwgivenzanddareparametricallymodeledanddenotedbyf(djz,q)andf(wjz,d,t)respectively.ButIdonotassumeanyparametricmodelforthemarginalpdfofz,m(z), 71

PAGE 72

anduseanon-parametricapproachtoestimateit.MysettingfollowsthatofRoeder,CarrollandLindsay(1996),exceptthattheyassumedthelogisticregressionmodelforf(djz,q).TheyproposedajointlikelihoodapproachtoprovidetheMLEfortheoddsratioparameterunderthebinarylogisticregressionmodel.Iextendtheirworktogeneralmodelswithmultiplediseasestates.Idivideqinto(q1,q2)whereq1isparametersnotinoddsratiosandq2isparametersinoddsratios.ThenIpresentapseudojointlikelihoodapproachthatprovidesmaximumlikelihoodestimatorsforq2andtundertheretrospectivelikelihoodforanyarbitrarymodels,f(djz,q)andf(wjz,d,t). SinceIestimatem(z)non-parametrically,myapproachissemi-parametric.Also,Iuseamixturemodelforthejointpdfforthereduceddataset.Hencemyapproachisasemiparametricmixtureapproach.Inthesemiparametricapproach,theconsistencyandasymptoticnormalityoftheMLEisverydifculttoproveunlessparametricandnon-parametricpartsarecompletelyseparated.MurphyandvanderVaart(2001)provedtheconsistencyandasymptoticnormalityoftheMLEforq2forthebinarylogisticregressionmodel.Howevertheirconsistencyproofislimitedtothesituationwherethesamplesizesarethesameforthecompleteandreduceddatasetsforeachdiseasestatus.IprovetheconsistencyoftheMLEforq2forgeneralmodelswithmultiplediseasestatesundertheirconditions.TheasymptoticnormalityoftheMLEforq2isnotprovedinmydissertation.TheconsistencyandasymptoticnormalityoftheMLEforq2ingeneralsituationsremainsasatopicforfutureresearch. Ialsoplantostudygene-environmentinteractionincase-controlstudies.Inferentialproblemregardingthegene-environmentinteractionhasaspecialfeatureinthatitisoftenassumedthatgeneticsusceptibilities(G)andenvironmentalexposures(E)areindependentofeachotherintheunderlyingpopulation.Henceusualapproachapplicabletogeneralcase-controldatamaynotbeapplicableundertheG-Eindependenceassumption.Piegorschetal.(1994)foundthatthemultiplicativeinteractionparameterinthelogisticregressionmodelcanbeestimatedbytheoddsratiobetweenGand 72

PAGE 73

Eamongcasesalone,assumingtherarediseaseandG-Eindependence.Also,thiscase-onlyestimatorfortheinteractionismoreefcientthantheG-Einteractionestimatorobtainedfromttingtheprospectivelogisticregressiontodatawithcasesandcontrols. ChatterjeeandCarroll(2005)foundthattheequivalenceofinferencebasedontheprospectiveandretrospectivelikelihoods,whichisabackgroundforusingtheprospectivelogisticregressionmodelwithcase-controldata,doesnotholdundertheG-Eindependence.Theequivalenceisbasedontheassumptionthatthepdfofcovariatesiscompletelyunspecied.However,undertheG-Eindependence,thepdfofcovariatesisfactorizedintotheproductofapdfofGandapdfofEandhencethisassumptionisviolated. UmbachandWeinberg(1997)generalizedPiegorschetal.(1994).Theypresentedaconstrainedlog-linearmodelapproachtoprovidethemaximumlikelihoodestimatorofalltheparametersofalogisticregressionmodelassumingrarediseaseandcategoricalexposures.Theyalsoshowedthattheirlog-linearmodelapproachproducesthesameestimatorofthemultiplicativeinteractionparameterasonefromthecase-onlyapproachbyPiegorschetal.(1994)underthelogisticregressionmodelinasimplesituation.ChatterjeeandCarroll(2005)proposedasemiparametricapproachtoobtaintheMLE'softhelogisticregressionmodelingeneralsettingsinvolvingcontinuousexposures,non-rarediseases,andpopulationstratication.TheyassumedthatthepdfofGisparametricandthepdfofEisleftunspeciedundertheG-Eindependenceassumption.Albertetal.(2001)andMukherjeeetal.(2008)showedthatmethodsexploitingtheG-EindependencemayincurbiasandresultininatedType-Ierrorratesiftheassumptionisviolated.MukherjeeandChatterjee(2008),LiandConti(2009),andMukherjeeetal.(2010)proposedapproachesthatcanrelaxtheG-EindependenceassumptionusinganempiricalBayes,Bayesianmodelaveraging,andproperfullBayesianmethodsrespectively. 73

PAGE 74

Therearestillmanystatisticalproblemsrelatedtogene-environmentinteractionsinacase-controlstudy.Forexample,inaFrequentistframework,thelogisticregressionmodelisonlystudieduntilnow.Also,manymethodsarebasedontheassumptionofrarediseaseandcategoricalexposures.AlthoughChatterjeeandCarroll(2005)'sapproachdoesnotassumerarediseaseandcanbeappliedtocontinuousexposures,theyarerestrictedbyassumingthatapdfofGhasaparametricmodel.InaBayesianframework,sofar,afullBayesianapproachwithmultipleGorcontinuousEhasnotbeenstudied.Iwillworkongeneralgene-environmentinteractionproblemsinbothBayesianandFrequentistframeworksinthefuture. 74

PAGE 75

REFERENCES Agresti,A.(1984).Categoricaldataanalysis.JohnWiley&Sons. Agresti,A.(2002).Categoricaldataanalysis.2nded.JohnWiley&Sons. Ahn,J.,Mukherjee,B.,Banerjee,M.,andCooney,K.A.(2009).Bayesianinferenceforthestereotyperegressionmodel:Applicationtoacase-controlstudyofprostatecancer.StatisticsinMedicine28,3139-3157. Ahn,J.,Mukherjee,B.,Gruber,S.B.,andSinha,S.(2010).Missingexposuredatainstereotyperegressionmodel:applicationtomatchedcase-Controlstudywithdiseasesub-classication.Biometrics.ArticlerstpublishedonlineJun16,2010,doi:10.1111/j.1541-0420.2010.01453.x. AlbertP.S.,RatnasingheD.,TangreaJ.,WacholderS.(2001).Limitationsofthecase-onlydesignforidentifyinggene-environmentinteractions.AmericanJournalofEpidemiology154,687-93. Anderson,J.A.(1972).Separatesamplelogisticdiscrimination.Biometrika59,19-35. Anderson,J.A.(1984).Regressionandorderedcategoricalvariable.JournaloftheRoyalStatisticalSociety,SeriesB46,1-40. Bernstein,S.(1917).TheoryofProbability.InRussian. Breslow,N.E.andCain,K.C.(1988).Logisticregressionfortwo-stagecase-controldata.Biometrika75,11-20. Breslow,N.E.andChatterjee,N.(1999).Designandanalysisoftwo-phasestudieswithbinaryoutcomeappliedtoWilmstumourprognosis.AppliedStatistics48,457-468. Carroll,R.J.,Gail,M.H.,andLubin,J.H.(1993).Case-controlstudieswitherrorsincovariates.JournaloftheAmericanStatisticalAssociation88,185-199. Chatterjee,N.(2004).Atwo-stageregressionmodelforepidemiologicalstudieswithmultivariatediseaseclassicationdata.JournaloftheAmericanStatisticalAssocia-tion,99,127-138. Chatterjee,N.andCarroll,R.J.(2005).Semiparametricmaximumlikelihoodestimationexploitinggene-environmentindependenceincase-controlstudies.Biometrika92,399-418. Chen,H.Y.(2003).Anoteonprospectiveanalysisofoutcome-dependentsamples.JournaloftheRoyalStatisticalSociety,SeriesB65,575-84. Cochran,W.G.(1954).Somemethodsforstrengtheningthecommonc2tests.Biomet-rics10,417-451. 75

PAGE 76

Corneld,J.(1951).Amethodofestimatingcomparativeratesfromclinicaldata:Applicationstothecancerofthelung,breastandcervix.JournalofnationalCancerInstitute11,1269-1275. Corneld,J.,Gordon,T.,andSmith,W.W.(1961).Quantalresponsecurvesforexponentiallyuncontrolledvariables.BulletinoftheInternationalStatisticalInstitute38,97. Cosslett,S.(1981).Maximumlikelihoodestimatorsforchoice-basedsamples.Econo-metrica49,1289-316. Diggle,P.J.,Morris,S.E.,andWakeeld,J.C.(2000).Point-sourcemodellingusingmatchedcasecontroldata.Biostatistics1,89-105. Fisher,R.A.(1935).Thedesignofexperiments(8thed.,1966).Edinburgh:Oliver&Boyd. Foutz,R.V.(1977).Ontheuniqueconsistentsolutiontothelikelihoodequations.JournaloftheAmericanStatisticalAssociation72,147-148. Geweke,J.(1992).Evalutingtheaccuracyofsampling-basedapproachestocalculatingposteriormoments.InBayesianStatistics4(edsJ.M.Bernardo,A.F.M.Smith,A.P.DawidandJ.O.Berger),169-193.Oxford:OxfordUniversityPress. Ghosh,M.andChen,M.-H.(2002).BayesianInferenceforMatchedCase-ControlStudies.Sankhya,B64,107-127. Ghosh,M.,Zhang,L.,andMukherjee,B.(2006).EquivalenceofposteriorsintheBayesiananalysisofthemultinomial-Poissontransformation.Metron-InternationalJournalofStatisticsLXIV,19-28. Godambe,V.P.(1960).Anoptimumpropertyofregularmaximumlikelihoodestimation.Ann.Math.Statist.31,1208-11. Greenland,S.(1994).Alternativemodelsforordinallogisticregression.StatisticsinMedicine13,1665-1677. Hsieh,D.A.,Manski,C.F.,andMcFadden,D.(1985).Estimationofresponseprobabilitiesfromaugmentedretrospectiveobservations.JournaloftheAmericanStatisticalAssociation80,651-662. Kagan,A.(2001)Anoteonthelogisticlinkfunction.Biometrika88,599-601. Kiefer,J.andWolfowitz,J.(1956).Consistencyofthemaximumlikelihoodestimatorinthepresenceofinnitelymanynuisanceparameters.AnnalsofMathematicalStatistics27,887-906. Kuss,O.(2006).Ontheestimationofthestereotyperegressionmodel.ComputationalStatisticsandDataAnalysis50,1877-1890. 76

PAGE 77

Li,D.andContin,D.V.(2009).Detectinggene-environmentinteractionsusingacombinedcase-onlyandcase-controlapproach.AmericanJournalofEpidemiol-ogy169,497-504. Mantel,N.,andHaenszel,W.(1959).Statisticalaspectsoftheanalysisofdatafromretrospectivestudiesofdisease.JournaloftheNationalCancerInstitute22,719. Mukherjee,B.,Ahn,J.,Gruber,S.B.,Ghosh,M.,andChatterjee,N.(2010).Case-controlstudiesofgene-environmentinteraction:BayesianDesignandAnalysis.Biometrics66,934-948. Mukherjee,B.,andChatterjee,N.(2008).Exploitinggene-environmentindependenceforanalysisofcase-controlstudies.AnempiricalBayestypeshrinkageestimatortotradeoffbetweenbiasandefciency.Biometrics64,685-694. Mukherjee,B.andLiu,I.(2009).Anoteonbiasduetottingprospectivemultivariategeneralizedlinearmodelstocategoricaloutcomesignoringretrospectivesamplingschemes.JournalofMultivariateAnalysis100,459-472. Murphy,S.A.andvanderVaart,A.W.(2001).Semiparametricmixturesincase-controlstudies.JournalofMultivariateAnalysis79,1-32. Phillips,A.,andHolland,P.W.(1987).EstimatorsofthevarianceoftheMantel-Haenszellog-odds-ratioestimate.Biometrics43,425. Piegorsch,W.W.,Weinberg,C.R.andTaylor,J.(1994).Nonhierarchicallogisticmodelsandcase-onlydesignsforassessingsusceptibilityinpopulation-basedcase-controlstudies.StatisticsinMedicine13,153-162. Poynter,J.N.,Gruber,S.B.,Higgins,P.D.etal.(2005).Statinsandtheriskofcolorectalcancer.NEnglJMed352,2184-2192. Prentice,R.L.andPyke,R.(1979).Logisticdiseaseincidencemodelsandcase-controlstudies.Biometrika66,403-411. Rao,C.R.(1973).Linearstatisticalinferenceanditsapplications.2nded.NewYork,Wiley. Rice,K.M.(2004).EquivalencebetweenconditionalandmixtureapproachestotheRaschmodelandmatchedcase-controlstudies,withapplications.JournaloftheAmericanStatisticalAssociation99,510-522. Rice,K.M.(2008).Equivalencebetweenconditionalandrandom-effectslikelihoodsforpair-matchedcase-controlstudies.JournaloftheAmericanStatisticalAssociation103,385-396. Robins,J.,Breslow,N.,andGreenland,S.(1986).EstimatorsoftheMantel-Haenszelvarianceconsistentinbothsparsedataandlarge-stratalimitingmodels.Biometrics42,311. 77

PAGE 78

Roeder,K.,Carroll,R.J.,andLindsay,B.G.(1996).Asemiparametricmixtureapproachtocase-controlstudieswitherrorsincovariables.JournaloftheAmericanStatisticalAssociation91,722-732. Satten,G.A.andKupper,L.L.(1993).Inferencesaboutexposure-diseaseassociationsusingprobability-of-exposureinformation.JournaloftheAmericanStatisticalAssocia-tion88,200-208. Scott,A.J.andWild,C.J.(1986).Fittinglogisticmodelsundercase-controlorchoicebasedsampling.JournaloftheRoyalStatisticalSociety,SeriesB48,170-82. Scott,A.J.andWild,C.J.(1991).Fittinglogisticmodelsinstratiedcase-controlstudies.Biometrics47,497-510. Scott,A.J.andWild,C.J.(1997).Fittingregressionmodelstocase-controldatabymaximumlikelihood.Biometrika84,57-71. Scott,A.J.andWild,C.J.(2001).Maximumlikelihoodforgeneralisedcase-controlstudies.JournalofStatisticalPlanningandInference96,3-27. Seaman,S.R.andRichardson,S.(2004).EquivalenceofprospectiveandretrospectivemodelsintheBayesiananalysisofcase-controlstudies.Biometrika91,15-25. Staicu,A.M.(2010).Ontheequivalenceofprospectiveandretrospectivelikelihoodmethodsincase-controlstudies.Biometrika.AdvanceAccesspublishedSeptember16,2010,doi:10.1093/biomet/asq054. Umbach,D.M.andWeinberg,C.R.(1997).Designingandanalysingcase-controlstudiestoexploitindependenceofgenotypeandexposure.StatisticsinMedicine16,1731-1743. Wald,A.(1949).Noteontheconsistencyofthemaximumlikelihoodestimate.AnnalsofMathematicalStatistics20,595-601. Weinberg,C.R.andWacholder,S.(1993)Prospectiveanalysisofcase-controldataundergeneralmultiplicative-interceptriskmodels.Biometrika80,461-5. White,H.(1982).MaximumLikelihoodEstimationofMisspeciedModels.Econometrica50,1-25. 78

PAGE 79

BIOGRAPHICALSKETCH JihyunSongwasbornin1974inSouthKorea.Shewasarstchildforherparentsandhasoneyoungerbrother.ShereceivedherBachelorofScienceinstatisticsfromSeoulNationalUniversityin1997inSouthKorea.ShereceivedherMasterofScienceinstatisticsfromSeoulNationalUniversityin1999.AfterworkingveandahalfyearsasastatisticianinindustryinSouthKorea,shemovedtotheU.S.topursueherPh.D.degreeinstatisticsattheUniversityofFloridainAugust2005.ShereceivedherPh.D.instatisticsfromtheUniversityofFloridainAugust2011. 79