Citation
Conditional Pseudo-Likelihood and Generalized Linear Mixed Model Methods to Adjust for Confounding Due to Cluster

Material Information

Title:
Conditional Pseudo-Likelihood and Generalized Linear Mixed Model Methods to Adjust for Confounding Due to Cluster
Creator:
Cai, Zhuangyu
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (117 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Biostatistics
Committee Chair:
BRUMBACK,BABETTE A
Committee Co-Chair:
WU,SAMUEL SHANGWU
Committee Members:
LU,XIAOMIN
XU,XIAOHUI
Graduation Date:
8/9/2014

Subjects

Subjects / Keywords:
Age groups ( jstor )
Behavioral Risk Factor Surveillance System ( jstor )
Consistent estimators ( jstor )
Datasets ( jstor )
Estimation bias ( jstor )
Estimation methods ( jstor )
Estimators ( jstor )
Simulations ( jstor )
Standardization ( jstor )
Survey data ( jstor )
Biostatistics -- Dissertations, Academic -- UF
cluster -- confounding -- glmm -- pseudo-likelihood
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Biostatistics thesis, Ph.D.

Notes

Abstract:
This dissertation focuses on adjusting for confounding due to unmeasured cluster covariates using generalized linear mixed models(GLMMs). In social epidemiology, GLMMs are often used to model associations between health outcomes and predictors. Those predictors may include both individual-level factors such as personal characteristics, and neighborhood-level factors such as average neighborhood house income. The social epidemiology data are often collected using complex surveys. The following three issues must be taken into account in order to find consistent estimates of parameters of interest. The first issue is that the number of sampled neighborhoods is usually large but the number of sampled individuals within each neighborhood is often small. The second issue is that unequal sampling rates are often used to sample neighborhoods and individuals within each neighborhood. The third issue is that neighborhood-level factors are unmeasured. If those three issues are not handled correctly, estimated coefficients could be seriously biased. In generalized linear mixed models, neighborhood-level effects can be handled using a random intercept, and individual-level coefficients can be consistently estimated by a pairwise conditional pseudo-likelihood (CPL) method. The first aim of this dissertation is to apply the CPL method to multinomial, count, and nonnegative continuous outcomes for finding individual-level coefficients. When neighborhood factors are unmeasured and correlated with other covariates, estimators based on a generalized linear mixed model are biased. A between-within model has been proposed to avoid the bias. However, when sample sizes within neighborhood are small and the number of neighborhoods is large, a between-within model approach to adjust for confounding due to unmeasured cluster covariates is itself prone to bias. When neighborhood factors are unmeasured and sample sizes within each of neighborhoods are small, sample means in each of neighborhoods are often used to replace true neighborhood means, which introduces extra variations into models. In those situations, neighborhood-level coefficients are attenuated. So the second aim of this dissertation is to find the source of bias on the use of between-within models and to correct attenuated neighborhood-level coefficients. The third aim of this dissertation is to use model-based standardization to account for unmeasured confounding due to cluster with complex survey data while estimating a marginal effect of an individual covariate. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2014.
Local:
Adviser: BRUMBACK,BABETTE A.
Local:
Co-adviser: WU,SAMUEL SHANGWU.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2016-08-31
Statement of Responsibility:
by Zhuangyu Cai.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Embargo Date:
8/31/2016
Classification:
LD1780 2014 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 1

ResearchArticle Received10October2011,Accepted27August2012Publishedonline13September2012inWileyOnlineLibrary(wileyonlinelibrary.com)DOI:10.1002/sim.5625Conditionalpseudolikelihoodmethods forclusteredordinal,multinomial,or countoutcomeswithcomplex surveydataBabetteA.Brumback,a * †ZhuangyuCai,aZhulinHe,aHaoW.ZhengaandAmyB.DaileybInordertoadjustindividual-levelcovariateeffectsforconfoundingduetounmeasuredneighborhood characteristics,wehaverecentlydevelopedconditionalpseudolikelihoodmethodstoestimatetheparameters ofaproportionaloddsmodelforclusteredordinaloutcomeswithcomplexsurveydata.Themethodsrequire samplingdesignjointprobabilitiesforeachwithin-neighborhoodpair.Inthepresentarticle,wedevelopasimilarmethodologyforabaselinecategorylogitmodelforclusteredmultinomialoutcomesandforaloglinear modelforclusteredcountoutcomes.Alloftheestimatorsandasymptoticsamplingdistributionswepresentcan beconvenientlycomputedusingstandardlogisticregressionsoftwareforcomplexsurveydata,suchasSASPROC SURVEYLOGISTIC.Wedemonstratevalidityofthemethodstheoreticallyandalsoempiricallybyusingsimulations.Weapplythenewmethodforclusteredmultinomialoutcomestodatafromthe2008FloridaBehavioral RiskFactorSurveillanceSystemsurveyinordertoinvestigatedisparitiesinfrequencyofdentalcleaningboth unadjustedandadjustedforconfoundingbyneighborhood.Copyright©2012JohnWiley&Sons,Ltd. Keywords:causalinference;complexsurveydata;compositelikelihood;conditionallikelihood;confounding; multinomialoutcome;countoutcome 1.IntroductionConfoundingoftheeffectofwithin-clustercovariatesbyunmeasuredbetween-clustervariablesisa commonprobleminmanycontexts,forexample,inlongitudinaldataanalysisandmulti-centerrandomizedclinicaltrials.Ourinterestinthisproblemstemsfromanapplicationinsocialepidemiology, inwhichscienticinterestisinestimatingtheeffectofindividual-levelcovariates(e.g.,race/ethnicity) onanordinal,multinomial,orcounthealthbehavioroutcome(e.g.,frequencyofdentalcleaning)in thepresenceofconfoundingbyunmeasuredneighborhood-levelcharacteristics(e.g.,atthelevelofzip codes,andincluding,e.g.,socio-economicstatus,culturalfactors,andavailabilityofhealthservices). Forillustrationinthisarticle,weapplythemethodsforclusteredmultinomialoutcomestoananalysis ofthe2008FloridaBehavioralRiskFactorSurveillanceSystem(BRFSS)surveydata.TheBRFSSconductsarepresentativesurveyofthenon-institutionalizedpopulationineachstateusingdisproportionate stratiedsampling. Fordataobtainedwitheithersimplerandomsamplingorordinaryclustersampling,modelsenlisting cluster-specicinterceptsandestimationbasedonconditionalmaximumlikelihoodorgeneralizedlinear mixedmodel(GLMM)regressionarethetwoprimaryapproachesavailableforadjustingforconfoundingbycluster.Particularlywhenthewithin-clustersamplesizesaresmall,theGLMMapproachmust aDepartmentofBiostatistics,CollegeofPublicHealthandHealthProfessions,CollegeofMedicine,UniversityofFlorida, Gainesville,FL32611,U.S.A.bDepartmentofHealthSciences,GettysburgCollege,Gettysburg,PA17325,U.S.A. * Correspondenceto:BabetteA.Brumback,DepartmentofBiostatistics,CollegeofPublicHealthandHealthProfessions, CollegeofMedicine,UniversityofFlorida,Gainesville,FL32611,U.S.A.†E-mail:brumback@uß.edu Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335 1325

PAGE 2

B.A.BRUMBACK ETAL. correctlymodeltheassociationbetweentherandomeffectandthecollectionofwithin-clustercovariates inordertoconsistentlyestimatetheeffectsofthewithin-clustercovariates.Tothisend,Neuhausand Kalbeisch[1]proposedincludingtheclusteraveragesofwithin-clustervariablesascovariatesinthe GLMM.NeuhausandMcCulloch[2]dubbedthisapproachthe‘poorman’s’alternativetoconditional maximumlikelihoodmethods.Brumback etal .[3]pointedoutthattheapproachwillbeconsistentwhen therandomeffectisalinearfunctionoftheclusteraveragesplusanindependentGaussianerrorterm buttypicallynotwhentherandomeffectissomeotherfunctionofthewithin-clustercovariates,suchas alinearfunctionoftheirmaximum.Toavoidtheproblemofspecifyingamodelfortherandomeffect, onecanturntoapproachesbasedonconditionalmaximumlikelihoodwhentheyareavailable. Extendingconditionalmaximumlikelihoodmethodsforcomplexsurveydataisnotstraightforward. Brumback etal .[4]consideredapplyingconditionallogisticregressiontoapseudosampleconstructed byreplicatingobservationsaccordingtointeger-scaledversionsoftheirsamplingweights;theirsimulationsdocumentedthatthisapproachcanbebadlyinconsistent.GraubardandKorn[5]alsoshowedwitha simpleexamplethattheapproachisinconsistent.HeandBrumback[6]speculatethatonefactorleading totheinconsistencyisthesimilaritybetweenconditionallogisticregressionappliedtoapseudosample constructedbyreplicatingobservationsalargenumberoftimesandordinarylogisticregressionthat treatsthecluster-specicinterceptsasxedeffects;theauthorsprovedanexactequivalencebetween thosetwomethodsforasimplematchedpairsdesignandinnitereplication.Forabinaryoutcome, GraubardandKorn[5]generalizedthemethodofLiang[7]toprovidetherstconsistentconditional maximumpsuedolikelihoodestimatorforcomplexsurveydata.BrumbackandHe[8]showed,usingan ideaofBreslow etal .[9]presentedinAgresti[10],howtheestimatoranditssamplingdistributionare easilycomputedusingstandardsoftwareforcomplexsurveydata,suchasSASPROCSURVEYLOGISTIC. Brumback,Dailey,andZheng[11]thenextendedtheapproachtoaccommodateordinaloutcomesand complexsurveydatausingaproportionaloddsmodel[10];again,theestimatorsandsamplingdistributionsareeasilycomputedusingSASPROCSURVEYLOGISTIC.Toourknowledge,evenfornoncomplex sampling,theapproachof[11]istherstonebasedonconditionallikelihoodmethodstobedeveloped fortheproportionaloddsmodel—reviewpapersfocusingonordinaldata[12,13]implythatnosuch approachexistedpriorto2005. Inthepresentpaper,wereviewthepreviousapproachesforclusteredbinaryandordinaloutcomes, andwepresentnewapproachesforclusteredmultinomialorcountoutcomesandcomplexsurveydata. Wemodelclusteredmultinomialoutcomesusingthebaselinecategorylogitmodel[10],andweusea loglinearmodel[10]forclusteredcountoutcomes.Weshowthatthemethodsforthesedifferenttypes ofoutcomesare,surprisingly,quitesimilartooneanotherandcanallbeimplementedusingsimple programmingwithSASPROCSURVEYLOGISTIC. Thepaperisorganizedasfollows.InSection2,wegivedetailsofthemotivatingBRFSSexample. InSection3,weintroducetheconditionalpseudolikelihoodmethodsforclusteredordinal,multinomial, orcountoutcomesandcomplexsurveydata.Wealsopresentthetheorythatvalidatesthesemethods.In Section4,wedemonstratevalidityofthenewmethodsempiricallyusingasimulationstudy.InSection5, wepresenttheanalysisoftheBRFSSdata.WeconcludewithadiscussioninSection6.2.MotivatingBehavioralRiskFactorSurveillanceSystemexampleWeareinterestedininvestigatingdisparitiesinfrequencyofdentalcleaningbothunadjustedand adjustedforconfoundingbyneighborhood;wealsowishtoadjustforothermeasuredindividual-level characteristics.Ouroutcomerepresentshowlongithasbeensinceanindividualhashadhisorherteeth cleanedbyadentistordentalhygienist(withinthepastyear,withinthepast2years,withinthepast 5years,5ormoreyearsago,ornever).Wecoulddichotomizethisoutcometotreatitasbinary,wecould treatitasordinal,orwecouldtreatitasmultinomial.Ifthequestionhadbeenphraseddifferently,asking insteadhowmanyyearsithasbeensincetheindividualhadhisorherteethcleaned,wewouldhavea countoutcomethatwemightanalyzewithaloglinearmodel.InSection5,weillustrateourmethod forthebaselinecategorylogitmodelbycollapsingourdentalcleaningoutcomeintothreecategories (withinthepastyear,from1to5yearsago,and 5 C yearsagoornever).Ourcovariateofprimaryinterestisrace/ethnicity,whichwecategorizedintowhitenon-Hispanic,AfricanAmericannon-Hispanic, Hispanic,andother.Wealsoincludedtheindividual-levelcovariatesgender,age(18–34,35–54,55–64, and 65 C years),education( < highschoolversus > highschool),andhealthinsurancestatus(covered versusnotcovered).Weoperationalizedneighborhoodastheintersectionofanindividual’szipcodeand BRFSSsurveystratum.TheFloridaBRFSSuses164strata,denedasthe67countiescrossedwithtwo 1326Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335

PAGE 3

B.A.BRUMBACK ETAL. telephonedensitystrata.Therewere1968suchneighborhoodsinthesample,onaveragecontaining4.6 sampledindividuals;39%containedonesampledindividual,15%twoindividuals,11%threeindividuals,8%fourindividuals,5.5%veindividuals,12%6–10individuals,4%11–15individuals,and5% 16–83individuals.Thus,thesamplesizesintheneighborhoodswererelativelysmall. TheFloridaBRFSSusesdisproportionatestratiedsampling,inwhichonlyonepersonperhousehold canbeselected.In2008,theBRFSSsampled10,874Floridians,including9745Floridianswithteeth. Eachindividualisassignedasamplingweight,representingtheinverseprobabilityofbeingselected intothesamplemultipliedbyapost-straticationadjustment,constructedsothejointdistributionof race/ethnicity,gender,andagematchesthatofthemostrecentstatecensus.Forouranalysis,wewill needtoestimatetheinverseprobabilityofselectingeachpossiblepairofindividualswithinagivenzip codeandstratum.Weareapproximatingthisastheproductofthetwoindividualsamplingweights. Thisapproximationwouldbenearlyexactifnopost-straticationadjustmenthadbeenmade.However, ifweassumethattheinverseofthepost-straticationfactorrepresentstheconditionalprobabilityof respondingtothesurveygiventhesurveydesignvariablesandrace/ethnicity,gender,andage,andthat theprobabilityofoneindividualinapairrespondingisindependentofwhethertheotherresponded, thenourapproximationisvalid.Wediscussconstructionofpairwiseinverseprobabilityweightsinmore generalityattheendofSection3. Weneedtopointoutthattheprobabilityofpairswithinthesamehouseholdwithinagivenzip codeandstratumbeingselectedintosampleis0.Strictlyspeaking,thisviolatesthe‘positivity’[14] assumption,thatis,thatallpairswithinagivenzipcodeandstratumhaveapositivechanceofselection intothesample.However,eveniftheBRFSSweretoallowmultipleindividualsperhouseholdtobe selectedintothesample,suchindividualswouldrepresentanegligiblefractionofthesample.Thusfor allpracticalpurposes,thepositivityassumptionissatisedinthatitsviolationinourcontextresultsin negligiblebias.Ifonewereconcernedthatexclusionofwithin-householdpairscouldleadtoimportant bias,oneshoulduseasamplingapproachdesignedtoincludemanysuchpairs;forexample,onecould sampleallpersonsperhousehold.Thiswouldbeusefulif,forexample,therewereanimportantgenetic componentrelatedtotoothdecay,whichcouldalsobeassociatedwithhabitsofdentalcleaning.Inthat case,wemightwanttoadjustforconfoundingbyunmeasuredhouseholdfactorsratherthanunmeasured zipcodefactors.However,thedesignoftheBRFSSconstrainsourchoicetobethelatter. Weexcludedindividualswithmissingdataora‘don’tknow’responsetoanyofthequestionswe usedinouranalysis,exceptforthequestiononwhethertheindividualhasteeth.Thisresultedinanal sampleof8989individuals;thatis,weexcluded7.8%ofFloridiansintheoriginalsamplewhohadteeth.3.ConditionalpseudolikelihoodmethodsforcomplexsurveydataOurconditionalpseudolikelihoodmethodsforcomplexsurveydatacanbeappliedtoclusteredordinal, multinomial,orcountoutcomes.Wepresenttheestimatorsandmethodofimplementationforeach typeofoutcomeinturn,andthenwediscussestimationofsamplingdistributions.Finally,wediscuss approximationofthepairwisesamplingprobabilities. 3.1.Ordinaloutcomes Themethodof[11]isbasedonaproportionaloddsmodelforthenitepopulationof M clusters (orneighborhoods)with Niindividualsinthepopulationresidinginneighborhood i .Let Yijbean ordinaloutcomewithcategories k D 1;:::;K forindividual j , j D 1;:::;Niincluster i , i D 1;:::;M ,let YiD .Yi1;:::;YiNi/ ,andlet XiD .Xi1;:::;XiNi/ .Let VkijD 1 if Yij6 k ,and 0 otherwise.Wecanwritetheproportionaloddsmodelforclustereddataas logit .E.Vkijj Xi;bi// D Xij C kC bi;k D 1;:::;K 1; (1) wherefor j ¤ l , Vkijq Vkilgiven Xiand bi.Wefurtherassumethatthe M clustersaresampled independentlyandidenticallydistributed(i.i.d.)fromasuperpopulationofclusters,sothatthe .Yi;Xi;bi/;i D 1;:::;Niarei.i.d.The kareincreasingwith k ,as P.Yij6 k j Xi;bi/ increasesin k . Furthermore,wedonotincludeanintercept Xibecauseoftheinclusionof biinthemodel.Thegoalis toestimate ,thecoefcientsoftheindividual-levelcovariates.When K>2 ,model(1)doesnotadmit aproperconditionallikelihood[10]. Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335 1327

PAGE 4

B.A.BRUMBACK ETAL. Let .Vkij;Vkil/ berandomlyselectedfromthepopulationofallwithin-clusterpairs.Theconditional likelihoodthat VkijD 1 and VkilD 0 giventhattheirsumis1equals Lkijl. I 1;0/ D exp ..Xij Xil// 1 C exp ..Xij Xil// : (2) Also,theconditionallikelihood Lkijl. I x;x/ that VkijD VkilD x giventhattheirsumequals 2x is equalto1.Treatingthesetofallrandomlyselectedpairsfromthepopulationofwithin-clusterpairsas independentandadditionallytreatingpairsfordifferent k asindependent,wecanformthecomposite conditionallikelihood[15] L./ D ƒK 1 k D 1ƒM i D 1ƒNi 1 j D 1ƒNil D j C 1Lkijl. I Vkij;Vkil/: (3) Withcomplexsurveydata,wehaveasampleof m clustersand niindividualspercluster,obtained typicallywithunequalprobabilities.Let wijlbetheinverseprobabilityofsamplingcluster i and thenindividuals j and l withinthatcluster.Ourestimatorof maximizestheweightedcomposite conditionallikelihood Lw./ D ƒK 1 k D 1ƒm i D 1ƒni 1 j D 1ƒnil D j C 1.Lkijl./ I Vkij;Vkil/wijl: (4) Theweightsin(4)aresurveyweights,ratherthanprecisionweights,andassuchmustbecorrectlyspeciedtoensureaconsistentestimator.Thederivativeofthelogof Lw./ , U./ ,isanunbiasedestimating equationfor ./ ,assumingthatourproportionaloddsmodel(1)iscorrect,thatourweights wijlare correct,andthatallwithin-clusterpairsinthepopulationhaveapositiveprobabilityofselectioninto thesample.Wenotethattheestimatingequationderivedforjustasinglexedvalueof k wouldbean unbiasedestimatingequationfor .Therefore,treatingthepairsfordifferent k asindependentamounts tosummingthe K 1 unbiasedestimatingequations,whichinturnleadstoanunbiasedestimating equationincorporatinginformationfor k D 1;:::;K 1 . Computationoftheestimatorisverysimpleifonemakesuseoftheobservationrstnotedby Breslow etal .[9]thatthelikelihoodat(2)canbeinterpretedasamarginallikelihoodforordinarylogisticregressionwithoutcome VkijlD 1 andcovariates .Xij Xil/ (withnointercept).Notethatwhen VkijD 0 and VkilD 1 ,thecovariateis .Xil Xij/ .Likewise,wecaninterprettheweightedcomposite conditionallikelihoodat(4)asaweightedmarginallikelihoodforweightedlogisticregression.Notethat pairswith .VkijD Vkil/ dropoutof(4).Theremaindergiverisetoaconstantoutcome VkijlD 1 paired withcovariates .Xij Xil/ or .Xil Xij/ (dependingonwhether Vkij>Vkilorviceversa)andweights wijl.Observethatwecanarbitrarilyrecodetheoutcomeas VkijlD 0 providedwerecodethecovariates withtheirnegatives.Thenwecanusestandardsoftwareforordinarylogisticregressionwithcomplex surveydata,suchasSASPROCSURVEYLOGISTIC,tosolvetheestimatingequationfor .Aswewill show,wecanalsousethatsoftwaretoestimatetheasymptoticsamplingdistributionofourestimator. UnlikeSASPROCLOGISTIC,SASPROCSURVEYLOGISTICwillnotacceptaconstantoutcome,andthus weneedtorecodesomeoftheoutcomesto VkijlD 0 withthecorrespondingrecodingofcovariates. Wenotethattheindependenceassumption j ¤ l , Vkijq Vkilj Xi;biisnotstrictlynecessaryforconsistencyofourestimatorof .Whatsufcesisthattheconditionallikelihoodthat VkijD 1 and VkilD 0 giventhat VkijC VkilD 1 equalstheexpressionat(2).Wecanexpressthecontributiontotheestimating equationfrompair .ij;il/ as wijl.Xij Xil/.Vkij Vkil/.1 expit ..Xij Xil/.Vkij Vkil///; (5) whichiseasilyshowntohavemean0undertheindependenceassumptionandtherestofourassumptionssurroundingmodel(1).However,model(1)stillholdsifthe Vkijaregeneratedasconditionally independent(given Xij, bi,and ij)Bernoullirandomvariableswithmeanexpit .Xij C kC bi/ij, where ijisindependentof Xijand biandhasexpectation1.Underthisgeneralizedmodel,theindependenceassumptionisviolated,andwhetherEquation(5)hasexpectation0wouldseemtodependon thedistributionof ij.Necessaryandsufcientconditionsfor(5)tohaveexpectation0areaninteresting questionforfurtherresearch. 3.2.Multinomialoutcomes Ournewmethodformultinomialoutcomesisbasedonabaselinecategorylogitmodel[10]forthe populationof M clusterswith Niindividualspercluster.Let Yijnowbethemultinomialoutcomewith 1328Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335

PAGE 5

B.A.BRUMBACK ETAL. categories k D 1;:::;K forindividual j incluster i .Wecanwritethebaselinecategorylogitmodelfor clustereddataas log .P.YijD k j Xi;bi/=P.YijD K j Xi;bi// D XijkC kC bi;k D 1;:::;K 1; (6) wherefor j ¤ l , Yijq Yilgiven Xiand bi.Weagainassumethatthe M clustersaresampledi.i.d.from asuperpopulationofclusters,sothatthe .Yi;Xi;bi/;i D 1;:::;Niarei.i.d.Thegoalistoestimate D .1;:::;K 1/ ,thecoefcientsoftheindividual-levelcovariates. Let .Yij;Yil/ berandomlyselectedfromthepopulationofallwithin-clusterpairs.Theconditional likelihoodthat YijD k and YilD K for k ¤ K ,giventhat .Yij;Yil/ D .k;K/ or .K;k/ ,equals Lijl. I k;K/ D exp ..Xij Xil/k/ 1 C exp ..Xij Xil/k/ ; (7) whereastheconditionallikelihoodthat YijD k1and YilD k2for k1¤ k2, k1¤ Kk2¤ K ,giventhat .Yij;Yil/ D .k1;k2/ or .k2;k1/ ,equals Lijl. I k1;k2/ D exp ..Xij Xil/.k1 k2// 1 C exp ..Xij Xil/.k1 k2// : (8) Also,theconditionallikelihood Lijl. I k;k/ that YijD YilD k ,giventhat .Yij;Yil/ D .k;k/ ,equals1. Treatingthesetofallrandomlyselectedpairsfromthepopulationofwithin-clusterpairsasindependent, wecanformthecompositeconditionallikelihood[15] L./ D ƒM i D 1ƒNi 1 j D 1ƒNil D j C 1Lijl. I Yij;Yil/: (9) Withcomplexsurveydata,ourestimatorof maximizestheweightedcompositeconditionallikelihood Lw./ D ƒm i D 1ƒni 1 j D 1ƒnil D j C 1.Lkijl. I Yij;Yil//wijl: (10) Onceagain,theweightsin(10)aresurveyweights,ratherthanprecisionweights,andassuchmust becorrectlyspeciedtoensureaconsistentestimator.Thederivativeofthelogof Lw./ isan unbiasedestimatingequationfor ./ ,assumingthatourbaselinecategorylogitmodel(6)iscorrect,that ourweights wijlarecorrect,andthatallwithin-clusterpairsinthepopulationhaveapositiveprobability ofselectionintothesample. Computationoftheestimatorisverysimpleifoneagainmakesuseoftheobservationof Breslow etal .[9].Foreaseofexposition,weillustratethecomputationforourBRFSSexample,in which K D 3 .Wecaninterprettheconditionallikelihoodat(7)asamarginallikelihoodforordinary logisticregressionwithoutcome YijlD 1 ,covariates ..Xij Xil/;0/ ,andparameters .1;2/ forpairs with k D 1 andwithoutcome YijlD 1 ,covariates .0;.Xij Xil// ,andparameters .1;2/ forpairswith k D 2 .Theconditionallikelihoodat(8)refersonlytopairswith k1D 1 and k2D 2 orviceversa,and, forpairswith k1D 1 and k2D 2 ,wecaninterpretitasamarginallikelihoodforordinarylogisticregressionwithoutcome YijlD 1 ,covariates ..Xij Xil/;.Xil Xij// ,andparameters .1;2/ .Therefore, wecaninterprettheweightedcompositeconditionallikelihoodat(10)asaweightedmarginallikelihood forweightedlogisticregression.Notethatpairswith .YijD Yil/ dropoutof(10).Theremaindergive risetoaconstantoutcome YijlD 1 pairedwiththecovariatesspeciedearlierandweights wijl.Observe thatwecan,again,arbitrarilyrecodetheoutcomeforsomepairsas YijlD 0 providedwerecodethe covariatesforthosepairswiththeirnegatives.Thenwecanusestandardsoftwareforordinarylogisticregressionwithcomplexsurveydata,suchasSASPROCSURVEYLOGISTIC,tosolvetheestimating equationfor ;wewillshowthatwecanalsousethesoftwaretoestimatetheasymptoticsampling distributionofourestimator. 3.3.Countoutcomes OurnewmethodforcountoutcomesisbasedonaloglinearmodelforPoissonoutcomes[10]forthe populationof M clusterswith Niindividualspercluster.Aswewillshow,itisnotnecessarythatthe distributionoftheoutcomesisPoisson,eventhoughthemethodisderivedviathisassumption.Let Yijnowbethecountoutcome,thatis,anonnegativeinteger,forindividual j incluster i .Wecanwritethe loglinearmodelforclustereddataas log .E.Yijj Xi;bi// D Xij C bi: (11) Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335 1329

PAGE 6

B.A.BRUMBACK ETAL. Weagainassumethatthe M clustersaresampledi.i.d.fromasuperpopulationofclusters,sothatthe .Yi;Xi;bi/;i D 1;:::;Niarei.i.d.Forthepurposeofderivinganestimatorof ,wetreatthedistribution of YijasPoisson,andweassumethatfor j ¤ l , Yijq Yilgiven Xiand bi. Let .Yij;Yil/ berandomlyselectedfromthepopulationofallwithin-clusterpairs.Theconditional likelihoodthat YijD yijand YilD yil,giventhat YijC YilD yijC yil,equals Lijl./ D exp ..XijyijC Xilyil// . exp .Xij/ C exp .Xil//yijC yil; (12) which,uponrearrangement,equals Lijl./ D exp ..Xij Xil// 1 C exp ..Xij Xil// yij exp ..Xil Xij// 1 C exp ..Xil Xij// yil: (13) Observethatwecaninterpret Lijl./ asaweightedmarginallikelihoodforalogisticregressionwith twoindependentoutcomes.Therstoutcomeequals1,withcovariates .Xij Xil/ ,nointercept,and weight yij.Thesecondoutcomealsoequals1,withcovariates .Xil Xij/ ,nointercept,andweight yil. Treatingthesetofallrandomlyselectedpairsfromthepopulationofwithin-clusterpairsasindependent, wecanformthecompositeconditionallikelihood[15] L./ D ƒM i D 1ƒNi 1 j D 1ƒNil D j C 1Lijl./: (14) Withcomplexsurveydata,outestimatorof maximizestheweightedcompositeconditionallikelihood Lw./ D ƒm i D 1ƒni 1 j D 1ƒnil D j C 1Lijl./wijl: (15) Thederivativeofthelogof Lw./ isanunbiasedestimatingequationfor assumingthatourloglinear model(11)iscorrect,thatourweights wijlarecorrect,andthatallwithin-clusterpairsinthepopulationhaveapositiveprobabilityofselectionintothesample.ToseethattheassumptionofthePoisson distributionisunnecessaryfortheestimatingequationtobeunbiased,writetheestimatingequationas S./ D ijlwijlSijl,where SijlD XijYijC XilYil .YijC Yil/ Xijexp .Xij/ C Xilexp .Xil/ exp .Xij/ C exp .Xil/ : (16) Supposingthatthepair .ij;il/ israndomlyselectedfromthepopulationofallwithin-clusterpairs, theexpectationof Sijlconditionalon Xiand biusingmodel(11)is0.Therefore,assumingthatthe weight wijlcorrectlyspeciestheinverseprobabilityofselectingpair .ij;il/ intothesampleand thatthepositivityassumptionholds,theexpectationof S./ is0,andthereforeitisanunbiased estimatingequation. Computationoftheestimatorisverysimpleifonemakesuseoftherearrangementof Lijl./ at(13). Wecanalsointerprettheweightedcompositeconditionallikelihoodat(15)asaweightedmarginallikelihoodforlogisticregressionwithtwoindependentoutcomesperpair .ij;il/ .Therefore,oneneedonly pairthewithin-clusterobservationsandthenforeachpairformtwoobservations,onewithoutcome equalto1,covariates .Xij Xil/ ,nointercept,andweight Yijwijlandtheotherwithoutcomeequal to1,covariates .Xil Xij/ ,nointercept,andweight Yilwijl.Observethatwecan,again,arbitrarily recodesomeoftheoutcomesas0providedthatwerecodetheircovariateswiththeirnegatives.Thenwe canusestandardsoftwareforordinarylogisticregressionwithcomplexsurveydata,suchasSASPROC SURVEYLOGISTIC,tosolvetheestimatingequationfor .Aswewillnextshow,wecanalsousethat softwaretoestimatetheasymptoticsamplingdistributionofourestimator. 3.4.Estimatingthesamplingdistribution Inthepopulationmodels(1),(6),and(11),the M populationclustersaresampledi.i.d.,sothatthe .Yi;Xi;bi/;i D 1;:::;Niarei.i.d.Therststageofacomplexsamplingdesigntypicallyconsistsof primarysamplingunitsnestedwithinprimarystrata h D 1;:::;H .Providedthatourcomplexsampling designissuchthateither(a)eachpopulationclusterisperfectlynestedwithinaprimarysamplingunit, or(b)eachprimarysamplingunitisperfectlynestedwithinapopulationcluster,andeachpopulation clusterisperfectlynestedwithinaprimarystratum,thenestimatingthesamplingdistributionofour 1330Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335

PAGE 7

B.A.BRUMBACK ETAL. estimatorof isstraightforward.Incase(a),welet c D 1;:::;Chindextheprimarysamplingunits withinprimarystratum h .Incase(b),welet c D 1;:::;Chindexthepopulationclustersnestedwithin primarystratum h .Let .Yc;Xc;bc/ representthecollectionofsampleddatapertainingto c .Ineither case(a)orcase(b),thenite-populationsamplingfollowedbythecomplexsamplingdesignrenders the .Yc;Xc;bc/;c D 1;:::;Chindependent.IntheBRFSSexample,theprimarysamplingunitsarethe individuals,whichareperfectlynestedwithinthepopulationclusters,andthepopulationclusterswere denedtobenestedwithintheprimarystrata.Hence,wesatisfytherequirementsforcase(b). Wecanthereforeestimatetheasymptoticsamplingdistributionofourestimator O of usingthe usualsandwichestimatorofvarianceforcomplexsurveydata[16–18].Specically,wecanexpressthe estimatingequation U./ D 0 asthesum H h D 1Chc D 1Uhc./ D 0 ,anditsestimatedvarianceisgivenby O var . O / D – r U. O / 1V. O /– r U. O /T 1; (17) where r U./ isthegradientof U./ withrespectto ,and V. O / D H h D 1.Ch=.Ch 1//Chc D 1.Uhc. O / Uh:. O //.Uhc. O / Uh:. O //T; (18) where Uh:. O / D .1=Ch/Chc D 1Uhc. O / . Ourestimatoroftheasymptoticsamplingdistributionisdesignconsistent;however,weneedfor H h D 1Chtobereasonablylargefortheestimatortoperformwellinpractice.Bythelawoflargenumbersandthecentrallimittheorem, O isapproximatelydistributedasmultivariatenormalwithmean andvariance O var . O / . InferencebasedontheestimatedasymptoticsamplingdistributionissimpleifoneusesSASPROC SURVEYLOGISTIC.Oneneedonlyspecifytheclusterasthevariablecorrespondingto c andthestratum asthevariablecorrespondingto h ;then,condenceintervals,standarderrors(SEs),and p -valuesoutput bySASPROCSURVEYLOGISTICaredesignconsistent. Analternativeapproachtoestimatingthesamplingdistributionwouldbetousethebootstrap,resamplingtheentities c D 1;:::;Chwithreplacementwithintheirrespectivestrata.Forsomesampling designs,the Chareverysmall;forexample,inthepublic-useNationalHealthInterviewSurvey (NHIS)le,all ChD 2 .Adaptationsofthebootstrapareavailable[19,20]forthosecomplexsurvey datasituations. 3.5.Approximatingthepairwisesamplingprobabilities Adrawbackofourmethodsisthattheyrequireaccuratespecicationofthepairwisesampling probabilities,whereasstandardmethodsforcomplexsurveydataonlyrequireaccuratespecication oftheindividualsamplingprobabilities.Public-usesurveydatasetscontaintheindividualsampling probabilities,butoftenthereisnoinformationonthepairwisesamplingprobabilities.OurBRFSS exampleisspecialinthatthesamplingdesignisrelativelysimple,whichallowedustoderiveareasonablemethodforapproximatingthepairwisesamplingprobabilities,asdescribedinSection2.In[8], weanalyzedpublic-useNHISdataandtreatedhouseholdasthepopulationcluster;thisalsoallowedus toeasilycomputesimplepairwisesamplingprobabilitiesbecauseallindividualsinahouseholdwere sampledforthevariablesinouranalysis.Therefore,thepairwisesamplingprobabilitiesequaledthe individualsamplingprobabilityforindividualswithinthesamehousehold.In[4],weanalyzedin-house NHISdataandtreatedthesecondarysamplingunit(SSU)ofthesurveyasthepopulationcluster.We requesteddatathatallowedustocompute wi,theinverseprobabilityofselectingtheSSU,and wj j i, theinverseprobabilityofselectinganindividualgiventhathis/herSSUwasselected.Toconstructpairwiseweights,onecouldadoptthetheorythattheindividualsinthenite-populationSSUweresampled i.i.d.fromasuperpopulation,sothattheinversepairwisesamplingprobabilityforpair .ij;il/ wouldbe approximatedas wiwj j iwl j i.Inthatanalysis,weignoredtheindividual-levelpost-straticationweights becauseitisverydifculttotranslatethemintoaccuratepairwisepost-straticationweights.However, seeGraubardandKorn[5]foranapproximatemethod. 3.6.Includingtheintercept Foreachofourmethods,whenwetranslatetheweightedcompositelikelihoodintoaweighted likelihoodforlogisticregression,wendthatthelogisticregressiondoesnotincludetheintercept. Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335 1331

PAGE 8

B.A.BRUMBACK ETAL. However,itispossibletoincludetheinterceptifonerandomlyselectswhichobservationstousewhen switchingtheoutcomefrom 1 to 0 .Thisresultsinanasymptoticallyequivalentestimator.4.SimulationstudyInthissection,weusesimulationstudiestoempiricallydemonstrateconsistencyof O forthemultinomialandcountmethodsandalsotoempiricallydemonstrateconsistencyof O var . O / of(17).Wealso investigatetheperformanceofthebootstrapestimatoroftheSEfor O .Inthesimulationstudies,wehave appliedtheversionofourmethodthatdoesnotincludetheintercept. Forthemultinomialmethod,werstgeneratedthenitepopulationasfollows.Wesimulated uiand di, i D 1;:::;M D 1000 independentlyofoneanotherusingi.i.d. N.0;1/ randomvariables.Welet piD exp .ui/=.1 C exp .ui// ,andforeach i ,welet Xijbei.i.d.Bernoulli .pi/ , j D 1;:::;NiD 1000 . Denoteby N Xithepopulationmean .1=Ni/Nij D 1Xij;wegenerated biD N XiC di.Then,wegenerated Yijasmultinomialaccordingtomodel(6)usingthe Xijand biwith K D 3 .Welet 1D 0:2 , 2D 0:4 , 1D 1 ,and 2D 2 .Ourinterestliesinestimating D .1;2/ .Next,wegeneratedthesampleddataby samplingpopulationobservationsindependentlybutwithinformativeprobabilitiesasfollows.Ifeither YijD 1 and XijD 0 or YijD 3 and XijD 1 ,thenwesampledtheobservationwithprobability 0:004 . Ifeither YijD 2 and XijD 0 or YijD 2 and XijD 1 ,thenwesampledtheobservationwithprobability 0:003 .Ifeither YijD 3 and XijD 0 or YijD 1 and XijD 1 ,thenwesampledtheobservationwith probability 0:002 .Thissamplingisinformativeinthatifwedonotaccountforit,wewillunderestimate becausewithineachclusterweareoversamplingtheoff-diagonalsofthetwo 2 2 contingencytables correspondingto 1and 2.Becauseofourindependentsampling,thepairwiseweightforsampledpair .ij;il/ isproportionaltotheproductoftheindividualinverseprobabilitiesofsampling;forexample,we lettheweightbe .1=4/ .1=3/ if YijD 1 , XijD 0 , YilD 2 ,and XilD 0 . Forthecountmethod,werstgeneratedthenitepopulationasfollows.Wegenerated ui, di, pi, Xij,and bithesameasforthemultinomialsimulation.Thenwegeneratedthemeanof Yijaccording tomodel(11);thatis,weset ijD E.Yijj Xi;bi/ D exp .Xij C bi/ with D 3 ,andwesimulated the YijasindependentPoissonrandomvariableswithmean ij.Ourinterestliesinestimating .Next, wegeneratedthesampleddatabysamplingpopulationobservationsindependentlybutwithinformative probabilitiesasfollows.Ifeither Yij<2 and XijD 0 or Yij> 2 and XijD 1 ,thenwesampledthe observationwithprobability 0:02 .Ifeither Yij<2 and XijD 1 or Yij> 2 and XijD 0 ,thenwesampled theobservationwithprobability 0:002 .Thissamplingisinformativeinthatifwedonotaccountforit, wewilloverestimatethemeasureofassociation between Xijand Yij.Becauseofourindependent sampling,thepairwiseweightforsampledpair .ij;il/ isproportionaltotheproductoftheindividual inverseprobabilitiesofsampling;forexample,welettheweightbe 1 10 if YijD 1 , XijD 0 , YilD 1 , XilD 1 . Resultsofapplyingourmethodsofestimating , O var . O / ,andthebootstrapareasfollows.Forthe multinomialsimulationforonesampleddataset,usingourmethodwithSASPROCSURVEYLOGISTICresultedin O 1D 1:032 withanestimatedSEof 0:187 and O 2D 2:088 withanestimatedSEof 0:157 . Incontrast,whenwedonotadjustfortheinformativesampling,thatis,whenweletall wijlD 1 ,we nd O 1D 0:354 withanestimatedSEof 0:183 and O 2D 1:419 withanestimatedSEof 0:153 .Thus, asweanticipated,failuretoadjustfortheinformativesamplingresultsinunderestimatesof 1and 2. Usingthebootstraptoresamplefromthesampleddataset500timesresultedinasamplingdistribution for O 1withameanof 1:024 andstandarddeviation(SD)of 0:181 andinasamplingdistributionfor O 2withameanof 2:082 andanSDof 0:147 .Thus,ourbootstrapresultssuggestthat O isapproximately unbiasedandthatourestimateof O var . O / iscorrect.Using500simulateddatasetstoapproximatethetrue samplingdistributionof O ,wefoundthedistributionof O 1tohaveameanof 0:991 andanSDof 0:178 , andwefoundthedistributionof O 2tohaveameanof 2:013 andanSDof 0:169 .Forthose500simulated datasets,wealsocomputedtheaverageof O var . O / andthentookthesquareroot;for 1itequaled 0:180 , andfor 2, 0:158 . ForthePoissonsimulationforonesampleddataset,usingourmethodwithSASPROC SURVEYLOGISTICresultedin O D 2:997 withanestimatedSEof 0:031 .Whenwedonotadjustfor theinformativesampling,thatis,whenweletall wijlD 1 ,wend O D 3:491 withanestimatedSEof 0:020 .Usingthebootstraptoresamplefromthatdataset500timesresultedinasamplingdistribution for O withameanof 2:999 andSDof 0:031 .Thus,ourbootstrapresultssuggestthat O isapproximately 1332Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335

PAGE 9

B.A.BRUMBACK ETAL. unbiasedandthatourestimateof O var . O / iscorrect.Using500simulateddatasetstoapproximatethe truesamplingdistributionof O ,wefoundthedistributionof O tohaveameanof 3:003 andanSDof 0:029 .Forthose500simulateddatasets,wealsocomputedtheaverageofthesquarerootof O var . O / ; itequaled 0:030 . Inbothsetsofsimulations,theempiricalresultscoincidewithourtheoreticalresultsofSection3. Theydocumentthatourestimatorof ,ourmethodofestimatingtheasymptoticsamplingdistribution, andourbootstrapperformwellwhentherequisitemodelingassumptionsarecorrect.5.AnalysisoftheFloridaBehavioralRiskFactorSurveillanceSystemWepresenttheresultsofapplyingthreemultinomialregressionstotheBRFSSdatainTableI. Therstcolumndisplaysthecrude(unadjusted)oddsratiosrepresentingtheassociationsbetween race/ethnicityandthemultinomialoutcome.Wehavechosenwhitenon-Hispanicasthereferencegroup forrace/ethnicityandamostrecentdentalcleaningof 5 C yearsagoasthereferencegroupforthe multinomialoutcome.WeseethatAfricanAmericannon-Hispanicpersonshaveastatisticallysignificantlyloweroddsofacleaningwithin1yearversus 5 C yearsthantheoddsforwhitenon-Hispanic persons,withanoddsratioand95%condenceintervalof0.58(0.40,0.84).Theanalogousodds ratioforHispanicpersonsisstatisticallyinsignicant(0.85(0.57,1.28)).Thesecondcolumndisplays theoddsratiosadjustedforgender,age,education,andhealthinsurance.Theadjustedoddsratiofor TableI. Estimatedoddsratiosand95%condenceintervalsfortheassociationsbetweenrecencyofdental cleaningandrace/ethnicity.CovariateCrudeModel1Model2 Race/ethnicity Whitenon-HispanicReferenceReferenceReference AfricanAmericannon-Hispanic < 1year0.58(0.40,0.84)0.70(0.47,1.05)0.73(0.31,1.73) AfricanAmericannon-Hispanic1–5years1.17(0.75,1.82)1.20(0.77,1.87)0.81(0.35,1.88) Hispanic < 1year0.85(0.57,1.28)1.43(0.95,2.16)3.46(1.34,8.98) Hispanic1–5years1.22(0.77,1.93)1.45(0.91,2.31)2.20(0.88,5.48) Other < 1year0.97(0.54,1.73)1.43(0.69,2.99)1.67(0.73,3.81) Other1–5years1.22(0.61,2.44)1.45(0.64,3.25)1.31(0.52,3.31) Gender MaleReferenceReference Female < 1year1.17(0.91,1.51)1.40(0.95,2.05) Female1–5years1.00(0.75,1.35)1.06(0.68,1.65) Age 18–34ReferenceReference 35–54 < 1year1.45(1.04,2.04)1.98(1.15,3.41) 35–541–5years0.98(0.67,1.44)0.97(0.53,1.78) 55–64 < 1year1.52(1.03,2.24)2.52(1.30,4.88) 55–641–5years0.78(0.50,1.21)0.96(0.48,1.93) 65+ < 1year1.31(0.92,1.86)1.66(0.93,2.97) 65+1–5years0.54(0.36,0.82)0.50(0.26,0.97) Education < HighschoolReferenceReference > Highschool < 1year3.34(2.20,5.07)2.00(1.05,3.79) > Highschool1–5years2.02(1.25,3.27)1.75(0.82,3.72) Healthinsurance NoinsuranceReferenceReference Insurance < 1year4.93(3.56,6.83)4.38(2.22,8.64) Insurance1–5years2.13(1.49,3.04)1.81(0.93,3.52) Resultsfromthreemodelsareshown:Crude,unadjustedforconfounding;Model1,adjustedforconfoundingby gender,age,education,andhealthinsurance;Model2,additionallyadjustedforconfoundingbyneighborhood. Referencelevelforrecencyofdentalcleaningis 5 C years. Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335 1333

PAGE 10

B.A.BRUMBACK ETAL. AfricanAmericannon-Hispanicpersonsversuswhitepersonsfortheoddsofcleaningwithin1year versus 5 C yearsis0.70(0.47,1.05)—itisalmoststatisticallysignicant;thatforHispanicpersons is1.43(0.95,2.16).Thethirdcolumndisplaystheresultsofournewmethodthatadjuststheodds ratiosforconfoundingbyneighborhoodinadditiontogender,age,education,andhealthinsurance. TheadjustedoddsratioofinterestforAfricanAmericannon-Hispanicpersonsversuswhitepersons isnolongerclosetostatisticallysignicant(0.73(0.31,1.73)),andthatforHispanicpersonsversuswhitepersonsnowindicatesareversedisparity(3.46(1.34,8.98)).Theseresultssuggestthat, forpersonslivinginthesameneighborhoodandwithsimilarvaluesoftheconfoundingvariables, AfricanAmericannon-Hispanicpersonshavesimilardentalpreventativehealthbehaviorsasdowhite non-Hispanicpersons,andHispanicpersonshaveenhanceddentalpreventativehealthbehaviorsversus whitenon-Hispanicpersons. Alsonoteworthyaretheoddsratiosforeducationinthesecondandthirdcolumns.Weseethatadjustingforconfoundingbyneighborhoodattenuatestheeffectofeducation.Thisistobeexpectedbecause wecanthinkofneighborhoodasaproxyforsocio-economicstatus,whichincludeseducation.6.DiscussionWehavereviewedourconditionalpseudolikelihoodmethodforbinaryandordinaloutcomeswithcomplexsurveydata,andwehavepresentednewconditionalpseudolikelihoodmethodsformultinomial andcountoutcomes.Wehaveshownthatallofthemethodsareeasilyimplementedusingstandard softwareforlogisticregressionwithcomplexsurveydata,suchasSASPROCSURVEYLOGISTIC.Wepresentedtheoreticalexplanationsandempiricalresultsbasedonsimulationsandthebootstraptovalidate ourmethods. Wewishtopointoutthatourordinalormultinomialmethodforbinaryoutcomeswithabinarycovariate,whichreducestothemethodofGraubardandKorn[5]inthatcase,canbeviewedasageneralized Mantel–Haenszel[21]estimatorofacommonoddsratioforcomplexsurveydata.Wecanusetheresult ofBreslow[22]ontheequivalenceoftheconditionallogisticregressionestimatorformatchedpairsand theMantel–Haenszelestimatortoshowthat,inthissimplecontext,ourestimatorofthecommonodds ratiois exp . O / D m i D 1nij D 1nil D j C 1wijlAijlDijl m i D 1nij D 1nil D j C 1wijlBijlCijl; (19) where AijlD XijYijC XilYil, BijlD .1 Xij/.1 Yij/ C .1 Xil/.1 Yil/ , CijlD .1 Xij/YijC .1 Xil/Yil,and DijlD Xij.1 Yij/ C Xil.1 Yil/ .Graubard,Fears,andGail[23]presentedadifferentgeneralizationoftheMantel–Haenszelestimatorforcomplexsurveydata;theirestimatorisdesign consistentforso-calledlarge-stratalimitingmodels,inwhichthenumberofstrataremainboundedas thenumberofsampledunitswithinstrataincreasestoinnity.BrumbackandHe[24]pointoutthatthe estimatorof[23]isnotdesignconsistentforthesparse-datalimitingmodel,inwhichthenumberof clustersincreasestoinnityasthenumberofsampledunitswithinclustersremainsbounded;analternativeestimatorthatisconsistentforthesparse-datalimitingmodelbutnotforthelarge-stratalimiting modelisalsopresentedin[24].Incontrast,theestimatorat(19)isduallyconsistent,thatis,consistent forbothlimitingmodels. Wenotethatourestimatorsandimplementationforadjustingforconfoundingbyclusterappliesnot onlytocomplexsurveydatabutalsotosimplersamplingdesigns.Forsimplersamplingdesigns,ithas beeneasytoapplyconditionallogisticregressionforbinaryoutcomeswhenweuseforexampleSAS PROCLOGISTICwiththestratumstatement.However,ithasnotbeeneasytoapplyconditionalregressionmethodswithmultinomialorcountoutcomes,andithasnotbeenpossibletoapplyconditional regressionmethodswithordinaloutcomesusingthecumulativelogitlink.Ourdataexampleenlistedan outcomethatwasordinal,butonemaysometimespreferthegeneralizedlogitlinktothemorerestrictive cumulativelogitlink. Wearedevelopourconditionallogisticregressionestimatorsusingaweightedcompositeconditional likelihood[15]approachthattreatsallwithin-clusterpairsasindependenteventhoughtheyarenotand, furthermore,fortheordinalmethod,treatsalloutcomepairsasthoughtheyareindependentwhenthey arenot(seeSection3forjustication).TherecentarticlebyVarin,Reid,andFirth[15]overviewscompositelikelihoodmethodsforseveraltypesofapplications,butitdoesnotaddresscomplexsurveydata. Furthertheoreticaldevelopmentsinthisdirectionwouldbeuseful. 1334Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335

PAGE 11

B.A.BRUMBACK ETAL. AcknowledgementsWewouldliketothankYoujieHuang,JamieForrest,andMelissaMurrayfromtheFloridaBRFSSOfcefor theirhelpfulsupport.WewouldalsoliketoacknowledgethesupportoftheNationalScienceFoundation,the USDA/NationalAgriculturalStatistics,theDepartmentofEducation/NationalCenterforEducationalStatistics, theSocialSecurityAdministration,andtheDepartmentofAgriculture/EconomicResearchServicethroughgrant NSFSES-1115618.References1.NeuhausJM,KalbeischJD.Between-andwithin-clustercovariateeffectsintheanalysisofclustereddata. Biometrics 1998; 54 :638–645. 2.NeuhausJM,McCullochCE.Separatingbetween-andwithin-clustercovariateeffectsbyusingconditionaland partitioningmethods. JournaloftheRoyalStatisticalSociety,SeriesB 2006; 68 :859–872. 3.BrumbackBA,DaileyAB,BrumbackLC,LivingstonMD,HeZ.Adjustingforconfoundingbyclusterusinggeneralized linearmixedmodels. StatisticsandProbabilityLetters 2010; 80 :1650–1654. 4.BrumbackBA,DaileyAB,HeZ,BrumbackLC,LivingstonMD.Effortstoadjustforconfoundingbyneighborhoodusing complexsurveydata. StatisticsinMedicine 2010; 29 :1890–1899. 5.GraubardBI,KornEL.Conditionallogisticregressionwithsurveydata. StatisticsinBiopharmaceuticalResearch 2011; 3 :398–408. 6.HeZ,BrumbackBA.Anequivalenceofconditionalandunconditionalmaximumlikelihoodestimatorsviainnite replicationofobservations. CommunicationsinStatisticsÐTheoryandMethods 2011.Inpress. 7.LiangK.ExtendedMantel–Haenszelestimatingprocedureformultivariatelogisticregressionmodels. Biometrics 1987; 43 :289–299. 8.BrumbackBA,HeZ.Adjustingforconfoundingbyneighborhoodusingcomplexsurveydata. StatisticsinMedicine 2011; 30 :965–972. 9.BreslowN,DayN,HalvorsenK,PrenticeR,SabaiC.Estimationofmultiplerelativeriskfunctionsinmatched case–controlstudies. AmericanJournalofEpidemiology 1978; 108 :299–307. 10.AgrestiA. CategoricalDataAnalysis ,SecondEdition.JohnWiley&Sons:Hoboken,2002. 11.BrumbackBA,DaileyAB,ZhengHW.Adjustingforconfoundingbyneighborhoodusingaproportionaloddsmodeland complexsurveydata. AmericanJournalofEpidemiology 2011; 175 (11):1133–1141. 12.AgrestiA,NatarajanR.Modelingclusteredorderedcategoricaldata:asurvey. InternationalStatisticalReview 2001; 69 :345–371. 13.LiuI,AgrestiA.Theanalysisoforderedcategoricaldata:anoverviewandasurveyofrecentdevelopments. Test 2005; 14 :1–73. 14.ColeSR,HernanMA.Constructinginverseprobabilityweightsformarginalstructuralmodels. AmericanJournalof Epidemiology 2008; 168 :656–664. 15.VarinC,ReidN,FirthD.Anoverviewofcompositelikelihoodmethods. StatisticaSinica 2011; 21 :5–42. 16.BinderDA.Onthevariancesofasymptoticallynormalestimatorsfromcomplexsurveys. InternationalStatisticalReview 1983; 51 :279–292. 17.SkinnerCJ,HoltD,SmithTMF. AnalysisofComplexSurveys .JohnWiley&Sons:Sussex,1989. 18.KornEL,GraubardBI. AnalysisofHealthSurveys .JohnWiley&Sons:NewYork,1999. 19.McCarthyPJ,SnowdenCB.Thebootstrapandnitepopulationsampling.In VitalandHealthStatistics2-95 .Public HealthServicePublication:U.S.GovernmentPrintingOfce,Washington,DC;85–1369. 20.ShaoJ.Impactofthebootstraponsamplesurveys. StatisticalScience 2003; 18 (2):191–198. 21.MantelN,HaenszelW.Statisticalaspectsoftheanalysisofdatafromretrospectivestudiesofdisease. Journalofthe NationalCancerInstitute1959; 22 :719–748. 22.BreslowN.Oddsratioestimatorswhenthedataaresparse. Biometrika 1981; 68 :73–84. 23.GraubardBI,FearsTR,GailMH.Effectsofclustersamplingonepidemiologicanalysisinpopulation-basedcase–control studies. Biometrics 1989; 45 :1053–1071. 24.BrumbackBA,HeZ.TheMantel–Haenszelestimatoradaptedforcomplexsurveydesignsisnotduallyconsistent. StatisticsandProbabilityLetters 2011; 81 :1465–1470. Copyright©2012JohnWiley&Sons,Ltd.Statist.Med. 2013, 321325–1335 1335



PAGE 1

CONDITIONALPSEUDO-LIKELIHOODANDGENERALIZEDLINEARMIXEDMODELMETHODSTOADJUSTFORCONFOUNDINGDUETOCLUSTERByZHUANGYUCAIADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2014

PAGE 2

c2014ZhuangyuCai 2

PAGE 3

Tomyfamily 3

PAGE 4

ACKNOWLEDGMENTS Iwouldliketoexpressmyappreciationtomanypeoplewhogavemetheirinvaluableassistanceandgeneroushelpinthewritingofthisdissertation.Firstofall,Iwouldliketoexpressthedeepestappreciationtomyadvisor,Dr.BabetteBrumback.Withoutherextraordinaryguidance,persistenthelpandencouragement,Iwouldneverbeabletonishmydissertation.Next,Iwanttothankthemembersofmydoctoralcommittee,Dr.SamuelWu,Dr.XiaominLuandDr.XiaohuiXu,whohaveprovidedabundantsupportthroughouttheclasses,examsanddissertation.IalsowanttothanktheDepartmentofBiostatisticsforitshelpduringmygraduatestudy.Finally,Iwanttothankmyfamilymembersincludingmyparents,mywife,myson,andmybrothersandsisterswhohaveprovidedmewithgreatsupport.Chapter2ispartiallyreprintedfromStatisticsinMedicine2013,volume32,BrumbackBA,CaiZ,HeZ,ZhengH,andDaileyAB.ConditionalPseudolikelihoodMethodsforClusteredOrdinal,Multinomial,orCountOutcomeswithComplexSurveydata,page1325-1335,withpermissionfromJohnWileySons. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 ABSTRACT ......................................... 8 CHAPTER 1LITERATUREREVIEW ............................... 10 1.1Introduction ................................... 10 1.2ApplicationsoftheCPLMethod ........................ 10 1.2.1TheCPLMethodforBinaryOutcomes ................ 12 1.2.2TheCPLMethodforOrdinalOutcomes ............... 14 1.3BiasandMeasurementError ......................... 16 1.3.1BiasinBetween-WithinModels .................... 16 1.3.2MeasurementErrorinLMs ....................... 18 1.4MarginalEffect ................................. 20 1.5Model-BasedStandardization ......................... 22 2THECPLMETHOD ................................. 26 2.1Introduction ................................... 26 2.2MultinomialOutcomesinBaselineCategoryLogitModels ......... 26 2.3CountOutcomesinLoglinearModels .................... 30 2.4NonnegativeContinuousOutcomesinLoglinearModels .......... 34 2.5ApplicationoftheCPLMethod ........................ 37 3BIASANDMEASUREMENTERRORDUETOUNMEASUREDCLUSTERCOVARIATES ..................................... 41 3.1Introduction ................................... 41 3.2BiasduetoAdjustingforConfoundingbyUnmeasuredClusterCovariates 41 3.3MeasurementErrorinLMMsduetoUnmeasuredClusterCovariates ... 44 4MODEL-BASEDSTANDARDIZATIONTOACCOUNTFORUNMEASUREDCLUSTERCONFOUNDINGWITHCOMPLEXSURVEYDATA ......... 51 4.1Introduction ................................... 51 4.2Methods ..................................... 52 4.2.1Method1:TheCPLMethodwithoutUsingtheCensusData .... 54 4.2.2Method2:TheCensusMethod .................... 56 4.3SimulationStudies ............................... 57 4.4Application ................................... 58 4.5Discussion ................................... 60 5

PAGE 6

5ACHIEVEMENTS,LIMITATIONSANDFUTURERESEARCH .......... 63 APPENDIX AEXPECTATIONOFTHESCOREFUNCTIONISZERO ............. 67 BPROOFOFIDENTICALESTIMATORSINTWOMODELS ............ 72 CDERIVATIONOFTHEDISTRIBUTIONOFTHECLUSTEREFFECTS ..... 76 DDISTRIBUTIONOFRANDOMEFFECTS ..................... 78 ESASCODEANDSTATACODEINCHAPTER4 .................. 82 E.1SASCodeforSimulationStudies ....................... 82 E.2StataCodeforSimulationStudies ...................... 88 E.3SASCodeforComputingIndividual-LevelCoefcientsUsingtheCPLMethodandGeneratingBootstrappingDatasetsfromthe2008BRFSSSurveySata ................................... 90 E.4StataCodeforEstimatingtheDistributionofRandomEffectUsingtheCPLMethod .................................. 102 E.5StataCodeforComputingIndividual-andNeighborhood-LevelCoefcientsandtheDistributionofRandomEffectUsingtheCensusMethod ..... 104 E.6SASCodeforComputingModel-BasedStandardizedProportionsUsingtheCPLMethod ................................ 106 E.7SASCodeforComputingModel-BasedStandardizedProportionsUsingtheCensusMethod .............................. 109 REFERENCES ....................................... 114 BIOGRAPHICALSKETCH ................................ 117 6

PAGE 7

LISTOFTABLES Table page 2-1SimulationResultsforaMultinomialDistributionBasedon100SimulationDataSetsUsingtheCPLMethod(True1=1and2=2) ........... 30 2-2SimulationResultsforaPoissonDistributionBasedon100SimulationDataSetsUsingtheCPLMethod(True=0.5) .................... 34 2-3SimulationResultsforaGammaDistributionBasedon100SimulationDataSetsUsingtheCPLMethod(True=3) ..................... 37 2-4AssociationsbetweenAverageDrinksandCovariatesinthe2008FloridaBRFSSComplexSurveyData ................................ 39 3-1SimulationResultsBasedon100SimulationDataSets ............. 44 3-2SimulationResultsforaLinearMixedModelBasedon100SimulationDataSets(True0=2,1=2,and2=)]TJ /F5 11.955 Tf 9.3 0 Td[(3) ...................... 49 4-1SimulationResultsBasedon100SimulationDataSets ............. 58 4-2AssociationsbetweenOutcomesandCovariatesinthe2008FloridaBRFSSComplexSurveyData ................................ 59 4-3Model-BasedStandardizedProportionsofPeopleWhoDrinkAlcohol,Stratiedbyagegroup,toAccountforUnmeasuredConfoundingduetoCluster ..... 61 7

PAGE 8

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyCONDITIONALPSEUDO-LIKELIHOODANDGENERALIZEDLINEARMIXEDMODELMETHODSTOADJUSTFORCONFOUNDINGDUETOCLUSTERByZhuangyuCaiAugust2014Chair:BabetteBrumbackMajor:BiostatisticsThisdissertationfocusesonadjustingforconfoundingduetounmeasuredclustercovariatesusinggeneralizedlinearmixedmodels(GLMMs).Insocialepidemiology,GLMMsareoftenusedtomodelassociationsbetweenhealthoutcomesandpredictors.Thosepredictorsmayincludeindividual-levelfactorssuchaspersonalcharacteristics,andneighborhood-levelfactorssuchasaverageneighborhoodhouseincome.Thesocialepidemiologydataareoftencollectedusingcomplexsurveys.Thefollowingthreeissuesmustbetakenintoaccountinordertondconsistentestimatesofparametersofinterest.Therstissueisthatthenumberofsampledneighborhoodsisusuallylargebutthenumberofsampledindividualswithineachneighborhoodisoftensmall.Thesecondissueisthatunequalsamplingratesareoftenusedtosampleneighborhoodsandindividualswithineachneighborhood.Thethirdissueisthatneighborhood-levelfactorsareunmeasured.Ifthosethreeissuesarenothandledcorrectly,estimatedcoefcientscouldbeseriouslybiased.Ingeneralizedlinearmixedmodels,neighborhood-leveleffectscanbehandledusingarandomintercept,andindividual-levelcoefcientscanbeconsistentlyestimatedbyapairwiseconditionalpseudo-likelihood(CPL)method.TherstaimofthisdissertationistoapplytheCPLmethodtomultinomial,count,andnonnegativecontinuousoutcomesforndingindividual-levelcoefcients.Whenneighborhoodfactorsareunmeasuredandcorrelatedwithothercovariates,estimatorsbasedonageneralizedlinearmixedmodelarebiased.Abetween-withinmodelhas 8

PAGE 9

beenproposedtoavoidthebias.However,whensamplesizeswithinneighborhoodaresmallandthenumberofneighborhoodsislarge,abetween-withinmodelapproachtoadjustforconfoundingduetounmeasuredclustercovariatesisitselfpronetobias.Whenneighborhoodfactorsareunmeasuredandsamplesizeswithineachofneighborhoodsaresmall,samplemeansineachofneighborhoodsareoftenusedtoreplacetrueneighborhoodmeans,whichintroducesextravariationsintomodels.Inthosesituations,neighborhood-levelcoefcientsareattenuated.Sothesecondaimofthisdissertationistondthesourceofbiasontheuseofbetween-withinmodelsandtocorrectattenuatedneighborhood-levelcoefcients.Thethirdaimofthisdissertationistousemodel-basedstandardizationtoaccountforunmeasuredconfoundingduetoclusterwithcomplexsurveydatawhileestimatingamarginaleffectofanindividualcovariate. 9

PAGE 10

CHAPTER1LITERATUREREVIEW 1.1IntroductionThischapterwillpresentaliteraturereviewregardingtotheapplicationoftheconditionalpseudo-likelihood(CPL)method,biasandmeasurementerrorduetounmeasuredclustercovariates,andmodel-basedstandardization.Section1.2willpresentthereviewofliteratureregardingtheapplicationoftheCPLmethodinsocialepidemiologywithcomplexsurveydata.Binaryoutcomesinlogitmodelswillbeintroducedrstandthenordinaloutcomesinproportionaloddsmodelswillbeintroducednext.Section1.3willpresentthereviewofliteratureregardingtobiasandmeasurementerrorduetounmeasuredclustercovariates.Biasinbetween-withinmodelsduetounmeasuredclustercovariateswillbeintroducedrst.Measurementerrorinsimplelinearmodelsandmultiplelinearmodelswillbepresentednext.Section1.4willpresentthereviewofliteratureregardingtomarginaleffectsingeneralizedlinearmixedmodels.Thelastsectionwillpresentthereviewofliteratureregardingtomodel-basedstandardizationsingeneralizedlinearmixedmodels. 1.2ApplicationsoftheCPLMethodIn1974,Besag[ 4 ]rstproposedtheconditionalpseudo-likelihoodforapproximateinferenceinspatialprocess.Thepseudo-likelihoodwasdenedastheproductoftheconditionaldensitiesofasingleobservationgivenitsnearest-neighbors.In1978,Breslowetal.[ 5 ]rstnotedthataconditionaldensityofanobservationgivenapairofobservationsinmatchedcase-controlstudycouldbeexpressedasanordinarylogisticregressionwithoutintercept.In1987,Liang[ 24 ]furtherstudiedtheconditionallikelihoodinastratiedcase-controlstudy.Theconditionallikelihoodwasdenedastheproductofthedensitiesgiventhesumofapairofoutcomesbasedonallpossiblepairsofobservationswithineachstratum.In2010,Graubardetal.[ 20 ]extendedtheconditionalpseudo-likelihoodmethodtosurveydataandusedspecialprogrammingfor 10

PAGE 11

estimatingtheparametersofinterest.In2010,Brumbacketal.[ 8 ]rstusedSASPROCSURVEYLOGISTICforestimatingtheparametersofinterestandgreatlysimpliedtheprogramming.TheCPLmethodwasappliedtoananalysisof2009NationalHealthInterviewSurvey(NHIS)public-usedatain2010[ 8 ]andtoananalysisofthe2008FloridaBehaviorRiskFactorSurveillanceSystem(BRFSS)surveydatain2012[ 10 ].MoredetailsabouttheapplicationoftheCPLmethodwillbepresentedintherestofthisSection.Insocialepidemiology,geographicneighborhoodisviewedasanimportantdeterminantofhealthbehaviorsandoutcomes.Theeffectsofmeasuredorunmeasuredneighborhoodcharacteristicsonhealthbehaviorsandoutcomesmightbeinterestingtoresearchersandevencannotbeignoredinsomesituations.Sohealthbehaviorsandoutcomescoulddependonbothindividualcharacteristicsandcorrespondingneighborhoodcharacteristics.Effectsofneighborhoodcharacteristicscanbemodeledusingarandominterceptindataanalyses.SupposeapopulationincludesMclustersandNiindividualsperclusterandtheoutcomestudiediswhetheranindividualhasacertainhealthbehaviorsuchastoothhygiene,physicalexercise,alcoholconsumption,smoking,drugabuseandaddiction,andsoon.Suchhealthbehaviorsarenotonlyrelatedtopersonalcharacteristicsbutalsopossiblyrelatedtocorrespondingneighborhoodcharacteristics.Leti=1,...,Mindexneighborhoodinthepopulationandj=1,...,Niindexindividualwithinagivenithneighborhood.SupposeYijisanoutcome,Xijisavectorofindividual-levelcharacteristics,andbiisavectorofneighborhood-levelcharacteristics.Thefollowinggeneralizedlinearmixedmodel[ 1 ]canbeusedtomodeltheassociationbetweenoutcomesandtwolevelsofcharacteristics, E(YijjXi,bi)=h(bi+Xij),(1) 11

PAGE 12

wherehisaninverselinkfunctionandaretheparametersofinterest.FurtherassumethattheMneighborhoodsaresampledindependentlyandidenticallydistributed(i.i.d.)fromasuper-populationofneighborhoods,sothe(Yi,Xi,bi),i=1,...,Mi,arei.i.d.TheCPLmethodhasbeendevelopedandappliedtotheresponsewhichisabinaryoutcomeoranordinaloutcome. 1.2.1TheCPLMethodforBinaryOutcomesAllsettingsformodel( 1 )areretainedbutfurtherassumethattheresponseisabinaryoutcome.Therefore,Yijj(bi,Xi)Ber(pij),pij=expit(bi+Xij),Xi=fXi1,...,XiNig.Thefollowinggeneralizedlinearmixedmodelcanbeusedtomodelassociationsbetweenanoutcomeandcovariates, E(YijjXi,bi)=expit(bi+Xij).(1)Whenthenumberofneighborhoodsislarge,thenthenumberofbiweneedtoestimateisalsolarge.Whencomplexclustersamplingisusedforcollectingdatainsocialepidemiologyandthesamplesizewithineachclusterisusuallysmall,thestandardmaximumlikelihoodtheoryisinvalid.NeymanandScott[ 27 ]showedthatwhenthenumberofnuisanceparameterstendstowardinnityasthesamplesizegrows,theestimatedarebiased.Therearetwoapproachestoavoidthebias.Oneistheconditionallikelihoodmethodwhichconditionsonsufcientstatisticsforbitoavoidestimatingnuisanceparameters.Thesecondoneisageneralizedlinearmixedmodelwhichtreatsneighborhoodeffectsbiasrandomeffectsthatareassumedtofollowanormaldistribution.Theconditionallikelihoodmethodwillbechosentoestimatetheparametersofinterestandeliminatethenuisanceparametersintherstgoalofthisdissertation.Morespecically,apairwiseconditionalpseudo-likelihoodmethodwillbeusedtoestimatetheparametersofinterestandeliminatethenuisanceparameters.Supposeniindividualsarerandomlysampledpercluster,thenallpossiblepairswithineachclusterareformed. 12

PAGE 13

Thereareni(ni)]TJ /F5 11.955 Tf 11.23 0 Td[(1)=2pairsintotalperclusterwheneverniisgreaterthan1.Let(Yij,Yil)denoteapairofoutcomesandSijl=Yij+Yil.Accordingtomodel( 1 ),thejointdistributionof(Yij,Yil)givenSijl,Xi,andbicanbederivedasfollowsP(Yij=1,Yil=0jSijl=1,Xi,bi)=P(Yij=1,Yil=0jXi,bi) P(Sijl=1jXi,bi)=P(Yij=1,Yil=0jXi,bi) P(Yij=1,Yil=0jXi,bi)+P(Yij=0,Yil=1jXi,bi)=P(Yij=1jXi,bi)P(Yil=0jXi,bi) P(Yij=1jXi,bi)P(Yil=0jXi,bi)+P(Yij=0jXi,bi)P(Yil=1jXi,bi)=eXij eXij+eXil=e(Xij)]TJ /F8 7.97 Tf 6.59 0 Td[(Xil) 1+e(Xij)]TJ /F8 7.97 Tf 6.59 0 Td[(Xil),Similarly,P(Yij=0,Yil=1jSijl=1,Xi,bi)=e(Xil)]TJ /F8 7.97 Tf 6.59 0 Td[(Xij) 1+e(Xil)]TJ /F8 7.97 Tf 6.58 0 Td[(Xij)=1 1+e(Xij)]TJ /F8 7.97 Tf 6.58 0 Td[(Xil),P(Yij=1,Yil=1jSijl=2,Xi,bi)=P(Yij=0,Yil=0jSijl=0,Xi,bi)=1.SopairswithSijl=2orSijl=0havenocontributionstothelikelihoodfunctionforestimating.AllpairswithYij=Yilcanbedroppedfromthelikelihoodfunction.BydroppingpairswithYij=Yil,itcanreducethesizeofadatasetandspeedupthecomputationofprogramming.Thelikelihoodfunctionforestimatinghasthefollowingform L(;Xi,Yi)=MYi=1ni)]TJ /F4 7.97 Tf 6.59 0 Td[(1Yj=1niYl=j+1e(Xij)]TJ /F8 7.97 Tf 6.59 0 Td[(Xil) 1+e(Xij)]TJ /F8 7.97 Tf 6.58 0 Td[(Xil)Yijl1 1+e(Xij)]TJ /F8 7.97 Tf 6.59 0 Td[(Xil)1)]TJ /F8 7.97 Tf 6.59 0 Td[(Yijl(1)Likelihoodfunction( 1 )canbeviewedasanordinarylogisticregressionmodelwithanewcovariateXijl=Xij)]TJ /F7 11.955 Tf 12.88 0 Td[(Xil,anewresponseYijl=Yij,andnointerceptwithinithneighborhood.Breslowetal.[ 5 ]rstnotedthataconditionalmaximumlikelihoodmethodformatchedpairsisequivalenttoanordinarylogisticregressionwithoutusingintercept.Therefore,canbeestimatedbyusingstandardlogisticregressionsuchasSASSURVEYPROCLOGISTIC.ThefollowingistheSAScodeforestimating. 13

PAGE 14

procsurveylogisticdata=all_pairs;odsoutputParameterEstimates=one_estimates;modelY_ijl=X_ijl/link=logitnoint;clusteri;run;Nowsupposeniindividualsaresampledperclusterbyusingunequalsamplingrates,thensamplingweightsmustbetakenintoaccountintheabovelikelihoodfunction.LetWijl=WijWilwhereWijandWilarethereciprocalsofsamplingratesforYijandYil,likelihoodfunction( 1 )canbewrittenasLW(;Xi,Yi)=MYi=1ni)]TJ /F4 7.97 Tf 6.59 0 Td[(1Yj=1niYl=j+1 e(Xij)]TJ /F8 7.97 Tf 6.58 0 Td[(Xil) 1+e(Xij)]TJ /F8 7.97 Tf 6.59 0 Td[(Xil)Yijl1 1+e(Xij)]TJ /F8 7.97 Tf 6.59 0 Td[(Xil)1)]TJ /F8 7.97 Tf 6.58 0 Td[(Yijl!Wijl (1)Graubardetal.[ 20 ]extendedtheconditionalpseudo-likelihoodmethodtosurveydataandusedspecialprogrammingforestimating.Brumbacketal.[ 8 ]rstusedSASPROCSURVEYLOGISTICforestimatingandgreatlysimpliedtheprogramming.Nowthepairwiseconditionalpseudo-likelihoodmethodcanbeusedtoestimate,allpairswithineachofclusterswillbetreatedasiftheywereindependent.ThefollowingistheSAScodeforestimating. procsurveylogisticdata=all_pairs;odsoutputParameterEstimates=one_estimates;modelY_ijl=X_ijl/link=logitnoint;clusteri;weightW_ijl;run; 1.2.2TheCPLMethodforOrdinalOutcomesTheCPLmethodwasappliedtoabinaryoutcomerst.Afterthat,theCPLmethodwasappliedtoanordinaloutcomebyBrumbacketal.[ 9 ].Allsettingsformodel( 1 )areretainedexceptthattheresponseisanordinaloutcomewithKcategories.Let 14

PAGE 15

YijbeanordinaloutcomeforindividualjinithneighborhoodandthevalueofYijbekwherek=1,...,K.Theproportionaloddsmodelcanbewrittenas P(YijkjXi,bi)=expit(Xij+k+bi),k=1,...,K)]TJ /F5 11.955 Tf 11.96 0 Td[(1,(1)wherekisincreasingwithk,sinceP(YijkjXi,bi)increasesink.Let(Yij,Yil)beoneofallpossiblepairedoutcomeswithintheithneighborhood.LetVkij=1ifYijkandVkij=0otherwiseandSkijl=Vkij+Vkil.SothejointdistributionofYijkgivenXi,bi,andSkijl=1canbewrittenasP(Yijk,Yil>kjXi,bi,Skijl=1)=P(Yijk,Yil>kjXi,bi) P(Skijl=1jXi,bi)=P(Yijk,Yil>kjXi,bi) P(Yijk,Yil>kjXi,bi)+P(Yij>k,YilkjXi,bi)=P(YijkjXi,bi)P(Yil>kjXi,bi) P(YijkjXi,bi)P(Yil>kjXi,bi)+P(Yij>kjXi,bi)P(YilkjXi,bi)=exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))Similarly,P(Yij>k,YilkjXi,bi,Skijl=1)=1 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))P(Yij>k,Yil>kjXi,bi,Skijl=2)=P(Yijk,YilkjXi,bi,Skijl=0)=1LetLkijl(;Vkij,Vkil)denotethelikelihoodforeachofabovejointdistributions.Treatingallpairswithinaneighborhoodasindependentandtreatingallpairsfordifferentkasindependent,thecompositeconditionallikelihood[ 32 ]canbewrittenas L()=K)]TJ /F4 7.97 Tf 6.59 0 Td[(1Yk=1MYi=1Ni)]TJ /F4 7.97 Tf 6.59 0 Td[(1Yj=1NiYl=j+1Lkijl(;Vkij,Vkil).(1)Whenunequalsamplingratesareusedforsamplingindividuals,samplingweightsmustbetakenintoaccount.LetWijl=WijWilwhereWijandWilarethereciprocals 15

PAGE 16

ofsamplingratesforYijandYil.Theweightedcompositeconditionallikelihoodcanbewrittenas LW()=K)]TJ /F4 7.97 Tf 6.59 0 Td[(1Yk=1mYi=1ni)]TJ /F4 7.97 Tf 6.59 0 Td[(1Yj=1niYl=j+1(Lkijl(;Vkil,Vkil))wijl.(1)DatamustbetransformedinordertousetheSASSURVEYLOGISTICprocedure.ThenewcovariateisXijl=Xij)]TJ /F7 11.955 Tf 12.63 0 Td[(XilandthenewoutcomeisYkijl=1whenP(Yijk,Yil>kjXi,bi,Skijl=1)orYkijl=0whenP(Yij>k,YilkjXi,bi,Skijl=1).AllpairswithVij=Vilcanbedroppedoutfromcomputationbecausetheydonotcontributetothelikelihoodfunctionforestimating.BydroppingpairswithVij=Vil,onecanreducethesizeofadatasetandspeedupthecomputationofprogramming.ThefollowingistheSAScodeforestimating. procsurveylogisticdata=all_pairs;odsoutputParameterEstimates=one_estimates;modelY_kijl=X_ijl/link=glogitnoint;clusteri;weightW_ijl;run;TheCPLmethodcanalsobeappliedtocountoutcomesandnonnegativecontinuousoutcomes.TherstaimofthisdissertationistodevelopmethodstoapplytheCPLmethodtocountoutcomesandnonnegativecontinuousoutcomes,whichwillextendthescopeoftheapplicationoftheCPLmethod. 1.3BiasandMeasurementError 1.3.1BiasinBetween-WithinModelsWewillfocusoninconsistentestimatorswhichdonotconvergeinprobabilitytothetruevaluesofparametersassamplesizeincreases.Inconsistentestimatorsarealsobiasedestimators.Biasistheexpecteddeviationofanestimatefromthetruequantitytobeestimated[ 2 ].Biasisalsoreferredtoassystematicerrorsthatdecrease 16

PAGE 17

thevalidityofestimates.Biasisdifferentfromrandomerrorsthatdecreasetheprecisionofestimates.Biascanhappeninsevendifferentstagesofresearchincludingliteraturereview,studydesign,studyexecution,datacollection,analysis,interpretationofresults,andpublication.Whenanestimator^ofaparameterconvergesto+bassamplesizeincreases,then^hasasymptoticbiasb.Thebiaswefocusinthissubsectionisconfoundingbiasduetounmeasuredclustercovariates.Thatbiaswillnotdisappearbyonlyincreasingthenumberofsampledneighborhoods.Brumbacketal.[ 7 , 10 ]havebeeninvestigatingracialandethnicdisparitiesindentalpreventivecareusingcomplexsurveydatafromthe2008FloridaBehavioralRiskFactorsSurveillanceSystem(BRFSS).Theywanttoknowwhetherestimatedhealthdisparitiesbasedonrace/ethnicitymightdisappearorevenreversewhenunmeasuredneighborhoodcharacteristicsareincludedinthemodel.IntheBRFSSsurveydata,somesamplesizeswithinneighborhoodsareusuallyverysmallandthenumberofneighborhoodsislarge,thenthenumberofnuisanceparameterstobeestimatedisalsolarge.NeymanandScott[ 27 ]pointedoutthatordinarymaximumlikelihoodestimatorsarebiased.Thereareseveralapproachestoavoidthebias.Thegeneralizedlinearmixedmodel(GLMM)[ 1 ]isonepossibleapproach,whichtreatsunmeasuredneighborhoodcharacteristicsasnormalrandomvariables.However,whenunmeasuredneighborhoodcharacteristicsarecorrelatedwithothercovariatesandsamplesizeswithineachneighborhoodaresmall,estimatorsarebiased[ 7 , 10 , 22 , 25 ].Considerasimplelinearmixedmodel Yij=bi+Xij+ij,(1)wherei=1,...,m,j=1,...,ni,Xij(0,2x),bi(0,2b),ijN(0,2),andijisindependentofbothbiandXij.Whenni=nforalliandmgoestoinnity,NeuhausandMcCulloch[ 25 ]showedtheprobabilitylimitoftheestimatedis ^glspr)560(!+xb 2x2 2+(n)]TJ /F5 11.955 Tf 11.96 0 Td[(1)2b,(1) 17

PAGE 18

wherexbisthecovariancebetweenbiandXij.TheaboveestimatedisunbiasedwhenbiisnotcorrelatedwithXijorsamplesizeninneighborhoodislarge;otherwise,itisbiased.Insocialepidemiology,between-withinmodels[ 25 , 26 , 31 ]areverypopularforadjustingforconfoundingduetounmeasuredneighborhoodcharacteristicsinordertoavoidthebias.SchemphandKaufman[ 30 ]supportusingbetween-withinmodelsmoreoftenforaccountingforcontextualconfounding.Recently,biasestimatorsbasedonbetween-withinmodelshavebeenreported.Brumbacketal.[ 7 , 10 ]presentasimulationstudytoshowthatwhenalinkfunctionisalogitfunction,estimatedparameterscanbebias.Thereasonforthebiasisnotclear,sofurtherresearchisneeded. 1.3.2MeasurementErrorinLMsInsocialepidemiology,surveysandinterviewsareoftenusedforcollectingdatasuchassaltintake,houseincome,hoursofdailyexercise,andmanyothers.Theyoftensufferfrommuchmeasurementerror.Itiswellknownthatnaiveestimatorsarebiasedwithoutconsideringmeasurementerror[ 3 , 11 – 14 ].Measurementerrorcausesbiasinparameterestimation.TherearetwocommontypesofmeasurementerrormodelsincludingaclassicalmeasurementerrormodelandaBerksonmeasurementerrormodel.LetXandWdenoteunobservedandobservedvaluesofameasurement.LetUdenotemeasurementerror.Theclassicalmeasurementerrormodelisoftheform W=X+U,E(UjX)=0.(1)TheBerksonmeasurementerrormodelisoftheform X=W+U,E(UjW)=0.(1)Thissubsectionwillpresentmoredetailsabouttheclassicalmeasurementerrorinasimplelinearmodelandamultiplelinearmodel. 18

PAGE 19

Letuslookatasimplelinearmodelrstandseehowmeasurementerrorleadstobiasedcoefcients.AssumeYijXi=0+1Xi+i,whereiN(0,2e),XiN(X,2X).Assumeanormalclassicalmeasurementerrormodel,soWi=Xi+ui,whereuiN(0,2u)andui?fYi,Xi,ig.Thenwecanhaveanaivemodel Yi=0+1Wi+i,(1)Accordingtotheleastsquaresestimators, ^1=ni=1(Wi)]TJ /F5 11.955 Tf 15.56 2.66 Td[(W)(Yi)]TJ /F5 11.955 Tf 13.89 2.66 Td[(Y) ni=1(Wi)]TJ /F5 11.955 Tf 15.56 2.66 Td[(W)2=SWY SWW,(1)whereSWY=(n)]TJ /F5 11.955 Tf 12.37 0 Td[(1))]TJ /F4 7.97 Tf 6.58 0 Td[(1ni=1(Wi)]TJ /F5 11.955 Tf 15.97 2.66 Td[(W)(Yi)]TJ /F5 11.955 Tf 14.3 2.66 Td[(Y)andSWW=(n)]TJ /F5 11.955 Tf 12.37 0 Td[(1))]TJ /F4 7.97 Tf 6.58 0 Td[(1ni=1(Wi)]TJ /F5 11.955 Tf 15.97 2.66 Td[(W)2.Therefore,asngoesto1,1=WY 2W=2X 2X+2u1,0=0+2u 2X+2u1X.Soweseethenaiveestimateof1isalwaysattenuatedaslongas2uisnotzero.NowsupposemultiplecovariatesaremeasuredwitherrorandXiisapdimensionalcolumnvector,wehavealinearmodelasfollows Yi=0+X0i1+i,i=1,..,n,(1)whichisequivalentto Yi)]TJ /F5 11.955 Tf 13.89 2.66 Td[(Y=(X0i)]TJ /F5 11.955 Tf 15.04 2.66 Td[(X0)1+i,i=1,..,n,(1)whereY=0+X01=n)]TJ /F4 7.97 Tf 6.58 0 Td[(1ni=1Yi.Accordingtotheleastsquaresestimators,^1=(X0X))]TJ /F4 7.97 Tf 6.59 0 Td[(1X0Y=S)]TJ /F4 7.97 Tf 6.58 0 Td[(1XXSXY.^1=S)]TJ /F4 7.97 Tf 6.59 0 Td[(1XXSXYwillbeusedveryoftenintherestofthissectionsotheywillbecalledthecenteredformofleastsquaresestimators.A 19

PAGE 20

normalclassicalmeasurementerrormodelisdenedasfollows Wi=Xi+ui,whereuiN(0,uu)(1)anduiisindependentofXiandYi.Thenaivemodelis Yi=0+W0i1+i,i=1,..,n.(1)Accordingtothecenteredformofleastsquaresestimators,^1=S)]TJ /F4 7.97 Tf 6.59 0 Td[(1WWSWY.Therefore,asngoesto1,then 1=)]TJ /F4 7.97 Tf 6.58 0 Td[(1WWWY=)]TJ /F4 7.97 Tf 6.59 0 Td[(1WWXY=(XX+uu))]TJ /F4 7.97 Tf 6.58 0 Td[(1XX1(1)(SeeBuonaccorsi[ 11 ]P.109).So1isbiasedaslongasuuisnotazeromatrix. 1.4MarginalEffectAmarginaleffectofacovariateonanoutcomeisfrequentlyreferredtoasapopulation-averagedeffect.Researchersmightbeinterestedinestimatingamarginaleffect.Theestimatedinprevioussectionsdoesnotrepresentamarginaleffectofacorrespondingcovariateonanoutcome.Thissectionwillgivemoredetailsaboutmarginaleffects.SupposethatGLMMsareusedtomodelassociationsbetweenhealthoutcomesandcovariatesasfollows E(YijjXi,bi)=h(bi+Xij),(1) bi=q(Xi,)+i,(1)whereq(Xi,)isafunctionofXiand.Aneighborhoodisdenotedbyi,i=1,...M;andanindividualwithin-neighborhoodisdenotedbyj,j=1,...,Ni;Yijisahealthoutcomeorsocialbehavior;hisaninverselinkfunction,biisavectorofneighborhood-levelcovariates;Xijisavectorofindividual-levelcovariates,Xi=fXi1,...,Xijg,andiN(0,1).Theestimatedinmodel( 1 )aresubject-leveleffectsofcovariatesandnotpopulation-averagedeffectsofcovariateswhenthelinkfunctiongisalogitfunction. 20

PAGE 21

BasedontheapproachpresentedbyBrumbacketal.[ 6 ],amarginaleffectcanbecharacterizedasfollows =gZh(bi+xij)dF(bi))]TJ /F7 11.955 Tf 11.96 0 Td[(gZh(bi+(xij)]TJ /F5 11.955 Tf 11.96 0 Td[(1))dF(bi),(1)whereF()isacumulativedistributionofbi.Formula( 1 )isageneralformforamarginaleffect.Ifalinkfunctiongisanidentityfunction,alogfunction,oralogitfunction,thenamarginaleffectisdenedasriskdifference,logrelativerisk,orlogoddsratio,respectively.Withoutlossgenerality,Xijisassumedtobeasinglecovariatefromnowoninthissection.Whenthelinkfunctiongisanidentityfunction,thentheinverselinkfunctionhisanidentityfunction.Themarginaleffectinthiscaseisariskdifference.Formula( 1 )becomes=Z(bi+xij)dF(bi))]TJ /F12 11.955 Tf 11.95 16.28 Td[(Z(bi+(xij)]TJ /F5 11.955 Tf 11.96 0 Td[(1))dF(bi)=(E(bi)+xij))]TJ /F5 11.955 Tf 11.96 0 Td[((E(bi)+(xij)]TJ /F5 11.955 Tf 11.95 0 Td[(1))=.Weseesubject-leveleffectisidenticaltothepopulation-averagedeffectwhenhisanidentityfunction.Whenthelinkfunctiongisalogfunction,thentheinverselinkfunctionhisanexponentialfunction.Themarginaleffectinthiscaseisalogrelativerisk.Formula( 1 )becomes=logZebi+xijdF(bi))]TJ /F7 11.955 Tf 11.96 0 Td[(logZebi+(xij)]TJ /F4 7.97 Tf 6.59 0 Td[(1)dF(bi)=.Whenthelinkfunctiongisalogitfunction,thentheinverselinkfunctionhisanexpitfunction.Themarginaleffectinthiscaseisalogoddsratio.Formula( 1 )becomes=logitZexpit(bi+xij)dF(bi))]TJ /F7 11.955 Tf 11.95 0 Td[(logitZexpit(bi+(xij)]TJ /F5 11.955 Tf 11.96 0 Td[(1))dF(bi)6=. 21

PAGE 22

Inthiscase,theestimatedarenotapopulation-averagedeffectofacovariateoramarginaleffect.Themarginaleffectmustbecomputedbasedonformula( 1 ).WhenXiisavailableandbiisalinearfunctionofXiand,Stata'sgllammcanbedirectlyusedtoestimateallparameters.Brumbacketal.[ 10 ]presenteddetailsabouthowtoapplyStatagllammwithcomplexsurveydata.WhenXiisnotavailablebutbiisstillassumedtobealinearfunctionofXiand,weightedsamplemeanswithineachofneighborhoodsareoftenusedtoreplacetheXi.Simulations[ 10 ]showthattheestimatedarebiased,sothemarginaleffectbasedonformula( 1 )willbebiasedtoo. 1.5Model-BasedStandardizationStandardizationistheprocessofmakingthingsofthesametypehavethesamebasicfeatures[ 15 ].Standardizationiswidelyusedinmanydifferentwaysinstatistics.Astandardnormaldeviateisthemostfrequentlyusedapplicationofstandardization.Thestandardnormaldeviatemeasuresthedistancebetweenarandomvariableandthemeanoftherandomvariableinunitsofthestandarddeviation[ 2 ].Standardizationmethodsareoftenusedtoadjustfortheeffectsofage,sex,andotherpossiblefactors,inthecomparisonofspecicratesbetweentwoormorepopulations[ 2 ].Indirectanddirectstandardizationarethetwomostwidelyusedmethodsforstandardizedratesinepidemiology.Forexample,adirectlystandardizedrateisdenedasthesumofweightedratesasfollows: rS=IXi=1Ni Nri,(1)whereN=PIi=1Ni,Niisthetotalnumberofpeopleingroupiinastandardpopulationandriisacruderateofgroupiinastudypopulation.Sothedirectlystandardizedrateisnothingmorethanweightedaveragingofcorrespondingrates.Thisstandardizationiscomputationallysimplebutitheavilyreliesonthechoiceofthestandardpopulation. 22

PAGE 23

Indirectanddirectstandardizationarealsoreferredtoasnon-parametricmethodswhichtendtosufferfrominstability(lowprecision)[ 21 ].Asaresult,parametricestimatorshavebeenproposedforhighprecision.Parametricestimatorsarealsoreferredtoasmodel-basedstandardizedestimators.Greenland[ 21 ]proposedageneralapproachforconstructingestimatorsofstandardizedparametersusinggeneralizedlinearmodelsandshowedthatmodel-basedestimatorscouldhaveanexceptionallysimpleforminsomecommonspecialcases.Themaindifferencebetweennonparametricstandardizationandmodel-basedstandardizationisthetypeofmeasurementsbeingstandardized.Nonparametricstandardizationndsweightedcrudemeasurements.Model-basedstandardizationndsweightedttedmeasurements.Model-basedstandardizationcanbeusedtostandardizemeasurementsofbothassociationandeffect.Rothman[ 29 ]presentedthedifferencesbetweenassociationandeffect.”Ameasureofeffectcompareswhatwouldhappentoonepopulationundertwopossiblebutdistinctlifecoursesorconditions,ofwhichatmostonlyonecanoccur.Itisatheoreticalconceptinsofarasitislogicallyimpossibletoobservethepopulationunderbothconditions,andhenceitislogicallyimpossibletoseedirectlythesizeoftheeffect.Incontrast,ameasureofassociationcompareswhathappensintwodistinctpopulations,althoughthetwodistinctpopulationsmaycorrespondtoonepopulationindifferenttimeperiods.Subjecttophysicalandsociallimitations,wecanobservebothpopulationsandsocandirectlyobserveanassociation”.Sinceaneffectofafactormeasuresthedifferenceofpopulationcharacteristicsundertwodistinctconditionsorlifecourses,theeffectofafactorisoftencalledpotential-outcomeorcounterfactualmeasure.LetY,X,andZdenoteoutcome,exposure,andconfounder.Theoutcomemodel,theregressionofYonXandZ,canbewrittenasfollows: E(YjX=x,Z=z)=h(+x+z),(1) 23

PAGE 24

wherehisaninverselinkfunction.LetPw(z)=1andWbeasetofweightsw(z)whichdenoteastandarddistributionforpotentialconfounderZ.ThentheregressionofYonXstandardizedtoWistheaverageofE(YjX=x,Z=z)weightedbyw(z)anditcanbewrittenas EW(YjX=x)=XZw(z)E(YjX=x,Z=z).(1)IfYisbinary,then( 1 )canberewrittenas PW(YjX=x)=XZw(z)P(YjX=x,Z=z).(1)Thereareseveralapproachestomodel-basedstandardization.Oneapproachistousemodel-ttedoutcomein( 1 ).Model-basedstandardizationhassomeniceproperties.Whentheoutcomemodelisoftheadditiveform,then( 1 )canbewrittenas E(YjX=x,Z=z)=+x+z,(1)thenEW(YjX=x+1))]TJ /F7 11.955 Tf 12.02 0 Td[(EW(YjX=x)isthestandardizeddifferenceforaunitincreaseinX,whichisequaltothecoefcientofcovariateX,regardlessoftheweightingW.Whentheoutcomemodelisofthelog-additiveform,then( 1 )canbewrittenas E(YjX=x,Z=z)=exp(+x+z),(1)thenEW(YjX=x+1)=Xzw(z)exp(+(x+1)+z)=Xzw(z)exp(+x+z)exp()=EW(YjX=x)exp(),solog[EW(YjX=x+1)=EW(YjX=x)]isthelogstandardizedratioforaunitincreaseinX,whichisequaltotheXcoefcient,regardlessoftheweightingW.Whentheoutcomemodelisofthelogitform,thosenicepropertiesdonotcarryover.Thethirdaimofthis 24

PAGE 25

dissertationistondareasonableapproachtocomputeamodel-basedstandardizedeffectforcomplexsurveydata. 25

PAGE 26

CHAPTER2THECPLMETHOD 2.1IntroductionBrumbacketal.[ 8 , 10 ]havedevelopedtheCPLmethodforbinaryandordinaloutcomeswithcomplexsurveydata.ThischapterwillextendthescopeoftheapplicationsoftheCPLmethodwithcomplexsurveydata.InSection2.2,theCPLmethodwillbeappliedtomultinomialoutcomesinbaselinecategorymodels.InSection2.3,theCPLmethodwillbeappliedtocountoutcomesinloglinearmodels.InSection2.4,theCPLmethodwillappliedtononnegativecontinuousoutcomesinloglinearmodels.InSection2.5,theCPLmethodwillbeappliedtothe2008FloridaBRFSSsurveydatatoaccesstheassociationsbetweenaveragealcoholconsumptionandindividualcharacteristics.ThischapterwilldevelopthemethodsofapplyingtheCPLmethodtovarioustypesofoutcomesandwillshowwhythosemethodswork.TheSAScodecorrespondingtodifferenttypesofoutcomeswillbeprovidedaswell. 2.2MultinomialOutcomesinBaselineCategoryLogitModelsSupposeapopulationincludesMclustersandNiindividualsperclusterandtheresponsestudiedisamultinomialoutcomesuchascategoriesofanindividual'sbodymassindex,daysofphysicalexerciseseachweek,andsoon.Suchhealthoutcomesandbehaviorsarerelatedtonotonlypersonalcharacteristicsbutalsoneighborhoodcharacteristics.Leti=1,...,Mindexneighborhoodinpopulationandj=1,...,Niindexindividualwithinagivenneighborhood.SupposeYijisamultinomialoutcomewithKcategories,Xijdenotesanindividualcharacteristic,andbidenotesaneighborhoodcharacteristic.AssumethatYijgivebiandXiindependentlyfollowsMN(1,...,K),Xi=fXi1,...,XiNig.Therearemanymodelssuchasabaselinecategorylogitmodel,anadjacentcategorieslogitmodel,andacumulativelogitmodel,whichcanbeusedtomodelassociationsbetweenamultinomialoutcomeandcovariates.TheCPLmethodwillbeappliedtothebaselinecategorylogitmodels.Thebaselinecategorylogit 26

PAGE 27

modelforabovedatacanbewrittenas logP(Yij=kjXi,bi) P(Yij=KjXi,bi)=Xijk+k+bi,k=1,...,K)]TJ /F5 11.955 Tf 11.96 0 Td[(1,(2)wherekaretheparametersofinterest.Supposeniindividualsarerandomlysampledpercluster;thenallpossiblepairswithineachclusterareformed.Thereareni(ni)]TJ /F5 11.955 Tf 12.04 0 Td[(1)=2pairsintotalperclusterwheneverniisgreaterthan1.Let(Yij,Yil)denoteanarbitrarypairofoutcomeswithinacluster.Byapplying( 2 ),theconditionallikelihoodthatYij=kandYil=KgivenXi,bi,andSijl=f(Yij=k,Yil=K)or(Yij=K,Yil=k)fork6=KgequalsLijl(;Yij,Yil)=P(Yij=k,Yil=KjXi,bi,Sijl)=exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)k) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)k).Similarly,theconditionallikelihoodthatYij=KandYil=kgivenXi,bi,andSijl=f(Yij=k,Yil=K)or(Yij=K,Yil=k)fork6=KgequalsLijl(;Yij,Yil)=P(Yij=K,Yil=kjXi,bi,Sijl)=1 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)k).TheconditionallikelihoodthatYij=k1andYil=k2givenXi,bi,andSijl=f(Yij=k1,Yil=k2)or(Yij=k2,Yil=k1)fork16=k2,k16=K,andk26=KgequalsLijl(;Yij,Yil)=P(Yij=k1,Yil=k2jXi,bi,Sijl)=exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)k1+(Xil)]TJ /F7 11.955 Tf 11.95 0 Td[(Xij)k2)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)k1+(Xil)]TJ /F7 11.955 Tf 11.96 0 Td[(Xij)k2)).Similarly,theconditionallikelihoodthatYij=k2andYil=k1givenXi,bi,andSijl=f(Yij=k1,Yil=k2)or(Yij=k2,Yil=k1)fork16=k2,k16=K,andk26=KgequalsLijl(;Yij,Yil)=P(Yij=k2,Yil=k1jXi,bi,Sijl)=1 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)k1+(Xil)]TJ /F7 11.955 Tf 11.96 0 Td[(Xij)k2)). 27

PAGE 28

TheconditionallikelihoodLijl(,Yij,Yil)thatYij=Yil=kgiventhat(Yij,Yil)=(k,k)havenocontributionsforestimatingk.UsingtheCPLmethod,allpairswithineachclusterfromsampledataaretreatedasiftheywereindependent,sotheconditionallikelihoodfunctionforestimatingkis L()=mYi=1ni)]TJ /F4 7.97 Tf 6.59 0 Td[(1Yj=1niYl=j+1Lijl(,Yij,Yil).(2)TheproofthattheexpectationofthescorefunctionS()iszeroispresentedinAppendixA.Now,supposeunequalsampleratesareusedforcollectingdata.LetWijandWildenotethereciprocalsofsamplingratesforYijandYil,respectively,soWijl=WijWildenotestheweightofthepairofoutcomes,thenthelikelihoodfunctionforsampledatacanbewrittenasfollows: LW()=mYi=1ni)]TJ /F4 7.97 Tf 6.59 0 Td[(1Yj=1niYl=j+1(Lijl(,Yij,Yil))Wijl.(2)Theweightsin( 2 )aresurveyweights,ratherthanprecisionweights,andassuchmustbecorrectlyspeciedtoensureaconsistentestimator.ThederivativeofthelogofLW()isanunbiasedestimatingequationfor,assumingthatourbaselinecategorylogitmodel( 2 )iscorrect,thatourweightswijlarecorrect,allwithin-clusterpairsinthepopulationhaveapositiveprobabilityofselectionintothesample,andthattheexpectationofthescorefunctionis0.TheSASPROCSURVEYLOGISTICcanbeusedforestimatingk.InordertousetheSASPROCSURVEYLOGISTIC,datahavetobetransformedasfollows.Forconvenience,assumeKhasthreecategoriesinthiscase.WhenYij=1andYil=3,thenthenewresponseisYijl=1andthenewcovariatesareXijl1=Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(XilandXijl2=0.WhenYij=2andYil=3,thenthenewresponseisYijl=1andthenewcovariatesareXijl1=0andXijl2=Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil. 28

PAGE 29

WhenYij=3andYil=1,thenthenewresponseisYijl=0andthenewcovariatesareXijl1=Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(XilandXijl2=0.WhenYij=3andYil=2,thenthenewresponseisYijl=0andthenewcovariatesareXijl1=0andXijl2=Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil.WhenYij=1andYil=2,thenthenewresponseisYijl=1andthenewcovariatesareXijl1=Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(XilandXijl2=Xil)]TJ /F7 11.955 Tf 11.95 0 Td[(Xij.WhenYij=2andYil=1,thenthenewresponseisYijl=0andthenewcovariatesareXij1=Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil,andXijl2=Xil)]TJ /F7 11.955 Tf 11.96 0 Td[(Xij.Nowwecanusethestandardsoftwareforanordinarylogisticregressionwithtransformedcomplexsurveydata(Yijl,Xijl1,Xijl2),andWijlforestimatingk. procsurveylogisticdata=all_pairs;odsoutputParameterEstimates=one_estimates;clusteri;modelY_ijl(ref='0')=X_ijl1X_ijl2/link=glogitnoint;WeightW_ijl;run;Asimulationstudyshowsthatwhentheselectionbiasisstrong,ignoringsamplingweightscouldresultinastronglybiasedestimateof.Thesimulationstudyisconductedasfollows.Apopulationdatasetisgeneratedbasedonthefollowingmodel logP(Yij=kjXi,bi) P(Yij=KjXi,bi)=Xijk+k+Xi+2i,k=1,...,K)]TJ /F5 11.955 Tf 11.95 0 Td[(1,(2)where1=1,2=2,1=0.2,2=0.4,=)]TJ /F5 11.955 Tf 9.3 0 Td[(1,2=1,i=1,...,1000,andj=1,...,1000.DataXij,Yij,andiaregeneratedasfollows:uiN(0,1),iN(0,1),XijBer(expit(ui)),Yijmultinomial(p1,p2,p3),p1=exp1=p,p2=exp2=p,p3=1=p,p=1+exp1+exp2,exp1=exp(Xij1+1+Xi+i),andexp2=exp(Xij2+2+Xi+i).100simulationdatasetsaresampledaccordingtofollows. 29

PAGE 30

Table2-1. SimulationResultsforaMultinomialDistributionBasedon100SimulationDataSetsUsingtheCPLMethod(True1=1and2=2) CoefcientUsingWeightMeanSDRange95%CIforMean 1Yes1.0030.200(0.553,1.621)(0.957,1.043)2Yes2.0260.181(1.577,2.707)(1.990,2.062)1No-0.3830.201(-0.842,0.232)(-0.423,-0.343)2No1.3320.181(0.883,2.020)(1.296,1.368) 1.WhenXij=0andYij=1,thesamplerateissetto0.004andcorrespondingsamplingweightissettoWij=1=4.2.WhenXij=0andYij=2,thesamplerateissetto0.003andcorrespondingsamplingweightissettoWij=1=3.3.WhenXij=0andYij=3,thesamplerateissetto0.002andcorrespondingsamplingweightissettoWij=1=2.4.WhenXij=1andYij=1,thesamplerateissetto0.002andcorrespondingsamplingweightissettoWij=1=2.5.WhenXij=1andYij=2,thesamplerateissetto0.003andcorrespondingsamplingweightissettoWij=1=3.6.WhenXij=1andYij=3,thesamplerateissetto0.004andcorrespondingsamplingweightissettoWij=1=4.TheCPLmethodisappliedto100simulationsampledatasetswithandwithoutusingsamplingweights.EstimatedarepresentedinTable 2-1 .FromTable 2-1 ,weseethatthe95%condenceintervalsofourestimateofwithoutusingsamplingweightsdonotcoverthetruevaluesofbutthe95%condenceintervalsofourestimateofwithusingsamplingweightsdo.Wealsoseethattheirrelativebiasesofourestimatesof1and2withoutusingweightsare-138.3%and-33.4%onaverage,respectively. 2.3CountOutcomesinLoglinearModelsSupposeapopulationincludeMclustersandNiindividualsperclusterandtheresponsestudiedisacountoutcomesuchasthenumberofsickdaysinoneyear,thenumberofcigarettesmokedinoneday,etc.Suchhealthoutcomesandbehaviorsare 30

PAGE 31

relatedtonotonlypersonalcharacteristicsbutalsoneighborhoodcharacteristics.Leti=1,...,Mindexneighborhoodsinthepopulationandj=1,...,Niindexindividualswithinagivenneighborhood.SupposeYijisacountoutcome,Xijdenotesanindividualcharacteristic,andbidenotesaneighborhoodcharacteristic.AssumethatYijgivenbiandXiindependentlyfollowsPoisson(ij)whereXi=fXi1,...,XiNig,Thefollowinggeneralizedlinearmodelcanbeusedtomodelassociationsbetweenoutcomesandcovariates. h(E(YijjXi,bi))=bi+Xij,(2)wherehisalogfunctionandareparametersofinterest.Therefore,ij=exp(bi+Xij).Supposeniindividualsarerandomlysampledpercluster,thenallpossiblepairswithineachclusterareformed.Thereareni(ni)]TJ /F5 11.955 Tf 12.04 0 Td[(1)=2pairsintotalperclusterwheneverniisgreaterthan1.LetSijl=Yij+Yiland(Yij,Yil)denoteapairofoutcomes,thenSijlPoisson(ij+il).SeeCasellaetal.[ 16 ]formoredetailsaboutthedistributionofthesumoftwoindependentrandomvariables.Byapplying( 2 ),thejointdistributionof(Yij,Yil)givenSijl,Xi,andbicanbederivedasfollows:P(Yij,YiljSijl,Xi,bi)=P(Yij,Yil,SijljXi,bi) P(SijljXi,bi)=P(YijjXi,bi)P(YiljXi,bi) P(Yij+YiljXi,bi)=yijijexp()]TJ /F3 11.955 Tf 9.3 0 Td[(ij) Yij!Yililexp()]TJ /F3 11.955 Tf 9.3 0 Td[(il) Yil!(Yij+Yij)! (ij+il)Yij+Yijexp()]TJ /F5 11.955 Tf 9.3 0 Td[((ij+il))=(Yil+Yil)! Yij!Yil!(exp(bi+Xij))Yij(exp(bi+Xil))Yil (exp(bi+Xij)+exp(bi+Xil))Yij+Yil=(Yil+Yil)! Yij!Yil!exp(Xij) exp(Xij)+exp(Xil)Yijexp(Xil) exp(Xij)+exp(Xil)Yil P(Yij,YiljSijl,Xi,bi)=(Yil+Yil)! Yij!Yil!exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))Yij1 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil))Sijl)]TJ /F8 7.97 Tf 6.59 0 Td[(Yij(2)UsingtheCPLmethod,allpairswithineachneighborhoodfromsampledataaretreatedasiftheyareindependent,sotheconditionallikelihoodfunctionforallpairswithineach 31

PAGE 32

neighborhoodcanbewrittenasfollows: L(jXi,bi)=MYi=1ni)]TJ /F4 7.97 Tf 6.59 0 Td[(1Yj=1niYl=j+1(Yil+Yil)! Yij!Yil!exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))Yij1 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))Yil(2)TheproofthattheexpectationofthelogofL(jXi,bi)isequaltozerowillbepresentedinnextsection.Twoapproachescanbeusedtointerprettheaboveresults.Therstapproachistotreattherighthandsideof( 2 )astwoindependentobservations.Brumbacketal.[ 9 ]rstinterpretedtherighthandsideof( 2 )usingtwoindependentobservations.TherstobservationhasacovariateXijl=Xij)]TJ /F7 11.955 Tf 12.34 0 Td[(Xil,nointercept,anoutcomeYijl=1,andaweightWijl=Yij;thesecondobservationhasacovariateXijl=Xij)]TJ /F7 11.955 Tf 12.15 0 Td[(Xil,nointercept,anoutcomeYijl=0,andaweightWijl=Yil.SocanbeestimatedbyusingtheSASSURVEYLOGISTICPROCEDURE.Thesecondapproachistotreattherighthandsideof( 2 )asabinomialdistributionwithpijl=exp((Xij)]TJ /F7 11.955 Tf 11.71 0 Td[(Xil))=(1+exp((Xij)]TJ /F7 11.955 Tf 11.71 0 Td[(Xil))),thenumberoftrialsSijl=Yij+Yil,thenumberofsuccessesoftrialsn=Yij,andweightswijl=1.Similarly,canbeestimatedbyusingtheSASSURVEYLOGISTICPROCEDURE.Now,supposetheunequalsampleisusedforcollectingdata.LetWijandWildenotethereciprocalsofsamplingratesforYijandYil,respectively,soWijl=WijWildenotestheweightofthepairofoutcomes(Yij,Yil),thenthelikelihoodfunctionofapairofobservationscanbewrittenasfollows: LWijl(jXi,bi)= (Yil+Yil)! Yij!Yil!exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil))Yij1 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))Sijl)]TJ /F8 7.97 Tf 6.59 0 Td[(Yij!Wijl.(2)ThederivativeofthelogofLWijl(jXi,bi)isanunbiasedestimatingequationforassumingthatourloglinearmodel( 2 )iscorrect,thatourweightsWijlarecorrect,thatallwithin-clusterpairsinthepopulationhaveapositiveprobabilityofselectionintothesample,andthattheexpectationofthescorefunctioniszero. 32

PAGE 33

Again,theconditionalpseudo-likelihoodmethodcanbeusedtoestimate,allpairswithineachclusterwillbetreatedasiftheywereindependent.Inordertondaconsistentestimatorof,theSASSURVEYLOGISTICcanbeusedtotageneralizedlinearmodelwithanewresponse(thenumberofevents)Yijl=Yij,anewcovariateXijl=Xij)]TJ /F7 11.955 Tf 12.24 0 Td[(Xil,anewweightWijl=WijWil,thenumberoftrialsS=Yij+Yil,andnointercept.ThefollowingSAScodeisusedtoestimatebasedontherstapproach. procsurveylogisticdata=all_pairs;odsoutputParameterEstimates=one_estimates;modelY_ijl=X_ijl/link=glogitnoint;clusteri;weightW_ijl;run;ThefollowingSAScodeisusedtoestimatebasedonthesecondapproach. procsurveylogisticdata=all_pairs;odsoutputParameterEstimates=one_estimates;modelY_ijl/S_ijl=X_ijl/link=glogitnoint;clusteri;weightW_ijl;run;Asimulationstudyshowsthatwhentheselectionbiasisstrong,ignoringsamplingweightscouldresultinastronglybiasedestimatedvalueof.Thesimulationstudyisconductedasfollows.Apopulationdatasetisgeneratedbasedonthefollowingmodel log(E(YijjXi,bi))=Xij+0+Xi1+2i,(2)where=0.5,0=)]TJ /F5 11.955 Tf 9.3 0 Td[(2,1=2,2=1,i=1,...,1000,andj=1,...,1000.DataXij,Yij,andiaregeneratedasfollows:uiN(0,1),iN(0,1),XijBer(expit(ui)), 33

PAGE 34

Table2-2. SimulationResultsforaPoissonDistributionBasedon100SimulationDataSetsUsingtheCPLMethod(True=0.5) CoefcientUsingWeightMeanSDRange95%CIforMean Yes0.4950.066(0.349,0.652)(0.482,0.508)No0.1790.064(0.053,0.325)(0.166,0.182) YijPoisson(ij),andij=exp(Xij+0+Xi1+i).100simulationdatasetsaresampledaccordingtofollows.1.WhenXij=0andYij<2,thesamplerateissetto0.002andcorrespondingsamplingweightissettoWij=2.2.WhenXij=1andYij<2,thesamplerateissetto0.004andcorrespondingsamplingweightissettoWij=1.3.WhenXij=0andYij>=2,thesamplerateissetto0.004andcorrespondingsamplingweightissettoWij=1.4.WhenXij=1andYij>=2,thesamplerateissetto0.002andcorrespondingsamplingweightissettoWij=2.TheCPLmethodisappliedto100simulationsampledatasetswithandwithoutusingsamplingweights.OurestimatesofarepresentedinTable 2-2 .FromTable 2-2 ,weseethatthe95%condenceintervalofestimatedwithoutusingsamplingweightsdoesnotcoverthetruevalueofbutthe95%condenceintervalofourestimateofwithusingsamplingweightsdoes.Wealsoseethattherelativebiasofourestimateofwithoutusingweightsis-64.2%onaverage. 2.4NonnegativeContinuousOutcomesinLoglinearModelsSupposethatthepopulationincludesMclustersandNiindividualsperclusterandtheresponsestudiedisanonnegativecontinuousoutcomesuchasanindividual'sBMI,bloodpressure,andsoon.Suchhealthoutcomesarerelatedtonotonlypersonalcharacteristicsbutalsoneighborhoodcharacteristics.Leti=1,...,Mindexneighborhoodinthepopulationandj=1,...,Niindexindividualswithinneighborhoodi.SupposeYijisanonnegativecontinuousoutcome,Xijdenotesan 34

PAGE 35

individualcharacteristic,andbidenotesaneighborhoodcharacteristic.AssumethatYijgivenbiandXiindependentlyfollowsgamma(ij,)orsomeothernonnegativecontinuousdistributionwhereXi=fXi1,...,XiNig.Model( 2 )canbeusedtomodelassociationsbetweenanonnegativecontinuousoutcomeandcovariates.Supposeniindividualsarerandomlysampledpercluster,thenallpossiblepairswithineachclusterareformed.Thereareni(ni)]TJ /F5 11.955 Tf 11.56 0 Td[(1)=2pairsintotalperclusterwheneverniisgreaterthan1.LetSijl=Yij+Yiland(Yij,Yil)denoteapairofoutcomes.Brumbacketal.[ 9 ]mentionthataveryusefulextensionof( 2 )isthat( 2 )canbeappliedtoanonnegativecontinuousoutcometondaconsistentestimatorofaslongastheexpectationofanoutcomegivenXiandbisatisesequation( 2 ).Thefollowingisthejusticationforthisextension.LetC=(Yil+Yil)!=(Yij!Yil!)andLijl(jXi,bi)bethelikelihoodfunctionofapairofoutcomes(Yij,Yil),thenLijl(jXi,bi)canbewrittenasfollows:Lijl(jXi,bi)=P(Yij,YiljSijl,Xi,bi)=Cexp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))Yij1 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil))Sijl)]TJ /F8 7.97 Tf 6.59 0 Td[(Yij=Cexp(Yij(Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))(1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))))]TJ /F4 7.97 Tf 6.59 0 Td[((Yij+Yil),thentheloglikelihoodcanbewrittenaslog(Lijl(jXi,bi))=Yij(Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil))]TJ /F5 11.955 Tf 11.95 0 Td[((Yij+Yil)log(1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil))),andthescorefunctionoflog[Lijl(;Xi,bi)]is S(jXi,bi)=(Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)Yij)]TJ /F5 11.955 Tf 11.96 0 Td[((Yij+Yil)exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)).(2) 35

PAGE 36

Taketheexpectationtobothsidesof( 2 )andapply( 2 )totheexpectationof( 2 ),wehaveE(S(jXi,bi))=(Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)exp(bi)exp(Xij))]TJ /F5 11.955 Tf 13.15 8.09 Td[((exp(Xij)+exp(Xil))exp(Xij) exp(Xij)+exp(Xil))=0.Weseethatwhenanoutcomehasanynonnegativecontinuousdistribution,themethodofndingconsistentlyestimatedforanoutcomewithaPoissondistributioncanbedirectlyappliedtotheoutcomewithanonnegativecontinuousdistributionwithoutanychange.Therefore,Poissonassumptionisunnecessary.Again,theconditionalpseudo-likelihoodmethodcanbeusedtoestimate,allpairswithineachclusterwillbetreatedasiftheywereindependent.InordertousetheSASPROCSURVEYLOGISTIC,therightsideof( 2 )hastobeinterpretedastwoindependentobservationsbecauseYijisnolongeraninteger.Fortherstobservation,Yijl=1,acovariateXijl=Xij)]TJ /F7 11.955 Tf 12.77 0 Td[(Xil,aweightWijl=Yij,usingidascluster,andnointercept.Forthesecondobservation,Yijl=0,acovariateXijl=Xij)]TJ /F7 11.955 Tf 12.69 0 Td[(Xil,aweightWijl=Yil,usingidascluster,andnointercept.Supposetheunequalsampleratesareusedforcollectingdata.LetWijandWildenotethereciprocalsofsamplingratesforYijandYil,respectively,soWijl=WijWildenotestheweightofthepairofoutcomes(Yij,Yil).Inordertondaconsistentestimatorof,weightsmustbereconsidered.AnewweightfortherstobservationisWijlYijandanewweightforthesecondobservationisWijlYil.TheSAScodeisthesameastheoneforthecountoutcomebasedontherstapproach.Asimulationstudyshowsthatwhentheselectionbiasisstrong,ignoringsamplingweightscouldresultinastronglybiasedestimateof.Thesimulationstudyisconductedasfollows.Apopulationdatasetisgeneratedbasedonthefollowingmodel log(E(YijjXi,bi))=Xij+0+Xi1+2i,(2) 36

PAGE 37

Table2-3. SimulationResultsforaGammaDistributionBasedon100SimulationDataSetsUsingtheCPLMethod(True=3) CoefcientUsingWeightMeanSDRange95%CIforMean Yes3.0180.104(2.785,3.237)(2.997,3.039)No2.5590.102(2.286,2.753)(2.539,2.579) where=3,0=)]TJ /F5 11.955 Tf 9.3 0 Td[(2,1=2,2=1,i=1,...,1000,andj=1,...,1000.DataXij,Yij,andiaregeneratedasfollows:uiN(0,1),iN(0,1),XijBer(expit(ui)),Yijgamma(ij,1),andij=exp(Xij+0+Xi1+i).100simulationdatasetsaresampledaccordingtofollows.1.WhenXij=0andYij<2,thesamplerateissetto0.001andcorrespondingsamplingweightissettoWij=4.2.WhenXij=1andYij<2,thesamplerateissetto0.004andcorrespondingsamplingweightissettoWij=1.3.WhenXij=0andYij>=2,thesamplerateissetto0.004andcorrespondingsamplingweightissettoWij=1.4.WhenXij=1andYij>=2,thesamplerateissetto0.001andcorrespondingsamplingweightissettoWij=4.TheCPLmethodisappliedto100simulationsampledatasetswithandwithoutusingsamplingweights.EstimatedarepresentedinTable 2-3 .FromTable 2-3 ,weseethatthe95%condenceintervalofourestimateofwithoutusingsamplingweightsdoesnotcoverthetruevalueofbutthe95%condenceintervalofourestimateofwithusingsamplingweightsdoes.Wealsoseethattherelativebiasofourestimateofwithoutusingweightsis-14.7%onaverage. 2.5ApplicationoftheCPLMethodTheCPLmethodwillbeappliedtoanonnegativecontinuousdistributiontoadjustforconfoundingbyneighborhoodcharacteristicswiththe2008FloridaBRFSScomplexsurveydata.Thehealthbehaviorchosenasaresponsevariablefordataanalysisisaveragealcoholconsumption(AVEDRNKS)inpast30days.Theresponsevariable 37

PAGE 38

AVEDRNKSapproximatelyfollowsanPoissondistribution.AnPoissondistributionisalsooneofnonnegativecontinuousdistributions.Multiplefactorsareincludedinthisdataanalysistondassociationsbetweentheheathbehaviorandthosefactorsforadjustingforconfoundingduetoneighborhoodcharacteristics.Thosefactorsincludegender,age,race,education,andhouseholdincome.Categorizedvariablesareusedindataanalysis.Ageiscategorizedintofourgroups.Therstagegrouprangesfrom18to34,thesecondagegrouprangesfrom35to54,thethirdagegrouprangesfrom55to64,andlastagegroup65+.Raceiscategorizedintovegroupsincludingwhites,blacks,Hispanics,Asians,andothers.Educationiscategorizedintotwogroupsandhouseholdincomeisalsocategorizedintotwogroups.Inthe2008FloridaBRFSSdataset,thereare10283observations.201observationsareexcludedfromdataanalysisduetomissingresponsevaluesandafurther1120observationsareexcludedduetomissingcovariatevalues.Theremaining8962observationsareusedtotthreedifferentmodelswithsurveyweights.Model1isageneralizedlinearmodelwithoutconsideringneighborhoodcharacteristics.Inmodel1thedistributionofresponseissettobeaPoissondistributionandaloglinkfunctionisused.Model2isaPoissonloglinearregressionmodelconsideringneighborhoodcharacteristics.TheloglinearmodelistranslatedintoalogisticregressionmodelbytheCPLmethod.TheCPLmethodisusedtoavoidestimatingnuisanceparameters.Model3isageneralizedlinearmixedmodelconsideringneighborhoodcharacteristicsandtreatsneighborhoodcharacteristicsasrandomvariablestoavoidestimatingnuisanceparameters.TheSASPROCGENMODisusedinmodel1forndingassociationsbetweenresponseandcovariates.TheSASPROCSURVEYLOGSTICisusedinmodel2forndingassociationsbetweenresponsesandcovariatesforadjustingforneighborhoodcharacteristics.TheStataprogramgllammisusedinmodel3forndingassociationsbetweenresponsesandcovariatesforadjustingforneighborhoodcharacteristics. 38

PAGE 39

Table2-4. AssociationsbetweenAverageDrinksandCovariatesinthe2008FloridaBRFSSComplexSurveyData Model1Model2Model3 FactorGroup(SE)p-value(SE)p-value(SE)p-value Age18-3400035-54-0.41(.001).0001-0.37(0.13)0.0038-0.38(0.10).000155-64-0.53(.001).0001-0.65(0.17).0001-0.60(0.16).000165+-0.77(.001).0001-0.74(0.13).0001-0.71(0.11).0001SexM000F-0.65(.001).0001-0.63(0.09).0001-0.58(0.08).0001RaceWhite000Black-0.67(.001).0001-0.59(0.17)0.0005-0.47(0.17)0.005Hisp-0.29(.001).0001-0.28(0.14)0.0434-0.29(0.15)0.051Asian-1.06(.004).0001-0.99(0.46)0.0299-0.88(0.49)0.075Other-0.25(.001).0001-0.21(0.24)0.3637-0.10(0.25)0.707Edu.13-000(Years)14+-0.02(.001).00010.18(0.13)0.16040.12(0.10)0.202IncomeLow000High-0.25(.001).00010.20(0.11)0.06920.19(0.11)0.082 Note:Model1isaGLManditsparametersareestimatedbyusingtheSASGENMODprocedure.Model2isaGLMMwithoutspecifying (Xi,i)anditsparametersareestimatedbyusingtheCPLmethod.Model3isaGLMMwithspecifying (Xi,i)=Xi+ianditsparametersareestimatedbyusingtheStataprogramgllamm. WeknowthatestimatorsfromtheCPLarealwaysconsistentnomatteriftheneighborhoodeffectexistsornot.Iftheeffectofneighborhoodcharacteristicsonassociationsbetweenoutcomesandcovariatesdoesnotexist,thenbothestimatedcoefcientsfrommodel1andmodel2shouldbeclosetoeachother.Letustakealookatthoseestimatedcoefcientsfromthreemodels.Wedoseethatmanycoefcientsamongmodelsaredifferent.Firstly,educationandincomehavesignicantassociationswithaveragedrinksinmodel1butthosesignicancesdisappearinbothmodel2andmodel3.Secondly,thedirectionsofthethoseassociationsinmodel1areoppositetothedirectionsoftheassociationsinmodel2and3.Last,allp-valuesoftherstfourestimatedcoefcientsinthreemodelsaresignicant,thedifferencesofthoseestimatedefcientsbetweenmodel2andmodel3isnotgreaterthanthosebetweenmodel2andmodel1exceptforthecoefcientoffemale.Soneighborhood-levelcovariatesshould 39

PAGE 40

beincludedtomodelassociationsbetweenresponsesandcovariates.Weseethattherearesomedifferencesofcoefcientsbetweenmodel2andmodel3.Theymightbecausedbyunmeasuredneighborhoodcovariates.Inmodel3,weightedneighborhoodmeansareusedasneighborhoodcovariates.Inthe2008FloridaBRFSSsurveydata,morethanhalfofneighborhoodshavetheirneighborhoodsamplesizeswhichare5orless.Therefore,weightedneighborhoodmeanscouldbesignicantlydifferentfromthetrueneighborhoodmeansespeciallywhenacovariateiscategorical. 40

PAGE 41

CHAPTER3BIASANDMEASUREMENTERRORDUETOUNMEASUREDCLUSTERCOVARIATES 3.1IntroductionChapter3presentsbiasandmeasurementerrorduetounmeasuredclustercovariatesinbetween-withinmodels.Section3.2discussesthebiasduetoadjustingconfoundingbyunmeasuredclustercovariatesinbetween-withinmodels.Weusesimulationstudiestoshowthatestimatorsarebiasedwhenbetween-withinmodelsareused,theheteroscedasticityoferrorsexist,andheteroscedasticityisnothandledproperly.Section3.3discussesthemeasurementerrorinlinearmixedmodels(LMMs)duetounmeasuredclustercovariates,presentshowtheunmeasuredclustercovariatesleadtobiasedestimatorsinLMMs,andshowshowtocorrectthebiasofestimatorsinLMMs.Asimulationstudyisalsoconductedtoshowthebiasedestimatorsandthecorrectionofbiasedestimators.TheresultsobtainedinLMMscanbedirectlyappliedtolinearmodels(LMs). 3.2BiasduetoAdjustingforConfoundingbyUnmeasuredClusterCovariatesBetween-withinmodelsareoftenusedinsocialepidemiologyinordertoadjustforconfoundingbyneighborhoodcharacteristics.Undermanysituations,theapproachadequatelyadjustsforconfoundingduetounmeasuredclustercovariates.However,therearesomesituationsunderwhichtheapproachcanfail.Inthissection,theheteroscedasticityoferrorswillbepresentedtoshowthattheapproachfails.Abetween-withinmodelhasthefollowingform g(E(YijjXi,i))=+(Xij)]TJ /F5 11.955 Tf 13.64 2.66 Td[(Xi)W+XiB+i,i=1,...,m;j=1,...,ni,(3)whichisequivalentto g(E(YijjXi,i))=+Xij+Xi+i,i=1,...,m;j=1,...,ni,(3) =Wand=B)]TJ /F3 11.955 Tf 11.96 0 Td[(W,(3) 41

PAGE 42

whereg()isalinkfunction,Xi=fXi1,Xi2,...,Xinig,Xi=n)]TJ /F4 7.97 Tf 6.58 0 Td[(1ijXijisaclustercovariateinclusteri,Xijisacovariateofjthobservationinclusteri,iareindependentandidenticallydistributedandfollowadistributionwithmean0andvariance2.Inthismodel,theclustercovariateXiappearsinalinearformwhichisanusualway.Actually,wemightnotknowhowtheclustercovariateXiaffectsanoutcomeYijexactly.Therefore,inamoregeneralway,wecanletbi= (Xi,)+iappearinthebetween-withinmodel.Nowwecanhavethefollowingmoregeneralformofthebetween-withinmodel g(E(YijjXi,bi))=Xij+bi.(3)Wheng()isanidentityfunctionanderrorsarenormallydistributedandhomoscedastic,thentheconsistentestimatedinmodels( 3 )and( 3 )areidenticalanddenotedbyBW.Ithasthefollowingform ^BW=mi=1nj=1(Xij)]TJ /F5 11.955 Tf 13.65 2.66 Td[(Xi)Yij mi=1nj=1(Xij)]TJ /F5 11.955 Tf 13.65 2.66 Td[(Xi)2,(3)whereassumeni=n,fori=1,...,m.SeeAppendixBfordetailsaboutthisformula.Supposemodel( 3 )isatruemodel.Whenbihappenstobeequalto+Xi+i,thentheusualbetween-withinmodelapproachisvalid.BecauseiisassumedtobeindependentofXij,thisimpliesthatXiisasufcientconfounderforestimatingtheeffectofXijonYij.Itmeansthatnootherunmeasuredconfoundersareneeded.Therefore,analternativebetween-withinmodelcouldhavethefollowingsimplerversion E(YijjXi)=+Xij+Xi.(3)Theestimatedbasedontheabovemodelisidenticaltotheestimatorat( 3 ).Whenalinkfuctiong()isanidentityanderrorsarenormallydistributedandheteroscedasticwithvariances1=wij,thenonemaywishtotakeheteroscedasticityintoaccount.Ifweightswijareaddedinmodel( 3 ),thesimulationstudythatwewillpresentshowsthattheestimatedisbiased.ThisestimatorisdenotedbyBWH.If 42

PAGE 43

weightswijareaddedinmodel( 3 ),thesimulationstudyshowsthattheestimatedisalsobiased.ThisestimatorisdenotedbyBWHa.Whenerrorsareheteroscedastic,onewayofhandlingheteroscedasticerrorsistoreplaceXiinmodel( 3 )byXwi=n)]TJ /F4 7.97 Tf 6.58 0 Td[(1(jwijXij)=(jwij).Therefore,anotherversionofthebetween-withinmodelwithheteroscedasticerrorsmightbe E(YijjXi,i)=+Xij+Xwi+i,(3)However,thesimulationstudyshowsthattheestimatedbasedon( 3 )withweightswijisstillbiased.ThisestimatorisdenotedbyBWb.Whenmodel( 3 )istrueanderrorsarenormallydistributedandheteroscedastic,aconsistentestimatedbasedonmodel( 3 )canbewrittenas ^=mi=1nj=1wij(Xij)]TJ /F5 11.955 Tf 13.65 2.66 Td[(Xwi)Yij mi=1nj=1wij(Xij)]TJ /F5 11.955 Tf 13.65 2.66 Td[(Xwi)2,(3)whichisidenticaltotheestimatorbasedonthefollowingmodel E(YijjXi)=+Xij+Xwi,(3)assumingheteroscedasticitywithvariance1=wij.Model( 3 )isanewversionofthebetween-withinmodeltobetteraccommodatetheheteroscedasticityoferrors.Thesimulationstudyisconductedasfollows.Weset10,000clustersand5observationspercluster.WeletuiN(0,0.252),XijN(ui,1),iN(0,1),bi=5ui+iorbi=5ui,andYijN(Xij+bi,1=wij)where=0.5andwij=expit(Xij+bi)(1)]TJ /F7 11.955 Tf 11.95 0 Td[(expit(Xij+bi)).SimulationresultsarepresentedinTable 3.2 .Therstestimatorisbasedontheincorrectassumptionthaterrorsarehomoscedastic.Thenextfourestimatorsarebasedonthecorrectassumptionthaterrorsareheteroscedastic.^BWcanbeestimatedbyusingtheSASPROCGENMODbasedonmodel( 3 )withoutusingweightswijortheSASPROCREGbasedonmodel( 3 )withoutusing 43

PAGE 44

Table3-1. SimulationResultsBasedon100SimulationDataSets biEstimatorMeanSDRange95%CIforMean 5ui+i^BW0.4970.239(-0.153,1.046)(0.450,0.544)^BWH0.2800.049(0.160,0.402)(0.270,0.290)^BWHa0.2140.051(0.088,0.341)(0.204,0.224)^BWHb0.4060.051(0.279,0.536)(0.396,0.416)^0.4960.050(0.367,0.624)(0.486,0.506)5ui^BW0.4730.156(0.067,0.792)(0.442,0.504)^BWH0.3050.044(0.209,0.467)(0.296,0.314)^BWHa0.2500.045(0.146,0.417)(0.241,0.259)^BWHb0.4120.044(0.319,0.561)(0.403,0.421)^0.4970.043(0.410,0.639)(0.489,0.505) weightswij.Equation( 3 )canalsobeusedtoestimate^BW.Theestimated^BWisconsistentbutmuchmorevariablethan^duetoignoringtheheteroscedasticityoferrors.^BWHisestimatedbyusingtheSASPROCGENMODbasedonmodel( 3 )usingweightswij.ThisestimatorisbiasedduetousingXiratherthanXWi.^BWHaisestimatedbyusingtheSASPROCREGbasedonmodel( 3 )usingweightswij.ThisestimatorisbiasedduetousingXiratherthanXWi.^BWHbisestimatedbyusingtheSASPROCGENMODbasedonmodel( 3 )usingweightswij.Thisestimatorisbiasedduetousingi.^isestimatedbyusingtheSASPROCREGbasedonmodel( 3 )usingweightswij.Equation( 3 )canbeusedtoestimate^aswell.Thisestimatorisconsistentandmuchlessvariablethan^BW.Insummary,whenbetween-withinmodelsareusedandtheheteroscedasticityoferrorsexist,heteroscedasticitymustbehandledproperly;otherwise,estimatorswillbebiasedorhavealargervariancethanthatof^. 3.3MeasurementErrorinLMMsduetoUnmeasuredClusterCovariatesIfyoubelieveamodelshouldincludeunmeasuredneighborhoodcharacteristicsandyouareinterestedinnotonlyestimatingtheeffectofpersonalfactorsonoutcomesbutalsoestimatingtheeffectofneighborhoodfactorsonoutcomes,thenanordinary 44

PAGE 45

estimatorforneighborhoodfactorsarebiased.Nextweturntoamultiplelinearmodelwithmeasurementerror.Inordertoknowhowmeasurementerroraffectsthecorrespondingcoefcientandmightfurtheraffectothercoefcients,thebestwayistobeginwithamotivationexample.Thefollowingsettingisctionalandnotbasedonanyobservationalstudies.Itmightbetrueorpartlytrue.SupposetherearetwoneighborhoodsAandBandaverageagesofpeopleinneighborhoodsAandBare30and60,respectively.PeopleinneighborhoodAarehighlylikelytodomoreoutdooractivitiesthandopeopleinneighborhoodB,soelderpeopleinneighborhoodAarealsohighlylikelytodomoreoutdooractivitiesthandoelderpeopleinneighborhoodB.Therefore,highbloodpressureisassociatedwithbothpersonalageandaverageageofpeopleinaneighborhoodsincemoreoutdooractivitiescanreducehighbloodpressure.Whensurveysorinterviewsareusedtocollectpeople'sageandtheaverageageofpeopleinaneighborhood,personalageisusuallymeasuredwithouterrorbuttheaverageageofpeopleinaneighborhoodwillmeasuredwitherrorduetothesmallsamplesizewithineachneighborhood.LetYijdenotepersonalbloodpressure,Xijdenotepersonalage-45andbeobservedvariablewithoutmeasurementerrors,Xi=N)]TJ /F4 7.97 Tf 6.58 0 Td[(1iNij=1Xijdenoteaverageageofpeopleinaneighborhoodandbeanunobservedvariableduetothesmallsamplesizeineachneighborhood.LetXis=n)]TJ /F4 7.97 Tf 6.59 0 Td[(1inij=1Xijdenotesampleaverageageandbeanactualobservedvariable.AssumeXijjui(ui,2x),ui(0,2u),ij(0,2),i(0,2n),ui?ij,andi?fui,ij,Xig.Therefore,Xij(0,2x+2u),Xijui(ui,2x=Ni),Xi(0,2u+2x=Ni),andXis(0,2u+2x=ni).Thenthefollowingmodelscanbeusedtomodeltheassociationbetweenpersonalbloodpressureandpersonalageandtheaverageageofpeopleinaneighborhood.WhenXiisavailable,thenthetruemodelis Yij=0+1Xij+2Xi+i+ij,(3) 45

PAGE 46

wherei=1,...,Mandj=1,...,ni.Forasampledata,XiisnotavailableinsteadofXis,thenanaivemodelis Yij=0+1Xij+2Xsi+i+ij,(3)wherei=1,...,Mandj=1,...,ni.BasedontheconditionaldistributionofXijgivenui,themarginaldistributionofXi,andtheassumptionthatXi1,...,Xiniareindependentgivenui,thecovariancebetweenXijandXicanbederivedasfollows:Cov(Xij,Xi)=E(XijXi)=E(X2ij=Ni+Xij(j06=jXij0)=Ni)=E(X2ij=Ni)+E(Xij(j06=jXij0)=Ni)=E(X2ij=Ni)+E(E(Xij(j06=jXij0)=Nijui))=E(X2ij=Ni)+(Ni)]TJ /F5 11.955 Tf 11.96 0 Td[(1)E(u2i)=Ni(independence)=2u+2x=Ni.Therefore, Cov(Xij,Xi)=Var(Xi)=2u+2x=Ni.(3)Similarly, Cov(Xij,Xis)=Var(Xis)=2u+2x=ni(3)ThedifferencebetweenXiandXsiisduetothesmallsamplesizeniwithineachneighborhoodandthedifferenceisnotindependentofXij.Thedifferencewillbecalledmeasurementerrorintherestofthissection.ThefollowingwillshowhowthemeasurementerroraffectscoefcientsduetothesmallsamplesizeinLMMs.Therestofthissectionwillincludefourparts.TherstpartwillshowthatthecoefcientofXijisalwaysconsistentlyandunbiasedlyestimatedinLMMs.ThesecondpartwillshowthatthecoefcientofXsiisbiasedinLMMs.Thethirdpartwillshowhowtocorrectbiased 46

PAGE 47

estimationof2inLMMs.ThelastpartwillshowthattheapproachusedforLMMscanbedirectlyappliedtoLMswithoutanychange.InAppendixB,formulasforestimating1inLM( B )andLMM( B )areestablished.Comparingtwoequations( B )and( B )forestimating1,weseetheyaretheexactsameestimator.WeseethattheLM( B )doesnotincludeXsi,so^1intheLM( B )willnotbeaffectedbythemeasurementerrorofXsiandisunbiasedconsistent.Thisimpliesthat^1intheLMM( B )willnotbeaffectedbythemeasurementerrorofXsiandisconsistentevenwhentheLMM( B )doesincludeXsiwhichhasmeasurementerrorduetosmallsamplesize.BasedontheLMM( B )andthematricesX0V)]TJ /F4 7.97 Tf 6.58 0 Td[(1XandX0V)]TJ /F4 7.97 Tf 6.59 0 Td[(1Y,wecanndtheestimateof1+2anditis ^1+^2=ijXsiYij)]TJ /F5 11.955 Tf 13.43 2.66 Td[(^0ijXsi ij(Xsi)2.(3)IfXiisavailable,thentheestimateof1+2isthefollowing ^1+^2=ijXiYij)]TJ /F5 11.955 Tf 13.43 2.65 Td[(^0ijXi ij(Xi)2.(3)BecausethereisadifferencebetweenXiandXsi,^1+^2isdifferentfrom^1+^2.Asimulationstudywillbeconductedandwillshowthat^2hasaround50%bias.Nowwewanttocorrectthebiasof^1+^2.Weknow^1isanunbiasedestimatorof^1,sowehavethefollowing^1=ij(Xij)]TJ /F5 11.955 Tf 13.65 2.65 Td[(Xi)Yij ij(Xij)]TJ /F5 11.955 Tf 13.65 2.65 Td[(Xi)2=ijXijYij)]TJ /F5 11.955 Tf 11.96 0 Td[(ijXiYij ij(X2ij)]TJ /F5 11.955 Tf 13.65 2.65 Td[(X2i). 47

PAGE 48

GivenVar(Xij)=2x+2uandVar(Xi)=2u+2x=Ni,wehaveVar(Xij))]TJ /F7 11.955 Tf 12.63 0 Td[(Var(Xi)=2x)]TJ /F3 11.955 Tf 11.96 0 Td[(2x=Ni.=2xsinceNiislarge,thenestimatorofcanbewrittenas^1=ijXijYij)]TJ /F5 11.955 Tf 11.96 0 Td[(ijXiYij (mn)(Var(Xij))]TJ /F7 11.955 Tf 11.95 0 Td[(Var(Xi)).=ijXijYij)]TJ /F5 11.955 Tf 11.95 0 Td[(ijXiYij (mn)2x.Fromlastequation,wecanseethatijXiYij.=ijXijYij)]TJ /F5 11.955 Tf 11.96 0 Td[((mn)2x^1=ijXijYij)]TJ /F5 11.955 Tf 11.96 0 Td[((mn)2x^1,then^1+^2=ijXiYij)]TJ /F5 11.955 Tf 13.42 2.66 Td[(^0ijXi ij(Xi)2.=ijXijYij)]TJ /F5 11.955 Tf 11.96 0 Td[((mn)2x^1)]TJ /F5 11.955 Tf 13.43 2.65 Td[(^0ijXi (mn)Var(Xi).Thecorrectionof^2is^c2.=ijXijYij)]TJ /F5 11.955 Tf 11.96 0 Td[((mn)2x^1)]TJ /F5 11.955 Tf 13.43 2.66 Td[(^0ijXi (mn)Var(Xi))]TJ /F5 11.955 Tf 13.43 2.66 Td[(^1=(mn))]TJ /F4 7.97 Tf 6.59 0 Td[(1ijXij(Yij)]TJ /F5 11.955 Tf 13.89 2.66 Td[(Y+Y))]TJ /F5 11.955 Tf 11.96 0 Td[((2x+2u)^1)]TJ /F5 11.955 Tf 13.43 2.66 Td[(^0X 2u=(mn))]TJ /F4 7.97 Tf 6.59 0 Td[(1ijXij(Yij)]TJ /F5 11.955 Tf 13.89 2.65 Td[(Y)+(mn))]TJ /F4 7.97 Tf 6.59 0 Td[(1ij(XijY))]TJ /F5 11.955 Tf 11.95 0 Td[((2x+2u)^1)]TJ /F5 11.955 Tf 13.43 2.65 Td[(^0X 2u=(mn))]TJ /F4 7.97 Tf 6.59 0 Td[(1ijXij(Yij)]TJ /F5 11.955 Tf 13.89 2.65 Td[(Y))]TJ /F5 11.955 Tf 11.96 0 Td[((2x+2u)^1 2u.=Cov(Xij,Yij))]TJ /F5 11.955 Tf 11.95 0 Td[((2x+2u)^1 2uasmngoes1.=Cov(Xij,^1Xij+^2Xsi))]TJ /F5 11.955 Tf 11.95 0 Td[((2x+2u)^1 2u=^1Var(Xij)+^2Cov(Xij,Xsi))]TJ /F5 11.955 Tf 11.95 0 Td[((2x+2u)^1 2u=^2Cov(Xij,Xsi) 2u 48

PAGE 49

Table3-2. SimulationResultsforaLinearMixedModelBasedon100SimulationDataSets(True0=2,1=2,and2=)]TJ /F5 11.955 Tf 9.29 0 Td[(3) CoefcientMeanSDRange95%CIforMean 02.0290.050(1.902,2.142)(2.019,2.039)12.0000.009(1.974,2.019)(1.998,2.002)2-1.5230.040(-1.635,-1.432)(-1.531,-1.515)c2-3.0190.202(-3.734,-2.526)(-3.059,-2.979) Forthecomputationofcorrectionof2,weusethefollowing ^c2.=(mn))]TJ /F4 7.97 Tf 6.59 0 Td[(1ijXijYij)]TJ /F5 11.955 Tf 11.96 0 Td[((^2x+^2u)^1)]TJ /F5 11.955 Tf 13.43 2.65 Td[(^0X ^u2,(3)whereijXijYijandXcanbedirectlycomputedbasedonrawdata,2xand2ucanbecomputedbasedonrawdatawiththeequalitiesCov(Xij,Xis)=Var(Xis)=2u+2x=niandVar(Xij)=2x+2u,^0and^1canbefoundinthenaivemodel.Asimulationstudyveriedtheabovecorrectionof^2.Asimulationstudyisconductedasfollows.Apopulationdatasetisgeneratedbasedonmodel( 3 ).Weset0=2,1=2,2=)]TJ /F5 11.955 Tf 9.3 0 Td[(3,Xi=N)]TJ /F4 7.97 Tf 6.58 0 Td[(1iPNij=1Xij,i=1,...,1000,andj=1,...,1000.iandjindexneighborhoodandindividual,respectively.DataXij,Yij,i,and"ijaregeneratedasfollows:"ijN(0,1),iN(0,1),uiN(0,1),XijN(ui,4),Yijaregeneratedbymodel( 3 ).100simulationdatasetsaresampledaccordingtofollows.Foreachsimulationdataset,fourindividualsarerandomlyselectedfromeachneighborhoodsoni=n=4foralli.Weestimate0,1,and2usingsampledatasets.Soparametersareactuallyestimatedbasedonthenaivemodel( 3 )becauseN)]TJ /F4 7.97 Tf 6.59 0 Td[(1iPNij=1Xijisdifferentfromn)]TJ /F4 7.97 Tf 6.59 0 Td[(1iPnij=1Xijduetosmallsamplesizewithineachneighborhood.TheestimatedparametersarepresentedinTable 3-2 withsuperscriptstar(*),weseethatestimated0and1fromthenaivemodelarestillverygoodbutthenaiveestimatorof2isbiasedanditsrelativebiasis49.3%.Now,wecancorrectthebiasbasedonformula( 3 ).Thecorrectedestimated2basedonformula( 3 )ispresentedinTable 3-2 withsuperscriptletterc. 49

PAGE 50

Theabovecorrectionof^2canbedirectlyappliedtoLMswithoutanychangebecauseLMsarespecialcasesofLMMswhenVar(i)=0. 50

PAGE 51

CHAPTER4MODEL-BASEDSTANDARDIZATIONTOACCOUNTFORUNMEASUREDCLUSTERCONFOUNDINGWITHCOMPLEXSURVEYDATAModel-basedstandardizationdependsonastatisticalmodeltoestimateanunconfoundedpopulation-averagedeffect.Weproposetwomethodstoconductmodel-basedstandardizationtoaccountforunmeasuredclusterconfoundingwithcomplexsurveydata.Therstmethodistousetheconditionalpseudo-likelihood(CPL)methodinageneralizedlinearmixedmodelwhencensusdataarenotavailable.Thesecondmethodistouseageneralizedlinearmixedmodelwhencensusdataareavailable.Weshowhowtoestimatethestandardizedeffectusingthesetwomethodswithcomplexsurveydata.Inoursimulationstudies,theestimatedparametersfromthesetwomethodsarealmostidentical.Weapplythesetwomethodstothe2008FloridaBehavioralRiskFactorSurveillanceSystem(BRFSS)surveydataandestimatestandardizedproportionsofpeoplewhodrinkalcohol,stratiedbyagegroup,toaccountforunmeasuredclusterconfounding.Wecompareanddiscussstandardizedproportionsestimatedbythetwomethods. 4.1IntroductionBetween-withinmodelsdecomposecovariateeffectsintobetween-clusterandwithin-clustereffects[ 25 ].Usingthebetween-withinmodelsinsocialepidemiologyisequivalenttoassumingthatindividual-levelfactorsandneighborhood-levelfactorsshouldbetreatedseparately.Thetermsclusterandneighborhoodareoftenusedinterchangeablyinthischapter.Insocialepidemiology,neighborhood-levelfactorsareoftenviewedasimportantcausesofhealthoutcomesandsocialbehaviors,inadditiontoeffectsofindividualfactors.Ourinterestismotivatedbyananalysisofthe2008FloridaBRFSScomplexsurveydata.Weareinterestedinestimatingthestandardizedproportionsofpeoplewhodrinkalcohol,stratiedbyagegroup,toaccountforunmeasuredconfoundingduetocluster.Drinkingisahealthbehaviorwhichisrelatedtomanypersonalfactorssuchaseducation,income,age,gender,andrace/ethnicity.Itis 51

PAGE 52

alsopossiblyinuencedbysocial-economicfactorsattheneighborhoodlevel.BRFSSsurveydatahasacomplexsamplingdesignsounequalsamplingweightsareusedtosampleindividuals.Wedeneneighborhoodsaszipcodes.Thesamplesizewithineachzipcodeistypically5orlessinthe2008FloridaBRFSSsurveydata.Wewanttocomparestandardizedproportionsacrossagegroupsusingidenticaldistributionsofindividualandneighborhoodlevelconfounderswithineachagegroupcategory.Thisismodel-basedstandardization.Wefacethreeproblemsincludingsmallsamplesizeswithincluster,unequalsamplingweights,andunmeasuredcluster-levelconfounding.Theywillhavetobesolvedsimultaneouslyinordertoestimatethestandardizedproportions,stratiedbyagegroup,toaccountforunmeasuredconfoundingduetocluster.Thischapterisorganizedasfollows.Section4.2describesthemethodswepropose.InSection4.3,wepresentsimulationstudiestoshowthatourmethodscanestimateparametersweneedforcomputingthemodel-basedstandardizedeffect.InSection4.4,weapplyourproposedmethodstothe2008FloridaBRFSSsurveydata.Thenalsectionconcludeswithabriefdiscussion. 4.2MethodsWewillintroduceourmethodswithourmotivatingexample.Themethodsweproposeforcomputingmodel-basedstandardizationswillbeappliedtothe2008FloridaBRFSSsurveydata.Ourinterestisinthemodel-basedstandardizedproportionsofpeoplewhodrinkalcohol,stratiedbyagegroup,accountingforunmeasuredconfoundingduetocluster.Leti=1,...,Mdenoteithneighborhoodsinthepopulationandj=1,...,Nidenotejthindividualsinithneighborhood.LetXijandZijdenotevectorsofindividual-levelexposuresandconfounders,respectively.Letuidenotearandomeffectandfollowacertaindistribution.Thenthefollowingmodelcanbeusedtomodeltheassociationbetweenresponseandindividual-levelexposuretoadjustfor 52

PAGE 53

confoundingZijandacluster-levelrandomeffectui. E(YijjXi,Zi,ui)=h(Xij+Zij+ui),(4) ui=(Xi,Zi,i).(4)whereisanunknownfunctionandiN(0,2),uimaybeassociatedinanarbitrarymannerwithXiandZi,Xi=(Xi1,...,Xip),Zi=(Zi1,...,Ziq),andhisaninverselinkfunction.WhenYijisabinaryoutcome,thenhcanbechosentobeanexpitfunctionsomodel( 4 )canbewrittenas E(YijjXi,Zi,ui)=expit(Xij+Zij+ui).(4)Themodel-basedstandardizedproportioncorrespondingtoXij=X0canbedenedas E(YijjXij=X0)=EZijZexpit(X0+Zij+ui)dF(ui)(4)Fordatafromarandomsample,themodel-basedstandardizedproportioncorrespondingtoXij=X0canbeestimatedasfollows: ^E(YijjXij=X0)=m)]TJ /F4 7.97 Tf 6.59 0 Td[(1mXi=1 n)]TJ /F4 7.97 Tf 6.58 0 Td[(1iniXj=1Zexpit(X0^+Zij^+ui)d^F(ui)!(4)Fordatafromacomplexsample,themodel-basedstandardizedproportioncorrespondingtoXij=X0canbeestimatedasfollows: ^E(YijjXij=X0)=mXi=1niXj=1Wij Pmi=1Pnij=1WijZexpit(X0^+Zij^+ui)d^F(ui)(4)whereWijisasamplingweightwhichistheinverseoftheprobabilityforselectingindividualjinneighborhoodi.Parametersand,andthedistributionofuimustbeestimatedrstbeforeE(YijjXij=X0)canbeestimated.Onewaytond,,anduiistoapplyanordinarylogisticregressionmodeltomodel( 4 )andtreatuiasasetofdummyvariables. 53

PAGE 54

TheproblemwithusingthismethodisthattheNeyman-Scottproblem[ 27 ]arisesinoursituationbecausethesamplesizewithineachneighborhoodistypicallysmall.Anotherwaytond,,andthedistributionofuiistoapplyageneralizedlinearmixedmodeltomodel( 4 ).However,theproblemofusingthismethodisfraughtwithdifcultiesarisingfromtheconfoundingduetounmeasuredneighborhoodfactorsandthecomplexsamplingdesignoftheBRFSS.Weproposetwomethods.Therstmethodistousetheconditionalpseudo-likelihoodmethodtoestimateand,usetheStataprogramgllammtoestimatethedistributionofui,anduseformula( 4 )toestimatethestandardizedproportion.ThismethodwillbereferredtoastheCPLmethod.ThesecondmethodistousetheStataprogramgllammwiththecensusdatatoestimate,,andthedistributionofui,andtouseformula( 4 )toestimatethestandardizedproportion.Thismethodwillbereferredtoasthecensusmethod. 4.2.1Method1:TheCPLMethodwithoutUsingtheCensusDataAsmentionedintheIntroductionsection,ahealthbehaviorisusuallyrelatedtopersonalfactorsandisalsopossiblyinuencedbyneighborhood-levelfactors.Inmodel( 4 ),uidependsonXi,Zi,andiviaafunction.However,thefunctionisusuallyunknownbecauseuiisunmeasuredandwedon'tknowwhichneighborhoodfactoruidependsonandhowuidependsonXi,Zi,andi.AfrequentlysimpleassumptionistoletuidependoniandclustermeansXiandZithroughalinearcombination.Thisassumptionwillbeusedinthecensusmethod.Evenforthissimpleassumption,itcannotbeappliedincertainsituations.Forexample,theBRFSSsurveydatadonotprovidetheinformationaboutclustermeans.Iftheclustermeansarereplacedbytheweightedsampleclustermean,Brumbacketal.[ 10 ]showthatthisreplacementdoesnotworkwellbecausetheestimated,,andthedistributionofuiinmodel( 4 )areallbiasedwhentheresponseisbinaryandsamplesizeswithineachclusteraresmalllike5orless.Toavoidthoseproblems,weproposetheCPLmethod. 54

PAGE 55

Therearetwostepsinthismethod.Therststepistoestimateandbyusingtheconditionalpseudo-likelihoodmethod.TheCPLmethodformsallpossiblepairswithineachclusterandallpairswithineachclusteraretreatedasiftheywereindependentintheconditionallikelihoodfunction.Allneighborhood-levelcovariatesareeliminatedintheconditionallikelihoodfunctionbyconditioningoneachofallpossiblepairsofoutcomeswithineachneighborhoodsotheCPLmethodestimatesandwithoutmakinganyassumptionaboutui.TheestimatedandobtainedbytheCPLmethodarewithin-clustercovariateeffectsbecausetheCPLmethodremovesallcluster-levelcovariatesfromtheconditionallikelihoodfunction[ 26 ].Brumbacketal.[ 9 ]showthattheCPLmethodcanconsistentlyandunbiasedlyestimatewithin-clustercovariateeffectswhenoutcomesarebinary,multinomial,count,andnon-negativecontinuousincomplexsurveydata.ThesecondstepistoestimatethedistributionofuibytreatingXij^+Zij^asanoffsetinmodel( 4 ).TheStataprogramgllammisusedtoestimatethedistributionofuibecauseitcantakesamplingweightsintoaccount.WhentheStataprogramgllammisused,itimpliestheassumptionthattherandomeffectuiisnormallydistributedandtheassumptionthattherandomeffectuiisindependentofindividual-levelfactors.LetFs(ui)denotetheestimateddistributionofuifromtheStataprogramgllamm.ThemeanofFs(ui)satisesthefollowingrelationships=Es(u)=EsX,Z,YEsujX,Z,Y(u)=EX,Z,YEsujX,Z,Y(u)=EX,Z,YEsujY(u)=(whenu?fX,Zg). 55

PAGE 56

Similarly,thevarianceofFs(ui)satisesthefollowingrelationship2s=EX,Z,YEsujX,Z,Y(u)]TJ /F3 11.955 Tf 11.96 0 Td[(s)2=EX,Z,YEsujY(u)]TJ /F3 11.955 Tf 11.96 0 Td[(s)2=2(whenu?fX,Zg).MoredetailsaboutthetworelationshipscanbefoundinAppendixD.Whenbothnormalityandindependenceassumptionsmentionedabovearetrue,thenFs(ui)=F(ui).Oncethedistributionofuiandparametersandareobtained,thecomputationofastandardizedproportioncanbecarriedoutstraightforwardbasedonformula( 4 ). 4.2.2Method2:TheCensusMethodDrinkingbehaviorisrelatedtobothpersonalfactorsandsocialeconomicfactorsattheneighborhoodlevel.Inthismethod,thosesocial-economicfactorsattheneighborhoodlevelinmodel( 4 )willbedenotedbytheirneighborhood-levelmeansfromthe2000censusdata.Inthecensusmethod,inmodel( 4 )isassumedtobealinearfunctionsouiisequaltoalinearcombinationofXi,Zi,andi.Therefore,uiinmodel( 4 )willincludetheneighborhood-levelmeansandarandomeffect.Itcanbedenotedby ui=Xix+Ziz+i,(4)whereiisassumedtobeindependentofallfactorsontwolevelsandtofollowanormaldistributionwiththemean0andthevariance2.Substituteuiinmodel( 4 ),thenmodel( 4 )canbewrittenas E(YijjXi,Zi,ui)=h(Xij+Zij+Xix+Ziz+i).(4)Comparingmodel( 4 )withausualbetween-withinmodel,weseethatandherearewithin-clustercovariateeffects,xandzarethedifferencesofbetween-clustereffectsandcorrespondingwithin-clustereffects.Ifabetween-clustereffectandcorrespondingwithin-clustereffectareidentical,theniszero.IfbothXandZarezero,thentherearenoeffectsofneighborhood-levelcovariatesonoutcomes.The 56

PAGE 57

2008FloridaBRFSSsurveydatawiththe2000censusdatawillbettedintomodel( 4 )byusingtheStataprogramgllamm.AlltheparameterswillbeestimatedbytheStataprogramgllamm. 4.3SimulationStudiesWeusesimulationstudiestodemonstratethatourmethodscangenerateconsistentestimatesofparametersinourproposedmethods.Weconductthreesimulationstudiesbasedon100simulationdatasetsusingthreedifferentmethods.First,wegenerateapopulationdatasetwhichincludesM=1000neighborhoodsandNi=1000individualsperneighborhood.Weletpij=expit(Xij+ui),ui=i+1Xi,iN(0,2),andXi=N)]TJ /F4 7.97 Tf 6.59 0 Td[(1iNij=1Xij.AssumeYijBer(pij)givenuiandXij,XijBer(expit(i))giveniN(0,1),andi?fi,Xijg.Wesamplepopulationdatawithineachneighborhood.Samplingrates0.2%and0.4%areusedtosampleconcordantobservationsanddiscordantobservations,respectively.Therefore,correspondingsamplingweightsare2and1forsampledconcordantobservationsanddiscordantobservations.Weletf,1,0,2g=f)]TJ /F5 11.955 Tf 15.27 0 Td[(0.5,2,0.5,1g.FromAppendixCweknowthatuiN(0+1=2,2+21=23)soE(ui)=1.5andVar(ui)=1.17.Inthepopulationdataset,unmeasuredclusterconfoundinguiisgeneratedbasedonui=i+1XiwhereXicanbeavectorofneighborhood-levelcovariateswhenXijisavectorofindividual-levelcovariates.Model( 4 )canbeusedtomodeltheassociationbetweenbinaryresponsesandcovariates.Intherstsimulationstudy,theCPLmethodisusedwhenXiisassumedtobenotavailableinsampledatasets.TheCPLmethodisusedtoestimaterst,thenthemeanandvarianceofuiareestimatedbytreatingXij^asanoffsetintheStataprogramgllamm.Therstsimulationstudyshowsthattheestimatedmeanandvarianceofuiareconsistent.Inthesecondsimulationstudy,censusdataareusedtoconstructXi.WeusetheStataprogramgllammtooptimizetheweightedcompositelikelihoodfunctionforestimatingalltheparameters.Thesecondsimulationstudyshowsthatestimatorsareconsistent.Inthethirdsimulationstudy,Xiisreplacedbytheweightedsamplecluster 57

PAGE 58

meanXWiwhenXiisassumedtobenotavailableinsampledatasets.Thismethodisreferredtoasaweightedmethod.TheStataprogramgllammisagainusedtooptimizetheweightedcompositelikelihoodfunctionforestimatingallparameters.Thissimulationstudyshowsthatestimatorofisbiased.WhenXiisavailable,wecanuseboththeCPLmethodandthecensusmethodweproposed.WhenXiisnotavailable,wecanonlyusetheCPLmethod. Table4-1. SimulationResultsBasedon100SimulationDataSets MethodEstimateObsMeanSDMinimumMaximum CPLcpl100-0.5320.158-0.894-0.196E(ui)1001.5350.1271.2581.809Var(ui)1001.1940.2290.6691.962Census100-0.5260.144-0.831-0.19011001.8500.3600.6762.72701000.5850.1850.1501.03821000.9310.1810.5051.732E(ui)1001.5100.1061.2611.729Var(ui)1001.0850.2020.5931.878Weighted100-0.8500.164-1.250-0.50111001.8080.2541.1632.59501000.7340.1290.4100.97921000.8140.1770.3601.584E(ui)1001.6380.1101.3901.861Var(ui)1000.9590.1860.4821.750 4.4ApplicationWeapplythetwoproposedmethodstothe2008FloridaBRFSSsurveydataforestimatingmodel-basedstandardizedproportions,stratiedbyagegroup,toadjustforconfoundingducetounmeasuredclustercovariates.Wealsoapplyaxedeffectmodeltothesamesurveydata.Table 4-2 presentsestimatedcoefcientsforthreemodels.Thesymbol*inTable 4-2 indicatesthatcovariatesareattheneighborhoodlevel,otherwise,theyareattheindividuallevel.InTable 4-2 ,model1usestheCPLmethod,model2usesaxedeffectmethod,andmodel3usesthecensusmethod.Wefocusoncomparingestimatedthecoefcientsofindividual-levelcovariatesinthethreedifferentmodels.Comparingthecoefcientsofindividual-levelcovariates 58

PAGE 59

Table4-2. AssociationsbetweenOutcomesandCovariatesinthe2008FloridaBRFSSComplexSurveyData Model1Model2Model3 FactorGroupCoeff.(95%CI)Coeff.(95%CI)Coeff.(95%CI) SexM000F-0.50(-0.714,-0.292)-0.62(-0.791,-0.453)-0.53(-0.721,-0.334)Edu.13-000(Years)14+0.51(0.233,0.792)0.55(0.344,0.751)0.50(0.247,0.755)IncomeLow000High0.58(0.335,0.818)0.80(0.614,0.986)0.64(0.413,0.860)Age18-3400035-54-0.36(-0.690,-0.027)-0.35(-0.601,-0.103)-0.35(-0.645,-0.604)55-64-0.66(-1.042,-0.269)-0.52(-0.803,-0.228)-0.65(-1.006,-0.300)65+-0.57(-0.913,-0.228)-0.46(-0.713,-0.217)-0.59(-0.904,-0.286)RaceWhite000Black-0.64(-1.123,-0.154)-0.82(-1.164,-0.472)-0.84(-1.273,-0.402)Hisp-0.62(-1.052,-0.188)-0.56(-0.844,-0.268)-0.64(-1.040,-0.236)Other-0.83(-1.351,-0.311)-0.60(-1.021,-0.180)-0.94(-1.457,-0.425)Intercept0.34(0.242,0.444)0.18(-0.097,0.458)-1.74(-4.592,1.120)SexMNANA0F-3.16(-7.916,1.586)Edu13-NANA(Years)14+3.56(0.830,6.298)IncomeLowNANA0Hight0.22(-1.572,2.009)Age18-34NANA035-540.12(-3.532,3.769)55-643.62(-0.525,7.774)65+0.36(-1.976,2.696)RaceWhiteNANA0Black1.09(0.060,2.115)Hisp1.15(0.087,2.210)Other-3.17(-9.668,3.323) estimatedfrommodel2andmodel3withthosefrommodel1,weseethatthedifferencesofcorrespondingestimatedcoefcientsbetweenmodel1andmodel3aremuchsmallerthanthosebetweenmodel1andmodel2onaverage.Itmeansthatmodel1andmodel3agreewitheachothermorethanmodel1andmodel2do.Brumbacketal.[ 9 ]showedthatestimatedcoefcientsfromtheCPLmethodareconsistentestimates.Therefore,neighborhood-levelconfoundingshouldbetaken 59

PAGE 60

intoaccount.Weseethatthecoefcientofneighborhood-levelcovariateeducationinmodel3hassignicantassociationwithdrinkingalcohol.Soitfurthersupportsthatneighborhood-levelconfoundingshouldbetakenintoaccount.Wealsonoticethatourestimatesof2inmodel1andmodel3are2.9e)]TJ /F5 11.955 Tf 12.9 0 Td[(18and9.0e)]TJ /F5 11.955 Tf 12.9 0 Td[(15withstandarderrors1.42e)]TJ /F5 11.955 Tf 12.53 0 Td[(15and1.31e)]TJ /F5 11.955 Tf 12.53 0 Td[(12,respectively.Sothevaluesof2intwomodelsareapproximatelyequaltozero.Model3areisalsoreferredtoasabetween-withinmodelsoestimatedcoefcientsofindividual-levelcovariatesinmodel3arereferredtoaswithin-clustereffects.Weuseformulas( 4 )and( 4 )tocomputemodel-basedstandardizedproportionsbasedonmodel1andmodel3,respectively.Model-basedstandardizedproportionsarepresentedinTable 4-3 .ThestandardizedproportionsfromtheCPLmethodinTable 4-3 arecomputedbasedonmodel1inTable 4-2 .ThestandardizedproportionsfromthecensusmethodinTable 4-3 arecomputedbasedonmodel3inTable 4-2 .Themeanstandardizedproportionsestimatedfromthetwomethodsbasedon100bootstrappingdatasetsarealmostidentical,sothetwomethodsgeneratealmostthesameresults.Ifneighborhood-levelmeansarenotavailable,theCPLmethodcanbeusedtoestimatestandardizedproportions.Ifneighborhood-levelmeansareavailable,boththeCPLmethodandthecensusmethodcanbeusedtoestimatestandardizedproportions. 4.5DiscussionWehavepresentedtwomethodstondmodel-basedstandardizedproportions.InTable 4-3 ,wehaveseenthatstandardizedproportionsfromtheCPLmethodarealmostidenticaltothosefromthecensusmethodbutestimatedstandarddeviationsfromtheCPLmethodishigherthanthosefromthecensusmethodonaverage.ThereasonisthattheCPLmethodcanonlyusediscordantpairsofobservationstoestimateindividual-levelparametersbutthecensusmethodusesallthedatatoestimateparameters.Ifcensusdataareavailable,weprefertousethecensusmethodbecauseitprovidesnarrower95%condenceintervalthantheCPLmethoddoes.Ifcensusdata 60

PAGE 61

Table4-3. Model-BasedStandardizedProportionsofPeopleWhoDrinkAlcohol,Stratiedbyagegroup,toAccountforUnmeasuredConfoundingduetoCluster MethodAgeGroupNMeanSD95%CIforMean CPLGroup1:18-341000.6180.030(0.612,0.624)Group2:35-541000.5380.020(0.534,0.542)Group3:55-641000.4770.033(0.471,0.484)Group4:65+1000.4920.023(0.487,0.497)Group2-Group1100-0.0800.037(-0.087,-0.073)Group3-Group1100-0.1410.046(-0.150,-0.132)Group4-Group1100-0.1250.042(-0.133,-0.117)CensusGroup1:18-341000.6130.026(0.608,0.618)Group2:35-541000.5380.018(0.534,0.542)Group3:55-641000.4760.027(0.471,0.481)Group4:65+1000.4870.019(0.483,0.491)Group2-Group1100-0.0750.030(-0.081,-0.069)Group3-Group1100-0.1370.039(-0.145,-0.129)Group4-Group1100-0.1260.034(-0.132,-0.119) arenotavailable,wecanusetheCPLmethodtoestimatestandardizedproportionsandtheir95%condenceinterval.TheStataprogramgllammcanbeveryslowwhentherearemanyparameterstobeestimated,manylatentvariables,andmanyobservations,asmentionedintheStataprogramgllammmanualbyRabe-Hesketh[ 28 ].ThenumberofparameterstobeestimatedinthecensusmethodistwotimesthatintheCPLmethodbutcomputationtimeinthecensusmethodismorethan30timesthatintheCPLmethod.AshortperiodofcomputationtimegivestheCPLmethodanadvantageoverthecensusmethodbecausetheCPLmethoddoesnotneedneighborhood-levelcovariatesforestimatingthedistributionofrandomeffects.InTable 4-2 ,weseethatmanyp-valuesofneighborhood-levelcoefcientsinmodel3aregreaterthan0.05sowecanspeedupthecomputationofthecensusmethodbyremovingneighborhood-levelcovariateswhosecoefcientshavep-valuesthataregreaterthan0.05.Thecensusmethodneedstospecifyneighborhood-levelcovariatesusedinamodelbuttheCPLmethoddoesnot.Thecensusmethodreliesontheassumption 61

PAGE 62

thatisalinearfunctionofarandomeffectandneighborhood-levelcovariatesbuttheCPLmethoddoesnotneedthisassumption.SotheCPLmethodismorerobustthanthecensusmethodbasedonthespecicationofneighborhoodcovariatesandtheassumptionabout.FromTable 4-2 ,weseethatamajorityofindividual-leveleffectsestimatedfrommodel1andmodel3areclosetoeachothersotheassumptionisapproximatelytrue.Weonlyprovidesimulationstudiestoshowthatourmethodsworkwellwhentheassumptionholds.Ideally,wewouldliketohavetheoreticalproofforourmethods.Itwillbeourfurtherresearchinthefuture. 62

PAGE 63

CHAPTER5ACHIEVEMENTS,LIMITATIONSANDFUTURERESEARCHChapter2haspresentedtheapplicationsoftheCPLmethodformanytypesofoutcomes.Brumbacketal.[ 7 ]rstappliedtheCPLmethodtobinaryoutcomesincomplexsurveydatausingtheSASSURVEYLOGISTICPROCEDURE.Thismethodgreatlysimpliedprogrammingandsimultaneouslyovercamemanydifcultissuessuchassmallsamplesizewithinneighborhood,complexsamplingdesign,andunmeasuredconfounding.Afterthat,thismethodhasbeengraduallyextendedtomanydifferenttypesofoutcomesincludingordinaloutcome,nominalmultinomialoutcome,countoutcome,andnonnegativeoutcome.Foreachtypeofoutcome,weusetheconditionalpseudo-likelihoodfunctiontodevelopamethodfortransformingresponseandcovariates.Afteroriginaldataaretransformedbytheruleswedevelopedbasedontheconditionalpseudo-likelihoodfunction,thosetransformeddataarereadytobeusedintheSASSURVEYLOGISTICPROCEDURE.WenotonlypresenttheoryandsimulationtoshowthatourmethodcanndconsistentestimatorinterestedbutalsoprovidetheSAScodeforcomputingtheestimatedparameters.Sofar,wehavedevelopedallmethodsandSAScodeforapplyingtheconditionalpseudo-likelihoodmethodtoeachtypeofoutcomes.TheSAScodeforeachtypeofoutcomesisaseparateleandhastorunindependentlytogetresults.Thisisnotconvenientforusers.ItispossibletointegrateallSAScodeforeachtypeofoutcomeintooneparticularprocedurelikemanySASprocedures.Bydoingthis,itwillmaketheCPLmethodmuchmoreeasiertousebecauseusersdonotneedtoconsiderhowtotransformdatabythemselves.Aftertherulesoftransformingdataweredeveloped,themajorityofworkistotransformdatainordertousetheCPLmethod.Ifwecandevelopageneralprocedurewhichcantransformdataautomatically,itwillbeveryconvenienttousers.Inordertotransformdataautomatically,theprocedureneedssomeinformation 63

PAGE 64

suchastypeofoutcome,alistofindividual-levelcovariatesinthemodel,thedistributionofresponse,clusterid,alinkfunction,andsamplingweightsforeachobservation.Whenoutcomesarecategoricalsuchasbinary,ordinal,andmultinomial,theCPLmethoddoesnotuseallpairs.Whenapairoftransformedoutcomeshavethesamevalue,thentheydonothavecontributiontothelikelihoodfunction.Thosepairsofdatacanbedroppedfromthedataset.Whenageneralizedlinearmixedmodelcanbeused,itwillhavehigherefciencythantheCPLmethod.Whenageneralizedlinermixedmodelcannotbeuseddirectly,forinstance,neighborhoodcovariatesdonotfollowanormaldistribution,theCPLmethodstillcanbeusedtondconsistentestimators.Chapter3haspresentedthebiasandmeasurementerrorinbetween-withinmodelsduetoadjustingforconfoundingbyunmeasuredclustercovariates.Between-withinmodelsarepopularinsocialepidemiologytoadjustforconfoundingbyneighborhoodcharacteristics.Therearesomesituationsunderwhichtheapproachfailstoadequatelyadjustsforconfoundingduetounmeasuredclustercovariates.Thischapterpresentedseveralsimulationstudiestoshowthatthefailurecanhappenwhenheteroscedasticityoferrorisnothandledproperly.Thosesimulationstudiesareconductedforlinearmixedmodels.Wehavealsofoundthefailurecanhappeningeneralizedlinearmixedmodels.Inthefuture,simulationstudiesforgeneralizedlinearmixedmodelsshouldbeconductedtoshowthathowheteroscedasticityoferrorcanleadtobiaswhenitisnothandledproperly.Insurveydata,thesamplesizewithineachneighborhoodistypicallysmall,whichresultsinthetroubleofusingbetween-withinmodelstoadjustforconfoundingduetoclustercovariates.Whencensusdataarenotavailable,asimplesamplemeanwithineachneighborhoodoraweightedsamplemeanswithineachneighborhoodisusedtoreplacethetruemeanwithineachneighborhood.Thisreplacementleadstofailuretoadjustforconfoundingbecausethisreplacementisoftenmuchdifferentfromthetruemeanwithineachneighborhoodduetosmallsamplesizewithineach 64

PAGE 65

neighborhood.Thedifferencebetweenthereplacementandtrueclustermeanleadstobiasedestimatorsofthecoefcientsofneighborhoodcovariatesinmodels.Thischapterpresentedhowtocorrectthebiasedestimatorsofneighborhoodcovariatesinlinearmixedmodels.Wealsofoundthatthishappenedingeneralizedlinearmixedmodels.Thecorrectionofbiasedestimatorsingeneralizedlinearmixedmodelsismorechallengingthantheoneinlinearmixedmodels.Chapter4haspresentedthemethodsofcomputingmodel-basedstandardizedproportionswithcomplexsurveydata.Personalbehaviorsdependsnotonlyonpersonalcharacteristicsbutalsopossibleonsocial-economicfactorssuchasneighborhoodlevelfactors.Whenwemodeltheassociationofapersonalbehaviorwithpersonalfactorsandneighborhoodlevelfactorsinsurveydata,neighborhoodlevelfactorsareusuallynotavailableorunmeasured.Usingsimplesamplemeansorweightedsamplemeanstoreplaceneighborhoodlevelfactorscanleadtoseriousbiaswhichwasdemonstratedbyoursimulationstudies.WeproposedtheCPLmethodtocircumventthereplacementandestimateindividuallevelcoefcientsrst,thentreatthesumoftheproductofindividualfactorsandcorrespondingindividualcoefcientsasanoffset,thenusingtheStataprogramgllammtoestimatethedistributionofneighborhoodlevelfactors.Afterthat,wecomputedmodel-basedstandardizedproportions.Whencensusdataareavailable,wecanusethecensusmethodwhichdirectlyusetheStataprogramgllammtoestimateallparameters,thenwecancomputemodel-basedstandardizedproportions.Simulationstudyshowsthat95%CIestimatesofgeneratedbythesetwomethodscoverthetruevalueofwhiletheweightedmethoddoesnot.Whenweappliedthesetwomethodstothe2008FloridaBRFSSsurveydata,thestandardizedproportionsestimatedbytwomethodswerealmostidentical.Fromthisapplication,wefoundthattheprogrambasedontheCPLmethodrunsmorethan30timesfasterthantheonebasedonthecensusmethod.Whenadatasetgetslarge,theCPLmethodhascomputationaladvantageoverthecensusmethod.Thecensusmethodhastospecify 65

PAGE 66

theneighborhoodlevelcovariatesusedinthemodelbuttheCPLmethoddoesnot,sotheCPLmethodismorerobustthanthecensusmethod. 66

PAGE 67

APPENDIXAEXPECTATIONOFTHESCOREFUNCTIONISZEROAbaselinecategorylogitmodelisdenedas logP(Yij=kjXi,bi) P(Yij=KjXi,bi)=Xijk+k+bi,k=1,...,K)]TJ /F5 11.955 Tf 11.96 0 Td[(1,(A)wherekaretheparametersofinterest,iindexesneighborhood,jindexesindividual,Xijareindividual-levelcovariates,biareneighborhood-levelcovariates,andXi=fXi1,...,Xinig.AssumeXijandXilareindependentgivenXiandbi.Forsimplicity,thenotationoftheconditiongivenXiandbiwillbeomittedfromallfollowingconditionalprobabilitymassfunctionsandprobabilitydensityfunctions.Forexample,P(jXi,bi,Sij)andP(jXi,bi)willbesimpliedasP(jSij)andP(),respectively.Kissetto3inthisproof.LetZ=(Yij,Yil)2fZ1,Z2,Z3,Z4,Z5,Z6gwhereZ1=(Yij=1,Yil=2),Z2=(Yij=1,Yil=3),Z3=(Yij=2,Yil=1),Z4=(Yij=2,Yil=3),Z5=(Yij=3,Yil=1),Z6=(Yij=3,Yil=2).ForZ=Z1,letS132fZ1,Z3g,thenwecanndtheprobabilityof(Yij=1,Yil=2)givenXi,bi,andS13as P(Z1jS13)=P(Yij=1)P(Yil=2) P(Yij=1)P(Yil=2)+P(Yij=2)P(Yil=1).(A)Basedonmodel( A ),wehavethefollowing P(Yij=k)=P(Yij=3)exp(Xijk+k+bi),k=1,...,K)]TJ /F5 11.955 Tf 11.96 0 Td[(1.(A)Therefore,wehavethefollowing P(Yij=1)=exp(Xij1+1+bi)P(Yij=3),(A) P(Yij=2)=exp(Xij2+2+bi)P(Yij=3),(A) P(Yil=1)=exp(Xil1+1+bi)P(Yil=3),(A) P(Yil=2)=exp(Xil2+2+bi)P(Yil=3).(A) 67

PAGE 68

Plug( A ),( A ),( A ),and( A )into( A ),wecansimplify( A )as P(Z1jS13)=exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(2)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(2)),(A)Similarly,wecanndtheprobabilityof(Yij=2,Yil=1)givenXi,bi,andS13as P(Z3jS13)=1 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(2)).(A)ForZ=Z2,letS252fZ2,Z5g,thenwecanndtheprobabilityof(Yij=1,Yil=3)givenXi,bi,andS25asP(Z2jS25)=P(Z2) P(Z2)+P(Z5)=P(Yij=1)P(Yil=3) P(Yij=1)P(Yil=3)+P(Yij=3)P(Yil=1)=P(Yij=3)exp(Xij1+1+bi)P(Yil=3) P(Yij=3)exp(Xij1+1+bi)P(Yil=3)+P(Yij=3)P(Yil=3)exp(Xil1+1+bi)=exp(Xij1) exp(Xij1)+exp(Xil)2).Itisequivalentto P(Z2jS25)=exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)1) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)1).(A)Similarly,wecanndfollowingprobabilitiesofpairs(Yij,Yil)givenXi,bi,andSas P(Z5jS25)=1 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)1),(A) P(Z4jS46)=exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)2) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)2),(A) P(Z6jS46)=1 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)2).(A) 68

PAGE 69

Therefore,thelogconditionalpseudolikelihoodfunctionofallpossiblepairswithineachneighborhoodforestimatingcanbewrittenasfollows:L()=mXi=1ni)]TJ /F4 7.97 Tf 6.58 0 Td[(1Xj=1niXl=j+1fI(Z=Z1)log(P(Z1jS13))+I(Z=Z3)log(P(Z3jS13))+I(Z=Z2)log(P(Z2jS25))+I(Z=Z5)log(P(Z5jS25))+I(Z=Z4)log(P(Z4jS46))+I(Z=Z6)log(P(Z6jS46))g,whereI(Z=Zi)isanindicatorfunctionfori=1,...,6.Plug( A ),( A ),( A ),( A ),( A ),( A )intotheabovelogconditionalpseudolikelihoodfunction,wecanhavethefollowinglogconditionalpseudolikelihoodfunctionL()=mXi=1ni)]TJ /F4 7.97 Tf 6.59 0 Td[(1Xj=1niXl=j+1 I(Z=Z1)logexp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(2)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(2))+I(Z=Z3)log1 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(2))+I(Z=Z2)logexp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)1) 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)1)+I(Z=Z5)log1 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)1)+I(Z=Z4)logexp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)2) 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)2)+I(Z=Z6)log1 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)2)!.Takethederivativewithrespectto1onthebothsidesoftheaboveloglikelihoodfunction,wegetascorefunction@L @1=mXi=1ni)]TJ /F4 7.97 Tf 6.58 0 Td[(1Xj=1niXl=j+1 I(Z=Z1)(Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(2)))]TJ /F7 11.955 Tf 13.15 8.09 Td[(I(Z=Z3)(Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(2)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(2))+I(Z=Z2)(Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)1))]TJ /F7 11.955 Tf 13.15 8.09 Td[(I(Z=Z5)(Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)1) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)1)!. 69

PAGE 70

Taketheexpectationwithrespecttoresponsesonthebothsidesoftheabovescorefunction,wegettheexpectationofthescorefunctionasfollows:E@L @1=mXi=1ni)]TJ /F4 7.97 Tf 6.59 0 Td[(1Xj=1niXl=j+1 E(I(Z=Z1))(Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil) 1+exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(2)))]TJ /F7 11.955 Tf 13.15 8.08 Td[(E(I(Z=Z3))(Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(2)) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(2))+E(I(Z=Z2))(Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)1))]TJ /F7 11.955 Tf 13.15 8.08 Td[(E(I(Z=Z5))(Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)exp((Xij)]TJ /F7 11.955 Tf 11.95 0 Td[(Xil)1) 1+exp((Xij)]TJ /F7 11.955 Tf 11.96 0 Td[(Xil)1)!.BasedontheassumptionthatXijandXilareindependentgivenXiandbi,wecanndE(I(Z=Zi))fori=1,...,6.Therefore,E(I(Z=Z1))=P(Z=Z1)=P(Yij=1,Yil=2)=P(Yij=1)P(Yil=2),E(I(Z=Z2))=P(Z=Z2)=P(Yij=1,Yil=3)=P(Yij=1)P(Yil=3),E(I(Z=Z3))=P(Z=Z3)=P(Yij=2,Yil=1)=P(Yij=2)P(Yil=1),E(I(Z=Z4))=P(Z=Z4)=P(Yij=2,Yil=3)=P(Yij=2)P(Yil=3),E(I(Z=Z5))=P(Z=Z5)=P(Yij=3,Yil=1)=P(Yij=3)P(Yil=1),E(I(Z=Z6))=P(Z=Z6)=P(Yij=3,Yil=2)=P(Yij=3)P(Yil=2).Plug( A ),( A ),( A ),and( A )intotheaboveexpressionsofE(I(Z=Zi))fori=1,...,6,thentheexpectationE(I(Z=Zi))fori=1,...,6canbewrittenasfollowingE(I(Z=Z1))=exp(Xij1+1+bi)P(Yij=3)exp(Xil2+2+bi)P(Yil=3),E(I(Z=Z2))=exp(Xij1+1+bi)P(Yij=3)P(Yil=3),E(I(Z=Z3))=exp(Xij2+2+bi)P(Yij=3)exp(Xil1+1+bi)P(Yil=3),E(I(Z=Z4))=exp(Xij2+2+bi)P(Yij=3)P(Yil=3),E(I(Z=Z5))=exp(Xil1+1+bi)P(Yij=3)P(Yil=3),E(I(Z=Z6))=exp(Xij2+2+bi)P(Yij=3)P(Yil=3). 70

PAGE 71

PlugthecorrespondingexpressionofE(I(Z=Zi))intotherighthandsideoftheexpectationofthescorefunction,itiseasytoshowthatboththenumeratorofthersttwofractionsandthenumeratorofthelasttwofractionsintheexpectationofthescorefunctionarezero.Therefore,theexpectationofthescorefunctionof1iszerobasedontheconditionalpseudolikelihoodfunction.Similarly,wecanshowthattheexpectationofthescorefunctionof2iszerobasedontheconditionalpseudolikelihoodfunction. 71

PAGE 72

APPENDIXBPROOFOFIDENTICALESTIMATORSINTWOMODELSShowthattwocoefcientsofXijinthefollowingtwomodelsareequal.Model1is Yij=0+1Xij+i+ij,(B)andmodel2is Yij=0+1Xij+2Xis+i+ij,(B)wherei=1,...,m,j=1,...,n,ijN(0,2),iN(0,2n),i?fXij,ijg,andXis=n)]TJ /F4 7.97 Tf 6.59 0 Td[(1ni=1Xij.Prove1areidenticalinthetwomodels.Formodel1,let=(0,1,1,...,m)0,Y=(Y11,...,Ymn)0,=(11,...,mn)0,andN(0,2IN)whereN=mn.LetX=0BBBBBBBBBBBBBBBBB@1X111...0...............1X1n1...0...............1Xm10...1...............1Xmn0...11CCCCCCCCCCCCCCCCCAthenX0X=0BBBBBBBBBB@NNXn...nNXijX2ijnXs1...nXsmnnXs1n...0...............nnXsm0...n1CCCCCCCCCCAandX0Y=(NY,ijXijYij,nY1,...,nYm)0,whereY=N)]TJ /F4 7.97 Tf 6.58 0 Td[(1mi=1nj=1Yij,X=N)]TJ /F4 7.97 Tf 6.59 0 Td[(1mi=1nj=1Xij,Yi=n)]TJ /F4 7.97 Tf 6.59 0 Td[(1nj=1Yij,andXsi=n)]TJ /F4 7.97 Tf 6.59 0 Td[(1nj=1Xij. 72

PAGE 73

BecauseX0Xisnotfullrank,onlym+1parameterscanbeestimated.Onecommonconstraintismi=1i=0.Basedontheleastsquaresestimators,X0X^=X0Y,thenwecanhavethefollowing ^0=Y)]TJ /F5 11.955 Tf 13.65 2.66 Td[(X^1(B) i=Yi)]TJ /F5 11.955 Tf 11.95 0 Td[((^0+Xsi^1)fori=1,...,m(B) NX^0+^1mi=1nj=1X2ij+nmi=1Xsii=mi=1nj=1XijYij(B)Pluggingequations( B )and( B )into( B ),wecanget ^1=mi=1nj=1XijYij)]TJ /F7 11.955 Tf 11.95 0 Td[(NXY)]TJ /F7 11.955 Tf 11.95 0 Td[(nmi=1Xsi(Yi)]TJ /F5 11.955 Tf 13.89 2.65 Td[(Y) mi=1nj=1X2ij)]TJ /F7 11.955 Tf 11.96 0 Td[(NX2)]TJ /F7 11.955 Tf 11.95 0 Td[(nmi=1Xsi(Xsi)]TJ /F5 11.955 Tf 13.65 2.65 Td[(X)=mi=1nj=1(Xij)]TJ /F5 11.955 Tf 13.64 2.65 Td[(Xsi)Yij mi=1nj=1(Xij)]TJ /F5 11.955 Tf 13.64 2.65 Td[(Xsi)2(B)Formodel2,itcanberewrittenas Yij=0+1(Xij)]TJ /F5 11.955 Tf 14.44 2.66 Td[(Xis)+(1+2)Xis+i+ij.(B)Let=(0,1,1+2)0,Y=(Y11,...,Ymn)0,=(11,...,mn)0,N(0,2IN),=(1,...,m)0,R=Var()=2IN,G=Var()=2nIm.LetX=0BBBB@1X11)]TJ /F5 11.955 Tf 13.64 2.66 Td[(Xs1Xs1.........1Xmn)]TJ /F5 11.955 Tf 13.65 2.65 Td[(XsmXsm1CCCCAandZ=0BBBBBBBBBBBBBBBBB@10...0............10...0............0...01............0...011CCCCCCCCCCCCCCCCCA 73

PAGE 74

Model2inamatrixformisY=X+Z+.SoV=Var(Y)=ZVar()Z0+Var()=ZGZ0+R=2nZImZ0+2IN=2nZZ0+2IN.Therefore,V=0BBBB@2Im+2nJm...0.........0...2Im+2nJm1CCCCA=0BBBB@A...0.........0...A1CCCCAwhereA=2Im+2nJm.SoA)]TJ /F4 7.97 Tf 6.58 0 Td[(1=(2))]TJ /F4 7.97 Tf 6.59 0 Td[(1Im+)]TJ /F10 7.97 Tf 6.58 0 Td[(2n 2+m2nJm,andV)]TJ /F4 7.97 Tf 6.59 0 Td[(1=0BBBB@A)]TJ /F4 7.97 Tf 6.59 0 Td[(1...0.........0...A)]TJ /F4 7.97 Tf 6.59 0 Td[(11CCCCAandA)]TJ /F4 7.97 Tf 6.59 0 Td[(1=0BBBBBBB@ab...bba...b............bb...a1CCCCCCCAwherea=2+(m)]TJ /F4 7.97 Tf 6.59 0 Td[(1)2n 2(2+m2n)andb=)]TJ /F10 7.97 Tf 6.59 0 Td[(2n 2(2+m2n).NowwecanndX0V)]TJ /F4 7.97 Tf 6.59 0 Td[(1andthenndX0V)]TJ /F4 7.97 Tf 6.59 0 Td[(1XandX0V)]TJ /F4 7.97 Tf 6.58 0 Td[(1YasfollowsX0V)]TJ /F4 7.97 Tf 6.59 0 Td[(1=0BBBB@1...1X11)]TJ /F5 11.955 Tf 13.65 2.65 Td[(Xs1...Xmn)]TJ /F5 11.955 Tf 13.65 2.65 Td[(XsmXs1...Xsm1CCCCA0BBBB@A)]TJ /F4 7.97 Tf 6.58 0 Td[(1...0.........0...A)]TJ /F4 7.97 Tf 6.59 0 Td[(11CCCCA=0BBBB@(2+m2n))]TJ /F4 7.97 Tf 6.58 0 Td[(1...(2+m2n))]TJ /F4 7.97 Tf 6.59 0 Td[(1(2))]TJ /F4 7.97 Tf 6.59 0 Td[(1(X11)]TJ /F5 11.955 Tf 13.65 2.66 Td[(Xs1)...(2))]TJ /F4 7.97 Tf 6.59 0 Td[(1(Xmn)]TJ /F5 11.955 Tf 13.65 2.66 Td[(Xsm)(2+m2n))]TJ /F4 7.97 Tf 6.59 0 Td[(1Xs1...(2+m2n))]TJ /F4 7.97 Tf 6.59 0 Td[(1Xsm1CCCCA,thenX0V)]TJ /F4 7.97 Tf 6.59 0 Td[(1X=0BBBB@N(2+m2n))]TJ /F4 7.97 Tf 6.58 0 Td[(1(2+m2n))]TJ /F4 7.97 Tf 6.59 0 Td[(1ij(Xij)]TJ /F5 11.955 Tf 13.64 2.66 Td[(Xsi)(2+m2n))]TJ /F4 7.97 Tf 6.58 0 Td[(1ijXsi(2))]TJ /F4 7.97 Tf 6.58 0 Td[(1ij(Xij)]TJ /F5 11.955 Tf 13.65 2.66 Td[(Xsi)(2))]TJ /F4 7.97 Tf 6.58 0 Td[(1ij(Xij)]TJ /F5 11.955 Tf 13.65 2.66 Td[(Xsi)2(2))]TJ /F4 7.97 Tf 6.59 0 Td[(1ij(Xij)]TJ /F5 11.955 Tf 13.64 2.66 Td[(Xsi)Xsi(2+m2n))]TJ /F4 7.97 Tf 6.58 0 Td[(1ijXsi(2))]TJ /F4 7.97 Tf 6.59 0 Td[(1ij(Xij)]TJ /F5 11.955 Tf 13.64 2.66 Td[(Xsi)Xsi(2+m2n))]TJ /F4 7.97 Tf 6.59 0 Td[(1ij(Xsi)21CCCCA 74

PAGE 75

=0BBBB@N(2+m2n))]TJ /F4 7.97 Tf 6.59 0 Td[(10(2+m2n))]TJ /F4 7.97 Tf 6.58 0 Td[(1ijXsi0(2))]TJ /F4 7.97 Tf 6.59 0 Td[(1ij(Xij)]TJ /F5 11.955 Tf 13.65 2.66 Td[(Xsi)20(2+m2n))]TJ /F4 7.97 Tf 6.59 0 Td[(1ijXsi0(2+m2n))]TJ /F4 7.97 Tf 6.59 0 Td[(1ij(Xsi)21CCCCA,andX0V)]TJ /F4 7.97 Tf 6.59 0 Td[(1Y=0BBBB@(2+m2n))]TJ /F4 7.97 Tf 6.59 0 Td[(1ijYij(2))]TJ /F4 7.97 Tf 6.58 0 Td[(1ij(Xij)]TJ /F5 11.955 Tf 13.65 2.65 Td[(Xsi)Yij(2+m2n))]TJ /F4 7.97 Tf 6.58 0 Td[(1ijXsiYij1CCCCA.Basedonthegeneralizedleastsquaresestimators,weknow(X0V)]TJ /F4 7.97 Tf 6.59 0 Td[(1X)^=X0V)]TJ /F4 7.97 Tf 6.59 0 Td[(1Y,wecanndtheestimatorof1,whichis ^1=ij(Xij)]TJ /F5 11.955 Tf 13.64 2.66 Td[(Xsi)Yij ij(Xij)]TJ /F5 11.955 Tf 13.64 2.66 Td[(Xsi)2.(B)Comparing( B )with( B ),weseetheyareidentical.Similarly,wecanndestimated0whichis ^0=N)]TJ /F4 7.97 Tf 6.59 0 Td[(1XijYij)]TJ /F5 11.955 Tf 11.96 0 Td[((^1+^2)N)]TJ /F4 7.97 Tf 6.59 0 Td[(1XijXsi.(B)IfthemeanofthemarginaldistributionofXijiszero,then^0=N)]TJ /F4 7.97 Tf 6.59 0 Td[(1PijYijwhichisnoteffectedbyXsi. 75

PAGE 76

APPENDIXCDERIVATIONOFTHEDISTRIBUTIONOFTHECLUSTEREFFECTSLetpij=expit(Xij+ui),ui=i+1Xi,iN(0,2),andXi=N)]TJ /F4 7.97 Tf 6.59 0 Td[(1iNij=1Xij.AssumeYijg(pij)givenuiandXij,XijBer(expit(i))giveniN(0,1),andi?fi,Xijg.Onecanndthedistributionofuiasfollows.Inordertondthemarginaldistributionofui,wecanusetheconditionalexpectationandvarianceofuigiveni,thenwehavethefollowingE(uiji)=E(i+1Xiji)=E(iji)+1E(Xiji)=E(i)+(1=Ni)Nij=1(E(Xijji))=0+1E(Xijji)=0+1ei 1+ei,Var(uiji)=Var(i+1Xiji)=Var(iji)+21Var(Xijji)=Ni=Var(i)+(21=Ni)Var(Xijji)=2+21N)]TJ /F4 7.97 Tf 6.58 0 Td[(1iei (1+ei)2.ThenthemeanandvarianceofthemarginaldistributionofuiarecomputedasfollowsE(ui)=E(E(uiji))=E0+1ei 1+ei=0+1Eei 1+eiVar(ui)=E(Var(uiji))+Var(E(uiji))=2+21N)]TJ /F4 7.97 Tf 6.58 0 Td[(1iEei (1+ei)2+21Varei 1+ei 76

PAGE 77

Therefore,theestimatedmeanandvarianceofthedistributionofuiforapopulationwithaniteclustersareoftheform\E(ui)=0+1 MMXi=1ei 1+ei\Var(ui)=2+21 MNiMXi=1ei (1+ei)2+21 MMXi=1ei 1+ei)]TJ /F18 11.955 Tf 12.01 3.43 Td[(\E(ui)2SinceiN(0,1),itisverysimpletousetheMonteCarlointegrationforcomputingM)]TJ /F4 7.97 Tf 6.59 0 Td[(1PMi=1(ei=(1+ei)),M)]TJ /F4 7.97 Tf 6.59 0 Td[(1PMi=1(ei=(1+ei)2),andM)]TJ /F4 7.97 Tf 6.59 0 Td[(1PMi=1(ei=(1+ei))]TJ /F18 11.955 Tf 11.8 3.43 Td[(\E(ui))2whenMislarge.TheirestimatedvaluescanbehighlyaccuratewhenMislarge.TheestimatedvaluesofM)]TJ /F4 7.97 Tf 6.59 0 Td[(1PMi=1(ei=(1+ei)),M)]TJ /F4 7.97 Tf 6.58 0 Td[(1PMi=1(ei=(1+ei)2),andM)]TJ /F4 7.97 Tf 6.59 0 Td[(1PMi=1(ei=(1+ei))]TJ /F18 11.955 Tf 13.28 3.43 Td[(\E(ui))2areapproximatelyequalto1/2,1/5,and1/23,respectively.Accordingtothecentrallimittheorem,XihasanapproximatelynormaldistributionsinceNiislarge.BecauseNiislarge,themiddletermofthevarianceofuiisapproximatetozero.Givenihasanormaldistributionandi?Xijsoi?Xi.Therefore,uihasanapproximatelynormaldistributionasfollows uiN(0+1=2,2+21=23)(C) 77

PAGE 78

APPENDIXDDISTRIBUTIONOFRANDOMEFFECTSLetp(u)N(R,2R),p(Xju),andp(YjX,u)berealdistributions.Letps(u)N(,2),ps(Xju),andps(YjX,u)bestandarddistributions.Letp(u)N(E,2E),p(Xju),andp(YjX,u)beanotherstandarddistributions.TheassumptionX?uisheldinbothstandarddistributionsbutitisnotheldfortherealdistributions.ThesedistributionshavethefollowingrelationshipspS(YjX,u)=p(YjX,u)p(YjX,u)=p(YjX,u)pS(Xju)=p(X)p(Xju)=p(X)pS(u)=p(u)p(u)6=p(u).ThepairedsampledataaretintotheStatagllamm,whichistomaximize XiXj
PAGE 79

Therefore,@ @pS(YjX;)=Zp(YjX,u)@ @p(u;)du (D)Morespecically,@ @pS(YjX;)=Zp(YjX,u)p(u;)u)]TJ /F3 11.955 Tf 11.95 0 Td[( 2du (D)@ @pS(YjX;)=Zp(YjX,u)p(u;)(u)]TJ /F3 11.955 Tf 11.95 0 Td[()2 3)]TJ /F5 11.955 Tf 13.22 8.09 Td[(1 du (D)Hence,E[S()]=NZZp(YjX,u)p(u;)u)]TJ /F3 11.955 Tf 11.96 0 Td[( 2dup(YjX;) pS(YjX;)dy (D)E[S())]=NZZp(YjX,u)p(u;)(u)]TJ /F3 11.955 Tf 11.96 0 Td[()2 3)]TJ /F5 11.955 Tf 13.22 8.09 Td[(1 dup(YjX;) pS(YjX;)dy (D)ThenE[S()]canbewrittenasE[S()]=EYjXESupS(YjX,u) pS(YjX)u)]TJ /F3 11.955 Tf 11.95 0 Td[( 2ind.=EYjXESupS(ujX,Y) pS(u)u)]TJ /F3 11.955 Tf 11.96 0 Td[( 2=EYjXESu)]TJ /F3 11.955 Tf 11.95 0 Td[( 2jX,YEXE[S()]=1 2EX,Y(ES(ujX,Y))]TJ /F3 11.955 Tf 11.96 0 Td[()Similarly,EXE[S()]=1 3EX,YES((u)]TJ /F3 11.955 Tf 11.96 0 Td[()2jX,Y))]TJ /F5 11.955 Tf 13.22 8.09 Td[(1 IfEX,YES(ujX,Y).=andEX,YEs((u)]TJ /F3 11.955 Tf 12.38 0 Td[()2jX,Y).=2,thenourestimatingequationhasthemean0. 79

PAGE 80

Nextforanotherstandarddistributions,estimatingequationssolveXiXj
PAGE 81

BasedonBayesianrule,wehavep(ujX,Y)=p(u,YjX) p(YjX)=p(YjX,u)p(ujX) p(YjX)=p(YjX,u)p(u) Rp(YjX,u)p(u)du=p(YjX,u)p(u) Rp(YjX,u)p(u)duThentheestimatedmeanandvarianceofuicanbewrittenas^=XiXj
PAGE 82

APPENDIXESASCODEANDSTATACODEINCHAPTER4 E.1SASCodeforSimulationStudies ***************************************************************;*Title:SimulationforBinaryResponseandunequallySampling;***************************************************************;*ThelocationofunequalSampleData;libnamemydata"C:\S4B\data\unequalSample";*Part1:Generatepopulationdataset;%macrogpop(M,Ni,beta,gamma0,gamma1,tau);*Generatex_ij~bin(&Ni,p_i)wherelogit(px_i)=u_iandu_i~N(0,1);datasimx;doi=1to&M;u_i=rand('normal',0,1);px_i=exp(u_i)/(1+exp(u_i));doj=1to&Ni;x_ij=rand('binomial',px_i,1);output;end;end;run;*Findthemeanofx_ijineachofclusters;procmeansdata=simxnoprint;classi;varx_ij;outputout=xmeanmean=xi_bar;run; 82

PAGE 83

*Generateb_i=gamma*x_bar+del_iwheredel_i~N(0,1);dataxmeana(drop=i);setxmean(keep=ixi_bar);where(i^=.);del_i=rand('normal',&gamma0,&tau);b_i=&gamma1*xi_bar+del_i;*b_i=&gamma0+&gamma1*xi_bar+&tau*del_i;run;datasim_di(drop=i);setxmeana;doi=1to&Ni;output;end;run;*Generatepy_ijforeachofy_ijbasedonx_ij,beta,andb_i;*Populationdatasetissavedinmydata.simxyp;datamydata.simxyp(keep=ijx_iju_ixi_bardel_ib_ipy_ijy_ij);mergesimxsim_di;py_ij=exp(&beta*x_ij+b_i)/(1+exp(&beta*x_ij+b_i));y_ij=rand('Binomial',py_ij,1);run;%mendgpop;%gpop(M=1000,Ni=1000,beta=-0.5,gamma0=0.5,gamma1=2,tau=1);*****************************************************;*Part2:UnequallySamplepopulationdatamanytimes;*------------------------------------------------------------;%macrounequal_sample(ts); 83

PAGE 84

%dorep=1%to&ts;datassimxyw;setmydata.simxyp;replicate=&rep;w_ij=1;ify_ij=x_ijthendo;w_ij=2;select=rand('binomial',.002,1);end;elseselect=rand('binomial',.004,1);ifselect=1thendo;xw_ij=x_ij*w_ij;output;end;run;%if&rep=1%then%do;datamydata.unequal_ssimxyw;setssimxyw;run;%end;%else%do;procappendbase=mydata.unequal_ssimxywdata=ssimxyw;run;%end;%end;%mendunequal_sample;%unequal_sample(ts=100)/*Computeweightedsamplemeanforeachcluster*/datatemp;setmydata.unequal_ssimxyw;run;procsortdata=temp;byreplicatei;run;*computeweightedsamplemeansforeachclusterineachofsample;datawsm4c(keep=replicateiwsmb_i);settemp;byreplicatei;iffirst.ithendo;sum_xw=0;sum_w=0;end;sum_xw+xw_ij;sum_w+w_ij;iflast.ithendo;wsm=sum_xw/sum_w;output;end;run;*Mergeweightedsamplemeanswithaunequal_sampledataset; 84

PAGE 85

*byreplicateandclusteri;procsql;createtableunequal_ssimxyw_wsm4casselectone.*,two.wsmasxi_hatfromtempone,wsm4ctwowhere(one.replicate=two.replicateandone.i=two.i);quit;*MakepairsforGLMMwithineachclusterofeachsample;procsql;createtablematchasselectone.iasi,one.jasone_j,two.jastwo_j,one.x_ijasx_ij1,two.x_ijasx_ij2,one.y_ijasy_ij1,two.y_ijasy_ij2,one.w_ij*two.w_ijasweight_prod,one.b_iasb_i,one.xi_barasxi_bar,one.xi_hatasxi_hat,one.del_iasdelta_i,one.replicateasreplicatefromunequal_ssimxyw_wsm4cone,unequal_ssimxyw_wsm4ctwowhere(one.replicate=two.replicateandone.i=two.iandone.j
PAGE 86

datawcc_pairs;setmatch;ify_ij1=y_ij2thendelete;ify_ij1=1andy_ij2=0thendo;y_ijl=1;diff=x_ij1-x_ij2;end;ify_ij1=0andy_ij2=1thendo;y_ijl=0;diff=x_ij1-x_ij2;end;run;%macrofindbeta(unequal_sample);%dors=1%to&unequal_sample;*selectonesample;dataall_pairs;setwcc_pairs;where(replicate=&rs);run;*Useallpairstoestimatebeta;title"Sample&rs";procsurveylogisticdata=all_pairs;odsoutputParameterEstimates=one_estimates;clusteri;modely_ijl(ref='0')=diff/link=glogitnoint;WeightWeight_prod;run;title;datatemp2;setone_estimates;replicate=&rs;run;%if&rs=1%then%do;datamydata.all_estimates;settemp2;run;%end;%if&rs^=1%then 86

PAGE 87

%do;procappendbase=mydata.all_estimatesdata=temp2;run;%end;%end;*Computemeansandstandarderrorsofparameters;title"Estimatebasedon&unequal_samplesample";procmeansdata=mydata.all_estimates;classvariable;varestimateStdErr;run;title;%mendfindbeta;%findbeta(unequal_sample=100);*mergeestimatedbetawithamatchdataset;procsql;createtablemydata.Match4unequalSampleasselectone.*,two.estimateasbetafrommatchone,mydata.all_estimatestwowhere(one.replicate=two.replicate);quit;*mergeestimatedbetawithasampledataset;procsql;createtablemydata.unequal_ssimxyw_wsm4c_betaasselectone.*,two.estimateasbetafromunequal_ssimxyw_wsm4cone,mydata.all_estimatestwowhere(one.replicate=two.replicate); 87

PAGE 88

quit;PROCEXPORTDATA=mydata.unequal_ssimxyw_wsm4c_betaOUTFILE="C:\S4B\Data\unequalSample\unequal_ssimxyw_wsm4c_beta.dta"DBMS=STATAREPLACE;RUN;PROCEXPORTDATA=MYDATA.Match4unequalSampleOUTFILE="C:\S4B\Data\unequalSample\Match4unequalSample.dta"DBMS=STATAREPLACE;RUN;PROCEXPORTDATA=MYDATA.All_estimatesOUTFILE="C:\S4B\Data\unequalSample\all_estimates.dta"DBMS=STATAREPLACE;RUN; E.2StataCodeforSimulationStudies //------------------------------------------------------------------------//Usingunequalsamplemethod//Truebeta=-0.5,gamma1=2,gamma0=0.5,andtao=1//Populationwith1000clusters,1000observationspercluster.//-----------------------------------------------------------------------programdefinegetdistsetmoreoffmatmyest=J(100,16,.)matcolnamemyest=A_BetaA_g1A_g0A_tau2B_BetaB_g1B_g0B_tau2C_g0C_tau2mu_avar_amu_bvar_bmu_cvar_cforvaluesi=1/100{use"C:\S4B\Data\unequalSample\Match4unequalSample.dta",clearkeepifreplicate==`i'genfinal_id=_nreshapelongx_ijy_ij,i(final_id)j(pair_id) 88

PAGE 89

genpwt2=weight_prod/10000genpwt1=1genxb=x_ij*beta//ModelA:usingGLLAMMandcensusdata//sob_i=gamma1*xi_bar+delta_iandx_ijareusinginthemodelA.gllammy_ijx_ijxi_bar,i(final_id)pweight(pwt)link(logit)family(binomial)robustcluster(i)adaptmatA=e(b)matmyest[`i',1]=A[1,1]matmyest[`i',2]=A[1,2]matmyest[`i',3]=A[1,3]matmyest[`i',4]=A[1,4]*A[1,4]//ModelB:usingGLLAMM//sob_i=gamma1*xi_hat+delta_iandx_ijareusinginthemodelB.gllammy_ijx_ijxi_hat,i(final_id)pweight(pwt)link(logit)family(binomial)robustcluster(i)adaptmatB=e(b)matmyest[`i',5]=B[1,1]matmyest[`i',6]=B[1,2]matmyest[`i',7]=B[1,3]matmyest[`i',8]=B[1,4]*B[1,4]//ModelC:usingCPL+GLLAMMsooffset(xb)usinginthemodelC.gllammy_ij,offset(xb)i(final_id)pweight(pwt)link(logit)family(binomial)robustcluster(i)adaptmatC=e(b)matmyest[`i',9]=C[1,1]matmyest[`i',10]=C[1,2]*C[1,2] 89

PAGE 90

matmyest[`i',11]=A[1,3]+A[1,2]/2matmyest[`i',12]=A[1,4]*A[1,4]+A[1,2]*A[1,2]/23matmyest[`i',13]=B[1,3]+B[1,2]/2matmyest[`i',14]=B[1,4]*B[1,4]+B[1,2]*B[1,2]/23matmyest[`i',15]=C[1,1]matmyest[`i',16]=C[1,2]*C[1,2]}svmatmyest,names(col)keepA_BetaA_g1A_g0A_tau2B_BetaB_g1B_g0B_tau2C_g0C_tau2mu_avar_amu_bvar_bmu_cvar_cdropifA_Beta==.summarizesave"C:\S4B\Data\UnequalSample\allparameters.dta",replaceendgetdist E.3SASCodeforComputingIndividual-LevelCoefcientsUsingtheCPLMethodandGeneratingBootstrappingDatasetsfromthe2008BRFSSSurveySata **********************************************************************;*Uses:1.Computeindividual-levelcoefficientsusingtheCPLmethod.*2.Generatebootstrappingdatasetsfromthe2008FloridaBRFSS*surveydata.**********************************************************************;%letlocal=1;%lettest=1;%letsim=2;%macrolib();%if&local=0%then%do;libnamemydata/home/zycai/originaldata/data;%end; 90

PAGE 91

*forrunningonalocalcomputer;%if&local=1%then%do;libnamemydata"C:\OriginalData\data";%end;%if&test=1%then%do;%letsim=4;%end;%if&test=0%then%do;%letsim=100;%end;%mend;%lib;datamydata.allxbmydata.allbetasmydata.allprop;set_null_;run;databrfss08(keep=genderraceageeducaincomedrnkanyavedrnkdrnkdypmalepfemalepwhitepblackphisppasianpotherpaoplt25p25to50pgt50plthspgthspage1page2page3page4_finalwtzipid);setmydata.alldeid08;ifpasian^=.;renamerace2=racesex=genderincome2=incomedrnkany4=drnkanyavedrnk2=avedrnk_DRNKDY3=drnkdy;pao=pasian+pother;pmale=1-pfemale;page1=1-page2-page3-page4;run;*Excludeobservationswithmissingvalues;databrfss; 91

PAGE 92

setbrfss08;ifdrnkanyin(1,2);ifdrnkany=2thendrnkany=0;**Deletemissingandunknowndata;ifracenotin(9,.);ifeducanotin(9,.);ifagenotin(7,9,.);ifincomenotin(77,99,.);run;procsortdata=brfss;byzipid;run;*Addnumberstodenoteindividualsineachofzipcodes;dataalcohol;setbrfss;byzipid;iffirst.zipidthenID_J=0;ID_J+1;output;run;*Removeanobservationthatisonlyonesampleinazipcode;dataalcohol_r;setalcohol;byzipid;iffirst.zipid=1andlast.zipid=1thendelete;run;**Categorizealcoholdata;dataalcoholc;setalcohol_r; 92

PAGE 93

female=0;ifgender=2thenfemale=1;*education:CollegeorabovevslessthanCollege;edu_h=0;ifeduca>4thenedu_h=1;income_h=0;ifincome>5thenincome_h=1;black=0;ifrace=2thenblack=1;hisp=0;ifrace=8thenhisp=1;asian=0;ifrace=3thenasian=1;other=0;ifracein(4,5,6,7)thenother=1;ao=asian+other;*age:referenceis[18,34];age2=0;if35<=age<=54thenage2=1;age3=0;if55<=age<=64thenage3=1;age4=0;ifage>=65thenage4=1;run;**Collectalldistinctzipcode;datazipcode(keep=zipid);setalcohol_r;byzipid;iffirst.zipid=1thenoutput;run;%macrosimu(sim);%dos=1%to∼*Resampledatabyzipcode;procsurveyselectdata=zipcodeout=zipcode_btmethod=urssamprate=1outhits 93

PAGE 94

rep=1;run;*Sortdatabyzipcode,andNumberHits;procsortdata=zipcode_bt;byzipidNumberHits;run;*Collectalldistinctzipcodeafterresampling;datazipcode_bs;setzipcode_bt;byzipidNumberHits;iffirst.NumberHits=1thenoutput;run;*Generatenewzipcodebasedonoldzipcodeandthenumberofhits*inresamplingdata;datazipcode_new(drop=NH);setzipcode_bs;simu=&s;NH=NumberHits;dowhile(NH>0);zipid_new=zipid+(NH-1)*0.01;NH=NH-1;output;end;run;*Mergenewzipcodedatawithcategorizedalcoholdatabyoldzipcode;procsql;createtablealcohol_bsas 94

PAGE 95

selecta.*,b.zipid_new,b.simufromalcoholcasa,zipcode_newasbwhere(a.zipid=b.zipid);run;*droporiginalfivecovariates;dataalcohol_bs;setalcohol_bs;dropAgeEducaIncomeGenderRace;run;procsortdata=alcohol_bs;byzipid_new;run;*Formallpossiblepairswithineachnewzipcode;*andtransformdatatouseordinarylogisticregression;procsql;createtablepairsasselecta.simu,a.zipid_new,a.ID_JasID_J1,b.ID_JasID_J2,a.drnkanyasdrnkany1,b.drnkanyasdrnkany2,a._finalwtas_finalwt1,b._finalwtas_finalwt2,a.edu_h-b.edu_hasd_edu_h,a.female-b.femaleasd_female,a.income_h-b.income_hasd_income_h,a.black-b.blackasd_black,a.hisp-b.hispasd_hisp,a.asian-b.asianasd_asian,a.other-b.otherasd_other, 95

PAGE 96

a.ao-b.aoasd_ao,a.age2-b.age2asd_age2,a.age3-b.age3asd_age3,a.age4-b.age4asd_age4fromalcohol_bsasa,alcohol_bsasbwhere(a.zipid_new=b.zipid_newanda.ID_J
PAGE 97

**Collectcoefficientsfromaboveresults;databeta;setone_estimate(keep=estimate);run;proctransposedata=betaout=temp;run;databetas;settemp;simu=&s;renameCOL1=b_femaleCOL2=b_eduCOL3=b_incomeCOL4=b_age2COL5=b_age3COL6=b_age4COL7=b_blackCOL8=b_hispCOL9=b_ao;drop_name_;RUN;**d_femaled_edu_hd_income_hd_age2d_age3d_age4d_blackd_hispd_ao;**Computelinearpredictorsforeachobservation;**Createavariabledemo_rincludingallcovariatesexceptage;procsql;createtablealcoholxbasselecta.simu,a.zipid_new,a.ID_J,a.avedrnk,a.drnkany,a._finalwt, 97

PAGE 98

a.female,a.edu_h,a.income_h,a.age2,a.age3,a.age4,a.black,a.hisp,a.ao,a.pfemale,a.pgths,a.pgt50,a.pblack,a.phisp,a.pao,a.page2,a.page3,a.page4,a.female*b.b_female+a.edu_h*b.b_edu+a.income_h*b.b_income+a.age2*b.b_age2+a.age3*b.b_age3+a.age4*b.b_age4+a.black*b.b_black+a.hisp*b.b_hisp+a.ao*b.b_aoasxb, 98

PAGE 99

a.female*1+a.edu_h*2+a.income_h*4+a.black*8+a.hisp*16+a.ao*32asdemo_rfromalcohol_bsasa,betasasbwhere(a.simu=b.simu);run;**freqencyaccordingtodemobyage;dataallxbc;setalcoholxb;age=put(age2,1.)||put(age3,1.)||put(age4,1.);demo=put(female,1.)||put(edu_h,1.)||put(income_h,1.)||put(black,1.)||put(hisp,1.)||put(ao,1.);run;procfreqdata=allxbc;tabledemo*age/nofreqnorownocolout=resultssparse;weight_finalwt;run;datatemp1;setresults(keep=percent);run;proctransposedata=temp1out=prop;run;dataprop;setprop(drop=_NAME__LABEL_);simu=&s;run;*Savealcoholdata,linearpredictor,beta,proportionsinlibrarymydata;%if&s^=1%then%do;procappendbase=mydata.allbetasdata=betas;run;procappendbase=mydata.allxbdata=alcoholxb;run;procappendbase=mydata.allpropdata=prop;run;%end;%if&s=1%then 99

PAGE 100

%do;datamydata.allbetas;setbetas;run;datamydata.allxb;setalcoholxb;run;datamydata.allprop;setprop;run;%end;%end;%mendsimu;%simu(&sim);title"SimulationResultsfor&simSampledDataSetswithWeight";procmeansdata=mydata.allbetas;run;title;**FormapaireddatasetforbeingusedinStata;procsql;createtablepairsasselecta.simu,a.zipid_new,a.ID_J,a.pfemale,a.pgths,a.pgt50,a.pblack,a.phisp,a.pao,a.page2,a.page3,a.page4,a.xbasxb1,a.drnkanyasdrnkany1,a.femaleasfemale1, 100

PAGE 101

a.edu_hasedu_h1,a.income_hasincome_h1,a.age2asage21,a.age3asage31,a.age4asage41,a.blackasblack1,a.hispashisp1,a.aoasao1,b.xbasxb2,b.drnkanyasdrnkany2,b.femaleasfemale2,b.edu_hasedu_h2,b.income_hasincome_h2,b.age2asage22,b.age3asage32,b.age4asage42,b.blackasblack2,b.hispashisp2,b.aoasao2,a._finalwt*b._finalwtasw_ijlfrommydata.allxbasa,mydata.allxbasbwhere(a.simu=b.simuanda.zipid_new=b.zipid_newanda.ID_J
PAGE 102

OUTFILE="/home/zycai/originaldata/data/pairs.dta"DBMS=STATAREPLACE;RUN;PROCEXPORTDATA=MYDATA.allbetasOUTFILE="/home/zycai/originaldata/data/allbetas.dta"DBMS=STATAREPLACE;RUN;PROCEXPORTDATA=MYDATA.allpropOUTFILE="/home/zycai/originaldata/data/allprop.dta"DBMS=STATAREPLACE;RUN;%end;%if&local=1%then%do;PROCEXPORTDATA=pairsOUTFILE="C:OriginalData\data\pairs.dta"DBMS=STATAREPLACE;RUN;PROCEXPORTDATA=MYDATA.allbetasOUTFILE="C:\OriginalData\data\allbetas.dta"DBMS=STATAREPLACE;RUN;PROCEXPORTDATA=MYDATA.allpropOUTFILE="C:\OriginalData\data\allprop.dta"DBMS=STATAREPLACE;RUN;%end;%mend;%output(); E.4StataCodeforEstimatingtheDistributionofRandomEffectUsingtheCPLMethod setmoreoffprogramdefinegetdistsetmoreoff 102

PAGE 103

setmemory20gsetmatsize400matmyest=J(400,2,.)matcolnamemyest=mu_bvar_bforvaluesi=1/100{use"/home/zycai/originaldata/data/pairs.dta",clearkeepifsimu==`i'genfinal_id=_nreshapelongxbdrnkany,i(final_id)j(pair_id)genpwt2=w_ijl/10000genpwt1=1//Model:usingCPL+GLLAMMsooffset(xb)andb_i=delta_iareused.gllammdrnkany,offset(xb)i(final_id)pweight(pwt)link(logit)family(binomial)robustcluster(zipid_new)adaptmatA=e(b)matmyest[`i',1]=A[1,1]matmyest[`i',2]=A[1,2]*A[1,2]}svmatmyest,names(col)keepmu_bvar_bdropifmu_b==.gensimu=_nsortsimusave"/home/zycai/originaldata/data/allmv.dta",replaceendgetdistclearall 103

PAGE 104

exit E.5StataCodeforComputingIndividual-andNeighborhood-LevelCoefcientsandtheDistributionofRandomEffectUsingtheCensusMethod setmoreoffprogramdefinegetdistsetmoreoffsetmemory20gsetmatsize400matmyest=J(400,20,.)matcolnamemyest=b_femaleb_edub_incomeb_age2b_age3b_age4b_blackb_hispb_otherb_pfemaleb_pgthsb_pgt50b_page2b_page3b_page4b_pblackb_phispb_pothermu_bvar_bforvaluesi=1/100{use"/home/zycai/originaldata/data/pairs.dta",clearkeepifsimu==`i'genfinal_id=_nreshapelongdrnkanyfemaleedu_hincome_hage2age3age4blackhispao,i(final_id)j(pair_id)genpwt2=w_ijl/10000genpwt1=1//Model:usingCPL+GLLAMMsooffset(xb)andb_i=delta_iareused.gllammdrnkanyfemaleedu_hincome_hage2age3age4blackhispaopfemalepgthspgt50page2page3page4pblackphisppao,i(final_id)pweight(pwt)link(logit)family(binomial)robustcluster(zipid)adaptmatA=e(b)matmyest[`i',1]=A[1,1] 104

PAGE 105

matmyest[`i',2]=A[1,2]matmyest[`i',3]=A[1,3]matmyest[`i',4]=A[1,4]matmyest[`i',5]=A[1,5]matmyest[`i',6]=A[1,6]matmyest[`i',7]=A[1,7]matmyest[`i',8]=A[1,8]matmyest[`i',9]=A[1,9]matmyest[`i',10]=A[1,10]matmyest[`i',11]=A[1,11]matmyest[`i',12]=A[1,12]matmyest[`i',13]=A[1,13]matmyest[`i',14]=A[1,14]matmyest[`i',15]=A[1,15]matmyest[`i',16]=A[1,16]matmyest[`i',17]=A[1,17]matmyest[`i',18]=A[1,18]matmyest[`i',19]=A[1,19]matmyest[`i',20]=A[1,20]*A[1,20]}svmatmyest,names(col)keepb_femaleb_edub_incomeb_age2b_age3b_age4b_blackb_hispb_otherb_pfemaleb_pgthsb_pgt50b_page2b_page3b_page4b_pblackb_phispb_pothermu_bvar_bdropifmu_b==.gensimu=_nsortsimu 105

PAGE 106

save"/home/zycai/originaldata/data/allparameters.dta",replaceendgetdistclearallexit E.6SASCodeforComputingModel-BasedStandardizedProportionsUsingtheCPLMethod *************************************************************;*Use:Computingmodel-basedstandardizedproportionsofpeople*whodrinkalcohol,stratifiedbyagegroup,usingCPLmethod;*************************************************************;%letlocal=1;%macrolib();%if&local=0%then%do;libnamemydata/home/zycai/originaldata/data/cpl;%end;*forrunningonlocalcomputer;%if&local=1%then%do;libnamemydata"C:\OriginalData\data\cpl";%end;%mend;%lib;procsortdata=mydata.allmv;bysimu;run;procsortdata=mydata.allxb;bysimu;run;procsortdata=mydata.allbetas;bysimu;run;dataalldata;mergemydata.allxbmydata.allmvmydata.allbetas;bysimu;run; 106

PAGE 107

procsortdata=alldata;bysimu;run;%macrostandardization(age2,age3,age4,num_int);%if&age2=0and&age3=0and&age4=0%then%do;datastand_1(keep=simuage_1);%end;%if&age2=1and&age3=0and&age4=0%then%do;datastand_2(keep=simuage_2);%end;%if&age2=0and&age3=1and&age4=0%then%do;datastand_3(keep=simuage_3);%end;%if&age2=0and&age3=0and&age4=1%then%do;datastand_4(keep=simuage_4);%end;setalldata;bysimu;iffirst.simuthendo;sumw=0;sumwe=0;end;item=female*b_female+edu_h*b_edu+income_h*b_income+&age2*b_age2+&age3*b_age3+&age4*b_age4+black*b_black+hisp*b_hisp+ao*b_ao;sum=0;doi=1to&num_int;bi=rand('normal',mu_b,sqrt(var_b));sum=sum+exp(item+bi)/(1+exp(item+bi));end;expect=sum/&num_int;sumw+_FINALWT;sumwe+_FINALWT*expect;iflast.simuthendo;%if&age2=0and&age3=0and&age4=0%then%do;age_1=sumwe/sumw;%end;%if&age2=1and&age3=0and&age4=0%then 107

PAGE 108

%do;age_2=sumwe/sumw;%end;%if&age2=0and&age3=1and&age4=0%then%do;age_3=sumwe/sumw;%end;%if&age2=0and&age3=0and&age4=1%then%do;age_4=sumwe/sumw;%end;output;end;run;%mend;%letnum_int=1000;*Standardizationtoagegroup1;%letage2=0;%letage3=0;%letage4=0;%standardization(&age2,&age3,&age4,&num_int);*Standardizationtoagegroup2;%letage2=1;%letage3=0;%letage4=0;%standardization(&age2,&age3,&age4,&num_int);*Standardizationtoagegroup3;%letage2=0;%letage3=1;%letage4=0;%standardization(&age2,&age3,&age4,&num_int);*Standardizationtoagegroup4;%letage2=0;%letage3=0;%letage4=1;%standardization(&age2,&age3,&age4,&num_int);procsortdata=stand_1;bysimu;run;procsortdata=stand_2;bysimu;run;procsortdata=stand_3;bysimu;run;procsortdata=stand_4;bysimu;run;dataallagec; 108

PAGE 109

mergestand_1stand_2stand_3stand_4;bysimu;d_age2=age_2-age_1;d_age3=age_3-age_1;d_age4=age_4-age_1;labelage_1='Age18~34'age_2='Age35~54'age_3='Age55~64'age_4='Age65+'d_age2='Age35~54-Age18~34'd_age3='Age55~64-Age18~34'd_age4='Age65+-Age18~34';run;odsrtffile='C:\OriginalData\expection_cpl.rtf';title1"Model-basedStandardizationtotheProportionofPopulation";title2"WhoDrinkAlcohol,StratifiedbyAgeGroup,toAccountfor"title3"UnmeasuredConfoundingduetoCluster.";procmeansdata=allagecmaxdec=3;varage_1age_2age_3age_4d_age2d_age3d_age4;run;title;odsrtfclose; E.7SASCodeforComputingModel-BasedStandardizedProportionsUsingtheCensusMethod *************************************************************;*Use:Computingmodel-basedstandardizedproportionsofpeople*whodrinkalcohol,stratifiedbyageagroup,usingcensus 109

PAGE 110

*method;*************************************************************;%letlocal=1;%macrolib();%if&local=0%then%do;libnamemydata/home/zycai/originaldata/data/cen;%end;*forrunningonlocalcomputer;%if&local=1%then%do;libnamemydata"C:\OriginalData\data\cen";%end;%mend;%lib;PROCIMPORTOUT=WORK.ALLparametersDATAFILE="C:\OriginalData\data\cen\allparameters.dta"DBMS=STATAREPLACE;RUN;procsortdata=allparameters;bysimu;run;procsortdata=mydata.allxb;bysimu;run;dataalldata;mergemydata.allxballparameters;bysimu;run;procsortdata=alldata;bysimu;run;%macrostandardization(age2,age3,age4,num_int);%if&age2=0and&age3=0and&age4=0%then%do;datastand_1(keep=simuage_1);%end;%if&age2=1and&age3=0and&age4=0%then%do;datastand_2(keep=simuage_2);%end;%if&age2=0and&age3=1and&age4=0%then%do;datastand_3(keep=simuage_3);%end;%if&age2=0and&age3=0and&age4=1%then 110

PAGE 111

%do;datastand_4(keep=simuage_4);%end;setalldata;bysimu;iffirst.simuthendo;sumw=0;sumwe=0;end;item=female*b_female+edu_h*b_edu+income_h*b_income+&age2*b_age2+&age3*b_age3+&age4*b_age4+black*b_black+hisp*b_hisp+ao*b_other+pfemale*b_pfemale+pgths*b_pgths+pgt50*b_pgt50+page2*b_page2+page3*b_page3+page4*b_page4+pblack*b_pblack+phisp*b_phisp+pao*b_pother;sum=0;doi=1to&num_int;bi=rand('normal',mu_b,sqrt(var_b));sum=sum+exp(item+bi)/(1+exp(item+bi));end;expect=sum/&num_int;sumw+_FINALWT;sumwe+_FINALWT*expect;iflast.simuthendo;%if&age2=0and&age3=0and&age4=0%then%do;age_1=sumwe/sumw;%end;%if&age2=1and&age3=0and&age4=0%then%do;age_2=sumwe/sumw;%end;%if&age2=0and&age3=1and&age4=0%then%do;age_3=sumwe/sumw;%end;%if&age2=0and&age3=0and&age4=1%then%do;age_4=sumwe/sumw;%end;output; 111

PAGE 112

end;run;%mend;letnum_int=1000;*Standardizationtoagegroup1;%letage2=0;%letage3=0;%letage4=0;%standardization(&age2,&age3,&age4,&num_int);*Standardizationtoagegroup2;%letage2=1;%letage3=0;%letage4=0;%standardization(&age2,&age3,&age4,&num_int);*Standardizationtoagegroup3;%letage2=0;%letage3=1;%letage4=0;%standardization(&age2,&age3,&age4,&num_int);*Standardizationtoagegroup4;%letage2=0;%letage3=0;%letage4=1;%standardization(&age2,&age3,&age4,&num_int);procsortdata=stand_1;bysimu;run;procsortdata=stand_2;bysimu;run;procsortdata=stand_3;bysimu;run;procsortdata=stand_4;bysimu;run;dataallagec;mergestand_1stand_2stand_3stand_4;bysimu;d_age2=age_2-age_1;d_age3=age_3-age_1;d_age4=age_4-age_1;labelage_1='Age18~34' 112

PAGE 113

age_2='Age35~54'age_3='Age55~64'age_4='Age65+'d_age2='Age35~54-Age18~34'd_age3='Age55~64-Age18~34'd_age4='Age65+-Age18~34';run;odsrtffile='C:\OriginalData\expection_cen.rtf';title1"Model-basedStandardizationtotheProportionsofPopulation";title2"WhoDrinksAlcohol,stratifiedbyAgeGroup,toAccountfor";title3"UnmeasuredConfoundingduetoCluster.";procmeansdata=allagecmaxdec=3;varage_1age_2age_3age_4d_age2d_age3d_age4;run;title;odsrtfclose; 113

PAGE 114

REFERENCES [1] AgrestiA.CategoricalDataAnalysis(2ndedition).Wiley:Hoboken,2002. [2] ArmitageP,ColtonT.EncyclopediaofBiostatistics:8-Volume.Wiley:Secondedition,2005. [3] BerksonJ.AreThereTwoRegressions?JournaloftheAmericanStatisticalAssociation1950;45(250):164-180. [4] BesagJ.SptialInteractionandtheStatisticalAnalysisofLatticesystemsJournaloftheRoyalstatisticalSociety.SeriesB(Methodological),1974;36(2):192-234. [5] BreslowN,DayN,HalvorsenK,PrenticeR,SabaiC.Estimationofmultiplerelativeriskfunctionsinmatchedcase-controlstudies.AmericanJournalofEpidemiology.1978;108:299-307. [6] BrumbackBA,DaileyAB,BrumbackLC,LivingstonMD.Adjustingforconfoundingbyclusterusinggeneralizedlinearmixedmodels.StatisticsandProbabilityLetters2010;80:1650-1654. [7] BrumbackBA,DaileyAB,HeZ,BrumbackLC,LivingstonMD.Effortstoadjustforconfoundingbyneighborhoodusingcomplexsurveydata.StatisticsinMedicine2010;29(18):1890-1899. [8] BrumbackBA,HeZ.Adjustingforconfoundingbyneighborhoodusingcomplexsurveydata.StatisticsinMedicine2011;30:965-972. [9] BrumbackBA,CaiZ,HeZ,ZhengH,DailyA.Conditionalpseudo-likelihoodmethodsforclusteredordinal,multinomial,orcountoutcomewithcomplexsurveydata.StatisticsinMedicine2013;32:1325-1335. [10] BrumbackBA,ZhengHW,DaileyAB.Adjustingforconfoundingbyneighborhoodusinggeneralizedlinearmixedmodelsandcomplexsurveydata.StatisticsinMedicine2013;32:1313-1324. [11] BuonaccorsiJP.MeasurementErrorModels,Methods,andApplications.TaylorandFracisGroup:LLC,2010. [12] CarrollRJ,SpiegelmanC,LanKK,BaileKT,AbbottRD.OnError-in-variablesforbinaryrepressionmodelsBiometrika,1984;71:19-25. [13] CarrollRJ,StefanskiLA,Covariatemeasurementerrorinlogisticregression.AnnalsofStatistics,1985;13:1335-1351. [14] CarrollRJ,RuppertD,StefanskiL.A.MeasurementErrorinNonlinearModels.Chapman&Hall,1995. [15] CambridgeDictionariesOnline.http://dictionary.cambridge.org. 114

PAGE 115

[16] CasellaG.BergerR.L.StatisticalInference.2ndEdition.DuxburyPress:2002. [17] ColeSR,HernanMA.Constructinginverseprobabilityweightsformarginalstructuralmodels.AmericanJournalofEpidemiology2008;168:656-664. [18] Demidenko,E.MixedModels:TheoryandApplications.NewYork:Wiley,2004. [19] GoetgelukS,VansteelandtS.Conditionalgeneralizedestimatingequationsforanalysisofclusteredandlongitudinaldata.Biometrics2008;64:772-292. [20] GraubardBI,KornEL.Conditionallogisticregressionwithsurveydata.StatisticsinBiopharmaceuticalResearch2011;3(2):398-408. [21] GreenlandS.Estimatingstandardarizedparametersfromgeneralizedlinearmodels.StatisticsinMedicine.1991;10:1069-1074. [22] GrilliL,RampichiniC.Measurementerrorinmultilevelmodelswithsampleclustermeans.WorkingpaperintheDipartimentodiStatistica51attheUniversitadegliStudidiFirenze,2009.Availablefromwww.ds.unif.it. [23] McCullag,P,NelderJA.GeneralizedLinearModels,Secondedition.ChapmanandHall:London,1989. [24] LiangK.ExtendedMantel-Haenszelestimatingprocedureformultivariatelogisticregressionmodels.Biometrics1987;43(2):289-299. [25] NeuhausJM,McCullochCE.Separatingbetween-andwithin-clustercovariateeffectsbyusingconditionalandpartitioningmethods.JournaloftheRoyalStatisti-calSociety,Series2006;68:859-872. [26] NeuhausJM,KalbeischJD.Between-andwithin-clustercovariateeffectintheanalysisofclustereddata.Biometrics1998;54:638-645. [27] NeymanJ,ScottEL.Consistentestimatesbasedonpartiallyconsistentobservations.Econometrica1948;16:1-32. [28] Rabe-HeskethS,SkrondalA.Multilevelmodelingofcomplexsurveydata.JournaloftheRoyalStatisticalSociety,SeriesA2006;169:805-827. [29] RothmanKJ,SanderGreenlandS,and,LashTL.ModernEpidemiology.LWW;ThirdEdition. [30] SchempfAH,kaufmanJS.Accountingforcontextinstudiesofhealthinequalities:areviewandcomparisonofanalyticapproaches.AnnalsofEpidemiology2012;22:683-690. [31] SjolanderA,LichtensteinP,LarssonH,PawitanY.Between-withinmodelsforsurvivalanalysis.StatisticsinMedicine2013;32:3067-3076. 115

PAGE 116

[32] VarinC,ReidN,FirthD.Anoverviewofcompositelikelihoodmethods.StatisticaSinica2011;21:5-42. 116

PAGE 117

BIOGRAPHICALSKETCH ZhuangyuCaiwasborninJiangxi,China.HereceivedthedegreeofBachelorofScienceinPhysicsfromJiangxiUniversityinChina.AttheUniversityofFlorida,hereceivedthedegreeofMasterofPublicHealthwithconcentionbiostatisticsinthespringof2009,thedegreeofMasterofScienceinBiostatisticsinthefallof2012,andaPh.D.inBiostatisticsinthesummerof2014. 117