Causal Inference with Complex Sampling Designs

MISSING IMAGE

Material Information

Title:
Causal Inference with Complex Sampling Designs
Physical Description:
1 online resource (110 p.)
Language:
english
Creator:
He, Zhulin
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Biostatistics
Committee Chair:
Brumback, Babette
Committee Members:
Lu, Xiaomin
Cantrell, Amy
Rheingans, Richard D

Subjects

Subjects / Keywords:
causal-inference -- complex-survey-data -- confounding -- instrumental-variable -- logistic-regression -- matched-pairs -- pseudolikelihood -- structural-nested-models
Biostatistics -- Dissertations, Academic -- UF
Genre:
Biostatistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
This PhD dissertation concerns causal inference with data from complex sampling designs. It consist of several aspects regarding newly developed or implemented methodologies with complex survey data, which are described as follows. First, we consider the problem of adjusting for confounding by cluster in the context of complex multistage sampling and a binary outcome. We investigate three categories of approaches -- ordinary logistic regression for survey data, with either no effect or a fixed effect for each cluster; conditional logistic regression extended for survey data; and generalized linear mixed model (GLMM) regression for survey data. We use theory, simulation, and analyses of the 2005 National Health Interview Survey (NHIS) data to compare and contrast all of these methods. One conclusion is that all of the methods perform poorly when the sampling bias is strong, which motivates us to find another method works properly with strong biased sampling designs. We then show that for logistic regression with a simple match-pairs design, infinitely replicating observations and maximizing the conditional likelihood results in an estimator identical to the unconditional maximum likelihood estimator (MLE) with a fixed effect for each pair based on the original sample. Therefore, applying conditional likelihood methods to a pseudosample with observations replicated a large number of times can lead to a biased estimator. This casts doubt on one alternative approach to conditional logistic regression with complex survey data. In the third chapter, we generalize binary conditional logistic regression for complex survey data by implementing the method based on a weighted pseudo-likelihood, in which the contribution from each neighborhood involves all pairs of cases and controls in the neighborhood. We show that it corresponds to an equivalent ordinary weighted log-likelihood formulation with binary outcomes. We explain how to program the method using standard software for ordinary logistic regression with complex survey data. We then apply the method to 2009 National Health Interview Survey (NHIS) public use data, to estimate the effect of education on health insurance coverage, adjusting for confounding by neighborhood. Last, we concentrate on adjusting for unmeasured confounding of the effect of cluster-level adherence on an individual binary outcome with complex sampling designs. Seeking new methodologies for adjusting for confounding due to cluster effects, we use double inverse-probability weighting to adjust for the disproportionate sampling and the association of individual-level confounders with randomization. Then we develop and apply methods based on structural nested models to estimate effects of adherence assessed in terms of relative risk and risk difference, using cluster-level randomization as an instrumental variable and using the double weights to adjust for complex sampling and individual-level confounding. As an important application, we wish to estimate the effect of school-level adherence on individual absenteeism in the context of a school-based water, sanitation, and hygiene intervention (WASH) in Western Kenya.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Zhulin He.
Thesis:
Thesis (Ph.D.)--University of Florida, 2012.
Local:
Adviser: Brumback, Babette.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2013-08-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2012
System ID:
UFE0044538:00001


This item is only available as the following downloads:


Full Text

PAGE 1

CAUSALINFERENCEWITHCOMPLEXSAMPLINGDESIGNSByZHULINHEADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2012

PAGE 2

c2012ZhulinHe 2

PAGE 3

Tomyfamily,whoofferedmeunconditionalloveandsupportthroughoutthecourseofmyPh.D.study 3

PAGE 4

ACKNOWLEDGMENTS Iwouldliketoexpressmysincerethankstomanyindividualswhoofferedtheirinvaluableassistanceandgeneroushelpinthepreparationofthisdissertation.Iamheartilythankfultomyadvisor,Dr.BabetteBrumback,whoseencouragement,guidanceandsupportfromtheinitialtothenallevelenabledmetodevelopanunderstandingofbiostatistcs,especiallyincausalmodelingandinferenceandcomplexsurveydataanalysis.DuringmyfouryearPh.D.study,shenotonlymentoredmehowtodoresearchinstatistics/biostatistics,butalsogavemegeneroushelpandsuggestionsonmyfurthercareer.Iconsidermyselfveryluckytohavebeenstudyingunderherguidance,andIwillalwaysbethankfulforeverythingIlearnedfromher.IwanttothanktheDepartmentofBiostatisticsforgivingmeallthechancesandfacilitiestomygraduatestudy.ThefacultiesandstaffsareverynicetothegraduatestudentsandalwaysgivemealotofadvicesandhelpformygraduatestudyandlifeinGainesville.IreallyenjoythepeacefulandgreatresearchenvironmentintheDepartmentofBiostatisticsattheUniversityofFlorida.Iwouldliketothankthemembersofmydoctoralcommitteefortheirinput,valuablediscussionsandaccessibility.Inparticular,IwouldliketothankDr.XiaominLuandDr.AmyCantrell,whogavemeadvicesandhelpinmygraduatestudy,encouragedmeformyfurthercareerandrecommendedmewithoutreservation.IwouldalsoliketothankDr.RichardRheingans,whoseenthusiasticinstructionsandgenerousencouragementshavefacilitatedmemuchthroughmyresearchworkwithDr.BabetteBrumbackinSWASHproject.Finally,andmostimportantly,Iwouldexpressmythankstomyparentsandmyhusbandwhoalwaysgivemetheirendlessloveandboundlesssupport.Thankyouforgivingmegreatcondenceandstrongsupportinmylife. 4

PAGE 5

Chapter1isreprintedfromStatisticsinMedicine,Volume29,Issue18,BrumbackBA,DaileyAB,HeZ,BrumbackLC,andLivingstonMD,EffortstoAdjustforConfoundingbyNeighborhoodUsingComplexSurveyData,pages1890-1899,2010,withpermissionfromJohnWiley&Sons.Chapter2ispartiallyreprintedfromCommunicationsinStatisticsTheoryandMethods,HeZ,BrumbackBA,AnEquivalenceofConditionalandUnconditionalMaximumLikelihoodEstimatorsviaInniteReplicationofObservations,inpress,withpermissionfromTaylor&FrancisGroup.Chapter3isreprintedfromStatisticsinMedicine,Volume30,Issue9,BrumbackBA,HeZ,Adjustingforconfoundingbyneighborhoodusingcomplexsurveydata,pages965-972,2011,withpermissionfromJohnWiley&Sons. 5

PAGE 6

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 10 CHAPTER 1EFFORTSTOADJUSTFORCONFOUNDINGBYNEIGHBORHOODUSINGCOMPLEXSURVEYDATA ............................. 12 1.1Outline ...................................... 12 1.2Introduction ................................... 12 1.3MotivatingNHISExample ........................... 15 1.4ModelingFramework .............................. 16 1.5ApproachestoEstimationwithComplexSurveyData ........... 18 1.5.1OrdinaryLogisticRegressionforComplexSurveyData ....... 18 1.5.2ConditionalLogisticRegressionforComplexSurveyData ..... 19 1.5.3GLMMRegressionforComplexSurveyData ............ 21 1.6SimulationStudy ................................ 24 1.6.1OrdinaryLogisticRegressionforComplexSurveyData ....... 25 1.6.2ConditionalLogisticRegressionforComplexSurveyData ..... 25 1.6.3GLMMRegressionforComplexSurveyData ............ 26 1.6.4Results ................................. 26 1.7ApplicationtoNHISData ........................... 30 1.8Discussion ................................... 32 2ANEQUIVALENCEOFCONDITIONALANDUNCONDITIONALMAXIMUMLIKELIHOODESTIMATORSVIAINFINITEREPLICATIONOFOBSERVATIONS 33 2.1Outline ...................................... 33 2.2Introduction ................................... 33 2.3MainResults .................................. 37 2.4SimulationStudy ................................ 46 2.5Discussion ................................... 48 3ADJUSTINGFORCONFOUNDINGBYNEIGHBORHOODUSINGCOMPLEXSURVEYDATA .................................... 50 3.1Outline ...................................... 50 3.2Introduction ................................... 50 3.3MotivatingNHISExample ........................... 52 3.4ModelingFramework .............................. 53 6

PAGE 7

3.5EstimationwithComplexSurveyData .................... 54 3.5.1SimpliedImplementation ....................... 56 3.5.2ExtensiontoMultipleIndividual-LevelCovariates .......... 58 3.5.3SpecifyingthePairwiseWeightsinPractice ............. 58 3.6SimulationStudy ................................ 59 3.6.1Results ................................. 60 3.7ApplicationtoNHISData ........................... 60 3.8Discussion ................................... 61 4ESTIMATINGTHEEFFECTOFCLUSTER-LEVELADHERENCEONANINDIVIDUALBINARYOUTCOMEWITHACOMPLEXSAMPLINGDESIGN .. 63 4.1Outline ...................................... 63 4.2Introduction ................................... 63 4.2.1InstrumentalVariable .......................... 64 4.2.2StructuralNestedModels ....................... 66 4.3Methods ..................................... 68 4.3.1Estimationwithcomplexsamplingdesigns .............. 68 4.3.2VarianceEstimation .......................... 70 4.4SimulationStudy ................................ 71 4.4.1Simulationwithstronginstrument ................... 71 4.4.2Simulationwithweakinstrument ................... 74 4.5ResultsonSchool-basedWater,Sanitation,andHygiene(SWASH)Project 75 4.6Conclusions ................................... 77 APPENDIX APROOFOFLEMMA1(CHAPTER2) ....................... 78 BPROOFOFLEMMA2(CHAPTER2) ....................... 80 CSASCODE(CHAPTER3) ............................. 81 DESTIMATIONOFINSTRUMENTALVARIABLEONSTRUCTURENESTEDMODELS ....................................... 83 EVARIANCEESTIMATION .............................. 88 REFERENCES ....................................... 105 BIOGRAPHICALSKETCH ................................ 110 7

PAGE 8

LISTOFTABLES Table page 1-1Resultsofthemoderatelybiasedsamplingsimulation .............. 28 1-2Resultsofthestronglybiasedsamplingsimulation ................ 29 1-3Resultsoftheunbiasedsamplingsimulation ................... 31 1-4ResultsofanalyzingtheNHISdata ......................... 32 2-1Resultsofsimulationstudywith=(0.5,0.8).TheMLEis^=(1.0814,1.6508)withSD(^)=(0.3838,0.1935). ........................... 48 4-1DistributionofY(0)jA=a,Z. ............................ 72 4-2DistributionofY(a)jA=a,Z. ............................ 73 4-3SimulationresultsforIVmethodonastructuralnestedmodelwithstronginstrument. ............................................. 73 4-4SimulationresultsforIVmethodonastructuralnestedmodelwithweakinstrument. ............................................. 75 8

PAGE 9

LISTOFFIGURES Figure page 2-1fk()asafunctionofkfor=0.5 ......................... 45 2-2fk()asafunctionofkfor=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.5 ........................ 46 4-1DAGrepresentsthecausalrelationshipbetweenvariables. ........... 66 4-2DAGrepresentsthecausalrelationshipbetweenvariables(withindividuallevelconfounders). .................................... 68 9

PAGE 10

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyCAUSALINFERENCEWITHCOMPLEXSAMPLINGDESIGNSByZhulinHeAugust2012Chair:BabetteBrumbackMajor:BiostatisticsThisPhDdissertationconcernscausalinferencewithdatafromcomplexsamplingdesigns.Itconsistofseveralaspectsregardingnewlydevelopedorimplementedmethodologieswithcomplexsurveydata,whicharedescribedasfollows.First,weconsidertheproblemofadjustingforconfoundingbyclusterinthecontextofcomplexmultistagesamplingandabinaryoutcome.Weinvestigatethreecategoriesofapproachesordinarylogisticregressionforsurveydata,witheithernoeffectoraxedeffectforeachcluster;conditionallogisticregressionextendedforsurveydata;andgeneralizedlinearmixedmodel(GLMM)regressionforsurveydata.Weusetheory,simulation,andanalysesofthe2005NationalHealthInterviewSurvey(NHIS)datatocompareandcontrastallofthesemethods.Oneconclusionisthatallofthemethodsperformpoorlywhenthesamplingbiasisstrong,whichmotivatesustondanothermethodworksproperlywithstrongbiasedsamplingdesigns.Wethenshowthatforlogisticregressionwithasimplematch-pairsdesign,innitelyreplicatingobservationsandmaximizingtheconditionallikelihoodresultsinanestimatoridenticaltotheunconditionalmaximumlikelihoodestimator(MLE)withaxedeffectforeachpairbasedontheoriginalsample.Therefore,applyingconditionallikelihoodmethodstoapseudosamplewithobservationsreplicatedalargenumberoftimescanleadtoabiasedestimator.Thiscastsdoubtononealternativeapproachtoconditionallogisticregressionwithcomplexsurveydata. 10

PAGE 11

Inthethirdchapter,wegeneralizebinaryconditionallogisticregressionforcomplexsurveydatabyimplementingthemethodbasedonaweightedpseudo-likelihood,inwhichthecontributionfromeachneighborhoodinvolvesallpairsofcasesandcontrolsintheneighborhood.Weshowthatitcorrespondstoanequivalentordinaryweightedlog-likelihoodformulationwithbinaryoutcomes.Weexplainhowtoprogramthemethodusingstandardsoftwareforordinarylogisticregressionwithcomplexsurveydata.Wethenapplythemethodto2009NationalHealthInterviewSurvey(NHIS)publicusedata,toestimatetheeffectofeducationonhealthinsurancecoverage,adjustingforconfoundingbyneighborhood.Last,weconcentrateonadjustingforunmeasuredconfoundingoftheeffectofcluster-leveladherenceonanindividualbinaryoutcomewithcomplexsamplingdesigns.Seekingnewmethodologiesforadjustingforconfoundingduetoclustereffects,weusedoubleinverse-probabilityweightingtoadjustforthedisproportionatesamplingandtheassociationofindividual-levelconfounderswithrandomization.Thenwedevelopandapplymethodsbasedonstructuralnestedmodelstoestimateeffectsofadherenceassessedintermsofrelativeriskandriskdifference,usingcluster-levelrandomizationasaninstrumentalvariableandusingthedoubleweightstoadjustforcomplexsamplingandindividual-levelconfounding.Asanimportantapplication,wewishtoestimatetheeffectofschool-leveladherenceonindividualabsenteeisminthecontextofaschool-basedwater,sanitation,andhygieneintervention(WASH)inWesternKenya. 11

PAGE 12

CHAPTER1EFFORTSTOADJUSTFORCONFOUNDINGBYNEIGHBORHOODUSINGCOMPLEXSURVEYDATA 1.1OutlineInthischapter,wefocusourinvestigationonourstudyontheestimationofanindividualexposureeffectintheeffectinthepresenceofconfoundingbyneighborhoodeffects,motivatedbyananalysisofNationalHealthInterviewSurvey(NHIS)publicusedata(SeeBrumbacketal.[ 14 ]).Intheanalysis,weoperationalizeneighborhoodasthesecondarysamplingunitofthesurvey,whichconsistsofsmallgroupsofneighboringcensusblocks.Thustheneighborhoodsmaybesampledwithunequalprobabilities,asmaybeindividualswithinneighborhoods.Wethendevelopandcompareseveralapproachesfortheanalysisoftheeffectofdichotomizedindividual-leveleducationonthereceiptofadequatemammographyscreening.Weinvestigatethreecategoriesofapproachesordinarylogisticregressionforcomplexsurveydata,witheithernoeffectoraxedeffectforeachcluster;asimplemodicationofconditionallogisticregressionforcomplexsurveydata;andgeneralizedlinearmixedmodel(GLMM)regressionforcomplexsurveydata[ 51 ].Weusetheory,simulation,andanalysisoftheNHISdatatocompareandcontrastallofthesemethods.Oneconclusionisthatallofthemethodsperformpoorlywhenthesamplingbiasisstrong,whichmotivatesustondanothermethodthatworksproperlywithstronglybiasedsamplingdesigns. 1.2IntroductionInsocialepidemiology,thepopulationmaybewidelydistributedgeographically.Socialepidemiologistsoftenconsiderneighborhoodcharacteristicsassomeofthemostimportantinuencesonhealthoutcomes.Sinceindividualsinthesameneighborhoodusuallysharesimilarcharacteristics(suchasmedicalinsurancecoverage,healthcareaccess,standardofliving),onemaybeinterestedinestimatingtheeffectsofthemeasuredorunmeasuredneighborhoodcharacteristics,orestimatingeffectsadjustedforconfoundingduetoneighborhoodcharacteristics.Thisresearchismotivatedby 12

PAGE 13

ananalysisofNationalHealthInterviewSurvey(NHIS)publicusedata[ 43 ]toassesstheeffectofindividuallevelexposuresonthereceiptofabinaryoutcome:adequatemammographyscreening(mammogramwithinpast24months)inwomenabove40.Inthischapter,wewouldliketoinvestigatedifferentmethodsofadjustingforconfoundingbyneighborhoodusingcomplexsurveydatawithpossiblyinformativesamplingdesigns.Recently,theproblemofadjustingforconfoundingincausalinferencehasgeneratedalargeliterature.Greenland,Robins,andPearl[ 25 ]providedahistoricalreviewoftheconceptofconfoundinganddiscussedthedistinctionbetweenconfoundingandcollapsibility.Theypointedoutthatconfoundingisoneofmanyproblemsthatplaguestudiesofcauseandeffect.GreenlandandRobins[ 24 ],GreenlandandMorgenstern[ 23 ],Pearl[ 48 ],Hernan,Brumback,andRobins[ 29 ]discussedtheproblemsincontrolofconfounding.Somerecentpapersstudytheproblemofadjustingforconfoundingbycluster.Berlinetal.[ 8 ]presentedanexampleofadjustingconfoundingduetocluster(centers)inaclinicalstudy.Localio,Berlin,andTenHave[ 40 ]investigatedtheperformanceofseveralregressionmethodsconditionalregression,randomeffectsmodels,generalizedestimatingequations(GEE),surveysamplingmethods,andxedeffectsregressionforadjustingforconfoundingbycluster(center)usingsimulations.Theyalsopointedoutandveriedthatforalargenumberofclustersizeandsmallsamplesizeswithincluster,thexedeffectsestimatorispositivelybiased.Unfortunately,inourliteraturesearch,wefoundnoliteraturethatexploredthetopicofadjustingforconfoundingduetoclusterwithcomplexsurveydata.Agresti[ 1 ]describedordinarylogisticregressionandconditionallogisticregressionforsimpleclustersamplingdesigns,whereordinarylogisticregressionisconductedwithaxedeffectforeachcluster.However,similarlytoLocalioetal.[ 40 ],Agresti[ 1 ]explainedthatordinarylogisticregressionleadstobiasedestimatorofindividualexposureeffectwhenthenumberofsampledindividualswithinclustersstaysxedwhilethenumberofsampledclusterstendstoinnity.NeymanandScott[ 47 ]explainedthatthis 13

PAGE 14

problemisduetothenumberofnuisanceparameterstendingtoinnityasthenumberofsampledclustersincreases,whichinvalidatesthetheoryofstandardmaximumlikelihoodestimationandinference.Generalizedlinearmixedmodels(GLMMs)withalogitlink[ 1 ]areanalternativemethod.NeuhausandMcCulloch[ 46 ]showedthatGLMMsfailtoadequatelyadjustforconfoundingbyclusterwhenthenumberofsampledclustersissmall.NeuhausandKalbeisch[ 44 ]addedclusteraverageexposureintothemodel.Berlinetal.[ 8 ],BeggandParides[ 7 ]appliedthisapproachtosomerealdataanalyses.Forsomenonbinaryoutcomes,Verbeke,SpiessensandLesaffre[ 63 ]developedconditionallinearmixedmodelswithlongitudinaldata.GoetgelukandVansteelandt[ 19 ]developedconditionalgeneralizedestimatingequations(CGEE),comparingtogeneralizedestimatingequations(GEE)whichwasrstintroducedbyLiangandZeger[ 36 ].ButGoetgelukandVansteelandt[ 19 ]excludedthelogitlinkintheirpaper.Inthischapter,weexamineandinvestigatethreecategoriesofapproachesordinarylogisticregression,conditionallogisticregressionandGLMMsadjustingforconfoundingduetoclusteronabinaryoutcomeinthesettingofcomplexsurveydata.InSection 1.3 ,weintroducethemotivatingNHISdataanddescribethesamplingdesignsettingofthesurvey.ThenwesetupthemodelingframeworkinSection 1.4 andexplicatethethreecategoriesofapproachinSection 1.5 .InSection 1.6 ,westudytheestimationperformanceoncomplexsurveydatawithdifferentsamplingdesignswithrespecttoeachapproach.Forordinarylogisticregressionapproach,weconsidertwodesignsettings,eitherincludingornotincludingaxedeffectforeachcluster.Withregardtoconditionallogisticregression,wegeneralizetheapproachusingapseudo-samplemethodtoreplicatetheobservationsaccordingtointeger-scaledapproximationsofthesamplingweights.WethenuseGLMMstoadjustforconfoundingbyclusterwithsimulatedcomplexsurveydata.WebaseourGLMMprogramontheworkofRabe-HeskethandSkrondal[ 51 ],usingpseudo-likelihoodmethods.Wethen 14

PAGE 15

comparetheresultsofthreecategoriesofapproacheswithcomplexsurveydataindifferentdesignsettings.InSection 1.7 ,weapplythemethodstotheNHISdatatoestimatetheeffectofeducationonmammographyscreening,adjustingforconfoundingbyneighborhood.Discursioncomestothelastsection. 1.3MotivatingNHISExampleOurresearchismotivatedbytheanalysisof2005NHISpublicusedata[ 43 ]toestimatetheeffectofindividuallevelexposuresonthereceiptofadequatemammographyscreening(atleastonemammogramwithinthelast2years).Inourstudy,weconsidereducation,healthinsurancestatus,agecategory,andrace/ethnicityastheindividuallevelexposures,andmeasuredcensustractvariables(suchaspovertyandsegregationindices,andhigh-dimensionalcategoricalvariablethatisneighborhooditself)astheneighborhood-levelpredictors.Becauseofunmeasuredconfoundingofneighborhood,suchasdifferentialavailabilityofhealthservices,wewanttoadjustforconfoundingbyneighborhooditself.Ourconcernishowcouldweadjustforconfoundingofindividuallevelvariablesbyunmeasuredneighborhoodfactorsusingthehigh-dimensionalcategoricalvariable.Wewillfocusontheeffectofdichotomizededucation(morethanhighschoolversushighschoolorless)onthereceiptofadequatemammographyscreening,adjustingforconfoundingbyneighborhood.Sincewewantedtomeasureneighborhoodlocally,weoperationalizeditasthesecondarysamplingunit(SSU)ofthesurvey,whichconsistsofsmallgroupsofneighboringcensusblocks.TheNHISisanannualcross-sectionalsurvey,usingface-to-faceinterviews,thatenlistsacomplexmultistagesamplingdesign,describedindetailbyBotmanetal.[ 10 ].Itistheoneofnation'slargestin-personsurveys,whichisconductedannuallybytheNationalCenterforHealthStatisticsregardingaccesstohealthcare.Inthesurvey,theU.S.ispartitionedintoapproximately2000primarysamplingunits(PSUs),whichareindividualcountiesorgroupsofadjacentcounties.Approximately350PSUsareselectedintothesamplewithunequalsamplingprobability,andthesesamePSUsare 15

PAGE 16

usedforseveralyearsinarow.ThenthePSUsareeachsubdividedinto20substratabasedonthecensusconcentrationofblackandHispanicpersons.ThesubstratadenitionsareconsistentacrossPSUs;henceinsomePSUs,someofthesubstratamaybeempty.SSUsaresampledwithunequalprobabilityfromeachsubstratumwithineachPSUvaries.ThenhouseholdsaresampledwithinSSUs,andnally,oneindividualperhouseholdissampledforthecancercomponentofthesurvey,onwhichthemammographyquestionsareasked.Weexcludedmenandwomenunder40fromoursample,thereforethereareapproximately8300womenresidinginapproximately4800SSUsinthenalsample;thenumberofsampledwomenperSSUrangedfrom1to8. 1.4ModelingFrameworkLeti=1,,Mindexallneighborhoodsinthepopulationandj=1,,Niindexallindividualswithinneighborhoodi.LetYijbeabinaryoutcomeforindividualjinneighborhoodi,andXijbethecorrespondingindividual-levelcovariate.HereintheNHISstudy,YijindicatesthereceiptofadequatemammographyscreeningandXijdenotestheeducationlevel.ThenamodelfortheindividuallevelexposureofXijonoutcomeYijthataccountsforconfoundingbyunmeasuredneighborhoodeffectsbiis h(E(YijjXij,bi))=Xij+bi,(1)wherehisthelogitlinkfunction.LetXi=(Xi1,,XiNi)andYi=(Yi1,,YiNi).Wehavethreeassumptions(1)Yij`fXij0gj06=jjbi,whichmeansXijistheonlycomponentofXithatinuencesthedistributionofYijgivenbi;(2)(Yi,Xi,bi)`(Yi0,Xi0,bi0)fori6=i0,whichindicatesthatindividualsfromdifferentneighborhoodsareindependent; 16

PAGE 17

(3)Yij`Yij0jXi,biforj6=j0,whichmeanstheindividualoutcomesareindependentofeachothergivenallinformationwithrespecttoneighborhoodi,includingallindividualexposuresandneighborhoodeffects.LetYxijbethepotentialoutcomeregardingXij=x.Whentheneighborhoodistheonlyconfounder,weassumethat fYxijgaXijjbi,(1)wherefYxijgdenotesthesetofpotentialoutcomesdenedbyallpossiblevaluesofx.Besides,wealsoneedaconsistencyassumption,whichassumethatthepotentialoutcomesYxijarewelldenedandthatthoseforagivenindividualindependentoftherealizedexposuresforotherindividuals.ThenacausalmodelcanbesetupbasedonthepotentialoutcomesfYxijgregardingallpossiblevaluesofx h(E(Yxijjbi))=x+bi,(1)whereinmodel( 1 )isidenticaltointhemodel( 1 ).However,insomeconditions,theconsistencyassumptionandcausalmodel( 1 )couldbedubious.Forexample,regardingtheNHISstudy,ifXijindicatesindividualeducationlevelandbirepresentedaneighborhoodmeaneducationlevel(thatis,biisinextricablylinkedwithXi=(Xi1,,XiNi)),thenwecannotmanipulateXijwithoutinturnmanipulatingbi.Inthiscase,model( 1 )wouldbenonsensicalandtheconsistencyassumptionwouldbeviolatedtoo.PleasereferGreenland[ 22 ]formorerelatedcommentary.Instead,onemightprefertoassumethatbiisonlylinkedtostationarycharacteristicsoftheneighborhood,suchasavailabilityofhealthservices,toavoidthisviolation.Inthischapter,weprefermodel( 1 )ratherthanmodel( 1 ).Thisisbecausemodel( 1 )willnotbeviolatedevenwhenacausalinterpretationisuntenable,suchasthecasespresentedinthelastparagraph;onemayalsowanttoknowtheassociation 17

PAGE 18

levelbetweenindividualexposuresandoutcomescanbeaccountedforbyunmeasuredneighborhood-levelfactors. 1.5ApproachestoEstimationwithComplexSurveyDataInthissection,weelaboratetheestimationapproaches,ordinarylogisticregression,conditionallogisticregressionandGLMMwithcomplexsurveydatarespectively.Supposethenumberofsampledneighborhoodismandthenumberofsampledindividualwithinneighborhoodiisni.Letpidenotetheprobabilityofselectingneighborhoodi(correspondingtoaSSU),andletpjjidenotetheprobabilityofselectingindividualjwithinneighborhoodi,conditionalonhavingselectedneighborhoodi.Thenitiseasytoseethattheunconditionalprobabilityofselectingindividualjwithinneighborhoodiispji=pipjji.Correspondingly,denetheweightswi,wjjiandwjiastheinverseprobability,i.e.1=pi,1=pjjiand1=pji.Weusethemethodofvarianceestimationbasedontheso-calleddesign-consistentultimateclusters[ 58 ].Ascommonincomplexsurveydataanalysis,weapproximatethesamplingdistributionoftheestimatorsasarisingfromresamplingPSUswithreplacementwithinprimarystrata,ignoringsubsequentstepsofthemultistagedesign.Especiallyfortheconditionallogisticregressionmethod,weapproximatethesamplingdistributionasarisingfromresamplingPSUswithreplacementignoringtheprimarystrata.SupposepopulationofMneighborhoodsandNiindividualsperneighborhoodinmodel( 1 )isconceptualizedasarandomtwo-stagesamplefromthesuperpopulation[ 58 ].Thenmodel( 1 )holdsexcepttheconditionthatMandNiapproachesinnity.Inthenextsubsections,wewillpresentthethreeestimationapproaches,ordinarylogisticregression,conditionallogisticregressionandGLMMwithcomplexsurveydatarespectively. 1.5.1OrdinaryLogisticRegressionforComplexSurveyDataInthepastdecade,ordinarylogisticregressionandconditionallogisticregressionmethodshavebeencommonlyusedinsurveydataanalysis,becausemanyvariables 18

PAGE 19

aregenerallymeasuredascategoricalandthetwomethodscanincorporatealargenumberofexplanatoryvariables[ 33 ].Forordinarylogisticregressionwithcomplexsurveydata,wecanuseSASPROCSURVEYLOGISTICtoestimatetheeffectofXijontheoutcomes.withthisapproach,wewouldsimplyregressionYijontheinterceptandXijusingweightedlogisticregression(withweightswij)andadjustthestandarderrorsforthemultistagedesignusingTaylor-serieslinearization(See[ 34 ],[ 58 ]).Anotheroptionforordinarylogisticregressionisthatsettingthebiasxedeffectsinthelogisticregression.However,thisapproachleadstoinconsistentestimatorsofduetotheNeyman-Scottproblem[ 47 ],i.e.whenthenumberofsampledindividualsperclusterstaysxed,thenumberofsampledclusterstendstoinnity.InourNHISstudy,niisfairlysmall(rangingfrom1to8),andthenumberofclusters,m,comparingtoniisrelativelylarge.Thereforewewouldnotconsiderthisapproachtoleadtoagoodestimationforourstudy. 1.5.2ConditionalLogisticRegressionforComplexSurveyDataIftheclustersinonesurveydataweresampledwithequalprobabilityandindividualswithinclusterwerealsosampledwithequalprobability,conditionallogisticregressionhasanadvantageforadjustingforconfoundingbycluster.OnecanimplementthisprocedureusingSASPROCLOGISTICwiththestratumstatement.Theconditionallogisticregressionisbasedonsufcientstatisticsforthebi,whicharePNij=1Yij,giventheclusterandindividualswithinclusteraresampledwithequalprobabilityfromsuperpopulation.Thenwecanderivethemaximumlikelihoodestimatoroftheconditionallikelihood.Agresti[ 1 ]describedpropertiesofconditionallogisticregressioninhisbook.Unfortunately,inourliteraturesearch,therewasnopublishedresearchregardingthegeneralizationofconditionallogisticregressionforusewithcomplexsurveydata.Forcomplexsurveydatawithnoequalrestrictiononsamplingprobabilities,denotethesufcientstatisticforbiasSi;then,theconditionallikelihoodforthepopulationisof 19

PAGE 20

theform L(jSi,i=1,,M)=MYi=1NiYj=1P(YijjXi,bi,Si)=MYi=1P(Yi1,,YiNijXi,bi)=P(SijXi,bi)=MYi=1QNij=1exp(Xij)Yij PYi2CiQNij=1exp(Xij)Yij (1) whereSi=PNij=1YijandCidenotethesetofallpossibleNi-vectorsofbinaryelementsYi=(Yi1,,YiNi)suchthatPNij=1Yij=Si.DuetocomplicatedpresenceofthedenominatorP(SijXi,bi)in( 1 ),itseemsimpossibletondaconsistentestimateofthepopulationconditionallikelihoodusingpseudo-likelihoodwithweightswiandwjji.Weturntointeger-scaledweightsinsteadoftheinverseprobabilityweightswiandwjjitoconstructapseudosamplewithpseudo-likelihood.Specically,wescalewiandwjjitobeinteger-valued,anddenotethenewinteger-scaledweightsaswiandwjji.Inpractice,onecanscaletheweightsandconvertthemtointegersbydividingalloftheweightsbythesmallestweight,multiplyingbyamoderatenumber,andthenroundingalloftheweightstothenearestinteger.Themoderatenumberneedstobesmallenoughsothattheestimationprocedureconcludesinareasonableamountoftimeandwithoutexceedingmemorylimitations.Thenthepseudosampleisconstructedbyreplicatingobservationsaccordingtotheweights,i.e.replicatingindividualjwithinneighborhoodiwjjitimesandthenreplicatingneighborhoodiwitimes.Then,thesamplingdistributionoftheestimatorcouldbeapproximatedusingthebootstrapthatresamplingthePSUswithreplacementignoringprimarystrata.Thisapproachmayworkbetterthanignoringthecomplexsamplingdesignandapplyingconditionallogisticregressiontotheunweighteddata.Howevertheapproachfailstoleadanunbiasedestimatorwhentheweightswjji=ktendstobelarge.HeandBrumback[ 28 ]showedthatwhenkapproachestoinnitybasedonthepsuedosample, 20

PAGE 21

theconditionalmaximumlikelihoodestimatorisequivalenttotheordinarymaximumlikelihoodestimator.AproofanddetailsoftheequivalencewillbeprovidedinChapter2.DuetoNeyman-Scottproblem[ 47 ],theestimatoroftheconditionallogisticregressionwouldbebiasedwhenitapproximatestheordinarylogisticregressionestimatorwithaxedeffectforeachcluster. 1.5.3GLMMRegressionforComplexSurveyDataTherearetwostandardsolutionstotheNeyman-Scottproblem[ 47 ].Oneisconditionallogisticregressioninthecontextofmodel( 1 )forsimpleclustersampling(substitutingmforMandniforNi).TheotheristheGLMMregression,specicallyislogisticregressionwithrandomintercepts(onepercluster).ToimplementtheGLMMregression,weneedtospecifyadistributionofthebi.Thestandardandmostcommonlyusedspecicationisi.i.d.N(0,2b).Neuhaus,KalbeischandHauck[ 45 ]statedonedisadvantageoftheGLMMthatthechoiceofdistributionforthebicaneffecttheconsistencyoftheGLMMestimatorof.Regardingourresearchoncomplexsurveydatawithconfoundingbyneighborhoodeffect,anotherdisadvantageoftheGLMMisthatitpresumesthatbiisindependentofXij,whileitisclearlyviolatedwhenthereisconfoundingbyneighborhood.TodescribethesecondweaknessoftheGLMM,weassumemodel( 1 )holdsforthepopulation,weaddanothermodeloftheformwithrespecttobi bi=E(bijXi)+0i,(1)whereXi=(Xi1,,XiNi),0ifollowsi.i.d.N(0,2),and0iisassumedtobeindependentofXi.Moreoverweassumethat E(bijXi)=q(Xi; ),(1)whereq(Xi; )isalinearfunctionof and isnite-dimensional.Itfollowsthat E(YijjXi,0i)=h)]TJ /F9 7.97 Tf 6.59 0 Td[(1(Xij+q(Xi; )+0i),(1) 21

PAGE 22

wherethedistributionofthe0iconditionalonXiisi.i.d.N(0,2).Furthermore,theof( 1 )hasthesameinterpretationastheinmodel( 1 )duetotheassumptionthatmodel( 1 )holds.Therefore,wehaveconstructedaGLMMforthepopulationwithrandomeffect0iindependentofXij;henceitisnolongermattersthatbiisassociatedwithXij.TheapproachofNeuhausandKalbeisch[ 44 ]canbederivedasaspecialcaseofmodel( 1 ),where =( 0, 1)T,q(Xi; )= 0+ 1XiandXi=(1=Ni)PNij=1Xij.ThisleadstothepopulationGLMM E(YijjXi,i)=expit( 0+ 1Xi+Xij+i),(1)whereexpit(x)=exp(x)=(1+exp(x)),withi=0i=havingastandardnormaldistributiongivenXiasi.i.d.N(0,1).FollowingGrilliandPratesi[ 26 ],wedene=( 0, 1,,)T.Thenwewritethelogarithmofthelikelihoodcorrespondingto( 1 )as logL()=MXi=1logZ1"expfNiXj=1logLij(;)g#()d.(1)Assumethatthenumberofsampledclustersandthenumberofsampledindividualswithinclusterstendtoinnity,thenwehavepseudo-likelihoodwithrespecttopopulationlikelihood( 1 ) log^L()=mXi=1wilogZ1"expfniXj=1wjjilogLwij(;)g#()d,(1)whereLwij(;)isidenticaltoLij(;)butwithXWi=Pnij=1wjjiXij=Pnij=1wjjisubstitutedforXi.AsexplainedinGrilliandPratesi[ 26 ]forthespecialcaseof 1=0,andinPfeffermannetal.[ 49 ]foranothercaseoftheidentitylinkfunctionand 1=0,itshowsthatthemaximumpseudo-likelihoodestimator^MPLapproachesthenitepopulationmaximumlikelihoodestimator^asmandniapproachMandNi,respectively.Becauseoftheconsistencyof^forthesuper-populationparameterasMgoestoinnity,wecanspeculatethat^MPLperformagoodestimationaswell. 22

PAGE 23

Oneobstacleofapplyingmaximumpseudo-likelihoodestimatorsintheGLMMregardingNHISstudyisthattheclustersizesinsurveydataarequitesmall.Itisnotobviousthatthemaximumpseudo-likelihoodestimators^MPLisconsistenttothetruevalues,sinceniisnotclosetoNiandtheweightswjjicouldbequiteunbalancedwithinclusters.Fornitepopulation,thepseudo-likelihoodresultsinabiasedestimatorsincethepseudo-likelihoodinvolveswithwjjiandthemeasurementerrorinXWiasanestimatorofXi.Regardingtheterminvolvingthewjji,therehasbeenmuchdiscussionbystatisticiansabouthowtoscalethewjjiweightssothatforsmallclustersizestheprocedurehasgoodperformance(See[ 51 ],[ 26 ],[ 49 ]).Oneofthemostintuitivesuggestionsistoscalethewjjisothatthesumoftheseweightsoverjequalsni,thesamplesizefortheclusters(See[ 49 ],[ 41 ]).However,therehasbeennodiscussionaboutwecouldndtheimplicationsofsubstitutingXWiforXi,exceptthatGrilliandRampichini[ 27 ]discussedonesolutionregardingtothisproblemfortheconditionthatwhenthewjjiareallequaltooneandniismuchlessthanNi.Theirresultsmightleadtoasuggestionthattheconsistencyoftheestimatorof 1isaffected,buttheconsistencyoftheestimatorofisnotverymuchaffectedunderthecondition.TheGLLAMMsoftwareofRabe-HeskethandSkrondal[ 51 ]canbeusedtomaximizethepseudo-likelihood( 1 )andtoderivea95%condenceintervalfor.Onemightbeinterestedinastandardizedpopulation-averageeffectforahypotheticalpopulationinwhichthedistributionofbiisthesameforeachpossiblevalueofXij.Assumethedistributionofbiisitsdistributionintheentirepopulation,anddenoteitasF(b),thentheestimatorofstandardizedpopulation-averageeffectis p=logitZlogit)]TJ /F9 7.97 Tf 6.59 0 Td[(1(+b)dF(b))]TJ /F1 11.955 Tf 11.96 0 Td[(logitZlogit)]TJ /F9 7.97 Tf 6.59 0 Td[(1(b)dF(b),(1) 23

PAGE 24

Thenwecouldusethepredictedrandomeffects^iand^MPLtoestimatethestandardizedpopulation-averageeffectaccording( 1 ) ^p=logit wi Pmi=1wi"Pnij=1wjjiexpit(^+^ 0+XWi^ 1)+^i Pnij=1wjji#!)]TJ /F1 11.955 Tf 9.3 0 Td[(logit wi Pmi=1wi"Pnij=1wjjiexpit(^ 0+XWi^ 1)+^i Pnij=1wjji#!. (1) 1.6SimulationStudyAswestatedintheprevioussections,thereisnomethodthatperformsperfectlyforallconditionsamongthethreecategoriesofapproaches.Alloftheproposedmethodshavelimitationsanditisnottheoreticallyclearthatwhichmethodperformbetterforagivenapplication.Thereforeweappealtoseveralsimulationstofurthercomparethemethods.Wesimulatedcomplexsurveydatausing(a)moderatelybiasedsampling,(b)stronglybiasedsampling,and(c)unbiasedsamplingwithrespecttoourestimandofinterest.SetM=m=1000andNi=1000foreachclusteri.LetXijconditionallyindependentBernoullirandomvariableswithprobabilityexpit(i),whereiarei.i.d.N(0,1).Setbi=)]TJ /F4 11.955 Tf 9.3 0 Td[(5Xi+i,whereiarei.i.d.N(0,1).ThenwegeneratedthebinaryoutcomeYijaccordingtomodel( 1 ),whereourtargetofestimationissetequalto0.5.WesimulatedthepopulationwithM=1000clusters.Next,wesampledfromthepopulationfollowingthedesigns(a),(b)and(c)respectively.First,wedenedabinaryconcordancevariableCandsetthevaluesofCaccordingthewhetherYij=Xijornot.Thatiswedividedtheindividualobservationsintoconcordant(C=1)anddiscordant(C=0)groups.Moreover,wegeneratedforeachobservationanindependentBernoullirandomvariableB,wherethedistributionofBdecidesthelevelofsamplingbias.For(a),wesettheprobabilityoftheBernoullidistributionofBequalto0.6ifC=1and0.4ifC=0.For(b),wesimplyletB=C.For(c)unbiasedsampling,weletBbean 24

PAGE 25

i.i.d.Bernoullivariablewithprobability0.5.Finally,weincludedobservationsintothesamplewithindependentprobability0.002ifB=1and0.004ifB=0.Weincludedm=1000neighborhoodsinthesample,andnicouldbevariedforeachclusterduetooursamplingscheme.Accordingtothesamplingscheme,sampledindividualsweregivenweightswi=1,wjji=2ifB=1,andwjji=1ifB=0.Eachsimulationwasrepeated100timesforeachmethodofestimating,wherethetruevalueis0.5.WewilldescribehowweconducttheestimationproceduresusingsoftwaresinSection 1.6.1 ,Section 1.6.2 andSection 1.6.3 ,respectively.ThenwewillpresenttheestimationresultsinSection 1.6.4 1.6.1OrdinaryLogisticRegressionforComplexSurveyDataWeappliedweightedordinarylogisticregressiontothethreecategoriesofsampleddatawithandwithoutincludingaxedeffectforeachcluster.WeusedSASPROCLOGISTIC(version9.2)toconductedtheapproachofestimating.ThedataweresimulatedusingSASsoftware. 1.6.2ConditionalLogisticRegressionforComplexSurveyDataWeappliedconditionallogisticregressiontothesimulateddatabasedonapseudo-sample.Weusedthreedifferentmethodsofconstructingthepseudo-sample.First,weletthepseudo-samplebeequaltothesample,i.e.alltheweightsequalto1(pseudosample1).Second,weconstructedthepseudo-samplewithweightswjji=2forB=1andwjji=1forB=0(pseudosample2).Andthird,weconstructedthepseudo-samplewithweightswjji=20forB=1andwjji=10forB=0(pseudosample3).Wecancomparetheeffectsofscalingweightsofsmallervaluesandlargervaluesbycomparingtheestimatorsofthesecondandthethirdmethods.WeusedSASPROCLOGISTIC(version9.2)withthestratumstatementconductedtheapproachofestimating.ThedataweresimulatedusingSASsoftware. 25

PAGE 26

1.6.3GLMMRegressionforComplexSurveyDataWeappliedGLMMregressionwithandwithoutweightstothesimulateddata.HerewescaledtheweightssothatthesumoverjequalsniaswementionedinSection 1.5.3 (See[ 49 ],[ 41 ]).WetreatedthedistributionofbiasGaussian,andtriedttingthenaivemodel( 1 )andthepopulationGLMM( 1 )(whichadjustsforconfounding)basedonitspseudo-likelihoodapproximation( 1 ).Besides,toexploretheeffectsofthemeasurementerrorinXWiasanestimateofXiwediscussedbefore,wealsotmodel( 1 )basedonthepseudo-likelihoodapproximation( 1 )butwithoutthemeasurementerror.ThelastapproachwouldbeimpossibleinpracticeunlessXiwereobserved,butitprovidesuswithusefulinformationontheeffectsofmeasurementerrorinthissimulationstudy.TheGLLAMMmacrowith20adaptivequadraturepointsandthepweightoptioninStataversion10wasusedfortheestimation.ThedataweresimulatedusingStata. 1.6.4ResultsTheresultsforthemoderatelybiasedsamplingschemearepresentedinTable 1-1 .Comparingtherelativebiasesofalleightmethods,itisobviousthatconditionallogisticregressionwiththesmallerpseudo-sampleperformsthebest(method4,mean^=0.46;true=0.5).Eitherordinarylogisticregressionwithoutclustereffect(method1,mean^=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.29)ornaiveGLMMapproachassuming 1=0withoutaccountingforconfounding(method6,mean^=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.20)showsdisastrousresultsduetotheincorrectestimateddirectionoftheeffect.TheothertwoGLMMmethodsthatcorrectlyadjustforconfounding(method7,mean^=0.36andmethod8,mean^=0.40)representsimilarandnearlycorrectresults.Ordinarylogisticregressionwithclustereffect(method2,mean^=0.62)leadstoabetterresultscomparingtomethod1.Comparingmethod4toconditionallogisticregressionwithlargerscaledweights(method5,mean^=0.59),itindicatesthatmethod5usesexcessivereplicationandthescaleoftheweightsindeed 26

PAGE 27

hasimpactsontheestimationprocedure.TheeffectofmeasurementerrorinGLMMprocedureisnotobviouscomparingtheresultsofmethods6and7.TheresultsforthestronglybiasedsamplingschemearepresentedinTable 1-2 .Alloftheeightmethodsperformpoorlyonestimatingthetrue.Failingtoadjusttheclustereffectinordinarylogisticregression(method1,mean^=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.29),method3(mean^=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.92),allGLMMmethods(method6,mean^=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.80;method7,mean^=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.27andmethod8,mean^=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.14)indicatetheincorrectestimationdirection.Theresultsofmethods2(mean^=0.14),method4(mean^=0.10),andmethod5(mean^=0.12)arequitesimilar,indicatingthattheNeyman-Scottproblem[ 47 ]iseithernotaproblemforthissimulationsetting.Comparingmethod3,method4andmethod5,weclearlyobservethatfailingtoweightthesampleatallcanleadtomuchbiasintheconditionallogisticregression,andtheexcessiveweightshaveunconspicuouseffectsontheestimationprocedure.WealsoobservethattheconditionallogisticregressionmethodsthatincorporatetheweightingperformmuchbetterthantheallthreeGLMMmethods.Comparingmethod7andmethod8,wendthatincludingmeasurementerrordoesleadtoadditionalbias,whichcouldbeowingtopoorperformanceoftheGLMMmethods.TofurtherinvestigatetheproblemofthefailureoftheGLMMmethods,weconductedasimulationwithouttheconfounding;i.e.wesetbi=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.5+i,sothatbiisindependentofXi.Then,weusedthe`naive'methodwhichisthebestGLMMmethodamongtheeightmethodsforthisnewsimulationsetting.Itturnedoutthatthemeanestimateofis-0.18withastandarddeviationof0.11in100iterationsofthesimulation.Itshowsdisastrousperformancewhilethetruevalueis0.5.ThisimpliesthatthemethodologyofRabe-HeskethandSkrondal[ 51 ]doesnotworkwellwhenthesamplingbiasisstrong.TheresultsfortheunbiasedsamplingarepresentedinTable 1-3 .SimilartoTable 1-1 andTable 1-2 ,wendmethods1(mean^=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.28)andmethod6(mean^=)]TJ /F4 11.955 Tf 9.29 0 Td[(0.04)failstopointedoutthecorrectdirectionofestimationowingtofailingto 27

PAGE 28

Table1-1. Resultsofthemoderatelybiasedsamplingsimulation CategoryMethodMeanSDRangeRelativebias Ordinarylogisticregression(1)withoutclustereffect-0.290.12(-0.61,-0.05)1.58(2)Withclustereffect0.620.22(-0.01,1.11)0.24Conditionallogisticregression(3)Pseudosample10.210.16(-0.22,0.78)0.58(4)Pseudosample20.460.18(-0.05,1.05)0.08(5)Pseudosample30.590.20(-0.04,1.05)0.18GLMM(6)Naive-0.200.13(-0.45,0.06)1.40(7)Adjusted0.360.16(0.04,0.80)0.28(8)Adjustedwithoutmeasurementerror0.400.13(0.06,0.68)0.20 Mean,SD,andrangearebasedon100simulations.Differentsimulateddatasetsareusedforeachresult,exceptforthepairs(pseudosample1,pseudosample2)and(naive,adjusted).Truevalueofestimandis0.50.Relativebiasiscalculatedastheabsolutevalueof(mean)]TJ /F4 11.955 Tf 11.96 0 Td[(0.5)=0.5. 28

PAGE 29

Table1-2. Resultsofthestronglybiasedsamplingsimulation CategoryMethodMeanSDRangeRelativebias Ordinarylogisticregression(1)withoutclustereffect-0.290.11(-0.61,0.11)1.58(2)Withclustereffect0.140.22(-0.40,0.70)0.72Conditionallogisticregression(3)Pseudosample10.920.14(-1.19,-0.58)2.84(4)Pseudosample20.100.15(-0.22,0.47)0.80(5)Pseudosample30.120.22(-0.36,0.86)0.76GLMM(6)Naive-0.800.15(-1.13,0.44)2.60(7)Adjusted-0.270.18(-0.62,0.16)1.54(8)Adjustedwithoutmeasurementerror-0.140.16(-0.53,0.19)1.24 Mean,SD,andrangearebasedon100simulations.Differentsimulateddatasetsareusedforeachresult,exceptforthepairs(pseudosample1,pseudosample2)and(naive,adjusted).Truevalueofestimandis0.50.Relativebiasiscalculatedastheabsolutevalueof(mean)]TJ /F4 11.955 Tf 11.96 0 Td[(0.5)=0.5. 29

PAGE 30

adjustortoproperlyadjustfortheneighborhoodeffect.Besides,thefailureofmethod6emphasizestheproblemwithstandardapplicationsofGLMMstoadjustforconfoundingbyneighborhood,wherethesemethodsallassumethatE(bijXi)= 0.Comparingtomethod3(noreplications;mean^=0.51),method4(fewreplications;mean^=0.59),andmethod5(excessivereplications;mean^=0.71),itshowstheadverseeffectofreplicatingobservationsontheconditionallogisticregression,whichmeansnoweightingtechnologyisneededintheconditionallogisticregression.Finally,comparingmethods7(mean^=0.54)andmethod8(mean^=0.53),wereachaconclusionthatfortheunbiasedsamplingscheme,theeffectofmeasurementerrorontheperformanceoftheGLMMmethodsisnegligible. 1.7ApplicationtoNHISDataTable 1-4 presentstheresultsofapplyingthemethodsofSection 1.5 totheNHISdatatoestimatetheeffectofeducationonreceiptofadequatemammographyscreening,adjustingforconfoundingbyneighborhood.Estimatedoddsratiosand95%condenceintervalsarepresentedforvemethods:ordinaryweightedlogisticregressionwithoutaclustereffect,conditionallogisticregressionwithreplication,GLMMwiththenaivemodelE(bijXi)= 0,GLMMwiththemodelforconfoundingadjustment,E(bijXi)= 0+ 1XWi,andtheGLMMestimatorofthepopulation-averagedeffect,p,computedaccordingtoequation( 1 ).Alloftheveestimatorsshowsthesamedirectionoftheeducationeffectsontheoutcome.Weseethattheordinarylogisticregressionwithoutclustereffect(method1,OR=1.62,95%CI(1.45,1.81))andtheGLMMregressionwithnaivemodel(method3,OR=1.66,95%CI(1.48,1.86))aresimilar.Bothconditionallogisticregressionwithreplication(method2,OR=1.44,95%CI(1.15,1.79))andtheadjustedGLMMregression(method4,OR=1.24,95%CI(1.05,1.46))showthereductionoftheeffectofeducationafteradjustingforconfoundingbyneighborhood,whichisreasonable.TheGLMMestimateofthepopulation-averagedeffect(method5,OR=1.18,95%CI(1.04, 30

PAGE 31

Table1-3. Resultsoftheunbiasedsamplingsimulation CategoryMethodMeanSDRangeRelativebias Ordinarylogisticregression(1)withoutclustereffect-0.280.12(-0.54,0.14)1.56(2)Withclustereffect0.660.21(-0.07,1.15)0.32Conditionallogisticregression(3)Pseudosample10.510.16(0.13,0.99)0.02(4)Pseudosample20.590.19(0.19,1.12)0.18(5)Pseudosample30.710.22(0.23,1.19)0.42GLMM(6)Naive-0.040.13(-0.33,0.35)1.08(7)Adjusted0.540.16(0.21,0.91)0.08(8)Adjustedwithoutmeasurementerror0.530.13(0.27,0.84)0.06 Mean,SD,andrangearebasedon100simulations.Differentsimulateddatasetsareusedforeachresult,exceptforthepairs(pseudosample1,pseudosample2)and(naive,adjusted).Truevalueofestimandis0.50.Relativebiasiscalculatedastheabsolutevalueof(mean)]TJ /F4 11.955 Tf 11.96 0 Td[(0.5)=0.5. 31

PAGE 32

1.35))isquitedifferentfromtheestimateofthepopulation-averagedeffectinmethod1,whichdoesnotadjustforconfoundingbyneighborhood. Table1-4. ResultsofanalyzingtheNHISdata MethodORestimate95%CI (1)Ordinarylogisticregressionwithoutclustereffect1.62(1.45,1.81)(2)Conditionallogisticregressionwithreplication1.44(1.15,1.79)(3)GLMMnaive1.66(1.48,1.86)(4)GLMMadjusted1.24(1.05,1.46)(5)GLMMpopulation-averagedeffect,adjusted1.18(1.04,1.35) 1.8DiscussionWeconcentrateourstudyontheproblemofadjustingforconfoundingbyclusterinthecontextofcomplexsurveydatawithbinaryoutcome.Wehaveinvestigatedthreecategoriesofapproachesordinarylogisticregressionforcomplexsurveydata,witheithernoeffectoraxedeffectforeachcluster;conditionallogisticregressionextendedforcomplexsurveydata;andgeneralizedlinearmixedmodel(GLMM)regressionforcomplexsurveydataintermsoftheory,simulation,andpractice.Thetheoryindicatesthatallofthemethodsconsideredhavelimitations.Oursimulationstudyindicatesthatallofthemethodsperformpoorlywhenthesamplingbiasisstrongandsomeofthemethodsleadtoagoodestimatorwhenthereisnosamplingbiasormoderatelysamplingbias.Thismotivatesustondanothermethodworksproperlywithstrongbiasedsamplingdesigns. 32

PAGE 33

CHAPTER2ANEQUIVALENCEOFCONDITIONALANDUNCONDITIONALMAXIMUMLIKELIHOODESTIMATORSVIAINFINITEREPLICATIONOFOBSERVATIONS 2.1OutlineInthischapter,weprovethatforlogisticregressionwithamatch-pairsdesignwithbinaryoutcome,afterinnitelyreplicatingtheobservations,theconditionalmaximumlikelihoodestimator(CMLE)isequivalenttotheunconditionalmaximumlikelihoodestimator(MLE)withaxedeffectforeachpairbasedontheoriginalsample.WealsoshowtheCMLEforthepseudosamplethatreplicatingtheoriginalsamplektimefallsinarangebuiltbyMLEfortheoriginalsample.Itisknownthatforbinarymatched-pairsdatawithsinglepredictor,theunconditionallogisticregressionestimatorisbiased.Therefore,applyingconditionallikelihoodmethodstoapseudosamplewithobservationsreplicatedalargenumberoftimescanleadtoabiasedestimator.Thisresultcastsdoubtononepossibleapproachtoconditionallogisticregressionwithcomplexsurveydata. 2.2IntroductionInChapter 1 ,motivatedbyanapplicationwithcomplexsurveydata,wecomparedseveralmethods,includingconditionallogisticregressionandunconditionallogisticregression,withasimulatedcomplexsurveydata[ 14 ].Theresultsshowthatforconditionallogisticregression,replicatingobservationsmoretimes(i.e.largerweights)wouldresultdifferentestimatortotheestimatorthatreplicatingobservationslesstimes(i.e.smallerweights).Inaddition,asthenumberoftimesthatobservationswerereplicatedincreasestoinnity,wespeculateanasymptoticequivalencebetweentheconditionallogisticregressionandunconditionallogisticregressionmethods.Therefore,applyingconditionallikelihoodmethodstoapseudosamplewithobservationsreplicatedalargenumberoftimescanleadtoabiasedestimator.Oneconsequenceisthatapossibleapproachtoapplyingconditionallikelihoodmethodstocomplexsurveydataispronetobias.Specically,applyingconditionallikelihoodmethodstoapseudosample 33

PAGE 34

constructedbyreplicatingobservationsaccordingtointeger-scaledversionsofcomplexsamplingweightscanleadtoabiasedestimatorwhentheunconditionalmaximumlikelihoodmethodsarebiased.Inthelogisticregressioncontext,ourmodellingframeworkisasfollows.Letiindexclustersandletjindexindividualswithinclusterinthepopulation.Toestimatetheeffectofap-dimensionalindividual-levelexposureXijandabinaryoutcomeYij,onepopulation-levelmodelthatadjustsforconfoundingbyclusterisgivenby logit(E(YijjXij,bi))=XTij+bi,(2)whereistheeffectofinterestandalloutcomesareassumedindependentofoneanothergiventheclustereffects,bi.Withordinaryclustersampling,mclustersandniindividualswithineachclusteriareselectedatrandomfromaconceptuallyinnitepopulationofclustersandindividualswithincluster.NeymanandScott[ 47 ]pointedoutthattheasymptotictheoryforunconditionalmaximumlikelihoodestimationisinvalidforestimatingandthecollectionofbiwhentheniremainsmallwhilemapproachesinnity.Andersen[ 2 ]andAgresti[ 1 ]pointsoutthatforone-dimensionalcovariatemodelof 2 ,theunconditionalmaximumlikelihoodestimator(MLE)ofisinconsistentinthatsetting,andespeciallysoforthesimplematchedpairsdesigninwhichni=2,Xi1=0andXi2=1.Forthatdesign,letn12bethenumberofobservationswithYi1=1andYi2=0,andn21bethenumberofobservationswithYi1=0andYi2=1.ThentheMLEofis2log(n21=n12),whereastheconditionalmaximumlikelihoodestimator(CMLE),whichisconsistent,islog(n21=n12).Thus,theMLEisbiasedbyafactoroftwoforthesimplematchedpairsdesign.Formoregeneraldesignsenlistingordinaryclustersampling,denotethecontributiontothelikelihoodcorrespondingto( 2 )fromindividualjofclusteriasLij(,bi)=exp(XTij+bi)Yij=(1+exp(XTij+bi)),andletLi(,bi)=Qnij=1Lij(,bi),b=(b1,...,bm),andL(,b)=Qmi=1Li(,bi).Let=(,b)andletl()andlij()denotethenatural 34

PAGE 35

logarithmsofL(,b)andLij(,bi),respectively.Theunconditionalmaximumlikelihoodestimatorofsolves@l() @=mi=1nij=1@lij() @=0.Withcomplexsurveydata,denotethereciprocaloftheprobabilityofselectingclusteriintothesampleaswiandthatoftheconditionalprobabilityofselectingindividualjfromclusterigivenselectionofclusteriaswjji.Letwij=wiwjji.Themaximumpseudolikelihoodestimator(MPLE,[ 58 ])forisdenedasthesolutiontomi=1nij=1wij@lij() @=0.Iftheweightswijwerescaledtobeintegers,theMPLEwouldequaltheMLEbasedonapseudosampleinwhichobservationijisreplicatedwijtimes.Unfortunately,theMPLEinheritsinconsistenciesfromtheMLEforsettingsinwhichwijisconstant.Thus,itisnotamethodthatwillgenerallydeliverconsistentestimatorsof.Generalizingconditionalmaximumlikelihoodestimationforusewithcomplexsurveydataisdifcult(see[ 14 ],[ 20 ]and[ 15 ]),becausethepseudolikelihoodapproachcannotbedirectlyappliedwithaconditionallikelihood.Tounderstandwhy,denotethesufcientstatisticforbiasSi=niPj=1Yij,andletVidenotethesetofallpossibleni-vectorsofbinaryelementsYisuchthatniPj=1Yij=Si.Then,theconditionallikelihoodcorrespondingto( 2 )isoftheform L(jSi,i=1,...,m)=mYi=1niYj=1P(YijjXi,bi,Si)=mYi=1P(Yi1,...,YinijXi,bi)=P(SijXi,bi)=mYi=1niQj=1exp(XTij)Yij PYi2ViniQj=1exp(XTij)Yij. (2) ThebidropoutofL(jSi,i=1,...,m)duetoconditioningontheSi.OwingtothedenominatortermP(SijXi,bi),thepseudolikelihoodapproach,inwhichcontributionstothelog-likelihoodareweightedbythewjjiandwi,isnotpossible.Nonetheless,itispossibletouseinteger-scaledweightstoconstructapseudosampleandthenmaximize 35

PAGE 36

( 2 )forthepseudosample.Specically,scaletheweightswijtobeinteger-valued;denotethesescaledweightsaswij.Inpractice,onecanscaletheweightsandconvertthemtointegersbydividingalloftheweightsbythesmallestweight,multiplyingbyamoderatenumber,andthenroundingalloftheweightstothenearestinteger.Themoderatenumberneedstobesmallenoughsothattheestimationprocedureconcludesinareasonableamountoftimeandwithoutexceedingmemorylimitations.Then,thepseudosampleisconstructedbyreplicatingindividualjwithinneighbourhoodiwijtimes.Thelikelihoodat( 2 )becomes Qnij=1Qwijk=1exp(XTij)Yij PYi2CiQnij=1Qwijk=1exp(XTij)Yijk (2) whereCidenotesthesetofallpossiblebinaryvectorsYiofsizePnij=1wijsuchthatPnij=1Pwijk=1Yijk=Pnij=1Pwijk=1Yij,andYijkisoneorzerodependingonthedesignatedelementofCi.Wethenmaximizetheproductof( 2 )overiasafunctionoftoobtaintheconditionallikelihoodestimatorbasedonthepseudosample.ThesamplingdistributionoftheestimatorcanbeapproximatedusingabootstrapthatresamplesPSUs(primarysamplingunits)withreplacementignoringprimarystrata;thisisaconservativeapproachinthatitcanonlyoverestimatethetruesamplingvariability.Intuitionsuggested,andsimulationstudiessupported,theconjectureofChapter1thattheconditionalmaximumlikelihoodestimatorbasedonapseudosamplewithlarge-scalereplicationapproximatestheunconditionalMLEbasedonthatpseudosample.Theintuitionderivesfromtheapproximationoftheconditionalmaximumlikelihoodestimatorbyageneralizedlinearmixedmodelestimator(e.g.see[ 37 ],[ 45 ],[ 55 ],[ 54 ]and[ 57 ]).Forthegeneralizedlinearmixedmodelwithasmoothmixingdistributionhavingpositivesupportontheentirerealline,large-scalereplicationofobservationswouldleadtoanestimatorthattendstowardstreatingtherandomeffectsasxedeffects;treatingtherandomeffectsasxedeffectsleadstotheunconditionalMLE.InSection 2.3 ,weformalizeandprovearesultevenstrongerthantheconjecture 36

PAGE 37

foraspecialcase,thatofthematchedpairsdesign.InSection 2.4 ,weconductasimulationstudyinordertoinvestigateamorecomplexscenariowherethereisnorestrictiononXij.Section 2.5 concludeswithadiscussionofimplicationsandfuturedirections. 2.3MainResultsOurmainresultpertainstothematchedpairsdesignofSection 2.2 ,inwhichwjji=wi=1.Theorem.Formodel( 2 )andthematchedpairsdesignwithni=2,Xi1andXi2arethep-dimensionalcovariates.DeneCi=Xi2)]TJ /F5 11.955 Tf 12.43 0 Td[(Xi1=(Ci1,,Cip)p1andforatleastonejwhereCj6=0.Denoteby^c(k)theCMLEthatmaximizes( 2 )basedonpretendingthatapseudosampleinwhichwij=kistheobservedsample,and^isthestandardMLEbasedontheoriginalsample.Then(1)Ask!1,^c(k)approaches^monotonically;i.e.limk!1^c(k)=^;(2)^c(1)=^ 2.Proof.WeneedthefollowingtwoLemmas.Lemma1.ForCi6=0p1, limk!1Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.95 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)CTi Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ke(k)]TJ /F6 7.97 Tf 6.58 0 Td[(r)CTi=eCTi 2 1+eCTi 2. (2) TheproofofLemma1isinAppendixA.Lemma2.Withoutlossofgenerality,supposeCis>0,s=1,,p.Whens0,Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.95 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.58 0 Td[(r)CTiss Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)CTissisnon-increasingwithk!1;whens<0,Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.95 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.58 0 Td[(r)CTiss Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)CTissisnon-decreasingwithk!1.TheproofofLemma2isinAppendixB.LetYkij=(yij1,yij2,,yijk)T,j=1,2,andYki=(Yki1T,Yki2T)0.FurtherletSi=PjPlyijl.Byconstructionofthepseudosample,Si=0,kor2k. 37

PAGE 38

Fromthemodel( 2 ),wehaveprobabilitiesregardingtwocasesthatfYi1=1,Yi2=0gandfYi1=0,Yi2=1g,P(Yi1=1,Yi2=0)=ebi+xTi1 1+ebi+xTi11 1+ebi+xTi2,andP(Yi1=0,Yi2=1)=1 1+ebi+xTi1ebi+xTi2 1+ebi+xTi2.Supposingthatmodel( 2 )generatedthepseudosample,wehavethatP(Yki=0jSi=0)=1,andP(Yki=1jSi=2k)=1.Furthermore,P(Si=k)=kXr=0P((Xlyi1l)=rand(Xlyi2l)=k)]TJ /F5 11.955 Tf 11.96 0 Td[(r),sothatP(Si=k)=kXr=0kr ebi+xTi1 1+ebi+xTi1!r1 1+ebi+xTi1k)]TJ /F6 7.97 Tf 6.59 0 Td[(rkr ebi+xTi2 1+ebi+xTi2!k)]TJ /F6 7.97 Tf 6.59 0 Td[(r1 1+ebi+xTi2r=kXr=0kr21 1+ebi+xTi1k1 1+ebi+xTi2kebi+xTi1rebi+xTi2k)]TJ /F6 7.97 Tf 6.58 0 Td[(r. 38

PAGE 39

Forthepseudosample,whenSi=k,eitherYki1=1andYki2=0,orYki1=0andYki2=1.Usingtheprecedingexpression,P(Yki1=1,Yki2=0jSi=k)= ebi+xTi1 1+ebi+xTi1!k1 1+ebi+xTi2k Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr21 1+ebi+xTi1k1 1+ebi+xTi2kebi+xTi1rebi+xTi2k)]TJ /F6 7.97 Tf 6.59 0 Td[(r=1 Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2eCTik)]TJ /F6 7.97 Tf 6.58 0 Td[(r,andP(Yki1=0,Yki2=1jSi=k)=1 1+ebi+xTi1k ebi+xTi2 1+ebi+xTi2!k Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr21 1+ebi+xTi1k1 1+ebi+xTi2kebi+xTi1rebi+xTi2k)]TJ /F6 7.97 Tf 6.59 0 Td[(r=eCTik Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2eCTik)]TJ /F6 7.97 Tf 6.58 0 Td[(r.Notethatci=xi2)]TJ /F5 11.955 Tf 12.29 0 Td[(xi1andIfYki1=1,Yki2=0gandIfYki1=1,Yki2=0gindicatefYki1=1,Yki2=0gandfYki1=1,Yki2=0grespectively.Theconditionallikelihood( 2 )appliedtothepseudosampleisthen Lk(jSi,i=1,...,m)=mYi=12641 Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ecTik)]TJ /F6 7.97 Tf 6.59 0 Td[(r375IfYki1=1,Yki2=0g264ecTik Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ecTik)]TJ /F6 7.97 Tf 6.59 0 Td[(r375IfYki1=0,Yki2=1g=mYi=1ecTikIfYki1=0,Yki2=1g Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.37 Td[(kr2ecTik)]TJ /F6 7.97 Tf 6.59 0 Td[(rIfYki1=1,Yki2=0g+IfYki1=0,Yki2=1g. (2) 39

PAGE 40

Thustheconditionalloglikelihoodislk()=mXi=1kcTiIfYki1=0,Yki2=1g)]TJ /F6 7.97 Tf 16.41 14.95 Td[(mXi=1(IfYki1=1,Yki2=0g+IfYki1=0,Yki2=1g)log"kXr=0kr2e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)cTi#,whichismaximizedwhen @lk() @s=mXi=1kcisIfYki1=0,Yki2=1g)]TJ /F6 7.97 Tf 16.42 14.94 Td[(mXi=1cis(IfYki1=1,Yki2=0g+IfYki1=0,Yki2=1g)Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.96 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)cTi Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)cTi=0, (2) foralls=1,,p.Andersen[ 2 ]explainedtheideahowtoestimatetheMLEforsimplematchedpairsdata.Nowweusehismethodextendtop-dimensionalcovariatesmatchedpairsdesigns.Themodel( 2 )canberewrittenas logit(E(YijjXij,bi))=XTij+bi,(2)whereXij=0p1forj=1andXij=Ciforj=2andbi=bi+XTi1.ThenthelikelihoodfortheithstratumisLi=yi1i1(1)]TJ /F7 11.955 Tf 11.96 0 Td[(1)]TJ /F6 7.97 Tf 6.59 0 Td[(yi1i1)yi2i2(1)]TJ /F7 11.955 Tf 11.96 0 Td[(1)]TJ /F6 7.97 Tf 6.59 0 Td[(yi2i2),wherei1=exp(bi) 1+exp(bi),i2=exp(bi+cTi) 1+exp(bi+cTi).Thecorrespondinglog-likelihoodisli=yi1logi1+(1)]TJ /F5 11.955 Tf 11.96 0 Td[(yi1)log(1)]TJ /F7 11.955 Tf 11.95 0 Td[(i1)+yi2logi2+(1)]TJ /F5 11.955 Tf 11.95 0 Td[(yi2)log(1)]TJ /F7 11.955 Tf 11.96 0 Td[(i2). 40

PAGE 41

Thenwehavethepartialdifferentialequationsasbelow@li @i1=yi1)]TJ /F7 11.955 Tf 11.96 0 Td[(i1 i1(1)]TJ /F7 11.955 Tf 11.96 0 Td[(i1),@li @i2=yi2)]TJ /F7 11.955 Tf 11.96 0 Td[(i2 i2(1)]TJ /F7 11.955 Tf 11.96 0 Td[(i2),@i1 @s=0,s=1,,p,@i2 @s=cisi2(1)]TJ /F7 11.955 Tf 11.96 0 Td[(i2),s=1,,p.Thenthecorrespondingloglikelihoodcanbeexpressedby l()=mXi=1biIfYi1=1,Yi2=0g+(bi+cTi)IfYi1=0,Yi2=1g+(2bi+cTi)IfYi1=1,Yi2=1g)]TJ /F4 11.955 Tf 20.59 0 Td[(log(1+ebi))]TJ /F4 11.955 Tf 11.95 0 Td[(log(1+ebi+cTi), (2) whichismaximizedwhen @l() @s=mXi=1cis(IfYi1=1,Yi2=1g+IfYi1=0,Yi2=1g))]TJ /F6 7.97 Tf 17.07 14.95 Td[(mXi=1cisebi+cTi 1+ebi+cTi=mXi=1cisIfYi2=1g)]TJ /F6 7.97 Tf 25.71 14.95 Td[(mXi=1cisebi+cTi 1+ebi+cTi=mXi=1cis(yi2)]TJ /F7 11.955 Tf 11.96 0 Td[(i2)=0, (2) fors=1,,p.ItisalsoeasytoseethatE(Yi1+Yi2)=E(Yi+)=i1+i2.Thus,wehavethelikelihoodequations mXi=1cisyi2=mXi=1cisi2, (2) yi+=2Xj=1yij=2Xj=1ij, (2) wheres=1,,p. 41

PAGE 42

Denethetotalcountnumbersasn11=mPi=1IfYi1=0,Yi2=0g,n12=mPi=1IfYi1=0,Yi2=1g,n21=mPi=1IfYi1=1,Yi2=0g,andn22=mPi=1IfYi1=1,Yi2=1g.Substituteexp(bi) 1+exp(bi)+exp(bi+cTi) 1+exp(bi+cTi)intherighthandsideofequation( 2 ),then(a)when^bi=(i.e.^i1=^i2=0),itrefersthen11subjectswithyi+=0,i.e.fYi1=0,Yi2=0g;(b)when^bi=+1(i.e.^i1=^i2=1),itrefersthen22subjectswithyi+=2,i.e.fYi1=1,Yi2=1g;(c)when^bi=)]TJ /F5 11.955 Tf 10.5 8.09 Td[(cTi^ 2(i.e.^i1+^i2=1),itrefersthe(n12+n21)subjectswithyi+=1,i.e.fYi1=0,Yi2=1gorfYi1=1,Yi2=0g.BybreakingmPi=1P(Yij=1)intocomponentsforthesetsofsubjectshavingyi+=0,yi+=2andyi+=1,therighthandsideofthelikelihoodequation( 2 )is mXi=1cis(IfYi1=0,Yi2=0gi2(0)+IfYi1=1,Yi2=1gi2(1)+[IfYi1=1,Yi2=0g+IfYi1=0,Yi2=1g]exp(cTi^ 2) 1+exp(cTi^ 2))=mXi=1cis(IfYi1=1,Yi2=1g+[IfYi1=1,Yi2=0g+IfYi1=0,Yi2=1g]exp(cTi^ 2) 1+exp(cTi^ 2)), (2) wheres=1,2,,p.Meanwhile,thelefthandsideof( 2 )is mXi=1cisyi2=mXi=1cisIfYi2=1g=mXi=1cis(IfYi1=1,Yi2=1g+IfYi1=0,Yi2=1g). (2) 42

PAGE 43

Thenwesetthat( 2 )=( 2 ),thatis mXi=1cisIfYi1=0,Yi2=1g)]TJ /F6 7.97 Tf 24.38 14.94 Td[(mXi=1cis(IfYi1=0,Yi2=1g+IfYi1=1,Yi2=0g)exp(cTi^ 2) 1+exp(cTi^ 2)=0, (2) fors=1,2,,p.Solvinginequation( 2 ),wecanshowthat^istheMLE.Denote^c(k)istheCMLEwhichisthesolutionto( 2 ).Ifwesetk=1,thenequation( 2 )becomes mXi=1cisIfYi1=0,Yi2=0g)]TJ /F6 7.97 Tf 24.38 14.95 Td[(mXi=1cis(IfYi1=0,Yi2=1g+IfYi1=1,Yi2=0g)exp(cTi) 1+exp(cTi)=0, (2) where^c(1)isthesolution(CMLE).Comparingequations( 2 )and( 2 ),itiseasytoseethat^c(1)=^ 2.Therefore,byLemma1,limk!1^c(k)=^.Moreover,byLemma2,theconvergenceismonotonic.Since^c(k)monotonicallyconvergefrom^ 2to^whenkincreasefrom1to1,thentherangeeachelementof^c(k)ishmin(^s 2,^s),max(^s 2,^s)i,fors=1,,pandk=1,2,,1.Forsimplematchedpairdesigns,wehavetheCorollaryandLemma3andLemma4belowasaspecialcasefortheTheorem.Corollary.Formodel( 2 )andthematchedpairsdesignwithni=2,Xi1=0andXi2=1,letn12bethenumberofobservationswithYi1=0andYi2=1,andn21bethenumberofobservationswithYi1=0andYi2=1.Denoteby^c(k)theCMLEthatmaximizes( 2 )basedonpretendingthatapseudosampleinwhichwi=1andwjji=k 43

PAGE 44

istheobservedsample.Thenask!1,^c(k)approaches^monotonically,where^isthestandardMLEbasedontheoriginalsample.Lemma3. limk!12logPk)]TJ /F9 7.97 Tf 6.58 0 Td[(1r=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2k)]TJ /F5 11.955 Tf 11.95 0 Td[(r ke(k)]TJ /F6 7.97 Tf 6.58 0 Td[(r) 1+Pk)]TJ /F9 7.97 Tf 6.59 0 Td[(1r=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2r ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)=. (2) Lemma4.When0,2logPk)]TJ /F9 7.97 Tf 6.58 0 Td[(1r=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2k)]TJ /F5 11.955 Tf 11.95 0 Td[(r ke(k)]TJ /F6 7.97 Tf 6.58 0 Td[(r) 1+Pk)]TJ /F9 7.97 Tf 6.59 0 Td[(1r=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2r ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)isnon-increasingwithk!1;when<0,2logPk)]TJ /F9 7.97 Tf 6.58 0 Td[(1r=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2k)]TJ /F5 11.955 Tf 11.95 0 Td[(r ke(k)]TJ /F6 7.97 Tf 6.58 0 Td[(r) 1+Pk)]TJ /F9 7.97 Tf 6.59 0 Td[(1r=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2r ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)isnon-decreasingwithk!1.TheproofsofLemma3andLemma4wasrstprovedbyHeandBrumback[ 28 ].SimpliedproofscanbederivedfromtheproofsofLemma1and2bynoticingthatPkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2k)]TJ /F6 7.97 Tf 6.59 0 Td[(r ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r) Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2r ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)=(1)]TJ /F11 11.955 Tf 13.15 17.81 Td[(Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.95 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r) Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ke(k)]TJ /F6 7.97 Tf 6.58 0 Td[(r)))]TJ /F9 7.97 Tf 6.59 0 Td[(1)]TJ /F4 11.955 Tf 11.95 0 Td[(1.Hencelimk!12logPkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2k)]TJ /F6 7.97 Tf 6.59 0 Td[(r ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r) Pkr=0)]TJ /F6 7.97 Tf 5.47 -4.38 Td[(kr2r ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)=2log(1)]TJ /F5 11.955 Tf 23.6 8.09 Td[(e=2 1+e=2)]TJ /F9 7.97 Tf 6.58 0 Td[(1)]TJ /F4 11.955 Tf 11.96 0 Td[(1)=andthemonotonicityofPkr=0(kr)2k)]TJ /F10 5.978 Tf 5.76 0 Td[(r ke(k)]TJ /F10 5.978 Tf 5.76 0 Td[(r) Pkr=0(kr)2r ke(k)]TJ /F10 5.978 Tf 5.76 0 Td[(r)isthesameasPkr=0(kr)2(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)e(k)]TJ /F10 5.978 Tf 5.75 0 Td[(r) Pkr=0(kr)2ke(k)]TJ /F10 5.978 Tf 5.75 0 Td[(r).Denotefk()=2logPk)]TJ /F9 7.97 Tf 6.59 0 Td[(1r=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2k)]TJ /F5 11.955 Tf 11.96 0 Td[(r ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r) 1+Pk)]TJ /F9 7.97 Tf 6.58 0 Td[(1r=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2r ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r).WeusedMaple(version12)toconstructFigure 2-1 andFigure 2-2 toillustratethatlimk!1fk()=andthattheconvergenceismonotone.Oneobservesthatfork=1,f1(0.5)=1andf1()]TJ /F4 11.955 Tf 9.3 0 Td[(0.5)=)]TJ /F4 11.955 Tf 9.3 0 Td[(1.Thiscoincideswithequation( 2 ),inwhichf1()=2;inotherwords,^c(1)=logn21 n12,aswewouldexpectoftheconditionalmaximumlikelihoodestimatorassumingnoreplication(k=1). 44

PAGE 45

Figure2-1. fk()asafunctionofkfor=0.5 45

PAGE 46

Figure2-2. fk()asafunctionofkfor=)]TJ /F4 11.955 Tf 9.3 0 Td[(0.5 2.4SimulationStudyWeconductedasimulationstudyusingamatchedpairsdesignwithtwocovariates.Wesampledm=1000clustersandweletni=2.WelettherstcovariateXijbeBernoulliwithprobabilitypxij=logit)]TJ /F9 7.97 Tf 6.58 0 Td[(1(ui),whereui,i=1,...,misi.i.d.N(0,1).WeletthesecondcovariateZijbei.i.d.standardnormaldistributionN(0,1).Leti,i=1,...,malsobei.i.dN(0,1)(independentoftheui),Xi=(1=ni)PjXijandZi=(1=ni)PjZij, 46

PAGE 47

andwesimulatedbi=)]TJ /F4 11.955 Tf 9.29 0 Td[(5Xi+3Zi+i.FinallywesimulatedbinaryresponsesYij=logit)]TJ /F9 7.97 Tf 6.58 0 Td[(1(0.5Xij+0.8Zij+bi).Thetrueis=(0.5,0.8).Werepeatedthesimulation100times.Againdenoteby^c(k)theconditionalmaximumlikelihoodestimatorthatmaximizes( 2 )basedonpretendingthatapseudosampleinwhichwi=1andwjji=kistheobservedsample.Let^denotetheunconditionalMLEbasedontheoriginalsample.Table1presentstheaverageandstandarddeviationof^c(k)withrespecttothe100simulations.Forcomparison,theaverage(standarddeviation)of^is(1.0814,1.6508)(standarddeviationsare0.3838and0.1935respectively).Onereadilyobservesthat^c(k)tendstowards^monotonicallyaskincreases. 47

PAGE 48

Table2-1. Resultsofsimulationstudywith=(0.5,0.8).TheMLEis^=(1.0814,1.6508)withSD(^)=(0.3838,0.1935). k^cSD(^c) 1(0.5407,0.8254)(0.1919,0.0967)2(0.7243,1.1044)(0.2550,0.1201)3(0.8148,1.2422)(0.2866,0.1334)4(0.8691,1.3252)(0.3058,0.1424)5(0.9055,1.3808)(0.3187,0.1490)6(0.9316,1.4205)(0.3281,0.1541)7(0.9511,1.4503)(0.3351,0.1581)8(0.9662,1.4735)(0.3406,0.1615)9(0.9783,1.4921)(0.3450,0.1642)10(0.9882,1.5072)(0.3487,0.1666)15(1.0187,1.5540)(0.3599,0.1744)20(1.0343,1.5782)(0.3658,0.1789)50(1.0628,1.6221)(0.3767,0.1876)100(1.0722,1.6366)(0.3802,0.1906) 2.5DiscussionWehaveshownthatforthematched-pairsdesignandalogisticregressionmodelwithaninterceptforeachpair,replicatingobservationsktimesandmaximizingtheconditionallikelihoodresultsinanestimatorwhoselimitaskapproachesinnityisexactlyidenticaltotheMLEwithaxedeffectforeachpair.Wepresentedresultsofasimulationstudyandgurestoillustrateourresults.Oneimplicationoftheresultisthattheuseofconditionallogisticregressionwithapseudosampleforcomplexsurveydesignswillsometimesleadtoabiasedestimator.Itwouldbeinterestingtofurtherinvestigatewhethertheequivalenceofinnitelyreplicatedconditionallogisticregressionwithunconditionallogisticregressionextendstogeneraldesigns.Theapproachtakeninthismainresultsectionextendtogeneraldesignformatchedpairswithbinaryoutcomes.Analternativeapproach,asmentionedintheintroductionsection,wouldbetoapproximatetheconditionalmaximumlikelihoodestimatorwithageneralizedlinearmixedmodelestimatorbasedonasmoothmixingdistributionhavingpositivesupportontheentirerealline.Then,large-scalereplicationwouldleadtoaGLMMestimatortendingtowardstreatingtherandomeffectsasxed 48

PAGE 49

effects(becausethereplicatedobservationswoulddominatetheBayesianprior),andtreatingtherandomeffectsasxedeffectsyieldstheunconditionalMLE.PreviousauthorshaveidentiedsomeresultsontheexactequivalenceofCMLEswithGLMMestimatorsinthespecialcaseoftheRaschmodels[ 52 ].Lindsayetal.[ 37 ]studiedestimationtechnologiesinRaschmodelsusingconditionallikelihoodmethods,whileRice([ 54 ]and[ 55 ])analysedtheequivalencebetweenconditionallikelihoodmethodandrandomeffectmodelsforRaschmodelsandmatched-paircase-controlstudies.Fortheseresultstobeusefulevenforconstructingaproofalternativetotheonepresentedhereforthematched-pairdesign,wewouldneedtoknownotonlytheexistenceofamixingdistributionyieldingexactequivalence,butalsoaboutcertaindetailsofthatmixingdistribution(suchasitssmoothnessandsupport)fork:kmatchedstudies.However,fork:kmatchedstudies,theformofmixingdistributionremainsunknown[ 55 ].YetanotherpossibilitywouldbeattemptusingtheresultinProposition3.2ofSeverini[ 57 ]toapproximatetheconditionallikelihoodfunctiontothethirdorderwithageneralizedlinearmixedmodelbasedonacomplicatedmixingdistribution.Thiswouldleadtoanasymptoticequivalenceratherthananexactequivalence. 49

PAGE 50

CHAPTER3ADJUSTINGFORCONFOUNDINGBYNEIGHBORHOODUSINGCOMPLEXSURVEYDATA 3.1OutlineRecently,GraubardandKorn[ 20 ]haveadaptedanoldermethodtocomplexsurveydata.Themethodisbasedonaweightedpseudo-likelihood,inwhichthecontributionfromeachneighborhoodinvolvesallpairsofcasesandcontrolsintheneighborhood.Inthischapter,weimplementtheirmethodbytranslatingthepairwisepseudo-conditionallikelihoodintoanequivalentordinaryweightedlog-likelihoodformulation.Wealsoshowthatitcorrespondstoanequivalentordinaryweightedlog-likelihoodformulationwithbinaryoutcomes.Weexplainhowtoprogramthemethodusingstandardsoftwareforordinarylogisticregressionwithcomplexsurveydata.Wethenapplythemethodto2009NationalHealthInterviewSurvey(NHIS)publicusedata,toestimatetheeffectofeducationonhealthinsurancecoverage,adjustingforconfoundingbyneighborhood. 3.2IntroductionInChapter1,weconsideredtheproblemofadjustingforconfoundingbyneighborhoodofanindividualexposureeffectonabinaryoutcome,usingcomplexsurveydatainwhichneighborhoodsaresampledwithunequalprobabilities,asareindividualswithinneighborhoods[ 14 ].ItismotivatedbyananalysisofNHISpublicusedatatoestimatetheeffectofeducationonmammographyscreening,adjustingforconfoundingbyneighborhood.Whenonlysomeneighborhoodsaresampled,theproblemofadjustingforconfoundingbyneighborhoodisoneofadjustingforconfoundingbycluster.InChapter1,wehaveexaminedthreecategoriesofmethods:ordinarylogisticregressionforsurveydata,witheithernoeffectoraxedeffectforeachcluster;conditionallogisticregressionextendedforsurveydata;andgeneralizedlinearmixedmodel(GLMM)regressionforsurveydata.Becausetheconditionallog-likelihoodcannotbeweightedtoproduceapseudolikelihood,conditionalmaximumlikelihoodwasappliedtoapseudosampleconstructedbyreplicatingobservationsaccordingtointeger-scaled 50

PAGE 51

versionsoftheweights.TheGLMMmethodutilizedthegeneralizedlinearlatentandmixedmodel(GLLAMM)StatasoftwareofRabe-HeskethandSkrondal[ 51 ]combinedwithanadaptationforsurveydataofthe`poorman's'aproposedbyNeuhausandKalbeisch[ 44 ]anddiscussedbyNeuhausandMcCulloch[ 46 ]andBrumbacketal.[ 14 ].Theapproachmodelstheassociationofthelatentneighborhoodeffectwiththeneighborhood-levelvectorofindividual-levelexposuresusingaGaussianmodellinearintheneighborhoodmeanoftheindividualexposures.TheGLLAMMsoftwareimplementsmaximumpseudolikelihoodwithcomplexsurveydata.InChapter1,weshowedthatthesemethodsworkedinsomewelldenedsettings.Buttheyfaileddramaticallywhenthesamplingbiaswasstrongandthesamplesizesoftheneighborhoodsweresmall.Thisinterestingndingmotivatesustondanewmethodthatworksproperlywhenthesamplingbiasisstrong.Subsequently,GraubardandKorn[ 20 ]showedhowtoadapttocomplexsurveydataamethodduetoLiang[ 35 ]foradjustingforconfoundingbycluster.Theiradaptationachievesaconsistentestimatorevenwhenneighborhoodsamplesizesaresmallandtheselectionbiasisstronglyinformative.Themethodisbasedonweightedpseudolikelihoods,wherethecontributionfromeachneighborhoodinvolvesallpairsofcasesandcontrolsintheneighborhood.Thecase-controlpairsaretreatedasiftheywereindependent,apairwisepseudo-conditionallikelihoodisthusderived,andthenthecorrespondingscoreequationisweightedwithinverse-probabilitiesofsamplingeachcase-controlpair.Inthischapter,wesimplifytheimplementationofGraubardandKorn's[ 20 ]approach,usingamethodduetoBreslowetal.[ 12 ]whichtranslatesthepairwisepseudo-conditionallikelihoodintoanequivalentordinaryweightedlog-likelihood;Agresti[ 1 ](Section10.2.6andSection4)describedtheapproachofBreslowetal.[ 12 ].Usingthesimpliedimplementation,weshowhowtoeasilyprogramthemethodusingsoftwareforordinarylogisticregressionforcomplexsurveydata(e.g.SASPROCSURVEYLOGISTIC.)WealsobroadenthesamplingscenariosconsideredbyGraubard 51

PAGE 52

andKorn[ 20 ].Weshowthatthemethodapplieseventosamplingsituationsinwhichtheprimarysamplingunits(PSUs)arenestedwithinneighborhood,asisthecasewithsurveydesignssuchastheBehavioralRiskFactorSurveillanceSystem(BRFSS)survey[ 53 ].WedemonstratethevalidityofoursimpliedimplementationbyapplyingittothesimulationconsideredbyBrumbacketal.[ 14 ]forwhichthepreviousmethodsfailed;thenewmethodperformsbeautifully.Wealsoapplythenewmethodtoananalysisof2009NHISpublic-usedata[ 10 ],toestimatetheeffectofeducationonhealthinsurancecoverage,adjustingforconfoundingbyneighborhood.Weoperationalizeneighborhoodasaperson'shousehold,whichispresentintheNHISpublic-useles.Innextseveralsections,wepresentoursimpliedimplementationmethodandthebroadenthesamplingscenarioswherethemethodworks.InSection 3.3 wedescribethemotivatingNHISexample;Section 3.4 outlinesthemodelingframework;andSection 3.5 detailstheapproachtoestimationwithcomplexsurveydataandthesimpliedimplementation.WepresentsimulationresultsinSection 3.6 ,andinSection 3.7 weanalyzetheNHISdataasanapplication.Section 3.8 concludeswithadiscussion. 3.3MotivatingNHISExampleOurresearchhasbeenmotivatedbyananalysisof2005NHISdatadiscussedbyBrumbacketal.[ 14 ].Itisastudyofindividual-levelandsmallneighborhood-levelpredictorsofhealth-relatedoutcomessuchasmammographyscreeningandhealthinsurancestatus[ 10 ].Individual-levelpredictorsthatwehavestudiedincludeeducation,agecategory,andrace/ethnicity.Neighborhood-levelpredictorsincludemeasuredcensus-tractvariablessuchaspovertyandsegregationindices,aswellasthehigh-dimensionalcategoricalvariablethatisneighborhooditself.Inthecourseofourresearch,thequestioncameuptoourmindhowcouldweadjustforconfoundingofindividuallevelvariablesbyunmeasuredneighborhoodfactorsusingthehigh-dimensionalcategoricalvariable?Inthissection,forexpositorypurposes,wewillfocusourattentionontheeffectofdichotomizededucation(morethanhighschoolversushighschoolor 52

PAGE 53

less)onhealthinsurancestatus(coveredornotcovered)adjustingforconfoundingbyneighborhood,operationalizedashousehold,becausecensus-tractorthesecondarysamplingunit(SSU)informationisnotincludedinthepublic-usele.Weusedthe2009NHISdataandconsideredonlyadultsovertheageof18;thetotalsamplesizewas64,616adults.Thenweexcludedindividualswithmissingorunknownresponsestotheeducationandinsurancecoveragequestions;thisresultedinanalsamplesizeof62,904adultsnestedin33,429householdsranginginsizefrom1to9.Ninety-vepercentofthehouseholdsinthenalsamplehadthreeorfewerincludedadultrespondents.Weusedtheinterimannualsurveyweights(WTIAinthepublic-usele),whichdonotincludethenalpoststraticationadjustment.Poststraticationadjustsweightsassociatedwiththesampleddatasothatthejointdistributionofasetofpost-stratifyingvariablesmatchestheknownpopulationjointdistribution([ 31 ],[ 39 ]).Poststraticationadjustmentsmayinterferewiththecomputationofthepairwiseweightsthatweneedforthemethodology,andwewilldiscussthisindetailinSection 3.5 3.4ModelingFrameworkLeti=1,,Mindexallneighborhoodsinthepopulation,andj=1,,Niindexallindividualsinthepopulationwithinagivenneighborhood.LetYijbeabinaryvariableindicatinghealthinsurancecoverageforpersonjinneighborhoodi,andcorrespondinglyletXijsimilarlyindicateeducationlevel.ThenasimplemodelfortheeffectofexposureXijonoutcomeYijthataccountsforconfoundingbyageneralhouseholdeffectbiis h(E(YijjXi,bi))=Xij+bi,(3)wherehisthelogitlinkfunctionandXi=(Xi1,,XiNi)indicatesalltheexposureinformationwithinneighborhoodi.Acausalinterpretationofinmodel( 3 )wouldbebasedonapotentialoutcomeframeworkwithYxijbeingthepotentialoutcometosettingXij=x.Ifhouseholdwasthe 53

PAGE 54

onlyconfounder,wewouldassume fYxijgqXijjbi,(3)wherefYxijgdenotesthesetofpotentialoutcomesdenedbyallpossiblevaluesofx.Thenwewouldalsoneedaconsistencyassumption,whichstatesthatthepotentialoutcomesYxijarewell-denedandthatthoseforagivenindividualdonotdependontherealizedexposuresforotherindividuals.Withtheseassumptions,ofmodel( 3 )isidenticaltoofthefollowingcausalmodel h(E(Yxijjbi))=x+bi.(3)Thecausalmodel( 3 )issetupbasedonthepotentialoutcomesfYxijgregardingallpossiblevaluesofx.InChapter1,wediscussed,insomeconditions,thattheconsistencyassumptionandcausalmodel( 3 )couldbedubious.Thereforewewillfocusourinterestinmodel( 3 )ratherthanthecausalmodel( 3 ).Model( 3 )retainsinterestevenwhenacausalinterpretationisuntenable;onemaywishtoknowhowmuchoftheassociationbetweenXijandYijcanbeaccountedforbyunmeasuredhousehold-levelfactors. 3.5EstimationwithComplexSurveyDataDenotetheprobabilityofselectingindividualjfromhouseholdiintothesampleaspijandthecorrespondinginverse-probabilityweightaswij.IfwehadanordinaryclustersampleaccordingtothemodelsinSection 3.4 ,wecoulduseconditionallogisticregressiontoestimate.Conditionallogisticregressionconditionsonsufcientstatisticsforthebi,andndsthemaximumlikelihoodestimatorbasedontheconditionallikelihood[ 1 ].Estimationandinferenceareimplementedin,forexample,SASPROCLOGISTICusingthestratumstatement.AsdemonstratedbyGraubardandKorn[ 20 ],thismethoddoesnotreadilygeneralizetothecomplexsurveysamplesetting.Followingtheirexposition,ordertheobservationssothat,withincluster 54

PAGE 55

i,therstKiindividualshaveYij=1andtheremainingNi)]TJ /F5 11.955 Tf 12.19 0 Td[(KihaveYij=0.Thentheconditionallikelihoodcontributionfromclusteriis Li=QKij=1exp(Xij) PcQKij=1exp(Xic(j)),(3)wherethesuminthedenominatorisoverallNichooseKiways(indexedbyc)ofselectingKielementsfromthesetf1,2,,Nig[ 1 ].c(j)representsthejthoftheKielementsselectedbyc.AsgivenbyGraubardandKorn[ 20 ],thecontributiontothescoreequationfromclusterithushastheform Si()=KiXj=1Xij)]TJ /F11 11.955 Tf 13.15 18.44 Td[(Pc(PKij=1Xic(j))(QKij=1exp(Xic(j))) PcQKij=1exp(Xic(j)).(3)Thenwehavethetotalscoreequation,S()=PMiSi().Onecanobservethatitisnotasimplesumofcontributionsfromeachpersoninthepopulationindexedbyiandj.Therefore,itisnotpossibletoweightcontributionsofsuchasumwithweightswij,aswouldbedonewithpseudolikelihoodestimation.FollowingLiang[ 35 ],GraubardandKorn[ 20 ]resolvedtheproblembyconsideringthecluster-specicpairwiseconditionallikelihood LLi=KiYj=1NiYl=Ki+1exp(Xij) (exp(Xij)+exp(Xil)).(3)Thisconditionallikelihood( 3 )wouldariseinclusteriifallpossiblepairs(j,l)ofobservationswithYij=1andYil=0weretreatedasiftheywereindependent(seeAgresti[ 1 ],Section10.2.6).GraubardandKorn[ 20 ]givethescoreequationscorrespondingtoLLias SLi=KiXj=1NiXl=Ki+1Xij)]TJ /F5 11.955 Tf 13.15 8.09 Td[(Xijexp(Xij)+Xilexp(Xil) exp(Xij)+exp(Xil).(3)ThescoreEquation( 3 )sumsoverallpossiblepairs(j,l)ofpositiveandnegativeoutcomesforclusteri.ThetotalscoreequationisthenSL()=PMi=1SLi(). 55

PAGE 56

Nextweestimate( 3 )usingcomplexsurveydata.Wedenotetheprobabilityofobservingpair(j,l)asqijl,andweletwijldenoteitsreciprocal.FortheNHISdata,theprobabilityofobservingapairfromthesameclusteristhesameastheprobabilitythatweobserveonememberofthatcluster,becausealladultsinahouseholdweresampled.Inotherwords,wijl=wij=wil.IfwewereusingBRFSSdata,inwhichindividualsaresampledroughlyindependentlyofoneanother,theprobabilityofobservingapairwouldbeapproximatelytheproductoftheprobabilitiesthatweobserveeachmemberofthepair.ForoursimulationinSection 3.6 ,theprobabilityofthepairisalsotheproductoftheprobabilities,wijl=wijwil.Weassumethattherearemsampledclusters,kisampledindividualswithYij=1inclusteri,and(ni)]TJ /F5 11.955 Tf 11.99 0 Td[(ki)sampledindividualswithYij=0inclusteri.Usingthepairwiseweightswijl,weestimateSLiby ^SLi=kiXj=1niXl=ki+1wijlXij)]TJ /F5 11.955 Tf 13.15 8.09 Td[(Xijexp(Xij)+Xilexp(Xil) exp(Xij)+exp(Xil),(3)andcorrespondinglyweestimateSLiby^SL=Pmi=1^SLi. 3.5.1SimpliedImplementationGraubardandKorn[ 20 ]usedspecialprogrammingtomaximize^SLandcomputestandarderrorsandcondenceintervals.However,ifonemakesuseofthecorrespondencebetweenconditionalmaximumlikelihoodformatchedpairsandordinarymaximumlikelihoodrstproposedbyBreslowetal.[ 12 ],theprogramminggreatlysimplies.ConsiderthecontributiontoLLifrompair(j,l),inwhichYij=1andYil=0,whichisgivenby P(Yij=1,Yil=0jYij+Yil=1)=exp(xij) exp(xij)+exp(xil).(3)Agresti([ 1 ],Section10.2.6)explainsthatifwedividethenumeratoranddenominatorbyexp(xij),theequationhastheformoflogisticregressionwithnointercept,withYi(j,l)=0asaconstantoutcome,andpredictorvaluesXi(j,l)=Xil)]TJ /F5 11.955 Tf 12.75 0 Td[(Xij.Then 56

PAGE 57

Equation( 3 )canberevisedasaordinarylogisticregressionform P(Yij=1,Yil=0jYij+Yil=1)=P(Yi(j,l)=0jYij+Yil=1)=1 1+exp(xi(j,l)).(3)ThisgivesrisetoanestimatingequationSLthathastheformofweightedordinarylogisticregression.Therefore,SASPROCLOGISTIC(withoutthestratumstatement)couldbeusedwithYi(j,l)andXi(j,l)toestimatewhentheweightswijlareallequaltoone,buttheestimatedsamplingvariabilitywouldbeinconsistentbecausethepairs(j,l)arenotindependent.Aconvenientfeatureofsoftwareforsurveydata,suchasSASPROCSURVEYLOGISTIC,isthatconsistentestimatorsofsamplingvariabilityareobtainedprovidedthatthespeciedPSUsareindependent.Furthermore,theweightswijlneednotequalone,andtheprocedurescanaccommodatecomplexmultistagesampling.Toaccommodatecomplexmultistagesamplingdesigns,wecanuseasandwichestimatorforthevarianceoftheestimatingequationSLprovidedthatthePSUsareindependent[ 58 ].WenotethatonecanpartitionthedatasetintoalternativePSUsandstillobtainconsistentestimatorsofthevariance,providedthattwoalternativePSUsdonotcontainobservationsbelongingtoasinglePSU,andalsothatthealternativePSUsresideinthesamestratumasthesinglePSUnestedwithinthem.Therefore,wecanusethesimpliedprocedureevenwhenthePSUrepresentsanindividual,asthesamewiththeBRFSSsurveydata.Inotherwords,fortheBRFSSdata,wecanletthealternativePSUbetheneighborhood,providedthattheneighborhoodlieswithinasinglestratum.Evenifneighborhoodoverlapsstrata,wecanobtainconservativeestimatesofvariabilitybyignoringthestraticationbecauseincorporatingstraticationcanonlyreducetheestimatedvariance.FortheNHISdatathatweanalyzeinthischapter,neighborhood(i.e.household)isnestedwithinPSU,andthereforewecanusethepublic-usePSUsinconjunctionwiththeCLUSTERstatementofPROCSURVEYLOGISTIC. 57

PAGE 58

UnlikeSASPROCLOGISTIC,SASPROCSURVEYLOGISTICwillnotworkwhenalloftheoutcomesareequaltozero.However,followingtheaboveargumentinAgresti[ 1 ],wecanhandlethisproblembysettinganyfraction(e.g.thosecorrespondingtotherstthousandhouseholds)ofthepairedobservationstohaveYi(j,l)=1andXi(j,l)=Xij)]TJ /F5 11.955 Tf 12.95 0 Td[(Xil.Intheappendix,wewillprovidesimpleSAScodeforthedatamanagementandtheuseofPROCSURVEYLOGISTICinSASsoftware.Allinferencewillbedirectedatsuperpopulation[ 58 ]parameters.SupposethepopulationofMneighborhoodsandNiindividualsperneighborhoodinmodel( 3 )isconceptualizedasarandomtwo-stagesamplefromthesuperpopulation,forwhichaversionofmodel( 3 )holdsbutforinniteMandNi.Intherststage,Mneighborhoodsareselectedcompletelyatrandomfromahypotheticallyinnitepopulationofneighborhoods,andinthesecondstage,Niindividualsareselectedcompletelyatrandomfromahypotheticallyinnitepopulationofindividualswithinneighborhood. 3.5.2ExtensiontoMultipleIndividual-LevelCovariatesTheabovemethodsextendverysimplytohandlemultipleindividual-levelcovariatesinmodel( 3 ).Forpdimensionalcovariates,model( 3 )isthenrevisedas h(E(YijjXi,bi)=XijT+bi,(3)whereXij2Rpand2RpandXidenotesallthecovariatesinformationwithinclusteri,i.e.Xi=(Xi1,,XiNi)T.Weformacovariate,Xi(j,l)=Xil)]TJ /F18 11.955 Tf 10.46 0 Td[(Xil,analogoustoXi(j,l)foreachofthemultiplecovariates.ThentheimplementationandinterpretationcanbeproceededasdescribedinSection 3.5.1 3.5.3SpecifyingthePairwiseWeightsinPracticeInpractice,specifyingthepairwiseweightsmaybechallengingorrequireadditionalassumptionsforvalidity.WithourNHISexample,weareusingthesurveydesign 58

PAGE 59

weights,whichneitherincludethenalpoststraticationadjustmentforunitnonresponsenoradjustmentsforitemnonresponse.Thisleadstoaccurateinferenceprovidedthatthenonresponseisnotinformative.Analternativepossibilitywouldbetomultiplywijlbythefactoraijl,whichisintendedtorepresenttheinverse-probabilityofcompletedatafrompair(ij,il)conditionaluponsamplingpair(ij,il)andonthesurveydesignvariables.Inpractice,aijlwouldlikelybemodeledasaijajl,whereaijiseitherapoststraticationadjustmentforindividualijortheproductofthepoststraticationadjustmentandaweightforitemnonresponse.Thismethodwouldleadtoaccurateinferenceprovidedthattheaijaccuratelyrepresentstheinverse-probabilityofcompletedatafrompersonijconditionalonhavingsampledpersonijandonthedesignvariables,andthattheconditionalprobabilityofcompletedatafrompersonijisindependentofthatforpersonil.Typically,however,aijisanadhocadjustmentthatdoesnottrulyrepresenttheinverse-probabilityofcompletedatafrompersonijconditionalonhavingsampledpersonijandonthedesignvariables.Forexample,thenonresponseprobabilitiesmightactuallydependonthesurveydesignvariables,insteadofbeingconstantacrossthemasisatypicalpoststraticationadjustmentfactor.However,forsituationsinwhichnonresponseisbelievedtobeinformative,itmaybepreferabletousethefactoraijailandinvoketheassumptionthatitapproximatesthetrueaijl. 3.6SimulationStudyInthissection,wedemonstratedthevalidityofthenewmethodandimplementationusingthestrongsamplingbiassimulationsettingsofChapter1(Brumbacketal.[ 14 ]),wherepreviousmethodsfailed.First,wesimulatedthepopulationwithMarbitrarilylarge(inpractice,weneededonlysetM=m=1000)andNi=1000foreachi.WeletXijbeindependent(givenui)Bernoullirandomvariableswithprobabilityexpit(ui),whereuiarei.i.d.N(0,1).Weletbi=)]TJ /F4 11.955 Tf 9.3 0 Td[(5Xi+i,withii.i.d.N(0,1).Finally,wegeneratedYijaccordingtomodel( 3 ) 59

PAGE 60

withourtargetofestimationsetequalto0.5.Second,wesampledfromthepopulationwithstronglybiasedsamplingscheme.Werstdividedtheindividualobservationsintoconcordant(C=1)anddiscordant(C=0)groupsbasedonwhetherYij=Xijornot.Wenallyincludedobservationsintothesamplewithindependentprobability0.002ifC=1and0.004ifC=0.Thenweincludedm=1000neighborhoodsintothesample,andnivariedforeachclusterduetooursamplingscheme.Observationsweregivensurveyweightswij=2ifC=1,andwij=1ifC=0.Owingtoindependentsamplingwithinneighborhoods,wijl=wijwil.Thesimulationwasrepeated100timestostudytheperformanceofthemethodinestimating(truevalue=0.5). 3.6.1ResultsThemeanoftheestimatesofwas0.5108withastandarddeviationof0.18andarangeof(0.16,0.99).Thisindicatesthatthemethodandtheimplementationperformverywellindeed,especiallywhencomparedwiththemethodsconsideredinChaper1(Brumbacketal.[ 14 ]).InChapter1,forthesamesimulationsettings,forconditionallogisticregressionwithapseudosamplethemeanoftheestimateswas0.10(SD=0.15,range=(-0.22,0.47)),andfortheGLLAMMmethod,themeanoftheestimateswas-0.27(SD=0.18,range=(-0.62,0.16))(seeTable( 1-2 )).Comparingthesesimulationresults,ourimplementednewmethodisobviouslysuperiortotheothertwomethodsunderthestronglybiasedsamplingscheme. 3.7ApplicationtoNHISDataNextweappliedthenewmethodandimplementationtotheNHISdatatoestimatetheeffectofeducationonhealthinsurancecoverage,adjustingforconfoundingbyhousehold.Theadjustedoddsratiowas1.23witha95percentcondenceintervalof(1.10,1.38).Forcomparison,wealsocomputedthecrudeoddsratiousingordinarylogisticregressionforcomplexsurveydata[ 32 ];itwas1.27witha95percentcondenceintervalof(1.18,1.37).Thus,attherstglance,thereappearstobenotmuchconfoundingbyhousehold,becausetheadjustedoddsratioestimateissimilarto 60

PAGE 61

thatofthecrudeoddsratio.Aswemightexpect,adjustingforhouseholdattenuatestheestimatedassociationbetweeneducationandhealthinsurancecoverage,althoughtheattenuationisslight.Unliketheriskdifferenceandrelativerisk,theoddsratioisnotcollapsibleevenassumingnoconfounding[ 25 ],andouroutcomeisrelativelyprevalent(theestimatedpercentofadultswithhealthinsurancecoverageis82.9percentwitha95percentcondenceintervalof(82.3,83.4),sothattheoddsratiodoesnotapproximatetherelativerisk.Conversely,thatthehousehold-specicoddsratioandthepopulation-averagedoddsratioareequaldoesnotimplythatthereisnoconfounding.InChapter1(Brumbacketal.[ 14 ]),wediscussedamethodforestimatingapopulation-averagedoddsratiothatisadjusted(i.e.standardized)forconfoundingbycluster;however,thatmethodutilizedaGLMMapproachratherthantheconditionallogisticregressionapproachconsideredinChapter1.Estimatingapopulation-averagedoddsratiothatisstandardizedforconfoundingbyclusterusingcomplexsurveydataisstillanopenproblem. 3.8DiscussionWehaveconsideredtheproblemofadjustingforconfoundingbyclusterinthecontextofcomplexmultistagesamplingandabinaryoutcome.WehaveachievedourgoalofndingamethodthatworksinthesettingofthestronglybiasedsamplingsimulationrstconsideredbyBrumbacketal.[ 14 ].Wehavealsosucceededinimplementingthemethodusingstandardsoftwareandbasicdatamanagement.Furthermore,wehavepointedoutthattheapplicabilityofthemethodisbroaderthanthatconsideredbyGraubardandKorn[ 20 ].Nevertheless,itwouldbeoffurtherinteresttondamethodthatwouldallowustoestimatethedistributionoftherandomclustereffectsconditionaloncluster-levelaggregatefunctionsoftheindividualcovariates.Inotherwords,whereaswehavefoundageneralizationofconditionallogisticregressionforcomplexsurveydata,wehavenotfoundageneralizationofGLMMsforcomplex 61

PAGE 62

surveydatathatworkswhenthesamplingbiasisstrongandtheclustersamplesizesaresmall.Furtherresearchisstillneeded.Additionally,themethodusedinthischapterrequiresaknowledgeoftheprobabilitythatagivenpairofobservationswithinaneighborhoodisselectedintothesample.Thisprobabilitywaseasilyobtainedforouroperationalizationofneighborhood,becauseeverymemberofahouseholdissampledfortheNHIS.However,inChapter1(Brumbacketal.[ 14 ]),weoperationalizedneighborhoodasSSU,andnoteverypersoninanSSUissampled.Intheirapplication,individualsinthedatasetwerefromdifferenthouseholds.Basedonthesuperpopulationmodelthatsupposesthatthesuperpopulationneighborhoodisinniteinsize,onemightplausiblyassumethatindividualsjandlweresampledindependentlyfromthesuperpopulation,giventhattheirSSUwassampled.FortheNHISapplicationinthethischapter,ifwehadtodeneneighborhoodasSSU,wewouldneedtoknowwhetherindividualsjandlwereinthesamehouseholdornot,whichwouldnotbeaproblem,becausethisinformationisavailabletous.Forotherapplications,however,itcouldbethecasethatthedenedneighborhoodisbiggerthantheclustersentailedbythenalclusteringstageofthesurveydesign,andthatthoseclustersarefurthermorenotidentiedinthedataset(e.g.aswithmanypublic-usedatasets).Inthesecases,themethodologywedescribedwouldnotbepossibletoimplement. 62

PAGE 63

CHAPTER4ESTIMATINGTHEEFFECTOFCLUSTER-LEVELADHERENCEONANINDIVIDUALBINARYOUTCOMEWITHACOMPLEXSAMPLINGDESIGN 4.1OutlineInthischapter,wewishtoestimatetheeffectofschool-leveladherenceonindividualabsenteeisminthecontextofaschool-basedwater,sanitation,andhygieneinterventioninWesternKenya.Schoolswithinstrataweredisproportionatelysampledandrandomizedtooneofthreeinterventions.Next,studentswithinschoolsweredisproportionatelysampledandmeasuredforoutcomessuchasabsenteeism.Weusedoubleinverse-probabilityweightingtoadjustforthedisproportionatesamplingandtheassociationofindividual-levelconfounderswithrandomization.Wedevelopandapplymethodsbasedonstructuralnestedmodelstoestimateeffectsofadherenceassessedintermsofrelativerisks,usingschool-levelrandomizationasaninstrumentalvariableandusingthedoubleweightstoadjustforcomplexsamplingandindividual-levelconfounding. 4.2IntroductionInthischapter,wewanttoestimatetheeffectofinterventionarmstheschoolreceivedonthepupilabsenteeisminacluster-randomizedtrialofschool-basedwater,sanitationandhygiene(SWASH)projectconductedinWesternKenya,from2007-2008.Thestudywasconductedin86schoolswithintwoareasinWesternKenya,RachuonyoDistrictandSubaDistrict.The86schoolswithintwoareasweredisproportionatelysampledandrandomizedtooneofthreeinterventions;theinterventionarmsare 1. HP&WT:theschoolswillbeprovidedwithhygienepromotionsandwatertreatment;herethehygienepromotionsisconsideredasprovidingsoap. 2. HP&WT+sanitation:theschoolswillnotonlybeprovidedwithhygienepromotionsandwatertreatment,butalsosanitationintervention,suchaslatrineconstruction. 3. Control:theschoolswillbeprovidednointerventionsaboveduringthestudyperiod.Theywillreceivealltheinterventionpackageattheendofthestudy. 63

PAGE 64

Studentswithinschoolsweredisproportionatelysampledintothestudy.Thenthestaffofthestudyrecordedwhetherstudentswereabsentfromschoolonthedayofmeasurementalongwithsomeindividual-levelandhousehold-levelcharacteristics.Thus,thenaldatasetconsistsof1029pupilsfrom86schoolswithinRachuonyoDistrictandSubaDistrictinWesternKenya.Inthischapter,wewouldliketoapplytheinstrumentalvariablemethodwithastructuralnestedmodelonSWASHproject. 4.2.1InstrumentalVariableThemethodofinstrumentalvariable(IV)iswidelyusedinstatistics,econometrics,epidemiology,geneticsandrelateddisciplines.TheconceptionofIVswasrstinventedbyWright[ 66 ]in1928(See[ 59 ]and[ 3 ]).Wrightinvestigatedtheelasticitiesofsupplyanddemandforproduct,likeaxseed(thesourceoflinseedoil).Inthebook,heexplainedhowtousecurveshifters(nowcalledIV)toaddresstheexistenceofconfounding:Suchadditionalfactorsmaybefactorswhich(A)affectdemandconditionswithoutaffectingcostconditionsorwhich(B)affectcostconditionswithoutaffectingdemandconditions.IVmethodcouldbeusedtoadapttotheobservationalstudytoadjustforunmeasuredconfounding.Comparatively,mostmethodsareusedtocontrolformeasuredconfounders,whichareknownandincludedinthemodelaspartofthecovariates.Afteritsrstinvention,IVmethodwasmainlyusedinmacroeconomicsandmicroeconomics.Inrecentdecades,IVmethodsbegantoentertheareaofhealthsciencetoestimatethecausaleffectoftreatmentorgeneontheoutcomeadjustingforunmeasuredconfounding.Angristetal.[ 4 ]outlinedtheframeworkofcausalinferenceusingIVinthesettingofrandomizedtreatmentdesignwhencomplianceisnotignorable.Greenland[ 21 ]gavetheepidemiologistsabriefintroductionofIVmethodandassumptionwithacausaldiagramfortreatmentrandomizationexperiment.HernanandRobins[ 30 ]discussedIVmethodappliedtoestimateparametersofseveralcausalmodels,includingstructuralequationmodelsandstructuralmeanmodels.Ingeneticseld,IVmethodis 64

PAGE 65

usuallycalledMendelianrandomization[ 65 ]andhasbeengrowninrecent10years.Recently,Glymouretal.[ 18 ]discussedapproachestoevaluateassumptionsforvalidIVassociatedwithagenomewideassociationstudy.InapplicationofIVmethod,manyresearchersconducttwo-stageleastsquare(2SLS)withtheinstrument,endogenousvariableandtheoutcome(See[ 64 ],[ 60 ],[ 50 ],[ 13 ],[ 5 ]and[ 6 ]).Justasthenameimplies,oneconducts2SLSIVbyapplyingstandardordinaryleastsquares(OLS)intwostages.Intherststage,oneconductsOLSregressionoftheendogenousvariableontheinstrumentandestimatethepredictedendogenousvariableforeachsubject.Inthesecondstage,oneappliesOLSregressionoftheoutcomeonthepredictedendogenousvariable.TheestimatedparameterinthesecondstageistheIVestimator.However,the2SLSIVdoesn'talwayswork.The2SLSIVcanonlybeappliedonsomelinearcausalmodels.Forexample, Y=X+U1X=Z+U2 (4) whereYistheoutcome,Zistheinstrumentvariable,Xistheendogenousvariable.ZisuncorrelatedwiththeerrorU1andU2(exclusiveassumption);andZiscorrelatedwithX.BunandWindmeijer[ 16 ]comparedthebiasof2SLSestimatorunderseveralscenarioswithdifferentnumberandstrengthofinstruments.The2SLSestimatorcanperformpoorlywithweakinstrumentsormanyinstrumentsorbothonnitesamples.TheadvantageofIVisobviousinthatitcanbeusedtoadjustforunmeasuredconfoundingforobservationalstudies.SeveralarticlescomparedIVmethodswithothercommonlyusedmethodsinobservationalstudies,includingintenttotreatanalysis,as-treatanalysis,per-protocalanalysis([ 38 ]and[ 61 ]).ManyresearchersrealizethattheIVmethodreliesstronglyontheassumptionsanddiscussthelimitationofIV(See[ 11 ]and[ 42 ]). 65

PAGE 66

Z// A// YUOO >> Figure4-1. DAGrepresentsthecausalrelationshipbetweenvariables. Next,wewilldescribetheIVassumptionsandaparticularcausaldirectedacyclicgraph(DAG)mostlyrepresentedwithIVmethod.DeneUistheunmeasuredconfounder,Zistherandomizationofthetreatment(instrument),Aistheendogenousvariable(suchascompliance)andYistheoutcomeweobserved.Thenwecanrepresenttherelationshipbetweenthevariablesusingadirectedacyclicgraph(DAG)asFigure 4-1 [ 21 ].TheassumptionsofIVisstatedin[ 24 ] 1. ZisindependentofU; 2. ZisassociatedwithA; 3. ZisindependentofYgivenUandA.WewilldiscussthemodiedassumptionswithregardtoapplicationonSchool-basedWater,Sanitation,andHygiene(SWASH)projectinthelatersection. 4.2.2StructuralNestedModelsStructuralnestedmodelswererstdevelopedbyRobins[ 56 ],whereitcanbeadaptedwithmanylinkfunctions,suchaslinearlink,logitlink.VansteelandtandGoetghebeur[ 62 ]extendedandappliedstructuralnestedmodelstosomeinstrumental-variablessettings,butnotincludingcomplexsamplingdesigns.Belowarethenotationswewilluseinthefuture. i:clusterinthedata,i=1,,m; j:individualjwithintheschooli,j=1,,ni; Yij:binaryindividual-leveloutcome; Zi:clusterlevelrandomization; 66

PAGE 67

Ai:clusterleveladherence;itsdegreeoffreedomisnogreaterthanthedegreeoffreedomofZi;otherwise,thestructuralnestedmodelmustonlyhaslessnumberofparameters. Ui:cluster-levelunmeasuredconfounders; Xij:individual-levelconfounders; Yij(a):potentialoutcometoadherencewithAi=a;WhentheclusterlevelofadherencevariableAiisonedimensional(dichotomous),astructuralnestedmodelinvolvespotentialoutcomesandthesimplelinearmodelis E(Yij(a)jAi=a,Zi)=E(Yij(0)jAi=a,Zi)+a, (4) whereE(Yij(a))]TJ /F5 11.955 Tf 11.96 0 Td[(Yij(0)jAi=a)=a,especiallywhena=1,istheriskdifferenceamongindividualswithinobservedadherenceequaltoa.HereweassumethattheclusterlevelrandomizationZihasnodirecteffectonindividualbinaryoutcomeYijexceptthroughclusterleveladherenceAi,whichisalsotheexclusionassumptionconsideredbyAngristetal.[ 4 ](referFigure 4-1 ).ThelinearformstructuralnestedmodelworkswellforsomecasesofdichotomousAi;wecanuselinearstructuralnestedmodelsevenwithbinaryoutcomes,providedthatthemodelissaturatedasafunctionofadherence.ButforsomechoicesofAi,model( 4 )mightbepoorlyspecied,andwewillturntouselogisticorlogarithmicstructuralnestedmodels.IntheSWASHstudy,Aiisordinal,thenthemodelmightpredictthatE(Yij(0)jAi=a)isnegativeorgreaterthanone,whichisundesirablewithbinaryoutcomeYij.Inthiscase,weappealtoalinkfunctionasasolutiontothisproblem.Hereinthiscaseweconsideredthelogitlinkinsteadofthelinearlinkinmodel( 4 ).Beforesetupthelogitstructuralnestedmodel,wedenoteastheeffectofadherences,andlet0=E(Yij(0)).Alsolet(Ai;Zi;)beaparametricmodelforE(YijjAi;Zi)withparameter;herewelet(Ai;Zi;)=h)]TJ /F9 7.97 Tf 6.59 0 Td[(1(Ai1+Zi2+AiZi3),wherethelinkish(p)=log(p 1)]TJ /F6 7.97 Tf 6.59 0 Td[(p)thelogittransformandAiZirepresentsamultidimensionalinteraction. 67

PAGE 68

X// $$ Z// A// YUOO >> Figure4-2. DAGrepresentsthecausalrelationshipbetweenvariables(withindividuallevelconfounders). Thenthestructuralnestedmodelcouldbesetupas h(E(Yij(a)jAi=a;Zi))=h(E(Yij(0)jAi=a;Zi))+a; (4) whererepresenttheeffectsofadherencesexpressedasthelogoddsratios. 4.3MethodsHereinthischapter,wewouldliketoshowhowtoadaptthestructuralnestedmodeltoinstrumental-variablesettingwithcomplexsurveydata. 4.3.1EstimationwithcomplexsamplingdesignsIntheSWASHstudy,weconsidertheindividuallevelcovariatesXij,whichmayhaveconfoundingeffectonestimatingtheeffectofinterventionarmreceptiononthebinaryoutcome.Thereforeweconstructconfoundingweightswcijtoadjusttheeffectofindividuallevelconfounders.AssumethatthetreatmentarmisindependenttothepotentialbaselineoutcomegivenindividuallevelconfoundersXij,i.e.ZiqYij(0)jXij.SoXijformasufcientsetofconfounders.ThentheconfoundingweightswcijisestimatedastheinverseoftheprobabilitythatindividualjreceivestreatmentZigiventheindividuallevelconfoundersXij.ThereforewecanuseanewDAG(seeFigure 4-2 )includeindividuallevelconfoundersXijtorepresentthecausalrelationship.ThenthethreeIVassumptionsregardingthenewDAGcanbeexpressedas 1. ZisindependentofU; 2. ZisassociatedwithA; 3. ZisindependentofYgivenUandXandA. 68

PAGE 69

Formodel( 4 ),wecanthenestimatebysolvingthefollowingestimatingequations XiXjwijZTi(h)]TJ /F9 7.97 Tf 6.59 0 Td[(1(h((Ai,Zi;)))]TJ /F5 11.955 Tf 11.95 0 Td[(Ai))]TJ /F7 11.955 Tf 11.95 0 Td[(0)=0, (4) XiXjwijDTi(Yij)]TJ /F7 11.955 Tf 11.95 0 Td[((Ai;Zi;))=0, (4) whereDTiisafunctionofAiandZi,e.g.(Ai;Zi;AiZi)T,andwijisthedoubleweightforindividualjwithinclusteri,whichisproductofthecomplexsamplingweightandtheconfoundingweightwcij.Assumethatthewij,thestructuralnestedmodeland()iscorrectlyspecied,thentheseestimatingequationsgenerateaconsistentestimatorof,becauseYij(0)isindependentofZiassumingideal(noconfoundingandequalprobability)sampling,andEAijZi)]TJ /F5 11.955 Tf 5.48 -9.68 Td[(h)]TJ /F9 7.97 Tf 6.58 0 Td[(1(h(E(Yij(a)jAi=a,Zi)))]TJ /F5 11.955 Tf 11.95 0 Td[(Ai)=E(Yij(0)jZi)=0.UsingTaylorexpansion,theexpitlinkh)]TJ /F9 7.97 Tf 6.59 0 Td[(1canbelinearizedash)]TJ /F9 7.97 Tf 6.58 0 Td[(1(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai(n+1))h)]TJ /F9 7.97 Tf 6.58 0 Td[(1(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai(n))+@h)]TJ /F9 7.97 Tf 6.58 0 Td[(1(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai) @j=(n)((n+1))]TJ /F7 11.955 Tf 11.95 0 Td[((n))=Yi)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai(n+1)whereYi=expit(li)]TJ /F5 11.955 Tf 12.41 0 Td[(Ai(n))+exp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai(n)) (1+exp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai(n)))2Ai(n)andAi=exp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai(n)) (1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai(n)))2Ai.Afterlinearization,equation( 4 )canbeformedasXiXjwijZTi(Yi)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai)]TJ /F7 11.955 Tf 11.96 0 Td[(0)=0.Thenwithupdatedand0,theequation( 4 )canbesolvedusingiterativestandardweightedinstrumentalvariables(IV)estimation.Equation( 4 )canbesolvedusingweightedgeneralizedlinearmodelregression. 69

PAGE 70

Next,weestimateE(Yij(a)jAi=a,Zi)bythemeanofindividualoutcomeYijconditionallyonZiandAi,whichis^YfAi=a,Zi=zg(a)=1 PiPjwijIfAi=a,Zi=zgXiXjwijYijIfAi=a,Zi=zg.ThenwederivetheestimatorofE(Yij(0)jAi=a,Zi)usingtheestimated^,thestructuralnestedmodelandtheestimatorofE(Yij(a)jAi=a,Zi),i.e.^Yij(0)=h)]TJ /F9 7.97 Tf 6.59 0 Td[(1(h(Yij))]TJ /F5 11.955 Tf 11.96 0 Td[(a)and^YfAi=a,Zi=zg(0)=1 PiPjwijIfAi=a,Zi=zgXiXjwij^Yij(0)IfAi=a,Zi=zg.WeusetherelationshipE(Yij(a)jAi=a)=EZijAiE(Yij(a)jAi=a,Zi)andE(Yij(0)jAi=a)=EZijAiE(Yij(0)jAi=a,Zi)toestimatetherelativerisk(RR)andrelativedifference(RD)foreachlevelofAii.e. RR(Ai=a)=E(Yij(a)jAi=a) E(Yij(0)jAi=a), (4) RD(Ai=a)=E(Yij(a)jAi=a))]TJ /F5 11.955 Tf 11.96 0 Td[(E(Yij(0)jAi=a). (4) PleasereferthedetailedcalculationinAppendix D 4.3.2VarianceEstimationThemethodabovemainlyfocusonsolvinganunbiasedestimatingequationofformU()=PPp=1CpPc=1Upc()=0,whereisavectorofparameterofinterest,cindexesgroupsthatplaytheroleofprimarysamplingunits(inSWASHstudy,werefercasschools),pindexestheprimarystrata(intheSWASHstudy,wereferpasthetwodistrictareas),andUpc()isasumofweightedestimatingequationswiththeweightedcomponents 70

PAGE 71

eachhavingazeroexpectation.LetrU(^)bethegradientofU(^),andlet=CpXc=1(Upc(^))]TJ /F5 11.955 Tf 11.96 0 Td[(Up(^))(Upc(^))]TJ /F5 11.955 Tf 11.95 0 Td[(Up(^))T,V(^)=PXp=1(Cp=(Cp)]TJ /F4 11.955 Tf 11.96 0 Td[(1)),whereUp(^)isthemeanoffUp1(^),,UpCp(^)g.ThenweusesandwichestimatorbasedonTaylorseries(Binder[ 9 ])asestimationofthevarianceVar() ^Var(^)=[rU(^)])]TJ /F9 7.97 Tf 6.58 0 Td[(1V(^)[rU(^)T])]TJ /F9 7.97 Tf 6.58 0 Td[(1. (4) Withtheestimatedvarianceoftheparameters^Var(^),wecanestimatethevarianceoftherelativerisk^Var(RR(Ai=1))and^Var(RR(Ai=1))throughDeltamethod[ 17 ].Thenthecondenceintervalsfortherelativeriskswouldbeeasytoestimate.PleasereferthedetailedvarianceestimationcalculationinAppendix E 4.4SimulationStudyInthissection,wewilldiscussthevalidityofIVonstructuralnestedmodel. 4.4.1SimulationwithstronginstrumentSupposethesizeofthedatasetis10,000.LetZbea3-categoryvariablewithequalprobabilitiesofvalue0,1and2.Thenwegenerateanother3-categoryvariableAdependonZwithunequalprobabilities.ThatisForZ=0, ValueofAProbability 03/411/821/8 ForZ=1, 71

PAGE 72

ValueofAProbability 01/813/421/8 ForZ=2, ValueofAProbability 01/811/823/4 Thenthestructuralnestedmodelcanbesetupasthemodelbelowlogit(E(Y(a)jA=a,Z))=logit(E(Y(0)jA=a,Z))+a,whereweset=log2.TheprobabilitiesofY(0)canbesetupasthe33table, Table4-1. DistributionofY(0)jA=a,Z. P(Y(0)jA=a,Z)Z=0Z=1Z=2 A=01/51/41/3A=11/41/51/4A=01/31/31/5 AccordingthedistributioninTable 4-1 ,wecanprovethatY(0)qZ,i.e.E(Y(0)jZ=0)=1 53 4+1 41 8+1 3+1 8=0.6396,E(Y(0)jZ=1)=1 41 8+1 53 4+1 3+1 8=0.6396,E(Y(0)jZ=2)=1 31 8+1 41 8+1 5+3 4=0.6396.ThereforeE(Y(0)jZ=0)=E(Y(0)jZ=1)=E(Y(0)jZ=2).Then,usingthestructuralnestedmodel,the33tablefortheprobabilityofY(a)is 72

PAGE 73

Table4-2. DistributionofY(a)jA=a,Z. P(Y(a)jA=a,Z)Z=0Z=1Z=2 A=01/51/41/3A=12/51/32/5A=22/32/31/2 ThereforewecangeneratethebinaryoutcomeYaccordingtoTable 4-2 andthencalculatetheconditionalexpectationsbyTable 4-1 andTable 4-2 .E(Y(0)jA=1)=1 41 8+1 53 4+1 41 8=0.2125E(Y(0)jA=2)=1 31 8+1 31 8+1 53 4=0.2333E(Y(1)jA=1)=2 51 8+1 33 4+2 51 8=0.35E(Y(2)jA=2)=2 31 8+2 31 8+1 23 4=0.5417ThereforetheRRsandRDsforeachlevelofAareRR(A=1)=E(Y(1)jA=1) E(Y(0)jA=1)1.647RR(A=2)=E(Y(2)jA=2) E(Y(0)jA=2)2.321RD(A=1)=E(Y(1)jA=1))]TJ /F5 11.955 Tf 11.96 0 Td[(E(Y(0)jA=1)=0.1375RD(A=2)=E(Y(2)jA=2))]TJ /F5 11.955 Tf 11.96 0 Td[(E(Y(0)jA=2)=0.3083Werepeatthesimulation100times.ThemeanoftheestimatedRRsandRDsareshowninTable 4-3 Table4-3. SimulationresultsforIVmethodonastructuralnestedmodelwithstronginstrument. RR(A=1)RR(A=2)RD(A=1)RD(A=2) Truevalue1.6472.3210.13750.3083IVestimator1.6572.3310.13790.3079 Comparingthesimulationresults(seeTable 4-3 ),theestimatorsarequiteclosetothetruevalues.WecanseethatIVonthestructuralnestedmodelwithcomplexsurveydataperformsverywell. 73

PAGE 74

4.4.2SimulationwithweakinstrumentSimilarasSection 4.4.1 ,supposethesizeofthedatasetis10,000.LetZbea3-categoryvariablewithequalprobabilitiesofvalue0,1and2.Thenwegenerateanother3-categoryvariableAdependonZwithunequalprobabilities.ThatisForZ=0, ValueofAProbability 01/611/321/2 ForZ=1, ValueofAProbability 01/211/621/3 ForZ=2, ValueofAProbability 01/311/221/6 TheprobabilitiesofY(0)andY(a)canbesetupasthesameonesinSection 4.4.1 (SeeTable 4-1 andTable 4-2 ).SimilarlywecanproveY(0)qZ,i.e.E(Y(0)jZ=0)=1 51 6+1 41 3+1 3+1 2=0.2833,E(Y(0)jZ=1)=1 41 3+1 51 6+1 3+1 2=0.2833,E(Y(0)jZ=2)=1 31 2+1 41 3+1 5+1 6=0.2833.ThereforeE(Y(0)jZ=0)=E(Y(0)jZ=1)=E(Y(0)jZ=2). 74

PAGE 75

ThereforewecangeneratethebinaryoutcomeYaccordingtoTable 4-2 andthencalculatetheconditionalexpectationsbyTable 4-1 andTable 4-2 .E(Y(0)jA=1)=1 41 3+1 51 6+1 41 2=0.2417E(Y(0)jA=2)=1 31 2+1 31 3+1 51 6=0.3111E(Y(1)jA=1)=2 51 3+1 31 6+2 51 2=0.3889E(Y(2)jA=2)=2 31 2+2 31 3+1 21 6=0.6390ThereforetheRRsandRDsforeachlevelofAareRR(A=1)=E(Y(1)jA=1) E(Y(0)jA=1)1.609RR(A=2)=E(Y(2)jA=2) E(Y(0)jA=2)2.054RD(A=1)=E(Y(1)jA=1))]TJ /F5 11.955 Tf 11.96 0 Td[(E(Y(0)jA=1)0.147RD(A=2)=E(Y(2)jA=2))]TJ /F5 11.955 Tf 11.96 0 Td[(E(Y(0)jA=2)0.328Werepeatthesimulation100times.ThemeanoftheestimatedRRsandRDsareshowninTable 4-4 Table4-4. SimulationresultsforIVmethodonastructuralnestedmodelwithweakinstrument. RR(A=1)RR(A=2)RD(A=1)RD(A=2) Truevalue1.6092.0540.1470.328IVestimator1.8312.4540.1720.376 Comparingthesimulationresults(seeTable 4-4 ),wecanseethatthebiasesarerelativelysmall.WeconcludethatIVonthestructuralnestedmodelwithweakinstrumentalsoperformswell. 4.5ResultsonSchool-basedWater,Sanitation,andHygiene(SWASH)ProjectFirstweusethenotationsbelowtoindexthevariablesusedforanalysisinthestudy. i:schoolsintheSWASHstudy,i=1,,86; 75

PAGE 76

j:individualpupilwithintheschooli; Yij:pupilabsenteeism,whichisanbinaryindividual-leveloutcome; Zi:interventionarms,whichisclusterlevelrandomization.Weuse0forthecontrolgroup,1forHP&WTgroup,2forHP&WT+sanitationgroup; Ai:adherencetotheintervention,whichconsideredasclusterleveladherence.HeretheadherenceisnotassignedaccordingtotheZi.Sincesomeschoolswereobservedwithsafewaterbutnosoap,wesimplyuse0referstheschoolshadlowestlevelsofsafewater,soapandsanitation(betterlatrineconstruction),1refersthegroupthathaveonlyonetreatment,safewaterorsoaporlatrineconstruction,while2refersthegroupthathaveatleasttwotreatments;howtoconstructAiwillberepresentedinthenextparagraph; Ui:cluster-levelunmeasuredconfounders; Xij:individual-levelconfounders,includingeducationallevel,age,gender; wij:thedoubleweightforindividualjwithinclusteri,whichisproductofthecomplexsamplingweightandtheconfoundingweight.Theconfoundingweightisaweightrepresentingtheestimatedinverseprobabilityoftreatmentasafunctionofindividual-levelcovariates.NotethatZiandAiarethreelevelcategoryvariables.WegeneratethreenewbinaryvariablesC1,C2andC3,withC1representingwhethersoapwassuppliedonthedayofmeasurement,C2representingwhethertheschoolhadasatisfactoryratiooflatrinestostudentsonthedayofmeasurementandC3indicatingiftherewassafewatersuppliedonthedayofmeasurement.Nextwecategorizethecluster-leveladherenceintothreelevelsrepresentedbyAi.SetAi=0ifC1+C2+C3=0,Ai=1ifC1+C2+C3=1,andAi=2ifC1+C2+C3=2or3.WethengeneratetwobinaryvariablescorrespondingtoAi,i.e.Ai1=1ifAi=1andAi2=1ifAi=2.Similarly,setZi1=1ifZi=1andZi2=1ifZi=2.Wemodeltheadherenceeffectsusingtwodimensions,i.e.=(1,2)T.Thereforewechangethemulti-dimensionvariablesZiandAitoagroupofmultivariables(Zi1,Zi2,Ai1andAi1)withonedimension.NowweapplytheIVmethodwiththelogitformstructuralnestedmodeltotheSWASHstudy.Weestimateinthestructuralnestedmodelbysolvingtheestimating 76

PAGE 77

equationsusingSASsoftware(Version9.2).Theestimatesare^1=1.3474and^2=0.4749.Theestimatedrelativerisksare^RR(Ai=1)=0.4086and^RR(Ai=2)=0.6938.Thecorresponding95%condenceintervalsare(0.19,0.86)and(0.41,1.17). 4.6ConclusionsWedevelopedandimplementedamethodologytoestimatetheeffectofschool-leveladherenceonindividualabsenteeismintheSWASHtrialinWesternKenya.Specically,weextendedinstrumentalvariable(IV)methodsbasedonstructuralnestedmodelstoaccommodatedatafromcomplexsamplingdesigns.Wealsoconsideredordinalcomplianceinathree-armedtrial;mostpreviousresearchconsidersbinarycomplianceandtwo-armedtrials.Ourresultsindicatethattheeffectofadoptingexactlyoneofthethreeprovisions(soap,sanitation,orsafewater)istodecreasetheriskofabsenteeismby41%versusadoptingnoprovision,withinthegroupofschoolsthatadoptedexactingoneprovision;with95%condencelevel,theriskliesinthecondenceinterval(19%,86%).Further,theeffectofadoptingtwoormoreofthethreeprovisionsistodecreasetheriskofabsenteeismby70%versusadoptingnoprovision,withinthegroupofschoolsthatadoptedtwoorthreeprovisions;with95%condencelevel,theriskliesinthecondenceinterval(41%,117%).Thus,ourresultssuggestthattheprovisionsareeffectiveinreducingabsenteeism. 77

PAGE 78

APPENDIXAPROOFOFLEMMA1(CHAPTER2)Lemma1.ForCi6=0p1, limk!1Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.96 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)CTi Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)CTi=eCTi 2 1+eCTi 2. (A) Proof.Lety=CTiandgk(y)=Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)y.Thelefthandsideof( A )maybeexpressedas limk!1Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.96 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)y Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)y=limk!1g0k(y) kgk(y).(A)Nowletx=(1+ey)=(1)]TJ /F5 11.955 Tf 11.96 0 Td[(ey).Clearly,jxj>1foranyyand gk(y)=(1+x))]TJ /F6 7.97 Tf 6.58 0 Td[(kkXr=0kr2k(x)]TJ /F4 11.955 Tf 11.95 0 Td[(1)(k)]TJ /F6 7.97 Tf 6.59 -.01 Td[(r)(x+1)r=2k(1+x))]TJ /F6 7.97 Tf 6.58 -.01 Td[(kPk(x),(A)and g0k(y)=2k(1+x))]TJ /F6 7.97 Tf 6.58 0 Td[(kn)]TJ /F5 11.955 Tf 9.3 0 Td[(k 1+xPk(x)+P0k(x)odx dy,(A)wherePk(x)=2)]TJ /F6 7.97 Tf 6.58 0 Td[(kPkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2k(x)]TJ /F4 11.955 Tf 12.51 0 Td[(1)(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)(x+1)ristheLegendrepolynomialofdegreek.HencewehaveTheLegendrepolynomialPk(x)hasthefollowingwellknownpropertiesforalljxj>1: (a) P2k(x))]TJ /F5 11.955 Tf 11.95 0 Td[(Pk+1(x)Pk)]TJ /F9 7.97 Tf 6.59 0 Td[(1(x)0, (b) (k+1)Pk+1(x))]TJ /F4 11.955 Tf 11.95 0 Td[((2k+1)xPk(x)+kPk)]TJ /F9 7.97 Tf 6.59 0 Td[(1(x)=0(Bonnet'srecursionformula), (c) (1)]TJ /F5 11.955 Tf 11.95 0 Td[(x2)P0k(x)=(k+1)fxPk(x))]TJ /F5 11.955 Tf 11.96 0 Td[(Pk+1(x)g.SincePk(x)>0forx>1andPk(x)=()]TJ /F4 11.955 Tf 9.3 0 Td[(1)kPk()]TJ /F5 11.955 Tf 9.3 0 Td[(x)forx<)]TJ /F4 11.955 Tf 9.3 0 Td[(1,wemayeasilyseefrom(a)thatPk+1(x)=Pk(x)Pk(x)=Pk)]TJ /F9 7.97 Tf 6.59 0 Td[(1(x)>0,forx>1andPk+1(x)=Pk(x)Pk(x)=Pk)]TJ /F9 7.97 Tf 6.58 0 Td[(1(x)<0,forx>)]TJ /F4 11.955 Tf 9.3 0 Td[(1.From(b),weseethatPk+1(x) Pk(x)=2k+1 k+1x)]TJ /F5 11.955 Tf 23.6 8.09 Td[(k k+1Pk)]TJ /F9 7.97 Tf 6.59 0 Td[(1(x) Pk(x)8><>:<2xifx>1>2xifx<)]TJ /F4 11.955 Tf 9.29 0 Td[(1, 78

PAGE 79

whichmeansthesequencePk+1(x)=Pk(x)isanon-decreasingpositivesequencewithupperbound2xifx>1andanon-increasingnegativesequencewithlowerbound2xifx<)]TJ /F4 11.955 Tf 9.3 0 Td[(1.LetaP=limk!1Pk+1(x)=Pk(x).Dividetheequality(b)bykPk(x)andletk!1,wehavethataP)]TJ /F4 11.955 Tf 11.96 0 Td[(2x+1=aP=0givesaP=xp x2)]TJ /F4 11.955 Tf 11.96 0 Td[(1.Thatis,thesequencePk+1(x)=Pk(x)convergestolimk!1Pk+1(x) Pk(x)=8><>:x+p x2)]TJ /F4 11.955 Tf 11.95 0 Td[(1ifx>1x)]TJ 11.95 9.88 Td[(p x2)]TJ /F4 11.955 Tf 11.95 0 Td[(1ifx<)]TJ /F4 11.955 Tf 9.29 0 Td[(1.Asaresult,thelimitationin( A )maybewrittenaslimk!1g0k(y) kgk(y)=limk!1)]TJ /F6 7.97 Tf 6.59 0 Td[(k 1+xPk(x)+P0k(x)dx dy kPk(x)=)]TJ /F4 11.955 Tf 9.3 0 Td[(1 1+x+1 x2)]TJ /F4 11.955 Tf 11.96 0 Td[(1nlimk!1Pk+1(x) Pk(x))]TJ /F5 11.955 Tf 11.95 0 Td[(xox2)]TJ /F4 11.955 Tf 11.95 0 Td[(1 2=8><>:1 2(1)]TJ /F5 11.955 Tf 11.96 0 Td[(x+p x2)]TJ /F4 11.955 Tf 11.96 0 Td[(1)ifx>11 2(1)]TJ /F5 11.955 Tf 11.96 0 Td[(x)]TJ 11.96 9.88 Td[(p x2)]TJ /F4 11.955 Tf 11.96 0 Td[(1)ifx<)]TJ /F4 11.955 Tf 9.3 0 Td[(1=ey=2 1+ey=2=eCTi=2 1+eCTi=2. 79

PAGE 80

APPENDIXBPROOFOFLEMMA2(CHAPTER2)Lemma2.Withoutlossofgenerality,supposeCis>0,s=1,,p.Whens0,Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.96 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)Ciss Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)Cissisnon-increasingwithk!1;whens<0,Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.95 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.58 0 Td[(r)Ciss Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ke(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)Cissisnon-decreasingwithk!1.Proof.FromtheargumentinAppendix A ,since Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2(k)]TJ /F5 11.955 Tf 11.95 0 Td[(r)e(k)]TJ /F6 7.97 Tf 6.59 0 Td[(r)CTi Pkr=0)]TJ /F6 7.97 Tf 5.48 -4.38 Td[(kr2ke(k)]TJ /F6 7.97 Tf 6.58 0 Td[(r)CTi=)]TJ /F4 11.955 Tf 9.3 0 Td[(1 1+x+k+1 k(x2)]TJ /F4 11.955 Tf 11.96 0 Td[(1)nPk+1(x) Pk(x))]TJ /F5 11.955 Tf 11.96 0 Td[(xox2)]TJ /F4 11.955 Tf 11.95 0 Td[(1 2,(B)wherex=(1+eCTi)=(1)]TJ /F5 11.955 Tf 11.54 0 Td[(eCTi),themonotonicityof( B )isthesameasthesequencehk(x)=k+1 knPk+1(x) Pk(x))]TJ /F5 11.955 Tf 11.96 0 Td[(xo.Afterafewalgebras,wehavehk(x))]TJ /F5 11.955 Tf 11.96 0 Td[(hk+1(x)=k+1 knPk+1(x) Pk(x))]TJ /F5 11.955 Tf 11.95 0 Td[(xo)]TJ /F5 11.955 Tf 18.03 8.08 Td[(k+2 (k+1)nPk+2(x) Pk+1(x))]TJ /F5 11.955 Tf 11.96 0 Td[(xo=k+1 knPk+1(x) Pk(x))]TJ /F5 11.955 Tf 11.95 0 Td[(xo)]TJ /F5 11.955 Tf 15.38 8.09 Td[(k+2 (k+1)h1 k+2f(2k+3)xPk+1(x))]TJ /F4 11.955 Tf 11.96 0 Td[((k+1)Pk(x)g Pk+1(x))]TJ /F5 11.955 Tf 11.96 0 Td[(xi=(k+1)P2k+1(x))]TJ /F4 11.955 Tf 11.95 0 Td[((2k+1)xPk(x)Pk+1(x)+kP2k(x) kPk(x)Pk+1(x)=P2k(x))]TJ /F5 11.955 Tf 11.95 0 Td[(Pk+1(x)Pk)]TJ /F9 7.97 Tf 6.59 0 Td[(1(x) Pk(x)Pk+1(x),wherethesecondandthirdequalitiesareduetotheproperty(b)inAppendix A .Nowbecauseoftheinequality(c)inAppendix A andthefactthatPk(x)Pk+1(x)>0forx>1andPk(x)Pk+1(x)=()]TJ /F4 11.955 Tf 9.3 0 Td[(1)2k+1Pk()]TJ /F5 11.955 Tf 9.29 0 Td[(x)Pk+1()]TJ /F5 11.955 Tf 9.3 0 Td[(x)<0forx<)]TJ /F4 11.955 Tf 9.3 0 Td[(1,weseethathk(x))]TJ /F5 11.955 Tf 12.56 0 Td[(hk+1(x)0forx>1and0forx<)]TJ /F4 11.955 Tf 9.3 0 Td[(1.Thatis,thesequencein( B )isnon-decreasingifx>1orCTi<0andnon-increasingifx<)]TJ /F4 11.955 Tf 9.3 0 Td[(1orCTi>0. 80

PAGE 81

APPENDIXCSASCODE(CHAPTER3)AllprogrammingwasdoneusingSASsoftware(version9.2).WeusedSASPROCSQLtocreatethepaireddataset.Specically, procsql;createtablematchasselectone.hhx,one.wtiaaswtiapairone.strat,one.psu,(one.educ-two.educ)aseducd,(one.cover-two.cover)ascoverdfromnhis2009.personcleanone,nhis2009.personcleantwowhere(one.hhx=two.hhxandone.cover>two.cover);Intheabovecode,nhis2009.personcleanistheNHISdatasetthathasbeencleanedinthesenseofdeletingobservationswithmissingdataforeducationorcoverage.Thevariablehhxidentiesanindividualshousehold.ThevariablewtiaistheinterimannualweightintheNHISdata,anditisidenticalforindividualswithinthesamehousehold.ThevariablesstratandpsuaretheNHISstrataandPSUvariables,andeducandcoverarethevariableswecreatedfordichotomouseducationandhealthinsurancecoverage.Thenewdatasetwithpairedobservationsiscalledmatch,andithasvariableshhx,wtiapair,strat,psu,educd,andcoverd.Thevariableeducdisthedifferencededucationvariable,whereascoverdisequaltooneforeveryone.Tocreatesomeobservationswithoutcomesequaltozero,weusedtheSAScode datanhis2009.match;setmatch; 81

PAGE 82

ifhhx<1000thendo;coverd=0;educd=-educd;end;run;Finally,toimplementthelogisticregression,weusedtheSAScode procsurveylogisticdata=nhis2009.match;stratastrat;clusterpsu;modelcoverd(event=1)=educd/noint;weightwtiapair;run; 82

PAGE 83

APPENDIXDESTIMATIONOFINSTRUMENTALVARIABLEONSTRUCTURENESTEDMODELSSupposeZiandAiare3-categoryvariablesinmodel( 4 ).FirstwegeneratetwonewvariablesforeachAiandZi,i.e.Ai1,Ai2,Zi1andZi2.Theywegeneratedasbelow Zi1=0andZi2=0forZi=0; Zi1=1andZi2=0forZi=1; Zi1=0andZi2=1forZi=2; Ai1=0andAi2=0forAi=0; Ai1=1andAi2=0forAi=1; Ai1=0andAi2=1forAi=2.(Ai,Zi;)isdenedasaparametricmodelforE(YijjAi,Zi)withparameter,whereinChapter 4 itisdenedas (Ai,Zi;) (D) =h)]TJ /F9 7.97 Tf 6.59 0 Td[(1(Ai1+Zi2+AiZi3)=expit(10+Ai111+Ai212+Zi121+Zi222+Ai1Zi131+Ai1Zi232+Ai2Zi133+Ai2Zi234),whereexpitmeanstheinversemappingoflogit,i.e.expit(x)=exp(x) 1+exp(x)=1 1+exp()]TJ /F6 7.97 Tf 6.59 0 Td[(x).Herewedenethelinearsuminequation( D )asli,therefore(Ai,Zi;)=expit(li).AssumethetreatmentarmisindependentofthepotentialbaselineoutcomegivenindividualconfounderXij,i.e.ZiqYij(0)jXij.Withdoubleweights(wij),theproductofthesamplingdesignweightsandaweightrepresentingtheestimatedinverseprobabilityoftreatmentasafunctionofindividual-levelconfounders,thepotentialbaselineoutcomeisindependentoftreatmentarm,ZiqYij(0). 83

PAGE 84

WeareinterestedinestimatingrelativeriskforeachlevelofAibyestimating,0,and.Wecanestimatetheparametersbysolvingtheestimatingequations XiXjwijZTi(h)]TJ /F9 7.97 Tf 6.59 0 Td[(1(h((Ai,Zi;)))]TJ /F5 11.955 Tf 11.95 0 Td[(Ai))]TJ /F7 11.955 Tf 11.95 0 Td[(0)=0, (D) XiXjwijDTi(Yij)]TJ /F7 11.955 Tf 11.95 0 Td[((Ai,Zi;))=0, (D) whereDTiisafunctionofAiandZi,e.g.(Ai,Zi,AiZi).Wealsoknowthat0=E(Yij(0))=E(Yij(0)jZi)=EAijZi[h)]TJ /F9 7.97 Tf 6.58 0 Td[(1(h(E(Yij(a)jAi=a,Zi)))]TJ /F5 11.955 Tf 11.95 0 Td[(Ai)].Writeequation( D )indetails, XiXjwij(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,XiXjwijAi1(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,XiXjwijAi2(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,XiXjwijZi1(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,XiXjwijZi2(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,XiXjwijAi1Zi1(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,XiXjwijAi1Zi2(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,XiXjwijAi2Zi1(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,XiXjwijAi2Zi2(Yij)]TJ /F5 11.955 Tf 11.96 0 Td[(expit(li))=0. (D) Thenwecanestimateusingweightedgeneralizedlinearmodel(wGLM)regressionmethod.WeobtaintheestimatorsusingsurveylogisticprocedureinSASsoftware(Version9.2). 84

PAGE 85

Nextwesubstitutethelogittransformforhinequation( D ),thenweget XiXjwijZTi(expit(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai))]TJ /F7 11.955 Tf 11.95 0 Td[(0)=0.(D)Writetheequationindetail,itturnsoutXiXjwij(expit(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai22))]TJ /F7 11.955 Tf 11.95 0 Td[(0)=0,XiXjwijZi1(expit(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai22))]TJ /F7 11.955 Tf 11.95 0 Td[(0)=0,XiXjwijZi2(expit(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai22))]TJ /F7 11.955 Tf 11.95 0 Td[(0)=0.Tolinearisetheexpitpartintheequationsabove,wedenef()=expit(li)]TJ /F5 11.955 Tf 12.27 0 Td[(Ai).Thegradientoff()is@f() @=exp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai)]2Ai.UsingTaylorexpansionwiththe(n)]TJ /F4 11.955 Tf 12.14 0 Td[(1)thknown(n)]TJ /F9 7.97 Tf 6.59 0 Td[(1),weapproximatelyhavef((n))f((n)]TJ /F9 7.97 Tf 6.59 0 Td[(1))+@f() @j=(n)]TJ /F17 5.978 Tf 5.76 0 Td[(1)((n))]TJ /F7 11.955 Tf 11.96 0 Td[((n)]TJ /F9 7.97 Tf 6.58 0 Td[(1))=expit(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai(n)]TJ /F9 7.97 Tf 6.58 0 Td[(1)))]TJ /F4 11.955 Tf 29.58 8.08 Td[(exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai(n)]TJ /F9 7.97 Tf 6.58 0 Td[(1)) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai(n)]TJ /F9 7.97 Tf 6.58 0 Td[(1))]2Ai((n))]TJ /F7 11.955 Tf 11.95 0 Td[((n)]TJ /F9 7.97 Tf 6.58 0 Td[(1))=yi)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai(n),whereyi=expit(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai(n)]TJ /F9 7.97 Tf 6.58 0 Td[(1))+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai(n)]TJ /F9 7.97 Tf 6.58 0 Td[(1)) [1+exp(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai(n)]TJ /F9 7.97 Tf 6.59 0 Td[(1))]2Ai(n)]TJ /F9 7.97 Tf 6.59 0 Td[(1)=mi 1+mi+mi (1+mi)2Ai(n)]TJ /F9 7.97 Tf 6.59 0 Td[(1)Ai=exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai(n)]TJ /F9 7.97 Tf 6.58 0 Td[(1)) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai(n)]TJ /F9 7.97 Tf 6.58 0 Td[(1))]2Ai=mi (1+mi)2Aimi=exp(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai(n)]TJ /F9 7.97 Tf 6.59 0 Td[(1)).Afterlinearization,equation( D )thenbecomes XiXjwijZTi(yi)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai)]TJ /F7 11.955 Tf 11.96 0 Td[(0)=0. (D) 85

PAGE 86

Insummary,thealgorithmforestimatingisdescriedasbelow 1. Setinitialvaluesof(0)0and(0);setj=0,let(j)0=(0)0,(j)=(0). 2. Obtainm(j)i,y(j)iandA(j)iasm(j)i=expit(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai(j)),y(j)i=m(j)i 1+m(j)i+m(j)i (1+m(j)i)2Ai(j),A(j)i=m(j)i (1+m(j)i)2Ai. 3. SolveEquation( D )withy(j)iandA(j)itoobtain(j+1)0and(j+1). 4. Forsomecriteriaa,ifk(j+1)0)]TJ /F7 11.955 Tf 12.16 0 Td[((j)0k+k(j+1))]TJ /F7 11.955 Tf 12.16 0 Td[((j)k
PAGE 87

WethenobtaintheestimateofE(Yij(0)jAi),i.e.Pallpossiblez(PiPjwijIfAi=a,Zi=zgq(Ai=a,Zi=z)) PiPjwijIfAi=ag.FinallywecanobtainRR(Ai=a)bythederivedestimatorsofE(Yij(a)jAi=a)andE(Yij(0)jAi=a). 87

PAGE 88

APPENDIXEVARIANCEESTIMATIONNowwefocusourinterestinestimatingvariancesofRR(Ai=1),RR(Ai=2),RD(Ai=1)andRD(Ai=2)byestimatingthevariancesof^,^0and^.ThemethodwedevelopedheremainlyfocusonsolvinganunbiasedestimatingequationofformU()=PPp=1CpPc=1Upc()=0,whereisavectorofinterest,cindexesgroupsthatplaytheroleofprimarysamplingunits,pindexestheprimarystrata,Upc()isasumofweightedestimatingequationswiththeweightedcomponentseachhavingazeroexpectation.LetrU(^)bethegradientofU(^),andletV(^)=PXp=1Cp Cp)]TJ /F4 11.955 Tf 11.95 0 Td[(1(Upc(^))]TJ /F5 11.955 Tf 11.95 0 Td[(Up(^)(Upc(^))]TJ /F5 11.955 Tf 11.96 0 Td[(Up(^)T,whereUp(^)isthemeanoffUp1(^),Up2(^),,UpCp(^)g.thenweusesandwichestimatorbasedonTaylorseriesasestimationofthevarianceVar(), ^Var(^)=[rU(^)])]TJ /F9 7.97 Tf 6.59 0 Td[(1V(^)[rU(^)T])]TJ /F9 7.97 Tf 6.59 0 Td[(1.(E)Therefore,forequation( D ),letUa1=XiXjwij(expit(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22))]TJ /F7 11.955 Tf 11.96 0 Td[(0)=0,Ua2=XiXjwijZi1(expit(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22))]TJ /F7 11.955 Tf 11.96 0 Td[(0)=0,Ua3=XiXjwijZi2(expit(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22))]TJ /F7 11.955 Tf 11.96 0 Td[(0)=0. 88

PAGE 89

ForEquation( D ),letUb1=XiXjwij(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,Ub2=XiXjwijAi1(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,Ub3=XiXjwijAi2(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,Ub4=XiXjwijZi1(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,Ub5=XiXjwijZi2(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,Ub6=XiXjwijAi1Zi1(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,Ub7=XiXjwijAi1Zi2(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,Ub8=XiXjwijAi2Zi1(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0,Ub9=XiXjwijAi2Zi2(Yij)]TJ /F5 11.955 Tf 11.95 0 Td[(expit(li))=0.Letvi=exp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai11)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai22) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai11)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai22)]2.ThenwetakepartialderivativesofUa1,,Ua3,Ub1,,Ub9withrespectto0,1,2,10,,34.For0,@Ua1 @0=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwij,@Ua2 @0=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi1,@Ua3 @0=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi2,@Ub1 @0==@Ub9 @0=0. 89

PAGE 90

For1,@Ua1 @1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijviAi1,@Ua2 @1=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijZi1exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijviZi1Ai1,@Ua3 @1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi2exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijviZi2Ai1,@Ub1 @1==@Ub9 @1=0.For2,@Ua1 @2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijviAi2,@Ua2 @2=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijZi1exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijviZi1Ai2,@Ua3 @2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi2exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijviZi2Ai2,@Ub1 @2==@Ub9 @2=0. 90

PAGE 91

For10,@Ua1 @10=XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2=XiXjwijvi,@Ua2 @10=XiXjwijZi1exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2=XiXjwijviZi1,@Ua3 @10=XiXjwijZi2exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2=XiXjwijviZi2,@Ub1 @10=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijexp(li) [1+exp(li)]2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmi,@Ub2 @10=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi1exp(li) [1+exp(li)]2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1,@Ub3 @10=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi2exp(li) [1+exp(li)]2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2,@Ub4 @10=)]TJ /F11 11.955 Tf 11.3 11.35 Td[(XiXjwijZi1exp(li) [1+exp(li)]2=)]TJ /F11 11.955 Tf 11.3 11.35 Td[(XiXjwijmiZi1,@Ub5 @10=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijZi2exp(li) [1+exp(li)]2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiZi2,@Ub6 @10=)]TJ /F11 11.955 Tf 11.3 11.35 Td[(XiXjwijAi1Zi1exp(li) [1+exp(li)]2=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijmiAi1Zi1,@Ub7 @10=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi1Zi2exp(li) [1+exp(li)]2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1Zi2,@Ub8 @10=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi2Zi1exp(li) [1+exp(li)]2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi1,@Ub9 @10=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi2Zi2exp(li) [1+exp(li)]2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi2. 91

PAGE 92

For11,@Ua1 @11=XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1=XiXjwijviAi1,@Ua2 @11=XiXjwijZi1exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1=XiXjwijviZi1Ai1,@Ua3 @11=XiXjwijZi2exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1=XiXjwijviZi2Ai1,@Ub1 @11=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijexp(li) [1+exp(li)]2Ai1=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi1,@Ub2 @11=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi1exp(li) [1+exp(li)]2Ai1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1,@Ub3 @11=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi2exp(li) [1+exp(li)]2Ai1=0,(*Ai1Ai2=0),@Ub4 @11=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijZi1exp(li) [1+exp(li)]2Ai1=)]TJ /F11 11.955 Tf 11.3 11.35 Td[(XiXjwijmiAi1Zi1,@Ub5 @11=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi2exp(li) [1+exp(li)]2Ai1=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi1Zi2,@Ub6 @11=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijAi1Zi1exp(li) [1+exp(li)]2Ai1=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijmiAi1Zi1,@Ub7 @11=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi1Zi2exp(li) [1+exp(li)]2Ai1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1Zi1,@Ub8 @11=0,@Ub9 @11=0. 92

PAGE 93

For12,@Ua1 @12=XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2=XiXjwijviAi2,@Ua2 @12=XiXjwijZi1exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2=XiXjwijviAi2Zi1,@Ua3 @12=XiXjwijZi2exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2=XiXjwijviAi2Zi2,@Ub1 @12=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijexp(li) [1+exp(li)]2Ai2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi2,@Ub2 @12=0,@Ub3 @12=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi2exp(li) [1+exp(li)]2Ai2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2,@Ub4 @12=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi1exp(li) [1+exp(li)]2Ai2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi2Zi1,@Ub5 @12=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijZi2exp(li) [1+exp(li)]2Ai2=)]TJ /F11 11.955 Tf 11.3 11.35 Td[(XiXjwijmiAi2Zi2,@Ub6 @12=0,@Ub7 @12=0,@Ub8 @12=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi2Zi1exp(li) [1+exp(li)]2Ai2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi1,@Ub9 @12=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi2Zi2exp(li) [1+exp(li)]2Ai2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi2. 93

PAGE 94

For21,@Ua1 @21=XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Zi1=XiXjwijviZi1,@Ua2 @21=XiXjwijZi1exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Zi1=XiXjwijviZi1,@Ua3 @21=0,@Ub1 @21=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijexp(li) [1+exp(li)]2Zi1=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiZi1,@Ub2 @21=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi1exp(li) [1+exp(li)]2Zi1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1Zi1,@Ub3 @21=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi2exp(li) [1+exp(li)]2Zi1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi1,@Ub4 @21=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijZi1exp(li) [1+exp(li)]2Zi1=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiZi1,@Ub5 @21=0,@Ub6 @21=)]TJ /F11 11.955 Tf 11.3 11.35 Td[(XiXjwijAi1Zi1exp(li) [1+exp(li)]2Zi1=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijmiAi1Zi1,@Ub7 @21=0,@Ub8 @21=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi2Zi1exp(li) [1+exp(li)]2Zi1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi1,@Ub9 @21=0. 94

PAGE 95

For22,@Ua1 @22=XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Zi2=XiXjwijviZi2,@Ua2 @22=0,@Ua3 @22=XiXjwijZi2exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Zi2=XiXjwijviZi2,@Ub1 @22=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijexp(li) [1+exp(li)]2Zi2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiZi2,@Ub2 @22=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi1exp(li) [1+exp(li)]2Zi2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1Zi2,@Ub3 @22=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi2exp(li) [1+exp(li)]2Zi2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi2,@Ub4 @22=0,@Ub5 @22=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijZi2exp(li) [1+exp(li)]2Zi2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiZi2,@Ub6 @22=0,@Ub7 @22=)]TJ /F11 11.955 Tf 11.3 11.35 Td[(XiXjwijAi1Zi2exp(li) [1+exp(li)]2Zi2=)]TJ /F11 11.955 Tf 11.29 11.35 Td[(XiXjwijmiAi1Zi2,@Ub8 @22=0,@Ub9 @22=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijAi2Zi2exp(li) [1+exp(li)]2Zi2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi2. 95

PAGE 96

For31,@Ua1 @31=XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1Zi1=XiXjwijviAi1Zi1,@Ua2 @31=XiXjwijZi1exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1Zi1=XiXjwijviAi1Zi1,@Ua3 @31=0,@Ub1 @31=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijexp(li) [1+exp(li)]2Ai1Zi1=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi1Zi1,@Ub2 @31=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi1exp(li) [1+exp(li)]2Ai1Zi1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1Zi1,@Ub3 @31=0,@Ub4 @31=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi1exp(li) [1+exp(li)]2Ai1Zi1=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi1Zi1,@Ub5 @31=0,@Ub6 @31=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi1Zi1exp(li) [1+exp(li)]2Ai1Zi1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1Zi1,@Ub7 @31=0,@Ub8 @31=0,@Ub9 @31=0. 96

PAGE 97

For32,@Ua1 @32=XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1Zi2=XiXjwijviAi1Zi2,@Ua2 @32=0,@Ua3 @32=XiXjwijZi2exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai1Zi2=XiXjwijviAi1Zi2,@Ub1 @32=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijexp(li) [1+exp(li)]2Ai1Zi2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi1Zi2,@Ub2 @32=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi1exp(li) [1+exp(li)]2Ai1Zi2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1Zi2,@Ub3 @32=0,@Ub4 @32=0,@Ub5 @32=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi2exp(li) [1+exp(li)]2Ai1Zi2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi1Zi2,@Ub6 @32=0,@Ub7 @32=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi1Zi2exp(li) [1+exp(li)]2Ai1Zi2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi1Zi2,@Ub8 @32=0,@Ub9 @32=0. 97

PAGE 98

For33,@Ua1 @33=XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2Zi1=XiXjwijviAi2Zi1,@Ua2 @33=XiXjwijZi1exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2Zi1=XiXjwijviAi2Zi1,@Ua3 @33=0,@Ub1 @33=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijexp(li) [1+exp(li)]2Ai2Zi1=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi2Zi1,@Ub2 @33=0,@Ub3 @33=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi2exp(li) [1+exp(li)]2Ai2Zi1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi1,@Ub4 @33=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi1exp(li) [1+exp(li)]2Ai2Zi1=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi2Zi1,@Ub5 @33=0,@Ub6 @33=0,@Ub7 @33=0,@Ub8 @33=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi2Zi1exp(li) [1+exp(li)]2Ai2Zi1=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi1,@Ub9 @33=0. 98

PAGE 99

For34,@Ua1 @34=XiXjwijexp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2Zi2=XiXjwijviAi2Zi2,@Ua2 @34=0,@Ua3 @34=XiXjwijZi2exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22) [1+exp(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai11)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai22)]2Ai2Zi2=XiXjwijviAi2Zi2,@Ub1 @34=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijexp(li) [1+exp(li)]2Ai2Zi2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi2Zi2,@Ub2 @34=0,@Ub3 @34=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi2exp(li) [1+exp(li)]2Ai2Zi2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi2,@Ub4 @34=0,@Ub5 @34=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijZi2exp(li) [1+exp(li)]2Ai2Zi2=)]TJ /F11 11.955 Tf 11.3 11.36 Td[(XiXjwijmiAi2Zi2,@Ub6 @34=0,@Ub7 @34=0,@Ub1 @34=0,@Ub1 @34=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijAi2Zi2exp(li) [1+exp(li)]2Ai2Zi2=)]TJ /F11 11.955 Tf 11.29 11.36 Td[(XiXjwijmiAi2Zi2, 99

PAGE 100

LetU(0,,)=0BBBBBBBBBBBBBB@Ua1Ua2Ua3Ub1...Ua91CCCCCCCCCCCCCCAandthegradientofU(0,,)isrU(0,,)=0BBBB@@Ua1 @0@Ua1 @34......@Ub9 @0@Ub9 @341CCCCA.ThenthevarianceisVar(0,^,^)=[rU(0,^,^)])]TJ /F9 7.97 Tf 6.59 0 Td[(1G[rU(0,^,^)T])]TJ /F9 7.97 Tf 6.58 0 Td[(1,whereG=PXp=1Cp Cp)]TJ /F4 11.955 Tf 11.95 0 Td[(1CpXi=1(Upi(0,^,^))]TJ /F4 11.955 Tf 13.38 2.66 Td[(Up(0,^,^))(Upi(0,^,^))]TJ /F4 11.955 Tf 13.38 2.66 Td[(Up(0,^,^))T.IntheSWASHstudy,pindexesgeometricstratum(RachuonyoDistrictandSubaDistrict)andiindexesclusters(schoolswithinthetwodistricts),Upi(0,^,^)isthesumoffUpij(0,^,^)goverj,andUp(0,^,^)isthesumoffUpi(0,^,^)goveri.Nextweconsiderderivetheestimators(U1andU2respectively)ofE(Yij(a)jAi=a)andE(Yij(0)jAi=a).E(Yij(a)jAi=a)=EZijAi((Ai,Zi;))=EZijAiexpit(li),andU1def=PallpossiblezPiPjwijexpit(li)IfAi=a,Zi=zg PiPjwijIfAi=ag. 100

PAGE 101

E(Yij(0)jAi=a)=EZijAi[h)]TJ /F9 7.97 Tf 6.58 0 Td[(1(h(expit(li)))]TJ /F5 11.955 Tf 11.96 0 Td[(Ai)]=EZijAi[expit(li)]TJ /F5 11.955 Tf 11.95 0 Td[(Ai)],andU2def=PallpossiblezPiPjwijexpit(li)]TJ /F5 11.955 Tf 11.96 0 Td[(Ai)IfAi=a,Zi=zg PiPjwijIfAi=ag.DeltaMethod:IfaconsistentestimatorTconvergesinprobabilitytoitstruevalue,thenafterapplyingcentrallimittheoremp n(T)]TJ /F7 11.955 Tf 11.96 0 Td[()D)650(!N(0,),wherenisnumberofobservationsandisthecovariance.Anyfunctionhwhichrh()existsandisnon-zerovalued,thenp n(h(T))]TJ /F5 11.955 Tf 11.96 0 Td[(h())D)650(!N(0,rh()Trh()).Inourcase,let=0B@E(Yij(a)jAi=a)E(Yij(0)jAi=a)1CA,andT=0B@U1U21CA, 101

PAGE 102

astherespectestimatorof.Weobtainthefollowingpartialderivatives,@U1 @0=0@U1 @1=0@U1 @2=0@U1 @10=PallpossiblezPiPjwijexp(li) [1+exp(li)]2IfAi=a,Zi=zg PiPjwijIfAi=ag@U1 @11=PallpossiblezPiPjwijexp(li) [1+exp(li)]2Ai1IfAi=a,Zi=zg PiPjwijIfAi=ag@U1 @12=PallpossiblezPiPjwijexp(li) [1+exp(li)]2Ai2IfAi=a,Zi=zg PiPjwijIfAi=ag@U1 @21=PallpossiblezPiPjwijexp(li) [1+exp(li)]2Zi1IfAi=a,Zi=zg PiPjwijIfAi=ag@U1 @22=PallpossiblezPiPjwijexp(li) [1+exp(li)]2Zi2IfAi=a,Zi=zg PiPjwijIfAi=ag@U1 @31=PallpossiblezPiPjwijexp(li) [1+exp(li)]2Ai1Zi1IfAi=a,Zi=zg PiPjwijIfAi=ag@U1 @32=PallpossiblezPiPjwijexp(li) [1+exp(li)]2Ai1Zi2IfAi=a,Zi=zg PiPjwijIfAi=ag@U1 @33=PallpossiblezPiPjwijexp(li) [1+exp(li)]2Ai2Zi1IfAi=a,Zi=zg PiPjwijIfAi=ag@U1 @34=PallpossiblezPiPjwijexp(li) [1+exp(li)]2Ai2Zi2IfAi=a,Zi=zg PiPjwijIfAi=ag 102

PAGE 103

@U2 @0=0@U2 @1=)]TJ /F11 11.955 Tf 10.49 19.52 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Ai1IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @3=)]TJ /F11 11.955 Tf 10.49 19.52 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Ai2IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @10=)]TJ /F11 11.955 Tf 10.49 19.52 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @11=)]TJ /F11 11.955 Tf 10.49 19.53 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Ai1IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @12=)]TJ /F11 11.955 Tf 10.49 19.52 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Ai2IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @21=)]TJ /F11 11.955 Tf 10.49 19.52 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Zi1IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @22=)]TJ /F11 11.955 Tf 10.49 19.52 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Zi2IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @31=)]TJ /F11 11.955 Tf 10.49 19.52 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Ai1Zi1IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @32=)]TJ /F11 11.955 Tf 10.49 19.53 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Ai1Zi2IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @33=)]TJ /F11 11.955 Tf 10.49 19.52 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Ai2Zi1IfAi=a,Zi=zg PiPjwijIfAi=ag@U2 @34=)]TJ /F11 11.955 Tf 10.49 19.52 Td[(PallpossiblezPiPjwijexp(li)]TJ /F6 7.97 Tf 6.59 0 Td[(Ai) [1+exp(li)]TJ /F6 7.97 Tf 6.58 0 Td[(Ai)]2Ai2Zi2IfAi=a,Zi=zg PiPjwijIfAi=ag 103

PAGE 104

Therefore,rh(T)=0BBBBBBB@@U1 @0@U1 @1@U1 @2@U1 @10@U1 @11@U1 @12@U1 @21@U1 @22@U1 @31@U1 @32@U1 @33@U1 @34@U2 @0@U2 @1@U2 @2@U2 @10@U2 @11@U2 @12@U2 @21@U2 @22@U2 @31@U2 @32@U2 @33@U2 @34.1CCCCCCCA.Thenusingdeltamethod,Var(T)=rh(T)Var(^0,^,^)rh(T)T.NowweturntofocusonRRandRD.Leth0B@U1U21CA=0B@U1=U2U1)]TJ /F5 11.955 Tf 11.96 0 Td[(U21CA,thenthegradientisrh0B@U1U21CA=0B@1 U2)]TJ /F6 7.97 Tf 10.49 4.88 Td[(U1 U21)]TJ /F4 11.955 Tf 9.3 0 Td[(11CA.Usedeltamethodagain,wecanobtainthevarianceofRRandRDVar0B@RRRD1CA=rh0B@U1U21CATVar(T)rh0B@U1U21CA. 104

PAGE 105

REFERENCES [1] Agresti,A.Categoricaldataanalysis,vol.359.JohnWileyandSons,2002. [2] Andersen,E.B.Discretestatisticalmodelswithsocialscienceapplications.North-HollandAmsterdam,1980. [3] Angrist,J.andKrueger,A.B.Instrumentalvariablesandthesearchforidentication:Fromsupplyanddemandtonaturalexperiments.Tech.rep.,NationalBureauofEconomicResearch,2001. [4] Angrist,J.D.,Imbens,G.W.,andRubin,D.B.Identicationofcausaleffectsusinginstrumentalvariables.JournaloftheAmericanStatisticalAssociation91(1996).434:444. [5] Bao,Y.,Duan,N.,andFox,S.A.Issomeprovideradviceonsmokingcessationbetterthannoadvice?Aninstrumentalvariableanalysisofthe2001NationalHealthInterviewSurvey.Healthservicesresearch41(2006).6:2114. [6] Beck,C.A.,Penrod,J.,Gyorkos,T.W.,Shapiro,S.,andPilote,L.Doesaggressivecarefollowingacutemyocardialinfarctionreducemortality?AnalysiswithinstrumentalvariablestocompareeffectivenessinCanadianandUnitedStatespatientpopulations.HealthServicesResearch38(2003).6Pt1:1423. [7] Begg,M.D.andParides,M.K.Separationofindividual-levelandcluster-levelcovariateeffectsinregressionanalysisofcorrelateddata.StatisticsinMedicine22(2003).16:2591. [8] Berlin,J.A.,Kimmel,S.E.,Have,T.R.T.,andSammel,M.D.Anempiricalcomparisonofseveralclustereddataapproachesunderconfoundingduetoclustereffectsintheanalysisofcomplicationsofcoronaryangioplasty.Biometrics55(1999).2:470. [9] Binder,D.A.Onthevariancesofasymptoticallynormalestimatorsfromcomplexsurveys.InternationalStatisticalReview/RevueInternationaledeStatistique51(1983).3:279. [10] Botman,S.L.,Moore,T.F.,Moriarity,C.L.,andV.L.,Parsons.DesignandestimationfortheNationalHealthInterviewSurvey,1995.CenterforHealthStatistics,VitalandHealthStatistics2(2000).130:1. [11] Bound,J.,Jaeger,D.A.,andBaker,R.M.Problemswithinstrumentalvariablesestimationwhenthecorrelationbetweentheinstrumentsandtheendogeneousexplanatoryvariableisweak.JournaloftheAmericanstatisticalassociation(1995):443. [12] Breslow,N.E.,Day,N.E.,Halvorsen,K.T.,Prentice,R.L.,andSabai,C.Estimationofmultiplerelativeriskfunctionsinmatchedcase-controlstudies.AmericanJournalofEpidemiology108(1978).4:299. 105

PAGE 106

[13] Brooks,J.M.,Chrischilles,E.A.,Scott,S.D.,andChen-Hardee,S.S.Wasbreastconservingsurgeryunderutilizedforearlystagebreastcancer?InstrumentalvariablesevidenceforstageIIpatientsfromIowa.Healthservicesresearch38(2003).6p1:1385. [14] Brumback,B.A.,Dailey,A.B.,He,Z.,Brumback,L.C.,andLivingston,M.D.Effortstoadjustforconfoundingbyneighborhoodusingcomplexsurveydata.Statisticsinmedicine29(2010).18:1890. [15] Brumback,B.A.andHe,Z.Adjustingforconfoundingbyneighborhoodusingcomplexsurveydata.StatisticsinMedicine30(2011).9:965. [16] Bun,M.J.G.andWindmeijer,F.Acomparisonofbiasapproximationsforthetwo-stageleastsquares(2SLS)estimator.EconomicsLetters113(2011).1:76. [17] Casella,G.andBerger,R.L.Statisticalinference.DuxburyPress,2001. [18] Glymour,M.M.,Tchetgen,E.J.T.,andRobins,J.M.Crediblemendelianrandomizationstudies:approachesforevaluatingtheinstrumentalvariableassumptions.Americanjournalofepidemiology175(2012).4:332. [19] Goetgeluk,S.andVansteelandt,S.Conditionalgeneralizedestimatingequationsfortheanalysisofclusteredandlongitudinaldata.Biometrics64(2008).3:772. [20] Graubard,B.I.andKorn,E.L.Conditionallogisticregressionwithsurveydata.StatisticsinBiopharmaceuticalResearch3(2011).2:398. [21] Greenland,S.Anintroductiontoinstrumentalvariablesforepidemiologists.InternationalJournalofEpidemiology29(2000).4:722. [22] .Areviewofmultileveltheoryforecologicanalyses.StatisticsinMedicine21(2002).3:389. [23] Greenland,S.andMorgenstern,H.Confoundinginhealthresearch.AnnualReviewofPublicHealth22(2001).1:189. [24] Greenland,S.andRobins,J.M.Identiability,exchangeability,andepidemiologicalconfounding.InternationalJournalofEpidemiology15(1986).3:413. [25] Greenland,S.,Robins,J.M.,andPearl,J.Confoundingandcollapsibilityincausalinference.StatisticalScience(1999):29. [26] Grilli,L.andPratesi,M.Weightedestimationinmultilevelordinalandbinarymodelsinthepresenceofinformativesamplingdesigns.Surveymethodology30(2004).1:93. 106

PAGE 107

[27] Grilli,L.andRampichini,C.Measurementerrorinmultilevelmodelswithsampleclustermeans.Tech.rep.,ElectronicWorkingPapers6/2009,DepartmentofStatistics-UniversityofFlorence,2009. [28] He,Z.andBrumback,B.A.AnEquivalenceofConditionalandUnconditionalMaximumLikelihoodEstimatorsviaIn.CommunicationsinStatisticsTheoryandMethods(inpress). [29] Hernan,M.A.,Brumback,B.,andRobins,J.M.MarginalstructuralmodelstoestimatethecausaleffectofzidovudineonthesurvivalofHIV-positivemen.Epidemiology11(2000).5:561. [30] Hernan,M.A.andRobins,J.M.Instrumentsforcausalinference:Anepidemiologist'sdream?Epidemiology(2006):360. [31] Holt,D.andSmith,TMF.Poststratication.JournaloftheRoyalStatisticalSociety.SeriesA(General)(1979):33. [32] Korn,E.L.andGraubard,B.I.Analysisofhealthsurveys.Wiley:NewYork,1999. [33] Lee,E.S.andForthofer,R.N.Analyzingcomplexsurveydata.SagePublications,Inc,2006. [34] Lehtonen,R.andPahkinen,E.Practicalmethodsfordesignandanalysisofcomplexsurveys,vol.10.Wiley,2004. [35] Liang,K.Y.ExtendedMantel-Haenszelestimatingprocedureformultivariatelogisticregressionmodels.Biometrics(1987):289. [36] Liang,K.Y.andZeger,S.L.Longitudinaldataanalysisusinggeneralizedlinearmodels.Biometrika73(1986).1:13. [37] Lindsay,B.,Clogg,C.C.,andGrego,J.SemiparametricestimationintheRaschmodelandrelatedexponentialresponsemodels,includingasimplelatentclassmodelforitemanalysis.JournaloftheAmericanStatisticalAssociation(1991):96. [38] Little,R.J.,Long,Q.,andLin,X.Acomparisonofmethodsforestimatingthecausaleffectofatreatmentinrandomizedclinicaltrialssubjecttononcompliance.Biometrics65(2009).2:640. [39] Little,R.J.A.Post-stratication:Amodeler'sperspective.JournaloftheAmericanStatisticalAssociation(1993):1001. [40] Localio,A.R.,Berlin,J.A.,andHave,T.R.T.Confoundingduetoclusterinmulticenterstudiescausesandcures.HealthServicesandOutcomesRe-searchMethodology3(2002).3:195. 107

PAGE 108

[41] Longford,NT.Model-basedvarianceestimationinsurveyswithstratiedclustereddesign.AustralianJournalofStatistics38(1996).3:333. [42] Martens,E.P.,Pestman,W.R.,DeBoer,A.,Belitser,S.V.,andKlungel,O.H.Instrumentalvariables:applicationandlimitations.Epidemiology17(2006).3:260. [43] NationalCenterforHealthStatistics.NationalHealthInterviewSurvey(NHIS):2005datarelease(online). http://www.cdc.gov/nchs/nhis/nhis_2005_data_release.htm ,lastaccessedonSep.18,2011. [44] Neuhaus,J.M.andKalbeisch,J.D.Between-andwithin-clustercovariateeffectsintheanalysisofclustereddata.Biometrics(1998):638. [45] Neuhaus,J.M.,Kalbeisch,J.D.,andHauck,W.W.Conditionsforconsistentestimationinmixed-effectsmodelsforbinarymatched-pairsdata.CanadianJournalofStatistics22(1994).1:139. [46] Neuhaus,J.M.andMcCulloch,C.E.Separatingbetween-andwithin-clustercovariateeffectsbyusingconditionalandpartitioningmethods.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology)68(2006).5:859. [47] Neyman,J.andScott,E.L.Consistentestimatesbasedonpartiallyconsistentobservations.Econometrica:JournaloftheEconometricSociety(1948):1. [48] Pearl,J.Causality:models,reasoningandinference.CambridgeUnivPress,2000. [49] Pfeffermann,D.,Skinner,C.J.,Holmes,D.J.,Goldstein,H.,andRasbash,J.Weightingforunequalselectionprobabilitiesinmultilevelmodels.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology)60(1998).1:23. [50] Pilote,L.,Beck,C.A.,Eisenberg,M.J.,Humphries,K.,Joseph,L.,Penrod,J.R.,andTu,J.V.Comparinginvasiveandnoninvasivemanagementstrategiesforacutemyocardialinfarctionusingadministrativedatabases.AmericanHeartJournal155(2008).1:42. [51] Rabe-Hesketh,S.andSkrondal,A.Multilevelmodellingofcomplexsurveydata.JournaloftheRoyalStatisticalSociety:SeriesA(StatisticsinSociety)169(2006).4:805. [52] Rasch,G.Probabilisticmodelsforsomeintelligenceandattainmenttests.UniversityofChicagoPress(Chicago),1980. [53] Remington,P.L.,Smith,M.Y.,Williamson,D.F.,Anda,R.F.,Gentry,E.M.,andHogelin,G.C.Design,characteristics,andusefulnessofstate-basedbehavioralriskfactorsurveillance:1981-87.PublicHealthReports103(1988).4:366. 108

PAGE 109

[54] Rice,K.Equivalencebetweenconditionalandrandom-effectslikelihoodsforpair-matchedcase-controlstudies.JournaloftheAmericanStatisticalAssociation103(2008).481:385. [55] Rice,K.M.EquivalencebetweenconditionalandmixtureapproachestotheRaschmodelandmatchedcase-controlstudies,withapplications.JournaloftheAmericanStatisticalAssociation99(2004).466:510. [56] Robins,J.M.Correctingfornon-complianceinrandomizedtrialsusingstructuralnestedmeanmodels.CommunicationsinStatistics-Theoryandmethods23(1994).8:2379. [57] Severini,T.A.OntherelationshipbetweenBayesianandnon-Bayesianeliminationofnuisanceparameters.StatisticaSinica9(1999):713. [58] Skinner,C.J.,Holt,D.,andSmith,T.M.F.Analysisofcomplexsurveys.JohnWiley&Sons,1989. [59] Stock,J.H.andTrebbi,F.Retrospectives:Whoinventedinstrumentalvariableregression?TheJournalofEconomicPerspectives17(2003).3:177. [60] Stukel,T.A.,Fisher,E.S.,Wennberg,D.E.,Alter,D.A.,Gottlieb,D.J.,andVermeulen,M.J.Analysisofobservationalstudiesinthepresenceoftreatmentselectionbias.ThejournaloftheAmericanMedicalAssociation297(2007).3:278. [61] TenHave,T.R.,Normand,S.L.T.,Marcus,S.M.,Brown,C.H.,Lavori,P.,andDuan,N.Intent-to-treatvs.non-intent-to-treatanalysesundertreatmentnon-adherenceinmentalhealthrandomizedtrials.Psychiatricannals38(2008).12:772. [62] Vansteelandt,S.andGoetghebeur,E.Causalinferencewithgeneralizedstructuralmeanmodels.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology)65(2003).4:817. [63] Verbeke,G.,Spiessens,B.,andLesaffre,E.Conditionallinearmixedmodels.TheAmericanStatistician55(2001).1:25. [64] vonHinkeKesslerScholder,S.,Propper,C.,Lawlor,D.,Windmeijer,F.,andDaveySmith,G.Geneticmarkersasinstrumentalvariables:anapplicationtochildfatmassandacademicachievement.(2010). [65] Wehby,G.L.,Ohsfeldt,R.L.,andMurray,J.C.Mendelianrandomizationequalsinstrumentalvariableanalysiswithgeneticinstruments.Statisticsinmedicine27(2008).15:2745. [66] Wright,P.G.Tariffonanimalandvegetableoils.26.MacmillanCompany,NewYork,1928. 109

PAGE 110

BIOGRAPHICALSKETCH ZhulinHewasborninTianjin,China.AftergraduatingfromTianjinNankaiHighSchoolin2002,sheenteredNankaiUniversityforherfouryearundergraduatestudyattheSchoolofMathematicalScience,majorinStatistics.From2006-2008,sheattendedUniversityofCalgaryinCanada.ShenishedherthesisonEstimationforCensoredSingle-IndexModelsUsingMAWVEandOPWGMethodsundertheguidanceofheradvisor,Dr.XuewenLu,andreceivedthedegreeofMasterofScienceinStatisticsin2008.Duringthefollowingyears,shestudiedbiostatisticsunderheradvisor,Dr.BabetteBrumback'shelp.Shemainlystudiedandworkedoncausalinferencewithcomplexsamplingdesigns.Inthesummerof2012,shereceivedherPh.D.fromtheUniversityofFlorida. 110