<%BANNER%>

Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2008-02-29.

Permanent Link: http://ufdc.ufl.edu/UFE0021374/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2008-02-29.
Physical Description: Book
Language: english
Creator: Li, Hongying
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Statement of Responsibility: by Hongying Li.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Wu, Rongling.
Electronic Access: INACCESSIBLE UNTIL 2008-02-29

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021374:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021374/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2008-02-29.
Physical Description: Book
Language: english
Creator: Li, Hongying
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Statement of Responsibility: by Hongying Li.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Wu, Rongling.
Electronic Access: INACCESSIBLE UNTIL 2008-02-29

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021374:00001


This item has the following downloads:


Full Text

PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

Ithankmycommitteechair,ProfessorRonglingWu,forhissupport,encouragement,andguidance.Dr.Wuprovidedmewithaninvaluableopportunitytostudystatisticalgeneticsandledmetothisfascinatingarea.Heisalwaysgenerouswithhishelptome.IthankDr.Ghoshforteachingmestatisticalinferencewhichhasbenetedmeremarkably.IthankDr.RamonLittell,Dr.MalayGhosh,Dr.XueliLiu,andDr.JulieJohnsonforservinginmydissertationcommittee.ThanksgotoallmyfriendsbothattheUniversityofFloridaandelsewherewhomadethelastveyearsfulloffun.Finally,specialthanksgotomyfamilyfortheirloveandsupport. 4

PAGE 5

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 7 LISTOFFIGURES .................................... 8 ABSTRACT ........................................ 9 CHAPTER 1INTRODUCTION .................................. 11 1.1GeneticMapping ................................ 11 1.2FunctionalMapping .............................. 12 1.3GeneticMapping:FromQTLtoQTN .................... 14 1.3.1SingleNucleotidePolymorphisms ................... 15 1.3.2QuantitativeTraitNucleotides ..................... 16 1.4NonignorableDropouts ............................. 18 1.5DissertationGoals ............................... 19 2GENERALFRAMEWORKFORFUNCTIONALQTNMAPPING ....... 22 2.1Introduction ................................... 22 2.2Model ...................................... 23 2.2.1LikelihoodFunction ........................... 23 2.2.2ModelingtheMixtureProportions ................... 25 2.2.3ModelingtheMeanVector ....................... 28 2.2.4ModelingtheCovarianceMatrix .................... 29 2.3ComputationalAlgorithm ........................... 31 2.4HypothesisTesting ............................... 31 3MODELSFORANALYZINGLONGITUDINALDATAWITHNON-IGNORABLEDROPOUT .......................... 34 3.1Introduction ................................... 34 3.2JointModels .................................. 35 3.2.1SelectionModels ............................. 36 3.2.2MixtureModels ............................. 39 3.3Conclusions ................................... 40 4PATTERNMIXTUREMODELFORFUNCTIONALMAPPINGOFQUANTITATIVETRAITNUCLEOTIDES .................... 41 4.1Introduction ................................... 41 4.2LikelihoodFunctions .............................. 42 4.2.1Notation ................................. 42 5

PAGE 6

......................... 42 4.2.3JointLikelihood ............................. 43 4.2.3.1Conditionaldistributionoflongitudinaldata ........ 43 4.2.3.2Conditionaldistributionofdropouttime .......... 44 4.2.3.3Observedlikelihoodfunction ................. 45 4.3ParameterEstimation ............................. 45 4.4VarianceEstimation .............................. 49 4.5HypothesisTesting ............................... 50 4.6AWorkedExample ............................... 52 4.6.1BackgroundandDataSummary .................... 52 4.6.2DataAnalysis .............................. 54 4.6.2.1Model1:analysisusingcompletersonly .......... 54 4.6.2.2Model2:analysisusingalllongitudinaldata ........ 54 4.6.2.3Model3:analysisusingjointmodel ............. 55 4.6.2.4Marginaldistributionoverthedropouttime ........ 57 4.7SimulationStudies ............................... 58 4.7.1SimulationScenarios .......................... 58 4.7.2Heritability ............................... 59 4.7.3SimulationParameters ......................... 61 4.7.4SimulationResults ........................... 61 4.8Discussion .................................... 62 5SELECTIONMODELFORFUNCTIONALMAPPINGOFQUANTITATIVETRAITNUCLEOTIDES ............................... 79 5.1Introduction ................................... 79 5.2LikelihoodFunctions .............................. 79 5.3ParameterEstimation ............................. 82 5.4AsymptoticSamplingVariances ........................ 85 5.5MonteCarloSimulation ............................ 85 5.5.1SimulationScenarios .......................... 85 5.5.2Results .................................. 87 5.6Discussion .................................... 87 6CONCLUSIONSANDPROSPECTS ........................ 89 REFERENCES ....................................... 91 BIOGRAPHICALSKETCH ................................ 99 6

PAGE 7

Table page 2-1Possiblediplotypesandtheirfrequenciesforeachofninegenotypes ....... 33 4-1Completersanddropoutsinformation ....................... 70 4-2Likelihoodratioteststatisticsandp-valuesforthetestsoffourpossibleriskhaplotypesfromcandidategene2ARusingcompletersonly .......... 70 4-3Estimatedparametersassuming[GG]astheriskhaplotypeusingcompletersonly .......................................... 70 4-4Likelihoodratioteststatisticsandp-valuesforthetestsoffourpossibleriskhaplotypesfromcandidategene2ARusingalllongitudinaldata ........ 71 4-5Comparisonofthethreejointmodels ........................ 71 4-6EstimatedparametersforthejointmodelIV .................... 72 4-7Trueparametersdescribingthesimulationmodels ................. 73 4-8Maximumlikelihoodestimates(MLEs)ofSNPallelefrequenciesandlinkagedisequilibrium ..................................... 73 4-9Simulationresultsforscenario1 ........................... 74 4-10Simulationresultsforscenario2 ........................... 75 4-11Simulationresultsforscenario3 ........................... 76 4-12Simulationresultsforscenario4 ........................... 77 4-13p-valuesofthetestsforQTNunderthethreemodelsforallfourscenariosundersimulation ....................................... 78 5-1Maximumlikelihoodestimates(MLEs)ofSNPallelefrequenciesandlinkagedisequilibrium ..................................... 88 5-2Maximumlikelihoodestimates(MLEs)oftheparameterswhichdescribethelongitudinalcurveanddropoutprocess ....................... 88 7

PAGE 8

Figure page 1-1SNPs,haplotypesandtagSNPs 20 1-2HaplotypecongurationofadiplotypefortwohypothesizedSNPs ....... 21 1-3Diplotypecongurationofadoubleheterozygotegenotype ............ 21 4-1Dobutaminedrugresponseexperimentdata .................... 65 4-2Estimatedmeancurvesusingcompletersonly(model1) ............. 65 4-3Samplemeansofheartratesatdierentdosages .................. 66 4-4EstimatedmeancurvesfromjointmodelI ..................... 66 4-5EstimatedmeancurvesfromjointmodelII ..................... 67 4-6EstimatedmeancurvesfromjointmodelIII .................... 67 4-7Estimatesofthemarginalcurvesofheartrate ................... 68 4-8Estimatesofthemarginalcurvesofheartratecomparedwiththecurvesfrommodel2 ........................................ 69 4-9Thesimulationmeancurvesforallthefourscenarios ............... 69 8

PAGE 9

Althoughahandfulofstatisticaltoolshasbeenavailabletoassociategenotypeswithphenotypes,thereisapressingneedforsuchatoolonmoresophisticatedstatisticalmodelsthatcandetectspecicDNAsequencevariants(calledquantitativetraitnucleotidesorQTNs)thatencodecomplexphenotypes.SomekeyissuesrelatedtotheidenticationofQTNsresponsibleforlongitudinalresponseshavebeensolvedbyfunctionalmapping.However,theutilizationoffunctionalmappingmaybelimitedinpracticebytheproblemofmissingorincompleteresponsesthatisubiquitousinclinicaltrials.Inclinicaltrials,therearealwayssomepatientswhohavetodropoutearlyduetoreasonslikephysiologicalsideeectsorlimitedduration,presentingasignicantchallengeinstatisticalinference. Inmyresearch,IderiveastatisticalmodelformappingQTNsthatcontrollongitudinalresponsessubjecttonon-ignorabledropoutbyembeddingtheideaofpatternmixturemodelsintotheframeworkoffunctionalmapping.AgeneralNewton-Raphsonalgorithmwasderivedtomaximizethelikelihoodfunction.Anumberofclinicallymeaningfulhypotheseswereformulatedtotestthegeneticcontrolmechanismsforlongitudinalresponsesanddropoutprocess.Themodelwasappliedtoastudyfromapharmacogenomicproject,inwhichaQTNwasfoundtoaectheartrateresponsetoincreasingdosesofdobutamine.Extensivesimulationstudieswereperformedtoinvestigatethestatisticalpropertiesofthemodelandvalidateitsusefulness.Thenewmodelcanbeusedto 9

PAGE 10

10

PAGE 11

1 ].Theseso-calledcomplextraitsareusuallycontrolledbyanetworkofgenesthatoperateindependentlyorinteractivelytoaecttheformationofatraitanditsprogression.Furthermore,theexpressionofthegenesinvolveddependsonthegeneticbackgroundofanorganism,thestageofitsdevelopmentandtheenvironmentinwhichtheorganismisreared.Forthesecharacteristics,classicMendeliangeneticapproacheshavebeenlittleusefulforthestudyoftheprecisegeneticarchitectureofacomplextrait. Inordertomapthegenes,weneedthefacilityofgenemarkers.GenemarkersaredetectablegenetictraitsordistinctivesegmentsofDNAthatserveaslandmarksforatargetgene.Markersaregeneticallylinkedtothetargetgene,sothatthroughlinkageanalysisthelattercanbeinferredfromtheformer.Withthediscoveryofpolymorphicmolecularmarkersthatcanbegeneratedalmostunlimitedlyfromthegenomeofanyorganism,thegeneticanalysisofcomplextraitshasenteredanewerainwhichgenesthatcontrolacomplextraitcanbelocatedindividuallyonthegenomeandtheirgeneticeectscanbeestimated[ 2 ].Themotiveforcethatallowsfortheidenticationofindividualgenesforcomplextraits,orquantitativetraitloci(QTL),comesfromthepublicationofLanderandBotstein'sseminalpaperforQTLmapping[ 3 ].LanderandBotsteinproposedastatisticalmodelwithinthecontextofaGaussianmixturemodel,implementedwiththeexpectation-maximization(EM)algorithm,totesttheexistenceofQTLandestimatetheirgeneticeectsonacomplextrait.Thepast18yearshavewitnessedaresurgenceofinterestinthedevelopmentofstatisticalmodelsandalgorithmsforgeneticmapping,aimedtoimprovetheprecisionofQTLmapping[ 4 ]{[ 7 ]andequipthemappingmodelstosuitdierentgeneticdesigns(F2/backcrossorfull-sibfamily),markertypes(dominant 11

PAGE 12

8 ]{[ 11 ].Thesestatisticalmappingmodelshaveplayedanimportantroleinthecharacterizationofthegeneticarchitectureofcomplextraits,suchasgrainyieldanditscomponentsinagriculture[ 12 ],life-historytraitsinevolutionarybiology[ 13 ],[ 14 ]anddiseasesinhumans[ 15 ]. 16 ]{[ 19 ]orinnite-dimensionalcharacters[ 20 ].Otherexamplesoffunction-valuedtraitsincludethecontinuouschangeofamorphologicalorphysiologicalvariablewithbodysize(allometricscaling)[ 21 ]{[ 23 ]andresponsivephenotypesofagivengenotypetoachangingenvironment(reactionnorm)[ 24 ].Thecommonpropertyofthesefunction-valuedtraitsisthattheycanbedescribedasafunction(orstochasticprocess)ofsomeindependentandcontinuousvariableconsistingofaninnitenumberofpoints,suchasage,temperature,lightintensityorbiologicalsize. Todeterminethedynamicchangesofgeneticeectsindevelopment,onecanmeasuretraitvaluesataseriesofdiscretetimepointsorstatesandthenextendthetraditionalintervalmappingapproachtoaccommodatethemultivariatenatureoftime-dependenttraits.However,thisextensionislimitedinthreeaspects: (1) ExpectedmeansofdierentQTLgenotypesatallpointsandallelementsinthecovariancematrixneedtobeestimated,resultinginsubstantialcomputationaldicultieswhenthevectorandmatrixdimensionsarelarge; (2) Theresultfromthisapproachmaynotbebiologicallymeaningfulbecausetheunderlyingbiologicalprincipleforthetraitisnotincorporated; (3) Thisapproachcannotbewelldeployedinapracticalschemebecauseof(2).Thus,somebiologicallyinterestingquestionscannotbeaskedandanswered. 12

PAGE 13

25 ]{[ 29 ],reviewedin[ 9 ].Thebasicprincipleofthismethod,calledfunc-tionalmapping,istoexpressthegenotypicmeansofaQTLatdierenttimepointsintermsofacontinuousgrowthfunctionwithrespecttotimet.Underthisprinciple,theparametersdescribingtheshapeofgrowthcurves,ratherthanthegenotypicmeansasexpectedintraditionalmappingstrategies,areestimatedwithinamaximumlikelihoodframework.Alsounliketraditionalmappingstrategies,functionalmappingestimatestheparametersthatmodelthestructureofthecovariancematrixamongmultipledierenttimepointsand,therefore,largelyreducesthenumberofparametersbeingestimatedforvariancesandcovariances,especiallywhenthenumberoftimepointsislarge. Althoughtheideaoffunctionalmappingwasoriginallystimulatedfromagrowthmodelforacontrolledcross,thismethodhasbeenextensivelyexpandedtoawidespectrumofbiomedicalandevolutionaryelds.WangandWu[ 30 ]andZhuetal.[ 31 ]combinedtheprincipleoffunctionalmappingandmathematicalmodelsforHIVdynamicsestablishedbyHoandcolleagues[ 32 ]tomaphostQTLthatgoverndynamicchangesofHIVviralloadsinahumanbody.Functionalmappinghasbeenextendedtostudythegeneticcontrolofdrugresponse[ 33 ]byimplementingwell-developedpharmacodynamicandpharmacokineticmodels[ 34 ].\Naturally-occurring"or\programmed"celldeath(PCD)inwhichthecellusesspecializedcellularmachinerytokillitselfisaubiquitousphenomenonthatoccursearlyinorgandevelopment.ArecentlyextendedfunctionalmappingmodelhasmadeitpossibletoidentifyanddetectQTLthatareresponsibleforPCDinanyorganism[ 35 ]. Intheory,geneticinformationcontainedwithinanykindoflongitudinallymeasureddatacanbeextractedbyfunctionalmapping.Theadvantagesoffunctionalmappinginmodelexibility,stabilityandpowerresultfromitsstatisticalformulationincludingthemathematicalapproximationofthemeanvectorandthemodelingofthecovariance 13

PAGE 14

25 ]ornonstationarymodels[ 36 ].Althoughoriginalfunctionalmappingwasderivedforthebiologicalprocessesthatcanbedescribedbyparametricfunctions,functionalmappinghasbeenmodiedwithinthenonparametriccontexttoaccommodatethesituationinwhichbiologicallymeaningfulmathematicalequationsdonotexist[ 37 ].Analysisoflongitudinaldataneedanumberofmathematicalmanipulationsforthecovariancematrix,suchasthecalculationofthematrixdeterminantandinverse.Inmodelingthestructureofcovariancematrixbyparametricfunctions,however,thesemathematicalmanipulationswillnotbemadepossiblebecauseofthematrix'ssparsestructure,especiallywhenthedimensionofthematrixistoolarge.Toovercomethisproblem,Zhaoetal.[ 38 ]proposedade-nosingapproachbasedonwavelettransformationtoreducethedimensionalityoflongitudinaldata.Preservingthefavorablepropertiesoffunctionalmapping,waveletthresholdingapproachcreatesanewdirectioninstatisticalgeneticstomanipulatehigh-dimensionalmultivariatedynamicdatainbothstatisticallyandbiologicallymeaningfulways. 39 ].DNAsequencevariationsbetweenindividualsareresponsibleforourbiologicalandphysicaldierences.Althoughapproximately99.9%ofhumanDNAsequencesarethesameacrossapopulation,theremaining0.1%variationsinthesesequencescanhaveamajorimpactonhowhumansrespondtodisease,bacteria,viruses,toxins,chemicalsanddrugs[ 40 ]{[ 43 ]. 14

PAGE 15

1-1 arefrom[ 44 ].Inthisgure,fourDNAsequencesofthesamechromosomeregionfromfourdierentpersonsarecompared.MostoftheDNAsequenceisidentical,onlythreebaseshavevariation.WecallthiskindbasesasSingleNucleotidePolymorphismsorSNPs(Figure 1-1 a).BytakingouttheSNPsfromoriginalDNAsequence,aparticularcombinationofbasesatnearbySNPscanbeconstructed,whichiscalledahaplotype(Figure 1-1 b).SNPsarethemostfrequentformofsequencevariationsamongindividuals,accountingforaround90%ofsequencedierences[ 45 ].Therestmayduetoinsertionsordeletions,repeatsandrearrangements.TheseSNPsarepresentathighdensityinthegenome,highlyconservedandalsoevolutionarilystablemakingthempowerfultoolsforthemappinganddiagnosisofdiseaserelatedalleles.SNPsareofgreatvalueforbiomedicalresearchandfordevelopingpharmaceuticalproductsormedicaldiagnostics. 15

PAGE 16

1-1 c).RecentmolecularsurveyssuggestthatthehumangenomecontainsmanydiscretehaplotypeblocksthataresitesofcloselylocatedSNPs[ 46 ],[ 47 ].Eachblockmayhaveafewcommonhaplotypeswhichaccountforalargeproportionofchromosomalvariation.Betweenadjacentblocksaretherelargeregions,calledhotspots,inwhichrecombinationeventsoccurwithhighfrequencies.SeveralalgorithmshavebeendevelopedtoidentifyaminimalsubsetofSNPs,i.e.,\tagging"SNPs,thatcancharacterizethemostcommonhaplotypes.ThenumberandtypeoftaggingSNPswithineachhaplotypeblockcanbeassumedtohavedeterminedpriortoassociationstudies. 1-2 ). Inapracticalgeneticanalysis,wecanonlyobservethegenotypeexpressedasAa=Bb.However,thedoubleheterozygotemaybeone(andonlyone)oftwopossiblediplotypes 16

PAGE 17

1-3 ).Inpractice,itisimportanttoestimatehaplotypeeectsonaquantitativetraitbasedonthediplotypesandthereforegenotypes.Forexample,ifananimalcarrieshaplotype[AB],itwillgrowbetterthanotheranimalsthatcarryanyotherhaplotypes,[Ab],[aB]and[ab].Forthisreason,thesamegenotypeAa=Bbmayperformdierently,dependingonwhatdiplotypeitcarries.Ifthisgenotypeisdiplotype[AB][ab],thenitwillhaveabettergrowth.Iftheanimalisdiplotype[Ab][aB],itsgrowthwillbepoorer.Thestatisticalmodelbeingdevelopedwillbeusedtodeterminewhichdiplotypeisassociatedwithbettergrowthinexperimentalcrosses. GeneticmappingwiththediplotypeshaspowertodetectparticularDNAsequencesthatencodeacomplextrait.Thenucleotidecomponentsofahaplotypeordiplotypethataectaquantitativetraitarecalledquantitativetraitnucleotides(QTNs).Asmentionedabove,thehumangenomeprojectgeneratesmassiveamountsofDNAsequencedataacrossthehumangenome[ 44 ]andthiswillfacilitatethecompleteidenticationofQTNsresponsibleforcomplextraits,theirnumberanddistributionaswellasthenumberandorderofnucleotidesthatcomprisetheQTNs.Also,thefrequenciesofhaplotypesforaQTNarealsoimportantfortheunderstandingofthepopulationstructureanddiversity. AsaseriesofDNAsequences,aQTNthatencodeaparticulartraitcanbedetectedwithSNPsgenotypedfromcandidategenesorthroughoutthewholegenome.Therecentdevelopmentofthehumanhaplotypemap(HapMap)hasmadeitpossibletogenomewidedetectQTNsthatareresponsibleforcomplexdiseases[ 44 ].Liuetal.[ 48 ]wereamongtherstwhoproposedageneralstatisticalmodelforthecharacterizationofQTNvariantsthatencodeacomplexphenotypeinanaturalpopulation.AstrategybasedonQTLinformationhasalsobeendevelopedtoidentifyQTNforcomplextraitsincontrolledcrosses[ 49 ]. 17

PAGE 18

9 ].Functionalmappingprovidesaquantitativeframeworkfortestingtheinterplaybetweengeneticactions/interactionsandthepatternofresponsesacrossdierenttimesorstates,providingacomprehensivepictureofanetworkofgeneticregulationsthatdeterminetheformationandprogressionofadiseaseaswellastheprospectiveeectsofdrugsdesignedtotreatthedisease. Inpractice,theutilizationoffunctionalmappingmaybe,however,limitedbytheproblemofmissingorincompleteresponsethatisubiquitousinclinicaltrials.Foranideallongitudinalstudydesign,dataarecollectedoneverysubjectinthesampleateachtime,butthisusuallycannothappenbecausetherearealwaysunobservedresponsesforsomesubjects.Forexample,inapharmacogeneticprojectbyProf.J.A.JohnsonattheUniversityofFlorida,agroupof163menandwomenwithagesfrom32to83yearsoldwereinvolvedinlongitudinalmeasuresforheartrate.Dobutaminewasinjectedintothesesubjectstoinvestigatetheirresponseinheartratetothisdrug.Thesubjectsreceivedincreasingdosesofdobutamine,untiltheyachievedtargetheartrateresponseorpredeterminedmaximumdose.Thedoselevelsusedwere0(baseline),5,10,20,30and40mcg/min,ateachofwhichheartratewasmeasured.Thetimeintervalof3minutesisallowedbetweentwosuccessivedosesforsubjectstoreachaplateauinresponsetothatdose.Ofthesesubjectsstudied,112(69%)completedthetestsofheartrateatallthesixdoselevels.Theothersdroppedoutbeforethecompletionofthetrialbecauseheartratesatanyhigherdoselevelsarebeyondtheirphysiologicallimitsortheyreachedotherphysiologicallimitslikethebloodpressurewastoohighortoolow,orthepatientexperiencedotheradverseeects.Atotalof31(19%),15(9%)and5(3%)subjects 18

PAGE 19

25 ],[ 27 ]{[ 29 ],mymodelincorporatesthestatisticalprincipleforhandlingnon-ignorabledropoutsintotheanalysis.Specically,myresearchachievesthefollowinggoals: (1) Derivealikelihood-basedmodelforfunctionalmappingoflongitudinaldatawithnon-ignorabledropouts,providingastatisticaltoolforstudyingthegeneticcontrolmechanismsofcomplexbiomedicalprocesses; (2) FrameastatisticalprocedurefortestingpleiotropiceectsofQTNsonlongitudinalandfailure-timeoutcomesandfurthercharacterizingthesharedgeneticbasisofthesetwoprocesses; (3) Provideaquantitativeframeworkfortestingtheinterplaybetweengeneticactionsandthedynamicpatternoflongitudinalvariables. Thestatisticalpropertiesofmymodelsareinvestigatedthroughsimulationstudies.Theusefulnessofthemodelisalsovalidatedbyarealdatasetaboutthelongitudinalpharmacogeneticstudyofdrugresponse. 19

PAGE 20

44 ])

PAGE 21

HaplotypecongurationofadiplotypefortwohypothesizedSNPs Figure1-3. DiplotypecongurationofadoubleheterozygotegenotypefortwohypothesizedSNPs. 21

PAGE 22

50 ],despiteaconsiderablenumberofQTLreportedintheliterature. AmoreaccurateandusefulapproachforthecharacterizationofgeneticvariantscontributingtoquantitativevariationistodirectlyanalyzeDNAsequencesassociatedwithaparticulardisease.IfastringofDNAsequenceisknowntoincreasediseaserisk,thisriskcanbereducedbysilencingtheexpressionofthisstringofDNAsequencewithaspecializeddrug.ThecontrolofthisdiseasecanbemademoreecientifallpossibleDNAsequencesdeterminingitsvariationareidentiedinthegenome.[ 48 ]proposedahaplotypingmethodtoidentifyDNAsequencevariantsthatareassociatedwithquantitativevariation.ThismethodwasderivedonthebasisofmultilocushaplotypeanalysisusinganitenumberoftagSNPs.Aclosedformsolutionwasstatisticallyderivedtoestimatetheeectsofhaplotypes,haplotypefrequencies,allelefrequenciesandthedegreesoflinkagedisequilibriaofvariousordersamongtagSNPsunderlyingthe 22

PAGE 23

51 ]extendedtheideaofhaplotypingtomapQTNsthatregulatealongitudinaltrait.Inthischapter,IwilldescribethebasicprincipleofthehaplotypingmethodfortheidenticationofDNAsequencevariantsthatencodealongitudinalresponse.Themotivationandapplicationofthismethodtostudythegeneticarchitectureofdrugresponsewillbediscussed. Functionalmappingattemptstojointlymodelmean-covariancestructuresinlongitudinalstudies,anareathathasrecentlyreceivedconsiderableinterestinthestatisticalliterature[ 52 ]{[ 55 ].However,incontrasttogenerallongitudinalmodeling,functionalmappingintegratestheestimationandtestprocessofitsunderlyingparameterswithinamixture-basedlikelihoodframework.Eachmixturecomponentinthelikelihoodmodelisgivenaparticularbiologicalrationale.Foranitemixturemodel,eachobservationisassumedtohavearisenfromoneofaknownorunknownnumberofcomponents,eachcomponentbeingmodeledbyadensityfromtheparametricfamily. 23

PAGE 24

(2)K=2jij1=2exp1 2(yiuq)1i(yiuq)0; whereqisagenericparameterset,inthiscaseequalstothesetofparametersthatareusedtodescribe(uq;i);yi=[yi(t1);:::;yi(tK)]isavectorofobservationsmeasuredatKtimepointsforsubjectianduq=[uq(t1);:::;uq(tK)]isavectorofexpectedvaluesforcompositediplotypeqatdierentpoints.iisthecovariancematrixforsubjecti. Thusmarginally,thelongitudinaldataforsubjectifollowsamixturemodel: where$=($0;$1;$2)arethemixtureproportions(i.e.,compositediplotypefrequencies)whichareconstrainedtobenon-negativeandsumtounity,and=(0;1;2)denotegenericparametersets. Sotheoveralllikelihoodfunctionwouldbe: 2expf1 2(Yiq)1i(Yiq)0g(2{3) Intheabovelikelihoodfunction,wecanseethatitsderivationneedsthreenecessarystepsasfollows: (1) Modelthemixtureproportionsintermsofcompositediplotypefrequencies, (2) Modelthemeanvectorforeachcompositediplotypeusingparametricornonparametricapproaches, 24

PAGE 25

Modelthecovariancematrixbasedonthestructuredornon-structuredform. Intherststep,geneticknowledgeaboutthestructureofasampledpopulationisrequired.Instep2,biologicallymeaningfulmathematicalfunctionsthatdenethedynamiccurveforthetraitareincorporated. Letpr1,:::,prLbetheallelefrequenciesfortheseLSNPs.Thefrequencyofhaplotype[r1;:::;rL],denotedaspr1r2rL,iscalculatedasfollows: whereD0sarethelinkagedisequilibriaofdierentordersamongparticularSNPs.Fortightlylinkedloci,cosegregationmayleadtonon-randomassociationsbetweenallelesin 25

PAGE 26

Inpractice,onlygenotypescanbeobserved,denotedasnr1r01=:::=rLr0L,withhaplotypesanddiplotypesbeingunobservable.ThenumberofdiplotypesislargerthanthenumberofgenotypesbecausethegenotypesthatareheterozygousattwoormoreSNPscontainmultipledierentdiplotypes.Diplotype(andthereforegenotype)frequenciescanbeexpressedintermsofhaplotypefrequencies.Diplotypeandgenotypefrequenciesareexpressed,respectively,asP[rm1:::rmL][rp1:::rpL]=8><>:p2r1:::rLfor[rm1:::rmL]=[rp1:::rpL]=[r1:::rL]2prm1:::rmLprp1:::rpLfor[rm1:::rmL]6=[rp1:::rpL]; 26

PAGE 27

2-1 listsallthepossiblegenotypesanddiplotypesanddiplotypefrequencies.Basedontable 2-1 ,itcanbeseenthatallthegenotypesareconsistentwiththediplotypeexceptfordoubleheterozygousgenotype10/10.Thisheterozygotecanbeeitherdiplotype[11][00]or[10][01],eachwithadierentfrequency. Thefrequenciesofeachgenotypeisexpressedintermsofthefrequenciesoffourhaplotypes(p11;p10;p01;p00).GenotypeGifollowsamultinomialdistributionwith9possibleoutcomeswithprobabilitygivenintable 2-1 .Thelikelihoodfunctionisconstructedas wherep=fp11;p10;p01g,P1l;l0=0pll0=1. EMalgorithmisreadilyusedtoobtainthemaximumlikelihoodestimates(MLEs)ofp.ThenwecancalculatetheMLEofrelativefrequencyintable 2-1 .CompositediplotypeQigivenGiandriskhaplotypeAisdeterminedfromtable 2-1 .Ithasthreepossiblecategories0,1,2.Theoretically,thisestimationprocesscanbeextendedtoanynumberofSNPsgenotypedfromthegenome. Thefrequenciesofcompositediplotypesasthemixtureproportionsforthemixturemodel( 2{2 )reectthefrequenciesofcompositediplotypesthatcannotbeobserveddirectlybutcanbeinferredfromobservableSNPgenotypes.Totallythereare3dierentcompositediplotypesAA,AAandAA. Inpractice,theriskhaplotypecannotbereadilydetectedfrommarkerdata.Byassigningeachofthefourhaplotypesastheriskhaplotype,thecorrespondinglikelihoodvalueisthencalculated.Thus,anoptimalriskhaplotypeisonethatcorrespondstothemaximumofthefourlikelihoodvalues[ 33 ],[ 48 ]. 27

PAGE 28

56 ].InthefunctionalQTNmapping,weoftenstudythegeneticarchitectureofthebiologicaltraitslikeheightsorweightsmeasuredatconsecutiveagesorbloodpressuresmeasuredatparticulartimesfordierentpatientsordierentdosesofacertaindrug.Theselongitudinaltraitsoftenasymptoticallygotoanupperorlowerboundorexhibitsuddenchangesinbehaviorduringthestudy[ 57 ].Inthesecases,thenonlinearcurvesbasedontheunderlyingbiologicalrationalearemoreappropriateforthemodeltting.Sothelongitudinalmeanwithacompositediplotypegatdierenttimesorstatesoftencanbemodeledbyamathematicalequationofbiologicalmeaning.Basedondierentsituations,thecurvesincludelogisticgrowthcurves,Emaxcurvesetc.Oriftherearenoobviousbiologicalrationaleunderthestudythatcanbeemployed,wecanusetheordinarymethodssuchasthepolynomialcurvesandnonparametricmethod. Consideratypicaldoseresponsecurveinpharmacodynamicstudies,whichspeciestherelationshipbetweendrugconcentration(c)anddrugeect(E)basedon whereE0isthebaselinevalue,Emaxistheasymptotic(limiting)eect,EC50isthedrugconcentrationthatresultsin50%ofthemaximaleect,andHistheslopeparameterthatdeterminestheslopeoftheconcentration-responsecurve.ThelargerH,thesteeperthelinearphaseofthethelog-concentration-eectcurve.ThisiscalledaEmaxcurve.Byestimatingthesecurveparametersseparatelyfordierentcompositediplotypes,functionalmappingcandeterminehowtheDNAsequencevariantsinuencedrugresponsebasedontheshapedierencesamongthethreecurves. 28

PAGE 29

Byestimatingtheparametervectorug=(E0,Emaxg,EC50g,Hg),thedose-dependentdrugeectcurvesfordierentcompositediplotypescanbeestimated.Fromtheseestimates,thelongitudinalchangesofvariousgeneticeectscanbetested. ModelingthemeanvectorbytheEmaxequationhasbiologicalmeritsintermsofitssolidbiologicalfoundationandtheimplementationofbiologicallymeaningfulhypotheses.Themodelingofthemeanvectorisalsostatisticallyadvantageous.Ascomparedtoconventionalmultivariateintervalmappingforstudyingmulti-dosedrugresponsedata,functionalmappingestimatesmuchfewerparameters. 58 ].Becauseofelegantmathematicalandstatisticalproperties,theautoregressiveprocesshasbeenwidelyusedforstudiesoflongitudinaldatameasurements.Therst-orderautoregressivemodelhasbeensuccessfullyappliedtomodelthestructureofthewithin-subjectcovariancematrixforfunctionalmapping.TheAR(1)modelisbasedontwosimpliedassumptions,i.e.,variancestationarity{theresidualvariance(2)isunchangedovertimepoints,andcovariancestationarity{thecorrelationbetweendierentmeasurementsdecreasesproportionally(in)withincreasedtimeinterval.Inpractice,thetwosimpliedassumptionsoftheAR(1)modelmaynothold.ZimmermanandNu~nez-Anton[ 56 ]proposedaso-calledstructuredantedependence(SAD)modeltomodeltheage-specicchangeofcorrelationintheanalysisoflongitudinaltraits.TheSADmodel 29

PAGE 30

36 ]. Ifthemeasurementtimesareassumedtobe(t1;t2;:::;tK),asecondorderSADmodelcanbewrittenas fork=3;:::;K:Heretheerrorterms(tk)areassumedtobenormallydistributedwithmeanzeroandvariance2(tk),termed\innovationvariances"attimetk.Theinnovationvariancescanbeassumedtobeaparametricfunctionoftimelikeapolynomialfunction[ 59 ].Fortheantedependenceparameters1and2,theyareunconstrainedsoanyparametricfunctionoftimecanbeconsidered.Forexample,1(tk;tk1)=exp((tktk1))and2(tk;tk2)=exp((tktk2)). ForrstorderSADmodelswithanantedependenceparameter,ifweassumetheinnovationvariances2(t)tobeaconstant2overtime,explicitformsofvarianceandcorrelationfunctionscanbeobtained: ForthissimplestSAD(1)model,thevarianceandcorrelationfunctionsarenon-stationary.Theychangeastimeandtimeintervalchange. 30

PAGE 31

56 ]. 60 ],canbeimplementedtoobtainthemaximumlikelihoodestimates(MLEs)ofthreegroupsofunknownparametersinaQTNmappingmodel,thatis,theQTN-segregatingparameters(p)thatspecifythesegregationpatternsofaQTNinthepopulation,thecurveparameters(uj)thatmodelthemeanvector,andtheparameters(v)thatmodelthestructureofthecovariancematrix.Alltheseunknowns=(p;uj;v)arecontainedwithinthemultinomiallikelihoodfunction 2{5 andthemixturemodeldescribedbyequation( 2{2 ). Thiscanbetestedbyformulatingthefollowinghypotheses, whereH0correspondstothereducedmodel,inwhichthedatacanbettedbyasinglecurve,andH1correspondstothefullmodel,inwhichthereexistdierentcurvestotthedata.Thelikelihoodratioteststatisticfortestingthehypothesesin( 2{8 )iscalculatedas: LR=2[logL(~jy)logL(^jy;G)]; where~and^denotetheMLEsoftheunknownparametersunderH0andH1,respectively.Notethattheestimationof^dependsonbothphenotypicvalues(y)andgeneticmarkerdata(G),whereastheestimationof~onlydependsony.ThecriticalthresholdforthedeclarationofaQTNcanbedeterminedapproximatelyfroma2

PAGE 32

Oneofthemostimportantbiologicaladvantagesforfunctionalmappingisthatitprovidesaquantitativeandtestableframeworkforunderstandingtherelationshipbetweengeneactionandvariousdevelopmentalevents[ 27 ].Forexample,whendoestheQTLorQTNturnonorotodeterminethedynamicprocessofatraitduringtimecourse?Inthecaseofgrowthcurves,whetherdoesthesameQTL/QTNpleiotropicallyaecttheexponentiallyandasymptoticallygrowingphases?HowdoestheQTL/QTNmediatethetransitionpointofthetwophases? 32

PAGE 33

PossiblediplotypesandtheirfrequenciesforeachofninegenotypesattwoSNPswithinaQTN,andcompositediplotypesbyassigningeachofthefourhaplotypes,respectively,asariskhaplotype(A) DiplotypeCompositediplotype RelativeGenotypeCongurationFrequencyfrequencyA=[11]A=[10]A=[01]A=[00] 11/11[11][11]p2111AAAAAAAA11/10[11][10]2p11p101AAAAAAAA11/00[10][10]p2101AAAAAAAA10/11[11][01]2p11p011AAAAAAAA10/10[11][00][10][01]2p11p002p10p01AAAAAAAAAAAAAAAA10/00[10][00]2p10p001AAAAAAAA00/11[01][01]p2011AAAAAAAA00/10[01][00]2p01p001AAAAAAAA00/00[00][00]p2001AAAAAAAA

PAGE 34

61 ],[ 62 ].Averageratesofchangeoveranindependentvariable,orslopes,invariousstagesareacommonsummarymeasure.Duringthestudy,itmayoccurthatsubjectswithdraworareremovedfromanassignedtreatmentduetovariousphysiologicalorpersonalreasons.Itisgenerallysuggestedthatsuchaneventmayberelatedtobothobservedandunobservedlongitudinaloutcomes.Specialstatisticaltechniquesareneededtoanalyzetheselongitudinaldatawithdropout. In1987,LittleandRubin[ 63 ]classiedthedatamissingmechanismsintothreecategories: (1) Missingcompletelyatrandom(MCAR):theprobabilityofnon-responsedoesnotdependonneitherobservednorunobservedlongitudinaloutcomes, (2) Missingatrandom(MAR):non-responsedependsontheobservedpartbutnottheunobservedpartofthelongitudinalresponses, (3) Informativedrop-out(ornon-ignorablenon-response(NINR)):theprobabilityofdrop-outdependsontheunobservedpartofthelongitudinalresponse. Moreexplicitly,weletYidenotetheintendedcompletesetoflongitudinalmeasurements,Yoibethemeasurementsthathavebeentaken,YmibethemissingmeasurementandDibethefailuretimeforsubjecti(i=1;:::;N),respectively.Thenundermissingcompletelyatrandom,wehave Underthemissingatrandommechanism, Undertheinformativedropout,

PAGE 35

64 ],undertheMAR,likelihoodbasedstatisticalinferencesaboutthemeasurementprocesscanignorethedropoutprocess.However,ignoringtheinformativedropoutprocesswillpotentiallyresultinbiasesininferencesaboutthemeasurementprocess.Inthischapter,Iwillprovideageneralreviewonthewayinwhichlongitudinaldatasubjecttonon-ignorabledropoutaremodelledandanalyzed.Formoredetails,seetheworkin[ 61 ],[ 65 ]{[ 70 ]. Basedonwhethertherearemissingdataandhowdataaremissing,allthesubjectsstudiedcanbesortedintothreecategories: (1) Thesubjectswhocompletemeasurementsofresponsesfromthersttointendedcompletiontimepoint, (2) Thesubjectswhoseresponsesaremissingatonetimebutarethenmeasuredatasubsequenttime(s),leadingtothestructureofsparsedataforthesesubjects, (3) Thesubjectswhodropoutfromthestudybeforetheplannedcompletiontime,thushaveincompleteresponses. Functionalmappingwasoriginallyproposedtomapandidentifygenesthatcontrolresponsecurvesforthedatacomposedoftherstcategoryofsubjects.Functionalmappingcanbereadilymodiedtoconsiderthesecondcategoryofsubjectsmeasuredwithunevenlyspacedtimeintervals[ 71 ]iftheintermittentmissingresponsesareignorable.Thereasonsforthedropoutsofthethirdcategoryofsubjectsmayincludelackofecacy,adverseeectordeath,and,thus,itisreasonabletoconsidertheirdropoutsasinformative.Inthiscase,weneedmorecomplexmodelstodrawaprecisepictureofthegeneticarchitectureoflongitudinalresponsesthanthetraditionalfunctionalmappingmodel. 35

PAGE 36

byconditioningDionYi,whereasthejointdensityfunctioninamixturemodelisspeciedasf(Yi;Di;)=f(Di;D)f(YijDi;YjD) byrstconditioningYiuponDi,wheredenotestheparametersetscharacterizingthejointdistributionofYiandDi,Y,DaretheparameterscharacterizingthemarginaldistributionofYiandDi,respectively,andYjDandDjYcharacterizetheconditionaldistributionsofYigivenDiandDigivenYi,respectively. Intheformulationofselectionmodels,thefailuretimedistributionmaydependdirectlyonelementsofYi,whichcanbecalledoutcome-dependentselectionmodel,ormaydependonYithroughindividualrandomeectsusedtodescribeitsdistribution,which 36

PAGE 37

Outcomedependentselectionmodelsaresuitedtosituationswithaxeddiscretesetofmeasurementtimet=(t1;:::;tK)forYi,i=1;:::;Nandwhendropoutdependsonthemostrecentmissingoutcomeorbothobservedandmissingoutcomes.DiggleandKenward[ 72 ]proposedanoutcome-dependentselectionmodelinwhichtheprobabilityofdropoutattimetkisafunctionofbothlongitudinaloutcomehistorypriortotkandtheunobservedYik.Intheirformulation,longitudinaloutcomesfollowamultivariatenormaldistributionwhereYiN(Wi;) anddropoutprobabilityismodeledwithalogisticregressionsuchaslogit[prfD=tkjDtk;Hik;Yik;Zig]=0+01Hik+2Yik+03Zi; 73 ].Andthisformulationonlyaccommodatesmonotonedropoutandthemeasurementtimesareaxedsetforallsubjects. Sometimesdropoutismoredirectlyrelatedtoalongitudinaltrend.Forexampleinclinicaltrials,patientsaretypicallyremovedfromthestudyiftreatmenthasnoeectorsevereadverseeectoveraxedperiodtime.ThesesituationsimplythatthedropoutprocessdependsonanunderlyingdiseaseprogressionrelatedtoYi,ratherthanonactualoutcomes.Randomeectsdependentselectionmodelsinwhichtheeventtimedistributionandlongitudinaldataaretakentodependonacommonsetoflatentrandomeectsare 37

PAGE 38

ThelinearmixedeectsmodelforlongitudinalmeasurementsisattractiveinthissituationsinceitdoesnotrequireYitobeobservedatthesamesetofoccasionsortohavethesamedimension.ThemodelisformulatedasYiji=Xi+Zii+eiandf(dijyi;i;DjY)=f(diji;DjY)whereXiisthedesignmatrixofxedeectsandZiisthedesignmatrixofrandomeectsi.ei=fei1;:::;eikigisthemeasurementerrorvectorandindependentofi,eijN(0;2),j=1;:::;ki.iN(;V).InasimplecasewhereYifollowsalineartimetrend,i=(0i;1i)0representsanindividuals'srandominterceptandslopeand WuanCarroll[ 74 ]rstproposedarandomeectsmodelforlongitudinaldatainthepresenceofinformativecensoring,inwhichtheindividualrandomeectsincludedinterceptsandslopesanddropoutonlyoccurredatdiscretetimepointswherethe 38

PAGE 39

75 ]proposedsucharandomeectsdependentselectionmodel:Yij=0i+1itij+eijwherei=(0i;1i)0istherandomeectsvector,distributedasN(;V)andeijN(0;2e)isthemeasurementerror,iandeijareindependent.Andtheyproposedtousethecoxmodelforthedropoutprocesswherethehazardofdropoutdependsonthelongitudinaldatathroughitscurrentunobservedtruevalue: 76 ]extendedtheWulfsohnandTsiatis's[ 75 ]model.TheyassumethatthemeasurementandeventprocessesareconditionallyindependentifbothconditionalonalatentbivariateGaussianprocess.Schluchter[ 77 ]andDeGruttolaandTu[ 78 ]modelledtherandomeectsandfailuretimeasamultivariatenormaldistribution.FollmanandWu[ 79 ]extendedthelinearmixedmodelforlongitudinaldatatogeneralizedlinearmixedeectsmodel. Little[ 67 ]proposesapatternmixturemodelwhereforeachpotentialdropoutoutcomethereisadierentpatternforYi.SoDihereisaqualitativecovariate.HeassumesYifollowsamultivariatenormaldistributionconditionalondropouttimeDi. 39

PAGE 40

Randomeectsmixturemodelsconsiderthedropoutasarandomeectforthelongitudinalprocess.ThemodelinPawitanandSelf[ 80 ]wasarandomeectsmixturemodelandtheymodelledthefailuretimeprocessasaWeibulldistribution.Otherrandomeectsmixturemodelsinclude[ 61 ],[ 65 ],[ 81 ]. Guoetal.[ 82 ]proposedarandompattern-mixturemodel.Theygeneralizedthedenitionofpattern.Thepatternnowcanbedenedaccordingtoagoodsurrogateforthedropoutprocess.Forexample,thepatterncanbedenedaccordingtoabaselineortime-varyingcovariate,ortimetodropoutasoriginal.Theytreatedthepattern-specicparametersasnuisanceparametersandmodeledthemasrandom.Aconstraintisthenimposedonthepatternbylinkingittothetimetodropoutusingarandom-eectssurvivalmodel,e.g.,[ 83 ]. 62 ]andLittle[ 68 ].Generallyspeaking,selectionmodelscanbeappropriateforbothsurvivalandlongitudinalsettings,whereasmixturemodelsweremainlyusedinlongitudinalstudies.Mixturemodelsareattractivebecauseoftherobustnessinmodelingthedropoutprocess.Bothselectionandmixturemodelshavesuchdrawbacksassensitivitytoparametricassumptionsandcomputationalcomplexity.Selectionmodelsneeddistributionassumptionsforthedropoutprocess.Mixturemodelsoftenneedsimplifyingassumptionstoensuretheidentiabilityofparameters. 40

PAGE 41

9 ].Butoriginalfunctionalmappingneedscompletelongitudinalmeasurementsonatimescalewedesign,notallowingrandommissingofdataduringthetrial.Inreality,itisofteninevitablethatsomesubjectsmaydropoutpermanentlyfromthestudybeforeitends.Ignoringthesedropoutdataandtheunderlyingdropoutmechanismsmayleadtobiasedinferencefortheirgeneticeectsonlongitudinalresponses. Instatistics,aneectivemethodonanalyzinglongitudinaldatawithnon-ignorabledropoutisbasedonpattern-mixturemodelsthatconstructalikelihoodonthejointdistributionofthelongitudinalresponsesanddropouttimesandfactorsthejointlikelihoodasthemarginaldistributionofthedropouttimeandtheconditionaldistributionofthelongitudinalresponsesgiventhedropouttime[ 65 ],[ 67 ],[ 68 ].Pattern-mixturemodelshavenowbeenusedinmanyapplicationsforwhichnon-ignorabledropoutdataarepresent[ 61 ],[ 65 ].Inthischapter,Iwillproposeapattern-mixturemodelwithintheframeworkoffunctionalmapping.Inthismethod,wejointlymodeltheobservedlongitudinaldata,andthedropouttimegiventhegeneticinformation.TheparametersareestimatedbyageneralNewton-Raphsonalgorithm.Iformulateanumberofhypothesestestsregardingthegeneticcontroloflongitudinalresponsesanddropouttimes.Simulationstudiesandrealdataexamplesareusedtodemonstratethepowerandusefulnessofthemodel.Functionalmappingintegratedwiththenon-ignorabledropouttheorywillopenupagatewaytostudythegeneticarchitectureofcomplexbiologicallyandclinicallymeaningfulprocesses. 41

PAGE 42

4.2.1Notation 2.2.2 ,observedtwo-SNPgenotypeisexpressedasr1r01=r2r02(r1r01;r2r02=1;0).LetGidenoteSNPgenotypeandQidenotecompositediplotypeforsubjecti.ThetwoSNPsformfourhaplotypeswiththerespectivefrequenciesdenedinSection 2.2.2 ,tendiplotypesandninegenotypes.Foreachsubject,GiisobservedandQimaybeunobservable.Alsoforeachsubject,longitudinaldataarecollectedatmultipletimepoints,alongwithdropoutinformation.Thelongitudinalobservationvectorforsubjecti(i=1;:::;N)isdenotedbyYi=(Yi1;:::;Yimi)measuredataseriesoftimepointsti=(ti1;:::;timi).LetDibetheeventtimeforsubjecti,whichmayberightcensoredattimeCi.Theindicatorofcensoringisdenotedbyi=I(DiCi)whichisequaltozeroifDiiscensoredandone,otherwise.Theobservedtimeeventisexpressedas~Di=min(Di;Ci).DihaspossiblevaluesfromS=fs1;s2;:::;sLg,wherethelasttimepointsLarereservedforthetrialcompletiontime.Thus,thedataforsubjecticonsistsoflongitudinalandtimeeventobservationsandSNPgenotypeinformationdenotedas(Yi;ti;~Di;i;Gi).Nextwewillformulatethelikelihoodfunctionfortheobserveddata. 42

PAGE 43

2{4 ).Thefrequenciesofeachgenotypeisexpressedintermsofthefrequenciesoffourhaplotypes(p11;p10;p01;p00).GenotypeGifollowsamultinomialdistributionwith9possibleoutcomeswithprobabilitygivenintable 2-1 .Thelikelihoodfunctionisconstructedas EMalgorithmisreadilyusedtoobtainthemaximumlikelihoodestimates(MLEs)ofp.Thenwecancalculatethemleofrelativefrequencyintable 2-1 .CompositediplotypeQigivenGiandriskhaplotypeAisdeterminedfromtable 2-1 .Ithasthreepossiblecategories0,1,2.Theoretically,thisestimationprocesscanbeextendedtoanynumberofSNPsgenotypedfromthegenome. ThejointdensitydistributionofYi,DigivenQiis 43

PAGE 44

Forthecovariancestructureofthemeasurementprocess,variousparametricformshavebeenproposed,asdiscussedinSection 2.2.4 ofChapter2.Wewillusethestructuredantedependencemodeloforderoneasanillustration. Sotheconditionaldensityfunctionforthelongitudinalmeasurementsgiventhedropouttimeandcompositediplotypeoftheithsubjectis: 2expf1 2(yiiql)1i(yiiql)0g(4{2) whereiql=(E0ql+EmaxqltHqli1 whereiq=I(Qi=q)andil=I(Di=sl),ql=Pr(Di=sljQi=q);PLl=1ql=1. 44

PAGE 45

2expf1 2(yiiql)1i(yiiql)0gqlf(Qi=qjGi)#il Ifthereiscensoringindropouttime, 2expf1 2(yiiql)1i(yiiql)0gf(Di=sljCi;i=0;Qi=q)f(Qi=qjGi)i where Wearrayedallparametersusedtodescribethemodelas=fE0ql;Emaxql;EC50ql;Hql; ;2e;ql;q=0;1;2;l=1;2;:::;Lg.Andweletqldenotetheparametersetthatdescribesthedistributionofthelongitudinaldataforsubjectswhocarrycompositediplotypeqanddropoutatsl,thatisql=fE0ql;Emaxql;EC50ql;Hql;;2eg. 73 ].ThiscanbedoneeasilyusingMatlabsoftware.AlsowecanderiveanexplicitgeneralNewton-RaphsonalgorithmfortheMLEs. 45

PAGE 46

2expf1 2(yiiql)1i(yiiql)0gqlf(Qi=qjGi)iliq LetZobethegenericobserveddataandE,EidenotethegenericconditionalexpectationsconditionedonZoandcurrentestimatedparameters.Theconditionalexpectationofthecompletelog-likelihood(omittingtheconstantterm)isexpressedas 2expf1 2(yiiql)1i(yiiql)0gqlf(Qi=qjGi)iiliqjZo)=NXi=1LXl=12Xq=0EilEiqhmi 2log(jij)1 2(yiiql)1i(yiiql)0+log(ql)+log(f(Qi=qjGi))i Here,Eil=E(I(Di=sl)jzoi)andEiq=E(I(Qi=q)jzoi). Forsubjectsthatthedropouttimesareobserved,Eilisequaltoil,and 2(yiiql)1i(yiiql)0gqlf(Qi=qjGi)I(Di=sl) 2(yiiql)1i(yiiql)0gqlf(Qi=qjGi)I(Di=sl) (4{8) Forsubjectsthatthedropouttimesarerightcensored, 46

PAGE 47

2(yiiql)1i(yiiql)0gqlf(Qi=qjGi)I(sl>~Di) 2(yiiql)1i(yiiql)0gqlf(Qi=qjGi)I(sl>~Di) (4{9) 2(yiiql)1i(yiiql)0gqlf(Qi=qjGi)I(sl>~Di) 2(yiiql)1i(yiiql)0gqlf(Qi=qjGi)I(sl>~Di) (4{10) Letdenotethegenericparametersetforthelog-likelihoodfunction.Weknowthat 2log@jij 2(yiiql)@1i 47

PAGE 48

2log@jij 2(yiiql)@1i ForobtainingtheMLEsofinobservedlog-likelihoodfunctionlo(jZo),aNewton-Raphsonalgorithmisiteratedasfollows: (J+1)=(J)+fV(1)@lo Actually,replacingVwithanyV1whereV1>V,thealgorithmstillcanworkbutwillneedmorestepstoconverge.Sincevar(X)=E(XXT)E(X)E(X)T>0,weknowthatitisalwaystrueE(XXT)>E(X)E(X)T.Letlf(jZ)denotethefulllog-likelihoodfunction.IfX=@lf NowLetV1=E@2lf 4{18 and 4{20 ,weknowthatV1>V.AgeneralNewtonRaphsonalgorithmisiteratedas: (J+1)=(J)+fV(1)1@lo 48

PAGE 49

WeknowthatasymptoticallyMLEs^N(;I1()).I(^)andB(^;zo)arebothconsistentestimatorsfortheFisherinformationmatrixI().SoifwecancalculateI(^)orB(^;zo),wecanestimatetheFisherinformationmatrixthusthevarianceofthemaximumlikelihoodestimators. Undertheregularityconditionsin[ 84 ]whichguaranteethattheMLEssolvethelog-likelihoodequationsandtheFisherinformationmatrixexists,Louis[ 85 ]describedonemethodforobtainingtheobservedinformationmatrixofthemaximumlikelihoodestimates.From[ 85 ], (4{19) When=^,B(^;zo)=E(B(^;z)jzo)E(S(^;z)ST(^;z)jzo)sinceS(^;zo)=0.Thismethodneedstocalculatetheconditionalexpectation(conditionalontheobserveddata) 49

PAGE 50

Adierentso-calledsupplementedEMalgorithmorSEMalgorithmwasproposedbyMengandRubin[ 86 ]toestimatetheasymptoticvariance-covariancematrices,whichcanalsobeusedforthecalculationsofthesamplingerrorsfortheMLEsoftheparameters.Thismethodavoidsthecalculationoftheconditionalexpectation(conditionalontheobserveddata)ofthesquareofthecompletedatascoresE(S(^;z)ST(^;z)jzo)butneedtocalculatetheconvergencerateofa"forced"EMalgorithm. Thereisanothereasiermethodfortheestimationofthevariance[ 62 ].Meilijson[ 87 ]advocatedanempiricalFisherinformationmethod.ThesamplecovariancematrixoftheindividualscoresSi(;zoi)is ^I(;zo)=NXi=1Si(;zoi)Si(;zoi)T1 UnderH0,weassumethatthedropoutisignorableundertheassumptionthatthereisnoQTNeect.UnderH1,weassumethatthedropoutisinformativealsounderthere 50

PAGE 51

Aftertestingtheinformativenessofthedropout,wecantesttheexistenceoftheQTNbasedonthetestresultoftest 4{20 .InordertotesttheQTN,weneedtwosteps.First,wetestiftheassumedQTNaectsthedistributionofthedropouttime,thenbasedontheresultofthistest,wetestiftheassumedQTNaectsthelongitudinalcurve. InordertonotconfoundtheQTNeectsonthedropouttimewithQTNeectsonthelongitudinalresponse,thetestforQTNeectsonthedropouttimeisdoneundertheassumptionthatthereisnoQTNeectonthelongitudinaldata. Ifthedropoutisinformative,wecanformulatethetestfortheQTNaectingthedistributionofthedropouttimeas: Or,ifthedropoutisignorable,as: Ifthedropoutisignorableandtest 4{22 indicatesthattheassumedQTNaectsthedistributionofthedropouttime,thetestfortheQTNaectingthedistributionofthelongitudinalcurveis: Ifthedropoutisignorablebuttest 4{22 indicatesthattheassumedQTNdoesnotaectthedistributionofthedropouttime,thetestfortheQTNaectingthedristributionofthelongitudinalcurveis 51

PAGE 52

4{21 indicatesthattheassumedQTNaectsthedistributionofthedropouttime,thenthetestfortheQTNaectingthedistributionofthelongitudinalcurveis: Otherwise,ifthedropoutisnon-ignorable,andthetest 4{21 indicatesthattheassumedQTNdoesn'taectthedistributionofthedropouttime,thetestfortheQTNaectingthedistributionofthelongitudinalcurveis: Forallthesetests,thelikelihoodratiotestcanbeusedandtheteststatisticsarecalculatedsimilarlyasinthetest 2{8 .Theteststatisticsapproximatelyhavea2distributionwithdegreefreedomastheparameternumberdierencebetweenthetwomodelsunderH0andH1. 4.6.1BackgroundandDataSummary 88 ].Ineachofthetwogenesthereareseveralpolymorphismscommoninthepopulation.Twocommonpolymorphismsareidentiedatcodons49(Ser49Gly)and389(Arg389Gly)forthe1ARgeneandatcodons16(Arg16Gly)and27(Gln27Glu)forthe2ARgene,respectively.Thefourhaplotypefrequencieswithineachgenewereestimated,whichwereusedtoestimateallelefrequenciesandlinkagedisequilibrium.Highlysignicantlinkagedisequilibrium,^D1=0:04for1ARand^D2=0:13for2AR,wasdetectedbetweentwoSNPsforeachgene(p-value<0:001)basedonthehypothesistest[ 51 ].So 52

PAGE 53

Aclinicaltrialstudywasconductedwhichattemptedtodetectifhaplotypevariantswithinthesecandidategenesaecttheheartrateresponsetoadrugnameddobutamineandhowtheyaectthedrugresponsecurve. Therearetotally163peopleinthestudy.Butthegeneticinformationforsomeofthemisnotavailable.Soonly143menandwomenwereincludedintheanalysis.Dobutaminewasinjectedintothesesubjectstoinvestigatetheirresponseinheartratetothisdrug.Thesubjectsreceivedincreasingdosesofdobutamine,untiltheyachievedtargetheartrateresponseorpredeterminedmaximumdose.Thedoselevelsusedwere0(baseline),5,10,20,30and40mcg/min,ateachofwhichheartratewasmeasured.Thetimeintervalof3minuteswasallowedbetweentwosuccessivedosesforsubjectstoreachaplateauinresponsetothatdose. Manysubjectsreachedthethresholdsoftheirheartratesbeforethehighestdosage(Figure 4-1 )soonlyincompletemeasurementsweretakenforthem.Table 4-1 showsthatabout3%,10%,19%subjectsdroppedoutatdoselevels10,20,30,respectively.Andtotallythereareabout32%percentsubjectsthatdonotcompletethestudy.[ 51 ]simplyignoredthosesubjectsthathadincompletemeasurementsandintheiranalysis,onlysubjectsforwhomtherewereheartratedataatallthesixdoselevelswereincluded(Figure 4-1 ).Butthereasonsforthesedropoutsarethephysicallimitationforhighheartrateorthebloodpressureistoohighortoolowandotheradverseeects.So,itisreasonabletothinkthatsomeofthedropoutsareinformative.Sothedropoutinformationshouldbeincorporatedtomakecorrectinference. Itisofourinteresttocomparetheresultsfromtheanalysisusingsubjectsthatarecompletersonly,theanalysisusingallthesubjectsbutconsideringthedropoutas 53

PAGE 54

4.6.2.1Model1:analysisusingcompletersonly 4-2 showsthelikihoodratioteststatisticsandthecorrespondingp-valuesforthetestoftheriskhaplotypebyassumingeachofthefourhaplotypesastheriskhaplotype.Ifweselectthesignicantlevelas0.05,wecanconcludethat[GG]isariskhaplotype. Soweassumethat[GG]isariskhaplotype.Andtable 4-3 showsthemaximumlikelihoodestimationsoftheparameters.Alsogure 4-2 showsthatsubjectshaving[GG][GG]havethehighestresponsecurvetothedrugand[GG][ Fromtable 4-4 ,ifweconsider0.05asthesignicantlevelwecanseethatwecan'tdetectanyriskhaplotypesowewillconcludethatthereisnoQTNaectingthelongitudinalcurve.Especially,thelikelihoodratioteststatisticsfortestingif[GG]isariskhaplotypeis12.8027withadegreefreedom14-6=8,andthep-valueis0.1188.Soitisnot 54

PAGE 55

Inthiscase,thedropouttimesetS=fs1;s2;s3;s4g=f10;20;30;40g.Andweconsiderthatthereisnocensoringonthedropouttimes,i.e.,i=1fori=1;:::;N:SincethesamplesizeswhenD=s1;s2;s3aresmall,andare4,14,27,respectively,togetmorereliableestimationofthecurveparameters,afeasiblewayistocombineobservationsacrossdropoutpatternstoreducethenumberofparameters. Figure 4-3 showsthesamplemeanheartratesofthesubjectsaccordingtotheirdropouttimes.Exploratorystudyshowsthatforsubjectswhoonlyhave3measurements,itisreasonabletoassumealinearcurveandthisensurestheparameteridentiability.Andforsubjectswhohave4measurements,weassumetheyhaveaquadraticcurve.Forsubjectswhohave5and6measurements,weassumetheyhaveaEmaxcurve. Wehavettedthreejointmodelsunderdierentassumptions.JointmodelIisthefullmodel:noconstraintsontheparametersforthefourdropoutpatterns.JointmodelIIcombinestheobservationsacrosstherstthreedropoutpatterns,i.e.,dropoutsatdoselevels10,20,30havethesamedistribution:q1=q2=q3,q=0;1;2.Inthisway,wecanusetheEmaxcurvetottheobservationsfromtherstthreedropoutpatterns.JointmodelIIIisonthebasisofjointmodelIIplusanadditionalconstraintwhichcombinesthesubjectsforcompositegenotypeQ=0acrossdierentdropoutpatterns:q1=q2=q3,q=0;1;2,0l=04,l=1;2;3. 55

PAGE 56

4-4 ).Thenitwouldbeofourinteresttoassumethatgivendropouttime10,20,30,thelongitudinaldatahasthesamedistribution,thustoreducetheparameternumbers.SowetthedatausingjointmodelIIthatassumestheconditionaldistributionoflongitudinaldatagivendropouttime10,20,30aresameanddierentfromtheconditionaldistributionoflongitudinaldatagiventheyarecompleters(q1=q2=q3,q=0;1;2).From 4-5 ,wecanseethatthettedcurvesforthedropoutsandcompletershavingcompositediplotype0aresimilar.Sowewouldliketotthemodelwhichassumesthatq1=q2=q3,q=0;1;2,0l=04,l=1;2;3.WecallthisjointmodelIII. Table 4-5 comparedthemaximumlog-likelihoodvaluesforthethreejointmodels.Wewillselecttheappropriatemodelbasedontwocommonlyused\informationcriterion",theBayesianinformationcriterionBIC[ 89 ],andAkaike0sinformationcriterionAIC[ 90 ].Thesearecalculatedas: AIC=2l(^)+2pBIC=2l(^)+log(n)p BasedonthecriterionofAICorBIC,thesmallerthebetter,wecanseethatjointmodelIIIisthebest.AlsojointmodelIIIisnestedinjointmodelIIandthep-valueforthetestforjointmodelIIIagainstjointmodelIIis0.91.SowenallychoosejointmodelIIItodoouranalysis. AfterwetthejointmodelIII,wecantestifthedropoutisinformativebytesting 56

PAGE 57

Nowweknowthatthedropoutisinformative.Next,beforewetesttheexistenceoftheQTNaectingthelongitudinaltrait,weneedtotestiftheQTNaectsthedropouttimeasintest 4{21 .Theteststatisticsis0.2887withadegreefreedom2.Thep-valueis0.87.Sothisdoesnotsupportthatthecompositediplotypewithriskhaplotype[GG]aectsthedistributionofthedropouttime.ThusweneedtoadjustthejointmodelIII.Onemoreconstraintthatql=l(q=0;1;2)isadded.WecallthisjointmodelIV. Thenbasedontheresultsthatthedropoutisinformativeandriskhaplotype[GG]doesnotaectthedistributionofthedropouttime,wenowcantestifriskhaplotype[GG]aectsthelongitudinaltraitusingtest 4{26 .Theteststatisticsis22.44withdegreefreedom12.Thep-valueis0.0329.Sowecanconcludethat[GG]isariskhaplotype.Table 4-6 showstheparameterestimationsfromjointmodelIV. Fromgure 4-6 andtable 4-6 ,wecanseethatforcompletersanddropouts,theQTNhasdierenteectsonthedrugresponse.Whileincompleters,subjectscarryingcompositediplotype0havethehighestresponsecurveandindropouts,subjectscarryingcompositediplotype2havethehighestresponsecurve.Thismaybeduetoothergeneticeectsotherthanthegeneticvariantsingene2ARwearestudyinghere. 4-7 representtheaveragecurveswhichwouldhavebeenobservedhadallthesubjectscompletethestudywiththeassumptionthatthefutureobservationsconditionedonthedropouttimeandcompositediplotypecanbeinferredfromtheestimatedcurvesusingourobserveddata.Youwillseethatsubjectscarrying[ 57

PAGE 58

Overall,wecanseethatignoringthedropoutinformationwouldnotletusdetectthisQTN.ButthejointmodelhastheabilitytodetecttheQTN.Wecomparetheestimatedcurvesofthemarginalmeansoverthedropouttimesandtheestimatedmeansfrommodel2eventhoughmodel2doesnothavetheabilitytodetecttheQTN(Fig. 4-8 ).Fromcomparison,wecanseethatwhiletheestimatedcurvesforthecompositediplotypes0and2areverysimilar,theestimatedcurveforthecompositediplotype1frommodel2isbelowthatfromthemarginalmeanofthejointmodelIV.Thisisbecausetheresultofmodel2isnotadjustedbytheeectofthedropout. 4.7.1SimulationScenarios WeassumethatthepopulationfromwhichasetofsamplesaredrawnforQTNmappingisatHardy-Weinbergequilibrium.Wesimulatea2-SNPQTN(SNP1withallelesA(1)anda(0),SNP2withallelesB(1)andb(0))thatencodethevariationinlongitudinalresponse.TheallelefrequenciesatthetwoSNPsandtheirlinkagedisequilibrium(LD)inthepopulationareassumedasintable 4-8 .Byassumingoneofthefourhaplotypesasariskhaplotype,wespecifythreedierentcompositediplotypes, 58

PAGE 59

4-7 .Foreachsubject,thephenotypicvaluesofalongitudinaltraitaresimulatedatnineevenly-spacedtimepointsT=(0;5;10;15;20;25;30;35;40),allowingacertainproportionofsubjectstobedroppedoutnon-ignorablyattime25.Soforsomesubjects,theydonothavecompletemeasurements,andtheydropoutattime25.ThesetthatdropouttimeDcangetvaluesfromisS=fs1;s2g=f25;40g,i.e.,therearepossiblytwopatterns:dropouts(attime25)andcompleters(attime40)forthelongitudinaldata.Basedontheassumeddropoutprobabilitiesfordierentcompositediplotypetype,werstsimulatethedropouttimeforeachsubject.ThenconditionedonthedropoutpatternandcompositeQTNtype,thelongitudinaldataforeachsubjectissimulatedfromamultivariatenormaldistributionwithaEmaxmeancurveandaSAD(1)covariancestructure. 2-1 ,weknowthatmarginallyundertheassumptionthat[11]isariskhaplotype,thecompositediplotypeQhasadistributionas:Q=8>>>><>>>>:0withprobabilityP(Q=0)=p2111withprobabilityP(Q=1)=2p11p10+2p11p01+2p11p002withprobabilityP(Q=2)=p210+2p10p01+2p10p00+p201+2p01p00+p200 Thevarianceofthelongitudinalobservationattimetis: 59

PAGE 60

Thegeneticvarianceattimetis: where

PAGE 61

4-7 .Afterthedatawassimulatedforeachscenario,weanalyzethembythreemodels:model1usingonlythecompleters,model2usingallthelongitudinaldatabutignoringthedropouttimes,model3usingthejointmodelweproposed.Becausethisisasimulationstudy,wealreadyknowwhichhaplotypeistheriskhaplotypesoweanalyzethedatabyassumingthetrueriskhaplotypeastheriskhaplotype. Fromtable 4-8 ,wecanseethattheestimationoftheallelefrequenciesandlinkagedisequilibriumareverygoodforallthefourscenarios.Thestandarderrorsarealmostsameforscenario1and2,scenario3and4.Andscenario3and4havesmallerstandarderrorsthanthatofscenario1and2.Thisisbecausetheestimationsoftheallelefrequenciesandlinkagedisequilibriumareusingthesamemultinomiallikelihoodfunction 61

PAGE 62

.Themaximumlikelihoodestimationsareunbiasedandthestandarderrorsdecreaseasthesamplesizesincreasefrom200asinscenario1and2to500asinscenario3and4. Fromtables 4-9 4-10 4-11 ,and 4-12 ,wecanseethatforallthefourscenarios,theestimationsfrommodel1whichonlyusethesubjectsthathavethecompletelongitudinalmeasurementshavethelargestbias,andthebiasesfortheestimationsfrommodel2whichuseallthesubjectsbutconsiderthedropoutasignorablearesmallerthanthoseofmodel1butlargerthanthoseofmodel3whichusethejointmodeloflongitudinaldataanddropoutdata.Thisadvantageofmodel3ismoreobviouswhenthestudyhasalargedropoutrate.Anditisobviousthattheestimationsfrommodel2and3aremoreprecisethanmodel1.Thestandarderrorsformodel3aresmallerthanthatfrommodel2fortherst6timepoints(atthistimepoint,allsubjectsarestillremainedinthestudy).Atthelast3timepoints,someofthesubjectsdroppedout,andthebiasesformodel3aresmallerthanmodel2,butthepriceisthattheyhavelargerstandarderrorsthanmodel2.Sotheadvantagesformodel3isthatbyincludingthedropoutinformation,wereducedtheestimationbiases,andalsothestandarderrorsforthemeansbeforethedropoutoccurs.Alsotheimprovementoftheprecisionislargerwhenwehaveabiggerattritionrate.Afterthedropoutoccurs,itmaysacricesomeprecisiontoreducethebias. Fromtable 4-13 ,wecanseethatfortherstthreescenarios,thepowerofmodel3islargerthanbothmodel2andmodel1fordetectingtheexistenceofQTNwithamoresignicantp-value.Andinthescenario4whichhasasmallheritability(0.05)andlargedropoutproportion(around70%),model3andmodel2haveacomparablepower.Butinthiscase,theestimationsoftheparametersfrommodel2havemoreseverebiasesthanmodel3. 62

PAGE 63

Byapplyingthisjointmodeltoarealexampleandsimulationstudies,wecanseethatthismodelhasabetterabilitytodetectQTNsinmostcasesthanthemodelwhichignorethedropoutprocess.AproblemwithttingthismodelishavingenoughobservedeventstoreliablyestimateYj(D;Q).SoweoftenneedtomakesomesimplifyingassumptionsaboutfYj(D;Q)toensureidentiabilityofthemodelparametersandstabilityofthealgorithmusedforMLestimation.Also,byaveragingoverthedropoutprobabilities,weobtainthemarginaldistributionofthelongitudinaldata.Thisisbasedontheassumptionthatf(YojD;Q)=f(YjD;Q). Inhumangeneticstudies,therearemanyothervariables(besidesthetimeandgeneticinformation)whichmayinuencetheexperiment,suchasage,sexetc.Sometimesthesevariablesshouldbeincludedinthemodelascovariates,thusallowingtocharacterizetheindividualvariationmoreprecisely.Supposetherearepsubjectspeciccovariatesinthestudy,wedenotetheireectsas,whichisap1vector,andthedesignmatrixforofsubjectiattimetijasXij,whichisaNpmatrix,wecanmodifytheconditionaldistributionoflongitudinaldatainchapter4tothefollowing: ThemodelcanbefurtherextendedtomodelinteractionsbetweendierentQTNsandbetweenQTNsandenvironments.Thesemodiedmodelswillndanimmediate 63

PAGE 64

64

PAGE 65

Thedobutaminedrugresponseexperimentdata.Thecurvesinyellowarethecompleteresponsescoveringalldoselevels,whereasthecurvesinothercolorsaretheincompleteresponsesforsubjectsthatdropoutearly. Figure4-2. Estimatedmeancurvesusingcompletersonly(model1) 65

PAGE 66

Thesamplemeansofheartratesatdierentdosagesgroupedaccordingtothedropouttimes ] Figure4-4. EstimatedmeancurvesfromjointmodelI,whichassumesalinearcurveforforsubjectsdropoutatdoselevel10,aquadraticcurveforsubjectsdropoutatdoselevel20,andEmaxcurvesforsubjectsdropoutatdoselevel30andcompleters. 66

PAGE 67

EstimatedmeancurvesfromjointmodelIIwhichcombinesobservationsfromdropoutpatterns10,20,30toauniquedropoutpattern. ] Figure4-6. EstimatedmeancurvesfromjointmodelIIIwhichcombinesobservationsfromdropoutpatterns10,20,30toauniquedropoutpatternandalsoassumesthatforsubjectscarrying[GG][GG],thedropoutsandcompletershavethesamedistribution. 67

PAGE 68

Figure4-7. Estimatesofthemarginalcurvesofheartrate,averagingtheconditionalmeansoverthedropoutprobabilities 68

PAGE 69

Estimatesofthemarginalcurvesofheartrate,averagingtheconditionalmeansoverthedropoutprobabilities,comparedwiththecurvesfrommodel2,whichconsidersthedropoutasignorable Figure4-9. Thesimulationmeancurvesforallthefourscenarios 69

PAGE 70

Completersanddropoutsinformation DropoutsCompletersTotalDoselevel102030 Number4142798143Percentage(%)2.89.7918.8868.53 Table4-2. Likelihoodratioteststatisticsandp-valuesforthetestsoffourpossibleriskhaplotypesfromcandidategene2ARusingcompletersonly RiskhaplotypeLRp-value [GC]10.02840.2630[GG]16.15940:0402[AC]12.82650.1180[AG]11.34920.1827 Table4-3. Estimatedparametersassuming[GG]astheriskhaplotypeusingcompletersonly CompositediplotypesParametersEstimations Q=0E079.35Emax116.38EC5039.82H1.74 Q=1E076.18Emax62.90EC5029.55H2.00 Q=2E076.92Emax59.65EC5022.72H2.06 -0.97-2e53.26 70

PAGE 71

Likelihoodratioteststatisticsandp-valuesforthetestsoffourpossibleriskhaplotypesfromcandidategene2ARusingalllongitudinaldata RiskhaplotypeLRp-value [GC]4.45890.8135[GG]12.80270.1188[AC]13.97990.0823[AG]13.86300.0854 Table4-5. Comparisonofthethreejointmodels Log-likelihoodParametersAICBIC JointmodelI-30005061006248.1JointmodelII-2961.3295980.66066.5JointmodelIII-2961.8255973.66047.7 71

PAGE 72

EstimatedparametersforthejointmodelIV.MLEdenotesmaximumlikelihoodestimation,STDdenotesstandarderror.STDisestimatedusingthemethoddescribedin[ 85 ]. CompositediplotypeDropoutpatternParameterMLESTD DropoutQ=0E078.741.21Emax120.3332.06EC5041.4712.88H1.660.22 Q=1E080.490.95Emax131.5757.79EC5036.6815.42H1.710.25 Q=2E082.801.06Emax72.5811.39EC5017.552.78H2.070.23 CompleterQ=0E078.741.21Emax120.3332.06EC5041.4712.88H1.660.22 Q=1E076.180.66Emax62.8210.02EC5029.535.06H2.000.28 Q=2E076.910.55Emax59.626.99EC5022.732.77H2.060.25 -0.960.01-2e62.061.94-10.31470.02 72

PAGE 73

Trueparametersdescribingthemodelsunderfourdierentsimulationscenarios. DropoutsCompleters Q=0E08377Emax7369EC501825H21 Q=1E08077Emax7260EC501823H12 Q=2E07976Emax8061EC502328H1.72-0.8 Scenario1(0.35,0.3,0.25)(0.65,0.7,0.75)2e33.6Scenario2(0.75,0.7,0.65)(0.25,0.3,0.35)2e30.4Scenario3(0.35,0.3,0.25)(0.65,0.7,0.75)2e97.5Scenario4(0.75,0.7,0.65)(0.25,0.3,0.35)2e91.6 Maximumlikelihoodestimates(MLEs)ofSNPallelefrequenciesandlinkagedisequilibrium.Thestandarderrors(STDs)oftheMLEswerecalculatedfrom500simulationreplicates. PopulationgeneticparameterspApBD Scenario1MLE0.60010.59830.0803STD0.02470.02470.0159 Scenario2MLE0.59970.60050.0800STD0.02390.02310.0147 Scenario3MLE0.59980.60010.0792STD0.01580.01550.0096 Scenario4MLE0.60050.60130.0797STD0.01500.01550.0092 73

PAGE 74

Simulationresultsforscenario1:N=200,h2=0:1,smalldropoutproportion(around30%). Time 0510152025303540 Q=0Model1Bias-2.05-1.84-3.11-4.72-5.94-6.67-7.02-7.12-7.06Std1.151.421.571.671.731.701.651.691.89Model2Bias0.04-0.140.160.260.02-0.42-0.95-1.49-2.01Std1.051.181.451.711.912.012.072.162.29Model3Bias0.030.070.130.220.200.150.100.100.13Std0.961.121.261.351.381.431.621.972.40 Q=1Model1Bias-0.88-2.26-3.82-4.60-4.81-4.78-4.68-4.58-4.49Std0.730.811.021.071.071.031.001.061.22Model2Bias0.02-0.050.150.300.19-0.08-0.40-0.71-0.97Std0.620.811.081.171.171.121.091.131.25Model3Bias0.010.030.060.100.110.110.100.090.10Std0.610.710.860.900.900.921.011.181.43 Q=2Model1Bias-0.73-1.61-2.91-3.84-4.36-4.63-4.78-4.87-4.92Std0.971.041.311.411.441.461.421.431.60Model2Bias0.04-0.16-0.18-0.07-0.10-0.34-0.72-1.14-1.56Std0.830.971.291.451.531.551.531.571.72Model3Bias0.030.04-0.02-0.010.010.010.010.040.10Std0.840.931.111.161.181.231.351.622.07

PAGE 75

Simulationresultsforscenario2:N=200,h2=0.1,largedropoutproportion(around70%). Time 0510152025303540 Q=0Model1Bias-4.56-4.36-7.29-10.77-13.34-14.84-15.52-15.65-15.45Std2.072.402.702.812.822.712.592.673.05Model2Bias-0.04-0.29-0.090.02-0.18-0.58-1.03-1.47-1.87Std1.031.141.371.581.761.912.112.382.68Model3Bias-0.06-0.05-0.09-0.05-0.03-0.010.030.100.19Std0.961.091.211.261.281.351.642.092.60 Q=1Model1Bias-2.15-5.30-8.97-10.88-11.44-11.40-11.15-10.87-10.62Std1.041.251.581.631.621.581.541.571.72Model2Bias0.0020.0030.200.260.12-0.13-0.40-0.63-0.82Std0.580.801.051.161.181.191.261.391.56Model3Bias-0.020.060.050.040.040.060.090.140.19Std0.560.710.820.860.870.941.141.441.76 Q=2Model1Bias-5.48-8.88-15.04-19.88-22.35-23.10-22.94-22.37-21.66Std1.301.391.781.901.891.871.851.912.18Model2Bias0.03-0.28-0.23-0.10-0.25-0.69-1.27-1.89-2.48Std0.741.041.441.601.651.681.761.942.20Model3Bias0.040.070.01-0.0040.010.030.080.150.27Std0.730.901.071.131.131.211.572.182.91

PAGE 76

Simulationresultsforscenario3:N=500,h2=0.05,smalldropoutproportion(around30%). Time 0510152025303540 Q=0Model1Bias-2.04-1.88-3.21-4.84-6.06-6.78-7.11-7.19-7.11Std1.341.591.701.761.801.791.771.832.03Model2Bias0.04-0.230.100.22-0.06-0.62-1.28-1.96-2.60Std1.031.261.471.531.591.611.671.791.98Model3Bias0.0010.04-0.0010.050.080.110.170.260.39Std1.031.241.391.441.461.521.722.082.54 Q=1Model1Bias-0.90-2.23-3.80-4.62-4.86-4.83-4.72-4.5849-4.46Std0.750.821.041.101.121.101.061.08771.22Model2Bias0.04-0.090.110.240.09-0.26-0.68-1.0686-1.41Std0.660.750.981.071.101.071.041.07801.19Model3Bias0.020.060.060.070.070.070.090.11350.16Std0.650.720.870.930.950.971.041.19351.42 Q=2Model1Bias-3.13-6.72-11.33-14.23-15.27-15.22-14.69-14.0555-13.45Std0.971.071.411.481.441.421.391.39711.56Model2Bias0.05-0.18-0.050.130.09-0.25-0.79-1.3901-1.98Std0.851.021.371.421.391.361.331.37781.55Model3Bias0.040.080.050.080.120.130.140.16590.23Std0.850.981.221.251.221.261.431.75642.25

PAGE 77

Simulationresultsforscenario4:N=500,h2=0.05,largedropoutproportion(around70%). Time 0510152025303540 Q=0Model1Bias-4.70-4.37-7.24-10.75-13.37-14.89-15.57-15.70-15.50Std1.962.472.662.752.792.742.702.813.14Model2Bias-0.10-0.54-0.24-0.05-0.44-1.19-2.03-2.84-3.56Std1.011.191.451.541.591.671.832.082.35Model3Bias-0.08-0.11-0.14-0.08-0.07-0.09-0.09-0.060.003Std0.971.171.361.441.461.602.012.583.19 Q=1Model1Bias-2.17-5.31-9.06-11.04-11.65-11.61-11.35-11.03-10.72Std1.191.341.721.821.831.771.681.711.93Model2Bias-0.01-0.170.060.12-0.18-0.66-1.18-1.67-2.09Std0.640.871.081.121.081.061.151.321.53Model3Bias-0.010.04-0.002-0.04-0.05-0.030.0040.070.15Std0.630.800.960.990.961.031.271.632.02 Q=2Model1Bias-5.55-8.87-15.00-19.85-22.36-23.17-23.07-22.53-21.82Std1.411.491.831.911.931.961.941.992.31Model2Bias0.05-0.29-0.040.21-0.09-0.86-1.87-2.92-3.92Std0.781.011.301.301.281.321.451.702.03Model3Bias0.010.110.050.020.010.010.030.100.22Std0.770.961.131.151.121.181.512.102.84

PAGE 78

Scenario1Model10.0038(0.0158)0.980Model20.0048(0.0289)0.982Model30.0027(0.0272)0.992 Scenario2Model10.0704(0.1419)0.730Model20.0081(0.0405)0.964Model30.0011(0.0066)0.992 Scenario3Model10.0076(0.0274)0.956Model20.0034(0.0182)0.978Model30.0032(0.0139)0.986 Scenario4Model10.0975(0.1546)0.62Model20.0025(0.0119)0.984Model30.0027(0.0138)0.982 78

PAGE 79

79 ].Onestrategyforhandlingnon-ignorabledropoutsistomodelthejointdistributionofthedropouttimeandthecompletelongitudinalresponsesandthenintegrateoutthemissingvalues.Thiscanbedonebytwoways.Therstwayassumesaselectionmodelinwhichdistributionofdropouttimeisconditionedontherepeatedmeasures.Thesecondassumesapattern-mixturemodelinwhichthedistributionofrepeatedmeasuresisamixtureofdistributionsforsubjectswithindistinctsub-groupsdeterminedbythepatternsofdropout.InChapter4,Iimplementedapattern-mixturemodelintothefunctionalmappingframeworktoconsidertheeectsofnon-ignorabledropouts.Butinmanypracticalsettings,theselectionmodelbasedontheoutcome-dependentdropoutassumption[ 72 ],[ 91 ]isappealing;atypicalexampleisthecaseofcensoringovercutovalues.Soitisalsoofourinteresttoexploretheuseofselectionmodelsinfunctionalmappingwithincompletelongitudinaldata. Inthischapter,Iwillincorporatetheideaofselectionmodeltomapquantitativetraitnucleotides(QTNs)thatencodelongitudinalresponseswithnon-ignorabledropouts.TheEMalgorithmwasimplementedtoobtainthemaximumlikelihoodestimatesofparameters.Asimulationstudywithcontinuousrepeatedmeasuresandnon-ignorabledropoutswasperformedtoinvestigatethestatisticalpropertiesofthismodel. 79

PAGE 80

Agivensubjectimaycarryoneofthethreepossiblecompositediplotypes,AA(0),AA(1)andAA(2),ataQTNthataectsalongitudinaltrait.Thechangesofalongitudinaltraitacrossdierenttimepointsmayfollowaparticularpatternthatcanbedescribedbyabiologicallysensiblemathematicalfunction.Forillustrationpurpose,Iassumethelongitudinaltraitsfollowalineartrend.Othermoresophisticatedmathematicalequationscanbereadilyextendedtofunctionalmappingalthoughthisextensionwillbecomputationallyintensive. GivencompositediplotypeQi,thephenotypicvalueofthelongitudinaltraitmeasuredataparticulartimepointintermsofthisQTNiswrittenas wherea1;b1;a2;b2arexedeects,QiisthecompositediplotypeforsubjectiwhichisassumedtoaectthelongitudinaltraitYi,indicatorI(Qi=1)(orI(Qi=2))is1whensubjecticarriescompositediplotypeAA(orAA)and0otherwise.i=(0i;1i)0aresubjectspecicrandomeects.iisassumedtohaveabivariateGaussiandistributionN(;V),or0B@0i1i1CAN0B@0B@011CA;0B@000101111CA1CA; 4.2.2 .AndtherelativefrequenciesofQiisobtainedfromtable 2-1 Fortheeventtimedistribution,thepopularextendedCoxproportionalmodelisusedtomodelthehazardoffailure,andsimilartothatin[ 75 ],thehazarddependson 80

PAGE 81

wherethebaselinehazardfunction0(t)isleftunspecied.Sothelongitudinalandeventtimedistributionsharethesamelatentrandomprocessiandthesamexedeecta1;b1;a2;b2. Accordingtosection 3.2 ,thejointdistributionoflongitudinalandeventtimedatawillbeaselectionmodelandcancharacterizetheinformativedropout.LetLobetheobservedlikelihoodfunction.SincethemarginaldistributionforYiiseasilyobtained,itisconvenienttofactorizethelikelihoodfunctionastheproductofthismarginaldistributionandtheconditionaldistributionofeventtimesgivenY,i.e., where 22e(yij2X=1(a+btij)I(Qi=)0i1itij)2);f(ij;V)=(2jVj)1=2exp1 2(i)0V1(i);f(~Di;ijQi;0;;a1;b1;a2;b2;i)=h0(~Di)expn(2=1(a+b~Di)I(Qi=)+0i+1i~Di)oiiexpZeDi00(u)exp(2=1(a+bu)I(Qi=)+0i+1iu)duf(QijGi;A)isdeterminedfromtable 21 andAistheassumedriskhaplotype. 81

PAGE 82

60 ].ThisisdonebyiteratingbetweentheE-andM-step.IntheE-step,itcomputestheexpectedcompletelog-likelihoodconditionalontheobserveddataandthecurrentestimatesoftheparameters.IntheM-step,newestimatesareobtainedbymaximizingtheexpectedcompletelog-likelihoodconditionalontheobserveddatafromthepreviousE-step. Inthecurrentsituation,thecompletedataforeachindividualis(Yi;ti;~Di;i;i;Qi),whereiisnotobservedandQimayormaynotbeobserved.ThecompletedatalikelihoodistheproductoverallNindividualsoftheintegrandoftheexpressiongivenbyequation 5{3 LetZobetheobserveddataandE,EidenotethegenericconditionalexpectationsconditionedonZoandcurrentestimatedparameters.Theconditionalexpectationofthecompletelog-likelihood(omittingtheconstantterm)isexpressedas

PAGE 83

2NXi=1milog2e1 22eNXi=1miXj=1Ei(yij(2X=1(a+btij)I(Qi=)+0i+1itij))2N 2NXi=1Ei(i)0V1(i)+NXi=1ilog0(~Di)+NXi=1i(Ei(2X=1(a+b~Di)I(Qi=)+0i+1i~Di))NXi=1ZeDi00(u)Ei"exp((2X=1(a+bu)I(Qi=)+0i+1iu))#du Letu1;:::;ukdenotetheorderedfailuretimes,thenonparametricmaximumlikelihoodestimatesof0(t)onlytakemassatuj,j=1;:::;k.BysettingthederivativesofEflfjZogtozero,agroupofscoreequationscanbederivedasfollows: 0=@EflfjZog (5{6) 0=@EflfjZog 2V1(NXi=1Ei(i)(i)0)V1 0=@EflfjZog 24emiXj=1Ei(yij2Xs=1(as+bstij)I(Qi=s)0i1itij)2# 0=@EflfjZog 83

PAGE 84

0=@EflfjZog wheres=1;2. 0=@EflfjZog TheclosedformsforestimatingtheMLEsarederivedas^=1

PAGE 85

@x@y=limh1!0h2!0[f(x+h1;y+h2)f(x+h1;yh2)][f(xh1;y+h2)f(xh1;yh2)] 4h1h2 @x2=limh!0f(x+h)2f(xh)+f(x) 5.5.1SimulationScenarios 85

PAGE 86

WeassumethatthepopulationfromwhichasetofsamplesaredrawnforQTNmappingisatHardy-Weinbergequilibrium.Wesimulatea2-SNPQTNthatencodesthevariationinlongitudinalresponse.TheallelefrequenciesatthetwoSNPsandtheirlinkagedisequilibrium(LD)inthepopulationareassumedasintable 5-1 .Byassumingoneofthefourhaplotypesasariskhaplotype,wespecifythreedierentcompositediplotypes,withcorrespondingfrequenciesdeterminedbyallelefrequenciesandLD,andresponsecurveparametersassumedintable 5-2 .Thephenotypicvalueoflongitudinalresponseissimulatedfrom:Yit=(a1+b1t)I(Qi=1)+(a2+b2t)I(Qi=2)+0i+1it+eit; ThesamesimulateddatasetisanalyzedbythetraditionalmodelthatassumestheMCARandMARdropouts.TheprecisionofparameterestimationandthepowerforthedetectionofsignicantQTNfromthetraditionalmodelarecomparedwiththosefromthisnonignorabledropoutmodel. 86

PAGE 87

5-1 and 5-2 .Theestimatesincludepopulationgeneticparameters,i.e.,allelefrequenciesandLD,quantitativegeneticparameters,i.e.,compositediplotype-speciccurveparametersandcovariancematrix-structuringparameters.TheMLEsofthepopulationgeneticparametersarequiteunbiasedandprecisesincetheyareestimatedfromasimplemulti-nominallikelihoodfunction.Theestimatesofthequantitativegeneticparametersarealsoreasonablyprecise,asindicatedbythesmallstandarderrorsoftheestimates(table 5-2 ). Bothmodelshavesuccessfullydetectedtheriskhaplotype.Theresultsfromthejointmodelarecomparablewiththosebytraditionalmodel,althoughthelatterisslightlymorebiasedandlessprecisethantheformer,exceptfor00andb2. 72 ],[ 79 ],[ 91 ]. Overall,theselectionmodeldoesnotdisplayanattractiveadvantage,comparedwiththepatternmixturemodelbecausetheformerneedsmuchlongercomputationtimesthanthelatter.Butthestrengthoftheselectionmodelisinthemodelingoftheeventtime.Byconditioningonthelongitudinaldata,wecanbettercapturetheeventtimedistribution.Inmanyclinicaltrialsorsurvivalstudies,thismethodisusedtondabiomarkerforthediseasedevelopment,although,inourcaseforQTNmappingwithlongitudinalresponse,usingpattern-mixturejointmodelwillbemoreappropriate.Byincludingthedropouttimeasacovariateforthelongitudinaldevelopment,wecanstudythelongitudinalcurvemorepreciselywithmixturemodels. 87

PAGE 88

Maximumlikelihoodestimates(MLEs)ofSNPallelefrequenciesandlinkagedisequilibrium.Thestandarderrors(STDs)oftheMLEswerecalculatedfrom100simulationreplicates. Populationgeneticparameters Table5-2. Maximumlikelihoodestimates(MLEs)oftheparameterswhichdescribethelongitudinalcurveanddropoutprocessunderthedropoutandtraditionalmodels.Thestandarderrors(STDs)oftheMLEsarecalculatedfrom100simulationreplicates.Thetraditionalmodeltslongitudinalresponseswithoutconsiderationofdropoutinformation. DropoutmodelTraditionalmodel GivenMLESTDMLESTD 88

PAGE 89

Whilebiologicalresearchismovedtowardasystematicunderstandingofbiologybymeasuringandmodelingmorecomprehensivebiologicalvariables,suchamovementwillleadtoathornycomputationalandanalyticalcomplexity.Bysimplifyingthebiologicalcomplexitywithmathematicalmodels,functionalmappingopensupgreatopportunitiestostudythegeneticarchitectureofcomplexbiologicalandpathogeneticprocesseswithawell-designedclinicaltrial.However,theconstructionofpreviousfunctionalmappingignorespossiblewithdraworremovalofsomesubjectsbeforethestudyendsduetovariousreasons.Suchignorancewillresultinbiasedinferenceandinecientestimation.Inrepeatedmeasurestudies,weareoftenencounteredbymissingdata.Sonewmodelsthattakethemissinginformationintoconsiderationareindemand.Also,ifthereasonsthatcausesubjects'dropoutsinanearlystudystagecontainbiologicalmechanisms,itisfundamentallymeaningfultodetectthegeneticbasisofsuchearlydropoutsandtestwhetherthegenesunderlyingthedropoutsareconsistentwiththoseforlongitudinalresponses. Thisdissertationdevelopedastatisticalmodelframeworkformappingquantitativetraitnucleotides(QTNs)thatregulatelongitudinaltraitswithnon-ignorabledropouts.Thisframeworkisbasedonthestrategyofapattern-mixturemodelwhichcanhandlenon-ignorabledropouts.Thisisappliedinapharmacogenomicprojectinpharmacogenomicresearch,theEmaxcurvehasbeenusedtocharacterizeeect-doseresponses.WehaveshowedthatthisjointmodelcandetectaQTNthataectstheheartrateresponsetoadrug,whereasnoQTNcanbedetectedifthedropoutinformationisnotconsidered.ThroughincorporatingthedropoutprocessintothestatisticalmodelforQTNmapping,newdiscoveriescanbemade.Simulationstudieswereundertakentodeciphertheadvantagesofthenewmodelintermsofreducedestimationbiasesandincreasedestimationeciency. 89

PAGE 90

Withtheincorporationofmathematicalfunctions,themodelproposedinthisdissertationcannditsimmediateapplicationsindierentareas,suchasthesigmoidcurvefortumorgrowth,bi-exponentialequationsforHIVdynamics,anddierentialequationsforbiologicalclocks,etc.Asthedropoutproblemisubiquitousinmanylongitudinalstudies,themodelwillbeusefulforthedetectionofspecicgeneticvariantsthatencodecomplexlongitudinalphenotypes. 90

PAGE 91

[1] M.LynchandB.Walsh,GeneticsandAnalysisofQuantitativeTraits,Sinauer,Sunderland,MA,1998. [2] D.Drayna,K.Davies,D.Hartley,J.L.Mandel,G.Camerino,R.Williamson,andR.White,\GeneticmappingofthehumanX-chromosomebyusingrestrictionfragmentlengthpolymorphisms,"Proc.Natl.Acad.Sci.USA,vol.81,no.9,pp.2836{2839,May1984. [3] E.S.LanderandD.Botstein,\MappingMenelianfactorsunderlyingquantitativetraitsusingRELPlinkagemaps,"Genetics,vol.121,pp.185{199,1989. [4] R.C.JansenandP.Stam,\Highresolutionofquantitativetraitsintomultiplelociviaintervalmapping,"Genetics,vol.136,pp.1447{1455,Apr.1994. [5] C.H.Kao,Z.B.Zeng,andR.D.Teasdale,\Multipleintervalmappingforquantitativetraitloci,"Genetics,vol.152,pp.1203{1216,Jul.1999. [6] Z.B.Zeng,\Theoreticalbasisofseparationofmultiplelinkedgeneeectsonmappingquantitativetraitloci,"Proc.Natl.Acad.Sci.USA,vol.90,pp.10972{10976,Dec.1993. [7] Z.B.Zeng,\Precisionmappingofquantitativetraitloci,"Genetics,vol.136,pp.1457{1468,Apr.1994. [8] M.Lin,X.Y.Lou,andR.L.Wu,\Ageneralstatisticalframeworkformappingquantitativetraitlociinnon-modelsystems:Issueforcharacterizinglinkagephases,"Genetics,vol.165,no.2,pp.901{913,Oct.2003. [9] R.L.WuandM.Lin,\Functionalmapping{Howtomapandstudythegeneticarchitectureofdynamiccomplextraits,"NatureReviewGenetics,vol.7,no.3,pp.229{237,Mar.2006. [10] R.L.Wu,C.X.Ma,andG.Casella,\Jointlinkageandlinkagedisequilibriummappingofquantitativetraitlociinnaturalpopulations,"Genetics,vol.160,pp.779{792,Feb.2002. [11] S.Z.XuandW.R.Atchley,\Arandommodelapproachtointervalmappingofquantitativetraitloci,"Genetics,vol.141,pp.1189{1197,Nov.1995. [12] C.B.Li,A.L.Zhou,andT.Sang,\Ricedomesticationbyreducingshattering,"Science,vol.311,no.5769,pp.1936{1939,Mar.2006. [13] R.R.H.AnholtandT.F.C.Mackay,\QuantitativegeneticanalysesofcomplexbehavioursinDrosophila,"NatureReviewGenetics,vol.5,pp.838{849,Nov.2004. [14] T.F.C.Mackay,\QuantitativetraitlociinDrosophila,"NatureReviewGenetics,vol.2,pp.11{20,Jan.2001. 91

PAGE 92

L.PeltonenandV.A.McKusick,\Dissectinghumandiseaseinthepostgenomicera,"Science,vol.291,no.5507,pp.1224{1229,Feb.2001. [16] F.JarezicandS.D.Pletcher,\Statisticalmodelsforestimatingthegeneticbasisofrepeatedmeasuresandotherfunction-valuedtraits,"Genetics,vol.156,pp.913{922,Oct.2000. [17] F.Jarezic,R.Thompson,andW.G.Hill,\Structuredantedependencemodelsforgeneticanalysisofrepeatedmeasuresonmultiplequantitativetraits,"GeneticalResearch,vol.82,no.1,pp.55{65,Aug.2003. [18] S.D.PletcherandC.J.Geyer,\Thegeneticanalysisofage-dependenttraits:Modelingthecharacterprocess,"Genetics,vol.153,pp.825{835,Qctober1999. [19] S.D.PletcherandF.Jarezic,\Generalizedcharacterprocessmodels:Estimatingthegeneticbasisoftraitsthatcannotbeobservedandthatchangewithageorenvironmentalconditions,"Biometrics,vol.58,no.1,pp.157{162,Mar.2002. [20] M.KirkpatrickandN.Heckman,\Aquantitativegeneticmodelforgrowth,shape,reactionnorms,andotherinnite-dimensionalcharacters,"Journalofmathematicalbiology,vol.27,no.4,pp.429{450,1989. [21] K.J.Niklas,PlantAllometry:TheScalingofFormandProcess,UniversityofChicagoPress,Chicago,IL,1994. [22] G.B.West,J.H.Brown,andB.J.Enquist,\Ageneralmodelfortheoriginofallometricscalinglawsinbiology,"Science,vol.276,no.5309,pp.122{126,Apr.1997. [23] G.B.West,J.H.Brown,andB.J.Enquist,\Thefourthdimensionoflife:Fractalgeometryandallometricscalingoforganisms,"Science,vol.284,no.5420,pp.1607{1609,June1999. [24] S.Via,R.Gomulkiewicz,G.deJong,S.E.Scheiner,C.D.Schlichting,andP.vanTienderen,\Adaptivephenotypicplasticity:consensusandcontroversy,"TrendsinEcologyandEvolution,vol.10,no.5,pp.212{217,May1995. [25] C.X.Ma,G.Casella,andR.L.Wu,\Functionalmappingofquantitativetraitlociunderlyingthecharacterprocess:atheoreticalframework,"Genetics,vol.161,no.4,pp.1751{1762,Aug.2002. [26] R.L.Wu,C.X.Ma,R.C.Littell,andG.Casella,\Astatisticalmodelforthegeneticoriginofallometricscalinglawsinbiology,"JournalofTheoreticalBiology,vol.219,no.1,pp.121{135,Nov.2002. [27] R.L.Wu,C.X.Ma,M.Lin,andG.Casella,\Ageneralframeworkforanalyzingthegeneticarchitectureofdevelopmentalcharacteristics,"Genetics,vol.166,pp.1541{1551,Mar.2004. 92

PAGE 93

R.L.Wu,C.X.Ma,M.Lin,Z.H.Wang,andG.Casella,\Functionalmappingofquantitativetraitlociunderlyinggrowthtrajectoriesusingatransform-both-sideslogisticModel,"Biometrics,vol.60,no.3,pp.729{738,Sep.2004. [29] R.L.Wu,Z.H.Wang,W.Zhao,andJ.M.Cheverud,\Amechanisticmodelforgeneticmachineryofontogeneticgrowth,"Genetics,vol.168,pp.2383{2394,Dec.2004. [30] Z.H.WangandR.L.Wu,\Astatisticalmodelforhigh-resolutionmappingofquantitativetraitlocideterminingHIVdynamics,"StatisticsinMedicine,vol.23,no.19,pp.3033{3051,2004. [31] Y.Zhu,W.Hou,andR.L.Wu,\AhaplotypeblockmodelfornemappingofquantitativetraitlociregulatingHIV-1pathogenesis,"JournalofTheoreticalMedicine,vol.5,pp.227{234,2003. [32] D.D.Ho,A.U.Neumann,A.S.Perelson,W.Chen,J.M.Leonard,M.Markowitz,\RapidturnoverofplasmavirionsandCD4lymphocytesinHIV-1infection,"Nature,vol.373,pp.123{126,Jan.1995. [33] M.Lin,C.Aquilante,J.A.Johnson,andR.L.Wu,\SequencingdrugresponsewithHapMap,"ThePharmacogenomicsJournal,vol.5,pp.149{156,Mar.2005. [34] H.DerendorfandB.Meibohm,\Modelingofpharmacokinetic/Pharmacodynamic(PK/PD)relationships.Conceptsandperspectives,"PharmaceuticalResearch,vol.16,no.2,pp.176{185,Feb.1999. [35] Y.H.Cui,J.Zhu,andR.L.Wu,\Functionalmappingforgeneticcontrolofprogrammedcelldeath,"PhysiologicalGenomics,vol.25,pp.458{469,2006. [36] W.Zhao,Y.Q.Chen,G.Casella,J.M.Cheverud,andR.L.Wu,\Anonstationarymodelforfunctionalmappingofcomplextraits,"Bioinformatics,vol.21,no.10,pp.2469{2477,2005. [37] M.LinandR.L.Wu,\Ajointmodelfornonparametricfunctionalmappingoflongitudinaltrajectoryandtime-to-event,"BMCBioinformatics,vol.7,no.1,138Mar.2006. [38] W.Zhao,H.Y.Li,andR.L.Wu,\Awaveletthresholdingapproachforfunctionalgeneticmappingoflongitudinaltraits,"JournaloftheAmericanStatisticalAssociation,revised,2006. [39] R.S.CooperandB.M.Psaty,\Genomicsandmedicine:Distraction,incrementalprogress,orthedawnofanewage?,"AnnalsofInternalMedicine,vol.138,no.7,pp.576{580,Apr.2003. 93

PAGE 94

G.Camerino,K.H.Grzeschik,M.Jaye,H.DeLaSalle,P.Tolstoshev,J.P.Lecocq,R.Heilig,andJ.L.Mandel,\RegionallocalizationonthehumanXchromosomeandpolymorphismofthecoagulationfactorIXgene(hemophiliaBlocus),"Proc.Natl.Acad.Sci.USA,vol.81,no.2,pp.498{502,Jan.1984. [41] M.Halling-Brown,ComputationalTechniquesforthePredictionofMinorHistocom-patibilityandTcellAntigens,PhDThesis,UniversityofLondon,2006. [42] I.Johansson,M.Oscarson,Q.Y.Yue,L.Bertilsson,F.Sjoqvist,andM.Ingelman-Sundberg,\GeneticanalysisoftheChinesecytochromeP4502Dlocus:characterizationofvariantCYP2D6genespresentinsubjectswithdiminishedcapacityfordebrisoquinehydroxylation,"MolecularPharmacology,vol.46,no.3,pp.452{459,Sep.1994. [43] M.Tatsuguchi,M.Furutani,J.Hinagata,T.Tanaka,Y.Furutani,S.Imamura,M.Kawana,T.Masaki,H.Kasanuki,T.Sawamura,andR.Matsuoka,\OxidizedLDLreceptorgene(OLR1)isassociatedwiththeriskofmyocardialinfarction,"BiochemicalandBiophysicalResearchCommunications,vol.303,no.1,pp.247{250,Mar.2003. [44] TheInternationalHapMapConsortium\TheInternationalHapMapProject,"Nature,vol.426,pp.789{796,Dec.2003. [45] F.S.Collins,L.D.Brooks,andA.Chakravarti,\ADNApolymorphismdiscoveryresourceforresearchonhumangeneticvariation,"GenomeResearch,vol.8,no.12,pp.1229{1331,Dec.1998. [46] E.Dawson,G.R.Abecasis,S.Bumpstead,Y.Chen,S.Hunt,D.M.Beare,J.Pabial,T.Dibling,E.Tinsley,S.Kirby,D.Carter,M.Papaspyridonos,S.Livingstone,R.Ganske,E.Lohmussaar,J.Zernant,N.Tonisson,M.Remm,R.Magi,T.Puurand,J.Vilo,A.Kurg,K.Rice,P.Deloukas,R.Mott,A.Metspalu,D.R.Bentley,L.R.Cardon,andI.Dunham,\Arst-generationlinkagedisequilibriummapofhumanchromosome22,"Nature,vol.418,pp.544{548,Aug.2002. [47] S.B.Gabriel,S.F.Schaner,H.Nguyen,J.M.Moore,J.Roy,B.Blumenstiel,J.Higgins,M.DeFelice,A.Lochner,M.Faggart,S.N.Liu-Cordero,C.Rotimi,A.Adeyemo,R.Cooper,R.Ward,E.S.Lander,M.J.Daly,andD.Altshuler,\Thestructureofhaplotypeblocksinthehumangenome,"Science,vol.296,no.5576,pp.2225{2229,June2002. [48] T.Liu,J.A.Johnson,G.Casella,andR.L.Wu,\SequencingComplexDiseasesWithHapMap,"Genetics,vol.168,pp.503{511,Sep.2004. [49] W.Hou,M.Lin,T.Liu,andR.L.Wu,\Mappingquantitativetraitnucleotidesunderlyingcomplextraitsinacontrolledcross,"Genomics,submitted. 94

PAGE 95

D.L.Remington,M.C.Ungerer,andM.D.Purugganan,\Map-basedcloningofquantitativetraitloci:progressandprospects,"GeneticalResearchvol.78,pp.213{218,2001. [51] M.Lin,H.Y.Li,W.Hou,J.A.Johnson,andR.L.Wu,\Modelingsequence-sequenceinteractionsfordrugresponse,"BMCBioinformatics,vol.23,no.10,pp.1251{1257,2007. [52] J.X.PanandG.Mackenzie,\Onmodellingmean-covariancestructuresinlongitudinalstudies,"Biometrika,vol.90,no.1,pp.239{244,2003. [53] M.Pourahmadi,\Jointmean-covariancemodelswithapplicationstolongitudinaldata:unconstrainedparameterisation,"Biometrika,vol.86,no.3,pp.677{690,1999. [54] M.Pourahmadi,\Maximumlikelihoodestimationofgeneralisedlinearmodelsformultivariatenormalcovariancematrix,"Biometrika,vol.87,no.2,pp..425{435,2000. [55] W.B.WuandM.Pourahmadi,\Nonparametricestimationoflargecovariancematricesoflongitudinaldata,"Biometrika,vol.90,no.4,pp.831{844,2003. [56] D.L.ZimmermanandV.Nu~nez-Anton,\Parametricmodelingofgrowthcurvedata:Anoverview,"Test,vol.10,no.1,pp.1{73,Jun.2001. [57] R.L.SandlandandC.A.McGilchrist,\Stochasticgrowthcurveanalysis,"Biomet-rics,vol.35,no.1,pp.255{271,Mar.,1979. [58] P.J.Diggle,P.Heagerty,K.Y.Liang,andS.L.Zeger,AnalysisofLongitudinalData,Oxford,UK:OxfordUniversityPress,2002. [59] V.Nu~nez-AntonandD.L.Zimmerman,\Modelingnonstationarylongitudinaldata,"Biometrics,vol.56,no.3,pp.699{705,Sep.2000. [60] A.P.Dempster,N.M.Laird,andD.B.Rubin,\MaximumlikelihoodfromincompletedataviaEMalgorithm,"JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),vol.39,no.1,pp.1{38,1977. [61] G.M.FitzmauriceandG.M.Laird,\Generalizedlinearmixturemodelsforhandlingnonignorabledropoutsinlongitudinalstudies,"Biostatistics,vol.1,no.2,pp.141{156,2000. [62] J.W.HoganandN.M.Laird,\Model-basedapproachestoanalysingincompletelongitudinalandfailuretimedata,"StatisticsinMedicine,vol.16,no.3,pp.259{272,1997. [63] R.J.A.LittleadnD.B.Rubin,Statisticalanalysiswithmissingdata,NewYork:Wiley,1987. 95

PAGE 96

N.M.Laird,\Missingdatainlongitudinalstudies,"StatisticsinMedicine,vol.7,pp.305{315,1988. [65] J.W.HoganandN.M.Laird,\Mixturemodelsforthejointdistributionofrepeatedmeasuresandeventtimes,"StatisticsinMedicine,vol.16,no.3,pp.239{257,1997. [66] F.Hsieh,Y.K.Tseng,andJ.L.Wang,\Jointmodelingofsurvivalandlongitudinaldata:likelihoodapproachrevisited,"Biometrics,vol.62,no.4,pp.1037{1043,Dec.2006. [67] R.J.A.Little,\Pattern-mixturemodelsformultivariateincompletedata,"JournaloftheAmericanStatisticalAssociation,vol.88,no.421,pp.125{134,Mar.1993. [68] R.J.A.Little,\Modelingthedropoutmechanisminrepeated-measuresstudies,"JournaloftheAmericanStatisticalAssociation,vol.90,no.431,pp.1112{1121,Sep.1995. [69] M.C.WuandK.R.Bailey,\Estimationandcomparisonofchangesinthepresenceofinformativerightcensoring:conditionallinearmodel,"Biometrics,vol.45,no.3,pp.939{955,Sep.1989. [70] M.G.Yu,N.J.Law,J.M.G.Taylor,andH.M.Sandler,\Jointlongitudinal-survival-curemodelsandtheirapplicationtoprostatecancer,"Sta-tisticaSinica,vol.14,pp.835{862,2004. [71] W.Hou,C.W.Garvan,W.Zhao,M.Behnke,F.D.Eyler,andR.L.Wu,\Ageneralmodelfordetectinggeneticdeterminantsunderlyinglongitudinaltraitswithunequallyspacedmeasurementsandnonstationarycovariancestructure,"Biostatistics,vol.6,no.3,pp420{433,2005. [72] P.J.DiggleandM.G.Kenward,\Informativedrop-outinlongitudinaldataanalysis,"AppliedStatistics,vol.43,no.1,pp.49{93,1994. [73] J.A.NelderandR.Mead,\Asimplexmethodforfunctionminimization,"TheComputerJournal,vol.7,pp.303{313,1965. [74] M.C.WuandR.J.Carroll,\Estimationandcomparisonofchangesinthepresenceofinformativerightcensoringbymodelingthecensoringprocess,"Biometrics,vol.44,no.1,pp.175{188,Mar.1988. [75] M.S.WulfsohnandA.A.Tsiatis,\Ajointmodelforsurvivalandlongitudinaldatameasuredwitherror,"Biometrics,vol.53,no.1,pp.330{339,Mar.1997. [76] R.Henderson,P.Diggle,andA.Dobson,\Jointmodelingoflongitudinalmeasurementsandeventtimedata,"Biostatistics,vol.1,no.4,pp.465{480. [77] M.D.Schluchter,\Methodsfortheanalysisofinformativelycensoredlongitudinaldata,"StatisticsinMedicine,vol.11,no.14{15,pp.1861-1870,1992. 96

PAGE 97

V.D.GruttolaandX.M.Tu,\ModellingprogressionofCD4-lymphocytecountanditsrelationshiptosurvivaltime,"Biometrics,vol.50,no.4,pp.1003{1014,Dec.1994. [79] D.FollmanandM.C.Wu,\Anapproximategeneralizedlinearmodelwithrandomeectsforinformativemissingdata,"Biometrics,vol.51,no.1,pp.151{168,Mar.1995. [80] Y.PawitanandS.Self,\ModelingdiseasemarkerprocessesinAIDS,"JournaloftheAmericanStatisticalAssociation,vol.88,no.423,pp.719{726,Sep.1993. [81] G.M.Fitzmaurice,N.M.Laird,andL.Shneyer,\Analternativeparameterizationofthegenerallinearmixturemodelforlongitudinaldatawithnon-ignorabledrop-outs,"StatisticsinMedicine,vol.20,no.7,pp.1009{1021,2001. [82] W.S.Guo,S.J.Ratclie,andT.T.TenHave,\Arandompattern-mixturemodelforlongitudinaldatawithdropouts,"JournaloftheAmericanStatisticalAssociation,vol.99,no.468,pp.929{937,Dec.2004. [83] D.ClaytonandJ.Cuzick,\MultivariategeneralizationsoftheProportionalhazardsmodel(withdiscussion),"JournaloftheRoyalStatisticalSociety.SeriesA(General),vol.148,no.2,pp.82{117,1985. [84] S.Zacks,TheTheoryofStatisticalInference,NewYork:JohnWiley&Sons,Inc. [85] T.A.Louis,\FindingtheobservedinformationmatrixwhenusingtheEMalgorithm,"JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),vol.44,no.2,pp.226{233,1982. [86] X.L.MengandD.B.Rubin,\UsingEMtoobtainasymptoticvariance-covariancematrices:theSEMalgorithm,"JournaloftheAmericanStatisticalAssociation,vol.86,no.416,pp.899{909,Dec.1991. [87] I.Meilijson,\AfastimprovementtotheEMalgorithmonitsownterms,"JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),vol.51,no.1,pp.127{138,1989. [88] V.Large,L.Hellstrom,S.Reynisdottir,F.Lonnqvist,P.Eriksson,L.Lannfelt,andP.Arner,\Humanbeta-2adrenoceptorgenepolymorphismsarehighlyfrequentinobesityandassociatewithalteredadipocytebeta-2adrenoceptorfunction,"JournalofClinicalInvestigation,vol.100,no.12,pp.3005{3012,Dec.1997. [89] G.Schwarz,\Estimatingthedimensionofamodel,"AnnalsofStatistics,vol.6,no.2,pp.461{464,Mar.1978. [90] H.Akaike,InformationTheoryandanExtensionoftheMaximumLikelihoodPrinciple,SecondInternationalSymposiumonInformationTheory,pp.267{281,1973. 97

PAGE 98

J.J.Heckman,\Thecommonstructureofstatisticalmodelsoftruncation,sampleselectionandlimiteddependentvariablesandasimpleestimatorforsuchmodels.,"AnnalsofeconomicandSocialMeasurementvol.5,pp.475{492,1976. 98

PAGE 99

HongyingLiwasborninChangge,HenanProvince,People'sRepublicofChina.ShereceivedherbachelorofsciencedegreefromtheUniversityofScienceandTechnologyofChina(USTC)inJune2002andhermasterofsciencedegreefromtheUniversityofFlorida(UF)inAugust2004.Bothdegreeswereinstatistics.ShereceivedherPhDinstatisticsfromUFin2007. 99

PAGE 100

HongyingLi (352)392-1946ext.239 DepartmentofStatistics Chair:RonglingWu Degree:DoctorofPhilosophy GraduationDate:August2007 Functionalmappinghasbeendevelopedtodetectgenesthataectthedevelopmentalprocessofacomplexbiologicaltrait.Functionalmappingneedsphenotypicmeasuresatamultitudeoftimepoints.However,inmanylongitudinalstudies,drop-outisprevalent,whichcomplicatesthestatisticalinference.Inthisdissertation,IproposeajointfunctionmappingmodelformappingandidentifyingtheDNAsequencevariantsencodingdynamictraitswithlongitudinaldatasubjecttoinformativedropout.Iusedthemodeltore-analyzegeneticandpharmacodynamicdata,leadingtothediscoveryofagenethataectsheartrateresponsestoadrug.Extensivesimulationstudieswereperformedtodemonstratetheusefulnessandadvantagesofthemodel.Thenewmodelwillopenupanewgatewaytorevealthegeneticcontrolmechanismsunderlyingthedevelopmentandprocessofalongitudinaltrait.