Citation
Functional Mapping of Longitudinal Trajectories with Unevenly and Irregularly Spaced Data

Material Information

Title:
Functional Mapping of Longitudinal Trajectories with Unevenly and Irregularly Spaced Data
Creator:
HOU, WEI ( Author, Primary )
Copyright Date:
2008

Subjects

Subjects / Keywords:
Covariance ( jstor )
Genotypes ( jstor )
Longitudinal data ( jstor )
Mathematical independent variables ( jstor )
Modeling ( jstor )
Parametric models ( jstor )
Phenotypic traits ( jstor )
Quantitative trait loci ( jstor )
Statistical models ( jstor )
Trajectories ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Wei Hou. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
11/30/2006
Resource Identifier:
445519719 ( OCLC )

Downloads

This item is only available as the following downloads:


Full Text

PAGE 4

iv

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS............................. iv LISTOFTABLES................................. vii LISTOFFIGURES................................ viii ABSTRACT.................................... ix CHAPTER 1INTRODUCTIONTOGENETICMAPPING............... 1 1.1BasicMethods............................. 1 1.2FunctionalMapping.......................... 3 1.3CurrentDevelopmentsandChallenges................ 4 1.4StructureandOrganization..................... 8 2AGENERALMODELFORUNEQUALLYSPACEDMEASUREMENTS ANDNONSTATIONARYCOVARIANCESTRUCTURE....... 9 2.1Introduction.............................. 9 2.2AFrameworkforHuntingLongitudinalQTL............ 10 2.3ModelGeneralization......................... 13 2.3.1Measurementschedule..................... 13 2.3.2ModellingtheCovarianceStructure............. 14 2.3.3ThelikelihoodandEstimation................ 15 2.3.4HypothesisTests........................ 17 2.4AWorkedExample.......................... 18 2.5Discussion............................... 24 3CHARACTERIZINGENVIRONMENT-INDUCEDQTLFORLONGITUDINALTRAJECTORIES....................... 27 3.1Introduction.............................. 27 3.2Methodology.............................. 29 3.2.1TheFiniteMixtureModel.................. 29 3.2.2TheLikelihoodFunction................... 30 3.2.3ModellingtheMeanVectorandCovarianceMatrix..... 31 3.2.4ComputationalAlgorithms.................. 33 3.2.5ModelSelection........................ 35 v

PAGE 6

3.3HypothesisTests........................... 36 3.3.1GlobalTest........................... 36 3.3.2LocalTest........................... 37 3.3.3RegionalTest......................... 37 3.3.4DierentialDevelopmentTest................ 38 3.3.5Genotype-EnvironmentInteractionTest........... 38 3.4Application.............................. 39 3.4.1ALongitudinalPrenatalCocaine-ExposureDatabase... 39 3.4.2Results............................. 40 3.5Discussion............................... 44 4NONLINEARMIXED-EFFECTMIXTUREMODELS.......... 48 4.1Introduction.............................. 48 4.2LinearMixed-EectsModel..................... 51 4.3NonlinearMixed-EectsModel................... 53 4.3.1LindstromandBates'Approximation............ 55 4.3.2BealandSheiner'sApproximation.............. 57 4.4LikelihoodforFunctionalMapping................. 58 4.5EstimationandComputationalAlgorithm............. 60 4.5.1LindstromandBates'LMEMethod............. 60 4.5.2BealandSheiner'sMethod.................. 62 4.6Hypotheses.............................. 62 4.7AWorkedExample.......................... 63 4.7.1MappingPopulation...................... 63 4.7.2QTLScanningandEstimation................ 65 4.8MonteCarloSimulation....................... 68 4.9Discussion............................... 69 5CONCLUSIONSANDPROSPECTS.................... 76 5.1Summary............................... 76 5.2FutureDirections........................... 77 5.2.1StatisticalInfrastructureNonparametricApproaches.... 78 5.2.2Subject-SpecicCovariates.................. 78 5.2.3MissingData.......................... 78 5.2.4ModelDiagnostics....................... 79 5.2.5BiologicalExtensions..................... 79 APPENDIXMLEsOFPOPULATIONGENETICPARAMETERS..... 80 REFERENCES................................... 82 BIOGRAPHICALSKETCH............................ 90 vi

PAGE 7

LISTOFTABLES Table page 2{1MLEsofpopulationandquantitativegeneticparametersforamajorQTL aectinggrowthtrajectoriesofheadcircumferencein145normalchildrenbasedontwodierentalgorithms.................. 23 3{1Parameterestimatesofamajorgeneaectinggrowthcurvesofheadcircumferencein144exposedand144controlchildren........... 43 4{1MLEsofQTLgenotype-specicparametersthatdenestemdiameter growthtrajectoriesinpoplartreesfromLindstromandBates'LME model.................................... 66 4{2TrueparametervaluesforMonteCarlosimulationsbasedonstemdiametersofpoplartrees............................ 69 4{3MLEsofparametersforsimulateddatasetwithasamplesizeof80... 70 4{4MLEsofparametersforsimulateddatasetwithasamplesizeof200.. 70 vii

PAGE 8

LISTOFFIGURES Figure page 1{1Fourrepresentativepatternsforthegeneticcontrolofgrowthtrajectories byadynamicQTL............................. 5 2{1Plotsofheadcircumferencevs.agesforeachofthe145childrensampled fromanaturalpopulation......................... 19 2{2Fourrandomlysampledsubjectsfromthesetofsubjectsstudied..... 20 2{3Threegrowthcurves(foreground)eachpresentingoneofthethreegroups ofgenotypesatthemajorQTLdetectedbyournonstationarymodel. 22 3{1Plotsofheadcircumferencevs.agesfortheexposed(A)andcontrol(B) groupseachcomposedof144children(inyellow)sampledfromanaturalpopulation............................... 40 3{2Threegrowthcurveseachpresentingthreegroupsofgenotypes, QQ (solid curves), Qq (dotcurves)and qq (brokencurves),intheexposed(red) andcontrol(blue)childrenatthegene,detectedbyourjointmodel.. 44 4{1Theproleoflog-likelihoodratio(LR)betweenthefullandreducedmodel estimatedfromLindstromandBates'LMEmodelforanalyzingstem diametergrowthtrajectoriesinaninterspecicpoplarhybridprogeny acrossallthelinkagegroups........................ 67 4{2Twogrowthcurves(foreground)eachpresentingoneofthetwogroupsof genotypesatthemajorQTLdetectedbyournonlinearrandommixedeectsmodel(LindstromandBates'approximation)onlinkagegroups D9andD10................................. 68 viii

PAGE 9

ix

PAGE 10

x

PAGE 11

1

PAGE 12

literatureconcerningthedevelopmentofstatisticalmethodsformappingcomplextraits(Zeng1994;Jansenetal.1994;SenandChurchill2001).AnalyticalstrategiesforQTLmappinghavebeenexpandedtowhole-genomemappingofepistaticQTLbymakinguseofallmarkers(Xu2003).Suchmappingstrategiesneedbeperformedinanexperimentalcross(backcross,F2orfull-sibfamily),astructuredpedigreeoranaturalpopulation,inwhichputativeQTLandmarkersareco-segregatingduetotheirphysicallinkage.Althoughuseful,traditionalstatisticalapproachesforQTLmappingneglectthedevelopmentalfeaturesoftraitformation.Forexample,bodyheightandweight,milkproduction,tumorsize,HIVload,circadianclockanddrugresponseallchangewithtimeorotherindependentvariablesandsogeneticcontrolofthetraitshouldbeaccordinglyexempliedasafunctionofanindependentvariable.Anapproximateapproachtodetectingtime-dependentgeneticeectsforthesedynamictraitshasbeentoassociatemarkerswithphenotypesfordierenttimesorstagesofdevelopmentandtocomparethedierencesacrossthesestages.Moreeectively,single-traitintervalmappinghasbeenextendedtoaccommodatethemultivariatenatureoftime-seriestraits(JiangandZeng1995).However,thisextensionislimitedinthreeaspects:(1)ExpectedvaluesofdierentQTLgenotypesatalltimepointsandallelementsinthecovariancematrixneedtobeestimated,resultinginsubstantialcomputationaldiculties,especiallywhenthenumberoftimepointsisthreeormore;(2)Theresultsmaynotbebiologicallymeaningfulbecausetheunderlyingbiologicalprinciplefortheformationofdynamictraitsisnotincorporated;(3)Nottakingintoaccounttheautocorrelationstructureoftimeseriestraits(Diggleetal.2002),suchamultivariateapproachmayaectthestatisticalpowertodetectsignicantQTL.Importantly,thelasttwoissuesmeanthatthisapproachcannotbeeectivelyusedinpractice.

PAGE 14

weight(vonBertalany1957;Richards1959)thatoccurwhenevertheanabolicormetabolicrateexceedstherateofcatabolism.Basedonfundamentalprinciplesbehindbiologicalorbiochemicalnetworks,Westetal.(2001)havemathematicallyproventheuniversalityofthesegrowthequations.WithmathematicalfunctionsincorporatedintotheQTLmappingframework,functionalmappingestimatesparametersthatdetermineshapesandfunctionsofaparticularbiologicalnetwork,insteadofdirectlyestimatingthegeneeectsatallpossibletimepoints.Becauseofsuchconnectionsamongthesepointsthroughmathematicalfunctions,functionalmappingstrikinglyreducesthenumberofparameterstobeestimatedand,hence,displaysincreasedstatisticalpower.Statistically,functionalmappingpresentsaproblemofjointlymodellingmean-covariancestructuresinlongitudinalstudies,anareathathasrecentlyreceivedconsiderableinterestinthestatisticalliterature(Pourahmadi,1999,2000;PanandMackenzie,2003;DanielsandPourahmadi,2002;WuandPourahmadi,2003).How-ever,dierentfromgenerallongitudinalmodelling,functionalmappingintegratestheparameterestimationandtestprocesswithinabiologicallymeaningfulmixture-basedlikelihoodframework.FunctionalmappingisthusadvantageousintermsofbiologicalrelevancebecausebiologicalprinciplesareembeddedintotheestimationprocessofQTLparameters.Theresultsderivedfromfunctionalmappingwillbeclosertobiologicalreality.1.3CurrentDevelopmentsandChallengesAlthoughtheideaoffunctionalmappingwasoriginallystimulatedfromagrowthmodelforacontrolledcross(Wuetal.2002),thismethodhasbeenextensivelyexpandedtoawidespectrumofbiomedicalandevolutionaryelds.WangandWu(2004)andZhuetal.(2004)combinedtheprincipleoffunctionalmappingandmathematicalmodelsforHIVdynamicsestablishedbyHoetal.(1995)tomaphostQTLthatgoverndynamicchangesofHIVviralloadsina

PAGE 15

Figure1{1:Fourrepresentativepatternsforthegeneticcontrolofgrowthtrajecto-riesbyadynamicQTL.EachcurveispresentedbyadierentQTLgenotype.A)PermanentQTL.SomeQTLarepermanentlyexpressed,whichgivesrisetotwoparallelgrowthcurvesinwhichonegenotypeisconsistentlybetterthantheotherthroughouttheentiregrowthprocess.TheexpressionofthisQTLisnotaectedbydevelopment,andthereforeshowsnointeractionwithage.B)EarlyQTL.SomeQTLareexpressedatearlydevelopmentalstagesandareswitchedoafterapar-ticularage.Thetwogenotypeshavedierentgrowthatearlystages,buttendtoconvergeatlaterstages.ThisQTLdisplaysinteractionswithagebecausethereisachangeofvarianceoftheQTLeectsduringdevelopment.C)LateQTL.SomeQTLremainsilentduringearlystagesandareexpressedonlyafteraparticularage.Thetwogenotypesdisplaysimilargrowthatearlystages,buttendtodivergeatlaterstages.AnalogoustoPatternB,thereisaQTLageinteractioninthiscase,operatingwiththeconditionalneutralitymechanism.D)InverseQTL.Onegeno-typeperformsbetterthantheotherattheearlystageofgrowth,butthischangesatthelaterstage.Thisgenedisplaysinverseeectsataparticularage.GenotypeageinteractionsoccurbecausethereisachangeofthedirectionoftheQTLeectsduringdevelopment.

PAGE 16

humanbody.Dierentfromtheoriginallinkageanalysisstrategyusedtoderivefunctionalmappinginacontrolledcross,functionalmappinginnaturalhumanpopulationsisfoundedonlinkagedisequilibriumthatdenethedegreeofmarker-QTLcosegregationornon-randomassociationatthepopulationlevel.Wangetal.(2005)proposedageneticmodelthatincorporatestheinteractionbetweendierentQTLderivedfromviralandhostgenomes.Veryrecently,Wuetal.(2005)havegeneralizedfunctionalmappingforHIVdynamicsbyconsideringagenome-widescanforhostQTLbasedonmultilocuslinkagedisequilibriummapping.Functionalmappinghasbeenextendedtostudythegeneticcontrolofdrugresponse(Lin2005;LinandWu2005a;Linetal.2005)byimplementingwell-developedpharmacodynamicandpharmacokineticmodels(DerendorfandMeibohm1999).Functionalmapping,inconjunctionwiththeDNAsequencedatafromtheHumanGenomeProject(TheInternationalHapMapConsortium2003),displaysgreatpotentialtoopenanovelareaforpharmacogeneticorpharmacogenomicresearch,anactivecuttingedgedisciplinethathasrecentlyreceivedincreasedattentioningeneralbiomedicalsciences(WattersandMcleod2003).Functionalmappingallowstoidentifyspecicgeneticvariantsthatcontributetointer-patientvariationinresponsetodierentdrugsordierentdosagesofthesamedrug.Theresultsfromfunctionalmappingwillprovidescienticguidanceforthedesignofpersonalizedmedicationsbasedonapatient'sgeneticmake-up.\Naturally-occurring"or\programmed"celldeath(PCD)inwhichthecellusesspecializedcellularmachinerytokillitselfisaubiquitousphenomenonthatoccursearlyinorgandevelopment.Suchacellsuicidemechanismthatenablesmetazoanstocontrolcellnumberandeliminatecellsthreateningtheorganism'ssurvivalhasbeenthoughttobeundergeneticcontrol.The2002NobelPrizeinmedicinewasawardedtothreescientistsbecauseoftheirdiscoveriesofthegeneticregulationofPCD(Horvitz2003).Arecentlyextendedfunctionalmappingmodel

PAGE 17

hasmadeitpossibletoidentifyanddetectQTLthatareresponsibleforPCDinanyorganism(Cui2005;CuiandWu2005).ThisextendedmodelincorporatesthebiologicalmechanismsofPCDthatundergoestwodierentdevelopmentalstages,exponentialgrowthandpolynomialdeath.PCD-incorporatedfunctionalmappingwillopenanovelavenueforthestudyofthegeneticarchitectureofimportantdiseases,suchascancer,thatundergouniversalPCDprocesses.Intheory,geneticinformationcontainedwithinanykindoflongitudinallymeasureddatacanbeextractedbyfunctionalmapping.Theadvantagesoffunc-tionalmappinginmodelexibility,stabilityandpowerresultfromitsstatisticalformulationincludingthemathematicalapproximationofthemeanvectorandthemodellingofthecovariancestructurebystationary(Maetal.2002)ornon-stationarymodels(Zhaoetal.2005a).Althoughoriginalfunctionalmappingwasderivedforthebiologicalprocessesthatcanbedescribedbyparametricfunctions,functionalmappinghasbeenmodiedwithinthenonparametriccontexttoac-commodatethesituationinwhichbiologicallymeaningfulmathematicalequationsdonotexist(LinandWu2005b).Analysesoflongitudinaldataneedanumberofmathematicalmanipulationsforthecovariancematrix,suchasthecalculationofthematrixdeterminantandinverse.Inmodellingthestructureofcovariancematrixbyparametricfunctions,however,thesemathematicalmanipulationswillnotbemadepossiblebecauseofthematrix'ssparsestructure,especiallywhenthedimen-sionofthematrixistoolarge.Toovercomethisproblem,Zhao(2005)proposedade-nosingapproachbasedonwavelettransformationtoreducethedimensionalityoflongitudinaldata.Preservingthefavorablepropertiesoffunctionalmapping,waveletthresholdingapproachcreatesanewdirectioninstatisticalgeneticstomanipulatehigh-dimensonalmultivariatedynamicdatainbothstatisticallyandbiologicallymeaningfulways(ZhaoandWu2005).

PAGE 18

Inpractice,functionalmappingmayencounteracommontypeoflongitudinaldatainagriculturalandbiomedicalresearch,whererepeatedmeasurementsareirregularlyspacedorsparseandthenumbersofrepeatedmeasurementsvaryamongsubjects.Thistypeofsparselongitudinaldatamayalsoconsistofnoisymeasurementswithunderlyingsmoothrandomtrajectoriesforeachsubjectinasample.Somestatisticaltheoryandmethodshavebeenestablishedtoanalyzesparseandirregularlongitudinaldata(Yaoetal.2005),althoughthereisstillmuchroomfortheirimprovement.Topushfunctionalmappingtowardabroadrangeofapplications,especiallyinHIV/AIDstudies,pharmacogeneticandpharmacogenomicresearchandgeneticmodellingofprogrammedcelldeathprocesses,thereisapressingneedtoconsiderthesparsityandirregularityoflongitudinaldatawithinthecontextoffunctionalmapping.1.4StructureandOrganizationTheoverallpurposeofthisdissertationistodeveloppowerfulstatisticalmodelsfordetectingandmappingQTLthatcontrolbiologicalprocesseswithcommonlyseensparseandirregularlongitudinaldata.InChapter2,ageneralmodelfordetectinggeneticdeterminantswithunequallyspacedmeasurementsandnonstationarycovariancematrixwasderived.Chapter3extendsageneralfunctionalmappingmodeltoexaminedierentialexpressionofQTLinresponsetochangingenvironmentalagentsandmodelgeneenvironmentinteractionsfromadevelopmentalordynamicperspective.Thedevelopmentofnonlinearmixed-eectsmodelforQTLmappingisthefocusofChapter4wherefunctionalmappingisintegratedwithoneofthemostactivestatisticalmethodsforfunctionaldataanalysis.ThelastchapterdiscussesfurtherdevelopmentofstatisticalmethodologiesforQTLmappingoflongitudinaltraits.

PAGE 19

9

PAGE 20

geneticmachineryofeachoftheseprocesses.Weanticipatethattheapplicationofthismethodtopracticalgeneticmappingdatasetscouldleadtothediscoveryofgenesthatarebiologicallyorclinicallyimportant(Wuetal.2004c).OurstatisticalmodelsfordetectinglongitudinalQTLwerepreviouslycon-structedasnitemixturemodels(Wuetal.2002a;Maetal.2002)withstationaryrst-orderautoregressiveresidualcovariancestructures(Diggleetal.2002).Theautoregressiveassumptioniscomputationallyconvenientbutthestationarityassumptionmaybeunrealistic.Althoughvariancescanoftenbestabilizedbytransformingthedata(BoxandCox,1964;CarrollandRuppert,1984),itismorediculttodealwithnonstationarycorrelationstructure.InSection2.2,weincorporatenonstationarycorrelationstructureintoourstatisticalframeworkfordetectinggeneticdeterminantsofgrowthtrajectories.InSection2.3themodelisfurtherextendedtoaccommodateunequallyspacedmeasurementtimesanddierentmeasurementpatterns.Section2.4describesanapplicationtoheadcircumferencegrowthinchildrenfordierentsubjectsfrombirthtoage7years.Section2.5describesthewiderimplicationsandextensionsofourapproach.2.2AFrameworkforHuntingLongitudinalQTLThestatisticalfoundationofQTLmappingisamixturemodelinwhicheachobservationyisassumedtohavearisenfromoneofaknownorunknownnumberofcomponents.AssumingthatthereareLQTLgenotypescontributingtothevariationofaquantitativetrait,thismixturemodelisexpressedasyp(yj$;';)=$1f(y;'1;)++$Lf(y;'L;); where$=($1;;$L)0arethemixtureproportions(i.e.,QTLgenotypefrequencies)whichareconstrainedtobenon-negativeandsumtounity;'=('1;;'L)0arethecomponent(orQTLgenotype)specicparameters,with

PAGE 21

wheremixtureproportions,$1;;$L,canbeviewedasthepriorprobabilitiesofQTLgenotypes.TraditionalQTL-detectingmethods(LanderandBotstein1989)canbereadilyextendedtoaccommodatetime-dependenttraits.Themultivariatenormaldistributionofindividualimeasuredattimepointsisexpressedasf(yi;'l;)=1 (2)=2jj1=2exp1 2(yiml)1(yiml)0; whereyi=[yi(t)]1=[yi(1);;yi()]isavectorofobservationmeasuredattimepointsandml=[l(t)]1=[l(1);;l()]isavectorofexpectedvaluesforQTLgenotypelatdierentpoints.Ataparticulartimet,therelationshipbetweentheobservationandexpectedmeancanbedescribedbyalinearregressionmodel,yi(t)=xil(t)+ei(t);wherexiistheindicatorvariabledenotedas1ifaQTLgenotypelisconsid-eredforindividualiand0otherwise;ei(t)astheresidualerrorsequencesas-sumediidnormalwithmeanzeroandvariance2(t);andcovariancestructure(tj1;tj2)=fei(t1),ei(t2)g.TheuseofatraditionalmultivariateapproachtomapQTLforgrowthprocessesislimitedintwoaspects:individualexpectedmeansofdierentQTLgenotypesatallpointsandallelementsinthematrixneedtobeestimated,resultinginsubstantialcomputationaldicultieswhenthevectorandmatrix

PAGE 22

dimensionislarge;failuretoincorporatetheunderlyingbiologicalprincipleofgrowthlimitsitspracticalapplicabilityandinterpretation.WehavedevelopedanewstatisticalframeworkfordetectingQTLthataectgrowthtrajectories(Wuetal.2002a;Maetal.2002).Thisframeworkincludestwotasks.Firstly,weneedtomodelthetime-dependentexpectedmeansofQTLgenotypelusingagrowthequation,l(t)=g(t;ml); whichisspeciedbyasetofcurveparametersml.Severalmathematicalmodels,suchaslogisticorS-shapedcurves,havebeenproposedtodescribenon-descreasinggrowthtrajectories.Thefundamentalbiologicalaspectsofmathematicalmod-ellingofgrowthcurveshavebeenfoundedbyearliermathematicalbiologists,e.g.Bertalany(1957),andrecentlyrevisitedbyWestetal.(2001).TheoverallformofthegrowthcurveofQTLgenotypelisdeterminedbythesetofcurveparameterscontainedinml.IfdierentgenotypesataputativeQTLhavedierentcombi-nationsoftheseparameters,thisimpliesthatthisQTLplaysaroleingoverningthedierentiationofgrowthtrajectories.Thus,bytestingfordierencesinmlamongdierentgenotypes,wecandeterminewhetherthereexistsaspecicQTLthatconfersaneectongrowthcurves.Secondly,weneedtomodelthestructureofthewithin-subjectcovariancematrixusingthestationaryAR(1)model,withcovariancestructurecov(t1;t2)=2jt2t1j:Theparametersthatdeterminethecovariancestructurearedenotedbyv=(;2).WehaveimplementedtheEMalgorithm,originallyproposedbyDempsteretal.(1977),toestimatethreegroupsofunknownparametersinaQTL-huntingmodel,thatis,thepopulationgeneticparameters(p)thatspecifyQTLgenotype

PAGE 23

frequenciesinthepopulation,curveparameters(ml)andcovarianceparame-ters(v).Theseunknownscorrespondtomixtureproportions(denotedbyl),component-specicparameters(denotedby'l)andcommonparameters(denotedby),respectively,inthemixturemodeldescribedbyEquation2-1.AdetaileddescriptionoftheEMalgorithmwasgiveninWuetal.(2002)andMaetal.(2002).Todealwithheteroscedasticresidualvariance,weembeddedthetransform-both-sidesapproach(CarrollandRuppert,1984)intothegrowth-incorporatednitemixturemodel(Wuetal.2004b).BothempiricalanalyseswithrealexamplesandcomputersimulationssuggestthatthisTBS-basedmodelcanincreasetheprecisionofparameterestimationandcomputationaleciency.Furthermore,thismodelpreservesoriginalbiologicalinterpretationofthecurveparametersalthoughstatisticalanalysesarebasedontransformeddata.2.3ModelGeneralization2.3.1MeasurementscheduleWenowgeneralizeourQTL-huntingmodeltoconsidertypicalcharacteristicsofactuallongitudinaldata:irregularlyspacedmeasurementtimesnotcommontoallsubjects,andnonstationarycorrelationstructure.Letyi=[y(ti1);;y(tii)]bethevectorofimeasurementsonsubjectiandletti=(ti1;;tii)bethecorrespondingvectorofmeasurementtimes.Thelongitudinalmeasurementsforsubjectiatdierenttimepointscanbeapproximatedbyamultivariatenormaldistributionwrittenasf(yi)=1 (2)i=2jij1=2exp1 2(yimli)1i(yimli)0; wheretheobservationvectoratdierenttimepoints,yi=[y(ti1);;y(tii)],themeanvectorforQTLgenotypel,mli=[l(ti1);;l(tii)],andthe(ii)within-subjectcovariancematrix,i,areallsubject-specic.

PAGE 24

where1j1j2i;0<<1and2,and,collectivelydenotedbyv,areunknownparameterstobettedtothedata.Theparameterdeterminesthebehaviorofthecorrelationbetweendierentmeasurementsonthesamesubject(ZimmermanandNu~nez-Anton2001).ThetransformationapproachformodellingthecovariancestructureincontinuoustimedescribedbyEquation2-6correspondstotheAR(1)structurewhen=1.AswiththeBox-Coxtransformation,thelogtransformationisobtainedwhenisnearzero.Nu~nez-AntonandWoodworth(1994)usedresultsinBarret(1979)toshowthattheinverse(1i)ofthecovariancematrixconstructedfrom(2.6)istridiago-nal,with8>>>>>>>>>>><>>>>>>>>>>>:2(1i)11=h12(ti2ti1)=i1;2(1i)ii=h12(tiiti(i1))=i12(1i)j(j+1)=h12(ti(j+1)tij)=i12(ti(j+1)tij)=;1ji1;2(1i)jj=nh12(tijti(j1))=ih12(ti(j+1)tij))=io1h12(ti(j+1)ti(j1))=i;11:(2.7)AgeneralformofthedeterminantcanbederivedbyapplyingtheCholeskyfactorizationtothecovariancematrix(Nu~nez-AntonandWoodworth,1994),which

PAGE 25

isexpressedasjij=2iiYj=2h12(tijti(j1))=i;0>>><>>>>:$2=q2+d;forQTLgenotypeAA$1=2q(1q)2d;forQTLgenotypeAa$0=(1q)2+d;forQTLgenotypeaa:IftwoQTLarettedtothedata,wecanestimatetheallelefrequenciesandlinkagedisequilibrium(LD)betweentheQTL.Letq1and1q1betheallele

PAGE 26

frequenciesofAandaatQTLAandq2and1q2betheallelefrequenciesofBandbatQTLB.UndertheassumptionofHardy-Weinbergequilibrium,thesetwoQTL,withthecoecientofLDdenotedbyD,formfourgameteswhosefrequenciesare(LynchandWalsh,1998)8>>>>>>><>>>>>>>:p11=q1q2+D;forgameteABp10=q1(1q2)D;forgameteAbp01=(1q1)q2D;forgameteaBp00=(1q1)(1q2)+D;forgameteab: Denoteatwo-QTLgenotypebyl1l2wherel1;l2=2;1;0arethenumbersofthecapitalalleleatQTLAandB,respectively.Weexpressthegenotypefrequenciesby$22=p211forAABB,$21=2p11p10forAABb,$20=p210forAAbb,$12=2p11p01forAaBB,$11=2(p11p00+p10p01)forAaBb,$10=2p10p00forAabb,$02=p201foraaBB,$01=2p01p00foraaBband$00=p200foraabb.Thus,wesubstituteQTLgenotypefrequenciesinthemixturemodel(Eq.2-9)bygametefrequencies.Thepopulationgeneticparametersaredenotedbyp=(q;d)or(q1;q2;D)fortheone-andtwo-QTLmodel,respectively.WedevelopanEM-simplexhybridalgorithmtoobtainthemaximumlikelihoodestimates(MLEs)ofthepopulationgeneticparameterspwithaclosedformoftheEMalgorithm(seeAppendix)andtheMLEsofthequantitativegeneticparametersmlandvwiththeMatlabfunctionfminsearchbasedonthemodiedsimplexalgorithm(Lagariusetal.1998).Thesimplexalgorithm,originallyproposedbyNelderandMead(1965),hasbeenshowntobeecientforestimatingthecurveandmatrix-structuringparameters(Zhaoetal.2004).Inpracticalimplementation,weintegratethesimplexalgorithmforquantitativegeneticparameterestimatesasanexpandedMsteptotheEMalgorithmdesignedtoestimatepopulationgeneticparametersineachiterativeloop.TheEM-simplexhybridalgorithmprovidesthepointestimatesofpopulationgeneticparameters,curveparametersandmatrix-structuringparameters.Wederive

PAGE 27

theasymptoticvariance-covariancematrixandevaluatethesamplingerrorsoftheestimates.Thetechniquesforsodoinginvolvecalculationoftheincomplete-datainformationmatrixwhichisthenegativesecond-orderderivativeoftheincomplete-datalog-likelihood.Theincomplete-datainformationcanbecalculatedbyextractingtheinformationforthemissingdatafromtheinformationforthecompletedata(Louis,1982).Adierentso-calledsupplementedEMalgorithmorSEMalgorithmwasproposedbyMengandRubin(1991)toestimatetheasymptoticvariance-covariancematrices,whichcanalsobeusedforthecalculationsofthesamplingerrorsfortheMLEsoftheparameters(p;ml;v).Weimplementedanalternativealgorithmforttingthenormalmixturemodel(Eq.2-9),theRfunctionoptim()(VenablesandRipley,2002).Thisfunctionusesseveraldierentalgorithmincludingseveraldierentmethods,SANN,L-BFGS,CG,BFGandNelder-Mead,asdefault.Theadvantageofthisalgorithmisthatitdirectlyprovidesestimatesofstandarderrorsforallparameterestimates.However,theEM-simplexhybridalgorithmiscomputationallymoreecient.2.3.4HypothesisTestsQTL-eecthypotheses:Incontrasttotraditionalmappingapproaches,ourQTL-huntingmodelbasedongrowthlawsallowsforthetestsofanumberofbiologicallyrelevanthypotheses(Wuetal.2004a).ThesetestsincludeaglobaltestfortheexistenceofsignicantQTL,alocaltestforthegeneticeectongrowthataparticulartimepoint,aregionaltestfortheoveralleectofQTLonaparticularperiodofgrowth,andaninteractiontestforthechangeofQTLexpressionacrossages(Wuetal.2004a).Asanexample,weformulatethehypothesisabouttheexistenceofaQTLaectinganoverallgrowthcurveas:H0:m2=m1=m0=m(2.11)

PAGE 28

i.e.thedatacanbettedbyasinglelogisticcurve,whereasunderthealternative,therearetwodierentlogisticcurves.UnderH0,thelog-likelihoodratiostatistic,LRsay,isasymptotically2-distributedwithsixdegreesoffreedom.Weuseanempiricalapproachfordeterminingthecriticalthresholdbasedonpermutationtests,asadvocatedbyChurchillandDoerge(1994).Byrepeatedlyshuingtherelationshipsbetweenmarkergenotypesandphenotypes,aseriesofthemaximumlog-likelihoodratiosarecalculated,fromthedistributionofwhichthecriticalthresholdisdetermined.Populationstructurehypotheses:TheoccurrenceofHWDand/orLDisrelatedtopopulationhistoryandstructureundertheoperationofvariousevolu-tionaryforces,suchasselection,mutationandadmixture(LynchandWalsh,1998).Bytestingthesetwoparameters,ourmodelcaninferpopulationstructureandorganization.Fortheone-QTLmodel,atestofwhetherthesampledpopulationdeviatesfromaHardy-WeinbergequilibriumisformulatedbyH0:d=0vs.H1:d6=0.Similarly,ifthetwo-QTLmodelisused,LDbetweentwoputativeQTLcanbetestedbycomparingthelikelihoodvaluesunderH0:D=0andH1:D6=0.Whend=0,theQTLgenotypefrequenciesaredeterminedsolelybyallelefre-quencies.WhenD=0,theMLEsofgametefrequenciescanbeobtainedwiththeconstraintp11p00=p10p01.2.4AWorkedExampleSubjects:Aspartofourongoingprospectivelongitudinalprojectonchilddevelopment(Eyleretal.1998a,1998b),wesampledagroupofchildren(n=145)fromahumanpopulationinFloridawhowereborntonormal,nondrug-usingmothers.Thechildrensampledwerefollowedfrombirthtoageseven.Interviewersandcliniciansconductedabatteryofpsychological,neurologicalandmedicaltestsatnineagepoints:birth,onemonth,sixmonths,1year,eighteenmonths,2years,3years,5years,and7years.Growthdatawereobtainedatallnineagepoints

PAGE 29

includedheadcircumference,weightandlength.However,thenumberandagepointsofmeasurementsactuallytakenvariedbetweensubjects.Asacasestudyforvalidatingthemodelpresentedinthisarticle,weusedheadcircumferencegrowthtotestwhetherthereexistspecicQTLthattriggereectsongrowthtrajectories.Inordertoadjustforgestationsthatwerelessthanthetypical40weeks,ageinthisstudyisdenedastimesinceestimatedconception,i.e.chronologicageplusactualweeksgestation.Toaccuratelyreectthetimecourseofmeasures,weusedtheageindecimalyearswhentheheadcircumferencewasmeasured.In14ofthe1,126observationsphysicallyimpossiblechangesinrecordedheadcircumferencefromonetimetothenext(>5cm)wereconsideredtobeinerrorandwereeliminated.Results:Figure2{1illustratestheplotsofheadcircumferenceagainstagesfrombirthtoage7yearsforallsubjects.Comparedtogrowthtrajectoriesbasedonrawdata(Fig.2{1A),growthtrajectoriesbasedonthelogtransformationdisplaysmallerandmoreuniformvariationthroughoutheaddevelopment(Fig.2{1B).ThisdierencesuggeststhattheTBS-basedmodelispreferabletoamodelthat

PAGE 30

Figure2{2:Fourrandomlysampledsubjectsfromthesetofsubjectsstudied.ThegrowthtrajectoriesofheadcircumferencecanbettedbytheJenss-Bayleymodel(Eq.2-12). usesuntransformeddata.Statisticaltestssuggestthatheadcircumferencegrowthtrajectoriesforallsubjects,exceptforthosewithtoofewrepeatedmeasurements,canbewellttedbytheJenss-Bayley(1937)model,g(t)=+tet; wherethegrowthparametersm=(;;;)describetheasymptoticgrowth,initialgrowthandrelativegrowthrate,respectively.Asevidence,werandomlychosefoursubjectsfromthestudysamples,ineachofwhichtheJenss-Bayleymodelisagoodttoheadcircumferencedata(Fig.2{2).Inthistypicalexample,therefore,Equation2-12isusedtodescribethemeanvalueofQTLgenotypelattimetasshownbyEquation2-4.

PAGE 31

Tomaximizethelikelihood,weusedtwodierentalgorithms,theMatlabfunc-tionfminsearchandRfunctionoptim().WetestedvedierentmethodsincludedintheRfunctionoptim().TheSANNmethodcouldnotconverge.TheL-BFGS-B,CGandBFGmethodsconvergedbutcouldnotgeneratepositivedeniteHessianmatrices.TheNelder-MeadmethodachievedthehighestmaximumlikelihoodamongtheconvergedmethodsandgeneratedapositivedeniteHessianmatrix.BoththeMatlabfunctionfminsearchandRfunctionoptim()haveconsistentlydetectedasignicantQTLwithmajoreectsondierentiationingrowthtrajec-toriesofheadcircumferenceandalsoprovidedsimilarestimatesoftheparametersandoftheirstandarderrors(Table2-1),buttheMatlabfunctionfminsearchused20minutesfewerthantheRfunctionoptim()tocompleteparameterestimatesforourexampleonaPCwith1.7GHzCPUand512MBRAM.Thelog-likelihoodratio(LR)teststatisticscalculatedfromtheMatlabfunctionfminsearchunderthefullmodel(thereisaQTL)andthereducedmodel(thereisnoQTL,i.e.,m2=m1=m0=m)is62.87,greaterthanthecriticalthresholdvalue(27.6)atthe0.01signicancelevel.ThecriticalvalueforclaimingtheexistenceofthisQTLwasdeterminedonthebasisofsimulationstudies.Inthisexample,theempiricalestimateofthecriticalvalueisobtainedfrom300simulations.LetAandadenotetheallelesthatcauseincreasedordecreasedheadcir-cumferencegrowth,respectively.Theincreasingalleleismorecommonthanthedecreasingallele,with^q=0:58foralleleAand1^q=0:42forallelea.ThecoecientofHardy-Weinbergdisequilibriumbetweenthetwoalleleswasestimatedas^d=0:036.ThehypothesistestindicatedthatthepopulationfromwhichoursamplesweredrawnisatHardy-Weinbergequilibriumbecause^disnotsignicantlydierentfromzerobasedonthelog-likelihoodtest.GivenconsiderabledierencesamongtheestimatesofthethreeQTLgenotypefrequencies,thisQTLshoulddisplayafairlyhighheterozygosityinthepopulation.Wecalculatedthethree

PAGE 32

Figure2{3:Threegrowthcurves(foreground)eachpresentingoneofthethreegroupsofgenotypesatthemajorQTLdetectedbyournonstationarymodel.ThedierentiationofgrowthcurvesshowstheeectoftheQTLongrowthtrajectories. genotypefrequenciesas0.37forthehomozygote(AA)comprisingoftheincreasedallele,0.21forthehomozygote(aa)comprisingofthedecreasedalleleand0.42fortheheterozygote(Aa).TheparametersdescribinggrowthcurvesofdierentQTLgenotypeswereestimatedwithreasonableprecision,withsamplingerrorsoftheMLEslessthanonetenthoftheestimatedvalues(Table2-1).However,theestimationprecisionvariesbetweenparameters.TheJenss-Bayleymodelcontainstwoparts,onereectedbyalinearregressiondescribedbyandandthesecondpartreectedbyanexponentialfunctiondescribedbyand.Ineachpart,theslopes,or,havealwayslargersamplingerrorsthantheintercepts,or.Ofthethreecovarianceparameters,thetransformationparameter()haspoorerestimationprecisionthanthecorrelation()andvariance(2).

PAGE 33

Table2{1:MLEsofpopulationandquantitativegeneticparametersforamajorQTLaectinggrowthtrajectoriesofheadcircumferencein145normalchildrenbasedontwodierentalgorithms,fminsearchandoptim().ThestandarderrorsoftheMLEsaregivenintheparentheses. ParametersGenotypeAAGenotypeAaGenotypeaa

PAGE 34

ThegrowthcurvesforthethreegenotypesatthedetectedQTLareshowninFigure2{3.ThedierentgrowthcurvesoftheQTLgenotypesshowthedynamicchangeofQTLexpressionovertime.ItappearsthattheQTLdisplaysanincreasedeectonheadgrowthfrombirthto1.5yearsoldandmaintainsaconsistentlargeeectonheadgrowththereafter(Fig.2{3).TheheterozygotecomprisingtheincreasinganddecreasingQTLallelesshowintermediategrowthrelativetothetwohomozygotes,suggestingthatthisQTLaectsgrowthtrajectoriesinanadditivegeneactionmode(Fig.2{3).ApproximateparallelismofthethreecurvesoftheQTLgenotypes(Fig.2{3)impliesthatthemajorgenehaslittleinteractionwithage.WeperformedaMonteCarlosimulationstudytoinvestigatethestatisticalpropertiesofparameterestimatesfromourmodel.WemimickedtheapplicationdescribedabovebysimulatingthreegrowthcurvesorQTLgenotypeseachfollowingadierentlogisticequation.Ourmodelwasthenusedtoestimatethesimulateddataandcalculatethecurveandmatrixparameters.Thesamesimulationschemewasrepeated400timestoestimatethetruesamplingerrorsoftheMLEsoftheparameters,whicharegiveninTable2-1.Alltheparametersincludinggenotypefrequencies,curveparametersandmatrixparametersarewellestimated.Theestimationprecisionbasedonthissimulationstudyisbroadlyinagreementwiththatfromthenumericalanalysis.2.5DiscussionInourpreviouspublications,aseriesofstatisticalmodelsforcharacterizingspecicquantitativetraitloci(QTL)regulatinggrowthprocesseshavebeendevelopedbyimplementinggrowthlawsderivedfromphysiologicalprinciples(Westetal.2001)orbuiltonthetnessofobservationaldata(Bertalany,1957).Althoughthesemodelshavebeenusedsuccessfullyforgenedetectionfrompracticallongitudinaldatainforesttrees(Wuetal.2003a)andHIV-1dynamics(Wuet

PAGE 35

al.2004c),theyarestilllimitedunlesssomeunderlyingassumptionsarerelaxed.Theseassumptions,includingequallyspacedtimeintervals,identicalmeasurementscheduleforallsubjectsandwithin-subjectstationarycorrelationstructure,areoftennotmetinpractice,especiallyinhumangeneticresearchwheresubjectsunderstudycannotbeexperimentallyunderfullcontrol.Thegoalofthisarticlewastoderiveamoregeneralmodelbyrelaxingeachoftheseassumptions,soasrowidentheapplicabilityofQTL-detectingmodels.Weusedanexampleforchildheadcircumferencegrowthtovalidatetheuse-fulnessofournewmodel.AQTLthataectsgrowthtrajectoriesofchildheadcircumferencefrombirthtoage7yearshasbeensuccessfullydetectedinasampleofnormalchildren(145)drawnfromapopulation,representativeofchildrendeliv-eredatourhospital.ThisQTLwasdetectedtoexertaconsistenteectonheadsizegrowthafteragetwoyears.Samuelsenetal.(2004)alsospeculatedgeneticcontroloverchildheadsizegrowthinaNorwegianpopulation.ThestandarderrorsofparameterestimatesinourexamplewereevaluatedfromtheFisherinformationmatrixderivedonthebasisofthesecondderivativeofthelikelihoodfunction.Theresultsarebroadlyconcordantwiththosefromanalternativeapproachbasedonsimulationstudiesthatmimicsthestructureoftheexampleused.Ournewmodelisrobustinthatitcanhandleanypatternoflongitudinaldata,includingmissingvaluesprovidedthatthesearemissingatrandom.Itsadditionalvalueliesinitsabilitytodescribenonstationarycovariancestructureswhileavoidingtheoverparameterizationoftheunstructuredmultivariatenormalmodel.Inthisstudy,ourderivationtakesadvantageofstationaryvarianceonthelog-scalethroughimplementingtheTBSmodel.Itispossiblethatnonstationarityinvariancestillexistsaftertransformation,inwhichcaseamoregeneralmodelisneededwhichallowsforboththewithin-subjectresidualvarianceandcorrelationto

PAGE 36

betimedependent(Nu~nez-AntonandZimmerman,2000;ZimmermanandNu~nez-Anton,2001).Insomeearlierwork,severalauthorshaveexploredthefeasibilityofmodellingnonstationaryvariance(Wolnger,1996;Nu~nez-Anton,1997;Zimmermanetal.1998).OurmodelisdevelopedtodetecttheQTLresponsibleforgrowthtrajectories.TheincorporationofmolecularmarkersintothismodelcandiscoverthepositionofaQTLonthegenome.AnadditionaltaskforsuchgeneticmappingistoderivetheconditionalprobabilitiesofunknownQTLgenotypesgivenknownmarkergenotypes.Forexperimentalcrossesorwellstructuredpedigrees,theseconditionalprobabilitiesthatcorrespondtomixtureproportionsinthenitemixturemodeldescribedbyEquation2-1areexpressedasafunctionoftherecombinationfractionbetweenthemarkersandputativeQTL.Ifamappingpopulationisrandomlydrawnfromanaturalpopulation,asusedinthisstudy,wecanimplementaclosed-formsolution,asderivedinLouetal.(2003),toestimatetheallelicfrequenciesofQTLallelesandthelinkagedisequilibriumbetweenthemarkersandQTL.TheestimationoftherecombinationfractionforacrossorpedigreesandthelinkagedisequilibriumforanaturalpopulationisarststeptowardpositioncloningQTLaectingacomplextrait.GivenmarkedevidencefortheexistenceofQTLforgrowthinchildheadcircumference,itwouldbeworthwhilegenotypingmolecularmarkersand,ultimately,identifyingtheunderlyingQTLhiddeninthispopulation.

PAGE 37

27

PAGE 38

comparethegeneticdierencebetweenagroupofsubjectswhoareexposedtoachemical(case)andasecondgroupwhoarenot(control).Bothgroupsaresampledfromanaturalpopulation.Statisticalmethodshavebeenavailabletostudygene-environmentinterac-tionsforbehaviorsordiseasesincase-controlstudies(Spinkaetal.2005).Butthesemethodshavenotconsideredthedevelopmentalchangeofatraitduringatimecourseandtheunderlyingbiologicalcontrolmechanismsthatcausesuchadevelopmentalchange.Inearlierwork(Wuetal.2002a),wehaveproposedastatisticalmodelforcharacterizinggenotypicsegregationatspecicgenesthatregulategrowthtrajectoriesinasinglepopulation.Thismodelisimplementedwithubiquitousgrowthequationsthatarederivedfromfundamentalbiologicalprinciples(Westetal.2001)andincorporatedbyarst-orderautoregressive[AR(1)]modelforstructuringtime-dependentcovariancematricesforlongitudinaltraits(Diggleetal.2002).TheadvantagesoftheAR(1)modelareitssimplestructureandeasymathematicalmanipulation,whereasitsdisadvantageincludespossibleviolationsofthetwounderlyingassumptions,stationaryvarianceandcorrelation.Wuetal.(2004a)appliedatransformationapproach,calledthetransform-both-sides(TBS)modelbyCarrollandRuppert(1984),tomodifyvarianceheteroscedasticity.Inarealexampleofaforesttree,thisTBS-basedmappingmodelcanincreasetheprecisionofparameterestimation,ascomparedtothemodelwithoutincorporatingtransformation(Wuetal.2004a).Inthisarticle,weextendWuetal.'s(2002a,2004a)modelstodetectdier-entialexpressionsofthesamegeneacrossenvironmentalchangesinacase-controlstudyandaccommodateamorecomplicatedsituationwithunequallyspacedmea-surementsandnonstationarycovariance.Specically,theextendedmodelcanbeusedtocharacterizewhetherandhowgeneticfactorsinteractwithenvironmentalagentstodetermineontogeneticdevelopment.Thismodelwasvalidatedbyareal

PAGE 39

exampleforacase-controlstudyofparentalcocaineexposure.Wefounddierentdegreesoftheassociationbetweenasegregatinggeneandthegrowthcurveofheadcircumferenceinyoungchildren,dependingonwhethersubjectsareprenatalco-caineexposed.Thisndingdemonstratestheeectofgene-environmentinteractiononcocaine-dependentgrowthtrajectories.3.2Methodology3.2.1TheFiniteMixtureModelFinitemixturemodelsareatypeofdensitymodelthatcomprisesanumberofcomponentfunctions,usuallyGaussian.Thesecomponentfunctionsarecombinedtoprovideamultimodaldensity.Gaussianmixturemodelscanbeemployedtomodelgenotypicsegregationofspecicgeneticfactorsthatdeterminegrowthcurves(Wuetal.2002a).Accordingtomixturemodels,eachcurvettedbyanitesetofmeasurementswithTtimepointsforsubjecti,arrayedbyy=(y(1);;y(T)),isassumedtohavearisenfromoneofaknownorunknownnumberofcomponents,eachcomponentbeingmodelledbyamultivariatenormaldistributiondensity.AssumingthatthereareJgenotypescontributingtothevariationamongdierentcurves,suchamixturemodelisexpressedasyp(yj$;';)=JXj=1$jfj(y;'j;); where=(1;;J)arethemixtureproportions(i.e.,genotypefrequencies)whichareconstrainedtobenon-negativeandsumtounity;'=('1;;'J)arethecomponent-(orgenotype)specicparameters,with'jbeingspecictocomponentj;andareparameterswhicharecommontoallcomponents.Thelikelihoodoflongitudinaldataywillbeconstructedonthebasisofthismixturemodel.

PAGE 40

=2Yk=1nkYi=1"3mXj=1$jfj(yik;'jk;k)# =2Yk=1Lk($;'k;kjyk); wherethemixtureproportions($j)arereectedbygenotypefrequencies.Equation3-2assumesthatallsubjectsareindependentintheirresponsestotimes,whereasEquation3-3assumesthatthetwogroupsareindependent,inwhichwecancalculatethelikelihoodfunctionLk($;'k;kjyk)separatelyforeachgroupandcombinethetwolikelihoodfunctionsthroughtheirmultiplication.Here,we

PAGE 41

assumethatthegenotypefrequenciesofthegenesareidenticalbetweenthetwogroups,althoughtheirgeneticeectsaregroup-dependent.Traditionalgeneticmethodscanbereadilyextendedtoaccommodatethemultivariatenatureoftime-dependenttraits.Forsubjectifromgroupk,itsmultivariatenormaldistribution,fj(yik;'jk;k),hasthemeanvectorthatcorrespondstocomponent-specicparameters,'k,fordierentgroups,speciedbyujik=(ujik(1);;ujik(Ti)):Ataparticulartimetiforsubjecti,therelationshipbetweentheobservationandgenotypicvaluecanbedescribedbyalinearregressionmodel,yik(ti)=JXj=1ikujk(ti)+eik(ti);whereikistheindicatorvariabledenotedas1ifgenotypejisconsideredforsubjectiand0otherwise,andeik(ti)istheresidualerrorthatisiidnormalwithmeanzeroandvariance2ik(ti).Theerrorsforsubjectifromgroupkattwodierenttimepoints,ti1andti2,arecorrelatedwithcovariancek(ti1;ti2).These(co)variancescomprisethe(TiTi)matrixikoffj(yik;'jk;k)whoseelementsarethecommonparametersofthemixturemodel(1).3.2.3ModellingtheMeanVectorandCovarianceMatrixTheestimationofthemeanvectorujkandthe(co)variancematrixikisstatisticallydicultbecauseitinvolvestoomanyunknownparametersgivenapossiblesamplesize.Also,suchdirectestimationdoesnottakeintoaccountthebiologicalprinciplesoftraitformationanddevelopment.Weincorporatethemathematicalaspectsoftraitformationintotheestimationprocessofthelikelihoodfunction(Eq.3-3).Themeanvalueofgenotypejforsubjectifromgroupkat

PAGE 42

timeticanbedescribedbyamathematicalfunctionexpressedasujk(ti)=G(ti:jk); wherejkisasetofparametersthatdeterminesjointlytheshapeoflongitudinalcurves.Thus,withEquation3-5,weonlyneedtoestimatethemathematicalparameters,ratherthanestimategenotypicvaluesateverytimepoint,inordertodetectgenotypicdierencesinatraitprocess.Thiscansignicantlyreducethenumberofunknownparameterstobeestimated,especiallywhenthenumberoftimepointsislarge.Thecomparisonsofjkamongthedierentgenotypesacrossdierentgroupscandeterminewhetherandhowthisputativegenedierentlyaectsgrowthtrajectoriesforthesetwogroups.Similarly,the(co)variancematrixcanbestructuredwithanappropriatemodel.Statisticalanalysisoflongitudinaldatahasestablishedanumberofstructuralmodelsthatcapturemostoftheinformationcontainedinthematrix(DavidianandGiltinan1995;Diggleetal.2002).Nu~nez-AntonandWoodworth(1994)proposedapowertransformationmodelforttingthecorrelationsasafunctionoftime.Byestimatingthepowerparameter,thisnonstationarymodelisregardedasmoregeneralcomparedtothestationaryAR(1)model.Nu~nez-AntonandWoodworth(1994)andNu~nez-AntonandZimmerman(2000)proposedageneralformofthe(co)variancematrix(i)forsubjectiwithTiobservationsattimes0<>:2(tij2tij1)=;6=02ln(tij2=tij1);=00j1j2Ti;0<<1; where2,andarethevariance,correlationbetweenaunittimeintervalandpower,respectively.Theseparametersneedtobeestimatedfromthedata,butshouldbedenotedbyk=(2k;k;k)dierentlyformatrixikingroupk.

PAGE 43

BasedonBarret's(1979)theoremthatleadstothewell-knowntridiagonalorJacobimatrix,Nu~nez-AntonandWoodworth(1994)derivedtheinverseofthe(co)variancematrixexpressedinEquation3-6.TheyalsoderivedaclosedformofthedeterminantbyapplyingtheCholeskyfactorizationtothecovariancematrix.Wecanincorporatetheclosedformsoftheinverseanddeterminantofthematrixintoamixture-basedgenedetectingmodel.3.2.4ComputationalAlgorithmsAsclassiedabove,theunknownparametersthatbuildupthelikelihoodfunction(Eq.3-3)includethecurveparameters,matrix-structuringparametersandthegenotypefrequenciesinthesampledrawnfromanaturalpopulation.Thersttwotypesofparametersarethequantitativegeneticparameters,arrayedbyQk=(fjkg3mj=1;k)fordierentgroups,whereasthelasttypeisthepopulationgeneticparameters,P.Alltheunknowns=fkg2k=1=(fQkg2k=1;P)canbeestimatedthroughdierentiatingthelog-likelihoodfunctionofEquation3-3withrespecttoeachunknown,settingthederivativeequaltozeroandsolvingthelog-likelihoodequations.ThisestimationprocesscanbeimplementedwiththeEMalgorithmasdescribedbelow.Thelog-likelihoodfunctionoflongitudinaldataforthetwogroupsbasedonEquation3-4isgivenbylogL(jy)=2Xk=1logLk(kjyk)=2Xk=1nkXi=1log3mXj=1[$jfj(yik;Qk;P)];

PAGE 44

withthederivativewithrespecttoanyelement`@ @`logL(jy)=2Xk=1nkXi=1(fj(yik;Qk;P)@$j @Qkfj(yik;Qk;P) @Qklogfj(yik;Qk;P)=2Xk=1nkXi=1jjik1 @Qklogfj(yik;Qk;P)wherewedenejjik=$jfj(yik;Qk) @`logL(jy)=0(3.8)togettheestimatesofintheMstep(Eq.3-8).Theestimatesarethenusedtoupdate,andtheprocessisrepeatedbetweenEquations3-7and3-8untilconvergence.Thevaluesatconvergencearethemaximumlikelihoodestimates(MLEs)of.TheiterativeexpressionofestimatingP,i.e.,genotypicfrequencies,isderivedasb$j=P2k=1(Pnki=1jjikPnki=1Pj06=j;3mj0jik)

PAGE 45

ButsinceNEC1isnotdenedascanbeseenfromabove,weareunabletocomparesituationsJ=1vs.J>1directlywithNECJ.Here,weuseageneralprocedurefortheonecomponentcasetodecidebetweenJ=andJ>1(Biernackietal.1999).LetJbethevaluethatminimizesNECJ(2JJsup)withJsupbeingan

PAGE 46

upperboundforthenumberofmixturecomponents.WechooseJifNECJ1,otherwisewedeclareonecomponentforthedata.3.3HypothesisTestsDierentfromtraditionalmappingapproaches,ourfunctionalmappingforlongitudinaltraitsallowsforthetestsofanumberofbiologicallymeaningfulhypotheses(Wuetal.2004a).Thesehypothesestestscanbeaglobaltestfortheexistenceofasignicantgene,alocaltestforthegeneticeectongrowthataparticulartimepoint,aregionaltestfortheoveralleectofgeneonaparticularperiodofgrowthprocess,oradierentialdevelopmenttestforthechangeofgeneticexpressionacrosstime.Thesetestsatdierentlevelscanbeformulatedtotesttheeectsofgenotypegroupinteractionontheshapeofgrowth.3.3.1GlobalTestTestingwhetheraspecicgeneexisttoaectdevelopmentaltrajectoriesisarststeptowardtheunderstandingofthegeneticarchitectureofgrowthanddevelopment.Thegeneticcontroloverentiregrowthprocessescanbetestedbyformulatingthefollowinghypotheses:8><>:H0:jkk;irrespectiveofj(k=1;2)H1:Notalltheseequalitiesabovehold(3.10)TheH0statesthattherearenogeneaectinggrowthtrajectoriesandthethreegenotypiccurvesineachgroupoverlap(thereducedmodel),whereastheH1proposesthatsuchagenedoexist(thefullmodel).TheteststatisticfortestingthehypothesesinEquation3-9iscalculatedasthelog-likelihoodratioofthereducedtothefullmodel:LR=2[logL(ejy)logL(bjy)];

PAGE 47

whereeandbdenotetheMLEsoftheunknownparametersunderH0andH1,respectively.Anempiricalapproachfordeterminingthecriticalthresholdisbasedonpermutationtests,asadvocatedbyChurchillandDoerge(1994).Byrepeatedlyshuingtherelationshipsbetweenmarkergenotypesandphenotypes,aseriesofthemaximumlog-likelihoodratiosarecalculated,fromthedistributionofwhichthecriticalthresholdisdetermined.3.3.2LocalTestThelocaltestcanexaminethesignicanceofthemaineectsofgeneanditsinteractionwithexposureongrowthtraitsmeasuredatatimepoint(t)ofinterest.Forexample,thehypothesisfortestingtheeectofgeneongrowthatagiventimetcanbeformulatedas8><>:H0:uj1(t)=u1(t);uj2(t)=u2(t);j=1;:::;3mH1:Notalltheequalitieshold(3.12)whichisequivalenttotestingthedierenceofthefullmodelwithnorestrictionandthereducedmodelwitharestrictionassetinthenullhypothesis.3.3.3RegionalTestSometimesweareinterestedintestingthedierenceoftraitprocessesinatimeintervalratherthansimplyatatimepoint.Thequestionofhowageneexertsitseectsonaperiodoftimecourse[t1;t2]canbetestedusingaregionaltestapproachbasedontheareas,Ajk=Zt2t1G(t:jk)dt:coveredbylongitudinalcurves.Thehypothesistestforthegeneticeectonaperiodoftraitprocessisequivalenttotestingthedierencebetweenthefullmodelwithnorestrictionandthereducedmodelwitharestriction.

PAGE 48

forgroup1,andj22;j=1;:::;3m; forgroup2.Therejectionofonlyoneofthetwonullhypotheses(Eq.3-13)and(Eq.3-14)detectedtheeectofenvironment-specicgenes.Ifboththenullhypothesesarerejected,thismeansthatthedetectedgene(s)simultaneouslyaectdevelopmentalcurvesintwodierentenvironments.Whethersuchso-calledpleiotropicgenes(i.e.,thosethatdisplayeectsonmultiplephenotypes)areenvironment-sensitivecanbefurthertested.Thiscanbedonebyformulatingthenullhypothesisj1j212;j=1;:::;3m;

PAGE 49

whichstatesthatthedierenceingenotypiclongitudinalcurvesbetweentwodier-entgroupsisidenticalacrossdierentgenotypes.Therejectionofthishypothesissuggeststhat,althoughthesamegene(s)aectdevelopmentaltrajectoriesindier-entenvironments,thereisasignicantgenotype-environmentinteractiontoshapelongitudinalcurves.3.4Application3.4.1ALongitudinalPrenatalCocaine-ExposureDatabaseAprospectivelongitudinalstudywasinitiatedtoinvestigatetheeectsofprenatalcocaineuseonneurodevelopmentaloutcomesoftheexposedchildren.Theserecruitedwomenwere>18yearsold,hadnomajorchronicillnessdiagnosedpriortopregnancy,didnotuseillicitsubstancesasidefromcocaineandmarijuana,andhadnochronicuseofprescriptionandover-the-counterdrugs.Researchersat-temptedtoenrollallwomendesignatedtodeliverinalocalcarecenter.Over2500womenwereapproachedand85%agreedtoparticipate.Fromthoseconsenting,154womenwereidentiedascocaineusersfromconrmedpositiveurinetoxicologyscreens(obtainedatstudyenrollmentanddelivery)oradmissionofuseinprivate,structuredinterviewsobtainedattheendofeachtrimesterofpregnancy.There-mainderofthosecensentingwereretainedaspotentialmatches.Thisexposedgroupwasmatchedforsocioeconomicstatus,race,parity,andlocationofprenatalcare(thatrelatedtoperinatalrisk)toacontrolgroupof154womenfromthematchpoolwhowerenotcocaineusers(Eyleretal.1998a,1998b).Thechildrenfrombothgroupshavebeenfollowedfrombirthtoageeight.Interviewersandcliniciansconductedabatteryofpsychological,neurological,andmedicaltestsateachofninetimepointsthatcorrespondedtothefollowingages:birth,onemonth,sixmonths,1year,eighteenmonths,2years,3years,5years,and7years.Measurementsofgrowthdataincludedheadcircumference,bodyweightandbodyheightatallninetimepoints.Inthisarticle,wefocusedoninvestigatingthe

PAGE 50

Figure3{1:Plotsofheadcircumferencevs.agesfortheexposed(A)andcontrol(B)groupseachcomposedof144children(inyellow)sampledfromanaturalpopu-lation.Ageistimeinyearssinceestimatedconception.AmeancurveineachgroupisttedusingEquation3-6.Notethatafewsubjectshaveonlyalimitednumberofmeasurementpoints. geneticcontributiontothegrowthofheadcircumference.Toaccuratelyreectthetimecourseofmeasures,weusedtheageindecimalyearswhentheheadcircumferencewasmeasured.Fortysixoutofthe2,232observationsweredeletedbecausetheywereobviousoutliers.3.4.2ResultsFigure3-1illustratestheplotsofheadcircumferenceagainstagesfrombirthtoage7yearsforallsubjects.Themeangrowthcurveisslightlylowerfortheexposed(Fig.3-1A)thancontrolgroup(Fig.3-1B),althoughvariationingrowthtrajectoriesamongsubjectsaresimilarbetweenthetwogroups.Statisticaltestssuggestthatheadcircumferencegrowthtrajectoriesforallsubjects,exceptforthoseinwhichtherearenotenoughrepeatedmeasurements,canbewellttedbyJenssandBayley's(1937)growthcurvethatcandescribegrowthduringchildhood,expressedasg(t)=a+btecdt;

PAGE 51

wherethegrowthparameterset(a;b;c;d)determinesjointlytheshapeofgrowthtrajectories.Smallerandmoreuniformvariationingrowthtrajectorieswasdetectedthroughoutheaddevelopmentafterthelogtransformationthanfortherawdata.Wewillthereforebaseouranalysisonthelog-transformeddatabyincorporatingthetransform-both-sides(TBS)modeladvocatedbyWuetal.(2004b).Thestatisticalmodelbasedontransformeddatawasusedtoidentifywhetherthereisaspecicgeneofmajoreectsthatcontrolheadcircumferencegrowthcurvesintheexposedandcontrolgroups.NECanalysisbyEquation3-9indicatesthataone-genemodel(withthreemixturecomponents)ismoreparsimoniousforexplaininggrowthdatathanatwo-genemodel.Basedonthelog-likelihoodratio(LR)test,amajorgenewasfoundtosignicantlyaectdierentiationingrowthtrajectoriesofheadcircumferenceinthesetwogroups.TheLRvaluecalculatedunderthefullmodel(thereisagene)andthereducedmodel(thereisnogene)was497.6,greaterthanthe=0:01criticalthresholdvalue(32.69)determinedonthebasisof100simulationreplications.Weexpressedthefrequenciesofthreegenotypesatthedetectedgeneintermsofallelefrequency(p)andHardy-Weinbergdisequilibrium(D)(LynchandWalsh1998).Specically,thesegenotypefrequenciesareexpressedas$1=p2+DforQQ,$2=2p(1p)2DforQqand$3=(1p)2+Dforqq,respectively.Theallelicfrequenciesatthedetectedgeneinthepooledexposedandcontrolgroupswereestimatedas0.3169foronealleleand0.6831forthesecondallele,withtheHWDcoecientof^D=0:0675.Thegrowthcurvesforthethreegenotypes,homozygotefortherarerallele(denotedasQQ),heterozygote(denotedasQq)andhomozygoteforthemorecommonallele(denotedasqq),atthedetectedgeneestimatedfromeachgroup(Table3-1)weredrawntoshowhowtheyareexpressedoverage(Fig.3-2).Thereweremarkeddierencesamongsixdierentgrowthcurves,validatingtheexistenceofthisgeneresponsiblefordevelopmentaltrajectoriesofheadsize.

PAGE 52

Afurtherindividualhypothesistestwasperformedtoexaminewhetherthisgeneisimportantineachofthetwogroups.TheLRvaluesfortheexposed(89.1,P<0:01)andcontrol(73.4,P<0:01)groupssuggesttheoccurrenceofthisgeneinbothgroups.Thegenotype(QQ)thatislessfrequentthantheothertwogenotypesmaintainsthefastestgrowththroughoutthegrowthprocessinthecontrolgroup(Table3-1;Fig.3-2).Theheterozygotedisplayssmallergrowththaneachofthetwohomozygoteinthecontrolgroup,suggestingthatthisgeneisoverdominantinadirectiontowarddecreasingheadgrowthinnon-cocaine-exposedchildren.Thegeneticexpressionofthismajorgeneshowsacompletelydierentdirectionintheexposedgroup,inwhichthehomozygotefortherareralleleexpressesleastduringtheentiregrowthperiodofmeasurement(Fig.3-2).Theheterozygotegrowsbetterthanthetwohomozygotes,demonstratingastrongoverdominantgeneactionmode.Surprisingly,theheterozygoteperformsbetterincocaine-exposedchildrenthantheunexposedcontrolchildren.Morestudies,e.g.,throughgeneticmappingbasedonmolecularmarkers,areneededtovalidatetheeectofthisgene.Weusedthehypothesistest(Eq.3-15)toidentifyhowthisgeneinteractswithprenatalcocaineexposuretodetermineoverallgrowthcurves.Becausethesamegenotypeintwodierentgroupsdonotoverlap,theareaundercurve(A)isanappropriatecriterionforthegenotypegroupinteractiontest,expressedasA=ZT0a+btecdtdt=aT+1 2bT2+1 (3.17) i.e.,thedierencebetweentheareasundercurvesfordierentgroupsissetequalforthethreemajorgenotypes.TheLRvalueforthistestis87.5(P<0:01),

PAGE 53

Table3{1:Parameterestimatesofamajorgeneaectinggrowthtrajectoriesofheadcircumferencein144exposedand144controlchildren ParametersGenotypeQQGenotypeQqGenotypeqq suggestingthatthegeneticeectofthisgeneisexpresseddierentlybetweentheexposedandcontrolgroupsand,thus,thisdetectedgeneisenvironment-sensitive.Dierenteectsofthegeneongrowthcurvesbetweenthetwogroupsaresummarizedbelow.First,overallgeneticeectsappeartobesimilarbetweentheexposedandcontrolgroup(Fig.3-2).However,asshownbycurvedierences,thelargestgeneticeectsoccuratagesonethroughthreeyearsintheexposedgroup,whereasonlyafterthisperiodisgeneticdierentiationobservedinthecontrolgroup.Second,thesamehomozygoticgenotypealwaysdisplaysmoreretardedgrowthintheexposedthancontrolgroup,especiallyforonecomposedoftherarerallele.Thehomozygotefortherareralleleisaectedtothelargestextentwhenchildrenwereexposedtococaine.

PAGE 54

Figure3{2:Threegrowthcurveseachpresentingthreegroupsofgenotypes,QQ(solidcurves),Qq(dotcurves)andqq(brokencurves),intheexposed(red)andcontrol(blue)childrenatthegene,detectedbyourjointmodel.Ageistimeinyearssinceestimatedconception.

PAGE 55

oneoftheimportantmechanismsfortheorganismtowelladapttoenvironmentalchanges(Wu1998).Theincorporationofallelicsensitivityintoourstatisticalmodelmakesitbiologicallyrelevantandpotent.Ourmodelwasusedtoanalyzelongitudinalgrowthdatafromawelldesignedstudyincludingcocaine-exposed(case)andunexposed(control)groupsofchildren.Thepubliclyinterestingquestionofwhetherprenatalexposuretococaineaectschilddevelopmenthaslongbeendebatedinmedicalandbehavioralsciences(Zuckermanetal.2002).Inanextensiveliteraturesurvey,Franketal.(2001)didnotndconvincingevidencethatprenatalcocaineexposureisassociatedwithdevelopmentaldelaysinearlychildhood.Theysuggestedthatobservedimpactsofcocaineexposuremaybeduetoothercocaine-unrelatedfactorsincludingthequalityofthechild'senvironment.However,inotherstudies,thesefactorswereconsideredasacauseforthepaucityofcocaine-relatedndings(TronickandBeeghly1999).Morerecentworktendstosupportthenegativeassociationbetweenchildren'sdevelopmentaltestscoresandprenatalexposuretococaine(Lester1999;Singeretal.2002;Bandstraetal.2004).Kolbetal.(2003)usedratsasamodelsystemtodetectthechangeofspecicbraincellsinresponsetostimulantdrugssuchasamphetamineorcocaineandthepersistentbehavioralandcognitivedecitsassociatedwithdrugabuse.Therearetwoapproachesforresolvingthecurrentdebateregardingcocaineexposure.First,morecarefully-designed,large-scale,prospectivelongitudinalstudiesareneededinwhichmanyconfoundingriskordemographicfactorscanbecontrolled.Second,theinvestigationofacocaine-relatedresponsealsoshouldbebasedonamechanisticunderstandingofitsgeneticcontrol.Theresponsetoprenatalcocaineexposureisbiologicalandundoubtedlyinvolvessomegeneticcomponent.Dierentpopulationsanddierentindividualswithinthesamepopulationmayresponddierentlytococaineexposurebecausetheycarrydierent

PAGE 56

geneticblueprints.Andreticetal.(1999)haveidentiedbiologicalclockgeneslinkedtococainesensitizationinfruities.Wellbeyondtheresolutionofthedebate,thecharacterizationofspecicgenesthatregulatetheeectofcocaineonchildren'scognitiveandmotordevelopmentcanhelptoprovidescienticguidanceforpreventingandcontrollingtheeectsofprenatalcocaineexposure.Ourmodelhastremendouspowertodetectmajorgeneshiddeninthisdatasetthatalterheadsizegrowthpatternduringchildhood.Itisparticularlysuitableforastudyinwhichmeasurementsarecollectedathighlyirregularandsubject-specictimepoints,asisthecaseforalmostallhuman-relatedstudies.Wefoundthatthesamegenotypesatthedetectedgenedisplayslowergrowththroughoutontogenyintheexposedthancontrolgroup.Inparticular,thehomozygoticgenotypeforthemorecommonalleleismostaectedbyprenatalcocaineexposurethroughouttheentiregrowthprocessinearlychildhood.Ourmodelhasadditionalpowertodetectwhetherandhowthedetectedgenetriggersaneectonotherbiologicallyandclinicallyimportantvariables,suchasthetimingofmaximalgrowthrate,thetimingofsexualmaturityanddevelopmentaltestscores.Specicgenesthatareassociatedwithgrowthretardationduetococaineexposurehavealreadybeendetectedintheanimalmodelsystems,includingfruitiesandrats(Kolbetal.2003).Althoughthereisconvincingevidencefortheexistenceofacocaine-specicgenefromourpracticaldataset,wehavenoideaaboutthegenomicpositionofthisgene.Weareevenunclearwhetherthisdetectedgeneactsasawholeorcontainsasuiteofmanygenesofsmalleectsthatacttogethertoaectresponsetococaineexposure.Theseuncertaintiescanbesolvedwithmolecularmarkers(LanderandBotstein1989).Themodelreportedherecanbereadilyextendedtoincludemolecularmarkerstoassociatewithdynamicchangesofaphenotypictrait(seeMaetal.2002;Wuetal.2004a).Itwouldbeworthwhiletogenotypemarkers

PAGE 57

fromthesetwococaine-exposedandunexposedgroups.Usingthesemarkers,ourgeneticmodelsbasedonlinkagedisequilibriumanalysis(Louetal.2003)canbeusedtodetecttheunderlyingquantitativetraitlociforcocaine-inducedalternationingrowthtrajectories.Inthisparticularcocainestudy,earlygrowthduringchildhoodmaynotonlybecontrolledbythechild'sowngenome,butalsobythegenomeoftheirmotherandfather.Inheritanceofbothparentalgenomesisessentialfornormalgrowthanddevelopment,becauseofimprintedgenesthatareexpressedonlyfromeitherthematernalorpaternalchromosome.Imprintedgeneshavebeendocumentedinmanymammals(Cattanachetal.1985).Thegeneticmappingofimprintedgenesthatplayacentralroleinshapingdevelopmentaltrajectoriesincocaine-exposedchildrencanbedetectedwithanewstatisticalmethodincorporatingmaternal-ospringinteractions.Inaddition,thecurrentmodelassumesthatsamplesindierentgroupsaredrawnrandomlyfromanaturalunstructuredpopulation.However,thisassumptionmaybeviolatedinpracticebecausethepeoplesampledmayhavedierentancestrybackground.Muchresearchhasbeenlaunchedintotheanalysisofpopulationstratication(Pritchardetal.2000).Ourcurrentmodel,alongwiththepopulationstructuringtheoryandlinkagedisequilibriumanalysis(Louetal.2003),willlayafoundationforthederivationofamorecomprehensivemodelthatshouldshedmorelightonchildgrowthanddevelopment.

PAGE 58

48

PAGE 59

Asatypeoftimeseriesdata,longitudinaltraitsexhibitastrongautocorrela-tionbetweensuccessivetimepoints.Structuringsuchatime-dependentcovariancematrixbyastationaryornonstationaryapproachcanincreasethemodel'sexi-bility,stabilityandstatisticalpowertodetectQTL.Theapproachesformodellingthecovariancestructureinfunctionalmappinghavebeenbasedonautoregressive(Maetal.2002)orantedependencemodels(Zhaoetal.2005a,2005b).Inallsuchmodellingwork,repeatedmeasurementsareassumedtobeindependentamongdierentsubjectsand,thus,onlywithin-subjectcovariancestructureshavebeenconsidered.Inageneralsettingoflongitudinaldataanalysis,threecomponentsofrandomvariabilityinthemodellingprocessshouldbedistinguished,thatis,therandomeectsthatstemsfromheterogeneitybetweenindividualproles,serialcor-relationbetweenobservationswithinsamplingunitandmeasurementerror(Diggleetal.2002).Thus,althoughcurrentapproachesforapproximatingacovariancestructuremerelybasedonserialcorrelationsarethoughttobeparsimonious,theyhaveseriouslimitationsthatwillpreventawideimplicationoffunctionalmapping.Theselimitationsareshowninthefollowingaspects:First,themathematicalparametersforindividuallongitudinalcurveswiththesameQTLgenotypemaynotbeindependentamongsubjects.Theignoranceofamong-subjectdependenceforthecurveparameterswouldoverestimatethegeneticeectofQTLonlongitudinaltrajectories.Todrawvalidscienticinferencesforlongitudinaldata,randomeectsthatcaptureheterogeneityamongsubjectsshouldbeconsidered,inaconjecturewithdirectmodellingofthewithinsubjectcorrela-tion(ChiandReinsel1989;Schabenberger1995).Second,thecurveparametersmaybeaectedbyanarrayofbiologicalordemographiccovariates,suchasage,sex,raceandbodyweight.Forthosebiologicalcovariates,itispossiblethattheyareunderthecontrolofgeneticsystemsthatarethesameasordierentfromthose

PAGE 60

forthelongitudinaltraitsunderconsideration.Testingthedierenceofgeneticcon-trolfordierenttraitsorprocessespresentsaninterestingandchallenginggeneticissue.Inthischapter,nonlinearmixed-eects(NLME)models,orhierarchicalnonlinearmodels,willbeincorporatedintothecontextoffunctionalmappingbasedonamixturemodel,aimedtocircumventtheabove-mentionedlimitationsofcurrentfunctionalmappingstrategies.Sincetheirrstemergenceintheearly1980s(BealandSheiner1982;reviewedinDavidianandGiltiman1995),NLMEmodelshavequicklybecomeapopularstatisticalmethodforstudyinglongitudinaldata(LindstromandBates1990)andtheirextensionsandmodicationstobettersuitnewchallengeshavecontinuouslyappeared(Voneshetal.2002;Wu2004).ThemajoradvantageofNLMEmodelsisintheircapacityandexibilitytomodelvariousstructuresofcovariancematrices.Also,theydisplayauniqueabilitytoaccommodateageneralintraindividualcovariancestructureforunbalanceddatawheremeasurementsaresparseforsomesubjectsanddierentsubjectsreceivedierentmeasurementpatterns.ApplicationofNLMEmodelsisapromisingapproachforimprovingparameterestimationandvalidinferencesoflongitudinaldata.Whilemixedeectsmodelsarewidelyusedinmanyotherareassuchasinhealthsciencesincludingpharmacoki-neticsandHIVdynamics,theyarenotyetdevelopedinanygeneticmappingstudy.Accordingly,thepurposeofthisworkistodevelopNLMEmodelsforestimationoftheontogeneticcontrolofcomplexlongitudinaltraits.IwillintegrateNLMEmodelswiththetheoryoffunctionalmappingwithinamixturemodelframework.IwilldescribetheprincipleofNLMEmodelsbystartingwithasimplerlinearmixedeectsmodel.

PAGE 61

whereZiisa(iq)subject-specicdesignmatrix,iisa(q1)vectorofsubject-specicunknownregressioncoecientsandiisa(i1)errorterm,andisusuallyassumedtohaveanormaldistributionwithmeanvector0andcovariancematrix.Notethatisa(ii)serialcovariancematrix,whichcanbestructuredbyasetofparameters(Diggleetal.2002).Stage2:i=Ki+bi whereKiisa(qp)vectorofcovariates,isa(p1)vectorofunknownpopulationparametersandbiisassumedtohaveanormaldistributionwithmeanvector0and(qq)covariancematrixD.Elementsinbiareindependentofeachotherandofthei.Thisstagecapturestheinter-individualsystematicandrandomvariation.

PAGE 62

Indeed,thetwostagemodelcanbecombinedinagenerallinearmixed-eectsmodel,expressedas8>>>><>>>>:yi=Xi+Zibi+i;biN(0;D);iN(0;i):(4.3)whereXi=ZiKiisthedesignmatrixthatlinksxedeectstoyiandbiissubject-specicrandomeects.Itcanbeshownthatconditionallyonbi,yihasanormaldistributionwithmeanXiandcovariancematrixVi=i+ZiDZTi.ThemarginallikelihoodfunctionofthenprogenyisthengivenbyL(yi;;)=nYi=1(2)i 2exp1 2(yiXi)TV1i()(yiXi)(4.4)whereisthesetofunknownparametersthatareusedtomodelthestructureofVi.AssumingthatViisknown,thegeneralizedleastsquareestimatorsofandbibymaximizingthelikelihoodfunctioncanbederived(LairdandWare1982;Harville1976).Themaximumlikelihoodestimator(MLE)ofanditsstandarderrorcanbederivedas^=(nXi=1XTiV1iXi)1nXi=1XTiV1iyi; var(^)=nXi=1XTiV1iXi!1 AnintuitiveapproachtotheinferenceofbiisbasedonBayesiantechniques.Theposteriordistributionbigivenyicanbewrittenasf(bijyi)=f(yijbi)f(bi)

PAGE 63

Thus,bicanbeestimatedbytheposteriormean^bi(;)=Zbif(bijyi)dbi=DZTiV1()(yiXi): Thestandarderrorof^biisthengivenas:var(^bi)=DZTi8<:V1iV1iXinXi=1XTiV1iXi!1XTiV1i9=;ZiD:(4.9)Itcanbeshownthat^biisalsotheweightedleastsquareestimatorsandBLUPforbi.Therefore,theestimates^and^bijointlymaximizetheposteriorfunction(LindstromandBates1992)f(;bijyi)/exp1 2(yiXiZibi)T1i(yiXiZibi)1 2bTiD1bi: ForunknowncovariancematricesViandi,theestimateofcanbeusedandthus^Viand^willreplaceViandiinEquations4-5and4-8,respectively.Therefore,theestimators^and^bidependonmatrix-structuringparameters^.Estimatingthestructureofacovariancematrixbyparameterswithintheframeworkofthelinearmixed-eectsmodelsarewelladdressedinstatisticalliteratures.Maximumlikelihood(ML)estimatesorrestrictedMLestimates(REML)canbeobtainedthroughbuilt-inproceduresfrommanysoftwaresuchasSAS,RandbyusingdierentkindsofalgorithmssuchasNewton-RaphsonandEM.Thesetechniquesareprovedtobecomputationecientandprecise,andmostimportantofall,theycanbeeasilyextendedtothenonlinearmixed-eectsmodels.4.3NonlinearMixed-EectsModelInthissection,Iwillapplythetechniquesderivedforthelinearmixed-eectstothenonlinearmixed-eectsmodels.Fortheresponsevectoryi=(yi1;:::;yii)0

PAGE 64

measuredatdierenttimepointsti(ti=1;:::;i),ageneralnonlinearmixed-eectmultivariatemodelisdenedinthefollowingtwostages(LindstromandBates1990;DavidianandGiltinan1995):Stage1.Fortheithprogeny,yi=i(i;ti)+i whereisanonlinearfunctionofiandti.Alltheothervariablesaresimilartothesettingsoflinearmixed-eectsmodels.Stage2.Fortheintra-individualvariation,i=Ai+Bibi whereisa(p1)vectoroftheunknownpopulationparameters,biisa(k1)vectoroftherandomeectsandhasanormaldistributionwithmeanvector0and(kk)covariancematrixD,andAiandBiaredesignmatricesofsizeqpandqkforandbi,respectively.Thisstagecapturestheinter-individualsystematicandrandomvariation.ThismodelinageneralformcanhandleanykindsofnonlinearfunctionandthedesignmatricesAiandBicanvaryfordierentgroups,covariatesorevenfordierentsubjects.Inthischapter,thesematricesaresimpliedasI.Themarginaldistributionoftheithprogenyisthengivenas(DavidianandGiltinan1995):Zfyijbi(yij;bi;)fbi(bijD)dbi wherefyijbi(yij;bi;)=1 (2)i=2jij1=2exp1 2(yi(+bi))T1i(yi(+bi))

PAGE 65

andfbi(bijD)=1 (2)i=2jDj1=2exp1 2bTiD1bi: Similartothelinearmixed-eectsmodels,theestimates^and^biwilljointlymaximizetheposteriordistributionfunctionas:f(;bijyi)/exp1 2[yi(+bi)]T1i[yi(+bi)]1 2bTiD1bi:(4.16)Inferenceoftheunknownparametersbasedonmaximizingthisdistributionfunctionwillbedicult.Therearecertainlimitationstomaximizethisdistributionfunction.Theexpectationfunctionisnotlinearintheunknownparameters;bi.Mostwell-developedlineartechniquescannotbeapplieddirectlytothenonlinearmodels.Therandomeectsbicannotbeeasilyintegratedoutduetothecomplexityofthenonlinearmodel.Noclosedformcanbeobtainedforthemaximumlikelihoodestimates.Therearesomealternativeapproachesaddressedinstatisticalliteraturesthatincludenumericalevaluationoftheintegral(DavidianandGiltinan2003)orapproximationstothenonlinearlikelihoodfunction(LindstormandBates1990;TierneyandKadane1986;Geweke1989andWolnger1993).Inthischapter,wewillusetwolinearizationapproximationmethods:oneproposedbyLindstromandBates(1990)andtheotherproposedbyBealandSheiner(1982).Bothofthemeth-odsusetherstorderTaylorexpansiontoapproximatethenonlinearexpectationfunction.TheLindstromandBatesmethodapproximatestheexpectationfunctionaroundthecurrentestimatesofandbi,whiletheBealandSheinermethodexpandstheTaylorseriesonlyaboutthemeanofE(bi)=0.4.3.1LindstromandBates'ApproximationLindstromandBates(1990)proposedalinearmixed-eects(LME)approxima-tionstepbasedontherst-orderTaylorexpansionsaroundthecurrentestimates

PAGE 66

ateachiterativestep.PinheiroandBates(1995)showedthatthisapproximationisquitecomputationallyecientandaccurate.Let(^(!);^b(!)i;^(!))bethecurrentestimatesof(;bi;).TherstorderTaylorexpansionof(;bi;)isexpressedas(;bi;)(^(!);^b(!)i;^(!))+^Xi(^(!))+^Zi(bi^b(!)i) (4.17) where^Xi=^Xi(^(!);^b(!)i;^(!))=@ @Tj^(!);^b(!)i ^Zi=^Zi(^(!);^b(!)i;^(!))=@ @bTij^(!);^b(!)i: Let~yi=yii(^(!);^b(!)i;^(!))+^Xi^(!)+^Zi^b(!)i.Itisstraightforwardtoshowthat~yihasanapproximatemarginalnormaldistributionwithmeanvector^Xiandcovariancematrix^Vi=i+^ZiD^ZTi.UndertheLMEapproximation,theapproximateposteriordistributionfunctionofbigivenyiis:f(;bijyi)/exp1 2(~yi^Xi)T1i(~yi^Xi)1 2bTiD1bi:(4.20)Similartothelinearmixed-eectsmodels,theestimates^and^biwilljointlymaximizetheposteriorfunction.Theresultsoflinear-mixedeectsmodelsgivenbyLaidandWare(1982)canbeappliedas:^b(!+1)i=^D(!)Z(!)Ti^V1(!)i(~yi^X(!)i^(!)); ^(!+1)=nXi=1^X(!)Ti^V1(!)i^X(!)i!1nXi=1^X(!)Ti^V1(!)i~yi: Notethat^biand^idependonthecurrentestimates^(!)l,^b(!)iand^(!)andthersttermsofTaylorexpansionareassumedtobeexact.UnderthisLMEapproxi-matestep,allthetechniquesareinthelinearmixed-eectsmodelframework.Thisapproximationreducesthecomputation.

PAGE 67

Toestimate,weneedtomaximizethemarginallog-likelihoodfunctionL(;jy)=1 2nXi=1i+^ZiD^ZTi1 2nXi=1~yi^XiTi+^ZiD^ZTi1~yi^Xi: Maximumlikelihood(ML)estimatesorrestrictedMLestimates(REML)ofcanbeobtainedthroughNewton-Raphson,orEMalgorithms.Anupdatedestimateofcanalsobeobtainedbymaximizingthisobjectivefunction.Atwo-stepiterationisconstructed.Intherststep,theleastsquareestima-tors^biand^arecalculatedbymaximizingtheapproximateposteriorlikelihoodfunctionand,inthesecondLMEstep,^and^areobtainedbymaximizingtheapproximatelikelihoodfunction.Thesetwostepsareiterateduntilacertainconvergencecriterionismet.4.3.2BealandSheiner'sApproximationBealandSheiner(1982)proposedalinearizationapproximationtothenon-linearmixedeectsmodelsaroundthemeanvalueofE(bi)=0insteadofthecurrentestimateofbiundertheLMEapproximationmodel.TherstorderTaylorexpansionsofyi(;bi)aboutE(bi)=0canbewrittenasyi(;bi)yi(;0)+^Zi(;0)bi(4.24)where^Zi(;0)=^Zi()=@ @bTij^;bi=0:(4.25)DierentfromtheLMEmethod,thereisnoposteriordistributionfunctionofbigivenyibecausebiisalwayssetto0.^Zi(;0)dependsonlyon,butnoton^(!)and^b(!)i.Inthiscase,(;0)isnonlinearin.

PAGE 68

Themarginaldistributionofandgivenyisapproximatenormalandthelog-likelihoodfunctioncanbeexpressedasL(;jy)=1 2nXi=1logji+^ZiD^ZTij1 2nXi=1n[yi(;0)]T(i+^ZiD^ZTi)1[yi(;0)]o Thismethoddoesnotneedtheintegrationof^bi.ThecomputationburdenintherststepoftheLMEmethodisreduced.Therefore,underBealandSheiner's(1982)model,onejustneedstomaximizethemarginallikelihoodfunctionofandgiveny.Thiscanbedonewithinonealgorithm.Forexample,usingthesimplexoptimizationmethod,wecanestimateandsimultaneouslybymaximizingthemarginallog-likelihoodfunction.Thismethodcouldresultinpoorestimatescomparedtotheapproximationaroundthecurrentestimateofbi(LindstromandBates1990).Thecomputationeciency,however,willdrawinterestsinanalyzinglargedatasets.4.4LikelihoodforFunctionalMappingThestatisticalfoundationforQTLmappingwithmolecularmarkersisanitemixturemodel.Accordingtothemixturemodel,thetraitvalueofanindividualisassumedtohavearisenfromoneandonlyoneofaknownorunknownnumberofmixturecomponents,eachcomponentwitharelativeproportionandbeingmodelledbyanormaldistributiondensity.Inamappingpopulation,weassumethatthereareLQTLgenotypeseachcorrespondingtoamixturecomponent.Letyibethetime-dependentmeasurementsforalongitudinaltrait,whosedistributionisexpressed,withinthemixturemodelcontext,asyi/f(yij$;;bi;)=$1f1(yij1;bi1;)++$LfL(yijL;biL;);

PAGE 69

where$=($1;:::;$L)0aretheQTLgenotypefrequencieswhichareconstrainedtobenonnegativeandsumtounity;=(1;:::;L)0andbi=(bi1;:::;biL)0arethecomponent(orQTLgenotype)specicparameters,withlandbilbeingspe-cictoQTLgenotypel;,whichisthecommonparameterstoallQTLgenotypes,isthesetofunknownparametersthatconstructDandi,andtheparametricfamily(fl)forthelthQTLgenotypeisdenedasfl(yijl;;bil)=1 (2) 2expn1 2[yii(l+bil)]T1i[yii(l+bil)]o: Here,weassume(bi1;:::;biL)iidN(0;D)andtheerrortermofeachcomponentiidN(0;i).Thus,allthecomponentssharethesamecovariancematricesDandi.Inarealanalysis,theapproximationmodelinsteadoftheexactlikelihoodmodelwillbeused.ThemixtureproportionorQTLgenotypefrequency$ldependsonthetypeofmappingpopulation,suchasthebackcrossorF2.ThefrequenciesofQTLgenotypescanbeinferredbyobservedmarkergenotypesbecausemarkersandQTLareassumedtobeco-segregatinginthemappingpopulation.AssumethataputativeQTLislocatedbetweentwoankingmarkersthatbrackettheQTL.Thus,themixtureproportionscanbeexpressedastheconditionalprobabilitiesofQTLgenotypesgivenaankingmarkergenotypeand,accordingly,$lshouldbewrittenas$lji,interpretedastheconditionalprobabilityofQTLgenotypelgiventheankingmarkergenotypeofprogenyi.TheconditionalprobabilitycanbederivedintermsofrecombinationfractionsbetweentheQTLandeachofthetwomarkersandbetweenthetwomarkers.Thelikelihoodofunknownparametersgiventhelongitudinalmeasurementsandmarkerinformation(M)foramappingpopulationofsizenisformulated,in

PAGE 70

termsofamixturemodel,asL($;;bi;jyi;M)=nYi=1$1jif1(yij1;bi1;)++$LjifL(yijL;biL;): Severalalgorithmshavebeendevelopedtoestimatetheunknownparameters($;;bi;)containedinthelikelihood(Eq.4-29).4.5EstimationandComputationalAlgorithmInthissection,IwillproposetheEMalgorithmtoobtainthemaximumlikelihoodestimates(MLEs)oftheunknownparameters($;;bi;)thatdescribethegenomiclocationofQTL($),QTLgenotype-speciccurveparameters(;bi)andthecovariance-structuringparameters().Inpractice,theQTLlocationcanbetreatedasaconstantbecauseaputativeQTLcanbesearchedatevery1or2cMonanintervaloftwoankingmakersthroughouttheentirelinkagegroup.Thelog-likelihoodratiotestscoresareplottedagainstthelinkagemapdistance.Thelinkagemappositioncorrespondingtoapeakofthelog-likelihoodratioplotwillbedeterminedastheMLEoftheQTLlocation.Thetwoapproximatemethods,LindstromandBates'LMEandBealandSheiner's,willbeusedtodealwiththenonlinearexpectationfunction(;bi;).ThesetwomethodsareincorporatedintotheframeworkoftheEMalgorithm.4.5.1LindstromandBates'LMEMethodOneachscanninglocationofaQTL,themixturelikelihoodwillonlydependonl,biland.FortheLMEapproximationmodel,itwilldependonthecur-rentestimationofthoseparametersateachstepintheiterationloop.Ifnormalconditionsaresatised,themarginaldistributionofyiisexpressedasZLXl=1$lfl(yijl;bil;)!f(bi1;:::;biL)TjDd(bi1;:::;biL)T=LXl=1$lZfl(yijl;bil;))f(biljD)dbi1;

PAGE 71

wherebilisintegratedforeachcomponent.Now,weproposeamodiedLME-simplexhybridalgorithmintwostepstoestimatetheunknownparametersfornonlinearcurvesofeachQTLgenotypeandtheparametersthatstructurethecovariancematrix:Estep:Fortheithprogeny,theposteriordistributionoflthQTLgenotypeisf(l;biljyi)/expn1 2(~yil^Xill)T1i(~yil^Xill)1 2bTilD1bilo;(4.31)where~yil=yii(^(!)l;^b(!)il;^(!))+^Xil^(!)+^Zil^b(!)il.Here,~yiland^Xilnotonlydependonthecurrentestimateof,butalsoonthecurrentestimatesoflandbilfortheltheQTLgenotypeinthemixturemodel.Theposteriorestimator^bilisgivenas^b(!+1)il=^D(!)Z(!)Til^V1(!)il(~yi^X(!)i^(!));(4.32)where^bilalsodependsonthecurrentestimates^(!)l;^b(!)iland^(!).Dierentfromthenormalnonlinearmixed-eectsmodels,lfortheltheQTLgenotypecannotbedirectlyestimatedatthisstepduetothecharacteristicsofamixturemodel.TheywillbeestimatedwiththecovariancematrixparametersintheMstep.Mstep:EstimateQTLgenotype-speciccurveparameterslandcovariancematrixparametersbymaximizingthemixturelikelihoodfunctionusingthesimplexalgorithmimplementedwiththeMatlabfunctionfminsearchbasedonthemodiedsimplexalgorithm,adirectsearchmethodfornonlinearunconstrainedoptimization(Lagariusetal.1998).Theapproximatemixturelikelihoodmodelisnowwrittenasf(;j~yi)=$1f1(1;j~yi)++$Lfl(L;j~yi) (4.33) wherefl(l;j~yi)=1 (2)=2j^Vij1=2exp1 2f~yi^XillgT^V1if~yi^Xillg:

PAGE 72

Notethat^biliscalculatedattheEstep,whereaslandareestimatedsimultane-ouslybysearchingthemaximumvalueofthelikelihoodfunction.4.5.2BealandSheiner'sMethodBealandSheiner's(1982)methodissimplebecausenoEMalgorithmisneededduetobi=0.Thesimplexalgorithmisdirectlyusedtosimultaneouslyestimatelandbymaximizingtheapproximatemixturelog-likelihoodfunction,denedasf(;jyi)=nXi=1log[$1f1(1;jyi)++$Lfl(L;jyi)] (4.35) wherefl(l;jyi)=1 (2) 2expn1 2[yi(l;0)]T^V1i[yi(l;0)]o:

PAGE 73

Todeterminethesignicanceofthelikelihoodratiotest,weusethecriticalthresh-oldgeneratedbypermutationtests(ChurchillandDoerge1994).Byrepeatedlyshuingtherelationshipsbetweenmarkergenotypesandphenotypes,aseriesofthemaximumlog-likelihoodratiosarecalculated,fromthedistributionofwhichthecriticalthresholdisobtained.TheLRstatisticisplottedagainsttestlocationsforallthelinkagegroups.AlocationwhichhasahighpeakofLRisconsideredcorrespondingtothepositionofQTL.Inaddition,thehypothesistestforthetimeatwhichthedetectedQTLturnsonoroitseectonlongitudinaltrajectoriescanbeperformed,bycomparingthedierenceoftheexpectedmeansbetweendierentgenotypesatvarioustimepoints.Withinthefunctionalmappingframework,theeectoftheQTLonaperiodoftimecourseanditsinteractionwithagecanalsobetested(seeWuetal.2004a).4.7AWorkedExample4.7.1MappingPopulationThenewlydevelopedmodelisusedtoanalyzeapublishedrealdatasetinforesttrees.TheplantmaterialsusedwasderivedfromthetriplehybridizationofPopulus.AP.deltoidesclone(designatedI-69)wasusedasafemaleparenttomatewithaninterspecicP.deltoidesP.nigraclone(designatedI-45)asamaleparent(Wuetal.1992).BothP.deltoidesI-69andP.euramericana(P.deltoidesP.nigra)I-45wereselectedattheResearchInstituteforPoplarsinItalyinthe1950sandwereintroducedtoChinain1972.Inthespringof1988,atotalof450one-year-oldrootedthree-wayhybridseedlingswereplantedataspacingof45mataforestfarmnearXuzhouCity,JiangsuProvince,China.Thetotalstemheightsanddiametersmeasuredattheendofeachof11growingseasonsareusedinthisexample.Asubset(90)ofgenotypesrandomlyselectedfromthe450hybridswereusedtoconstructparent-dependentgeneticlinkagemapswithrandomamplied

PAGE 74

polymorphicDNAs,ampliedfractionlengthpolymorphismsandinter-simplesequencerepeatsbasedonatwo-waypseudo-testbackcrossdesigninwhichtherearetwodierentQTLgenotypesQqandqq,ataputativeQTL(Yinetal.2002).Thesemapsprovideabasicframeworkformappinggrowth-specicQTLwithourbivariatemodel.Asanexample,ourQTLanalysiswillbebasedonthegeneticmapconstructedfromheterozygousmarkerssegregatingintheparentP.deltoides.Thegrowthofthestemdiametercanbewelltbyalogisticequationex-pressedas(t)=a

PAGE 75

twoassumptionscanberelaxedbyintroducingmorecomplicatednon-stationarymodels(seeChapter3).4.7.2QTLScanningandEstimationTheexactlikelihoodfunctionisapproximatedonthebasisoftheTaylorexpansionaroundthecurrentestimateofQTLgenotype-speciccurveparameterslandrandomeectsbil.Inthisexample,theLMEapproximationmethod(LindstromandBates1990)isusedfortheestimateofbil.InEstep,theposteriormeanofbilforthelthQTLgenotypeinthemixturemodeliscalculatedand,intheMstep,lisestimatedbysimplexalgorithm.AccordingtoLindstromandBates(1990),theresidualyi(l;bil)near^bilcanbeapproximatedasyi(l;bil)yi(l;^bil)+^Zil(bil^bil);(4.41)where^Zil=^Zil(^l;^bil)=@ @bTil^l;^bil: Itcanbeshownthatyi(l;^bil)hasanapproximatenormaldistributionwithmeanvector^Zil^bilandcovariancematrix+^ZilD^ZTil.Theapproximateposteriordistributionfunctionofbigivenyiisf(l;biljyi)/expn1 2(yi(l;^bil)+^Zil^bil)T1i(yi(l;^bil)+^Zil^bil)1 2bTilD1bilo; andtheposteriormeanofbilis^b(!+1)il=^DZTil^V1i(yi(l;^bil)+^Zil^bil):(4.44)

PAGE 76

Table4{1:MLEsofQTLgenotype-specicparametersthatdenestemdiametergrowthtrajectoriesinpoplartreesfromLindstromandBates'LMEmodel.ThestandarderrorsoftheMLEsaregivenintheparentheses. ParametersGenotypeQqGenotypeqq TheapproximatelikelihoodfunctionofyiiswrittenasL(yi)/j+^ZilD^ZTilj1 2expf1 2(yi(l;^bil)+^Zil^bil)T(+^ZilD^ZTil)1(yi(l;^bil)+^Zil^bil)g: TheNLME-basedmappingmodelisusedtogenomewidescanforallpossibleQTL,theirexistenceandchromosomaldistribution.WedetecttwoQTLonlinkagegroups9and10thataectdiametergrowthtrajectoriesinpoplartrees.Figure4-1illustratesaplotofthelikelihoodratios(LR)betweenthefull(thereisaQTL)andreducedmodel(thereisnoQTL)acrossallthelinkagegroups.ThesetwodetectedQTLarelocatedat111.1cMfromtherstleftmarkeronlinkagegroupD9and12cMfromtherstleftmarkeronlinkagegroupD10becausetheLRpeaksatthesepositionsfarexceedthegenome-widecriticalthreshold.Permutation

PAGE 77

Figure4{1:Theproleoflog-likelihoodratio(LR)betweenthefullandreducedmodelestimatedfromLindstromandBates'LMEmodelforanalyzingstemdi-ametergrowthtrajectoriesinaninterspecicpoplarhybridprogenyacrossallthelinkagegroups.Thedashedlineisthe1%cutopointfrompermutationtests.ThelinkagemapusedisoneconstructedwithheterozygousmarkersfromtheP.deltoidesparent(Yinetal.2002). testswereperformedtodeterminetheempiricalthreshold(31.64)fordeclaringthegenome-wideexistenceofQTLthroughoutallthelinkagegroups.TheMLEsofthecurveparametersforeachoftwoQTLgenotypes,Qqandqq,andtheparametersthatmodelthestructureofthevariancematrixweretabulatedinTable4-1,alongwiththeapproximatestandarderrorsoftheseestimatesestimatedfromFisher'sinformationmatrix.Alltheparameterscanbeestimatedwithreasonableprecision.TheMLEsofthecurveparametersinTable4-1wereusedtodrawgrowthcurvesateachQTLfordiametergrowth(Figs.4-2Aand4-2B).ThepatternofthedierentiationingrowthcurvesbetweentwoQTL

PAGE 78

Figure4{2:Twogrowthcurves(foreground)eachpresentingoneofthetwogroupsofgenotypesatthemajorQTLdetectedbyournonlinearrandommixed-eectsmodel(LindstromandBates'approximation)onlinkagegroupsD9andD10.ThedierentiationofgrowthcurvesshowstheeectoftheQTLongrowthtrajectories. genotypesateachQTLsuggeststhatthesetwodetectedQTLdonottriggeraneectongrowthatanearlystageoftreedevelopment,butareactivatedatage5{6yearsandkeepoperationalafterwards.ThetimingofQTLtobeswitchedonseemstobeconcordantwiththeemergingageofinter-treecompetitionforresourcesavailability.Foracomparison,wealsoanalyzethesamedatausingBealandSheiner'smethod.ResultsfromthismethodarebroadlyconsistentwiththosefromLind-stromandBates'LMEmodel.4.8MonteCarloSimulationInordertoexaminethestatisticalpropertiesoftheNLMEmodelforQTLmapping,fourMonteCarlosimulationstrategieswereperformed.Thesimulationstudiesmimictheexampleofpoplartrees.Twosamplesizes(80and200)wereused,ineachcaseofwhichdataweresimulatedaccordingtobytwodierentstrategies,NLMEandsimplefunctionalmapping(SFM)inwhichonlyserialcorrelationsaremodelledwiththeAR(1)process.Thesimulateddatasetsunderdierentstrategiesareanalyzed,respectively,bytheNLMEandSFMmodels.Such

PAGE 79

Table4{2:TrueparametervaluesforMonteCarlosimulationsbasedonstemdiam-etersofpoplartrees ParametersGenotypeQQGenotypeQq al28.2823.85bl12.2411.52rl0.560.6324.270.91D1110.25D227.28D220.001 reciprocaldesignsarethoughttobehelpfulforthemethodologicalcomparisonofQTLmapping.Table4-2givesalltheparametersusedtosimulatedatabasedondierentstrategies.Asexpected,ifthedataaresimulatedbytheNLMEstrategy,theNLMEmodeldisplaysbetterperformanceforparameterestimationandstatisticalpowerthandoestheSFMmodel.Thisdierenceismoreremarkableforalow(80)(Table4-3)thanlarger(200)samplesize(Table4-4).ForthedatasimulatedundertheSFMstrategy,thetwoanalyticalmodels,NLMEandSFM,performsimilarlyintheprecisionofparameterestimationandpowerforlowandlargesamplesizes(Tables4-5and4-6).Theestimatesofheritabilitybythetwomodelsareconsistentwiththetruevalue.InconjunctionwiththeNLME-simulateddata,allthesesuggestthattheNLMEmodelismoregeneralandcanbeusedinabroaderrangeofdatatypesthantheSFMmodel.4.9DiscussionInalltheorganisms,thedevelopmentofmorphological,anatomicalandphys-iologicaltraitsandtheirsubsequentcomplicationstakeplaceincharacteristicontogeneticperiods.Eectivemodellingofthegeneticcontrolofparticularphys-iologicalalterationsemerginginthecourseofthedevelopmentalprocess(from

PAGE 80

Table4{3:MLEsofparametersforsimulateddatasetwithasamplesizeof80.ThestandarderrorsoftheMLEsaregivenintheparentheses. ParametersGenotypeQQGenotypeQq Table4{4:MLEsofparametersforsimulateddatasetwithasamplesizeof200.ThestandarderrorsoftheMLEsaregivenintheparentheses. ParametersGenotypeQQGenotypeQq

PAGE 81

theirearlyonsetuntiltheirlateconsequences)requirestheuseofadequatestatis-ticalmodels.Somebasicstatisticalmodelsforthegeneticstudyofdevelopmentaldynamicshavebeenproposed,inanattempttoidentifytheontogeneticgeneticfactorsorquantitativetraitloci(QTL)thatcontrolthestructureandfunctionofadevelopmentalsystem(Maetal.2002;Wuetal.2002,2003,2004a,2004b,2004c;Zhaoetal.2004a,2004b,2005).Thesesocalledfunctionalmappingmodelshavebeenexpandedintovariousgeneticeldsrelatedtobiomedicalsciences,suchascancergrowth(Liuetal.2005),HIVdynamics(WangandWu2004;Wangetal.2005;Wuetal.2005)anddrugresponse(LinandWu2004a).Thecentralideaoffunctionalmappingistomodelthemeanvectorandcovariancematrixstructurebyparametricornonparametricapproaches.Previousfunctionalmappingapproacheshavemodelledthestructureofthecovariancematrixbyconsideringautocorrelationcomponents,butignoringothersourcesthatalsoaectthecovariancestructure,suchasrandomeectsandmeasurementerrors(Diggleetal.2002).Thestudyofthischapterisaimedtogeneralizefunctionalmappingtomodeltheeectsofrandomeectsontheparameterestimationoffunctionalmappinganditsrelevanthypothesistests,thusbroadeningthevisibilityoffunctionalmapping.Theincorporationofrandomeectswithfunctionalmappingbasedonnon-linearmixedeects(NLME)models(BealandSheiner1982;LindstromandBates1990;DavidianandGiltiman1995;Voneshetal.2002;Wu2004)isrobustinthatitcanprovidesucientpowertodetectontogeneticQTLforlongitudinaldatameasuredatunevenspacesandirregularlyfordierentsubjects.TheNLME-incorporatedfunctionalmappingmodelhasbeenusedtoanalyzeapublishedgrowthdatasetinpoplartrees.Ascomparedtoprevioussimplerfunctionalmapping(SFM)(Maetal.2002),thenewmodelgeneratesagreeableresultsforthedetectionofQTL,theirchromosomallocationsandontogeneticeectsduringatimecourse.However,simulationstudiesbasedonreciprocal

PAGE 82

designs,thatis,thedatasimulatedand,then,analyzedbyNLMEandSFMmodels,respectively,suggestthatwhileQTLcontainedintheSFM-simulateddatacanbedetectedbybothmodels,QTLintheNLME-simulateddatacanonlywellbedetectedbytheNLMEmodel.AllthisimpliesthattheNLMEmodelismoregeneralandcanbeusedmorewidelyinpracticethantheSFMmodel.Perhaps,themostsignicantadvantageofNLME-basedfunctionalmappingisitsexibilitytoextendtheideaoffunctionalmappingtoabroadspectrumofbiologicalandbiomedicalareas.NLMEmodelsincludetwo-stagehierarchicalcharacterizationofintra-andinter-subjectvariation.Intherststage,anyformofparametricmodelscanbeincorporatedthataredenedbybiologicallymeaningfulmathematicalparameters;forexample,growthrateparameterinthegrowthequa-tion(Richards1959)isrelatedtothedevelopmentalstatusofanorganisminatimeperiod.Thesemathematicalparametersmaybecorrelatedwithotherphysiologicalvariablesorexpresseddierentlyunderdierentenvironmentalconditionsorgeneticbackgrounds.ThegeneticcontrolofthesebiologicalphenomenacanbeintegratedintothesecondstageoftheNLMEmodelatwhichspecicunderlyingQTLcanbemodelled,estimatedandtested.NLME-incorporatedfunctionalmappingpro-videsaquantitativeframeworkforcharacterizingthedevelopmentalmachineryofthegeneticcontrolofcomplextraitsattheinterplaybetweentraitformationandprogressionandtheenvironmentinwhichtheorganismisgrown.Statistically,thereisstillmuchroomfortheimprovementofNLMEmodels.Forexample,itwouldbeinterestingtoestimatethecovariancematricesbasedontherestrictedmaximumlikelihoodmethod(REML).Theestimatefromgeneralmaximumlikelihoods(ML)isbiaseddownwardbecauseMLdoesnottakeintoaccountthedegreeoffreedomlostinestimatingthexedeectsl.Assumingthatthexedeectshasanoninformativeimproperprior(;1),REMLestimatorsisthemodeofposteriordistributionafterxedeectsareintegratedout(Davidian

PAGE 83

andGiltinan1995).Forthelinearmixed-eectsmodel,themarginallikelihoodfunctionoflinearbyintegratingoutbothandb:LREML(jy)=ZZLML(;b;jy)dbd:(4.46)ItcanbeshowedthatLREML(jy)=jXTV1Xj1 2LML():(4.47)wherey=266666664y1y2...yn377777775;X=266666664X1X2...Xn377777775;V=diag266666664V1V2...Vn377777775:ThereforeREMLislessbiased.ThedierencebetweentheMLandREMLesti-matorsarelargeasthenumberofxedeectsincrease(VerbekeandMolenberghs,2000).ForthenonlinearLMEmodel,LREML(j^(!);^b(!);^(!);y)=j^X(!)T^V(!)1^X(!)j1 2LML(j(!);^b(!);^(!);y):where^X(!)Tistheapproximationaroundthecurrentestimationof;b;andasdenedinthesection4.3.Gianolaetal.(2004)usedGibbssamplertotasimpleREMLmixturemodel.Theirmodeldoesnotincorporaterepeatedmeasuresandassumetheresidualcovariancematrixas2I.WeproposeamodiedREMLmodelfornonlinearmixed-eectsmodelusingthebackcrossdesign.Asstatedinbackcrossdesign,L(y)=nYi=1[1jif1(yi)+0jif0(yi)](4.48)DeneciBernoulli(),beanindependent(apriori)randomvariabletakingthevalueci=1withprobability1jiifQTLisQq,orthevalueci=0withprobability

PAGE 84

2diag(ci)(~yi^Xi11)T^V1(~yi^Xi11)]exp[1 2(Idiag(ci))(~yi^Xi00)T^V1(~yi^Xi00)]p(bjD)nYi=1(1ji)ci(0ji)1ci:Sothedistributionof1jb;c;0;;;yisf(1jb;c;0;;;y)/exp[1 2diag(ci)(~yi^Xi11)T^V1(~yi^Xi11)]=exp[1 2(~yi^Xi11)Tdiag(ci)^V1(~yi^Xi11)]:andsimilarly0jb;c;1;;;yhasthedensityas:f(0jb;c;1;;;y)/exp[1 2(Idiag(ci))(~yi^Xi00)T^V1(~yi^Xi00)]=exp[1 2(~yi^Xi00)T(Idiag(ci))^V1(~yi^Xi00)]

PAGE 85

Givenci,ciand1ciareeither0or1.For1,onlythesubjectswhichhasci=0contributeinformation.Therefore,theposteriordistribution1jb;c;0;;;yisrecognizedasNormalwithmean(XT1diag(ci)^V1X1)1XT1diag(ci)^V1~yandcovariancematrix(XT1diag(ci)^V1X1)1(SorensenandGianola2002).Likewise,theposteriordistribution0jb;c;1;;;yisNormalwithmean(XT0diag(ci)^V1X0)1XT0(Idiag(ci))^V1~yandcovariancematrix(XT0(Idiag(ci))^V1X0)1.ThenateachEstep,theGibbssamplercanbedrawnsuccessivelyfromtheposteriordistributionsof1and0.bi1andbi2willalsobeestimatedatthisstep.ThelikelihoodfunctionhastobemodiedaccordinglyintheSimplexstep.Sincethesettingthistechniqueisquitedierentfromourproposedmodel,itwillnotbeincludedinthisdissertationwork.

PAGE 86

76

PAGE 87

InChapter2,Iproposedageneralmodelforfunctionmappingwhichcanhandleirregularscheduledrepeatedmeasurements.Theassumptionsofevenspacedtimepointsorconstantresidualvarianceisrelaxed.Thismodelisusedtoanalyzehumandatainwhichamissing-dataproblemoccurscommonly.InChapter3,Iproposedastatisticalmodelforformulatinggenotypeenvironmentinteractionsforlongitudinaltraits.Irelaxedtheassumptionsofvariance-covariancestationaritytoallowthepatternoftime-dependentvariancesandcorrelationstochangeinresponsetodierentenvironmentalagents.ThemodelprovidesausefulframeworkfortestingmanyfundamentalquestionsregardingtheimpactsofenvironmentalheterogeneityontheexpressionofQTLeectinadevelopmentalsense.Thedevelopmentofnon-linearmixedeects(NLME)modelsforfunctionalmappingisthefocusofChapter4.TheNLMEmodelscancharacterizetheintra-andinter-individualvariationofmathematicalparametersthatspecifylongitudinalcurves.Insomecases,thesemodelshaveproventobemorepowerfulforstudyingrepeatmeasuredatathannon-randomeectsmodels.Thegeneralformofthenonlinearfunctionallowsforthemodellingofawidevarietyofmeanfunctionstructures.Theresidualcovariancematrixcanalsobemodelledusingdierentapproaches.Ihaveforthersttimeincorporatedsuchamixedeectsmodelintothemixture-based,functionalmappingframework.5.2FutureDirectionsAlthoughsignicantresultshavebeenachievedfrommydissertationandpreviousworkbyotherauthors,therearestillanumberofopenquestionsthathavenotbeensolvedbutareveryimportantforfunctionalmapping.Below,Ipinpointthethefollowingaspectsoffunctionalmappingwhichdeservefurtherworkinthefuture.

PAGE 89

Moregenerally,missingcovariatesisaproblemwhichcaneasilyhappenincollectinghumandata.Wu(2004)proposedaGibbssamplingschemetodealwithmissingcovariatesundernonlinearmixed-eectsmodelsandthemissingcovariatescanbecontinuous,categoricalormixedtype.IwillusetheGibbssamplinginthemixturemodeltodealwiththecovariatemissingproblem.5.2.4ModelDiagnosticsTheQQplotwillbeanintuitivetooltotesttheassumptionofnormaldistribution.CriteriaAICandBICcanbeusedtoselectnitemixturemodels(McLanchlanandPeel2000).However,thesetoolshavenotbeenwelldevelopedforthenonlinearmixturemixed-eectsmodels.Iwillworkonndingtestablediagnostictoolstovalidatethenonlinearmixturemixed-eectsmodeltting,totesttheoutliersandtochoosethesuitablecovariancematricesiandD.5.2.5BiologicalExtensionsIwillincorporatemorestatisticalteststotestsomeimportantbiologicaltime-to-events,suchasthetimepointatwhichageneisturnedonoro.Theeectsofgenotype-environmentinteractionsplayanimportantroleinshapinggrowthtrajectories.Iwilldevelopstatisticalmodelstotesthowtheseinteractioneectstriggeranimpactonthechangesofgrowthpatterns.

PAGE 90

li=fl(yi;ml;v) IntheMstep,obtaintheestimatesofpbysolvingthelog-likelihoodequa-tions.Fortheone-QTLmodel,wehave^q=Pni=1(1i0i) ^d=bp whereb=(2q22q+1)nXi=11i+(12q)nXi=1[(1q)2iq0i];c=q(1q)nXi=1q(1q)1i(1q)22iq20iIneachMstep,theestimateofdwithtwopossiblesolutionsbasedonequation(A4)shouldbetakenfromitsspace,max[q2;(1q)2]d2q(1q):

PAGE 91

Forthetwo-QTLmodel,weobtaintheestimatesofthegameticfrequenciesusing^p11=Pni=1(222i+21i+12i+11i) 2n; ^p10=Pni=1(220i+21i+10i+(1)11i) 2n; ^p01=Pni=1(202i+12i+01i+(1)11i) 2n; ^p00=Pni=1(200i+10i+01i+11i) 2n: Iterationsarerepeatedbetweenequations(A1){(A4)fortheone-QTLmodelandbetweenequations(A1),(A2)and(A5){(A8)forthetwo-QTLmodeluntilconvergence.Accordingtotheinvariancepropertyofthemaximumlikelihoodmethod,theMLEsoftheallelefrequenciesofthetwoQTLandtheirLDcanbeobtainedfromtheMLEsofgametefrequenciesbysolvingequation(2.10).

PAGE 92

82

PAGE 93

Chi,E.M.andReinsel,G.C.(1989)ModelsforlongitudinaldatawithrandomeectsandAR(1)errors.JournaloftheAmericanStatisticalAssociation84:452-459.Churchill,G.A.andDoerge,R.W.(1994).Empiricalthresholdvaluesforquantita-tivetraitmapping.Genetics138:963-971.Cui,Y.H.(2005).Statisticalmodelsforfunctionalmappingofprogrammedcelldeath.PhDDissertation,UniversityofFlorida,Gainesville,Florida.Cui,Y.H.andWu,R.L.(2005).Functionalmappingforgeneticcontrolofprogrammedcelldeath.PhysiologicalGenomics(accepted).Daniels,M.J.andPourahmadi,M.(2002).Bayesiananalysisofcovariancematricesanddynamicmodelsforlongitudinaldata.Biometrika89:553-566.Davidian,M.andGiltinan,D.M.(2003)Nonlinearmodelsforrepeatedmea-surements:Anoverviewandupdate.Editor'sInvitedpaper.JournalofAgricultural,Biological,andEnvironmentalStatstics8:387-419.Davidian,M.andGiltinan,D.(1995).NonlinearModelsforRepeatedMeasurementData.NewYork:ChapmanandHall.Dempster,A.P.,Laird,N.M.andRubin,D.B.(1977).MaximumlikelihoodfromincompletedataviaEMalgorithm.JournaloftheRoyalStatisticalSociety:SeriesB39:1-38.Derendorf,H.andMeibohm,B.(1999).Modelingofpharmacokinetic/pharmaco-dynamic(PK/PD)relationships:Conceptsandperspectives.PharmaceuticalResearch16:176-185.Diggle,P.J.,Heagerty,P.,Liang,K.Y.andZeger,S.L.(2002).AnalysisofLongitudinalData.Oxford,UK:OxfordUniversityPress.Eyler,F.D.,Behnke,M.,Conlon,M.,Woods,N.S.andWobie,K.(1998a).Birthoutcomefromaprospectivematchedstudyofprenatalcrack/cocaineuse.PartI:Interactiveanddoseeectsonhealthandgrowth.Pediatrics101:229-237.Eyler,F.D.,Behnke,M.,Conlon,M.,Woods,N.S.andWobie,K.(1998b).Birthoutcomefromaprospective,matchedstudyofprenatalcrack/cocaineuse.PartII:Interactiveanddoseeectsonneurobehavioralassessment.Pediatrics101:237-241.Frank,D.A.,Augustyn,M.,Knight,W.G.,Pell,T.andZuckerman,B.(2001).Growth,development,andbehaviorinearlychildhoodfollowingprenatalcocaineexposure:Asystematicreview.JournaloftheAmericanMedicalAssociation.285:1613-1625.Gabriel,K.R.(1962).Antedependenceanalysisofanorderedsetofvariables.AnnalsofMathematicalStatistics33:201-212.

PAGE 94

GewekeJ.(1989).BayesianinferenceinEconometricsmodelsusingMonteCarlointegration.Econometrica57:13171339.GianolaD.,deg_ardJ.,HeringstadB.,SorensenD.,MadsenP.,JensenJ.andDetilleuxJ.(2004)Mixturemodelforinferringsusceptibilitytomastitisindairycattle:aprocedureforlikelihood-basedinference.GeneticsSelectionEvolution36:327.Gong,Y.,Wang,Z.H.,Liu,T.,Zhao,W.,Zhu,Y.,Johnson,J.A.andWu,R.L.(2004).Astatisticalmodelforhigh-resolutionmappingofquantitativetraitlociaectingpharmacodynamicprocesses.ThePharmacogenomicsJournal4:315-321.Harville,D.A.(1976).ExtensionoftheGaussMarkovtheoremtoincludetheestimationofrandomeects.AnnalofStatistics4:384395.HoD.D.,NeumannA.U.,PerelsonA.S.,ChenW,LeonardJ.M.andMarkowitzM.(1995).RapidturnoverofplasmavirionsandCD4lymphocytesinHIVinfection.Nature373:12326Horvitz,H.R.(2003).Worms,life,anddeath(Nobellecture).Chembiochem4:697-711.Hou,W.,Garvan,C.W.,Zhao,W.,Behnke,M.,Eyler,F.D.andWu,R.L.(2005).Ageneralizedmodelfordetectinggeneticdeterminantsunderlyinglongitudinaltraitswithunequallyspacedmeasurementsandtime-dependentcorrelatederrors.Biostatistics6:420-433.TheInternationalHapMapConsortium.TheInternationalHapMapProject.(2003).Nature426:789-94.Jansen,R.C.andStam,P.(1994).Highresolutionmappingofquantitativetraitsintomultiplelociviaintervalmapping.Genetics136:1447-1455.Jenss,R.M.andBayley,N.(1937).Amathematicalmethodforstudyingthegrowthofachild.HumanBiology9:556-563.Jiang,C.andZeng,Z.B.(1995).MultipletraitanalysisofgeneticTW00997toT.F.C.M.andE.G.P.ThisisapublicationoftheW.M.mappingforquantitativetraitloci.Genetics140:1111-1127.Kolb,B.,Gorny,G.,Li,Y.,Samaha,A.N.andRobinson,T.E.(2003).Ampheta-mineorcocainelimitstheabilityoflaterexperiencetopromotestructuralplasticityintheneocortexandnucleusaccumbens.Proc.Natl.Acad.Sci.USA100:10523-10528.Lagarius,J.C.,Reeds,J.A.,Wright,M.H.andWright,P.E.(1998).ConvergencepropertiesoftheNeler-Meadsimplexmethodinlowdimensions.SIAMJournalofOptimization9(1):112-147.Laird,N.M.andWare,J.H.(1982).Randomeectsmodelsforlongitudinaldata.Biometrics38:963-974.

PAGE 95

Lander,E.S.andBotstein,D.(1989).MappingMenelianfactorsunderlyingquantitativetraitsusingRELPlinkagemaps.Genetics121:185-199.Lester,B.M.(Ed).(1999).PrenatalDrugExposureandChildOutcome.Philadel-phia:W.B.SaundersCompany.Lin,M.(2005)MathematicalandstatisticalmethodsforidentifyingDNAsequencevariantsthatencodedrugresponse.PhDDissertation,UniversityofFlorida,Gainesville,Florida.Lin,M.andWu,R.L.(2005a).Theoreticalbasisfortheidenticationofallelicvariantsthatencodedrugecacyandtoxicity.Genetics170:919-928.Lin,M.andWu,R.L.(2005b)Aunifyingmodelfornonparametricfunctionalmappingoflongitudinaltrajectoriesandtime-to-events.BMCBioinformatics(accepted).Lindstrom,M.J.andBates,D.M.(1990),NonlinearMixedEectsModelsforRepeatedMeasuresData.Biometrics46:673687.Lo,Y.T.,Mendell,N.R.andRubin,D.B.(2001).Testingthenumberofcompo-nentsinanormalmixture.Biometrika88:767-778.Lou,X.-Y.,Casella,G.,Littell,R.C.,Yang,M.K.C.andWu,R.L.(2003).Ahaplotype-basedalgorithmformultilocuslinkagedisequilibriummappingofquantitativetraitlociwithepistasisinnaturalpopulations.Genetics163:1533-1548.Louis,T.A.(1982).FindingtheobservedinformationmatrixwhenusingtheEMalgorithm.JournaloftheRoyalStatisticalSocietySeriesB44:226-233.Lynch,M.andWalsh,B.(1998).GeneticsandAnalysisofQuantitativeTraits.Sinauer,Sunderland,MA.Ma,C.-X.,Casella,G.andWu,R.L.(2002).Functionalmappingofquantitativetraitlociunderlyingthecharacterprocess:Atheoreticalframework.Genetics161:1751-1762.Ma,C.-X.,Wu,R.L.andCasella,G.(2004).FunMap:Functionalmappingofcomplextraits.Bioinformatics20:1808-1811.McLaehlan,G.andPeel,D.(2000).FiniteMixtureModels.NewYork:JohnWileyandSponsorship.Macchiavelli,R.E.andArnold,S.F.(1994).Variableorderantedependencemodels.CommuicationsinStatistics:TheoryandMethods23:2683-2699.Meng,X.-L.andRubin,D.B.(1991).UsingEMtoobtainasymptoticvariance-covariancematrices:TheSEMalgorithm.JournaloftheAmericanStatisticalAssociation86:899-909.Nelder,J.A.andMead,R.(1965).Asimplexmethodforfunctionminimization.ComputerJournal7:308-313.

PAGE 96

Nu~nez-Anton,V.(1997).Longitudinaldataanalysis:non-stationaryerrorstruc-turesandantedependencemodels.AppliedStochasticModelsandDataAnalysis13:279-287.Nu~nez-Anton,V.andWoodworth,G.G.(1994).Analysisoflongitudinaldatawithunequallyspacedobservationsandtime-dependentcorrelatederrors.Biometrics50:445-456.Nu~nez-Anton,V.andZimmerman,D.L.(2000).Modelingnonstationarylongitudi-naldata.Biometrics56:699-705.Pan,J.X.andMackenzie,G.(2003).Onmodellingmean-covariancestructuresinlongitudinalstudies.Biometrika90:239-244.Pinheiro,J.C.andBates,D.M.(1995).ApproximationstotheLog-LikelihoodFunctionintheNonlinearMixed-EectsModel.JournalofComputationalandGraphicalStatistics4:12-35.Pourahmadi,M.(1999).Jointmean-covariancemodelswithapplicationstolongitudinaldata:Unconstrainedparameterisation.Biometrika86:677-690.Pourahmadi,M.(2000).Maximumlikelihoodestimationofgeneralisedlinearmodelsformultivariatenormalcovariancematrix.Biometrika87:425-435.Pritchard,J.K.,Stephens,M.,Rosenberg,N.A.andDonnelly,P.(2000).Associa-tionmappinginstructuredpopulations.AmericanJournalofHumanGenetics67:170-181.Richards,F.J.(1959).Aexiblegrowthfunctionforempiricaluse.JournalofExperimentalBotany10:290-300.Samuelsen,S.O.,Stene,L.C.andBakketeig,L.S.(2004).Associationofheadcircumferenceatbirthamongsiblingpairs.PaediatricandPerinatalEpidemi-ology18:26-34.Schabenberger,O.(1995).Theuseofordinalresponsemethodologyinforestry.ForestScience41:321-336.Sen,S.andChurchill,G.A.(2001)Astatisticalframeworkforquantitativetraitmapping.Genetics159:371387.Singer,L.T.,Arendt,R.,Minnes,S.,Farkas,K.,Salvator,A.,Kirchner,H.L.andKliegman,R.(2002).Cognitiveandmotoroutcomesofcocaine-exposedinfants.JournaloftheAmericanMedicalAssociation287:1952-1960.SorensenD.andGianolaD.(2002).Likelihood,BayesianandMCMCMethodsinQuantitativegeneticsNewYork:Springer-Verlag.Spellman,P.T.,Sherlock,G.,Zhang,M.Q.,Iyer,V.R.,Anders,K.,Eisen,M.B.,Brown,P.O.,Botstein,D.andFutcherB.(1998).Comprehensiveidenticationofcell-cycleregulatedgenesinSaccharomycescerevisiaebyMicroarrayHybridization.MolecularBiologyoftheCell95:14863-14868(1998).

PAGE 97

Spinka,C.,Carroll,R.J.andChatterjee,N.(2005).Analysisofcase-controlstudiesofgeneticandenvironmentalfactorswithmissinggeneticinformationandhaplotype-phaseambiguity.GeneticEpidemiology29:108-127.Tierney,L.andKadane,J.B.(1986).Accurateapproximationsforposteriormo-mentsandmarginaldensities.JournalofTheAmericanStatisticalAssociation81:82-86.Tronick,E.Z.andBeeghly,M.(1999).Prenatalcocaineexposure,childdevelop-ment,andthecompromisingeectsofcumulativerisk.InB.M.Lester(Ed.)ClinicsinPerinatology.Montreal,Canada:W.B.Saunders.Venables,W.N.andRipley,B.D.(2002).ModernAppliedStatisticswithS.FourthEdition.NewYork:Springer.Verbeke,G.andMolenberghs,G.(2000).Linearmixedmodelsforlongitudinaldata.NewYork:Springer-Verlag.vonBertalany,L.(1957).Quantitativelawsinmetabolismandgrowth.QuarterlyReviewOFBiology32:217-231.VoneshE.F.,WangH.,NieL.andMajumdarD.2002.Conditionalsecond-ordergeneralizedestimatingequationsforgeneralizedlinearandnonlinearmixed-eectsmodels.JournalofAmericanStatisticalAssociation97:271283.WangZ.H.,Hou,W.andWu,R.L.(2005).AStatisticalModeltoAnalyzeQuantitativeTraitLocusInteractionsforHIVDynamicsfromtheVirusandHumanGenomes.StatisticsinMedicine(inpress).Wang,Z.H.andWu,R.L.(2004).Astatisticalmodelforhigh-resolutionmappingofquantitativetraitlocidetermininghumanHIV-1dynamics.StatisticsinMedicine23:3033-3051.Watters,J.W.andMcLeod,H.L.(2003).Usinggenome-widemappinginthemousetoidentifygenesthatinuencedrugresponse.TrendsinPharmacologi-calSciences24:55-58.West,G.B.,Brown,J.H.andEnquist,B.J.(2001).Ageneralmodelforontoge-neticgrowth.Nature413:628-631.West,G.B.,Brown,J.H.andEnquist,B.J.(1997).Ageneralmodelfortheoriginofallometricscalinglawsinbiology.Science276:122-126.Wolnger,R.D.(1996).Heterogeneousvariance-covariancestructuresforrepeatedmeasures.JournalofAgricultural,Biological,andEnvironmentalStatistics1:205-230.Wolnger,R.D.(1993).Laplace'sApproximationforNonlinearMixedModels.Biometrika80:791-795.

PAGE 98

Wu,L.(2004).ExactandApproximateInferencesforNonlinearMixed-EectsModelsWithMissingCovariates.JournaloftheAmericanStatisticalAssocia-tion99:700-709(10)Wu,R.L.(1998).Thedetectionofplasticitygenesinheterogeneousenvironments.Evolution54,967-977.Wu,R.L.,Ma,C.-X.,Chang,M.,Littell,R.C.,Wu,S.S.,Yin,T.M.,Huang,M.R.,Wang,M.X.andCasella,G.(2002a).Alogisticmixturemodelforcharacterizinggeneticdeterminantscausingdierentiationingrowthtrajectories.GeneticalResearch19:235-245.Wu,R.L.,Ma,C.-X.,Lin,M.andCasella,G.(2004a).Ageneralframeworkforanalyzingthegeneticarchitectureofdevelopmentalcharacteristics.Genetics166:1541-1551.Wu,R.L.,Ma,C.-X.,Lin,M.,Wang,Z.H.andCasella,G.(2004b).Functionalmappingofquantitativetraitlociunderlyinggrowthtrajectoriesusingatransform-both-sideslogisticmodel.Biometrics60:729-738.Wu,R.L.,Ma,C.-X.,Littell,R.C.andCasella,G.(2002b).Astatisticalmodelforthegeneticoriginofallometricscalinglawsinbiology.JournalofTheoreticalBiology217:275-287.Wu,R.L.,Ma,C.-X.,Wang,Z.H.,Zhu,Y.,Li,H.H.andCasella,G.(2004c).AmajorgenedetectedtoaectHIV-1dynamics.JournalofTheoreticalBiology(accepted).Wu,R.L.,Ma,C.-X.,Yang,M.C.K.,Chang,M.,Santra,U.,Wu,S.S.,Huang,M.,Wang,M.andCasella,G.(2003a).QuantitativetraitlociforgrowthinPopulus.GeneticalResearch81:51-64.Wu,R.L.,Ma,C.-X.,Zhao,W.andCasella,G.(2003b).Functionalmappingofquantitativetraitlociunderlyinggrowthrates:Aparametricmodel.PhysiologicalGenomics14:241-249.Wu,S.,Yang,J.andWu,R.L.(2005)MultilocuslinkagedisequilibriummappingofquantitativetraitlocithataectHIVdynamics:Asimulationapproach.StatisticsinMedicine(accepted).Wu,W.B.andPourahmadi,M.(2003)Nonparametricestimationoflargecovari-ancematricesoflongitudinaldata.Biometrika90:831-844.Xu,S.Z.(2003).Estimatingpolygeniceectsusingmarkersoftheentiregenome.Genetics163:789801.Yao,F.,Mller,H.-G.andWang,J.-L.(2005)Functionaldataanalysisforsparselongitudinaldata.oftheAmericanStatisticalAssociation100:577-590.Yin,T.M.,Zhang,X.Y.,Huang,M.R.,Wang,M.X.,Zhuge,Q.,Zhu,L.H.andWu,R.L.(2002)MolecularlinkagemapsofthePopulusgenome.Genome45:541-555.

PAGE 99

Zeng,Z.-B.(1994).Precisionmappingofquantitativetraitloci.Genetics136:1457-1468.Zhao,W.(2005).WaveletDimensionalityReductionModelsforFunctionalMappingofLongitudinalTraits.PhDDissertation,UniversityofFlorida,Gainesville,Florida.Zhao,W.,Chen,Y.Q.,Casella,G.,Cheverud,J.M.andWu,R.L.(2005).Anonstationarymodelforfunctionalmappingofcomplextraits.Bioinformatics21:2469-2477.Zhao,W.andWu,R.L.(2005).Awaveletthresholdingapproachforfunctionalgeneticmappingoflongitudinaltraits.JournaloftheAmericanStatisticalAssociation(revised).Zhao,W.Wu,R.L.,Ma,C.-X.andCasella,G.(2004).Afastalgorithmforfunctionalmappingofcomplextraits.Genetics167:2133-2137.Zhu,Y.,Hou,W.andWu,R.L.(2004).AHaplotypeBlockModelforFineMappingofQuantitativeTraitLociRegulatingHIV-1Pathogenesis.JournalofTheoreticalMedicine5:1-7.Zimmerman,D.L.andNu~nez-Anton,V.(2001).Parametricmodelingofgrowthcurvedata:Anoverview(withdiscussion).Test10:1-73.Zimmerman,D.L.,Nu~nez-Anton,V.andElBarmi,H.(1998).Computationalaspectsoflikelihood-basedestimationofrst-orderantedependencemodels.JournalofStatisticalComputationandSimulation60:67-84.Zimmerman,D.L.andNu~nez-Anton,V.(1997).Structuredantedependencemodelsforlongitudinaldata.InG.Gregoire,D.R.Brillinger,P.J.Diggle,E.Russek-Cohen,W.G.WarrenandR.D.Wolnger,(eds.),ModellingLongitudinalandSpatiallyCorrelatedData:Methods,Applications,andFutureDirections.LectureNotesSeriesinStatisticsNo.122.NewYork:Springer-Verlag,pp.63-76Zuckerman,B.,Frank,D.A.andMayes,L.(2002).Cocaine-exposedinfantsanddevelopmentaloutcomes.JournaloftheAmericanMedicalAssociation287:1990-1991.

PAGE 100

90