<%BANNER%>

A Robust Approach for Genetic Mapping of Complex Traits

Permanent Link: http://ufdc.ufl.edu/UFE0022399/00001

Material Information

Title: A Robust Approach for Genetic Mapping of Complex Traits
Physical Description: 1 online resource (137 p.)
Language: english
Creator: Wu, Song
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: functional, haplotype, hardy, integrated, l2e, linkage, qtl, statistical, zygotic
Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Genetic mapping has proven to be a powerful tool for studying the genetic architecture of complex traits by localizing individual quantitative trait loci (QTLs) that underlie the traits. For this reason, the past two decades have witnessed a surge of interest in developing statistical methods for QTL mapping. The central theme of QTL mapping is to assign each individual with all possible genotypes at the unobservable QTL in a probability determined from observed markers linked to the QTL and then model the phenotypic distribution of each QTL genotype by parametric or non-parametric approaches. All the statistical mapping methods rely on several key assumptions, which include (1) markers and QTL are segregating in an equilibrium state, (2) for a continuous trait, it obeys a parametric distribution, and (3) there is a direct relationship between genotypes and phenotypes. It is possible that violation of each of these assumptions will lead to biased inference about QTL locations and effects and to spurious QTL discoveries. In this dissertation, I will derive a battery of robust statistical approaches for QTL mapping, which do not rely on these assumptions and push mapping work toward more practical settings. Assumption 1 states that the segregation of genetic loci does not deviate from Mendel's first law in an experimental cross or from Hardy-Weinberg equilibrium (HWE) in a natural population. In a cross, differences in viability may occur among gametes or zygotes due to some unknown mechanisms, leading to distorted segregation. In a natural population affected by various evolutionary forces, individuals may not be randomly mating, making the population at Hardy-Weinberg disequilibrium (HWD). I will develop a general framework for relaxing this equilibrium assumption. By focusing on a natural population, I will demonstrate my model framework for relaxing HWE the assumption at the haplotypic and zygotic levels. Assumption 2 requires that the trait distribution can be exactly fit by a parametric function, in that maximum likelihood or Bayesian approaches are formulated for parameter estimation. However, it is not possible to know the true underlying model for observed phenotypes. I will derive a robust approach that is flexible enough to accommodate a certain degree of misspecification of the true model. Here, I will incorporate the idea of integrated square errors or $L_2E$ into the genetic mapping framework and formulate the hypothesis testing by defining a new test statistic: Energy Difference. Assumption 3 suggests that the outcome phenotype of a complex trait is determined by causal QTLs in a direct way, neglecting the biological pathway or process of trait formation on a time or space scale. A statistical model, called functional mapping, has been proposed to model the dynamic pattern of the genetic control of a trait in time course by biologically meaningful mathematical equations, aimed to relax the assumption of direct genotype-phenotype relationships. Here, I will extend the L2E approach to functional mapping by relaxing the widely used multivariate normality assumption, greatly expanding the breadth of use of functional mapping. M y dissertation is divided into three parts each corresponding to an assumption mentioned above. In each part, I first formulate a general likelihood function, derive computing algorithms for parameter estimation, provide and prove theorems behind a typical issue. Then, I perform extensive simulation studies to investigate the statistical properties of each approach and compare the results from my newly derived approaches with those from traditional ones. Lastly, analyses of real examples are conducted to demonstrate the usefulness and utilization of the new approaches in a practical genetic setting. In the last chapter of this dissertation, I discuss several issues pertaining to the future direction of QTL mapping.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Song Wu.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Wu, Rongling.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022399:00001

Permanent Link: http://ufdc.ufl.edu/UFE0022399/00001

Material Information

Title: A Robust Approach for Genetic Mapping of Complex Traits
Physical Description: 1 online resource (137 p.)
Language: english
Creator: Wu, Song
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: functional, haplotype, hardy, integrated, l2e, linkage, qtl, statistical, zygotic
Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Genetic mapping has proven to be a powerful tool for studying the genetic architecture of complex traits by localizing individual quantitative trait loci (QTLs) that underlie the traits. For this reason, the past two decades have witnessed a surge of interest in developing statistical methods for QTL mapping. The central theme of QTL mapping is to assign each individual with all possible genotypes at the unobservable QTL in a probability determined from observed markers linked to the QTL and then model the phenotypic distribution of each QTL genotype by parametric or non-parametric approaches. All the statistical mapping methods rely on several key assumptions, which include (1) markers and QTL are segregating in an equilibrium state, (2) for a continuous trait, it obeys a parametric distribution, and (3) there is a direct relationship between genotypes and phenotypes. It is possible that violation of each of these assumptions will lead to biased inference about QTL locations and effects and to spurious QTL discoveries. In this dissertation, I will derive a battery of robust statistical approaches for QTL mapping, which do not rely on these assumptions and push mapping work toward more practical settings. Assumption 1 states that the segregation of genetic loci does not deviate from Mendel's first law in an experimental cross or from Hardy-Weinberg equilibrium (HWE) in a natural population. In a cross, differences in viability may occur among gametes or zygotes due to some unknown mechanisms, leading to distorted segregation. In a natural population affected by various evolutionary forces, individuals may not be randomly mating, making the population at Hardy-Weinberg disequilibrium (HWD). I will develop a general framework for relaxing this equilibrium assumption. By focusing on a natural population, I will demonstrate my model framework for relaxing HWE the assumption at the haplotypic and zygotic levels. Assumption 2 requires that the trait distribution can be exactly fit by a parametric function, in that maximum likelihood or Bayesian approaches are formulated for parameter estimation. However, it is not possible to know the true underlying model for observed phenotypes. I will derive a robust approach that is flexible enough to accommodate a certain degree of misspecification of the true model. Here, I will incorporate the idea of integrated square errors or $L_2E$ into the genetic mapping framework and formulate the hypothesis testing by defining a new test statistic: Energy Difference. Assumption 3 suggests that the outcome phenotype of a complex trait is determined by causal QTLs in a direct way, neglecting the biological pathway or process of trait formation on a time or space scale. A statistical model, called functional mapping, has been proposed to model the dynamic pattern of the genetic control of a trait in time course by biologically meaningful mathematical equations, aimed to relax the assumption of direct genotype-phenotype relationships. Here, I will extend the L2E approach to functional mapping by relaxing the widely used multivariate normality assumption, greatly expanding the breadth of use of functional mapping. M y dissertation is divided into three parts each corresponding to an assumption mentioned above. In each part, I first formulate a general likelihood function, derive computing algorithms for parameter estimation, provide and prove theorems behind a typical issue. Then, I perform extensive simulation studies to investigate the statistical properties of each approach and compare the results from my newly derived approaches with those from traditional ones. Lastly, analyses of real examples are conducted to demonstrate the usefulness and utilization of the new approaches in a practical genetic setting. In the last chapter of this dissertation, I discuss several issues pertaining to the future direction of QTL mapping.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Song Wu.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Wu, Rongling.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022399:00001


This item has the following downloads:


Full Text

PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

Firstandmostimportantly,IwouldliketoexpressmywarmestgratitudetomyPhDadvisorandcommitteechair,Dr.RonglingWu.Ifitwerenotforhim,Imighthavepursuedanotheracademicpath.Itishewhohasintroducedmeintothiswonderfulworldofstatisticalgenetics.FromDayOnewhenIjoinedDr.Wu'sgroup,hiseverlastingenthusiasmhasinspiredmetobeagoodresearcher.Fromhim,Ilearntnotonlyknowledgeaboutstatisticsandgenetics,butalsoabouthisattitudetowardlifeandresearch.Dr.Wualwayshadcondenceinmeandsupportedmethroughoutmydoctoralstudies.IthasbeenagreatpleasurethatIworkmydissertationunderhisguidance.Also,Iwouldliketothankmyothercommitteemembers:Drs.MalayGhosh,XueliLiuandHartmutDerendorf,forgivingmevaluablesuggestionsonmydissertation.Theirreadingandcommentingwereveryhelpful.Myspecialthanksgotomyfamily,whohaveunconditionallyandconstantlylovedmeandsupportedmeforyears.Ideeplyappreciatethesacricesmyparentshadtomaketogivemetheopportunitiesforabettereducation.Ialsoowethegreatestdebtofgratitudetomybelovedwife,Jie,whohasbackedmewheneverpossible.Withouttheirunderstandingandcontinuoussupport,thisdissertationwouldnothavebeenpossible.Further,Iwanttothankmyson,David,ajoyfulangel,whobringssomuchhappinessandpridetoourlifeandcanalwayscheermeupwhenIfeeldown.Lastbutnotleast,IwanttothankallofmyfriendsattheUniversityofFlorida.Withtheirhumor,friendship,andunwaveringsupport,IwillalwaysrecallthegoodtimesspentatGainesvilleduringmygraduatestudy. 4

PAGE 5

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 8 LISTOFFIGURES .................................... 10 ABSTRACT ........................................ 11 CHAPTER 1FUNDAMENTALOFGENETICMAPPING ................... 14 1.1Introduction ................................... 14 1.2BasicConcepts ................................. 14 1.2.1Model .................................. 14 1.2.2Marker,QTL,GenotypeandHaplotype ................ 15 1.3StrategiesforQTLMapping .......................... 18 1.3.1History .................................. 18 1.3.2MappingPopulations .......................... 19 1.3.2.1Experimentalcrosses ..................... 20 1.3.2.2Nuclearfamilies ........................ 21 1.3.2.3Naturalpopulationswithunrelatedindividuals ...... 21 1.3.2.4Naturalpopulationswithunrelatedfamilies ........ 22 1.3.3PrinciplesofQTLMapping ....................... 23 1.3.3.1Linkagemapping ....................... 23 1.3.3.2Linkagedisequilibriummapping ............... 25 1.3.3.3Jointlinkageandlinkagedisequilibriummapping ..... 26 1.3.3.4Functionalmapping ..................... 26 1.4StatisticalMethodforQTLMapping ..................... 27 1.4.1TheMixtureModel ........................... 27 1.4.2Structure ................................. 29 1.4.3Estimation ................................ 30 1.4.4HypothesisTesting ........................... 33 1.4.4.1Marker-QTLassociation ................... 33 1.4.4.2QTLeect .......................... 35 1.5DissertationObjectives ............................. 36 2GENETICMAPPINGINANON-EQUILIBRIUMPOPULATION ....... 38 2.1Introduction ................................... 38 2.2ThePrincipleofLinkageDisequlibriumMapping .............. 40 2.2.1Hardy-WeinbergEquilibrium ...................... 40 2.2.2Evolutionaryforce:Wahlundeect .................. 41 2.2.3LinkageDisequilibrium ......................... 43 5

PAGE 6

................................. 44 2.3GeneticMappinginaNon-EquilibriumPopulationattheHaplotypeLevel 45 2.3.1NotationandDenitions ........................ 45 2.3.2ModelFormulation ........................... 49 2.3.2.1Linearmodel ......................... 49 2.3.2.2Likelihoodfunction ...................... 49 2.3.2.3EMalgorithm ......................... 50 2.3.3HypothesisTests ............................ 52 2.3.4ComputerSimulation .......................... 54 2.3.4.1Schemes ............................ 54 2.3.4.2Results ............................ 55 2.4GeneticMappinginaNon-EquilibriumPopulationattheZygoticLevel .. 61 2.4.1Introduction ............................... 61 2.4.2Notation ................................. 61 2.4.3RelationshipswithGameticLD .................... 63 2.4.4Model .................................. 64 2.4.5HypothesisTests ............................ 66 2.4.6ComputerSimulation .......................... 67 2.4.6.1Schemes ............................ 67 2.4.6.2Results ............................ 68 2.5Discussion .................................... 73 3GENETICMAPPINGOFCOMPLEXTRAITSBYMINIMIZINGINTEGRATEDSQUAREERRORS ................................. 75 3.1Introduction ................................... 75 3.2L2EApproach ................................. 77 3.3Method ..................................... 78 3.3.1MappingPopulation .......................... 78 3.3.2StatisticalModeling ........................... 80 3.3.2.1Approach1:modelingerrordensity ............. 81 3.3.2.2Approach2:modelingobservationdensity ......... 83 3.4HypothesisTesting ............................... 86 3.5MonteCarloSimulation ............................ 87 3.6AWorkedExample ............................... 89 3.7Discussion .................................... 90 4FUNCTIONALMAPPINGOFDEVELOPMENTALTRAITSBYANL2EMETHOD ....................................... 100 4.1Introduction ................................... 100 4.2StatisticalModels ................................ 101 4.2.1Notations ................................ 101 4.2.2L2EModel ................................ 102 4.2.3MeanCurveModeling ......................... 103 4.2.4CovarianceModeling .......................... 104 6

PAGE 7

............................... 105 4.3.1ComputationalAlgorithm ....................... 107 4.4RealDataAnalysis ............................... 108 5FUTUREDIRECTION ............................... 116 APPENDIX AL2EASYMPTOTICS ................................ 119 A.1PropositionandExamples ........................... 119 A.2UsefulFormulas ................................. 123 A.3SimulationStudy ................................ 124 BCOVARIANCEMATRICES ............................. 125 B.1SAD(1) ..................................... 125 B.2AR(1) ...................................... 126 REFERENCES ....................................... 128 BIOGRAPHICALSKETCH ................................ 137 7

PAGE 8

Table page 1-1Genotypicandphenotypicdatastructureforasmallsetofrandomsamplesfromamappingpopulation ................................ 28 2-1TheWahlundeect:Population1obeystheHWE,andsodoespopulation2.However,whenthesetwopopulationmixtogether,theoverallpopulationisnotinHWE. ........................................ 42 2-2EstimationofpopulationandquantitativegeneticparametersinaHWDpopulationatthehaplotypelevelanalyzedbythenon-equilibriummodel.Numbersinparenthesesarethestandarderrorsoftheestimates. ...................... 56 2-3EstimationofpopulationandquantitativegeneticparametersinaHWDpopulationatthehaplotypelevelanalyzedbytheequilibriummodel.Numbersinparenthesesarethestandarderrorsoftheestimates. ...................... 57 2-4EstimationofpopulationandquantitativegeneticparametersinaHWEpopulationanalyzedbythenon-equilibriummodel.Numbersinparenthesesarethestandarderrorsoftheestimates. ................................ 59 2-5EstimationofpopulationandquantitativegeneticparametersinaHWEpopulationanalyzedbytheequilibriummodel.Numbersinparenthesesarethestandarderrorsoftheestimates. ................................ 60 2-6DataissimulatedfromanonequilibriumpopulationbyazygoticLDmodel,andisanalyzedwiththezygoticLDmodel ..................... 69 2-7DataissimulatedfromanonequilibriumpopulationbyazygoticLDmodel,andisanalyzedwiththegameticLDmodel .................... 70 2-8DataissimulatedfromaequilibriumpopulationbyagameticLDmodel,andisanalyzedwiththezygoticLDmodel. ........................ 71 2-9DataissimulatedfromaequilibriumpopulationbyagameticLDmodel,andisanalyzedwiththegameticLDmodel ........................ 72 3-1Conditionalprobabilities(!jji)ofQTLgenotypesgivenmarkergenotypesofM1andM2foraF2population.r1,r2andraretherecombinationfractionsbetweenmarkerM1andtheQTL,betweentheQTLandmarkerM2,andbetweenthetwoankingmarkers. .............................. 92 3-2Nonoise ........................................ 94 3-310%noisepointswith=45 ............................ 94 3-410%noisepointswith=55 ............................ 95 8

PAGE 9

................. 98 4-1TheparameterestimationforQTLsdetectedbyMLEmethod .......... 115 4-2TheparameterestimationforQTLsdetectedbyL2Emethod .......... 115 9

PAGE 10

Figure page 1-1Genome,chromosomeandgene ........................... 17 1-2Genotypeanddiplotype ............................... 18 1-3TheextentofLDforappropriateresolutionofassociationstudies ........ 22 1-4Mixturedistributionwiththreecomponents .................... 31 3-1Theempiricaldensityofmicedata ......................... 76 3-2Thenoramldensityofsimulateddata ........................ 77 3-3TheprincipalofL2E ................................. 78 3-4Comparisonbetweenapproximatingerroranddatadensities ........... 93 3-5Illustrationofthemousemarkers .......................... 96 3-6Intervalmapping-scan ............................... 96 3-7Empiricaldistributionoftheunivariatetrait .................... 97 3-8L2Escanproleoftheweek5ofthemicegrowthdata .............. 98 3-9MLEscanproleoftheweek5ofthemicegrowthdata .............. 99 4-1MLEscanproleforthegrowthcurveofthemicedata .............. 111 4-2ThemeancurveplotsforallsignicantQTLsdetectedbyMLE ......... 112 4-3L2Escanproleforthegrowthcurveofthemicedata .............. 113 4-4ThemeancurveplotsfortheQTLsestimatedbyL2E ............... 114 10

PAGE 11

11

PAGE 12

12

PAGE 13

13

PAGE 14

1.2.1ModelIn1918,Fisherarguedthatthevariationofaquantitativetraitiscontributedbythreefactors:thegeneticeectthatcanbeinheritedthroughgenerations,theenvironmenteectthatchangetheexpressionofthetraitfromplacetoplace,andthegenebyenvironmentinteractioneectinwhichdierentgenesresponddierentlytotheenvironmentalchange.Estimatingtherelativecontributionsofthesefactorsisthecentral 14

PAGE 15

Standardanalyticapproacheslikeanalysisofvariancecanbeappliedtoestimatethecontributingvariancecomponents.Withtheadventofmolecularmarkertechnologies,thesevariancecomponentscanbeunderstoodattheindividualgenelevel.Statisticalmodelsfordissectingthephenotypicvarianceintoitscomponentshavewellbeendeveloped(LanderandBotstein1989;HaleyandKnott1992;Kaoetal.1999,2002;reviewedinWuetal.2007).Assumingthatthesecomponentsareindependentofeachother,phenotypicvariancecanbepartitionedintogeneticvariance,environmentalvariance,genotypebyenvironmentinteractionvariance,andresidualvariance,expressedas Toquantifytheproportionofphenotypicvariationinapopulationthatisattributabletogeneticvariationamongindividuals,asimplequantity,heritability(H2),isdenedasH2=Var(G) 1{1 ),wewillnotbeabletoobservethedesignmatrixforthegeneticfactors.Thus,standardregressiontechniquescannotbedirectlyappliedtoestimateandtest 15

PAGE 16

1-1 illustrateshowDNAsegmentsarepackedintochromosomes,givingaconcreteviewofhowgenesandmarkersarelinearlyalignedonthechromosomes.Allthegene-carryingchromosomestogetherformstheso-calledGenome,whichisalsothetotalsamplespaceforallfunctionalgenes.Genomesvarywidelyinsizefromspeciestospecies,forinstance,thegenomeforabacteriummaycontainaslittleas600thousandDNAbasepairs,whilehumanandmousegenomesmayhaveabout3billionbasepairs.Tomapaspeciclocusinagenomelikehuman'swouldbeaextremelydiculttask.Fordiploidspecies,thechromosomesalwayscomeinpairs(Fig. 1-1 ),asoneinheritedfromthefatherandanotheronefromthemother.Thesepairsarecalledhomologouschromosomes.Sinceroutinegenotypingprocedurestoelucidatethegenotypeofanindividualwithabiologicalassaycannotseparatethetwohomologouschromosomes,there 16

PAGE 17

Genome,chromosomeandgene:imagefromtheNationalHumanGenomeResearchInstitute(NHGRI)geneticillustrationsforchromosome isnodenitewaytodeterminewhichoneofthetwohomologouschromosomesaalleleactuallycomesfrom.Thus,foranindividual,onlythegeneticconstitutionorthespecicallelemakeup(orcalledgenotypes)aredirectlyobservable.Thegenotypeinformationcannotfullyrevealthegeneticstructureoftheindividual.AsshowninFig. 1-2 ,theobserveddoubleheterozygousgenotypeMmQqattwolocimayactuallycomefromtwodierentdiplotypes:MQjmqorMqjmQ.Thisfurthercomplicatesouranalysis,andtheanalysiscanbecomemoredicultasmorelociareinvolved.Ahaplotypeisacombinationofallelesatmultiplelinkedlocithataretransmittedtogetheronthesamechromosome.Twohaplotypesformadiplotype,whichsubsequentlycorrespondstoagenotype(Fig. 1-2 ).Analysisofthepatternsofhowhaplotypesare 17

PAGE 18

Genotypeisthegeneticconstitutionofanindividual.Adoubleheterozygousgenotypemaycomefromtwodierentdiplotypes,whichinturnareformsbydierenthaplotypes. transmittedthroughgenerationscanhelpusunderstandtherecombinationeventoccurringbetweenthemarkersandQTLs,establishingabasistoderivetheconditionalprobabilityrelationshipbetweenthem(table 1{4 andtable 3-1 ). 1.3.1HistoryQTLanalysiscandatebacktoasearlyas1923,whenSaxrstdiscoveredthatpatternandpigmentmarkerswerecloselylinkedwithseedsizeinbeans.TherationalbehindSax'sQTLanalysisisthatmarkersnearbyafunctionalQTLcanabsorbpartialQTLeectduetolinkedinheritance.ThecloseramarkeristoaQTL,themoreQTLeectthismarkerabsorb.Thus,asimpleregressionanalysisandANOVAormaximumlikelihoodcanbeappliedonasinglemarkertoestimateandtestitsmaineectthatmayactuallyderivefromanearbyQTL(Solleretal.1976;Weller1986,1987;Simpson1989).ThedirectanalysisofgeneticmarkerscouldinfertheexistenceofapotentialQTL.However,thisanalysisisnotabletopreciselyidentifythelocationoftheQTLbecauseit 18

PAGE 19

19

PAGE 20

20

PAGE 21

21

PAGE 22

1-3 ).Otherwise,amoderate-densitymapwillbesucientifLDextendsoverawidedistance.ThisdesignisappropriateformappingQTLsthatgoverninfectiousdiseaseslikeHIV/AIDS. Figure1-3. TheextentofLDforappropriateresolutionofassociationstudies.(a)LDdeclinesslowlywithincreasingdistancefromthegene(circle)responsibleforthephenotypeonachromosome.Inthiscase,evenalowdensityofmarkers(shownasverticallines)issucienttoidentifyassociatedmarkers(thickarrows).(b)LDdeclinesveryrapidlyaroundthecausativegene(circle),andamuchgreaterdensityofmarkersisrequiredtoidentifyanassociatedmarker(thickarrow).Adaptedfrom(Rafalski2002). 22

PAGE 23

23

PAGE 24

24

PAGE 25

(1) Increasethemarkerdensity.Themoremarkerswehave,thesmallertheintervalsize,andthereforethehigherthemapresolution.Thisistechnicallytheeasiestapproachforincreasingthemappingresolution,althoughitisalabor-intensiveandtime-consuming\weblab"undertaking. (2) Increasethecrossoverdensitybyincreasingcurrentrecombinants.Bygeneratingmoreprogenyofamappingpopulation,onecanincreasethecrossoverdensity.However,thisisanexpensivewaybecausethenumberofbackcrossorF2progenyrequiredtomapaQTLdownto5cMorlessisenormous(5000)(Darvasi1998).Onealternativeforincreasingthecrossoverdensityistogenerateadvancedintercrosslines,suchasF3,F4,...,Fn. (3) Increasethecrossoverdensitybyutilizinghistoricalrecombinants.Thisapproachcapitalizesonthenon-randomallelicassociation,i.e,LD,betweenaQTLandcloselylikedmarkersatthepopulationlevel.Inthefollowingsection,thisapproachwillbediscussedindetail. 25

PAGE 26

26

PAGE 27

1.4.1TheMixtureModelConsiderasetofnrandomsamplesdrawnfromamappingpopulationinanequilibrium.Foranexperimentalcross,thereisnodistortedsegregation,i.e.,marker 27

PAGE 28

Genotypicandphenotypicdatastructureforasmallsetofrandomsamplesfromamappingpopulation Sub-MarkerHeartRate jectM1M2Mmy andQTLsegregationfollowsMendel'srstlaw.Foranaturalpopulation,randommatingisassumed,leadingallgeneticlocitobeatHardy-Weinbergequilibrium.AssumethatallsampledsubjectsaregenotypedformultipleSNPsandmeasuredforacomplextrait.Table 1-1 tabulatespartofanassumedmarkerandphenotypicdataset.ThreegenotypesateachSNPwithtwoallelesMandmarelabeledby2(MM),1(Mm)and0(mm).GeneticmappingintendstodetectaQTLwithtwoallelesQandqresponsibleforaquantitativetrait(y)thatisassociatedwiththemarkerconsidered.ThestatisticalfoundationforQTLmappingislaidoutinthemixturemodel(McLachlanandPeel2000),inwhicheachsubjectmustcarryoneandonlyoneofthethreeQTLgenotypes,labeledby2(QQ),1(Qq)and0(qq).Thus,thelikelihoodofthesubjectsgiventheirtraitphenotypes(y)andmarkergenotypes(M)isexpressedasamixtureofdensityfunctions,i.e., where!=(!2ji;!1ji;!0ji)isthevectorforconditionalprobabilitiesofQTLgenotypesgiventhemarkergenotypeofsubjecti,=(2;1;0)isthevectorforgenotypicvaluesofthethreeQTLgenotypes,and2istheresidualvariance. 28

PAGE 29

1{3 )providesaframeworkbywhichobservationsmaybegroupedintodierentQTLgenotypes,anditincludesthreetypesoftasks: (1) Derivethestructureofthemodelincludingthemixtureproportions(denotedasthefrequenciesofQTLgenotypes)andthedensityfunctions(expressedintermsofQTLeectsandresidualvariance). (2) OptimizethesettingofQTLmappingfromanexperimentaldesignperspective. (3) Developastatisticalmethodfortheestimationoftheunknownparametersdenedinthemixturemodel.ThersttaskistheprerequisiteforQTLmapping.Incurrentapproaches,twokeyassumptionshavebeenusedtoderivethestructuresofQTLmapping,althoughtheyareviolatedinmanypracticalsituations.Below,Iwilluselinkagedisequilibrium(LD)mappingasanexampletoexplainthesetwoassumptions.Hardy-WeinbergEquilibrium(HWE):InLDmapping,theconditionalprobabilitiescontainedwithinequation( 1{3 )areexpressedintermsofhaplotypefrequencies,i.e.,p11forMQ,p10forMq,p01formQ,andp00formq.Thesefourhaplotypesrandomlyunitetogenerate16diplotypes(i.e.,apairofhaplotypes).UndertheHWE,thediplotypefrequenciescanbeexpressedastheproductsofthefrequenciesofeachhaplotypethatconstitutesthediplotype.Atotalof16diplotypeswillbecollapsedintoninedistinguishablegenotypes.Totheend,thegenotypefrequenciesaretabulatedas Qjq+qjQ qq p211 p201 Ofthesegenotypes,doubleheterozygoteMmQqincludestwodierentdiplotypes(asetofhaplotypepairs,Fig. 1-2 )eachwithadierentfrequencyofformation.Ifwecanobserve 29

PAGE 30

1{3 ),eachQTLgenotypegroupisspeciedbyaknowndistribution,dependingonthetypeofacomplextrait.Forcontinuoustraits,anormaldistributionisoftenassumed(Wuetal.2007),whereas,forcounttraits,aPoissondistributionmaybeanoptimalapproximation(Maetal.2007).Byassumingthatthevariationofaphenotypictraitfollowsanormaldistribution,eachoftwopossiblegenotypeclassescanbedescribedbyf2(yij2;2)=N(2;2);f1(yij1;2)=N(1;2);f0(yij0;2)=N(0;2);where2,1and0arethephenotypicmeansforthethreeQTLgenotypesQQ,Qqandqq,respectively,and2istheresidualvariancewithineachQTLgenotype.QTLgenotype-specicnormaldistributionsarediagrammedinFig. 1-4 1{3 ).Thedetailedprocedureisdescribedbelow: 30

PAGE 31

Amixturedistributioncomposedofthreecomponents.Thethreereddashedlinesarethenormaldistributionsforeachcomponentinvolvedandhebluesolidlineisanobservedmixturenormaldistribution Dierentiatingthelog-likelihoodofequation( 1{3 )withrespecttoanyunknownparameterlyieldsthelikelihoodequations@ @llogL(p;qjy;M)=nXi=12Xj=0@!jji @qlogfj(yijj;2) @qlogfj(yijj;2)=nXi=12Xj=0jji@!jji @qlogfj(yijj;2);where jji=!jjifj(yi) 31

PAGE 32

1{3 ),!jjiistheprobabilityofsubjectitocarryQTLgenotypej.This(prior)probabilitytakesthevaluesoftheabovetabledependingthe(known)markergenotypeofindividualitocarry.Directlysolvingtheaboveequationisadicultytask.Fortunately,aclosed-formEMalgorithmcanbederivedtoestimatethehaplotypefrequenciesbyusingthecalculatedposteriorprobabilitywiththelog-likelihoodequations: ^p11=1 2n"n2Xi=1(22ji+1ji)+n1Xi=1(2ji+1ji)#;^p10=1 2n(n2Xi=1(1ji+20ji)+n1Xi=1[0ji+(1)1ji]);^p01=1 2n(n0Xi=1(22ji+1ji)+n1Xi=1[1ji+(1)1ji]);^p00=1 2n"n0Xi=1(20ji+1ji)+n1Xi=1(0ji+1ji)#; where=p11p00 ^j=Pni=1jjiyi Iterationismadebetweenequations( 1{5 )(Estep)andequations( 1{6 )and( 1{7 )(Mstep).Thisiterativeprocedurecontinuesuntilstableestimatesoftheseparametersareobtained. 32

PAGE 33

1.4.4.1Marker-QTLassociationWhetherornotthemarkerconsideredcandetectasignicantQTLcanbetestedontheassociationbetweenthemarkerandQTL.TraditionalLDmappingisbasedongameticlinkagedisequilibriumthatdescribesthenon-randomassociationbetweenallelesatdierentlociwithinahaplotype.Accordingtothisdenition,haplotypefrequenciescanbepartitionedintodierentcomponentsexpressedas wherepandqaretheallelefrequenciesofMandQ,whereastheallelefrequenciesofmandqare1pand1q,andDisthegameticlinkagedisequilibriumbetweenthemarkerandQTL.Fromtheseexpressions,itcanbeshownthatD=p11p00p10p01.Underlinkageequilibrium,thegametefrequenciesareexpressedas Underlinkageequilibrium,thegenotypefrequenciesaretabulatedas Qq qq p2q2 2p(1p)(1q)2 (1p)2(1q)2 33

PAGE 34

1{3 )isrewrittenas (1p)2f1(yi)+(1p)2(1q)2 (1{13) Thelog-likelihoodratioteststatisticforthesignicanceofLDiscalculatedbycomparingthelikelihoodvaluesundertheH1(fullmodel,equation( 2{24 ))andH0(reducedmodel,equation( 1{12 ))using LR1=2[logL0(~!;~;~2jy;M)logL1(^!;^;^2jy;M)];(1{14)wherethetildesandhatsdenotetheMLEsofparametersunderthenullandalternativehypothesesof( 1{13 ),respectively.Bycomparingthestructuresoflikelihoods( 2{24 )and( 1{12 ),thelatter(nullhypothesis)isnestedwithintheformer(alternativehypothesis).Thus,theLR1isconsideredtoasymptoticallyfollowa2distributionwithonedegree 34

PAGE 35

2{24 )and( 1{12 )willbereducedtoQni=1f(yi)iftheQTLeectisnotsignicant. whereistheoverallmean.TheeectoftheQTLcanbetestedusingthefollowinghypothesesexpressedas Thelikelihoodunderthealternativehypothesisof( 1{16 )hasthesameformas( 2{24 ).Thelikelihoodunderitsnullhypothesisisexpressedas whichisnotrelatedtomarkerinformation.Thenullhypothesisof( 1{16 )isnotnestedwithinitsalternative,becausetheDvalueisxedinthealternativewhereasitisfreein 35

PAGE 36

LR2=2[logL0(~;~2jy)logL1(^!;^;^a;^d;^2jy;M)];(1{18)wherethetildesandhatsdenotetheMLEsofparametersunderthenullandalternativehypothesesof( 1{16 ),respectively,whichmaynotfollowa2distributionwithtwodegreesoffreedom.ThethresholdforthesignicanceofQTLeectcanbedeterminedfrompermutationtests(ChurchillandDoerge1994).TheadditiveanddominanceeectsofthesignicantQTLcanbetestedseparatelybyformulatingthenullhypothesisH0:a=0orH0:d=0,respectively.TheestimatesofquantitativegeneticparametersunderthesenullhypothesescanbemadewiththesameEMprocedureasdescribedinequations( 1{7 ),exceptforaconstraintexpressedas~2=~0forH0:a=0,and~1=1 2(~2+~0)forH0:d=0.Inbothcases,thecalculatedlog-likelihoodvaluesfollowa2-distributionwithonedegreeoffreedom. 36

PAGE 37

IncorporategeneticallymeaningfulparametersthatdenethemixtureproportionsofQTLgenotypesintothemixturemodel,andthenderiveageneralapproachfortheestimationofpopulationgeneticparameters(Chapter2); (2) Proposeageneralmodelforapproximatingthedistributionofaunivariatephenotypictraitminimizingtheintegratedsquareerror(Chapter3); (3) ExtendthemethodologydevelopedinChapter3intomultivariatesettingandderiveageneralprocedureforthegenomewideenumerationofQTLthatcontrolacomplextrait(Chapter4).IwillperformextensivesimulationstudiestoinvestigatethestatisticalpropertiesofmyrobustmappingmodelintermsofitsestimationprecisionandpowerforQTLdetection.Realexampleswillbeusedtodemonstratetheutilizationofthemodelandvalidateitsusefulnessinpracticalgenomicprojects. 37

PAGE 38

38

PAGE 39

21parametersareneededtodescribethefullmodel.Thus,theteststatisticsyieldsa2testwithk(k1) 2degreeoffreedom.Particularly,amarkerwithtwoalleleswouldyielda2testwithonedegreeoffreedom.Typically,violationofHWEtestingisviewedasasignalofgenotypingerrorsinunrelatedindividuals(Gomesetal.1999;Hoskingetal.2004),butthisisnottrueinallcases.Forexample,inahybridpopulationarisingfrommixingofgenesbetweentwoormorepopulations,allelesderivedfromthesamepopulationtendstoclustertogether,eitherbecauseofWahlundeect(1928,section 2.2.2 )orbecauseofstrongselectionagainsthybridorbecauseofboth.ThesetypesofHWDsarisefromtheheterogeneityofthepopulation,butwithineachsubpopulation,theHWEisstillsatised.SotheresultingHardy-Weinbergdisequilibrium(HWD)atindividuallociorhaplotypesconstructedbymultiplelocimaybemaintainedinthemixingpopulationanditwouldbeinappropriatetothrowawaythelocuswithHWDinthewholepopulation. 39

PAGE 40

2.2.1Hardy-WeinbergEquilibriumAnaturalpopulationsampledtomapaQTLbasedonlinkagedisequilibriumconsistsofospringfromamixtureofdierentmatingtypes,andtheratiosofthedierentgenotypesinsuchapopulationareweightedaveragesofthesegregationratiosofthedierentmatingtypes,theweightsbeingtherelativefrequenciesofthedierentmatingtypes.Whenthematingtypefrequenciesarisefromrandommating,theratiosofthedierentgenotypesfollowamathematicalmodelestablishedindependentlybytheEnglishmathematicianHardy(1908)andtheGermanphysicianWeinberg(1908).ConsideralocuswithallelesAinafrequencyofpandainafrequencyof1p.LetthefrequenciesofthethreegenotypesAA,AaandaainalargepopulationatgenerationbeP2,P1andP0,whichsumintoone.HardyandWeinberg'sresultwasthat,ifindividualsinthepopulationmatedwitheachotheratrandom,thesefrequencieswouldsatisfytherelationship Whenthisrelationshipisheld,thefollowingtwotheoremscanbeproven: 40

PAGE 41

wheretdenotesagenerationofthepopulation. 41

PAGE 42

Pop.1() TheWahlundeect:Population1obeystheHWE,andsodoespopulation2.However,whenthesetwopopulationmixtogether,theoverallpopulationisnotinHWE. aregivenbyH1=2p1(1p1)+(1)2p2(1p2)=2[p1(1p1)+(1)p2(1p2)]H0=2p(1p)=2[pp2]:WehaveH0>H1,sincepp2=p1+(1)p2[p1+(1)p2]2=p1(1p1)+(1)p2(1p2)+p21+(1)p22[p1+(1)p2]2=p1(1p1)+(1)p2(1p2)+(1)[p21+p222p1p2]p1(1p1)+(1)p2(1p2)Thisistheso-calledWahlundeect,whichcanalsobeprovedbyJensen'sinequality.Proof:LetRVXbeequaltop1withprobabilityandp2withprobability(1).E(X)=p1+(1)p2=pLetg(x)=2x(1-x),whichisaconcavefunction.then,g(E(X))=2p(1p)E[g(X)]=2[p1(1p1)+(1)p2(1p2)]ByJensen'sinequality,g(E(X))E[g(x)])pp2p1(1p1)+(1)p2(1p2):TheWahlundeectisoneofthemajorcausesforthepopulationstudiedtobeinHWD. 42

PAGE 43

Ifweassumethattheallelicfrequenciesareconstantfromgenerationtogeneration,thatis,thatp(t)q(t)=pq,etc.,andwedeneD(t)=p11(t)pq,thenwecanwrite( 2{4 )asD(t+1)=(1r)D(t);

PAGE 44

wherewedeneD(1)=p11pq.WeconcludethatD(t+1)convergesto0atthegeometricrate1r.Thus,linkageequilibrium,orgametephaseequilibrium,p11=pq,isapproachedgraduallyandwithoutoscillation.Thelargerisr,thefasteristherateofconvergence,themostrapidbeing(1 2)tforunlinkedloci.Theprincipleoflinkagedisequilibriumdecayingwithgenerationbuildsupanalternativemappingstrategy(WeissandClark2002).Theapproachofusinglinkagedisequilibriumtoconstructahigh-densitylinkagemapallowsustoincreasethesamplesizebyusingalltherecombinationeventsoccurringbetweendierentmarkerssincetheoriginofthemutation,insteadofonlythoseeventsinthepastfewgenerationsofafamily.Accordingtoequation( 2{5 ),asignicantD(t+1)valuedetectedinthecurrentgenerationimpliesthatthedecayrateaftertgenerations,(1r)t,shouldbesignicantlydierentfromzero.Thiscanoccuronlywhenr!0undertheassumptionthatthedisequilibriumisgeneratedlongtimeago(tislarge).Thisassumptionisplausiblebecauseitdidtakemutations/LDalongtimetospreadinapopulation.Therefore,linkagedisequilibriumanalysisprovidesanimportanttoolfornemappingofgenesaectingaquantitativetrait. 44

PAGE 45

2{6 )providesasimplewaytotestthesignicanceofLD.Itcanbeshownthat2=2nr2;hasachi-squaredistributionwithonedegreeoffreedom.Thisshowsanicefeatureofr2. 2.3.1NotationandDenitionsConsideramarkerwithallelesM(inafrequencyofp)andm(inafrequencyof1p)aswellasanassociatedQTLwithtwoallelesQ(inafrequencyofq)andq(inafrequencyof1q).LetP2,P1,P0andQ2,Q1,Q0bethefrequenciesofgenotypesatthemarkerandQTLinanaturalpopulation,respectively.ThefrequenciesoffourhaplotypesformedbythemarkerandQTLaredenotedasp11forMQ(1),p10forMq(2),p01formQ(3)andp00formq(4).TheHWDatthehaplotypelevelimpliesthatthediplotypefrequencies 45

PAGE 46

Diplotype Frequency p211+D11MQjMq p210+D22MqjmQ p201+D33mQjmq p200+D44whereD::'sarethecoecientsofHWDforapairofhaplotypes.Fromtheabove,weexpressthefrequencyofhaplotypeMQasp11=1 2[2(p211+D11)+2p11p10+D12+2p11p01+D13+2p11p00+D14]=(p211+p11p10+p11p01+p11p00)+D11+1 2(D12+D13+D14)=p11+D11+1 2(D12+D13+D14);whichleadstoD11=1 2(D12+D13+D14):Similarly,wecanobtainD22=1 2(D12+D23+D24);D33=1 2(D13+D23+D34);andD44=1 2(D14+D24+D34):

PAGE 47

P22= (2) 2p11p10+2D12 P21= (1) 2p11p01+2D13 2p10p00+2D24 P20= (0) 2p01p00+2D34 CurrentpopulationgeneticmodelsforLDmappingassumethatthehaplotypesconstructedbyindividuallociareinHWE.Withthisassumption,jointQTL-markergenotypefrequenciescanbeexpressedastheproductsofthecorrespondinghaplotypefrequencies,asshowninequation( 1{4 ).Accordingtotraditionalquantitativegenetictheory,ifindividuallociareassumedtobeatHWE,thenthehaplotypesconstructedbytheselociarealsoatHWE(LynchandWalsh1998).Below,weshowtheinconsistencyofHWEatindividuallociandtheirhaplotypes.Proposition:ThehaplotypesmaynotbeinHWEevenifindividuallocithatconstitutethehaplotypesareatHWE.Proof:IfapopulationisinHWEatthehaplotypelevel,thenwewillhaveD12=D13=D14=D23=D24=D34=0.Fromdisplay( 2{7 ),wecalculatethegenotypefrequenciesas 47

PAGE 48

(2{8) Fromequations( 2{8 )and( 2{9 ),itisnotpossibletoassureD12=D13=D14=D23=D24=D34=0.However,ifthehaplotypesareatHWE,individuallocimustbeatHWE. 48

PAGE 49

2.3.2.1LinearmodelThephenotypicvalueofacomplextraitforsubjectiattheputativeQTLisexpressedasyi=2Xj=0ij+ei;i=1;;nwhereiistheindicatorvariabledenedas1ifsubjectihasaQTLgenotypej(j=2forQQ,1forQqand0forqq)and0otherwise,andeiistheresidualerror,normallydistributedasN(0,2).WecandecomposetheQTLgenotypicvaluesjintotheirunderlyinggeneticeectsbydening2=+a,1=+dand0=a,whereistheoverallmean,aistheadditiveeect,anddisthedominanceeect. 22(yij)2#)where,!jjidenotestheconditionalprobabilityofindividualicontainingtheQTLgenotypejgivenitsmarkerinformation. 49

PAGE 50

2log(2)1 2log(2)1 22(yij)2+log(!jji)Now,supposetheparametersatstepris(r)=((r)p;(r)q),thentheconditionaldensityofzigivenyiatsteprisp(zi=jjyi;(r))=f(yi;zi=j;(r)) 50

PAGE 51

2log(2jkji)1 2log(2)1 22(yij)2+log(!jji)=2Xk=0nkXi=12Xj=0jkji1 2log(2jkji)1 2log(2)1 22(yij)2+log(!jji)IntheM-step,weneedtomaximizetheaboveE[`c]functionwithrespectto(p;q).Marker-QTLdiplotypefrequencies,asshowninequation( 2{7 ),areestimatedby Quantitativegeneticparametersq=(2;1;0;2)areestimatedby BothEandMstepsareiteratedbetweenequations( 2{10 )and( 2{11 )and( 2{12 )untiltheestimatesarestable.Thestablevaluesarethemaximumlikelihoodestimates(MLEs)ofparameters. 51

PAGE 52

2{7 ),itiseasytoobtaintheMLEsofallelefrequenciesatthemarkerandQTLandtheirhaplotypefrequenciesasp=(P22+P12+P02)+1 2(P21+P11+P01);q=(P22+P12+P20)+1 2(P12+P11+P10);p11=P22+1 2(P12+P21+P11);p10=P02+1 2(P12+P01+P101);p01=P20+1 2(P21+P10+P101);p00=P00+1 2(P01+P10+P11):ThegameticLDandsixhaplotypeHWDcoecientsareestimatedby fromwhichthelog-likelihoodratioisestimatedandcomparedwiththepredeterminedthresholdfrompermutationtests.TheadditiveanddominanceeectscanalsobetestedindividuallytoexploretheinheritancemodeoftheQTLforthecomplextrait. 52

PAGE 53

Thelog-likelihoodratiocalculatedunderthesehypothesesfollowsa2distributionwithonedegreeoffreedom.Theestimatesoftheparametersunderthenullhypothesisof( 2{21 )canbeobtainedwiththeEMalgorithmderivedabove,withaconstraintexpressedasP22+1 2(P12+P21+P11)=[(P22+P12+P02)+1 2(P21+P11+P01)][(P22+P12+P20)+1 2(P12+P11+P10)]:Also,sixHWDcoecientsatthehaplotypelevelaretestedbythefollowinghypotheses: Thelog-likelihoodratiocalculatedunderthesehypothesesfollowa2distributionwithsixdegreesoffreedom(equallingtothenumberofdisequilibria).AllthesedierentHWDdisequilibriacanbetestedseparatelyorjointly.TheestimatesofalltheparametersexceptforthosedisequilibriatestedcanbeobtainedfromtheEMalgorithmdescribedabove,butincorporatedwiththeconstraintsoflettingthesetestedparametersequaltozerousingequations( 2{14 ),( 2{15 ),( 2{16 ),( 2{17 ),( 2{18 ),or( 2{19 ).Thelog-likelihoodratioforeachofthesedisequilibriumtestsis2-distributed. 53

PAGE 54

2.3.4.1SchemesToshowtherobustnessofthenewmodel,IperformedMonteCarlosimulationstudiedbydesigningtwodierentschemes.Bothschemessimulatemarkerandphenotypicdata;therstschemeassumesthatthepopulationisinnon-equilibrium(i.e.,inHardy-Weinbergdisequilibrium,HWD,atthehaplotypelevel)whereasthesecondschemeassumesthatthepopulationisinHWE.Thetwosetsofdataunderdierentschemeswillthenbeanalyzedbymynewlydevelopedmodel(referredastothenon-equilibriummodel)andtraditionalLDmodel(referredastotheequilibriummodel).Theresultsfromdierentsetsofdataanalyzedbydierentmodelsarethencompared.TheHWDdataweresimulatedbyassumingthatdiplotypefrequenciesforamolecularmarkeranditsassociatedQTLarenotjusttheproductsofthecorrespondinghaplotypesderivedfrommaleandfemaleparents.Iusedthegenotypefrequenciesshowninequation( 2{7 )tosimulatethedistributionofgenotypesinanon-equilibriumpopulation.Thepopulationgeneticparametersusedformysimulationarep=0:6andq=0:6fortheallelefrequenciesofthemarkerandQTL,D=0:05forthegameticlinkagedisequilibriumbetweenthemarkerandQTL,andD12=0:05,D23=0:05,D13=D14=D24=D34=0forthecoecientsofHWDatthehaplotypelevel.ApositiveDvalueimpliesthatthecommonallelesofthemarker(M)andQTL(Q)areincouplingphase.ThesignsofhaplotypeHWDarerelatedtothedierencesofhomozygoteandheterozygotefrequenciesinthesimulatedpopulationfromthepopulationexpectedathaplotypeHWE.Thephenotypicvaluesofaquantitativetraitwereexpressedasthesumofgenotypicvaluesandresidualerrors.ThegenotypicvalueforaQTLgenotypeisdeterminedbytheoverallmean(=15),additiveeect(a=5),anddominanceeect(d=0).Theresidualerrorisassumedtofollowanormaldistributionwithmeanzeroandvariance(2)whichisscaledtomeetapredeterminedheritabilitylevel.Inthissimulation,Iassume 54

PAGE 55

2{7 ).Threelevelsofsamplesizeareassumed,n=250,500,and1000. 2-2 and 2-3 .Itisnotsurprisedthatbothmodelsprovidepreciseandaccurateestimatesoftheallelefrequencyatthemarkerbecausethemarkergenotypesareobservable.BothmodelalsoprovidereasonablypreciseestimatesoftheallelefrequencyattheQTL,butthenon-equilibriummodelseemstoperformbetterthanitscounterpartwhensamplesizeisrelativelysmall(250).Thenon-equilibriummodelispreciseenoughtoestimateallthedisequilibriumcoecientsincludingthegameticLDandHWDatthehaplotypelevel,althoughtheestimationprecisionincreaseswithsamplesize.ThesedisequilibriumparametersandtheQTLallelefrequencybelongtopopulationgeneticparameterswhoseestimatesareaectedmorestronglybysamplesizethantheheritabilityoftheQTL(Table 2-2 ).Thenon-equilibriummodelgenerallycanwellestimatetheadditiveanddominanceeectsoftheQTLhiddeninanon-equilibriumpopulationandtheresidualvariance.Itappearsthattheequilibriummodelprovidesbetterestimatesoftheseparameters,althoughthedatawerederivedfromahaplotypeHWDpopulation.However,theestimationprecisionofthenon-equilibriummodelcangreatlyincreasewithincreasingsamplesizeandheritability.Dierentfrompopulationgeneticparameters,quantitativegeneticparameters(includingQTLeectsandtheresidualvariance)aremoresensitive 55

PAGE 56

EstimationofpopulationandquantitativegeneticparametersinaHWDpopulationatthehaplotypelevelanalyzedbythenon-equilibriummodel.Numbersinparenthesesarethestandarderrorsoftheestimates. TrueH2=0:4H2=0:2H2=0:1Value250500100025050010002505001000

PAGE 57

EstimationofpopulationandquantitativegeneticparametersinaHWDpopulationatthehaplotypelevelanalyzedbytheequilibriummodel.Numbersinparenthesesarethestandarderrorsoftheestimates. TrueH2=0:4H2=0:2H2=0:1Value250500100025050010002505001000

PAGE 58

2-3 ).ThepowerofQTLdetectionbythenon-equilibriummodelisreasonablyhighfrom0.30forasmallheritabilityandsmallsamplesizeto0.42forasmallheritabilitybutlargesamplesizeto0.53foralargeheritabilitybutsmallsamplesizeto0.68foralargeheritabilityandlargesamplesize.ThepowertodetectaQTLinanon-equilibriumpopulationbythetraditionalequilibriummodelislow,regardlessofincreasingheritabilityandsamplesize.Inthesecondsimulationscheme,thedataweresimulatedbyassumingnohaplotypeHWD.ThisisaspecialcaseoftherstschemebylettingD12=D23=D13=D14=D24=D34=0.Foralltheotherparameters,Itookthesamevaluesasusedintherstscheme.Thesimulateddataweresubjecttoanalysisbythenon-equilibriumandequilibriummodels.Overall,thenon-equilibriummodelcanprovidegoodestimatesofallgeneticparameters,especiallywhenthereisalargesamplesizeand/oralargeheritability(Table 2-4 ).Also,thepowerofQTLdetectionbythenon-equilibriummodelisreasonablyhigh,increasingwithheritabilityandsamplesize.However,theequilibriummodelseemstoprovidesbetterestimatesandpowerforanequilibriumpopulationthanthenon-equilibriummodel.Thus,itisrecommendedthatforsuchanequilibriumpopulationtheequilibriummodelshouldbeused.Inpractice,thereisnoinformationaboutthenatureofamappingpopulation,whichaectsthedecisionofchoosingamoreappropriatemodel.Byintroducingmodelselectioncriteria,thisquestioncanbesolved.Aftertheestimationofparametersbydierentmodels,wefurthercalculateAICorBICvalues.AlowerAICorBICvaluecorrespondstoamoreappropriatemodel. 58

PAGE 59

EstimationofpopulationandquantitativegeneticparametersinaHWEpopulationanalyzedbythenon-equilibriummodel.Numbersinparenthesesarethestandarderrorsoftheestimates. TrueH2=0:4H2=0:2H2=0:1Value250500100025050010002505001000

PAGE 60

EstimationofpopulationandquantitativegeneticparametersinaHWEpopulationanalyzedbytheequilibriummodel.Numbersinparenthesesarethestandarderrorsoftheestimates. TrueH2=0:4H2=0:2H2=0:1Value250500100025050010002505001000

PAGE 61

2.4.1IntroductionInsection 2.3 ,thenon-equilibriumpropertyofamappingpopulationwasunderstoodbyspecifyingHardy-Weinbergdisequilibriumatthehaplotypelevel.Thistreatmentleadstothestructureofmarker-QTLgenotypefrequenciesasshowninequation( 2{7 ).Thenon-equilibriumpropertyofapopulationcanalsobeunderstoodbyspecifyingmarker-QTLnonrandomassociationsatthezygoticlevel.Severalearlypapershaveexaminedpossiblegeneticandevolutionarycausesforzygoticassociationsinanonequilibriumpopulation(Haldane1949;BennettandBinet1956;Charlesworth1991;BartonandGale1993).Also,thetheoryforzygoticdisequilibriahaswellbeendevelopedintheliteratures(Weir1996;Yang2000,2002).However,thereisnoapplicationofzygoticdisequilibriaintothegeneticmappingofcomplextraitsalthoughmanymappingpopulationsmaybeinzygoticassociations.Zygoticnonrandomassociationscanbedividedintodierentcomponents,i.e.,HWDatindividualloci,gameticandnon-gameticlinkagedisequilibria,trigenic,andquadrigeniclinkagedisequilibriabetweenthemarkerandQTL.Bytestingthesedierenttypesofdisequilibria,wecangeneralizeaframeworkforinferringtheexistenceoftheunderlyingQTLforacomplextrait.Liuetal.(2006)usedthesevetypesofdisequilibriatostudythegeneticstructureofahybridpopulationincanine.Theyexaminedtherelationshipsofcompositegameticandnongameticdisequilibrium,trigenicdisequilibrium,andquadrigenicdisequilibriumwithgeneticdistancethroughoutthecaninegenome.Inthischapter,Iwillrstoutlinethedescriptionofnonrandomassociationandthenproposeastatisticalmodelformappingcomplextraitsbyconsideringnonrandomassociations.Simulationstudieswillbeusedtodemonstratethestatisticalpropertyofthenewmodel. 61

PAGE 62

Qq qq MM P22= P21= P20= ThesedisequilibriacompletelycharacterizethelinkagedisequilibriaamongthegenotypesbetweenthemarkerandQTL.Inarandom-matingpopulation,alltheseDvaluesareequalto0.Below,IwillshowthattheselinkagedisequilibriumamongthegenotypesareindependentoftheHardy-Weinbergdisequilibriaateachlocus,andthusissuitableforthezygoticLDmapping.Theorem:TheHWDsatindividuallociareindependentofzygoticLDsbetweentheseloci.Proof:Fromthedisplay( 2{23 ),theallelefrequenciesatthemarkerandQTLcanbederivedasp=P2+1 2P1;q=Q2+1 2Q1:

PAGE 63

2P1)2=P2P01 4P21;andDQ=Q2q2=Q2(Q2+1 2Q1)2=Q2Q01 4Q21:NeitherofDMandDQcontainsazygoticLD.TheindependencebetweenHWDsatindividuallociandtheirzygoticLDssuggeststhattheQTL-markerassociationsatthezygoticlevelcanbeusedforLDmapping,regardlessofthepopulationinHWEorHWD. 1{4 )and( 2{23 ),wederivetherelationshipsbetweenzygoticdisequilibriaandLDdisequilibriumasfollows:DM=0;DQ=0D1=2DpApQ+D2D2=2DpApq+D2;D3=2DpapQ+D2;D4=2Dpapq+D2:ItcanbeseenthatZygoticLDmappingreducestogameteLDmapping. 63

PAGE 64

whereparametervector!=(P2;P1;Q2;Q1;D1;D2;D3;D4)containsthegenotypicfrequenciesofthemarkerandQTLandtheirnonrandomdisequilibria,vector=(2;1;0)containsthreegenotypicvalueswhichcanbepartitionedintoadditive(a)anddominanceeects(d),and2istheresidualvariance.Basedonthestructureofgenotypefrequenciesinequation( 2{23 ),P22=P2Q2+D1P12=P2Q1(D1+D2)P02=P2Q0+D2P21=P1Q2(D1+D3)P11=P1Q1+(D1+D2+D3+D4)P01=P1Q0(D2+D4)P20=P0Q2+D3P10=P0Q1(D3+D4)P00=P0Q0+D4;

PAGE 65

TheEMalgorithmisderivedtoestimatetheunknownparameters.IntheEstep,theposteriorprobabilityofaQTLgenotype(j)givenamarkergenotype(k)forindividualiiscalculatedby jkji=Pjkfj(yi) IntheMstep,marker-QTLgenotypefrequenciesareestimatedthroughthecalculatedposteriorprobabilitiesby andgenotypicvaluesoftheQTLgenotypesandresidualvarianceareestimatedby 65

PAGE 66

2{33 )and( 2{34 ),( 2{35 ),and( 2{36 )untiltheestimatesarestable.TheMLEsofmarker-QTLgenotypicfrequenciesareusedtoobtaintheMLEsof!=(P2;P1;Q2;Q1;D1;D2;D3;D4)withequations( 2{25 ){( 2{32 ). 2P1DM=P2p2 forthemarker,and 2Q1DQ=Q2q2; fortheQTL.ThedegreeofoveralldisequilibriaforthemarkerandQTLcanbetestedbythefollowinghypotheses: 66

PAGE 67

2{37 ),( 2{38 ),( 2{29 ),( 2{30 ),( 2{31 ),or( 2{32 ).Thelog-likelihoodratioforeachofthesedisequilibriumtestsis2-distributed. 2.4.6.1SchemesSimulationstudieswereperformedtoinvestigatethestatisticalpropertiesofthemodel(referredastothezygoticmodel)derivedtoincorporatenon-randomassociationsbetweenthemarkerandQTLatthezygoticlevel.Iwillusetwodierentschemestosimulatethedata.Therstoneassumesthatthepopulationisinanon-equilibriumstatebecauseoflinkagedisequilibriumatthezygoticlevel,whereasthesecondoneassumesthatthepopulationisinHWEforindividuallocialthoughlinkagedisequilibriabetweendierentlocioccur.Thedatasimulatedunderthetwoschemeswereanalyzedbyboththezygoticmodelandtraditionalgameticmodel,whichallowsthecomparisonsbetweenthetwomodels.Table 2-6 givesthepopulationandquantitativegeneticparametersusedtosimulatethedatafortherstschemebythezygoticLDmodel.ThemarkerwithtwoallelesMandmhaveallelefrequenciesofp=0:6and1p=0:4.ThetwoallelefrequenciesattheQTLareq=0:6foralleleQand1q=0:4foralleleq.ThemarkergenotypesandQTLgenotypesareassociatedatthezygoticlevel,withtwoHWDparametersgivenasDM=0:03andDQ=0:05andsixzygoticassociationparametersgivenasD1=0:01,D2=0:03,D3=0:01,andD4=0:02.TheprobabilityrelationshipbetweenthemarkerandQTLgenotypesfollowsthedisplay( 2{23 ).Inthesecondscheme,thedataweresimulatedunderHWEforboththemarkerandQTL,butallowingnonrandomassociationsbetweenthesetwoloci.ThisimpliesDM=DQ=0andD1,D2,D3,andD4eachtakinganon-zerovaluegivenas0.0385,0:0215,0:0215,and0.0185,respectively. 67

PAGE 68

2{7 ).Threelevelsofsamplesizeareassumed,n=250,500,and1000. 2-6 bythezygoticmodeland 2-7 bythegameticmodel.Thezygoticmodelprovidesreasonablygoodparameterestimatesofallthegeneticparameters.Asexpected,theallelefrequencyandHWDofthemarkercanbeverywellestimatedbecauseofno\missing"dataforthemarker.ThezygoticmodelalsogoodestimatesoftheallelefrequencyandHWDoftheQTLalthoughtheprecisionincreasewithheritabilityandespeciallysamplesize.Sixzygoticassociationparameters(D1,D2,D3,andD4)canbepreciselyestimated,withincreasingheritabilityandsamplesizeleadingtobetterestimation(Table 2-6 ).Ingeneral,thezygoticmodelperformswellinestimatinggeneticeectsoftheQTLandtheresidualvariance,butitseemsthatalargesampleand/orlargeheritabilityisrecommendedtohaveadequatelypreciseestimates.Theinuenceofincreasingheritabilityontheestimationprecisionofthesequantitativegeneticparametersislargerthanthatofincreasingsamplesize.Ifthegameticmodelisusedtoestimatethissetofnon-equilibriumdata,asdoneincurrentLDmappingstudies,allelefrequenciesatthemarkerandQTLcanwellbeestimated(Table 2-7 ).Moreover,thequantitativegeneticparameterscanbemorepreciselyestimatedbythegameticmodelthanthezygoticmodel.However,asignicantdrawbackforthegameticmodeltomanipulatethenon-equilibriumpopulationisthatitcannotcapturethe 68

PAGE 69

DataissimulatedfromanonequilibriumpopulationbyazygoticLDmodel,andisanalyzedwiththezygoticLDmodel TrueH2=0:4H2=0:2H2=0:1Value250500100025050010002505001000

PAGE 70

DataissimulatedfromanonequilibriumpopulationbyazygoticLDmodel,andisanalyzedwiththegameticLDmodel TrueH2=0:4H2=0:2H2=0:1Value250500100025050010002505001000

PAGE 71

DataissimulatedfromaequilibriumpopulationbyagameticLDmodel,andisanalyzedwiththezygoticLDmodel. TrueH2=0:4H2=0:2H2=0:1NValue250500100025050010002505001000

PAGE 72

DataissimulatedfromaequilibriumpopulationbyagameticLDmodel,andisanalyzedwiththegameticLDmodel TrueH2=0:4H2=0:2H2=0:1Value250500100025050010002505001000

PAGE 73

2-8 ).Thedatasimulatedunderthesecondschemewereanalyzedbythegameticmodel.ThismodelcanpreciselyestimatetheallelefrequenciesatthemarkerandQTL,geneticeectsoftheQTL,andresidualvariance(Table 2-9 ).However,itestimatesasinglegameticlinkagedisequilibriumbycollapsingallzygoticassociations.Itsunabilitytodiscernthedierencesofsixzygoticdisequilibriawillaecttheinterpretationandinferenceofthemarker-QTLassociation,althoughthepowerofQTLdetectionbyitsassociatedmarkerisreasonablyhighwiththegameticmodel.BycomparingtheresultsaboutanalysesofthedatasimulatedunderHWE,itcanbeseenthatthezygoticmodelactuallycoversthegameticmodel.Inotherwords,nomatterwhetheramappingpopulationisatHWEorHWD,thezygoticmodelcanalwaysbeusedtoanalyzethedata,althoughitiscomputationallymoreexpensivethanthegameticmodel. 73

PAGE 74

74

PAGE 75

3-1 illustratestheempiricalmarginaldensitiesforbodymassgrowthmeasuredattendierentages 75

PAGE 76

3-2 3-2 3-1 )isimpossibletoknow,thereisapressingneedtodevelopamorerobuststatisticalapproachforgeneticmappingthatcandealwiththedistorteddistributionofatraiteectivelyandeciently. Figure3-1. Empiricaldensityplotsformousegrowthdataat10dierentages(inweeks). Inthischapter,Iwillderivearobustapproachforgeneticmappingofcomplextraitswithempiricaldistributions,asshowninFig. 3-1 .Theintegratedsquareerrorshasbeenusedasthegoodness-of-tcriterioninnonparametricdensityestimation(Scott1992).Somepreviousstudieshaveshownthattheestimatorfromthismethodisrobusttothe 76

PAGE 77

Standardnormaldensity(A)andamixtureofthreecontrastingnormaldensities(B). modelspecication(Beran1977;Hjort1994;Basuetal.1998;Scott2001).Thismethodallowsthedeviationofaproposeddensityfunctionfromthetrueunderlyingdensity.Iwillincorporateintegratedsquareerrorsintogeneticmappinginaparametricway. 3-3 describestheprincipleoftheL2Emethod.Supposeisthespacethatcontainsallthedensityfunctionsandthetruedensityfissomewherein.Assumethatthereisanotherparametricdensityfamily'nearbythetruedensity.ThegoaloftheparametricL2Emethodistosearchfora'thatistheclosestdensityfunctiontothef.Itisplausibletoassumetheexistenceofsuchaparametricdensityfamily,becauseifthedatacollectedisreallygovernedbyanunderlyingdensity,theerrorsproducedshouldnotchangethisdensitydramatically.ConsiderastandardnormaldistributionN(0;1)with500samples.If10%samplesaredrawnwronglyfromapointmassatvalue10,leadingtooutliersornoise,thetruedistributionforthedatacollectedwillbeamixtureofastandardnormalwithaprobabilityof0.9andapointmassat10withaprobabilityof0.1.Itiseasytoverifythatthemaximumlikelihoodmethodbasedonasinglenormaldensitywillyieldabiased 77

PAGE 78

Figure3-3. GraphicillustrationoftheprincipleoftheL2Emethod.isthespaceofallpossibledensities;fisourtruedensitywherethedataaregenerated;'istheparametricfamilywewillbeworkingon. 3.3.1MappingPopulationGeneticmappingisbasedonasegregatingpopulation.Thechoiceofamappingpopulationdependsonthepurposeofmolecularstudies,thebiologicalnatureoftheorganismstudied,andtheavailabilityofmaterials.Intheprecedingchapters,IproposedaseriesofrobustmappingapproachesthatcanpreciselycapturethesegregationpatternofmarkersandQTLsinapopulation.Iusedtheprincipleoflinkagedisequilibriumtomap 78

PAGE 79

3-1 )iscalculatedastheratioofthejointmarker-QTLgenotypefrequencyoverthemarkergenotypefrequency.EachQTLgenotypehasagenotype-specicmeanj.ThecomparisonsofthesemeansamongthreedierentgenotypescandeterminewhetherandhowthisputativeQTLaectthetrait.ThetraitphenotypeofprogenyiduetotheQTLcanbeexpressedbythefollowinglinearstatisticalmodel: 79

PAGE 80

3{1 ),onebasedonthetruedensityoftheerrorterm(ei)andthesecondbasedonthetruedensityoftheobserveddata(yi).Ineachcase,anenergyfunctionEcanbedenedasfollows:E=Zj'fj2du=Z'2du2Z'fdu+Zf2du;whereucanbeeithertheerrortermeortheobserveddatay,and'issomeknownparametricdensityfunctionusedtoapproximatetheunknownobjectivetruedensityfunctionf.ThegoalistominimizeE.SincethetermRf2ducontainsonlytheobjectivedensity,whichdoesnothaveanyunknownparameters,itcanbedroppedofromtheaboveequationintermsofminimization.So,theenergyfunctionbecomesE=Z'2du2Z'fdu=Z'2du2E('):

PAGE 81

wherePNi=1'(ui)istheterminwhichthedatamaycomeintoplay.Inaddition,ifR'2ducanbeintegratedout,wewillhaveanexplicitformulatominimize.SincetheLawoflargenumbershasbeenusedintheformuladerivation,theL2Emethodisnotsuitablefordatawithasmallsamplesize. 3{1 ),therandomnessisderivedfromtheunderlyingerrorterm.Thus,itisnaturaltodirectlyworkonthedensityoftheerrortermf(e).Inacontinuouscase,anormaldensityfunction'(ej0;2)isusedtoapproximatethetrueerrordensityf(e).Thus,theenergyfunction( 3{2 )becomesEe=Z'2(e)de2 222de=1 22Zee2 22Zee2 221 221 2p 2p 81

PAGE 82

^=argmin(Ee):=argmin"1 2p 2p 2p 22# InapositionoftheQTLbetweentwomarkers,Eeisapproximatedtwice,onebytheLawoflargenumbersandtheotherbythecalculationof'(ei).However,iftheQTLislocatedatanexactmarkerposition,thiswillbedierent.Atmarkerpositions,weassumethatthemarkergenotypeisexactlytheQTLgenotype.Thereisnomixturedensityin 82

PAGE 83

2p 2p 22; whereNkisthenumberofindividualsinthemarkergroupjandP2j=0Nk=N.

PAGE 84

22#2dy=1 22"Z2Xj=0!2kje(yj)2 22dy#=1 22"p 42dy#=1 22"p 42#=1 2p 42#and,Z2Xj=0!kj'j!fdy:=2 22So,Ekd=1 2p 422p 22#Atamarkerposition, 2p 22 Mathematically,wecanndasetoff^gk2;^gk1;^gk0;^2kgforeachmarkergroupkthatminimizesEk.However,sincetheunderlyingQTLsthatgovernthetraitareactuallythesameacrossallmarkergroups,theparameterestimationsfromallmarkergroupshavetobeconsistent.Therefore,weneedtondonesetoff^2;^1;^0;^2gthatcansomehowminimizeEksimultaneously. 84

PAGE 85

(1) SinceEkdmeasuresthegoodnessoftthatthetruedensityfk(y)isapproximatedbytheknowndensityfunction,itisobviousthemoresubjectsonemarkergroupcontains,themoreaccuratelythetruedensityforthatmarkergroupcanbeapproximated.Also,largesamplesizeguaranteebetterusageoftheLawoflargenumbers.So,insummarizingalltheenergyfunctions,weneedgivemoreweighttothemarkergroupcontainingmoresubjects.Thatis,theweightforthemarkergroupkshouldincreaseasthemarkergroupsizeEkdincreases.Thus,thesummaryenergyfunctionmaylooklikethis, whereNkisthenumberofprogeniesinthemarkergroupkandh(x)isaincreasingfunctionwithrespecttox, (2) Weknowthattherstmethod,byusingthetruedensityoftheerrorterm,andthesecondmethod,byusingthetruedensityoftheobserveddata,areactuallyderivedfromthesamemodel(Equation 3{1 ).Atmarkerpositions,bothmethodsonlyapplytheLawoflargenumbers.Logically,thesetwomethodsshouldagreewitheachotheratthemarkerpositions.Fromequation( 3{6 ),Weknowthattheenergyfunctionformarkergroupkatmarkerpositionsis, 2p 22 Atmarkerpositions,pluggingEquation( 3{7 )thisintotheEquation( 3{6 )yields 2p 22#=1 2p 22 85

PAGE 86

3{4 )andEquation( 3{8 ),wehave1 2p 22=1 2p 22wecanseethatifh(Nk)=Nk=N,theabovetwoenergyfunctionsareexactlythesame.Hence,thenalformoftheestimatorsfortheunknownparametersbyusingthetruedensityoftheobserveddatais: ^=argmin(Ed)=argmin"KXi=1NkEk=N#=argmin"KXi=1NkEk#=argmin(1 2p 42)2p 22#)=argmin(1 42)#1 22) 3.3.2 ,IhaveintroducedhowtondtheL2EestimatesunderH1anditscorrespondingenergyEH1.UnderH0,thederivationisverysimilartothesituationunderH1andinfactevensimpler.UnderH0,weassumenoQTLunderliesthetraitandthus,thereisnomixturestructureinthenullmodel.Thetruedensityisapproximatedbyasinglenormaldensity,andthecorrespondingenergyfunctionsunderH0(EH0)forusing 86

PAGE 87

2p 22ThismakesperfectsensebecauseunderH0,justlikeatmarkerpositions,thereisnodierencebetweenthesetwomethods.Analogoustothelikelihoodratio(LR)teststatistics,Ideneanenergydierence(ED)teststatisticsforourhypothesistesting:ED=EH0EH1Sincemixtureofnormalsisabiggerdensityfamilythanasinglenormal,EH1isminimizedinabiggerspacethantheEH0,andthus,EH1isalwayssmallerthanEH0,i.e.EDteststatisticsisalwaysapositivevalue.AlthoughitisnoteasytoseewhatkindofasymptoticdistributionEDhas,theexactdistributionforEDisnotthatcriticalinthiscase.ThereasonisbecauseevenifthedistributionfortheEDteststatisticscanbederived,itcanonlybederivedforhypothesesataxedlocus.SincewewanttosearchforQTLacrossthewholegenomeandtheQTLpositionisunidentied,thenalglobalEDvaluewillnotfollowtheasymptoticdistributionderivedfromasinglelocus.Toovercomethelimitationsduetothefailureoftheteststatistictofollowastandardstatisticaldistribution,wecanusesimulationstudies(LanderandBotstein1989;VanOoijen1992;ChenandChen2005)andpermutationtests(ChurchillandDoerge1994;DoergeandChurchill1996;Good2000)tocalculatethecriticalthresholdvalue. 3.3.2 ,twostatisticalmethodshavebeenproposedtoimplementtheL2Eprinciple.HereextensiveMonteCarlosimulationexperimentswereperformedtoexaminethestatisticalpropertiesoftheproposedmodelsandtocomparebetweenthesetwomethods. 87

PAGE 88

3-4 showstypicalEDprolesobtainedbyusingthetruedensityofeithertheerrortermortheobserveddata.Thepeakvaluefromtheerrortermdensityalwaysoccursatamarkerposition,althoughthetrueQTLlocationisplacedrightbetweenthefthandsixthmarkers.Comparably,themethodbyusingtheobserveddatadensitycanndapeakEDvalueatthetrueQTLlocation.TheEDvaluesfrombothmethodsagreeattheexactmarkerpositionasexpected.Thisindicatesthatthemethodbyusingtheobserveddatadensityperformsbetter.Therefore,wewillusethismethodtoanalyzetherealdataset,andfromnowon,ifthereisnospecicindication,theL2EmethodbydefaultmeanstheL2Emethodbyusingthetruedensityoftheobserveddata.TofurtherinvestigatethepropertiesoftheL2Emethod,wecompareditwiththeconventionalmaximum-likelihood(MLE)basedmethod.Thesimulationsetupisalmostthesameasbeforeexceptthatthistimeweaddoutlierpointsintothesimulateddata.Theoutliersaregeneratedfromapointmassontheuppersideofthetruemixturedensity.Dierentpercentageofthenoisepoints(0,5%,10%,20%)havebeenconsideredinoursimulations,buthereweonlypresenttheresultsfromthe10%casebecauseitshowsthegeneralpoints.Foreachsimulationsetup,100replicatesarecarriedoutto 88

PAGE 89

3-2 withoutnoisepoints,Table 3-3 with10%outliersat45andTable 3-4 with10%outliersat55,andsummarizedasbelow: 1). Ifthereisnonoisepoints,bothL2EandMLEmethodscangiveoutconsistentestimators,buttheMLEestimatorshavebettereciencyasevidencedbysmallerMSE.ThisresultisdesiredasshowninAppendixA. 2). Whenthenoisepointsareaddedintothedata,theL2EestimatorsstayconsistentbuttheMLEestimatorsbecomebiased.Astheoutliersmovefurtherawayfromthetruedensity(from45to55),theL2Emethodgrowtobemoreadvantageous. 3). Anotherobservationisthatastheheritabilitybecomesmaller,thedierencebetweentheL2EandMLEmethodsarealsobecomeless.Thisisbecauseasthevalueofheritabilitygoesdown,thevariabilityofthemixturedensitygoesupandthus,therelativepositionoftheoutlierbecomecloser.Thisisconsistentwiththepoint2). 3.7 ).TheF2progenywasmeasuredfortheirbodymassat10weeklyintervalsstartingatage7days.Therawweightswerecorrectedfortheeectsofeachcovariantduetodam,littersizeatbirthandparity(Vaughnetal.,1999).Thegrowthdataofthe502micecanbeseenastheblackcurvesinFig. 4-2 .Thephenotypictraitconsideredhereisthegrowthratefromweek5toweek10,whichisdenedtobetheratioofmicebodyweightatweek10overthebodyweightatweek5.FromFig. 3-7 ,wecanthatseetheempiricaldistributionofthistraitcontainsatailattherightend,whichmaysuggestoutliersinthedata.BothL2EandMLEmethodsareappliedtoperformtheintervalmappingonthisdata.ThescanprolesinFig. 3-8 andFig. 3-9 showthatthebasicshapesofbothprolesaresimilar,i.e.ifthere 89

PAGE 90

3-5 .AlthoughtheMLEmethodcandetectmoreQTLsatthechromosomelevel,itcannotidentifyanyofthemassignicantatgenomelevel.Incontrast,L2Emethodsuccessfullydetectsonegenome-widesignicantQTL.Thisexampleshowsthedierenceofhowthetwomethodshandlethedata.TheL2Edownweightsthedatawhicharenot"good"forthettedmodel. 90

PAGE 91

91

PAGE 92

Conditionalprobabilities(!jji)ofQTLgenotypesgivenmarkergenotypesofM1andM2foraF2population.r1,r2andraretherecombinationfractionsbetweenmarkerM1andtheQTL,betweentheQTLandmarkerM2,andbetweenthetwoankingmarkers. Marker QTL genotypefrequency QQQqQq 4(1r)2 1 4(1r1)2(1r2)21 2r1r2(1r1)(1r2)1 4r21r22M1M1M2m21 2r(1r) 4(1r1)2r2(1r2)1 2r1(1r1)(12r2+2r22)1 4r21r2(1r2)M1M1m2m21 4r2 1 4(1r1)2r221 2r1r2(1r1)(1r2)1 4r21(1r2)2M1m1M2M21 2r(1r) 4r1(1r1)(1r2)21 2(12r1+2r12)r2(1r2)1 4r1(1r1)r22M1m1M2m21 2(12r+2r2) 4r1(1r1)r2(1r2)1 2(12r1+2r12)(12r2+2r22)1 4r1(1r1)r2(1r2)M1m1m2m21 2r(1r) 4r1(1r1)r221 2(12r1+2r12)r2(1r2)1 4r1(1r1)(1r2)2m1m1M2M21 4r2 1 4r21(1r2)21 2r1r2(1r1)(1r2)1 4(1r1)2r22m1m1M2m21 2r(1r) 4r21r2(1r2)1 2r1(1r1)(12r2+2r22)1 4(1r1)2r2(1r2)m1m1m2m21 4(1r)2 1 4r21r221 2r1r2(1r1)(1r2)1 4(1r1)2(1r2)2

PAGE 93

Comparisonbetweentwomethods:a.usingthetruedensityoftheerrortermandb.usingthetruedensityoftheobserveddata.Theredlineinbindicatesthepeakvalue.

PAGE 94

Nonoise N=400 H2=0.4H2=0.2H2=0.1TrueValueL2EMLEL2EMLEL2EMLE Table3-3. 10%noisepointswith=45 N=400 H2=0.4H2=0.2H2=0.1TrueValueL2EMLEL2EMLEL2EMLE

PAGE 95

10%noisepointswith=55 N=400 H2=0.4H2=0.2H2=0.1TrueValueL2EMLEL2EMLEL2EMLE

PAGE 96

RelativepositionsofthegeneticmarkersusedinthestudyofLG/JandSM/Jmice(Vaughnetal.1999). Figure3-6. Intervalmappingdemonstration 96

PAGE 97

Theempiricaldistributionofthemicegrowthratefromweek5toweek10. 97

PAGE 98

L2Escanprolefortheweek5weightmeasureofthemicedata.They-axisisEDteststatistics.Thegreenlineandtheredlinearethechromosome-wideandgenome-wide95%signicantcutopointbasedonthe1000permutations,respectively. Table3-5. SignicantQTLsdetectedbybothL2EandMLEmethodsat95%signicancelevelonbothgenome-wideandchromosome-wide. SignicantLinkagegroupsdetectedModelsGenome-wideChromosome-wide 98

PAGE 99

MLEscanprolefortheweek5weightmeasureofthemicedata.They-axisisLRteststatistics.Thegreenlineandtheredlinearethechromosome-wideandgenome-wide95%signicantcutopointbasedonthe1000permutations,respectively. 99

PAGE 100

100

PAGE 101

4.2.1NotationsConsideranF2populationofsizen,inwhicheachoftheF2progenyismeasuredforTdierenttraits.Let~yi=(yi1;:::;yiT)bethephenotypicvectorforindividuali.ItisassumedthatapleiotropicQTLwithalleleQandqaectsthesetraitsjointly.ThisQTLisbracketedbytwoankingmarkersM1withallelesM1andm1andM2withallelesM2andm2.Letr1,r2andrbetherecombinationfractionsbetweenM1andtheQTL,betweentheQTLandM2,andbetweenthetwomarkers,respectively.AlthoughQTLgenotypesarenotknown,theprobabilitywithwhichanindividualicarriesaQTLgenotypecanbeinferredfromthemarkergenotypeofthisindividualwithafunctionoftherecombinationfractions(r1,r2,andr).TheconditionalprobabilityofQTLgenotypej(j=2forQQ,1forQq,and0forqq),conditionalupononeoftheninegenotypesoftheankingmarkersaregiveninTable 3-1 .EachQTLgenotypehasagenotypicmeanvector~gj=(gj1;:::;gjT)forTtraits.ThecomparisonsofthesemeanvectorsamongthreedierentgenotypescandeterminewhetherandhowthisputativeQTLaectthetraits.Thetraitphenotypeofprogenyiduetothe 101

PAGE 102

3{1 ):~yi=~iG+~ei;where~i2f(1;0;0);(0;1;0);(0;0;1)gisanindicatorvectorthatisdenedtobe0=(0;0;1)forj=2,1=(0;1;0)forj=1and2=(1;0;0)forj=0;G=0BBBB@~gT0~gT1~gT21CCCCA,with~g2,~g1,~g0beingthemeanvectorsforthreedierentQTLgenotypes;~eif(~e)istheresidualeectofindividuali,includingtheaggregateeectofpolygenesanderroreect.Weassumethatf(~e)istheunknowntruedensityof~eiwithmean~0.Then,thedensityof~yiwouldbeamixtureoffwithmean~gj.

PAGE 103

(2)m 2e(~y~gj)01(~y~gj) 2#2d~y=1 (2)mjj"Z2Xj=0!2kje(~y~gj)01(~y~gj)d~y+2ZXi6=j!ki!kje(~y~gi)01(~y~gi)+(~y~gj)01(~y~gj) 2d~y#=1 (2)mjj"m 22Xj=0!2kj+2ZXi6=j!ki!kje(~y~gi+~gj 4#=1 (2)mjj"m 22Xj=0!2kj+2m 2Xi6=j!ki!kje(~gi~gj)01(~gi~gj) 4#and,Z2Xj=0!kj'j!fd~y:=2 (2)m 2e(~y~gj)01(~y~gj) 2So,Ek=1 2mm 2"2Xj=0!2kj+2Xi6=j!ki!kje(~gi~gj)01(~gi~gj) 42m 2#Thus,byusingtheweightedsumformulaasinequation( 3{9 ),theL2Eestimator^foramultivariatetraitis ^=argmin(E)=argmin"9Xi=1NkEk=N#=argmin"9Xi=1NkEk#=argmin(1 29Xk=1"Nk 4)#1 2NXi=12Xj=0!ije(~yi~gj)01(~yi~gj) 2) 103

PAGE 104

+et whereistheasymptoticorlimitingvalueofgwhent!1, 4-2 ),wecanseethattheF2micefollowstheS-shaped(logistic)growthcurvewell.Therefore,weuseEquation( 4{2 )totthemeanvectorforgrowth-controllingQTLgenotypej,withj=(j,j,j),inourmappingmodel.IftheparametersaredierentfordierentgenotypesattheputativeQTL,itimpliesthatthisQTLplaysaroleingoverningthedierenceofgrowthtrajectories. 104

PAGE 105

4{2 ),canbetestedbyformulatingthefollowinghypotheses:H01:210H11:NotalltheseequalitiesaboveholdTheH01statesthattherearenoQTLaectingmousegrowthcurve(thereducedmodel),whereastheH11proposesthatsuchQTLdoexist(thefullmodel).Regionaltest:HowthedetectedQTLstriggertheireectsonthedierenceofgrowthtrajectoriesduringaparticularperiod[t1,t2]canbetested.Thistestcanbebased 105

PAGE 106

3.4 .Itiscalculatedas:ED=E(ep;eq)E(bp;bq)where(e)and(b)denotetheL2Esoftheunknownparametersunderthenullandalternativehypotheses,respectively.Anempiricalapproachfordeterminingthecriticalthresholdisbasedonpermutationtests,asadvocatedbyChurchillandDoerge(1994).Byrepeatedlyshuingtherelationshipsbetweenmarkergenotypesandphenotypes,aseriesofenergydierencesarecalculated,fromthedistributionofwhichthecriticalthresholdisdetermined.Wecanalsotesttheglobaleectsofdierentgeneticcomponents,additive, 106

PAGE 107

4{1 ).Itisacomplexfunction,andisalmostimpossibletondaclosed-formsolutiontominimizeit.Thus,wewillusesomenumericalmethodssuchasNewton-RaphsonorSimplextosearchfortheL2Eestimates.However,Fornumericaloptimizationmethods,onecommonproblemisthattheyareeasytobetrappedinalocalmode,andusuallycannotguaranteedaglobalmode.Thisisalmostaunsolvableproblem.However,agoodstartingpointtobeginsearchwouldgreatlyhelp.BelowIwillshowhowtoempiricallychooseinitialvaluesforparameterestimationinaglobaltest:UnderH01,weassumealltheobserveddatacomefromthesamedistribution.Therefore,wecancalculatethemeansofthephenotypicdataateachtimepoint,andnonlinearlytitwiththeequation( 4{2 )tondtheinitialvalues(a0,b0,r0).Sincetherearethreeparameters,weneedthreeequationstosolveforthem.Choosethreetimepoints(t1;t2;t3)whichcanbestcharacterizethewholecurveshape,thenthefollowingequationscanbeestablished:a 107

PAGE 108

4-1 andFig. 4-3 ,respectively.ItisquiteinterestingthattheMLEmethodidenties6genome-widesignicantQTLs(onchromosomes4,6,7,10,11and15),whileL2Eonlyidentiestwoofthemsignicant(onchromosomes6and11).Forsimplicity,IwillcalltheQTLsidentiedsignicantbyMLEasMLEQTLs,andtheQTLsidentiedsignicantbyL2EasL2EQTLs.ThemostnotabledierencesbetweentheMLEQTLsandtheL2EQTLsareonthechromosome7,whereMLE

PAGE 109

4-2 andthecorrespondingparameterestimationsareintheTable 4-1 .Also,themeancurvesoftheL2EQTLsareplottedinFig. 4-4 (labeledas*inthegure)andthecorrespondingparameterestimationsareintheTable 4-2 .Forcomparisonpurpose,themeancurvesfortheQTLsidentiedsignicantbyMLEbutnotbyL2EarealsoplottedinFig. 4-4 withcorrespondingparametersestimatedbyL2E.ForallthesignicantQTLsdetectedbybothmethods,theQTLgenotypeeectsareintheorderofQQ>Qq>qq.Thisagreeswiththeexperimentaldesignthatthelargerinbreedmouseline(LG/J)carriesthefavorablegenesforthegrowth.SincetheL2EhasbeenshowntobemorerobustthantheMLE,andlesssensitivetothemodelmisspecication,theL2EQTLsandtheirgeneticeectsaremoretrustful.So,hereIwillonlyfocusonthediscussionofthetwoL2EQTLs,whicharealsodetectedbyMLE.Onchromosome6wheretheL2EQTLislocated,MLEprolecontainstwoalmostequalLRpeaksthatmakeitdiculttoknowwhichonecomesfromthetrueQTLsignalorwhethertherearetwonearbyQTLs.However,L2Egivesonlyonepeakandeliminatestheambiguity.Onchromosome11,L2EQTLgenotypesshowaverycleardominanteect,i.e.QQ=Qq,indicatingtheallelefromLG/JmouselineisdominateonthetheallelefromSG/Jmouseline.TheMLEcannotshowthisinformationforthesameQTL.Infact,theMLEQTLeectisquitesmallandalmostmissedthesignal.ThesedierencesbetweentheL2EandMLEindicatethatthemicedatamaynotfollowthemutlivariatenormaldistributionverywell.Furtherbiologicalexperimentssuchaspositionalcloningarestronglyrecommendedtopinpointtheexactgenesinthetwosignicantlocionchromosome6and11.FortheQTLsonchromosome4,7,10and15,whichareonlyidentiedsignicantbytheMLEmethod,weneedtobeconservativeontheinterpretationoftheirresults.Theymaybespuriousresultsfromthemisspecicationofthemodel. 109

PAGE 110

110

PAGE 111

MLEscanproleforthegrowthcurveofthemicedata.Theredlinethegenomewidecutopointgeneratedbythepermutationtest. 111

PAGE 112

ThemeancurveplotsforallsignicantQTLsdetectedbyMLE 112

PAGE 113

L2Escanproleforthegrowthcurveofthemicedata.Theredlinethegenomewidecutopointgeneratedbythepermutationtest. 113

PAGE 114

ThemeancurveplotsforallthecorrespondingQTLsdetectedbyMLEbutwiththeirparametersestimatedbyL2E.The*indicatethesignicanttwoQTLsdetectedbyL2E. 114

PAGE 115

TheparameterestimationforQTLsdetectedbyMLEmethod chromosomeQTLlocationaQQbQQrQQaQqbQqrQqaqqbqqrqq Table4-2. TheparameterestimationforQTLsdetectedbyL2Emethod chromosomeQTLlocationaQQbQQrQQaQqbQqrQqaqqbqqrqq

PAGE 116

116

PAGE 117

117

PAGE 118

(1) ExtendthisrobustapproachtoestimateepistaticinteractionsofQTLsforacomplextrait,providingaquantitativeframeworkfortestingtheroleofepistasisintraitcontrolanddevelopment; (2) Extendmymodelstoamorecomplicatedpopulationmapping,suchasthosewithfamilystructure; (3) InspiredbyaBasu'spaper(1998),wemayaddonemoreparameterintheL2Emethodtone-tunetherobustnessofthismethod; (4) Modelacomplexnetworkofgenotypebyenvironmentinteractionstotestwhetherdierentgenesactsinglyorinteractwithothergenestodetermineenvironment-dependentresponsesforacomplextrait. 118

PAGE 119

3{2 ).Undermildconditions,theL2Eparametersareconsistentandasymptoticallynormal:p @0@' @0TdxB2=Z@' @0@' @0T'dxB1=Z@' @0'dxProof:Theestimationfunctionsfor( 3{2 )isn=@E @=1 @TandB(0)=ETInthiscase,A(0)=E@ @T=0=EZ@2' @@T'dx+Z@' @@' @Tdx@2' @@T=0=Z@' @T0@' @0dxB(0)=ET=0=Z@' @0@' @0T'dxZ@' @0'dxZ@' @T0'dxThentheresultsfollow. 119

PAGE 120

22+p2e(x2)2 22+p3e(x3)2 22:TostudytheasymptoticpropertiesoftheL2Estimators,weneedtocalculatethefollowing:A=266666664R@' @12dxR@' @1@' @2dxR@' @1@' @3dxR@' @1@' @2dxR@' @1@' @2dxR@' @22dxR@' @2@' @3dxR@' @2@' @2dxR@' @1@' @3dxR@' @2@' @3dxR@' @32dxR@' @3@' @2dxR@' @1@' @2dxR@' @2@' @2dxR@' @3@' @2dxR(@' @2)2dx377777775B1=0BBBBBBB@R@' @1'dxR@' @2'dxR@' @3'dxR@' @2'dx1CCCCCCCAB2=266666664R@' @12'dxR@' @1@' @2'dxR@' @1@' @3'dxR@' @1@' @2'dxR@' @1@' @2'dxR@' @22'dxR@' @2@' @3'dxR@' @2@' @2'dxR@' @1@' @3'dxR@' @2@' @3'dxR@' @32'dxR@' @3@' @2'dxR@' @1@' @2'dxR@' @2@' @2'dxR@' @3@' @2'dxR@' @22'dx377777775Dierentiatingthedensityfunction'(x)overallparametersyield:@' @1=p1 22@' @2=p2 22@' @3=p3 22@' @2=1 22p 22(xj)2

PAGE 121

@j2dx=p2j 2Z@' @j@' @kdx=pjpkek2jk 212k2jkj6=kZ@' @j@' @2dx=Xk=2;3pjpkek2jk(jk) 8p 23 2k2jkZ@' @22dx=1 8p 2"3 43Xj=1p2j+Xj6=k2pjpk3 42k2jk+k4jkek2jk#:FormatrixB1andforj;k=1;2;3,wehaveZ@' @j'dx=pj @2'dx=1 82p @j2'dx=p2j @j@' @k'dx=pjpk @j@' @2'dx=pj 2"Xk6=j(2cjk5c2jk)pjpk+2cjk(1cjk)p2ke3c2jkljXk6=j(l2k1)Yk6=jpkec123#Z(@' @22'dx=1 12p 2Xj6=kpjpk(pj+pk)[3(k2jk1)(3k2jk1)1]e3k2jk+6p1p2p3[Xk6=j(l2j1)(l2k1)1]ec123#Thus,theasymptoticcovariancematrixfortheL2estimatorscanbeexpressedinaclosedform,althoughtheexpressionformisverycomplicated.Inasimplernormalcase, 121

PAGE 122

22:then,A=264R@' @2dxR@' @@' @2dxR@' @@' @2dxR(@' @2)2dx375=2641 4p 2003 32p 2375A1=2644p 20032p 2 226410082 @2'dxR@' @@' @2'dxR@' @@' @2'dxR(@' @2)2'dx375=1 6p 22375B1=264R@' @'dxR@' @2'dx375=26401 82p 82p 82p 64(2)3375Hence,A1(B2B1BT1)A1=A1B2A1A1B1BT1A1=26482

PAGE 123

2dt=r aZx2eax2dx=1 2a1 2dt=1 2ap 2ap 4a21 2dt=3 4a2p 4a2p 123

PAGE 124

2;Ze3 2x2dx=r 2x2dx=1 3r 2x2dx=1 3r 3.5 forfurtherdetails. 124

PAGE 128

1. Alberch,P.,S.J.Gould,G.F.OsterandD.B.Wake,1979Sizeandshapeinontogenyandphylogeny.Paleobiology5:296-317. 2. Allison,D.B.,1997Transmission-disequilibriumtestsforquantitativetraits.AmericanJournalofHumanGenetics60:676-690. 3. AtchleyW.R.andJ.Zhu,1997Developmentalquantitativegenetics,conditionalepigeneticvariabilityandgrowthinmice,Genetics147:765-776 4. AnderssonL,C.S.Haley,H.Ellegren,S.A.Knott,M.Johansson,K.Andersson,L.Andersson-Eklund,I.Edfors-Lilja,M.Fredholm,I.Hansson,J.Hakansson,K.Lundstrom,1994Geneticmappingofquantitativetraitlociforgrowthandfatnessinpigs.Science263:1771-1774. 5. Barton,N.H.,andK.S.Gale,1993GeneticanalysisofhybridzonesinHybridZonesandtheEvolutionaryProcess.OxfordUniversityPress,Oxford. 6. BasuA.,I.R.Harris,N.L.HjortandM.C.Jones,1998Robustandecientestimationbyminimisingadensitypowerdivergence.Biometrika85:549-559. 7. Beran,R.,1977RobustlocationEstimates.TheAnnalsofStatistics5:431-444. 8. Bennett,J.H.,andF.E.Binet,1956AssociationbetweenMendelianfactorswithmixedselngandrandommating.Heredity10:51-55. 9. Broman,K.W.,2005Thegenomesofrecombinantinbredlines.Genetics169:1133-1146. 10. BrownG.R.,D.L.Bassoni,G.P.Gill,J.R.Fontana,N.C.Wheeler,R.R.Megraw,M.F.Davis,M.M.Sewell,G.A.TuskanandD.B.Neale,2003Identicationofquantitativetraitlociinuencingwoodpropertytraitsinloblollypine(PinustaedaL.).III.QTLvericationandcandidategenemapping.Genetics164:1537-1546. 11. Camp,N.J.,1998Genomewidetransmission/disequilibriumtesting:Aconsiderationofthegenotyperelativerisksatdiseaseloci.AmericanJournalofHumanGenetics61:1424-1430. 12. Charlesworth,B.,1991Theevolutionofsexchromosomes.Science251:1030-1033. 13. Chen,Z.andH.Chen,2005OnsomestatisticalaspectsoftheintervalmappingforQTLdetection.StatisticaSinica15:909-925. 14. Cheverud,J.M.,E.J.Routman,F.A.M.Duarte,B.vanSwinderenandK.Cothran,1996Quantitativetraitlociformurinegrowth.Genetics142:1305-1319. 15. ChurchillG.A.andR.W.Doerge,1994EmpiricalThresholdvaluessforquantitativetraitmapping.Genetics138:963-971,. 128

PAGE 129

CorvaP.M.andJ.F.Medrano,2001Quantitativetraitloci(QTLs)mappingforgrowthtraitsinthemouse:Areview,Genet.Sel.Evol.22:105-132. 17. Daniels,M.J.andM.Pourahmadi,2002Bayesiananalysisofcovariancematricesanddynamicmodelsforlongitudinaldata.Biometrika89,553-566. 18. Darvasi,A.,1998Experimentalstrategiesforthegeneticdissectionofcomplextraitsinanimalmodels.NatureGenetics18,19-24. 19. Doerge,R.W.andG.A.Churchill,1996Permutationtestformultiplelociaectingaquantitativecharacter.Genetics142:285-294. 20. Du,F.X.,P.Sorensen,G.Thaller,I.Hoeschele,2002Jointlinkagedisequilibriumandlinkagemappingofquantitativetraitloci.Proceedingsofthe7thWorldCongressGeneticsAppliedtoLivestockProduction32:661-668. 21. Emebiri,L.C.,M.E.Devey,A.C.MathesonandM.U.Slee,1998Age-relatedchangesintheexpressionofQTLsforgrowthinradiatapineseedlings.Theor.Appl.Genet.97:1053-1061. 22. Frary,A.,T.C.Nesbitt,A.Frary,S.Grandillo,E.vanderKnaap,B.Cong,J.P.Liu,J.Meller,R.Elber,K.B.Alpert,andS.D.Tanksley,2000Aquantitativetraitlocuskeytotheevolutionoftomatofruitsize.Science289:85-88. 23. FarnirF.,B.Grisart,W.CoppietersandJ.Riquetetal,2002SimultaneousminingoflinkageandLDtonemapQTLinoutbreedhalf-sibpedigrees:revisitingthelocationofaQTLwithmajoreectonmilkproductiononbovinechromosome14.Genetics161:275-287. 24. FisherR.A.,1918ThecorrelationbetweenrelativesonthesuppositionofMendelianinheritance.TransRoySocEdinb52:399-433. 25. Fulker,D.W.andCardon,L.R.,Asib-pairapproachtointervalmappingofquantitativetraitloci.AmericanJournalofHumanGenetics54:1092-1103. 26. Georges,M.,Mapping,nemapping,andmoleculardissectionofquantitativetraitlociindomesticanimals.AnnualReviewofGenomicsandHumanGenetics8:131-162. 27. GomesI.,A.Collins,C.Lonjou,N.S.Thomas,J.Wilkinson,M.Watson,N.Morton,1999Hardy-Weinbergqualitycontrol.AnnHumGenet63:535538 28. GoodP.,2000Permutationtests:apracticalguidetoresamplingmethodsfortestinghypotheses.Springer-VerlagNewYork,Inc. 29. Gould,S.J.,1977OntogenyandPhylogeny.HarvardUniversityPress,Cambridge,MA. 129

PAGE 130

Greenwood,P.E.,1996Aguidetochi-squaredtesting.NewYork:JohnWiley&Sons. 31. Haldane,J.B.S.,1949Theassociationofcharactersasaresultofinbreedingandlinkage.Ann.Eugen15:15-23. 32. HaleyC.andS.Knott,1992Asimpleregressionmethodformappingquantitativetraitlociinlinecrossesusingankingmarkers.Heredity69:315-324. 33. Hardy,G.H.,1908Mendelianproportionsinamixedpopulation.Science28:49-50. 34. HartwellL.,L.Hood,M.L.Goldberg,A.Reynolds,L.M.SilverandR.C.Veres,2006Genetics:FromGenestoGenomes.McGraw-HillCompanies. 35. Hedrick,P.W.,1987Gameticdisequilibriummeasures:proceedwithcaution.Genetics117:331-341. 36. Hjort,N.L.,1994MinimumL2androbustKullback-Leiblerestimation.Proceedingsofthe12thPragueConference:102-105. 37. HoescheleI.andP.Vanranden,1993Bayesiananalysisoflinkagebetweengeneticmarkersandquantitativetraitloci.I.Priorknowledge.Theoret.Appl.Genet.85:953-960. 38. HoskingL.,S.Lumsden,K.Lewis,A.Yeo,L.McCarthy,A.Bansal,J.Riley,I.Purvis,C.F.Xu,2004DetectionofgenotypingerrorsbyHardy-Weinbergequilibriumtesting.EurJHumGenet12:395399 39. Hurvich,C.MandTsai,C.L.,1989Regressionandtimeseriesmodelselectioninsmallsamples.Biometrika76:297-307. 40. Kao,C.H.,Z.B.Zeng,andR.D.Teasdale,1999Multipleintervalmappingforquantitativetraitloci.Genetics152:1203-1216. 41. KaoC.H.andZ.B.Zeng,2002Modelingepistasisofquantitativetraitlociusingcockerham'smodel.Genetics160:1243-1261. 42. Kirkpatrick,M.,andN.Heckman,1989Aquantitativegeneticmodelforgrowth,shape,reactionnorms,andotherinnite-dimensionalcharacters.J.Math.Biol.27:429-450. 43. Knott,S.A.andC.S.Halehyt,1992Maximumlikelihoodmappingofquantitativetraitlociusingfull-Sibfamilies.Genetics132:1211-1222. 44. Knott,S.A.,L.Marklund,C.S.Haley,K.Andersson,W.Davies,H.Ellegren,M.Fredholm,I.Hansson,B.Hoyheim,K.Lundstrom,M.MollerandL.Andersson,1998MultiplemarkermappingofquantitativetraitlociinacrossbetweenoutbredwildboarandLargeWhitepigs.Genetics149:1069-1080. 45. Kruglyak,L.,1999Geneticisolates:separatebutequal?PNAS96:1170-1172. 130

PAGE 131

Jansen,R.C.,1993Intervalmappingofmultiplequantitativetraitloci.Genetics135:205-211. 48. Jansen,R.C.andP.Stam,1994Highresolutionofquantitativetraitsintomultiplelociviaintervalmapping.Genetics136:1447-1455. 48. Jansen,R.C.,andP.Stam.1994Highresolutionmappingofquantitativetraitsintomultiplelociviaintervalmapping.Genetics136:1447-1455. 49. Li,C.B.,A.L.ZhouandT.Sang,2006Ricedomesticationbyreducingshattering.Science311:1936-1939. 50. LiuT.,R.J.Todhunter,Q.Lu,L.Schoettinger,H.Y.Li,R.C.Littell,N.Burton-Wurster,G.M.Acland,G.LustandR.L.Wu,2006ModelingExtentandDistributionofZygoticDisequilibrium:ImplicationsforaMultigenerationalCaninePedigree.Genetics174:439-453. 51. Liu,T.,R.J.Todhunter,S.Wu,W.Hou,R.Mateescu,Z.Zhang,N.I.Burton-Wurster,G.M.Acland,G.LustandR.L.Wu,2007aArandommodelformappingimprintedquantitativetraitlociinastructuredpedigree:Animplicationformappingcaninehipdysphasia.Genomics90:276-284. 52. Liu,T.,2007bBayesianfuncationalmappingofcomplexdynamictraits.UniversityofFloridaElectronicDissertation. 53. Lander,E.andD.Botstein,1989MappingMendelianfactorsunderlyingquantitativetraitsusingRFLPlinkagemaps.Genetics,121:185-199. 54. Lewontin,R.C.,1964Theinteractionofselectionandlinkage.Generalconsiderations,heteroticmodels.Genetics49:49-67. 55. Lewontin,R.C.,1988Onmeasuresofgameticdisequilibrium.Genetics120:849-852. 56. LinD.Y.,D.Zeng,2006Likelihood-basedinferenceonhaplotypeeectsingeneticassociationstudies(withdiscussion).JournaloftheAmericanStatisticalAssociation,101:89-118. 57. Lou,X.Y.andG.Casella,R.J.Todhunter,M.C.K.Yang,R.L.Wu,2005Ageneralstatisticalframeworkforunifyingintervalandlinkagedisequilibriummapping:towardhigh-resolutionmappingofquantitativetraits.JournaloftheAmericanStatisticalAssociation100:158-172. 58. Lynch,M.andB.Walsh,1998Geneticsandanalysisofquantitativetraits.SinauerAssociates,Sunderland,MA. 59. Ma,C.X.,G.CasellaandR.L.Wu,2002Functionalmappingofquantitativetraitlociunderlyingthecharacterprocess:Atheoreticalframework.Genetics161:1751-1762. 131

PAGE 132

Ma,C.X.,Q.B.Yu,A.Berg,D.Drost,E.Novaes,G.F.Fu,J.S.Yap,A.X.Tan,M.Kirst,Y.H.CuiandR.L.Wu,2007APleiotropicModelforMappingPhenotypicPlasticityofaCountTrait.Genetics. 61. Mackay,T.F.C.,2001QuantitativetraitlociinDrosophila.NatureReviewsGenetics2:11-20. 62. McLachlan,G.J.andD.Peel,2002Finitemixturemodel.NewYork,JohnWiley&Sons,Inc. 63. McRae,A.F.,J.C.Mcewan,K.G.Dodds,T.Wilson,andA.M.Crawfordetal,2002Linkagedisequilibriumindomesticsheep.Genetics160:1113-1122. 64. Mott,R.,C.J.Talbot,M.G.Turri,A.C.CollinsandJ.Flint,2000Amethodfornemappingquantitativetraitlociinoutbredanimalstocks.PNAS97:12649-12654. 65. Nuzhdin,S.V.,E.G.Pasyukova,C.L.Dilda,Z-B.ZengandT.F.C.Mackay,1997Sex-specicquantitativetraitlociaectinglongevityinDrosophilamelanogaster.Proc.Natl.Acad.Sci.USA94:9734-9739. 66. Paterson,A.H.,E.S.Lander,J.D.Hewitt,S.PetersonandS.E.Lincolnetal,1988ResolutionofquantitativetraitsintoMendelianfactorsbyusingacompletelinkagemapofrestrictionfragmentlengthpolymorphisms.Nature335:721-726. 67. Paterson,A.H.,2006Leangthroughthegenomesofourmajorcropplants:strategiesforcapturinguniqueinformation.NatureReviewsGenetics7:174-184. 68. Peltonen,L.andV.A.McKusick,2001Genomicsandmedicine:Dissectinghumandiseaseinthepostgenomicera.Science291:1224-1229. 69. Pletcher,S.D.andC.J.Geyer,1999Thegeneticanalysisofage-dependenttraits:modelingthecharacterprocess.Genetics153:825-835. 70. Pletcher,S.D.andF.Jarezic,2002Generalizedcharacterprocessmodels:estimatingthegeneticbasisoftraitsthatcannotbeobservedandthatchangewithageorenvironmentalconditions.Biometrics58:157-162. 71. Pourahmadi,M.,1999Jointmean-covariancemodelswithapplicationstolongitudinaldata:Unconstrainedparameterisation.Biometrika86:677-690. 72. Pourahmadi,M.,2000Maximumlikelihoodestimationofgeneralisedlinearmodelsformultivariatenormalcovariancematrix.Biometrika87:425-443 73. Rabinowitz,D.,1997Atransmissiondisequilibriumtestforquantitativetraitloci.HumanHeredity47:342-350. 74. Rafalski,A.,2002Applicationsofsinglenucleotidepolymorphismsincropgenetics.CurrentOpinioninPlantBiology5:94-100. 132

PAGE 133

Reyes-ValdejsM.H.andD.M.Stelly,1995Amaximumlikelihoodalgorithmforgenomemappingofcytogeneticlocifrommeioticcongurationdata.Proc.Natl.Acad.Sci.92:9824-9828. 76. SalviS.andR.Tuberosa,2005TocloneornottocloneplantQTLs:presentandfuturechallenges.TrendsinPlantScience10:297-304. 77. Sax,K.,1923Theassociationofsizedierencewithseed-coatpatternandpigmentationinPhaseolusvulgaris.Genetics8:552-560. 78. Schwarz,G.,1978Estimatingthedimensionofamodel.TheAnnalsofStatistics6:461-464. 79. Scott,D.W.,1992Multivariatedensityestimation:theory,practiceandvisualization.JohnWiley,NewYork. 80. Scott,D.W.,2001Parametricstatisticalmodelingbyminimumintegratedsquareerror.Technometrics43:273-285. 81. Siegmund,D.andB.Yakir,2007Thestatisticsofgenemapping.Springer-VerlagNewYork,Inc. 82. Simpson,S.P.,1989Detectionoflinkagebetweenquantitativetraitlociandrestrictionfragmentlengthpolymorphismsusinginbredlines.TheoretcialappliedGentics77:815-819. 83. Spelman,R.J.,W.Coppieters,L.Karim,J.K.vanArendonkandH.Bovenhuis,1996QuantitativetraitlocianalysisforvemilkproductiontraitsonchromosomesixintheDutchHolstein-Friesianpopulation.Genetics144:1799-1808. 84. Stuber,C.W.,S.E.Lincoln,D.W.Wol,T.HeletjarisandE.S.Lander,1992Identicationofgeneticfactorscontributingtoheterosisinahybridfromtwoelitemaizeinbredlinesusingmolecularmarkers.Genetics132:823-839. 85. TaiJ.J.,1989ApplicationofBayesiandecisionproceduretotheinferenceofgeneticlinkage.J.Am.Stat.Assoc.84:669-673. 86. Tanksley,S.D.,1993Mappinggenes.Annu.Rev.Genet.27:205-233. 87. VanOoijen,J.W.,1992Accuracyofmappingquantitativetraitlociinautogamousspecies.TheoriticalandAppliedGenetics84:803-811. 88. Verhaegen,D.,C.Plomion,J.M.Gion,M.PoitelandP.Costa,1997QuantitativetraitdissectionanalysisinEucalyptususingRAPDmarkers.1.DetectionofQTLininterspecichybridprogeny,stabilityofQTLexpressionacrossdierentages.Theor.Appl.Genet.95:597-608. 89. VonBertalany,L.,1957Quantitativelawsinmetabolismandgrowth.Q.Rev.Biol.32:217-231. 133

PAGE 134

Wahlund,S.,1928ZusammensetzungvonPopulationundKorrelationserscheinungvomStandpunktderVererbungslehreausbetrachtet.Hereditas11:65106 91. Wall,J.D.andJ.K.Pritchard,2003Haplotypeblocksandlinkagedisequilibriuminthehumangenome.NatureReviewsGenetics4:587-597. 92. Walling,G.A.,P.M.Visscher,L.Andersson,M.F.Rothschild,L.Wang,G.Moser,M.A.Groenen,J.P.Bidanel,S.Cepica,A.L.Archibald,H.Geldermann,D.J.deKoning,D.MilanandC.S.Haley,2000Combinedanalysesofdatafromquantitativetraitlocimappingstudies:chromosome4eectsonporcinegrowthandfatness.Genetics155:1369-1378. 93. Weinberg,W.,1908berdennachweisdervererbungbeimmenschen.Jahresh.Vereinf.vaterl.Naturk.Wurttem64:368-382. 94. Weir,B.,1996GeneticdataanalysisII.SinauerAssociates,Sunderland,MA. 95. WeissK.M.andA.G.Clark,2002Linkagedisequilibriumandthemappingofcomplexhumantraits.TrendsGenet18:19-24. 96. Weiss,L.A.,G.Kosova,R.J.Delahanty,L.Jiang,E.H.Cook,C.OberandJ.S.Sutclie,2006VariationinITGB3isassociatedwithwholebloodserotoninlevelandautismsusceptibility.EurJHumGenet14:923-31. 97. Weller,J.I.,1986Maximumlikelihoodtechniquesforthemappingandanalysisofquantitativetraitlociwiththeaidofgeneticmarkers.Biometrics42:627-640. 98. Weller,J.I.,1987MappingandanalysisofquantitativetraitlociinLocyper-sicon(tomato)withtheaidofgeneticmakersusingapproximatemaximumlikelihoodmethods.Heredity59:413-421. 99. West,G.B.,J.H.BrownandB.J.Enquist,2001Ageneralmodelforontogeneticgrowth.Nature413:628-631. 100. Wittke-ThompsonJ.K.,A.PluzhnikovA,N.J.Cox,2005RationalinferencesaboutdeparturesfromHardy-Weinbergequilibrium.AmJofHumGenet76:967-968. 101. Wu,R.L.,1998Thedetectionofplasticitygenesinheterogeneousenvironments.Evolution52:967-977. 102. WU,R.L.andZ.B.Zeng,2001Jointlinkageandlinkagedisequilibriummappinginnaturalpopulations.Genetics157:899-909 103. Wu,R.L.,X.Y.Lou,C.X.Ma,X.L.Wang,B.A.Larkins,andG.Casella,2002aAnimprovedgeneticmodelgenerateshigh-resolutionmappingofQTLforproteinqualityinmaizeendosperm.PNAS99:11281-11286. 104. Wu,R.L.,C.X.Ma,andG.Casella,2002bAbivalentpolyploidmodelforlinkageanalysisinoutcrossingtetraploidspecies.TheoreticalPopulationBiology62:129-151. 134

PAGE 135

Wu,R.L.,C.X.Ma,R.C.Littell,S.S.Wu,T.M.Yin,M.Huang,M.WangandG.Casella,2002cAlogisticmixturemodelforcharacterizinggeneticdeterminantscausingdierentiationingrowthtrajectories.Genet.Res.19:35-245. 106. Wu,R.L.,C.X.Ma,W.ZhaoandG.Casella,2003Functionalmappingofquantitativetraitlociunderlyinggrowthrates:Aparametricmodel.Physiol.Genomics14:241-249. 107. Wu,R.L.,C.X.Ma,M.Lin,Z.H.WangandG.Casella,2004aFunctionalmappingofgrowthquantitativetraitlociusingatransform-both-sideslogisticmodel.Biometrics60:729-38. 108. Wu,R.L.,C.X.Ma,M.LinandG.Casella,2004bAgeneralframeworkforanalyzingthegeneticarchitectureofdevelopmentalcharacteristics.Genetics166:1541-1551. 109. Wu,R.L.,C.X.MaandG.Casella,2007Statisticalgeneticsofquantitativetraits:linkage,mapandQTL.Springer-VerlagNewYork,Inc. 110. Wu,S.,J.YangandR.L.Wu,2007Semiparametricfunctionalmappingofquantitativetraitlocigoverninglong-termHIVdynamics.Bioinformatics,23:i569-i576. 111. Wu,W.B.andM.Pourahmadi,2003Nonparametricestimationoflargecovariancematricesoflongitudinaldata.Biometrika90:831-844. 112. Wu,W.R.,W.M.Li,D.Z.Tang,H.R.LuandA.J.Worland,1999Time-relatedmappingofquantitativetraitlociunderlyingtillernumberinrice.Genetics151:297-303. 113. Xu,S.andW.R.Atchley,1995Arandommodelapproachtointervalmappingofquantitativetraitloci.Genetics141:1189-1197. 114. Xu,S.,2003Estimatingpolygeniceectsusingmarkersoftheentiregenome.Genetics163:789-801. 115. YangJ.,2006Nonparametricfunctionalmappingofquantitativetraitloci.PhDDissertationatUniversityofFlorida. 116. YangR.C.,2000Zygoticassociationsandmultilocusstatisticsinanonequilibriumdiploidpopulation.Genetics155:1449-1458. 117. YangR.C.,2002Analysisofmultilocuszygoticassociations.Genetics161:435-45. 118. ZengZ.B.,1994Precisionmappingofquantitativetraitloci.Genetics136:1457-1468. 135

PAGE 136

Zeng,Z.B.,J.Liu,L.F.Stam,L.F.,C.H.Kao,J.M.Mercer,C.C.Laurie,2000Geneticarchitectureofamorphologicalshapedierencebetweentwodrosophilaspecies.Genetics154:299-310. 120. Zhang,Q.,D.Boichard,I.Hoeschele,C.Ernst,A.Eggen,B.Murkve,M.Pster-Genskow,L.A.Witte,F.E.Grignola,P.Uimari,G.ThallerandM.D.Bishop,1998Mappingquantitativetraitlociformilkproductionandhealthofdairycattleinalargeoutbredpedigree.Genetics149:1959-1973. 136

PAGE 137

SongWuwasborninJingzhou,abeautifulmid-sizedcitynearbyChangjiangRiver.ThecityislocatedatthemiddlepartofChina,fullofhistoricalsites.In1996,SonglefthishometownandenrolledintheUniversityofScience&TechnologyofChinainHeifei,Anhui.Fiveyearslater,heobtainedhisbachelor'sdegreeinbiologicalsciences.Rightafterhiscollegegraduationin2001,SongwasadmittedintoaninterdisciplinaryPhDprogramintheCollegeofMedicineattheUniversityofFlorida(Gainesville,FL,USA).HehadbeenaPh.D.candidateintheconcentrationofMolecularCellBiologyforthreeyears.Duringthatperiod,Songlearnedagreatdealaboutthemolecularmechanismsofgeneregulation,whichlaidasolidfoundationforhisfutureresearchwork.Motivatedbyhisstronginterestsinstatisticalgenetics,SongjoinedtheDepartmentofStatisticsattheUniversityofFloridain2005tostarthistrainingonstatistics.Songobtainedhismaster'sdegreeinstatisticsin2006,andhisPhDdegreeinAugustof2008.SonghasworkedintheDepartmentofBiostatisticsasafacultymembersinceAugustof2008.Song'sresearchgoalistobuildabridgeacrossdiversesciences. 137