<%BANNER%>

Nonparametric Covariance Estimation in Functional Mapping of Complex Dynamic Traits

Permanent Link: http://ufdc.ufl.edu/UFE0022595/00001

Material Information

Title: Nonparametric Covariance Estimation in Functional Mapping of Complex Dynamic Traits
Physical Description: 1 online resource (113 p.)
Language: english
Creator: Yap, John
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: covariance, functional, likelihood, mapping, mixture, nonparametric, penalized, qtl, quantitative, trait
Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: One of the fundamental objectives in agricultural, biological and biomedical research is the identification of genes that control the developmental pattern of complex traits, their responses to the environment, and the way these genes interact in a coordinated manner to determine the final expression of the trait. More recently, a new statistical framework, called functional mapping, has been developed to identify and map quantitative trait loci (QTLs) that determine developmental trajectories by integrating biologically meaningful mathematical models of trait progression into a mixture model for unknown QTL genotypes. Functional mapping has emerged to be a powerful statistical tool for mapping QTLs controlling the responsiveness (reaction norm) of a trait to developmental and environmental signals. From a statistical perspective, functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits is virtually a joint mean-covariance likelihood model. Appropriate choices of the model for the mean and covariance structures are of critical importance to statistical inference about QTL locations and actions/interactions. While a battery of statistical and mathematical models have been proposed for mean vector modeling, the analysis of covariance structure has been mostly limited to parametric structures like autoregressive one (AR(1)) or structured antedependence (SAD) model. In functional mapping of reaction norms that respond to two environmental signals, a model, expressed as a Kronecker product of two AR(1) structures, has been proposed to test differences of the genetic control of responses to different environments. For practical longitudinal data sets, parametric modeling may be too simple to capture the complex pattern and structure of the covariance. There is a pressing need to develop a robust approach for modeling any possible structure of longitudinal covariance, ultimately broadening the use of functional mapping. Our study proposes a nonparametric covariance estimator in functional mapping of quantitative trait locus. We adopt the Huang et al. approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized positive-definite covariance estimator is obtained using a normal penalized likelihood with an L2 penalty. This approach is embedded within the mixture likelihood framework of functional mapping by using a reparameterized version of the derivative of the log-likelihood. We extend the idea of functional mapping to model the covariance structure of interaction effects between the two environmental signals in a non-separable way. The extended model allows the quantitative test of several fundamental biological questions. Is there a pleiotropic QTL that regulates genotypic responses to different environmental signals? What is the difference in the timing and duration of QTL expression between environment-specific responsiveness? How is an environment-dependent QTL regulated by a development-related QTL? We performed various simulation studies to reveal the statistical properties of the new models and demonstrate the advantages of the proposed estimator. By analyzing real examples in genetic studies, we illustrated the utilization and usefulness of the methodology. The new methods will provide a useful tool for genome-wide scanning for the existence, distribution and interactions of QTLs underlying a dynamic trait important to agriculture, biology and health sciences.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by John Yap.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Wu, Rongling.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022595:00001

Permanent Link: http://ufdc.ufl.edu/UFE0022595/00001

Material Information

Title: Nonparametric Covariance Estimation in Functional Mapping of Complex Dynamic Traits
Physical Description: 1 online resource (113 p.)
Language: english
Creator: Yap, John
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: covariance, functional, likelihood, mapping, mixture, nonparametric, penalized, qtl, quantitative, trait
Statistics -- Dissertations, Academic -- UF
Genre: Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: One of the fundamental objectives in agricultural, biological and biomedical research is the identification of genes that control the developmental pattern of complex traits, their responses to the environment, and the way these genes interact in a coordinated manner to determine the final expression of the trait. More recently, a new statistical framework, called functional mapping, has been developed to identify and map quantitative trait loci (QTLs) that determine developmental trajectories by integrating biologically meaningful mathematical models of trait progression into a mixture model for unknown QTL genotypes. Functional mapping has emerged to be a powerful statistical tool for mapping QTLs controlling the responsiveness (reaction norm) of a trait to developmental and environmental signals. From a statistical perspective, functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits is virtually a joint mean-covariance likelihood model. Appropriate choices of the model for the mean and covariance structures are of critical importance to statistical inference about QTL locations and actions/interactions. While a battery of statistical and mathematical models have been proposed for mean vector modeling, the analysis of covariance structure has been mostly limited to parametric structures like autoregressive one (AR(1)) or structured antedependence (SAD) model. In functional mapping of reaction norms that respond to two environmental signals, a model, expressed as a Kronecker product of two AR(1) structures, has been proposed to test differences of the genetic control of responses to different environments. For practical longitudinal data sets, parametric modeling may be too simple to capture the complex pattern and structure of the covariance. There is a pressing need to develop a robust approach for modeling any possible structure of longitudinal covariance, ultimately broadening the use of functional mapping. Our study proposes a nonparametric covariance estimator in functional mapping of quantitative trait locus. We adopt the Huang et al. approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized positive-definite covariance estimator is obtained using a normal penalized likelihood with an L2 penalty. This approach is embedded within the mixture likelihood framework of functional mapping by using a reparameterized version of the derivative of the log-likelihood. We extend the idea of functional mapping to model the covariance structure of interaction effects between the two environmental signals in a non-separable way. The extended model allows the quantitative test of several fundamental biological questions. Is there a pleiotropic QTL that regulates genotypic responses to different environmental signals? What is the difference in the timing and duration of QTL expression between environment-specific responsiveness? How is an environment-dependent QTL regulated by a development-related QTL? We performed various simulation studies to reveal the statistical properties of the new models and demonstrate the advantages of the proposed estimator. By analyzing real examples in genetic studies, we illustrated the utilization and usefulness of the methodology. The new methods will provide a useful tool for genome-wide scanning for the existence, distribution and interactions of QTLs underlying a dynamic trait important to agriculture, biology and health sciences.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by John Yap.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Wu, Rongling.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022595:00001


This item has the following downloads:


Full Text

PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

IrealizedthatIhavebeeninschoolformostofmylifeandthisdissertationistheculminationofmyformaleducation-butcertainlynottheendtolearning.Iwouldliketothankeveryonewhohascontributedtotheaccumulationofmyknowledgeandhoningofmyskills,thosewhohavehelpedmeinallaspectsofmycareer,andallwhohaveaectedmylife.Inparticular,thankstoThepeopleinmyearlyyearsofeducation:MabelNakasas,myrsttutor,forhertirelesseortevenwhenIwasdaydreamingorfallingasleepwhileshewasteachingmeMath;RicoSantoswhogaveustheopportunitytobetterourMathskills;Mr.andMrs.Yebanforalltheirsupportandmentorship;Dr.AurelloRamos,Jr.forgivingmeajobatLSC;Dr.AugustoHermosillaforallhispreciouspiecesofadviceregardingmycareer;allmycolleaguesattheAteneodeManilaUniversityMathDepartmentforalltheirfriendshipandsupport.Allmyrecommenders:Dr.JoseMarasigan,Dr.ReginaldMarcelo,andDr.GerrySalas(forinitialadmissiontograduateschool);Dr.StephenAgard,Dr.DennisCook,Dr.ChrisBingham,andDr.JohnBaxter(foradmissiontothePh.D.programinStatisticsattheUniversityofFlorida);Dr.RonglingWu,Dr.JamesHobert,Dr.MarkYang,andDr.WendyLondon(forjobapplications).Dr.WendyLondonforgivingmetheopportunitytoworkatCOGandlearnaboutchildren'scancerandmyCOGcolleaguesPatrickMcGrady,ChenguangWang,andStephenLinda.Mycolleagues,ocematesandfriendsintheStatisticsDepartmentatUF:AixinTanwhohashelpedmealotinmystatisticscareer,SongWuforallhishelpinstatistical 4

PAGE 5

5

PAGE 6

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 8 LISTOFFIGURES .................................... 10 ABSTRACT ........................................ 11 CHAPTER 1INTRODUCTION .................................. 13 1.1BasicGeneticsandQTLMapping ....................... 14 1.1.1Terminology ............................... 14 1.1.2ExperimentalCrosses .......................... 15 1.1.3LinkageandMarkers .......................... 16 1.1.4IntervalMapping ............................ 17 1.2FunctionalMappingofQTL .......................... 20 1.2.1ModelFormulation ........................... 20 1.2.2ParameterEstimationviatheEMAlgorithm ............. 23 1.2.3HypothesisTests ............................ 25 1.3OtherQTLMappingModels .......................... 26 1.4Goals ....................................... 28 2NONPARAMETRICCOVARIANCEESTIMATIONINFUNCTIONALMAPPINGOFQTL ................................. 30 2.1Introduction ................................... 30 2.2CovarianceEstimation ............................. 31 2.2.1ModiedCholeskyDecompositionandRegressionInterpretation .. 31 2.2.2RegularizedCovarianceEstimators .................. 33 2.2.3RidgeRegressionandLASSO ..................... 35 2.2.4PenalizedLikelihood .......................... 38 2.3CovarianceEstimationinFunctionalMapping ................ 41 2.3.1ComputingthePenalizedLikelihoodEstimates ............ 41 2.3.2FromEMtoECMAlgorithm ..................... 44 2.3.3SelectionofTuningParameter ..................... 46 2.4NumericalResults ................................ 46 2.4.1Simulations ............................... 46 2.4.2RealDataAnalysis ........................... 56 2.5SummaryandDiscussion ............................ 62 6

PAGE 7

........................... 64 3.1Introduction ................................... 64 3.2FunctionalMappingofReactionNormstoMultipleEnvironmentalSignals 66 3.2.1Likelihood ................................ 68 3.2.2MeanandCovarianceModels ..................... 69 3.2.3HypothesisTests ............................ 70 3.3Spatio-temporalCovarianceFunctions .................... 71 3.3.1Introduction ............................... 71 3.3.2BasicIdeas,Notation,andAssumptions ................ 72 3.3.3SeparableCovarianceStructures .................... 73 3.3.4NonseparableCovarianceStructures .................. 75 3.3.4.1SpectralmethodbyCressieandHuang(1999) ....... 75 3.3.4.2MonotonefunctionmethodbyGneiting(2002) ...... 77 3.4Simulations ................................... 78 3.5SummaryandDiscussion ............................ 90 4CONCLUDINGREMARKS ............................. 93 4.0.1Summary ................................. 93 4.0.2FutureDirections ............................ 94 APPENDIX ADERIVATIONOFEMALGORITHMFORMULAS ............... 97 BDERIVATIONOFEQUATION2-9 ......................... 99 CMINIMIZATIONOF2-33 .............................. 100 DDEFINITIONOFKRONECKERPRODUCT ................... 102 EDERIVATIONOFEQUATION3-20 ........................ 103 REFERENCES ....................................... 104 BIOGRAPHICALSKETCH ................................ 113 7

PAGE 8

Table page 1-1Conditionalgenotypeprobabilityinabackcross .................. 19 1-2ConditionalgenotypeprobabilityinanF2 19 2-1AveragedQTLposition,meancurveparameters,maximumlog-likelihoodratios(maxLR),entropyandquadraticlossesandtheirstandarderrors(giveninparentheses)forthreeQTLgenotypesinanF2populationunderdierentsamplesizes(n)basedon100simulationreplicates(NP,NormalData). 52 2-2AveragedQTLposition,meancurveparameters,maximumlog-likelihoodratios(maxLR),entropyandquadraticlossesandtheirstandarderrors(giveninparentheses)forthreeQTLgenotypesinanF2populationunderdierentsamplesizes(n)basedon100simulationreplicates(AR(1),NormalData). 53 2-3AveragedQTLposition,meancurveparameters,maximumlog-likelihoodratios(maxLR),entropyandquadraticlossesandtheirstandarderrors(giveninparentheses)forthreeQTLgenotypesinanF2populationunderdierentsamplesizes(n)basedon100simulationreplicates(NP,Datafromt-distribution). 54 2-4AveragedQTLposition,meancurveparameters,maximumlog-likelihoodratios(maxLR),entropyandquadraticlossesandtheirstandarderrors(giveninparentheses)forthreeQTLgenotypesinanF2populationunderdierentsamplesizes(n)basedon100simulationreplicates(AR(1),Datafromt-distribution). 55 2-5AvailablemarkersandphenotypedataofalinkagemapinanF2populationofmice(datafromVaughnetal.,1999). ........................ 59 3-1AveragedQTLposition,meancurveparameters,maximumlog-likelihoodratios(maxLR),entropyandquadraticlossesandtheirstandarderrors(giveninparentheses)fortwoQTLgenotypesinabackcrosspopulationunderdierentsamplesizes(n)basedon100simulationreplicates(NonseparableModel). 81 3-2AveragedQTLposition,meancurveparameters,maximumlog-likelihoodratios(maxLR),entropyandquadraticlossesandtheirstandarderrors(giveninparentheses)fortwoQTLgenotypesinabackcrosspopulationunderdierentsamplesizes(n)basedon100simulationreplicates(NonseparableModel). 82 3-3AveragedQTLposition,meancurveparameters,maximumlog-likelihoodratios(maxLR),entropyandquadraticlossesandtheirstandarderrors(giveninparentheses)fortwoQTLgenotypesinabackcrosspopulationunderdierentsamplesizes(n)basedon100simulationreplicates(NP). 83 8

PAGE 9

84 3-5AveragedQTLposition,meancurveparameters,maximumlog-likelihoodratios(maxLR),entropyandquadraticlossesandtheirstandarderrors(giveninparentheses)fortwoQTLgenotypesinabackcrosspopulationunderdierentsamplesizes(n)basedon100simulationreplicates(C1withn=400and2=2;4). 88 3-6AveragedQTLposition,meancurveparameters,maximumlog-likelihoodratios(maxLR),entropyandquadraticlossesandtheirstandarderrors(giveninparentheses)fortwoQTLgenotypesinabackcrosspopulationunderdierentsamplesizes(n)basedon100simulationreplicates(C1withn=400,increasedirradianceandtemperaturelev-els,and2=1;2). 89 9

PAGE 10

Figure page 1-1ExperimentalcrossesfrompureinbredlineparentsP1andP2 15 1-2Crossing-over ..................................... 17 1-3Weightsofmicemeasuredeveryweekfor10weeks ................ 22 1-4HypotheticalplotofLRvs.linkagemap ...................... 27 2-1Penalizedlikelihoodincurveestimation ...................... 40 2-2Log-likelihoodratio(LR)plotsbasedonsimulateddataunderthreedierentcovariancestructures ................................. 49 2-3Theproleofthelog-likelihoodratios(LR)betweenthefullmodel(thereisaQTL)andreduced(thereisnoQTL)modelforbodymassgrowthtrajectoriesacrossthegenomeinamouseF2population .................... 58 2-4Log-likelihoodratio(LR)plotsforchromosomes6and7ofthemicedata .... 60 2-5ThreegrowthcurveseachpresentingagenotypeateachofsevenQTLsdetectedonmousechromosomes1,4,6,7,10,11,and15forgrowthtrajectoriesofmiceinanF2population. ................................. 61 3-1Reactionnormsurfaceofphotosyntheticrateasafunctionofirradianceandtemperature ...................................... 68 3-2Boxplotsofthevaluesofthelog-likelihoodunderthealternativemodel,H1 85 3-3Covarianceplots ................................... 86 3-4Contourplots ..................................... 87 4-1Formationofaphenotypebyalandscape ...................... 95 10

PAGE 11

Oneofthefundamentalobjectivesinagricultural,biologicalandbiomedicalresearchistheidenticationofgenesthatcontrolthedevelopmentalpatternofcomplextraits,theirresponsestotheenvironment,andthewaythesegenesinteractinacoordinatedmannertodeterminethenalexpressionofthetrait.Morerecently,anewstatisticalframework,calledfunctionalmapping,hasbeendevelopedtoidentifyandmapquantitativetraitloci(QTLs)thatdeterminedevelopmentaltrajectoriesbyintegratingbiologicallymeaningfulmathematicalmodelsoftraitprogressionintoamixturemodelforunknownQTLgenotypes.FunctionalmappinghasemergedtobeapowerfulstatisticaltoolformappingQTLscontrollingtheresponsiveness(reactionnorm)ofatraittodevelopmentalandenvironmentalsignals. Fromastatisticalperspective,functionalmappingdesignedtostudythegeneticregulationandnetworkofquantitativevariationindynamiccomplextraitsisvirtuallyajointmean-covariancelikelihoodmodel.AppropriatechoicesofthemodelforthemeanandcovariancestructuresareofcriticalimportancetostatisticalinferenceaboutQTLlocationsandactions/interactions.Whileabatteryofstatisticalandmathematicalmodelshavebeenproposedformeanvectormodeling,theanalysisofcovariancestructurehasbeenmostlylimitedtoparametricstructureslikeautoregressiveone(AR(1))orstructuredantedependence(SAD)model.Infunctionalmappingofreactionnormsthatrespondtotwoenvironmentalsignals,amodel,expressedasaKroneckerproductoftwoAR(1) 11

PAGE 12

Ourstudyproposesanonparametriccovarianceestimatorinfunctionalmappingofquantitativetraitlocus.WeadoptHuangetal.'s(2006)approachofinvokingthemodiedCholeskydecompositionandconvertingtheproblemintomodelingasequenceofregressionsofresponses.Aregularizedpositive-denitecovarianceestimatorisobtainedusinganormalpenalizedlikelihoodwithanL2penalty.Thisapproachisembeddedwithinthemixturelikelihoodframeworkoffunctionalmappingbyusingareparameterizedversionofthederivativeofthelog-likelihood.Weextendtheideaoffunctionalmappingtomodelthecovariancestructureofinteractioneectsbetweenthetwoenvironmentalsignalsinanon-separableway.Theextendedmodelallowsthequantitativetestofseveralfundamentalbiologicalquestions.IsthereapleiotropicQTLthatregulatesgenotypicresponsestodierentenvironmentalsignals?WhatisthedierenceinthetiminganddurationofQTLexpressionbetweenenvironment-specicresponsiveness?Howisanenvironment-dependentQTLregulatedbyadevelopment-relatedQTL?Weperformedvarioussimulationstudiestorevealthestatisticalpropertiesofthenewmodelsanddemonstratetheadvantagesoftheproposedestimator.Byanalyzingrealexamplesingeneticstudies,weillustratedtheutilizationandusefulnessofthemethodology.Thenewmethodswillprovideausefultoolforgenome-widescanningfortheexistence,distributionandinteractionsofQTLsunderlyingadynamictraitimportanttoagriculture,biologyandhealthsciences. 12

PAGE 13

Anumberofbiologicaltraitsarequantitativelyinherited.Examplesofsuchtraitsincludetheheightoftrees,theweightorbodymassofanimals,theyieldofagriculturalcrops,orevendiseaseprogressionanddrugresponse.Geneticmappingofquantitativetraitsandsubsequentcloningoftheunderlyinggeneshavebecomeaconsiderablefocusinagricultural,biological,andbiomedicalresearch.SincethepublicationoftheseminalmappingpaperbyLanderandBotstein(1989),therehasbeenalargeamountofliteratureconcerningthedevelopmentofstatisticalmethodsformappingcomplextraits(reviewedinJansen,2000;Hoeschele,2000;Wuetal.,2007b).Althoughtheideaofassociatingacontinuouslyvaryingphenotypewithadiscretetrait(marker)datesbacktotheworkofSax(1923),itwasLanderandBotstein(1989)whorstestablishedanexplicitprincipleforlinkageanalysis.Theyalsoprovidedatractablestatisticalalgorithmfordissectingaquantitativetraitintotheirindividualgeneticlocuscomponents,referredtoasquantitativetraitloci(QTLs). ThesuccessofLanderandBotsteinindevelopingapowerfulmethodforlinkageanalysisofacomplextraithasrootsintwodierentdevelopments.First,therapiddevelopmentofmoleculartechnologiesinthemiddle1980sledtothegenerationofavirtuallyunlimitednumberofmarkersthatspecifythegenomestructureandorganizationofanyorganism(Draynaetal.,1984).Second,almostsimultaneously,improvedstatisticalandcomputationaltechniques,suchastheEMalgorithm(Dempsteretal.,1977),madeitpossibletotacklecomplexgeneticandgenomicproblems. LanderandBotstein's(1989)modelforintervalmappingofQTLsisregardedasappropriateforanideal(simplied)situation,inwhichthesegregationpatternsofallmarkerscanbepredictedonthebasisoftheMendelianlawsofinheritanceandatraitunderstudyisstrictlycontrolledbyoneQTLonachromosome.Thisworkwasextendedandimprovedbymanyresearchers(JansenandStam,1994;Zeng,1994;Haleyetal.,1994; 13

PAGE 14

ThischapterprovidesanoverviewofbasicgeneticconceptsrelatedtoQTLmappingofcomplextraits.Fundamentalproceduresforfunctionalmapping(Maetal.,2002)willbeemphasized.FunctionalmappingisastatisticalandgeneticmodelformappingQTLsthatunderlieacomplexdynamictrait.Thischapterisorganizedasfollows:Section 1.1 introducesbasicgeneticconceptsanddescribehowQTLmapping,viaintervalmapping,isdoneusingtheideaoflinkageinstructuredpopulationscalledexperimentalcrosses.Section 1.2 ,introducesthefunctionalmappingmodel.Section 1.3 describesafewotherQTLmappingmethodsandnally,Section 1.4 statesthemaingoalsofthisdissertationandgivestheoutlineoftherestofthechapters. 1.1.1Terminology 14

PAGE 15

ExperimentalcrossesfrompureinbredlineparentsP1andP2:F1P1orF1P2producesabackcrosswhileF1F1producesanintercrossorF2.(AdaptedfromBroman,1997). 'A'and'a',thepossiblegenotypesareAA;Aaandaa.Thegenotypedeterminesthetraitorphenotype.VariationduetoaQTLresultsfromphenotypesdeterminedbydierentgenotypes.However,becauseenvironmentalfactorsalsocontributetothetotalphenotypicvariation,itisdiculttoinferanospring'sgenotypefromitsphenotype. 1-1 ).EachparentcontributesachromosomestrandtocreateanospringcalledtherstlialorF1whichhasgenotypeAa.IftheF1ismatedtooneofitsparents,sayP2,theospringiscalledabackcross,withgenotypeeitherAaoraa.Duringmeiosis(theproductionofsexcellsorgametes),eachparentalstrandreplicatesandexchangegeneticmaterialwithotherstrands.Thisexchangeisknownascrossing-over. 15

PAGE 16

1-2 .EachchromosomestrandfromP1andP2pairupandthenreplicatestoformatetrad.Crossing-over,asillustratedhere,occursatonepointbetweentheinnerstrandsandinvolvesanexchangeofgeneticmaterial.Theendresultisfourchromosomesthateitherresembletheparentalstrands(nonrecombinant,NR)ornot(recombinant,R).Ifcrossing-overdoesoccur,itcanalsodosomorethanonceandincludetheothertwoouterstrands.Ingeneral,recombinantstrandsareformedwhenthereisanoddnumberofcrossoverpoints. Genesonthesamechromosomehaveanassociationcalledlinkage.Thetendencyduringmeiosisisforthegenestoremainonthesamestrand.Thismeansthattherewillbemorenonrecombinantthanrecombinantchromosomes.Ifristherecombina-tionfractionortheproportionofrecombinantchromosomes,thentheproportionofnonrecombinantchromosomesis1r.Becauseoflinkage,itisgenerallytruethatr<1=2.Thevalueofrdependsonthedistancebetweengeneloci.Genesthatarefarapartusuallyhavehighvaluesofrbecauseofthelargeportionofchromosomeinbetweenallowingforabetterchanceforcrossing-overtooccur.Iftwogenesareveryclosetoeachother,thereisahighpossibilitythatnocrossing-overwilloccurandtheywillendupinthesamechromosome. LinkageprovidesawayoflocatingaQTLinachromosomebyusingknownoridentiedgenescalledgeneticormolecularmarkers.MarkersdonotaecttheQTL'sphenotypedirectlyandassuchtheyaresaidtobephenotypicallyneutral.Butthey 16

PAGE 17

Crossing-over:(1)Parentalchromosomestrandsalign(2)Eachstrandreplicatestoformatetrad,crossing-overstartsbetweeninnerstrands(3)Recombinant(R)ornonrecombinant(NR)gametes. mayaectothervisiblephenotypessuchaseyecolor,makingitpossibletodistinguishtheirgenotypes.IfamarkeriscloselylinkedwithaQTL,thenbothoftheirallelescouldpossiblyenduponthesamechromosomeinabackcrossorF2ospring.Theresultingmarkergenotype,whichcanbeidentied,isinformativeinpredictingtheQTL.Thus,aprerequisiteforQTLmappingistheconstructionofalinkagemapofmarkersthatspansanentirechromosomeorthesetofallchromosomesinanorganism(genome).Themoremarkersthereare,thegreaterthechanceofQTLdetection.Someofthepopularlyusedmarkersincludetherestrictionfragmentlengthpolymorphisms(RFLPs),ampliedfragmentlengthpolymorphisms(AFLPs),andsinglenucleotidepolymorphisms(SNPs). 17

PAGE 18

Aunitmapdistance,expressedincentiMorgans(cM),betweentwolociisdenedastheexpectednumberofcrossoversbetweenlociin100meioticproducts.Assumingthatcrossoversoccuratrandomandareindependentofeachother,theHaldanemapfunction(Haldane,1919;Wuetal.,2007c)canbeusedtorelateadistanceofdcMtotherecombinationfrequencyrinthefollowingway: 2orr=1 2(1e2d=100):(1{1) ThedistancesbetweenmarkersacrossthegenomeareknownandusuallyexpressedincM.However,QTLmappingmodelsutilizeprobabilitieswhichareexpressedintermsofr.Thus,whenalinkagemapofmarkersaregivenincM,theyareconvertedtorusingEq. 1{1 Forsimplicity,weassumeabackcrosspopulationwithtwopossiblegenotypesatalocidenotedby1(forAa)or0(foraa).Consideranintervalonachromosomewithtwolinkedmarkers,MandN,asendpointsandletrbetherecombinationfractionbetweenthem.WerefertoMastheleftmarkerandNastherightmarker.SupposethereexistsaQTL,Q,withinthemarkerinterval.Letr1betherecombinationfractionbetweenMandQandr2betweenQandN.ItiseasytoshowusingEq. 1{1 thatr=r1+r22r1r2.TheQTLgenotypesareunknownbuttheirconditionalprobabilitiesgiventhemarkergenotypescanbederived.TheseconditionalprobabilitiesareshowninTable 1-1 .Asanillustration,ifanospringhasgenotype1(Mm)atmarkerMand0(nn)atmarkerN,thenthemarkerintervalgenotypeis10.TheconditionalprobabilitythataQTLhasgenotype1(Qq)is

PAGE 19

Table1-1. Conditionalgenotypeprobabilityinabackcross MarkerIntervalQTLGenotype Genotype10 11(1r1)(1r2) (1r)r1r2 (1r) 1-2 Table1-2. ConditionalgenotypeprobabilityinanF2 Genotype210 22(1r1)2(1r2)2 (1r)2r21r22 (12r+2r2)(12r1+2r21)(12r2+2r22) (12r+2r2)2r1(1r1)r2(1r2) (12r+2r2)10r1(1r1)r22 (1r)2(1r1)2(1r2)2 1-1 and 1-2 wereobtainedusingathree-pointanalysisofgenes(seeforexample,chapter4ofWuetal.,2007c).AQTLisusuallysearchedorscannedatconsecutiveequidistantpointsorintervalsinthegenome.Forexample,agivenchromosomeissearchedstartingattheleftmostmarkertotheoppositeendatevery2or4cM.Ateachsearchpoint,thenumericalvaluesoftheconditionalprobabilitiesofaQTLcanbecalculatedusingTables 1-1 and 1-2 .Theseconditionalprobabilitiesformtheweights 19

PAGE 20

1.2.1 .Noticethatforagivenmarkerintervalineithertable,PJk=1pkji=1,whereJisthenumberofgenotypes(J=2;3forabackcrossandintercross,respectively).Thismeansthatallentriesforagivenrowineithertableaddupto1. Acompleteintervalmappingmodelinvolvesthephenotypedataasidefromthemarkers.Weshallseeinthenextsection( 1.2.1 )howfunctionalmapping,whichisbasedonintervalmapping,incorporatesphenotypedata. 1.2.1ModelFormulation wherethemeangenotypevaluegkandcovariancearespecied,andk=1;:::;J. Thelikelihoodfunctioncanberepresentedbyamultivariatemixturemodel whereistheparametervectorwhichwewillspecifyshortly.pkjiistheprobabilityofaQTLgenotypegiventhegenotypesoftwoankingmarkers(Section 1.1.4 ).Asstatedearlier,aQTLissearchedatdierentpointsthroughoutthegenome.Atanygivenpoint,thenumericalvalueofpkjicanbecomputedbasedonTables 1-1 and 1-2 .Thus,foragivensearchposition,thelog-likelihood,logL(),canbemaximizedtoobtainthemaximumlikelihoodestimates(MLEs)ofthemean,,andcovariance,,parameters.Therefore,=(;). 20

PAGE 21

1-3 )canbemodeledbyalogisticfunctiondenedby (Niklas,1994;Westetal.,2001;Zhaoetal.,2004).Thismodelhasafewdesirabledescriptiveproperties.Thecurvestartswithanexponentialphaseandreachesaninectionpointwherethemaximumrateofgrowthoccurs.Thengrowthcontinuesasymptoticallytowardsthevaluea.Thevalueattheonsetofgrowthisa=(1+b)att=0whileatthepointofinectionisa=2att=logb=r.Thesepropertiescanbeusedtoderivehypothesistests(Maetal.,2002;Zhaoetal.,2004).OtherparametricmeanmodelsincludethesigmoidEmaxequationwhichrelatesdrugconcentrationanddrugeect(Linetal.,2007),theRichardsandGompertziancurvesfortime-dependenttumorgrowth(Lietal.,2006),andthebiexponentialmodelforHIV-Idynamics(Wangetal.,2006).Intheabsenceofstructuralforms,semiparametric(Cuietal.,2006;Wuetal.,2007d;Yangetal.,2007)ornonparametric(Zhao,W.,2005a;Yang,J.,2006;Yangetal.,2007)approachescanbeusedtomodelthemean. Thecovariance,,isassumedtobethesameforeachgenotypegroupk.TheusualparametricmodelistheautoregressiveoneorAR(1), =226666666666412:::m11:::m221:::..................m1m2::::::1377777777775;(1{5) 21

PAGE 22

Weightsofmicemeasuredeveryweekfor10weeks.DatafromthestudyofVaughnetal.(1999). whichispopularinthelongitudinaldataliterature(Diggleetal.,2002;VerbekeandMolenberghs,2005).Thismodelassumesvariancestationarity(equalresidualvariances2ateachtimepoint)andcovariancestationarity(proportionallydecreasingcovariancesbetweentimepoints).Explicitformsfortheinverseanddeterminantareeasilyobtainedbymatrixalgebra: 1=1 22

PAGE 23

Thus,anAR(1)modeliscomputationallyecient.Furthermore,whenusingtheECM-algorithm,itispossibletoobtainCM-stepiterationsolutionsfortheparametersinlogistic(Maetal.,2002)orrationalfunction(Yapetal.,2007)meanmodels.DespitetheadvantagesofanAR(1)model,theassumptionsofvarianceandcovariancestationaritymaynotalwayshold,especiallyforrealdata.ThisisevidentfromFigure 1-3 wherethedataappearsto'fanout'acrosstimeinsteadofbeingstationary.Wuetal.(2004b)triedtoresolvethisproblembyapplyingatransform-both-sides(CarrollandRuppert,1984)log-transformationonthedatatoachieveapproximatestationarity.TheAR(1)canthenbeusedonthetransformeddata(Zhaoetal.,2004).However,suchtransformationmaynotproduceastationarycovariancesothatanAR(1)isstillnotanappropriatemodel.Togetridofstationarityissues,Zhaoetal.(2005)proposedusingastructuredantedependencemodel(SAD)(ZimmermanandNu~nez-Anton,2001)whichcanhandlenon-stationarydataandismorerobustandlessdata-dependentthanAR(1).TheelementsofanSADcovariancestructureoforder1aregivenby var(yij)=12j whereisthegeneralizedautoregressiveparameter,istheinnovationvariance,andj=1;:::;m. 1{8 showsthatthevariancesarenotconstantandthecovariancesarenotonlydependentonjj2j1jbutalsoonthereferencepointj1.However,Zhaoetal.stillrecommendsmodelingdatabySADinconjunctionwithAR(1). 1{3 ,canbewrittenas logL()=nXi=1log"JXk=1pkjifk(yij)#(1{9) 23

PAGE 24

@logL()=0(1{10) where2.Notethatforlogisticmean,=fak;bk;rkjk=1;:::;Jg,andAR(1)covariance,=f2;g,models,=(;).However,theleftsideofEq. 1{10 canbereparameterizedas @logL()=nXi=1JXk=1pkji@ @fk(yij) @logfk(yij)=nXi=1JXk=1Pkji@ @logfk(yij) (1{11) where isinterpretedastheposteriorprobabilitythatprogenyihasQTLgenotypek(McLachlanandPeel,2000;Maetal.,2002).LetP=fPkji;k=1;:::;J;i=1;:::;ng.TheExpectationandMaximization(EM)algorithm(Dempsteretal.,1977)atthe(j+1)thiterationproceedsasfollows: 1. Thecurrentvalueofis(j). 2. 3. @logfk(yij)=0(1{14) toget(j+1). 4. Repeatuntilsomeconvergencecriterionismet. ThevaluesatconvergencearetheMLEsof. 24

PAGE 25

A showsthederivationof 1{13 and 1{14 basedonamissingdataargument.Thisderivationcanalsobeusedtoshowtheextensionfromamaximumtomaximumpenalizedlikelihoodalgorithm(Section 2.3.2 ).ForamorethoroughtreatmentontheEMalgorithm,thereaderisreferredtoMcLachlanandPeel(2000). AlthoughtheEMalgorithmprovidesecientcomputationofthemodelparametersinfunctionalmapping,othermethodscanbeusedaswellsuchasNewton-RaphsonortheNelder-Meadsimplexalgorithm(NelderandMead,1965;Zhaoetal.,2004)whichisadirectnonlinearoptimizationprocedure.Thesemethodsareparticularlyusefulwhennoclosedformformulasfortheparameterestimatescanbeobtained. whereH0isthereduced(ornull)modelsothatonlyasinglelogisticcurvecantthephenotypedataandH1isthefull(oralternative)modelinwhichcasethereexistmorethanonelogisticcurvesthattthephenotypedataduetotheexistenceofaQTL.Notethatthelikelihoodfunctioncorrespondingtothenullmodelisgivensimplyby where and inthecaseofalogisticmeanmodel.Anumberofotherimportanthypothesescanbetested,asoutlinedinWuetal.(2004a). 25

PAGE 26

plottedovertheentirelinkagemap,where~and^denotetheMLEsunderH0andH1,respectively.ThepeakoftheLRplot,whichweshallfromhereonrefertoasmaxLR,wouldsuggestaputativeQTLbecausethiscorrespondstowhenH1isthemostlylikelyoverH0.ThedistributionofLRisdiculttodeterminebecauseoftwomajorissues:theunidentiabilityoftheQTLpositionunderH0andamultipletestingproblemthatarisesbecausetestsacrossthegenomearenotmutuallyindependent(Wuetal,2007c).However,anonparametricmethodcalledpermutationtestsbyDoergeandChurchill(1996)canbeusedtondanapproximatedistributionandasignicancethresholdfortheexistenceofaQTL.Inpermutationtests,thefunctionalmappingmodelisappliedtoseveralrandompermutationsofthephenotypedataonthemarkersandathresholdisdeterminedfromthesetofmaxLRvaluesobtainedfromeachpermutationtestrun.Theideahereistodisassociatethemarkersandphenotypessothatrepeatedapplicationofthemodelonpermuteddatawillproduceanapproximateempiricalnulldistribution. Figure 1-4 showsahypotheticalplotofthelog-likelihoodratioteststatisticoveralinkagemaponachromosome.Themarkersarespacedoutatf0;32;46;58;68;82;100;112gcMandtheQTLsearchwasdoneat2cMintervalsfromtheleftmostmarkerat0.Thetwohorizontallinesarethresholdsbasedonpermutationtests.ThesolidredlinethatcrossestheLRplotsuggestsasignicantQTLwhilethebrokengreenlinedoesnot. 26

PAGE 27

HypotheticalplotofLRvs.linkagemap.Thelatterconsistofmarkersspacedoutatf0,32,46,58,68,82,10,112galongthechromosomerepresentedbythex-axis.QTLsearchwasat2cMintervals.Thelocationcorrespondingtothepeakoftheplot,maxLR,suggestsaQTLposition.Athresholdthatcrossestheplot(redhorizontalline)indicatesasignicantQTL;ifnot(greenhorizontalbrokenline),theQTLisnotsignicant. mapping(CIM;Zeng,1994)wereproposedtoaddressthisissue.Toincreasethepowerofintervalmapping,bothmethodsuseasubsetofmarkerlocibeyondthemarkerintervalunderconsiderationascovariatesinapartialregressionanalysis.TheeectsofthesubsetofmarkersareusedtoestimatetheeectsofotherQTLs.Theproblemwiththesemethodsthoughishowtoselecttheappropriatemarkerstoincludeinthemodel. Multipleintervalmapping(MIM;Kaoetal.,1999)usesmultiplemarkerintervalssimultaneouslytoidentifymultipleQTLs.Thismodelallowsestimationofmainandepistatic(interaction)eectsamongalldetectedQTLs.However,anissuewiththismethodismodelcomparisonandsearchingthroughmodels(Broman,2001). 27

PAGE 28

1.2 )providesausefulframeworkforgeneticmappingthroughmeanandcovariancemodelingofmulti-orlongitudinaltraits.Becauseitrequiresasmallnumberofmodelparameterstoestimate,itiscomputationallyecientandcanbeusedondatathathavelimitedsamplesizes.FunctionalmappinghasshownpotentialasapowerfulstatisticalmethodinQTLmapping. AlthoughparametricmodelssuchasAR(1)andSADaresuitablecovariancestructuresforthelikelihood-basedfunctionalmapping,severebiascouldbeintroducedintheestimationprocessiftheunderlyingdatastructureissignicantlydierent.Specically,abiasedcovarianceestimatecanaecttheestimatesforQTLlocation,QTLeects(theestimatedmeanmodel),andeventhevalueofmaxLR,whichisneededinpermutationtestsforsignicance.Thus,thereisaneedforarobustestimatorthatcanprovidemoreaccurateandpreciseresults.Inthisregard,weproposeanonparametricapproach.AnonparametriccovarianceestimatorwasproposedbyHuangetal.(2006)forthenullmodel(Section 1.2.3 ).TheseauthorsusedapenalizedlikelihoodprocedureinsolvingasetofregressionequationsobtainedfromthemodiedCholeskydecompositionofthecovariancematrix.Theirestimatorisregularizedandguaranteedto 28

PAGE 29

Therestofthechaptersareorganizedasfollows: InChapter 2 ,wedescribethemodiedCholeskydecompositionapproachofcovarianceestimationandHuangetal.'snonparametricprocedure.WeprovidesomediscussiononridgeregressionandLASSOtechniquesforsolvingaregression,andpenalizedlikelihoodbecausethesearethemainconceptsbehindtheirmethod.Thenweshowtheextensiontothemixturelikelihoodcaseandapplyourmodeltosimulatedandreallongitudinaldata. InChapter 3 ,weextendtheuseofourproposedestimatortofunctionalmappingofreactionnormswithtwoenvironmentalsignals.Weconsiderphotosyntheticrateasthereactionnormandirradianceandtemperatureasthetwoenvironmentsignals.Thepreviousproposedcovariancemodelwasparametricandseparable.Insituationswhentheunderlyingdatastructureisnonseparable,ournonparametricestimatorisshowntobemorereliablebasedonthesimulationresults. InChapter 4 ,weconcludethisdissertation. 29

PAGE 30

Themaindicultiesassociatedwithcovarianceestimationarethenumberofparametersthatneedtobeestimated(whichgrowsquadraticallywithdimension)andthepositive-deniteconstraint.Pourahmadi(1999)discoveredthatthelattercanbetakencareofifoneusesthemodiedCholeskydecomposition.AfewpublishedresearchthatfollowedPourahmadi'ssuggestionproposedwaysofregularizationtoprovideanecientcovarianceestimator(WuandPourahmadi,2003;Huangetal.,2006and2007b;Levinaetal.,2008).Inthischapter,weadopttheapproachproposedbyHuangetal.(2006)whichusesapenalizedlikelihoodprocedureandextendittothemixturelikelihoodframeworkoffunctionalmapping.Suchextensionispossibleiftheposteriorprobabilityreparameterizationofthederivativeofthelog-likelihoodfunction, 1{11 ,isused.Thenewapproachisanonparametriccovarianceestimatorinfunctionalmapping.Thetermnonparametric,whichreferstodistributionfreemethods,maynotbeexactlyappropriate.Asweshallseeinthischapter,theestimatorisreallyforunstructuredcovariancesand 30

PAGE 31

Thischapterisorganizedasfollows:InSection 2.2 ,wediscussthemodiedCholeskydecompositionanditsregressioninterpretation,reviewthemethodsproposedintheliteratureforregularizedcovarianceestimators,anddiscussridgeregression(HoerlandKennard,1970),theleastabsoluteshrinkageandselectoroperatororLASSO(Tibshirani,1996),andpenalizedlikelihood.InSection 2.3 ,wedescribehowHuangetal.'sapproachcanbeextendedtofunctionalmapping.Section 2.4 isdevotedtosimulationsandananalysisofarealdataofanintercrossprogenyofmiceusingourproposedmethodology.Thelastsection( 2.5 )summarizesthechapterandprovidessomediscussion. 2.2.1ModiedCholeskyDecompositionandRegressionInterpretation whereT=26666666666410002110031321............m1m2m;m11377777777775;

PAGE 32

wheretjisthe(t;j)thentry(forj
PAGE 33

Pourahmadi(1999)recognizedthattheGARPsandthelogarithmoftheIVsareunconstrainedparametersandhencecanbemodeledintermsofcovariates.HisapproachwastoestimatetheCholeskycomponentsTandD,insteadofestimatingdirectly.Thatis,ndestimates^Tand^DofTandD,respectively,sothatanestimatorofis^=^T1^D(^T1)0whichispositive-denite.Hedidthisbysuggestingalinkfunctiong()fordenedby (WuandPourahmadi,2003)wherelogDmeansthematrixDwherethelogarithmistakenoneachdiagonalelementandIistheidentitymatrix.Thisformulationisanalogoustoalinkfunctionforthemeaningeneralizedlinearmodeltheory(McCullaghandNelder,1989). AlthoughtheCholeskycomponentsstillhavethesamenumberofparameterstoestimateasanyunstructuredcovariancematrix,thisnumbercanbereducedconsiderablybyusingcovariatesandmodelingtheentriesofTandDeitherparametrically,nonparametrically,orinaBayesianway(Pourahmadi,1999;DanielsandPourahmadi,2002;WuandPourahmadi,2003).Pourahmadi(1999)illustratedtheparametricapproachbyusingtimelagsascovariatesfortheentriesofTandDinanalyzingthecattledata(Kenward1987).WuandPourahmadi(2003)andHuangetal.(2007b)eachusedanonparametricapproachbycapitalizingontheregressionrepresentationEq. 2{2 .Intuitively,forlongitudinaldata,termsfarawayintheregressionareexpectedtobesmall.Thatis,thelagjregressioncoecientt;tjisexpectedtobesmallforaxedtandlargej.ThismeansthatforagivenrowontheTmatrix,thetermsareexpected 33

PAGE 34

AlthoughHuangetal.'snonparametricapproachofsmoothingtherstfewsubdiagonalsoftheTmatrixandsettingtheresttozeroproducesastatisticallyecientestimatorof,itmaynotbeadequateincaseswhenthediagonalsarenotsmoothorwhentheremaybesmallbutnonzeroelementsinT.AnalternativeapproachistousepenalizedlikelihoodasproposedbyHuangetal.(2006).Byimposingroughnesspenaltiesonthenegativelog-likelihoodfunction,thisprocedureessentiallyprovidesasolutiontothesequenceofregressionequations 2{2 .TheclassofLppenaltieswithp=1;2areconsideredunderthegeneralframeworkofpenalizedlikelihoodforregressionmodels(FanandLi,2001).TheL1andL2penaltiesallowshrinkageontheGARPsinasimilarfashionasLASSOandridgeregression,respectively.Moreover,theL1penaltycanshrinksomeoftheGARPstowardzeroandthusprovideaselectionschemeforregressioncoecients.ThisishoweverdierentfrombandingTwheret;tjissettozeroforallj>m0.WiththeL1penalty,thezeroescanbeirregularlyplacedinanygivenrowofT.Levinaetal.(2008)proposedasimilarpenalizedlikelihoodprocedurecalledadaptivebanding.Theirmethodusesanestedlassopenaltywhichsetst;tjtozeroforallj>k,wherekmay 34

PAGE 35

WeadopttheL2penaltyapproachofHuangetal.(2006)andproposeanextensionofthismethodtocovarianceestimationinthemixturelikelihoodframeworkoffunctionalmapping.Suchextensionispossiblebycapitalizingontheposteriorprobabilityrepresentationofthemixturelog-likelihoodusedintheimplementationoftheEMalgorithm(Section 2.3.1 ).EstimationisthencarriedoutbyusingtheECMalgorithm(MengandRubin,1993)withtwoCM-steps(Section 2.3.2 ). Assumethelinearregressionmodel,y=X+,wherey=(y1;y2;:::;yn)0isthevectorofresponses,X=266666664x11x21xp1x12x22xp2............x1nx2nxpn377777775 ^=(X0X)1X0y;(2{6) 35

PAGE 36

andgivestheminimumvarianceamongunbiasedlinearestimatorsof.AdrawbackoftheOLSestimator 2{6 isthatitisnotuniquewhenthedesignmatrixXislessthanfullrank,i.e.rank(X)
PAGE 37

2{9 and 2{10 ,smalleigenvaluescancausetheexpectedvalueandvarianceofthesquareddistancefrom^totobelargeor,asshownbyEq. 2{11 ,theregressioncoecientsthemselvestobetoolargeinmagnitude.Similarly,thevarianceof^canalsobeinatedsincevar(^)=2(X0X)1.Asaresult,themeansquarederror(MSE)alsobecomesinatedandpredictionsbasedontheOLSestimator( 2{6 )arenotveryreliable. Onewayofresolvingmulticollinearityisthroughridgeregression.TheideaofridgeregressionistomakeX0XclosetotheidentitymatrixbyreplacingitwithX0X+IwhereissomepositivenumberandIistheidentitymatrix.Theresultingestimatoris ^r=(X0X+I)1X0y(2{12) whichisessentiallyashrunkversionof^toward0.^ryieldslargereigenvaluesi+,fori=1;:::;p,andtherefore,smallerpredictionvariances.Itshouldbenotedhowever,thatthereisatrade-oforthisbecauseunlike^,^risnotunbiased.Butontheaverage,westillgetlowerMSEandbetteroverallprediction.HoerlandKennard(1970)provideswaysforselecting. LASSOdoesasimilarapproachasridgeregressioninreducingvariancebysacricingbias.LASSOalsoshrinkstheregressioncoecientstowards0butgoesfurtherbypossiblyallowingsomeofthemtobe0.Thisisdesirableinthesensethattheresultingmodelisparsimoniousandhasbetterinterpretation,becauseonlythecoecientswithstrongeectsareincludedinthemodel.TheLASSOestimateof,whichweshalldenoteas^l,canbeobtainedbyminimizing wheret0isatuningparameter.ThisisaquadraticprogrammingproblemwithlinearinequalityconstraintsandTibshirani(1996)providesecientandstablealgorithmsto 37

PAGE 38

2{13 isequivalenttothepenalizedresidualsumofsquares (Gilletal.,1981;Tibshirani,1996)whereisatuningparameter. Ridgeregressioncanalsobeexpressedasaconstrainedoptimizationproblemasaminimizationof wheret0isatuningparameterorequivalently,aminimizationof (Tibshirani,1996)whereisatuningparameter.Therefore, 2{16 alsoleadsto 2{12 with=. wherep()iscalledthepenaltyfunctionand>0isatuningparameter.Theideabehindpenalizedlikelihoodis,insomesense,similartomeansquarederror mse=bias2+variance(2{18) 38

PAGE 39

whereyiistheresponseinaregressiononxi,iisi.i.d.N(0;2)andfisafunctiontobeestimated.Here,=fandL(f)=Pni=1(yif)2,theresidualsumofsquares,withouttheconstants.Acompletelyunbiasedestimateoffisacurve,^f,thattsalltheyi'sexactly.However,thiscurveshouldhaveahighvariancebecauseofitsrapidlocalvariation.Wesaythat^finthiscaseis"rough"sothatifwewanttocontrolthe"roughness"aspectoff,wecanuseitasapenalty,p(f),in 2{17 .Towardstheotherextreme,anotherestimateoffcanbetoosmoothbutmaybeseverelybiased.Figure 2-1 showsthreedierentestimatesoffwithvaryingdegreesofroughness.The"inbetween"curveseemstobethebestestimatebecauseithasagoodbalancebetweentandroughness.Thetuningparametercontrolstherelativeimportanceofthesetwo. Apopularchoiceforassessingroughnessistheintegratedsquaredsecondderivative(ISSD) whichmeasuresthecurvatureofafunction(RamsayandSilverman,1997;Green,1999).HighlyvariablefunctionswillhavehighvaluesforpISSD.AlinearfunctionhaspISSD=0andthereforehastheleastcurvature. AnotherexampleofpenalizedlikelihoodisridgeregressionwhichwehaveseeninSection 2.2.3 .Inridgeregression, 2{16 hasthesameformas 2{17 with==(1;:::;p)0.Theaspectofthatisbeingcontrolledisthesizeofitselements(theregressioncoecients),quantiedbypr()=0=Ppj=12j.LASSOalsocontrolsthesizeoftheregressioncoecientsbutbyusingthepenaltypl()=Ppj=1jjjinstead.RidgeregressionandLASSOarespecialcasesofafamilyofpenalizedregressionscalledbridgeregressionwhichimposesapenaltyfunctionoftheformPpj=1jjj,1(Fu,1998). 39

PAGE 40

Penalizedlikelihoodincurveestimation.Dependingonthepenalty,theestimatedcurvecaneitherberough,smoothorinbetween.Theinbetweencurveillustratesabalancebetweenbiasandvariance. Penalizedlikelihoodisalsousedinmodelselection.SupposewehavemodelMiwithparametervectoriandlikelihoodfunctionLi(i).Let=L1(^1)=L2(^2)where^i=argminLi(i);i=1;2.SupposemodelM2isnestedwithinmodelM1.Thenbythelikelihoodratiotest,thesimplermodelM2isrejectedif2logexceedsacertainpercentileofthe2p1p2distribution,wherepiisthenumberofparametersinMi.However,iftheamountofdataislarge,themorecomplexmodelM1isselectedevenwhenthesimplermodelM2istrue(Lindley,1957;Green,1999).Variousapproachesthatutilizepenalizedlikelihoodhavebeendevelopedtoalleviatethisproblem.OnesuchapproachistheBayesianInformationCriteria(BIC;Schwarz,1978)which,formodelMi,isdenedas 40

PAGE 41

whichshowsapenalizedformofthelikelihoodratioteststatistic.OtherapproachessuchasAkaike'sInformationCriteria(AIC;Akaike,1974) andMallows'Cp(Mallows,1973) usedierentformsofthepenalties.Themainideabehindthesecriteriaispenalizemorecomplexmodelstofavorthesimpleronesbasedonthenumberofparametersineachofthem. RecallthatLASSOshrinkstheregressioncoecientstowardszeroandevenallowthemtobeexactlyzero.Therefore,inadditiontocontrollingthesizeoftheregressioncoecientsthroughpl()=Ppj=1jjj,LASSOalsoimplementsmodelselectionbyprovidingsimplerormoreparsimoniousregressionmodels. FormoreaboutpenalizedlikelihoodseeRamsayandSilverman(1997)andGreen(1999),andforasymptotictheory,Coxetal.(1990). 2.3.1ComputingthePenalizedLikelihoodEstimates (2{25) whereyki=yigk,k=1;:::;J. 41

PAGE 42

logL()=nXi=1log"JXk=1pkjifk(yij)#(2{26) andthat @logL()=nXi=1JXk=1Pkji@ @logfk(yij)(2{27) where and2=(;). ItfollowsfromTT0=D,Eq. 2{1 ,that1=T0DTandjj=jDj.Therefore,ifisgiven, @logL()=nXi=1JXk=1Pkji@ @m 2logjj1 2yki01yki=1 2nXi=1JXk=1Pkji@ @logjj+yki01yki=1 2nXi=1JXk=1Pkji@ @mXt=1log2t+mXt=1kit2 whereki1=yki1andkit=ykitPt1j=1tjykijfort=2;:::;m.Itisimplicitlyassumed,therefore,that2t=var(kt)fork=1;:::;J.Notethatifk=(k1;:::;km)0andyk=(yk1;:::;ykm)0thenk=Tyksothatvar(k)=TT0=D. Denethepenalizednegativelog-likelihoodas AssumingtheL2penaltyp(ftjg)=Pmt=2Pt1j=12tj,wehave Ourproblemisimmediatelysolvedif 2{31 canbeexpressedinthesameformas 2{16 wherethetj'scorrespondtothej's.However,thersttermontherightsideof 2{31 42

PAGE 43

2{27 .Thus,bytakingthederivativeof 2{31 andusing 2{29 ,weget @[2logL()+p(ftjg)]=2nXi=1JXk=1Pkji@ @logfk(yij)+@ @mXt=2t1Xj=12tj=nXi=1JXk=1Pkji@ @mXt=1log2t+mXt=1kit2 @mXt=2t1Xj=12tj=@ @"nXi=1JXk=1PkjimXt=1log2t+mXt=1kit2 @"nXi=1JXk=1Pkjilog21+ki12 @mXt=2"nXi=1JXk=1PkjimXt=1log2t+mXt=1kit2 2{16 whenwrittenintermsoftjbecausekit=ykitPt1j=1tjykijfort=2;:::;m.Thus,weneedtominimize and foreacht=2;:::;m. Theminimizerof 2{32 canbeobtainedbysolving@ @21"nXi=1JXk=1Pkjilog21+ki12 whichyields ^12=Pni=1PJk=1Pkjiyki12 Fort=2;:::;m, 2{33 canbeminimizedbyalternatingminimizationover2tandtj,j=1;2;:::;t1(seeAppendix C ).Thesolutionsare 43

PAGE 44

and wheret(t)=(t1;t2;:::;t;t1)0andHt,ItandgtaregiveninAppendix C .Noticethesimilarityof 2{36 to 2{12 andthatinformulas 2{34 2{35 and 2{36 ,theposteriorprobabilities,Pkji's,aretheweightsforthegenotypegroups,k=1;:::;J.Anonparametriccovarianceestimate,^NPcanthereforebeobtainedthrough^NP=^T1^D(^T1)0,wheretheelementsof^Daregivenby 2{34 and 2{35 ,andtheelementsof^Taregivenby 2{36 TheprecedingcalculationswerebasedontheL2penalty,p(ftjg)=Pmt=2Pt1j=12tj.IftheL1penalty,p(ftjg)=Pmt=2Pt1j=1jtjj,isusedinstead,closedformsolutionslike 2{35 and 2{36 cannotbeobtainedandaniterativealgorithmisneeded.ThisiscarriedoutbyusinganiterativelocalquadraticapproximationofPt1j=1jtjj(FanandLi,2001;Ojelundetal.,2001).ThereaderisreferredtoHuangetal.(2006)foradditionaldetails. 1.2.2 ),apenaltyterm,p(),onthemodelparameters,canbeintroducedtothecompletedatalog-likelihood, A{1 ,togetthepenalizedcompletedatalog-likelihood logLPc()=nXi=1JXk=1xik[logpkji+logfk(yij)]+p():(2{37) Clearly,takingtheconditionalexpectationof 2{37 doesnotaectthepenaltytermbecausetheexpectationistakenwithrespecttothemissingvariablex.Thus,attheE-step,wehave 44

PAGE 45

@QP(j(j))=nXi=1JXk=1P(j)kji@ @logfk(yij)+@ @p()=0(2{39) toget(j+1),where2. ThederivedformulasforintheprecedingsectioncannotbedirectlyusedintheEMalgorithmbecauseweassumedthatwasgiven.WeinsteaduseavariantoftheEMalgorithmcalledtheExpectationandConditionalMaximization(ECM)algorithm(Meng&Rubin,1993)whichpartitionstheparametersetaccordingtomeanandcovarianceparameters,and,respectively.TheECMalgorithmdiersfromEMinthattheM-stepinvolvesaconditionaloptimizationwithrespecttoeachpartitionof.Moreprecisely,forthe(j+1)thiteration,theECMalgorithmproceedsasfollows: 1. Initialize(j)=((j);(j)). 2. 3. 2{34 2{35 and 2{36 (Section 2.3.1 ). 4. Repeatsteps(2)(3)untilsomeconvergencecriterionismet. Unlessastructure,suchasAR(1),isimposedonthecovariancematrix,itisdiculttondclosedformCM-stepsolutionsforthemeanparametersinfunctionalmapping.Hence,estimationinthiscaseiscarriedoutbyusingtheNelder-Meadsimplexalgorithm(NelderandMead,1965;Zhaoetal.,2004)whichcanbereadilyimplementedbypopularsoftware.Seeforexamplethefminsearchbuilt-infunctioninMatlaboroptiminR. Inabackcrosspopulationdesign,Maetal.(2002)provideclosedformiterationformulasfor 45

PAGE 46

ForaK-foldcross-validation,letZdenotethefulldataset.ZisrandomlysplitintoKsubsetsofaboutthesamesize.Eachsubset,sayZs(s=1;:::;K),isusedtovalidatethelog-likelihoodbasedontheparametersestimatedusingthedataZnZs.Thevalueofthatmaximizestheaverageofallcross-validatedlog-likelihoodsisusedtoselectanestimatefor. Thecross-validatedlog-likelihoodcriterionisgivenby where^sisanestimateofswhichisbasedonthedatasetZnZsandLsisthelikelihoodbasedonZs.=^ischosentomaximizeC(). Notethattherereallyaretwosetsoftuningparametersinoursetting-oneunderthenullmodelandanotherunderthealternative.However,becausethelog-likelihoodunderthenullmodelisconstantthroughoutamarkerinterval,weshallassumethatthecorrespondingtuningparameterhasbeenestimatedaccordinglyandinthesucceedingsectionssimplyrefertothetuningparametersastheonesforthealternativemodel. 2.4.1Simulations 2.3.1 ),isassessedandcomparedtoanAR(1)-structuredestimator,AR(1)(Section 1.2.1 ).Weinvestigatedatageneratedfrombothmultivariatenormalandt-distributions.Webeginwiththeformer. 46

PAGE 47

1.1.2 )forQTLmapping,werandomlygenerated6markersequallyspacedonachromosome100cMlongwith1QTLbetweenthesecondandthirdmarkers,12cMfromthesecondmarker(or32cMfromtheleftmostmarkerinthechromosome).EachphenotypeassociatedwiththesimulatedQTLhadm=10measurementsandwassampledfromamultivariatenormaldistribution,usinglogisticcurvesasgenotypemeansunder3dierentcovariancestructures.Themeanparameterswerea1=30;a2=28:5;a3=27:5,b1=b2=b3=5,andr1=r2=r3=:5andthecovariancestructureswereasfollows: 47

PAGE 48

2-2 ,thebrokenlineLRplotistheresultofourprocedurewhilethesolidoneisbasedonindividualc'sthathaveeachbeenseparatelycross-validated.Forn=400,thesetwoplotsareindistinguishable.Thereasonforthisisthat,thecross-validated'sateachsearchpointwithinamarkerintervalarenotthatdierentfromoneanother.Thus,usingoneforeachmarkerinterval(theonethatproducesthemaximumLR)willnotsignicantlyalterthegeneralshapeoftheLRplot.Thetwodottedlineplotswerebasedonc,forallc=1;2;:::;26,settotwodierentarbitraryvaluesof. Toevaluatetheestimate^l(l=1;2;3)ofthetruecovariancestructurel,anumberofcriteriacanbeused.Amongthemarethematrixnormlosses klk 48

PAGE 49

Log-likelihoodratio(LR)plotsbasedonsimulateddataunderthreedierentcovariancestructures.Thesolidlineplotisbasedoncross-validated(CV)tuningparametersateachsearchpoint(individual's).Thebrokenlineplotisbasedoncross-validatedtuningparameters(max's)correspondingtothemaximumLRineachmarkerinterval.Thedottedlineplotisbasedontwodierentarbitrarytuningparametervalues,eachassumedatallsearchpoints. 49

PAGE 50

whereIistheidentitymatrix.Theselossesareallnonnegativeandequalitytozeroholdswhen^l=l.Thereisnoagreementastowhichofthesenormsisappropriateforaparticularsituationbutanyofthemmaybeusedandtheresultswouldqualitativelybethesame(Levinaetal.,2008).Here,weuseLEandLQwhichwerealsousedbyWuandPourahmadi(2003),Huangetal.(2006and2007b),andLevinaetal.(2008).Thecorrespondingriskfunctionsaredenedby and 100simulationrunswerecarriedoutandtheaveragesonallrunsoftheestimatedQTLlocation,logisticmeanparameterestimates,maxLR,entropyandquadraticlosses,includingtherespectiveMontecarlostandarderrors(SE),wererecorded.TheresultsareshowninTables 2-1 and 2-2 .For1,AR(1)doeswellasexpected,butNPalsodoesagoodjob.Bothprovidebetterprecisionwithincreasedsamplesize.ThemaxLRvaluesarecomparablei.e.38.52and112.03fromTable1versus37.78and128.21fromTable2,respectively,arenottoodierentfromeachother. For2and3,NPdoesbetterthanAR(1).AR(1)showshighvaluesforbothaveragedlosseswhichtranslatestosignicantlybiasedestimatesinQTLlocationandpoormeanparameterestimates,particularlyfor3atthesecondandthirdgenotypegroup.Increasedsamplesizedoesnothelpandevenmakesmeanparameterestimatesworseinthecaseof3.ValuesofmaxLRforNPandAR(1)areverydierentinthese 50

PAGE 51

Toassesstherobustnessofourproposednonparametricestimator,wemodeledsimulateddatafromat-distributionwith5degreesoffreedom.Thatis,samplesweretakenfrom whereXN(0;),Z2()andgkisthelogisticmeanforgenotypek=1;2;3.TheresultsarepresentedinTables 2-3 and 2-4 .WeexcludedthecolumnformaxLRbecauseitisnotappropriateinthisscenario.Theresultsshowthatdespiteinatedaveragelosses,NPstilloutperformsAR(1).Noticethatthequadraticlossisseverelyinatedbecauseofthefattailsofthet-distribution.Itmaynotbeareliablemeasureofperformancebutwepresenttheresultshereforillustration. 51

PAGE 52

QTLQTLgenotype1QTLgenotype2QTLgenotype3 CovariancenLocation^a1^b1^r1^a2^b2^r2^a3^b3^r3maxLRLELQ

PAGE 53

QTLQTLgenotype1QTLgenotype2QTLgenotype3 CovariancenLocation^a1^b1^r1^a2^b2^r2^a3^b3^r3maxLRLELQ

PAGE 54

QTLQTLgenotype1QTLgenotype2QTLgenotype3 CovariancenLocation^a1^b1^r1^a2^b2^r2^a3^b3^r3LELQ

PAGE 55

QTLQTLgenotype1QTLgenotype2QTLgenotype3 CovariancenLocation^a1^b1^r1^a2^b2^r2^a3^b3^r3LELQ

PAGE 56

1.1.2 )populationof259maleand243femaleprogenywith96markersinatotalof19chromosomes.Themiceweremeasuredfortheirbodymassat10weeklyintervalsstartingatage7days.Correctionsweremadefortheeectsduetodam,littersizeatbirth,parity,andsex(Cheverudetal.,1996;Krameretal.,1998).AplotoftheweightdataisshowninFigure 1-3 FunctionalmappingwasrstusedtoanalyzethisdatainZhaoetal.(2004),whoinvestigatedQTLsexinteraction.Theauthorsusedalogisticcurve(Eq. 1{4 )tomodelthegenotypemeansandemployedthetransform-both-sides(TBS;Section 1.2.1 )techniqueforvariancestabilizationinordertoutilizeanAR(1)structure.Theirmethodidentied4of19chromosomesthateachhadsignicantQTLsandtheyconcludedthatthereweresexdierencesofbodymassgrowthinmice.Zhaoetal.(2005)appliedanSADcovariancestructureinfunctionalmappingandfound3QTLs.LiuandWu(2007)likewiseanalyzedthesamedatausingaBayesianapproachinfunctionalmappinganddetectedonly3signicantQTLs. Here,weappliedourproposednonparametricestimator,NP,inagenome-widescanforgrowthQTLwithoutregardtosex.Wescannedthelinkagemapatintervalsof4cM.Figure 2-3 showstheLRplotsforall19chromosomes.Theywereobtainedusing'sthatwerecross-validatedateachsearchpoint.Weconductedapermutationtest(DoergeandChurchill,1996;Section 1.2.3 )toidentifysignicantQTLs.Foreverypermutationrun,wecalculatedmaxLReforchromosomee=1;:::;19usingthesamegeneralprocedureasinthesimulations(Section 2.4.1 ).Inthismicedataset,however,somemarkerswereeithermissingornotgenotypedandweusedonlytheavailablemarkers(Table 2-5 ).Thus,everymarkerintervalhaddierentsetsofavailablephenotypedata.Butwebelievethisdidnotaecttheresultsbecauseofthelargesamplesizeoftheavailabledata.Welookedatchromosomes6and7andfoundthistobethecase.Figure 2-4 showsLRplotsbased 56

PAGE 57

2-3 correspondto95%(broken)and99%(solid)thresholdsbasedon100permutationtestruns.Therewere9chromosomeswithsignicantQTLs(1;4;6;7;9;10;11;14and15)basedonthe95%thresholdbutonly7under99%(1;4;6;7;10;11and15).Thetwochromosomesthatdidnotmakethe99%threshold(9and14)barelymadethe95%.Forthismicedataset,werecommendusingthe99%thresholdbecausetherewereonly100permutationtestruns.Zhaoetal.(2004)identiedQTLsinchromosomes6;7;11and15,andZhaoetal.(2005)andLiuandWu(2007)foundQTLsinchromosomes6;7and10.Thesewereallatthe95%threshold.OurndingsveriedtheresultsofthesepreviousstudiesthatmadeuseofthefunctionalmappingmethodandevendetectedmoreQTLs.Althoughthereisadiscrepancyinourresultsandothers,itisinconclusivetosaythattheseadditionalQTLsthatourproposedmodeldetectedarenonexistent.Infact,Vaughnetal.(1999)identied17QTLs,althoughmostofthemaresuggestive,usingsimpleintervalmapping. TheestimatedgenotypemeancurvesforthedetectedQTLsareshowninFigure??.ThreegenotypesataQTLhavedierentgrowthcurves,indicatingthetemporalgeneticeectsofthisQTLongrowthprocessesformousebodymass.SomeQTLs,likethoseonchromosomes6,7and10,actinanadditivemannerbecausetheheterozygote(Qq,brokencurves)areintermediatebetweenthetwohomozygotes(QQ,solidcurvesandqq,dotcurves).SomeQTLsuchasoneonchromosome11areoperationalinadominantwaysincetheheterozygoteisveryclosetooneofthehomozygotes. 57

PAGE 58

.ThegenomicpositioncorrespondingtothepeakofthecurveistheoptimallikelihoodestimateoftheQTLlocalizationindicatedbyverticalbrokenlines.Theticksonthex-axisindicatethepositionsofmarkersonthechromosome.Themapdistances(incenti-Morgan)betweentwomarkersarecalculatedusingtheHaldanemappingfunction.Thethresholdsforclaimingthegenome-wideexistenceofaQTLareshownbyhorizontallines.

PAGE 59

AvailablemarkersandphenotypedataofalinkagemapinanF2populationofmice(datafromVaughnetal.,1999). MarkerIntervals Chromosome12345678 13784334834674504404662414404453465430347749148947647544614754814814915441439449381385646748348548174074244594523783724284158395453472949849649810401406481490497114314514684644461249748948348813450443466144434754951549149446816498173713941848747942019445468468 59

PAGE 60

Log-likelihoodratio(LR)plotsforchromosomes6and7ofthemicedata.Thesolidlineplotisbasedoncross-validated(CV)tuningparametersateachsearchpoint(individual's).Thebrokenlineplotisbasedoncross-validatedtuningparameters(max's)correspondingtothemaximumLRineachmarkerinterval.Thedottedlineplotisbasedontwodierentarbitrarytuningparametervalues,eachassumedatallsearchpoints.Slightdierencesbetweenthesolidandbrokenlineplotsmaybeduetodierentsamplesizesamongmarkerintervals(seeTable5). 60

PAGE 61

g:Means

PAGE 62

Inthischapter,weadoptedHuangetal.'sL2penaltyapproachinfunctionalmapping.ThispenaltyworksbestwhenthetrueTmatrixhasmanysmallelements.UsingtheL1penaltygivesabetterestimatorwhensomeoftheelementsofTareactuallyzero.However,webelievethatthedierencesinresultsbetweenusingeitherpenaltieswillnotbesignicantunlessthedimensionisverylarge.Nonetheless,theL1penaltycanbeeasilyincorporatedintoourscheme.WehaveshownhowtointegrateHuangetal.'sprocedureintothemixturelikelihoodframeworkoffunctionalmapping.Thekeywastoutilizetheposteriorprobabilityrepresentationofthederivativeofthelog-likelihood, 2{27 ,andapplyanL2penaltytothenegativelog-likelihood.EstimationwasthencarriedoutusingtheECMalgorithm(Section 2.3.2 )withtwoCM-steps,basedonapartitionofthemeanandcovarianceparameters.OursimulationshaveshownbetteraccuracyandprecisioninestimatesforQTLlocation,genotypemeanparameters,andmaxLRvalues,byNPcomparedtoAR(1).ThemaxLRvaluesareimportantbecausethecompleteLRplotprovidestheamountofevidencefortheexistenceofaQTL.LRvaluesnoticeably 62

PAGE 63

2-3 )seemedtohavethelargestevidenceforQTLexistence.TheLRplotsarealsousedinpermutationtests(Section 1.2.3 )tondasignicancethreshold.MorepreciseestimatesofthecovariancestructuremeansbetterestimatesofthethepeakoftheLRplotandthereforemorereliablepermutationtestsresults. Withregardstotheutilizationofourproposedmodel,wesuggestapreliminaryanalysisofthedatabycheckingvarianceandcovariancestationarity.IftheselatterconditionsaresatisedthenAR(1)maybeanappropriatemodel.IfcovariancestationarityisnotanissuethenaTBSmethod(Section 1.2.1 )coupledwithusingAR(1)isapplicable.IfnostationarityisdetectedthenanSAD(Section 1.2.1 )orNPmaybemoreuseful.Althoughwedidnotassessthecomparativeperformanceofthesetwomodels,wethinkthatSADbecomesmorecomputationallyintensiveifthedataexhibitslong-termdependence,inwhichcaseNPmaybemoreappropriate.NPshouldalsobeconsideredifotherparametricstructuresaresuspect. 63

PAGE 64

1.2 )whichaddressesthelatterdicultybyusingabiologicallyrelevantmathematicalfunctiontomodelreactionnorms.Theauthorsconsideredaparametricmodelofphotosyntheticrateasafunctionoflightirradianceandtemperatureandstudiedthegeneticmechanismofsuchprocess.Theyshowed,throughextensivesimulations,thatinabackcrosspopulationwithoneortwo-QTLs,theirmethodaccuratelyandpreciselyestimatedtheQTLlocation(s)andtheparametersofthemeanmodel.However,theyassumedthecovariancematrixtobeaKroneckerproductoftwoAR(1)structures,eachmodelingareactionnormduetooneenvironmentalfactor.Thistypeofcovariancemodelissaidtobeseparable.Althoughcomputationallyattractive,suchmodelonlycapturesseparatereactionnormeectsbutfailstoincorporateinteractions.Amoregeneralapproachisthereforeneeded. 64

PAGE 65

Inthischapter,weshowthroughsimulationsthat,infunctionalmappingofreactionnormstotwoenvironmentalsignals,(1)nonseparablestructurescanbeutilizedascovariancemodelsandusedtogeneratedataofprocessesthatexhibitinteractions(2)theseparablemodelproposedbyWuetal.(2007),whichweshallcallAR(1),maynotbeappropriateforsuchdataand(3)thenonparametriccovarianceestimator,NP,developedinchapter 2 ,isamorereliablecovariancemodelthanAR(1).Byutilityin(1),wemeanthatanonseparablemodelcananalyzedatageneratedbythesamenonseparablemodel.Withregardsto(2),ourresultsaresurprisingbecause,forsomevarianceoftheprocessoracertainnumberoflevelsintheenvironmentalsignals,theestimatedQTLlocationandmeanmodelparametersaregenerallyrobusttoabiasedseparablecovarianceestimate,^AR(1),ofanonseparableunderlyingstructure.Thatis,ifthecovarianceofadatageneratedfromanonseparablestructureisestimatedbytheseparablemodel,AR(1),theestimateisbiased,asexpected,buttheQTLlocationandmeanmodelparametersarestillaccuratelyandpreciselyestimated.However,theestimatedmaxLR(Section 65

PAGE 66

)isnotaccuratebecausethetrueunderlyingcovariancestructureandthe(biased)estimate,^AR(1),producedierentlog-likelihoodvalues.RecallthatmaxLRisimportantbecauseitisusedinpermutationteststoassesssignicanceofQTLexistence.Butwhenboththevarianceandthenumberoflevelsintheenvironmentalsignalsareincreased,theestimatedQTLlocationisseverelybiasedwhilethemeanparametersareonlymildlyaected.NPprovidesconsistentlybetterresultsoverAR(1).Ofcourse,ifnonseparablecovariancemodelsthemselvesareusedtoanalyzedatathatexhibitinteractions,theresultsareexpectedtobemuchbetter.However,inreality,theunderlyingstructureofthedataisunknownanditisverydiculttoidentifyanappropriatenonseparablemodeltouseinthiscase.Modelersoftenemploystrategiesthataremainlyadhocorspecictoaproblem.Unfortunately,therearenogeneralguidelinesthatareavailableinapproachingthesetypeofproblems.Wewill,however,usenonseparablecovariancemodelstogeneratesimulateddatawithinteractionsanduseittocompareNPandAR(1). Thischapterisorganizedasfollows:InSection 3.2 ,wedescribethefunctionalmappingmodelproposedbyWuetal.(2007a)forreactionnorms.InSection 3.3 ,wediscussseparableandnonseparablemodelsusedinspatio-temporalmodeling.InSection 3.4 ,wepresentasimulationstudyusingsomenonseparablestructuresintroducedinSection 3.3 andthenconcludewithasummaryanddiscussioninSection 3.5 .Inthischapter,wemayalternatelyusethetermscovariancematrix,structureorfunction.Theyallrefertothesamething. 66

PAGE 67

Anexampleofareactionnormthatillustratesasurfacelandscapeisphotosyn-thesis(Wuetal.,2007a)whichistheprocessbywhichlightenergyisconvertedtochemicalenergybyplantsandotherlivingorganisms.Itisanimportantbutcomplexprocessbecauseitinvolvesseveralfactorssuchastheageofaleaf(wherephotosynthesistakesplaceinmostplants),theconcentrationofcarbondioxideintheenvironment,temperature,lightirradiance,availablenutrientsandwaterinthesoil,etc..Amathematicalexpressionfortherateofsingle-leafphotosynthesis,P,withoutphotorespirationis 2I+Pmp (ThornleyandJohnson,1990),where2(0;1)isadimensionlessparameter,isthephotochemicaleciency,Iistheirradiance,andPmistheasymptoticphotosyntheticrateatasaturatingirradiance.Pmisalinearfunctionofthetemperature,T, wherePm(20)isthevalueofPmatthereferencetemperatureof20oCandTisthetemperatureatwhichphotosynthesisstops.Tischosenoverarangeoftemperatures,suchas5oC-25oC,toprovideagoodttoobserveddata. Wuetal.(2007a)studiedthereactionnormofphotosyntheticrate,denedbyEqs. 3{1 and 3{2 ,asafunctionofirradiance(I)andtemperature(T).Thatis,theauthorsconsideredP=P(I;T).Here,weassumethatT=5sothatthereactionnormmodelparametersare(;Pm(20);).ThesurfacelandscapethatdescribesthereactionnormofP(I;T),withparameters(;Pm(20);)=(0:02;1;0:9),isshowninFigure 3-1 .Asstatedearlier,eachreactionnormsurfacecorrespondstoaspecicgeneticeect.Thus,ifaQTL 67

PAGE 68

Reactionnormsurfaceofphotosyntheticrateasafunctionofirradianceandtemperature.ModelisbasedonEqs. 3{1 and 3{2 withparameters(;Pm(20);)=(0:02;1;0:9).AdaptedfromWuetal.(2007a). 1.1.2 )withoneQTL.Extensionstomorecomplicateddesignsandthetwo-QTLcase,asinWuetal.(2007a),arestraightforward.AssumeabackcrossplantpopulationofsizenwithasingleQTLaectingthephenotypictraitofphotosyntheticrate.Thephotosyntheticrateforeachprogenyi(=1;:::;n)ismeasuredatdierentirradiance(s=1;:::;S)andtemperature(t=1;:::;T)levels.Thischoiceofvariablesisadoptedforconsistencyinlaterdiscussionsaswewillbeworkingwithspatio-temporalcovariancemodels.Thesetofphenotypemeasurementsorobservationscanbewritteninvectorformas {z }irradiance1;:::;yi(S;1);:::;yi(S;T)| {z }irradianceS]0:(3{3) 68

PAGE 69

1.1 ).Becauseweassumeabackcrossdesign,theQTLhastwopossiblegenotypes(asdothemarkers)whichshallbeindexedbyk=1;2.Thelikelihoodfunctionbasedonthephenotypeandmarkerdatacanbeformulatedas wherepkjiistheconditionalprobabilityofaQTLgenotypegiventhegenotypeofamarkerintervalforprogenyi(Section 1.1.4 ).Weassumeamultivariatenormaldensityforthephenotypevectoryiwithgenotype-specicmeans {z }irradiance1;:::;k(S;1);:::;k(S;T)| {z }irradianceS]0(3{5) andcovariancematrix=cov(yi). 3{5 canbemodeledusingEqs. 3{1 and 3{2 as 2kks+Pmkp where andk=1;2. Wuetal.(2007a)usedaseparablestructure(Mitchelletal.,2005)fortheSTSTcovariancematrixas AR(1)=12(3{8) 69

PAGE 70

D ).Notethat1and2areuniqueonlyuptomultiplesofaconstantbecauseforsomejcj>0,c1(1=c)2=12.Eachof1and2ismodeledusinganAR(1)structurewithacommonerrorvariance,2,andcorrelationparameters1and2: 1=226666666411:::S1111:::S21............S11S21:::1377777775;2=226666666412:::T1221:::T21............T12T22:::1377777775(3{9) Separablecovariancestructures,however,cannotmodelinteractioneectsofeachreactionnormtotemperatureandirradiance.Thus,thereisaneedforamoregeneralmodelforthispurpose. Notethatwith 3{6 3{7 3{8 and 3{9 ,=f1;Pm1(20);1;2;Pm2(20);2;2;1;2gin 3{4 .ThesemodelparametersmaybeestimatedusingtheECMalgorithmbutclosedformsolutionsattheCM-stepcouldbeverycomplicated.AmoreecientmethodistheNelder-Meadsimplexalgorithm(Section 2.3.2 ). Thismeansthatifthereactionnormcurvesaredistinct(intermsoftheirrespectiveestimatedparameters),thenaQTLpossiblyexists.OfcourseaslightdierenceinparameterestimatesdoesnotautomaticallymeanaQTLexists.Butthesignicanceoftheresultscanbetestedbydoingpermutationtestsusingthelog-likelihoodratio 70

PAGE 71

1.2.3 ).AproceduredescribedinWuetal.(2004a)canbeusedtotesttheadditiveeectsofaQTL.Otherhypothesescanbeformulatedandtestedsuchasthegeneticcontrolofthereactionnormtoeachenvironmentalfactor,interactioneectsbetweenenvironmentalfactorsonthephenotype,andthemarginalslopeofthereactionnormwithrespecttoeachenvironmentalfactororthegradientofthereactionnormitself.ThereaderisreferredtoWuetal.(2007a)formoredetails. 3.3.1Introduction 71

PAGE 72

Theconstructionofvalid(positive-denite)nonseparablecovariancemodelshastakengreatstridesinrecentyears.SchabenbergerandGotway(2005)describefourmainapproaches:(1)Gneiting's(2002)monotonefunction,(2)CressieandHuang's(1999)spectralmethod,(3)mixture(Ma,2007),and(4)JonesandZhang's(1997)partialdierentialequation.(1)and(2)utilizemainlystatisticalprincipleswhereas(3)and(4)aremostlymathematicalinnature.Weshalldiscuss(1)and(2)inSection 3.3.4 anduseexamplesderivedfromtheseapproachesinthesimulations(Section 3.4 ). whereobservationsarecollectedatNspatio-temporalcoordinates(s1;t1);(s2;t2);:::;(sN;tN)andd2Z+.Thedataareonlyapartialrealizationoftheprocessbecause,forpracticalreasons,theprocesscannotbeobservedateachcoordinate.Gneiting(2002)notesthatmathematically,thespace-timedomainRdRandthepurelyspatialdomainRd+1areequivalent.Thismeansthatthespace-timecovariancefunctionsinRdRandspatialcovariancefunctionsinRd+1belongtothesameclass.However,thenotationRdRisusedtohighlightthedistinctionbetweentherespectivedomains.Inthisstudy,wewillonlybeconcernedwiththecased=1sothat,fromhereon,wewilluseRinsteadofRdforthespatialdomain.Asidefromthosementionedintheintroduction(Section 3.1 ),Ymayalsorepresentozonelevels,diseaseincidence,oceancurrentpatterns,watertemperatures,etc.Inourstudy,Yrepresentsphotosyntheticrate. Ifvar(Y(s;t))<1forall(s;t)2RR,thenthemeanE[Y(s;t)]andcovariancecov(Y(s;t);Y(s+u;t+v)),whereuandvarespatialandtemporallags,respectively,bothexist.Weassumethatthecovarianceisstationaryinspaceandtimesothatforsome 72

PAGE 73

cov(Y(s;t);Y(s+u;t+v))=C(u;v):(3{11) Thismeansthatthecovariancefunction,C,dependsonlyonthelagsandnotonthevaluesofthecoordinatesthemselves.Stationarityisoftenassumedtoallowestimationofthecovariancefunctionfromthedata(CressieandHuang,1999).Wealsoassumethatthecovariancefunctionisisotropicwhichmeansthatitdependsonlyontheabsolutelagsandnotinthedirectionororientationofthecoordinatestoeachother: cov(Y(s;t);Y(s+u;t+v))=C(juj;jvj):(3{12) Stationaryandisotropiccovariancefunctionsaresaidtobetranslationandrotation-invariant(abouttheorigin)(WallerandGotway,2004).NotethatC(u;0)andC(0;v)correspondtopurelyspatialandpurelytemporalcovariancefunctions,respectively. Tobeavalidcovariancefunction,Cmustbepositivedenite.Thismeansthatforany(s1;t1);:::;(sk;tk)2RR,anyrealcoecientsa1;:::;ak,andanypositiveintegerk, NotethatbasedonEq. 3{13 ,Cshouldreallybenonnegative-denite.However,thisisthewayitisdenedintheliteratureandwewilladheretothisconvention. Inspatio-temporalanalysis,theultimategoalisoptimalprediction(orkriging)ofanunobservedpartoftherandomprocessusinganappropriatecovariancefunctionmodel.Inthisstudy,weutilizeanonseparablecovariancetocalculatethemixturelikelihoodassociatedwithfunctionalmapping. 73

PAGE 74

where2=C(0;0)isthevarianceoftheprocess. Withrepresentation 3{14 ,separablemodelshaveanadvantage.Forexample,modelsforC(u;vj)canbeeasilyconstructedbyselectingsuitableandreadilyavailablechoicesforeachofC1(uj1)andC2(vj2).Becausemanyofthesechoicesarepositive-denite,C(u;vj)isguaranteedtobepositive-denitealso.Anexampleis whereC1(uja)=exp(ajuj)andC2(vjb)=exp(bjvj).Noticethatforanygivenspatiallagsu1andu2,C(u1;vja;b)andC(u2;vja;b)areproportionaltoeachother.Thismeansthattheplotsofthetemporalcovarianceshavethesameshapesatthesespatiallags.ThispropertyisimportantinthespectralconstructionofvalidnonseparablemodelsproposedbyCressieandHuang(1999)(Section 3.3.4.1 ).Forseparablemodels,theprocessesinthespatialandtemporaldomainsdonotactoneachotherandhencetheselectionofanappropriatemodelforC(u;vj)canbefacilitatedbydoingseparate(conditional)exploratorydataanalysesofspatialandtemporalpatterns. AmoregeneraldenitionforseparabilityisasaKroneckerproduct,asinEq. 3{8 .FromEq. 3{8 ,itcanbeshownthat1AR(1)=1112andjAR(1)j=j1jj2j,wherejjdenotesthedeterminantofamatrix.Thus,anotheradvantageofseparablemodelsiscomputationaleciency,particularlyinlikelihoodmodelswheretheinverseanddeterminantofthecovariancematrixarecalculated.ForalargecovariancematrixofdimensionUV,itsinversecanbecalculatedfromtheinversesofitsKroneckercomponent 74

PAGE 75

3{14 as whereu=1;:::;U,v=1;:::;V.Thismodelassumesequidistantorregularlyspacedcoordinates.Thus,twoconsecutiveorclosestneighborcoordinateswillhavethesamecorrelationstructureasanothereveniftheirrespectivedistancesaredierent.Amoreappropriatemodelmightbe whereaandbarescaleparameters.However,thismodelismorecomplexthanAR(1)inthesensethatithasmoreparameters(5vs3)toestimate.Thequestionofwhichmodelisbetterwillleadustoamodelselectionissue. 75

PAGE 76

E )thatEq. 3{19 canbeexpressedas 3{20 canbeusedtondvalidcovariancefunctionsbyselectingappropriateformsfor(!;v)and(!).Togetnonseparablestructures,(!;v)mustnotbeindependentof!.Otherwise,C(u;v)willbeseparable. CressieandHuanggavesevenexamplesofvalidnonseparablecovariancefunctionsconstructedfromcertainchoicesfor(!;v)and(!)andusingequation 3{20 .Wepresentthreeofthemhereandusethersttwointhesimulations. wherea;b0arethescalingparametersoftimeandspace,respectively,and2=C(0;0). (ajvj+1)2+b2juj2;(3{22) wherea;b0arethescalingparametersoftimeandspace,respectively,and2=C(0;0). 76

PAGE 77

3{23 reducestoaseparablemodel. 3{19 or 3{20 .Gneiting(2002)developedanapproachthatdoesnotrelyonFouriertransformpairsandavoidsthiskindoflimitation. Let(x)and(x)befunctionswithnonnegativedomains.Suppose(x)iscompletelymonotoneand(x)ispositivewithacompletelymonotonederivative.Then where2=C(0;0)>0,isavalidnonseparablespatio-temporalcovariancemodel.(x)and(x)canbeassociatedwithspatialandtemporalstructures,respectively,andGneiting(2002)providesalistoffunctionsthatcanbeusedforeach.Forexample,using Multiplying 3{25 bythepurelytemporalcovariancefunction(ajvj2+1);v2R,with0,produces 77

PAGE 78

3{26 is where1=2replaces+=2.isaspace-timeinteractionparameterwhichimpliesaseparablestructurewhen0andnonseparablestructureotherwise.Increasingvaluesofindicatesstrengtheningspatio-temporalinteraction. 3{21 3{22 (Examples1and2ofSection 3.3.4.1 )and 3{27 ,denotedasfollows: (ajvj+1)2+b2juj2;a;b0;2>0;(3{29) Tosimplifyouranalysis,weassumeforC3(u;v)that=1=2;=1=2,and=1sothat Wethengeneratedatausingthesenonseparablestructurestosimulateinteractioneectsbetweenthetwoenvironmentalsignalsinfunctionalmappingofareactionnorm.Simulationsusingseparableandnonseparablecovariancestructuresforspatio-temporalprocesswerestudiedbyHuangetal.(2007a).Thegenerateddataisanalyzedusingthenonparametricestimator,NP,developedinchapter 2 andAR(1)toassesstheirperformance.Wealsowanttotestwhethertheseparablemodel,AR(1),canbeusedto 78

PAGE 79

2.4.1 ). Usingabackcrossdesign(2genotypegroups;Section 1.1.2 )fortheQTLmappingpopulation,werandomlygenerated6markersequallyspacedonachromosome100cMlong.OneQTLwassimulatedbetweenthefourthandfthmarkers,12cMfromthefourthmarker(or72cMfromtheleftmostmarkerofthechromosome).TheQTLhadtwopossiblegenotypeswhichdeterminedtwodistinctmeanphotosyntheticratereactionnormsurfacesdenedbyEqs. 3{1 and 3{2 (seeFigure 3-1 ).Thesurfaceparametersforeachgenotypegroupwere(1;Pm1(20);1)=(0:02;2;0:9)and(2;Pm2(20);2)=(0:01;1:5;0:9).Phenotypeobservationswereobtainedbysamplingfromamultivariatenormaldistributionwithmeansurfacebasedonirradianceandtemperaturelevelsoff0;50;100;200;300gandf15;20;25;30g,respectivelyandcovariancematrixCl(u;v);l=1;2;3. Thefunctionalmappingmodelwasappliedtothemarkerandphenotypedatawithn=200;400samples.ThesurfacedenedbyEqs. 3{1 and 3{2 wasusedasmeanmodelandCl(u;v)ascovariancemodeltoanalyzethedatageneratedusingCl(u;v).Thatis,wemodeleddatageneratedbythesamemeanandcovarianceusedinthemodel.100simulationrunswerecarriedoutandtheaveragesonallrunsoftheestimatedQTLlocation,meanparameterestimates,maxLR,entropyandquadraticlosses(seeSection 2.4.1 ),includingtherespectiveMontecarlostandarderrors(SE),wererecorded.TheresultsareshowninTables 3-1 and 3-2 .Table 3-2 alsoincludestheresultsforAR(1).BothtablesshowaccurateandpreciseestimatesofQTLlocation,meansurfaceandcovarianceparameters. Next,NPandAR(1)wereusedtoanalyzethedatageneratedbyeachofCl(u;v);l=1;2;3.Tables 3-3 and 3-4 showtheresultsoftheserespectiveanalyses.TheresultsforNPareverygood.However,thoseforAR(1)aresomewhatunexpected.Apparently,theestimatedQTLlocationandmeanparametersareaccurateandprecise!Thiswouldimply 79

PAGE 80

3-1 and 3-2 ,whichshouldbe(almost)thetruevalues.Theaveragelosses,however,areinatedforC1andC2.Uponcloseinspection,itturnsoutthatitismisleadingtolookatmaxLRinthissituation.Whatshouldbeconsideredarethelog-likelihoodvaluesunderthenullandalternativemodelsfromwhichmaxLRisderived.Figure 3-2 providesboxplotsofthelog-likelihoodvaluesunderthealternativemodelbasedonthe100simulationruns.TheseplotsrevealclearbiasedestimatesofC1andC2byAR(1)andthedegreesofbiasareconsistentwiththeaveragelosses.Theresultsforthenullmodelareverysimilarbutarenotpresentedhere.WealsoprovidethecovarianceandcorrespondingcontourplotsofCl(u;v);l=1;2;3andtheAR(1)estimatesoftheseinFigures 3-3 and 3-4 WeconductedfurthersimulationsunderC1withn=400,thecasewhereAR(1)performedtheworst.Weconsideredtwoscenarios:increasedvariance(2=2;4)andnumberofirradiance(f0;50;100;150;200;250;300g)andtemperature(f15;18;21;24;27;30g)levels.TheresultsareshowninTables 3-5 and 3-6 ,respectively.Theresultsshowthatunderthesetwoscenarios,theestimateoftheQTLlocationisseverelybiasedifoneusesAR(1).ThisisnotthecaseforNP. 80

PAGE 81

QTLQTLgenotype1QTLgenotype2 CovariancenLocation^1^Pm1(20)^1^2^Pm2(20)^2maxLRLELQ2ab C120071.960.022.000.900.011.540.88131.460.020.191.000.500.01(0.32)(0.00)(0.01)(0.00)(0.00)(0.02)(0.01)(2.31)(0.00)(0.04)(0.00)(0.00)(0.00)40072.000.022.010.900.011.520.89262.110.011.131.000.500.01(0.20)(0.00)(0.01)(0.00)(0.00)(0.01)(0.01)(3.00)(0.00)(0.02)(0.00)(0.00)(0.00) QTLQTLgenotype1QTLgenotype2 CovariancenLocation^1^Pm1(20)^1^2^Pm2(20)^2maxLRLELQ212

PAGE 82

QTLQTLgenotype1QTLgenotype2 CovariancenLocation^1^Pm1(20)^1^2^Pm2(20)^2maxLRLELQ2abc C320071.960.022.010.890.011.550.87126.800.020.191.001.030.010.62(0.34)(0.00)(0.01)(0.01)(0.00)(0.02)(0.01)(2.20)(0.00)(0.04)(0.00)(0.02)(0.00)(0.02)40071.920.022.010.900.011.520.89253.380.010.131.001.010.010.61(0.20)(0.00)(0.01)(0.00)(0.00)(0.01)(0.01)(2.83)(0.00)(0.02)(0.00)(0.01)(0.00)(0.02) QTLQTLgenotype1QTLgenotype2 CovariancenLocation^1^Pm1(20)^1^2^Pm2(20)^2maxLRLELQ212

PAGE 83

QTLQTLgenotype1QTLgenotype2 CovariancenLocation^1^Pm1(20)^1^2^Pm2(20)^2maxLRLELQ

PAGE 84

QTLQTLgenotype1QTLgenotype2 CovariancenLocation^1^Pm1(20)^1^2^Pm2(20)^2maxLRLELQ

PAGE 85

Boxplotsofthevaluesofthelog-likelihoodunderthealternativemodel,H1.SignicantlybiasedestimatesbyAR(1)areapparentforC1. 85

PAGE 86

Covarianceplots.PlotsofCl,l=1;2;3versusirradiance(juj)andtemperature(jvj)lagsareontheleftcolumn.OntherightcolumnaretheestimatesofClbyAR(1). 86

PAGE 87

Contourplots.ContourplotsofCl,l=1;2;3ontheleftcolumn.OntherightcolumnarethecontourplotsoftheestimatesofClbyAR(1). 87

PAGE 88

QTLQTLgenotype1QTLgenotype2log-likelihood Covariance2Location^1^Pm1(20)^1^2^Pm2(20)^2H0H1maxLRLELQ

PAGE 89

QTLQTLgenotype1QTLgenotype2log-likelihood Covariance2Location^1^Pm1(20)^1^2^Pm2(20)^2H0H1maxLRLELQ

PAGE 90

Afewissuesneedtobediscussed.First,NPwasdevelopedinchapter2basedonasequenceofregressionsobtainedfromthemodiedCholeskydecompositionofthecovariancematrixofaonedimensional(longitudinal)vectorwhichhasanorderingofvariables.Inthischapter,thephenotypevectorconsistsofobservationsbasedontwolevelsofirradianceandtemperaturemeasurements,i.e. {z }irradiance1;:::;yi(S;1);:::;yi(S;T)| {z }irradianceS]0:(3{32) Whiletheorderofthevariablesinthisvectorispredened,thereisnonaturalorderinglikeinlongitudinaldata.InsteadofNP,amoreappropriatemethodmightbetoadoptthesparsepermutationinvariantcovarianceestimator(SPICE)proposedbyRothmanetal.(2008)whichisinvarianttovariablepermutations.SPICEisderivedbydecomposingthecovariancematrixas =C0C(3{33) whereC=[ctj]isalowertriangularmatrix.Intermsofthecomponentsofthesequenceofregressionequations, whereftjgandf2ttgaretheGARPsandIVs(Section 2.2.1 ).However,oursimulationresultssuggestthatNPcanstillbedirectlyappliedtoobservationsthathavenovariableorderingsuchas 3{32 .Furthermore,Rothmanetal.statedthat,undervariable 90

PAGE 91

Asecondissuepertainstousingnonseparablemodelsinfunctionalmappingwherethesimulationsinthischaptershowedverygoodresults.Thismightbeagoodideaifthemodelcloselyreectsthestructureofthedata.Unfortunately,thisisnotoftenthecase.Infact,itisnotevenknownwhetherthedataexhibitsinteractionsornot.Beforedecidingonwhatmodeltouse,spatio-temporalmodelersutilizetestsforseparability(Mitchelletal.,2005;Fuentesetal.,2005).Ifseparablemodelsareappropriate,thereareawealthofoptions.Otherwise,itisdiculttochoosefromanumberofcomplexmodelsbecausetherearenoavailablegeneralguidelinesasyetthatcanhelponedecideonaspecicnonseparablemodel.ThemodelC3thatwasusedinthesimulations(Section 3.4 )hasaneasytointerpretinteractionparameter2[0;1].However,despiteaninteraction"strength"of=0:6,theseparablemodel,AR(1),estimatedthedatageneratedbyC3quitewell.Thus,thetrade-obetweenusinganonseparablemodelinsteadofaseparableonemaynotbeworthit.Anotheroptionistouseseparableapproximationstononseparablecovariances(Genton,2007).Thenonseparablecovariancesthatweconsideredwereassumedtobestationaryandisotropic(Section 3.3.2 ).Thesetwoassumptionsmaynotalwaysholdforrealdata.Althoughnotspecicallyaddressedhere,usingNPmayworkfordatathatdonotsatisfytheseassumptions. Inthischapter,weonlyconsideredtwoenvironmentalsignalswithinteractions:irradianceandtemperature.However,thereactionnormofphotosyntheticrateisaverycomplexprocessbecausetherearereallymoreenvironmentalsignalsatplayotherthanthelattertwo.Thespatialdomainofspatio-temporalnonseparablecovariancemodelscanbeextendedtomorethanonedimensions.Forexample,atwodimensionalspatialdomainmodelsanareaonaatsurfacewhileathreedimensionaldomainmodelsspace.However,thisextensioncannotbeusedtoincreasethenumberofsignalsunlessthesignalshavethesameunitofmeasurementoroneassumesseparabilityornointeraction.Thus, 91

PAGE 92

Theanalysisconductedinthischapterwereallbasedonsimulateddata,whichmakesourproposedmodeltheoreticalandnot(yet)practical.However,wehopethatourtheoreticalframeworkcaneitherstimulateandmotivateresearcherstoconductexperimentsandstudiestoproducedatathatourmodelcananalyzeoratleastleadresearchtoadirectionthatweconsidertheoreticallypossible. 92

PAGE 93

2{34 2{35 and 2{36 basedonanL2penalty.Thisprovidednotonlyanintuitiveinterpretationbutmoreimportantly,awaytoextendanullmodelcovariancetoanalternative(ormixture)modelcovariance.Thus,thenestedLASSO(Levinaetal.,2008)andSPICE(Rothmanetal.,2008)modelscanpotentiallybeimplementedbyourmethodtoproduceotherregularizedestimators. Weconsideredtwomainapplicationsoffunctionalmapping:traitsmeasuredacrosstimeorlongitudinaltraits(chapter 2 )andreactionnormstotwoenvironmentalsignals(chapter 3 ).Forlongitudinaltraits,simulationsshowedthatourestimatorcanmorepreciselyestimatetheQTLlocationanditseectscomparedtoAR(1).Forreactionnorms,simulationsagainshowedourestimatortobemorereliablethanWuetal.'s(2007a)proposedestimatorinthepresenceofinteractioneectsbetweenthetwoenvironmentalsignals.Theinteractioneectsweresimulatedusingnonseparablecovariancestructures.Inbothapplications,thenonparametricestimatorismoreexiblebecauseitdoesnotassumeaparametricorstructuralformandisthereforesuitedtoanalyzedatawithvaryingstructures.Therefore,thenonparametricestimatorcanbeusedasanalternativeover,oraguidefor,parametricmodelingofthecovarianceinthepracticaldeploymentoffunctionalmapping. 93

PAGE 94

1.3 ).Compositefunctionalmappingallowsmodelingofmarkereectsbeyondtheintervalconsideredbyusingapartialregressionanalysis.ThissignicantlyimprovestheaccuracyandprecisionoffunctionalmappinginmultipleQTLdetection.However,compositefunctionalmappingassumesanAR(1)covariancestructure.Itwouldbeadvantageoustoincorporateourproposednonparametricestimatorintothismethodtofurtherimproveitspower. Thedevelopmentofcomplextraitsistheconsequenceofinteractionsamongamultitudeofgeneticandenvironmentalfactorsthateachtriggeranimpactoneverystepoftraitdevelopment.Thisprocessisinherentlycomplicated,butcanbeillustratedbyalandscapeofphenotypeformedbygeneticandenvironmentalvariables(Rice2002;Wolf2002).Thesurfaceofsuchaphenotypelandscapedenesthephenotypedeterminedbyaparticularcombinationofunderlyinggenetic(suchasadditive,dominantorepistatic)andenvironmentalfactors(suchastemperature,lightormoisture)thatinteractwitheachotherthroughdevelopmentalpathways.Thenumberofunderlyingfactorscontributingtophenotypicvariationdenesthenumberofdimensionsofthelandscapespace.Intheory,thenumberofunderlyingfactorscanbeunlimited,implyingthatalandscapecanexistinvery-high-dimensionalspace(i.e.,hyperspace)(Wolf2002).Figure 4-1 showsahypotheticallandscape,wherethephenotypeofanindividualisdeterminedbythevaluesoftwounderlyingfactors.Bycharacterizingthetopographicalfeaturesofsuchlandscape,afundamentalquestionofhoweachunderlyingfactorcontributestotheexpressionofaparticulartraitindividuallyorthroughaninteractivewebcanbeaddressed.Thesefeaturestypicallyinclude\slope",\curvature",\peak-valley"and\ridge".Thedescriptionofthetopographyofathree-dimensionallandscape(Fig. 4-1 )ismostintuitive,but 94

PAGE 95

Figure4-1. Formationofaphenotypebyalandscape.Thephenotypicformationisafunctionofthevalueofunderlyingfactors1and2(u1andu2)thatinteractduringtraitdevelopment.Twoshadedovalspresenttwodierentareasonthesurface,onebeingsteeper(pointingtoInsetA)andthesecondbeingatter(pointingtoInsetB).Thesteeperoneisassociatedwithadramaticchangeinphenotypicexpressioncontributedbyasmallchangeintheunderlyingfactors(indicatedbythedistributioninInsetA),whereastheatteroneassociatedwithadierentpatterninwhichdramaticchangesintheunderlyingfactorsonlyleadtoaminorchangeinphenotypicexpression(indicatedbythedistributioninInsetB).AdaptedfromWolf(2002). Becausebiologicaltraitsarederivedfromdevelopmentalprocessesandphysiologicalregulatorymechanisms,complexmultivariatesystemsthatundergosuchprocessesshould 95

PAGE 96

96

PAGE 97

Letx=(x01;:::;x0n)0,wherexi=(xi1;:::;xiJ)0,i=1;:::;n,isavectorthatindicatesfromwhichgenotypegroupyi=(yi1;:::;yim)0belongsto.Weassumethatthexi'sareindependentandidenticallydistributed(i.i.d.)realizedvaluesfromamultinomial(1;pi)distributionwherepi=(p1ji;:::;pJji)0.Thus,xik=1or0,dependingonwhetherornotyibelongstogenotypegroupk=1;:::;J.Inreality,xisunknown(ormissing)sothaty=(y01;:::;y0n)0canbeviewedasincompletedata.Thecompletedatais(x0;y0)0withlog-likelihood logLc()=lognYi=1JYk=1[pkjifk(yij)]xik=nXi=1JXk=1xik[logpkji+logfk(yij)]: TheEMalgorithmatthe(j+1)thiterationproceedsasfollows: 1. Thecurrentvalueofis(j). 2. 97

PAGE 98

P(Xik=1jyi;(j))=P(yi;Xik=1j(j)) P(yij(j))=P(yijXik=1;(j))P(Xik=1j(j)) sothat A{2 becomes Therefore,thisstepinvolvesupdatingP(j)kjiusing(j)asin 1{13 3. @Q(j(j))=nXi=1JXk=1P(j)kji@ @logfk(yij)=0(A{5) toget(j+1). 4. Repeatuntilsomeconvergencecriterionismet. 98

PAGE 99

SupposeX0Xisincorrelationform.ThentheeigenvaluedecompositionofX0Xis (X0X)1=V2666666641=100001=200............0001=k377777775V0 2{9 thenfollows. 99

PAGE 100

Forxedtj,j=1;2;:::;t1, 2{33 isminimizedwithrespectto2tbysolving@ @2t"nXi=1JXk=1Pkjilog2t+kit2 yielding sincekit=ykitPt1j=1ykijtj. Forxed2t, 2{33 isminimizedwithrespecttotjbytheminimizerof Lett(t)=(t1;t2;:::;t;t1)0andyki(t)=(yki1;yki2;:::;yki;t1)0.Thersttermof C{2 is 1

PAGE 101

C{2 becomes forxed2t. 101

PAGE 102

If

PAGE 103

ByFouriertransformation, 22ZZei(u!+v)C(u;v)dudv=1 2Zeiv1 2Zeiu!C(u;v)dudv=1 2Zeivh(!;v)dv where istheinverseFouriertransformofginorthespatialspectraldensityfortemporallag.Using E{2 3{19 becomes Let where(!;v)isavalidcontinuousautocorrelationfunctioninvforeach!and(!)>0.IfR(!;v)dv<1andRk(!)d!<1,thenintermsof E{4 E{3 becomes 3{20 103

PAGE 104

[1] Andersson,L.,Haley,C.S.,Ellegren,H.,Knott,S.A.,Johansson,M.,Andersson,K.,Anderssoneklund,L.,Edforslilja,I.,Fredholm,M.,Hansson,I.,Hakansson,J.,Hakansson,J.andLundstrom,K.(1994).\Geneticmappingofquantitativetraitlociforgrowthandfatnessinpigs",Science2631771-1774. [2] Angilletta,Jr.,M.J.andSears,M.W.(2004).\Evolutionofthermalreactionnormsforgrowthrateandbodysizeinectotherms:anintroductiontothesymposium",Integr.Comp.Biol.44,401-402. [3] Akaike,H.(1974)\Anewlookatthestatisticalmodelidentication",IEEETransactionsonAutomaticControl19(6):716723. [4] Banerjee,O.,'dAspremont,A.,andElGhaouli,L.(2006).\Sparsecovarianceselectionviarobustmaximumlikelihoodestimation",ProceedingsofICML. [5] Bickel,P.andLevina,E.(2008)\Regularizedestimationoflargecovariancematrices",Ann.Statist.36(1):199-227. [6] Bochner,S.(1955).HarmonicAnalysisandtheTheoryofProbability,UniversityofCaliforniaPress,BerkleyandLosAngeles. [7] Broman,K.(1997)Identifyingquantitativetraitlociinexperimentalcrosses,Ph.D.Dissertation,DepartmentofStatistics,UniversityofCalifornia,Berkley. [8] Broman,K.(2001).\ReviewofstatisticalmethodsforQTLmappinginexperimentalcrosses",LabAnimal30,no.7,44-52. [9] Carlborg,O.,Andersson,L.andKinghorn,B.(2000).\Theuseofageneticalgorithmforsimultaneousmappingofmultipleinteractingquantitativetraitloci",Genetics155,2003-2010. [10] Carrol,R.J.andRupert,D.(1984).\Powertransformationswhenttingtheoreticalmodelstodata",J.Am.Statist.Assoc.79,321-328. [11] Cox,D.D.andSullivan,F.(1990).\Asymptoticanalysisofpenalizedlikelihoodandrelatedestimators",AnnalsofStatistics18,1676-1695. [12] Cressie,N.andHuang,H-C.(1999).\Classesofnonseparable,spatio-temporalstationarycovariancefunctions",J.Am.Statist.Assoc94,no.448,1330-1340. [13] Cui,H.J.,Zhu,J.andWu,R.(2006)\Functionalmappingforgeneticcontrolofprogrammedcelldeath",Physiol.Genom.25,458-469. [14] Daniels,M.J.andPourahmadi,M.(2002).\Bayesiananalysisofcovariancematricesanddynamicmodelsforlongitudinaldata",Biometrika89,553-566. [15] deBoor,C.(2001)\APracticalGuidetoSplines",Reviseded.SpringerNewYork. 104

PAGE 105

Dempster,A.P.,Laird,N.M.andRubin,D.B.(1977).\MaximumlikelihoodfromincompletedataviatheEMalgorithm",J.Roy.Statist.Soc.B39,1-38. [17] Diggle,P.J.,Heagerty,P.,Liang,K.Y.andZeger,S.L.(2002).AnalysisofLongitudi-nalData,OxfordUniversityPress,UK. [18] Doerge,R.W.(2002).\Mappingandanalysisofquantitativetraitlociinexperimentalpopulations",Nat.Rev.Genet.3:43-52. [19] Doerge,R.W.andChurchill,G.A.(1996).\Permutationtestsformultiplelociaectingaquantitativecharacter",Genetics142,285-294. [20] Drayne,D.,Davies,K.,Hartley,D.,Mandel,J.L.,Camerino,G.,Williamson,R.andWhite,R.(1984).\GeneticmappingofthehumanX-chromosomebyusingrestrictionfragmentlengthpolymorphisms",Proc.Natl.Acad.Sci.USA812836-2839. [21] Eilers,P.H.C.andMarx,B.D.(1996)\FlexiblesmoothingwithB-splinesandpenalties",StatisticalScience11,no.2,89-121. [22] Fan,J.andLi,R.(2001).\Variableselectionvianonconcavepenalizedlikelihoodanditsoracleproperties",J.Am.Statist.Assoc.96,1348-1360. [23] Fu,W.(1998).\Penalizedregressions:Thebridgeversusthelasso",Comput.Graph.Statist.7,397-416. [24] Fuentes,M.(2005).\Testingseparabilityofspatial-temporalcovariancefunctions",JournalofStatisticalPlanningandInference136,no.2,447-466. [25] Furrer,R.andBengtsson,T.(2007).\Estimationofhigh-dimensionalpriorandposterioricovariancematricesinKalmanltervariants",JournalofMultivariateAnalysis98,no.2,pp.227-255. [26] Genton,M.(2007).\Separableapproximationsofspace-timecovariancematrices",Envirometrics18,681-695. [27] Gill,P.,Murray,W.andWright,M.(1981).PracticalOtimization,AcademicPress,NewYork. [28] Gneiting,T.(2002).\Nonseparable,stationarycovariencefunctionsforspace-timedata",J.Am.Statist.Assoc97,no.458,590-600. [29] Gneiting,T.,Genton,M.andGuttorp,P.(2006).\Geostatisticalspace-timemodels,stationary,separabilityandfullsymmetry",StatisticalMethodsforSpatio-temporalSystems(MonographsonStatisticsandAppliedProbability)B.Finkenstadt,L.HeldandV.Isham,editors,Chapman&Hall/CRC. [30] Green,P.(1990).\OnuseoftheEMalgorithmforpenalizedlikelihoodestimation",J.Roy.Statist.Soc.B52,443-452. 105

PAGE 106

Green,P.(1999).\Penalizedlikelihood",EncyclopediaofStatisticalSciences3,578-586. [32] Griths,A.J.,Wessler,S.R.,Lewontin,R.C.,Gelbart,W.G.,Suzuki,D.T.andMiller,J.H.(2005).IntroductiontoGeneticAnalysis,W.H.FreemanandCompany,NewYork. [33] Haldane,J.B.S.(1919).\Thecombinationoflinkagevaluesandthecalculationofdistancebetweenthelocioflinkedfactors",JournalofGenetics8,299-309. [34] Haley,C.S.,Knott,S.A.andElsen,J.M.(1994).\Geneticmappingofquantitativetraitlociincrossbetweenoutbredlinesusingleastsquares",Genetics1361195-1207. [35] Hoerl,A.andKennard,R.(1970).\Ridgeregression:biasedestimationfornonorthogonalproblems",Technometrics12,55-67. [36] Hoeschele,I.(2000).Mappingquantitativetraitlociinoutbredpedigrees.In:HandbookofStatisticalGeneticsEditedbyD.J.Balding,M.BishopandC.Cannings.WileyNewYork.567-597. [37] Huang,H.C.,Martinez,F.,Mateu,J.andMontes,F.(2007a).\Modelcomparisonandselectionforstationaryspace-timemodels",Comp.StatisticsandDataAnalysis51,4577-4596. [38] Huang,J.,Liu,L.andLiu,N.(2007b).\Estimationoflargecovariancematricesoflongitudinaldatawithbasisfunctionapproximations",J.Comput.Graph.Statist.16,189-209. [39] Huang,J.,Liu,N.,Pourahmadi,M.andLiu,L.(2006).\Covarianceselectionandestimationviapenalisednormallikelihood",Biometrika93,85-98. [40] Ibanez,M.V.andSimo,A.(2007).\Spatio-temporalmodelingofperimetrictestdata",StatisticalMethodsinMedicalResearch16,no.6,497-522. [41] Jansen,R.C.(2000).Quantitativetraitlociininbredlines.In:HandbookofStatisticalGeneticsEditedbyD.J.Balding,M.BishopandC.Cannings.WileyNewYork.567-597. [42] Jansen,R.C.andStam,P.(1994).\Highresolutionofquantitativetraitsintomultiplelociviaintervalmapping",Genetics136,1447-1455. [43] Jones,R.H.andZhang,Y.(1997).\Modelsforcontinuousstationaryspace-timeprocess",InModellingLongitudinalandSpatiallyCorrelatedData,LectureNotesinStatistics122,Springer,NewYork,122,289-298. [44] Kao,C.H.,Zeng,Z-B.andTeasdale,R.D.(1999).\Multipleintervalmappingforquantitativetraitloci",Genetics152,1203-1216. 106

PAGE 107

Knott,S.A.,Neale,D.B.,Sewell,M.M.andHaley,C.S.(1997).\Multiplemarkermappingofquantitativetraitlociinanoutbredpedigreeofloblollypine",Theor.Appl.Genet.94810-820. [46] Kramer,M.G.,Vaughn,T.T.,Pletscher,L.S.,King-Ellison,K.Erikson,C.andCheverud,J.M.(1998).\Geneticvariationinbodyweightgrowthandcompositionintheintercrossoflarge(LG/J)andsmall(SM/J)inbredstrainsofmice",GeneticsandMolecularBiology21,211-218. [47] Kenward,M.G.(1987).\Amethodforcomparingprolesofrepeatedmeasurements",Appl.Statist36,296-308. [48] Kingsolver,J.G.,Izem,R.andRagland,G.J.(2004).\Plasticityofsizeandgrowthinuctuatingthermalenvironments:comparingreactionnormsandperformancecurves",Integr.Comp.Biol.44,450-460. [49] M.KirkpatrickandN.Heckman,\Aquantitativegeneticmodelforgrowth,shape,reactionnorms,andotherinnite-dimensionalcharacters",J.Math.Biol.27,429-450,1989. [50] Kolovos,A.,Christakos,G.,Hristopulos,D.T.andSerre,M.L.(2004).\Methodsforgeneratingnon-separablespatiotemporalcovariancemodelswithpotentialenvironmentalapplications",AdvancesinWaterResources27,815-830. [51] Krishnaiah,P.(1985).\MultivariateAnalysis",ElsevierSciencePublishersB.V.,NewYork. [52] Lander,E.S.andBotstein,D.(1989).\MappingMendelianfactorsunderlyingquantitativetraitsusingRFLPlinkagemaps",Genetics121,185-199. [53] Ledoit,O.andWolf,M.(2003).\Awell-conditionedestimatorforlarge-dimensionalcovariancematrices",JournalofMultivariateAnalysis88,365-411. [54] Levina,E.,Rothman,A.andZhu,J.(2008).\Sparseestimationoflargecovariancematricesviaanestedlassopenalty",Ann.Appl.Statist.2,no.1,245-263. [55] Li,H.Y.,Kim,B-R.andWu,R.L.(2006).\Identicationofquantitativetraitnucleotidesthatregulatecancergrowth:Asimulationapproach",J.Theor.Biol.242,426-439. [56] Li,Y.,Wang,N.,Hong,M.,Turner,N.,Lupton,J.andCarrol,R.,(2007).\Nonparametricestimationofcorrelationfunctionsinlongitudinalandspecialdata,withapplicationstocoloncarcinogenesisexperiments",AnnalsofStatistics35,no.4,1608-1643. [57] Lin,M.,Li,H.Y.,Hou,W.,Johnson,J.A.andWu,R.L.(2007).\Modelingsequence-sequenceinteractionsfordrugresponse",Bioinformatics23,no.10,1251-1257. 107

PAGE 108

Lin,M.,Lou,X-Y.,Chang,M.andWu,R.L.(2003).\Ageneralstatisticalframeworkformappingquantitativetraitlociinnon-modelsystems:Issueforcharacterizinglinkagephases",Genetics165,901-913. [59] Lindley,D.V.(1957).\Astatisticalparadox",Biometrika44,187-192. [60] Liu,T.,Liu,X-L,Chen,Y.M.andWu,R.L.(2007).\Aunifyingdierentialequationmodelforfunctionalgeneticmappingofcircadianrhythms",Theor.Biol.MedicalModel.4,5. [61] Liu,T.andWu,R.L.(2007).\AgeneralBayesianframeworkforfunctionalmappingofdynamiccomplextraits",Genetics(tentativelyaccepted2007). [62] Liu,T.,Zhao,W.,Tian,L.andWu,R.L.(2005).\Analgorithmformoleculardissectionoftumorprogression",J.Math.Biol.50,336-354. [63] Long,F.,Chen,Y.Q.,Cheverud,J.M.andWu,R.L.(2006).\Geneticmappingofallometricscalinglaws",Genet.Res.87,207-216. [64] Lynch,M.andWalsh,B.(1998).GeneticsandAnalysisofQuantitativeTraits.Sinauer,Sunderland,MA. [65] Ma,C.(2007).\Recentdevelopmentsontheconstructionofspatial-temporalcovariancemodels",StochEnvironResRiskAssess,Springer-Verlag,22,s39-s47. [66] Ma,C.,Casella,G.andWu,R.L.(2002).\Functionalmappingofquantitativetraitlociunderlyingthecharacterprocess:Atheoreticalframework",Genetics161,1751-1762. [67] Madsen,H.andThyregod,P.(2001).\Calibrationwithabsoluteshrinkage",J.Chemomet.15,497-509. [68] Mallows,C.L.(1973).\SomeCommentsonCp",Technometrics,15,661-675. [69] Matern,B.(1986).SpatialVariation,SpringerNewYork,2ndEd.. [70] McCullagh,P.andNelder,J.A.(1989).GeneralizedLinearModels,ChapmanandHall,London. [71] McLachlan,G.andPeel,D.(2000).FiniteMixtureModels,JohnWileyandSons,Inc.,NewYork. [72] Meng,X-L.andRubin,D.(1993).\MaximumlikelihoodestimationviatheECMalgorithm:Ageneralframework",Biometrika80,267-278. [73] Mitchell,M.W.,Genton,M.G.andGumpertz,M.L.(2005)\Testingforseparabilityofspace-timecovariences",Envirometrics16,819-831. [74] Molenberghs,G.andVerbeke,G.(2005).ModelsforDiscreteLongitudinalData,SpringerScience+BusinessMedia,Inc.,NewYork. 108

PAGE 109

Myers,R.(1990).ClassicalandModernRegressionwithApplications,PWS-KentPublishingCompany,Boston. [76] Nelder,J.A.andMead,R.(1965).\Asimplexmethodforfunctionminimization",Comput.J.7,308-313. [77] Newton,H.J.(1988).TIMESLAB:ATimeSeriesAnalysisLaboratory,Wadsworth&Brooks/Cole,PacicGrove,CA. [78] Niklas,K.L.(1994).PlantAllometry:TheScalingofFormandProcess,UniversityofChicago,Chicago. [79] Nychka,D.,Wikle,C.andRoyle,A.(2002).\Multiresolutionmodelsfornonstationaryspatialcovariancefunctions",StatisticalModeling2,315-331. [80] Ojelund,H.,Madsen,H.andThyregod,P.(2001).\Calibrationwithabsoluteshrinkage",J.Chemomet15,497-509. [81] Pan,J.X.andMackenzie,G.(2003).\Onmodellingmean-covarianceinlongitudinalstudies",Biometrika90,239-244. [82] Porcu,E.andMateu,J.(2006)\Nonseparablestationaryanisotropicspace-timecovariancefunctions",StochEnvironRes.RiskAssess21,113-122. [83] Porcu,E.,Mateu,J.,Zini,A.andPini,R.(2007).\Modellingspatio-temporaldata:Anewviogramandcovariancestructureproposal",StatisticsandProbabilityLetters77,83-89. [84] Pourahmadi,M.(1999).\Jointmean-covariancemodelswithapplicationstolongitudinaldata:Unconstrainedparameterization",Biometrika86,677-690. [85] Pourahmadi,M.(2000).\Maximumlikelihoodestimationofgeneralisedlinearmodelsformultivariatenormalcovariancematrix",Biometrika87,425-435. [86] Ramsay,J.O.andSilverman,B.W.(1997).FunctionalDataAnalysis,Springer-Verlag,NewYork. [87] Rothman,A.,Bickel,P.,Levina,E.andZhu,J.(2007).\Sparsepermutationinvariantcovarianceestimation",Dept.ofStatistics,Univ.ofMichigan(TechnicalReportno.467). [88] Sampson.P.andGuttorp,P.(1992).\Nonparametricestimationofnonstationaryspatialcovariancestructure",J.Am.Statist.Assoc87,108-119. [89] Satagopan,J.M.,Yandell,Y.S.,Newton,M.A.andOsborn,T.C.(1996).\ABayesianapproachtodetectquantitativetraitlociusingMarkovchainMonteCarlo",Genetics144,805-816. [90] Sax,K.(1923).\Theassociationofsizedierencewithseed-coatpatternandpigmentationinPhaseolusvulgaris",Genetics8552-560. 109

PAGE 110

Schabenberger,O.andGotway,C.(2005).StatisticalMethodsforSpatialDataAnalysis,ChapmanandHall/CRC,BocaRaton. [92] Schwarz,G.(1978).\Estimatingthedimensionofamodel",AnnalsofStatistics6(2):461-464. [93] Sillanpaa,M.J.andArjas,E.(1999).\Bayesianmappingofmultiplequantitativetraitlocifromincompleteoutbredospringdata",Genetics151,1605-1619. [94] Smith,M.andKohn,R.(2002).\Parsimoniouscovariancematrixestimationforlongitudinaldata",J.Am.Statist.Assoc97,no.460,1141-1153. [95] Stein,M.(2005).\Space-timecovariancefunctions",J.Am.Statist.Assoc100,no.469,310-321. [96] Stratton,D.(1998).\ReactionnormfunctionsandQTL-environmentinteractionsforoweringtimeinArabidopsisthaliana",Heredity81,144-155. [97] Stroud,J.(2001).\Dynamicmodelsforspatiotemporaldata",J.R.Statist.Soc.B63,673-698. [98] Tibshirani,R.(1996).\RegressionshrinkageandselectionviatheLasso",J.Roy.Statist.Soc.B58,267-288. [99] Vaughn,T.,Pletscher,S.,Peripato,A.,King-Ellison,K.,Adams,E.,Erikson,C.andCheverud,J.(1999).\Mappingofquantitativetraitlociformurinegrowth:Acloserlookatgeneticarchitecture",Genet.Res.74,313-322. [100] Thornley,J.H.M.andJohnson,I.R.(1990).PlantandCropModelling:AMathemat-icalApproachtoPlantandCropPhysiology,ClarendonPress,Oxford. [101] Waller,L.andGotway,C.(2004).AppliedSpatialStatisticsforPublicHealthData,Wiley-Interscience,Hoboken,N.J.. [102] Wang,Z.,Hou,W.andWu,R.L.(2006).\AstatisticalmodeltoanalysequantitativetraitlocusinteractionsforHIVdynamicsfromthevirusandhumangenomes",Statist.Med25,495-511. [103] Wang,Z.andWu,R.L.(2004).\Astatisticalmodelforhigh-resolutionmappingofquantitativetraitlocideterminingHIVdynamics",Statist.Med23,3033-3051. [104] Weiss,R.(2005).ModelingLongitudinalData,Springer-Verlag,NewYork. [107] West,G.B.,Brown,J.H.andEnquist,B.J.(2001).\Ageneralmodelforontogeneticgrowth",Nature413,628-631. [106] Wolf,J.B.(2002).\Thegeometryofphenotypicevolutionindevelopmentalhyperspace",ProceedingsoftheNationalAcademyofSciencesoftheUSA99,15849-15851. 110

PAGE 111

Wong,F.,Carter,C.K.andKohn,R.(2003)\Ecientestimationofcovarianceselectionmodels",Biometrika90,809-830. [108] Wu,J.,Zeng,Y.,Huang,J.,Hou,W.,Zhu,J.andWu,R.L.(2007a).\Functionalmappingofreactionnormstomultipleenvironmentalsignals",Genet.Res.Camb.89,27-38. [109] Wu,R.L.,Hou,W.,Cui,Y.,Li,H.Y.,Wu,S.,Ma,C-X.andZeng,Y.(2007b)\Modelingthegeneticarchitectureofcomplextraitswithmolecularmarkers",RecentPatentsonNanotechnology1,41-49. [110] Wu,R.L.,Ma,C-X.,andCasella,G.(2007c).StatisticalGeneticsofQuantitativeTraits:Linkage,Maps,andQTL,Springer-Verlag,NewYork. [111] Wu,R.L.,Ma,C-X.,Lin,M.andCasella,G.(2004a).\Ageneralframeworkforanalyzingthegeneticarchitectureofdevelopmentalcharacteristics",Genetics166,1541-1551. [112] Wu,R.L.,Ma,C-X.,Lin,M.,Wang,Z.andCasella,G.(2004b).\Functionalmappingofquantitativetraitlociunderlyinggrowthtrajectoriesusingatransform-both-sideslogisticmodel",Biometrics60,729-738. [113] Wu,R.L.,Ma,C-X.,Littel,R.andCasella,G.(2002).\Astatisticalmodelforthegeneticoriginofallometricscalinglawsinbiology",J.Theor.Biol.217,275-287. [114] Wu,W.B.andPourahmadi,M.(2003).\Nonparametricestimationoflargecovariancematricesoflongitudinaldata",Biometrika90,831-844. [115] Wu,S.,Yang,J.andWu,R.L.(2007d).\Semiparametricfunctionalmappingofquantitativetraitlocigoverninglong-termHIVdynamics",Bioinformatics23,569-576. [116] Xu,S.Z.(1996).\Mappingquantitativetraitlociusingfour-waycrosses",Genet.Res.68175-181. [117] Xu,S.Z.andYi,N.J.(2000).\Mixedmodelanalysisofquantitativetraitloci",Proc.Natl.Acad.Sci.USA97,14542-14547. [118] Yang,J.(2006)Nonparametricfunctionalmappingofquantitativetraitloci,Ph.D.Dissertation,DepartmentofStatistics,UniversityofFlorida. [119] Yang,R.Q.,Gao,H.J.,Wang,X.,Zheng,Z-B.,andWu,R.L.(2007).\Asemiparametricmodelforcompositefunctionalmappingofdynamicquantitativetraits",Genetics177,1859-1870. [120] Yap,J.S.,Wang,C.G.andWu,R.L.(2007).\Asimulationapproachforfunctionalmappingofquantitativetraitlocithatregulatethermalperformancecurves",PLoSONE2(6),e554. 111

PAGE 112

Yuan,M.andLin,Y.(2007).\ModelselectionandestimationintheGaussiangraphicalmodel",Biometrika94(1),19-35. [122] Zeng,Z-B.(1994).\Precisionmappingofquantitativetraitloci",Genetics136,1457-1468. [123] Zhao,W.(2005a).Statisticalmodellingforfunctionalmappingoflongitudinalandmultiplelongitudinaltraits:structuredantedependencemodelandwaveletdimensionalityreduction,Ph.D.Dissertation,DepartmentofStatistics,UniversityofFlorida. [124] Zhao,W.,Chen,Y.,Casella,G.,Cheverud,J.M.andWu,R.L.(2005b).\Anon-stationarymodelforfunctionalmappingofcomplextraits",Bioinformatics21,2469-2477. [125] Zhao,W.,Ma,C-X.,Cheverud,J.M.andWu,R.L.(2004).\AunifyingstatisticalmodelforQTLmappingofgenotypesexinteractionfordevelopmentaltrajectories",Physiol.Genomics19,218-227. [126] Zimmerman,D.andNu~nez-Anton,V.(2001).\Parametricmodelingofgrowthcurvedata:Anoverview(withdiscussions)",Test10,1-73. 112

PAGE 113

JohnStephenF.YapwasborninthetownofTagoloan,MisamisOriental,PhilippinestoRhodaandLizardoYap.Hehasanolderbrother,Mark.JohnearnedaB.S.inmathematicsfromAteneodeManilaUniversityinthePhilippinesandupongraduationworkedasanactuarialassistantatWatsonWyatt.HealsospentayearasanassistantinstructorintheMathematicsDepartmentofAteneodeManilaUniversity.JohnobtainedaM.S.inmathematicswithemphasisinactuarialsciencefromtheUniversityofMinnesotainMinneapolisandaPh.D.instatisticsfromtheUniversityofFloridainGainesville.HewillworkattheFoodandDrugAdministrationasamathematicalstatistician. 113