A Physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

MISSING IMAGE

Material Information

Title:
A Physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure
Series Title:
Genome Biology
Physical Description:
Book
Language:
English
Creator:
Zuccolo, Andrea
Bowers, John E.
Estill, James C.
Xiong, Zhiyong
Luo, Meizhong
Sebastian, Aswathy
Goicoechea, José Luis
Collura, Kristi
Yu, Yeisoo
Jiao, Yuannian
Duarte, Jill
Haibao, Tang
Ayyampalayam, Saravanaraj
Rounsley, Steve
Kudrna, Dave
Paterson, Andrew H.
Pires, J.Chris
Chanderbali, Andre
Soltis, Douglas E.
Chamala, Srikar
Barbazuk, Brad
Soltis, Pamela S.
Albert, Victor A.
Ma, Hong
Mandoli, Dina
Banks, Jody
Carlson, John E.
Tomkins, Jeffrey
dePamphilis, Claude W.
Wing, Rod A.
Leebens-Mack, Jim
Publisher:
BioMed Central
Publication Date:

Notes

Abstract:
Background: Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results: Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions: When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
General Note:
Additional files available under Downloads Tab: Additional file 1: Supplemental tables and figures cited with additional details for the physical map and shotgun sequences. Additional file 2: Synteny analysis of Amborella BAC ends and Vitis genes.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution.
System ID:
AA00009670:00001


This item is only available as the following downloads:


Full Text

PAGE 1

RESEARCH OpenAccessAphysicalmapforthe Amborellatrichopoda genomeshedslightontheevolutionof angiospermgenomestructureAndreaZuccolo1,JohnEBowers2,JamesCEstill2,ZhiyongXiong3,MeizhongLuo1,4,AswathySebastian1, JosLuisGoicoechea1,KristiCollura1,YeisooYu1,YuannianJiao5,JillDuarte5,HaibaoTang2,6,7, SaravanarajAyyampalayam2,SteveRounsley8,9,DaveKudrna1,AndrewHPaterson2,7,JChrisPires3, AndreChanderbali10,DouglasESoltis10,SrikarChamala10,BradBarbazuk10,PamelaSSoltis11,VictorAAlbert12, HongMa5,13,DinaMandoli14,JodyBanks15,JohnECarlson16,JeffreyTomkins17,ClaudeWdePamphilis5, RodAWing1andJimLeebens-Mack2*AbstractBackground: Recentphylogeneticanalyseshaveidentified Amborellatrichopoda ,anunderstorytreespecies endemictotheforestsofNewCaledonia,assistertoacladeincludingallotherknownfloweringplantspecies. The Amborella genomeisauniquereferenceforunderstandingtheevolutionofangiospermgenomesbecauseit canserveasanoutgrouptorootcomparativeanalyses.Aphysicalmap,BACendsequencesandsampleshotgun sequencesprovideafirstviewofthe870Mbp Amborella genome. Results: Analysisof Amborella BACendssequencedfromeachcontigsuggeststhatthedensityoflongterminal repeatretrotransposonsisnegativelycorrelatedwiththatofproteincodinggenes.Syntenic,presumablyancestral, geneblockswereidentifiedincomparisonsofthe Amborella BACcontigsandthesequenced Arabidopsisthaliana Populustrichocarpa, Vitisvinifera and Oryzasativa genomes.Parsimonymappingofthelossofsyntenycorroborates previousanalysessuggestingthattherateofstructuralchangehasbeenmorerapidonlineagesleadingto Arabidopsis and Oryza comparedwithlineagesleadingto Populus and Vitis .Thegammapaleohexiploidyevent identifiedinthe Arabidopsis Populus and Vitis genomesisshowntohaveoccurredafterthedivergenceofallother knownangiospermsfromthelineageleadingto Amborella Conclusions: Whenplacedinthecontextofaphysicalmap,BACendsequencesrepresentingjust5.4%ofthe Amborella genomehavefacilitatedreconstructionofgeneblocksthatexistedinthelastcommonancestorofall floweringplants.The Amborella genomeisaninvaluablereferenceforinferencesconcerningtheancestral angiospermandsubsequentgenomeevolution.BackgroundTheoriginandrapiddiversificationoftheangiosperms (floweringplants)werepivotaleventsintheevolutionary historyofEarth sbiota.Overthepast130to150million yearsangiospermshavedive rsifiedtoincludeapproximately350,000speciesoccupyingnearlyallhabitableterrestrialandmanyaquaticen vironments.Angiosperms generatethevastmajorityofhumanfoodeitherdirectlyor indirectlyasanimalfeed,andtheyaccountforahugeproportionofland-basedphotosynthesisandcarbonsequestration.Comparativeanalysesofgenomesequencesand genefunctionforagrowingnumberofspeciesaresheddinglightonhowgeneandgenomeduplicationshave contributedtothediversificationwithinmajorflowering plantlineages(forexample,Rosidae,Asteridae,Monocotyledoneae[1]),butelucidationofthegeneticandgenomic processesunderlyingthekeyinnovationsassociatedwith theoriginoffloweringplants(forexample,typically *Correspondence:jleebensmack@plantbio.uga.edu2DepartmentofPlantBiology,UniversityofGeorgia,4504MillerPlant Sciences,Athens,GA30602,USA FulllistofauthorinformationisavailableattheendofthearticleZuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 2011Zuccoloetal.;licenseeBioMedCentralLtd.ThisisanopenaccessarticledistributedunderthetermsoftheCreativeCommons AttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,andreproductionin anymedium,providedtheoriginalworkisproperlycited.

PAGE 2

bisexualflowers,endospermf ormation,doublefertilization,ovuleswithtwointeguments,seeddevelopment withinthecarpel)requirescomparisonsbetweenlineages thatdivergedfromthelastcommonancestorofallextant angiosperms[2,3]. Recentphylogeneticanalyseshaveidentified Amborellatrichopoda ,anunderstorytreeorshrubspecies endemictotheforestsofNewCaledonia,asthesister speciestoallotherextantangiosperms[4-8]. Amborella isnomore ancient or primitive thananyotherextant floweringplantspecies,butcomparisonsbetween Amborella andotherangiospermsareallowingresearcherstotriangulateoncharacteristicsoftheirlastcommonancestor.Usingasimilarapproach,researchers haveusedthecompletegenomesequenceofplatypus, Ornithorhynchusanatinus ,representingthesistergroup ofallotherextantmammals,toelucidatemammalian genomeevolution[9]. Previouscomparisonsoftranscriptomecontent[10], geneexpressionpatterns[11-13],andgenefunction [14,15]between Amborella andotherfloweringplantspecieshavesuggestedthatmuchofthefloraldevelopment programthathasbeencharacterizedin Arabidopsis ,snapdragonandmaizeexistedinthelastcommonancestorof extantangiosperms.While geneduplicationsinthe MADS-boxtranscriptionfactorfamilylikelycontributed totheearliestfloraldevelopmentregulatorynetworks [11,12,16-19],itisnotclearwhethertheseweresingle geneduplicationsortheproductofpolyploidization.Genomeduplicationshaveoccurredrepeatedlythroughout angiospermhistory[20-23]butthereisuncertaintyinthe timingofpolyploidyevents relativetotheoriginofthe angiospermsandimportantinnovationsinfloweringplant history[24]. HerewedescribeaBAC-baseddraftphysicalmap for A.trichopoda anduseBACendsequences(BESs) tocomparethestructureofthe Amborella genometo representativeeudicot( Vitis Populus and Arabidopsis ) andgrass( Oryza )genomes.Comparativeanalysesof sequencesfortwolargecontiguousregions(487.3and 629.7kbinthe Amborella genome)werealsoperformed.Inadditionweusealargetranscriptome assemblytoidentifyBACend smatchingprotein-codingsequences[25].Ouraimhereistobegintoinvestigatewhetherregionsofthesegenomeshaveremained syntenicthroughoutangiospermhistory,anddeterminewhetherancientgenome duplicationsdiscovered ineudicotandgrassgenomes[26-29]occurredbefore orafterthedivergenceoftheselineagesfromthe Amborella lineage.Inaddition,thephysicalmapand sequenceanalysesestablishaframeworkforfuturestudiesofallfloweringplantgenomes,includingthe Amborella genomeitself.ResultsanddiscussionBAClibraryandphysicalmapThestructureandcompositionofthe870Mbp/C[30] A. trichopoda genomewasinvestigatedthroughphysical mappingofclonesfroma5.2coverageBAClibrary. Thelibrarywasconstructedafterpartialdigestofhighmolecular-weightDNAwith Hin dIII.Thelibrary,which comprises36,684BACcloneswithanestimatedaverage insertsizeof123kb,isavailablethroughtheArizona GenomicsInstitute[31].TheBAClibrarywasdouble spottedinhighdensityontoHybondN+filters.All 36,684cloneswereend-sequenced,andaphysicalmap wasconstructedafterhighinformationcontentfingerprinting(HICF)[32,33].Atotalof32,719fingerprinted BACswasassembledinto3,106contigsand1,356singletonsusingtheprogramFPCversion7.2[34]. Thequalityofthephysicalmapwasassessedbyscreeningthearrayedlibrarywithprobesdevelopedfor Amborella homologsforeightgenesthathavebeenfoundtobe single-copyinsequencedplantgenomes[35,36].Probes derivedfrom Amborella cDNAclonesorPCRamplicons wereputativehomologsoffollowingsingle-copy Arabidopsis genes: ASD (At1g14810), DWARF1 (At3g19820), GIGANTEA (At1g22770), LEAFY (At5g61850),adienelactonehydrolasegene(At2g32520),acytochrome-Coxidase-relatedgene(At4g37830), EIF3K (At4g33250) andahypotheticalprotein-codinggenewithstrongsimilaritytoricegeneOs02g0593400(At5g63135).AllverifiedpositiveclonesmappedtothesameFPCcontigfor sixoftheeightprobes(FigureS1inAdditionalfile1). Positiveclonesforthe EIF3K andthehypotheticalprotein-codinggeneprobeswereeachdistributedbetween twoFPCcontigsandinspectionoftheHICFbandsfor thesecontigssuggeststhatthegeneshavebeenduplicatedinthe Amborella lineage.Inaccordancewiththe expectedlibrarycoverage,thesinglecopynucleargene probeshybridizedto3to13clones(mean6.9). ThecorrelationbetweenHICFbandsandthenumberof BACsincludedineachFPCcontigwas0.655forallcontigsand0.917afterremovingtwocontigsderivedfromthe chloroplastandmitochondrialgenomesandonecontig composedlargelyofrepetit iveelements(FigureS2in Additionalfile1).Weusedacalibrationofaverageinsert size(123kb)overtheaveragenumberofHICFbandsper BACclone(128)toobtainaroughestimateofFPCcontig lengths.Of77FPCcontigswith39ormoreBACs(not includingthecontigswiththeplastomeandrepetitiveelements),estimatedlengthsrangedfrom308to1,429kb. BACendsequencingwasperformedonallfingerprintedBACsproducing69,466Sangerreadswithan averagelengthof695bpafterqualityandvectortrimming.Thiscorrespondsto48.25Mbp,orroughly5.4% ofthe Amborella genome.BESswererelatedtotheZuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page2of14

PAGE 3

physicalmapandusedtoidentifyregionsofsynteny betweenregionsofthe Amborella genomeandthe sequenced Arabidopsis Populus Vitis (grape),and Oryza (rice)genomes(seebelow).Inaddition,end sequenceswereusedtoverifytheidentityofthethree excludedFPCcontigsdescribedabove.AllBESsmappingatleast100bpapartontheplastidgenome[37] werefoundinthesameFPCcontig.Thiscontig includedjust532BACs,indicatingverylow(1.6%)plastidDNAcontamination.CharacterizationofrepeatsinBACendandshotgun sequencesRepeatcompositionandfrequencyinthe Amborella genomewerecharacterizedthroughanalysisoftheBACend andwholegenomesurveysequences.Readswerefirst comparedwithsequencesinRepbase(v.15.08)[38]using BLASTN[39].Inordertominimizetheeffectofdivergencebetween Amborella genesandhomologousrepeats fromotherspecies,weusedrelaxedBLASTNsettings(-q -4-r5)toaccommodateanestimated160millionyearsof sequencedivergencesincethelastcommonancestorof extantfloweringplants[8,40-42]whilemaintainingrigoroussupportforsignificanthits(E-valuethresholdwasset at1e-10).AllBACendsequenceswithoutsignificanthits werethencomparedwiththenon-redundantproteindatabaseinGenBankusingBLASTXandanE-valuethreshold ofe-5.Finally,theremainingsequenceswithoutmatches inRepbaseortheGenBanknrdatabasewerecompared withsequencesthatdidhavematchesineitherdatabase usingBLASTNwithanE-valuethresholdof1.0e-10.We reportresultsbothexcludingthese internal BLAST searchesandincludingthem (I).Togethertheseresults provideestimatesoftransposableelement(TE)content basedonconservativeandmorecomprehensive(andpossiblymorepermissive;I)searchstrategies. Withthemorecomprehensivestrategy(I),slightlymore thanhalfofallthe Amborella BESsmatchedknownTE sequences.Notsurprisingly,themosthighlyrepresented TEclasswaslongterminalrepeat(LTR)retrotransposons, accountingfor7.65%(I:30.01%)ofallBESsand57.5% (I:56.58%)ofallthosewithhitstoRepbase.HitstoTy1copia typesequenceswereslightlymorecommon(3.11%; I:13.79%)thanmatchestoTy3gypsy -likeLTRs(3.50%; I:12.09%);theremainingLTRretrotransposonmatches (1.04%;I:4.13%)werenotclassified.LINEsalsorepresentedasignificantfractionof Amborella BACends: 2.70%(I:11.60%)ofthetotal,19.98%ofalltherepeats (I:22.22%).ThisisnoteworthybecauseLINEsareusually significantlylessnumerousthanLTRretrotransposonsin plantgenomes[43-47]withsomenotableexceptions,such astheelement del2 in Liliumspeciosum [48].ThecompletesetofDNATE-relatedBESsaccountsforjust1.63% (I:4.51%)ofthetotal,andthemostrepresentedclassesare thoseofhATandMuDRelements:0.92%(I:2.41%)and 0.49%(I:1.04%)ofthetotalBESs,respectively.Results fromthesameanalysesreplicatedonthesetof2,695randomshearedSangersequences(Table1)and648,519454 reads(TableS1inAdditionalfile1)aregenerallyinvery goodagreementwiththoseobtainedusingBESdata. A denovo searchfornovelminiatureinvertedrepeat transposableelements(MITEs)overlookedbythesimilaritysearchapproachwascarriedoutusingthepipeline MUST[49].Themostabundantcandidatesidentifiedby thepipelineweremanuallyinspectedtoconfirmfeatures typicalofMITEs,suchassmallsize,terminalinverted repeats,highA+Tnucleotid econtentandtargetsite duplications.Threeputativehigh-copyMITEswere identified.Alloftheseweresmallelements(174to500 bp)withterminalinvertedrepeats,targetsiteduplications,andA+Tcontentgreaterthan65%(FigureS3in Additionalfile1).Repeatcopynumbersestimatedfrom theBESsandrandomshearedsequenceswereextrapolatedtoobtaingenome-wideestimatesusingtheproceduredevelopedbyHawkins etal .[50].Copynumber rangesfrom3,300copiesforMITE_2to17,000copies forMITE_1.TheestimatesinferredfromBESswere generallyconsistentwiththosecalculatedforrandom shearedreads(withthepossibleexceptionofMITE_3; Table2). TheconservedreversetranscriptasedomainsofLTR retrotransposonsandLINEswerecollectedandusedto estimatemaximumlikelihoodtrees(Figure1).Inthecase ofLTRretroelements,thetreesindicatesubstitutionrate heterogeneity(thatis,variati oninroot-to-tipdistances) andnoevidenceforrecentretrotranspositionalburstsof singlefamilies(thatis,shortterminalbranches).Inthe caseofLINEs,thephylogenetictreedisplaysverylong branchessuggestiveofanancientdiversificationorvery rapidsubstitutionrates.Ashasbeendescribedforother plants[51], Amborella LINEsexhibithighsequence divergenceandextremeheterogeneity. The Amborella BESswerealsosearchedformicrosatellites(thatis,simplesequencerepeats(SSRs));forcomparison,thesearchwasalsoconductedonthe Amborella randomshearedreadsandonBESs(fromother Hin dIII BAClibraries)from Glycine (soybean)and Oryzarufipogon .Incomparisontotheothertwospecies, Amborella showsahigherfrequencyofSSRs,particularlymono-and dinucleotiderepeats,withaparticularlyhighfrequencyof AG dinucleotidemicrosatellites.TheresultsofSSRanalysisinBESswereconfirmedbythoseobtainedfromthe randomlysheared Amborella sequences(Table3). Repeatprofilesintheshotgunsequenceswerealso assessedusingTall ymertocharacterizeK-merfrequencies[52].The Amborella K-merfrequencyprofileswere comparedwiththoseof Arabidopsisthaliana Oryza sativa (rice), Sorghumbicolor and Zeamays (m aize).Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page3of14

PAGE 4

Whilethe Amborella genomesizeisclosesttoSorghum s (870and740Mbp/C,respectively),itsK-merfrequency profilesweremoresimilartothoseof Arabidopsis and rice,withmuchsmallergenomesizes(157and490Mbp/ 1C,respectively[53])(Figure2).DistributionofBESswithmatchestoprotein-coding regionsofreferencegenomesAllBESsandshotgunsequenceswerecomparedtothe GenBanknrdatabaseusingBLASTX[39]withanevaluethresholdof1e-5.Aftertheremovalofsequences similartoTEs,theoverallfrequenciesofsequencesfindingmatchesintheproteindatabasewere11.9%and 8.05%fortheBESandSangershotgunsequences, respectively.ForBESsfromFPCcontigswithtenor moreBACs,wefoundanegativecorrelationbetween thefrequenciesofBESsmatc hingprotein-codinggenes andLTRretrotransposons(r=-0.423, P <0.0001).As hasbeendescribedforothergenomes[54-56],gene densityseemstobenegativelycorrelatedwithretrotranposondensityinthe Amborella genome.Identificationofsyntenicblocksbetween Amborella Arabidopsis,rice,poplarandgrapeTakingadvantageoftheavailabilityofaphaseIphysical mapassembly,wemappedthe Amborella contigsontothe genomesof A.thaliana Populustrichocarpa Vitisvinifera ,and O.sativa .Wefocusedonthe77largestcontigs withatleast39clones.BLASTanalysesofBESsweredone withinthecontextoftheirlinkageswithinFPCcontigs.All ofthecontigBESsclassifiedasrepeats(seeabove)were discarded.Thoseremainingwerecomparedagainstthe fourreferencegenomes.Becauseofthelargeevolutionary timethatseparates Amborella fromtheotherfour sequencedgenomes[41,42,57],thecomparisonswerecarriedoutattheproteinlevelusingtBLASTX;onlythebest hitsweretakenintoaccount. Amborella FPCcontigswere consideredforfurtheranalysesifatleasttwoBESshad matcheswithbitscoresgreaterthan80(typicallyamaximume-valueof1.0E-20over100aminoacidicresidues) tolociseparatedbylessthan500kbwithinoneofthe fourgenomesbeingcompared.Positivematcheswere usedasanchorstocircumscribe4-Mbptractswithinthe Table1FrequenciesofBACendsequencesandSangershotgunsequencesmatchingsequencesinRepbaseType Absolutenumberin BESs Percentage BESs Percentagerepeats inBESs Absolutenumberin SGSs Percentage SGSs Percentagerepeats inSGSs DNATEs hAT 642(1,671)0.92(2.41)6.84(4.61) 20(41)0.74(1.52)5.73(2.94) MuDR 343(724)0.49(1.04)3.65(2.00) 7(30)0.26(1.11)2.00(2.15) CACTA 27(75)0.04(0.11)0.29(0.21) 0(4) 0(0.15) 0(0.29) Helitrons 12(69)0.02(0.10)0.13(0.19) 0(3) 0(0.11) 0(0.22) Other 108(595)0.15(0.86)1.15(1.64) 1(24)0.04(0.89)0.29(1.72) Total 1,132(3,134)1.63(4.51)12.06(8.64) 28(102)1.04(3.78)8.02(7.31) Retrotransposons LTRTy1copia 2,162(9,578)3.11(13.79)23.02(26.42) 64(314)2.37(11.65)18.34(22.51) LTRTy3gypsy 2,431(8,395)3.50(12.09)25.89(23.15) 129(377)4.78(13.98)36.96(27.03) LTRnot classified 720(2,868)1.04(4.13)7.67(7.91) 51(139)1.89(5.16)14.61(0.96) LINEs 1,876(8,055)2.70(11.60)19.98(22.22) 55(294)2.04(10.91)15.76(21.08) SINEs 11(183)0.02(0.26)0.12(0.50) 0(4) 0(0.15) 0(0.29) Retronot classified 1,058(4,046)1.52(5.82)11.27(11.16) 23(165)0.85(6.12)6.59(11.83) Total 8,258(33,125)11.89(47.69)87.94(91.36) 321(1,293)11.91(47.96)91.98(92.69) Total 9,390(36,259)13.52(52.20)100(100) 349(1,395)12.95(51.74)100(100)ResultsinparenthesesincludeInternalBlastNsearches.Repbasev.15.08wasused[38].SINE,shortinterspersedelement;SGS,Sangershotgunsequence. Table2Putativelyhigh-copyMITEsidentifiedintheBESsandSangershotgunsequencesusingMUSTpipelineLengthInvertedrepeatlengthBEShitsCopynumberestimateSGShitsCopynumberestimateAT% MITE_135826542~17,00018~17,20068.80 MITE_219019140~3,3008~3,10068.70 MITE_3516 47 394 ~17,900 8 ~11,300 75.20CopynumberestimatesbasedonprocedureofHawkins etal .[50].SGS,Sangershotgunsequence.Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page4of14

PAGE 5

referencegenomesandasecond,morefocusedtBLASTX searchwasperformedcomparingtheBESswiththese regions.Ane-valuethresholdof1.0E-4wasusedforthe secondsetoftBLASTXsearchesandallsignificanthits wereusedtoidentifysyntenicregions.Weconsidereda contigasanchoredifthecontighadatleastfourpositive hits(e-valuelowerthan1.0e-4)toatleastthreedistinct genes. Non-repetitiveBESswerealsocomparedtoadatabase of246,196 Amborella cDNAunigeneassemblieswith lengthsgreaterthan100bp.ThesecDNAswerederived fromcomprehensivesequencingofninecDNAlibraries (Table4)[25].Sixty-sixpe rcentofthenon-repetitive BESsmatchedcDNAsequencesinBLASTNsearches withane-valuecutoffof1.0e-10. Usingthesearchstrategydescribedabove,29large Amborella BACcontigs(>39BACclones)showed Figure1 MaximumlikelihoodtreesforreversetranscriptasegenesclassifiedasCopia-typeandGypsy-typeLTRandLINEelements (a) Copia-type; (b) Gypsy-typeLTRs; (c) Gypsy-typeLINEs.Themaximumlikelihoodtreesshowrateheterogeneityandnorecentexpansive radiations(thatis,shortterminalbranches).ReversetranscriptasesequenceswereminedfromBACendsequenceset. Table3SimplesequencerepeatsidentifiedinBESsand SangershotgunsequencesRepeat Amborella (BES)aAmborella (RS)aSoyBeanaOryza rufipogonaMono 149.66152.8972.74 50.79 Di 225.03211.0077.89 63.94 Tri 72.4978.96110.01144.06 Tetra 89.8890.70100.67102.25 Penta 74.8589.7364.54 56.00 Total 611.92623.28425.85417.04aValuesarepresencepermillionbasepairs.RS,Randomsheared Fraction of 20mersCopy number of 20mer Arabidopsis Amborella Rise Sorghum Maize Figure2 K-meranalysesofSangershotgunsequencesreveal lowfrequenciesofshortrepeatsinthe Amborella genome relativetothesorghumandmaizegenomes Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page5of14

PAGE 6

syntenywithatleastoneofthefoursequencedgenomes, andnineoftheseshowedsyntenywithatleastoneregion inallfourgenomes.AllBESsmappingtothesesyntenic regionsalsoexhibitedsignificantmatchestothe sequencesinthe Amborella cDNAassembly(Table4; TableS2inAdditionalfile1).Whereas25ofthese Amborella BACcontigsmappedtoatleastonetractin the Vitis genome,15,16,and24contigswerefoundtobe syntenicwithoneormoretractsinthe Oryza Arabidopsis ,and Populus genomes,respectively(TableS2inAdditionalfile1).Theseresultsprovideanovel,albeitcoarse, firstviewoftheancestralgenomeforallfloweringplants andthetimingofrearrangementsandotherstructural changes(forexample,genomeduplications,fractionation, chromosomalfissionsand fusions)thathavereduced syntenybetweenthemonocotandeudicotgenomesanalyzedhere(Figure3).Parsimonymappingofsyntenyloss ontoaphylogenyconsistingof Amborella andtheother fourspeciesindicatesvariationinratesofchangeingenomestructure.Inagreementwithpreviousstudies [29,45], Vitis seemstohavebeenthemoststableofthe sequencedgenomes,andtherateofchangeslowedinthe lineageleadingto Populus followingdivergencefrom thelineageleadingto Arabidopsis (Figure3).PaleopolyploidyinangiospermgenomesPaleopolyploidyeventshavebeenwellcharacterizedin allfoursequencedgenomesanalyzedhere[29,45,58-60], andthesyntenic Amborella FPCcontigsdescribed aboveoftenmatchmultipleregionsinthesegenomes. Themostancientofthesepaleopolyploidyeventsisthe so-called g triplicationthathasbeeninferredtohave occurredbeforethedivergenceoftheAsteridae(representedbytomato, Solanumlycopersicon)andtheRosidae,including Vitis Populus and Arabidopsis [29]. Giventheveryincompleteviewofthe Amborella genomethatisavailableintheBESdata,wearenotableto assesssyntenybetween Amborella FPCcontigs.Nevertheless,comparisonsbetweenthe Amborella contigsand setsofsyntenicblocksinthe Vitis genomeindicatethat the g triplicationmostlikelyoccurredsometimeafter thedivergenceofallotherangiospermsfromthelineage leadingto Amborella AllBESswerecomparedtoallannotatedprotein-codinggenesinthe Vitis genomeplacedwithinthecontext ofthepre-triplicationancestralgeneblocksandpost-triplicationsyntenicsegmentsidentifiedbyTang etal .[29]. Atotalof328 Amborella FPCcontigshadbetweentwo andeightgeneswithsignificantbestBLASTXmatches (e-values 1.0E-6)to Vitis genescorrespondingtopretriplicationgeneblocksintheancestralgenome.Inmost ofthesecases(199of328;Additionalfile2),besthits weredistributedbetweentwoorthreehomeologous (thatis,post-triplication)syntenic Vitis genomesegments.Oftheremaining129 Amborella FPCcontigs withBESsshowingsignificantBLASTXhitstoasingle Vitis subgenome(thatis,singlecopyofatriplicated ancestralblock),most(113)includedjust2genesmappingtotheancestral Vitis geneblocks(14including3 genes,and2including4genes)(Additionalfile2).All21 Table4StatisticsforcDNAsequencesincludedinmulti-librarytranscriptomeassemblyof246,196unigeneswith lengthsgreaterthan100bpTissue-libraryname SequencingmethodNumberofreadsUnscreenedreadsTotalpassingbases(MB) Apicalmeristem-Atr12 454FLXTitanium 794,746 688,305 201.90 Maleflowers-Atr15 454FLXTitanium 277,023 255,213 73.49 Oldleaves-Atr14 454FLXTitanium 280,097 260,563 73.49 Oldstem-Atr13 454FLXTitanium 259,431 238,156 68.70 Pre-meioticfemaleflowerbuds-Atr10454FLXGS 895,000 812,325 176.97 Pre-meioticfemaleflowerbud-Atr02Sanger 13,263 13,141 7.17 Pre-meioticmaleflowerbud-Atr01Sanger 25,343 25,006 14.17 Root-Atr11 454FLXGS 324,070 300,275 64.88 Stem-Atr16 454FLXTitanium 410,098 388,436 120.03AssembliesandrawdatacanbedownloadedfromtheAncestralAngiospermGenomeProjectwebsite[25].ABLASTportalfortheassemblyisalsoavailableat theprojectwebsite. Figure3 Variationinratesofstructuralevolutionevidentin parsimonymappingoflossesofsyntenywith29geneblocks inferredforthelastcommonancestorofallextantflowering plantlineages Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page6of14

PAGE 7

FPCcontigswithbestBLASTXmatchestofiveormore geneswithintheancestral Vitis blocksweredistributed amongtwoorthreepost-triplicationsubgenomes.Completesequencesforthe Amborella BACcontigsmay revealmoreevendistributionofsegmentsamong Vitis subgenomes,buttheresultsdescribedheresuggestthat triplication,fractionationanddivergenceofhomeologous segmentsinthe Vitis genomepostdatethedivergence betweenlineagesleadingto Vitis and Amborella (thatis, thelastcommonancestorofallextantangiosperms).Analysisofcompletesequencesfortwo Amborella BAC contigsTwoofthelarger(approximately500kb)BACcontigs(IDs 431and1003)mappingtomultiplesegmentsinallfour sequencedreferencegenomeswereidentifiedforfurther investigation.Aminimumtilingpathwasconstructedfor eachcontig,andflorescence insitu hybridizationswere performedtoverifythattheBACsmappedtoasinglecontiguousregioninthe Amborella genome(Figure4).Each BACinthetilingpathswassubclonedandsequencedto8 coverageonanABI3730xlsequencer.Gapswereclosed foreachscaffold,andcontiguous487,318and629,678bp phaseIIsequenceswereassembledforcontigs431and 1003,respectively. TheDAWGPAWSsuiteofscriptswasusedtoorganize abinitio genepredictions,BLASTresultsandtheoutput ofrepeatidentificationtools[61,62]. Abinitio genepredictionsweregeneratedusingFGENESH[63],AUGUSTUS[64],SNAP[65],GeneID[66]andGenScan[67].In addition, Amborella ESTsequencesproducedbythe454 Titaniumplatform(2,943,273reads;totalreadsizeof approximately776Mbp;averagereadlengthof263.60 bp)andSangersequencing(38,147reads;totalreadsize ofapproximately21.3Mbp;averagereadlengthof559.57 bp)weresplice-alignedtothecontigsusingGMAP (GenomicMappingandAlignmentProgram)[68]with thePASA(ProgramtoAssembleSplicedAlignments) genomeannotationtool[69].AllpredictionsweremanuallycomparedwithBLASTXresultsagainstgeneannotationsfrom Arabidopsis [70], Vitis [45], Z.mays [56], Medicago [71], Oryza [72,73],and Sorghum [55]aswell astBLASTxresultsagainstthe Amborella transcript assemblies.GBrowseviewsofgeneannotationsand BLASTresultsforeachcontigareavailableattheAncestralAngiospermGenomeProjectwebsite[25]. Rigorousassessmentsofsyntenybetweenthese Amborella contigsandtheaforementionedfourangiospermgenomeswereperformedusingLASTZ[74,75]. Dotplotscomparingthe Amborella contigsandthe Vitis Figure4 HybridizationofthreeBACclonesintheminimumtilingpathsforcontigs1003and431tomitoticsquashes(2 n =26) verifiestheFPCassemblies.(a-e) Resultsforcontig1003; (f-j) resultsforcontig431.Panels(a)and(f)showallthreeBAC-FISHprobesmerged; (e,j)DAPIstaining;(b,c,d)showeachofthreeBACs(red,green,white)forcontig1003;(g,h,i)showeachofthreeBACs(red,green,white)for contig431. Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page7of14

PAGE 8

genomeshowthatcontigsaresyntenicwithpreviously triplicatedblocks[29].Regionsofcontig1003match genesonsyntenicsegmentsofchromosomes1,14and 17inthe Vitis genome(Figure5)andcontig431 mappedtosyntenicportionsof Vitis chromosomes6,8 and13(Figure6).Thesefindingssupporttheconclusion fromtheBESanalysessuggestingthatthe g triplication occurredafterthefirstbranchingeventinthephylogeny ofextantangiosperms. Atleasttwogenomeduplications( r and s )havebeen inferredtohaveoccurredwithinthemonocotlineage leadingtoricesincediverg enceofmonocotsandeudicots[28].Theseduplicationswereevidentincomparisonswithboth Amborella contigs.Regionsofcontig 1003werefoundtobesyntenicwithportionsofrice chromosomes2and4derivedfromthe r duplication andaportionofchromosome10(Figure5)thatisrelated tothesetworegionsthroughtheearlier s duplication [28].TheLASTZanalysisofcontig431revealedsynteny withsevenregionsinthericegenome(Figure6)andone ofthe putativeancestralregions (PAR17)characterized byTang etal .[28].ThesePARsweredefinedasregions ofsyntenybetweenthericeand Vitis genomes.Phylogeneticanalysesofgenesin Amborella contig431andsyntenicregionsofthericeand Vitis genomesmayelucidate thetimingofthe g triplicationandgenomeduplications Figure5 LASTZdotplotscomparingBACcontig1003syntenic regionsinthegrapeandricegenomes (a) Grapegenome; (b) ricegenome. Figure6 LASTZdotplotscomparingBACcontig431syntenic regionsinthegrapeandricegenomes (a) Grapegenome; (b) ricegenome. Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page8of14

PAGE 9

evidentinsyntenyanalysesofthericegenomerelativeto thedivergenceofmonocotsandeudicots.Phylogeneticanalysesofgenefamiliesrepresentedin sequenced Amborella contigsWhilethefractionationprocesshasresultedinthelossof mostduplicatedgenesfollowingtheancientpolyploidy eventsevidentinthesyntenic Vitis andricesegments showninFigures5and6,duplicate Vitis geneshavebeen retainedforhomologsofthree Amborella geneslocated oncontig431(Figures6a).Thesegeneswereusedto searchthePlantTribesgene familydatabase[35].The threegenesetsidentifiedinthesyntenyanalysiscorrespondtothreegenefamilies(auxin-independentgrowth promoter,ceramidaseandplantuncouplingmitochondrial protein)circumscribedthroughOrthoMCLclustering[76] ofgeneannotationsfromtheavailable Arabidopsis Carica (papaya), Populus Medicago (alfalfa), Glycine, Cucumis (cucumber), Vitis Mimulus Oryza Sorghum Selaginella (spikemoss)and Physcomitrella genomes.Homologous genessampledfromexemplarasterid,ranunculid,nongrassmonocotandgymnospermspecieswereobtained fromESTassemblydatabases[25,77,78]andwereadded toeachgenefamilyset.Sequencesineachgenefamilyset werealignedusingMUSCLE[79],andRAxML[80]run withtheGTRGAMMAsubstitutionmodelwasusedto obtainmaximumlikelihoodestimatesofgenetrees. Inspectionoftheresultinggenetreesshowssupportfor theinferencedrawnfromtheBACendsequenceanalysis. The g triplication(hexaploidyevent)clearlyoccurred after Amborella divergedfromotherextantangiosperm lineages(Figure7).Theplacementofthe g triplication withrespecttothedivergenceofmonocotsandeudocots orcoreeudicotsandtheRanunculalesvariesamongthe threegenetrees.Thisincongruenceamonggenetreesis likelyduetoartifactsassociatedwithsubstitutionrate variationandinsufficient taxonsampling.Analysesof additionalgenefamilieswithbroadertaxonsamplingwill benecessarytoobtainbetterresolutionforthetimingof the g triplicationwithrespecttothedivergenceofmonocot,eudicots,Ranunculales(thatis, basal eudicots)and coreeudocots.ConclusionsA.trichopoda isthesisterspeciestothelargeclade encompassingallotherextantfloweringplants.Assuch, Figure7 Genetreesforauxin-independentgrowthpromoter( AXI1 ),ceramidaseandplantuncouplingmitochondrialprotein1 ( PUMP1 )genefamilies.(a) Auxin-independentgrowthpromoter( AXI1 ); (b) ceramidase; (c) plantuncouplingmitochondrialprotein1[ PUMP1 ] genefamilies.Thegenetreesshowdivergenceofgeneson Amborella contig431divergingfromlineagesleadingto Vitis g homeologsmapping tosyntenicblocksonchromosomes6,8and13(showninred).Genessampledfrommajorangiospermlineagesarehighlighted. Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page9of14

PAGE 10

comparativeanalysesof Amborella andotherflowering plantsofferauniquelyinformativeperspectiveonthe mostrecentcommonancestorofallextantangiosperms. ThephysicalmapandBACendsequencesdescribedin thisstudyprovidealow-resolutionviewofthe Amborella genome.Nonetheless,thesedatashedlighton genomicfeaturesofthelastcommonancestoroffloweringplants.Moreover,the Amborella genomeprovidesa uniquereferenceforunderstandinggenomeevolution throughoutangiospermhistory.Whenplacedinthe contextofthephysicalmap,BESsrepresentingjust5.4% ofthe Amborella genomeallowedreconstructionof ancestralgeneblocksinregionsrepresentedby29BAC contigsandinferenceofthetimingofstructuralmutationsthatdisruptedtheseblocks(Figure3). AnalysesofBESsandBACcontigsalsoindicatethat theancient g polyploidyeventinferredfromthe Arabidopsis [58], Carica [81], Populus [60],and Vitis [45] genomesoccurredafterthe Amborella lineagediverged fromtherestoftheangiosperms.Therefore,iftheoriginofangiospermswasassociatedwithagenomeduplicationashasbeenhypothesizedelsewhere[16,20,23], thatpolyploidyeventpredatedthe g event.MaterialsandmethodsBAClibraryconstructionProtocolsforDNAmegabasepreparation,libraryconstruction,pickingandarrayingproposedinLuoand Wing[82]werefollowed.FingerprintingTheSNaPshotfingerprintingtechniquewasadopted [32]withthemodificationsdescribedbyKim etal .[83]. SnapshotreactionswereloadedintoABI3730xlDNA sequencers.Analysisofdataforeachcontigwascarried outusingtheABIDataCollectionProgram.PhysicalmapconstructionFingerprintswereassembledintocontigsusingtheprogramFPCversion7.2[34].TheinitialassemblywascarriedoutusingaSulstonscorethresholdofe-50 followedbythreeroundsofdequeuingatthesame stringencyandauto-mergingofcontigsate-21.BACendextractionandsequencingBACDNAwasextractedandendsequencedfrom 36,684clonesusingthemethodsdescribedbyAmmiraju etal .[83,84].Sequencequalityassessmentandtrimming werecarriedoutusingtheprogramsPhred[85]and Lucy[86].RandomshearedlibraryArandomshearedlibrarywasconstructedaspreviously described[87].cDNAsequencingandassemblyAdditionalSangerESTswere generatedfromavailable maleandfemaleflowerbud cDNAlibraries[10](Table 4).Librariesfor454sequencingwereconstructedfrom thetissueslistedinTable4usingtheMintcDNA synthesiskit(Evrogen,Moscow,Russia).TotalRNAsfor cDNAsynthesiswereisolatedusingacombinationof CTABextractionandtheRNeasyPlantMinikit(Qiagen Valencia,CAUSA)aspreviouslydescribedforbasal angiosperms[11].TworoundsofmessengerRNAisolationwereperformedwiththePoly(A)Purist mRNA PurificationKit(AmbionInc.Austin,TXUSA)accordingtothemanufacturer srecommendation.ContaminantDNAwasremovedwithDNA-free (Ambion Inc.)andmRNAqualitywasverifiedusingaBioanalyzer (AgilentInc.SantaClara,CA,UnitedStates).Vector andadaptorsequencesweretrimmedfrom454Titanium(2,943,273reads;totalreadsizeofapproximately 776Mbp;averagereadlengthof263.60bp)andSanger sequences(38,147reads;totalreadsizeofapproximately 21.3Mbp;averagereadlengthof559.57bp)usingseqclean[88]andassembledusingMIRA[89].Similaritysearches,repeatclassificationandcontig anchoringSimilaritysearcheswerecarriedoutusingtheprograms BLASTNandBLASTX[39].BLASTNwasrununder relaxedsettings(-q-4-r5)inordertoaccommodatethe evolutionarydistancebetween Amborella andthespecies includedintherepeatdatabasesused;thesignificance thresholdwassetat1e-10.InthecaseofBLASTX searchesthethresholdwassetat1e-5or1e-4fortheBES syntenyanalysis.tBLASTXwasusedtoanchorthecontigstothereferencegenomes(seeResultsfordetails).DatabasesThedatabasesusedinsimilaritysearcheswereRepBase version15.08[38],theGenBanknon-redundant(nr) database,andthe Oryza, Arabidopsis Vitis and Populus genomesequences.ValidationofrepeatsearchesandMITEidentificationTheprogramMUST[49]wasusedfor denovo characterizationofhighlyrepeatedsequences;resultswere theninspectedforthepresenceofMITEfeatures. Invertedrepeatswereidentifiedmanuallyparsingthe resultsofdot-plotcomparisonsmadeusingtheprogram Dotter [90].SimplesequencerepeatsearchesMicrosatelliteswereidentifiedusingtheprogramSputnik[91].SSRcomposition,lengthanddistributionwere parsedandanalyzedusingthetoolsandthestrategy usedbyMorgante etal .[92].Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page10of14

PAGE 11

Fluorescence insitu hybridizationFPCcontigswerevalidatedbyhybridizingBACDNAs to Amborella chromosomesquashes.DNAwasprepared forBACmappingtothemiddleandbothendsofBAC contigs431and1003andusedtopreparefluorescently labeledBAC-FISHprobes.Chromosomesquasheswere preparedfromroottipsandlabeledBAC-FISHprobes werepreparedasdescribedbyXiong etal .[93].ContigsequencingandannotationMinimumtilingpathsofsevenandsixBACswereidentifiedforcontigs1003and431,respectively,bythevisual inspectionoftheFPCassemblies.Adjacentcloneswere chosenbasedontheirreciprocalpositionandprobability valueassociatedtotheiroverlappingfingerprintedbands asshownbyFPC.Sequencingofselectedminimumtiling pathBACswasdonetophaseIIqualityaspreviously described[73].PhaseIIBACsequenceswerethen assembledinto1003and431contigsequencesbasedon dotplotcomparisonsandoverlapsimilaritybetweenadjacentclones. PerlscriptsavailablefromtheDAWGPAWSpackage [61,62]wereusedtoconvertcomputationalannotation resultsfrommultiplesourcesintoasingleGFF3filefor combinedevidenceannotationinApollo[94]andpublicationinGbrowse[95]. Abinitio geneannotationprograms usedinthisprocessincludedFGENESH[63]AUGUSTUS [64],SNAP[65],GeneID[66]andGenScan[67].Because Amborella-specificgenemodelparameterizationswere notavailablefortheseprogr ams,multipleplantmodels wereusedforeach abinitio program.Thesequenceofthe entirecontigwasBLASTx(e<110-5)searchedagainst geneannotationsfrom Arabidopsis [70], Vitis [45], Z.mays [56], Medicago [71], Oryza [72],and Sorghum [55] aswellastBLASTx(e<110-5)searchedagainstadatabaseofcomprehensive Amborella transcriptassemblies [25].Inaddition, Amborella ESTsequences(readsand assemblies;Table4)weresplice-alignedtothecontigs usingGMAP(GenomicMappingandAlignmentProgram)[68]withthePASA(ProgramtoAssembleSpliced Alignments)genomeannotationtool[69].ThegenemodelsandBLASTsearchresultsweremanuallycombined intogenemodelsusingtheApollogenomeannotation curationtool[94].SyntenyanalysisofsequencedBACcontigswith Vitis and Oryza genomesSequenced Amborella BACcontigs431(487,318bp) and1003(629,678bp)werecomparedtotheInternationalRiceGenomeSequencingProject(IRGSP)rice genomeassembly(version5)andtheGenoscope12 Vitis genomeassemblyusingLASTZanddefaultparameters.PriortoLASTZcomparisons,allgenomic sequencesweremaskedusingNCBI sWindowMasker toremovesimplerepeats.Significantmatchesafter repeatmaskingwerevisualizedasdotplots.Geneannotationsforthericeand Vitis genomeswereobtained fromtheRiceAnnotationProject[96]andGenoscope [97],respectively,andplottedontheverticalaxesofthe dotplots(Figures5and6).FGENESH[63]annotations forthe Amborella contigswereincludedonthehorizontalaxesofthedotplots.LASTZscoresweresummed forallaligned Amborella -riceor Amborella-Vitis blocks within100kbofeachotherinsequencedgenomes.All regionswithsummedscores>100,000wereconsidered assyntenicandincludedinFigures5and6.PhylogeneticanalysisAllalignmentswerecarriedoutusingtheprogram MUSCLE [79]rununderdefaultsettings.Maximum likelihoodanalyseswererunonalignedDNAand aminoacidsequencesusingRAxML[80]andthe GTRGAMMAnucleotidesubstitutionmodel.SubmissionofdatatoGenBankdatabasesBESs(HR616970toHR686434),full-lengthBAC sequences(AC243594.1toAC243606.1),Sangershotgun sequences(HR614237toHR616931),454shotgun sequences(SRP006044),SangerESTs(FD425831.1to FD443502.1)and454cDNAsequences(SRX018174, SRX018165,SRX018164,SRX018163,SRX018157, SRX018156)havebeendepositedintheappropriate NCBIGenBanksequencedatabases.Allsequencesare alsoavailableattheAncestralAngiospermGenomeProjectwebsite[25].AdditionalmaterialAdditionalfile1:Supplementaltablesandfigurescitedwith additionaldetailsforthephysicalmapandshotgunsequences Additionalfile2:Syntenyanalysisof Amborella BACendsand Vitis genes Abbreviations bp:basepair;BAC:bacterialartificialchromosome;BES:BACendsequence; EST:expressedsequencetag;FISH:fluorescence insitu hybridization;HICF: highinformationcontentfingerprinting;LINE:longinterspersedelement; LTR:longterminalrepeat;MITE:miniatureinvertedrepeattransposable element;SSR:simplesequencerepeat;TE:transposableelement. Acknowledgements ThisworkwassupportedwithfundingfromNationalScienceFoundation grants0208502,0638595and0922742.Wealsoacknowledgehelpful commentsandsuggestionsprovidedbyanonymousreviewers. Authordetails1ArizonaGenomicsInstitute,SchoolofPlantSciencesandBIO5Institutefor CollaborativeResearch,UniversityofArizona,1657EastHelenStreet,Tucson, AZ85721,USA.2DepartmentofPlantBiology,UniversityofGeorgia,4504 MillerPlantSciences,Athens,GA30602,USA.3DepartmentofBiological Sciences,UniversityofMissouri,371BLifeSciencesCenter,Columbia,MOZuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page11of14

PAGE 12

65211,USA.4CollegeofLifeSciencesandTechnology,Huazhong AgriculturalUniversity,Wuhan,Hubei430070,China.5IntercollegeGraduate DegreePrograminPlantBiologyandInstituteofMolecularEvolutionary Genetics,HuckInstitutesoftheLifeSciences,ThePennsylvaniaState University,405LifeSciencesBuilding,UniversityPark,Pennsylvania16802, USA.6DepartmentofPlantandMicrobiology,CollegeofNaturalResources, UniversityofCalifornia,311KoshlandHall,Berkeley94709,CA,USA.7Plant GenomeMappingLaboratory,UniversityofGeorgia,111RiverbendRoad, Athens,GA30605,USA.8SchoolofPlantSciencesandBIO5,Universityof Arizona,1657EastHelenStreet,Tucson,AZ85721,USA.9DowAgrosciences LLC,9330ZionsvilleRoad,Indianapolis,IN46268,USA.10Departmentof Biology,UniversityofFlorida,220BartramHall,Gainesville,FL32611,USA.11FloridaMuseumofNaturalHistory,MuseumRoadandNewellDrive, UniversityofFlorida,Gainesville,FL32611,USA.12DepartmentofBiological Sciences,UniversityatBuffalo(SUNY),637HochstetterHall,Buffalo,NY 14260,USA.13StateKeyLaboratoryofGeneticEngineering,SchoolofLife Sciences,InstituteofPlantBiology,CenterforEvolutionaryBiology,and InstitutesofBiomedicalSciences,FudanUniversity,220HandanRoad, Shanghai200433,China.14NorthernLights,4500NE40thStreet,SeattleWA 98105,USA.15DepartmentofBotanyandPlantPathology,PurdueUniversity, B028WhistlerHall,WestLafayette,IN47906,USA.16SchoolofForest Resources,ThePennsylvaniaStateUniversity,323ForestResourcesBuilding, UniversityPark,PA16802,USA.17ClemsonUniversityGenomicsInstitute, ClemsonUniversity,51CherrySt,Clemson,NC29634,USA. Authors contributions JLM,AZ,RAWandCWDdesignedandcoordinatedthestudy.The Amborella BAClibrarywasconstructedandcharacterizedintheArizonaGenomics Institute(AGI)byDK,YY,KC,JLG,ASandML.cDNAlibraryproductionand sequencingwasperformedbyACattheUniversityofFloridaand assemblieswereperformedbySA.FundingforBAClibraryconstructionwas obtainedbyDM,JB,JEC,JT,CWDandRAW.Comparativeanalyseswere performedbyAZ,AS,JEB,JCE,JD,HT,SR,AHP,DES,PSS,VAA,HM,CWDand JL-M.Florescence insitu hybridizationswereperformedbyZXandJCP.BAC contigannotationswereperformedbyJEB,JCE,SC,BBandJLM.AZand JLMwrotethefirstdraftofthemanuscriptandallauthorscontributedto refinement. Received:21November2010Revised:19May2011 Accepted:27May2011Published:27May2011 References1.CantinoP,DoyleJJ,GrahamS,JuddW,OlmsteadR,SoltisD,SoltisP, DonoghueMJ: TowardsaphylogeneticnomenclatureofTracheophyta. Taxon 2007, 56 :822-846. 2.Leebens-MackJH,WallPK,DuarteJ,ZhengZ,OppenheimerD, dePamphilisCW: Agenomicsapproachtothestudyoffloral developmentalgenetics:strengthsandlimitations. AdvBotRes 2006, 44 :527-549. 3.SoltisDE,AlbertVA,Leebens-MackJ,PalmerJD,WingRA,dePamphilisCW, MaH,CarlsonJE,AltmanN,KimS,WallPK,ZuccoloA,SoltisPS: The Amborella genome:anevolutionaryreferenceforplantbiology. Genome Biol 2008, 9 :402. 4.MathewsS,DonoghueMJ: Therootofangiospermphylogenyinferred fromduplicatephytochromegenes. Science 1999, 286 :947-950. 5.QiuYL,LeeJ,Bernasconi-QuadroniF,SoltisDE,SoltisPS,ZanisM, ZimmerEA,ChenZ,SavolainenV,ChaseMW: Theearliestangiosperms: evidencefrommitochondrial,plastidandnucleargenomes. Nature 1999, 402 :404-407. 6.SoltisPS,SoltisDE,ChaseMW: Angiospermphylogenyinferredfrom multiplegenesasatoolforcomparativebiology. Nature 1999, 402 :402-404. 7.JansenRK,CaiZ,RaubesonLA,DaniellH,dePamphilisCW,Leebens-MackJ, MullerKF,Guisinger-BellianM,HaberleRC,HansenAK,ChumleyTW,LeeSB, PeeryR,McNealJR,KuehlJV,BooreJL: Analysisof81genesfrom64 plastidgenomesresolvesrelationshipsinangiospermsandidentifies genome-scaleevolutionarypatterns. ProcNatlAcadSciUSA 2007, 104 :19369-19374. 8.MooreMJ,BellCD,SoltisPS,SoltisDE: Usingplastidgenome-scaledatato resolveenigmaticrelationshipsamongbasalangiosperms. ProcNatl AcadSciUSA 2007, 104 :19363-19368. 9.WarrenWC,HillierLW,MarshallGravesJA,BirneyE,PontingCP,GrutznerF, BelovK,MillerW,ClarkeL,ChinwallaAT,YangSP,HegerA,LockeDP, MiethkeP,WatersPD,VeyrunesF,FultonL,FultonB,GravesT,WallisJ, PuenteXS,Lopez-OtinC,OrdonezGR,EichlerEE,ChenL,ChengZ, DeakinJE,AlsopA,ThompsonK,KirbyP, etal : Genomeanalysisofthe platypusrevealsuniquesignaturesofevolution. Nature 2008, 453 :175-183. 10.AlbertVA,SoltisDE,CarlsonJE,FarmerieWG,WallPK,IlutDC,SolowTM, MuellerLA,LandherrLL,HuY,BuzgoM,KimS,YooMJ,FrohlichMW,PerlTrevesR,SchlarbaumSE,BlissBJ,ZhangX,TanksleySD,OppenheimerDG, SoltisPS,MaH,dePamphilisCW,Leebens-MackJH: Floralgeneresources frombasalangiospermsforcomparativegenomicsresearch. BMCPlant Biol 2005, 5 :5. 11.KimS,KohJ,YooMJ,KongH,HuY,MaH,SoltisPS,SoltisDE: Expression offloralMADS-boxgenesinbasalangiosperms:implicationsforthe evolutionoffloralregulators. PlantJ 2005, 43 :724-744. 12.SoltisDE,ChanderbaliAS,KimS,BuzgoM,SoltisPS: TheABCmodeland itsapplicabilitytobasalangiosperms. AnnBot 2007, 100 :155-163. 13.Vialette-GuiraudAC,AdamH,FinetC,JasinskiS,JouannicS,ScuttCP: InsightsfromANA-gradeangiospermsintotheearlyevolutionofCUPSHAPEDCOTYLEDONgenes. AnnBot 2011, 107 :1511-1519. 14. FourquinC,Vinauger-DouardM,ChambrierP,Berne-DedieuA,ScuttCP: FunctionalconservationbetweenCRABSCLAWorthologuesfromwidely divergedangiosperms. AnnBot 2007, 100 :651-657. 15.FourquinC,Vinauger-DouardM,FoglianiB,DumasC,ScuttCP: Evidence thatCRABSCLAWandTOUSLEDhaveconservedtheirrolesincarpel developmentsincetheancestoroftheextantangiosperms. ProcNatl AcadSciUSA 2005, 102 :4649-4654. 16.ZahnLM,KongH,Leebens-MackJH,KimS,SoltisPS,LandherrLL,SoltisDE, dePamphilisCW,MaH: TheevolutionoftheSEPALLATAsubfamilyof MADS-boxgenes:apreangiospermoriginwithmultipleduplications throughoutangiospermhistory. Genetics 2005, 169 :2209-2223. 17.ZahnLM,Leebens-MackJ,dePamphilisCW,MaH,TheissenG: ToBorNot toBaflower:theroleofDEFICIENSandGLOBOSAorthologsinthe evolutionoftheangiosperms. JHered 2005, 96 :225-240. 18.ZahnLM,Leebens-MackJH,ArringtonJM,HuY,LandherrLL, dePamphilisCW,BeckerA,TheissenG,MaH: Conservationanddivergence intheAGAMOUSsubfamilyofMADS-boxgenes:evidenceof independentsub-andneofunctionalizationevents. EvolDev 2006, 8 :30-45. 19.ShanH,ZahnL,GuindonS,WallPK,KongH,MaH,dePamphilisCW, Leebens-MackJ: EvolutionofplantMADSboxtranscriptionfactors: evidenceforshiftsinselectionassociatedwithearlyangiosperm diversificationandconcertedgeneduplications. MolBiolEvol 2009, 26 :2229-2244. 20.CuiL,WallPK,Leebens-MackJH,LindsayBG,SoltisDE,DoyleJJ,SoltisPS, CarlsonJE,ArumuganathanK,BarakatA,AlbertVA,MaH,dePamphilisCW: Widespreadgenomeduplicationsthroughoutthehistoryofflowering plants. GenomeRes 2006, 16 :738-749. 21.VandePeerY,FawcettJA,ProostS,SterckL,VandepoeleK: Theflowering world:ataleofduplications. TrendsPlantSci 2009, 14 :680-688. 22.WoodTE,TakebayashiN,BarkerMS,MayroseI,GreenspoonPB, RiesebergLH: Thefrequencyofpolyploidspeciationinvascularplants. ProcNatlAcadSciUSA 2009, 106 :13875-13879. 23.DeBodtS,MaereS,VandePeerY: Genomeduplicationandtheoriginof angiosperms. TrendsEcolEvol 2005, 20 :591-597. 24.SoltisDE,AlbertVA,Leebens-MackJ,BellCD,PatersonAH,ZhengC, SankoffD,dePamphilisCW,WallPK,SoltisPS: Polyploidyandangiosperm diversification. AmJBot 2009, 96 :336-348. 25. AncestralAngiospermGenomeProject.. [http://ancangio.uga.edu/]. 26.LyonsE,PedersenB,KaneJ,AlamM,MingR,TangH,WangX,BowersJ, PatersonA,LischD,FreelingM: Findingandcomparingsyntenicregions among Arabidopsis andtheoutgroupspapaya,poplar,andgrape:CoGe withrosids. PlantPhysiol 2008, 148 :1772-1781. 27.TangH,BowersJE,WangX,MingR,AlamM,PatersonAH: Syntenyand collinearityinplantgenomes. Science 2008, 320 :486-488. 28. TangH,BowersJE,WangX,PatersonAH: Angiospermgenome comparisonsrevealearlypolyploidyinthemonocotlineage. ProcNatl AcadSciUSA 2010, 107 :472-477. 29.TangH,WangX,BowersJE,MingR,AlamM,PatersonAH: Unraveling ancienthexaploidythroughmultiply-alignedangiospermgenemaps. GenomeRes 2008, 18 :1944-1954.Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page12of14

PAGE 13

30.LeitchI,HansonL: DNAC-valuesinsevenfamiliesfillphylogenetic gapsinthebasalangiosperms. BotJLinnSoc 2002, 140 :175-179. 31. ArizonaGenomeInstitute.. [http://www.genome.arizona.edu/orders/direct. html?library=AT_SBa]. 32.LuoMC,ThomasC,YouFM,HsiaoJ,OuyangS,BuellCR,MalandroM, McGuirePE,AndersonOD,DvorakJ: High-throughputfingerprintingof bacterialartificialchromosomesusingthesnapshotlabelingkitand sizingofrestrictionfragmentsbycapillaryelectrophoresis. Genomics 2003, 82 :378-389. 33.NelsonWM,BhartiAK,ButlerE,WeiF,FuksG,KimH,WingRA,MessingJ, SoderlundC: Whole-genomevalidationofhigh-information-content fingerprinting. PlantPhysiol 2005, 139 :27-38. 34.SoderlundC,HumphrayS,DunhamA,FrenchL: Contigsbuiltwith fingerprints,markers,andFPCV4.7. GenomeRes 2000, 10 :1772-1787. 35.WallPK,Leebens-MackJ,MullerKF,FieldD,AltmanNS,dePamphilisCW: PlantTribes:ageneandgenefamilyresourceforcomparativegenomics inplants. NucleicAcidsRes 2008, 36 :D970-976. 36.DuarteJM,WallPK,EdgerPP,LandherrLL,MaH,PiresJC,Leebens-MackJ, dePamphilisCW: Identificationofsharedsinglecopynucleargenesin Arabidopsis ,Populus,Vitisand Oryza andtheirphylogeneticutility acrossvarioustaxonomiclevels. BMCEvolBiol 2010, 10 :61. 37.GoremykinVV,Hirsch-ErnstKI,WolflS,HellwigFH: Analysisofthe Amborella trichopodachloroplastgenomesequencesuggeststhat amborellaisnotabasalangiosperm. MolBiolEvol 2003, 20 :1499-1505. 38.JurkaJ,KapitonovVV,PavlicekA,KlonowskiP,KohanyO,WalichiewiczJ: RepbaseUpdate,adatabaseofeukaryoticrepetitiveelements. Cytogenet GenomeRes 2005, 110 :462-467. 39.AltschulSF,GishW,MillerW,MyersEW,LipmanDJ: Basiclocalalignment searchtool. JMolBiol 1990, 215 :403-410. 40.Leebens-MackJ,RaubesonLA,CuiL,KuehlJV,FourcadeMH,ChumleyTW, BooreJL,JansenRK,dePamphilisCW: Identifyingthebasalangiosperm nodeinchloroplastgenomephylogenies:samplingone swayoutof theFelsensteinzone. MolBiolEvol 2005, 22 :1948-1963. 41.MagallonS: Usingfossilstobreaklongbranchesinmoleculardating:a comparisonofrelaxedclocksappliedtotheoriginofangiosperms. Syst Biol 2010, 59 :384-399. 42.SmithSA,BeaulieuJM,DonoghueMJ: Anuncorrelatedrelaxed-clock analysissuggestsanearlieroriginforfloweringplants. ProcNatlAcadSci USA 2010, 107 :5897-5902. 43. BaucomRS,EstillJC,ChaparroC,UpshawN,JogiA,DeragonJM, WestermanRP,SanmiguelPJ,BennetzenJL: Exceptionaldiversity,nonrandomdistribution,andrapidevolutionofretroelementsintheB73 maizegenome. PLoSGenet 2009, 5 :e1000732. 44.InternationalRiceGenomeSequencingProject: Themap-basedsequence ofthericegenome. Nature 2005, 436 :793-800. 45.JaillonO,AuryJM,NoelB,PolicritiA,ClepetC,CasagrandeA,ChoisneN, AubourgS,VituloN,JubinC,VezziA,LegeaiF,HugueneyP,DasilvaC, HornerD,MicacE,JublotD,PoulainJ,BruyereC,BillaultA,SegurensB, GouyvenouxM,UgarteE,CattonaroF,AnthouardV,VicoV,DelFabbroC, AlauxM,DiGasperoG,DumasV, etal : Thegrapevinegenomesequence suggestsancestralhexaploidizationinmajorangiospermphyla. Nature 2007, 449 :463-467. 46.SchmutzJ,CannonSB,SchlueterJ,MaJ,MitrosT,NelsonW,HytenDL, SongQ,ThelenJJ,ChengJ,DXu,HellstenU,MayGD,YuY,SakuraiT, UmezawaT,BhattacharyyaMK,SandhuD,ValliyodanB,LindquistE,PetoM, GrantD,ShuS,GoodsteinD,BarryK,Futrell-GriggsM,AbernathyB,DuJ, TianZ,ZhuL, etal : Genomesequenceofthepalaeopolyploidsoybean. Nature 2010, 463 :178-183. 47.VershininAV,DrukaA,AlkhimovaAG,KleinhofsA,Heslop-HarrisonJS: LINEs andgypsy-likeretrotransposonsinHordeumspecies. PlantMolBiol 2002, 49 :1-14. 48.LeetonPR,SmythDR: AnabundantLINE-likeelementamplifiedinthe genomeofLiliumspeciosum. MolGenGenet 1993, 237 :97-104. 49.ChenY,ZhouF,LiG,XuY: MUST:asystemforidentificationofminiature inverted-repeattransposableelementsandapplicationsto Anabaena variabilis and Haloquadratumwalsbyi Gene 2009, 436 :1-7. 50.HawkinsJS,KimH,NasonJD,WingRA,WendelJF: Differentiallineagespecificamplificationoftransposableelementsisresponsiblefor genomesizevariationin Gossypium GenomeRes 2006, 16 :1252-1261. 51.SchmidtT: LINEs,SINEsandrepetitiveDNA:non-LTRretrotransposonsin plantgenomes. PlantMolBiol 1999, 40 :903-910. 52.KurtzS,NarechaniaA,SteinJC,WareD: AnewmethodtocomputeK-mer frequenciesanditsapplicationtoannotatelargerepetitiveplant genomes. BMCGenomics 2008, 9 :517. 53. KEWC-ValueDatabase.. [http://data.kew.org/cvalues/]. 54.BowersJE,AriasMA,AsherR,AviseJA,BallRT,BrewerGA,BussRW, ChenAH,EdwardsTM,EstillJC,ExumHE,GoffVH,HerrickKL,SteeleCL, KarunakaranS,LafayetteGK,LemkeC,MarlerBS,MastersSL,McMillanJM, NelsonLK,NewsomeGA,NwakanmaCC,OdehRN,PhelpsCA,RarickEA, RogersCJ,RyanSP,SlaughterKA,SoderlundCA, etal : Comparative physicalmappinglinksconservationofmicrosyntenytochromosome structureandrecombinationingrasses. ProcNatlAcadSciUSA 2005, 102 :13206-13211. 55.PatersonAH,BowersJE,BruggmannR,DubchakI,GrimwoodJ, GundlachH,HabererG,HellstenU,MitrosT,PoliakovA,SchmutzJ, SpannaglM,TangH,WangX,WickerT,BhartiAK,ChapmanJ,FeltusFA, Gowik U,GrigorievIV,LyonsE,MaherCA,MartisM,NarechaniaA,OtillarRP, PenningBW,SalamovAA,WangY,ZhangL,CarpitaNC, etal : The Sorghumbicolorgenomeandthediversificationofgrasses. Nature 2009, 457 :551-556. 56.SchnablePS,WareD,FultonRS,SteinJC,WeiF,PasternakS,LiangC, ZhangJ,FultonL,GravesTA,MinxP,ReilyAD,CourtneyL,KruchowskiSS, TomlinsonC,StrongC,DelehauntyK,FronickC,CourtneyB,RockSM, BelterE,DuF,KimK,AbbottRM,CottonM,LevyA,MarchettoP,OchoaK, JacksonSM,GillamB, etal : TheB73maizegenome:complexity,diversity, anddynamics. Science 2009, 326 :1112-1115. 57.BellCD,SoltisDE,SoltisP: Theageanddiversificationofangiospermsrerevisited. AmJBot 2010, 97 :1296-1303. 58.BowersJE,ChapmanBA,RongJ,PatersonAH: Unravellingangiosperm genomeevolutionbyphylogeneticanalysisofchromosomalduplication events. Nature 2003, 422 :433-438. 59.PatersonAH,BowersJE,ChapmanBA: Ancientpolyploidizationpredating divergenceofthecereals,anditsconsequencesforcomparative genomics. ProcNatlAcadSciUSA 2004, 101 :9903-9908. 60.TuskanGA,DifazioS,JanssonS,BohlmannJ,GrigorievI,HellstenU, PutnamN,RalphS,RombautsS,SalamovScheinAJ,SterckL,AertsA, BhaleraoRR,BhaleraoRP,BlaudezD,BoerjanW,BrunA,BrunnerA,BusovV, CampbellM,CarlsonJ,ChalotM,ChapmanJ,ChenGL,CooperD, CoutinhoPM,CouturierJ,CovertS,CronkQ,CunninghamR, etal : The genomeofblackcottonwood, Populustrichocarpa (Torr.&Gray). Science 2006, 313 :1596-1604. 61.EstillJC,BennetzenJL: TheDAWGPAWSpipelinefortheannotationof genesandtransposableelementsinplantgenomes. PlantMethods 2009, 5 :8. 62. DAWGPAWS.. [http://dawgpaws.sourceforge.net]. 63. FGENESH.. [http://softberry.com]. 64.StankeM,SchoffmannO,MorgensternB,WaackS: Genepredictionin eukaryoteswithageneralizedhiddenMarkovmodelthatuseshints fromexternalsources. BMCBioinformatics 2006, 7 :62. 65.KorfI: Genefindinginnovelgenomes. BMCBioinformatics 2004, 5 :59. 66.BlancoE,AbrilJF: Computationalgeneannotationinnewgenome assembliesusingGeneID. MethodsMolBiol 2009, 537 :243-261. 67.BurgeC,KarlinS: Predictionofcompletegenestructuresinhuman genomicDNA. JMolBiol 1997, 268 :78-94. 68.WuTD,WatanabeCK: GMAP:agenomicmappingandalignment programformRNAandESTsequences. Bioinformatics 2005, 21 :1859-1875. 69.HaasBJ,DelcherAL,MountSM,WortmanJR,SmithRKJr,HannickLI, MaitiR,RonningCM,RuschDB,TownCD,SalzbergSL,WhiteO: Improving the Arabidopsis genome annotationusingmaximaltranscriptalignment assemblies. NucleicAcidsRes 2003, 31 :5654-5666. 70.SwarbreckD,WilksC,LameschP,BerardiniTZ,Garcia-HernandezM, FoersterH,LiD,MeyerT,MullerR,PloetzL,RadenbaughA,SinghS, SwingV,TissierC,ZhangP,HualaE: The Arabidopsis InformationResource (TAIR):genestructureandfunctionannotation. NucleicAcidsRes 2008, 36 : D1009-1014. 71.CannonSB,SterckL,RombautsS,SatoS,CheungF,GouzyJ,WangX, MudgeJ,VasdewaniJ,SchiexT,SpannaglM,MonaghanE,NicholsonC, HumphraySJ,SchoofH,MayerKF,RogersJ,QuetierF,OldroydGE, DebelleF,CookDR,RetzelEF,RoeBA,TownCD,TabataS,VandePeerY, YoungND: LegumegenomeevolutionviewedthroughtheMedicago truncatulaandLotusjaponicusgenomes. ProcNatlAcadSciUSA 2006, 103 :14959-14964.Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page13of14

PAGE 14

72.ItohT,TanakaT,BarreroRA,YamasakiC,FujiiY,HiltonPB,AntonioBA, AonoH,ApweilerR,BruskiewichR,BureauT,BurrF,CostadeOliveiraA, FuksG,HabaraT,HabererG,HanB,HaradaE,HirakiAT,HirochikaH, HoenD,HokariH,HosokawaS,HsingYI,IkawaH,IkeoK,ImanishiT,ItoY, JaiswalP,KannoM, etal : Curatedgenomeannotationof Oryzasativa ssp. japonicaandcomparativegenomeanalysiswith Arabidopsisthaliana. GenomeRes 2007, 17 :175-183. 73.ProjectIRGS: Themap-basedsequenceofthericegenome. Nature 2005, 436 :793-800. 74.HarrisRS: ImprovedpairwisealignmentofgenomicDNA. PhDThesis PennsylvaniaStateUniversity,BiologyDepartment;2007. 75. MillerLabSoftware.. [http://www.bx.psu.edu/miller_lab/]. 76.LiL,StoeckertCJJr,RoosDS: OrthoMCL:identificationoforthologgroups foreukaryoticgenomes. GenomeRes 2003, 13 :2178-2189. 77.DuvickJ,FuA,MuppiralaU,SabharwalM,WilkersonMD,LawrenceCJ, LushboughC,BrendelV: PlantGDB:aresourceforcomparativeplant genomics. NucleicAcidsRes 2008, 36 :D959-965. 78. PlantGDB.. [http://www.plantgdb.org/]. 79.EdgarRC: MUSCLE:amultiplesequencealignmentmethodwithreduced timeandspacecomplexity. BMCBioinformatics 2004, 5 :113. 80.StamatakisA: RAxML-VI-HPC:maximumlikelihood-basedphylogenetic analyseswiththousandsoftaxaandmixedmodels. Bioinformatics 2006, 22 :2688-2690. 81.MingR,HouS,FengY,YuQ,Dionne-LaporteA,SawJH,SeninP,WangW, LyBV,LewisKL,SalzbergSL,FengL,JonesMR,SkeltonRL,MurrayJE, ChenC,QianW,ShenJ,DuP,EusticeM,TongE,TangH,LyonsE,PaullRE, MichaelTP,WallK,RiceDW,AlbertH,WangML,ZhuYJ, etal : Thedraft genomeofthetransgenictropicalfruittreepapaya(Caricapapaya Linnaeus). Nature 2008, 452 :991-996. 82.LuoM,WingRA: AnimprovedmethodforplantBAClibrary construction. MethodsMolBiol 2003, 236 :3-20. 83.KimH,SanMiguelP,NelsonW,ColluraK,WissotskiM,WallingJG,KimJP, JacksonSA,SoderlundC,WingRA: Comparativephysicalmapping between Oryzasativa (AAgenometype)and O.punctata (BBgenome type). Genetics 2007, 176 :379-390. 84.AmmirajuJS,LuoM,GoicoecheaJL,WangW,KudrnaD,MuellerC,TalagJ, KimH,SisnerosNB,BlackmonB,FangE,TomkinsJB,BrarD,MacKillD, McCouch,KurataN,LambertG,GalbraithDW,ArumuganathanK,KRao, WallingSJ,GillN,YuY,SanMiguelP,SoderlundC,JacksonS,WingRA: The Oryza bacterialartificialchromosomelibraryresource:constructionand analysisof12deep-coveragelarge-insertBAClibrariesthatrepresent the10genometypesofthegenus Oryza GenomeRes 2006, 16 :140-147. 85.EwingB,HillierL,WendlMC,GreenP: Base-callingofautomated sequencertracesusingphred.I.Accuracyassessment. GenomeRes 1998, 8 :175-185. 86.ChouHH,HolmesMH: DNAsequencequalitytrimmingandvector removal. Bioinformatics 2001, 17 :1093-1104. 87.ZuccoloA,SebastianA,TalagJ,YuY,KimH,ColluraK,KudrnaD,WingRA: Transposableelementdistribution,abundanceandroleingenomesize variationinthegenus Oryza BMCEvolBiol 2007, 7 :152. 88. SeqClean.. [http://sourceforge.net/projects/seqclean/]. 89.ChevreuxB,PfistererT,DrescherB,DrieselAJ,MullerWE,WetterT,SuhaiS: UsingthemiraESTassemblerforreliableandautomatedmRNA transcriptassemblyandSNPdetectioninsequencedESTs. GenomeRes 2004, 14 :1147-1159. 90.SonnhammerEL,DurbinR: Adot-matrixprogramwithdynamicthreshold controlsuitedforgenomicDNAandproteinsequenceanalysis. Gene 1995, 167 :GC1-10. 91. Sputnik.. [http://espressosoftware.com/sputnik/index.html]. 92.MorganteM,HanafeyM,PowellW: Microsatellitesarepreferentially associatedwithnonrepetitiveDNAinplantgenomes. NatGenet 2002, 30 :194-200. 93.XiongZ,KimJS,PiresJC: Integrationofgenetic,physical,andcytogenetic mapsforBrassicarapachromosomeA7. CytogenetGenomeRes 2010, 129 :190-198. 94.LeeE,HarrisN,GibsonM,ChettyR,LewisS: Apollo:acommunityresource forgenomeannotationediting. Bioinformatics 2009, 25 :1836-1837. 95.SteinLD,MungallC,ShuS,CaudyM,MangoneM,DayA,NickersonE, StajichJE,HarrisTW,ArvaA,LewisS: Thegenericgenomebrowser:a buildingblockforamodelorganismsystemdatabase. GenomeRes 2002, 12 :1599-1610. 96. RiceAnnotationProject.. [http://rapdb.dna.affrc.go.jp/]. 97. GrapeGenomeBrowser.. [http://www.genoscope.cns.fr/externe/ GenomeBrowser/Vitis/].doi:10.1186/gb-2011-12-5-r48 Citethisarticleas: Zuccolo etal .: Aphysicalmapforthe Amborella trichopoda genomeshedslightontheevolutionofangiosperm genomestructure. GenomeBiology 2011 12 :R48. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Zuccolo etal GenomeBiology 2011, 12 :R48 http://genomebiology.com/2011/12/5/R48 Page14of14


xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EARNRW234_9VNP2Y INGEST_TIME 2012-02-29T15:53:40Z PACKAGE AA00009670_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES