<%BANNER%>

UFIR




PAGE 1

Whole Genome Based Nucleotide Diversity # pairwise comparisons 55 # sites considered 1,495,930 # mismatches 22,539,297 0.273946909280514 5% 0.0136973454640257 # mismatches caused by 1 sequencing error 10 # mismatches needed to change pi by 5% 112696 5 # sequencing errors needed 112696 bermuda model 0.9999 # sequencing errors assuming bermuda model 150 change in 1.82E 05 % change in 0.01% genome size (# sites considered) needed to create 5% change in 1.13E+09



PAGE 1

Additional file 3 Estimates for Genome Sizes (in Mbp). Species Sequence in contigs (> 200 bp) Sequence in contigs bp) Genome size (sequence estimate) Optical map size Genome size (map estimate) Scaffold size Y. aldovae 4.29 0.03 4.33 AflII: 4.30 AflII: 4.22 AflII: 4.22 Y. bercovieri 4.32 0.11 4.37 AflII: 4.54 NheI: 4.50 AflII: 4.19 NheI: 4.24 AflII: 4.51 NheI: 4.52 Y. frederiksenii 4.87 0.12 4.90 AflII: 5.34 NheI: 5.40 AflII: 4.96 NheI: 4.88 AflII: 5.31 NheI: 5.30 Y. intermedia 4.69 0.13 4.71 AflII: 4.95 NheI: 5.07 AflII: 4.74 NheI: 4.56 AflII: 5.03 NheI: 5.00 Y. kristensenii 4.65 0.04 4.77 AflII: 4.63 AflII: 4.46 AflII: 4.65 Y. mollaretii 4.54 0.16 4.57 AflII: 4.93 NheI: 4.92 AflII: 4.75 NheI: 4.57 AflII: 4.88 NheI: 4.86 Y. rohdei 4.31 0.02 4.34 AflII: 4.65 NheI: 4.65 AflII: 4.30 NheI: 4.20 AflII: 4.56 NheI: 4.55 Y. ruckeri 3.73 0.03 3.79 AflII: 3.90 NheI: 3.96 AflII: 3.76 NheI: 3.58 AflII: 3.89 NheI: 3.85 was used to compute the sequence estimate. Contigs mapped onto the optical map were

PAGE 2

used to estimate the expansion in optical map size and this was used to compute the map estimate of genome size.



PAGE 1

Y Y e e r r s s i i n n i i a a e e 1 1 . Y Y . a a l l d d o o v v a a e e 2 2 . Y Y . b b e e r r c c o o v v e e r r i i i i 3 3 . Y Y . f f r r e e d d e e r r i i k k s s e e n n i i i i 4 4 . Y Y . i i n n t t e e r r m m e e d d i i a a 5 5 . Y Y . k k r r i i s s t t e e n n s s e e n n i i i i 6 6 . Y Y . m m o o l l l l a a r r e e t t i i i i 7 7 . Y Y . r r o o h h d d e e i i 8 8 . Y Y . r r u u c c k k e e r r i i E E . c c o o l l i i 1 1 . Y Y 1 1 0 0 8 8 8 8 2 2 . C C 6 6 0 0 0 0 / / P P 1 1 3 3 . G G M M 1 1 1 1 9 9 / / p p R R K K 2 2 / / p p I I N N T T 4 4 . I I N N V V 1 1 1 1 0 0 48 kb 20 kb 12 kb 5 kb 4 kb 3 kb 5. 8 kb 60 kb 93 kb Y Y e e r r s s i i n n i i a a e e Y Y e e r r s s i i n n i i a a e e E E . c c o o l l i i 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 5 kb ladder 1 kb ladder 1 kb ladder 48 kb Legend: Pulse fie ld gel electrophoresis of plasmid preparations from Yersiniae strains and known E. coli strains carrying different sized plasmids. PFGE gel (1% agarose) was run under the following conditions in 0.5X TBE buffer at 14 C, switch time 1 6 seconds, for 18 hrs at a voltage gradient of 6V/cm



PAGE 1

RESEARCHOpenAccess Genomiccharacterizationofthe Yersinia genus PeterEChen 1 † ,ChristopherCook 1 † ,AndrewCStewart 1 † ,NiranjanNagarajan 2,7 ,DanDSommer 2 ,MihaiPop 2 BrendanThomason 1 ,MaureenPKileyThomason 1 ,ShannonLentz 1 ,NicholeNolan 1 ,ShanmugaSozhamannan 1 AlexanderSulakvelidze 3 ,AlfredMateczun 1 ,LeiDu 4 ,MichaelEZwick 1,5 ,TimothyDRead 1,5,6* Abstract Background: NewDNAsequencingtechnologieshaveenableddetailedcomparativegenomicanalysesofentire generaofbacterialpathogens.Priortothisstudy,threespeciesoftheenterobacterialgenus Yersinia thatcause invasivehumandiseases( Yersiniapestis Yersiniapseudotuberculosis ,and Yersiniaenterocolitica )hadbeensequenced. However,therewerenogenomicdataonthe Yersinia specieswithmorelimitedvirulencepotential,frequently foundinsoilandwaterenvironments. Results: Weusedhigh-throughputsequencing-by-synthesisinstrumentstoobtain25-to42-foldaverage redundancy,whole-genomeshotgundatafromthetypestrainsofeightspecies: Y.aldovae Y.bercovieri Y. frederiksenii Y.kristensenii Y.intermedia Y.mollaretii,Y.rohdei ,and Y.ruckeri .Thedeepestbranchingspeciesinthe genus, Y.ruckeri ,causativeagentofredmouthdiseaseinfish,hasthesmallestgenome(3.7Mb),althoughitshares thesamecoresetofapproximately2,500genesastheothermembersofthespecies,whosegenomesrangein sizefrom4.3to4.8Mb. Yersinia genomeshadasimilarglobalpartitionofproteinfunctions,asmeasuredbythe distributionofClusterofOrthologousGroupsfamilies.Genometogenomevariationinislandswithgenes encodingfunctionssuchasureases,hydrogeneasesandB-12cofactormetabolitereactionsmayreflectadaptations tocolonizingspecifichosthabitats. Conclusions: Rapidhigh-qualitydraftsequencingwasusedsuccessfullytocomparepathogenicandnonpathogenicmembersofthe Yersinia genus.Thisworkunderscorestheimportanceoftheacquisitionofhorizontally transferredgenesintheevolutionof Y.pestis andpointstovirulencedeterminantsthathavebeengainedandlost onmultipleoccasionsinthehistoryofthegenus. Background Ofthemillionsofspeciesofbacteriathatliveonthis planet,onlyaverysmallpercentagecauseserious humandiseases[1].Comparativegeneticstudiesare revealingthatmanypathogenshaveonlyrecently emergedfromproteanenvironmental,commensalor zoonoticpopulations[2-5].Foravarietyofreasons, mostresearchefforthasbeenfocusedoncharacterizing thesepathogens,whiletheircloselyrelatednon-pathogenicrelativeshaveonlybeenlightlystudied.Asa result,ourunderstandingofthepopulationbiologyof thesecladesremainsbiased,limitingourknowledgeof theevolutionofvirulenceandourabilitytodesign reliableassaysthatdistinguishpathogensignaturesfrom thebackgroundintheclinicandenvironment[6]. Therecentdevelopmentofsecondgenerationsequencingplatforms(reviewedbyMardis[7,8]andShendure [7,8])offersanopportunitytochangethedirectionof microbialgenomics,enablingtherapidgenomesequencingoflargenumbersofstrainsofbothpathogenicand non-pathogenicstrains.Herewedescribethedeploymentofnewsequencingtechnologytoextensivelysampleeightgenomesfromthe Yersinia genusofthefamily Enterobacteriaceae.Thefirst publishedsequencingstudiesonthe Yersinia genushavefocusedexclusivelyon invasivehumandisease-causingspeciesthatincluded five Yersiniapestis genomesequences(oneofwhich, strain91001,isfromtheavirulent ‘ microtus ’ biovar) [9-12],two Yersiniapseudotuberculosis [13,14]andone Yersiniaenterocolitica biotype1B[15].Primarilyazoonoticpathogen, Y.pestis ,thecausativeagentofbubonic *Correspondence:tread@emory.edu † Contributedequally 1 BiologicalDefenseResearchDirectorate,NavalMedicalResearchCenter,503 RobertGrantAvenue,SilverSpring,Maryland20910,USA Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 2010Chenetal.;licenseeBioMedCentralLtd.ThisisanopenaccessarticledistributedunderthetermsoftheCreativeCommons AttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,andreproductionin anymedium,providedtheoriginalworkisproperlycited.

PAGE 2

plagueandacategoryAselectagent,isarecently emergedlineagethathassinceundergoneglobalexpansion[2].Followingintroductionintoahumanthrough fleabite[16], Y.pestis isengulfedbymacrophagesand takentotheregionallymphnodes. Y.pestis then escapesthemacrophagesandmultipliestocausea highlylethalbacteremiaifun treatedwithantibiotics. Y. pseudotuberculosis and Y.enterocolitica (primarilybiotype1B)areenteropathogenst hatcausegastroenteritis followingingestionandtranslocationofthePeyer ’ s patches.Like Y.pestis ,theenteropathogenicYersiniae canescapemacrophagesandmultiplyoutsidehostcells, butunliketheirmorevirulentcogener,theyonlyusually causeself-limitinginflammatorydiseases. Thegenerallyacceptedpathwayfortheevolutionof thesemoreseveredisease-causingYersiniaeismemorablyencapsulatedbytherecipe, ‘ addDNA,stir,reduce ’ [17].IneachspeciesDNAhasbeen ‘ added ’ byhorizontalgenetransferintheformofplasmidsandgenomic islands.Allthreehumanpathogenscarrya70-kbpYV virulenceplasmid(alsoknownaspCD),whichcarries theYsctypeIIIsecretionsystemandYopseffectors [18-20],thatisnotdetectedinnon-pathogenicspecies. Y.pestis alsohastwoadditionalplasmids,pMT(also knownaspFra),containingtheF1capsule-likeantigen andmurinetoxin,andpPla(alsoknownaspPCP1), whichcarriesplasminogen-activatingfactor,Pla. Y.pestis Y.pseudotuberculosis ,andbiotype1B Y.enterocolitica alsocontainachromosomallylocated,mobile,highpathogenicityisland(HPI)[21].TheHPIincludesa clusterofgenesforbiosynthesisofyersiniabactin,an iron-bindingsiderophorenecessaryforsystemicinfection[22]. ‘ Stir ’ referstointra-genomicchange,notably therecentexpansionofinsertionsequences(IS)within Y.pestis (3.7%ofthe Y.pestis CO92genome[9])anda highlevelofgenomestructuralvariation[23]. ‘ Reduce ’ describesthelossoffunctionsviadeletionsandpseudogeneaccumulationin Y.pestis [9,13]duetoshiftsin selectionpressurecausedbythetransitionfrom Y.pseudotuberculosis -likeenteropathoge nicitytoaflea-borne transmissioncycle.Thisdescriptionof Y.pestis evolutionis,ofcourse,oversimplified. Y.pestis strainsshow considerablediversityatthe phenotypiclevelandthere isevidenceofacquisitionofplasmidsandotherhorizontallytransferredgenes[[12,24,25]DNAmicroarray, [26,27]]. Whilemostattentionisfocusedonthethreewellknownhumanpathogens,severalother,lessfamiliar Yersinia specieshavebeensplitofffrom Y.enterocolitica overthepast40yearsbasedonbiochemistry,serology and16SRNAsequence[28,29]. Y.ruckeri isanagriculturallyimportantfishpathogenthatisacauseof ‘ red mouth ’ diseaseinsalmonidfish.Thespecieshassufficientphylogeneticdivergencefromtherestofthe Yersinia genustostircontroversyaboutitstaxonomic assignment[30]. Y.fredricksenii Y.kristensenii Y.inter-media Y.mollaretii Y.bercovieri ,and Y.rohdei have beenisolatedfromhumanfeces,freshwater,animal fecesandintestinesandfoods[28].Therehavebeen reportsassociatingsomeofthespecieswithhuman diarrhealinfections[31]andlethalityformice[32]. Y. aldovae ismostoftenisolatedfromfreshwaterbuthas alsobeenculturedfromfishandthealimentarytractsof wildrodents[33].Thereisnoreportofisolationof Y. aldovae fromhumanfecesorurine[28]. Usingmicrobead-based,massivelyparallelsequencing bysynthesis[34]werapidlyandeconomicallyobtained highredundancygenomesequenceofthetypestrainsof eachoftheseeightlesserknown Yersinia species.From thesegenomesequences,wewereabletodeterminethe coregenesetthatdefinesthe Yersinia genusandto lookforcluestodistinguishthegenomesofhuman pathogensfromlessvirulentstrains.ResultsHigh-redundancydraftgenomesequencesofeight Yersinia speciesWholegenomeshotguncove rageofeightpreviously unsequenced Yersinia species(Table1)wasobtainedby single-endbead-basedpyrosequencing[34]usingthe 454LifeSciencesGS-20instrument.Eachoftheeight genomeswassequencedtoahighlevelofredundancy (between25and44sequencingreadsperbase)and assembled denovo intolargecontigs(Table2;Additionalfile1).Excludingcontigsthatcoveredrepeat regionsandthereforehadsignificantlyincreasedcopy number,thequalityofthesequenceofthedraftassemblieswashigh,withlessthan0.1%ofthesequenceof eachgenomehavingaconsensusqualityscore[35]less than40.Moreover,amorerecentassessmentofquality ofGS-20datasuggeststhatthescoresgeneratedbythe 454LifeSciencessoftwareareanunderestimationofthe truesequencequality[36 ].Themostcommonsequencingerrorencounteredwhenassemblingpyrosequencingdataistherarecallingofincorrectnumbersof homopolymerscausedbyvariationintheintensityof fluorescenceemitteduponextensionwiththelabeled nucleoside[34]. Previousstudiesandourexperiencesuggestthatat thislevelofsequencecoveragetheassemblygapsfallin repeatregionsthatcannotbespannedbysingle-end sequencereads(averagelength109nucleotidesinthis study)[34].FewerRNAgenesareobservedcomparedto published Yersinia genomesfinishedusingtraditional Sangersequencingtechnology(Additionalfile1),reflectingthegreaterdifficultyofuniquelyassemblingrepetitivesequenceswithsingle-endreads.Weassessedthe qualityofourassembliesusingmetricsimplementedinChen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page2of18

PAGE 3

the amosvalidate package[37].Specifically,wefocused onthreemeasuresfrequently correlatedwithassembly errors:densityofpolymorphismswithinassembled reads,depthofcoverage,andbreakpointsinthealignmentofunassembledreadstothefinalassembly. Regionsineachgenomewhereatleastonemeasure suggestedapossiblemis-assemblywerevalidatedby manualinspection(Additionalfile2).Manyofthesuspectregionscorrespondedtocollapsedrepeats,where thelocationofindividualmembersoftherepeatfamily withinthegenomecouldnotbeaccuratelydetermined. Basedontheresultsofthe amosvalidate analysisand theopticalmapalignmentwefoundnoevidenceof mis-assembliesleadingtochimericcontigsintheeight genomeswesequenced.Genomicregionsflaggedbythe amosvalidate packagearemadeavailableinGFFformat (compatiblewithmostgenomebrowsers)inAdditional file3. Genomesizeswereestimatedinitiallyasthesumof thesizesofthecontigsfromtheshotgunassembly,with correctionsforcontigsrepresentingcollapsedrepeats (Table2).Wealsoderivedanindependentestimatefor thegenomesizefromthewhole-genomeopticalrestrictionmappingofthespecies[38](Additionalfile4). Alignmentofcontigstotheopticalmaps[39]suggested thattheopticalmapsconsistentlyoverestimatedsizes(2 to10%onaverage).Aftercorrection,themap-based estimatesandsequence-basedestimatesagreedwell (within7%).Twospecies, Y.aldovae (4.22to4.33Mbp) and Y.ruckeri (3.58to3.89Mbp),haveasubstantially reducedtotalgenomesizecomparedwiththe4.6to4.8 Mbpseeninthegenusgenerally.Theagreement betweentheopticalmapsandsequence-basedestimates ofgenomesizestalliedwith experimentalevidencefor thelackoflargeplasmidsinthesequencedgenomes (Additionalfile5).Ascreenformatchestoknown Table1StrainssequencedinthisstudySpeciesATCC number Other designations Year isolated Location isolated DescriptionOptimum growth temperature Reference Y.aldovae 35236TCNY6065NRCzechoslovakiaDrinkingwater26C[100] Y.bercovieri 43970TCDC2475-87NRFranceHumanstool26C[101] Y.frederiksenii 33641TCDC1461-81,CIP 80-29 NRDenmarkSewage26C[102] Y.intermedia 29909TCIP80-28NRNRHumanurine37C[103] Y.kristensenii 33638TCIP80-30NRNRHumanurine26C[104] Y.mollaretii 43969TCDC2465-87NRUSASoil26C[101] Y.rohdei 43380TH271-36/78,CDC 3022-85 1978GermanyDogfeces26C[105] Y.ruckeri 29473T2396-611961Idaho,USARainbowtrout( Oncorhynchus mykiss )withredmouthdisease 26C[67]NR,notreportedinreferencepublication. Table2GenomessummarySpeciesTypestrainNCBI project ID GenBank accession number Total reads Numberof contigs>500 nt Totallengthof largecontigs %large contigs
PAGE 4

plasmidgenesproducedonlyafewcandidateplasmid contigs,totalinglessthan10kbpofsequenceineach genome. ThenumberofISelementspergenomefortheeight species(12to167matches)discoveredusingtheISfinderdatabase[40]wasmuchlowerthaninthe Y.pestis genome(1,147matches;copynumbersestimatestook intoaccountthepossibilityofmis-assemblyandwere accordinglyadjusted;seeMet hods).Furthermore,the non-pathogenicspecieswiththemostISmatches, namely Y.bercovieri (167matches), Y.aldovae (143 matches)and Y.ruckeri (136matches),havecomparativelysmallergenomes.Wealsosearchedfornovel repeatfamiliesusinga denovo repeat-finder[41]and collectedanon-redundantsetof44repeatsequence familiesinthe Yersinia genus(Table3;Additionalfile 6).Interestingly,thewell-knownERICelement[42]was recoveredbyour denovo searchandwasfoundtobe presentinmanycopiesinallthepathogenicspecies,but wasrelativelyrareinthenon-pathogenicones.Onthe otherhand,asimilarandrecentlydiscoveredelement, YPAL[43](alsorecoveredbythe denovo search),was abundantinallthe Yersinia genomesexceptthefish pathogen Y.ruckeri .InsertionsequenceIS1541Cinthe ISfinderdatabase,whichhasexpandedin Y.pestis (to morethan60copies),hadonlyahandfulofstrong matchesin Y.enterocolitica Y.pseudotuberculosis ,and Y.bercovieri andnodiscernablematchesintheother Yersinia genomes.New Yersinia genomedatareducethepoolofunique detectiontargetsfor Y.pestis and Y.enterocoliticaThesequencesgeneratedinthisstudyprovidenew backgroundinformationforvalidatinggenusdetection anddiagnosisassaystargetingpathogenicmembersof the Yersinia genus.Theassaydesignprocesscommonly startsbycomputationallyidentifyinggenomicregions thatareuniquetothetargetedgenus( ’ signatures ’ )-an idealsignatureissharedbyalltargetedpathogensbut notfoundinabackgroundcomprisingnon-pathogenic nearneighborsorinotherunrelatedmicrobes.While manypathogensarewellcharacterizedatthegenomic level,thebackgroundsetisonlysparselyrepresentedin genomicdatabases,therebylimitingtheabilitytocomputationallyscreenoutnon-specificcandidateassays (falsepositives).Asaresult,manyassaysmayfail experimentalfieldtests,therebyincreasingthecostsof assaydevelopmentefforts.Toevaluatewhetherthenew genomicsequencesgeneratedinourstudycanreduce theincidenceoffalsepositivesinassaydevelopment,we computedsignaturesforthe Y.pestis and Y.enterocolitica generausingtheInsigniapipeline[44],thesystem previouslyusedto successfullydevelopassaysforthe detectionof V.cholerae [44].Weidentified171and100 regionswithinthegenomesof Y.pestis and Y.enterocolitica ,respectively,thatrepresentgoodcandidatesfor thedesignofdetectionassays.In Y.pestis theseregions tendedtoclusteraroundtheoriginofreplication, whereasin Y.enterocolitica therewasamoreevendistribution.TheaverageG+Ccontentoftheregionsfor theuniquesequencesinbothspecieswasclosetothe Yersinia average(47%)andtherewasnotastrongassociationwithputativegenomeislands(Additionalfiles7, 8,9,10,11,12,[45]).Forbothspecies,mostregions overlappedpredictedgenes(161of171(94%)and96of 100(96%)in Y.pestis and Y.pseudotuberculosis ,respectively).Interestingly,171 Y.pestis generegionswere spreadoveronly70differentgenes,whereasthe96 Y. enterocolitica regionswerefoundoverlappingonly90 genes.Therewasnoobvioustrendinthenatureofthe genesharboringtheseputativesignalsexceptthatmany couldbearguablyclassedas ‘ non-core ’ functions, Table3DistributionofcommonrepeatsequencesERIC (127bp) YPAL (167bp) Kristensenii 39 (142bp) IS1541C (708bp) Aldovae3 (154bp) E.coli 03505 Y.pestis 5443336138 Y.pseudotuberculosis 555229536 Y.enterocolitica 63144100375 Y.aldovae 68446040 Y.bercovieri 9456913 Y.frederiksenii 057605 Y.intermedia 29148043 Y.kristensenii 29970059 Y.mollaretii 66226020 Y.rohdei 037807 Y.ruckeri 452002Threeoftherepeatsequencesfoundusing denovo searchesmatchedtheknownrepeatelementsERIC,YPAL,andIS1541Candareidentifiedassuch. Kristensenii 39andAldovae3areelementsfoundfrom denovo searchesinthe Y.kristensenii and Y.aldovae genomes,respectively.Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page4of18

PAGE 5

encodingphageendonucleases,invasins,hemolysinsand hypotheticalproteins. Ten Y.pestis -specificand31 Y.enterocolitica -specific putativesignatureshavesignificantmatchesinthenew genomesequencedata(Additionalfiles7,8,9,10),indicatingassaysdesignedwithintheseregionswouldresult infalsepositiveresults.Thisresultunderscorestheneed forafurthersamplingofgenomesofthe Yersinia genus inordertoassistthedesignofdiagnosticassays.Yersinia whole-genomecomparisonsWeperformedamultiplealignmentofthe11 Yersinia speciesusingtheMAUVEalgorithm[46](fromhereon Y.pestis CO92and Y.pseudotuberculosis IP32953were usedastherepresentativegenomesoftheirspecies)and obtained98locallycollinearblocks(LCBs;Additional files13,14,[47]).ThemeanlengthoftheLCBswas 23,891bp.Theshortestblockwas1,570bp,andthe longestwas201,130bp.Thismultiplealignmentofthe ‘ core ’ regiononaveragecovered52%ofeach Yersinia genome.Thenucleotidediversity( )fortheconcatenatedalignedregionwas0.27,oranapproximategenuswidenucleotidesequencehomologyof73%.Asexpected forasetofbacteriawiththislevelofdiversity,thealignmentofthegenomesshowsevidenceofmultiplelarge genomerearrangements[23](Additionalfile13). Usinganautomatedpipelineforannotationand clusteringofproteinorthologsbasedontheMarkov chainclusteringtoolMCL[48],weestimatedthesize ofthe Yersinia proteincoresettobe2,497andthe pan-genome[49]tobe27,470 (Additionalfiles15,16, 17,18).Thecorenumberfallsasymptoticallyasgenomesareintroducedandhencethisestimateissomewhatlowerthantherecentanalysisofonlythe Y. enterocolitica Y.pseudotuberculosis and Y.pestis genomes(2,747coreproteins)[15].Wefound681genes tobeinexactlyonecopyineach Yersinia genomeand tobenearlyidenticalinlength.WeusedClustalW [50]toalignthemembersofthishighlyconservedset, andconcatenatedindividualgeneproductalignments tomakeadatasetof170,940aminoacidsforeachof thespecies.Uninformativecharacterswereremoved fromthedatasetandaphylogenyofthegenuswas computedusingPhylip[51](Figure1).Thetopology ofthistreewasidenticalwhetherdistanceorparsimonymethodswereused(Additionalfiles19,20)and wasalsoidenticaltoatreebasedonthenucleotide sequenceoftheapproximately1.5MbofthecoregenomeinLCBs(seeabove).Thegenusbrokedowninto threemajorclades:theoutlyingfishpathogen, Y.ruckeri ; Y.pestis / Y.pseudotuberculosis ;andtheremainder ofthe ‘ enterocolitica ’ -likespecies. Y.kristensenii ATCC33638Twasthenearestneighborof Y.enterocolitica 8081.Theoutlyingpositionof Y.ruckeri was confirmedfurtherwhenweanalyzedthecontribution ofthegenometoreducingthesizeofthe Yersinia coreproteinfamiliesset.If Y.ruckeri wasexcluded, the Yersinia corewouldbe2,232proteinfamiliesof N=2ratherthan2,072(Table4).Incontrast,omissionofanyoneofthe10otherspeciesonlyreduced thesetbyamaximumof22families. ClusteringthesignificantClusterofOrthologous Groups(COG)hits[52]foreachgenomehierarchically (Figure2)yieldedasimilar patternforthethreebasic clades.TheoverallcompositionoftheCOGmatchesin eachgenome,asmeasuredbytheproportionofthe numbersineachCOGsupercategory,wassimilar throughoutthegenus,withthenotableexceptionsof thehighpercentageofgroupLCOGsin Y.pestis dueto theexpansionofISrecombinasesandtherelatively lownumberofgroupG(su garmetabolism)COGsin Y.ruckeri (Figure2).Sharedproteinclustersinpathogenic Yersinia : yersiniabactinbiosynthesisisthekeychromosomal functionspecifictohighvirulenceinhumansThe Yersinia proteomeswereinvestigatedforcommon clustersinthethreehighvirulencespeciesmissingfrom thelowhumanvirulencegenomes(Figure3).Becauseof thecloseevolutionaryrelationshipofthe ‘ enterocolictica ’ cladestrains,thenumberofuniqueproteinclusters in Y.enterocolitica wasreducedtoagreaterdegreethan themorephylogenticallyisolated Y.pestis and Y.pseudotuberculosis .Manyofthesamegenomeislandsidentifiedasrecenthorizontalacquisitionby Y.pestis and/or Y.pseudotuberculosis [9,13,15]werenotpresentinany ofthenewlysequencedgenomes.However,somegenes, interestingfromtheperspectiveofthehostspecificityof the Y.pestis / Y.pseutoberculosis ancestor,weredetected inother Yersinia speciesforthefirsttime.These includedorthologsofYPO3720/YPO3721,ahemolysin andactivatorproteinin Y.intermedia Y.bercovieri and Y.fredricksenii ;YPO0599,ahemeutilizationprotein alsofoundin Y.intermedia ;andYPO0399,anenhancin metalloproteasethathadanorthologin Y.kristensenii (ykris0001_41250).Enhancinwasoriginallyidentifiedas afactorpromotingbaculovirusinfectionofgypsymoth midgutbydegradationofmucin[53].Otherlociin Y. pestis / Y.pseudotuberculosis linkedwithinsectinfection, theTccCandTcABCtoxinclusters[54],werealso foundin Y.mollaretti .In Y.mollaretti theTcaandTcc proteinsshowabout90%sequenceidentityto Y.pestis / Y.pseudotuberculsis andshareidenticalflankingchromosomallocations.Furtherworkwillneedtobeundertakentoresolvewhethertheinsertionofthetoxin genesin Y.mollaretti isanindependenthorizontal transfereventoroccurredpriortodivergenceofthe species.Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page5of18

PAGE 6

Aftercomparisonofthenewlowvirulencegenomes, thenumberofproteinclusterssharedby Y.enterocolitica andtheothertwopathogenswasreducedto12and 13for Y.pseudotuberculosis and Y.pestis ,respectively (Figure3).Theremainingsharedproteinswereeither identifiedasphage-relatedorofunknownrole,providingfewcluestopossiblefunctionsthatmightdefinedistinctpathogenicniches.Performingasimilaranalysis strategybetweenothersgenomeofthe ‘ enterocolitica ’ cladeand Y.pestis or Y.pseudotuberculosis gaveasimilarresultintermsofnumbersandtypesofsharedproteinclusters. Onlysixteenclustersofchromosomalproteinswere foundtobecommontoallthreehigh-virulencespecies butabsentfromalleightnon-pathogens(Figure3).Elevenofthesearecomponentsoftheyersiniabactinbiosynthesisopero n(Additionalfile21),further highlightingthecriticalimportanceofthisironbinding siderophoreforinvasivedisease.Theotherproteinsare generallysmallproteinsthatarelikelyincludedbecause theyfallinunassembledregionsoftheeightdraftgenomes.Oneothersmallisland ofthreeproteinsconstitutingamulti-drugeffluxpump(YE0443toYE0445) wascommontothehigh-virulencespeciesbutmissing fromtheeightdraftlow-virulencespecies.Variableregionsof Y.enterocolitica cladegenomesThebasicmetabolicsimilaritiesof Y.enterocolitica and thesevenspeciesonthemainbranchofthe Yersinia genusphylogenetictreearefurtherillustratedinFigure 4,wherethebestproteinmatchesagainsteach Y.enterocolitica 8081geneproduct[15]areplottedagainsta circulargenomemap.Veryfewgenesexclusiveto Y. enterocolitica 8081werefoundoutsideofprophage regions,whichisatypicalresultwhengroupsofclosely Table4 Yersinia coresizereductionbyexclusionofone speciesSpeciesexcludedCoreproteinfamilies None2,072 Y.enterocolitica 2,074 Y.aldovae 2,085 Y.bercovieri 2,079 Y.frederiksenii 2,077 Y.intermedia 2,080 Y.kristensenii 2,076 Y.mollaretii 2,078 Y.rohdei 2,091 Y.ruckeri 2,232 Y.pseudotuberculosis 2,076 Y.pestis 2,094Thecoreproteinfamilieswithnumberofmembers2orgreaterwere recalculatedineachcase(seeMaterialsandmethods)withtheproteinset fromonegenomemissing. 0.00 0.25 0.50 0.75 1.00Sensitivity 0.00 0.25 0.50 0.75 1.00 1-SpecificityA 0.00 0.25 0.50 0.75 1.00Sensitivity 0.00 0.25 0.50 0.75 1.00 1-SpecificityB 0.00 0.25 0.50 0.75 1.00Sensitivity 0.00 0.25 0.50 0.75 1.00 1-SpecificityC 0.00 0.25 0.50 0.75 1.00Sensitivity 0.00 0.25 0.50 0.75 1.00 1-SpecificityD Figure1 Yersinia whole-genomephylogeny .Thephylogenyofthe Yersinia genuswasconstructedfromadatasetof681concatenated, conservedproteinsequencesusingtheNeighbor-Joining(NJ)algorithmimplementedbyPHYLIP[51].Thetreewasrootedusing E.coli .The scalemeasuresnumberofsubstitutionsperresidue.Treetopologiescomputedusingmaximumlikelihoodandparsimonyestimatesareidentical witheachotherandtheNJtree(Additionalfile20).Theonlybranchesnotsupportedinmorethan99%ofthe1,000bootstrapreplicatesusing bothmethodsaremarkedwithasterisks.Boththesebranchesweresupportedby>57%ofreplicates. Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page6of18

PAGE 7

relatedbacterialgenomesarecompared[55].Oneofthe largestislandsfoundin Y.enterocolitica 8081wasthe 66-kb Y.pseudotuberculosis adhesionpathogenicity island(YAPIye)[15,56,57],auniquefeatureofbiotype 1Bstrains.YAPIye,containingatypeIVpilusgeneclusterandotherputativevirulencedeterminants,suchas arsenicresistance,issimilartoa99-kbYAPIpstthatis foundinseveralotherserotypesof Y.pseudotuberculosis [14,57]butismissingin Y.pestis andtheserotypeI Y. pseudotuberculosis strainIP32953[14].Amodelhas beenproposedfortheacquisitionofYAPIinacommon ancestorof Y.pseudotuberculosis and Y.enterocolitica andsubsequentdegradationtovariousdegreeswithin the Y.pseudotuberculosis clade.However,thecomplete absenceofYAPIfromanyofthesevenspeciesinthe Y. enterocolitica branch(Figure4),aswellasfrommost strainsof Y.enterocolitica [15],arguesagainstanancient acquisitionofYAPI,butinsteadsuggeststherecent Figure2 ComparisonofmajorCOGgroupsin Yersinia genomes .BarsrepresentthenumberofproteinsassignedtoCOGsuperfamilies[52] foreachgenome,basedonmatchestotheConservedDomainDatabase[95]databasewithanE-valuethreshold<10-10.TheCOGgroupsare: U,intracellulartrafficking;G,carbohydratetransportandmetabolism;R,generalfunctionprediction;I,lipidtransportandmetabolism;D,cel l cyclecontrol;H,coenzymetransportandmetabolism;B,chromatinstructure;P,inorganiciontransportandmetabolism;W,extracellular structures;O,post-translationalmodification;J,translation;A,RNAprocessingandediting;L,replication,recombinationandrepair;C,ener gy production;M,cellwall/membranebiogenesis;Q,secondarymetabolitebiosynthesis;Z,cytoskeleton;V,defensemechanisms;E,aminoacid transportandmetabolism;K,transcription;N,cellmotility;T,signaltransduction;F,nucleotidetransport;S,functionunknown. Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page7of18

PAGE 8

Figure3 Distributionofproteinclustersacross Y.enterocolitica 8081, Y.pestis CO92,and Y.pseudotuberculosis IP32953 (a) TheVenn diagramshowsthenumberofproteinclustersuniqueorsharedbetweenthetwootherhighvirulence Yersinia species(seeMaterialsand methods). (b) Thenumberofsharedanduniqueclustersthatdonotcontainasinglememberoftheeightlowhumanvirulencegenomes sequencedinthisstudy. Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page8of18

PAGE 9

independentacquisitionofrelatedislandsbyboth Y. enterocolitica biogroup1Band Y.pseudotuberculosis Manygenespreviouslythoughttobeuniqueto Y. enterocolitica ingeneralandbiotype1Binparticular turnedouttohaveorthologsinthelowhumanvirulencespeciessequencedinthisstudy.Theseincluded severalputativebiotype1B-s pecificgenesidentifiedby microarray-basedscreening[58],includingYE0344 HylDhemophore(yinte0001_41550has78%nucleotide identity),YE4052metalloprotease(yinte0001_36030has 95%nucleotideidentity),andYE4088,atwo-component sensorkinase,whichhadort hologsinallspecies.Large portionsofthebiogroup1B-specificislandcontaining theYts1typeIIsecretionsystemwerefoundin Y.ruckeri Y.mollaretii ,and Y.aldovae Y.aldovae and Y.mollaretii alsohadislandscontaining ysa typethree secretionsystems(TTSS)with75to85%nucleotide identitytothehomologin Y.enterocolitica 1B.The Figure4 Protein-basedcomparisonof Y.enterocolitica 8081tothe Yersinia genus .Themaprepresentstheblastscoreratio(BSR)[98,99]to theproteinencodedby Y.enterocolitica [15].BlueindicatesaBSR>0.70(strongmatch);cyan0.69to0.4(intermediate);green<0.4(weak).Red andpinkoutercirclesarelocationsofthe Y.enterocolitica genesonthe+and-strands.Thegenomesareorderedfromoutsidetoinsidebased onthegreatestoverallsimilarityto Y.enterocolitica : Y.kristensenii Y.frederiksenii Y.mollaretii Y.intermedia Y.bercovieri,Y.aldovae Y.rohdei Y. ruckeri Y.pseudotuberculosis ,and Y.pestis .Theblackbarsontheoutsiderefertogenomeislandsin Y.enterocolitica identifiedbyThomson etal [15]. Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page9of18

PAGE 10

ysa genesareachromosomalcluster[9,13,15]thatin Y. enterocolitica ,atleast,appearstoplayaroleinvirulence [59].The Y.enterocoliticaysa genesarefoundinthe plasticityzone(Figure4)andhaveverylowsimilarityto the Y.pestis and Y.pseudotuberculosisysa genes(which aremoresimilartothe Salmonella SPI-2island[60,61]) andarefoundbetweenorthologsofYPO0254and YPO0274[9].Specieswithinthe Yersinia genushad eitherthe Y.enterocolitica typeof ysa TTSSlocusor the Y.pestis /SPI-2type(withtheexceptionof Y.aldovae ,whichhasboth;Additionalfile22).Thissuggested theexchangeofchromosomalTTSSgeneswithin Yersinia Themodularnatureoftheislandsfoundinthe Y. enterocolitica genomewasdemonstratedfurtherbytwo examplesgleanedfromcomparisonwiththeevolutionarilyclosestlowhumanvirulencegenome, Y.kristensenii ATCC33638T(Figure1).TheYGI-3island[15]in Y. enterocolitica 8081isadegradedintegratedplasmid;at thesamechromosomallocusin Y.kristensenii ATCC 33638Taprophagewasfound,suggestingthattheYGI3locationmaybearecombinationalhotspot.Another Y.enterocolitica 8081island,YGI-1,encodesa ‘ tight adherence ’ ( tad )locusresponsiblefornon-specificsurfacebinding. Y.kristensenii ATCC33638Thadanidentical13gene tad locusinthesameposition,butthe nucleotidesequenceidentityoftheregionto Y.enterocolitica 8081wasuniformlylowerthanthatfoundfor therestofthegenome,suggestingtherehadbeeneither ageneconversioneventreplacingthe tad locuswitha setofnewallelesintherecenthistoryof Y.kristensenii or Y.enterocolitica orthelocuswasunderveryhigh positiveselectivepressure.Niche-specificmetabolicadaptationsinthe Yersinia genusComparisonofthe Y.enterocolitica genometo Y.pestis and Y.pseudotuberculosis revealedsomepotentially significantmetabolicdifferencesthatmayaccountfor varyingtropismsingastricinfections[62]. Y.enterocolitica 8081alonecontainedentiregeneclustersforcobalamin(vitaminB12)biosynthesis( cbi ),1,2-propanediol utilization( pdu ),andtetrathionaterespiration( ttr ).In Y. enterocolitica and Salmonellatyphimurium [63,64],vitaminB12isproducedunderanaerobicconditionswhere itisusedasacofactorin1,2-propanedioldegradation, withtetrathionateservingasanelectronacceptor.This studyshowedthegenesforthispathwaytobeageneral featureofspeciesinthe ‘ enterocolitica ’ branchofthe Yersinia genus(withthecaveatthatsomeportionsare missinginsomespecies;forexample, Y.rohdei ismissingthe pdu cluster(Table5).Additionally, Y.intermedia Y.bercovieri ,and Y.mollaretii containedgene clustersencodingdegradationofthemembranelipid constituentethanolamine. Ethanolaminemetabolism underanaerobicconditionsalsorequirestheB12cofactor. Y.intermedia containedthefull17-genecluster reportedin S.typhimurium [65],includingstructural componentsofthecarboxysomeorganelle.Anotherdiscoveryfromthe Y.enterocolitica genomeanalysiswas thepresenceoftwocompacthydrogenasegeneclusters, Hyd-2andHyd-4[15].Hydrogenreleasedfromfermentationbyintestinalmicrofloraisimputedtobean importantenergysourceforentericgutpathogens[66]. Bothgeneclustersareconservedacrossalltheother sevenenterocolitica-bran chspecies,butaremissing from Y.pestis and Y.pseudotuberculosis Y.ruckeri containedasingle[NiFe]-containinghydrogenasecomplex. Y.ruckeri ,themostevolutionarilydistantmemberof thegenus(Figure1)withthesmallestgenome(3.7Mb), hadseveralfeaturesthatweredistinctivefromitscogeners.The Y.ruckeri O-antigenoperoncontaineda neuB sialicacidsynthasegene,t hereforethebacteriumwas predictedtoproduceasialatedoutersurfacestructure. Amongthecommon Yersinia genesthataremissing Table5Keyniche-specificgenesin Yersiniacbipduttreuthyd-2hyd-4uremtnopg Y.enterocolitica +++-+++++ Y.aldovae ++--+++++ Y.bercovieri +++ eutABC +++++ Y.frederiksenii +++-+++++ Y.intermedia +++ eutSPQTDMNEJGHABCLKR +++++ Y.kristensenii +++-+++++ Y.mollaretii ++eutABC +++++ Y.rohdei +-+-+++++ Y.ruckeri ----+/-hyfABCGHINfdhF+/-(hyaD,hypEDB)--+ Y.pseudotuberculosis ------+++ Y.pestis ------+/---Abbreviations: cbi ,cobalamin(vitaminB12)biosynthesis; pdu ,1,2-propanediolutilization; ttr ,tetrathionaterespiration; eut ,ethanolaminedegradation; hyd-2 and hyd-4 ,hydrogenases2and4,respectively; ure ,urease; mtn ,methioninesalvagepathway; opg ,osmoprotectant(synthesisofperiplasmicbranchedglucans).Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page10of18

PAGE 11

onlyin Y.ruckeri werethoseforxyloseutilizationand ureaseactivity,consistent withphenotypesthathave longbeenknowninclinicalmicrobiology[67](Table3). Surprisingly,wediscoveredthat Y.ruckeri wasalso missingthe mtnKADCBEU geneclusterthatcomprises themajorityofthemethioninesalvagepathway[68] foundinmostotherYersiniae.Thesegeneshavealso beendeletedfrom Y.pestis ,butaswith Y.ruckeri ,the mtnN (methylthioadenosinenucleosidase)ismaintained. Thelossofthesegenesin Y.pestis hasbeeninterpreted asaconsequenceofadaptationtoanobligatehostdwellinglifecycle,wherethea vailabilityofthesulfurcontainingaminoacidsisnotanutritionallimitation [15].DiscussionWhole-genomeshotgunsequencingbyhigh-throughput bead-basedpyrosequencinghasprovedremarkablyusefulforthelarge-scalesequencingofcloselyrelatedbacteria[49,69-74].High-quality denovo assembliescanbe obtainedwithrelativelyfewerrorsandgapswhenthe sequencereadcoverageredundancyis15-foldor greater.Closingallthegapsineachgenomesequenceis time-consumingandcostly;therefore,inthenearfuture therewillbeanexcessofdraftbacterialsequencesversusclosedgenomesinpublic databases.Ouranalysis strategyheremeldsbothdraftandcompletegenomes usingconsistentautomatedannotationthatisscalable toencompasspotentiallymuchlargerdatasets.High qualitydraftsequencingislikelytoshortlysupersede comparativegenomehybridizationusingmicroarrays [25,58,75,76]asthemostpopularstrategyforgenomewidebacterialcomparisons.Genomesequencedatasets canbeusedtoshedlightonthenovelfunctionsin closerelativesthatmayhavebeenlostinthepathogen ofinterest,aswellasorthologsingenomesthatfall belowthethresholdforhybridization-baseddetection. Theproblemsofusingmicroarraysforcomparisonsof morediversebacterialtaxaareillustratedinastudyof the Yersinia genus,usingmanyofthestrainssequenced inthiswork,wheretheestimatednumberofcoregenes wasfoundtobeonly292[25]. Wecannotclaimcompletecoverageofallthetype strainsofthe Yersinia genus,asthreenewspecieshave beencreated[77-79]sinceourworkbegan.Nonetheless, fromthisextensivegenomicsurveywehaveattempted tocategorizethefeaturesthatdefine Yersinia .Thecore ofabout2,500proteinspresentinall11speciesisnota subsetofanyotherenterobacterialgenome.Speciesof the Y.enterocolitica clade(Figure1)haveoverallasimilararrayofproteinfunctionsandcontainanumberof conservedgeneclusters(cobalamin,hydrogenases, ureases,andsoon)foundinotherbacteria( Helicobacter Campylobacte r, Salmonella Escherichiacoli )that colonizethemammaliangut. Y.pestis haslostmanyof thesegenesbydeletionordisruptionsinceitssplitfrom theentericpathogen Y.pseudotuberculosis andadoption ofaninsectvector-mediatedpathogenicitymode.The smaller Y.ruckeri chromosomedoesnotappeartoresult fromrecentreductiveevolution(asisthecaseof Y.pestis ),evidencedbytherelativelylownumberofframeshiftsandpseudogenes,andthenormalamountof repetitivecontigsinthe newbler genomeassembly.Like Y.pestis Y.ruckeri lacksurease,methioninesalvage genes,andB12-relatedmetabolism.Theprevailingconsensusisthatthepathwayoftransmissionofredmouth diseaseinfishisgastrointestinalyetthesimilaritiesof Y. ruckeri genomereductionto Y.pestis hintatanalternativemodeofinfectionfor Y.ruckeri Thiscomparativegenomicstudyreaffirmsthatthe distinguishingfeaturesofthehigh-levelmammalian pathogensistheacquisitionofaparticularsetof mobileelements:HPI,thepYV,pMT1andpPCPplasmids,andtheYADIisland. However,theeightspecies sequencedinthisstudybeli evedtohaveeitherlowor zeropotentialforhumaninfe ction,containnumerous, apparentlyhorizontallytransferredgenesthatwouldbe consideredputativevirule ncedeterminantsifdiscoveredinthegenomeofamoreseriouspathogen.Two examplesareyaldo0001_40900(bilesalthydrolase)and yfred0001_36480,anorthologoftheTibAadhesinof enterotoxigenic E.coli .Bilesalthydrolaseinpathogenic Brucellaabortus hasbeenshowntoenhancebile resistanceduringoralmous einfections[80]andthe TibAadhesinformsabiofilmthatmediateshuman cellinvasion[81].Thelow-virulencespeciescontaina similar(andinsomecasesgreater)numberofmatches toknowndrugresistancemechanismsthathavebeen curatedintheAntibioticResistanceGenesDatabase [82](Additionalfile23,[83]).AddingDNA,stirring andreducing[17]is,therefore,thegeneralrecipefor Yersinia genomeevolutionratherthanaformulaspecifictopathogens.Comparativegenomicstudiessuchas thesecanbeusedtoenhanceourabilitytorapidly assessthevirulencepotentialofagenomesequenceof anemergingpathogenandweplantocontinueto buildmoreextensivedatabasesofnon-pathogenic Yersinia genomesthatwillallowustodrawconclusions withmorestatisticalpowerp ossiblethanjust11representativespecies.ConclusionsGenomesofthe11 Yersinia speciesstudiedrangein estimatedsizefrom3.7to4.8Mb.Thenucleotidediversity( )oftheconservedbackbonebasedonlargecollinearconservedblockswascalculatedtobe0.27.There werenoorthologsofgenesandpredictedproteinsin thevirulence-associatedplasmidspYV,pMT1,andpPla,Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page11of18

PAGE 12

andtheHPIof Y.pestis inthegenomesofthetype strains-eightnon-orlow-pathogenic Yersinia species Apartfromfunctionsencodedontheaforementioned plasmids,HPIandYAPIregions,onlynineproteins detectedascommontoallthree Yersinia pathogenspecies( Y.pestis Y.enterocolitica and Y.pseudotuberculosis )werenotfoundonatleastoneoftheothereight species.Therefore,ourstudyisinagreementwiththe hypothesisthatgenesacquiredbyrecenthorizontal transfereffectivelydefinethemembersofthe Yersinia genusvirulentforhumans. Thecoreproteomeofthe11 Yersinia speciesconsists ofapproximately2,500proteins. Yersinia genomeshada similarglobalpartitionofproteinfunctions,asmeasured bythedistributionofCOGfamilies.Genometogenome variationinislandswithge nesencodingfunctionssuch asureases,hydrogenasesandB12cofactormetabolite reactionsmayreflectadaptati onstocolonizingspecific hosthabitats. Y.ruckeri ,asalmonidfishpath ogen,istheearliest branchingmemberofthegenusandhasthesmallest genome(3.7Mb).Like Y.pestis Y.ruckeri lacksfunctionalurease,methioninesalvagegenes,andB12-related metabolism.Theselossesmayreflectadaptationtoa lifestylethatdoesnotincludecolonizationofthemammaliangut. TheabsenceoftheYAPIislandinanyoftheseven ‘ Y. enterocolitica clade ’ genomeslikelyindicatesthatYAPI wasacquiredindependentlyin Y.enterocolitica and Y. pseudotuberculosis Weidentified171and100regionswithinthegenomes of Y.pestis and Y.enterocolitica ,respectively,thatrepresentedpotentialcandidatesforthedesignofnucleotide sequence-basedassaysforuniquedetectionofeach pathogen.MaterialsandmethodsBacterialstrainsTypestrainsoftheeight Yersinia speciessequencedin thisstudy(Table1)wereacquiredfromtheAmerican TypeCultureCollection(ATCC)andpropagatedat37 Cor25C( Y.ruckeri )onLuriamedia.DNAforgenome sequencingwaspreparedfromovernightbrothcultures propagatedfromsingleco loniesstreakedonaLuria agarplateusingtheProme gaWizardMaxiprepSystem (Promega,Madison,WI,USA).GenomesequencingandassemblyGenomesweresequencedusingtheGenomeSequencer 20Instrument(454LifeSequencingInc.,Branford,CT) [34].Librariesforsequencingwerepreparedfrom5 g ofgenomicDNA.Thesequencingreadsforeachproject wereassembled denovo usingthe newbler program (version01.51.02;454LifeSciencesInc).OpticalmappingOpticalmaps[38]foreachgenomeusingtherestriction enzymes Afl IIand Nhe I( Y.aldovae and Y.kristensenii onlyhavemapsfor Afl II)wereconstructedbyOpgen Inc.(Madison,WI).The newbler assembliesforeach genomewerescaffoldedusingtheopticalmapsandthe SOMApackage[39](Additionalfile4).Assembliesthat didnotalignagainsttheopticalmapweretestedfor highreadcoverage,unusualGCcontent,andgood matchestoplasmid-associatedgenesfromtheACLAME database[84](BLASTE-valuelessthan10-20)toidentify sequencesthatcouldpotentiallybepartofanextrachromosomalelement.DetectionofdisruptedgenesWeusedtwomethodsfordetectingdisruptedproteins used.Inthefirstmethodclusteredproteingroupswere usedtoadduceevidenceforpossiblegenedisruption events.Theclusterswereparsedforpairsofproteins thatmetthefollowingcriteria:bothfromthesamegenome;encodedbygeneslocatedonthesamestrandwith lessthan200bpseparatingtheirframes;andtotal lengthofthecombinedgeneswasnotgreaterthan 120%ofthelongestgeneinthecluster.Thesecond methodusedwastheFSFINDalgorithm[85]witha standardbacterialgenemodeltocomparetheaccumulationofpredictedframeshiftsacrossdifferentgenomes.AssemblyvalidationInordertoruleoutartifactsduetoundocumented featuresofthe newbler assemblies,newassemblies weregeneratedforvalidati onpurposesbyre-mapping alltheshotgunreadstothesequenceoftheassembled contigsusingAMOScmp[86].Theresultingassembly wasthensubjectedtoanalysisusingthe amosvalidat e package[37].Theoutputofthisprogramincludesa listofgenomicregionsthat containinconsistencies highlightingpossiblemisassemblies.Theresulting regionsweremanuallyinspectedtoreducethepossibilityofassemblyerrors.Theregionsflaggedbythe amosvalidate packageareprovidedinGFF(general featureformat),compatiblewithmostgenomebrowsers(Additionalfile3).Insertionsequencesand denovo repeatfindingThepresenceofrepeatsisk nowntoconfoundassembly programsandthe newbler assemblerisknowntocollapse high-fidelityrepeatinstan cesintoasinglecontig.To accountforthepossibilityofsuchmisassemblies,we computedthecopynumberofcontigsbasedoncoverage statisticsandusedthisinformationtocorrectourestimatesfortheabundanceofclassesofrepeats(Additional file3).Tofindknowninsertionsequences,thegenomes werescannedformatchesusingtheISfinderwebserviceChen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page12of18

PAGE 13

[40]withaBLASTE-valuethresholdof10-10(matchesto knownrepeatcontigswerecountedasmultiplematches basedonthecoverageofthecontig).Inaddition,we searchedforcommonrepeatsequencesinthegenome usingtheRepeatScoutprogram[41]afterduplicating knownrepeatcontigs.Therepeatsfoundineachgenome werecollected(64sequences)andtransformedintoa non-redundantsetof44sequencesusingtheCD-HIT program[87](Additionalfile6).Therepeatsfoundwere thensearchedagainstallthegenomesusingBLASTwith anE-valuethresholdof10-10torecordmatches.The resultantfiguresforrepeatc ontentareestimationsthat maybelowerthanthetruenumberfoundinthe genomes.FindinguniqueDNAsignaturesin Y.pestis and Y. enterocoliticaDNAsignaturesforthe Y.pestis andthe Y.enterocolitica genomeswereidentifiedusingtheInsigniapipeline [44].Signaturesof100bporlongerwereconsidered goodcandidatesforthedesignofdetectionassays. Thesesignatureswerethencomparedwiththegenomes ofthe Yersinia strainssequencedduringthecurrent studyusingtheMUMmerpackage[88]withdefault parameters.Signaturesthatmatchedbymorethan40 bpweredeemedinvalidated,astheywouldlikelyleadto false-positiveresults.AutomatedannotationWeusedDIYA[89]forautomatedannotation,which isapipelineforintegratingbacterialanalysistools. UsingDIYA,theassembliesgeneratedby newbler were scaffoldedbasedontheopti calmap,concatenated,and usedasatemplatefortheprogramsGLIMMER[90], tRNASCAN-SE[91],andRNAmmer[92]forpredictionofopenreadingframesandRNAgenes,respectively.Allpredictedproteinsencodedbyeachcoding sequencewerecomparedagainstadatabaseofallproteinspredictedfromthecanonicalannotationof Y. pestis CO92[9]asapreliminaryscreenforpotentially novelfunctions.TheGenBankformatfilescreated fromtheeightgenomessequencedinthisstudywere combinedwithotherDIYA-annotated,published wholegenomestoformadatasetforanalysis.AllproteinsweresearchedagainsttheUniRef50database (July2008)[93]usingBLASTP[94]andagainstthe ConservedDomainDatabase[95]usingRPSBLAST [96]withanE-valuethresholdof10-10torecord matches.DatabaseaccessionnumbersTheannotatedgenomedataweresubmittedtoNCBI GenBankandthesequencedatasubmittedtotheNCBI ShortReadArchive(SRA).Theaccessionnumbersare: Y.rohdei ,ATCC_43380:[Genbank:ACCD00000000]/ [SRA:SRA009766.1]; Y.ruckeri ATCC_29473:[Genbank: ACCC00000000]/[SRA:SRA009767.1]; Y.aldovae ATCC_35236:[Genbank:ACCB00000000]/[SRA: SRA009760.1]; Y.kristensenii ATCC_33638:[Genbank: ACCA00000000]/[SRA:SRA009764.1]; Y.intermedia ATCC_29909:[Genbank:AALF00000000]/[SRA: SRA009763.1]; Y.frederiksenii ATCC_33641:[Genbank: AALE00000000]/[SRA:SRA009762.1]; Y.mollaretii ATCC_43969:[Genbank:AALD00000000]/[SRA: SRA009765.1]; Y.bercovieri ATCC_43970:[Genbank: AALC00000000]/[SRA:SRA009761.1].Whole-genomealignmentusingMAUVEYersinia genomeswerealignedusingthestandard MAUVE[46]algorithmwithde faultsettings.Acutoff for1,500bpwassetastheminimumLCBlength. LCBsforeachgenomewereextractedfromtheoutput oftheprogramandconcatenated.Fromthealignment nucleotidediversitywascalculatedbyanin-house scriptusingpositionswheretherewasabaseinall11 genomes.Becauseofthesizeofthedataset,thecalculatedvalueof isveryrobustintermsofsequence error.Wecalculatedthat112,696nucleotidesof sequenceintheconcatenatedcorewouldhavetobe wrongtoaltertheestimationof P by5%(Additional file24).PHYLIP[51]programswereusedtobuilda consensustreeoftheMAUVEalignmentwithbootstrapping1,000replicates.Theunderlyingmodelfor eachreplicatewasFitch-Margoliash.Thefinalphylogenywasresolvedaccording tothemajorityconsensus rule.ClusteringproteinorthologsThecompletepredictedproteomefromallgenomes annotatedinthisstudywassearchedagainstitself usingBLASTPwithdefaultparameters.Weremoved short,spurious,andnon-homologoushitsbysettinga bitscore/alignmentlengthfilteringthresholdof0.4and minimumproteinlengthof30.Predictedproteinspassingthisfilterwereclusteredintofamiliesbasedon thesenormalizeddistanc esusingtheMCLalgorithm [48]withaninflationparametervalueof4.These parameterswerebasedonani nvestigationofclustering12completed E.coli genomes,whichproduced verysimilarresultstoapreviousstudy[42].WholegenomephylogeneticreconstructionFromtheresultsofclusteringanalysis,681proteins werefoundthathadexactlyonememberineachofthe genomesandthelengthofeachproteininthecluster wasnearlyidentical.TheseproteinsequenceswereChen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page13of18

PAGE 14

alignedusingClustalW[50],andindividualgenealignmentswereconcatenatedintoastringof170,940amino acidsforeachgenome.Uninformativecharacterswere removedfromthedatasetusingGblocks[97]andaphylogenyreconstructedwithPHYLIP[51]underaneighbor-joiningmodel.Toevaluatenodesupport,amajority rule-consensustreeof1,000bootstrapreplicateswas computed.Additionalfile1:StatisticsfromDIYAandframeshiftdetection programsoneightgenomessequencedinthisstudyandother enterobacterialgenomesfromNCBI StatisticsfromrunningDIYA[89] andframeshiftdetectionprogramsontheeightgenomessequencedin thisstudyandvariousotherenterobacterialgenomesdownloadedfrom NCBI. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S1.xls] Additionalfile2:Resultsof amosvalidate analysisontheeight genomesofthisstudy Resultsof amosvalidate [37]analysisonthe eightgenomesofthisstudy. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S2.doc] Additionalfile3:Additionalannotationfiles TheseconsistofISfinder [40],RepeatScout[41]and amosvalidate [37]results(GFFformat);repeats foundbyRepeatScoutinfastaformat,scaffoldfiles(NCBIAGPformat); andinformationaboutlengthofcontigs,readcount,estimatedrepeat number,countinscaffoldandwhetherornotthecontigwasplacedby SOMA[39]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S3.gz] Additionalfile4:Estimatesforgenomesizes(inMbp)basedon opticalmapdata Estimatesforgenomesizes(inMbp)basedonoptical mapdata. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S4.doc] Additionalfile5:Pulsedfieldgelanalysisoftheeightsequenced Yersinia speciesandfailuretodetectplasmids An E.coli strainwith knownplasmidswasapositivecontrol. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S5.doc] Additionalfile6:Sequencesofthedetectedrepeatfamilies Sequencesofthedetectedrepeatfamilies. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S6.txt] Additionalfile7: Y.pestis CO92signatureslongerthan100bp computedbytheInsigniapipeline Y.pestis CO92signatureslonger than100bpcomputedbytheInsignia[44]pipeline. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S7.txt] Additionalfile8:Sequencesofthenewgenomesthatmatch(that is,invalidate)the Y.pestis CO92signatureslistedinAdditionalfile 7 Sequencesofthenewgenomesthatmatch(thatis,invalidate)the Y. pestis CO92signatureslistedinAdditionalfile7. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S8.txt] Additionalfile9: Y.enterocolitica signatureslongerthan100bp computedbytheInsigniapipeline Y.enterocolitica signatureslonger than100bpcomputedbytheInsigniapipeline. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S9.txt] Additionalfile10:Sequencesofthenewgenomesthatmatch(that is,invalidate)the Y.enterocolitica signatures Sequencesofthenew genomesthatmatch(thatis,invalidate)the Y.enterocolitica signatures. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S10.txt] Additionalfile11: Y.pestis genomewiththeInsiginia-indentified repeatsandgenomeislandsplotted Y.pestis genomewiththe Insiginia-indentifiedrepeatsandgenomeislandsidentifiedusing IslandViewer[45]plotted.ThefigurewascreatedusingDNAPlotter[106]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S11.png] Additionalfile12: Y.enterocolitica genomewiththeInsiginiaindentifiedrepeatsandgenomeplotted Y.enterocolitica genomewith theInsiginia-indentifiedrepeatsandgenomeislandsidentifiedusing IslandViewer[45]plotted.ThefigurewascreatedusingDNAPlotter[106]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S12.png] Additionalfile13:OutputoftheMAUVE[46]alignmentof11 Yersinia species Theeightgenomessequencedinthisstudyare representedaspseudocontigs,orderedbyacombinationofoptical mappingandalignmenttotheclosestcompletedreferencegenome. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S13.jpeg] Additionalfile14:Wholegenomemultiplealignmentproducedby MAUVEofthe11 Yersiniagenomes Wholegenomemultiplealignment producedbyMAUVEofthe11 Yersinia genomesinXMFAformat[106]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S14.zip] Additionalfile15:Outputoftheclusteranalysisofthe11 Yersinia species Thetopleveldirectoryconsistsofadirectorycalled Additional_cluster_filesand5010directories,oneforeachmulti-protein clusterfamily.(Thistopleveldirectoryhasbeensplitintothreedatafiles foruploadingpurposes(Additionalfiles15,16,17).)Withinthedirectory arethefollowingfiles:PGL1_unique_ Yersinia _unclustered.out-listofall proteinsingletonsthatMCLdidnotgroupintoacluster(seeMaterials andMethods);PGL1_ Yersinia _unique_locus_tags.txt-namesofthe11 locustagprefixesusedforeachgenome;PGL1_unique_ Yersinia .gffmappingeach Yersinia proteintoaclusterintabdelimitedGFF; PGL1_unique_ Yersinia .sigfile-listofthelongestproteinineachcluster; PGL1_unique_ Yersinia .summary-summarytableoffeaturesofeachof theclusters;PGL1_unique_ Yersinia .table-summarytableofeachprotein intheclusters.Withineachclusterdirectoryarethefollowingfiles,where ‘ x ’ istheclustername:PGL1_unique_ Yersinia -x.faa-multifastafileofthe proteinsinthecluster;PGL1_unique_ Yersinia -x.summary-summaryof thepropertiesoftheproteins;PGL1_unique_ Yersinia -x.matches-blast matchesbetweentheproteinsofthecluster;PGL1_unique_ Yersinia -x. muscle.fasta-musclealignmentoftheproteins;PGL1_unique_ Yersinia -x. muscle.fasta.gblo-gblocksoutputofmusclealignment(thatis,autotrimmedalignment);PGL1_unique_ Yersinia -x.muscle.fasta.gblo.htm-as aboveinhtmlformat;PGL1_unique_ Yersinia -x.muscle.tree-treefilefrom musclealignment;PGL1_unique_ Yersinia -x.sif-matchesbetween proteinsinsimpleinteractionformatfordisplayongraphingsoftware. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S15.zip]Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page14of18

PAGE 15

Additionalfile16:Outputoftheclusteranalysisofthe11 Yersinia species Thetopleveldirectoryconsistsofadirectorycalled Additional_cluster_filesand5010directories,oneforeachmulti-protein clusterfamily.(Thistopleveldirectoryhasbeensplitintothreedatafiles foruploadingpurposes(Additionalfiles15,16,17.)Withinthedirectory arethefollowingfiles:PGL1_unique_ Yersinia _unclustered.out-listofall proteinsingletonsthatMCLdidnotgroupintoacluster(seeMaterials andMethods);PGL1_ Yersinia _unique_locus_tags.txt-namesofthe11 locustagprefixesusedforeachgenome;PGL1_unique_ Yersinia .gffmappingeach Yersinia proteintoaclusterintabdelimitedGFF; PGL1_unique_ Yersinia .sigfile-listofthelongestproteinineachcluster; PGL1_unique_ Yersinia .summary-summarytableoffeaturesofeachof theclusters;PGL1_unique_ Yersinia .table-summarytableofeachprotein intheclusters.Withineachclusterdirectoryarethefollowingfiles,where ‘ x ’ istheclustername:PGL1_unique_ Yersinia -x.faa-multifastafileofthe proteinsinthecluster;PGL1_unique_ Yersinia -x.summary-summaryof thepropertiesoftheproteins;PGL1_unique_ Yersinia -x.matches-blast matchesbetweentheproteinsofthecluster;PGL1_unique_ Yersinia -x. muscle.fasta-musclealignmentoftheproteins;PGL1_unique_ Yersinia -x. muscle.fasta.gblo-gblocksoutputofmusclealignment(thatis,autotrimmedalignment);PGL1_unique_ Yersinia -x.muscle.fasta.gblo.htm-as aboveinhtmlformat;PGL1_unique_ Yersinia -x.muscle.tree-treefilefrom musclealignment;PGL1_unique_ Yersinia -x.sif-matchesbetween proteinsinsimpleinteractionformatfordisplayongraphingsoftware. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S16.zip] Additionalfile17:Outputoftheclusteranalysisofthe11 Yersinia species Thetopleveldirectoryconsistsofadirectorycalled Additional_cluster_filesand5010directories,oneforeachmulti-protein clusterfamily.(Thistopleveldirectoryhasbeensplitintothreedatafiles foruploadingpurposes(Additionalfiles15,16,17.)Withinthedirectory arethefollowingfiles:PGL1_unique_ Yersinia _unclustered.out-listofall proteinsingletonsthatMCLdidnotgroupintoacluster(seeMaterials andMethods);PGL1_ Yersinia _unique_locus_tags.txt-namesofthe11 locustagprefixesusedforeachgenome;PGL1_unique_ Yersinia .gffmappingeach Yersinia proteintoaclusterintabdelimitedGFF; PGL1_unique_ Yersinia .sigfile-listofthelongestproteinineachcluster; PGL1_unique_ Yersinia .summary-summarytableoffeaturesofeachof theclusters;PGL1_unique_ Yersinia .table-summarytableofeachprotein intheclusters.Withineachclusterdirectoryarethefollowingfiles,where ‘ x ’ istheclustername:PGL1_unique_ Yersinia -x.faa-multifastafileofthe proteinsinthecluster;PGL1_unique_ Yersinia -x.summary-summaryof thepropertiesoftheproteins;PGL1_unique_ Yersinia -x.matches-blast matchesbetweentheproteinsofthecluster;PGL1_unique_ Yersinia -x. muscle.fasta-musclealignmentoftheproteins;PGL1_unique_ Yersinia -x.muscle.fasta.gblo-gblocksoutputofmusclealignment(thatis,autotrimmedalignment);PGL1_unique_ Yersinia -x.muscle.fasta.gblo.htm-as aboveinhtmlformat;PGL1_unique_ Yersinia -x.muscle.tree-treefilefrom musclealignment;PGL1_unique_ Yersinia -x.sif-matchesbetween proteinsinsimpleinteractionformatfordisplayongraphingsoftware. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S17.zip] Additionalfile18:Completeproteinsetsforthe11speciesof Yersinia Completeproteinsetsforthe11speciesof Yersinia Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S18.zip] Additionalfile19:Inferredevolutionarytreesreconstructedusing PHYLIP[51]ofthe11 Yersinia speciesproteomesbasedon parsimony Toevaluatenodesupport,amajorityrule-consensustreeof 1,000bootstrapreplicateswascomputed. E.coli wasusedasan outgroupspecies. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S19.pdf] Additionalfile20:Inferredevolutionarytreesreconstructedusing PHYLIP[51]ofthe11 Yersinia speciesproteomesbasedon maximumlikelihood Toevaluatenodesupport,amajorityruleconsensustreeof1,000bootstrapreplicateswascomputed. E.coli was usedasanoutgroupspecies. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S20.pdf] Additionalfile21:Twentyproteinsconservedinpathogenicstrains butmissingfromthenon-pathogenset Acurveshowingtherateof declineinnumberofthissetasmorenon-pathogengenomesare addedisalsoincluded. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S21.doc] Additionalfile22:PhylogenyofTTSScomponentYscNin Yersinia andotherenterobacteriaspecies PhylogenyofTTSScomponentYscN in Yersinia andotherenterobacteriaspecies. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S22.doc] Additionalfile23:Putativeantibioticresistancegenesinthe Yersinia genusdeterminedusingtheAntibioticResistanceGenes Database Putativeantibioticresistancegenesinthe Yersinia genus determinedusingtheAntibioticResistanceGenesDatabase[45]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S23.xls] Additionalfile24:Calculationsfortheestimationof fromaligned Yersinia coregenomes Calculationsfortheestimationof from aligned Yersinia coregenomes. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S24.doc] Abbreviations ATCC:AmericanTypeCultureCollection;COG:ClusterofOrthologous Groups;HPI:high-pathogenicityisland;IS:insertionsequence;LCB:locally collinearblock;SRA:ShortReadArchive;TTSS:typeIIIsecretionsystem;YAPI: Y.pseudotuberculosis adhesionpathogenicityisland. Acknowledgements WewouldliketothankAyraAkmal,KimBishop-Lilly,MikeCariaso,Brian Osborne,BillKlimke,TimWelch,JenniferTsai,CherylTimmsStraussand membersofthe454ServiceCenterfortheirhelpandadviceincompleting thismanuscript.ThisworkwassupportedbygrantTMTI0068_07_NM_Tfrom theJointScienceandTechnologyOfficeforChemicalandBiological Defense(JSTO-CBD),DefenseThreatReductionAgencyInitiativetoTDR.The viewsexpressedinthisarticlearethoseoftheauthorsanddonot necessarilyreflecttheofficialpolicyorpositionoftheUSDepartmentofthe Navy,USDepartmentofDefense,ortheUSGovernment.Someofthe authorsareemployeesoftheUSGovernment,andthisworkwasprepared aspartoftheirofficialduties.Title17USC§105providesthat ‘ Copyright protectionunderthistitleisnotavailableforanyworkoftheUnitedStates Government. ’ Title17USC§101definesaUSGovernmentworkasawork preparedbyamilitaryservicememberoremployeeoftheUSGovernment aspartofthatperson ’ sofficialduties. Authordetails1BiologicalDefenseResearchDirectorate,NavalMedicalResearchCenter,503 RobertGrantAvenue,SilverSpring,Maryland20910,USA.2Universityof MarylandInstituteforAdvancedComputerSciences,Centerfor BioinformaticsandComputationalBiology,UniversityofMaryland,College Park,Maryland20742,USA.3EmergingPathogensInstituteandDepartment ofMolecularGeneticsandMicrobiology,UniversityofFloridaCollegeofChen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page15of18

PAGE 16

Medicine,Gainesville,Florida32610,USA.4454LifeSciencesInc.,15 CommercialStreet,Branford,Connecticut06405,USA.5Departmentof HumanGenetics,EmoryUniversitySchoolofMedicine,615MichaelStreet, Atlanta,Georgia30322,USA.6DivisionofInfectiousDiseases,Emory UniversitySchoolofMedicine,615MichaelStreet,Atlanta,Georgia30322, USA.7Currentaddress:ComputationalandMathematicalBiology,Genome InstituteofSingapore,Singapore-127726. Authors ’ contributions TDR,MEZ,LD,andSSwereinvolvedinstudydesign.AS,andAMwere involvedinmaterials.LD,MPKT,SL,andNNowereinvolvedin454 sequencing.SS,MPKT,andCCwereinvolvedinadditionalexperiments.PEC, TDR,CC,MEZ,ACS,NN,MP,BT,andDDSwereinvolvedindataanalysis. TDR,MP,andNNwrotethepaper. Received:23May2009Revised:7October2009 Accepted:4January2010Published:4January2010 References1.EckerDJ,SampathR,WillettP,WyattJR,SamantV,MassireC,HallTA, HariK,McNeilJA,Buchen-OsmondC,BudowleB: TheMicrobialRosetta StoneDatabase:acompilationofglobalandemerginginfectious microorganismsandbioterroristthreatagents. BMCMicrobiol 2005, 5 :19. 2.AchtmanM,ZurthK,MorelliG,TorreaG,GuiyouleA,CarnielE: Yersinia pestis,thecauseofplague,isarecentlyemergedcloneofYersinia pseudotuberculosis. ProcNatlAcadSciUSA 1999, 96 :14043-14048. 3.vanBaarlenP,vanBelkumA,SummerbellRC,CrousPW,ThommaBP: Molecularmechanismsofpathogenicity:howdopathogenic microorganismsdevelopcross-kingdomhostjumps?. FEMSMicrobiolRev 2007, 31 :239-277. 4.VanErtMN,EasterdayWR,HuynhLY,OkinakaRT,Hugh-JonesME,RavelJ, ZaneckiSR,PearsonT,SimonsonTS,U ’ RenJM,KachurSM,LeademDoughertyRR,RhotonSD,ZinserG,FarlowJ,CokerPR,SmithKL,WangB, KeneficLJ,Fraser-LiggettCM,WagnerDM,KeimP: GlobalGenetic PopulationStructureofBacillusanthracis. PLoSONE 2007, 2 :e461. 5.ZwickME,McAfeeF,CutlerDJ,ReadTD,RavelJ,BowmanGR,GallowayDR, MateczunA: Microarray-basedresequencingofmultipleBacillus anthracisisolates. GenomeBiol 2005, 6 :R10. 6.AhmedN,DobrindtU,HackerJ,HasnainSE: Genomicfluidityand pathogenicbacteria:applicationsindiagnostics,epidemiologyand intervention. NatRevMicrobiol 2008, 6 :387-394. 7.MardisER: Theimpactofnext-generationsequencingtechnologyon genetics. TrendsGenet 2008, 24 :133-141. 8.ShendureJ,JiH: Next-generationDNAsequencing. NatBiotechnol 2008, 26 :1135-1145. 9.ParkhillJ,WrenBW,ThomsonNR,TitballRW,HoldenMT,PrenticeMB, SebaihiaM,JamesKD,ChurcherC,MungallKL,BakerS,BashamD, BentleySD,BrooksK,Cerdeo-TrragaAM,ChillingworthT,CroninA, DaviesRM,DavisP,DouganG,FeltwellT,HamlinN,HolroydS,JagelsK, KarlyshevAV,LeatherS,MouleS,OystonPC,QuailM,RutherfordK, etal : GenomesequenceofYersiniapestis,thecausativeagentofplague. Nature 2001, 413 :523-527. 10.DengW,BurlandV,PlunkettG,BoutinA,MayhewGF,LissP,PernaNT, RoseDJ,MauB,ZhouS,SchwartzDC,FetherstonJD,LindlerLE, BrubakerRR,PlanoGV,StraleySC,McDonoughKA,NillesML,MatsonJS, BlattnerFR,PerryRD: GenomesequenceofYersiniapestisKIM. JBacteriol 2002, 184 :4601-4611. 11.SongY,TongZ,WangJ,WangL,GuoZ,HanY,ZhangJ,PeiD,ZhouD, QinH,PangX,HanY,ZhaiJ,LiM,CuiB,QiZ,JinL,DaiR,ChenF,LiS, YeC,DuZ,LinW,WangJ,YuJ,YangH,WangJ,HuangP,YangR: CompletegenomesequenceofYersiniapestisstrain9anisolate avirulenttohumans. DNARes 2004z, 11 :179-197. 12.ChainPS,HuP,MalfattiSA,RadnedgeL,LarimerF,VergezLM,WorshamP, ChuMC,AndersenGL: CompletegenomesequenceofYersiniapestis strainsAntiquaandNepal516:evidenceofgenereductioninan emergingpathogen. JBacteriol 2006, 188 :4453-4463. 13.ChainPS,CarnielE,LarimerFW,LamerdinJ,StoutlandPO,RegalaWM, GeorgescuAM,VergezLM,LandML,MotinVL,BrubakerRR,FowlerJ, HinnebuschJ,MarceauM,MedigueC,SimonetM,Chenal-FrancisqueV, SouzaB,DacheuxD,ElliottJM,DerbiseA,HauserLJ,GarciaE: Insightsinto theevolutionofYersiniapestisthroughwhole-genomecomparisonwithYersiniapseudotuberculosis. ProcNatlAcadSciUSA 2004, 101 :13826-13831. 14.EppingerM,RosovitzMJ,FrickeWF,RaskoDA,KokorinaG,FayolleC, LindlerLE,CarnielE,RavelJ: ThecompletegenomesequenceofYersinia pseudotuberculosisIP31758,thecausativeagentofFarEastscarlet-like fever. PLoSGenet 2007, 3 :e142. 15.ThomsonNR,HowardS,WrenBW,HoldenMT,CrossmanL,ChallisGL, ChurcherC,MungallK,BrooksK,ChillingworthT,FeltwellT,AbdellahZ, HauserH,JagelsK,MaddisonM,MouleS,SandersM,WhiteheadS, QuailMA,DouganG,ParkhillJ,PrenticeMB: TheCompleteGenome SequenceandComparativeGenomeAnalysisoftheHighPathogenicity YersiniaenterocoliticaStrain8081. PLoSGenet 2006, 2 :e206. 16.RollinsSE,RollinsSM,RyanET: Yersiniapestisandtheplague. AmJClin Pathol 2003, 119(Suppl) :S78-85. 17.WrenBW: Theyersiniae – amodelgenustostudytherapidevolutionof bacterialpathogens. NatRevMicrobiol 2003, 1 :55-64. 18.CornelisGR: TheYersiniaYsc-Yopvirulenceapparatus. IntJMedMicrobiol 2002, 291 :455-462. 19.JurisSJ,ShaoF,DIxonJE: Yersinia effectorstargetmammaliansignaling pathways. CellMicrobiol 2002, 4 :201-211. 20.ViboudGI,BliskaJB: Yersinia outerproteins:roleinmodulationofhostcell signalingresponsesandpathogenesis. AnnuRevMicrobiol 2005, 59 :69-89. 21.SchubertS,RakinA,HeesemannJ: TheYersiniahigh-pathogenicityisland (HPI):evolutionaryandfunctionalaspects. IntJMedMicrobiol 2004, 294 :83-94. 22.CarnielE: TheYersiniahigh-pathogenicityisland:aniron-uptakeisland. MicrobesInfect 2001, 3 :561-569. 23.DarlingAE,MiklosI,RaganMA: Dynamicsofgenomerearrangementin bacterialpopulations. PLoSGenet 2008, 4 :e1000128. 24.AnisimovAP,LindlerLE,PierGB: IntraspecificdiversityofYersiniapestis. ClinMicrobiolRev 2004, 17 :434-464. 25.WangX,HanY,LiY,GuoZ,SongY,TanY,DuZ,RakinA,ZhouD,YangR: YersiniagenomediversitydisclosedbyYersiniapestisgenome-wide DNAmicroarray. CanJMicrobiol 2007, 53 :1211-1221. 26.WelchTJ,FrickeWF,McDermottPF,WhiteDG,RossoML,RaskoDA, MammelMK,EppingerM,RosovitzMJ,WagnerD,RahalisonL,LeclercJE, HinshawJM,LindlerLE,CebulaTA,CarnielE,RavelJ: Multipleantimicrobial resistanceinplague:anemergingpublichealthrisk. PLoSONE 2007, 2 :e309. 27.DerbiseA,Chenal-FrancisqueV,PouillotF,FayolleC,PrvostMC, MdigueC,HinnebuschBJ,CarnielE: Ahorizontallyacquiredfilamentous phagecontributestothepathogenicityoftheplaguebacillus. Mol Microbiol 2007, 63 :1145-1157. 28.SulakvelidzeA: YersiniaeotherthanY.enterocolitica,Y. pseudotuberculosis,andY.pestis:theignoredspecies. MicrobesInfect 2000, 2 :497-513. 29.BottoneEJ,BercovierH,MollaretHH: GenusXLI.YersiniaVanLoghem 1944,15AL. Bergey ’ sManualofSystematicBacteriology 2005, 2 :838-846. 30.KotetishviliM,KregerA,WautersG,MorrisJGJr,SulakvelidzeA,StineOC: Multilocussequencetypingforstudyinggeneticrelationshipsamong Yersiniaspecies. JClinMicrobiol 2005, 43 :2674-2684. 31.NobleMA,BartelukRL,FreemanHJ,SubramaniamR,HudsonJB: Clinical significanceofvirulence-relatedassayofYersiniaspecies. JClinMicrobiol 1987, 25 :802-807. 32.Robins-BrowneRM,CianciosiS,BordunAM,WautersG: Pathogenicityof Yersiniakristenseniiformice. InfectImmun 1991, 59 :162-167. 33.FukushimaH,GomyodaM,KanekoS: Miceandmolesinhabiting mountainousareasofShimanePeninsulaassourcesofinfectionwith Yersiniapseudotuberculosis. JClinMicrobiol 1990, 28 :2448-2455. 34.MarguliesM,EgholmM,AltmanWE,AttiyaS,BaderJS,BembenLA,BerkaJ, BravermanMS,ChenYJ,ChenZ,DewellSB,DuL,FierroJM,GomesXV, GodwinBC,HeW,HelgesenS,HoCH,HoCH,IrzykGP,JandoSC, AlenquerML,JarvieTP,JirageKB,KimJB,KnightJR,LanzaJR,LeamonJH, LefkowitzSM,LeiM, etal : Genomesequencinginmicrofabricatedhighdensitypicolitrereactors. Nature 2005, 437 :376-380. 35.EwingB,GreenP: Base-callingofautomatedsequencertracesusing phred.II.Errorprobabilities. GenomeRes 1998, 8 :186-194. 36.BrockmanW,AlvarezP,YoungS,GarberM,GiannoukosG,LeeWL,RussC, LanderES,NusbaumC,JaffeDB: QualityscoresandSNPdetectionin sequencing-by-synthesissystems. GenomeRes 2008, 18 :763-770.Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page16of18

PAGE 17

37.PhillippyAM,SchatzMC,PopM: Genomeassemblyforensics:findingthe elusivemis-assembly. GenomeBiol 2008, 9 :R55. 38.SamadAH,CaiWW,HuX,IrvinB,JingJ,ReedJ,MengX,HuangJ,HuffE, PorterB: Mappingthegenomeonemoleculeatatime – opticalmapping. Nature 1995, 378 :516-517. 39.NagarajanN,ReadTD,PopM: Scaffoldingandvalidationofbacterial genomeassembliesusingopticalrestrictionmaps. Bioinformatics 2008, 24 :1229-35. 40.SiguierP,PerochonJ,LestradeL,MahillonJ,ChandlerM: ISfinder:the referencecentreforbacterialinsertionsequences. NucleicAcidsRes 2006, 34 :D32-36. 41.PriceAL,JonesNC,PevznerPA: Denovoidentificationofrepeatfamilies inlargegenomes. Bioinformatics 2005, 21(Suppl1) :i351-358. 42.HultonCS,HigginsCF,SharpPM: ERICsequences:anovelfamilyof repetitiveelementsinthegenomesofEscherichiacoli,Salmonella typhimuriumandotherenterobacteria. MolMicrobiol 1991, 5 :825-834. 43.DeGregorioE,SilvestroG,VendittiR,CarlomagnoMS,DiNoceraPP: StructuralorganizationandfunctionalpropertiesofminiatureDNA insertionsequencesinyersiniae. JBacteriol 2006, 188 :7876-7884. 44.PhillippyAM,MasonJA,AyanbuleK,SommerDD,TavianiE,HuqA, ColwellRR,KnightIT,SalzbergSL: ComprehensiveDNAsignature discoveryandvalidation. PLoSComputBiol 2007, 3 :e98. 45.LangilleMG,BrinkmanFS: IslandViewer:anintegratedinterfacefor computationalidentificationandvisualizationofgenomicislands. Bioinformatics 2009, 25 :664-665. 46.DarlingAC,MauB,BlattnerFR,PernaNT: Mauve:multiplealignmentof conservedgenomicsequencewithrearrangements. GenomeRes 2004, 14 :1394-1403. 47. MAUVEAlignerUserGuide. http://asap.ahabs.wisc.edu/mauve-aligner/ mauve-user-guide/. 48.EnrightAJ,VanDongenS,OuzounisCA: Anefficientalgorithmfor large-scaledetectionofproteinfamilies. NucleicAcidsRes 2002, 30 :1575-1584. 49.TettelinH,MasignaniV,CieslewiczMJ,DonatiC,MediniD,WardNL, AngiuoliSV,CrabtreeJ,JonesAL,DurkinAS,DeboyRT,DavidsenTM, MoraM,ScarselliM,MargarityRosI,PetersonJD,HauserCR,SundaramJP, NelsonWC,MadupuR,BrinkacLM,DodsonRJ,RosovitzMJ,SullivanSA, DaughertySC,HaftDH,SelengutJ,GwinnML,ZhouL,ZafarN, etal : GenomeanalysisofmultiplepathogenicisolatesofStreptococcus agalactiae:implicationsforthemicrobial “ pan-genome ” ProcNatlAcad SciUSA 2005, 102 :13950-13955. 50.LarkinMA,BlackshieldsG,BrownNP,ChennaR,McGettiganPA, McWilliamH,ValentinF,WallaceIM,WilmA,LopezR,ThompsonJD,GibsonTJ,HigginsDG: ClustalWandClustalXversion2.0. Bioinformatics 2007, 23 :2947-2948. 51.FelsensteinJ: PHYLIP:PhylogenyInferencePackage,version3.6. Seattle, WA,USA.:UniversityofWashington2001. 52.TatusovRL,GalperinMY,NataleDA,KooninEV: TheCOGdatabase:atool forgenome-scaleanalysisofproteinfunctionsandevolution. Nucleic AcidsRes 2000, 28 :33-36. 53.LeporeLS,RoelvinkPR,GranadosRR: Enhancin,thegranulosisvirus proteinthatfacilitatesnucleopolyhedrovirus(NPV)infections,isa metalloprotease. JInvertebrPathol 1996, 68 :131-140. 54.BowenD,RocheleauTA,BlackburnM,AndreevO,GolubevaE,BhartiaR, ffrench-ConstantRH: InsecticidaltoxinsfromthebacteriumPhotorhabdus luminescens. Science 1998, 280 :2129-2132. 55.BrussowH,CanchayaC,HardtWD: Phagesandtheevolutionofbacterial pathogens:fromgenomicrearrangementstolysogenicconversion. MicrobiolMolBiolRev 2004, 68 :560-602. 56.CollynF,GuyL,MarceauM,SimonetM,RotenCA: Describingancient horizontalgenetransfersatthenucleotideandgenelevelsby comparativepathogenicityislandgenometrics. Bioinformatics 2006, 22 :1072-1079. 57.CollynF,BillaultA,MulletC,SimonetM,MarceauM: YAPI,anewYersinia pseudotuberculosispathogenicityisland. InfectImmun 2004, 72 :4784-4790. 58.HowardSL,GauntMW,HindsJ,WitneyAA,StablerR,WrenBW: Applicationofcomparativephylogenomicstostudytheevolutionof Yersiniaenterocoliticaandtoidentifygeneticdifferencesrelatingto pathogenicity. JBacteriol 2006, 188 :3645-3653. 59.HallerJC,CarlsonS,PedersonKJ,PiersonDE: Achromosomallyencoded typeIIIsecretionpathwayinYersiniaenterocoliticaisimportantin virulence. MolMicrobiol 2000, 36 :1436-1446. 60.HenselM,SheaJE,BaumlerAJ,GleesonC,BlattnerF,HoldenDW: Analysis oftheboundariesofSalmonellapathogenicityisland2andthe correspondingchromosomalregionofEscherichiacoliK-12. JBacteriol 1997, 179 :1105-1111. 61.SheaJE,HenselM,GleesonC,HoldenDW: Identificationofavirulence locusencodingasecondtypeIIIsecretionsysteminSalmonella typhimurium. ProcNatlAcadSciUSA 1996, 93 :2593-2597. 62.ThomsonNR,HowardS,WrenBW,PrenticeMB: Comparativegenome analysesofthepathogenicYersiniaebasedonthegenomesequenceof Yersiniaenterocoliticastrain8081. AdvExpMedBiol 2007, 603 :2-16. 63.PrenticeMB,CuccuiJ,ThomsonN,ParkhillJ,DeeryE,WarrenMJ: CobalaminsynthesisinYersiniaenterocolitica8081.Functionalaspects ofaputativemetabolicisland. AdvExpMedBiol 2003, 529 :43-46. 64.RothJR,LawrenceJG,BobikTA: Cobalamin(coenzymeB12):synthesis andbiologicalsignificance. AnnuRevMicrobiol 1996,50 :137-181. 65.KofoidE,RappleyeC,StojiljkovicI,RothJ: The17-geneethanolamine(eut) operonofSalmonellatyphimuriumencodesfivehomologuesof carboxysomeshellproteins. JBacteriol 1999, 181 :5317-5329. 66.MaierRJ: Useofmolecularhydrogenasanenergysubstratebyhuman pathogenicbacteria. BiochemSocTrans 2005, 33 :83-85. 67.EwingWH,RossAJ,BrennerDJ,RFG: Yersiniaruckerisp.nov.,the Redmouth(RM)Bacterium. IntJSystBacteriol 1978, 28 :37-44. 68.SekowskaA,DnervaudV,AshidaH,MichoudK,HaasD,YokotaA, DanchinA: Bacterialvariationsonthemethioninesalvagepathway. BMC Microbiol 2004, 4 :9. 69.HillerNL,JantoB,HoggJS,BoissyR,YuS,PowellE,KeefeR,EhrlichNE, ShenK,HayesJ,BarbadoraK,KlimkeW,DernovoyD,TatusovaT,ParkhillJ, BentleySD,PostJC,EhrlichGD,HuFZ: ComparativeGenomicAnalysesof SeventeenStreptococcuspneumoniaeStrains:Insightsintothe PneumococcalSupragenome. JBacteriol 2007, 189 :8186-95. 70.HoggJS,HuFZ,JantoB,BoissyR,HayesJ,KeefeR,PostJC,EhrlichGD: CharacterizationandmodelingoftheHaemophilusinfluenzaecoreand supragenomesbasedonthecompletegenomicsequencesofRdand 12clinicalnontypeablestrains. GenomeBiol 2007, 8 :R103. 71.MatheeK,NarasimhanG,ValdesC,QiuX,MatewishJM,KoehrsenM, RokasA,YandavaCN,EngelsR,ZengE,OlavariettaR,DoudM,SmithRS, MontgomeryP,WhiteJR,GodfreyPA,KodiraC,BirrenB,GalaganJE,LoryS: DynamicsofPseudomonasaeruginosagenomeevolution. ProcNatlAcad SciUSA 2008, 105 :3100-3105. 72.HoltK,ParkhillJ,MazzoniC,RoumagnacP,WeillF,GoodheadI,RanceR, BakerS,MaskellD,WainJ,DolecekC,AchtmanM,DouganG: Highthroughputsequencingprovidesinsightsintogenomevariationand evolutioninSalmonellaTyphi. NatGenet 2008, 40 :987-93. 73.SimmonsS,DibartoloG,DenefV,GoltsmanD,ThelenM,BanfieldJ,EisenJ: PopulationGenomicAnalysisofStrainVariationinLeptospirillumGroup IIBacteriaInvolvedinAcidMineDrainageFormation. PlosBiol 2008, 6 : e177. 74.RaskoD,RosovitzM,MyersG,MongodinE,FrickeW,GajerP,CrabtreeJ, SperandioV,RavelJ: Thepan-genomestructureofEscherichiacoli: comparativegenomicanalysisofE.colicommensalandpathogenic isolates. JournalofBacteriology 2008, 190 :6881-93. 75.ReadTD,PetersonSN,TourasseN,BaillieLW,PaulsenIT,NelsonKE, TettelinH,FoutsDE,EisenJA,GillSR,HoltzappleEK,OkstadOA,HelgasonE, RilstoneJ,WuM,KolonayJF,BeananMJ,DodsonRJ,BrinkacLM,GwinnM, DeBoyRT,MadpuR,DaughertySC,DurkinAS,HaftDH,NelsonWC, PetersonJD,PopM,KhouriHM,RaduneD, etal : Thegenomesequenceof BacillusanthracisAmesandcomparisontocloselyrelatedbacteria. Nature 2003, 423 :81-86. 76.TettelinH,MasignaniV,CieslewiczMJ,EisenJA,PetersonS,WesselsMR, PaulsenIT,NelsonKE,MargaritI,ReadTD,MadoffLC,WolfAM,BeananMJ, BrinkacLM,DaughertySC,DeBoyRT,DurkinAS,KolonayJF,MadupuR, LewisMR,RaduneD,FedorovaNB,ScanlanD,KhouriH,MulliganS, CartyHA,ClineRT,VanAkenSE,GillJ,ScarselliM, etal : Completegenome sequenceandcomparativegenomicanalysisofanemerginghuman pathogen,serotypeVStreptococcusagalactiae. ProcNatlAcadSciUSA 2002, 99 :12391-12396. 77.SpragueLD,NeubauerH: Yersiniaaleksiciaesp.nov.IntJSystEvol Microbiol 2005, 55 :831-835.Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page17of18

PAGE 18

78.SpragueLD,ScholzHC,AmannS,BusseHJ,NeubauerH: Yersiniasimilis sp.nov. IntJSystEvolMicrobiol 2008, 58 :952-958. 79.MerhejV,AdekambiT,PagnierI,RaoultD,DrancourtM: Yersinia massiliensissp.nov.,isolatedfromfreshwater. IntJSystEvolMicrobiol 2008, 58 :779-784. 80.DelpinoMV,MarchesiniMI,EsteinSM,ComerciDJ,CassataroJ,FossatiCA, BaldiPC: AbilesalthydrolaseofBrucellaabortuscontributestothe establishmentofasuccessfulinfectionthroughtheoralrouteinmice. InfectImmun 2007, 75 :299-305. 81.SherlockO,VejborgRM,KlemmP: TheTibAadhesin/invasinfrom enterotoxigenicEscherichiacoliisselfrecognizingandinducesbacterial aggregationandbiofilmformation. InfectImmun 2005, 73 :1954-1963. 82.LiuB,PopM: ARDB – AntibioticResistanceGenesDatabase. NucleicAcids Res 2009, 37 :D443-447. 83. AntibioticResistanceGenesDatabase. http://ardb.cbcb.umd.edu/. 84.LeplaeR,HebrantA,WodakSJ,ToussaintA: ACLAME:aCLAssificationof MobilegeneticElements. NucleicAcidsRes 2004, 32 :D45-49. 85.KislyukA,LomsadzeA,LapidusAL,BorodovskyM: Frameshiftdetectionin prokaryoticgenomicsequences. IntJBioinformResAppl 2009, 5 :458-477. 86.PopM,PhillippyA,DelcherAL,SalzbergSL: Comparativegenome assembly. BriefBioinform 2004, 5 :237-248. 87.LiW,GodzikA: Cd-hit:afastprogramforclusteringandcomparinglarge setsofproteinornucleotidesequences. Bioinformatics 2006, 22 :1658-1659. 88.KurtzS,PhillippyA,DelcherAL,SmootM,ShumwayM,AntonescuC, SalzbergSL: Versatileandopensoftwareforcomparinglargegenomes. GenomeBiol 2004, 5 :R12. 89.StewartAC,OsborneB,ReadTD: DIYA:Abacterialannotationpipelinefor anygenomicslab. Bioinformatics 2009, 25 :962-3. 90.SalzbergSL,DelcherAL,KasifS,WhiteO: Microbialgeneidentification usinginterpolatedMarkovmodels. NucleicAcidsRes 1998, 26 :544-548. 91.LoweTM,EddySR: tRNAscan-SE:aprogramforimproveddetectionof transferRNAgenesingenomicsequence. NucleicAcidsRes 1997, 25 :955-964. 92.LagesenK,HallinP,RdlandEA,StaerfeldtHH,RognesT,UsseryDW: RNAmmer:consistentandrapidannotationofribosomalRNAgenes. NucleicAcidsRes 2007, 35 :3100-3108. 93.SuzekBE,HuangH,McGarveyP,MazumderR,WuCH: UniRef: comprehensiveandnon-redundantUniProtreferenceclusters. Bioinformatics 2007, 23 :1282-1288. 94.AltschulSF,GishW,MillerW,MyersEW,LipmanDJ: Basiclocalalignment searchtool. JMolBiol 1990, 215 :403-410. 95. ConservedDomainDatabase(CDD). http://www.ncbi.nlm.nih.gov/sites/ entrez?db=cdd. 96.AltschulSF,MaddenTL,SchafferAA,ZhangJ,ZhangZ,MillerW, LipmanDJ: GappedBLASTandPSI-BLAST:anewgenerationofprotein databasesearchprograms. NucleicAcidsRes 1997, 25 :3389-3402. 97.TalaveraG,CastresanaJ: Improvementofphylogeniesafterremoving divergentandambiguouslyalignedblocksfromproteinsequence alignments. SystBiol 2007, 56 :564-577. 98.ReadTD,MyersGS,BrunhamRC,NelsonWC,PaulsenIT,HeidelbergJ, HoltzappleE,KhouriH,FederovaNB,CartyHA,UmayamLA,HaftDH, PetersonJ,BeananMJ,WhiteO,SalzbergSL,HsiaRC,McClartyG,RankRG, BavoilPM,FraserCM: GenomesequenceofChlamydophilacaviae (ChlamydiapsittaciGPIC):examiningtheroleofniche-specificgenesin theevolutionoftheChlamydiaceae. NucleicAcidsRes 2003, 31 :2134-2147. 99.RaskoDA,MyersGS,RavelJ: Visualizationofcomparativegenomic analysesbyBLASTscoreratio. BMCBioinformatics 2005, 6 :2. 100.BercovierH,SteigerwaltAG,GuiyouleA,Huntley-CarterG,BrennerDJ: Yersiniaaldovae(FormerlyYersiniaenterocolitica-LikeGroupX2):aNew SpeciesofEnterobacteriaceaeIsolatedfromAquaticEcosystems. IntJ SystBacteriol 1984, 34 :166-172. 101.WautersG,JanssensM,SteigerwaltAG,BrennerDJ: Yersiniamollaretiisp. nov.andYersiniabercovierisp.nov.,FormerlyCalledYersinia enterocoliticaBiogroups3Aand3B. IntJSystBacteriol 1988, 38 :424. 102.UrsingJ,BrennertDJ,BercovierH,FanningGR,SteigerwaltAG,BraultJ, MollaretHH: Yersiniafrederiksenii:Anewspeciesofenterobacteriaceae composedofrhamnose-positivestrains(formerlycalledatypicalyersinia enterocoliticaorYersiniaenterocolitica-Like). CurrentMicrobiology 1980, 4 :213-217. 103.BrennerDJ,BercovierHH,UrsingJ,AlonsoJM,SteigerwaltAG,FanningGR, CarterGP,MollaretHH: Yersiniaintermedia:Anewspeciesof enterobacteriaceaecomposedofrhamnose-positive,melibiose-positive, raffinose-positivestrains(formerlycalledYersiniaenterocoliticaor Yersiniaenterocolitica-like). CurrentMicrobiology 1980, 4 :207-212. 104.BercovierH,UrsingJ,BrennerDJ,SteigerwaltAG,FanningGR,CarterGP, MollaretHH: Yersiniakristensenii:Anewspeciesofenterobacteriaceae composedofsucrose-negativestrains(formerlycalledatypicalYersinia enterocoliticaorYersiniaenterocolitica-Like). CurrentMicrobiology 1980, 4 :219-224. 105.AleksicS,SteigerwaltAG,BockemuehlJ: Yersiniarohdeisp.nov.isolated fromhumananddogfecesandsurfacewater. IntJSystBacteriol 1987. 106.CarverT,ThomsonN,BleasbyA,BerrimanM,ParkhillJ: DNAPlotter:circular andlinearinteractivegenomevisualization. Bioinformatics 2009, 25 :119-120. doi:10.1186/gb-2010-11-1-r1 Citethisarticleas: Chen etal .: Genomiccharacterizationofthe Yersinia genus. GenomeBiology 2010 11 :R1. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page18of18

PAGE 19

Number of regions identified within the 8 Yersinia genomes by Amosvalidate. High polymorphism rate indicates regions in the assembly where multiple high quality polymorphisms were found within 500bp from each oth er. A polymorphism was considered high quality if two or more reads disagree with the consensus sequence and the sum of their quality values exceeds 40 (likelihood of error lower than 1 in 10,000). Manual inspection of these polymorphisms indicated that t he majority are due to homopolymer tracts. High coverage indicates regions in the assembly that are deeper than 3 times the average coverage for the genome. Break indicates a region in the assembly where two or more singleton reads disagree with the contig structure, i.e. these reads only partially align to the contig indicating an alternative assembly of this genomic region is possible. Breaks n ear contig end indicates features that occur within 20bp from the end of a contig. Such breaks are often an end of contig artifact rather than an indication of mis assembly.

PAGE 20

Additional file 3 Estimates for Genome Sizes (in Mbp). Species Sequence in contigs (> 200 bp) Sequence in contigs bp) Genome size (sequence estimate) Optical map size Genome size (map estimate) Scaffold size Y. aldovae 4.29 0.03 4.33 AflII: 4.30 AflII: 4.22 AflII: 4.22 Y. bercovieri 4.32 0.11 4.37 AflII: 4.54 NheI: 4.50 AflII: 4.19 NheI: 4.24 AflII: 4.51 NheI: 4.52 Y. frederiksenii 4.87 0.12 4.90 AflII: 5.34 NheI: 5.40 AflII: 4.96 NheI: 4.88 AflII: 5.31 NheI: 5.30 Y. intermedia 4.69 0.13 4.71 AflII: 4.95 NheI: 5.07 AflII: 4.74 NheI: 4.56 AflII: 5.03 NheI: 5.00 Y. kristensenii 4.65 0.04 4.77 AflII: 4.63 AflII: 4.46 AflII: 4.65 Y. mollaretii 4.54 0.16 4.57 AflII: 4.93 NheI: 4.92 AflII: 4.75 NheI: 4.57 AflII: 4.88 NheI: 4.86 Y. rohdei 4.31 0.02 4.34 AflII: 4.65 NheI: 4.65 AflII: 4.30 NheI: 4.20 AflII: 4.56 NheI: 4.55 Y. ruckeri 3.73 0.03 3.79 AflII: 3.90 NheI: 3.96 AflII: 3.76 NheI: 3.58 AflII: 3.89 NheI: 3.85 was used to compute the sequence estimate. Contigs mapped onto the optical map were

PAGE 21

used to estimate the expansion in optical map size and this was used to compute the map estimate of genome size.

PAGE 22

Y Y e e r r s s i i n n i i a a e e 1 1 . Y Y . a a l l d d o o v v a a e e 2 2 . Y Y . b b e e r r c c o o v v e e r r i i i i 3 3 . Y Y . f f r r e e d d e e r r i i k k s s e e n n i i i i 4 4 . Y Y . i i n n t t e e r r m m e e d d i i a a 5 5 . Y Y . k k r r i i s s t t e e n n s s e e n n i i i i 6 6 . Y Y . m m o o l l l l a a r r e e t t i i i i 7 7 . Y Y . r r o o h h d d e e i i 8 8 . Y Y . r r u u c c k k e e r r i i E E . c c o o l l i i 1 1 . Y Y 1 1 0 0 8 8 8 8 2 2 . C C 6 6 0 0 0 0 / / P P 1 1 3 3 . G G M M 1 1 1 1 9 9 / / p p R R K K 2 2 / / p p I I N N T T 4 4 . I I N N V V 1 1 1 1 0 0 48 kb 20 kb 12 kb 5 kb 4 kb 3 kb 5. 8 kb 60 kb 93 kb Y Y e e r r s s i i n n i i a a e e Y Y e e r r s s i i n n i i a a e e E E . c c o o l l i i 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 5 kb ladder 1 kb ladder 1 kb ladder 48 kb Legend: Pulse fie ld gel electrophoresis of plasmid preparations from Yersiniae strains and known E. coli strains carrying different sized plasmids. PFGE gel (1% agarose) was run under the following conditions in 0.5X TBE buffer at 14 C, switch time 1 6 seconds, for 18 hrs at a voltage gradient of 6V/cm

PAGE 26

Y. enterocolitica Y. kristensenli Y. frederiksenii Y. rohdei Y. bercovieri Y. mollaretii Y. intermedia Y. aldovae Y. pseudotuberculosis Y. pestis Y. ruckeri E. coliParsimony Tree (with percent bootstrap support at nodes) 100 100 100 100 100 100 100 67.4 64.5

PAGE 27

yberc0001 ymoll0001 ykris0001 yente0001X yrohd0001 yfred0001 yinte0001 yaldo0001 ypest0001X ypseu0001X yruck0001 ecoli0001XMaximum Likelihood Tree (with percent bootstrap support at nodes) 100 99.9 99.9 100 59.1 60.4 99.8 97.8 100

PAGE 28

Proteins conserved in pathogenic species but missing from non pathogens Y. enterocolitica 8081 Length (aa) Description Y. pestis CO92 Y. pseudotube rculosis IP32953 Notes YE0393 106 primosomal replication protein n YPO3538 YPTB0439 YE0445 468 probable outer membrane efflux lipoprotein (ileB) YPO3481 YPTB0493 YE0759 66 Hypothetica l protein YPO3367 YPTB0763 *In the same location as YPO336 7 but ortholog in CO92 called on the opposite strand YE2612 434 putative salicylate synthetase YPO1916 YPTB1601 YE2613 426 putative signal transducer YPO1915 YPTB1600 YE2614 600 inner membrane ABC transporter YbtQ YPO1914 YPTB1599

PAGE 29

YE2615 600 lipoprotein inner membrane ABC transporter YPO1913 YPTB1598 YE2616 319 transcriptio nal regulator YbtA YPO1912 YPTB1597 YE2617 2035 yersiniabact in biosynthetic protein YPO1911 YPTB1596 YE2618 3161 yersiniabact in biosynthetic protein YPO1910 YPTB1595 YE2619 366 yersiniabact in biosynthetic protein YbtU YPO1909 YPTB1594 YE2620 267 yersiniabact in biosynthetic protein Y btT YPO1908 YPTB1593 YE2621 525 yersiniabact in siderophore biosynthetic protein YPO1907 YPTB1592 YE2622 673 pesticin/yer siniabactin receptor protein YPO1906 YPTB1591 YE3032 108 Hypothetica l protein YPTB1208 No locus assigned in Y.pestis

PAGE 30

CO92 but pr esent in other Y. pestis YE3036 215 Putative uncharacteri zed protein YPO2821 YPTB1040 Truncate d in CO92 YE3368 78 Hypothetica l protein YPO0876 YPTB3119 YE3390 145 Hypothetica l protein YPO0904 YPTB3179 YE3910 101 30S ribosomal protein S14 YPO0222 a YP TB3685 Decline in number of unique genes as more genomes are added. Each value is the average of 8 random genome permutations.

PAGE 32

YscN orthologs from the 8 Yersinia genomes in this study were grouped with Y. pestis YscN UniRef50 group. All proteins within 25 amino acids length to the 439 YscN protein of Y. pestis were aligned using ClustalW and trimmed manually. A neighbor joini ng tree using PHYLIP was constructed. Plasmid and chromosomal clusters of Yersinia form separate branches and the Y. enterocolitica/ Y. mollareti branch are further from other enterocolitical like strains than the Y. pestis/ Y. pseudotuberculosis ysa orth olog.

PAGE 33

Whole Genome Based Nucleotide Diversity # pairwise comparisons 55 # sites considered 1,495,930 # mismatches 22,539,297 0.273946909280514 5% 0.0136973454640257 # mismatches caused by 1 sequencing error 10 # mismatches needed to change pi by 5% 112696 5 # sequencing errors needed 112696 bermuda model 0.9999 # sequencing errors assuming bermuda model 150 change in 1.82E 05 % change in 0.01% genome size (# sites considered) needed to create 5% change in 1.13E+09



PAGE 1

RESEARCHOpenAccess Genomiccharacterizationofthe Yersinia genus PeterEChen 1 † ,ChristopherCook 1 † ,AndrewCStewart 1 † ,NiranjanNagarajan 2,7 ,DanDSommer 2 ,MihaiPop 2 BrendanThomason 1 ,MaureenPKileyThomason 1 ,ShannonLentz 1 ,NicholeNolan 1 ,ShanmugaSozhamannan 1 AlexanderSulakvelidze 3 ,AlfredMateczun 1 ,LeiDu 4 ,MichaelEZwick 1,5 ,TimothyDRead 1,5,6* Abstract Background: NewDNAsequencingtechnologieshaveenableddetailedcomparativegenomicanalysesofentire generaofbacterialpathogens.Priortothisstudy,threespeciesoftheenterobacterialgenus Yersinia thatcause invasivehumandiseases( Yersiniapestis Yersiniapseudotuberculosis ,and Yersiniaenterocolitica )hadbeensequenced. However,therewerenogenomicdataonthe Yersinia specieswithmorelimitedvirulencepotential,frequently foundinsoilandwaterenvironments. Results: Weusedhigh-throughputsequencing-by-synthesisinstrumentstoobtain25-to42-foldaverage redundancy,whole-genomeshotgundatafromthetypestrainsofeightspecies: Y.aldovae Y.bercovieri Y. frederiksenii Y.kristensenii Y.intermedia Y.mollaretii,Y.rohdei ,and Y.ruckeri .Thedeepestbranchingspeciesinthe genus, Y.ruckeri ,causativeagentofredmouthdiseaseinfish,hasthesmallestgenome(3.7Mb),althoughitshares thesamecoresetofapproximately2,500genesastheothermembersofthespecies,whosegenomesrangein sizefrom4.3to4.8Mb. Yersinia genomeshadasimilarglobalpartitionofproteinfunctions,asmeasuredbythe distributionofClusterofOrthologousGroupsfamilies.Genometogenomevariationinislandswithgenes encodingfunctionssuchasureases,hydrogeneasesandB-12cofactormetabolitereactionsmayreflectadaptations tocolonizingspecifichosthabitats. Conclusions: Rapidhigh-qualitydraftsequencingwasusedsuccessfullytocomparepathogenicandnonpathogenicmembersofthe Yersinia genus.Thisworkunderscorestheimportanceoftheacquisitionofhorizontally transferredgenesintheevolutionof Y.pestis andpointstovirulencedeterminantsthathavebeengainedandlost onmultipleoccasionsinthehistoryofthegenus. Background Ofthemillionsofspeciesofbacteriathatliveonthis planet,onlyaverysmallpercentagecauseserious humandiseases[1].Comparativegeneticstudiesare revealingthatmanypathogenshaveonlyrecently emergedfromproteanenvironmental,commensalor zoonoticpopulations[2-5].Foravarietyofreasons, mostresearchefforthasbeenfocusedoncharacterizing thesepathogens,whiletheircloselyrelatednon-pathogenicrelativeshaveonlybeenlightlystudied.Asa result,ourunderstandingofthepopulationbiologyof thesecladesremainsbiased,limitingourknowledgeof theevolutionofvirulenceandourabilitytodesign reliableassaysthatdistinguishpathogensignaturesfrom thebackgroundintheclinicandenvironment[6]. Therecentdevelopmentofsecondgenerationsequencingplatforms(reviewedbyMardis[7,8]andShendure [7,8])offersanopportunitytochangethedirectionof microbialgenomics,enablingtherapidgenomesequencingoflargenumbersofstrainsofbothpathogenicand non-pathogenicstrains.Herewedescribethedeploymentofnewsequencingtechnologytoextensivelysampleeightgenomesfromthe Yersinia genusofthefamily Enterobacteriaceae.Thefirst publishedsequencingstudiesonthe Yersinia genushavefocusedexclusivelyon invasivehumandisease-causingspeciesthatincluded five Yersiniapestis genomesequences(oneofwhich, strain91001,isfromtheavirulent ‘ microtus ’ biovar) [9-12],two Yersiniapseudotuberculosis [13,14]andone Yersiniaenterocolitica biotype1B[15].Primarilyazoonoticpathogen, Y.pestis ,thecausativeagentofbubonic *Correspondence:tread@emory.edu † Contributedequally 1 BiologicalDefenseResearchDirectorate,NavalMedicalResearchCenter,503 RobertGrantAvenue,SilverSpring,Maryland20910,USA Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 2010Chenetal.;licenseeBioMedCentralLtd.ThisisanopenaccessarticledistributedunderthetermsoftheCreativeCommons AttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,andreproductionin anymedium,providedtheoriginalworkisproperlycited.

PAGE 2

plagueandacategoryAselectagent,isarecently emergedlineagethathassinceundergoneglobalexpansion[2].Followingintroductionintoahumanthrough fleabite[16], Y.pestis isengulfedbymacrophagesand takentotheregionallymphnodes. Y.pestis then escapesthemacrophagesandmultipliestocausea highlylethalbacteremiaifun treatedwithantibiotics. Y. pseudotuberculosis and Y.enterocolitica (primarilybiotype1B)areenteropathogenst hatcausegastroenteritis followingingestionandtranslocationofthePeyer ’ s patches.Like Y.pestis ,theenteropathogenicYersiniae canescapemacrophagesandmultiplyoutsidehostcells, butunliketheirmorevirulentcogener,theyonlyusually causeself-limitinginflammatorydiseases. Thegenerallyacceptedpathwayfortheevolutionof thesemoreseveredisease-causingYersiniaeismemorablyencapsulatedbytherecipe, ‘ addDNA,stir,reduce ’ [17].IneachspeciesDNAhasbeen ‘ added ’ byhorizontalgenetransferintheformofplasmidsandgenomic islands.Allthreehumanpathogenscarrya70-kbpYV virulenceplasmid(alsoknownaspCD),whichcarries theYsctypeIIIsecretionsystemandYopseffectors [18-20],thatisnotdetectedinnon-pathogenicspecies. Y.pestis alsohastwoadditionalplasmids,pMT(also knownaspFra),containingtheF1capsule-likeantigen andmurinetoxin,andpPla(alsoknownaspPCP1), whichcarriesplasminogen-activatingfactor,Pla. Y.pestis Y.pseudotuberculosis ,andbiotype1B Y.enterocolitica alsocontainachromosomallylocated,mobile,highpathogenicityisland(HPI)[21].TheHPIincludesa clusterofgenesforbiosynthesisofyersiniabactin,an iron-bindingsiderophorenecessaryforsystemicinfection[22]. ‘ Stir ’ referstointra-genomicchange,notably therecentexpansionofinsertionsequences(IS)within Y.pestis (3.7%ofthe Y.pestis CO92genome[9])anda highlevelofgenomestructuralvariation[23]. ‘ Reduce ’ describesthelossoffunctionsviadeletionsandpseudogeneaccumulationin Y.pestis [9,13]duetoshiftsin selectionpressurecausedbythetransitionfrom Y.pseudotuberculosis -likeenteropathoge nicitytoaflea-borne transmissioncycle.Thisdescriptionof Y.pestis evolutionis,ofcourse,oversimplified. Y.pestis strainsshow considerablediversityatthe phenotypiclevelandthere isevidenceofacquisitionofplasmidsandotherhorizontallytransferredgenes[[12,24,25]DNAmicroarray, [26,27]]. Whilemostattentionisfocusedonthethreewellknownhumanpathogens,severalother,lessfamiliar Yersinia specieshavebeensplitofffrom Y.enterocolitica overthepast40yearsbasedonbiochemistry,serology and16SRNAsequence[28,29]. Y.ruckeri isanagriculturallyimportantfishpathogenthatisacauseof ‘ red mouth ’ diseaseinsalmonidfish.Thespecieshassufficientphylogeneticdivergencefromtherestofthe Yersinia genustostircontroversyaboutitstaxonomic assignment[30]. Y.fredricksenii Y.kristensenii Y.inter-media Y.mollaretii Y.bercovieri ,and Y.rohdei have beenisolatedfromhumanfeces,freshwater,animal fecesandintestinesandfoods[28].Therehavebeen reportsassociatingsomeofthespecieswithhuman diarrhealinfections[31]andlethalityformice[32]. Y. aldovae ismostoftenisolatedfromfreshwaterbuthas alsobeenculturedfromfishandthealimentarytractsof wildrodents[33].Thereisnoreportofisolationof Y. aldovae fromhumanfecesorurine[28]. Usingmicrobead-based,massivelyparallelsequencing bysynthesis[34]werapidlyandeconomicallyobtained highredundancygenomesequenceofthetypestrainsof eachoftheseeightlesserknown Yersinia species.From thesegenomesequences,wewereabletodeterminethe coregenesetthatdefinesthe Yersinia genusandto lookforcluestodistinguishthegenomesofhuman pathogensfromlessvirulentstrains.ResultsHigh-redundancydraftgenomesequencesofeight Yersinia speciesWholegenomeshotguncove rageofeightpreviously unsequenced Yersinia species(Table1)wasobtainedby single-endbead-basedpyrosequencing[34]usingthe 454LifeSciencesGS-20instrument.Eachoftheeight genomeswassequencedtoahighlevelofredundancy (between25and44sequencingreadsperbase)and assembled denovo intolargecontigs(Table2;Additionalfile1).Excludingcontigsthatcoveredrepeat regionsandthereforehadsignificantlyincreasedcopy number,thequalityofthesequenceofthedraftassemblieswashigh,withlessthan0.1%ofthesequenceof eachgenomehavingaconsensusqualityscore[35]less than40.Moreover,amorerecentassessmentofquality ofGS-20datasuggeststhatthescoresgeneratedbythe 454LifeSciencessoftwareareanunderestimationofthe truesequencequality[36 ].Themostcommonsequencingerrorencounteredwhenassemblingpyrosequencingdataistherarecallingofincorrectnumbersof homopolymerscausedbyvariationintheintensityof fluorescenceemitteduponextensionwiththelabeled nucleoside[34]. Previousstudiesandourexperiencesuggestthatat thislevelofsequencecoveragetheassemblygapsfallin repeatregionsthatcannotbespannedbysingle-end sequencereads(averagelength109nucleotidesinthis study)[34].FewerRNAgenesareobservedcomparedto published Yersinia genomesfinishedusingtraditional Sangersequencingtechnology(Additionalfile1),reflectingthegreaterdifficultyofuniquelyassemblingrepetitivesequenceswithsingle-endreads.Weassessedthe qualityofourassembliesusingmetricsimplementedinChen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page2of18

PAGE 3

the amosvalidate package[37].Specifically,wefocused onthreemeasuresfrequently correlatedwithassembly errors:densityofpolymorphismswithinassembled reads,depthofcoverage,andbreakpointsinthealignmentofunassembledreadstothefinalassembly. Regionsineachgenomewhereatleastonemeasure suggestedapossiblemis-assemblywerevalidatedby manualinspection(Additionalfile2).Manyofthesuspectregionscorrespondedtocollapsedrepeats,where thelocationofindividualmembersoftherepeatfamily withinthegenomecouldnotbeaccuratelydetermined. Basedontheresultsofthe amosvalidate analysisand theopticalmapalignmentwefoundnoevidenceof mis-assembliesleadingtochimericcontigsintheeight genomeswesequenced.Genomicregionsflaggedbythe amosvalidate packagearemadeavailableinGFFformat (compatiblewithmostgenomebrowsers)inAdditional file3. Genomesizeswereestimatedinitiallyasthesumof thesizesofthecontigsfromtheshotgunassembly,with correctionsforcontigsrepresentingcollapsedrepeats (Table2).Wealsoderivedanindependentestimatefor thegenomesizefromthewhole-genomeopticalrestrictionmappingofthespecies[38](Additionalfile4). Alignmentofcontigstotheopticalmaps[39]suggested thattheopticalmapsconsistentlyoverestimatedsizes(2 to10%onaverage).Aftercorrection,themap-based estimatesandsequence-basedestimatesagreedwell (within7%).Twospecies, Y.aldovae (4.22to4.33Mbp) and Y.ruckeri (3.58to3.89Mbp),haveasubstantially reducedtotalgenomesizecomparedwiththe4.6to4.8 Mbpseeninthegenusgenerally.Theagreement betweentheopticalmapsandsequence-basedestimates ofgenomesizestalliedwith experimentalevidencefor thelackoflargeplasmidsinthesequencedgenomes (Additionalfile5).Ascreenformatchestoknown Table1StrainssequencedinthisstudySpeciesATCC number Other designations Year isolated Location isolated DescriptionOptimum growth temperature Reference Y.aldovae 35236TCNY6065NRCzechoslovakiaDrinkingwater26C[100] Y.bercovieri 43970TCDC2475-87NRFranceHumanstool26C[101] Y.frederiksenii 33641TCDC1461-81,CIP 80-29 NRDenmarkSewage26C[102] Y.intermedia 29909TCIP80-28NRNRHumanurine37C[103] Y.kristensenii 33638TCIP80-30NRNRHumanurine26C[104] Y.mollaretii 43969TCDC2465-87NRUSASoil26C[101] Y.rohdei 43380TH271-36/78,CDC 3022-85 1978GermanyDogfeces26C[105] Y.ruckeri 29473T2396-611961Idaho,USARainbowtrout( Oncorhynchus mykiss )withredmouthdisease 26C[67]NR,notreportedinreferencepublication. Table2GenomessummarySpeciesTypestrainNCBI project ID GenBank accession number Total reads Numberof contigs>500 nt Totallengthof largecontigs %large contigs
PAGE 4

plasmidgenesproducedonlyafewcandidateplasmid contigs,totalinglessthan10kbpofsequenceineach genome. ThenumberofISelementspergenomefortheeight species(12to167matches)discoveredusingtheISfinderdatabase[40]wasmuchlowerthaninthe Y.pestis genome(1,147matches;copynumbersestimatestook intoaccountthepossibilityofmis-assemblyandwere accordinglyadjusted;seeMet hods).Furthermore,the non-pathogenicspecieswiththemostISmatches, namely Y.bercovieri (167matches), Y.aldovae (143 matches)and Y.ruckeri (136matches),havecomparativelysmallergenomes.Wealsosearchedfornovel repeatfamiliesusinga denovo repeat-finder[41]and collectedanon-redundantsetof44repeatsequence familiesinthe Yersinia genus(Table3;Additionalfile 6).Interestingly,thewell-knownERICelement[42]was recoveredbyour denovo searchandwasfoundtobe presentinmanycopiesinallthepathogenicspecies,but wasrelativelyrareinthenon-pathogenicones.Onthe otherhand,asimilarandrecentlydiscoveredelement, YPAL[43](alsorecoveredbythe denovo search),was abundantinallthe Yersinia genomesexceptthefish pathogen Y.ruckeri .InsertionsequenceIS1541Cinthe ISfinderdatabase,whichhasexpandedin Y.pestis (to morethan60copies),hadonlyahandfulofstrong matchesin Y.enterocolitica Y.pseudotuberculosis ,and Y.bercovieri andnodiscernablematchesintheother Yersinia genomes.New Yersinia genomedatareducethepoolofunique detectiontargetsfor Y.pestis and Y.enterocoliticaThesequencesgeneratedinthisstudyprovidenew backgroundinformationforvalidatinggenusdetection anddiagnosisassaystargetingpathogenicmembersof the Yersinia genus.Theassaydesignprocesscommonly startsbycomputationallyidentifyinggenomicregions thatareuniquetothetargetedgenus( ’ signatures ’ )-an idealsignatureissharedbyalltargetedpathogensbut notfoundinabackgroundcomprisingnon-pathogenic nearneighborsorinotherunrelatedmicrobes.While manypathogensarewellcharacterizedatthegenomic level,thebackgroundsetisonlysparselyrepresentedin genomicdatabases,therebylimitingtheabilitytocomputationallyscreenoutnon-specificcandidateassays (falsepositives).Asaresult,manyassaysmayfail experimentalfieldtests,therebyincreasingthecostsof assaydevelopmentefforts.Toevaluatewhetherthenew genomicsequencesgeneratedinourstudycanreduce theincidenceoffalsepositivesinassaydevelopment,we computedsignaturesforthe Y.pestis and Y.enterocolitica generausingtheInsigniapipeline[44],thesystem previouslyusedto successfullydevelopassaysforthe detectionof V.cholerae [44].Weidentified171and100 regionswithinthegenomesof Y.pestis and Y.enterocolitica ,respectively,thatrepresentgoodcandidatesfor thedesignofdetectionassays.In Y.pestis theseregions tendedtoclusteraroundtheoriginofreplication, whereasin Y.enterocolitica therewasamoreevendistribution.TheaverageG+Ccontentoftheregionsfor theuniquesequencesinbothspecieswasclosetothe Yersinia average(47%)andtherewasnotastrongassociationwithputativegenomeislands(Additionalfiles7, 8,9,10,11,12,[45]).Forbothspecies,mostregions overlappedpredictedgenes(161of171(94%)and96of 100(96%)in Y.pestis and Y.pseudotuberculosis ,respectively).Interestingly,171 Y.pestis generegionswere spreadoveronly70differentgenes,whereasthe96 Y. enterocolitica regionswerefoundoverlappingonly90 genes.Therewasnoobvioustrendinthenatureofthe genesharboringtheseputativesignalsexceptthatmany couldbearguablyclassedas ‘ non-core ’ functions, Table3DistributionofcommonrepeatsequencesERIC (127bp) YPAL (167bp) Kristensenii 39 (142bp) IS1541C (708bp) Aldovae3 (154bp) E.coli 03505 Y.pestis 5443336138 Y.pseudotuberculosis 555229536 Y.enterocolitica 63144100375 Y.aldovae 68446040 Y.bercovieri 9456913 Y.frederiksenii 057605 Y.intermedia 29148043 Y.kristensenii 29970059 Y.mollaretii 66226020 Y.rohdei 037807 Y.ruckeri 452002Threeoftherepeatsequencesfoundusing denovo searchesmatchedtheknownrepeatelementsERIC,YPAL,andIS1541Candareidentifiedassuch. Kristensenii 39andAldovae3areelementsfoundfrom denovo searchesinthe Y.kristensenii and Y.aldovae genomes,respectively.Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page4of18

PAGE 5

encodingphageendonucleases,invasins,hemolysinsand hypotheticalproteins. Ten Y.pestis -specificand31 Y.enterocolitica -specific putativesignatureshavesignificantmatchesinthenew genomesequencedata(Additionalfiles7,8,9,10),indicatingassaysdesignedwithintheseregionswouldresult infalsepositiveresults.Thisresultunderscorestheneed forafurthersamplingofgenomesofthe Yersinia genus inordertoassistthedesignofdiagnosticassays.Yersinia whole-genomecomparisonsWeperformedamultiplealignmentofthe11 Yersinia speciesusingtheMAUVEalgorithm[46](fromhereon Y.pestis CO92and Y.pseudotuberculosis IP32953were usedastherepresentativegenomesoftheirspecies)and obtained98locallycollinearblocks(LCBs;Additional files13,14,[47]).ThemeanlengthoftheLCBswas 23,891bp.Theshortestblockwas1,570bp,andthe longestwas201,130bp.Thismultiplealignmentofthe ‘ core ’ regiononaveragecovered52%ofeach Yersinia genome.Thenucleotidediversity( )fortheconcatenatedalignedregionwas0.27,oranapproximategenuswidenucleotidesequencehomologyof73%.Asexpected forasetofbacteriawiththislevelofdiversity,thealignmentofthegenomesshowsevidenceofmultiplelarge genomerearrangements[23](Additionalfile13). Usinganautomatedpipelineforannotationand clusteringofproteinorthologsbasedontheMarkov chainclusteringtoolMCL[48],weestimatedthesize ofthe Yersinia proteincoresettobe2,497andthe pan-genome[49]tobe27,470 (Additionalfiles15,16, 17,18).Thecorenumberfallsasymptoticallyasgenomesareintroducedandhencethisestimateissomewhatlowerthantherecentanalysisofonlythe Y. enterocolitica Y.pseudotuberculosis and Y.pestis genomes(2,747coreproteins)[15].Wefound681genes tobeinexactlyonecopyineach Yersinia genomeand tobenearlyidenticalinlength.WeusedClustalW [50]toalignthemembersofthishighlyconservedset, andconcatenatedindividualgeneproductalignments tomakeadatasetof170,940aminoacidsforeachof thespecies.Uninformativecharacterswereremoved fromthedatasetandaphylogenyofthegenuswas computedusingPhylip[51](Figure1).Thetopology ofthistreewasidenticalwhetherdistanceorparsimonymethodswereused(Additionalfiles19,20)and wasalsoidenticaltoatreebasedonthenucleotide sequenceoftheapproximately1.5MbofthecoregenomeinLCBs(seeabove).Thegenusbrokedowninto threemajorclades:theoutlyingfishpathogen, Y.ruckeri ; Y.pestis / Y.pseudotuberculosis ;andtheremainder ofthe ‘ enterocolitica ’ -likespecies. Y.kristensenii ATCC33638Twasthenearestneighborof Y.enterocolitica 8081.Theoutlyingpositionof Y.ruckeri was confirmedfurtherwhenweanalyzedthecontribution ofthegenometoreducingthesizeofthe Yersinia coreproteinfamiliesset.If Y.ruckeri wasexcluded, the Yersinia corewouldbe2,232proteinfamiliesof N=2ratherthan2,072(Table4).Incontrast,omissionofanyoneofthe10otherspeciesonlyreduced thesetbyamaximumof22families. ClusteringthesignificantClusterofOrthologous Groups(COG)hits[52]foreachgenomehierarchically (Figure2)yieldedasimilar patternforthethreebasic clades.TheoverallcompositionoftheCOGmatchesin eachgenome,asmeasuredbytheproportionofthe numbersineachCOGsupercategory,wassimilar throughoutthegenus,withthenotableexceptionsof thehighpercentageofgroupLCOGsin Y.pestis dueto theexpansionofISrecombinasesandtherelatively lownumberofgroupG(su garmetabolism)COGsin Y.ruckeri (Figure2).Sharedproteinclustersinpathogenic Yersinia : yersiniabactinbiosynthesisisthekeychromosomal functionspecifictohighvirulenceinhumansThe Yersinia proteomeswereinvestigatedforcommon clustersinthethreehighvirulencespeciesmissingfrom thelowhumanvirulencegenomes(Figure3).Becauseof thecloseevolutionaryrelationshipofthe ‘ enterocolictica ’ cladestrains,thenumberofuniqueproteinclusters in Y.enterocolitica wasreducedtoagreaterdegreethan themorephylogenticallyisolated Y.pestis and Y.pseudotuberculosis .Manyofthesamegenomeislandsidentifiedasrecenthorizontalacquisitionby Y.pestis and/or Y.pseudotuberculosis [9,13,15]werenotpresentinany ofthenewlysequencedgenomes.However,somegenes, interestingfromtheperspectiveofthehostspecificityof the Y.pestis / Y.pseutoberculosis ancestor,weredetected inother Yersinia speciesforthefirsttime.These includedorthologsofYPO3720/YPO3721,ahemolysin andactivatorproteinin Y.intermedia Y.bercovieri and Y.fredricksenii ;YPO0599,ahemeutilizationprotein alsofoundin Y.intermedia ;andYPO0399,anenhancin metalloproteasethathadanorthologin Y.kristensenii (ykris0001_41250).Enhancinwasoriginallyidentifiedas afactorpromotingbaculovirusinfectionofgypsymoth midgutbydegradationofmucin[53].Otherlociin Y. pestis / Y.pseudotuberculosis linkedwithinsectinfection, theTccCandTcABCtoxinclusters[54],werealso foundin Y.mollaretti .In Y.mollaretti theTcaandTcc proteinsshowabout90%sequenceidentityto Y.pestis / Y.pseudotuberculsis andshareidenticalflankingchromosomallocations.Furtherworkwillneedtobeundertakentoresolvewhethertheinsertionofthetoxin genesin Y.mollaretti isanindependenthorizontal transfereventoroccurredpriortodivergenceofthe species.Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page5of18

PAGE 6

Aftercomparisonofthenewlowvirulencegenomes, thenumberofproteinclusterssharedby Y.enterocolitica andtheothertwopathogenswasreducedto12and 13for Y.pseudotuberculosis and Y.pestis ,respectively (Figure3).Theremainingsharedproteinswereeither identifiedasphage-relatedorofunknownrole,providingfewcluestopossiblefunctionsthatmightdefinedistinctpathogenicniches.Performingasimilaranalysis strategybetweenothersgenomeofthe ‘ enterocolitica ’ cladeand Y.pestis or Y.pseudotuberculosis gaveasimilarresultintermsofnumbersandtypesofsharedproteinclusters. Onlysixteenclustersofchromosomalproteinswere foundtobecommontoallthreehigh-virulencespecies butabsentfromalleightnon-pathogens(Figure3).Elevenofthesearecomponentsoftheyersiniabactinbiosynthesisopero n(Additionalfile21),further highlightingthecriticalimportanceofthisironbinding siderophoreforinvasivedisease.Theotherproteinsare generallysmallproteinsthatarelikelyincludedbecause theyfallinunassembledregionsoftheeightdraftgenomes.Oneothersmallisland ofthreeproteinsconstitutingamulti-drugeffluxpump(YE0443toYE0445) wascommontothehigh-virulencespeciesbutmissing fromtheeightdraftlow-virulencespecies.Variableregionsof Y.enterocolitica cladegenomesThebasicmetabolicsimilaritiesof Y.enterocolitica and thesevenspeciesonthemainbranchofthe Yersinia genusphylogenetictreearefurtherillustratedinFigure 4,wherethebestproteinmatchesagainsteach Y.enterocolitica 8081geneproduct[15]areplottedagainsta circulargenomemap.Veryfewgenesexclusiveto Y. enterocolitica 8081werefoundoutsideofprophage regions,whichisatypicalresultwhengroupsofclosely Table4 Yersinia coresizereductionbyexclusionofone speciesSpeciesexcludedCoreproteinfamilies None2,072 Y.enterocolitica 2,074 Y.aldovae 2,085 Y.bercovieri 2,079 Y.frederiksenii 2,077 Y.intermedia 2,080 Y.kristensenii 2,076 Y.mollaretii 2,078 Y.rohdei 2,091 Y.ruckeri 2,232 Y.pseudotuberculosis 2,076 Y.pestis 2,094Thecoreproteinfamilieswithnumberofmembers2orgreaterwere recalculatedineachcase(seeMaterialsandmethods)withtheproteinset fromonegenomemissing. 0.00 0.25 0.50 0.75 1.00Sensitivity 0.00 0.25 0.50 0.75 1.00 1-SpecificityA 0.00 0.25 0.50 0.75 1.00Sensitivity 0.00 0.25 0.50 0.75 1.00 1-SpecificityB 0.00 0.25 0.50 0.75 1.00Sensitivity 0.00 0.25 0.50 0.75 1.00 1-SpecificityC 0.00 0.25 0.50 0.75 1.00Sensitivity 0.00 0.25 0.50 0.75 1.00 1-SpecificityD Figure1 Yersinia whole-genomephylogeny .Thephylogenyofthe Yersinia genuswasconstructedfromadatasetof681concatenated, conservedproteinsequencesusingtheNeighbor-Joining(NJ)algorithmimplementedbyPHYLIP[51].Thetreewasrootedusing E.coli .The scalemeasuresnumberofsubstitutionsperresidue.Treetopologiescomputedusingmaximumlikelihoodandparsimonyestimatesareidentical witheachotherandtheNJtree(Additionalfile20).Theonlybranchesnotsupportedinmorethan99%ofthe1,000bootstrapreplicatesusing bothmethodsaremarkedwithasterisks.Boththesebranchesweresupportedby>57%ofreplicates. Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page6of18

PAGE 7

relatedbacterialgenomesarecompared[55].Oneofthe largestislandsfoundin Y.enterocolitica 8081wasthe 66-kb Y.pseudotuberculosis adhesionpathogenicity island(YAPIye)[15,56,57],auniquefeatureofbiotype 1Bstrains.YAPIye,containingatypeIVpilusgeneclusterandotherputativevirulencedeterminants,suchas arsenicresistance,issimilartoa99-kbYAPIpstthatis foundinseveralotherserotypesof Y.pseudotuberculosis [14,57]butismissingin Y.pestis andtheserotypeI Y. pseudotuberculosis strainIP32953[14].Amodelhas beenproposedfortheacquisitionofYAPIinacommon ancestorof Y.pseudotuberculosis and Y.enterocolitica andsubsequentdegradationtovariousdegreeswithin the Y.pseudotuberculosis clade.However,thecomplete absenceofYAPIfromanyofthesevenspeciesinthe Y. enterocolitica branch(Figure4),aswellasfrommost strainsof Y.enterocolitica [15],arguesagainstanancient acquisitionofYAPI,butinsteadsuggeststherecent Figure2 ComparisonofmajorCOGgroupsin Yersinia genomes .BarsrepresentthenumberofproteinsassignedtoCOGsuperfamilies[52] foreachgenome,basedonmatchestotheConservedDomainDatabase[95]databasewithanE-valuethreshold<10-10.TheCOGgroupsare: U,intracellulartrafficking;G,carbohydratetransportandmetabolism;R,generalfunctionprediction;I,lipidtransportandmetabolism;D,cel l cyclecontrol;H,coenzymetransportandmetabolism;B,chromatinstructure;P,inorganiciontransportandmetabolism;W,extracellular structures;O,post-translationalmodification;J,translation;A,RNAprocessingandediting;L,replication,recombinationandrepair;C,ener gy production;M,cellwall/membranebiogenesis;Q,secondarymetabolitebiosynthesis;Z,cytoskeleton;V,defensemechanisms;E,aminoacid transportandmetabolism;K,transcription;N,cellmotility;T,signaltransduction;F,nucleotidetransport;S,functionunknown. Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page7of18

PAGE 8

Figure3 Distributionofproteinclustersacross Y.enterocolitica 8081, Y.pestis CO92,and Y.pseudotuberculosis IP32953 (a) TheVenn diagramshowsthenumberofproteinclustersuniqueorsharedbetweenthetwootherhighvirulence Yersinia species(seeMaterialsand methods). (b) Thenumberofsharedanduniqueclustersthatdonotcontainasinglememberoftheeightlowhumanvirulencegenomes sequencedinthisstudy. Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page8of18

PAGE 9

independentacquisitionofrelatedislandsbyboth Y. enterocolitica biogroup1Band Y.pseudotuberculosis Manygenespreviouslythoughttobeuniqueto Y. enterocolitica ingeneralandbiotype1Binparticular turnedouttohaveorthologsinthelowhumanvirulencespeciessequencedinthisstudy.Theseincluded severalputativebiotype1B-s pecificgenesidentifiedby microarray-basedscreening[58],includingYE0344 HylDhemophore(yinte0001_41550has78%nucleotide identity),YE4052metalloprotease(yinte0001_36030has 95%nucleotideidentity),andYE4088,atwo-component sensorkinase,whichhadort hologsinallspecies.Large portionsofthebiogroup1B-specificislandcontaining theYts1typeIIsecretionsystemwerefoundin Y.ruckeri Y.mollaretii ,and Y.aldovae Y.aldovae and Y.mollaretii alsohadislandscontaining ysa typethree secretionsystems(TTSS)with75to85%nucleotide identitytothehomologin Y.enterocolitica 1B.The Figure4 Protein-basedcomparisonof Y.enterocolitica 8081tothe Yersinia genus .Themaprepresentstheblastscoreratio(BSR)[98,99]to theproteinencodedby Y.enterocolitica [15].BlueindicatesaBSR>0.70(strongmatch);cyan0.69to0.4(intermediate);green<0.4(weak).Red andpinkoutercirclesarelocationsofthe Y.enterocolitica genesonthe+and-strands.Thegenomesareorderedfromoutsidetoinsidebased onthegreatestoverallsimilarityto Y.enterocolitica : Y.kristensenii Y.frederiksenii Y.mollaretii Y.intermedia Y.bercovieri,Y.aldovae Y.rohdei Y. ruckeri Y.pseudotuberculosis ,and Y.pestis .Theblackbarsontheoutsiderefertogenomeislandsin Y.enterocolitica identifiedbyThomson etal [15]. Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page9of18

PAGE 10

ysa genesareachromosomalcluster[9,13,15]thatin Y. enterocolitica ,atleast,appearstoplayaroleinvirulence [59].The Y.enterocoliticaysa genesarefoundinthe plasticityzone(Figure4)andhaveverylowsimilarityto the Y.pestis and Y.pseudotuberculosisysa genes(which aremoresimilartothe Salmonella SPI-2island[60,61]) andarefoundbetweenorthologsofYPO0254and YPO0274[9].Specieswithinthe Yersinia genushad eitherthe Y.enterocolitica typeof ysa TTSSlocusor the Y.pestis /SPI-2type(withtheexceptionof Y.aldovae ,whichhasboth;Additionalfile22).Thissuggested theexchangeofchromosomalTTSSgeneswithin Yersinia Themodularnatureoftheislandsfoundinthe Y. enterocolitica genomewasdemonstratedfurtherbytwo examplesgleanedfromcomparisonwiththeevolutionarilyclosestlowhumanvirulencegenome, Y.kristensenii ATCC33638T(Figure1).TheYGI-3island[15]in Y. enterocolitica 8081isadegradedintegratedplasmid;at thesamechromosomallocusin Y.kristensenii ATCC 33638Taprophagewasfound,suggestingthattheYGI3locationmaybearecombinationalhotspot.Another Y.enterocolitica 8081island,YGI-1,encodesa ‘ tight adherence ’ ( tad )locusresponsiblefornon-specificsurfacebinding. Y.kristensenii ATCC33638Thadanidentical13gene tad locusinthesameposition,butthe nucleotidesequenceidentityoftheregionto Y.enterocolitica 8081wasuniformlylowerthanthatfoundfor therestofthegenome,suggestingtherehadbeeneither ageneconversioneventreplacingthe tad locuswitha setofnewallelesintherecenthistoryof Y.kristensenii or Y.enterocolitica orthelocuswasunderveryhigh positiveselectivepressure.Niche-specificmetabolicadaptationsinthe Yersinia genusComparisonofthe Y.enterocolitica genometo Y.pestis and Y.pseudotuberculosis revealedsomepotentially significantmetabolicdifferencesthatmayaccountfor varyingtropismsingastricinfections[62]. Y.enterocolitica 8081alonecontainedentiregeneclustersforcobalamin(vitaminB12)biosynthesis( cbi ),1,2-propanediol utilization( pdu ),andtetrathionaterespiration( ttr ).In Y. enterocolitica and Salmonellatyphimurium [63,64],vitaminB12isproducedunderanaerobicconditionswhere itisusedasacofactorin1,2-propanedioldegradation, withtetrathionateservingasanelectronacceptor.This studyshowedthegenesforthispathwaytobeageneral featureofspeciesinthe ‘ enterocolitica ’ branchofthe Yersinia genus(withthecaveatthatsomeportionsare missinginsomespecies;forexample, Y.rohdei ismissingthe pdu cluster(Table5).Additionally, Y.intermedia Y.bercovieri ,and Y.mollaretii containedgene clustersencodingdegradationofthemembranelipid constituentethanolamine. Ethanolaminemetabolism underanaerobicconditionsalsorequirestheB12cofactor. Y.intermedia containedthefull17-genecluster reportedin S.typhimurium [65],includingstructural componentsofthecarboxysomeorganelle.Anotherdiscoveryfromthe Y.enterocolitica genomeanalysiswas thepresenceoftwocompacthydrogenasegeneclusters, Hyd-2andHyd-4[15].Hydrogenreleasedfromfermentationbyintestinalmicrofloraisimputedtobean importantenergysourceforentericgutpathogens[66]. Bothgeneclustersareconservedacrossalltheother sevenenterocolitica-bran chspecies,butaremissing from Y.pestis and Y.pseudotuberculosis Y.ruckeri containedasingle[NiFe]-containinghydrogenasecomplex. Y.ruckeri ,themostevolutionarilydistantmemberof thegenus(Figure1)withthesmallestgenome(3.7Mb), hadseveralfeaturesthatweredistinctivefromitscogeners.The Y.ruckeri O-antigenoperoncontaineda neuB sialicacidsynthasegene,t hereforethebacteriumwas predictedtoproduceasialatedoutersurfacestructure. Amongthecommon Yersinia genesthataremissing Table5Keyniche-specificgenesin Yersiniacbipduttreuthyd-2hyd-4uremtnopg Y.enterocolitica +++-+++++ Y.aldovae ++--+++++ Y.bercovieri +++ eutABC +++++ Y.frederiksenii +++-+++++ Y.intermedia +++ eutSPQTDMNEJGHABCLKR +++++ Y.kristensenii +++-+++++ Y.mollaretii ++eutABC +++++ Y.rohdei +-+-+++++ Y.ruckeri ----+/-hyfABCGHINfdhF+/-(hyaD,hypEDB)--+ Y.pseudotuberculosis ------+++ Y.pestis ------+/---Abbreviations: cbi ,cobalamin(vitaminB12)biosynthesis; pdu ,1,2-propanediolutilization; ttr ,tetrathionaterespiration; eut ,ethanolaminedegradation; hyd-2 and hyd-4 ,hydrogenases2and4,respectively; ure ,urease; mtn ,methioninesalvagepathway; opg ,osmoprotectant(synthesisofperiplasmicbranchedglucans).Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page10of18

PAGE 11

onlyin Y.ruckeri werethoseforxyloseutilizationand ureaseactivity,consistent withphenotypesthathave longbeenknowninclinicalmicrobiology[67](Table3). Surprisingly,wediscoveredthat Y.ruckeri wasalso missingthe mtnKADCBEU geneclusterthatcomprises themajorityofthemethioninesalvagepathway[68] foundinmostotherYersiniae.Thesegeneshavealso beendeletedfrom Y.pestis ,butaswith Y.ruckeri ,the mtnN (methylthioadenosinenucleosidase)ismaintained. Thelossofthesegenesin Y.pestis hasbeeninterpreted asaconsequenceofadaptationtoanobligatehostdwellinglifecycle,wherethea vailabilityofthesulfurcontainingaminoacidsisnotanutritionallimitation [15].DiscussionWhole-genomeshotgunsequencingbyhigh-throughput bead-basedpyrosequencinghasprovedremarkablyusefulforthelarge-scalesequencingofcloselyrelatedbacteria[49,69-74].High-quality denovo assembliescanbe obtainedwithrelativelyfewerrorsandgapswhenthe sequencereadcoverageredundancyis15-foldor greater.Closingallthegapsineachgenomesequenceis time-consumingandcostly;therefore,inthenearfuture therewillbeanexcessofdraftbacterialsequencesversusclosedgenomesinpublic databases.Ouranalysis strategyheremeldsbothdraftandcompletegenomes usingconsistentautomatedannotationthatisscalable toencompasspotentiallymuchlargerdatasets.High qualitydraftsequencingislikelytoshortlysupersede comparativegenomehybridizationusingmicroarrays [25,58,75,76]asthemostpopularstrategyforgenomewidebacterialcomparisons.Genomesequencedatasets canbeusedtoshedlightonthenovelfunctionsin closerelativesthatmayhavebeenlostinthepathogen ofinterest,aswellasorthologsingenomesthatfall belowthethresholdforhybridization-baseddetection. Theproblemsofusingmicroarraysforcomparisonsof morediversebacterialtaxaareillustratedinastudyof the Yersinia genus,usingmanyofthestrainssequenced inthiswork,wheretheestimatednumberofcoregenes wasfoundtobeonly292[25]. Wecannotclaimcompletecoverageofallthetype strainsofthe Yersinia genus,asthreenewspecieshave beencreated[77-79]sinceourworkbegan.Nonetheless, fromthisextensivegenomicsurveywehaveattempted tocategorizethefeaturesthatdefine Yersinia .Thecore ofabout2,500proteinspresentinall11speciesisnota subsetofanyotherenterobacterialgenome.Speciesof the Y.enterocolitica clade(Figure1)haveoverallasimilararrayofproteinfunctionsandcontainanumberof conservedgeneclusters(cobalamin,hydrogenases, ureases,andsoon)foundinotherbacteria( Helicobacter Campylobacte r, Salmonella Escherichiacoli )that colonizethemammaliangut. Y.pestis haslostmanyof thesegenesbydeletionordisruptionsinceitssplitfrom theentericpathogen Y.pseudotuberculosis andadoption ofaninsectvector-mediatedpathogenicitymode.The smaller Y.ruckeri chromosomedoesnotappeartoresult fromrecentreductiveevolution(asisthecaseof Y.pestis ),evidencedbytherelativelylownumberofframeshiftsandpseudogenes,andthenormalamountof repetitivecontigsinthe newbler genomeassembly.Like Y.pestis Y.ruckeri lacksurease,methioninesalvage genes,andB12-relatedmetabolism.Theprevailingconsensusisthatthepathwayoftransmissionofredmouth diseaseinfishisgastrointestinalyetthesimilaritiesof Y. ruckeri genomereductionto Y.pestis hintatanalternativemodeofinfectionfor Y.ruckeri Thiscomparativegenomicstudyreaffirmsthatthe distinguishingfeaturesofthehigh-levelmammalian pathogensistheacquisitionofaparticularsetof mobileelements:HPI,thepYV,pMT1andpPCPplasmids,andtheYADIisland. However,theeightspecies sequencedinthisstudybeli evedtohaveeitherlowor zeropotentialforhumaninfe ction,containnumerous, apparentlyhorizontallytransferredgenesthatwouldbe consideredputativevirule ncedeterminantsifdiscoveredinthegenomeofamoreseriouspathogen.Two examplesareyaldo0001_40900(bilesalthydrolase)and yfred0001_36480,anorthologoftheTibAadhesinof enterotoxigenic E.coli .Bilesalthydrolaseinpathogenic Brucellaabortus hasbeenshowntoenhancebile resistanceduringoralmous einfections[80]andthe TibAadhesinformsabiofilmthatmediateshuman cellinvasion[81].Thelow-virulencespeciescontaina similar(andinsomecasesgreater)numberofmatches toknowndrugresistancemechanismsthathavebeen curatedintheAntibioticResistanceGenesDatabase [82](Additionalfile23,[83]).AddingDNA,stirring andreducing[17]is,therefore,thegeneralrecipefor Yersinia genomeevolutionratherthanaformulaspecifictopathogens.Comparativegenomicstudiessuchas thesecanbeusedtoenhanceourabilitytorapidly assessthevirulencepotentialofagenomesequenceof anemergingpathogenandweplantocontinueto buildmoreextensivedatabasesofnon-pathogenic Yersinia genomesthatwillallowustodrawconclusions withmorestatisticalpowerp ossiblethanjust11representativespecies.ConclusionsGenomesofthe11 Yersinia speciesstudiedrangein estimatedsizefrom3.7to4.8Mb.Thenucleotidediversity( )oftheconservedbackbonebasedonlargecollinearconservedblockswascalculatedtobe0.27.There werenoorthologsofgenesandpredictedproteinsin thevirulence-associatedplasmidspYV,pMT1,andpPla,Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page11of18

PAGE 12

andtheHPIof Y.pestis inthegenomesofthetype strains-eightnon-orlow-pathogenic Yersinia species Apartfromfunctionsencodedontheaforementioned plasmids,HPIandYAPIregions,onlynineproteins detectedascommontoallthree Yersinia pathogenspecies( Y.pestis Y.enterocolitica and Y.pseudotuberculosis )werenotfoundonatleastoneoftheothereight species.Therefore,ourstudyisinagreementwiththe hypothesisthatgenesacquiredbyrecenthorizontal transfereffectivelydefinethemembersofthe Yersinia genusvirulentforhumans. Thecoreproteomeofthe11 Yersinia speciesconsists ofapproximately2,500proteins. Yersinia genomeshada similarglobalpartitionofproteinfunctions,asmeasured bythedistributionofCOGfamilies.Genometogenome variationinislandswithge nesencodingfunctionssuch asureases,hydrogenasesandB12cofactormetabolite reactionsmayreflectadaptati onstocolonizingspecific hosthabitats. Y.ruckeri ,asalmonidfishpath ogen,istheearliest branchingmemberofthegenusandhasthesmallest genome(3.7Mb).Like Y.pestis Y.ruckeri lacksfunctionalurease,methioninesalvagegenes,andB12-related metabolism.Theselossesmayreflectadaptationtoa lifestylethatdoesnotincludecolonizationofthemammaliangut. TheabsenceoftheYAPIislandinanyoftheseven ‘ Y. enterocolitica clade ’ genomeslikelyindicatesthatYAPI wasacquiredindependentlyin Y.enterocolitica and Y. pseudotuberculosis Weidentified171and100regionswithinthegenomes of Y.pestis and Y.enterocolitica ,respectively,thatrepresentedpotentialcandidatesforthedesignofnucleotide sequence-basedassaysforuniquedetectionofeach pathogen.MaterialsandmethodsBacterialstrainsTypestrainsoftheeight Yersinia speciessequencedin thisstudy(Table1)wereacquiredfromtheAmerican TypeCultureCollection(ATCC)andpropagatedat37 Cor25C( Y.ruckeri )onLuriamedia.DNAforgenome sequencingwaspreparedfromovernightbrothcultures propagatedfromsingleco loniesstreakedonaLuria agarplateusingtheProme gaWizardMaxiprepSystem (Promega,Madison,WI,USA).GenomesequencingandassemblyGenomesweresequencedusingtheGenomeSequencer 20Instrument(454LifeSequencingInc.,Branford,CT) [34].Librariesforsequencingwerepreparedfrom5 g ofgenomicDNA.Thesequencingreadsforeachproject wereassembled denovo usingthe newbler program (version01.51.02;454LifeSciencesInc).OpticalmappingOpticalmaps[38]foreachgenomeusingtherestriction enzymes Afl IIand Nhe I( Y.aldovae and Y.kristensenii onlyhavemapsfor Afl II)wereconstructedbyOpgen Inc.(Madison,WI).The newbler assembliesforeach genomewerescaffoldedusingtheopticalmapsandthe SOMApackage[39](Additionalfile4).Assembliesthat didnotalignagainsttheopticalmapweretestedfor highreadcoverage,unusualGCcontent,andgood matchestoplasmid-associatedgenesfromtheACLAME database[84](BLASTE-valuelessthan10-20)toidentify sequencesthatcouldpotentiallybepartofanextrachromosomalelement.DetectionofdisruptedgenesWeusedtwomethodsfordetectingdisruptedproteins used.Inthefirstmethodclusteredproteingroupswere usedtoadduceevidenceforpossiblegenedisruption events.Theclusterswereparsedforpairsofproteins thatmetthefollowingcriteria:bothfromthesamegenome;encodedbygeneslocatedonthesamestrandwith lessthan200bpseparatingtheirframes;andtotal lengthofthecombinedgeneswasnotgreaterthan 120%ofthelongestgeneinthecluster.Thesecond methodusedwastheFSFINDalgorithm[85]witha standardbacterialgenemodeltocomparetheaccumulationofpredictedframeshiftsacrossdifferentgenomes.AssemblyvalidationInordertoruleoutartifactsduetoundocumented featuresofthe newbler assemblies,newassemblies weregeneratedforvalidati onpurposesbyre-mapping alltheshotgunreadstothesequenceoftheassembled contigsusingAMOScmp[86].Theresultingassembly wasthensubjectedtoanalysisusingthe amosvalidat e package[37].Theoutputofthisprogramincludesa listofgenomicregionsthat containinconsistencies highlightingpossiblemisassemblies.Theresulting regionsweremanuallyinspectedtoreducethepossibilityofassemblyerrors.Theregionsflaggedbythe amosvalidate packageareprovidedinGFF(general featureformat),compatiblewithmostgenomebrowsers(Additionalfile3).Insertionsequencesand denovo repeatfindingThepresenceofrepeatsisk nowntoconfoundassembly programsandthe newbler assemblerisknowntocollapse high-fidelityrepeatinstan cesintoasinglecontig.To accountforthepossibilityofsuchmisassemblies,we computedthecopynumberofcontigsbasedoncoverage statisticsandusedthisinformationtocorrectourestimatesfortheabundanceofclassesofrepeats(Additional file3).Tofindknowninsertionsequences,thegenomes werescannedformatchesusingtheISfinderwebserviceChen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page12of18

PAGE 13

[40]withaBLASTE-valuethresholdof10-10(matchesto knownrepeatcontigswerecountedasmultiplematches basedonthecoverageofthecontig).Inaddition,we searchedforcommonrepeatsequencesinthegenome usingtheRepeatScoutprogram[41]afterduplicating knownrepeatcontigs.Therepeatsfoundineachgenome werecollected(64sequences)andtransformedintoa non-redundantsetof44sequencesusingtheCD-HIT program[87](Additionalfile6).Therepeatsfoundwere thensearchedagainstallthegenomesusingBLASTwith anE-valuethresholdof10-10torecordmatches.The resultantfiguresforrepeatc ontentareestimationsthat maybelowerthanthetruenumberfoundinthe genomes.FindinguniqueDNAsignaturesin Y.pestis and Y. enterocoliticaDNAsignaturesforthe Y.pestis andthe Y.enterocolitica genomeswereidentifiedusingtheInsigniapipeline [44].Signaturesof100bporlongerwereconsidered goodcandidatesforthedesignofdetectionassays. Thesesignatureswerethencomparedwiththegenomes ofthe Yersinia strainssequencedduringthecurrent studyusingtheMUMmerpackage[88]withdefault parameters.Signaturesthatmatchedbymorethan40 bpweredeemedinvalidated,astheywouldlikelyleadto false-positiveresults.AutomatedannotationWeusedDIYA[89]forautomatedannotation,which isapipelineforintegratingbacterialanalysistools. UsingDIYA,theassembliesgeneratedby newbler were scaffoldedbasedontheopti calmap,concatenated,and usedasatemplatefortheprogramsGLIMMER[90], tRNASCAN-SE[91],andRNAmmer[92]forpredictionofopenreadingframesandRNAgenes,respectively.Allpredictedproteinsencodedbyeachcoding sequencewerecomparedagainstadatabaseofallproteinspredictedfromthecanonicalannotationof Y. pestis CO92[9]asapreliminaryscreenforpotentially novelfunctions.TheGenBankformatfilescreated fromtheeightgenomessequencedinthisstudywere combinedwithotherDIYA-annotated,published wholegenomestoformadatasetforanalysis.AllproteinsweresearchedagainsttheUniRef50database (July2008)[93]usingBLASTP[94]andagainstthe ConservedDomainDatabase[95]usingRPSBLAST [96]withanE-valuethresholdof10-10torecord matches.DatabaseaccessionnumbersTheannotatedgenomedataweresubmittedtoNCBI GenBankandthesequencedatasubmittedtotheNCBI ShortReadArchive(SRA).Theaccessionnumbersare: Y.rohdei ,ATCC_43380:[Genbank:ACCD00000000]/ [SRA:SRA009766.1]; Y.ruckeri ATCC_29473:[Genbank: ACCC00000000]/[SRA:SRA009767.1]; Y.aldovae ATCC_35236:[Genbank:ACCB00000000]/[SRA: SRA009760.1]; Y.kristensenii ATCC_33638:[Genbank: ACCA00000000]/[SRA:SRA009764.1]; Y.intermedia ATCC_29909:[Genbank:AALF00000000]/[SRA: SRA009763.1]; Y.frederiksenii ATCC_33641:[Genbank: AALE00000000]/[SRA:SRA009762.1]; Y.mollaretii ATCC_43969:[Genbank:AALD00000000]/[SRA: SRA009765.1]; Y.bercovieri ATCC_43970:[Genbank: AALC00000000]/[SRA:SRA009761.1].Whole-genomealignmentusingMAUVEYersinia genomeswerealignedusingthestandard MAUVE[46]algorithmwithde faultsettings.Acutoff for1,500bpwassetastheminimumLCBlength. LCBsforeachgenomewereextractedfromtheoutput oftheprogramandconcatenated.Fromthealignment nucleotidediversitywascalculatedbyanin-house scriptusingpositionswheretherewasabaseinall11 genomes.Becauseofthesizeofthedataset,thecalculatedvalueof isveryrobustintermsofsequence error.Wecalculatedthat112,696nucleotidesof sequenceintheconcatenatedcorewouldhavetobe wrongtoaltertheestimationof P by5%(Additional file24).PHYLIP[51]programswereusedtobuilda consensustreeoftheMAUVEalignmentwithbootstrapping1,000replicates.Theunderlyingmodelfor eachreplicatewasFitch-Margoliash.Thefinalphylogenywasresolvedaccording tothemajorityconsensus rule.ClusteringproteinorthologsThecompletepredictedproteomefromallgenomes annotatedinthisstudywassearchedagainstitself usingBLASTPwithdefaultparameters.Weremoved short,spurious,andnon-homologoushitsbysettinga bitscore/alignmentlengthfilteringthresholdof0.4and minimumproteinlengthof30.Predictedproteinspassingthisfilterwereclusteredintofamiliesbasedon thesenormalizeddistanc esusingtheMCLalgorithm [48]withaninflationparametervalueof4.These parameterswerebasedonani nvestigationofclustering12completed E.coli genomes,whichproduced verysimilarresultstoapreviousstudy[42].WholegenomephylogeneticreconstructionFromtheresultsofclusteringanalysis,681proteins werefoundthathadexactlyonememberineachofthe genomesandthelengthofeachproteininthecluster wasnearlyidentical.TheseproteinsequenceswereChen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page13of18

PAGE 14

alignedusingClustalW[50],andindividualgenealignmentswereconcatenatedintoastringof170,940amino acidsforeachgenome.Uninformativecharacterswere removedfromthedatasetusingGblocks[97]andaphylogenyreconstructedwithPHYLIP[51]underaneighbor-joiningmodel.Toevaluatenodesupport,amajority rule-consensustreeof1,000bootstrapreplicateswas computed.Additionalfile1:StatisticsfromDIYAandframeshiftdetection programsoneightgenomessequencedinthisstudyandother enterobacterialgenomesfromNCBI StatisticsfromrunningDIYA[89] andframeshiftdetectionprogramsontheeightgenomessequencedin thisstudyandvariousotherenterobacterialgenomesdownloadedfrom NCBI. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S1.xls] Additionalfile2:Resultsof amosvalidate analysisontheeight genomesofthisstudy Resultsof amosvalidate [37]analysisonthe eightgenomesofthisstudy. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S2.doc] Additionalfile3:Additionalannotationfiles TheseconsistofISfinder [40],RepeatScout[41]and amosvalidate [37]results(GFFformat);repeats foundbyRepeatScoutinfastaformat,scaffoldfiles(NCBIAGPformat); andinformationaboutlengthofcontigs,readcount,estimatedrepeat number,countinscaffoldandwhetherornotthecontigwasplacedby SOMA[39]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S3.gz] Additionalfile4:Estimatesforgenomesizes(inMbp)basedon opticalmapdata Estimatesforgenomesizes(inMbp)basedonoptical mapdata. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S4.doc] Additionalfile5:Pulsedfieldgelanalysisoftheeightsequenced Yersinia speciesandfailuretodetectplasmids An E.coli strainwith knownplasmidswasapositivecontrol. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S5.doc] Additionalfile6:Sequencesofthedetectedrepeatfamilies Sequencesofthedetectedrepeatfamilies. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S6.txt] Additionalfile7: Y.pestis CO92signatureslongerthan100bp computedbytheInsigniapipeline Y.pestis CO92signatureslonger than100bpcomputedbytheInsignia[44]pipeline. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S7.txt] Additionalfile8:Sequencesofthenewgenomesthatmatch(that is,invalidate)the Y.pestis CO92signatureslistedinAdditionalfile 7 Sequencesofthenewgenomesthatmatch(thatis,invalidate)the Y. pestis CO92signatureslistedinAdditionalfile7. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S8.txt] Additionalfile9: Y.enterocolitica signatureslongerthan100bp computedbytheInsigniapipeline Y.enterocolitica signatureslonger than100bpcomputedbytheInsigniapipeline. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S9.txt] Additionalfile10:Sequencesofthenewgenomesthatmatch(that is,invalidate)the Y.enterocolitica signatures Sequencesofthenew genomesthatmatch(thatis,invalidate)the Y.enterocolitica signatures. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S10.txt] Additionalfile11: Y.pestis genomewiththeInsiginia-indentified repeatsandgenomeislandsplotted Y.pestis genomewiththe Insiginia-indentifiedrepeatsandgenomeislandsidentifiedusing IslandViewer[45]plotted.ThefigurewascreatedusingDNAPlotter[106]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S11.png] Additionalfile12: Y.enterocolitica genomewiththeInsiginiaindentifiedrepeatsandgenomeplotted Y.enterocolitica genomewith theInsiginia-indentifiedrepeatsandgenomeislandsidentifiedusing IslandViewer[45]plotted.ThefigurewascreatedusingDNAPlotter[106]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S12.png] Additionalfile13:OutputoftheMAUVE[46]alignmentof11 Yersinia species Theeightgenomessequencedinthisstudyare representedaspseudocontigs,orderedbyacombinationofoptical mappingandalignmenttotheclosestcompletedreferencegenome. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S13.jpeg] Additionalfile14:Wholegenomemultiplealignmentproducedby MAUVEofthe11 Yersiniagenomes Wholegenomemultiplealignment producedbyMAUVEofthe11 Yersinia genomesinXMFAformat[106]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S14.zip] Additionalfile15:Outputoftheclusteranalysisofthe11 Yersinia species Thetopleveldirectoryconsistsofadirectorycalled Additional_cluster_filesand5010directories,oneforeachmulti-protein clusterfamily.(Thistopleveldirectoryhasbeensplitintothreedatafiles foruploadingpurposes(Additionalfiles15,16,17).)Withinthedirectory arethefollowingfiles:PGL1_unique_ Yersinia _unclustered.out-listofall proteinsingletonsthatMCLdidnotgroupintoacluster(seeMaterials andMethods);PGL1_ Yersinia _unique_locus_tags.txt-namesofthe11 locustagprefixesusedforeachgenome;PGL1_unique_ Yersinia .gffmappingeach Yersinia proteintoaclusterintabdelimitedGFF; PGL1_unique_ Yersinia .sigfile-listofthelongestproteinineachcluster; PGL1_unique_ Yersinia .summary-summarytableoffeaturesofeachof theclusters;PGL1_unique_ Yersinia .table-summarytableofeachprotein intheclusters.Withineachclusterdirectoryarethefollowingfiles,where ‘ x ’ istheclustername:PGL1_unique_ Yersinia -x.faa-multifastafileofthe proteinsinthecluster;PGL1_unique_ Yersinia -x.summary-summaryof thepropertiesoftheproteins;PGL1_unique_ Yersinia -x.matches-blast matchesbetweentheproteinsofthecluster;PGL1_unique_ Yersinia -x. muscle.fasta-musclealignmentoftheproteins;PGL1_unique_ Yersinia -x. muscle.fasta.gblo-gblocksoutputofmusclealignment(thatis,autotrimmedalignment);PGL1_unique_ Yersinia -x.muscle.fasta.gblo.htm-as aboveinhtmlformat;PGL1_unique_ Yersinia -x.muscle.tree-treefilefrom musclealignment;PGL1_unique_ Yersinia -x.sif-matchesbetween proteinsinsimpleinteractionformatfordisplayongraphingsoftware. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S15.zip]Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page14of18

PAGE 15

Additionalfile16:Outputoftheclusteranalysisofthe11 Yersinia species Thetopleveldirectoryconsistsofadirectorycalled Additional_cluster_filesand5010directories,oneforeachmulti-protein clusterfamily.(Thistopleveldirectoryhasbeensplitintothreedatafiles foruploadingpurposes(Additionalfiles15,16,17.)Withinthedirectory arethefollowingfiles:PGL1_unique_ Yersinia _unclustered.out-listofall proteinsingletonsthatMCLdidnotgroupintoacluster(seeMaterials andMethods);PGL1_ Yersinia _unique_locus_tags.txt-namesofthe11 locustagprefixesusedforeachgenome;PGL1_unique_ Yersinia .gffmappingeach Yersinia proteintoaclusterintabdelimitedGFF; PGL1_unique_ Yersinia .sigfile-listofthelongestproteinineachcluster; PGL1_unique_ Yersinia .summary-summarytableoffeaturesofeachof theclusters;PGL1_unique_ Yersinia .table-summarytableofeachprotein intheclusters.Withineachclusterdirectoryarethefollowingfiles,where ‘ x ’ istheclustername:PGL1_unique_ Yersinia -x.faa-multifastafileofthe proteinsinthecluster;PGL1_unique_ Yersinia -x.summary-summaryof thepropertiesoftheproteins;PGL1_unique_ Yersinia -x.matches-blast matchesbetweentheproteinsofthecluster;PGL1_unique_ Yersinia -x. muscle.fasta-musclealignmentoftheproteins;PGL1_unique_ Yersinia -x. muscle.fasta.gblo-gblocksoutputofmusclealignment(thatis,autotrimmedalignment);PGL1_unique_ Yersinia -x.muscle.fasta.gblo.htm-as aboveinhtmlformat;PGL1_unique_ Yersinia -x.muscle.tree-treefilefrom musclealignment;PGL1_unique_ Yersinia -x.sif-matchesbetween proteinsinsimpleinteractionformatfordisplayongraphingsoftware. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S16.zip] Additionalfile17:Outputoftheclusteranalysisofthe11 Yersinia species Thetopleveldirectoryconsistsofadirectorycalled Additional_cluster_filesand5010directories,oneforeachmulti-protein clusterfamily.(Thistopleveldirectoryhasbeensplitintothreedatafiles foruploadingpurposes(Additionalfiles15,16,17.)Withinthedirectory arethefollowingfiles:PGL1_unique_ Yersinia _unclustered.out-listofall proteinsingletonsthatMCLdidnotgroupintoacluster(seeMaterials andMethods);PGL1_ Yersinia _unique_locus_tags.txt-namesofthe11 locustagprefixesusedforeachgenome;PGL1_unique_ Yersinia .gffmappingeach Yersinia proteintoaclusterintabdelimitedGFF; PGL1_unique_ Yersinia .sigfile-listofthelongestproteinineachcluster; PGL1_unique_ Yersinia .summary-summarytableoffeaturesofeachof theclusters;PGL1_unique_ Yersinia .table-summarytableofeachprotein intheclusters.Withineachclusterdirectoryarethefollowingfiles,where ‘ x ’ istheclustername:PGL1_unique_ Yersinia -x.faa-multifastafileofthe proteinsinthecluster;PGL1_unique_ Yersinia -x.summary-summaryof thepropertiesoftheproteins;PGL1_unique_ Yersinia -x.matches-blast matchesbetweentheproteinsofthecluster;PGL1_unique_ Yersinia -x. muscle.fasta-musclealignmentoftheproteins;PGL1_unique_ Yersinia -x.muscle.fasta.gblo-gblocksoutputofmusclealignment(thatis,autotrimmedalignment);PGL1_unique_ Yersinia -x.muscle.fasta.gblo.htm-as aboveinhtmlformat;PGL1_unique_ Yersinia -x.muscle.tree-treefilefrom musclealignment;PGL1_unique_ Yersinia -x.sif-matchesbetween proteinsinsimpleinteractionformatfordisplayongraphingsoftware. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S17.zip] Additionalfile18:Completeproteinsetsforthe11speciesof Yersinia Completeproteinsetsforthe11speciesof Yersinia Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S18.zip] Additionalfile19:Inferredevolutionarytreesreconstructedusing PHYLIP[51]ofthe11 Yersinia speciesproteomesbasedon parsimony Toevaluatenodesupport,amajorityrule-consensustreeof 1,000bootstrapreplicateswascomputed. E.coli wasusedasan outgroupspecies. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S19.pdf] Additionalfile20:Inferredevolutionarytreesreconstructedusing PHYLIP[51]ofthe11 Yersinia speciesproteomesbasedon maximumlikelihood Toevaluatenodesupport,amajorityruleconsensustreeof1,000bootstrapreplicateswascomputed. E.coli was usedasanoutgroupspecies. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S20.pdf] Additionalfile21:Twentyproteinsconservedinpathogenicstrains butmissingfromthenon-pathogenset Acurveshowingtherateof declineinnumberofthissetasmorenon-pathogengenomesare addedisalsoincluded. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S21.doc] Additionalfile22:PhylogenyofTTSScomponentYscNin Yersinia andotherenterobacteriaspecies PhylogenyofTTSScomponentYscN in Yersinia andotherenterobacteriaspecies. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S22.doc] Additionalfile23:Putativeantibioticresistancegenesinthe Yersinia genusdeterminedusingtheAntibioticResistanceGenes Database Putativeantibioticresistancegenesinthe Yersinia genus determinedusingtheAntibioticResistanceGenesDatabase[45]. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S23.xls] Additionalfile24:Calculationsfortheestimationof fromaligned Yersinia coregenomes Calculationsfortheestimationof from aligned Yersinia coregenomes. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/gb-2010-11-1-r1S24.doc] Abbreviations ATCC:AmericanTypeCultureCollection;COG:ClusterofOrthologous Groups;HPI:high-pathogenicityisland;IS:insertionsequence;LCB:locally collinearblock;SRA:ShortReadArchive;TTSS:typeIIIsecretionsystem;YAPI: Y.pseudotuberculosis adhesionpathogenicityisland. Acknowledgements WewouldliketothankAyraAkmal,KimBishop-Lilly,MikeCariaso,Brian Osborne,BillKlimke,TimWelch,JenniferTsai,CherylTimmsStraussand membersofthe454ServiceCenterfortheirhelpandadviceincompleting thismanuscript.ThisworkwassupportedbygrantTMTI0068_07_NM_Tfrom theJointScienceandTechnologyOfficeforChemicalandBiological Defense(JSTO-CBD),DefenseThreatReductionAgencyInitiativetoTDR.The viewsexpressedinthisarticlearethoseoftheauthorsanddonot necessarilyreflecttheofficialpolicyorpositionoftheUSDepartmentofthe Navy,USDepartmentofDefense,ortheUSGovernment.Someofthe authorsareemployeesoftheUSGovernment,andthisworkwasprepared aspartoftheirofficialduties.Title17USC§105providesthat ‘ Copyright protectionunderthistitleisnotavailableforanyworkoftheUnitedStates Government. ’ Title17USC§101definesaUSGovernmentworkasawork preparedbyamilitaryservicememberoremployeeoftheUSGovernment aspartofthatperson ’ sofficialduties. Authordetails1BiologicalDefenseResearchDirectorate,NavalMedicalResearchCenter,503 RobertGrantAvenue,SilverSpring,Maryland20910,USA.2Universityof MarylandInstituteforAdvancedComputerSciences,Centerfor BioinformaticsandComputationalBiology,UniversityofMaryland,College Park,Maryland20742,USA.3EmergingPathogensInstituteandDepartment ofMolecularGeneticsandMicrobiology,UniversityofFloridaCollegeofChen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page15of18

PAGE 16

Medicine,Gainesville,Florida32610,USA.4454LifeSciencesInc.,15 CommercialStreet,Branford,Connecticut06405,USA.5Departmentof HumanGenetics,EmoryUniversitySchoolofMedicine,615MichaelStreet, Atlanta,Georgia30322,USA.6DivisionofInfectiousDiseases,Emory UniversitySchoolofMedicine,615MichaelStreet,Atlanta,Georgia30322, USA.7Currentaddress:ComputationalandMathematicalBiology,Genome InstituteofSingapore,Singapore-127726. Authors ’ contributions TDR,MEZ,LD,andSSwereinvolvedinstudydesign.AS,andAMwere involvedinmaterials.LD,MPKT,SL,andNNowereinvolvedin454 sequencing.SS,MPKT,andCCwereinvolvedinadditionalexperiments.PEC, TDR,CC,MEZ,ACS,NN,MP,BT,andDDSwereinvolvedindataanalysis. TDR,MP,andNNwrotethepaper. Received:23May2009Revised:7October2009 Accepted:4January2010Published:4January2010 References1.EckerDJ,SampathR,WillettP,WyattJR,SamantV,MassireC,HallTA, HariK,McNeilJA,Buchen-OsmondC,BudowleB: TheMicrobialRosetta StoneDatabase:acompilationofglobalandemerginginfectious microorganismsandbioterroristthreatagents. BMCMicrobiol 2005, 5 :19. 2.AchtmanM,ZurthK,MorelliG,TorreaG,GuiyouleA,CarnielE: Yersinia pestis,thecauseofplague,isarecentlyemergedcloneofYersinia pseudotuberculosis. ProcNatlAcadSciUSA 1999, 96 :14043-14048. 3.vanBaarlenP,vanBelkumA,SummerbellRC,CrousPW,ThommaBP: Molecularmechanismsofpathogenicity:howdopathogenic microorganismsdevelopcross-kingdomhostjumps?. FEMSMicrobiolRev 2007, 31 :239-277. 4.VanErtMN,EasterdayWR,HuynhLY,OkinakaRT,Hugh-JonesME,RavelJ, ZaneckiSR,PearsonT,SimonsonTS,U ’ RenJM,KachurSM,LeademDoughertyRR,RhotonSD,ZinserG,FarlowJ,CokerPR,SmithKL,WangB, KeneficLJ,Fraser-LiggettCM,WagnerDM,KeimP: GlobalGenetic PopulationStructureofBacillusanthracis. PLoSONE 2007, 2 :e461. 5.ZwickME,McAfeeF,CutlerDJ,ReadTD,RavelJ,BowmanGR,GallowayDR, MateczunA: Microarray-basedresequencingofmultipleBacillus anthracisisolates. GenomeBiol 2005, 6 :R10. 6.AhmedN,DobrindtU,HackerJ,HasnainSE: Genomicfluidityand pathogenicbacteria:applicationsindiagnostics,epidemiologyand intervention. NatRevMicrobiol 2008, 6 :387-394. 7.MardisER: Theimpactofnext-generationsequencingtechnologyon genetics. TrendsGenet 2008, 24 :133-141. 8.ShendureJ,JiH: Next-generationDNAsequencing. NatBiotechnol 2008, 26 :1135-1145. 9.ParkhillJ,WrenBW,ThomsonNR,TitballRW,HoldenMT,PrenticeMB, SebaihiaM,JamesKD,ChurcherC,MungallKL,BakerS,BashamD, BentleySD,BrooksK,Cerdeo-TrragaAM,ChillingworthT,CroninA, DaviesRM,DavisP,DouganG,FeltwellT,HamlinN,HolroydS,JagelsK, KarlyshevAV,LeatherS,MouleS,OystonPC,QuailM,RutherfordK, etal : GenomesequenceofYersiniapestis,thecausativeagentofplague. Nature 2001, 413 :523-527. 10.DengW,BurlandV,PlunkettG,BoutinA,MayhewGF,LissP,PernaNT, RoseDJ,MauB,ZhouS,SchwartzDC,FetherstonJD,LindlerLE, BrubakerRR,PlanoGV,StraleySC,McDonoughKA,NillesML,MatsonJS, BlattnerFR,PerryRD: GenomesequenceofYersiniapestisKIM. JBacteriol 2002, 184 :4601-4611. 11.SongY,TongZ,WangJ,WangL,GuoZ,HanY,ZhangJ,PeiD,ZhouD, QinH,PangX,HanY,ZhaiJ,LiM,CuiB,QiZ,JinL,DaiR,ChenF,LiS, YeC,DuZ,LinW,WangJ,YuJ,YangH,WangJ,HuangP,YangR: CompletegenomesequenceofYersiniapestisstrain9anisolate avirulenttohumans. DNARes 2004z, 11 :179-197. 12.ChainPS,HuP,MalfattiSA,RadnedgeL,LarimerF,VergezLM,WorshamP, ChuMC,AndersenGL: CompletegenomesequenceofYersiniapestis strainsAntiquaandNepal516:evidenceofgenereductioninan emergingpathogen. JBacteriol 2006, 188 :4453-4463. 13.ChainPS,CarnielE,LarimerFW,LamerdinJ,StoutlandPO,RegalaWM, GeorgescuAM,VergezLM,LandML,MotinVL,BrubakerRR,FowlerJ, HinnebuschJ,MarceauM,MedigueC,SimonetM,Chenal-FrancisqueV, SouzaB,DacheuxD,ElliottJM,DerbiseA,HauserLJ,GarciaE: Insightsinto theevolutionofYersiniapestisthroughwhole-genomecomparisonwithYersiniapseudotuberculosis. ProcNatlAcadSciUSA 2004, 101 :13826-13831. 14.EppingerM,RosovitzMJ,FrickeWF,RaskoDA,KokorinaG,FayolleC, LindlerLE,CarnielE,RavelJ: ThecompletegenomesequenceofYersinia pseudotuberculosisIP31758,thecausativeagentofFarEastscarlet-like fever. PLoSGenet 2007, 3 :e142. 15.ThomsonNR,HowardS,WrenBW,HoldenMT,CrossmanL,ChallisGL, ChurcherC,MungallK,BrooksK,ChillingworthT,FeltwellT,AbdellahZ, HauserH,JagelsK,MaddisonM,MouleS,SandersM,WhiteheadS, QuailMA,DouganG,ParkhillJ,PrenticeMB: TheCompleteGenome SequenceandComparativeGenomeAnalysisoftheHighPathogenicity YersiniaenterocoliticaStrain8081. PLoSGenet 2006, 2 :e206. 16.RollinsSE,RollinsSM,RyanET: Yersiniapestisandtheplague. AmJClin Pathol 2003, 119(Suppl) :S78-85. 17.WrenBW: Theyersiniae – amodelgenustostudytherapidevolutionof bacterialpathogens. NatRevMicrobiol 2003, 1 :55-64. 18.CornelisGR: TheYersiniaYsc-Yopvirulenceapparatus. IntJMedMicrobiol 2002, 291 :455-462. 19.JurisSJ,ShaoF,DIxonJE: Yersinia effectorstargetmammaliansignaling pathways. CellMicrobiol 2002, 4 :201-211. 20.ViboudGI,BliskaJB: Yersinia outerproteins:roleinmodulationofhostcell signalingresponsesandpathogenesis. AnnuRevMicrobiol 2005, 59 :69-89. 21.SchubertS,RakinA,HeesemannJ: TheYersiniahigh-pathogenicityisland (HPI):evolutionaryandfunctionalaspects. IntJMedMicrobiol 2004, 294 :83-94. 22.CarnielE: TheYersiniahigh-pathogenicityisland:aniron-uptakeisland. MicrobesInfect 2001, 3 :561-569. 23.DarlingAE,MiklosI,RaganMA: Dynamicsofgenomerearrangementin bacterialpopulations. PLoSGenet 2008, 4 :e1000128. 24.AnisimovAP,LindlerLE,PierGB: IntraspecificdiversityofYersiniapestis. ClinMicrobiolRev 2004, 17 :434-464. 25.WangX,HanY,LiY,GuoZ,SongY,TanY,DuZ,RakinA,ZhouD,YangR: YersiniagenomediversitydisclosedbyYersiniapestisgenome-wide DNAmicroarray. CanJMicrobiol 2007, 53 :1211-1221. 26.WelchTJ,FrickeWF,McDermottPF,WhiteDG,RossoML,RaskoDA, MammelMK,EppingerM,RosovitzMJ,WagnerD,RahalisonL,LeclercJE, HinshawJM,LindlerLE,CebulaTA,CarnielE,RavelJ: Multipleantimicrobial resistanceinplague:anemergingpublichealthrisk. PLoSONE 2007, 2 :e309. 27.DerbiseA,Chenal-FrancisqueV,PouillotF,FayolleC,PrvostMC, MdigueC,HinnebuschBJ,CarnielE: Ahorizontallyacquiredfilamentous phagecontributestothepathogenicityoftheplaguebacillus. Mol Microbiol 2007, 63 :1145-1157. 28.SulakvelidzeA: YersiniaeotherthanY.enterocolitica,Y. pseudotuberculosis,andY.pestis:theignoredspecies. MicrobesInfect 2000, 2 :497-513. 29.BottoneEJ,BercovierH,MollaretHH: GenusXLI.YersiniaVanLoghem 1944,15AL. Bergey ’ sManualofSystematicBacteriology 2005, 2 :838-846. 30.KotetishviliM,KregerA,WautersG,MorrisJGJr,SulakvelidzeA,StineOC: Multilocussequencetypingforstudyinggeneticrelationshipsamong Yersiniaspecies. JClinMicrobiol 2005, 43 :2674-2684. 31.NobleMA,BartelukRL,FreemanHJ,SubramaniamR,HudsonJB: Clinical significanceofvirulence-relatedassayofYersiniaspecies. JClinMicrobiol 1987, 25 :802-807. 32.Robins-BrowneRM,CianciosiS,BordunAM,WautersG: Pathogenicityof Yersiniakristenseniiformice. InfectImmun 1991, 59 :162-167. 33.FukushimaH,GomyodaM,KanekoS: Miceandmolesinhabiting mountainousareasofShimanePeninsulaassourcesofinfectionwith Yersiniapseudotuberculosis. JClinMicrobiol 1990, 28 :2448-2455. 34.MarguliesM,EgholmM,AltmanWE,AttiyaS,BaderJS,BembenLA,BerkaJ, BravermanMS,ChenYJ,ChenZ,DewellSB,DuL,FierroJM,GomesXV, GodwinBC,HeW,HelgesenS,HoCH,HoCH,IrzykGP,JandoSC, AlenquerML,JarvieTP,JirageKB,KimJB,KnightJR,LanzaJR,LeamonJH, LefkowitzSM,LeiM, etal : Genomesequencinginmicrofabricatedhighdensitypicolitrereactors. Nature 2005, 437 :376-380. 35.EwingB,GreenP: Base-callingofautomatedsequencertracesusing phred.II.Errorprobabilities. GenomeRes 1998, 8 :186-194. 36.BrockmanW,AlvarezP,YoungS,GarberM,GiannoukosG,LeeWL,RussC, LanderES,NusbaumC,JaffeDB: QualityscoresandSNPdetectionin sequencing-by-synthesissystems. GenomeRes 2008, 18 :763-770.Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page16of18

PAGE 17

37.PhillippyAM,SchatzMC,PopM: Genomeassemblyforensics:findingthe elusivemis-assembly. GenomeBiol 2008, 9 :R55. 38.SamadAH,CaiWW,HuX,IrvinB,JingJ,ReedJ,MengX,HuangJ,HuffE, PorterB: Mappingthegenomeonemoleculeatatime – opticalmapping. Nature 1995, 378 :516-517. 39.NagarajanN,ReadTD,PopM: Scaffoldingandvalidationofbacterial genomeassembliesusingopticalrestrictionmaps. Bioinformatics 2008, 24 :1229-35. 40.SiguierP,PerochonJ,LestradeL,MahillonJ,ChandlerM: ISfinder:the referencecentreforbacterialinsertionsequences. NucleicAcidsRes 2006, 34 :D32-36. 41.PriceAL,JonesNC,PevznerPA: Denovoidentificationofrepeatfamilies inlargegenomes. Bioinformatics 2005, 21(Suppl1) :i351-358. 42.HultonCS,HigginsCF,SharpPM: ERICsequences:anovelfamilyof repetitiveelementsinthegenomesofEscherichiacoli,Salmonella typhimuriumandotherenterobacteria. MolMicrobiol 1991, 5 :825-834. 43.DeGregorioE,SilvestroG,VendittiR,CarlomagnoMS,DiNoceraPP: StructuralorganizationandfunctionalpropertiesofminiatureDNA insertionsequencesinyersiniae. JBacteriol 2006, 188 :7876-7884. 44.PhillippyAM,MasonJA,AyanbuleK,SommerDD,TavianiE,HuqA, ColwellRR,KnightIT,SalzbergSL: ComprehensiveDNAsignature discoveryandvalidation. PLoSComputBiol 2007, 3 :e98. 45.LangilleMG,BrinkmanFS: IslandViewer:anintegratedinterfacefor computationalidentificationandvisualizationofgenomicislands. Bioinformatics 2009, 25 :664-665. 46.DarlingAC,MauB,BlattnerFR,PernaNT: Mauve:multiplealignmentof conservedgenomicsequencewithrearrangements. GenomeRes 2004, 14 :1394-1403. 47. MAUVEAlignerUserGuide. http://asap.ahabs.wisc.edu/mauve-aligner/ mauve-user-guide/. 48.EnrightAJ,VanDongenS,OuzounisCA: Anefficientalgorithmfor large-scaledetectionofproteinfamilies. NucleicAcidsRes 2002, 30 :1575-1584. 49.TettelinH,MasignaniV,CieslewiczMJ,DonatiC,MediniD,WardNL, AngiuoliSV,CrabtreeJ,JonesAL,DurkinAS,DeboyRT,DavidsenTM, MoraM,ScarselliM,MargarityRosI,PetersonJD,HauserCR,SundaramJP, NelsonWC,MadupuR,BrinkacLM,DodsonRJ,RosovitzMJ,SullivanSA, DaughertySC,HaftDH,SelengutJ,GwinnML,ZhouL,ZafarN, etal : GenomeanalysisofmultiplepathogenicisolatesofStreptococcus agalactiae:implicationsforthemicrobial “ pan-genome ” ProcNatlAcad SciUSA 2005, 102 :13950-13955. 50.LarkinMA,BlackshieldsG,BrownNP,ChennaR,McGettiganPA, McWilliamH,ValentinF,WallaceIM,WilmA,LopezR,ThompsonJD,GibsonTJ,HigginsDG: ClustalWandClustalXversion2.0. Bioinformatics 2007, 23 :2947-2948. 51.FelsensteinJ: PHYLIP:PhylogenyInferencePackage,version3.6. Seattle, WA,USA.:UniversityofWashington2001. 52.TatusovRL,GalperinMY,NataleDA,KooninEV: TheCOGdatabase:atool forgenome-scaleanalysisofproteinfunctionsandevolution. Nucleic AcidsRes 2000, 28 :33-36. 53.LeporeLS,RoelvinkPR,GranadosRR: Enhancin,thegranulosisvirus proteinthatfacilitatesnucleopolyhedrovirus(NPV)infections,isa metalloprotease. JInvertebrPathol 1996, 68 :131-140. 54.BowenD,RocheleauTA,BlackburnM,AndreevO,GolubevaE,BhartiaR, ffrench-ConstantRH: InsecticidaltoxinsfromthebacteriumPhotorhabdus luminescens. Science 1998, 280 :2129-2132. 55.BrussowH,CanchayaC,HardtWD: Phagesandtheevolutionofbacterial pathogens:fromgenomicrearrangementstolysogenicconversion. MicrobiolMolBiolRev 2004, 68 :560-602. 56.CollynF,GuyL,MarceauM,SimonetM,RotenCA: Describingancient horizontalgenetransfersatthenucleotideandgenelevelsby comparativepathogenicityislandgenometrics. Bioinformatics 2006, 22 :1072-1079. 57.CollynF,BillaultA,MulletC,SimonetM,MarceauM: YAPI,anewYersinia pseudotuberculosispathogenicityisland. InfectImmun 2004, 72 :4784-4790. 58.HowardSL,GauntMW,HindsJ,WitneyAA,StablerR,WrenBW: Applicationofcomparativephylogenomicstostudytheevolutionof Yersiniaenterocoliticaandtoidentifygeneticdifferencesrelatingto pathogenicity. JBacteriol 2006, 188 :3645-3653. 59.HallerJC,CarlsonS,PedersonKJ,PiersonDE: Achromosomallyencoded typeIIIsecretionpathwayinYersiniaenterocoliticaisimportantin virulence. MolMicrobiol 2000, 36 :1436-1446. 60.HenselM,SheaJE,BaumlerAJ,GleesonC,BlattnerF,HoldenDW: Analysis oftheboundariesofSalmonellapathogenicityisland2andthe correspondingchromosomalregionofEscherichiacoliK-12. JBacteriol 1997, 179 :1105-1111. 61.SheaJE,HenselM,GleesonC,HoldenDW: Identificationofavirulence locusencodingasecondtypeIIIsecretionsysteminSalmonella typhimurium. ProcNatlAcadSciUSA 1996, 93 :2593-2597. 62.ThomsonNR,HowardS,WrenBW,PrenticeMB: Comparativegenome analysesofthepathogenicYersiniaebasedonthegenomesequenceof Yersiniaenterocoliticastrain8081. AdvExpMedBiol 2007, 603 :2-16. 63.PrenticeMB,CuccuiJ,ThomsonN,ParkhillJ,DeeryE,WarrenMJ: CobalaminsynthesisinYersiniaenterocolitica8081.Functionalaspects ofaputativemetabolicisland. AdvExpMedBiol 2003, 529 :43-46. 64.RothJR,LawrenceJG,BobikTA: Cobalamin(coenzymeB12):synthesis andbiologicalsignificance. AnnuRevMicrobiol 1996,50 :137-181. 65.KofoidE,RappleyeC,StojiljkovicI,RothJ: The17-geneethanolamine(eut) operonofSalmonellatyphimuriumencodesfivehomologuesof carboxysomeshellproteins. JBacteriol 1999, 181 :5317-5329. 66.MaierRJ: Useofmolecularhydrogenasanenergysubstratebyhuman pathogenicbacteria. BiochemSocTrans 2005, 33 :83-85. 67.EwingWH,RossAJ,BrennerDJ,RFG: Yersiniaruckerisp.nov.,the Redmouth(RM)Bacterium. IntJSystBacteriol 1978, 28 :37-44. 68.SekowskaA,DnervaudV,AshidaH,MichoudK,HaasD,YokotaA, DanchinA: Bacterialvariationsonthemethioninesalvagepathway. BMC Microbiol 2004, 4 :9. 69.HillerNL,JantoB,HoggJS,BoissyR,YuS,PowellE,KeefeR,EhrlichNE, ShenK,HayesJ,BarbadoraK,KlimkeW,DernovoyD,TatusovaT,ParkhillJ, BentleySD,PostJC,EhrlichGD,HuFZ: ComparativeGenomicAnalysesof SeventeenStreptococcuspneumoniaeStrains:Insightsintothe PneumococcalSupragenome. JBacteriol 2007, 189 :8186-95. 70.HoggJS,HuFZ,JantoB,BoissyR,HayesJ,KeefeR,PostJC,EhrlichGD: CharacterizationandmodelingoftheHaemophilusinfluenzaecoreand supragenomesbasedonthecompletegenomicsequencesofRdand 12clinicalnontypeablestrains. GenomeBiol 2007, 8 :R103. 71.MatheeK,NarasimhanG,ValdesC,QiuX,MatewishJM,KoehrsenM, RokasA,YandavaCN,EngelsR,ZengE,OlavariettaR,DoudM,SmithRS, MontgomeryP,WhiteJR,GodfreyPA,KodiraC,BirrenB,GalaganJE,LoryS: DynamicsofPseudomonasaeruginosagenomeevolution. ProcNatlAcad SciUSA 2008, 105 :3100-3105. 72.HoltK,ParkhillJ,MazzoniC,RoumagnacP,WeillF,GoodheadI,RanceR, BakerS,MaskellD,WainJ,DolecekC,AchtmanM,DouganG: Highthroughputsequencingprovidesinsightsintogenomevariationand evolutioninSalmonellaTyphi. NatGenet 2008, 40 :987-93. 73.SimmonsS,DibartoloG,DenefV,GoltsmanD,ThelenM,BanfieldJ,EisenJ: PopulationGenomicAnalysisofStrainVariationinLeptospirillumGroup IIBacteriaInvolvedinAcidMineDrainageFormation. PlosBiol 2008, 6 : e177. 74.RaskoD,RosovitzM,MyersG,MongodinE,FrickeW,GajerP,CrabtreeJ, SperandioV,RavelJ: Thepan-genomestructureofEscherichiacoli: comparativegenomicanalysisofE.colicommensalandpathogenic isolates. JournalofBacteriology 2008, 190 :6881-93. 75.ReadTD,PetersonSN,TourasseN,BaillieLW,PaulsenIT,NelsonKE, TettelinH,FoutsDE,EisenJA,GillSR,HoltzappleEK,OkstadOA,HelgasonE, RilstoneJ,WuM,KolonayJF,BeananMJ,DodsonRJ,BrinkacLM,GwinnM, DeBoyRT,MadpuR,DaughertySC,DurkinAS,HaftDH,NelsonWC, PetersonJD,PopM,KhouriHM,RaduneD, etal : Thegenomesequenceof BacillusanthracisAmesandcomparisontocloselyrelatedbacteria. Nature 2003, 423 :81-86. 76.TettelinH,MasignaniV,CieslewiczMJ,EisenJA,PetersonS,WesselsMR, PaulsenIT,NelsonKE,MargaritI,ReadTD,MadoffLC,WolfAM,BeananMJ, BrinkacLM,DaughertySC,DeBoyRT,DurkinAS,KolonayJF,MadupuR, LewisMR,RaduneD,FedorovaNB,ScanlanD,KhouriH,MulliganS, CartyHA,ClineRT,VanAkenSE,GillJ,ScarselliM, etal : Completegenome sequenceandcomparativegenomicanalysisofanemerginghuman pathogen,serotypeVStreptococcusagalactiae. ProcNatlAcadSciUSA 2002, 99 :12391-12396. 77.SpragueLD,NeubauerH: Yersiniaaleksiciaesp.nov.IntJSystEvol Microbiol 2005, 55 :831-835.Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page17of18

PAGE 18

78.SpragueLD,ScholzHC,AmannS,BusseHJ,NeubauerH: Yersiniasimilis sp.nov. IntJSystEvolMicrobiol 2008, 58 :952-958. 79.MerhejV,AdekambiT,PagnierI,RaoultD,DrancourtM: Yersinia massiliensissp.nov.,isolatedfromfreshwater. IntJSystEvolMicrobiol 2008, 58 :779-784. 80.DelpinoMV,MarchesiniMI,EsteinSM,ComerciDJ,CassataroJ,FossatiCA, BaldiPC: AbilesalthydrolaseofBrucellaabortuscontributestothe establishmentofasuccessfulinfectionthroughtheoralrouteinmice. InfectImmun 2007, 75 :299-305. 81.SherlockO,VejborgRM,KlemmP: TheTibAadhesin/invasinfrom enterotoxigenicEscherichiacoliisselfrecognizingandinducesbacterial aggregationandbiofilmformation. InfectImmun 2005, 73 :1954-1963. 82.LiuB,PopM: ARDB – AntibioticResistanceGenesDatabase. NucleicAcids Res 2009, 37 :D443-447. 83. AntibioticResistanceGenesDatabase. http://ardb.cbcb.umd.edu/. 84.LeplaeR,HebrantA,WodakSJ,ToussaintA: ACLAME:aCLAssificationof MobilegeneticElements. NucleicAcidsRes 2004, 32 :D45-49. 85.KislyukA,LomsadzeA,LapidusAL,BorodovskyM: Frameshiftdetectionin prokaryoticgenomicsequences. IntJBioinformResAppl 2009, 5 :458-477. 86.PopM,PhillippyA,DelcherAL,SalzbergSL: Comparativegenome assembly. BriefBioinform 2004, 5 :237-248. 87.LiW,GodzikA: Cd-hit:afastprogramforclusteringandcomparinglarge setsofproteinornucleotidesequences. Bioinformatics 2006, 22 :1658-1659. 88.KurtzS,PhillippyA,DelcherAL,SmootM,ShumwayM,AntonescuC, SalzbergSL: Versatileandopensoftwareforcomparinglargegenomes. GenomeBiol 2004, 5 :R12. 89.StewartAC,OsborneB,ReadTD: DIYA:Abacterialannotationpipelinefor anygenomicslab. Bioinformatics 2009, 25 :962-3. 90.SalzbergSL,DelcherAL,KasifS,WhiteO: Microbialgeneidentification usinginterpolatedMarkovmodels. NucleicAcidsRes 1998, 26 :544-548. 91.LoweTM,EddySR: tRNAscan-SE:aprogramforimproveddetectionof transferRNAgenesingenomicsequence. NucleicAcidsRes 1997, 25 :955-964. 92.LagesenK,HallinP,RdlandEA,StaerfeldtHH,RognesT,UsseryDW: RNAmmer:consistentandrapidannotationofribosomalRNAgenes. NucleicAcidsRes 2007, 35 :3100-3108. 93.SuzekBE,HuangH,McGarveyP,MazumderR,WuCH: UniRef: comprehensiveandnon-redundantUniProtreferenceclusters. Bioinformatics 2007, 23 :1282-1288. 94.AltschulSF,GishW,MillerW,MyersEW,LipmanDJ: Basiclocalalignment searchtool. JMolBiol 1990, 215 :403-410. 95. ConservedDomainDatabase(CDD). http://www.ncbi.nlm.nih.gov/sites/ entrez?db=cdd. 96.AltschulSF,MaddenTL,SchafferAA,ZhangJ,ZhangZ,MillerW, LipmanDJ: GappedBLASTandPSI-BLAST:anewgenerationofprotein databasesearchprograms. NucleicAcidsRes 1997, 25 :3389-3402. 97.TalaveraG,CastresanaJ: Improvementofphylogeniesafterremoving divergentandambiguouslyalignedblocksfromproteinsequence alignments. SystBiol 2007, 56 :564-577. 98.ReadTD,MyersGS,BrunhamRC,NelsonWC,PaulsenIT,HeidelbergJ, HoltzappleE,KhouriH,FederovaNB,CartyHA,UmayamLA,HaftDH, PetersonJ,BeananMJ,WhiteO,SalzbergSL,HsiaRC,McClartyG,RankRG, BavoilPM,FraserCM: GenomesequenceofChlamydophilacaviae (ChlamydiapsittaciGPIC):examiningtheroleofniche-specificgenesin theevolutionoftheChlamydiaceae. NucleicAcidsRes 2003, 31 :2134-2147. 99.RaskoDA,MyersGS,RavelJ: Visualizationofcomparativegenomic analysesbyBLASTscoreratio. BMCBioinformatics 2005, 6 :2. 100.BercovierH,SteigerwaltAG,GuiyouleA,Huntley-CarterG,BrennerDJ: Yersiniaaldovae(FormerlyYersiniaenterocolitica-LikeGroupX2):aNew SpeciesofEnterobacteriaceaeIsolatedfromAquaticEcosystems. IntJ SystBacteriol 1984, 34 :166-172. 101.WautersG,JanssensM,SteigerwaltAG,BrennerDJ: Yersiniamollaretiisp. nov.andYersiniabercovierisp.nov.,FormerlyCalledYersinia enterocoliticaBiogroups3Aand3B. IntJSystBacteriol 1988, 38 :424. 102.UrsingJ,BrennertDJ,BercovierH,FanningGR,SteigerwaltAG,BraultJ, MollaretHH: Yersiniafrederiksenii:Anewspeciesofenterobacteriaceae composedofrhamnose-positivestrains(formerlycalledatypicalyersinia enterocoliticaorYersiniaenterocolitica-Like). CurrentMicrobiology 1980, 4 :213-217. 103.BrennerDJ,BercovierHH,UrsingJ,AlonsoJM,SteigerwaltAG,FanningGR, CarterGP,MollaretHH: Yersiniaintermedia:Anewspeciesof enterobacteriaceaecomposedofrhamnose-positive,melibiose-positive, raffinose-positivestrains(formerlycalledYersiniaenterocoliticaor Yersiniaenterocolitica-like). CurrentMicrobiology 1980, 4 :207-212. 104.BercovierH,UrsingJ,BrennerDJ,SteigerwaltAG,FanningGR,CarterGP, MollaretHH: Yersiniakristensenii:Anewspeciesofenterobacteriaceae composedofsucrose-negativestrains(formerlycalledatypicalYersinia enterocoliticaorYersiniaenterocolitica-Like). CurrentMicrobiology 1980, 4 :219-224. 105.AleksicS,SteigerwaltAG,BockemuehlJ: Yersiniarohdeisp.nov.isolated fromhumananddogfecesandsurfacewater. IntJSystBacteriol 1987. 106.CarverT,ThomsonN,BleasbyA,BerrimanM,ParkhillJ: DNAPlotter:circular andlinearinteractivegenomevisualization. Bioinformatics 2009, 25 :119-120. doi:10.1186/gb-2010-11-1-r1 Citethisarticleas: Chen etal .: Genomiccharacterizationofthe Yersinia genus. GenomeBiology 2010 11 :R1. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Chen etal GenomeBiology 2010, 11 :R1 http://genomebiology.com/2010/11/1/R1 Page18of18



PAGE 1

Y. enterocolitica Y. kristensenli Y. frederiksenii Y. rohdei Y. bercovieri Y. mollaretii Y. intermedia Y. aldovae Y. pseudotuberculosis Y. pestis Y. ruckeri E. coliParsimony Tree (with percent bootstrap support at nodes) 100 100 100 100 100 100 100 67.4 64.5



PAGE 1

yberc0001 ymoll0001 ykris0001 yente0001X yrohd0001 yfred0001 yinte0001 yaldo0001 ypest0001X ypseu0001X yruck0001 ecoli0001XMaximum Likelihood Tree (with percent bootstrap support at nodes) 100 99.9 99.9 100 59.1 60.4 99.8 97.8 100



PAGE 1

Number of regions identified within the 8 Yersinia genomes by Amosvalidate. High polymorphism rate indicates regions in the assembly where multiple high quality polymorphisms were found within 500bp from each oth er. A polymorphism was considered high quality if two or more reads disagree with the consensus sequence and the sum of their quality values exceeds 40 (likelihood of error lower than 1 in 10,000). Manual inspection of these polymorphisms indicated that t he majority are due to homopolymer tracts. High coverage indicates regions in the assembly that are deeper than 3 times the average coverage for the genome. Break indicates a region in the assembly where two or more singleton reads disagree with the contig structure, i.e. these reads only partially align to the contig indicating an alternative assembly of this genomic region is possible. Breaks n ear contig end indicates features that occur within 20bp from the end of a contig. Such breaks are often an end of contig artifact rather than an indication of mis assembly.



PAGE 1

Proteins conserved in pathogenic species but missing from non pathogens Y. enterocolitica 8081 Length (aa) Description Y. pestis CO92 Y. pseudotube rculosis IP32953 Notes YE0393 106 primosomal replication protein n YPO3538 YPTB0439 YE0445 468 probable outer membrane efflux lipoprotein (ileB) YPO3481 YPTB0493 YE0759 66 Hypothetica l protein YPO3367 YPTB0763 *In the same location as YPO336 7 but ortholog in CO92 called on the opposite strand YE2612 434 putative salicylate synthetase YPO1916 YPTB1601 YE2613 426 putative signal transducer YPO1915 YPTB1600 YE2614 600 inner membrane ABC transporter YbtQ YPO1914 YPTB1599

PAGE 2

YE2615 600 lipoprotein inner membrane ABC transporter YPO1913 YPTB1598 YE2616 319 transcriptio nal regulator YbtA YPO1912 YPTB1597 YE2617 2035 yersiniabact in biosynthetic protein YPO1911 YPTB1596 YE2618 3161 yersiniabact in biosynthetic protein YPO1910 YPTB1595 YE2619 366 yersiniabact in biosynthetic protein YbtU YPO1909 YPTB1594 YE2620 267 yersiniabact in biosynthetic protein Y btT YPO1908 YPTB1593 YE2621 525 yersiniabact in siderophore biosynthetic protein YPO1907 YPTB1592 YE2622 673 pesticin/yer siniabactin receptor protein YPO1906 YPTB1591 YE3032 108 Hypothetica l protein YPTB1208 No locus assigned in Y.pestis

PAGE 3

CO92 but pr esent in other Y. pestis YE3036 215 Putative uncharacteri zed protein YPO2821 YPTB1040 Truncate d in CO92 YE3368 78 Hypothetica l protein YPO0876 YPTB3119 YE3390 145 Hypothetica l protein YPO0904 YPTB3179 YE3910 101 30S ribosomal protein S14 YPO0222 a YP TB3685 Decline in number of unique genes as more genomes are added. Each value is the average of 8 random genome permutations.



PAGE 1

YscN orthologs from the 8 Yersinia genomes in this study were grouped with Y. pestis YscN UniRef50 group. All proteins within 25 amino acids length to the 439 YscN protein of Y. pestis were aligned using ClustalW and trimmed manually. A neighbor joini ng tree using PHYLIP was constructed. Plasmid and chromosomal clusters of Yersinia form separate branches and the Y. enterocolitica/ Y. mollareti branch are further from other enterocolitical like strains than the Y. pestis/ Y. pseudotuberculosis ysa orth olog.


xml version 1.0 encoding utf-8 standalone no
mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance
xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd
metsHdr CREATEDATE 2012-07-31T08:06:44
agent ROLE CUSTODIAN TYPE ORGANIZATION
name BioMed Central
dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1
mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml
xmlData
epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI
http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd
epdcx:description epdcx:resourceId sword-mets-epdcx-1
epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork
http:purl.orgdcelements1.1title
epdcx:valueString Genomic characterization of the Yersinia genus
http:purl.orgdctermsabstract
Abstract
Background
New DNA sequencing technologies have enabled detailed comparative genomic analyses of entire genera of bacterial pathogens. Prior to this study, three species of the enterobacterial genus Yersinia that cause invasive human diseases (Yersinia pestis, Yersinia pseudotuberculosis, and Yersinia enterocolitica) had been sequenced. However, there were no genomic data on the Yersinia species with more limited virulence potential, frequently found in soil and water environments.
Results
We used high-throughput sequencing-by-synthesis instruments to obtain 25- to 42-fold average redundancy, whole-genome shotgun data from the type strains of eight species: Y. aldovae, Y. bercovieri, Y. frederiksenii, Y. kristensenii, Y. intermedia, Y. mollaretii, Y. rohdei, and Y. ruckeri. The deepest branching species in the genus, Y. ruckeri, causative agent of red mouth disease in fish, has the smallest genome (3.7 Mb), although it shares the same core set of approximately 2,500 genes as the other members of the species, whose genomes range in size from 4.3 to 4.8 Mb. Yersinia genomes had a similar global partition of protein functions, as measured by the distribution of Cluster of Orthologous Groups families. Genome to genome variation in islands with genes encoding functions such as ureases, hydrogeneases and B-12 cofactor metabolite reactions may reflect adaptations to colonizing specific host habitats.
Conclusions
Rapid high-quality draft sequencing was used successfully to compare pathogenic and non-pathogenic members of the Yersinia genus. This work underscores the importance of the acquisition of horizontally transferred genes in the evolution of Y. pestis and points to virulence determinants that have been gained and lost on multiple occasions in the history of the genus.
http:purl.orgdcelements1.1creator
Chen, Peter E
Cook, Christopher
Stewart, Andrew C
Nagarajan, Niranjan
Sommer, Dan D
Pop, Mihai
Thomason, Brendan
Thomason, Maureen PK
Lentz, Shannon
Nolan, Nichole
Sozhamannan, Shanmuga
Sulakvelidze, Alexander
Mateczun, Alfred
Du, Lei
Zwick, Michael E
Read, Timothy D
http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1
http:purl.orgeprintentityTypeExpression
http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066
en
http:purl.orgeprinttermsType
http:purl.orgeprinttypeJournalArticle
http:purl.orgdctermsavailable
epdcx:sesURI http:purl.orgdctermsW3CDTF 2010-01-04
http:purl.orgdcelements1.1publisher
BioMed Central Ltd
http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus
http:purl.orgeprintstatusPeerReviewed
http:purl.orgeprinttermscopyrightHolder
Peter E Chen et al.; licensee BioMed Central Ltd.
http:purl.orgdctermslicense
http://creativecommons.org/licenses/by/2.0
http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights
http:purl.orgeprintaccessRightsOpenAccess
http:purl.orgeprinttermsbibliographicCitation
Genome Biology. 2010 Jan 04;11(1):R1
http:purl.orgdcelements1.1identifier
http:purl.orgdctermsURI http://dx.doi.org/10.1186/gb-2010-11-1-r1
fileSec
fileGrp sword-mets-fgrp-1 USE CONTENT
file sword-mets-fgid-0 sword-mets-file-1
FLocat LOCTYPE URL xlink:href gb-2010-11-1-r1.xml
sword-mets-fgid-1 sword-mets-file-2 applicationpdf
gb-2010-11-1-r1.pdf
sword-mets-fgid-3 sword-mets-file-3 applicationmsword
GB-2010-11-1-R1-S2.DOC
sword-mets-fgid-4 sword-mets-file-4 applicationzip
GB-2010-11-1-R1-S18.ZIP
sword-mets-fgid-5 sword-mets-file-5 textplain
GB-2010-11-1-R1-S10.TXT
sword-mets-fgid-6 sword-mets-file-6
GB-2010-11-1-R1-S16.ZIP
sword-mets-fgid-7 sword-mets-file-7 imagepng
GB-2010-11-1-R1-S12.PNG
sword-mets-fgid-8 sword-mets-file-8
GB-2010-11-1-R1-S22.DOC
sword-mets-fgid-9 sword-mets-file-9 $mimeResolver.getContentTypeFor($file.name)
GB-2010-11-1-R1-S3.GZ
sword-mets-fgid-10 sword-mets-file-10
GB-2010-11-1-R1-S14.ZIP
sword-mets-fgid-11 sword-mets-file-11
GB-2010-11-1-R1-S4.DOC
sword-mets-fgid-12 sword-mets-file-12 imagejpeg
GB-2010-11-1-R1-S13.JPEG
sword-mets-fgid-13 sword-mets-file-13
GB-2010-11-1-R1-S8.TXT
sword-mets-fgid-14 sword-mets-file-14 applicationvnd.ms-excel
GB-2010-11-1-R1-S23.XLS
sword-mets-fgid-15 sword-mets-file-15
GB-2010-11-1-R1-S19.PDF
sword-mets-fgid-16 sword-mets-file-16
GB-2010-11-1-R1-S9.TXT
sword-mets-fgid-17 sword-mets-file-17
GB-2010-11-1-R1-S21.DOC
sword-mets-fgid-18 sword-mets-file-18
GB-2010-11-1-R1-S20.PDF
sword-mets-fgid-19 sword-mets-file-19
GB-2010-11-1-R1-S17.ZIP
sword-mets-fgid-20 sword-mets-file-20
GB-2010-11-1-R1-S15.ZIP
sword-mets-fgid-21 sword-mets-file-21
GB-2010-11-1-R1-S24.DOC
sword-mets-fgid-22 sword-mets-file-22
GB-2010-11-1-R1-S7.TXT
sword-mets-fgid-23 sword-mets-file-23
GB-2010-11-1-R1-S6.TXT
sword-mets-fgid-24 sword-mets-file-24
GB-2010-11-1-R1-S11.PNG
sword-mets-fgid-25 sword-mets-file-25
GB-2010-11-1-R1-S1.XLS
sword-mets-fgid-26 sword-mets-file-26
GB-2010-11-1-R1-S5.DOC
structMap sword-mets-struct-1 structure LOGICAL
div sword-mets-div-1 DMDID Object
sword-mets-div-2 File
fptr FILEID
sword-mets-div-3
sword-mets-div-4
sword-mets-div-5
sword-mets-div-6
sword-mets-div-7
sword-mets-div-8
sword-mets-div-9
sword-mets-div-10
sword-mets-div-11
sword-mets-div-12
sword-mets-div-13
sword-mets-div-14
sword-mets-div-15
sword-mets-div-16
sword-mets-div-17
sword-mets-div-18
sword-mets-div-19
sword-mets-div-20
sword-mets-div-21
sword-mets-div-22
sword-mets-div-23
sword-mets-div-24
sword-mets-div-25
sword-mets-div-26
sword-mets-div-27


!DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd'
ui gb-2010-11-1-r1
ji GBJ
fm
dochead Research
bibl
title
p Genomic characterization of the it Yersinia genus
aug
au id A1 ce yes snm Chenmi Efnm Peterinsr iid I1 email peter.chen@med.navy.mil
A2 CookChristopherchristopher.cook@med.navy.mil
A3 StewartCAndrewandrew.stewart@med.navy.mil
A4 NagarajanNiranjanI2 I7 ninagarajann@gis.a-star.edu.sg
A5 SommerDDandsommer@umiacs.umd.edu
A6 PopMihaimpop@umiacs.umd.edu
A7 ThomasonBrendanbthomason@afmic.detrick.army.mil
A8 Thomasonmnm P KileyMaureenkileyma@mail.nih.gov
A9 LentzShannonscourtney12@gmail.com
A10 NolanNicholenichole.nolan@med.navy.mil
A11 SozhamannanShanmugashanmuga.sozhamannan@med.navy.mil
A12 SulakvelidzeAlexanderI3 asulakvelidze@ufl.edu
A13 MateczunAlfredalfred.mateczun@med.navy.mil
A14 DuLeiI4 lei.du@roche.com
A15 ZwickEMichaelI5 mzwick@emory.edu
ca A16 ReadDTimothyI6 tread@emory.edu
insg
ins Biological Defense Research Directorate, Naval Medical Research Center, 503 Robert Grant Avenue, Silver Spring, Maryland 20910, USA
University of Maryland Institute for Advanced Computer Sciences, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
Emerging Pathogens Institute and Department of Molecular Genetics and Microbiology, University of Florida College of Medicine, Gainesville, Florida 32610, USA
454 Life Sciences Inc., 15 Commercial Street, Branford, Connecticut 06405, USA
Department of Human Genetics, Emory University School of Medicine, 615 Michael Street, Atlanta, Georgia 30322, USA
Division of Infectious Diseases, Emory University School of Medicine, 615 Michael Street, Atlanta, Georgia 30322, USA
Current address: Computational and Mathematical Biology, Genome Institute of Singapore, Singapore-127726
source Genome Biology
issn 1465-6906
pubdate 2010
volume 11
issue 1
fpage R1
url http://genomebiology.com/2010/11/1/R1
xrefbib pubidlist pubid idtype pmpid 20047673doi 10.1186/gb-2010-11-1-r1
history rec date day 23month 5year 2009revrec 7102009acc 412010pub 412010
cpyrt 2010collab Chen et al.; licensee BioMed Central Ltd.note This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
shorttitle
Yersinia genomics
shortabs
Comparative Yersinia genomics identifies features responsible for the colonization of specific host habitats and the horizontal transfer of virulence determinants.
abs
sec
st
Abstract
Background
New DNA sequencing technologies have enabled detailed comparative genomic analyses of entire genera of bacterial pathogens. Prior to this study, three species of the enterobacterial genus Yersinia that cause invasive human diseases (Yersinia pestis, Yersinia pseudotuberculosis, and Yersinia enterocolitica) had been sequenced. However, there were no genomic data on the Yersinia species with more limited virulence potential, frequently found in soil and water environments.
Results
We used high-throughput sequencing-by-synthesis instruments to obtain 25- to 42-fold average redundancy, whole-genome shotgun data from the type strains of eight species: Y. aldovae, Y. bercovieri, Y. frederiksenii, Y. kristensenii, Y. intermedia, Y. mollaretii, Y. rohdei, and Y. ruckeri. The deepest branching species in the genus, Y. ruckeri, causative agent of red mouth disease in fish, has the smallest genome (3.7 Mb), although it shares the same core set of approximately 2,500 genes as the other members of the species, whose genomes range in size from 4.3 to 4.8 Mb. Yersinia genomes had a similar global partition of protein functions, as measured by the distribution of Cluster of Orthologous Groups families. Genome to genome variation in islands with genes encoding functions such as ureases, hydrogeneases and B-12 cofactor metabolite reactions may reflect adaptations to colonizing specific host habitats.
Conclusions
Rapid high-quality draft sequencing was used successfully to compare pathogenic and non-pathogenic members of the Yersinia genus. This work underscores the importance of the acquisition of horizontally transferred genes in the evolution of Y. pestis and points to virulence determinants that have been gained and lost on multiple occasions in the history of the genus.
meta
classifications
classification 30010008 subtype man_spc_id type BMC Evolution
300100010 Genome studies
300100014 Microbiology and parasitology
bdy
Background
Of the millions of species of bacteria that live on this planet, only a very small percentage cause serious human diseases abbrgrp abbr bid B1 1. Comparative genetic studies are revealing that many pathogens have only recently emerged from protean environmental, commensal or zoonotic populations B2 2B3 3B4 4B5 5. For a variety of reasons, most research effort has been focused on characterizing these pathogens, while their closely related non-pathogenic relatives have only been lightly studied. As a result, our understanding of the population biology of these clades remains biased, limiting our knowledge of the evolution of virulence and our ability to design reliable assays that distinguish pathogen signatures from the background in the clinic and environment B6 6.
The recent development of second generation sequencing platforms (reviewed by Mardis B7 7B8 8 and Shendure 78) offers an opportunity to change the direction of microbial genomics, enabling the rapid genome sequencing of large numbers of strains of both pathogenic and non-pathogenic strains. Here we describe the deployment of new sequencing technology to extensively sample eight genomes from the Yersinia genus of the family Enterobacteriaceae. The first published sequencing studies on the Yersinia genus have focused exclusively on invasive human disease-causing species that included five Yersinia pestis genome sequences (one of which, strain 91001, is from the avirulent 'microtus' biovar) B9 9B10 10B11 11B12 12, two Yersinia pseudotuberculosis B13 13B14 14 and one Yersinia enterocolitica biotype 1B B15 15. Primarily a zoonotic pathogen, Y. pestis, the causative agent of bubonic plague and a category A select agent, is a recently emerged lineage that has since undergone global expansion 2. Following introduction into a human through flea bite B16 16, Y. pestis is engulfed by macrophages and taken to the regional lymph nodes. Y. pestis then escapes the macrophages and multiplies to cause a highly lethal bacteremia if untreated with antibiotics. Y. pseudotuberculosis and Y. enterocolitica (primarily biotype 1B) are enteropathogens that cause gastroenteritis following ingestion and translocation of the Peyer's patches. Like Y. pestis, the enteropathogenic Yersiniae can escape macrophages and multiply outside host cells, but unlike their more virulent cogener, they only usually cause self-limiting inflammatory diseases.
The generally accepted pathway for the evolution of these more severe disease-causing Yersiniae is memorably encapsulated by the recipe, 'add DNA, stir, reduce' B17 17. In each species DNA has been 'added' by horizontal gene transfer in the form of plasmids and genomic islands. All three human pathogens carry a 70-kb pYV virulence plasmid (also known as pCD), which carries the Ysc type III secretion system and Yops effectors B18 18B19 19B20 20, that is not detected in non-pathogenic species. Y. pestis also has two additional plasmids, pMT (also known as pFra), containing the F1 capsule-like antigen and murine toxin, and pPla (also known as pPCP1), which carries plasminogen-activating factor, Pla. Y. pestis, Y. pseudotuberculosis, and biotype 1B Y. enterocolitica also contain a chromosomally located, mobile, high-pathogenicity island (HPI) B21 21. The HPI includes a cluster of genes for biosynthesis of yersiniabactin, an iron-binding siderophore necessary for systemic infection B22 22. 'Stir' refers to intra-genomic change, notably the recent expansion of insertion sequences (IS) within Y. pestis (3.7% of the Y. pestis CO92 genome 9) and a high level of genome structural variation B23 23. 'Reduce' describes the loss of functions via deletions and pseudogene accumulation in Y. pestis 913 due to shifts in selection pressure caused by the transition from Y. pseudotuberculosis-like enteropathogenicity to a flea-borne transmission cycle. This description of Y. pestis evolution is, of course, oversimplified. Y. pestis strains show considerable diversity at the phenotypic level and there is evidence of acquisition of plasmids and other horizontally transferred genes [12B24 24B25 25 DNA microarray, B26 26B27 27].
While most attention is focused on the three well-known human pathogens, several other, less familiar Yersinia species have been split off from Y. enterocolitica over the past 40 years based on biochemistry, serology and 16S RNA sequence B28 28B29 29. Y. ruckeri is an agriculturally important fish pathogen that is a cause of 'red mouth' disease in salmonid fish. The species has sufficient phylogenetic divergence from the rest of the Yersinia genus to stir controversy about its taxonomic assignment B30 30. Y. fredricksenii, Y. kristensenii, Y. intermedia, Y. mollaretii, Y. bercovieri, and Y. rohdei have been isolated from human feces, fresh water, animal feces and intestines and foods 28. There have been reports associating some of the species with human diarrheal infections B31 31 and lethality for mice B32 32. Y. aldovae is most often isolated from fresh water but has also been cultured from fish and the alimentary tracts of wild rodents B33 33. There is no report of isolation of Y. aldovae from human feces or urine 28.
Using microbead-based, massively parallel sequencing by synthesis B34 34 we rapidly and economically obtained high redundancy genome sequence of the type strains of each of these eight lesser known Yersinia species. From these genome sequences, we were able to determine the core gene set that defines the Yersinia genus and to look for clues to distinguish the genomes of human pathogens from less virulent strains.
Results
High-redundancy draft genome sequences of eight Yersinia species
Whole genome shotgun coverage of eight previously unsequenced Yersinia species (Table tblr tid T1 1) was obtained by single-end bead-based pyrosequencing 34 using the 454 Life Sciences GS-20 instrument. Each of the eight genomes was sequenced to a high level of redundancy (between 25 and 44 sequencing reads per base) and assembled de novo into large contigs (Table T2 2; Additional file supplr sid S1 1). Excluding contigs that covered repeat regions and therefore had significantly increased copy number, the quality of the sequence of the draft assemblies was high, with less than 0.1% of the sequence of each genome having a consensus quality score B35 35 less than 40. Moreover, a more recent assessment of quality of GS-20 data suggests that the scores generated by the 454 Life Sciences software are an underestimation of the true sequence quality B36 36. The most common sequencing error encountered when assembling pyrosequencing data is the rare calling of incorrect numbers of homopolymers caused by variation in the intensity of fluorescence emitted upon extension with the labeled nucleoside 34.
tbl Table 1caption Strains sequenced in this studytblbdy cols 8
r
c left
b Species
ATCC number
Other designations
center
Year isolated
Location isolated
Description
Optimum growth temperature
Reference
cspan
hr
Y. aldovae
35236T
CNY 6065
NR
Czechoslovakia
Drinking water
26°C
B100 100
Y. bercovieri
43970T
CDC 2475-87
NR
France
Human stool
26°C
B101 101
Y. frederiksenii
33641T
CDC 1461-81, CIP 80-29
NR
Denmark
Sewage
26°C
B102 102
Y. intermedia
29909T
CIP 80-28
NR
NR
Human urine
37°C
B103 103
Y. kristensenii
33638T
CIP 80-30
NR
NR
Human urine
26°C
B104 104
Y. mollaretii
43969T
CDC 2465-87
NR
USA
Soil
26°C
101
Y. rohdei
43380T
H271-36/78, CDC 3022-85
1978
Germany
Dog feces
26°C
B105 105
Y. ruckeri
29473T
2396-61
1961
Idaho, USA
Rainbow trout (Oncorhynchus mykiss) with red mouth disease
26°C
B67 67
tblfn
NR, not reported in reference publication.
Table 2Genomes summary9
Species
Type strain
NCBI project ID
GenBank accession number
Total reads
Number of contigs 500 nt/b
/p
/c
c ca="center"
p
bTotal length of large contigs/b
/p
/c
c ca="center"
p
b% large contigs /p
/c
c ca="center"
p
bNumber of contigs aligned to chromosomal scaffold/b
/p
/c
/r
r
c cspan="9"
hr/
/c
/r
r
c ca="left"
p
itY. rohdei/it
/p
/c
c ca="left"
pATCC_43380/p
/c
c ca="left"
p29767/p
/c
c ca="left"
p[Genbank:ext-link ext-link-type="gen" ext-link-id="ACCD00000000"ACCD00000000/ext-link]/p
/c
c ca="center"
p991,106/p
/c
c ca="center"
p83/p
/c
c ca="center"
p4,303,720/p
/c
c ca="center"
p0.11/p
/c
c ca="center"
p60/p
/c
/r
r
c ca="left"
p
itY. ruckeri/it
/p
/c
c ca="left"
pATCC_29473/p
/c
c ca="left"
p29769/p
/c
c ca="left"
p[Genbank:ext-link ext-link-type="gen" ext-link-id="ACCC00000000"ACCC00000000/ext-link]/p
/c
c ca="center"
p1,347,304/p
/c
c ca="center"
p103/p
/c
c ca="center"
p3,716,658/p
/c
c ca="center"
p0.004/p
/c
c ca="center"
p68/p
/c
/r
r
c ca="left"
p
itY. aldovae/it
/p
/c
c ca="left"
pATCC_35236/p
/c
c ca="left"
p29741/p
/c
c ca="left"
p[Genbank:ext-link ext-link-type="gen" ext-link-id="ACCB00000000"ACCB00000000/ext-link]/p
/c
c ca="center"
p1,125,002/p
/c
c ca="center"
p104/p
/c
c ca="center"
p4,277,123/p
/c
c ca="center"
p0.006/p
/c
c ca="center"
p60/p
/c
/r
r
c ca="left"
p
itY. kristensenii/it
/p
/c
c ca="left"
pATCC_33638/p
/c
c ca="left"
p29761/p
/c
c ca="left"
p[Genbank:ext-link ext-link-type="gen" ext-link-id="ACCA00000000"ACCA00000000/ext-link]/p
/c
c ca="center"
p1,374,452/p
/c
c ca="center"
p86/p
/c
c ca="center"
p4,637,246/p
/c
c ca="center"
p0.003/p
/c
c ca="center"
p63/p
/c
/r
r
c ca="left"
p
itY. intermedia/it
/p
/c
c ca="left"
pATCC_29909/p
/c
c ca="left"
p29755/p
/c
c ca="left"
p[Genbank:ext-link ext-link-type="gen" ext-link-id="AALF00000000"AALF00000000/ext-link]/p
/c
c ca="center"
p1,768,909/p
/c
c ca="center"
p74/p
/c
c ca="center"
p4,684,150/p
/c
c ca="center"
p0.003/p
/c
c ca="center"
p68/p
/c
/r
r
c ca="left"
p
itY. frederiksenii/it
/p
/c
c ca="left"
pATCC_33641/p
/c
c ca="left"
p29743/p
/c
c ca="left"
p[Genbank:ext-link ext-link-type="gen" ext-link-id="AALE00000000"AALE00000000/ext-link]/p
/c
c ca="center"
p1,504,985/p
/c
c ca="center"
p90/p
/c
c ca="center"
p4,864,031/p
/c
c ca="center"
p0.005/p
/c
c ca="center"
p56/p
/c
/r
r
c ca="left"
p
itY. mollaretii/it
/p
/c
c ca="left"
pATCC_43969/p
/c
c ca="left"
p16105/p
/c
c ca="left"
p[Genbank:ext-link ext-link-type="gen" ext-link-id="AALD00000000"AALD00000000/ext-link]/p
/c
c ca="center"
p1,825,876/p
/c
c ca="center"
p110/p
/c
c ca="center"
p4,535,932/p
/c
c ca="center"
p0.003/p
/c
c ca="center"
p80/p
/c
/r
r
c ca="left"
p
itY. bercovieri/it
/p
/c
c ca="left"
pATCC_43970/p
/c
c ca="left"
p16104/p
/c
c ca="left"
p[Genbank:ext-link ext-link-type="gen" ext-link-id="AALC00000000"AALC00000000/ext-link]/p
/c
c ca="center"
p1,263,275/p
/c
c ca="center"
p144/p
/c
c ca="center"
p4,316,521/p
/c
c ca="center"
p0.006/p
/c
c ca="center"
p91/p
/c
/r
/tblbdy/tbl
suppl id="S1"
title
pAdditional file 1/p
/title
caption
pStatistics from DIYA and frameshift detection programs on eight genomes sequenced in this study and other enterobacterial genomes from NCBI/p
/caption
text
pStatistics from running DIYA abbrgrpabbr bid="B89"89/abbr/abbrgrp and frameshift detection programs on the eight genomes sequenced in this study and various other enterobacterial genomes downloaded from NCBI./p
/text
file name="gb-2010-11-1-r1-S1.xls"
pClick here for file/p
/file
/suppl
pPrevious studies and our experience suggest that at this level of sequence coverage the assembly gaps fall in repeat regions that cannot be spanned by single-end sequence reads (average length 109 nucleotides in this study) abbrgrpabbr bid="B34"34/abbr/abbrgrp. Fewer RNA genes are observed compared to published itYersinia /itgenomes finished using traditional Sanger sequencing technology (Additional file supplr sid="S1"1/supplr), reflecting the greater difficulty of uniquely assembling repetitive sequences with single-end reads. We assessed the quality of our assemblies using metrics implemented in the itamosvalidate /itpackage abbrgrpabbr bid="B37"37/abbr/abbrgrp. Specifically, we focused on three measures frequently correlated with assembly errors: density of polymorphisms within assembled reads, depth of coverage, and breakpoints in the alignment of unassembled reads to the final assembly. Regions in each genome where at least one measure suggested a possible mis-assembly were validated by manual inspection (Additional file supplr sid="S2"2/supplr). Many of the suspect regions corresponded to collapsed repeats, where the location of individual members of the repeat family within the genome could not be accurately determined. Based on the results of the itamosvalidate /itanalysis and the optical map alignment we found no evidence of mis-assemblies leading to chimeric contigs in the eight genomes we sequenced. Genomic regions flagged by the itamosvalidate /itpackage are made available in GFF format (compatible with most genome browsers) in Additional file supplr sid="S3"3/supplr./p
suppl id="S2"
title
pAdditional file 2/p
/title
caption
pResults of itamosvalidate /itanalysis on the eight genomes of this study/p
/caption
text
pResults of itamosvalidate /itabbrgrpabbr bid="B37"37/abbr/abbrgrp analysis on the eight genomes of this study./p
/text
file name="gb-2010-11-1-r1-S2.doc"
pClick here for file/p
/file
/suppl
suppl id="S3"
title
pAdditional file 3/p
/title
caption
pAdditional annotation files/p
/caption
text
pThese consist of ISfinder abbrgrpabbr bid="B40"40/abbr/abbrgrp, RepeatScout abbrgrpabbr bid="B41"41/abbr/abbrgrpand itamosvalidate /itabbrgrpabbr bid="B37"37/abbr/abbrgrp results (GFF format); repeats found by RepeatScout in fasta format, scaffold files (NCBI AGP format); and information about length of contigs, read count, estimated repeat number, count in scaffold and whether or not the contig was placed by SOMA abbrgrpabbr bid="B39"39/abbr/abbrgrp./p
/text
file name="gb-2010-11-1-r1-S3.gz"
pClick here for file/p
/file
/suppl
pGenome sizes were estimated initially as the sum of the sizes of the contigs from the shotgun assembly, with corrections for contigs representing collapsed repeats (Table tblr tid="T2"2/tblr). We also derived an independent estimate for the genome size from the whole-genome optical restriction mapping of the species abbrgrpabbr bid="B38"38/abbr/abbrgrp (Additional file supplr sid="S4"4/supplr). Alignment of contigs to the optical maps abbrgrpabbr bid="B39"39/abbr/abbrgrp suggested that the optical maps consistently overestimated sizes (2 to 10% on average). After correction, the map-based estimates and sequence-based estimates agreed well (within 7%). Two species, itY. aldovae /it(4.22 to 4.33 Mbp) and itY. ruckeri /it(3.58 to 3.89 Mbp), have a substantially reduced total genome size compared with the 4.6 to 4.8 Mbp seen in the genus generally. The agreement between the optical maps and sequence-based estimates of genome sizes tallied with experimental evidence for the lack of large plasmids in the sequenced genomes (Additional file supplr sid="S5"5/supplr). A screen for matches to known plasmid genes produced only a few candidate plasmid contigs, totaling less than 10 kbp of sequence in each genome./p
suppl id="S4"
title
pAdditional file 4/p
/title
caption
pEstimates for genome sizes (in Mbp) based on optical map data/p
/caption
text
pEstimates for genome sizes (in Mbp) based on optical map data./p
/text
file name="gb-2010-11-1-r1-S4.doc"
pClick here for file/p
/file
/suppl
suppl id="S5"
title
pAdditional file 5/p
/title
caption
pPulsed field gel analysis of the eight sequenced itYersinia /itspecies and failure to detect plasmids/p
/caption
text
pAn itE. coli /itstrain with known plasmids was a positive control./p
/text
file name="gb-2010-11-1-r1-S5.doc"
pClick here for file/p
/file
/suppl
pThe number of IS elements per genome for the eight species (12 to 167 matches) discovered using the IS finder database abbrgrpabbr bid="B40"40/abbr/abbrgrp was much lower than in the itY. pestis /itgenome (1,147 matches; copy numbers estimates took into account the possibility of mis-assembly and were accordingly adjusted; see Methods). Furthermore, the non-pathogenic species with the most IS matches, namely itY. bercovieri /it(167 matches), itY. aldovae /it(143 matches) and itY. ruckeri /it(136 matches), have comparatively smaller genomes. We also searched for novel repeat families using a itde novo /itrepeat-finder abbrgrpabbr bid="B41"41/abbr/abbrgrp and collected a non-redundant set of 44 repeat sequence families in the itYersinia /itgenus (Table tblr tid="T3"3/tblr; Additional file supplr sid="S6"6/supplr). Interestingly, the well-known ERIC element abbrgrpabbr bid="B42"42/abbr/abbrgrp was recovered by our itde novo /itsearch and was found to be present in many copies in all the pathogenic species, but was relatively rare in the non-pathogenic ones. On the other hand, a similar and recently discovered element, YPAL abbrgrpabbr bid="B43"43/abbr/abbrgrp (also recovered by the itde novo /itsearch), was abundant in all the itYersinia /itgenomes except the fish pathogen itY. ruckeri/it. Insertion sequence IS1541C in the IS finder database, which has expanded in itY. pestis /it(to more than 60 copies), had only a handful of strong matches in itY. enterocolitica/it, itY. pseudotuberculosis/it, and itY. bercovieri /itand no discernable matches in the other itYersinia /itgenomes./p
tbl id="T3"titlepTable 3/p/titlecaptionpDistribution of common repeat sequences/p/captiontblbdy cols="6"
r
c
p/
/c
c ca="right"
p
bERIC/b
/p
p
b(127 bp)/b
/p
/c
c ca="right"
p
bYPAL/b
/p
p
b(167 bp)/b
/p
/c
c ca="right"
p
bitKristensenii /it39/b
/p
p
b(142 bp)/b
/p
/c
c ca="right"
p
bIS1541C/b
/p
p
b(708 bp)/b
/p
/c
c ca="right"
p
bAldovae3/b
/p
p
b(154 bp)/b
/p
/c
/r
r
c cspan="6"
hr/
/c
/r
r
c ca="left"
p
itE. coli/it
/p
/c
c ca="right"
p0/p
/c
c ca="right"
p3/p
/c
c ca="right"
p5/p
/c
c ca="right"
p0/p
/c
c ca="right"
p5/p
/c
/r
r
c ca="left"
p
itY. pestis/it
/p
/c
c ca="right"
p54/p
/c
c ca="right"
p43/p
/c
c ca="right"
p33/p
/c
c ca="right"
p61/p
/c
c ca="right"
p38/p
/c
/r
r
c ca="left"
p
itY. pseudotuberculosis/it
/p
/c
c ca="right"
p55/p
/c
c ca="right"
p52/p
/c
c ca="right"
p29/p
/c
c ca="right"
p5/p
/c
c ca="right"
p36/p
/c
/r
r
c ca="left"
p
itY. enterocolitica/it
/p
/c
c ca="right"
p63/p
/c
c ca="right"
p144/p
/c
c ca="right"
p100/p
/c
c ca="right"
p3/p
/c
c ca="right"
p75/p
/c
/r
r
c ca="left"
p
itY. aldovae/it
/p
/c
c ca="right"
p6/p
/c
c ca="right"
p84/p
/c
c ca="right"
p46/p
/c
c ca="right"
p0/p
/c
c ca="right"
p40/p
/c
/r
r
c ca="left"
p
itY. bercovieri/it
/p
/c
c ca="right"
p9/p
/c
c ca="right"
p45/p
/c
c ca="right"
p6/p
/c
c ca="right"
p9/p
/c
c ca="right"
p13/p
/c
/r
r
c ca="left"
p
itY. frederiksenii/it
/p
/c
c ca="right"
p0/p
/c
c ca="right"
p57/p
/c
c ca="right"
p6/p
/c
c ca="right"
p0/p
/c
c ca="right"
p5/p
/c
/r
r
c ca="left"
p
itY. intermedia/it
/p
/c
c ca="right"
p2/p
/c
c ca="right"
p91/p
/c
c ca="right"
p48/p
/c
c ca="right"
p0/p
/c
c ca="right"
p43/p
/c
/r
r
c ca="left"
p
itY. kristensenii/it
/p
/c
c ca="right"
p2/p
/c
c ca="right"
p99/p
/c
c ca="right"
p70/p
/c
c ca="right"
p0/p
/c
c ca="right"
p59/p
/c
/r
r
c ca="left"
p
itY. mollaretii/it
/p
/c
c ca="right"
p6/p
/c
c ca="right"
p62/p
/c
c ca="right"
p26/p
/c
c ca="right"
p0/p
/c
c ca="right"
p20/p
/c
/r
r
c ca="left"
p
itY. rohdei/it
/p
/c
c ca="right"
p0/p
/c
c ca="right"
p37/p
/c
c ca="right"
p8/p
/c
c ca="right"
p0/p
/c
c ca="right"
p7/p
/c
/r
r
c ca="left"
p
itY. ruckeri/it
/p
/c
c ca="right"
p45/p
/c
c ca="right"
p2/p
/c
c ca="right"
p0/p
/c
c ca="right"
p0/p
/c
c ca="right"
p2/p
/c
/r
/tblbdytblfn
pThree of the repeat sequences found using itde novo /itsearches matched the known repeat elements ERIC, YPAL, and IS1541C and are identified as such. itKristensenii/it39 and Aldovae3 are elements found from itde novo /itsearches in the itY. kristensenii /itand itY. aldovae /itgenomes, respectively./p
/tblfn/tbl
suppl id="S6"
title
pAdditional file 6/p
/title
caption
pSequences of the detected repeat families/p
/caption
text
pSequences of the detected repeat families./p
/text
file name="gb-2010-11-1-r1-S6.txt"
pClick here for file/p
/file
/suppl
/sec
sec
st
pNew itYersinia /itgenome data reduce the pool of unique detection targets for itY. pestis /itand itY. enterocolitica/it/p
/st
pThe sequences generated in this study provide new background information for validating genus detection and diagnosis assays targeting pathogenic members of the itYersinia /itgenus. The assay design process commonly starts by computationally identifying genomic regions that are unique to the targeted genus ('signatures') an ideal signature is shared by all targeted pathogens but not found in a background comprising non-pathogenic near neighbors or in other unrelated microbes. While many pathogens are well characterized at the genomic level, the background set is only sparsely represented in genomic databases, thereby limiting the ability to computationally screen out non-specific candidate assays (false positives). As a result, many assays may fail experimental field tests, thereby increasing the costs of assay development efforts. To evaluate whether the new genomic sequences generated in our study can reduce the incidence of false positives in assay development, we computed signatures for the itY. pestis /itand itY. enterocolitica /itgenera using the Insignia pipeline abbrgrpabbr bid="B44"44/abbr/abbrgrp, the system previously used to successfully develop assays for the detection of itV. cholerae /itabbrgrpabbr bid="B44"44/abbr/abbrgrp. We identified 171 and 100 regions within the genomes of itY. pestis /itand itY. enterocolitica/it, respectively, that represent good candidates for the design of detection assays. In itY. pestis /itthese regions tended to cluster around the origin of replication, whereas in itY. enterocolitica /itthere was a more even distribution. The average G+C content of the regions for the unique sequences in both species was close to the itYersinia /itaverage (47%) and there was not a strong association with putative genome islands (Additional files supplr sid="S7"7/supplr, supplr sid="S8"8/supplr, supplr sid="S9"9/supplr, supplr sid="S10"10/supplr, supplr sid="S11"11/supplr, supplr sid="S12"12/supplr, abbrgrpabbr bid="B45"45/abbr/abbrgrp). For both species, most regions overlapped predicted genes (161 of 171 (94%) and 96 of 100 (96%) in itY. pestis /itand itY. pseudotuberculosis/it, respectively). Interestingly, 171 itY. pestis /itgene regions were spread over only 70 different genes, whereas the 96 itY. enterocolitica /itregions were found overlapping only 90 genes. There was no obvious trend in the nature of the genes harboring these putative signals except that many could be arguably classed as 'non-core' functions, encoding phage endonucleases, invasins, hemolysins and hypothetical proteins./p
suppl id="S7"
title
pAdditional file 7/p
/title
caption
pitY. pestis /itCO92 signatures longer than 100 bp computed by the Insignia pipeline/p
/caption
text
pitY. pestis /itCO92 signatures longer than 100 bp computed by the Insignia abbrgrpabbr bid="B44"44/abbr/abbrgrp pipeline./p
/text
file name="gb-2010-11-1-r1-S7.txt"
pClick here for file/p
/file
/suppl
suppl id="S8"
title
pAdditional file 8/p
/title
caption
pSequences of the new genomes that match (that is, invalidate) the itY. pestis /itCO92 signatures listed in Additional file supplr sid="S7"7/supplr/p
/caption
text
pSequences of the new genomes that match (that is, invalidate) the itY. pestis /itCO92 signatures listed in Additional file supplr sid="S7"7/supplr./p
/text
file name="gb-2010-11-1-r1-S8.txt"
pClick here for file/p
/file
/suppl
suppl id="S9"
title
pAdditional file 9/p
/title
caption
pitY. enterocolitica /itsignatures longer than 100 bp computed by the Insignia pipeline/p
/caption
text
pitY. enterocolitica /itsignatures longer than 100 bp computed by the Insignia pipeline./p
/text
file name="gb-2010-11-1-r1-S9.txt"
pClick here for file/p
/file
/suppl
suppl id="S10"
title
pAdditional file 10/p
/title
caption
pSequences of the new genomes that match (that is, invalidate) the itY. enterocolitica /itsignatures/p
/caption
text
pSequences of the new genomes that match (that is, invalidate) the itY. enterocolitica /itsignatures./p
/text
file name="gb-2010-11-1-r1-S10.txt"
pClick here for file/p
/file
/suppl
suppl id="S11"
title
pAdditional file 11/p
/title
caption
pitY. pestis /itgenome with the Insiginia-indentified repeats and genome islands plotted/p
/caption
text
pitY. pestis /itgenome with the Insiginia-indentified repeats and genome islands identified using IslandViewer abbrgrpabbr bid="B45"45/abbr/abbrgrp plotted. The figure was created using DNAPlotter abbrgrpabbr bid="B106"106/abbr/abbrgrp./p
/text
file name="gb-2010-11-1-r1-S11.png"
pClick here for file/p
/file
/suppl
suppl id="S12"
title
pAdditional file 12/p
/title
caption
pitY. enterocolitica /itgenome with the Insiginia-indentified repeats and genome plotted/p
/caption
text
pitY. enterocolitica /itgenome with the Insiginia-indentified repeats and genome islands identified using IslandViewer abbrgrpabbr bid="B45"45/abbr/abbrgrp plotted. The figure was created using DNAPlotter abbrgrpabbr bid="B106"106/abbr/abbrgrp./p
/text
file name="gb-2010-11-1-r1-S12.png"
pClick here for file/p
/file
/suppl
pTen itY. pestis/it-specific and 31 itY. enterocolitica/it-specific putative signatures have significant matches in the new genome sequence data (Additional files supplr sid="S7"7/supplr, supplr sid="S8"8/supplr, supplr sid="S9"9/supplr, supplr sid="S10"10/supplr), indicating assays designed within these regions would result in false positive results. This result underscores the need for a further sampling of genomes of the itYersinia /itgenus in order to assist the design of diagnostic assays./p
/sec
sec
st
pitYersinia /itwhole-genome comparisons/p
/st
pWe performed a multiple alignment of the 11 itYersinia /itspecies using the MAUVE algorithm abbrgrpabbr bid="B46"46/abbr/abbrgrp (from here on itY. pestis /itCO92 and itY. pseudotuberculosis /itIP32953 were used as the representative genomes of their species) and obtained 98 locally collinear blocks (LCBs; Additional files supplr sid="S13"13/supplr, supplr sid="S14"14/supplr, abbrgrpabbr bid="B47"47/abbr/abbrgrp). The mean length of the LCBs was 23,891 bp. The shortest block was 1,570 bp, and the longest was 201,130 bp. This multiple alignment of the 'core' region on average covered 52% of each itYersinia /itgenome. The nucleotide diversity (Π) for the concatenated aligned region was 0.27, or an approximate genus-wide nucleotide sequence homology of 73%. As expected for a set of bacteria with this level of diversity, the alignment of the genomes shows evidence of multiple large genome rearrangements abbrgrpabbr bid="B23"23/abbr/abbrgrp (Additional file supplr sid="S13"13/supplr)./p
suppl id="S13"
title
pAdditional file 13/p
/title
caption
pOutput of the MAUVE abbrgrpabbr bid="B46"46/abbr/abbrgrp alignment of 11 itYersinia /itspecies/p
/caption
text
pThe eight genomes sequenced in this study are represented as pseudocontigs, ordered by a combination of optical mapping and alignment to the closest completed reference genome./p
/text
file name="gb-2010-11-1-r1-S13.jpeg"
pClick here for file/p
/file
/suppl
suppl id="S14"
title
pAdditional file 14/p
/title
caption
pWhole genome multiple alignment produced by MAUVE of the 11 itYersinia /itgenomes/p
/caption
text
pWhole genome multiple alignment produced by MAUVE of the 11 itYersinia /itgenomes in XMFA format abbrgrpabbr bid="B106"106/abbr/abbrgrp./p
/text
file name="gb-2010-11-1-r1-S14.zip"
pClick here for file/p
/file
/suppl
pUsing an automated pipeline for annotation and clustering of protein orthologs based on the Markov chain clustering tool MCL abbrgrpabbr bid="B48"48/abbr/abbrgrp, we estimated the size of the itYersinia /itprotein core set to be 2,497 and the pan-genome abbrgrpabbr bid="B49"49/abbr/abbrgrp to be 27,470 (Additional files supplr sid="S15"15/supplr, supplr sid="S16"16/supplr, supplr sid="S17"17/supplr, supplr sid="S18"18/supplr). The core number falls asymptotically as genomes are introduced and hence this estimate is somewhat lower than the recent analysis of only the itY. enterocolitica/it, itY. pseudotuberculosis /itand itY. pestis /itgenomes (2,747 core proteins) abbrgrpabbr bid="B15"15/abbr/abbrgrp. We found 681 genes to be in exactly one copy in each itYersinia /itgenome and to be nearly identical in length. We used ClustalW abbrgrpabbr bid="B50"50/abbr/abbrgrp to align the members of this highly conserved set, and concatenated individual gene product alignments to make a dataset of 170,940 amino acids for each of the species. Uninformative characters were removed from the dataset and a phylogeny of the genus was computed using Phylip abbrgrpabbr bid="B51"51/abbr/abbrgrp (Figure figr fid="F1"1/figr). The topology of this tree was identical whether distance or parsimony methods were used (Additional files supplr sid="S19"19/supplr, supplr sid="S20"20/supplr) and was also identical to a tree based on the nucleotide sequence of the approximately 1.5 Mb of the core genome in LCBs (see above). The genus broke down into three major clades: the outlying fish pathogen, itY. ruckeri/it; itY. pestis/ititY. pseudotuberculosis/it; and the remainder of the 'enterocolitica'-like species. itY. kristensenii /itATCC33638T was the nearest neighbor of itY. enterocolitica /it8081. The outlying position of itY. ruckeri /itwas confirmed further when we analyzed the contribution of the genome to reducing the size of the itYersinia /itcore protein families set. If itY. ruckeri /itwas excluded, the itYersinia /itcore would be 2,232 protein families of N = 2 rather than 2,072 (Table tblr tid="T4"4/tblr). In contrast, omission of any one of the 10 other species only reduced the set by a maximum of 22 families./p
tbl id="T4"titlepTable 4/p/titlecaptionpitYersinia /itcore size reduction by exclusion of one species/p/captiontblbdy cols="2"
r
c ca="left"
p
bSpecies excluded/b
/p
/c
c ca="center"
p
bCore protein families/b
/p
/c
/r
r
c cspan="2"
hr/
/c
/r
r
c ca="left"
pNone/p
/c
c ca="center"
p2,072/p
/c
/r
r
c ca="left"
p
itY. enterocolitica/it
/p
/c
c ca="center"
p2,074/p
/c
/r
r
c ca="left"
p
itY. aldovae/it
/p
/c
c ca="center"
p2,085/p
/c
/r
r
c ca="left"
p
itY. bercovieri/it
/p
/c
c ca="center"
p2,079/p
/c
/r
r
c ca="left"
p
itY. frederiksenii/it
/p
/c
c ca="center"
p2,077/p
/c
/r
r
c ca="left"
p
itY. intermedia/it
/p
/c
c ca="center"
p2,080/p
/c
/r
r
c ca="left"
p
itY. kristensenii/it
/p
/c
c ca="center"
p2,076/p
/c
/r
r
c ca="left"
p
itY. mollaretii/it
/p
/c
c ca="center"
p2,078/p
/c
/r
r
c ca="left"
p
itY. rohdei/it
/p
/c
c ca="center"
p2,091/p
/c
/r
r
c ca="left"
p
itY. ruckeri/it
/p
/c
c ca="center"
p2,232/p
/c
/r
r
c ca="left"
p
itY. pseudotuberculosis/it
/p
/c
c ca="center"
p2,076/p
/c
/r
r
c ca="left"
p
itY. pestis/it
/p
/c
c ca="center"
p2,094/p
/c
/r
/tblbdytblfn
pThe core protein families with number of members 2 or greater were recalculated in each case (see Materials and methods) with the protein set from one genome missing./p
/tblfn/tbl
fig id="F1"titlepFigure 1/p/titlecaptionpitYersinia /itwhole-genome phylogeny/p/captiontext
pbitYersinia /itwhole-genome phylogeny/b. The phylogeny of the itYersinia /itgenus was constructed from a dataset of 681 concatenated, conserved protein sequences using the Neighbor-Joining (NJ) algorithm implemented by PHYLIP abbrgrpabbr bid="B51"51/abbr/abbrgrp. The tree was rooted using itE. coli/it. The scale measures number of substitutions per residue. Tree topologies computed using maximum likelihood and parsimony estimates are identical with each other and the NJ tree (Additional file supplr sid="S20"20/supplr). The only branches not supported in more than 99% of the 1,000 bootstrap replicates using both methods are marked with asterisks. Both these branches were supported by 57% of replicates.p
textgraphic file="gb-2010-11-1-r1-1"fig
suppl id="S15"
title
pAdditional file 15p
title
caption
pOutput of the cluster analysis of the 11 itYersinia itspeciesp
caption
text
pThe top level directory consists of a directory called Additional_cluster_files and 5010 directories, one for each multi-protein cluster family. (This top level directory has been split into three data files for uploading purposes (Additional files supplr sid="S15"15supplr, supplr sid="S16"16supplr, supplr sid="S17"17supplr).) Within the directory are the following files: PGL1_unique_itYersiniait_unclustered.out list of all protein singletons that MCL did not group into a cluster (see Materials and Methods); PGL1_itYersiniait_unique_locus_tags.txt names of the 11 locus tag prefixes used for each genome; PGL1_unique_itYersiniait.gff mapping each itYersinia itprotein to a cluster in tab delimited GFF; PGL1_unique_itYersiniait.sigfile list of the longest protein in each cluster; PGL1_unique_itYersiniait.summary summary table of features of each of the clusters; PGL1_unique_itYersiniait.table summary table of each protein in the clusters. Within each cluster directory are the following files, where 'x' is the cluster name: PGL1_unique_itYersiniait-x.faa multifasta file of the proteins in the cluster; PGL1_unique_itYersiniait-x.summary summary of the properties of the proteins; PGL1_unique_itYersiniait-x.matches blast matches between the proteins of the cluster; PGL1_unique_itYersiniait-x.muscle.fasta muscle alignment of the proteins; PGL1_unique_itYersiniait-x.muscle.fasta.gblo gblocks output of muscle alignment (that is, auto-trimmed alignment); PGL1_unique_itYersiniait-x.muscle.fasta.gblo.htm as above in html format; PGL1_unique_itYersiniait-x.muscle.tree treefile from muscle alignment; PGL1_unique_itYersiniait-x.sif matches between proteins in simple interaction format for display on graphing software.p
text
file name="gb-2010-11-1-r1-S15.zip"
pClick here for filep
file
suppl
suppl id="S16"
title
pAdditional file 16p
title
caption
pOutput of the cluster analysis of the 11 itYersinia itspeciesp
caption
text
pThe top level directory consists of a directory called Additional_cluster_files and 5010 directories, one for each multi-protein cluster family. (This top level directory has been split into three data files for uploading purposes (Additional files supplr sid="S15"15supplr, supplr sid="S16"16supplr, supplr sid="S17"17supplr.) Within the directory are the following files: PGL1_unique_itYersiniait_unclustered.out list of all protein singletons that MCL did not group into a cluster (see Materials and Methods); PGL1_itYersiniait_unique_locus_tags.txt names of the 11 locus tag prefixes used for each genome; PGL1_unique_itYersiniait.gff mapping each itYersinia itprotein to a cluster in tab delimited GFF; PGL1_unique_itYersiniait.sigfile list of the longest protein in each cluster; PGL1_unique_itYersiniait.summary summary table of features of each of the clusters; PGL1_unique_itYersiniait.table summary table of each protein in the clusters. Within each cluster directory are the following files, where 'x' is the cluster name: PGL1_unique_itYersiniait-x.faa multifasta file of the proteins in the cluster; PGL1_unique_itYersiniait-x.summary summary of the properties of the proteins; PGL1_unique_itYersiniait-x.matches blast matches between the proteins of the cluster; PGL1_unique_itYersiniait-x.muscle.fasta muscle alignment of the proteins; PGL1_unique_itYersiniait-x.muscle.fasta.gblo gblocks output of muscle alignment (that is, auto-trimmed alignment); PGL1_unique_itYersiniait-x.muscle.fasta.gblo.htm as above in html format; PGL1_unique_itYersiniait-x.muscle.tree treefile from muscle alignment; PGL1_unique_itYersiniait-x.sif matches between proteins in simple interaction format for display on graphing software.p
text
file name="gb-2010-11-1-r1-S16.zip"
pClick here for filep
file
suppl
suppl id="S17"
title
pAdditional file 17p
title
caption
pOutput of the cluster analysis of the 11 itYersinia itspeciesp
caption
text
pThe top level directory consists of a directory called Additional_cluster_files and 5010 directories, one for each multi-protein cluster family. (This top level directory has been split into three data files for uploading purposes (Additional files supplr sid="S15"15supplr, supplr sid="S16"16supplr, supplr sid="S17"17supplr.) Within the directory are the following files: PGL1_unique_itYersiniait_unclustered.out list of all protein singletons that MCL did not group into a cluster (see Materials and Methods); PGL1_itYersiniait_unique_locus_tags.txt names of the 11 locus tag prefixes used for each genome; PGL1_unique_itYersiniait.gff mapping each itYersinia itprotein to a cluster in tab delimited GFF; PGL1_unique_itYersiniait.sigfile list of the longest protein in each cluster; PGL1_unique_itYersiniait.summary summary table of features of each of the clusters; PGL1_unique_itYersiniait.table summary table of each protein in the clusters. Within each cluster directory are the following files, where 'x' is the cluster name: PGL1_unique_itYersiniait-x.faa multifasta file of the proteins in the cluster; PGL1_unique_itYersiniait-x.summary summary of the properties of the proteins; PGL1_unique_itYersiniait-x.matches blast matches between the proteins of the cluster; PGL1_unique_itYersiniait-x.muscle.fasta muscle alignment of the proteins; PGL1_unique_itYersiniait-x.muscle.fasta.gblo gblocks output of muscle alignment (that is, auto-trimmed alignment); PGL1_unique_itYersiniait-x.muscle.fasta.gblo.htm as above in html format; PGL1_unique_itYersiniait-x.muscle.tree treefile from muscle alignment; PGL1_unique_itYersiniait-x.sif matches between proteins in simple interaction format for display on graphing software.p
text
file name="gb-2010-11-1-r1-S17.zip"
pClick here for filep
file
suppl
suppl id="S18"
title
pAdditional file 18p
title
caption
pComplete protein sets for the 11 species of itYersiniaitp
caption
text
pComplete protein sets for the 11 species of itYersiniait.p
text
file name="gb-2010-11-1-r1-S18.zip"
pClick here for filep
file
suppl
suppl id="S19"
title
pAdditional file 19p
title
caption
pInferred evolutionary trees reconstructed using PHYLIP abbrgrpabbr bid="B51"51abbrabbrgrp of the 11 itYersinia itspecies proteomes based on parsimonyp
caption
text
pTo evaluate node support, a majority rule-consensus tree of 1,000 bootstrap replicates was computed. itE. coli itwas used as an outgroup species.p
text
file name="gb-2010-11-1-r1-S19.pdf"
pClick here for filep
file
suppl
suppl id="S20"
title
pAdditional file 20p
title
caption
pInferred evolutionary trees reconstructed using PHYLIP abbrgrpabbr bid="B51"51abbrabbrgrp of the 11 itYersinia itspecies proteomes based on maximum likelihoodp
caption
text
pTo evaluate node support, a majority rule-consensus tree of 1,000 bootstrap replicates was computed. itE. coli itwas used as an outgroup species.p
text
file name="gb-2010-11-1-r1-S20.pdf"
pClick here for filep
file
suppl
pClustering the significant Cluster of Orthologous Groups (COG) hits abbrgrpabbr bid="B52"52abbrabbrgrp for each genome hierarchically (Figure figr fid="F2"2figr) yielded a similar pattern for the three basic clades. The overall composition of the COG matches in each genome, as measured by the proportion of the numbers in each COG supercategory, was similar throughout the genus, with the notable exceptions of the high percentage of group L COGs in itY. pestis itdue to the expansion of IS recombinases and the relatively low number of group G (sugar metabolism) COGs in itY. ruckeri it(Figure figr fid="F2"2figr).p
fig id="F2"titlepFigure 2ptitlecaptionpComparison of major COG groups in itYersinia itgenomespcaptiontext
pbComparison of major COG groups in itYersinia itgenomesb. Bars represent the number of proteins assigned to COG superfamilies abbrgrpabbr bid="B52"52abbrabbrgrp for each genome, based on matches to the Conserved Domain Database abbrgrpabbr bid="B95"95abbrabbrgrp database with an E-value threshold <10sup-10sup. The COG groups are: U, intracellular trafficking; G, carbohydrate transport and metabolism; R, general function prediction; I, lipid transport and metabolism; D, cell cycle control; H, coenzyme transport and metabolism; B, chromatin structure; P, inorganic ion transport and metabolism; W, extracellular structures; O, post-translational modification; J, translation; A, RNA processing and editing; L, replication, recombination and repair; C, energy production; M, cell wallmembrane biogenesis; Q, secondary metabolite biosynthesis; Z, cytoskeleton; V, defense mechanisms; E, amino acid transport and metabolism; K, transcription; N, cell motility; T, signal transduction; F, nucleotide transport; S, function unknown.p
textgraphic file="gb-2010-11-1-r1-2"fig
sec
sec
st
pShared protein clusters in pathogenic itYersiniait: yersiniabactin biosynthesis is the key chromosomal function specific to high virulence in humansp
st
pThe itYersinia itproteomes were investigated for common clusters in the three high virulence species missing from the low human virulence genomes (Figure figr fid="F3"3figr). Because of the close evolutionary relationship of the 'enterocolictica' clade strains, the number of unique protein clusters in itY. enterocolitica itwas reduced to a greater degree than the more phylogentically isolated itY. pestis itand itY. pseudotuberculosisit. Many of the same genome islands identified as recent horizontal acquisition by itY. pestis itandor itY. pseudotuberculosis itabbrgrpabbr bid="B9"9abbrabbr bid="B13"13abbrabbr bid="B15"15abbrabbrgrp were not present in any of the newly sequenced genomes. However, some genes, interesting from the perspective of the host specificity of the itY. pestisititY. pseutoberculosis itancestor, were detected in other itYersinia itspecies for the first time. These included orthologs of YPO3720YPO3721, a hemolysin and activator protein in itY. intermediait, itY. bercovieri itand itY. fredrickseniiit; YPO0599, a heme utilization protein also found in itY. intermediait; and YPO0399, an enhancin metalloprotease that had an ortholog in itY. kristensenii it(ykris0001_41250). Enhancin was originally identified as a factor promoting baculovirus infection of gypsy moth midgut by degradation of mucin abbrgrpabbr bid="B53"53abbrabbrgrp. Other loci in itY. pestisititY. pseudotuberculosis itlinked with insect infection, the TccC and TcABC toxin clusters abbrgrpabbr bid="B54"54abbrabbrgrp, were also found in itY. mollarettiit. In itY. mollaretti itthe Tca and Tcc proteins show about 90% sequence identity to itY. pestisititY. pseudotuberculsis itand share identical flanking chromosomal locations. Further work will need to be undertaken to resolve whether the insertion of the toxin genes in itY. mollaretti itis an independent horizontal transfer event or occurred prior to divergence of the species.p
fig id="F3"titlepFigure 3ptitlecaptionpDistribution of protein clusters across itY. enterocolitica it8081, itY. pestis itCO92, and itY. pseudotuberculosis itIP32953pcaptiontext
pbDistribution of protein clusters across itY. enterocolitica it8081, itY. pestis itCO92, and itY. pseudotuberculosis itIP32953b. b(a) bThe Venn diagram shows the number of protein clusters unique or shared between the two other high virulence itYersinia itspecies (see Materials and methods). b(b) bThe number of shared and unique clusters that do not contain a single member of the eight low human virulence genomes sequenced in this study.p
textgraphic file="gb-2010-11-1-r1-3"fig
pAfter comparison of the new low virulence genomes, the number of protein clusters shared by itY. enterocolitica itand the other two pathogens was reduced to 12 and 13 for itY. pseudotuberculosis itand itY. pestisit, respectively (Figure figr fid="F3"3figr). The remaining shared proteins were either identified as phage-related or of unknown role, providing few clues to possible functions that might define distinct pathogenic niches. Performing a similar analysis strategy between others genome of the 'enterocolitica' clade and itY. pestis itor itY. pseudotuberculosis itgave a similar result in terms of numbers and types of shared protein clusters.p
pOnly sixteen clusters of chromosomal proteins were found to be common to all three high-virulence species but absent from all eight non-pathogens (Figure figr fid="F3"3figr). Eleven of these are components of the yersiniabactin biosynthesis operon (Additional file supplr sid="S21"21supplr), further highlighting the critical importance of this iron binding siderophore for invasive disease. The other proteins are generally small proteins that are likely included because they fall in unassembled regions of the eight draft genomes. One other small island of three proteins constituting a multi-drug efflux pump (YE0443 to YE0445) was common to the high-virulence species but missing from the eight draft low-virulence species.p
suppl id="S21"
title
pAdditional file 21p
title
caption
pTwenty proteins conserved in pathogenic strains but missing from the non-pathogen setp
caption
text
pA curve showing the rate of decline in number of this set as more non-pathogen genomes are added is also included.p
text
file name="gb-2010-11-1-r1-S21.doc"
pClick here for filep
file
suppl
sec
sec
st
pVariable regions of itY. enterocolitica itclade genomesp
st
pThe basic metabolic similarities of itY. enterocolitica itand the seven species on the main branch of the itYersinia itgenus phylogenetic tree are further illustrated in Figure figr fid="F4"4figr, where the best protein matches against each itY. enterocolitica it8081 gene product abbrgrpabbr bid="B15"15abbrabbrgrp are plotted against a circular genome map. Very few genes exclusive to itY. enterocolitica it8081 were found outside of prophage regions, which is a typical result when groups of closely related bacterial genomes are compared abbrgrpabbr bid="B55"55abbrabbrgrp. One of the largest islands found in itY. enterocolitica it8081 was the 66-kb itY. pseudotuberculosis itadhesion pathogenicity island (YAPIsubyesub) abbrgrpabbr bid="B15"15abbrabbr bid="B56"56abbrabbr bid="B57"57abbrabbrgrp, a unique feature of biotype 1B strains. YAPIsubyesub, containing a type IV pilus gene cluster and other putative virulence determinants, such as arsenic resistance, is similar to a 99-kb YAPIsubpst subthat is found in several other serotypes of itY. pseudotuberculosis itabbrgrpabbr bid="B14"14abbrabbr bid="B57"57abbrabbrgrp but is missing in itY. pestis itand the serotype I itY. pseudotuberculosis itstrain IP32953 abbrgrpabbr bid="B14"14abbrabbrgrp. A model has been proposed for the acquisition of YAPI in a common ancestor of itY. pseudotuberculosis itand itY. enterocolitica itand subsequent degradation to various degrees within the itY. pseudotuberculosis itclade. However, the complete absence of YAPI from any of the seven species in the itY. enterocolitica itbranch (Figure figr fid="F4"4figr), as well as from most strains of itY. enterocolitica itabbrgrpabbr bid="B15"15abbrabbrgrp, argues against an ancient acquisition of YAPI, but instead suggests the recent independent acquisition of related islands by both itY. enterocolitica itbiogroup 1B and itY. pseudotuberculosisit.p
fig id="F4"titlepFigure 4ptitlecaptionpProtein-based comparison of itY. enterocolitica it8081 to the itYersinia itgenuspcaptiontext
pbProtein-based comparison of itY. enterocolitica it8081 to the itYersinia itgenusb. The map represents the blast score ratio (BSR) abbrgrpabbr bid="B98"98abbrabbr bid="B99"99abbrabbrgrp to the protein encoded by itY. enterocolitica itabbrgrpabbr bid="B15"15abbrabbrgrp. Blue indicates a BSR 0.70 (strong match); cyan 0.69 to 0.4 (intermediate); green <0.4 (weak). Red and pink outer circles are locations of the itY. enterocolitica itgenes on the + and strands. The genomes are ordered from outside to inside based on the greatest overall similarity to itY. enterocoliticait: itY. kristenseniiit, itY. frederikseniiit, itY. mollaretiiit, itY. intermediait, itY. bercovieri, Y. aldovaeit, itY. rohdeiit, itY. ruckeriit, itY. pseudotuberculosisit, and itY. pestisit. The black bars on the outside refer to genome islands in itY. enterocolitica itidentified by Thomson itet alit. abbrgrpabbr bid="B15"15abbrabbrgrp.p
textgraphic file="gb-2010-11-1-r1-4"fig
pMany genes previously thought to be unique to itY. enterocoliticait in general and biotype 1B in particular turned out to have orthologs in the low human virulence species sequenced in this study. These included several putative biotype 1B-specific genes identified by microarray-based screening abbrgrpabbr bid="B58"58abbrabbrgrp, including YE0344 HylD hemophore (yinte0001_41550 has 78% nucleotide identity), YE4052 metalloprotease (yinte0001_36030 has 95% nucleotide identity), and YE4088, a two-component sensor kinase, which had orthologs in all species. Large portions of the biogroup 1B-specific island containing the Yts1 type II secretion system were found in itY. ruckeriit, itY. mollaretiiit, and itY. aldovaeit. itY. aldovaeit and itY. mollaretiiit also had islands containing itysait type three secretion systems (TTSS) with 75 to 85% nucleotide identity to the homolog in itY. enterocoliticait 1B. The itysaitgenes are a chromosomal cluster abbrgrpabbr bid="B9"9abbrabbr bid="B13"13abbrabbr bid="B15"15abbrabbrgrp that in itY. enterocoliticait, at least, appears to play a role in virulence abbrgrpabbr bid="B59"59abbrabbrgrp. The itY. enterocolitica ysa itgenes are found in the plasticity zone (Figure figr fid="F4"4figr) and have very low similarity to the itY. pestis itand itY. pseudotuberculosis ysa itgenes (which are more similar to the itSalmonella itSPI-2 island abbrgrpabbr bid="B60"60abbrabbr bid="B61"61abbrabbrgrp) and are found between orthologs of YPO0254 and YPO0274 abbrgrpabbr bid="B9"9abbrabbrgrp. Species within the itYersinia itgenus had either the itY. enterocolitica ittype of itysa itTTSS locus or the itY. pestisitSPI-2 type (with the exception of itY. aldovaeit, which has both; Additional file supplr sid="S22"22supplr). This suggested the exchange of chromosomal TTSS genes within itYersiniait.p
suppl id="S22"
title
pAdditional file 22p
title
caption
pPhylogeny of TTSS component YscN in itYersinia itand other enterobacteria speciesp
caption
text
pPhylogeny of TTSS component YscN in itYersinia itand other enterobacteria species.p
text
file name="gb-2010-11-1-r1-S22.doc"
pClick here for filep
file
suppl
pThe modular nature of the islands found in the itY. enterocolitica itgenome was demonstrated further by two examples gleaned from comparison with the evolutionarily closest low human virulence genome, itY. kristensenii itATCC 33638T (Figure figr fid="F1"1figr). The YGI-3 island abbrgrpabbr bid="B15"15abbrabbrgrp in itY. enterocolitica it8081 is a degraded integrated plasmid; at the same chromosomal locus in itY. kristensenii itATCC 33638T a prophage was found, suggesting that the YGI-3 location may be a recombinational hotspot. Another itY. enterocolitica it8081 island, YGI-1, encodes a 'tight adherence' (ittadit) locus responsible for non-specific surface binding. itY. kristensenii itATCC 33638T had an identical 13 gene ittad itlocus in the same position, but the nucleotide sequence identity of the region to itY. enterocolitica it8081 was uniformly lower than that found for the rest of the genome, suggesting there had been either a gene conversion event replacing the ittad itlocus with a set of new alleles in the recent history of itY. kristensenii itor itY. enterocolitica itor the locus was under very high positive selective pressure.p
sec
sec
st
pNiche-specific metabolic adaptations in the itYersinia itgenusp
st
pComparison of the itY. enterocolitica itgenome to itY. pestis itand itY. pseudotuberculosis itrevealed some potentially significant metabolic differences that may account for varying tropisms in gastric infections abbrgrpabbr bid="B62"62abbrabbrgrp. itY. enterocolitica it8081 alone contained entire gene clusters for cobalamin (vitamin B12) biosynthesis (itcbiit), 1,2-propanediol utilization (itpduit), and tetrathionate respiration (itttrit). In itY. enterocolitica itand itSalmonella typhimurium itabbrgrpabbr bid="B63"63abbrabbr bid="B64"64abbrabbrgrp, vitamin B12 is produced under anaerobic conditions where it is used as a cofactor in 1,2-propanediol degradation, with tetrathionate serving as an electron acceptor. This study showed the genes for this pathway to be a general feature of species in the 'enterocolitica' branch of the itYersinia itgenus (with the caveat that some portions are missing in some species; for example, itY. rohdei itis missing the itpdu itcluster (Table tblr tid="T5"5tblr). Additionally, itY. intermediait, itY. bercovieriit, and itY. mollaretii itcontained gene clusters encoding degradation of the membrane lipid constituent ethanolamine. Ethanolamine metabolism under anaerobic conditions also requires the B12 cofactor. itY. intermedia itcontained the full 17-gene cluster reported in itS. typhimurium itabbrgrpabbr bid="B65"65abbrabbrgrp, including structural components of the carboxysome organelle. Another discovery from the itY. enterocolitica itgenome analysis was the presence of two compact hydrogenase gene clusters, Hyd-2 and Hyd-4 abbrgrpabbr bid="B15"15abbrabbrgrp. Hydrogen released from fermentation by intestinal microflora is imputed to be an important energy source for enteric gut pathogens abbrgrpabbr bid="B66"66abbrabbrgrp. Both gene clusters are conserved across all the other seven enterocolitica-branch species, but are missing from itY. pestis itand itY. pseudotuberculosisit. itY. ruckeri itcontained a single [NiFe]-containing hydrogenase complex.p
tbl id="T5"titlepTable 5ptitlecaptionpKey niche-specific genes in itYersiniaitpcaptiontblbdy cols="10"
r
c
p
c
c ca="left"
p
b
itcbiit
b
p
c
c ca="left"
p
b
itpduit
b
p
c
c ca="left"
p
b
itttrit
b
p
c
c ca="left"
p
b
iteutit
b
p
c
c ca="left"
p
b
ithyd-2it
b
p
c
c ca="left"
p
b
ithyd-4it
b
p
c
c ca="left"
p
b
itureit
b
p
c
c ca="left"
p
b
itmtnit
b
p
c
c ca="left"
p
b
itopgit
b
p
c
r
r
c cspan="10"
hr
c
r
r
c ca="left"
p
itY. enterocoliticait
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p-p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. aldovaeit
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. bercovieriit
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p
iteutABCit
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. frederikseniiit
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p-p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. intermediait
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p
iteutSPQTDMNEJGHABCLKRit
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. kristenseniiit
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p-p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. mollaretiiit
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p-p
c
c ca="left"
p
iteutABCit
p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. rohdeiit
p
c
c ca="left"
p+p
c
c ca="left"
p-p
c
c ca="left"
p+p
c
c ca="left"
p-p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. ruckeriit
p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p+- hyfABCGHINfdhFp
c
c ca="left"
p+- (hyaD, hypEDB)p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. pseudotuberculosisit
p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
c ca="left"
p+p
c
r
r
c ca="left"
p
itY. pestisit
p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
c ca="left"
p+-p
c
c ca="left"
p-p
c
c ca="left"
p-p
c
r
tblbdytblfn
pAbbreviations: itcbiit, cobalamin (vitamin B12) biosynthesis; itpduit, 1,2-propanediol utilization; itttrit, tetrathionate respiration; iteutit, ethanolamine degradation; ithyd-2 itand ithyd-4it, hydrogenases 2 and 4, respectively; itureit, urease; itmtnit, methionine salvage pathway; itopgit, osmoprotectant (synthesis of periplasmic branched glucans).p
tblfntbl
pitY. ruckeriit, the most evolutionarily distant member of the genus (Figure figr fid="F1"1figr) with the smallest genome (3.7 Mb), had several features that were distinctive from its cogeners. The itY. ruckeri itO-antigen operon contained a itneuB itsialic acid synthase gene, therefore the bacterium was predicted to produce a sialated outer surface structure. Among the common itYersinia itgenes that are missing only in itY. ruckeri itwere those for xylose utilization and urease activity, consistent with phenotypes that have long been known in clinical microbiology abbrgrpabbr bid="B67"67abbrabbrgrp (Table tblr tid="T3"3tblr). Surprisingly, we discovered that itY. ruckeri itwas also missing the itmtnKADCBEU itgene cluster that comprises the majority of the methionine salvage pathway abbrgrpabbr bid="B68"68abbrabbrgrp found in most other Yersiniae. These genes have also been deleted from itY. pestisit, but as with itY. ruckeriit, the itmtnN it(methylthioadenosine nucleosidase) is maintained. The loss of these genes in itY. pestis ithas been interpreted as a consequence of adaptation to an obligate host-dwelling lifecycle, where the availability of the sulfur-containing amino acids is not a nutritional limitation abbrgrpabbr bid="B15"15abbrabbrgrp.p
sec
sec
sec
st
pDiscussionp
st
pWhole-genome shotgun sequencing by high-throughput bead-based pyrosequencing has proved remarkably useful for the large-scale sequencing of closely related bacteria abbrgrpabbr bid="B49"49abbrabbr bid="B69"69abbrabbr bid="B70"70abbrabbr bid="B71"71abbrabbr bid="B72"72abbrabbr bid="B73"73abbrabbr bid="B74"74abbrabbrgrp. High-quality itde novo itassemblies can be obtained with relatively few errors and gaps when the sequence read coverage redundancy is 15-fold or greater. Closing all the gaps in each genome sequence is time-consuming and costly; therefore, in the near future there will be an excess of draft bacterial sequences versus closed genomes in public databases. Our analysis strategy here melds both draft and complete genomes using consistent automated annotation that is scalable to encompass potentially much larger datasets. High quality draft sequencing is likely to shortly supersede comparative genome hybridization using microarrays abbrgrpabbr bid="B25"25abbrabbr bid="B58"58abbrabbr bid="B75"75abbrabbr bid="B76"76abbrabbrgrp as the most popular strategy for genome-wide bacterial comparisons. Genome sequence datasets can be used to shed light on the novel functions in close relatives that may have been lost in the pathogen of interest, as well as orthologs in genomes that fall below the threshold for hybridization-based detection. The problems of using microarrays for comparisons of more diverse bacterial taxa are illustrated in a study of the itYersinia itgenus, using many of the strains sequenced in this work, where the estimated number of core genes was found to be only 292 abbrgrpabbr bid="B25"25abbrabbrgrp.p
pWe cannot claim complete coverage of all the type strains of the itYersinia itgenus, as three new species have been created abbrgrpabbr bid="B77"77abbrabbr bid="B78"78abbrabbr bid="B79"79abbrabbrgrp since our work began. Nonetheless, from this extensive genomic survey we have attempted to categorize the features that define itYersiniait. The core of about 2,500 proteins present in all 11 species is not a subset of any other enterobacterial genome. Species of the itY. enterocolitica itclade (Figure figr fid="F1"1figr) have overall a similar array of protein functions and contain a number of conserved gene clusters (cobalamin, hydrogenases, ureases, and so on) found in other bacteria (itHelicobacterit, itCampylobacteitr, itSalmonellait, itEscherichia coliit) that colonize the mammalian gut. itY. pestis ithas lost many of these genes by deletion or disruption since its split from the enteric pathogen itY. pseudotuberculosis itand adoption of an insect vector-mediated pathogenicity mode. The smaller itY. ruckeri itchromosome does not appear to result from recent reductive evolution (as is the case of itY. pestisit), evidenced by the relatively low number of frameshifts and pseudogenes, and the normal amount of repetitive contigs in the itnewbler itgenome assembly. Like itY. pestisit, itY. ruckeri itlacks urease, methionine salvage genes, and B12-related metabolism. The prevailing consensus is that the pathway of transmission of red mouth disease in fish is gastrointestinal yet the similarities of itY. ruckeri itgenome reduction to itY. pestis ithint at an alternative mode of infection for itY. ruckeriit.p
pThis comparative genomic study reaffirms that the distinguishing features of the high-level mammalian pathogens is the acquisition of a particular set of mobile elements: HPI, the pYV, pMT1 and pPCP plasmids, and the YADI island. However, the eight species sequenced in this study believed to have either low or zero potential for human infection, contain numerous, apparently horizontally transferred genes that would be considered putative virulence determinants if discovered in the genome of a more serious pathogen. Two examples are yaldo0001_40900 (bile salt hydrolase) and yfred0001_36480, an ortholog of the TibA adhesin of enterotoxigenic itE. coliit. Bile salt hydrolase in pathogenic itBrucella abortus ithas been shown to enhance bile resistance during oral mouse infections abbrgrpabbr bid="B80"80abbrabbrgrp and the TibA adhesin forms a biofilm that mediates human cell invasion abbrgrpabbr bid="B81"81abbrabbrgrp. The low-virulence species contain a similar (and in some cases greater) number of matches to known drug resistance mechanisms that have been curated in the Antibiotic Resistance Genes Database abbrgrpabbr bid="B82"82abbrabbrgrp (Additional file supplr sid="S23"23supplr, abbrgrpabbr bid="B83"83abbrabbrgrp). Adding DNA, stirring and reducing abbrgrpabbr bid="B17"17abbrabbrgrp is, therefore, the general recipe for itYersinia itgenome evolution rather than a formula specific to pathogens. Comparative genomic studies such as these can be used to enhance our ability to rapidly assess the virulence potential of a genome sequence of an emerging pathogen and we plan to continue to build more extensive databases of non-pathogenic itYersinia itgenomes that will allow us to draw conclusions with more statistical power possible than just 11 representative species.p
suppl id="S23"
title
pAdditional file 23p
title
caption
pPutative antibiotic resistance genes in the itYersinia itgenus determined using the Antibiotic Resistance Genes Databasep
caption
text
pPutative antibiotic resistance genes in the itYersinia itgenus determined using the Antibiotic Resistance Genes Database abbrgrpabbr bid="B45"45abbrabbrgrp.p
text
file name="gb-2010-11-1-r1-S23.xls"
pClick here for filep
file
suppl
sec
sec
st
pConclusionsp
st
pGenomes of the 11 itYersinia itspecies studied range in estimated size from 3.7 to 4.8 Mb. The nucleotide diversity (Π) of the conserved backbone based on large collinear conserved blocks was calculated to be 0.27. There were no orthologs of genes and predicted proteins in the virulence-associated plasmids pYV, pMT1, and pPla, and the HPI of itY. pestis itin the genomes of the type strains eight non- or low-pathogenic itYersinia itspeciesp
pApart from functions encoded on the aforementioned plasmids, HPI and YAPI regions, only nine proteins detected as common to all three itYersinia itpathogen species (itY. pestisit, itY. enterocolitica itand itY. pseudotuberculosisit) were not found on at least one of the other eight species. Therefore, our study is in agreement with the hypothesis that genes acquired by recent horizontal transfer effectively define the members of the itYersinia itgenus virulent for humans.p
pThe core proteome of the 11 itYersinia itspecies consists of approximately 2,500 proteins. itYersinia itgenomes had a similar global partition of protein functions, as measured by the distribution of COG families. Genome to genome variation in islands with genes encoding functions such as ureases, hydrogenases and B12 cofactor metabolite reactions may reflect adaptations to colonizing specific host habitats.p
pitY. ruckeriit, a salmonid fish pathogen, is the earliest branching member of the genus and has the smallest genome (3.7 Mb). Like itY. pestisit, itY. ruckeri itlacks functional urease, methionine salvage genes, and B12-related metabolism. These losses may reflect adaptation to a lifestyle that does not include colonization of the mammalian gut.p
pThe absence of the YAPI island in any of the seven 'itY. enterocolitica itclade' genomes likely indicates that YAPI was acquired independently in itY. enterocolitica itand itY. pseudotuberculosisit.p
pWe identified 171 and 100 regions within the genomes of itY. pestis itand itY. enterocoliticait, respectively, that represented potential candidates for the design of nucleotide sequence-based assays for unique detection of each pathogen.p
sec
sec
st
pMaterials and methodsp
st
sec
st
pBacterial strainsp
st
pType strains of the eight itYersinia itspecies sequenced in this study (Table tblr tid="T1"1tblr) were acquired from the American Type Culture Collection (ATCC) and propagated at 37°C or 25°C (itY. ruckeriit) on Luria media. DNA for genome sequencing was prepared from overnight broth cultures propagated from single colonies streaked on a Luria agar plate using the Promega Wizard Maxiprep System (Promega, Madison, WI, USA).p
sec
sec
st
pGenome sequencing and assemblyp
st
pGenomes were sequenced using the Genome Sequencer 20 Instrument (454 Life Sequencing Inc., Branford, CT) abbrgrpabbr bid="B34"34abbrabbrgrp. Libraries for sequencing were prepared from 5 μg of genomic DNA. The sequencing reads for each project were assembled itde novo itusing the itnewbler itprogram (version 01.51.02; 454 Life Sciences Inc).p
sec
sec
st
pOptical mappingp
st
pOptical maps abbrgrpabbr bid="B38"38abbrabbrgrp for each genome using the restriction enzymes itAflitII and itNheitI (itY. aldovae itand itY. kristensenii itonly have maps for itAflitII) were constructed by Opgen Inc. (Madison, WI). The itnewbler itassemblies for each genome were scaffolded using the optical maps and the SOMA package abbrgrpabbr bid="B39"39abbrabbrgrp (Additional file supplr sid="S4"4supplr). Assemblies that did not align against the optical map were tested for high read coverage, unusual GC content, and good matches to plasmid-associated genes from the ACLAME database abbrgrpabbr bid="B84"84abbrabbrgrp (BLAST E-value less than 10sup-20sup) to identify sequences that could potentially be part of an extrachromosomal element.p
sec
sec
st
pDetection of disrupted genesp
st
pWe used two methods for detecting disrupted proteins used. In the first method clustered protein groups were used to adduce evidence for possible gene disruption events. The clusters were parsed for pairs of proteins that met the following criteria: both from the same genome; encoded by genes located on the same strand with less than 200 bp separating their frames; and total length of the combined genes was not greater than 120% of the longest gene in the cluster. The second method used was the FSFIND algorithm abbrgrpabbr bid="B85"85abbrabbrgrp with a standard bacterial gene model to compare the accumulation of predicted frameshifts across different genomes.p
sec
sec
st
pAssembly validationp
st
pIn order to rule out artifacts due to undocumented features of the itnewbler itassemblies, new assemblies were generated for validation purposes by re-mapping all the shotgun reads to the sequence of the assembled contigs using AMOScmp abbrgrpabbr bid="B86"86abbrabbrgrp. The resulting assembly was then subjected to analysis using the itamosvalidatite package abbrgrpabbr bid="B37"37abbrabbrgrp. The output of this program includes a list of genomic regions that contain inconsistencies highlighting possible misassemblies. The resulting regions were manually inspected to reduce the possibility of assembly errors. The regions flagged by the itamosvalidate itpackage are provided in GFF (general feature format), compatible with most genome browsers (Additional file supplr sid="S3"3supplr).p
sec
sec
st
pInsertion sequences and itde novo itrepeat findingp
st
pThe presence of repeats is known to confound assembly programs and the itnewbler itassembler is known to collapse high-fidelity repeat instances into a single contig. To account for the possibility of such misassemblies, we computed the copy number of contigs based on coverage statistics and used this information to correct our estimates for the abundance of classes of repeats (Additional file 3). To find known insertion sequences, the genomes were scanned for matches using the IS finder web service abbrgrpabbr bid="B40"40abbrabbrgrp with a BLAST E-value threshold of 10sup-10 sup(matches to known repeat contigs were counted as multiple matches based on the coverage of the contig). In addition, we searched for common repeat sequences in the genome using the RepeatScout program abbrgrpabbr bid="B41"41abbrabbrgrp after duplicating known repeat contigs. The repeats found in each genome were collected (64 sequences) and transformed into a non-redundant set of 44 sequences using the CD-HIT program abbrgrpabbr bid="B87"87abbrabbrgrp (Additional file supplr sid="S6"6supplr). The repeats found were then searched against all the genomes using BLAST with an E-value threshold of 10sup-10 supto record matches. The resultant figures for repeat content are estimations that may be lower than the true number found in the genomes.p
sec
sec
st
pFinding unique DNA signatures in itY. pestis itand itY. enterocoliticaitp
st
pDNA signatures for the itY. pestis itand the itY. enterocolitica itgenomes were identified using the Insignia pipeline abbrgrpabbr bid="B44"44abbrabbrgrp. Signatures of 100 bp or longer were considered good candidates for the design of detection assays. These signatures were then compared with the genomes of the itYersinia itstrains sequenced during the current study using the MUMmer package abbrgrpabbr bid="B88"88abbrabbrgrp with default parameters. Signatures that matched by more than 40 bp were deemed invalidated, as they would likely lead to false-positive results.p
sec
sec
st
pAutomated annotationp
st
pWe used DIYA abbrgrpabbr bid="B89"89abbrabbrgrp for automated annotation, which is a pipeline for integrating bacterial analysis tools. Using DIYA, the assemblies generated by itnewbler itwere scaffolded based on the optical map, concatenated, and used as a template for the programs GLIMMER abbrgrpabbr bid="B90"90abbrabbrgrp, tRNASCAN-SE abbrgrpabbr bid="B91"91abbrabbrgrp, and RNAmmer abbrgrpabbr bid="B92"92abbrabbrgrp for prediction of open reading frames and RNA genes, respectively. All predicted proteins encoded by each coding sequence were compared against a database of all proteins predicted from the canonical annotation of itY. pestis itCO92 abbrgrpabbr bid="B9"9abbrabbrgrp as a preliminary screen for potentially novel functions. The GenBank format files created from the eight genomes sequenced in this study were combined with other DIYA-annotated, published whole genomes to form a dataset for analysis. All proteins were searched against the UniRef50 database (July 2008) abbrgrpabbr bid="B93"93abbrabbrgrp using BLASTP abbrgrpabbr bid="B94"94abbrabbrgrp and against the Conserved Domain Database abbrgrpabbr bid="B95"95abbrabbrgrp using RPSBLAST abbrgrpabbr bid="B96"96abbrabbrgrp with an E-value threshold of 10sup-10 supto record matches.p
sec
sec
st
pDatabase accession numbersp
st
pThe annotated genome data were submitted to NCBI GenBank and the sequence data submitted to the NCBI Short Read Archive (SRA). The accession numbers are: itY. rohdeiit, ATCC_43380: [Genbank:ext-link ext-link-id="ACCD00000000" ext-link-type="gen"ACCD00000000ext-link][SRA:SRA009766.1]; itY. ruckeri itATCC_29473: [Genbank:ext-link ext-link-id="ACCC00000000" ext-link-type="gen"ACCC00000000ext-link][SRA:SRA009767.1]; itY. aldovae itATCC_35236: [Genbank:ext-link ext-link-id="ACCB00000000" ext-link-type="gen"ACCB00000000ext-link][SRA:SRA009760.1]; itY. kristensenii itATCC_33638: [Genbank:ext-link ext-link-id="ACCA00000000" ext-link-type="gen"ACCA00000000ext-link][SRA:SRA009764.1]; itY. intermedia itATCC_29909: [Genbank:ext-link ext-link-id="AALF00000000" ext-link-type="gen"AALF00000000ext-link][SRA:SRA009763.1]; itY. frederiksenii itATCC_33641: [Genbank: ext-link ext-link-id="AALE00000000" ext-link-type="gen"AALE00000000ext-link][SRA:SRA009762.1]; itY. mollaretii itATCC_43969: [Genbank:ext-link ext-link-id="AALD00000000" ext-link-type="gen"AALD00000000ext-link][SRA:SRA009765.1]; itY. bercovieri itATCC_43970: [Genbank:ext-link ext-link-id="AALC00000000" ext-link-type="gen"AALC00000000ext-link][SRA:SRA009761.1].p
sec
sec
st
pWhole-genome alignment using MAUVEp
st
pitYersinia itgenomes were aligned using the standard MAUVE abbrgrpabbr bid="B46"46abbrabbrgrp algorithm with default settings. A cutoff for 1,500 bp was set as the minimum LCB length. LCBs for each genome were extracted from the output of the program and concatenated. From the alignment nucleotide diversity was calculated by an in-house script using positions where there was a base in all 11 genomes. Because of the size of the dataset, the calculated value of Π is very robust in terms of sequence error. We calculated that 112,696 nucleotides of sequence in the concatenated core would have to be wrong to alter the estimation of itP itby ± 5% (Additional file supplr sid="S24"24supplr). PHYLIP abbrgrpabbr bid="B51"51abbrabbrgrp programs were used to build a consensus tree of the MAUVE alignment with bootstrapping 1,000 replicates. The underlying model for each replicate was Fitch-Margoliash. The final phylogeny was resolved according to the majority consensus rule.p
suppl id="S24"
title
pAdditional file 24p
title
caption
pCalculations for the estimation of Π from aligned itYersinia itcore genomesp
caption
text
pCalculations for the estimation of Π from aligned itYersinia itcore genomes.p
text
file name="gb-2010-11-1-r1-S24.doc"
pClick here for filep
file
suppl
sec
sec
st
pClustering protein orthologsp
st
pThe complete predicted proteome from all genomes annotated in this study was searched against itself using BLASTP with default parameters. We removed short, spurious, and non-homologous hits by setting a bitscorealignment length filtering threshold of 0.4 and minimum protein length of 30. Predicted proteins passing this filter were clustered into families based on these normalized distances using the MCL algorithm abbrgrpabbr bid="B48"48abbrabbrgrp with an inflation parameter value of 4. These parameters were based on an investigation of clustering 12 completed itE. coli itgenomes, which produced very similar results to a previous study abbrgrpabbr bid="B42"42abbrabbrgrp.p
sec
sec
st
pWhole genome phylogenetic reconstructionp
st
pFrom the results of clustering analysis, 681 proteins were found that had exactly one member in each of the genomes and the length of each protein in the cluster was nearly identical. These protein sequences were aligned using ClustalW abbrgrpabbr bid="B50"50abbrabbrgrp, and individual gene alignments were concatenated into a string of 170,940 amino acids for each genome. Uninformative characters were removed from the dataset using Gblocks abbrgrpabbr bid="B97"97abbrabbrgrp and a phylogeny reconstructed with PHYLIP abbrgrpabbr bid="B51"51abbrabbrgrp under a neighbor-joining model. To evaluate node support, a majority rule-consensus tree of 1,000 bootstrap replicates was computed.p
sec
sec
sec
st
pAbbreviationsp
st
pATCC: American Type Culture Collection; COG: Cluster of Orthologous Groups; HPI: high-pathogenicity island; IS: insertion sequence; LCB: locally collinear block; SRA: Short Read Archive; TTSS: type III secretion system; YAPI: itY. pseudotuberculosis itadhesion pathogenicity island.p
sec
sec
st
pAuthors' contributionsp
st
pTDR, MEZ, LD, and SS were involved in study design. AS, and AM were involved in materials. LD, MPKT, SL, and NNo were involved in 454 sequencing. SS, MPKT, and CC were involved in additional experiments. PEC, TDR, CC, MEZ, ACS, NN, MP, BT, and DDS were involved in data analysis. TDR, MP, and NN wrote the paper.p
sec
bdy
bm
ack
sec
st
pAcknowledgementsp
st
pWe would like to thank Ayra Akmal, Kim Bishop-Lilly, Mike Cariaso, Brian Osborne, Bill Klimke, Tim Welch, Jennifer Tsai, Cheryl Timms Strauss and members of the 454 Service Center for their help and advice in completing this manuscript. This work was supported by grant TMTI0068_07_NM_T from the Joint Science and Technology Office for Chemical and Biological Defense (JSTO-CBD), Defense Threat Reduction Agency Initiative to TDR. The views expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the US Department of the Navy, US Department of Defense, or the US Government. Some of the authors are employees of the US Government, and this work was prepared as part of their official duties. Title 17 USC §105 provides that 'Copyright protection under this title is not available for any work of the United States Government.' Title 17 USC §101 defines a US Government work as a work prepared by a military service member or employee of the US Government as part of that person's official duties.p
sec
ack
refgrpbibl id="B1"titlepThe Microbial Rosetta Stone Database: a compilation of global and emerging infectious microorganisms and bioterrorist threat agents.ptitleaugausnmEckersnmfnmDJfnmauausnmSampathsnmfnmRfnmauausnmWillettsnmfnmPfnmauausnmWyattsnmfnmJRfnmauausnmSamantsnmfnmVfnmauausnmMassiresnmfnmCfnmauausnmHallsnmfnmTAfnmauausnmHarisnmfnmKfnmauausnmMcNeilsnmfnmJAfnmauausnmBuchen-OsmondsnmfnmCfnmauausnmBudowlesnmfnmBfnmauaugsourceBMC Microbiolsourcepubdate2005pubdatevolume5volumefpage19fpagexrefbibpubidlistpubid idtype="doi"10.11861471-2180-5-19pubidpubid idtype="pmcid"1127111pubidpubid idtype="pmpid" link="fulltext"15850481pubidpubidlistxrefbibbiblbibl id="B2"titlepYersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis.ptitleaugausnmAchtmansnmfnmMfnmauausnmZurthsnmfnmKfnmauausnmMorellisnmfnmGfnmauausnmTorreasnmfnmGfnmauausnmGuiyoulesnmfnmAfnmauausnmCarnielsnmfnmEfnmauaugsourceProc Natl Acad Sci USAsourcepubdate1999pubdatevolume96volumefpage14043fpagelpage14048lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.96.24.14043pubidpubid idtype="pmcid"24187pubidpubid idtype="pmpid" link="fulltext"10570195pubidpubidlistxrefbibbiblbibl id="B3"titlepMolecular mechanisms of pathogenicity: how do pathogenic microorganisms develop cross-kingdom host jumpsptitleaugausnmvan BaarlensnmfnmPfnmauausnmvan BelkumsnmfnmAfnmauausnmSummerbellsnmfnmRCfnmauausnmCroussnmfnmPWfnmauausnmThommasnmfnmBPfnmauaugsourceFEMS Microbiol Revsourcepubdate2007pubdatevolume31volumefpage239fpagelpage277lpagexrefbibpubidlistpubid idtype="doi"10.1111j.1574-6976.2007.00065.xpubidpubid idtype="pmpid" link="fulltext"17326816pubidpubidlistxrefbibbiblbibl id="B4"titlepGlobal Genetic Population Structure of Bacillus anthracis.ptitleaugausnmVan ErtsnmfnmMNfnmauausnmEasterdaysnmfnmWRfnmauausnmHuynhsnmfnmLYfnmauausnmOkinakasnmfnmRTfnmauausnmHugh-JonessnmfnmMEfnmauausnmRavelsnmfnmJfnmauausnmZaneckisnmfnmSRfnmauausnmPearsonsnmfnmTfnmauausnmSimonsonsnmfnmTSfnmauausnmU'RensnmfnmJMfnmauausnmKachursnmfnmSMfnmauausnmLeadem-DoughertysnmfnmRRfnmauausnmRhotonsnmfnmSDfnmauausnmZinsersnmfnmGfnmauausnmFarlowsnmfnmJfnmauausnmCokersnmfnmPRfnmauausnmSmithsnmfnmKLfnmauausnmWangsnmfnmBfnmauausnmKeneficsnmfnmLJfnmauausnmFraser-LiggettsnmfnmCMfnmauausnmWagnersnmfnmDMfnmauausnmKeimsnmfnmPfnmauaugsourcePLoS ONEsourcepubdate2007pubdatevolume2volumefpagee461fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.pone.0000461pubidpubid idtype="pmcid"1866244pubidpubid idtype="pmpid" link="fulltext"17520020pubidpubidlistxrefbibbiblbibl id="B5"titlepMicroarray-based resequencing of multiple Bacillus anthracis isolates.ptitleaugausnmZwicksnmfnmMEfnmauausnmMcAfeesnmfnmFfnmauausnmCutlersnmfnmDJfnmauausnmReadsnmfnmTDfnmauausnmRavelsnmfnmJfnmauausnmBowmansnmfnmGRfnmauausnmGallowaysnmfnmDRfnmauausnmMateczunsnmfnmAfnmauaugsourceGenome Biolsourcepubdate2005pubdatevolume6volumefpageR10fpagexrefbibpubidlistpubid idtype="doi"10.1186gb-2004-6-1-r10pubidpubid idtype="pmcid"549062pubidpubid idtype="pmpid" link="fulltext"15642093pubidpubidlistxrefbibbiblbibl id="B6"titlepGenomic fluidity and pathogenic bacteria: applications in diagnostics, epidemiology and intervention.ptitleaugausnmAhmedsnmfnmNfnmauausnmDobrindtsnmfnmUfnmauausnmHackersnmfnmJfnmauausnmHasnainsnmfnmSEfnmauaugsourceNat Rev Microbiolsourcepubdate2008pubdatevolume6volumefpage387fpagelpage394lpagexrefbibpubidlistpubid idtype="doi"10.1038nrmicro1889pubidpubid idtype="pmpid" link="fulltext"18392032pubidpubidlistxrefbibbiblbibl id="B7"titlepThe impact of next-generation sequencing technology on genetics.ptitleaugausnmMardissnmfnmERfnmauaugsourceTrends Genetsourcepubdate2008pubdatevolume24volumefpage133fpagelpage141lpagexrefbibpubid idtype="pmpid" link="fulltext"18262675pubidxrefbibbiblbibl id="B8"titlepNext-generation DNA sequencing.ptitleaugausnmShenduresnmfnmJfnmauausnmJisnmfnmHfnmauaugsourceNat Biotechnolsourcepubdate2008pubdatevolume26volumefpage1135fpagelpage1145lpagexrefbibpubidlistpubid idtype="doi"10.1038nbt1486pubidpubid idtype="pmpid" link="fulltext"18846087pubidpubidlistxrefbibbiblbibl id="B9"titlepGenome sequence of Yersinia pestis, the causative agent of plague.ptitleaugausnmParkhillsnmfnmJfnmauausnmWrensnmfnmBWfnmauausnmThomsonsnmfnmNRfnmauausnmTitballsnmfnmRWfnmauausnmHoldensnmfnmMTfnmauausnmPrenticesnmfnmMBfnmauausnmSebaihiasnmfnmMfnmauausnmJamessnmfnmKDfnmauausnmChurchersnmfnmCfnmauausnmMungallsnmfnmKLfnmauausnmBakersnmfnmSfnmauausnmBashamsnmfnmDfnmauausnmBentleysnmfnmSDfnmauausnmBrookssnmfnmKfnmauausnmCerdeño-TárragasnmfnmAMfnmauausnmChillingworthsnmfnmTfnmauausnmCroninsnmfnmAfnmauausnmDaviessnmfnmRMfnmauausnmDavissnmfnmPfnmauausnmDougansnmfnmGfnmauausnmFeltwellsnmfnmTfnmauausnmHamlinsnmfnmNfnmauausnmHolroydsnmfnmSfnmauausnmJagelssnmfnmKfnmauausnmKarlyshevsnmfnmAVfnmauausnmLeathersnmfnmSfnmauausnmMoulesnmfnmSfnmauausnmOystonsnmfnmPCfnmauausnmQuailsnmfnmMfnmauausnmRutherfordsnmfnmKfnmauetalaugsourceNaturesourcepubdate2001pubdatevolume413volumefpage523fpagelpage527lpagexrefbibpubidlistpubid idtype="doi"10.103835097083pubidpubid idtype="pmpid" link="fulltext"11586360pubidpubidlistxrefbibbiblbibl id="B10"titlepGenome sequence of Yersinia pestis KIM.ptitleaugausnmDengsnmfnmWfnmauausnmBurlandsnmfnmVfnmauausnmPlunkettsnmfnmGfnmauausnmBoutinsnmfnmAfnmauausnmMayhewsnmfnmGFfnmauausnmLisssnmfnmPfnmauausnmPernasnmfnmNTfnmauausnmRosesnmfnmDJfnmauausnmMausnmfnmBfnmauausnmZhousnmfnmSfnmauausnmSchwartzsnmfnmDCfnmauausnmFetherstonsnmfnmJDfnmauausnmLindlersnmfnmLEfnmauausnmBrubakersnmfnmRRfnmauausnmPlanosnmfnmGVfnmauausnmStraleysnmfnmSCfnmauausnmMcDonoughsnmfnmKAfnmauausnmNillessnmfnmMLfnmauausnmMatsonsnmfnmJSfnmauausnmBlattnersnmfnmFRfnmauausnmPerrysnmfnmRDfnmauaugsourceJ Bacteriolsourcepubdate2002pubdatevolume184volumefpage4601fpagelpage4611lpagexrefbibpubidlistpubid idtype="doi"10.1128JB.184.16.4601-4611.2002pubidpubid idtype="pmcid"135232pubidpubid idtype="pmpid" link="fulltext"12142430pubidpubidlistxrefbibbiblbibl id="B11"titlepComplete genome sequence of Yersinia pestis strain 9 an isolate avirulent to humans.ptitleaugausnmSongsnmfnmYfnmauausnmTongsnmfnmZfnmauausnmWangsnmfnmJfnmauausnmWangsnmfnmLfnmauausnmGuosnmfnmZfnmauausnmHansnmfnmYfnmauausnmZhangsnmfnmJfnmauausnmPeisnmfnmDfnmauausnmZhousnmfnmDfnmauausnmQinsnmfnmHfnmauausnmPangsnmfnmXfnmauausnmHansnmfnmYfnmauausnmZhaisnmfnmJfnmauausnmLisnmfnmMfnmauausnmCuisnmfnmBfnmauausnmQisnmfnmZfnmauausnmJinsnmfnmLfnmauausnmDaisnmfnmRfnmauausnmChensnmfnmFfnmauausnmLisnmfnmSfnmauausnmYesnmfnmCfnmauausnmDusnmfnmZfnmauausnmLinsnmfnmWfnmauausnmWangsnmfnmJfnmauausnmYusnmfnmJfnmauausnmYangsnmfnmHfnmauausnmWangsnmfnmJfnmauausnmHuangsnmfnmPfnmauausnmYangsnmfnmRfnmauaugsourceDNA Ressourcepubdate2004zpubdatevolume11volumefpage179fpagelpage197lpagexrefbibpubid idtype="doi"10.1093dnares11.3.179pubidxrefbibbiblbibl id="B12"titlepComplete genome sequence of Yersinia pestis strains Antiqua and Nepal516: evidence of gene reduction in an emerging pathogen.ptitleaugausnmChainsnmfnmPSfnmauausnmHusnmfnmPfnmauausnmMalfattisnmfnmSAfnmauausnmRadnedgesnmfnmLfnmauausnmLarimersnmfnmFfnmauausnmVergezsnmfnmLMfnmauausnmWorshamsnmfnmPfnmauausnmChusnmfnmMCfnmauausnmAndersensnmfnmGLfnmauaugsourceJ Bacteriolsourcepubdate2006pubdatevolume188volumefpage4453fpagelpage4463lpagexrefbibpubidlistpubid idtype="doi"10.1128JB.00124-06pubidpubid idtype="pmcid"1482938pubidpubid idtype="pmpid" link="fulltext"16740952pubidpubidlistxrefbibbiblbibl id="B13"titlepInsights into the evolution of Yersinia pestis through whole-genome comparison with Yersinia pseudotuberculosis.ptitleaugausnmChainsnmfnmPSfnmauausnmCarnielsnmfnmEfnmauausnmLarimersnmfnmFWfnmauausnmLamerdinsnmfnmJfnmauausnmStoutlandsnmfnmPOfnmauausnmRegalasnmfnmWMfnmauausnmGeorgescusnmfnmAMfnmauausnmVergezsnmfnmLMfnmauausnmLandsnmfnmMLfnmauausnmMotinsnmfnmVLfnmauausnmBrubakersnmfnmRRfnmauausnmFowlersnmfnmJfnmauausnmHinnebuschsnmfnmJfnmauausnmMarceausnmfnmMfnmauausnmMediguesnmfnmCfnmauausnmSimonetsnmfnmMfnmauausnmChenal-FrancisquesnmfnmVfnmauausnmSouzasnmfnmBfnmauausnmDacheuxsnmfnmDfnmauausnmElliottsnmfnmJMfnmauausnmDerbisesnmfnmAfnmauausnmHausersnmfnmLJfnmauausnmGarciasnmfnmEfnmauaugsourceProc Natl Acad Sci USAsourcepubdate2004pubdatevolume101volumefpage13826fpagelpage13831lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.0404012101pubidpubid idtype="pmcid"518763pubidpubid idtype="pmpid" link="fulltext"15358858pubidpubidlistxrefbibbiblbibl id="B14"titlepThe complete genome sequence of Yersinia pseudotuberculosis IP31758, the causative agent of Far East scarlet-like fever.ptitleaugausnmEppingersnmfnmMfnmauausnmRosovitzsnmfnmMJfnmauausnmFrickesnmfnmWFfnmauausnmRaskosnmfnmDAfnmauausnmKokorinasnmfnmGfnmauausnmFayollesnmfnmCfnmauausnmLindlersnmfnmLEfnmauausnmCarnielsnmfnmEfnmauausnmRavelsnmfnmJfnmauaugsourcePLoS Genetsourcepubdate2007pubdatevolume3volumefpagee142fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.pgen.0030142pubidpubid idtype="pmcid"1959361pubidpubid idtype="pmpid" link="fulltext"17784789pubidpubidlistxrefbibbiblbibl id="B15"titlepThe Complete Genome Sequence and Comparative Genome Analysis of the High Pathogenicity Yersinia enterocolitica Strain 8081.ptitleaugausnmThomsonsnmfnmNRfnmauausnmHowardsnmfnmSfnmauausnmWrensnmfnmBWfnmauausnmHoldensnmfnmMTfnmauausnmCrossmansnmfnmLfnmauausnmChallissnmfnmGLfnmauausnmChurchersnmfnmCfnmauausnmMungallsnmfnmKfnmauausnmBrookssnmfnmKfnmauausnmChillingworthsnmfnmTfnmauausnmFeltwellsnmfnmTfnmauausnmAbdellahsnmfnmZfnmauausnmHausersnmfnmHfnmauausnmJagelssnmfnmKfnmauausnmMaddisonsnmfnmMfnmauausnmMoulesnmfnmSfnmauausnmSanderssnmfnmMfnmauausnmWhiteheadsnmfnmSfnmauausnmQuailsnmfnmMAfnmauausnmDougansnmfnmGfnmauausnmParkhillsnmfnmJfnmauausnmPrenticesnmfnmMBfnmauaugsourcePLoS Genetsourcepubdate2006pubdatevolume2volumefpagee206fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.pgen.0020206pubidpubid idtype="pmcid"1698947pubidpubid idtype="pmpid" link="fulltext"17173484pubidpubidlistxrefbibbiblbibl id="B16"titlepYersinia pestis and the plague.ptitleaugausnmRollinssnmfnmSEfnmauausnmRollinssnmfnmSMfnmauausnmRyansnmfnmETfnmauaugsourceAm J Clin Patholsourcepubdate2003pubdatevolume119volumeissueSupplissuefpageS78fpagelpage85lpagexrefbibpubid idtype="pmpid"12951845pubidxrefbibbiblbibl id="B17"titlepThe yersiniae--a model genus to study the rapid evolution of bacterial pathogens.ptitleaugausnmWrensnmfnmBWfnmauaugsourceNat Rev Microbiolsourcepubdate2003pubdatevolume1volumefpage55fpagelpage64lpagexrefbibpubidlistpubid idtype="doi"10.1038nrmicro730pubidpubid idtype="pmpid" link="fulltext"15040180pubidpubidlistxrefbibbiblbibl id="B18"titlepThe Yersinia Ysc-Yop virulence apparatus.ptitleaugausnmCornelissnmfnmGRfnmauaugsourceInt J Med Microbiolsourcepubdate2002pubdatevolume291volumefpage455fpagelpage462lpagexrefbibpubidlistpubid idtype="doi"10.10781438-4221-00153pubidpubid idtype="pmpid"11890544pubidpubidlistxrefbibbiblbibl id="B19"titlepitYersinia iteffectors target mammalian signaling pathways.ptitleaugausnmJurissnmfnmSJfnmauausnmShaosnmfnmFfnmauausnmDIxonsnmfnmJEfnmauaugsourceCell Microbiolsourcepubdate2002pubdatevolume4volumefpage201fpagelpage211lpagexrefbibpubidlistpubid idtype="doi"10.1046j.1462-5822.2002.00182.xpubidpubid idtype="pmpid" link="fulltext"11952637pubidpubidlistxrefbibbiblbibl id="B20"titlepitYersinia itouter proteins: role in modulation of host cell signaling responses and pathogenesis.ptitleaugausnmViboudsnmfnmGIfnmauausnmBliskasnmfnmJBfnmauaugsourceAnnu Rev Microbiolsourcepubdate2005pubdatevolume59volumefpage69fpagelpage89lpagexrefbibpubidlistpubid idtype="doi"10.1146annurev.micro.59.030804.121320pubidpubid idtype="pmpid" link="fulltext"15847602pubidpubidlistxrefbibbiblbibl id="B21"titlepThe Yersinia high-pathogenicity island (HPI): evolutionary and functional aspects.ptitleaugausnmSchubertsnmfnmSfnmauausnmRakinsnmfnmAfnmauausnmHeesemannsnmfnmJfnmauaugsourceInt J Med Microbiolsourcepubdate2004pubdatevolume294volumefpage83fpagelpage94lpagexrefbibpubidlistpubid idtype="doi"10.1016j.ijmm.2004.06.026pubidpubid idtype="pmpid"15493818pubidpubidlistxrefbibbiblbibl id="B22"titlepThe Yersinia high-pathogenicity island: an iron-uptake island.ptitleaugausnmCarnielsnmfnmEfnmauaugsourceMicrobes Infectsourcepubdate2001pubdatevolume3volumefpage561fpagelpage569lpagexrefbibpubidlistpubid idtype="doi"10.1016S1286-4579(01)01412-5pubidpubid idtype="pmpid" link="fulltext"11418330pubidpubidlistxrefbibbiblbibl id="B23"titlepDynamics of genome rearrangement in bacterial populations.ptitleaugausnmDarlingsnmfnmAEfnmauausnmMiklossnmfnmIfnmauausnmRagansnmfnmMAfnmauaugsourcePLoS Genetsourcepubdate2008pubdatevolume4volumefpagee1000128fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.pgen.1000128pubidpubid idtype="pmcid"2483231pubidpubid idtype="pmpid" link="fulltext"18650965pubidpubidlistxrefbibbiblbibl id="B24"titlepIntraspecific diversity of Yersinia pestis.ptitleaugausnmAnisimovsnmfnmAPfnmauausnmLindlersnmfnmLEfnmauausnmPiersnmfnmGBfnmauaugsourceClin Microbiol Revsourcepubdate2004pubdatevolume17volumefpage434fpagelpage464lpagexrefbibpubidlistpubid idtype="doi"10.1128CMR.17.2.434-464.2004pubidpubid idtype="pmcid"387406pubidpubid idtype="pmpid" link="fulltext"15084509pubidpubidlistxrefbibbiblbibl id="B25"titlepYersinia genome diversity disclosed by Yersinia pestis genome-wide DNA microarray.ptitleaugausnmWangsnmfnmXfnmauausnmHansnmfnmYfnmauausnmLisnmfnmYfnmauausnmGuosnmfnmZfnmauausnmSongsnmfnmYfnmauausnmTansnmfnmYfnmauausnmDusnmfnmZfnmauausnmRakinsnmfnmAfnmauausnmZhousnmfnmDfnmauausnmYangsnmfnmRfnmauaugsourceCan J Microbiolsourcepubdate2007pubdatevolume53volumefpage1211fpagelpage1221lpagexrefbibpubidlistpubid idtype="doi"10.1139W07-087pubidpubid idtype="pmpid" link="fulltext"18026215pubidpubidlistxrefbibbiblbibl id="B26"titlepMultiple antimicrobial resistance in plague: an emerging public health risk.ptitleaugausnmWelchsnmfnmTJfnmauausnmFrickesnmfnmWFfnmauausnmMcDermottsnmfnmPFfnmauausnmWhitesnmfnmDGfnmauausnmRossosnmfnmMLfnmauausnmRaskosnmfnmDAfnmauausnmMammelsnmfnmMKfnmauausnmEppingersnmfnmMfnmauausnmRosovitzsnmfnmMJfnmauausnmWagnersnmfnmDfnmauausnmRahalisonsnmfnmLfnmauausnmLeclercsnmfnmJEfnmauausnmHinshawsnmfnmJMfnmauausnmLindlersnmfnmLEfnmauausnmCebulasnmfnmTAfnmauausnmCarnielsnmfnmEfnmauausnmRavelsnmfnmJfnmauaugsourcePLoS ONEsourcepubdate2007pubdatevolume2volumefpagee309fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.pone.0000309pubidpubid idtype="pmcid"1819562pubidpubid idtype="pmpid" link="fulltext"17375195pubidpubidlistxrefbibbiblbibl id="B27"titlepA horizontally acquired filamentous phage contributes to the pathogenicity of the plague bacillus.ptitleaugausnmDerbisesnmfnmAfnmauausnmChenal-FrancisquesnmfnmVfnmauausnmPouillotsnmfnmFfnmauausnmFayollesnmfnmCfnmauausnmPrévostsnmfnmMCfnmauausnmMédiguesnmfnmCfnmauausnmHinnebuschsnmfnmBJfnmauausnmCarnielsnmfnmEfnmauaugsourceMol Microbiolsourcepubdate2007pubdatevolume63volumefpage1145fpagelpage1157lpagexrefbibpubidlistpubid idtype="doi"10.1111j.1365-2958.2006.05570.xpubidpubid idtype="pmpid" link="fulltext"17238929pubidpubidlistxrefbibbiblbibl id="B28"titlepYersiniae other than Y. enterocolitica, Y. pseudotuberculosis, and Y. pestis: the ignored species.ptitleaugausnmSulakvelidzesnmfnmAfnmauaugsourceMicrobes Infectsourcepubdate2000pubdatevolume2volumefpage497fpagelpage513lpagexrefbibpubidlistpubid idtype="doi"10.1016S1286-4579(00)00311-7pubidpubid idtype="pmpid" link="fulltext"10865195pubidpubidlistxrefbibbiblbibl id="B29"titlepGenus XLI. Yersinia Van Loghem 1944, 15AL.ptitleaugausnmBottonesnmfnmEJfnmauausnmBercoviersnmfnmHfnmauausnmMollaretsnmfnmHHfnmauaugsourceBergey's Manual of Systematic Bacteriologysourcepubdate2005pubdatevolume2volumefpage838fpagelpage846lpagebiblbibl id="B30"titlepMultilocus sequence typing for studying genetic relationships among Yersinia species.ptitleaugausnmKotetishvilisnmfnmMfnmauausnmKregersnmfnmAfnmauausnmWauterssnmfnmGfnmauausnmMorrissnmfnmJGfnmsufJrsufauausnmSulakvelidzesnmfnmAfnmauausnmStinesnmfnmOCfnmauaugsourceJ Clin Microbiolsourcepubdate2005pubdatevolume43volumefpage2674fpagelpage2684lpagexrefbibpubidlistpubid idtype="doi"10.1128JCM.43.6.2674-2684.2005pubidpubid idtype="pmcid"1151872pubidpubid idtype="pmpid" link="fulltext"15956383pubidpubidlistxrefbibbiblbibl id="B31"titlepClinical significance of virulence-related assay of Yersinia species.ptitleaugausnmNoblesnmfnmMAfnmauausnmBarteluksnmfnmRLfnmauausnmFreemansnmfnmHJfnmauausnmSubramaniamsnmfnmRfnmauausnmHudsonsnmfnmJBfnmauaugsourceJ Clin Microbiolsourcepubdate1987pubdatevolume25volumefpage802fpagelpage807lpagexrefbibpubidlistpubid idtype="pmcid"266092pubidpubid idtype="pmpid" link="fulltext"3584418pubidpubidlistxrefbibbiblbibl id="B32"titlepPathogenicity of Yersinia kristensenii for mice.ptitleaugausnmRobins-BrownesnmfnmRMfnmauausnmCianciosisnmfnmSfnmauausnmBordunsnmfnmAMfnmauausnmWauterssnmfnmGfnmauaugsourceInfect Immunsourcepubdate1991pubdatevolume59volumefpage162fpagelpage167lpagexrefbibpubidlistpubid idtype="pmcid"257721pubidpubid idtype="pmpid" link="fulltext"1987029pubidpubidlistxrefbibbiblbibl id="B33"titlepMice and moles inhabiting mountainous areas of Shimane Peninsula as sources of infection with Yersinia pseudotuberculosis.ptitleaugausnmFukushimasnmfnmHfnmauausnmGomyodasnmfnmMfnmauausnmKanekosnmfnmSfnmauaugsourceJ Clin Microbiolsourcepubdate1990pubdatevolume28volumefpage2448fpagelpage2455lpagexrefbibpubidlistpubid idtype="pmcid"268204pubidpubid idtype="pmpid" link="fulltext"2254420pubidpubidlistxrefbibbiblbibl id="B34"titlepGenome sequencing in microfabricated high-density picolitre reactors.ptitleaugausnmMarguliessnmfnmMfnmauausnmEgholmsnmfnmMfnmauausnmAltmansnmfnmWEfnmauausnmAttiyasnmfnmSfnmauausnmBadersnmfnmJSfnmauausnmBembensnmfnmLAfnmauausnmBerkasnmfnmJfnmauausnmBravermansnmfnmMSfnmauausnmChensnmfnmYJfnmauausnmChensnmfnmZfnmauausnmDewellsnmfnmSBfnmauausnmDusnmfnmLfnmauausnmFierrosnmfnmJMfnmauausnmGomessnmfnmXVfnmauausnmGodwinsnmfnmBCfnmauausnmHesnmfnmWfnmauausnmHelgesensnmfnmSfnmauausnmHosnmfnmCHfnmauausnmHosnmfnmCHfnmauausnmIrzyksnmfnmGPfnmauausnmJandosnmfnmSCfnmauausnmAlenquersnmfnmMLfnmauausnmJarviesnmfnmTPfnmauausnmJiragesnmfnmKBfnmauausnmKimsnmfnmJBfnmauausnmKnightsnmfnmJRfnmauausnmLanzasnmfnmJRfnmauausnmLeamonsnmfnmJHfnmauausnmLefkowitzsnmfnmSMfnmauausnmLeisnmfnmMfnmauetalaugsourceNaturesourcepubdate2005pubdatevolume437volumefpage376fpagelpage380lpagexrefbibpubidlistpubid idtype="pmcid"1464427pubidpubid idtype="pmpid" link="fulltext"16056220pubidpubidlistxrefbibbiblbibl id="B35"titlepBase-calling of automated sequencer traces using phred. II. Error probabilities.ptitleaugausnmEwingsnmfnmBfnmauausnmGreensnmfnmPfnmauaugsourceGenome Ressourcepubdate1998pubdatevolume8volumefpage186fpagelpage194lpagexrefbibpubid idtype="pmpid" link="fulltext"9521922pubidxrefbibbiblbibl id="B36"titlepQuality scores and SNP detection in sequencing-by-synthesis systems.ptitleaugausnmBrockmansnmfnmWfnmauausnmAlvarezsnmfnmPfnmauausnmYoungsnmfnmSfnmauausnmGarbersnmfnmMfnmauausnmGiannoukossnmfnmGfnmauausnmLeesnmfnmWLfnmauausnmRusssnmfnmCfnmauausnmLandersnmfnmESfnmauausnmNusbaumsnmfnmCfnmauausnmJaffesnmfnmDBfnmauaugsourceGenome Ressourcepubdate2008pubdatevolume18volumefpage763fpagelpage770lpagexrefbibpubidlistpubid idtype="doi"10.1101gr.070227.107pubidpubid idtype="pmcid"2336812pubidpubid idtype="pmpid" link="fulltext"18212088pubidpubidlistxrefbibbiblbibl id="B37"titlepGenome assembly forensics: finding the elusive mis-assembly.ptitleaugausnmPhillippysnmfnmAMfnmauausnmSchatzsnmfnmMCfnmauausnmPopsnmfnmMfnmauaugsourceGenome Biolsourcepubdate2008pubdatevolume9volumefpageR55fpagexrefbibpubidlistpubid idtype="doi"10.1186gb-2008-9-3-r55pubidpubid idtype="pmcid"2397507pubidpubid idtype="pmpid" link="fulltext"18341692pubidpubidlistxrefbibbiblbibl id="B38"titlepMapping the genome one molecule at a time--optical mapping.ptitleaugausnmSamadsnmfnmAHfnmauausnmCaisnmfnmWWfnmauausnmHusnmfnmXfnmauausnmIrvinsnmfnmBfnmauausnmJingsnmfnmJfnmauausnmReedsnmfnmJfnmauausnmMengsnmfnmXfnmauausnmHuangsnmfnmJfnmauausnmHuffsnmfnmEfnmauausnmPortersnmfnmBfnmauaugsourceNaturesourcepubdate1995pubdatevolume378volumefpage516fpagelpage517lpagexrefbibpubidlistpubid idtype="doi"10.1038378516a0pubidpubid idtype="pmpid" link="fulltext"7477412pubidpubidlistxrefbibbiblbibl id="B39"titlepScaffolding and validation of bacterial genome assemblies using optical restriction maps.ptitleaugausnmNagarajansnmfnmNfnmauausnmReadsnmfnmTDfnmauausnmPopsnmfnmMfnmauaugsourceBioinformaticssourcepubdate2008pubdatevolume24volumefpage1229fpagelpage35lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbtn102pubidpubid idtype="pmcid"2373919pubidpubid idtype="pmpid" link="fulltext"18356192pubidpubidlistxrefbibbiblbibl id="B40"titlepISfinder: the reference centre for bacterial insertion sequences.ptitleaugausnmSiguiersnmfnmPfnmauausnmPerochonsnmfnmJfnmauausnmLestradesnmfnmLfnmauausnmMahillonsnmfnmJfnmauausnmChandlersnmfnmMfnmauaugsourceNucleic Acids Ressourcepubdate2006pubdatevolume34volumefpageD32fpagelpage36lpagexrefbibpubidlistpubid idtype="doi"10.1093nargkj014pubidpubid idtype="pmcid"1347377pubidpubid idtype="pmpid" link="fulltext"16381877pubidpubidlistxrefbibbiblbibl id="B41"titlepDe novo identification of repeat families in large genomes.ptitleaugausnmPricesnmfnmALfnmauausnmJonessnmfnmNCfnmauausnmPevznersnmfnmPAfnmauaugsourceBioinformaticssourcepubdate2005pubdatevolume21volumeissueSuppl 1issuefpagei351fpagelpage358lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbti1018pubidpubid idtype="pmpid" link="fulltext"15961478pubidpubidlistxrefbibbiblbibl id="B42"titlepERIC sequences: a novel family of repetitive elements in the genomes of Escherichia coli, Salmonella typhimurium and other enterobacteria.ptitleaugausnmHultonsnmfnmCSfnmauausnmHigginssnmfnmCFfnmauausnmSharpsnmfnmPMfnmauaugsourceMol Microbiolsourcepubdate1991pubdatevolume5volumefpage825fpagelpage834lpagexrefbibpubidlistpubid idtype="doi"10.1111j.1365-2958.1991.tb00755.xpubidpubid idtype="pmpid"1713281pubidpubidlistxrefbibbiblbibl id="B43"titlepStructural organization and functional properties of miniature DNA insertion sequences in yersiniae.ptitleaugausnmDe GregoriosnmfnmEfnmauausnmSilvestrosnmfnmGfnmauausnmVendittisnmfnmRfnmauausnmCarlomagnosnmfnmMSfnmauausnmDi NocerasnmfnmPPfnmauaugsourceJ Bacteriolsourcepubdate2006pubdatevolume188volumefpage7876fpagelpage7884lpagexrefbibpubidlistpubid idtype="doi"10.1128JB.00942-06pubidpubid idtype="pmcid"1636318pubidpubid idtype="pmpid" link="fulltext"16963573pubidpubidlistxrefbibbiblbibl id="B44"titlepComprehensive DNA signature discovery and validation.ptitleaugausnmPhillippysnmfnmAMfnmauausnmMasonsnmfnmJAfnmauausnmAyanbulesnmfnmKfnmauausnmSommersnmfnmDDfnmauausnmTavianisnmfnmEfnmauausnmHuqsnmfnmAfnmauausnmColwellsnmfnmRRfnmauausnmKnightsnmfnmITfnmauausnmSalzbergsnmfnmSLfnmauaugsourcePLoS Comput Biolsourcepubdate2007pubdatevolume3volumefpagee98fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.pcbi.0030098pubidpubid idtype="pmcid"1868776pubidpubid idtype="pmpid" link="fulltext"17511514pubidpubidlistxrefbibbiblbibl id="B45"titlepIslandViewer: an integrated interface for computational identification and visualization of genomic islands.ptitleaugausnmLangillesnmfnmMGfnmauausnmBrinkmansnmfnmFSfnmauaugsourceBioinformaticssourcepubdate2009pubdatevolume25volumefpage664fpagelpage665lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbtp030pubidpubid idtype="pmcid"2647836pubidpubid idtype="pmpid" link="fulltext"19151094pubidpubidlistxrefbibbiblbibl id="B46"titlepMauve: multiple alignment of conserved genomic sequence with rearrangements.ptitleaugausnmDarlingsnmfnmACfnmauausnmMausnmfnmBfnmauausnmBlattnersnmfnmFRfnmauausnmPernasnmfnmNTfnmauaugsourceGenome Ressourcepubdate2004pubdatevolume14volumefpage1394fpagelpage1403lpagexrefbibpubidlistpubid idtype="doi"10.1101gr.2289704pubidpubid idtype="pmcid"442156pubidpubid idtype="pmpid" link="fulltext"15231754pubidpubidlistxrefbibbiblbibl id="B47"titlepMAUVE Aligner User Guideptitleurlhttp:asap.ahabs.wisc.edumauve-alignermauve-user-guideurlbiblbibl id="B48"titlepAn efficient algorithm for large-scale detection of protein families.ptitleaugausnmEnrightsnmfnmAJfnmauausnmVan DongensnmfnmSfnmauausnmOuzounissnmfnmCAfnmauaugsourceNucleic Acids Ressourcepubdate2002pubdatevolume30volumefpage1575fpagelpage1584lpagexrefbibpubidlistpubid idtype="doi"10.1093nar30.7.1575pubidpubid idtype="pmcid"101833pubidpubid idtype="pmpid" link="fulltext"11917018pubidpubidlistxrefbibbiblbibl id="B49"titlepGenome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".ptitleaugausnmTettelinsnmfnmHfnmauausnmMasignanisnmfnmVfnmauausnmCieslewiczsnmfnmMJfnmauausnmDonatisnmfnmCfnmauausnmMedinisnmfnmDfnmauausnmWardsnmfnmNLfnmauausnmAngiuolisnmfnmSVfnmauausnmCrabtreesnmfnmJfnmauausnmJonessnmfnmALfnmauausnmDurkinsnmfnmASfnmauausnmDeboysnmfnmRTfnmauausnmDavidsensnmfnmTMfnmauausnmMorasnmfnmMfnmauausnmScarsellisnmfnmMfnmauausnmMargarit y RossnmfnmIfnmauausnmPetersonsnmfnmJDfnmauausnmHausersnmfnmCRfnmauausnmSundaramsnmfnmJPfnmauausnmNelsonsnmfnmWCfnmauausnmMadupusnmfnmRfnmauausnmBrinkacsnmfnmLMfnmauausnmDodsonsnmfnmRJfnmauausnmRosovitzsnmfnmMJfnmauausnmSullivansnmfnmSAfnmauausnmDaughertysnmfnmSCfnmauausnmHaftsnmfnmDHfnmauausnmSelengutsnmfnmJfnmauausnmGwinnsnmfnmMLfnmauausnmZhousnmfnmLfnmauausnmZafarsnmfnmNfnmauetalaugsourceProc Natl Acad Sci USAsourcepubdate2005pubdatevolume102volumefpage13950fpagelpage13955lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.0506758102pubidpubid idtype="pmcid"1216834pubidpubid idtype="pmpid" link="fulltext"16172379pubidpubidlistxrefbibbiblbibl id="B50"titlepClustal W and Clustal X version 2.0.ptitleaugausnmLarkinsnmfnmMAfnmauausnmBlackshieldssnmfnmGfnmauausnmBrownsnmfnmNPfnmauausnmChennasnmfnmRfnmauausnmMcGettigansnmfnmPAfnmauausnmMcWilliamsnmfnmHfnmauausnmValentinsnmfnmFfnmauausnmWallacesnmfnmIMfnmauausnmWilmsnmfnmAfnmauausnmLopezsnmfnmRfnmauausnmThompsonsnmfnmJDfnmauausnmGibsonsnmfnmTJfnmauausnmHigginssnmfnmDGfnmauaugsourceBioinformaticssourcepubdate2007pubdatevolume23volumefpage2947fpagelpage2948lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbtm404pubidpubid idtype="pmpid" link="fulltext"17846036pubidpubidlistxrefbibbiblbibl id="B51"titlepPHYLIP: Phylogeny Inference Package, version 3.6.ptitleaugausnmFelsensteinsnmfnmJfnmauaugpublisherSeattle, WA, USA.: University of Washingtonpublisherpubdate2001pubdatebiblbibl id="B52"titlepThe COG database: a tool for genome-scale analysis of protein functions and evolution.ptitleaugausnmTatusovsnmfnmRLfnmauausnmGalperinsnmfnmMYfnmauausnmNatalesnmfnmDAfnmauausnmKooninsnmfnmEVfnmauaugsourceNucleic Acids Ressourcepubdate2000pubdatevolume28volumefpage33fpagelpage36lpagexrefbibpubidlistpubid idtype="doi"10.1093nar28.1.33pubidpubid idtype="pmcid"102395pubidpubid idtype="pmpid" link="fulltext"10592175pubidpubidlistxrefbibbiblbibl id="B53"titlepEnhancin, the granulosis virus protein that facilitates nucleopolyhedrovirus (NPV) infections, is a metalloprotease.ptitleaugausnmLeporesnmfnmLSfnmauausnmRoelvinksnmfnmPRfnmauausnmGranadossnmfnmRRfnmauaugsourceJ Invertebr Patholsourcepubdate1996pubdatevolume68volumefpage131fpagelpage140lpagexrefbibpubidlistpubid idtype="doi"10.1006jipa.1996.0070pubidpubid idtype="pmpid" link="fulltext"8858909pubidpubidlistxrefbibbiblbibl id="B54"titlepInsecticidal toxins from the bacterium Photorhabdus luminescens.ptitleaugausnmBowensnmfnmDfnmauausnmRocheleausnmfnmTAfnmauausnmBlackburnsnmfnmMfnmauausnmAndreevsnmfnmOfnmauausnmGolubevasnmfnmEfnmauausnmBhartiasnmfnmRfnmauausnmffrench-ConstantsnmfnmRHfnmauaugsourceSciencesourcepubdate1998pubdatevolume280volumefpage2129fpagelpage2132lpagexrefbibpubidlistpubid idtype="doi"10.1126science.280.5372.2129pubidpubid idtype="pmpid" link="fulltext"9641921pubidpubidlistxrefbibbiblbibl id="B55"titlepPhages and the evolution of bacterial pathogens: from genomic rearrangements to lysogenic conversion.ptitleaugausnmBrussowsnmfnmHfnmauausnmCanchayasnmfnmCfnmauausnmHardtsnmfnmWDfnmauaugsourceMicrobiol Mol Biol Revsourcepubdate2004pubdatevolume68volumefpage560fpagelpage602lpagexrefbibpubidlistpubid idtype="doi"10.1128MMBR.68.3.560-602.2004pubidpubid idtype="pmcid"515249pubidpubid idtype="pmpid" link="fulltext"15353570pubidpubidlistxrefbibbiblbibl id="B56"titlepDescribing ancient horizontal gene transfers at the nucleotide and gene levels by comparative pathogenicity island genometrics.ptitleaugausnmCollynsnmfnmFfnmauausnmGuysnmfnmLfnmauausnmMarceausnmfnmMfnmauausnmSimonetsnmfnmMfnmauausnmRotensnmfnmCAfnmauaugsourceBioinformaticssourcepubdate2006pubdatevolume22volumefpage1072fpagelpage1079lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbti793pubidpubid idtype="pmpid" link="fulltext"16303795pubidpubidlistxrefbibbiblbibl id="B57"titlepYAPI, a new Yersinia pseudotuberculosis pathogenicity island.ptitleaugausnmCollynsnmfnmFfnmauausnmBillaultsnmfnmAfnmauausnmMulletsnmfnmCfnmauausnmSimonetsnmfnmMfnmauausnmMarceausnmfnmMfnmauaugsourceInfect Immunsourcepubdate2004pubdatevolume72volumefpage4784fpagelpage4790lpagexrefbibpubidlistpubid idtype="doi"10.1128IAI.72.8.4784-4790.2004pubidpubid idtype="pmcid"470613pubidpubid idtype="pmpid" link="fulltext"15271940pubidpubidlistxrefbibbiblbibl id="B58"titlepApplication of comparative phylogenomics to study the evolution of Yersinia enterocolitica and to identify genetic differences relating to pathogenicity.ptitleaugausnmHowardsnmfnmSLfnmauausnmGauntsnmfnmMWfnmauausnmHindssnmfnmJfnmauausnmWitneysnmfnmAAfnmauausnmStablersnmfnmRfnmauausnmWrensnmfnmBWfnmauaugsourceJ Bacteriolsourcepubdate2006pubdatevolume188volumefpage3645fpagelpage3653lpagexrefbibpubidlistpubid idtype="doi"10.1128JB.188.10.3645-3653.2006pubidpubid idtype="pmcid"1482848pubidpubid idtype="pmpid" link="fulltext"16672618pubidpubidlistxrefbibbiblbibl id="B59"titlepA chromosomally encoded type III secretion pathway in Yersinia enterocolitica is important in virulence.ptitleaugausnmHallersnmfnmJCfnmauausnmCarlsonsnmfnmSfnmauausnmPedersonsnmfnmKJfnmauausnmPiersonsnmfnmDEfnmauaugsourceMol Microbiolsourcepubdate2000pubdatevolume36volumefpage1436fpagelpage1446lpagexrefbibpubidlistpubid idtype="doi"10.1046j.1365-2958.2000.01964.xpubidpubid idtype="pmpid" link="fulltext"10931293pubidpubidlistxrefbibbiblbibl id="B60"titlepAnalysis of the boundaries of Salmonella pathogenicity island 2 and the corresponding chromosomal region of Escherichia coli K-12.ptitleaugausnmHenselsnmfnmMfnmauausnmSheasnmfnmJEfnmauausnmBaumlersnmfnmAJfnmauausnmGleesonsnmfnmCfnmauausnmBlattnersnmfnmFfnmauausnmHoldensnmfnmDWfnmauaugsourceJ Bacteriolsourcepubdate1997pubdatevolume179volumefpage1105fpagelpage1111lpagexrefbibpubidlistpubid idtype="pmcid"178805pubidpubid idtype="pmpid" link="fulltext"9023191pubidpubidlistxrefbibbiblbibl id="B61"titlepIdentification of a virulence locus encoding a second type III secretion system in Salmonella typhimurium.ptitleaugausnmSheasnmfnmJEfnmauausnmHenselsnmfnmMfnmauausnmGleesonsnmfnmCfnmauausnmHoldensnmfnmDWfnmauaugsourceProc Natl Acad Sci USAsourcepubdate1996pubdatevolume93volumefpage2593fpagelpage2597lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.93.6.2593pubidpubid idtype="pmcid"39842pubidpubid idtype="pmpid"8637919pubidpubidlistxrefbibbiblbibl id="B62"titlepComparative genome analyses of the pathogenic Yersiniae based on the genome sequence of Yersinia enterocolitica strain 8081.ptitleaugausnmThomsonsnmfnmNRfnmauausnmHowardsnmfnmSfnmauausnmWrensnmfnmBWfnmauausnmPrenticesnmfnmMBfnmauaugsourceAdv Exp Med Biolsourcepubdate2007pubdatevolume603volumefpage2fpagelpage16lpagexrefbibpubidlistpubid idtype="doi"full_textpubidpubid idtype="pmpid" link="fulltext"17966400pubidpubidlistxrefbibbiblbibl id="B63"titlepCobalamin synthesis in Yersinia enterocolitica 8081. Functional aspects of a putative metabolic island.ptitleaugausnmPrenticesnmfnmMBfnmauausnmCuccuisnmfnmJfnmauausnmThomsonsnmfnmNfnmauausnmParkhillsnmfnmJfnmauausnmDeerysnmfnmEfnmauausnmWarrensnmfnmMJfnmauaugsourceAdv Exp Med Biolsourcepubdate2003pubdatevolume529volumefpage43fpagelpage46lpagexrefbibpubidlistpubid idtype="doi"full_textpubidpubid idtype="pmpid" link="fulltext"12756726pubidpubidlistxrefbibbiblbibl id="B64"titlepCobalamin (coenzyme B12): synthesis and biological significance.ptitleaugausnmRothsnmfnmJRfnmauausnmLawrencesnmfnmJGfnmauausnmBobiksnmfnmTAfnmauaugsourceAnnu Rev Microbiolsourcepubdate1996pubdatevolume50volumefpage137fpagelpage181lpagexrefbibpubidlistpubid idtype="doi"10.1146annurev.micro.50.1.137pubidpubid idtype="pmpid" link="fulltext"8905078pubidpubidlistxrefbibbiblbibl id="B65"titlepThe 17-gene ethanolamine (eut) operon of Salmonella typhimurium encodes five homologues of carboxysome shell proteins.ptitleaugausnmKofoidsnmfnmEfnmauausnmRappleyesnmfnmCfnmauausnmStojiljkovicsnmfnmIfnmauausnmRothsnmfnmJfnmauaugsourceJ Bacteriolsourcepubdate1999pubdatevolume181volumefpage5317fpagelpage5329lpagexrefbibpubidlistpubid idtype="pmcid"94038pubidpubid idtype="pmpid" link="fulltext"10464203pubidpubidlistxrefbibbiblbibl id="B66"titlepUse of molecular hydrogen as an energy substrate by human pathogenic bacteria.ptitleaugausnmMaiersnmfnmRJfnmauaugsourceBiochem Soc Transsourcepubdate2005pubdatevolume33volumefpage83fpagelpage85lpagexrefbibpubidlistpubid idtype="doi"10.1042BST0330083pubidpubid idtype="pmpid" link="fulltext"15667272pubidpubidlistxrefbibbiblbibl id="B67"titlepYersinia ruckeri sp. nov., the Redmouth (RM) Bacterium.ptitleaugausnmEwingsnmfnmWHfnmauausnmRosssnmfnmAJfnmauausnmBrennersnmfnmDJfnmauausnmRsnmfnmFGfnmauaugsourceInt J Syst Bacteriolsourcepubdate1978pubdatevolume28volumefpage37fpagelpage44lpagebiblbibl id="B68"titlepBacterial variations on the methionine salvage pathway.ptitleaugausnmSekowskasnmfnmAfnmauausnmDénervaudsnmfnmVfnmauausnmAshidasnmfnmHfnmauausnmMichoudsnmfnmKfnmauausnmHaassnmfnmDfnmauausnmYokotasnmfnmAfnmauausnmDanchinsnmfnmAfnmauaugsourceBMC Microbiolsourcepubdate2004pubdatevolume4volumefpage9fpagexrefbibpubidlistpubid idtype="doi"10.11861471-2180-4-9pubidpubid idtype="pmcid"395828pubidpubid idtype="pmpid" link="fulltext"15102328pubidpubidlistxrefbibbiblbibl id="B69"titlepComparative Genomic Analyses of Seventeen Streptococcus pneumoniae Strains: Insights into the Pneumococcal Supragenome.ptitleaugausnmHillersnmfnmNLfnmauausnmJantosnmfnmBfnmauausnmHoggsnmfnmJSfnmauausnmBoissysnmfnmRfnmauausnmYusnmfnmSfnmauausnmPowellsnmfnmEfnmauausnmKeefesnmfnmRfnmauausnmEhrlichsnmfnmNEfnmauausnmShensnmfnmKfnmauausnmHayessnmfnmJfnmauausnmBarbadorasnmfnmKfnmauausnmKlimkesnmfnmWfnmauausnmDernovoysnmfnmDfnmauausnmTatusovasnmfnmTfnmauausnmParkhillsnmfnmJfnmauausnmBentleysnmfnmSDfnmauausnmPostsnmfnmJCfnmauausnmEhrlichsnmfnmGDfnmauausnmHusnmfnmFZfnmauaugsourceJ Bacteriolsourcepubdate2007pubdatevolume189volumefpage8186fpagelpage95lpagexrefbibpubidlistpubid idtype="doi"10.1128JB.00690-07pubidpubid idtype="pmcid"2168654pubidpubid idtype="pmpid" link="fulltext"17675389pubidpubidlistxrefbibbiblbibl id="B70"titlepCharacterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains.ptitleaugausnmHoggsnmfnmJSfnmauausnmHusnmfnmFZfnmauausnmJantosnmfnmBfnmauausnmBoissysnmfnmRfnmauausnmHayessnmfnmJfnmauausnmKeefesnmfnmRfnmauausnmPostsnmfnmJCfnmauausnmEhrlichsnmfnmGDfnmauaugsourceGenome Biolsourcepubdate2007pubdatevolume8volumefpageR103fpagexrefbibpubidlistpubid idtype="doi"10.1186gb-2007-8-6-r103pubidpubid idtype="pmcid"2394751pubidpubid idtype="pmpid" link="fulltext"17550610pubidpubidlistxrefbibbiblbibl id="B71"titlepDynamics of Pseudomonas aeruginosa genome evolution.ptitleaugausnmMatheesnmfnmKfnmauausnmNarasimhansnmfnmGfnmauausnmValdessnmfnmCfnmauausnmQiusnmfnmXfnmauausnmMatewishsnmfnmJMfnmauausnmKoehrsensnmfnmMfnmauausnmRokassnmfnmAfnmauausnmYandavasnmfnmCNfnmauausnmEngelssnmfnmRfnmauausnmZengsnmfnmEfnmauausnmOlavariettasnmfnmRfnmauausnmDoudsnmfnmMfnmauausnmSmithsnmfnmRSfnmauausnmMontgomerysnmfnmPfnmauausnmWhitesnmfnmJRfnmauausnmGodfreysnmfnmPAfnmauausnmKodirasnmfnmCfnmauausnmBirrensnmfnmBfnmauausnmGalagansnmfnmJEfnmauausnmLorysnmfnmSfnmauaugsourceProc Natl Acad Sci USAsourcepubdate2008pubdatevolume105volumefpage3100fpagelpage3105lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.0711982105pubidpubid idtype="pmcid"2268591pubidpubid idtype="pmpid" link="fulltext"18287045pubidpubidlistxrefbibbiblbibl id="B72"titlepHigh-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi.ptitleaugausnmHoltsnmfnmKfnmauausnmParkhillsnmfnmJfnmauausnmMazzonisnmfnmCfnmauausnmRoumagnacsnmfnmPfnmauausnmWeillsnmfnmFfnmauausnmGoodheadsnmfnmIfnmauausnmRancesnmfnmRfnmauausnmBakersnmfnmSfnmauausnmMaskellsnmfnmDfnmauausnmWainsnmfnmJfnmauausnmDoleceksnmfnmCfnmauausnmAchtmansnmfnmMfnmauausnmDougansnmfnmGfnmauaugsourceNat Genetsourcepubdate2008pubdatevolume40volumefpage987fpagelpage93lpagexrefbibpubidlistpubid idtype="doi"10.1038ng.195pubidpubid idtype="pmcid"2652037pubidpubid idtype="pmpid" link="fulltext"18660809pubidpubidlistxrefbibbiblbibl id="B73"titlepPopulation Genomic Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage Formation.ptitleaugausnmSimmonssnmfnmSfnmauausnmDibartolosnmfnmGfnmauausnmDenefsnmfnmVfnmauausnmGoltsmansnmfnmDfnmauausnmThelensnmfnmMfnmauausnmBanfieldsnmfnmJfnmauausnmEisensnmfnmJfnmauaugsourcePlos Biolsourcepubdate2008pubdatevolume6volumefpagee177fpagexrefbibpubidlistpubid idtype="doi"10.1371journal.pbio.0060177pubidpubid idtype="pmcid"2475542pubidpubid idtype="pmpid" link="fulltext"18651792pubidpubidlistxrefbibbiblbibl id="B74"titlepThe pan-genome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates.ptitleaugausnmRaskosnmfnmDfnmauausnmRosovitzsnmfnmMfnmauausnmMyerssnmfnmGfnmauausnmMongodinsnmfnmEfnmauausnmFrickesnmfnmWfnmauausnmGajersnmfnmPfnmauausnmCrabtreesnmfnmJfnmauausnmSperandiosnmfnmVfnmauausnmRavelsnmfnmJfnmauaugsourceJournal of Bacteriologysourcepubdate2008pubdatevolume190volumefpage6881fpagelpage93lpagexrefbibpubidlistpubid idtype="doi"10.1128JB.00619-08pubidpubid idtype="pmcid"2566221pubidpubid idtype="pmpid" link="fulltext"18676672pubidpubidlistxrefbibbiblbibl id="B75"titlepThe genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria.ptitleaugausnmReadsnmfnmTDfnmauausnmPetersonsnmfnmSNfnmauausnmTourassesnmfnmNfnmauausnmBailliesnmfnmLWfnmauausnmPaulsensnmfnmITfnmauausnmNelsonsnmfnmKEfnmauausnmTettelinsnmfnmHfnmauausnmFoutssnmfnmDEfnmauausnmEisensnmfnmJAfnmauausnmGillsnmfnmSRfnmauausnmHoltzapplesnmfnmEKfnmauausnmOkstadsnmfnmOAfnmauausnmHelgasonsnmfnmEfnmauausnmRilstonesnmfnmJfnmauausnmWusnmfnmMfnmauausnmKolonaysnmfnmJFfnmauausnmBeanansnmfnmMJfnmauausnmDodsonsnmfnmRJfnmauausnmBrinkacsnmfnmLMfnmauausnmGwinnsnmfnmMfnmauausnmDeBoysnmfnmRTfnmauausnmMadpusnmfnmRfnmauausnmDaughertysnmfnmSCfnmauausnmDurkinsnmfnmASfnmauausnmHaftsnmfnmDHfnmauausnmNelsonsnmfnmWCfnmauausnmPetersonsnmfnmJDfnmauausnmPopsnmfnmMfnmauausnmKhourisnmfnmHMfnmauausnmRadunesnmfnmDfnmauetalaugsourceNaturesourcepubdate2003pubdatevolume423volumefpage81fpagelpage86lpagexrefbibpubidlistpubid idtype="doi"10.1038nature01586pubidpubid idtype="pmpid" link="fulltext"12721629pubidpubidlistxrefbibbiblbibl id="B76"titlepComplete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae.ptitleaugausnmTettelinsnmfnmHfnmauausnmMasignanisnmfnmVfnmauausnmCieslewiczsnmfnmMJfnmauausnmEisensnmfnmJAfnmauausnmPetersonsnmfnmSfnmauausnmWesselssnmfnmMRfnmauausnmPaulsensnmfnmITfnmauausnmNelsonsnmfnmKEfnmauausnmMargaritsnmfnmIfnmauausnmReadsnmfnmTDfnmauausnmMadoffsnmfnmLCfnmauausnmWolfsnmfnmAMfnmauausnmBeanansnmfnmMJfnmauausnmBrinkacsnmfnmLMfnmauausnmDaughertysnmfnmSCfnmauausnmDeBoysnmfnmRTfnmauausnmDurkinsnmfnmASfnmauausnmKolonaysnmfnmJFfnmauausnmMadupusnmfnmRfnmauausnmLewissnmfnmMRfnmauausnmRadunesnmfnmDfnmauausnmFedorovasnmfnmNBfnmauausnmScanlansnmfnmDfnmauausnmKhourisnmfnmHfnmauausnmMulligansnmfnmSfnmauausnmCartysnmfnmHAfnmauausnmClinesnmfnmRTfnmauausnmVan AkensnmfnmSEfnmauausnmGillsnmfnmJfnmauausnmScarsellisnmfnmMfnmauetalaugsourceProc Natl Acad Sci USAsourcepubdate2002pubdatevolume99volumefpage12391fpagelpage12396lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.182380799pubidpubid idtype="pmcid"129455pubidpubid idtype="pmpid" link="fulltext"12200547pubidpubidlistxrefbibbiblbibl id="B77"titlepYersinia aleksiciae sp. nov.ptitleaugausnmSpraguesnmfnmLDfnmauausnmNeubauersnmfnmHfnmauaugsourceInt J Syst Evol Microbiolsourcepubdate2005pubdatevolume55volumefpage831fpagelpage835lpagexrefbibpubidlistpubid idtype="doi"10.1099ijs.0.63220-0pubidpubid idtype="pmpid" link="fulltext"15774670pubidpubidlistxrefbibbiblbibl id="B78"titlepYersinia similis sp. nov.ptitleaugausnmSpraguesnmfnmLDfnmauausnmScholzsnmfnmHCfnmauausnmAmannsnmfnmSfnmauausnmBussesnmfnmHJfnmauausnmNeubauersnmfnmHfnmauaugsourceInt J Syst Evol Microbiolsourcepubdate2008pubdatevolume58volumefpage952fpagelpage958lpagexrefbibpubidlistpubid idtype="doi"10.1099ijs.0.65417-0pubidpubid idtype="pmpid" link="fulltext"18398201pubidpubidlistxrefbibbiblbibl id="B79"titlepYersinia massiliensis sp. nov., isolated from fresh water.ptitleaugausnmMerhejsnmfnmVfnmauausnmAdekambisnmfnmTfnmauausnmPagniersnmfnmIfnmauausnmRaoultsnmfnmDfnmauausnmDrancourtsnmfnmMfnmauaugsourceInt J Syst Evol Microbiolsourcepubdate2008pubdatevolume58volumefpage779fpagelpage784lpagexrefbibpubidlistpubid idtype="doi"10.1099ijs.0.65219-0pubidpubid idtype="pmpid" link="fulltext"18398169pubidpubidlistxrefbibbiblbibl id="B80"titlepA bile salt hydrolase of Brucella abortus contributes to the establishment of a successful infection through the oral route in mice.ptitleaugausnmDelpinosnmfnmMVfnmauausnmMarchesinisnmfnmMIfnmauausnmEsteinsnmfnmSMfnmauausnmComercisnmfnmDJfnmauausnmCassatarosnmfnmJfnmauausnmFossatisnmfnmCAfnmauausnmBaldisnmfnmPCfnmauaugsourceInfect Immunsourcepubdate2007pubdatevolume75volumefpage299fpagelpage305lpagexrefbibpubidlistpubid idtype="doi"10.1128IAI.00952-06pubidpubid idtype="pmcid"1828384pubidpubid idtype="pmpid" link="fulltext"17088355pubidpubidlistxrefbibbiblbibl id="B81"titlepThe TibA adhesininvasin from enterotoxigenic Escherichia coli is self recognizing and induces bacterial aggregation and biofilm formation.ptitleaugausnmSherlocksnmfnmOfnmauausnmVejborgsnmfnmRMfnmauausnmKlemmsnmfnmPfnmauaugsourceInfect Immunsourcepubdate2005pubdatevolume73volumefpage1954fpagelpage1963lpagexrefbibpubidlistpubid idtype="doi"10.1128IAI.73.4.1954-1963.2005pubidpubid idtype="pmcid"1087433pubidpubid idtype="pmpid" link="fulltext"15784535pubidpubidlistxrefbibbiblbibl id="B82"titlepARDB--Antibiotic Resistance Genes Database.ptitleaugausnmLiusnmfnmBfnmauausnmPopsnmfnmMfnmauaugsourceNucleic Acids Ressourcepubdate2009pubdatevolume37volumefpageD443fpagelpage447lpagexrefbibpubidlistpubid idtype="doi"10.1093nargkn656pubidpubid idtype="pmcid"2686595pubidpubid idtype="pmpid" link="fulltext"18832362pubidpubidlistxrefbibbiblbibl id="B83"titlepAntibiotic Resistance Genes Databaseptitleurlhttp:ardb.cbcb.umd.eduurlbiblbibl id="B84"titlepACLAME: a CLAssification of Mobile genetic Elements.ptitleaugausnmLeplaesnmfnmRfnmauausnmHebrantsnmfnmAfnmauausnmWodaksnmfnmSJfnmauausnmToussaintsnmfnmAfnmauaugsourceNucleic Acids Ressourcepubdate2004pubdatevolume32volumefpageD45fpagelpage49lpagexrefbibpubidlistpubid idtype="doi"10.1093nargkh084pubidpubid idtype="pmcid"308818pubidpubid idtype="pmpid" link="fulltext"14681355pubidpubidlistxrefbibbiblbibl id="B85"titlepFrameshift detection in prokaryotic genomic sequences.ptitleaugausnmKislyuksnmfnmAfnmauausnmLomsadzesnmfnmAfnmauausnmLapidussnmfnmALfnmauausnmBorodovskysnmfnmMfnmauaugsourceInt J Bioinform Res Applsourcepubdate2009pubdatevolume5volumefpage458fpagelpage477lpagexrefbibpubidlistpubid idtype="doi"10.1504IJBRA.2009.027519pubidpubid idtype="pmpid" link="fulltext"19640832pubidpubidlistxrefbibbiblbibl id="B86"titlepComparative genome assembly.ptitleaugausnmPopsnmfnmMfnmauausnmPhillippysnmfnmAfnmauausnmDelchersnmfnmALfnmauausnmSalzbergsnmfnmSLfnmauaugsourceBrief Bioinformsourcepubdate2004pubdatevolume5volumefpage237fpagelpage248lpagexrefbibpubidlistpubid idtype="doi"10.1093bib5.3.237pubidpubid idtype="pmpid" link="fulltext"15383210pubidpubidlistxrefbibbiblbibl id="B87"titlepCd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.ptitleaugausnmLisnmfnmWfnmauausnmGodziksnmfnmAfnmauaugsourceBioinformaticssourcepubdate2006pubdatevolume22volumefpage1658fpagelpage1659lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbtl158pubidpubid idtype="pmpid" link="fulltext"16731699pubidpubidlistxrefbibbiblbibl id="B88"titlepVersatile and open software for comparing large genomes.ptitleaugausnmKurtzsnmfnmSfnmauausnmPhillippysnmfnmAfnmauausnmDelchersnmfnmALfnmauausnmSmootsnmfnmMfnmauausnmShumwaysnmfnmMfnmauausnmAntonescusnmfnmCfnmauausnmSalzbergsnmfnmSLfnmauaugsourceGenome Biolsourcepubdate2004pubdatevolume5volumefpageR12fpagexrefbibpubidlistpubid idtype="doi"10.1186gb-2004-5-2-r12pubidpubid idtype="pmcid"395750pubidpubid idtype="pmpid" link="fulltext"14759262pubidpubidlistxrefbibbiblbibl id="B89"titlepDIYA: A bacterial annotation pipeline for any genomics lab.ptitleaugausnmStewartsnmfnmACfnmauausnmOsbornesnmfnmBfnmauausnmReadsnmfnmTDfnmauaugsourceBioinformaticssourcepubdate2009pubdatevolume25volumefpage962fpagelpage3lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbtp097pubidpubid idtype="pmcid"2660880pubidpubid idtype="pmpid" link="fulltext"19254921pubidpubidlistxrefbibbiblbibl id="B90"titlepMicrobial gene identification using interpolated Markov models.ptitleaugausnmSalzbergsnmfnmSLfnmauausnmDelchersnmfnmALfnmauausnmKasifsnmfnmSfnmauausnmWhitesnmfnmOfnmauaugsourceNucleic Acids Ressourcepubdate1998pubdatevolume26volumefpage544fpagelpage548lpagexrefbibpubidlistpubid idtype="doi"10.1093nar26.2.544pubidpubid idtype="pmcid"147303pubidpubid idtype="pmpid" link="fulltext"9421513pubidpubidlistxrefbibbiblbibl id="B91"titleptRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.ptitleaugausnmLowesnmfnmTMfnmauausnmEddysnmfnmSRfnmauaugsourceNucleic Acids Ressourcepubdate1997pubdatevolume25volumefpage955fpagelpage964lpagexrefbibpubidlistpubid idtype="doi"10.1093nar25.5.955pubidpubid idtype="pmcid"146525pubidpubid idtype="pmpid" link="fulltext"9023104pubidpubidlistxrefbibbiblbibl id="B92"titlepRNAmmer: consistent and rapid annotation of ribosomal RNA genes.ptitleaugausnmLagesensnmfnmKfnmauausnmHallinsnmfnmPfnmauausnmRødlandsnmfnmEAfnmauausnmStaerfeldtsnmfnmHHfnmauausnmRognessnmfnmTfnmauausnmUsserysnmfnmDWfnmauaugsourceNucleic Acids Ressourcepubdate2007pubdatevolume35volumefpage3100fpagelpage3108lpagexrefbibpubidlistpubid idtype="doi"10.1093nargkm160pubidpubid idtype="pmcid"1888812pubidpubid idtype="pmpid" link="fulltext"17452365pubidpubidlistxrefbibbiblbibl id="B93"titlepUniRef: comprehensive and non-redundant UniProt reference clusters.ptitleaugausnmSuzeksnmfnmBEfnmauausnmHuangsnmfnmHfnmauausnmMcGarveysnmfnmPfnmauausnmMazumdersnmfnmRfnmauausnmWusnmfnmCHfnmauaugsourceBioinformaticssourcepubdate2007pubdatevolume23volumefpage1282fpagelpage1288lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbtm098pubidpubid idtype="pmpid" link="fulltext"17379688pubidpubidlistxrefbibbiblbibl id="B94"titlepBasic local alignment search tool.ptitleaugausnmAltschulsnmfnmSFfnmauausnmGishsnmfnmWfnmauausnmMillersnmfnmWfnmauausnmMyerssnmfnmEWfnmauausnmLipmansnmfnmDJfnmauaugsourceJ Mol Biolsourcepubdate1990pubdatevolume215volumefpage403fpagelpage410lpagexrefbibpubid idtype="pmpid"2231712pubidxrefbibbiblbibl id="B95"titlepConserved Domain Database(CDD)ptitleurlhttp:www.ncbi.nlm.nih.govsitesentrezdb=cddurlbiblbibl id="B96"titlepGapped BLAST and PSI-BLAST: a new generation of protein database search programs.ptitleaugausnmAltschulsnmfnmSFfnmauausnmMaddensnmfnmTLfnmauausnmSchaffersnmfnmAAfnmauausnmZhangsnmfnmJfnmauausnmZhangsnmfnmZfnmauausnmMillersnmfnmWfnmauausnmLipmansnmfnmDJfnmauaugsourceNucleic Acids Ressourcepubdate1997pubdatevolume25volumefpage3389fpagelpage3402lpagexrefbibpubidlistpubid idtype="doi"10.1093nar25.17.3389pubidpubid idtype="pmcid"146917pubidpubid idtype="pmpid" link="fulltext"9254694pubidpubidlistxrefbibbiblbibl id="B97"titlepImprovement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.ptitleaugausnmTalaverasnmfnmGfnmauausnmCastresanasnmfnmJfnmauaugsourceSyst Biolsourcepubdate2007pubdatevolume56volumefpage564fpagelpage577lpagexrefbibpubidlistpubid idtype="doi"10.108010635150701472164pubidpubid idtype="pmpid" link="fulltext"17654362pubidpubidlistxrefbibbiblbibl id="B98"titlepGenome sequence of Chlamydophila caviae (Chlamydia psittaci GPIC): examining the role of niche-specific genes in the evolution of the Chlamydiaceae.ptitleaugausnmReadsnmfnmTDfnmauausnmMyerssnmfnmGSfnmauausnmBrunhamsnmfnmRCfnmauausnmNelsonsnmfnmWCfnmauausnmPaulsensnmfnmITfnmauausnmHeidelbergsnmfnmJfnmauausnmHoltzapplesnmfnmEfnmauausnmKhourisnmfnmHfnmauausnmFederovasnmfnmNBfnmauausnmCartysnmfnmHAfnmauausnmUmayamsnmfnmLAfnmauausnmHaftsnmfnmDHfnmauausnmPetersonsnmfnmJfnmauausnmBeanansnmfnmMJfnmauausnmWhitesnmfnmOfnmauausnmSalzbergsnmfnmSLfnmauausnmHsiasnmfnmRCfnmauausnmMcClartysnmfnmGfnmauausnmRanksnmfnmRGfnmauausnmBavoilsnmfnmPMfnmauausnmFrasersnmfnmCMfnmauaugsourceNucleic Acids Ressourcepubdate2003pubdatevolume31volumefpage2134fpagelpage2147lpagexrefbibpubidlistpubid idtype="doi"10.1093nargkg321pubidpubid idtype="pmcid"153749pubidpubid idtype="pmpid" link="fulltext"12682364pubidpubidlistxrefbibbiblbibl id="B99"titlepVisualization of comparative genomic analyses by BLAST score ratio.ptitleaugausnmRaskosnmfnmDAfnmauausnmMyerssnmfnmGSfnmauausnmRavelsnmfnmJfnmauaugsourceBMC Bioinformaticssourcepubdate2005pubdatevolume6volumefpage2fpagexrefbibpubidlistpubid idtype="doi"10.11861471-2105-6-2pubidpubid idtype="pmcid"545078pubidpubid idtype="pmpid" link="fulltext"15634352pubidpubidlistxrefbibbiblbibl id="B100"titlepYersinia aldovae (Formerly Yersinia enterocolitica-Like Group X2): a New Species of Enterobacteriaceae Isolated from Aquatic Ecosystems.ptitleaugausnmBercoviersnmfnmHfnmauausnmSteigerwaltsnmfnmAGfnmauausnmGuiyoulesnmfnmAfnmauausnmHuntley-CartersnmfnmGfnmauausnmBrennersnmfnmDJfnmauaugsourceInt J Syst Bacteriolsourcepubdate1984pubdatevolume34volumefpage166fpagelpage172lpagebiblbibl id="B101"titlepYersinia mollaretii sp. nov. and Yersinia bercovieri sp. nov., Formerly Called Yersinia enterocolitica Biogroups 3A and 3B.ptitleaugausnmWauterssnmfnmGfnmauausnmJanssenssnmfnmMfnmauausnmSteigerwaltsnmfnmAGfnmauausnmBrennersnmfnmDJfnmauaugsourceInt J Syst Bacteriolsourcepubdate1988pubdatevolume38volumefpage424fpagebiblbibl id="B102"titlepYersinia frederiksenii: A new species of enterobacteriaceae composed of rhamnose-positive strains (formerly called atypical yersinia enterocolitica or Yersinia enterocolitica -Like).ptitleaugausnmUrsingsnmfnmJfnmauausnmBrennertsnmfnmDJfnmauausnmBercoviersnmfnmHfnmauausnmFanningsnmfnmGRfnmauausnmSteigerwaltsnmfnmAGfnmauausnmBraultsnmfnmJfnmauausnmMollaretsnmfnmHHfnmauaugsourceCurrent Microbiologysourcepubdate1980pubdatevolume4volumefpage213fpagelpage217lpagexrefbibpubid idtype="doi"10.1007BF02605859pubidxrefbibbiblbibl id="B103"titlepYersinia intermedia: A new species of enterobacteriaceae composed of rhamnose-positive, melibiose-positive, raffinose-positive strains (formerly called Yersinia enterocolitica or Yersinia enterocolitica -like).ptitleaugausnmBrennersnmfnmDJfnmauausnmBercoviersnmfnmHHfnmauausnmUrsingsnmfnmJfnmauausnmAlonsosnmfnmJMfnmauausnmSteigerwaltsnmfnmAGfnmauausnmFanningsnmfnmGRfnmauausnmCartersnmfnmGPfnmauausnmMollaretsnmfnmHHfnmauaugsourceCurrent Microbiologysourcepubdate1980pubdatevolume4volumefpage207fpagelpage212lpagexrefbibpubid idtype="doi"10.1007BF02605858pubidxrefbibbiblbibl id="B104"titlepYersinia kristensenii: A new species of enterobacteriaceae composed of sucrose-negative strains (formerly called atypical Yersinia enterocolitica or Yersinia enterocolitica -Like).ptitleaugausnmBercoviersnmfnmHfnmauausnmUrsingsnmfnmJfnmauausnmBrennersnmfnmDJfnmauausnmSteigerwaltsnmfnmAGfnmauausnmFanningsnmfnmGRfnmauausnmCartersnmfnmGPfnmauausnmMollaretsnmfnmHHfnmauaugsourceCurrent Microbiologysourcepubdate1980pubdatevolume4volumefpage219fpagelpage224lpagexrefbibpubid idtype="doi"10.1007BF02605860pubidxrefbibbiblbibl id="B105"titlepYersinia rohdei sp. nov. isolated from human and dog feces and surface water.ptitleaugausnmAleksicsnmfnmSfnmauausnmSteigerwaltsnmfnmAGfnmauausnmBockemuehlsnmfnmJfnmauaugsourceInt J Syst Bacteriolsourcepubdate1987pubdatebiblbibl id="B106"titlepDNAPlotter: circular and linear interactive genome visualization.ptitleaugausnmCarversnmfnmTfnmauausnmThomsonsnmfnmNfnmauausnmBleasbysnmfnmAfnmauausnmBerrimansnmfnmMfnmauausnmParkhillsnmfnmJfnmauaugsourceBioinformaticssourcepubdate2009pubdatevolume25volumefpage119fpagelpage120lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbtn578pubidpubid idtype="pmcid"2612626pubidpubid idtype="pmpid" link="fulltext"18990721pubidpubidlistxrefbibbiblrefgrp
bm
art


49772 49881 NS3669_Yersinia_kristensenii_ATCC33638
111737 111857 NS3669_Yersinia_kristensenii_ATCC33638
348903 349027 NS3669_Yersinia_kristensenii_ATCC33638
750113 750226 NS2456_Yersinia_fredriksenii_ATCC33641
785707 785820 NS3669_Yersinia_kristensenii_ATCC33638
818180 818290 NS2255_Yersinia_aldovae_ATCC35236
819932 820044 NS2255_Yersinia_aldovae_ATCC35236
1015727 1015828 NS2255_Yersinia_aldovae_ATCC35236
1062196 1062298 NS2255_Yersinia_aldovae_ATCC35236
1313974 1314093 NS2255_Yersinia_aldovae_ATCC35236
1434719 1434830 NS3669_Yersinia_kristensenii_ATCC33638
1545342 1545445 NS2455_Yersinia_intermedia_ATCC29909
1622305 1622414 NS2255_Yersinia_aldovae_ATCC35236
1808550 1808649 NS3669_Yersinia_kristensenii_ATCC33638
2031675 2031805 NS2457_Yersinia_rohdei_ATCC43380
2427351 2427503 NS3669_Yersinia_kristensenii_ATCC33638
2587144 2587247 NS2456_Yersinia_fredriksenii_ATCC33641 NS2459_Yersinia_bercovieri_ATCC43970
2948790 2948921 NS2255_Yersinia_aldovae_ATCC35236
2953365 2953470 NS2255_Yersinia_aldovae_ATCC35236
3061320 3061430 NS2255_Yersinia_aldovae_ATCC35236 NS2456_Yersinia_fredriksenii_ATCC33641 NS2458_Yersinia_mollaretii_ATCC43969 NS2459_Yersinia_bercovieri_ATCC43970
3114314 3114415 NS3669_Yersinia_kristensenii_ATCC33638
3178485 3178602 NS2456_Yersinia_fredriksenii_ATCC33641
3736081 3736198 NS2255_Yersinia_aldovae_ATCC35236
3739389 3739493 NS3669_Yersinia_kristensenii_ATCC33638
3755485 3755593 NS2255_Yersinia_aldovae_ATCC35236
3828223 3828329 NS2455_Yersinia_intermedia_ATCC29909
3831448 3831551 NS2455_Yersinia_intermedia_ATCC29909
4048243 4048358 NS3669_Yersinia_kristensenii_ATCC33638
4190436 4190536 NS3669_Yersinia_kristensenii_ATCC33638
4251087 4251191 NS2255_Yersinia_aldovae_ATCC35236
4561058 4561196 NS3669_Yersinia_kristensenii_ATCC33638


>kristensenii0 count=179
TATACCCAAAATAATTGGAGTTGCAGGAAGGCAGCAAGCGAACTAATCCCGATGAGCTTACACAAGTAAGTGATTCGGGT
GAGTGAGCGCAGCTAACACCCCTGCAGCTTCAAGGACGAAGGGTATA
>kristensenii1 count=84
TATACCCTAAATAATTCGAGTTGCAGGAAGGCGGCAAGCGAGAGACAAATCGGTCGGGAACCGATTTGAACAGCATTTAT
GCTAGCCCGCAGGGTGAGCCTCAAGGATGAGGCTCATTAATCCCGATGAGCTTACTCAAGTAAGTGATTCGGGTGAGTGA
GAGCAGCCAACACCCATGCAGCTTGAAGTATGACGGGTATA
>kristensenii2 count=50
TATAGCTCAGTGAGTGAAGGGTTTACCGCACCTAGGGGCACTCCGGCGGCTTACGCCGCTACGACCCCAACGGCACGTTT
CCCCTTCATTTGACTTTGTCAGCAGTCTGA
>kristensenii4 count=10
TGCAAATAATCAGACAATCTGTGTGGACACTGCGCAATGCGTATCGTTAGGTAAGGAGGTGATCCAACCGCAGGTTCCCC
TACGGTTACCTTGTTACGACTTCACCCCAGTCATGAATCACAAAGTGGTAAGCGCCCTCCCGAAGGTTAAGCTACCTACT
TCTTTTGCAACCCACTCCCATGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGTAGCATTCTGATCTA
CGATTACTAGCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTACGACAGACTTTATGTGGTCCGCTT
GCTCTCGCGAGTTCGCTTCACTTTGTATCTGCCATTGTAGCACGTGTGTAGCCCTACTCGTAAGGGCCATGATGACTTGA
CGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCTCCCTTGAGTTCCCACCATTACGTGCTGGCAACAAAGGATAA
GGGTTGCGCTCGTTGCGGGACTTAACCCAACATTTCACAACACGAGCTGACGACAGCCATGCAGCACCTGTCTCACAGTT
CCCGAAGGCACCTAAGCATCTCTGCTAAGTTCTGTGGATGTCAAGAGTAGGTAAGGTTCTTCGCGTTGCATCGAATTAAA
CCACATGCTCCACCGCTTGTGCGGGCCCCCGTCAATTCATTTGAGTTTTAACCTTGCGGCCGTACTCCCCAGGCGGTCGA
CTTAACGCGTTAGCTCCGGAAGCCACGCCTCAAGGGCACAACCTCCAAGTCGACATCGTTTACAGCGTGGACTACCAGGG
TATCTAATCCTGTTTGCTCCCCACGCTTTCGCACCTGAGCGTCAGTCTTTGTCCAGGGGGCCGCCTTCGCCACCGGTATT
CCTCCAGATCTCTACGCATTTCACCGCTACACCTGGAATTCTACCCCCCTCTACAAGACTCTAGCTTGCCAGTTTCAAAT
GCAGTTCCCACGTTAAGCGCGGGGATTTCACATCTGACTTAACAAACCGCCTGCGTGCGCTTTACGCCCAGTAATTCCGA
TTAACGCTTGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGCGAGTAACGTCAATGCT
CAGTGCTATTAACACTGAACCCTTCCTCCTCGCTGAAAGTGCTTTACAACCCGAAGGCCTTCTTCACACACGCGGCATGG
CTGCATCAGGCTTGCGCCCATTGTGCAATATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGACCGTGTCTCAGTTCCAGT
GTGGCTGGTCATCCTCTCAGACCAGCTAGGGATCGTCGCCTAGGTGAGCCATTACCCCACCTACTAGCTAATCCCATCTG
GGCACATCCGATGGCGTGAGGCCCGAAGGTCCCCCACTTTGCTCTTGCGAGGTCATGCGGTATTAGCTACCGTTTCCAGT
AGTTATCCCCCTCCATCAGGCAGTTTCCCAGACATTACTCACCCGTCCGCCGCTCGCCGGCAAAGTAGTAAACTACTTCC
CGCTGCCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTTCAATTTAAGATT
TGTTTGATTTGCTATGAGTTAACATAGCGATGCTCAAAGATTACTTTCTGCAAATAATCAGACAATCTGTGTGGACACTG
CGCAATGCGTATCGTTAGGTAAGGAGGTGATCCAACCGCAGGTTCCCCTACGGTTACCTTGTTACGACTTCACCCCAGTC
ATGAATCACAAAGTGGTAAGCGCCCTCCCGAAGGTTAAGCTACCTACTTCTTTTGCAACCCACTCCCATGGTGTGACGGG
CGGTGTGTACAAGGCCCGGGAACGTATTCACCGTAGCATTCTGATCTACGATTACTAGCGATTCCGACTTCATGGAGTCG
AGTTGCAGACTCCAATCCGGACTACGACAGACTTTATGTGGTCCGCTTGCTCTCGCGAGTTCGCTTCACTTTGTATCTGC
CATTGTAGCACGTGTGTAGCCCTACTCGTAAGGGCCATGATGACTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCG
GCAGTCTCCCTTGAGTTCCCACCATTACGTGCTGGCAACAAAGGATAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACA
TTTCACAACACGAGCTGACGACAGCCATGCAGCACCTGTCTCACAGTTCCCGAAGGCACCTAAGCATCTCTGCTAAGTTC
TGTGGATGTCAAGAGTAGGTAAGGTTCTTCGCGTTGCATCGAATTAAACCACATGCTCCACCGCTTGTGCGGGCCCCCGT
CAATTCATTTGAGTTTTAACCTTGCGGCCGTACTCCCCAGGCGGTCGACTTAACGCGTTAGCTCCGGAAGCCACGCCTCA
AGGGCACAACCTCCAAGTCGACATCGTTTACAGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGC
ACCTGAGCGTCAGTCTTTGTCCAGGGGGCCGCCTTCGCCACCGGTATTCCTCCAGATCTCTACGCATTTCACCGCTACAC
CTGGAATTCTACCCCCCTCTACAAGACTCTAGCTTGCCAGTTTCAAATGCAGTTCCCACGTTAAGCGCGGGGATTTCACA
TCTGACTTAACAAACCGCCTGCGTGCGCTTTACGCCCAGTAATTCCGATTAACGCTTGCACCCTCCGTATTACCGCGGCT
GCTGGCACGGAGTTAGCCGGTGCTTCTTCTGCGAGTAACGTCAATGCTCAGTGCTATTAACACTGAACCCTTCCTCCTCG
CTGAAAGTGCTTTACAACCCGAAGGCCTTCTTCACACACGCGGCATGGCTGCATCAGGCTTGCGCCCATTGTGCAATATT
CCCCACTGCTGCCTCCCGTAGGAGTCTGGACCGTGTCTCAGTTCCAGTGTGGCTGGTCATCCTCTCAGACCAGCTAGGGA
TCGTCGCCTAGGTGAGCCATTACCCCACCTACTAGCTAATCCCATCTGGGCACATCCGATGGCGTGAGGCCCGAAGGTCC
CCCACTTTGCTCTTGCGAGGTCATGCGGTATTAGCTACCGTTTCCAGTAGTTATCCCCCTCCATCAGGCAGTTTCCCAGA
CATTACTCACCCGTCCGCCGCTCGCCGGCAAAGTAGTAAACTACTTCCCGCTGCCGCTCGACTTGCATGTGTTAGGCCTG
CCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTTCAATTTAAGATTTGTTTGATTTGCTATGAGTTAACATAGCGATG
CTCAAAGATTACTTTCTGCAAATAATCAGACAATCTGTGTGGACACTGCGCAATGCGTATCGTTAGGTAAGGAGGTGATC
CAACCGCAGGTTCCCCTACGGTTACCTTGTTACGACTTCACCCCAGTCATGAATCACAAAGTGGTAAGCGCCCTCCCGAA
GGTTAAGCTACCTACTTCTTTTGCAACCCACTCCCATGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACC
GTAGCATTCTGATCTACGATTACTAGCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTACGACAGAC
TTTATGTGGTCCGCTTGCTCTCGCGAGTTCGCTTCACTTTGTATCTGCCATTGTAGCACGTGTGTAGCCCTACTCGTAAG
GGCCATGATGACTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCTCCCTTGAGTTCCCACCATTACGTGC
TGGCAACAAAGGATAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATTTCACAACACGAGCTGACGACAGCCATGCAG
CACCTGTCTCACAGTTCCCGAAGGCACCTAAGCATCTCTGCTAAGTTCTGTGGATGTCAAGAGTAGGTAAGGTTCTTCGC
GTTGCATCGAATTAAACCACATGCTCCACCGCTTGTGCGGGCCCCCGTCAATTCATTTGAGTTTTAACCTTGCGGCCGTA
CTCCCCAGGCGGTCGACTTAACGCGTTAGCTCCGGAAGCCACGCCTCAAGGGCACAACCTCCAAGTCGACATCGTTTACA
GCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGCACCTGAGCGTCAGTCTTTGTCCAGGGGGCCGC
CTTCGCCACCGGTATTCCTCCAGATCTCTACGCATTTCACCGCTACACCTGGAATTCTACCCCCCTCTACAAGACTCTAG
CTTGCCAGTTTCAAATGCAGTTCCCACGTTAAGCGCGGGGATTTCACATCTGACTTAACAAACCGCCTGCGTGCGCTTTA
CGCCCAGTAATTCCGATTAACGCTTGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGC
GAGTAACGTCAATGCTCAGTGCTATTAACACTGAACCCTTCCTCCTCGCTGAAAGTGCTTTACAACCCGAAGGCCTTCTT
CACACACGCGGCATGGCTGCATCAGGCTTGCGCCCATTGTGCAATATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGACC
GTGTCTCAGTTCCAGTGTGGCTGGTCATCCTCTCAGACCAGCTAGGGATCGTCGCCTAGGTGAGCCATTACCCCACCTAC
TAGCTAATCCCATCTGGGCACATCCGATGGCGTGAGGCCCGAAGGTCCCCCACTTTGCTCTTGCGAGGTCATGCGGTATT
AGCTACCGTTTCCAGTAGTTATCCCCCTCCATCAGGCAGTTTCCCAGACATTACTCACCCGTCCGCCGCTCGCCGGCAAA
GTAGTAAACTACTTCCCGCTGCCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAAC
TCTTCAATTTAAGATTTGTTTGATTTGCTATGAGTTAACATAGCGATGCTCAAAGATTACTTTCTGCAAATAATCAGACA
ATCTGTGTGGACACTGCGCAATGCGTATCGTTAGGTAAGGAGGTGATCCAACCGCAGGTTCCCCTACGGTTACCTTGTTA
CGACTTCACCCCAGTCATGAATCACAAAGTGGTAAGCGCCCTCCCGAAGGTTAAGCTACCTACTTCTTTTGCAACCCACT
CCCATGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGTAGCATTCTGATCTACGATTACTAGCGATTC
CGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTACGACAGACTTTATGTGGTCCGCTTGCTCTCGCGAGTTCGC
TTCACTTTGTATCTGCCATTGTAGCACGTGTGTAGCCCTACTCGTAAGGGCCATGATGACTTGACGTCATCCCCACCTTC
CTCCGGTTTGTCACCGGCAGTCTCCCTTGAGTTCCCACCATTACGTGCTGGCAACAAAGGATAAGGGTTGCGCTCGTTGC
GGGACTTAACCCAACATTTCACAACACGAGCTGACGACAGCCATGCAGCACCTGTCTCACAGTTCCCGAAGGCACCTAAG
CATCTCTGCTAAGTTCTGTGGATGTCAAGAGTAGGTAAGGTTCTTCGCGTTGCATCGAATTAAACCACATGCTCCACCGC
TTGTGCGGGCCCCCGTCAATTCATTTGAGTTTTAACCTTGCGGCCGTACTCCCCAGGCGGTCGACTTAACGCGTTAGCTC
CGGAAGCCACGCCTCAAGGGCACAACCTCCAAGTCGACATCGTTTACAGCGTGGACTACCAGGGTATCTAATCCTGTTTG
CTCCCCACGCTTTCGCACCTGAGCGTCAGTCTTTGTCCAGGGGGCCGCCTTCGCCACCGGTATTCCTCCAGATCTCTACG
CATTTCACCGCTACACCTGGAATTCTACCCCCCTCTACAAGACTCTAGCTTGCCAGTTTCAAATGCAGTTCCCACGTTAA
GCGCGGGGATTTCACATCTGACTTAACAAACCGCCTGCGTGCGCTTTACGCCCAGTAATTCCGATTAACGCTTGCACCCT
CCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGCGAGTAACGTCAATGCTCAGTGCTATTAACACT
GAACCCTTCCTCCTCGCTGAAAGTGCTTTACAACCCGAAGGCCTTCTTCACACACGCGGCATGGCTGCATCAGGCTTGCG
CCCATTGTGCAATATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGACCGTGTCTCAGTTCCAGTGTGGCTGGTCATCCTC
TCAGACCAGCTAGGGATCGTCGCCTAGGTGAGCCATTACCCCACCTACTAGCTAATCCCATCTGGGCACATCCGATGGCG
TGAGGCCCGAAGGTCCCCCACTTTGCTCTTGCGAGGTCATGCGGTATTAGCTACCGTTTCCAGTAGTTATCCCCCTCCAT
CAGGCAGTTTCCCAGACATTACTCACCCGTCCGCCGCTCGCCGGCAAAGTAGTAAACTACTTCCCGCTGCCGCTCGACTT
GCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTTCAATTTAAGATTTGTTTGATTTGCTATG
AGTTAACATAGCGATGCTCAAAGATTACTTTCTGCAAATAATCAGACAATCTGTGTGGACACTGCGCAATGCGTATCGTT
AGGTAAGGAGGTGATCCAACCGCAGGTTCCCCTACGGTTACCTTGTTACGACTTCACCCCAGTCATGAATCACAAAGTGG
TAAGCGCCCTCCCGAAGGTTAAGCTACCTACTTCTTTTGCAACCCACTCCCATGGTGTGACGGGCGGTGTGTACAAGGCC
CGGGAACGTATTCACCGTAGCATTCTGATCTACGATTACTAGCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCAAT
CCGGACTACGACAGACTTTATGTGGTCCGCTTGCTCTCGCGAGTTCGCTTCACTTTGTATCTGCCATTGTAGCACGTGTG
TAGCCCTACTCGTAAGGGCCATGATGACTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCTCCCTTGAGT
TCCCACCATTACGTGCTGGCAACAAAGGATAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATTTCACAACACGAGCT
GACGACAGCCATGCAGCACCTGTCTCACAGTTCCCGAAGGCACCTAAGCATCTCTGCTAAGTTCTGTGGATGTCAAGAGT
AGGTAAGGTTCTTCGCGTTGCATCGAATTAAACCACATGCTCCACCGCTTGTGCGGGCCCCCGTCAATTCATTTGAGTTT
TAACCTTGCGGCCGTACTCCCCAGGCGGTCGACTTAACGCGTTAGCTCCGGAAGCCACGCCTCAAGGGCACAACCTCCAA
GTCGACATCGTTTACAGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGCACCTGAGCGTCAGTCT
TTGTCCAGGGGGCCGCCTTCGCCACCGGTATTCCTCCAGATCTCTACGCATTTCACCGCTACACCTGGAATTCTACCCCC
CTCTACAAGACTCTAGCTTGCCAGTTTCAAATGCAGTTCCCACGTTAAGCGCGGGGATTTCACATCTGACTTAACAAACC
GCCTGCGTGCGCTTTACGCCCAGTAATTCCGATTAACGCTTGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAG
CCGGTGCTTCTTCTGCGAGTAACGTCAATGCTCAGTGCTATTAACACTGAACCCTTCCTCCTCGCTGAAAGTGCTTTACA
ACCCGAAGGCCTTCTTCACACACGCGGCATGGCTGCATCAGGCTTGCGCCCATTGTGCAATATTCCCCACTGCTGCCTCC
CGTAGGAGTCTGGACCGTGTCTCAGTTCCAGTGTGGCTGGTCATCCTCTCAGACCAGCTAGGGATCGTCGCCTAGGTGAG
CCATTACCCCACCTACTAGCTAATCCCATCTGGGCACATCCGATGGCGTGAGGCCCGAAGGTCCCCCACTTTGCTCTTGC
GAGGTCATGCGGTATTAGCTACCGTTTCCAGTAGTTATCCCCCTCCATCAGGCAGTTTCCCAGACATTACTCACCCGTCC
GCCGCTCGCCGGCAAAGTAGTAAACTACTTCCCGCTGCCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATC
TGAGCCATGATCAAACTCTTCAATTTAAGATTTGTTTGATTTGCTATGAGTTAACATAGCGATGCTCAAAGATTACTTTC
TGCAAATA
>kristensenii5 count=14
TAGCTCAGTTGGTAGAGCAATGCCTTGTCAAACCTGGGGTCGCGAGTTCGAGTCTCGTTACCCGCACCAA
>kristensenii9 count=19
AGGCGTTATTAGCGCAGGCAGTTTGGATGCGGCCAGCGCGCAGAAACCGGAGCGTACACGCAGTACGTGAGGATTTCGAG
CACTGCACAAGCTCAAAATGGCAA
>kristensenii10 count=14
TCCTGCTCAAGCGCGCTTTTCGCCGCCGTCCATGGCGGCTCACTCGGCCTACCGTCTTTGTCGGTAATATTCCTGATTGC
GCTCAAGGTCAAAAGAGAAAATCAAAACCATGGTTTTAGATTTTGATGTTAAAAGCACATTTGAGCAGCCGAGTGAAGAG
TGTAGCAAGGGAAATCCGCCATGGACGGCGGATTTAGGCGTCATGAGCAGGGATGCGAATAAAGCCGTCCCGCCAGCGAA
ATGATGAGTGAGGGAACCCACGCAGTGGGCAGGCGATACGTGCGAAACCCCGGATTCCACAAGGGGGCGCGCCCGCACCC
TTTGGCGGTTGGTTCATCGGAGGTTAAGTGAACTCAGTTGCTTTAACAACCGAACCAGATCACCAAA
>kristensenii13 count=10
CAGTACATGAGGATGGCGAGCACTGCACAAGCCCAGAATGACAACGCGCAGTAGCCG
>kristensenii28 count=15
GAAATTGCCGCGGGCAATACTGATTTATCAGCACGAACCGAACAACAAGCCGCATCACTAGAAGAAACAGCAGCCAGTAT
GGAGCAACTGACTGCCACGGTTAAACAGAATGCCGAAAATGCTCACCACGCTAGTCAACTGGCAACGAATGCATCAGAAA
CCGCGCAACAAGGTGGCAAAATCGTCAATGATGTGGTGCACACCATGAATGAAATATCCGGCAGCTCAAAGAAAATTGCT
GAAATCACCAACGTAATTAATAGCATTGCTTTCCAAACCAATATTCTGGCATTGAATGCCGCAGTTGAAGCAGCCCGTGC
CGGTGAGCAAGGCCGTGGATTTGCCGTCGTCGCCAGTGAAGTGCGCAATTTGGCACAGCGCAGTGCACAAGCAGCAAAAG
AAATTGAAGGGTTGATTGAAGACTCAGTTAACCATGTCGATGATGGCTCAATGCTGGTACAAAATGCGGGTAAAACCATG
GAAGATATCGTAAAAGCCGTCACCCATGTGACCGATATTATGGGAGAAATCGCATCAGCCTCTGATGAACAAAGCCGGGG
TATCAACCAGGTCGGTCAGGCTGTATCCGAAATGGACAGCGTAACACAACAAAATGCCTCTTTGGTAGAAGAATCAGCAG
CAGCTGCCGCATCACTGGAAGAACAAGCCAACATACTGACCCAAGCCGTCGCAGTATT
>kristensenii38 count=10
ATGAAAGGTCAAGTAAAGTGGTTCAACGAGTCTAAAGGCTTCGGTTTCATCACTCCAGCTGACGGCAGCAAAGATGTGTT
CGTACACTTCTCTGCAATCCA
>kristensenii39 count=56
CAATACCCTAAATAATTCGAGTTGCAGGAAGGCGGCAATTGAGTCAATCCCCAGAAGCTTATTGATTTTAAGGTCAATAG
CAGTAAGTGACTGGGGTGAACGAAAGCAGCCAACACACATGCAGCTTGAAGTATGACGGGTA
>kristensenii47 count=12
AATAGGCTATTTTACTTGCCATTTTGAACCTGAGCAGTGCTCGAAATCCTCACGTACTTGCGTGTACGCTCCGGTTTCTG
TGCGCTG
>kristensenii50 count=11
TCTTCAGCTTGGAAGGCTGAGGTAATAGCCATTATACGATGACCGCTTGGTGGCTCTTACCGACAAGCCCAAGTTTGTGA
TCGGGTAGAAGTAAGTCACATACGCTGGTTAGTACAGAAAGAGCCGAGGCCGCAGTAATAACGATGACCAATACTGTAAG
TGTCCTACCCCCTTACAGCTATTACTACAGCGGCAGGAATATTACTCAACCGTAAAAAATATTCAACAATCCAATTAATA
ACAAAAAGTTAGATTTTGTTTCTTTTTACTTCAGAGGTTATTGTTCGGTGCTTAAAAAGCAAAAACCCGCACAGTGGCGG
GTTCGTTGATATAAGTGGTATTACTTTGTAAGCACTTTTAATAGATTAATTACCCTTTTGTTATAACACAACCATTTTTA
CATTCAGTTTTTGTCAGAAACCAAGGATGATTATATAGCCTTGCAACAGGCGGTCATTAACCTCGACATTGCAGTAAGCG
TCAGTGCGGCCATACTATGATGGTTGACGATATCATCGTCAGGAATAACGGTGACATCGACCAGCGGAAAAAACCACTAT
AGCGTTTTCTTTTCCGTCACTGATGAATGCATACTAATTCTTTCGTGCGCTAGCCACCCGTCTTATTCCTCTTGCTCGAT
CTTGTTTGGTATCGTGACGGCCATATAACCATTGGGGCCACACCCAAAAACTCAGCAATAATCCGCTCACCTTTAGGCCA
AGGGCGCGTAAGAGCATTTGCCAGAGTGGATGAACTTAACCCCGCTAATCTCGATAATTCTGCCAGAGAGGAACCTTTCT
TGTGTATTGCGGCAATGATATCGGCTTTATGCCAGTCATTTGAATACTGAATCATACAAATCCCTATGTTATTTATGATA
AGAATATTTAATATTATGAATCTAATAATTAAATAGCGTAAAACAGATCGCTATCACAAATAAGCATTCAAAAATGAGTC
TGTGTATTAAAAAAGATGTATGATACTTGCTTTGTTATATCGCATCTTCTTATCATATTTATATAATTATTAAGGTTTTA
ATATGGATGTTAATAATGAATTAATGGAATGCAATATAAAAACCATGCGAAGAGTATTTAATAGCTGCTTTGATTTTAGT
TTCATGGATATTAATTACTCTATTATAGATATAGACAGAAAAATAATACTAACTTTACCCAGCAACTACGAGTGGCATTT
GATTTACTGGCATGATAATTTAGACCTACATATCAGTGAACGCATAATTCCCGGCATACAACATTGGTGTAATTACTCAT
CATGCCATGGTGAAACACTCAGTAAAAGTGGCAAGAAAGAAACAAAAGTTGATATATGTACACAGAGTGGTTCTGTTTTT
GAAATAATGTCAGTTAACGCAGATCGTAAGTTATCGACTGCTGAAACACTGGCAATCTTAAAATACAAGCCCGTCATTTC
TGACTTTGCCAGAAAATTATGGAAACCGCCAAAACAGCAAAGCGTAATTTTACCTATCAGAGAGCAAATCAGTTATCCGA
ACCCCAAAATGGAAGATGAACATGAAGACGCACTTAATAGTCATCCTTACATGCGTTTTGGTGAGATTGTTTTTACACGG
AAAGAGATGATAACTATTCGTCATTTATTATCACATCGCAGAATAAAAGAGATCGCAGCCATTCAGGGGTGTTCTTTGAG
TGTTGAGCAACAACGCATTCATCGTATAAAAGAGAAAGTGAACTGCCAGAATCAGCCATTATCTGCACTTTTCAATGCTC
TTAAACTTCATGGAGTGACTTTATCATGCCTAGATATATTCATCACATATCCATAACTTGTGGGAATATAAATAACACAC
TTACATTTTATAAAATTTTGGGTTTCGTAAAAGAAAATCCTATCGTGATGACGAATGCATGATTTATCATCTATATGACA
GGGCCGGTTTCATTATTGAAGTCTTTCATTATATACAGAACCATCATTTAAATAAAAAAAGAGTATTCTCAACTGAGATA
CTGGGAATAACACACGTTGCTTTTATTGTCAGTAATATCGATGAGGTTTTACATACCCTGAAAAATCAGGAGATTGAGTG
TGGCAATATCATCACCTCCAGAATAGGTTCATACCGGTATTTTTTTACCTATGATCCCGATGGCAATGCTATCGAAATTA
TTGAGGAAATAAAATGAAATGGAAATTCAGAGTGGGGGCCGTTGCAGGTAATGCACTGGAGTTTTATGATATTGCTGTAT
TTGGGGCAATATCAGTTTACCTTGCAGCAGAGCTTGAACGACAAGGCTATACGCAAGGCATTATCATGGTGTGGGGAATA
TTTGCTTTACGCTTCATCGTCAGGCCTTTGGGCGGCTATGTAATAGGGCGCTATGCTGATATGGTGGGCCGAAAACCGGC
GTTGATCCTTACAAGTTTTATCACCGGTCTTGCTACACTCGGGATGGCACTTCTACCCGTGTCGCTTTTGGGACACTACA
CACCGCTTGCGCTGTTGTTATTACAAATGACACTGGCACTCAGCTACGGAGGGGAATACCCCACACTGATAACCTATCTT
TTTAGTGATGCCCAGGATAACCAGCGTGCAAGGATCAGCGCACTTATTGTTTGCAGCTCCATTGTCGGCGTGCTGGCTTC
TCTCCTGATTGTTTATTTCCTTGAAAGCCATCTCAGTCCTGAAATGATGCAGTCCATTGGCTGGCGCGTTCCTTTGCTCG
TTGGCGTCTTCAATATAGCCATCAGTTTTTGGTTCAGGGCCAGATTACCCGTCCAGTTAGTGCAACCCGTTCGGGCGTTA
AAAGTCCACAAAATGAATATGCTGCATGTTTTTCTCTTTACTGTTCCTGGGGCGGTGATATTTTATGTCCATAATATGTC
AACAACACTATTGAGAGAGGTGCTACATCTGGATTTTATTAAGGGAATTTATCCTATTATCTCATCCGGCTTATTATTGT
TTTTACTGGTTGTTTGTGGTTGGTTAACAGATAAATATGCAACCCCATCGAAAGTGTTCAGACTCGGCGTTATATTTTTG
GTCATTTTTTCCGTTCCGCTTTATTTTATATTAAACAGTAAGACTATTGAGTTTATGATGGCAGCACAGTTTATTATTAC
ACTCAATGCTTCAATGATATTATGTAATCTGGCGGCGGTATTATTTGACGTTTCGCGAGGTCATACAACAACGCTTGGGA
TGGGATATAATCTTTCACTTTCACTATTTGGTGGTTTAACACCGCTCATTATCAACTTTCTAATATCATACAGTGTCATT
TATGCAGGACTCTATATTTCATTGTCTGGTTGCTCCCTTTTGGTTTCTTATTATATCGATAAGAAAATCTATCAGTTATC
ACACTGACCGACCACATCAATAATTGAACAGTAATTAAGATGAAAAAAGAATGGTTCTCAGCAAGTGAATTGTTATTTCA
CCCTTTACTTCCGACAACAACACAAGGTATTCATAAAAAGGCAAAACGAGAGGGCTGGATATCCAGAAAAAGAATGGGTA
TTCAGGGAAAAGGCATCGAGTATTTTATAGACAGTCTTCCTGACAATGTCAGCACTTATATTTATCATCACAGTGAGTTG
TTCGAGAGGATTAGTATTCAGGAACACTTTGATATCTGGATTAATGCGTTTAAACAGTTAACACCGAGTGAGCGTTATAA
ATTCTCAGAATATATATTACGGGAGGGAATCATCAGTATTTATAAAAACCTGTAACTTTTATTCTTAATATTAATTTGAT
AAGGAATATCAATGAAATCAATCAATACTTACTTTTTTTGCGCCTTATTCATAACCACTCCGACTCTGGCATCCGATGAG
GGCGATACCAATATTATAGAATGTCCGGTATCTTATGATGTGGGTTTCTCGCCAAGGGGAAATGCCCTGGATTTAATATT
GGCAGAATTAGCTTCCGCTAAGGAATCTATTTATCTTGCCGCTTACAGTTTCACCTCTCAACCAGTTGTCGATGCTTTAA
TCGCAGCCCATGAACGGGGCGTGACGGTTTCAGTGGTGGTGAATAAAGGTTCAATTAACGGGAATGGTGCCAAGGCCCGT
TATTTACAAGAAAATAACATTCCCATAAAAATGAATGCAAAATACAGTATTATGCATAATAAATTCTTTCTTGTGGATAA
TAAGTCACTTAAAACCGGATCGTTTAATTACAGTGCGGCGGCACATAAGAGAAATGCGGAAAATATTCTTATAATCAGAT
GCGTTGATGATGTTATTACACAATATCAGCAAGAGTTTACACGGTTATGGGATGAATCCATTGAAATCCCCGAAATTTCT
ACATCCTGATAATTAATTCGGCGGCCCATAGCCCGAACTGACGTTTTGACTCTGGTGCTGCCTCAAATCCTTTATCACTT
GAGTTGGCCATATCACCATCCCTGTTTCTCTTGGCTTTTCCCGCCTATTTAATTTCGTTGCTTCCAACAGCCACTTTTGC
TTTGTTCACAATACATATAACTGTTCTATCCAGTGGGTGAATGAAGTGAGAACGGTGTTACTTCAGGGTATCAGGCGGCC
TTTGGTTCCAGCCATGCGATAATGTCATCAAATAGCCTTCCGGAGACCTCAATTTCATCGCTACAGACTCACGCTACATA
TAGTGTTCCATCTCCAACGGTACACCTAACGCTGCCAAGCAACCAGCAATAAACCCTTCCGCCCTTTGTAATCGGATCAC
TACCTGATTCAACGAAACTCCTAATTTCCCAGCCATCGCACGTAACGGCATGCACTTGATGTAATGCCATTCAAGCAAAG
TACAAAGGTATGGGTCTTTCTTTTTCAAAACATTCATCGCAGAACTGATGATCAATCCATCATCGTCACTACATGAAGCG
CGTGATTTACGTGTATAAGGTAATAACCCTTTAAATCCAGCAGCAATAGGCGAATAGTAAACATTCCCTCCTTCACTCGC
AGCCCATGCCCCCCAACGTTCCAACACCATTTGAATGTCACGCATTATGCCACCTTACCCATCTGGCTC
>aldovae0 count=168
TATACCCTTCGTCCTTGAAGCTGCAGGGGTGTTAGCTGCGCTCACTCACCCGAATCACTTACTTGTGTAAGCTCATCGGG
ATTCGTTCGCTTGCTGCCTACCTGCAACTCCAATGACTTTGGGTATA
>aldovae1 count=51
CCGTCATACTTCAAGCTGCATATGCGTTCCCTGCGTTCTATCACCCCAGTCACTTACATATGTAAGCTACCGAGGTGTCA
CTCAGTTGCCGCCTTCCTGCAACTCGAATTATTTAGGGTATA
>aldovae2 count=19
CTGTCGGTTATTACAGCTTGCAAACTGACCTCTTTATCGCGGGCCTCTTTTTATCGGCGCGGTACTGACTGGCGAGAAAA
AGATAAGGTCGTTATCGATGCCATTCAGGCTGTGTTGTCTGAGTCGCCTCAGGCTGGTTTCTGGAAATGCTATTATCGTC
TCAGATTTAAAGGGTTTATCTTCAACCATAAACGTGTTTATCGTGTGTATTGCCGGTTAGGTTTAAACCTTAAACGCAGG
ATCAAAAAAACGCTGCCAAAACGTGAGAATAAGCCGTTATCAATTGTGAATCTGCCTGATATTCAGTGGGCACTGGACTT
TATGCATGATGCTCTTTACTGCGGTAAGCGGTTCAGAACACTAAATATAATTGACGAAGGAACGCGTGAATGTCTGGCCA
TTGAAGTTGATACTTCTTTACCGACAGACCGGGTTATCCGTGTACTTGATCGACTGAAAAAAGAACGTGGGTTACCTCAG
CAACTCAGAGTTGATAATGGCCCAGAACTGATATCAGTTAACTTATTGAATTACTGCGAATATAATCACATTACACTGTG
CCATATTCAGCCGGGTAAACCACAACAAAACGGATTTATTGAGCGTTTTAATGGCTCATTTCGTCGGGAGTTTTTGAATG
CTTACTTGTTTGAATCATTAAGCCAAGTCAGAGAAATGGCTTGGTTCTGGCAACAGGATTATAACCTGAATCGTACACAT
GAAAGTCTGGGGCATCTTCCCCCGGAGACATACCGAAAACAGCTAGAAAACTCTAATCAGGACTGTCTCAGATAACGGGG
AGTGGACGTCCACTCCCCCTAAAATGAGACAGCTATAATTAGATTTTCTGCCTTTACCTTTGCAGAGGTCAAAATGAAAA
AAGCACGCTTTACTGAAACTCAGATCGTTAATATCCTGAAACTGGCTGATTCAGGTATAAAAGTTGAAGAAATTTGCCGT
CAGAACGGAATTAGCAGTGCCACCTACTACAATTGGAAATCTAAGTATGGTGGGATGGAAGCTAATGATGTTAAACGATT
AAAAGAGCTTGAAGAGGAAAATACTCGTCTCAAAAAATTGTTTGCTGAAGTCAGTCTTGAAAATCATGCCATGAAGGAGC
TTTTTGCAAAAAAGGGTTGGTAATAACCGAGAAAAAGTCCTGTGCCGGATTATTAACGGCTTCAGGCCTGTCGGTTATTA
CAGCTTGCAAACTGACCTCTTTATCGCGGGCCTCTTTTTATCGGCGCGGTACTGACTGGCGAGAAAAAGATAAGGTCGTT
ATCGATGCCATTCAGGCTGTGTTGTCTGAGTCGCCTCAGGCTGGTTTCTGGAAATGCTATTATCGTCTCAGATTTAAAGG
GTTTATCTTCAACCATAAACGTGTTTATCGTGTGTATTGCCGGTTAGGTTTAAACCTTAAACGCAGGATCAAAAAAACGC
TGCCAAAACGTGAGAATAAGCCGTTATCAATTGTGAATCTGCCTGATATTCAGTGGGCACTGGACTTTATGCATGATGCT
CTTTACTGCGGTAAGCGGTTCAGAACACTAAATATAATTGACGAAGGAACGCGTGAATGTCTGGCCATTGAAGTTGATAC
TTCTTTACCGACAGACCGGGTTATCCGTGTACTTGATCGACTGAAAAAAGAACGTGGGTTACCTCAGCAACTCAGAGTTG
ATAATGGCCCAGAACTGATATCAGTTAACTTATTGAATTACTGCGAATATAATCACATTACACTGTGCCATATTCAGCCG
GGTAAACCACAACAAAACGGATTTATTGAGCGTTTTAATGGCTCATTTCGTCGGGAGTTTTTGAATGCTTACTTGTTTGA
ATCATTAAGCCAAGTCAGAGAAATGGCTTGGTTCTGGCAACAGGATTATAACCTGAATCGTACACATGAAAGTCTGGGGC
ATCTTCCCCCGGAGACATACCGAAAACAGCTAGAAAACTCTAATCAGGACTGTCTCAGATAACGGGGAGTGGACGTCCAC
TCCCCCTAAAATGAGACAGCTATAATTAGATTTTCTGCCTTTACCTTTGCAGAGGTCAAAATGAAAAAAGCACGCTTTAC
TGAAACTCAGATCGTTAATATCCTGAAACTGGCTGATTCAGGTATAAAAGTTGAAGAAATTTGCCGTCAGAACGGAATTA
GCAGTGCCACCTACTACAATTGGAAATCTAAGTATGGTGGGATGGAAGCTAATGATGTTAAACGATTAAAAGAGCTTGAA
GAGGAAAATACTCGTCTCAAAAAATTGTTTGCTGAAGTCAGTCTTGAAAATCATGCCATGAAGGAGCTTTTTGCAAAAAA
GGGTTGGTAATAACCGAGAAAAAGTCCTGTGCCGGATTATTAACGGCTTCAGGCCTGTCGGTTATTACAGCTTGCAAACT
GACCTCTTTATCGCGGGCCTCTTTTTATCGGCGCGGTACTGACTGGCGAGAAAAAGATAAGGTCGTTATCGATGCCATTC
AGGCTGTGTTGTCTGAGTCGCCTCAGGCTGGTTTCTGGAAATGCTATTATCGTCTCAGATTTAAAGGGTTTATCTTCAAC
CATAAACGTGTTTATCGTGTGTATTGCCGGTTAGGTTTAAACCTTAAACGCAGGATCAAAAAAACGCTGCCAAAACGTGA
GAATAAGCCGTTATCAATTGTGAATCTGCCTGATATTCAGTGGGCACTGGACTTTATGCATGATGCTCTTTACTGCGGTA
AGCGGTTCAGAACACTAAATATAATTGACGAAGGAACGCGTGAATGTCTGGCCATTGAAGTTGATACTTCTTTACCGACA
GACCGGGTTATCCGTGTACTTGATCGACTGAAAAAAGAACGTGGGTTACCTCAGCAACTCAGAGTTGATAATGGCCCAGA
ACTGATATCAGTTAACTTATTGAATTACTGCGAATATAATCACATTACACTGTGCCATATTCAGCCGGGTAAACCACAAC
AAAACGGATTTATTGAGCGTTTTAATGGCTCATTTCGTCGGGAGTTTTTGAATGCTTACTTGTTTGAATCATTAAGCCAA
GTCAGAGAAATGGCTTGGTTCTGGCAACAGGATTATAACCTGAATCGTACACATGAAAGTCTGGGGCATCTTCCCCCGGA
GACATACCGAAAACAGCTAGAAAACTCTAATCAGGACTGTCTCAGATAACGGGGAGTGGACGTCCACTCCCCCTAAAATG
AGACAGCTATAATTAGATTTTCTGCCTTTACCTTTGCAGAGGTCAAAATGAAAAAAGCACGCTTTACTGAAACTCAGATC
GTTAATATCCTGAAACTGGCTGATTCAGGTATAAAAGTTGAAGAAATTTGCCGTCAGAACGGAATTAGCAGTGCCACCTA
CTACAATTGGAAATCTAAGTATGGTGGGATGGAAGCTAATGATGTTAAACGATTAAAAGAGCTTGAAGAGGAAAATACTC
GTCTCAAAAAATTGTTTGCTGAAGTCAGTCTTGAAAATCATGCCATGAAGGAGCTTTTTGCAAAAAAGGGTTGGTAATAA
CCGAGAAAAAGTCCTGTGCCGGATTATTAACGGCTTCAGGCCTGTCGGTTATTACAGCTTGCAAACTGACCTCTTTATCG
CGGGCCTCTTTTTATCGGCGCGGTACTGACTGGCGAGAAAAAGATAAGGTCGTTATCGATGCCATTCAGGCTGTGTTGTC
TGAGTCGCCTCAGGCTGGTTTCTGGAAATGCTATTATCGTCTCAGATTTAAAGGGTTTATCTTCAACCATAAACGTGTTT
ATCGTGTGTATTGCCGGTTAGGTTTAAACCTTAAACGCAGGATCAAAAAAACGCTGCCAAAACGTGAGAATAAGCCGTTA
TCAATTGTGAATCTGCCTGATATTCAGTGGGCACTGGACTTTATGCATGATGCTCTTTACTGCGGTAAGCGGTTCAGAAC
ACTAAATATAATTGACGAAGGAACGCGTGAATGTCTGGCCATTGAAGTTGATACTTCTTTACCGACAGACCGGGTTATCC
GTGTACTTGATCGACTGAAAAAAGAACGTGGGTTACCTCAGCAACTCAGAGTTGATAATGGCCCAGAACTGATATCAGTT
AACTTATTGAATTACTGCGAATATAATCACATTACACTGTGCCATATTCAGCCGGGTAAACCACAACAAAACGGATTTAT
TGAGCGTTTTAATGGCTCATTTCGTCGGGAGTTTTTGAATGCTTACTTGTTTGAATCATTAAGCCAAGTCAGAGAAATGG
CTTGGTTCTGGCAACAGGATTATAACCTGAATCGTACACATGAAAGTCTGGGGCATCTTCCCCCGGAGACATACCGAAAA
CAGCTAGAAAACTCTAATCAGGACTGTCTCAGATAACGGGGAGTGGACGTCCACTCCCCCTAAAATGAGACAGCTATAAT
TAGATTTTCTGCCTTTACCTTTGCAGAGGTCAAAATGAAAAAAGCACGCTTTACTGAAACTCAGATCGTTAATATCCTGA
AACTGGCTGATTCAGGTATAAAAGTTGAAGAAATTTGCCGTCAGAACGGAATTAGCAGTGCCACCTACTACAATTGGAAA
TCTAAGTATGGTGGGATGGAAGCTAATGATGTTAAACGATTAAAAGAGCTTGAAGAGGAAAATACTCGTCTCAAAAAATT
GTTTGCTGAAGTCAGTCTTGAAAATCATGCCATGAAGGAGCTTTTTGCAAAAAAGGGTTGGTAATAACCGAGAAAAAGTC
CTGTGCCGGATTATTAACGGCTTCAGGCCTGTCGGTTATTACAGCTTGCAAACTGACCTCTTTATCGCGGGCCTCTTTTT
ATCGGCGCGGTACTGACTGGCGAGAAAAAGATAAGGTCGTTATCGATGCCATTCAGGCTGTGTTGTCTGAGTCGCCTCAG
GCTGGTTTCTGGAAATGCTATTATCGTCTCAGATTTAAAGGGTTTATCTTCAACCATAAACGTGTTTATCGTGTGTATTG
CCGGTTAGGTTTAAACCTTAAACGCAGGATCAAAAAAACGCTGCCAAAACGTGAGAATAAGCCGTTATCAATTGTGAATC
TGCCTGATATTCAGTGGGCACTGGACTTTATGCATGATGCTCTTTACTGCGGTAAGCGGTTCAGAACACTAAATATAATT
GACGAAGGAACGCGTGAATGTCTGGCCATTGAAGTTGATACTTCTTTACCGACAGACCGGGTTATCCGTGTACTTGATCG
ACTGAAAAAAGAACGTGGGTTACCTCAGCAACTCAGAGTTGATAATGGCCCAGAACTGATATCAGTTAACTTATTGAATT
ACTGCGAATATAATCACATTACACTGTGCCATATTCAGCCGGGTAAACCACAACAAAACGGATTTATTGAGCGTTTTAAT
GGCTCATTTCGTCGGGAGTTTTTGAATGCTTACTTGTTTGAATCATTAAGCCAAGTCAGAGAAATGGCTTGGTTCTGGCA
ACAGGATTATAACCTGAATCGTACACATGAAAGTCTGGGGCATCTTCCCCCGGAGACATACCGAAAACAGCTAGAAAACT
CTAATCAGGACTGTCTCAGATAACGGGGAGTGGACGTCCACTCCCCCTAAAATGAGACAGCTATAATTAGATTTTCTGCC
TTTACCTTTGCAGAGGTCAAAATGAAAAAAGCACGCTTTACTGAAACTCAGATCGTTAATATCCTGAAACTGGCTGATTC
AGGTATAAAAGTTGAAGAAATTTGCCGTCAGAACGGAATTAGCAGTGCCACCTACTACAATTGGAAATCTAAGTATGGTG
GGATGGAAGCTAATGATGTTAAACGATTAAAAGAGCTTGAAGAGGAAAATACTCGTCTCAAAAAATTGTTTGCTGAAGTC
AGTCTTGAAAATCATGCCATGAAGGAGCTTTTTGCAAAAAAGGGTTGGTAATAACCGAGAAAAAGTCCTGTGCCGGATTA
TTAACGGCTTCAGGCCTGTCGGTTATTACAGCTTGCAAACTGACCTCTTTATCGCGGGCCTCTTTTTATCGGCGCGGTAC
TGACTGGCGAGAAAAAGATAAGGTCGTTATCGATGCCATTCAGGCTGTGTTGTCTGAGTCGCCTCAGGCTGGTTTCTGGA
AATGCTATTATCGTCTCAGATTTAAAGGGTTTATCTTCAACCATAAACGTGTTTATCGTGTGTATTGCCGGTTAGGTTTA
AACCTTAAACGCAGGATCAAAAAAACGCTGCCAAAACGTGAGAATAAGCCGTTATCAATTGTGAATCTGCCTGATATTCA
GTGGGCACTGGACTTTATGCATGATGCTCTTTACTGCGGTAAGCGGTTCAGAACACTAAATATAATTGACGAAGGAACGC
GTGAATGTCTGGCCATTGAAGTTGATACTTCTTTACCGACAGACCGGGTTATCCGTGTACTTGATCGACTGAAAAAAGAA
CGTGGGTTACCTCAGCAACTCAGAGTTGATAATGGCCCAGAACTGATATCAGTTAACTTATTGAATTACTGCGAATATAA
TCACATTACACTGTGCCATATTCAGCCGGGTAAACCACAACAAAACGGATTTATTGAGCGTTTTAATGGCTCATTTCGTC
GGGAGTTTTTGAATGCTTACTTGTTTGAATCATTAAGCCAAGTCAGAGAAATGGCTTGGTTCTGGCAACAGGATTATAAC
CTGAATCGTACACATGAAAGTCTGGGGCATCTTCCCCCGGAGACATACCGAAAACAGCTAGAAAACTCTAATCAGGACTG
TCTCAGATAACGGGGAGTGGACGTCCACTCCCCCTAAAATGAGACAGCTATAATTAGATTTTCTGCCTTTACCTTTGCAG
AGGTCAAAATGAAAAAAGCACGCTTTACTGAAACTCAGATCGTTAATATCCTGAAACTGGCTGATTCAGGTATAAAAGTT
GAAGAAATTTGCCGTCAGAACGGAATTAGCAGTGCCACCTACTACAATTGGAAATCTAAGTATGGTGGGATGGAAGCTAA
TGATGTTAAACGATTAAAAGAGCTTGAAGAGGAAAATACTCGTCTCAAAAAATTGTTTGCTGAAGTCAGTCTTGAAAATC
ATGCCATGAAGGAGCTTTTTGCAAAAAAGGGTTGGTAATAACCGAGAAAAAGTCCTGTGCCGGATTATTAACGGCTTCAG
GCCTGTCGGTTATTACAGCTTGCAAACTGACCTCTTTATCGCGGGCCTCTTTTTATCGGCGCGGTACTGACTGGCGAGAA
AAAGATAAGGTCGTTATCGATGCCATTCAGGCTGTGTTGTCTGAGTCGCCTCAGGCTGGTTTCTGGAAATGCTATTATCG
TCTCAGATTTAAAGGGTTTATCTTCAACCATAAACGTGTTTATCGTGTGTATTGCCGGTTAGGTTTAAACCTTAAACGCA
GGATCAAAAAAACGCTGCCAAAACGTGAGAATAAGCCGTTATCAATTGTGAATCTGCCTGATATTCAGTGGGCACTGGAC
TTTATGCATGATGCTCTTTACTGCGGTAAGCGGTTCAGAACACTAAATATAATTGACGAAGGAACGCGTGAATGTCTGGC
CATTGAAGTTGATACTTCTTTACCGACAGACCGGGTTATCCGTGTACTTGATCGACTGAAAAAAGAACGTGGGTTACCTC
AGCAACTCAGAGTTGATAATGGCCCAGAACTGATATCAGTTAACTTATTGAATTACTGCGAATATAATCACATTACACTG
TGCCATATTCAGCCGGGTAAACCACAACAAAACGGATTTATTGAGCGTTTTAATGGCTCATTTCGTCGGGAGTTTTTGAA
TGCTTACTTGTTTGAATCATTAAGCCAAGTCAGAGAAATGGCTTGGTTCTGGCAACAGGATTATAACCTGAATCGTACAC
ATGAAAGTCTGGGGCATCTTCCCCCGGAGACATACCGAAAACAGCTAGAAAACTCTAATCAGGACTGTCTCAGATAACGG
GGAGTGGACGTCCACTCCCCCTAAAATGAGACAGCTATAATTAGATTTTCTGCCTTTACCTTTGCAGAGGTCAAAATGAA
AAAAGCACGCTTTACTGAAACTCAGATCGTTAATATCCTGAAACTGGCTGATTCAGGTATAAAAGTTGAAGAAATTTGCC
GTCAGAACGGAATTAGCAGTGCCACCTACTACAATTGGAAATCTAAGTATGGTGGGATGGAAGCTAATGATGTTAAACGA
TTAAAAGAGCTTGAAGAGGAAAATACTCGTCTCAAAAAATTGTTTGCTGAAGTCAGTCTTGAAAATCATGCCATGAAGGA
GCTTTTTGCAAAAAAGGGTTGGTAATAACCGAGAAAAAGTCCTGTGCCGGATTATTAACGGCTTCAGGCCTGTCGGTTAT
TACAGCTTGCAAACTGACCTCTTTATCGCGGGCCTCTTTTTATCGGCGCGGTACTGACTGGCGAGAAAAAGATAAGGTCG
TTATCGATGCCATTCAGGCTGTGTTGTCTGAGTCGCCTCAGGCTGGTTTCTGGAAATGCTATTATCGTCTCAGATTTAAA
GGGTTTATCTTCAACCATAAACGTGTTTATCGTGTGTATTGCCGGTTAGGTTTAAACCTTAAACGCAGGATCAAAAAAAC
GCTGCCAAAACGTGAGAATAAGCCGTTATCAATTGTGAATCTGCCTGATATTCAGTGGGCACTGGACTTTATGCATGATG
CTCTTTACTGCGGTAAGCGGTTCAGAACACTAAATATAATTGACGAAGGAACGCGTGAATGTCTGGCCATTGAAGTTGAT
ACTTCTTTACCGACAGACCGGGTTATCCGTGTACTTGATCGACTGAAAAAAGAACGTGGGTTACCTCAGCAACTCAGAGT
TGATAATGGCCCAGAACTGATATCAGTTAACTTATTGAATTACTGCGAATATAATCACATTACACTGTGCCATATTCAGC
CGGGTAAACCACAACAAAACGGATTTATTGAGCGTTTTAATGGCTCATTTCGTCGGGAGTTTTTGAATGCTTACTTGTTT
GAATCATTAAGCCAAGTCAGAGAAATGGCTTGGTTCTGGCAACAGGATTATAACCTGAATCGTACACATGAAAGTCTGGG
GCATCTTCCCCCGGAGACATACCGAAAACAGCTAGAAAACTCTAATCAGGACTGTCTCAGATAACGGGGAGTGGACGTCC
ACTCCCCCTAAAATGAGACAGCTATAATTAGATTTTCTGCCTTTACCTTTGCAGAGGTCAAAATGAAAAAAGCACGCTTT
ACTGAAACTCAGATCGTTAATATCCTGAAACTGGCTGATTCAGGTATAAAAGTTGAAGAAATTTGCCGTCAGAACGGAAT
TAGCAGTGCCACCTACTACAATTGGAAATCTAAGTATGGTGGGATGGAAGCTAATGATGTTAAACGATTAAAAGAGCTTG
AAGAGGAAAATACTCGTCTCAAAAAATTGTTTGCTGAAGTCAGTCTTGAAAATCATGCCATGAAGGAGCTTTTTGCAAAA
AAGGGTTGGTAATAACCGAGAAAAAGTCCTGTGCCGGATTATTAACGGCTTCAGGCCTGTCGGTTATTACAGCTTGCAAA
CTGACCTCTTTATCGCGGGCCTCTTTTTATCGGCGCGGTACTGACTGGCGAGAAAAAGATAAGGTCGTTATCGATGCCAT
TCAGGCTGTGTTGTCTGAGTCGCCTCAGGCTGGTTTCTGGAAATGCTATTATCGTCTCAGATTTAAAGGGTTTATCTTCA
ACCATAAACGTGTTTATCGTGTGTATGGCCGCGTACACTCAAACCTAAAACACACAATCAAAAAAACACTGCCAAAACGG
GAGAAGAAACCGCTACCAAGGAAGAAGCCCCAGAATATCAAGTGGACAATGCACGTTATAAACCAGGCGCCAGAACCAAG
CAAGCGCTCCAAAACACCAAAAAGAATCAAACAAGGAACCAGTCAAAAACTCCCCACGAAAGGAGACACTAAAACACCCA
AAAAACCCGTTATCCGGGGACTTAACCGACTGAAAAAGGAACAGGGTAACCTCAGCAAATCAGAAGTAATAAAAGAACAG
AACTGATATCAGTTCTCGGACATAATTACCTCTGAGTTACTCACATAACACACTGCCATTTTTAACCCGGTCAAGCACAA
GGATAACCGAGTCATTGAGTATTGTAATAGCAAATTTAATGGGCAGACTTTCACGCCTTACTTCGTCGAATAAATAAACC
AAGCCAAACAAATGACCGCAGTAAAGAAAAACAAGCATAAACTCAAGCGCACACAGAAAAGCAGGCAGATTCACAACCGA
AAACAGACCAAAAACACCTAGAAAAAGCGAATCAGGAATCCCGCAGATAACGGGGAAACCAAACCGGCAATACACAAGAT
AAACACGTTTAAGGATGAAGATAAACCCTTTAAATCTGAGACGATAATAGCATTTCCAGAAACCAGCCTGAGGCGACTCA
GACAACACAGCCTGAATGGCATCGATAACGACCTTATCTTTTTCTCGCCAGTCAGTACCGCGCCGATAAAAAGAGGCCCG
CGATAAAGAGGTCAGTTTGCAAGCTGTAATAACCGACAGGCCTGAAGCCGTTAATAATCCGGCACAGGACTTTTTCTCGG
TTATTACCAACCCTTTTTTGCAAAAAGCTCCTTCATGGCATGATTTTCAAGACTGACTTCAGCAAACAATTTTTTGAGAC
GAGTATTTTCCTCTTCAAGCTCTTTTAATCGTTTAACATCATTAGCTTCCATCCCACCATACTTAGATTTCCAATTGTAG
TAGGTGGCACTGCTAATTCCGTTCTGACGGCAAATTTCTTCAACTTTTATACCTGAATCAGCCAGTTTCAGGATATTAAC
GATCTGAGTTTCAGTAAAGCGTGCTTTTTTCATTTTGACCTCTGCAAAGGTAAAGGCAGAAAATCTAATTATAGCTGTCT
CATTTTAGGGGGAGTGGACGTCCACTCCCCGTTATCTGAGACAGTCCTGATTAGAGTTTTCTAGCTGTTTTCGGTATGTC
TCCGGGGGAAGATGCCCCAGACTTTCATGTGTACGATTCAGGTTATAATCCTGTTGCCAGAACCAAGCCATTTCTCTGAC
TTGGCTTAATGATTCAAACAAGTAAGCATTCAAAAACTCCCGACGAAATGAGCCATTAAAACGCTCAATAAATCCGTTTT
GTTGTGGTTTACCCGGCTGAATATGGCACAGTGTAATGTGATTATATTCGCAGTAATTCAATAAGTTAACTGATATCAGT
TCTGGGCCATTATCAACTCTGAGTTGCTGAGGTAACCCACGTTCTTTTTTCAGTCGATCAAGTACACGGATAACCCGGTC
TGTCGGTAAAGAAGTATCAACTTCAATGGCCAGACATTCACGCGTTCCTTCGTCAATTATATTTAGTGTTCTGAACCGCT
TACCGCAGTAAAGAGCATCATGCATAAAGTCCAGTGCCCACTGAATATCAGGCAGATTCACAATTGATAACGGCTTATTC
TCACGTTTTGGCAGCGTTTTTTTGATCCTGCGTTTAAGGTTTAAACCTAACCGGCAATACACACGATAAACACGTTTATG
GTTGAAGATAAACCCTTTAAATCTGAGACGATAATAGCATTTCCAGAAACCAGCCTGAGGCGACTCAGACAACACAGCCT
GAATGGCATCGATAACGACCTTATCTTTTTCTCGCCAGTCAGTACCGCGCCGATAAAAAGAGGCCCGCGATAAAGAGGTC
AGTTTGCAAGCTGTAATAACCGACAGGCCTGAAGCCGTTAATAATCCGGCACAGGACTTTTTCTCGGTTATTACCAACCC
TTTTTTGCAAAAAGCTCCTTCATGGCATGATTTTCAAGACTGACTTCAGCAAACAATTTTTTGAGACGAGTATTTTCCTC
TTCAAGCTCTTTTAATCGTTTAACATCATTAGCTTCCATCCCACCATACTTAGATTTCCAATTGTAGTAGGTGGCACTGC
TAATTCCGTTCTGACGGCAAATTTCTTCAACTTTTATACCTGAATCAGCCAGTTTCAGGATATTAACGATCTGAGTTTCA
GTAAAGCGTGCTTTTTTCATTTTGACCTCTGCAAAGGTAAAGGCAGAAAATCTAATTATAGCTGTCTCATTTTAGGGGGA
GTGGACGTCCACTCCCCGTTATCTGAGACAGTCCTGATTAGAGTTTTCTAGCTGTTTTCGGTATGTCTCCGGGGGAAGAT
GCCCCAGACTTTCATGTGTACGATTCAGGTTATAATCCTGTTGCCAGAACCAAGCCATTTCTCTGACTTGGCTTAATGAT
TCAAACAAGTAAGCATTCAAAAACTCCCGACGAAATGAGCCATTAAAACGCTCAATAAATCCGTTTTGTTGTGGTTTACC
CGGCTGAATATGGCACAGTGTAATGTGATTATATTCGCAGTAATTCAATAAGTTAACTGATATCAGTTCTGGGCCATTAT
CAACTCTGAGTTGCTGAGGTAACCCACGTTCTTTTTTCAGTCGATCAAGTACACGGATAACCCGGTCTGTCGGTAAAGAA
GTATCAACTTCAATGGCCAGACATTCACGCGTTCCTTCGTCAATTATATTTAGTGTTCTGAACCGCTTACCGCAGTAAAG
AGCATCATGCATAAAGTCCAGTGCCCACTGAATATCAGGCAGATTCACAATTGATAACGGCTTATTCTCACGTTTTGGCA
GCGTTTTTTTGATCCTGCGTTTAAGGTTTAAACCTAACCGGCAATACACACGATAAACACGTTTATGGTTGAAGATAAAC
CCTTTAAATCTGAGACGATAATAGCATTTCCAGAAACCAGCCTGAGGCGACTCAGACAACACAGCCTGAATGGCATCGAT
AACGACCTTATCTTTTTCTCGCCAGTCAGTACCGCGCCGATAAAAAGAGGCCCGCGATAAAGAGGTCAGTTTGCAAGCTG
TAATAACCGACAGGCCTGAAGCCGTTAATAATCCGGCACAGGACTTTTTCTCGGTTATTACCAACCCTTTTTTGCAAAAA
GCTCCTTCATGGCATGATTTTCAAGACTGACTTCAGCAAACAATTTTTTGAGACGAGTATTTTCCTCTTCAAGCTCTTTT
AATCGTTTAACATCATTAGCTTCCATCCCACCATACTTAGATTTCCAATTGTAGTAGGTGGCACTGCTAATTCCGTTCTG
ACGGCAAATTTCTTCAACTTTTATACCTGAATCAGCCAGTTTCAGGATATTAACGATCTGAGTTTCAGTAAAGCGTGCTT
TTTTCATTTTGACCTCTGCAAAGGTAAAGGCAGAAAATCTAATTATAGCTGTCTCATTTTAGGGGGAGTGGACGTCCACT
CCCCGTTATCTGAGACAGTCCTGATTAGAGTTTTCTAGCTGTTTTCGGTATGTCTCCGGGGGAAGATGCCCCAGACTTTC
ATGTGTACGATTCAGGTTATAATCCTGTTGCCAGAACCAAGCCATTTCTCTGACTTGGCTTAATGATTCAAACAAGTAAG
CATTCAAAAACTCCCGACGAAATGAGCCATTAAAACGCTCAATAAATCCGTTTTGTTGTGGTTTACCCGGCTGAATATGG
CACAGTGTAATGTGATTATATTCGCAGTAATTCAATAAGTTAACTGATATCAGTTCTGGGCCATTATCAACTCTGAGTTG
CTGAGGTAACCCACGTTCTTTTTTCAGTCGATCAAGTACACGGATAACCCGGTCTGTCGGTAAAGAAGTATCAACTTCAA
TGGCCAGACATTCACGCGTTCCTTCGTCAATTATATTTAGTGTTCTGAACCGCTTACCGCAGTAAAGAGCATCATGCATA
AAGTCCAGTGCCCACTGAATATCAGGCAGATTCACAATTGATAACGGCTTATTCTCACGTTTTGGCAGCGTTTTTTTGAT
CCTGCGTTTAAGGTTTAAACCTAACCGGCAATACACACGATAAACACGTTTATGGTTGAAGATAAACCCTTTAAATCTGA
GACGATAATAGCATTTCCAGAAACCAGCCTGAGGCGACTCAGACAACACAGCCTGAATGGCATCGATAACGACCTTATCT
TTTTCTCGCCAGTCAGTACCGCGCCGATAAAAAGAGGCCCGCGATAAAGAGGTCAGTTTGCAAGCTGTAATAACCGACAG
GCCTGAAGCCGTTAATAATCCGGCACAGGACTTTTTCTCGGTTATTACCAACCCTTTTTTGCAAAAAGCTCCTTCATGGC
ATGATTTTCAAGACTGACTTCAGCAAACAATTTTTTGAGACGAGTATTTTCCTCTTCAAGCTCTTTTAATCGTTTAACAT
CATTAGCTTCCATCCCACCATACTTAGATTTCCAATTGTAGTAGGTGGCACTGCTAATTCCGTTCTGACGGCAAATTTCT
TCAACTTTTATACCTGAATCAGCCAGTTTCAGGATATTAACGATCTGAGTTTCAGTAAAGCGTGCTTTTTTCATTTTGAC
CTCTGCAAAGGTAAAGGCAGAAAATCTAATTATAGCTGTCTCATTTTAGGGGGAGTGGACGTCCACTCCCCGTTATCTGA
GACAGTCCTGATTAGAGTTTTCTAGCTGTTTTCGGTATGTCTCCGGGGGAAGATGCCCCAGACTTTCATGTGTACGATTC
AGGTTATAATCCTGTTGCCAGAACCAAGCCATTTCTCTGACTTGGCTTAATGATTCAAACAAGTAAGCATTCAAAAACTC
CCGACGAAATGAGCCATTAAAACGCTCAATAAATCCGTTTTGTTGTGGTTTACCCGGCTGAATATGGCACAGTGTAATGT
GATTATATTCGCAGTAATTCAATAAGTTAACTGATATCAGTTCTGGGCCATTATCAACTCTGAGTTGCTGAGGTAACCCA
CGTTCTTTTTTCAGTCGATCAAGTACACGGATAACCCGGTCTGTCGGTAAAGAAGTATCAACTTCAATGGCCAGACATTC
ACGCGTTCCTTCGTCAATTATATTTAGTGTTCTGAACCGCTTACCGCAGTAAAGAGCATCATGCATAAAGTCCAGTGCCC
ACTGAATATCAGGCAGATTCACAATTGATAACGGCTTATTCTCACGTTTTGGCAGCGTTTTTTTGATCCTGCGTTTAAGG
TTTAAACCTAACCGGCAATACACACGATAAACACGTTTATGGTTGAAGATAAACCCTTTAAATCTGAGACGATAATAGCA
TTTCCAGAAACCAGCCTGAGGCGACTCAGACAACACAGCCTGAATGGCATCGATAACGACCTTATCTTTTTCTCGCCAGT
CAGTACCGCGCCGATAAAAAGAGGCCCGCGATAAAGAGGTCAGTTTGCAAGCTGTAATAACCGACAGGCCTGAAGCCGTT
AATAATCCGGCACAGGACTTTTTCTCGGTTATTACCAACCCTTTTTTGCAAAAAGCTCCTTCATGGCATGATTTTCAAGA
CTGACTTCAGCAAACAATTTTTTGAGACGAGTATTTTCCTCTTCAAGCTCTTTTAATCGTTTAACATCATTAGCTTCCAT
CCCACCATACTTAGATTTCCAATTGTAGTAGGTGGCACTGCTAATTCCGTTCTGACGGCAAATTTCTTCAACTTTTATAC
CTGAATCAGCCAGTTTCAGGATATTAACGATCTGAGTTTCAGTAAAGCGTGCTTTTTTCATTTTGACCTCTGCAAAGGTA
AAGGCAGAAAATCTAATTATAGCTGTCTCATTTTAGGGGGAGTGGACGTCCACTCCCCGTTATCTGAGACAGTCCTGATT
AGAGTTTTCTAGCTGTTTTCGGTATGTCTCCGGGGGAAGATGCCCCAGACTTTCATGTGTACGATTCAGGTTATAATCCT
GTTGCCAGAACCAAGCCATTTCTCTGACTTGGCTTAATGATTCAAACAAGTAAGCATTCAAAAACTCCCGACGAAATGAG
CCATTAAAACGCTCAATAAATCCGTTTTGTTGTGGTTTACCCGGCTGAATATGGCACAGTGTAATGTGATTATATTCGCA
GTAATTCAATAAGTTAACTGATATCAGTTCTGGGCCATTATCAACTCTGAGTTGCTGAGGTAACCCACGTTCTTTTTTCA
GTCGATCAAGTACACGGATAACCCGGTCTGTCGGTAAAGAAGTATCAACTTCAATGGCCAGACATTCACGCGTTCCTTCG
TCAATTATATTTAGTGTTCTGAACCGCTTACCGCAGTAAAGAGCATCATGCATAAAGTCCAGTGCCCACTGAATATCAGG
CAGATTCACAATTGATAACGGCTTATTCTCACGTTTTGGCAGCGTTTTTTTGATCCTGCGTTTAAGGTTTAAACCTAACC
GGCAATACACACGATAAACACGTTTATGGTTGAAGATAAACCCTTTAAATCTGAGACGATAATAGCATTTCCAGAAACCA
GCCTGAGGCGACTCAGACAACACAGCCTGAATGGCATCGATAACGACCTTATCTTTTTCTCGCCAGTCAGTACCGCGCCG
ATAAAAAGAGGCCCGCGATAAAGAGGTCAGTTTGCAAGCTGTAATAACCGACAGGCCTGAAGCCGTTAATAATCCGGCAC
AGGACTTTTTCTCGGTTATTACCAACCCTTTTTTGCAAAAAGCTCCTTCATGGCATGATTTTCAAGACTGACTTCAGCAA
ACAATTTTTTGAGACGAGTATTTTCCTCTTCAAGCTCTTTTAATCGTTTAACATCATTAGCTTCCATCCCACCATACTTA
GATTTCCAATTGTAGTAGGTGGCACTGCTAATTCCGTTCTGACGGCAAATTTCTTCAACTTTTATACCTGAATCAGCCAG
TTTCAGGATATTAACGATCTGAGTTTCAGTAAAGCGTGCTTTTTTCATTTTGACCTCTGCAAAGGTAAAGGCAGAAAATC
TAATTATAGCTGTCTCATTTTAGGGGGAGTGGACGTCCACTCCCCGTTATCTGAGACAGTCCTGATTAGAGTTTTCTAGC
TGTTTTCGGTATGTCTCCGGGGGAAGATGCCCCAGACTTTCATGTGTACGATTCAGGTTATAATCCTGTTGCCAGAACCA
AGCCATTTCTCTGACTTGGCTTAATGATTCAAACAAGTAAGCATTCAAAAACTCCCGACGAAATGAGCCATTAAAACGCT
CAATAAATCCGTTTTGTTGTGGTTTACCCGGCTGAATATGGCACAGTGTAATGTGATTATATTCGCAGTAATTCAATAAG
TTAACTGATATCAGTTCTGGGCCATTATCAACTCTGAGTTGCTGAGGTAACCCACGTTCTTTTTTCAGTCGATCAAGTAC
ACGGATAACCCGGTCTGTCGGTAAAGAAGTATCAACTTCAATGGCCAGACATTCACGCGTTCCTTCGTCAATTATATTTA
GTGTTCTGAACCGCTTACCGCAGTAAAGAGCATCATGCATAAAGTCCAGTGCCCACTGAATATCAGGCAGATTCACAATT
GATAACGGCTTATTCTCACGTTTTGGCAGCGTTTTTTTGATCCTGCGTTTAAGGTTTAAACCTAACCGGCAATACACACG
ATAAACACGTTTATGGTTGAAGATAAACCCTTTAAATCTGAGACGATAATAGCATTTCCAGAAACCAGCCTGAGGCGACT
CAGACAACACAGCCTGAATGGCATCGATAACGACCTTATCTTTTTCTCGCCAGTCAGTACCGCGCCGATAAAAAGAGGCC
CGCGATAAAGAGGTCAGTTTGCAAGCTGTAATAACCGACAGGCCTGAAGCCGTTAATAATCCGGCACAGGACTTTTTCTC
GGTTATTACCAACCCTTTTTTGCAAAAAGCTCCTTCATGGCATGATTTTCAAGACTGACTTCAGCAAACAATTTTTTGAG
ACGAGTATTTTCCTCTTCAAGCTCTTTTAATCGTTTAACATCATTAGCTTCCATCCCACCATACTTAGATTTCCAATTGT
AGTAGGTGGCACTGCTAATTCCGTTCTGACGGCAAATTTCTTCAACTTTTATACCTGAATCAGCCAGTTTCAGGATATTA
ACGATCTGAGTTTCAGTAAAGCGTGCTTTTTTCATTTTGACCTCTGCAAAGGTAAAGGCAGAAAATCTAATTATAGCTGT
CTCATTTTAGGGGGAGTGGACGTCCACTCCCCGTTATCTGAGACAGTCCTGATTAGAGTTTTCTAGCTGTTTTCGGTATG
TCTCCGGGGGAAGATGCCCCAGACTTTCATGTGTACGATTCAGGTTATAATCCTGTTGCCAGAACCAAGCCATTTCTCTG
ACTTGGCTTAATGATTCAAACAAGTAAGCATTCAAAAACTCCCGACGAAATGAGCCATTAAAACGCTCAATAAATCCGTT
TTGTTGTGGTTTACCCGGCTGAATATGGCACAGTGTAATGTGATTATATTCGCAGTAATTCAATAAGTTAACTGATATCA
GTTCTGGGCCATTATCAACTCTGAGTTGCTGAGGTAACCCACGTTCTTTTTTCAGTCGATCAAGTACACGGATAACCCGG
TCTGTCGGTAAAGAAGTATCAACTTCAATGGCCAGACATTCACGCGTTCCTTCGTCAATTATATTTAGTGTTCTGAACCG
CTTACCGCAGTAAAGAGCATCATGCATAAAGTCCAGTGCCCACTGAATATCAGGCAGATTCACAATTGATAACGGCTTAT
TCTCACGTTTTGGCAGCGTTTTTTTGATCCTGCGTTTAAGGTTTAAACCTAACCGGCAATACACACGATAAACACGTTTA
TGGTTGAAGATAAACCCTTTAAATCTGAGACGATAATAGCATTTCCAGAAACCAGCCTGAGGCGACTCAGACAACACAGC
CTGAATGGCATCGATAACGACCTTATCTTTTTCTCGCCAGTCAGTACCGCGCCGATAAAAAGAGGCCCGCGATAAAGAGG
TCAGTTTGCAAGCTGTAATAACCGACAGGCCTGAAGCCGTTAATAATCCGGCACAGGACTTTTTCTCGGTTATTACCAAC
CCTTTTTTGCAAAAAGCTCCTTCATGGCATGATTTTCAAGACTGACTTCAGCAAACAATTTTTTGAGACGAGTATTTTCC
TCTTCAAGCTCTTTTAATCGTTTAACATCATTAGCTTCCATCCCACCATACTTAGATTTCCAATTGTAGTAGGTGGCACT
GCTAATTCCGTTCTGACGGCAAATTTCTTCAACTTTTATACCTGAATCAGCCAGTTTCAGGATATTAACGATCTGAGTTT
CAGTAAAGCGTGCTTTTTTCATTTTGACCTCTGCAAAGGTAAAGGCAGAAAATCTAATTATAGCTGTCTCATTTTAGGGG
GAGTGGACGTCCACTCCCCGTTATCTGAGACAGTCCTGATTAGAGTTTTCTAGCTGTTTTCGGTATGTCTCCGGGGGAAG
ATGCCCCAGACTTTCATGTGTACGATTCAGGTTATAATCCTGTTGCCAGAACCAAGCCATTTCTCTGACTTGGCTTAATG
ATTCAAACAAGTAAGCATTCAAAAACTCCCGACGAAATGAGCCATTAAAACGCTCAATAAATCCGTTTTGTTGTGGTTTA
CCCGGCTGAATATGGCACAGTGTAATGTGATTATATTCGCAGTAATTCAATAAGTTAACTGATATCAGTTCTGGGCCATT
ATCAACTCTGAGTTGCTGAGGTAACCCACGTTCTTTTTTCAGTCGATCAAGTACACGGATAACCCGGTCTGTCGGTAAAG
AAGTATCAACTTCAATGGCCAGACATTCACGCGTTCCTTCGTCAATTATATTTAGTGTTCTGAACCGCTTACCGCAGTAA
AGAGCATCATGCATAAAGTCCAGTGCCCACTGAATATCAGGCAGATTCACAATTGATAACGGCTTATTCTCACGTTTTGG
CAGCGTTTTTTTGATCCTGCGTTTAAGGTTTAAACCTAACCGGCAATACACACGATAAACACGTTTATGGTTGAAGATAA
ACCCTTTAAATCTGAGACGATAATAGCATTTCCAGAAACCAGCCTGAGGCGACTCAGACAACACAGCCTGAATGGCATCG
ATAACGACCTTATCTTTTTCTCGCCAGTCAGTACCGCGCCGATAAAAAGAGGCCCGCGATAAAGAGGTCAGTTTGCAAGC
TGTAATAACCGA
>aldovae3 count=63
ACCCGAATCACTTACCTGAGTAAGCTCATCGGGATTAATGAGCCTCATCCTTGAGGCTCACCCTGCGGGCTAGCATAAAT
GCTGTTCAAATCGGTTCCCGACCGATTTGTCACTCGCTTGCCGCCTTCCTGCAACTCGAATTATTTTGGGTATA
>aldovae4 count=41
ATACCCTTCATCCTTGAAGTTGCAGGGGTGTTGGCTGCGCTCGTTACTCGGCCCGTCCATGGGCCTCGCCTCTGCGAGGC
CGCTGCAAGCAGCGTTCAAATCTGCTCCCGACAGATTTGTCACCCGAATCACTTACTTGTGTAAGCTCATCGGGATT
>aldovae5 count=25
TGAGGTTAATGACAAAGTGCCCGCAGCGGTGAAAACAGGCAGATCGTAAAGACGCCGTAAACACATCCCTGGAGGCTCGA
GCCGCGCCATCCCTGGCGCGGACGCTTTACTCTTCTGCCTGTCCTCACCTTGCAAGATCGAGTCGTCGGGGTTTGTCAGC
AGTCTG
>aldovae11 count=17
TGGTGCGGGCAACGAGACTCGAACTCGCGACCCCCAGCTTAGCAAGCAGGTGCTCTACCAACTGAGCTATGCCCGCA
>mollaretii1 count=33
CTGAGGTTATTGACAAAGTGCCAGCGGCGGAGAGAACAGGCGGATCGTAAAGACGCCGTAAATCCATCCTTGGAGGCTCG
AGCCGCGCCATCCTTGGCGCGGACGCTTTACTCTTCTGCCTGTCCTCACCGCTTCAGATCGAGTCAT
>mollaretii7 count=27
GGTTTCGGTTGATATTCGTTTAGGTGATCGGTTCGGTTGACATAAGATCGGTGCTCACTTGGCCTCCGATGCCTCAACCG
CCAAAGAGGTTACAAGCGCGTGACCCCTTTGGAATCCCCGCGCTAGCGCACGTATCGCTTGCCCACTTCGTGGGTACCCT
C
>mollaretii9 count=18
AGGCGTAATTGGCGCAGACAGTTTGGACGCGGACAGCGCGCAAAAACCGGAGCGTACACGTAGTACGTGAGGATTCTGAG
CACTGCCCAAGCCCAAAATGGCAAGCAAAATAGCC
>mollaretii10 count=27
TGCAGGAAGGCGGCAAACGAAAGAATCCCGATGAGCTTACACAAGTAAGTGATTCGGGTGACAAATCTGCCGGGAGCAGA
TTTGAACGCTGCTTGCAGCGGCCTCGCAGAGGCGAGGCCCATGGATGGGCCGAGTAACGAGTGCAGCCAACACACCTGCA
GCTTGAAAGATGAAGG
>mollaretii11 count=16
ATCTCTTTGGCCGCATGGGCACTGCGTTGTGCCAAATTACGCACTTCACCAGCGACCACAGCAAAACCACGGCCCTGCTC
ACCTGCTCGCGCCGCTTCAACAGCGGCATTCAATGCCAGAATATT
>fredriksenii2 count=18
TAGCTCAGTTGGTAGAGCACCTGACTTGTAATCAGTGGGTCGCGAGTTCGAGTCCCGTAGCCCGCACCAA
>fredriksenii3 count=31
GAACGCCCGGAAGCCAGCGGTTACAGCGCCAACCTGGCAGCAGCCAACAATATGTTTGTCACTCGTTTGCATGACCGTCT
GGGAGAAACCCAATATATCGATGCACTGACTGGTGAGCAGAAAGTCACCAGCATGTGGCTGCGTAATGAAGGCGGGCATA
ACCGTTCGCGTGATACTCAGGACCAACTGCGCACCCAAAGTAACCGCTATGTGATGCAACTGGGGGGCGATATTGCCCAG
TGGAGCAACAATGGTGCAGACCGTTTCCATCTGGGGCTGATGGCCGGTTACGGTAACAGCAAAAGCACCACCGAATCACG
TTTATCCGGTTACAGCGCCCGCGCCTCGGTAGATGGCTACAGCACCGGTGTCTATGGCACCTGGTATGCCAATAGCGCCG
ATAAAACCGGCCTGTATGTGGACAGCTGGGCGCAATACAGCTGGTTCAATAACACTGTTAATGGTCAGGGTTTGGCGGCT
GAAGAGTACAAATCCAAGGGGGTCACCGCCTCGGTTGAAAGCGGCTATACCTTTAAAGTGGGCGAAAATACCGCGAAAAA
TGCCACATACTTTATCCAGCCGAAAGCGCAGGTCACCTGGATGGGGGTCAAAGCTGATGACCATAAAGAAGCCAATGGCA
CACATGTTTCAGGTGAAGGCAATGGCAATATCCAGACCCGTCTAGGGGTGAAAGCCTTTATGAACGGCTACGCCGAACAG
GATAAAGGCAAAGACCGGGTATTCCAGCCGTTTGTCGAAGCCAACTGGATCCACAACACCAAAGACTTTGGTACCACCAT
GGATGGCATTACGGTGAAACAGGCCGGTGCGGCAAATATTGGTGAACTGAAACTCGGTGTGGAAGGCCAAATTAATAAGA
AAGTCAATCTCTGGGGCAATGTCGGCCAGCAAGTCGGTAATAAAGGCTACAGTGATACCGCAGTGATGTTAGGGGTTAAA
TACAACTTCTAAT
>fredriksenii4 count=17
GTAGTCATAGGCCCCGGCAACGATACGCCCCTGCTGAGTAAACTCACCCTCAGACACGCCGCCCACCGTGATCAGCTCAA
TGCCGTTCAGGGTCTGTGCGCCACTGCCGCCCAGATTATTGACCTGCACATAGGTGTTACCGGAGGTATTGCCAGCGATA
ATCAGTTTATCGGTAACAGAATCGTCATCTTCCAGCACCGAATTCATATTCAGCAGGCCATTGTTGCCGGTATAATCGCC
GGTAATGGTCAGGGTGTTGCCGCCAATCCCGCCAAAGTTCACAGCCCCGCCATTATTGAGGGACTGAAGTGTCTG
>fredriksenii9 count=11
CGGTAACCGGCGGGGCGACCTTTGCGGGCGGCACCCAGACCAGTCTGGCGGTGACGACTGACAACAACGGCGTGGCGACA
GCCAATCTGGTGAGTCTGGTGGCGGGCGACCACCCTGTCACAGCGACCGTGGGAACCAATACTACGGCGGCGAAGAACGC
GACCTTTATTGCCGATGAAACCACCGCGGTCATTGCCGCGACCGACTTTACCGTTGCCAGCGGAGCCGTGGCTGATAATA
GTGCCACTAATGCGTTAAGTGCCACAGTGAAAGATGCCGGGGGCAATACCGTGCCGAATGTCAGTGTCACTTTTGCGGTA
ACCGGCGGGGCGACCTTTGCGGGCGGCACCCAAACCAGTCTGGCGGTGACTACTGACAACAACGGGGTGGCGACAGCCAA
TCTGGTGAGTCTGGTGGCCGGCGATCATCCGGTCACGGCCACCGTGGGGACCAACACCACGGCGGCGAAGAACTCGACCT
TTATTGCCGATGAAACCACCGCCGTGATTGCGGCGACCGACTTTACCGTTGCCAGCGGAGCCGTGGCTGATAATAGTGCC
ACTAATGCACTAAGTGCCACGGTGAAAGACAGCCATGGCAATACCGTGCCGAACGTCAGTGTCACCTTTGC
>fredriksenii13 count=11
ACCGATGCACAAGGAAACCGAGTACCGAATCTGCTGGTGAATTTCAGTGCCAATAATGGTGCGGTGATTGCTGCCAGTGG
CACCACCGGTACCGATGGTTCGGTGAC
>bercovieri1 count=49
TGACAAACCCCGATGACTCGATCAGGAACGGTGAGGACAGGCAGAAGAGTAAAGCGTCCGCGCCAGGGATGGCGCGGCTC
GAGCCTCCAGGGATGGATTTACGGCGTCTTTACGATCTGCCTGTTCTCTCCGCTGCGGGCACTTTGTCAATAACCTCA
>bercovieri2 count=11
TATCCTCCGGCATAGCCGGAGGTTTTTCATATGCGCCTATAAGGCTCTCTTACCTGCCGCGCCCTAACAGGCGCATCGCG
ATCTGACATTTGCATCACAATTCGTTACTTACGGCCCGTAAACGGGCTACCCGGATAGGGGATCGACAACTGCTCACCCA
TTTTATCCTCTTCCAGTTGGTGTTTTATGTATTCTTGTATCCTGGCTGTATTTTTACCCACCGTATCAACGTAATACCCT
CGACACCAAAACTCCCGGTTACGATATTTGAACTTCAAATCGCCAAACTGCTCATAAAGCATCAGGCTACTCTTTCCCTT
TAGGTATCCCATAAAGCCCGACACACTCATCTTGGGTGGGATTTCCAGAAGCATATGGATGTGATCCACACAACATTCCG
CTTCCAGAATATTCACGTTCTTCCATTCGCACAGCTTTCTTAAAATACTGCCAATCGCTTTGCGCTTTTCCCCGTAGAAC
ACCTGTCTTCGGTACTTCGGCGCAAATACTATGTGATATTTACAGTTCCATCGCGTGTGCGCTAAGCTCTTTTCGTCCCT
CATAGGGACCCCCTTTTGATTTCTTGTTGAACGTTTGCAGTTGCCAGACCGCAAACTGTTTTAACAAATCAAAAGGGGTT
TTTATAACTGACTCAAAGCTGAAAGCTTTACGGAACCCCCAGCCTAGCTGGGGGTTTTCTGTGCTATCCTCCGGCATAGC
CGGAGGTTTTTCATATGCGCCTATAAGGCTCTCTTACCTGCCGCGCCCTAACAGGCGCATCGCGATCTGACATTTGCATC
ACAATTCGTTACTTACGGCCCGTAAACGGGCTACCCGGATAGGGGATCGACAACTGCTCACCCATTTTATCCTCTTCCAG
TTGGTGTTTTATGTATTCTTGTATCCTGGCTGTATTTTTACCCACCGTATCAACGTAATACCCTCGACACCAAAACTCCC
GGTTACGATATTTGAACTTCAAATCGCCAAACTGCTCATAAAGCATCAGGCTACTCTTTCCCTTTAGGTATCCCATAAAG
CCCGACACACTCATCTTGGGTGGGATTTCCAGAAGCATATGGATGTGATCCACACAACATTCCGCTTCCAGAATATTCAC
GTTCTTCCATTCGCACAGCTTTCTTAAAATACTGCCAATCGCTTTGCGCTTTTCCCCGTAGAACACCTGTCTTCGGTACT
TCGGCGCAAATACTATGTGATATTTACAGTTCCATCGCGTGTGCGCTAAGCTCTTTTCGTCCCTCATAGGGACCCCCTTT
TGATTTCTTGTTGAACGTTTGCAGTTGCCAGACCGCAAACTGTTTTAACAAATCAAAAGGGGTTTTTATAACTGACTCAA
AGCTGAAAGCTTTACGGAACCCCCAGCCTAGCTGGGGGTTTTCTGTGCTATCCTCCGGCATAGCCGGAGGTTTTTCATAT
GCGCCTATAAGGCTCTCTTACCTGCCGCGCCCTAACAGGCGCATCGCGATCTGACATTTGCATCACAATTCGTTACTTAC
GGCCCGTAAACGGGCTACCCGGATAGGGGATCGACAACTGCTCACCCATTTTATCCTCTTCCAGTTGGTGTTTTATGTAT
TCTTGTATCCTGGCTGTATTTTTACCCACCGTATCAACGTAATACCCTCGACACCAAAACTCCCGGTTACGATATTTGAA
CTTCAAATCGCCAAACTGCTCATAAAGCATCAGGCTACTCTTTCCCTTTAGGTATCCCATAAAGCCCGACACACTCATCT
TGGGTGGGATTTCCAGAAGCATATGGATGTGATCCACACAACATTCCGCTTCCAGAATATTCACGTTCTTCCATTCGCAC
AGCTTTCTTAAAATACTGCCAATCGCTTTGCGCTTTTCCCCGTAGAACACCTGTCTTCGGTACTTCGGCGCAAATACTAT
GTGATATTTACAGTTCCATCGCGTGTGCGCTAAGCTCTTTTCGTCCCTCATAGGGACCCCCTTTTGATTTCTTGTTGAAC
GTTTGCAGTTGCCAGACCGCAAACTGTTTTAACAAATCAAAAGGGGTTTTTATAACTGACTCAAAGCTGAAAGCTTTACG
GAACCCCCAGCCTAGCTGGGGGTTTTCTGTGCTATCCTCCGGCATAGCCGGAGGTTTTTCATATGCGCCTATAAGGCTCT
CTTACCTGCCGCGCCCTAACAGGCGCATCGCGATCTGACATTTGCATCACAATTCGTTACTTACGGCCCGTAAACGGGCT
ACCCGGATAGGGGATCGACAACTGCTCACCCATTTTATCCTCTTCCAGTTGGTGTTTTATGTATTCTTGTATCCTGGCTG
TATTTTTACCCACCGTATCAACGTAATACCCTCGACACCAAAACTCCCGGTTACGATATTTGAACTTCAAATCGCCAAAC
TGCTCATAAAGCATCAGGCTACTCTTTCCCTTTAGGTATCCCATAAAGCCCGACACACTCATCTTGGGTGGGATTTCCAG
AAGCATATGGATGTGATCCACACAACATTCCGCTTCCAGAATATTCACGTTCTTCCATTCGCACAGCTTTCTTAAAATAC
TGCCAATCGCTTTGCGCTTTTCCCCGTAGAACACCTGTCTTCGGTACTTCGGCGCAAATACTATGTGATATTTACAGTTC
CATCGCGTGTGCGCTAAGCTCTTTTCGTCCCTCATAGGGACCCCCTTTTGATTTCTTGTTGAACGTTTGCAGTTGCCAGA
CCGCAAACTGTTTTAACAAATCAAAAGGGGTTTTTATAACTGACTCAAAGCTGAAAGCTTTACGGAACCCCCAGCCTAGC
TGGGGGTTTTCTGTGCTATCCTCCGGCATAGCCGGAGGTTTTTCATATGCGCCTATAAGGCTCTCTTACCTGCCGCGCCC
TAACAGGCGCATCGCGATCTGACATTTGCATCACAATTCGTTACTTACGGCCCGTAAACGGGCTACCCGGATAGGGGATC
GACAACTGCTCACCCATTTTATCCTCTTCCAGTTGGTGTTTTATGTATTCTTGTATCCTGGCTGTATTTTTACCCACCGT
ATCAACGTAATACCCTCGACACCAAAACTCCCGGTTACGATATTTGAACTTCAAATCGCCAAACTGCTCATAAAGCATCA
GGCTACTCTTTCCCTTTAGGTATCCCATAAAGCCCGACACACTCATCTTGGGTGGGATTTCCAGAAGCATATGGATGTGA
TCCACACAACATTCCGCTTCCAGAATATTCACGTTCTTCCATTCGCACAGCTTTCTTAAAATACTGCCAATCGCTTTGCG
CTTTTCCCCGTAGAACACCTGTCTTCGGTACTTCGGCGCAAATACTATGTGATATTTACAGTTCCATCGCGTGTGCGCTA
AGCTCTTTTCGTCCCTCATAGGGACCCCCTTTTGATTTCTTGTTGAACGTTTGCAGTTGCCAGACCGCAAACTGTTTTAA
CAAATCAAAAGGGGTTTTTATAACTGACTCAAAGCTGAAAGCTTTACGGAACCCCCAGCCTAGCTGGGGGTTTTCTGTGC
TATCCTCCGGCATAGCCGGAGGTTTTTCATATGCGCCTATAAGGCTCTCTTACCTGCCGCGCCCTAACAGGCGCATCGCG
ATCTGACATTTGCATCACAATTCGTTACTTACGGCCCGTAAACGGGCTACCCGGATAGGGGATCGACAACTGCTCACCCA
TTTTATCCTCTTCCAGTTGGTGTTTTATGTATTCTTGTATCCTGGCTGTATTTTTACCCACCGTATCAACGTAATACCCT
CGACACCAAAACTCCCGGTTACGATATTTGAACTTCAAATCGCCAAACTGCTCATAAAGCATCAGGCTACTCTTTCCCTT
TAGGTATCCCATAAAGCCCGACACACTCATCTTGGGTGGGATTTCCAGAAGCATATGGATGTGATCCACACAACATTCCG
CTTCCAGAATATTCACGTTCTTCCATTCGCACAGCTCTCTTAAAATACTGCCAATAGCGATCCGCTTGTCCCCATAGAAC
ACCTGCCTTCGGTACTTCGGCGCAAACACAAAACCACAAGCAAAGCTCCAGCGCCCGTAAACCAAGCACCTTTCAGCCAG
CAAAAAAAACCCCTTTTGATTTCTTAATAAACGTTGCCGTCTGCCAACTGCAAACGTTCAAAAAAAAACAAAAGGGGGTC
CCAAAAAGGAACCAAAACAGAAAACCGCACACAACACCCAACCGAAAAGAGCACATACTATGCGCGCCGAAGTACCGAAG
ACAGGTGTTCTACGGGGAAAAGCGCAAAGCGATTGGCAGTATTTTAAGAAAGCTGTGCGAATGGAAGAACGTGAATATTC
TGGAAGCGGAATGTTGTGTGGATCACATCCATATGCTTCTGGAAATCCCACCCAAGATGAGTGTGTCGGGCTTTATGGGA
TACCTAAAGGGAAAGAGTAGCCTGATGCTTTATGAGCAGTTTGGCGATTTGAAGTTCAAATATCGTAACCGGGAGTTTTG
GTGTCGAGGGTATTACGTTGATACGGTGGGTAAAAATACAGCCAGGATACAAGAATACATAAAACACCAACTGGAAGAGG
ATAAAATGGGTGAGCAGTTGTCGATCCCCTATCCGGGTAGCCCGTTTACGGGCCGTAAGTAACGAATTGTGATGCAAATG
TCAGATCGCGATGCGCCTGTTAGGGCGCGGCAGGTAAGAGAGCCTTATAGGCGCATATGAAAAACCTCCGGCTATGCCGG
AGGATAGCACAGAAAACCCCCAGCTAGGCTGGGGGTTCCGTAAAGCTTTCAGCTTTGAGTCAGTTATAAAAACCCCTTTT
GATTTGTTAAAACAGTTTGCGGTCTGGCAACTGCAAACGTTCAACAAGAAATCAAAAGGGGGTCCCTATGAGGGACGAAA
AGAGCTTAGCGCACACGCGATGGAACTGTAAATATCACATAGTATTTGCGCCGAAGTACCGAAGACAGGTGTTCTACGGG
GAAAAGCGCAAAGCGATTGGCAGTATTTTAAGAAAGCTGTGCGAATGGAAGAACGTGAATATTCTGGAAGCGGAATGTTG
TGTGGATCACATCCATATGCTTCTGGAAATCCCACCCAAGATGAGTGTGTCGGGCTTTATGGGATACCTAAAGGGAAAGA
GTAGCCTGATGCTTTATGAGCAGTTTGGCGATTTGAAGTTCAAATATCGTAACCGGGAGTTTTGGTGTCGAGGGTATTAC
GTTGATACGGTGGGTAAAAATACAGCCAGGATACAAGAATACATAAAACACCAACTGGAAGAGGATAAAATGGGTGAGCA
GTTGTCGATCCCCTATCCGGGTAGCCCGTTTACGGGCCGTAAGTAACGAATTGTGATGCAAATGTCAGATCGCGATGCGC
CTGTTAGGGCGCGGCAGGTAAGAGAGCCTTATAGGCGCATATGAAAAACCTCCGGCTATGCCGGAGGATAGCACAGAAAA
CCCCCAGCTAGGCTGGGGGTTCCGTAAAGCTTTCAGCTTTGAGTCAGTTATAAAAACCCCTTTTGATTTGTTAAAACAGT
TTGCGGTCTGGCAACTGCAAACGTTCAACAAGAAATCAAAAGGGGGTCCCTATGAGGGACGAAAAGAGCTTAGCGCACAC
GCGATGGAACTGTAAATATCACATAGTATTTGCGCCGAAGTACCGAAGACAGGTGTTCTACGGGGAAAAGCGCAAAGCGA
TTGGCAGTATTTTAAGAAAGCTGTGCGAATGGAAGAACGTGAATATTCTGGAAGCGGAATGTTGTGTGGATCACATCCAT
ATGCTTCTGGAAATCCCACCCAAGATGAGTGTGTCGGGCTTTATGGGATACCTAAAGGGAAAGAGTAGCCTGATGCTTTA
TGAGCAGTTTGGCGATTTGAAGTTCAAATATCGTAACCGGGAGTTTTGGTGTCGAGGGTATTACGTTGATACGGTGGGTA
AAAATACAGCCAGGATACAAGAATACATAAAACACCAACTGGAAGAGGATAAAATGGGTGAGCAGTTGTCGATCCCCTAT
CCGGGTAGCCCGTTTACGGGCCGTAAGTAACGAATTGTGATGCAAATGTCAGATCGCGATGCGCCTGTTAGGGCGCGGCA
GGTAAGAGAGCCTTATAGGCGCATATGAAAAACCTCCGGCTATGCCGGAGGATAGCACAGAAAACCCCCAGCTAGGCTGG
GGGTTCCGTAAAGCTTTCAGCTTTGAGTCAGTTATAAAAACCCCTTTTGATTTGTTAAAACAGTTTGCGGTCTGGCAACT
GCAAACGTTCAACAAGAAATCAAAAGGGGGTCCCTATGAGGGACGAAAAGAGCTTAGCGCACACGCGATGGAACTGTAAA
TATCACATAGTATTTGCGCCGAAGTACCGAAGACAGGTGTTCTACGGGGAAAAGCGCAAAGCGATTGGCAGTATTTTAAG
AAAGCTGTGCGAATGGAAGAACGTGAATATTCTGGAAGCGGAATGTTGTGTGGATCACATCCATATGCTTCTGGAAATCC
CACCCAAGATGAGTGTGTCGGGCTTTATGGGATACCTAAAGGGAAAGAGTAGCCTGATGCTTTATGAGCAGTTTGGCGAT
TTGAAGTTCAAATATCGTAACCGGGAGTTTTGGTGTCGAGGGTATTACGTTGATACGGTGGGTAAAAATACAGCCAGGAT
ACAAGAATACATAAAACACCAACTGGAAGAGGATAAAATGGGTGAGCAGTTGTCGATCCCCTATCCGGGTAGCCCGTTTA
CGGGCCGTAAGTAACGAATTGTGATGCAAATGTCAGATCGCGATGCGCCTGTTAGGGCGCGGCAGGTAAGAGAGCCTTAT
AGGCGCATATGAAAAACCTCCGGCTATGCCGGAGGATAGCACAGAAAACCCCCAGCTAGGCTGGGGGTTCCGTAAAGCTT
TCAGCTTTGAGTCAGTTATAAAAACCCCTTTTGATTTGTTAAAACAGTTTGCGGTCTGGCAACTGCAAACGTTCAACAAG
AAATCAAAAGGGGGTCCCTATGAGGGACGAAAAGAGCTTAGCGCACACGCGATGGAACTGTAAATATCACATAGTATTTG
CGCCGAAGTACCGAAGACAGGTGTTCTACGGGGAAAAGCGCAAAGCGATTGGCAGTATTTTAAGAAAGCTGTGCGAATGG
AAGAACGTGAATATTCTGGAAGCGGAATGTTGTGTGGATCACATCCATATGCTTCTGGAAATCCCACCCAAGATGAGTGT
GTCGGGCTTTATGGGATACCTAAAGGGAAAGAGTAGCCTGATGCTTTATGAGCAGTTTGGCGATTTGAAGTTCAAATATC
GTAACCGGGAGTTTTGGTGTCGAGGGTATTACGTTGATACGGTGGGTAAAAATACAGCCAGGATACAAGAATACATAAAA
CACCAACTGGAAGAGGATAAAATGGGTGAGCAGTTGTCGATCCCCTATCCGGGTAGCCCGTTTACGGGCCGTAAGTAACG
AATTGTGATGCAAATGTCAGATCGCGATGCGCCTGTTAGGGCGCGGCAGGTAAGAGAGCCTTATAGGCGCATATGAAAAA
CCTCCGGCTATGCCGGAGGATAGCACAGAAAACCCCCAGCTAGGCTGGGGGTTCCGTAAAGCTTTCAGCTTTGAGTCAGT
TATAAAAACCCCTTTTGATTTGTTAAAACAGTTTGCGGTCTGGCAACTGCAAACGTTCAACAAGAAATCAAAAGGGGGTC
CCTATGAGGGACGAAAAGAGCTTAGCGCACACGCGATGGAACTGTAAATATCACATAGTATTTGCGCCGAAGTACCGAAG
ACAGGTGTTCTACGGGGAAAAGCGCAAAGCGATTGGCAGTATTTTAAGAAAGCTGTGCGAATGGAAGAACGTGAATATTC
TGGAAGCGGAATGTTGTGTGGATCACATCCATATGCTTCTGGAAATCCCACCCAAGATGAGTGTGTCGGGCTTTATGGGA
TACCTAAAGGGAAAGAGTAGCCTGATGCTTTATGAGCAGTTTGGCGATTTGAAGTTCAAATATCGTAACCGGGAGTTTTG
GTGTCGAGGGTATTACGTTGATACGGTGGGTAAAAATACAGCCAGGATACAAGAATACATAAAACACCAACTGGAAGAGG
ATAAAATGGGTGAGCAGTTGTCGATCCCCTATCCGGGTAGCCCGTTTACGGGCCGTAAGTAACGAATTGTGATGCAAATG
TCAGATCGCGATGCGCCTGTTAGGGCGCGGCAGGTAAGAGAGCCTTATAGGCGCATATGAAAAACCTCCGGCTATGCCGG
AGGATA
>bercovieri5 count=10
CGTATTGCTCTTTAACAATCTGGAACAAGCTGAAAATTGAAACAATACAGCTGAAACTTATCTCTCCGTAGATGTACTGA
GATAAGGAGTAGCCTGTATTAGAGTCTCTCAAATAATCGCAACGCAAGGGTCTGCAAAGACACCTTCGGGTTGTGAGGTT
AAGCGACTAAGCGTACACGGTGGATGCCTAGGCAGTCAGAGGCGATGAAGGGCGTGCTAATCTGCGAAAAGCGTCGGTAA
GCTGATATGAAGCGTTACAACCGACGATACCCGAATGGGGAAACCCAGTGCAATTCGTTGCACTATTGCATGGTGAATAC
ATAGCCATGCAAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTA
GCGGCGAGCGAACGGGGAAGAGCCCAGAGTCTGAATCAGTTTATGTGTTAGTGGAAGCGTCTGGAAAGTCGCACGGTACA
GGGTGATAGTCCCGTACACAAAAATGCATATTCTGTGAACTCGATGAGTAGGGCGGGACACGTGACATCCTGTCTGAATA
TGGGGGGACCATCCTCCAAGGCTAAATACTCCTGACTGACCGATAGTGAACCAGTACCGTGAGGGAAAGGCGAAAAGAAC
CCCGGCGAGGGGAGTGAAATAGAACCTGAAACCGTGTACGTACAAGCAGTGGGAGCACCTTCGTGGTGTGACTGCGTACC
TTTTGTATAATGGGTCAGCGACTTATATTTTGTAGCAAGGTTAACCGAATAGGGGAGCCGTAGGGAAACCGAGTCTTAAC
TGGGCGTCTAGTTGCAAGGTATAGACCCGAAACCCGGTGATCTAGCCATGGGCAGGTTGAAGGTTGGGTAACACTAACTG
GAGGACCGAACCGACTAATGTTGAAAAATTAGCGGATGACTTGTGGCTGGGGGTGAAAGGCCAATCAAACCGGGAGATAG
CTGGTTCTCCCCGAAAGCTATTTAGGTAGCGCCTCGTGAACTCATCTTCGGGGGTAGAGCACTGTTTCGGCTAGGGGGTC
ATCCCGACTTACCAAACCGATGCAAACTCCGAATACCGAAGAATGTTATCACGGGAGACACACGGCGGGTGCTAACGTCC
GTCGTGAAGAGGGAAACAACCCAGACCGCCAGCTAAGGTCCCAAAGTCATGGTTAAGTGGGAAACGATGTGGGAAGGCAC
AGACAGCCAGGATGTTGGCTTAGAAGCAGCCATCATTTAAAGAAAGCGTAATAGCTCACTGGTCGAGTCGGCCTGCGCGG
AAGATGTAACGGGGCTAAACCATGCACCGAAGCTGCGGCAGCGTATTGCTCTTTAACAATCTGGAACAAGCTGAAAATTG
AAACAATACAGCTGAAACTTATCTCTCCGTAGATGTACTGAGATAAGGAGTAGCCTGTATTAGAGTCTCTCAAATAATCG
CAACGCAAGGGTCTGCAAAGACACCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGATGCCTAGGCAGTCAG
AGGCGATGAAGGGCGTGCTAATCTGCGAAAAGCGTCGGTAAGCTGATATGAAGCGTTACAACCGACGATACCCGAATGGG
GAAACCCAGTGCAATTCGTTGCACTATTGCATGGTGAATACATAGCCATGCAAGGCGAACCGGGGGAACTGAAACATCTA
AGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAAGAGCCCAGAGTCTGAATCAG
TTTATGTGTTAGTGGAAGCGTCTGGAAAGTCGCACGGTACAGGGTGATAGTCCCGTACACAAAAATGCATATTCTGTGAA
CTCGATGAGTAGGGCGGGACACGTGACATCCTGTCTGAATATGGGGGGACCATCCTCCAAGGCTAAATACTCCTGACTGA
CCGATAGTGAACCAGTACCGTGAGGGAAAGGCGAAAAGAACCCCGGCGAGGGGAGTGAAATAGAACCTGAAACCGTGTAC
GTACAAGCAGTGGGAGCACCTTCGTGGTGTGACTGCGTACCTTTTGTATAATGGGTCAGCGACTTATATTTTGTAGCAAG
GTTAACCGAATAGGGGAGCCGTAGGGAAACCGAGTCTTAACTGGGCGTCTAGTTGCAAGGTATAGACCCGAAACCCGGTG
ATCTAGCCATGGGCAGGTTGAAGGTTGGGTAACACTAACTGGAGGACCGAACCGACTAATGTTGAAAAATTAGCGGATGA
CTTGTGGCTGGGGGTGAAAGGCCAATCAAACCGGGAGATAGCTGGTTCTCCCCGAAAGCTATTTAGGTAGCGCCTCGTGA
ACTCATCTTCGGGGGTAGAGCACTGTTTCGGCTAGGGGGTCATCCCGACTTACCAAACCGATGCAAACTCCGAATACCGA
AGAATGTTATCACGGGAGACACACGGCGGGTGCTAACGTCCGTCGTGAAGAGGGAAACAACCCAGACCGCCAGCTAAGGT
CCCAAAGTCATGGTTAAGTGGGAAACGATGTGGGAAGGCACAGACAGCCAGGATGTTGGCTTAGAAGCAGCCATCATTTA
AAGAAAGCGTAATAGCTCACTGGTCGAGTCGGCCTGCGCGGAAGATGTAACGGGGCTAAACCATGCACCGAAGCTGCGGC
AGCGTATTGCTCTTTAACAATCTGGAACAAGCTGAAAATTGAAACAATACAGCTGAAACTTATCTCTCCGTAGATGTACT
GAGATAAGGAGTAGCCTGTATTAGAGTCTCTCAAATAATCGCAACGCAAGGGTCTGCAAAGACACCTTCGGGTTGTGAGG
TTAAGCGACTAAGCGTACACGGTGGATGCCTAGGCAGTCAGAGGCGATGAAGGGCGTGCTAATCTGCGAAAAGCGTCGGT
AAGCTGATATGAAGCGTTACAACCGACGATACCCGAATGGGGAAACCCAGTGCAATTCGTTGCACTATTGCATGGTGAAT
ACATAGCCATGCAAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAG
TAGCGGCGAGCGAACGGGGAAGAGCCCAGAGTCTGAATCAGTTTATGTGTTAGTGGAAGCGTCTGGAAAGTCGCACGGTA
CAGGGTGATAGTCCCGTACACAAAAATGCATATTCTGTGAACTCGATGAGTAGGGCGGGACACGTGACATCCTGTCTGAA
TATGGGGGGACCATCCTCCAAGGCTAAATACTCCTGACTGACCGATAGTGAACCAGTACCGTGAGGGAAAGGCGAAAAGA
ACCCCGGCGAGGGGAGTGAAATAGAACCTGAAACCGTGTACGTACAAGCAGTGGGAGCACCTTCGTGGTGTGACTGCGTA
CCTTTTGTATAATGGGTCAGCGACTTATATTTTGTAGCAAGGTTAACCGAATAGGGGAGCCGTAGGGAAACCGAGTCTTA
ACTGGGCGTCTAGTTGCAAGGTATAGACCCGAAACCCGGTGATCTAGCCATGGGCAGGTTGAAGGTTGGGTAACACTAAC
TGGAGGACCGAACCGACTAATGTTGAAAAATTAGCGGATGACTTGTGGCTGGGGGTGAAAGGCCAATCAAACCGGGAGAT
AGCTGGTTCTCCCCGAAAGCTATTTAGGTAGCGCCTCGTGAACTCATCTTCGGGGGTAGAGCACTGTTTCGGCTAGGGGG
TCATCCCGACTTACCAAACCGATGCAAACTCCGAATACCGAAGAATGTTATCACGGGAGACACACGGCGGGTGCTAACGT
CCGTCGTGAAGAGGGAAACAACCCAGACCGCCAGCTAAGGTCCCAAAGTCATGGTTAAGTGGGAAACGATGTGGGAAGGC
ACAGACAGCCAGGATGTTGGCTTAGAAGCAGCCATCATTTAAAGAAAGCGTAATAGCTCACTGGTCGAGTCGGCCTGCGC
GGAAGATGTAACGGGGCTAAACCATGCACCGAAGCTGCGGCAGCGTATTGCTCTTTAACAATCTGGAACAAGCTGAAAAT
TGAAACAATACAGCTGAAACTTATCTCTCCGTAGATGTACTGAGATAAGGAGTAGCCTGTATTAGAGTCTCTCAAATAAT
CGCAACGCAAGGGTCTGCAAAGACACCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGATGCCTAGGCAGTC
AGAGGCGATGAAGGGCGTGCTAATCTGCGAAAAGCGTCGGTAAGCTGATATGAAGCGTTACAACCGACGATACCCGAATG
GGGAAACCCAGTGCAATTCGTTGCACTATTGCATGGTGAATACATAGCCATGCAAGGCGAACCGGGGGAACTGAAACATC
TAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAAGAGCCCAGAGTCTGAATC
AGTTTATGTGTTAGTGGAAGCGTCTGGAAAGTCGCACGGTACAGGGTGATAGTCCCGTACACAAAAATGCATATTCTGTG
AACTCGATGAGTAGGGCGGGACACGTGACATCCTGTCTGAATATGGGGGGACCATCCTCCAAGGCTAAATACTCCTGACT
GACCGATAGTGAACCAGTACCGTGAGGGAAAGGCGAAAAGAACCCCGGCGAGGGGAGTGAAATAGAACCTGAAACCGTGT
ACGTACAAGCAGTGGGAGCACCTTCGTGGTGTGACTGCGTACCTTTTGTATAATGGGTCAGCGACTTATATTTTGTAGCA
AGGTTAACCGAATAGGGGAGCCGTAGGGAAACCGAGTCTTAACTGGGCGTCTAGTTGCAAGGTATAGACCCGAAACCCGG
TGATCTAGCCATGGGCAGGTTGAAGGTTGGGTAACACTAACTGGAGGACCGAACCGACTAATGTTGAAAAATTAGCGGAT
GACTTGTGGCTGGGGGTGAAAGGCCAATCAAACCGGGAGATAGCTGGTTCTCCCCGAAAGCTATTTAGGTAGCGCCTCGT
GAACTCATCTTCGGGGGTAGAGCACTGTTTCGGCTAGGGGGTCATCCCGACTTACCAAACCGATGCAAACTCCGAATACC
GAAGAATGTTATCACGGGAGACACACGGCGGGTGCTAACGTCCGTCGTGAAGAGGGAAACAACCCAGACCGCCAGCTAAG
GTCCCAAAGTCATGGTTAAGTGGGAAACGATGTGGGAAGGCACAGACAGCCAGGATGTTGGCTTAGAAGCAGCCATCATT
TAAAGAAAGCGTAATAGCTCACTGGTCGAGTCGGCCTGCGCGGAAGATGTAACGGGGCTAAACCATGCACCGAAGCTGCG
GCAGCGTATTGCTCTTTAACAATCTGGAACAAGCTGAAAATTGAAACAATACAGCTGAAACTTATCTCTCCGTAGATGTA
CTGAGATAAGGAGTAGCCTGTATTAGAGTCTCTCAAATAATCGCAACGCAAGGGTCTGCAAAGACACCTTCGGGTTGTGA
GGTTAAGCGACTAAGCGTACACGGTGGATGCCTAGGCAGTCAGAGGCGATGAAGGGCGTGCTAATCTGCGAAAAGCGTCG
GTAAGCTGATATGAAGCGTTACAACCGACGATACCCGAATGGGGAAACCCAGTGCAATTCGTTGCACTATTGCATGGTGA
ATACATAGCCATGCAAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCC
AGTAGCGGCGAGCGAACGGGGAAGAGCCCAGAGTCTGAATCAGTTTATGTGTTAGTGGAAGCGTCTGGAAAGTCGCACGG
TACAGGGTGATAGTCCCGTACACAAAAATGCATATTCTGTGAACTCGATGAGTAGGGCGGGACACGTGACATCCTGTCTG
AATATGGGGGGACCATCCTCCAAGGCTAAATACTCCTGACTGACCGATAGTGAACCAGTACCGTGAGGGAAAGGCGAAAA
GAACCCCGGCGAGGGGAGTGAAATAGAACCTGAAACCGTGTACGTACAAGCAGTGGGAGCACCTTCGTGGTGTGACTGCG
TACCTTTTGTATAATGGGTCAGCGACTTATATTTTGTAGCAAGGTTAACCGAATAGGGGAGCCGTAGGGAAACCGAGTCT
TAACTGGGCGTCTAGTTGCAAGGTATAGACCCGAAACCCGGTGATCTAGCCATGGGCAGGTTGAAGGTTGGGTAACACTA
ACTGGAGGACCGAACCGACTAATGTTGAAAAATTAGCGGATGACTTGTGGCTGGGGGTGAAAGGCCAATCAAACCGGGAG
ATAGCTGGTTCTCCCCGAAAGCTATTTAGGTAGCGCCTCGTGAACTCATCTTCGGGGGTAGAGCACTGTTTCGGCTAGGG
GGTCATCCCGACTTACCAAACCGATGCAAACTCCGAATACCGAAGAATGTTATCACGGGAGACACACGGCGGGTGCTAAC
GTCCGTCGTGAAGAGGGAAACAACCCAGACCGCCAGCTAAGGTCCCAAAGTCATGGTTAAGTGGGAAACGATGTGGGAAG
GCACAGACAGCCAGGATGTTGGCTTAGAAGCAGCCATCATTTAAAGAAAGCGTAATAGCTCACTGGTCGAGTCGGCCTGC
GCGGAAGATGTAACGGGGCTAAACCATGCACCGAAGCTGCGGCAGCGTATTGCTCTTTAACAATCTGGAACAAGCTGAAA
ATTGAAACAATACAGCTGAAACTTATCTCTCCGTAGATGTACTGAGATAAGGAGTAGCCTGTATTAGAGTCTCTCAAATA
ATCGCAACGCAAGGGTCTGCAAAGACACCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGATGCCTAGGCAG
TCAGAGGCGATGAAGGGCGTGCTAATCTGCGAAAAGCGTCGGTAAGCTGATATGAAGCGTTACAACCGACGATACCCGAA
TGGGGAAACCCAGTGCAATTCGTTGCACTATTGCATGGTGAATACATAGCCATGCAAGGCGAACCGGGGGAACTGAAACA
TCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAAGAGCCCAGAGTCTGAA
TCAGTTTATGTGTTAGTGGAAGCGTCTGGAAAGTCGCACGGTACAGGGTGATAGTCCCGTACACAAAAATGCATATTCTG
TGAACTCGATGAGTAGGGCGGGACACGTGACATCCTGTCTGAATATGGGGGGACCATCCTCCAAGGCTAAATACTCCTGA
CTGACCGATAGTGAACCAGTACCGTGAGGGAAAGGCGAAAAGAACCCCGGCGAGGGGAGTGAAATAGAACCTGAAACCGT
GTACGTACAAGCAGTGGGAGCACCTTCGTGGTGTGACTGCGTACCTTTTGTATAATGGGTCAGCGACTTATATTTTGTAG
CAAGGTTAACCGAATAGGGGAGCCGTAGGGAAACCGAGTCTTAACTGGGCGTCTAGTTGCAAGGTATAGACCCGAAACCC
GGTGATCTAGCCATGGGCAGGTTGAAGGTTGGGTAACACTAACTGGAGGACCGAACCGACTAATGTTGAAAAATTAGCGG
ATGACTTGTGGCTGGGGGTGAAAGGCCAATCAAACCGGGAGATAGCTGGTTCTCCCCGAAAGCTATTTAGGTAGCGCCTC
GTGAACTCATCTTCGGGGGTAGAGCACTGTTTCGGCTAGGGGGTCATCCCGACTTACCAAACCGATGCAAACTCCGAATA
CCGAAGAATGTTATCACGGGAGACACACGGCGGGTGCTAACGTCCGTCGTGAAGAGGGAAACAACCCAGACCGCCAGCTA
AGGTCCCAAAGTCATGGTTAAGTGGGAAACGATGTGGGAAGGCACAGACAGCCAGGATGTTGGCTTAGAAGCAGCCATCA
TTTAAAGAAAGCGTAATAGCTCACTGGTCGAGTCGGCCTGCGCGGAAGATGTAACGGGGCTAAACCATGCACCGAAGCTG
CGGCAGCGTATTGCTCTTTAACAATCTGGAACAAGCTGAAAATTGAAACAATACAGCTGAAACTTATCTCTCCGTAGATG
TACTGAGATAAGGAGTAGCCTGTATTAGAGTCTCTCAAATAATCGCAACGCAAGGGTCTGCAAAGACACCTTCGGGTTGT
GAGGTTAAGCGACTAAGCGTACACGGTGGATGCCTAGGCAGTCAGAGGCGATGAAGGGCGTGCTAATCTGCGAAAAGCGT
CGGTAAGCTGATATGAAGCGTTACAACCGACGATACCCGAATGGGGAAACCCAGTGCAATTCGTTGCACTATTGCATGGT
GAATACATAGCCATGCAAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCC
CCAGTAGCGGCGAGCGAACGGGGAAGAGCCCAGAGTCTGAATCAGTTTATGTGTTAGTGGAAGCGTCTGGAAAGTCGCAC
GGTACAGGGTGATAGTCCCGTACACAAAAATGCATATTCTGTGAACTCGATGAGTAGGGCGGGACACGTGACATCCTGTC
TGAATATGGGGGGACCATCCTCCAAGGCTAAATACTCCTGACTGACCGATAGTGAACCAGTACCGTGAGGGAAAGGCGAA
AAGAACCCCGGCGAGGGGAGTGAAATAGAACCTGAAACCGTGTACGTACAAGCAGTGGGAGCACCTTCGTGGTGTGACTG
CGTACCTTTTGTATAATGGGTCAGCGACTTATATTTTGTAGCAAGGTTAACCGAATAGGGGAGCCGTAGGGAAACCGAGT
CTTAACTGGGCGTCTAGTTGCAAGGTATAGACCCGAAACCCGGTGATCTAGCCATGGGCAGGTTGAAGGTTGGGTAACAC
TAACTGGAGGACCGAACCGACTAATGTTGAAAAATTAGCGGATGACTTGTGGCTGGGGGTGAAAGGCCAATCAAACCGGG
AGATAGCTGGTTCTCCCCGAAAGCTATTTAGGTAGCGCCTCGTGAACTCATCTTCGGGGGTAGAGCACTGTTTCGGCTAG
GGGGTCATCCCGACTTACCAAACCGATGCAAACTCCGAATACCGAAGAATGTTATCACGGGAGACACACGGCGGGTGCTA
ACGTCCGTCGTGAAGAGGGAAACAACCCAGACCGCCAGCTAAGGTCCCAAAGTCATGGTTAAGTGGGAAACGATGTGGGA
AGGCACAGACAGCCAGGATGTTGGCTTAGAAGCAGCCATCATTTAAAGAAAGCGTAATAGCTCACTGGTCGAGTCGGCCT
GCGCGGAAGATGTAACGGGGCTAAACCATGCACCGAAGCTGCGGCAGCG
>bercovieri6 count=15
GTGATTGACGGCATTGCTTTCCAAACCAATATTCTGGCGCTGAATGCCGCCGTTGAAGCCGCCCGTGCCGGTGAGCAGGG
GCGTGGTTTTGCTGTGGTCGCCGGTGAAGTGCGTAATTTGGCACAGCGCAGTGCGC
>bercovieri10 count=10
CTATCAGACAATCTGTGTGGACACTGCGCAATGCGTATCGTGAGGTAAGGAGGTGATCCAACCGCAGGTTCCCCTACGGT
TACCTTGTTACGACTTCACCCCAGTCATGAATCACAAAGTGGTAAGCGCCCTCCCGAAGGTTAAGCTACCTACTTCTTTT
GCAACCCACTCCCATGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGTAGCATTCTGATCTACGATTA
CTAGCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTACGACAGACTTTATGTGGTCCGCTTGCTCTC
GCGAGTTCGCTTCACTTTGTATCTGCCATTGTAGCACGTGTGTAGCCCTACTCGTAAGGGCCATGATGACTTGACGTCAT
CCCCACCTTCCTCCGGTTTGTCACCGGCAGTCTCCCTTGAGTTCCCACCATTACGTGCTGGCAACAAAGGATAAGGGTTG
CGCTCGTTGCGGGACTTAACCCAACATTTCACAACACGAGCTGACGACAGCCATGCAGCACCTGTCTCACCTATCAGACA
ATCTGTGTGGACACTGCGCAATGCGTATCGTGAGGTAAGGAGGTGATCCAACCGCAGGTTCCCCTACGGTTACCTTGTTA
CGACTTCACCCCAGTCATGAATCACAAAGTGGTAAGCGCCCTCCCGAAGGTTAAGCTACCTACTTCTTTTGCAACCCACT
CCCATGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGTAGCATTCTGATCTACGATTACTAGCGATTC
CGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTACGACAGACTTTATGTGGTCCGCTTGCTCTCGCGAGTTCGC
TTCACTTTGTATCTGCCATTGTAGCACGTGTGTAGCCCTACTCGTAAGGGCCATGATGACTTGACGTCATCCCCACCTTC
CTCCGGTTTGTCACCGGCAGTCTCCCTTGAGTTCCCACCATTACGTGCTGGCAACAAAGGATAAGGGTTGCGCTCGTTGC
GGGACTTAACCCAACATTTCACAACACGAGCTGACGACAGCCATGCAGCACCTGTCTCACCTATCAGACAATCTGTGTGG
ACACTGCGCAATGCGTATCGTGAGGTAAGGAGGTGATCCAACCGCAGGTTCCCCTACGGTTACCTTGTTACGACTTCACC
CCAGTCATGAATCACAAAGTGGTAAGCGCCCTCCCGAAGGTTAAGCTACCTACTTCTTTTGCAACCCACTCCCATGGTGT
GACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGTAGCATTCTGATCTACGATTACTAGCGATTCCGACTTCATG
GAGTCGAGTTGCAGACTCCAATCCGGACTACGACAGACTTTATGTGGTCCGCTTGCTCTCGCGAGTTCGCTTCACTTTGT
ATCTGCCATTGTAGCACGTGTGTAGCCCTACTCGTAAGGGCCATGATGACTTGACGTCATCCCCACCTTCCTCCGGTTTG
TCACCGGCAGTCTCCCTTGAGTTCCCACCATTACGTGCTGGCAACAAAGGATAAGGGTTGCGCTCGTTGCGGGACTTAAC
CCAACATTTCACAACACGAGCTGACGACAGCCATGCAGCACCTGTCTCACCTATCAGACAATCTGTGTGGACACTGCGCA
ATGCGTATCGTGAGGTAAGGAGGTGATCCAACCGCAGGTTCCCCTACGGTTACCTTGTTACGACTTCACCCCAGTCATGA
ATCACAAAGTGGTAAGCGCCCTCCCGAAGGTTAAGCTACCTACTTCTTTTGCAACCCACTCCCATGGTGTGACGGGCGGT
GTGTACAAGGCCCGGGAACGTATTCACCGTAGCATTCTGATCTACGATTACTAGCGATTCCGACTTCATGGAGTCGAGTT
GCAGACTCCAATCCGGACTACGACAGACTTTATGTGGTCCGCTTGCTCTCGCGAGTTCGCTTCACTTTGTATCTGCCATT
GTAGCACGTGTGTAGCCCTACTCGTAAGGGCCATGATGACTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAG
TCTCCCTTGAGTTCCCACCATTACGTGCTGGCAACAAAGGATAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATTTC
ACAACACGAGCTGACGACAGCCATGCAGCACCTGTCTCACCTATCAGACAATCTGTGTGGACACTGCGCAATGCGTATCG
TGAGGTAAGGAGGTGATCCAACCGCAGGTTCCCCTACGGTTACCTTGTTACGACTTCACCCCAGTCATGAATCACAAAGT
GGTAAGCGCCCTCCCGAAGGTTAAGCTACCTACTTCTTTTGCAACCCACTCCCATGGTGTGACGGGCGGTGTGTACAAGG
CCCGGGAACGTATTCACCGTAGCATTCTGATCTACGATTACTAGCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCA
ATCCGGACTACGACAGACTTTATGTGGTCCGCTTGCTCTCGCGAGTTCGCTTCACTTTGTATCTGCCATTGTAGCACGTG
TGTAGCCCTACTCGTAAGGGCCATGATGACTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCTCCCTTGA
GTTCCCACCATTACGTGCTGGCAACAAAGGATAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATTTCACAACACGAG
CTGACGACAGCCATGCAGCACCTGTCTCACC
>bercovieri11 count=23
GGTTGATGACAAACCTATTTTGTCACTCCGAAACTGGCGGGGCGACTGCACCGATGGGCGCTCCGACGGCTCACGCCGTT
ACGACCCATTCGGCACATTTCCCCGCTTATCAACTTTGTCAGCAGTCTGA
>intermedia6 count=21
TGGTGCGGCCTGCGGGACTCGAACCCGCGACCCACTGATTAGCAGTCAGGTGCTCTACCAACTGAGCTATGGCCGCA
>intermedia12 count=10
CTCGGTTGAGGCGACAGAGCTAAGATGAACACTGAACGATTATCAACCGAAACC
>intermedia13 count=37
TATACCCGTCATACTTCAAGCTGCATGTGCGTTGGCTGCCTTCGTTCACCCCAGTCACTTACTTATGTAAGCTCCTGGGG
ATT
>rohdei5 count=65
TCAGACTGCTGACAAAGTTGATTGGCGGGGAAATGTGCCGAATGGGTCGTAGCCGCGTAAGCCGCCGGAGCGCCCATAGG
TGCAGTCGCCCCGGCTGACTCGAAACTGAAAATATCGATTTGTCATCAA
>rohdei6 count=16
TAGCTCAGTTGGTAGAGCACCAGCCTCCTAAGCAGGGGGTCGCGAGTTCGAGTCTCGTTGCCCGCACCAAATT
>ruckeri0 count=70
TCAGACTGCTGACAAACCTCGATCCTGCGATCTCTGGTGGGGAAGACAGGGAGAAGAGTCAAGCGTCCGCGCCAGGGATG
GCGCGGAACGAGCCGACAAGGACGTATTCACGGCGTCTTGACGATCTGCCGGTTTTCACCGCAGCGAGCACTTTGTCATT
AACCTCA
>ruckeri1 count=10
CGCCCTACTCATCGAGTTCACAGCCAGTGTGTTTTTGTGTACGGGACTATCACCCTGTACCGTGCGACTTTCCAGACGCT
TCCACTAACACACCAACTGATTCAGACTCTGGGCTCTTCCCCGTTCGCTCGCCGCTACTGGGGGAATCTCGGTTGATTTC
TTTTCCTCGGGGTACTTAGATGTTTCAGTTCCCCCGGTTCGCCTTGCATGGCTATGTATTCACCATGCAATAGTGCAACG
AATTGCACTGGGTTTCCCCATTCGGGTATCGTCGGTTATTACGCTTCATATCAGCTTACCGACGCTTTTCGCAGATTAGC
ACGCCCTTCATCGCCTCTGACTGCCTAGGCATCCACCGTGTACGCTTAGTCGCTTAACCTCACAACCCGAAGGTGTCTTT
TCTTTACTGGTGTGGTGGTGAACACACTCTGCGTTATTGTGCAGTACGCGCCATCCTCATGAACCTCAAGTTCACTTCGG
TTGCCGTGCGCTGTACGCTTCGCCGAGTATCTTCTCACTCACACTGCAACAAAGGATAAACACATCGCATTGCAATTATT
TGAGAGACTCTGATACAGGGTACTCCTTATCTCAGTACTGCTACGGAGAGACAAGTTTCAGCTGTATCGTTTCAATTTTC
AGCTTGTTCCAGATTGTTAAAGAGCAATATTCTAACACGACCCACGTAGACTGGCGTCCACGCTTCGATGCCTCCCACCT
ATCCTACACATCAAGGCTCAATATTCAGTGTCAAGCTATAGTAAAGGTTCACGGGGTCTTTCCGTCTTGCCGCGGGTACA
CTGCATCTTCACAGCGAGTTCAATTTCACTGAGTCTCGGGTGGAGACAGCCTGGCCATCATTACGCCATTCGTGCAGGTC
GGAACTTACCCGACAAGGAATTTCGCTACCTTAGGACCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTCGATCAAGAG
CTTCGCCTTGCGGCTGACCCCATCAATTAACCTTCCAGCACCGGGCAGGCGTCACACCGTATACGTCCACTTTCGTGTTT
GCACAGTGCTGTGTTTTTATTAAACAGTTGCAGCCAGCTGGTATCTGCGACTGGCTTCGGCGCCGAGAGCAAGTCTCTTT
ACCTAATGCCAGCGTGCCTTCTCCCGAAGTTACGGCACCATTTTGCCTAGTTCCTTCACCCGAGTTCTCTCAAGCGCCTG
AGTATTCTCTACCTGACCACCTGTGTCGGTTTGGGGTACGATTGAATGTCACCTGATGCTTAGAGGCTTTTCCTGGAAGC
TTGGCATCAACTACTTCACCACCGTAGTGGCTCGTCATCACACCTCAGCGTTGATAAGCAACCGGATTTACCAAGTCACT
CCGCCTACATGCTTAAACCGGGACAACCGTCGCCCGGCTAGCCTAGCCTTCTCCGTCCCCCCTTCGCAGTCACACCCAGT
ACAGGAATATTAACCTGTTTCCCATCGACTACGCTTTTCAGCCTCGCCTTAGGGGTCGACTCACCCTGCCCCGATTAACG
TTGGACAGGAACCCTTGGTCTTCCGGCGTGCGGGTTTTTCACCCGCATTATCGTTACTTATGTCAGCATTCGCACTTCTG
ATACCTCCAGCAGCCCTCACAGGCCACCTTCAACGGCTTACAGAACGCTCCCCTACCCAACAACGCCTAAGCGTCGCTGC
CGCAGCTTCGGTGCATGGTTTAGCCCCGTTACATCTTCCGCGCAGGCCGACTCGACCAGTGAGCTATTACGCTTTCTTTA
AATGATGGCTGCTTCTAAGCCAACATCCTGGCTGTCTATGCCTTCCCACATCGTTTCCCACTTAACCATGACTTTGGGAC
CTTAGCTGGCGGTCTGGGTTGTTTCCCTCTTCACGACGGACGTTAGCACCCGCCGTGTGTCTCCCGTGATAACATTCTTC
GGTATTCGGAGTTTGCATCGGTTTGGTAATCCGGGATGGACCCCTAGCCGAAACAGTGCTCTACCCCCGAAGATGAGTTC
ACGAGGCGCTACCTAAATAGCTTTCGGGGAGAACCAGCTATCTCCCGGTTTGATTGGCCTTTCACCCCCAGCCACAAGTC
ATCCGCTAATTTTTCAACATTAGTCGGTTCGGTCCTCCAGTTAGTGTTACCCAACCTTCAACCTGCCCATGGCTAGATCA
CCGGGTTTCGGGTCTATACCTTGCAACTAGACGCCCAGTTAAGACTCGGTTTCCCTACGGCTCCCCTATTCGGTTAACCT
TGCTACAAAATATAAGTCGCTGACCCATTATACAAAAGGTACGCAGTCACACCACGAAGGTGCTCCCACTGCTTGTACGT
ACACGGTTTCAGGTTCTATTTCACTCCCCTCGCCGGGGTTCTTTTCGCCTTTCCCTCACGGTACTGGTTCACTATCGGTC
AGTCAGGAGTATTTAGCCTTGGAGGATGGTCCCCCCATATTCAGACAGGATGTCACGTGTCCCGCCCTACTCATCGAGTT
CACAGCCAGTGTGTTTTTGTGTACGGGACTATCACCCTGTACCGTGCGACTTTCCAGACGCTTCCACTAACACACCAACT
GATTCAGACTCTGGGCTCTTCCCCGTTCGCTCGCCGCTACTGGGGGAATCTCGGTTGATTTCTTTTCCTCGGGGTACTTA
GATGTTTCAGTTCCCCCGGTTCGCCTTGCATGGCTATGTATTCACCATGCAATAGTGCAACGAATTGCACTGGGTTTCCC
CATTCGGGTATCGTCGGTTATTACGCTTCATATCAGCTTACCGACGCTTTTCGCAGATTAGCACGCCCTTCATCGCCTCT
GACTGCCTAGGCATCCACCGTGTACGCTTAGTCGCTTAACCTCACAACCCGAAGGTGTCTTTTCTTTACTGGTGTGGTGG
TGAACACACTCTGCGTTATTGTGCAGTACGCGCCATCCTCATGAACCTCAAGTTCACTTCGGTTGCCGTGCGCTGTACGC
TTCGCCGAGTATCTTCTCACTCACACTGCAACAAAGGATAAACACATCGCATTGCAATTATTTGAGAGACTCTGATACAG
GGTACTCCTTATCTCAGTACTGCTACGGAGAGACAAGTTTCAGCTGTATCGTTTCAATTTTCAGCTTGTTCCAGATTGTT
AAAGAGCAATATTCTAACACGACCCACGTAGACTGGCGTCCACGCTTCGATGCCTCCCACCTATCCTACACATCAAGGCT
CAATATTCAGTGTCAAGCTATAGTAAAGGTTCACGGGGTCTTTCCGTCTTGCCGCGGGTACACTGCATCTTCACAGCGAG
TTCAATTTCACTGAGTCTCGGGTGGAGACAGCCTGGCCATCATTACGCCATTCGTGCAGGTCGGAACTTACCCGACAAGG
AATTTCGCTACCTTAGGACCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTCGATCAAGAGCTTCGCCTTGCGGCTGAC
CCCATCAATTAACCTTCCAGCACCGGGCAGGCGTCACACCGTATACGTCCACTTTCGTGTTTGCACAGTGCTGTGTTTTT
ATTAAACAGTTGCAGCCAGCTGGTATCTGCGACTGGCTTCGGCGCCGAGAGCAAGTCTCTTTACCTAATGCCAGCGTGCC
TTCTCCCGAAGTTACGGCACCATTTTGCCTAGTTCCTTCACCCGAGTTCTCTCAAGCGCCTGAGTATTCTCTACCTGACC
ACCTGTGTCGGTTTGGGGTACGATTGAATGTCACCTGATGCTTAGAGGCTTTTCCTGGAAGCTTGGCATCAACTACTTCA
CCACCGTAGTGGCTCGTCATCACACCTCAGCGTTGATAAGCAACCGGATTTACCAAGTCACTCCGCCTACATGCTTAAAC
CGGGACAACCGTCGCCCGGCTAGCCTAGCCTTCTCCGTCCCCCCTTCGCAGTCACACCCAGTACAGGAATATTAACCTGT
TTCCCATCGACTACGCTTTTCAGCCTCGCCTTAGGGGTCGACTCACCCTGCCCCGATTAACGTTGGACAGGAACCCTTGG
TCTTCCGGCGTGCGGGTTTTTCACCCGCATTATCGTTACTTATGTCAGCATTCGCACTTCTGATACCTCCAGCAGCCCTC
ACAGGCCACCTTCAACGGCTTACAGAACGCTCCCCTACCCAACAACGCCTAAGCGTCGCTGCCGCAGCTTCGGTGCATGG
TTTAGCCCCGTTACATCTTCCGCGCAGGCCGACTCGACCAGTGAGCTATTACGCTTTCTTTAAATGATGGCTGCTTCTAA
GCCAACATCCTGGCTGTCTATGCCTTCCCACATCGTTTCCCACTTAACCATGACTTTGGGACCTTAGCTGGCGGTCTGGG
TTGTTTCCCTCTTCACGACGGACGTTAGCACCCGCCGTGTGTCTCCCGTGATAACATTCTTCGGTATTCGGAGTTTGCAT
CGGTTTGGTAATCCGGGATGGACCCCTAGCCGAAACAGTGCTCTACCCCCGAAGATGAGTTCACGAGGCGCTACCTAAAT
AGCTTTCGGGGAGAACCAGCTATCTCCCGGTTTGATTGGCCTTTCACCCCCAGCCACAAGTCATCCGCTAATTTTTCAAC
ATTAGTCGGTTCGGTCCTCCAGTTAGTGTTACCCAACCTTCAACCTGCCCATGGCTAGATCACCGGGTTTCGGGTCTATA
CCTTGCAACTAGACGCCCAGTTAAGACTCGGTTTCCCTACGGCTCCCCTATTCGGTTAACCTTGCTACAAAATATAAGTC
GCTGACCCATTATACAAAAGGTACGCAGTCACACCACGAAGGTGCTCCCACTGCTTGTACGTACACGGTTTCAGGTTCTA
TTTCACTCCCCTCGCCGGGGTTCTTTTCGCCTTTCCCTCACGGTACTGGTTCACTATCGGTCAGTCAGGAGTATTTAGCC
TTGGAGGATGGTCCCCCCATATTCAGACAGGATGTCACGTGTCCCGCCCTACTCATCGAGTTCACAGCCAGTGTGTTTTT
GTGTACGGGACTATCACCCTGTACCGTGCGACTTTCCAGACGCTTCCACTAACACACCAACTGATTCAGACTCTGGGCTC
TTCCCCGTTCGCTCGCCGCTACTGGGGGAATCTCGGTTGATTTCTTTTCCTCGGGGTACTTAGATGTTTCAGTTCCCCCG
GTTCGCCTTGCATGGCTATGTATTCACCATGCAATAGTGCAACGAATTGCACTGGGTTTCCCCATTCGGGTATCGTCGGT
TATTACGCTTCATATCAGCTTACCGACGCTTTTCGCAGATTAGCACGCCCTTCATCGCCTCTGACTGCCTAGGCATCCAC
CGTGTACGCTTAGTCGCTTAACCTCACAACCCGAAGGTGTCTTTTCTTTACTGGTGTGGTGGTGAACACACTCTGCGTTA
TTGTGCAGTACGCGCCATCCTCATGAACCTCAAGTTCACTTCGGTTGCCGTGCGCTGTACGCTTCGCCGAGTATCTTCTC
ACTCACACTGCAACAAAGGATAAACACATCGCATTGCAATTATTTGAGAGACTCTGATACAGGGTACTCCTTATCTCAGT
ACTGCTACGGAGAGACAAGTTTCAGCTGTATCGTTTCAATTTTCAGCTTGTTCCAGATTGTTAAAGAGCAATATTCTAAC
ACGACCCACGTAGACTGGCGTCCACGCTTCGATGCCTCCCACCTATCCTACACATCAAGGCTCAATATTCAGTGTCAAGC
TATAGTAAAGGTTCACGGGGTCTTTCCGTCTTGCCGCGGGTACACTGCATCTTCACAGCGAGTTCAATTTCACTGAGTCT
CGGGTGGAGACAGCCTGGCCATCATTACGCCATTCGTGCAGGTCGGAACTTACCCGACAAGGAATTTCGCTACCTTAGGA
CCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTCGATCAAGAGCTTCGCCTTGCGGCTGACCCCATCAATTAACCTTCC
AGCACCGGGCAGGCGTCACACCGTATACGTCCACTTTCGTGTTTGCACAGTGCTGTGTTTTTATTAAACAGTTGCAGCCA
GCTGGTATCTGCGACTGGCTTCGGCGCCGAGAGCAAGTCTCTTTACCTAATGCCAGCGTGCCTTCTCCCGAAGTTACGGC
ACCATTTTGCCTAGTTCCTTCACCCGAGTTCTCTCAAGCGCCTGAGTATTCTCTACCTGACCACCTGTGTCGGTTTGGGG
TACGATTGAATGTCACCTGATGCTTAGAGGCTTTTCCTGGAAGCTTGGCATCAACTACTTCACCACCGTAGTGGCTCGTC
ATCACACCTCAGCGTTGATAAGCAACCGGATTTACCAAGTCACTCCGCCTACATGCTTAAACCGGGACAACCGTCGCCCG
GCTAGCCTAGCCTTCTCCGTCCCCCCTTCGCAGTCACACCCAGTACAGGAATATTAACCTGTTTCCCATCGACTACGCTT
TTCAGCCTCGCCTTAGGGGTCGACTCACCCTGCCCCGATTAACGTTGGACAGGAACCCTTGGTCTTCCGGCGTGCGGGTT
TTTCACCCGCATTATCGTTACTTATGTCAGCATTCGCACTTCTGATACCTCCAGCAGCCCTCACAGGCCACCTTCAACGG
CTTACAGAACGCTCCCCTACCCAACAACGCCTAAGCGTCGCTGCCGCAGCTTCGGTGCATGGTTTAGCCCCGTTACATCT
TCCGCGCAGGCCGACTCGACCAGTGAGCTATTACGCTTTCTTTAAATGATGGCTGCTTCTAAGCCAACATCCTGGCTGTC
TATGCCTTCCCACATCGTTTCCCACTTAACCATGACTTTGGGACCTTAGCTGGCGGTCTGGGTTGTTTCCCTCTTCACGA
CGGACGTTAGCACCCGCCGTGTGTCTCCCGTGATAACATTCTTCGGTATTCGGAGTTTGCATCGGTTTGGTAATCCGGGA
TGGACCCCTAGCCGAAACAGTGCTCTACCCCCGAAGATGAGTTCACGAGGCGCTACCTAAATAGCTTTCGGGGAGAACCA
GCTATCTCCCGGTTTGATTGGCCTTTCACCCCCAGCCACAAGTCATCCGCTAATTTTTCAACATTAGTCGGTTCGGTCCT
CCAGTTAGTGTTACCCAACCTTCAACCTGCCCATGGCTAGATCACCGGGTTTCGGGTCTATACCTTGCAACTAGACGCCC
AGTTAAGACTCGGTTTCCCTACGGCTCCCCTATTCGGTTAACCTTGCTACAAAATATAAGTCGCTGACCCATTATACAAA
AGGTACGCAGTCACACCACGAAGGTGCTCCCACTGCTTGTACGTACACGGTTTCAGGTTCTATTTCACTCCCCTCGCCGG
GGTTCTTTTCGCCTTTCCCTCACGGTACTGGTTCACTATCGGTCAGTCAGGAGTATTTAGCCTTGGAGGATGGTCCCCCC
ATATTCAGACAGGATGTCACGTGTCCCGCCCTACTCATCGAGTTCACAGCCAGTGTGTTTTTGTGTACGGGACTATCACC
CTGTACCGTGCGACTTTCCAGACGCTTCCACTAACACACCAACTGATTCAGACTCTGGGCTCTTCCCCGTTCGCTCGCCG
CTACTGGGGGAATCTCGGTTGATTTCTTTTCCTCGGGGTACTTAGATGTTTCAGTTCCCCCGGTTCGCCTTGCATGGCTA
TGTATTCACCATGCAATAGTGCAACGAATTGCACTGGGTTTCCCCATTCGGGTATCGTCGGTTATTACGCTTCATATCAG
CTTACCGACGCTTTTCGCAGATTAGCACGCCCTTCATCGCCTCTGACTGCCTAGGCATCCACCGTGTACGCTTAGTCGCT
TAACCTCACAACCCGAAGGTGTCTTTTCTTTACTGGTGTGGTGGTGAACACACTCTGCGTTATTGTGCAGTACGCGCCAT
CCTCATGAACCTCAAGTTCACTTCGGTTGCCGTGCGCTGTACGCTTCGCCGAGTATCTTCTCACTCACACTGCAACAAAG
GATAAACACATCGCATTGCAATTATTTGAGAGACTCTGATACAGGGTACTCCTTATCTCAGTACTGCTACGGAGAGACAA
GTTTCAGCTGTATCGTTTCAATTTTCAGCTTGTTCCAGATTGTTAAAGAGCAATATTCTAACACGACCCACGTAGACTGG
CGTCCACGCTTCGATGCCTCCCACCTATCCTACACATCAAGGCTCAATATTCAGTGTCAAGCTATAGTAAAGGTTCACGG
GGTCTTTCCGTCTTGCCGCGGGTACACTGCATCTTCACAGCGAGTTCAATTTCACTGAGTCTCGGGTGGAGACAGCCTGG
CCATCATTACGCCATTCGTGCAGGTCGGAACTTACCCGACAAGGAATTTCGCTACCTTAGGACCGTTATAGTTACGGCCG
CCGTTTACTGGGGCTTCGATCAAGAGCTTCGCCTTGCGGCTGACCCCATCAATTAACCTTCCAGCACCGGGCAGGCGTCA
CACCGTATACGTCCACTTTCGTGTTTGCACAGTGCTGTGTTTTTATTAAACAGTTGCAGCCAGCTGGTATCTGCGACTGG
CTTCGGCGCCGAGAGCAAGTCTCTTTACCTAATGCCAGCGTGCCTTCTCCCGAAGTTACGGCACCATTTTGCCTAGTTCC
TTCACCCGAGTTCTCTCAAGCGCCTGAGTATTCTCTACCTGACCACCTGTGTCGGTTTGGGGTACGATTGAATGTCACCT
GATGCTTAGAGGCTTTTCCTGGAAGCTTGGCATCAACTACTTCACCACCGTAGTGGCTCGTCATCACACCTCAGCGTTGA
TAAGCAACCGGATTTACCAAGTCACTCCGCCTACATGCTTAAACCGGGACAACCGTCGCCCGGCTAGCCTAGCCTTCTCC
GTCCCCCCTTCGCAGTCACACCCAGTACAGGAATATTAACCTGTTTCCCATCGACTACGCTTTTCAGCCTCGCCTTAGGG
GTCGACTCACCCTGCCCCGATTAACGTTGGACAGGAACCCTTGGTCTTCCGGCGTGCGGGTTTTTCACCCGCATTATCGT
TACTTATGTCAGCATTCGCACTTCTGATACCTCCAGCAGCCCTCACAGGCCACCTTCAACGGCTTACAGAACGCTCCCCT
ACCCAACAACGCCTAAGCGTCGCTGCCGCAGCTTCGGTGCATGGTTTAGCCCCGTTACATCTTCCGCGCAGGCCGACTCG
ACCAGTGAGCTATTACGCTTTCTTTAAATGATGGCTGCTTCTAAGCCAACATCCTGGCTGTCTATGCCTTCCCACATCGT
TTCCCACTTAACCATGACTTTGGGACCTTAGCTGGCGGTCTGGGTTGTTTCCCTCTTCACGACGGACGTTAGCACCCGCC
GTGTGTCTCCCGTGATAACATTCTTCGGTATTCGGAGTTTGCATCGGTTTGGTAATCCGGGATGGACCCCTAGCCGAAAC
AGTGCTCTACCCCCGAAGATGAGTTCACGAGGCGCTACCTAAATAGCTTTCGGGGAGAACCAGCTATCTCCCGGTTTGAT
TGGCCTTTCACCCCCAGCCACAAGTCATCCGCTAATTTTTCAACATTAGTCGGTTCGGTCCTCCAGTTAGTGTTACCCAA
CCTTCAACCTGCCCATGGCTAGATCACCGGGTTTCGGGTCTATACCTTGCAACTAGACGCCCAGTTAAGACTCGGTTTCC
CTACGGCTCCCCTATTCGGTTAACCTTGCTACAAAATATAAGTCGCTGACCCATTATACAAAAGGTACGCAGTCACACCA
CGAAGGTGCTCCCACTGCTTGTACGTACACGGTTTCAGGTTCTATTTCACTCCCCTCGCCGGGGTTCTTTTCGCCTTTCC
CTCACGGTACTGGTTCACTATCGGTCAGTCAGGAGTATTTAGCCTTGGAGGATGGTCCCCCCATATTCAGACAGGATGTC
ACGTGTCCCGCCCTACTCATCGAGTTCACAGCCAGTGTGTTTTTGTGTACGGGACTATCACCCTGTACCGTGCGACTTTC
CAGACGCTTCCACTAACACACCAACTGATTCAGACTCTGGGCTCTTCCCCGTTCGCTCGCCGCTACTGGGGGAATCTCGG
TTGATTTCTTTTCCTCGGGGTACTTAGATGTTTCAGTTCCCCCGGTTCGCCTTGCATGGCTATGTATTCACCATGCAATA
GTGCAACGAATTGCACTGGGTTTCCCCATTCGGGTATCGTCGGTTATTACGCTTCATATCAGCTTACCGACGCTTTTCGC
AGATTAGCACGCCCTTCATCGCCTCTGACTGCCTAGGCATCCACCGTGTACGCTTAGTCGCTTAACCTCACAACCCGAAG
GTGTCTTTTCTTTACTGGTGTGGTGGTGAACACACTCTGCGTTATTGTGCAGTACGCGCCATCCTCATGAACCTCAAGTT
CACTTCGGTTGCCGTGCGCTGTACGCTTCGCCGAGTATCTTCTCACTCACACTGCAACAAAGGATAAACACATCGCATTG
CAATTATTTGAGAGACTCTGATACAGGGTACTCCTTATCTCAGTACTGCTACGGAGAGACAAGTTTCAGCTGTATCGTTT
CAATTTTCAGCTTGTTCCAGATTGTTAAAGAGCAATATTCTAACACGACCCACGTAGACTGGCGTCCACGCTTCGATGCC
TCCCACCTATCCTACACATCAAGGCTCAATATTCAGTGTCAAGCTATAGTAAAGGTTCACGGGGTCTTTCCGTCTTGCCG
CGGGTACACTGCATCTTCACAGCGAGTTCAATTTCACTGAGTCTCGGGTGGAGACAGCCTGGCCATCATTACGCCATTCG
TGCAGGTCGGAACTTACCCGACAAGGAATTTCGCTACCTTAGGACCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTCG
ATCAAGAGCTTCGCCTTGCGGCTGACCCCATCAATTAACCTTCCAGCACCGGGCAGGCGTCACACCGTATACGTCCACTT
TCGTGTTTGCACAGTGCTGTGTTTTTATTAAACAGTTGCAGCCAGCTGGTATCTGCGACTGGCTTCGGCGCCGAGAGCAA
GTCTCTTTACCTAATGCCAGCGTGCCTTCTCCCGAAGTTACGGCACCATTTTGCCTAGTTCCTTCACCCGAGTTCTCTCA
AGCGCCTGAGTATTCTCTACCTGACCACCTGTGTCGGTTTGGGGTACGATTGAATGTCACCTGATGCTTAGAGGCTTTTC
CTGGAAGCTTGGCATCAACTACTTCACCACCGTAGTGGCTCGTCATCACACCTCAGCGTTGATAAGCAACCGGATTTACC
AAGTCACTCCGCCTACATGCTTAAACCGGGACAACCGTCGCCCGGCTAGCCTAGCCTTCTCCGTCCCCCCTTCGCAGTCA
CACCCAGTACAGGAATATTAACCTGTTTCCCATCGACTACGCTTTTCAGCCTCGCCTTAGGGGTCGACTCACCCTGCCCC
GATTAACGTTGGACAGGAACCCTTGGTCTTCCGGCGTGCGGGTTTTTCACCCGCATTATCGTTACTTATGTCAGCATTCG
CACTTCTGATACCTCCAGCAGCCCTCACAGGCCACCTTCAACGGCTTACAGAACGCTCCCCTACCCAACAACGCCTAAGC
GTCGCTGCCGCAGCTTCGGTGCATGGTTTAGCCCCGTTACATCTTCCGCGCAGGCCGACTCGACCAGTGAGCTATTACGC
TTTCTTTAAATGATGGCTGCTTCTAAGCCAACATCCTGGCTGTCTATGCCTTCCCACATCGTTTCCCACTTAACCATGAC
TTTGGGACCTTAGCTGGCGGTCTGGGTTGTTTCCCTCTTCACGACGGACGTTAGCACCCGCCGTGTGTCTCCCGTGATAA
CATTCTTCGGTATTCGGAGTTTGCATCGGTTTGGTAATCCGGGATGGACCCCTAGCCGAAACAGTGCTCTACCCCCGAAG
ATGAGTTCACGAGGCGCTACCTAAATAGCTTTCGGGGAGAACCAGCTATCTCCCGGTTTGATTGGCCTTTCACCCCCAGC
CACAAGTCATCCGCTAATTTTTCAACATTAGTCGGTTCGGTCCTCCAGTTAGTGTTACCCAACCTTCAACCTGCCCATGG
CTAGATCACCGGGTTTCGGGTCTATACCTTGCAACTAGACGCCCAGTTAAGACTCGGTTTCCCTACGGCTCCCCTATTCG
GTTAACCTTGCTACAAAATATAAGTCGCTGACCCATTATACAAAAGGTACGCAGTCACACCACGAAGGTGCTCCCACTGC
TTGTACGTACACGGTTTCAGGTTCTATTTCACTCCCCTCGCCGGGGTTCTTTTCGCCTTTCCCTCACGGTACTGGTTCAC
TATCGGTCAGTCAGGAGTATTTAGCCTTGGAGGATGGTCCCCCCATATTCAGACAGGATGTCACGTGTCCCGCCCTACTC
ATCGAGTTCACAGCCAGTGTGTTTTTGTGTACGGGACTATCACCCTGTACCGTGCGACTTTCCAGACGCTTCCACTAACA
CACCAACTGATTCAGACTCTGGGCTCTTCCCCGTTCGCTCGCCGCTACTGGGGGAATCTCGGTTGATTTCTTTTCCTCGG
GGTACTTAGATGTTTCAGTTCCCCCGGTTCGCCTTGCATGGCTATGTATTCACCATGCAATAGTGCAACGAATTGCACTG
GGTTTCCCCATTCGGGTATCGTCGGTTATTACGCTTCATATCAGCTTACCGACGCTTTTCGCAGATTAGCACGCCCTTCA
TCGCCTCTGACTGCCTAGGCATCCACCGTGTACGCTTAGTCGCTTAACCTCACAACCCGAAGGTGTCTTTTCTTTACTGG
TGTGGTGGTGAACACACTCTGCGTTATTGTGCAGTACGCGCCATCCTCATGAACCTCAAGTTCACTTCGGTTGCCGTGCG
CTGTACGCTTCGCCGAGTATCTTCTCACTCACACTGCAACAAAGGATAAACACATCGCATTGCAATTATTTGAGAGACTC
TGATACAGGGTACTCCTTATCTCAGTACTGCTACGGAGAGACAAGTTTCAGCTGTATCGTTTCAATTTTCAGCTTGTTCC
AGATTGTTAAAGAGCAATATTCTAACACGACCCACGTAGACTGGCGTCCACGCTTCGATGCCTCCCACCTATCCTACACA
TCAAGGCTCAATATTCAGTGTCAAGCTATAGTAAAGGTTCACGGGGTCTTTCCGTCTTGCCGCGGGTACACTGCATCTTC
ACAGCGAGTTCAATTTCACTGAGTCTCGGGTGGAGACAGCCTGGCCATCATTACGCCATTCGTGCAGGTCGGAACTTACC
CGACAAGGAATTTCGCTACCTTAGGACCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTCGATCAAGAGCTTCGCCTTG
CGGCTGACCCCATCAATTAACCTTCCAGCACCGGGCAGGCGTCACACCGTATACGTCCACTTTCGTGTTTGCACAGTGCT
GTGTTTTTATTAAACAGTTGCAGCCAGCTGGTATCTGCGACTGGCTTCGGCGCCGAGAGCAAGTCTCTTTACCTAATGCC
AGCGTGCCTTCTCCCGAAGTTACGGCACCATTTTGCCTAGTTCCTTCACCCGAGTTCTCTCAAGCGCCTGAGTATTCTCT
ACCTGACCACCTGTGTCGGTTTGGGGTACGATTGAATGTCACCTGATGCTTAGAGGCTTTTCCTGGAAGCTTGGCATCAA
CTACTTCACCACCGTAGTGGCTCGTCATCACACCTCAGCGTTGATAAGCAACCGGATTTACCAAGTCACTCCGCCTACAT
GCTTAAACCGGGACAACCGTCGCCCGGCTAGCCTAGCCTTCTCCGTCCCCCCTTCGCAGTCACACCCAGTACAGGAATAT
TAACCTGTTTCCCATCGACTACGCTTTTCAGCCTCGCCTTAGGGGTCGACTCACCCTGCCCCGATTAACGTTGGACAGGA
ACCCTTGGTCTTCCGGCGTGCGGGTTTTTCACCCGCATTATCGTTACTTATGTCAGCATTCGCACTTCTGATACCTCCAG
CAGCCCTCACAGGCCACCTTCAACGGCTTACAGAACGCTCCCCTACCCAACAACGCCTAAGCGTCGCTGCCGCAGCTTCG
GTGCATGGTTTAGCCCCGTTACATCTTCCGCGCAGGCCGACTCGACCAGTGAGCTATTACGCTTTCTTTAAATGATGGCT
GCTTCTAAGCCAACATCCTGGCTGTCTATGCCTTCCCACATCGTTTCCCACTTAACCATGACTTTGGGACCTTAGCTGGC
GGTCTGGGTTGTTTCCCTCTTCACGACGGACGTTAGCACCCGCCGTGTGTCTCCCGTGATAACATTCTTCGGTATTCGGA
GTTTGCATCGGTTTGGTAATCCGGGATGGACCCCTAGCCGAAACAGTGCTCTACCCCCGAAGATGAGTTCACGAGGCGCT
ACCTAAATAGCTTTCGGGGAGAACCAGCTATCTCCCGGTTTGATTGGCCTTTCACCCCCAGCCACAAGTCATCCGCTAAT
TTTTCAACATTAGTCGGTTCGGTCCTCCAGTTAGTGTTACCCAACCTTCAACCTGCCCATGGCTAGATCACCGGGTTTCG
GGTCTATACCTTGCAACTAGACGCCCAGTTAAGACTCGGTTTCCCTACGGCTCCCCTATTCGGTTAACCTTGCTACAAAA
TATAAGTCGCTGACCCATTATACAAAAGGTACGCAGTCACACCACGAAGGTGCTCCCACTGCTTGTACGTACACGGTTTC
AGGTTCTATTTCACTCCCCTCGCCGGGGTTCTTTTCGCCTTTCCCTCACGGTACTGGTTCACTATCGGTCAGTCAGGAGT
ATTTAGCCTTGGAGGATGGTCCCCCCATATTCAGACAGGATGTCACGTGTCCCGCCCTACTCATCGAGTTCACAGCCAGT
GTGTTTTTGTGTACGGGACTATCACCCTGTACCGTGCGACTTTCCAGACGCTTCCACTAACACACCAACTGATTCAGACT
CTGGGCTCTTCCCCGTTCGCTCGCCGCTACTGGGGGAATCTCGGTTGATTTCTTTTCCTCGGGGTACTTAGATGTTTCAG
TTCCCCCGGTTCGCCTTGCATGGCTATGTATTCACCATGCAATAGTGCAACGAATTGCACTGGGTTTCCCCATTCGGGTA
TCGTCGGTTATTACGCTTCATATCAGCTTACCGACGCTTTTCGCAGATTAGCACGCCCTTCATCGCCTCTGACTGCCTAG
GCATCCACCGTGTACGCTTAGTCGCTTAACCTCACAACCCGAAGGTGTCTTTTCTTTACTGGTGTGGTGGTGAACACACT
CTGCGTTATTGTGCAGTACGCGCCATCCTCATGAACCTCAAGTTCACTTCGGTTGCCGTGCGCTGTACGCTTCGCCGAGT
ATCTTCTCACTCACACTGCAACAAAGGATAAACACATCGCATTGCAATTATTTGAGAGACTCTGATACAGGGTACTCCTT
ATCTCAGTACTGCTACGGAGAGACAAGTTTCAGCTGTATCGTTTCAATTTTCAGCTTGTTCCAGATTGTTAAAGAGCAAT
ATTCTAACACGACCCACGTAGACTGGCGTCCACGCTTCGATGCCTCCCACCTATCCTACACATCAAGGCTCAATATTCAG
TGTCAAGCTATAGTAAAGGTTCACGGGGTCTTTCCGTCTTGCCGCGGGTACACTGCATCTTCACAGCGAGTTCAATTTCA
CTGAGTCTCGGGTGGAGACAGCCTGGCCATCATTACGCCATTCGTGCAGGTCGGAACTTACCCGACAAGGAATTTCGCTA
CCTTAGGACCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTCGATCAAGAGCTTCGCCTTGCGGCTGACCCCATCAATT
AACCTTCCAGCACCGGGCAGGCGTCACACCGTATACGTCCACTTTCGTGTTTGCACAGTGCTGTGTTTTTATTAAACAGT
TGCAGCCAGCTGGTATCTGCGACTGGCTTCGGCGCCGAGAGCAAGTCTCTTTACCTAATGCCAGCGTGCCTTCTCCCGAA
GTTACGGCACCATTTTGCCTAGTTCCTTCACCCGAGTTCTCTCAAGCGCCTGAGTATTCTCTACCTGACCACCTGTGTCG
GTTTGGGGTACGATTGAATGTCACCTGATGCTTAGAGGCTTTTCCTGGAAGCTTGGCATCAACTACTTCACCACCGTAGT
GGCTCGTCATCACACCTCAGCGTTGATAAGCAACCGGATTTACCAAGTCACTCCGCCTACATGCTTAAACCGGGACAACC
GTCGCCCGGCTAGCCTAGCCTTCTCCGTCCCCCCTTCGCAGTCACACCCAGTACAGGAATATTAACCTGTTTCCCATCGA
CTACGCTTTTCAGCCTCGCCTTAGGGGTCGACTCACCCTGCCCCGATTAACGTTGGACAGGAACCCTTGGTCTTCCGGCG
TGCGGGTTTTTCACCCGCATTATCGTTACTTATGTCAGCATTCGCACTTCTGATACCTCCAGCAGCCCTCACAGGCCACC
TTCAACGGCTTACAGAACGCTCCCCTACCCAACAACGCCTAAGCGTCGCTGCCGCAGCTTCGGTGCATGGTTTAGCCCCG
TTACATCTTCCGCGCAGGCCGACTCGACCAGTGAGCTATTACGCTTTCTTTAAATGATGGCTGCTTCTAAGCCAACATCC
TGGCTGTCTATGCCTTCCCACATCGTTTCCCACTTAACCATGACTTTGGGACCTTAGCTGGCGGTCTGGGTTGTTTCCCT
CTTCACGACGGACGTTAGCACCCGCCGTGTGTCTCCCGTGATAACATTCTTCGGTATTCGGAGTTTGCATCGGTTTGGTA
ATCCGGGATGGACCCCTAGCCGAAACAGTGCTCTACCCCCGAAGATGAGTTCACGAGGCGCTACCTAAATAGCTTTCGGG
GAGAACCAGCTATCTCCCGGTTTGATTGGCCTTTCACCCCCAGCCACAAGTCATCCGCTAATTTTTCAACATTAGTCGGT
TCGGTCCTCCAGTTAGTGTTACCCAACCTTCAACCTGCCCATGGCTAGATCACCGGGTTTCGGGTCTATACCTTGCAACT
AGACGCCCAGTTAAGACTCGGTTTCCCTACGGCTCCCCTATTCGGTTAACCTTGCTACAAAATATAAGTCGCTGACCCAT
TATACAAAAGGTACGCAGTCACACCACGAAGGTGCTCCCACTGCTTGTACGTACACGGTTTCAGGTTCTATTTCACTCCC
CTCGCCGGGGTTCTTTTCGCCTTTCCCTCACGGTACTGGTTCACTATCGGTCAGTCAGGAGTATTTAGCCTTGGAGGATG
GTCCCCCCATATTCAGACAGGATGTCACGTGTCCCGCCCTACTCATCGAGTTCACAGCCAGTGTGTTTTTGTGTACGGGA
CTATCACCCTGTACCGTGCGACTTTCCAGACGCTTCCACTAACACACCAACTGATTCAGACTCTGGGCTCTTCCCCGTTC
GCTCGCCGCTACTGGGGGAATCTCGGTTGATTTCTTTTCCTCGGGGTACTTAGATGTTTCAGTTCCCCCGGTTCGCCTTG
CATGGCTATGTATTCACCATGCAATAGTGCAACGAATTGCACTGGGTTTCCCCATTCGGGTATCGTCGGTTATTACGCTT
CATATCAGCTTACCGACGCTTTTCGCAGATTAGCACGCCCTTCATCGCCTCTGACTGCCTAGGCATCCACCGTGTACGCT
TAGTCGCTTAACCTCACAACCCGAAGGTGTCTTTTCTTTACTGGTGTGGTGGTGAACACACTCTGCGTTATTGTGCAGTA
CGCGCCATCCTCATGAACCTCAAGTTCACTTCGGTTGCCGTGCGCTGTACGCTTCGCCGAGTATCTTCTCACTCACACTG
CAACAAAGGATAAACACATCGCATTGCAATTATTTGAGAGACTCTGATACAGGGTACTCCTTATCTCAGTACTGCTACGG
AGAGACAAGTTTCAGCTGTATCGTTTCAATTTTCAGCTTGTTCCAGATTGTTAAAGAGCAATATTCTAACACGACCCACG
TAGACTGGCGTCCACGCTTCGATGCCTCCCACCTATCCTACACATCAAGGCTCAATATTCAGTGTCAAGCTATAGTAAAG
GTTCACGGGGTCTTTCCGTCTTGCCGCGGGTACACTGCATCTTCACAGCGAGTTCAATTTCACTGAGTCTCGGGTGGAGA
CAGCCTGGCCATCATTACGCCATTCGTGCAGGTCGGAACTTACCCGACAAGGAATTTCGCTACCTTAGGACCGTTATAGT
TACGGCCGCCGTTTACTGGGGCTTCGATCAAGAGCTTCGCCTTGCGGCTGACCCCATCAATTAACCTTCCAGCACCGGGC
AGGCGTCACACCGTATACGTCCACTTTCGTGTTTGCACAGTGCTGTGTTTTTATTAAACAGTTGCAGCCAGCTGGTATCT
GCGACTGGCTTCGGCGCCGAGAGCAAGTCTCTTTACCTAATGCCAGCGTGCCTTCTCCCGAAGTTACGGCACCATTTTGC
CTAGTTCCTTCACCCGAGTTCTCTCAAGCGCCTGAGTATTCTCTACCTGACCACCTGTGTCGGTTTGGGGTACGATTGAA
TGTCACCTGATGCTTAGAGGCTTTTCCTGGAAGCTTGGCATCAACTACTTCACCACCGTAGTGGCTCGTCATCACACCTC
AGCGTTGATAAGCAACCGGATTTACCAAGTCACTCCGCCTACATGCTTAAACCGGGACAACCGTCGCCCGGCTAGCCTAG
CCTTCTCCGTCCCCCCTTCGCAGTCACACCCAGTACAGGAATATTAACCTGTTTCCCATCGACTACGCTTTTCAGCCTCG
CCTTAGGGGTCGACTCACCCTGCCCCGATTAACGTTGGACAGGAACCCTTGGTCTTCCGGCGTGCGGGTTTTTCACCCGC
ATTATCGTTACTTATGTCAGCATTCGCACTTCTGATACCTCCAGCAGCCCTCACAGGCCACCTTCAACGGCTTACAGAAC
GCTCCCCTACCCAACAACGCCTAAGCGTCGCTGCCGCAGCTTCGGTGCATGGTTTAGCCCCGTTACATCTTCCGCGCAGG
CCGACTCGACCAGTGAGCTATTACGCTTTCTTTAAATGATGGCTGCTTCTAAGCCAACATCCTGGCTGTCTATGCCTTCC
CACATCGTTTCCCACTTAACCATGACTTTGGGACCTTAGCTGGCGGTCTGGGTTGTTTCCCTCTTCACGACGGACGTTAG
CACCCGCCGTGTGTCTCCCGTGATAACATTCTTCGGTATTCGGAGTTTGCATCGGTTTGGTAATCCGGGATGGACCCCTA
GCCGAAACAGTGCTCTACCCCCGAAGATGAGTTCACGAGGCGCTACCTAAATAGCTTTCGGGGAGAACCAGCTATCTCCC
GGTTTGATTGGCCTTTCACCCCCAGCCACAAGTCATCCGCTAATTTTTCAACATTAGTCGGTTCGGTCCTCCAGTTAGTG
TTACCCAACCTTCAACCTGCCCATGGCTAGATCACCGGGTTTCGGGTCTATACCTTGCAACTAGACGCCCAGTTAAGACT
CGGTTTCCCTACGGCTCCCCTATTCGGTTAACCTTGCTACAAAATATAAGTCGCTGACCCATTATACAAAAGGTACGCAG
TCACACCACGAAGGTGCTCCCACTGCTTGTACGTACACGGTTTCAGGTTCTATTTCACTCCCCTCGCCGGGGTTCTTTTC
GCCTTTCCCTCACGGTACTGGTTCACTATCGGTCAGTCAGGAGTATTTAGCCTTGGAGGATGGTCCCCCCATATTCAGAC
AGGATGTCACGTGTCCCGCCCTACTCATCGAGTTCACAGCCAGTGTGTTTTTGTGTACGGGACTATCACCCTGTACCGTG
CGACTTTCCAGACGCTTCCACTAACACACCAACTGATTCAGACTCTGGGCTCTTCCCCGTTCGCTCGCCGCTACTGGGGG
AATCTCGGTTGATTTCTTTTCCTCGGGGTACTTAGATGTTTCAGTTCCCCCGGTTCGCCTTGCATGGCTATGTATTCACC
ATGCAATAGTGCAACGAATTGCACTGGGTTTCCCCATTCGGGTATCGTCGGTTATTACGCTTCATATCAGCTTACCGACG
CTTTTCGCAGAT
>ruckeri5 count=11
GTTACGCCCAGCTCGCTCACCCAAGTCTTTCATGTCGAGCCAGATTTTTCGATAACCATAAACCGTACCACTTTCAAGCC
ACAGTTGTTTAATGAGGCCTGTTTGGCGCTCATCTTGCCTGGCGCGCAGAGATTGAGGCCGTTTTAGCCACTGGTAATAA
CCGCTGTAATGGAGGCCAAGAACCAGACACAACCGGCGTACCGCATAAAAGGTCGACATTTTTTTAATGAAGGCGTACTT
CAGCCTGACGTTCTTGCAAAGTACGCGGCGGCCTTTTTTAAAATATCTCGCTCTTCAGTCACCCGCTTAAGTTCTTGCCT
GAGGCGCTTTACTTCGTGCTGGAGGGCATCGTCCTGCTTGCGCTGTGGTTCGGGTTTTTGGTAACGCTTGATCCATGCGT
AAAGGCTATTGGTGGAAACACCAAGTCGTGCCGCAACCTCAGAAACAGGGTATCCCTGCTCGTTAATCTGTTTTACAGCT
TCAATTTTGAATTCTTCAGTGAACGTTTTGACTGACATAAACACTCCTTCATTAGGCCTCCATTATGAGGCAAAAAAGTG
TCTACGAAACCCGGGTCTATTCATCTAGGTTACGCCCAGCTCGCTCACCCAAGTCTTTCATGTCGAGCCAGATTTTTCGA
TAACCATAAACCGTACCACTTTCAAGCCACAGTTGTTTAATGAGGCCTGTTTGGCGCTCATCTTGCCTGGCGCGCAGAGA
TTGAGGCCGTTTTAGCCACTGGTAATAACCGCTGTAATGGAGGCCAAGAACCAGACACAACCGGCGTACCGCATAAAAGG
TCGACATTTTTTTAATGAAGGCGTACTTCAGCCTGACGTTCTTGCAAAGTACGCGGCGGCCTTTTTTAAAATATCTCGCT
CTTCAGTCACCCGCTTAAGTTCTTGCCTGAGGCGCTTTACTTCGTGCTGGAGGGCATCGTCCTGCTTGCGCTGTGGTTCG
GGTTTTTGGTAACGCTTGATCCATGCGTAAAGGCTATTGGTGGAAACACCAAGTCGTGCCGCAACCTCAGAAACAGGGTA
TCCCTGCTCGTTAATCTGTTTTACAGCTTCAATTTTGAATTCTTCAGTGAACGTTTTGACTGACATAAACACTCCTTCAT
TAGGCCTCCATTATGAGGCAAAAAAGTGTCTACGAAACCCGGGTCTATTCATCTAGGTTACGCCCAGCTCGCTCACCCAA
GTCTTTCATGTCGAGCCAGATTTTTCGATAACCATAAACCGTACCACTTTCAAGCCACAGTTGTTTAATGAGGCCTGTTT
GGCGCTCATCTTGCCTGGCGCGCAGAGATTGAGGCCGTTTTAGCCACTGGTAATAACCGCTGTAATGGAGGCCAAGAACC
AGACACAACCGGCGTACCGCATAAAAGGTCGACATTTTTTTAATGAAGGCGTACTTCAGCCTGACGTTCTTGCAAAGTAC
GCGGCGGCCTTTTTTAAAATATCTCGCTCTTCAGTCACCCGCTTAAGTTCTTGCCTGAGGCGCTTTACTTCGTGCTGGAG
GGCATCGTCCTGCTTGCGCTGTGGTTCGGGTTTTTGGTAACGCTTGATCCATGCGTAAAGGCTATTGGTGGAAACACCAA
GTCGTGCCGCAACCTCAGAAACAGGGTATCCCTGCTCGTTAATCTGTTTTACAGCTTCAATTTTGAATTCTTCAGTGAAC
GTTTTGACTGACATAAACACTCCTTCATTAGGCCTCCATTATGAGGCAAAAAAGTGTCTACGAAACCCGGGTCTATTCAT
CTAGGTTACGCCCAGCTCGCTCACCCAAGTCTTTCATGTCGAGCCAGATTTTTCGATAACCATAAACCGTACCACTTTCA
AGCCACAGTTGTTTAATGAGGCCTGTTTGGCGCTCATCTTGCCTGGCGCGCAGAGATTGAGGCCGTTTTAGCCACTGGTA
ATAACCGCTGTAATGGAGGCCAAGAACCAGACACAACCGGCGTACCGCATAAAAGGTCGACATTTTTTTAATGAAGGCGT
ACTTCAGCCTGACGTTCTTGCAAAGTACGCGGCGGCCTTTTTTAAAATATCTCGCTCTTCAGTCACCCGCTTAAGTTCTT
GCCTGAGGCGCTTTACTTCGTGCTGGAGGGCATCGTCCTGCTTGCGCTGTGGTTCGGGTTTTTGGTAACGCTTGATCCAT
GCGTAAAGGCTATTGGTGGAAACACCAAGTCGTGCCGCAACCTCAGAAACAGGGTATCCCTGCTCGTTAATCTGTTTTAC
AGCTTCAATTTTGAATTCTTCAGTGAACGTTTTGACTGACATAAACACTCCTTCATTAGGCCTCCATTATGAGGCAAAAA
AGTGTCTACGAAACCCGGGTCTATTCATCTAGGTTACGCCCAGCTCGCTCACCCAAGTCTTTCATGTCGAGCCAGATTTT
TCGATAACCATAAACCGTACCACTTTCAAGCCACAGTTGTTTAATGAGGCCTGTTTGGCGCTCATCTTGCCTGGCGCGCA
GAGATTGAGGCCGTTTTAGCCACTGGTAATAACCGCTGTAATGGAGGCCAAGAACCAGACACAACCGGCGTACCGCATAA
AAGGTCGACATTTTTTTAATGAAGGCGTACTTCAGCCTGACGTTCTTGCAAAGTACGCGGCGGCCTTTTTTAAAATATCT
CGCTCTTCAGTCACCCGCTTAAGTTCTTGCCTGAGGCGCTTTACTTCGTGCTGGAGGGCATCGTCCTGCTTGCGCTGTGG
TTCGGGTTTTTGGTAACGCTTGATCCATGCGTAAAGGCTATTGGTGGAAACACCAAGTCGTGCCGCAACCTCAGAAACAG
GGTATCCCTGCTCGTTAATCTGTTTTACAGCTTCAATTTTGAATTCTTCAGTGAACGTTTTGACTGACATAAACACTCCT
TCATTAGGCCTCCATTATGAGGCAAAAAAGTGTCTACGAAACCCGGGTCTATTCATCTAGGT


Insignia V0.7
Signatures calculated: Thu Aug 28 2008 14:46:27

Reference Organism:
Yersinia pestis CO92 CO92

Target Organism(s):
Yersinia\ pestis\ KIM\ KIM
Yersinia\ pestis\ CO92\ CO92
Yersinia\ pestis\ biovar\ Medievalis\ str.\ 91001\ 91001
Yersinia\ pestis\ Antiqua\ Antiqua


Signatures:
Start Stop Sequence
2022 2132 TTATTTAATCAAATTTGCCGAATCTTCTTCCTCGTTTCCCAATATCGGCCAGTCAATATCTGGAGCCAACGAAGTATCAACTCGGTTAAGCAAGACACGGTACTTTTTCAA
2453 2594 AGTCAAAAACTGCGGCAAACCCTTCCGTAGTCGTCGGTGGTTCAATCTCAGTGCAGTTGGCCGGTAACCCAGTAAACGGTGGGATATAGGCATCTCCAACTCCGATATATTCATTTGTCCCCTCCAGCAGATTGTACACTTT
2774 2888 CATCCACCTGACCAAACTGTTCGTAGCACTGACAATGCAGTCAAACTCAGCGCTCGCTTCGAGCGACCTGTAAATTATCTCGACATTTTCGCTATATCCAAATTCACATGTACAA
3403 3562 TGGTTTTGTACCAAGATCTGTACTGGAAGCGGCTGCATTGTGCGCATGTGACTTAATTCCATCTTGCTCCAGTGACAACACTGCACGAGAAGTTGGTTTACCCTTAATCATCCATCCACGCATGTCAGGTAACACACCTGATGGGTATGCGGCCGCCAGC
3545 3697 GGGTATGCGGCCGCCAGCTTCGGATACACACTCTTGTCAAACGTCTGCCCTTGCATGATGGCGAAACCAGACGGGGCCACATCGTTCGGCCACGGAATAGGGGCACCTACTGGATATGACTCAGGCGGTGGTAGAACACTACTGTACAATATA
3903 4043 ATGTTGCTGTGATGCCAACAGGATTCCAGTTCCCTGATCGCAACACTATTTCGCTTATCGCTGCCTGGTCATAAAGCCCCGCGTTGTATCCAGCACCTCCAATAAGAGTAATTAAAGCGGTTGAAGCTCCTTGAGGCATTG
4137 4273 CAGGAATATCTGCCCCATTCTGGGATTTCTGAATTGAGCCAGCCGCCAGGTTAACCGTATTCAACAAGCCAAGATAATCAATTACCCCTTGAGGTGAACGACCAGAAAGCGCAGTTAGTGTTCCATCCAAAGGCTGC
4627 4769 GCCTTCAGCTATTTCCTGAGCTTCGTCTCTAAATCGCTTGGCGTCATCGGCAAAACCTTGAGCACTTGCGCTAGCACTGTTTGCCTTTTCCGCCTCTGATTTAACCGCCGACAGGAGCTGATCAGATGTTTTCGTGGCTGCTT
5501 5630 AACCAGCTGCGCATCGTCTTCGAGTCGGCTGACATACGGGTCATAAGCGCGGTAAAACGTGCGCTGAACTGCGTTAAGTCGCCCTCATAGGTTGTGATGATGCGGCACGGAACTTCCTTCTGGGTCTCCC
5635 5791 GTACGGCTCTGCGAGAATCAGACTTGTGTCGCTAATTACGCGCTTAATTTCGTACAACTTGTTGTCAGGGCCGATGACGATCATGCCCGGCAGAACGCCGTTGGCGGTCACATTCCAGTACGTGCCAGTGCCAGAAAGTGCGTCACTTCCCTGTGTA
6108 6279 GAAGAAATTGTCAGGGAGACCGCTTTCCGCTGCTTTAAGCGAGATATCCAAACGTTCTTTGGAAATGTCATTCGCGTCCCAGCGGTGGCCGTTCCATTCGAAATAAACGGTGCGACACTCCTGACGAAGACGCCAGTCATTAATTACTGCCAGTTTCTCAACTTTCAATCTT
6546 6660 AACACCAGAATGAGATAAGGTTCATTCACGCGGTGATCGTATTTACGCTGTCTATGCCAGGTATAAACATCAACAGTCGCAGACTGACCTGCCGGCAAATTGTAATAACCAAACA
6957 7074 GCAGCAATATTTCCCCTGAACGTACCGTTGTTGAACTCAGGGCTACCATCTTTGGGGATAATCCAGCCTCTACTGCCTGCGGCGTAATCGTTCGACTGAATGACATTACCAATTTTGG
7088 7218 GCCGTCTTTGATATAAGCGCCGTTCATATACGCCACGCTGTTTTCGATGACAAATGGCGTAGTGATCTTCCCGTTTACGGAGTTGACCAGGCCAAAACGGTCTGCCTGCACCAAAAATTGTGATAAACCGG
7295 7486 GGCAGCAGAAACCTTCTTATCGGTATCGGCAATCGCCTTCGCTTGCTGCTGAATGAGAGCCGAGTTGCCATCAACTTCAGCCTTAACCGTATCAATTCTCTTACCGAGAGCACCGTCAGCGTTTGCACGCGCGGTGGCTTCGGATGTGATAGCAGCGTTGATATCTTTGCCGGTCTGAGCCTGGAGATTGGT
7666 7771 TCTGTGCCTTCAGTTGGGTAATTTGCTTCGATAGCGCCTCATCCGCTGTGGCTCGGGCAATCTGCTCTTCAGTAAGTGCAGCGCTGATATCACCTTCAACTTTCGC
7869 7990 GCTTCCAGCTTGGAGATATGCGTCGCAATTGTCTTGTCAGCCTCTACGCGAGCCGTTGTCTCTTCCAGAATCGACGCCCGGATGTCCTCACCAATTTCTGCGCGGATTTCTTCCACTTTTGA
8101 8206 TGGTTTTGGCCAGAGCCTCGATATTGGTCGACGCTTCCGCGCTGGCGCGTTCGACTTCGGCAACGGTCTTTTTCATCTCTTCGACGGCGGCAACACTGTCATCGAC
8275 8391 CAGCAATCAGCGCGGAGGCGTTGTCGAGCGCAGCGTGCGCCGCTTCCACAGCGCCAGCTATGGCTTTGCCCTGCTCGGAAACGGTTTCCTGCAACTCGACCAGAGCCGCATTAGAGT
19576 19698 CGGCCAGCGGCGCAGTGATAGTGATGTTACGACCCTGTACCAGCTCGGAGATGGTGGTCTGCCCCAGCTGGTCTACGGTGACTTTCAGAGTTTCGGTCGCAACTTCAACCTGAACACCGCCTT
43518 43621 CTCAGCAGTCTGGGCTTTCGTCACTGGCAATGGACCGAGATCTGCGCATTCGAGGCTGCCTTACCGGCGCTTATGCTTATCCATTTGCTGGTAAAAATGAGCCA
49756 49907 AGCCTGTCTGGCTGGAATGATGCCGATTCTGTTTCTGGAAATGATGTCCCAGGCATCTACCGGATTGTTGCACCAGTCGAAAGAACCCGCTCCGTTTGGCCCAATCGCAATCACTTTTGTGTAATCTCCAACTTGTGAGCCTTTCCAGCCAT
55259 55378 GGAGCTTATAGAAGGAAGAAGTTAGGGGGTGGGGTCGGTAAGTTGATGAAAACAGGGAGTGTTCTGATAATCTATAGCCGTTTTCTGCGGCCGTAGACGTCAGAAAATAGACAGTGTTGA
55904 56043 GCAGCCATTGCATCAGCGGTCGACAATATCGAATCTGCCGGTATCGGCGCGATTGTAGAATCAGTCGACTCGGCCTTAGTTGGCCTGAGCGATGTAGCTGAAATGACGGGCATGTCCCGACAAGCAATTACCATGCTTAA
59223 59343 TGCCGCCCTGTTCACACAAGGGCGGCTCTCCGTTATCAGGCGGCGTGTGCCGTGTTGGTGGTGTCGTTGTCAGTGCTGTCGGTATCGGAAACCGCGTCGTCAGCACCGGATTCAGCCGCTG
59721 59854 AGGGCTTTCAGCCTTGCCCGTTCGGTCATGAGCATCTCAAACGCTGCGCCATCCTTACCGGACGGGGCGTTCTCCGTCAGGCTGCCGTGGGACACGGTCAGACTGATACGGAACGGGTGCGTCGTGGTCAGACA
60041 60143 GGTTTCTTCCGGCATATCCGCTTTCAGGCGCACACCGCGCTGCACGTACACCTGCCCTTCGTACAGGCTGACCATCACTCCCGCGCCTGACTTCATGTCGTCA
60127 60259 CTGACTTCATGTCGTCAGTCCACGCGCTGGCTTCTGCCGCCTCCTGAATGGCGAGGATGTCCGCTTCGATCGCATCTGACTCGTCGCAAACGCTGTCATATGTATCGTACTGCTCCCGCAGCGCATCAAGACG
60262 60376 GCTGCTGCTCAGGGGTGTACACCGGATCGGGTTCAACAGGCTGCACGAACTCCTTACCGTCTTCGCCGTGGTACCAGATGCGGGCTGCGCGTCCGCGACTCCATGCCCAGCCCTC
60436 60703 GCACGCTGTCTGCCGTGCCGTCGCCCTCGTCCTGGCTGAACAGATCTTCACGGACGTAACCCCCTGCTGCCTCGTAGGCATCGCGCCCGATAAAACGAAATTTGGCGTTGTCGGTGCGCATCTCGGTTTCGGTAATAGCGCGTTTAATCAGGTGTGCGGGCGCGTTCGACCAGCTGGCCTTCACGTTTTCGAATACCTCGACCTGACGGGCAGGATCGTCCTCGAGACAGAGCGCCTGACACTGCTCAACGGTCAGTTCGTCCTGCGC
60854 60966 CATGGCGGCGCGCTGCTCGTTCTCGGCAACCGAGGCGAGGGCCGCAATGTCGTCTGAGACTCGCTTCACCATGACGGTATGATCAGCCGCAAGACGATCTTCCTGCGCCAGCA
61034 61232 AATCAGGTTCTGCAGCAGTCCGAGCGCCTCGATGGAGTCAGCCAGACCGCGAACGCTGTCCACGGAATACGGGATAGTGCGCACGTTCAGCGGGGATTTAGCGAGACTGGACAGCGGAACATACTCGATGACAGCGGCCTCAAGGGCGGCTTTCAGAGCTTCTTCCTGCGCCTTCGTGATTTTTTTGCTGCTCGCTTTT
62431 62536 TCCTCACTTTCCGCCTCGCCGCGGTATGCATCATCAAACGCGTCATAGTCACACTCGCCGGTATACTCCGCCCAGGCGATAAAAGCGGCTTCCCTGCCCTCTTCTT
62615 62721 CCTGGAACATAAATTCTGCGTCCCATTCATCGGCGTGCAGCGCCTGGCAGGCCTCGTAAAAATCCTCCCTGCCGTCAAACTCCGTCAGTTCAAACCATTTGCCAAAA
62784 62897 GCGCATAAAAAATCCTCCATAGTCGATGTTGTCTGAGTTACGGCTTACGCCGTGCCGACACGACAGCCCGACAGGGAGCGAAAAATGCAGGGGCGATGCATGAAGCCGCAGCGC
63112 63227 GGAGGAAGAGCCGTAAAAGCGGAGCCGGCGACATCCTTCGTGGTAGATTCAGGGGGCCGCCGCCCCCTGCTGTTCGCTACACTGGCCGGATTGCTGACAGGGCATTGCGGACAAGA
63523 63638 GCGTGGTCAGCGCCGCCTTTATCCGGCGCGCATCCTCCCATACCTGCCGGGATACATCCCGCACGGCGCGGGACAGGCAATCCGTTACAGCCTGAATCTCCGGCTCCGTCAGCCAT
63893 64030 AGGCCATCCATAATCAGCAATCAAATCGCGCGCACGGCGCATCACAAGGGCGTTTAACGTGGTGGTTTCCTGCATAATGGGTTCCTCTGAACGGGGTCAGTACACGCCGCCCGGGCGGCGTGTTGATTTCATTTTTTC
64021 64144 TCATTTTTTCAGTAATGCGCGAACTTTCGCCGCATCAGCCAGGCGGCGGGTAACGCTTTCCGGGACCGGATGATTGCTGTCACGGGCGGTCCATTCCCTGTGCCATAACGCCTGTTCGGCAGTG
64199 64432 TTCACCTGTTCAATCAGATCCATCACTGCACCGGCAGACACGCCCGACTCAAGCTGTTGTTCTTCCGGTATGGCGTCCATACGCGCCAGTACCGCCCTTGTGTCCTGTTCCGAAACGGCATATTCCATTCCGTCGGCCAGTGCTTCGATATCCGCCGGGGACCAGACAATGAGGTTCAGCGGGGTTTCTGCGGGATACTGCTCACGCAACAAGCGGCAGAGGGTTTCACAGGTT
65350 65536 GGTAGTTCTCCTTTCTCAAATGCGGCGCCAGTTACGCCGTGCCGATACGACAGCCGGGCGAAAAGCGGAGAATGCAGGGGCGGTGCCGCAAGCCGCAGCGCAGTGAAAAAGCCCACGCTGGCTGGACGGTTTGACGGCGAAAATTGAGACGGAAAACGACGGGAAGACAGCGGGTAAAGCCTTTGAA
65519 65707 GCGGGTAAAGCCTTTGAAATAAACTGCGACCGGGTGAGGACACCGGCAGAGCGAACCCTTGCATTCGCAGCGGCCCGGCTACAGTAGCGGCACGGCGTTCACAGCAGAGCGATATGAATGAGGAGCGCAGCGACATCGTATTTTCATCGTTGCTGTCCACAATTTCGGTGGGCAATCCTGTGTATAAAA
66000 66112 GGTCGTGCCCCACAGCCTGAAGAAGATGTTAGCAGCATCCCCCGACACTAAAACAGCATTAAGATGTTAGCCTCGAAAATTGAGTAATTGCCTGATCAAGAGACAGGAGAAGA
66100 66205 AGACAGGAGAAGAGCAAGACTCCGCTGTACGGGTAACTCGACATGATGCAAACGCTGAATACCCGAGTATTGCATATATAATTCGCGGGTATTCCATTCATTTACA
66441 66557 CCATCCTTCCCGACAAGCGCATGGCACAGAATAGATCCGTGATTCTCTTTACGAAGCCGAGCAAAACGACTGTCGAAGTTTTTCTGCGTCAGAGGTTCAATGTCGAGGTACTTCATC
66588 66705 TTCATTCTGTTGAGGTCTGCACCTTCAGCCATCGCCCAAAGAATGTGATGCATATGCTCATCTCGGCAGGTAGCTGCCTCATAACATGATCTGAGCGTTTCCTCAGCAGACAATACAG
66943 67061 CAATCTGTCTCTGACTGGACAGATGTCCGCCAATCAAATCAGATACCGACTGCCCAATCCCGGTGAAGATGATACGAACGGGTACATCCCCATCGCTTAACTGCTTGAGCAAAACACCA
67295 67519 CATTCCTTGGCTAATAACCGATTCGATGACTGTTTCCAGGGTCGAGTCATGGTCACAACTGACGGTAATAGGACGGTTATCCGAAGACTGGATAAGCGAAGCGGCTGTGTGGGCGAGGGAGGTTTTACCTACTCCGCGATCGCCATAAATGAACACATGTCGGCCCGGTGAATGAAGTGCCAGTTGTATCGTCTCCAACTGCTTATCCCGCCCAAAAAGCTTTTC
67517 67655 TTCAGGTGTACGCACTGGCACACTGAGAGTAAGCACTTCATTGAGCCGGGTGGCAAAAGTTGGTAAATCGCAACCTGCAATAGCCATGTCCATTCACTCCGAGAAGATGATTTAGGGGCTATTATACTGATTTCCTGAA
67739 67892 GGGAAAAGTGTGGTTTGCTGGAAGGGGTAGCGTCTGAATGGCATATAAAACGGGAGATGCTTTCGCCTGCCTATACAACTCGCTGGTCTGAACTGCCAATAGCGAAATTATGAGTCACGAAGAGGTTGAAAAGCGTGGTGATTTCAGTGTGAAA
68431 68541 CTTTCGCCGGAGAGAGATTTTCTTTCAACGCAATATCTTCTTTCGTCATGCTGGTTCCGTAGGTAACTTCGAGTCGCTTCCCCAGCTCGCGAAGACTATGTTCTCTGGCAG
68661 68815 ATCTCAATTAGTCCATTAACCTCCCGACCAATAGCCGGAAAGAACTGCTGCAGCTTTATCGTCCGGGAAATATCACTGACGGACTCGCGCGTAAGCATTGTCTGATCACGTCCATTAACCGAAGCATCAACGAACGTTTTTGATTCAACTTCTCC
68944 69095 CTATAACCTGTTTCATTTTGACGCTCCCCTGATATATTCAATTCGGTCAAATACGGCCTTAGTAAACCGTTCAGCTTCAGTACGTGCCTTTTTCAATGCCTCTGCACTTCCCGGGTACGAAACAGGGTTAGCACTAATGATGGTGTCAAAAG
69085 69227 GGTGTCAAAAGACTCACCACATCGTTCGAATCCGTCCAACCTCGGGAGCGATGAGTCTAGGATGTTACTCGCATACACCTCACGAGCCAGACTATGGGACGTTTCGTGATCGCGCTTACTGGTCATTTTTGACATAAATCCAA
71042 71145 TCAGAGCTTGCATCTGAGGCGAAGGGTTGGATAAGCCCTACGGGCGCGTCAGCCTGTCACATTCGGGTATACTTCTGAGATACCGCATTAAAACATGGTATTCC
71277 71385 ATGCACTGAAGAATTTTGCTCGCCTACATGGCCACCGAATTGCAGGCTGGTATGTCGACAATATCTCTGGCGCAACGATGACACGTCCAAAACTCGTGCAACTTTTGGA
71584 71712 TAATGGCATGATGCTGGATATGCTCGCGGCTATATCCAGAAAAGATTATGAGGATCGACGTCGCCGTCAGGCGGAAGGGATCAGCAAAGCGAAAGCGGAAGGTAAATATCGGGGCAGGGTGGCTGATGC
71697 71829 GCAGGGTGGCTGATGCACAAAAACACGAGCTCATCCGAACGCTACGCCTATTGCATGGTAAGTCATTGCGTGAAACAGCCCGTCTCGCAGGGGTATCCAAAATGACGGTGATCCGCGTCTGCCAGCAAGCTGA
72083 72281 ACTTCAGCGATGTGCTTTACAGCCTCGATACCGTGGAAGGCGAGGGCTACGTCCATGTGTTGATCGAGCATCAAAGCTCACCCGATAAGCATATGGCTTTTCGCTTAATTCGTTACGCGATCGCTGCCATGCAACGCCACCTTGAAGCGGGCCATGCTAAGCTGCCGCTGGTGATACCGGTGCTCTTTTACGTGGGTAA
72264 72378 GCTCTTTTACGTGGGTAAACGCAGTCCTTATCCGTACTCCACGCGGTGGTTAGATGAGTTTGACGACCCTGAGTTGGCACATAAACTCTATAGCGGTGCATTTCCACTGGTTGAT
72727 72832 ATAGAGAAAGGCAAACTCGAAGTGGCACGCAGTTTGTTAAAAATGGGCATGCCGATTGAGTCGGTCCAAGAAGCAACCGGCCTTTCTGAAGACGATTTGGCGCAAA
72880 72990 TCTCATGTTCATTTAGATGCGGTGGTACCACCACAAACGGCGGGCCTTTAGCTTGGTGTATCGCTTCTTTCAATACCCGCAGTAGTAGCGCATCATTCCAGTTTGCATGAA
72991 73233 TGTGGATAACCGGTTTTCCCACGGCAAAGCGGGCGGCAATTAACTCACGCAGCGTATCTTCCGCCATGAGCTGCGCGGACGTTTTGGCCGGTTTTGGCTTTTCCGAACGTACCGGTGGCAGCTGTGATAAACGTAAACCACGACGCGACATTGATACGTCCTCCACGTTAATAGGGGTTTGACGCTCAACGGAACGGTTTCACACGTTAAGGGCTTTTTGTCCGATAAATCAAAACATTATAG
74557 74665 TGGTAAAGAAATCATGCAGCAATCTTACTTACGGCAAGAGGAGTTTGCAGGGGCAATGACTGAAGTACTGCGGAATTCGCGTTATCGCTGTGATATATCATCACTTAAT
76303 76534 AGTTTTGACGCCGACATCGGTGGTCGTTTTAGGTAGCGTCCGAGATATTTTACGCATGTGCCATGCGCCCCAAGTCTTCTTAGCGAAGTGGACTTTCTAATAACGACCATACTGGACTGTTCTGGTCAAGTTAAACGACAGGTCGCCTTCAATTACCCAGCGCCCTCGTGTTTCCCCACAATCGGTGAGCTGATCGCATGATGGACGCTGCGTTCAGGCCGGAAGCCATATG
77704 77878 TGGCGGCAGAGCAAACCTTCTTAACGTGATATAGCCACCCTGACGGACAGACTGGCTACGTTGCTGATGGCAGATTATCTGTCTTCACCGCAGGTAATGGCGCTGATACACTATTTATTACAGGCCGGCGAGTCTGCTGACTCCGAAGCCTTTGTTCGCGAACTGGCACAGCGTG
78128 78239 TGAATCGTCATCGATTCGGGCCCTGATGAAAGACGCGCGTGAGGTTCTTGTTCGTTAAGCGGATACGTATATCCTGTACAATAGCCAGATAGGCCTCACGGGGTCTAACCCT
82265 82368 GCATTCTAGATAGTGGGCCGGCGCCGGGACTTAGCTATTTGCGCATACCCAGCAACACCAATCTTAGCTATTTGTGCACGCGCATCAATATCAAAATTAGCTAT
83989 84123 AAACAACAAGGATGGCAGCGTTCGTATATTTATGCCGAGCGAGGATTGAATACAATTAAGAGCCGTTTGACATTGGGGGAAACCTATTCTGATAGCAGTATCTTTGACAGTATCCCGATTAAGGGGATAAAAATT
84628 84735 AAAAAAGAAAGCGGCCAACGTTGGCGCGTTCGATATAATAAGTACTTGCAGAGTGGAACATCGTTAAACATTGCTAGCGAGGAATACGCCACAGAAGGATTTAACAAA
85637 85753 ATAAACCTGTTCCATTTGGTTCTATAGTTACCATTGAAGGGCAATCATCCAGCTCTGGCATTGTCGGAGATAATAGCGGTGTCTATTTGACTGGACTACCTAAAAAATCAAAAATAC
86302 86413 ATATCTCTCCTAAGGTAAACGGTGAGAACCTTGTGGGGGATGACGTCGTCTTGGCTACGGGCAGCCAGGATTTCTTTGTTCGCTCAATTGGTTCCAAAGGCGGTAAACTTGC
87947 88051 TCCGTCAAATAAGAAGAAGGGCGTTGATACACCAATGTCGAAGGTGGCCATTGCACAAGCGTTGAGAAGAATTCTGGAGCGCCCAGAGCTGATGGATTTGGAACC
88473 88705 TTCCAGGACCATCAGCATTAGCCGAACAGGGTCAGCGTCCAGAGCTACGGCCAGCGCTCTTACCTTATCAACGGGGAGAGGGATCTTTCCGCTTTTGATGAGAGACAGGTTGTTGGCGTTTTTATACCCTACTTCCTTGGCAATAGTAGCCTGACTTTTCGGTGAGATAGTGATCAGGGAATCAATATAAGCCGCGTAGCGACCTTGTTTGCTATCTGTTTCGTTGGTAGCCA
403100 403256 AAAAATAATCAATACAGCCCCGACTGGGGCAAACACTATCGTCAGCTCGTAGAACAGGTCCAAGCCGAAAAACCCAACTTCTCTGCGGATACAATTCGGCAGATTTGGTACGAACGTAGCAATGGTATCTCCAGCCTCAGGCAGGGCGGTATGTCCC
403239 403373 GGCAGGGCGGTATGTCCCAGTTTGAATTCGAAACGGCGCAAAAACAGCTGCTCGCTCTCACCCGTCAGATGGCAGATGAGTGTACGCAGGAGAACTACGATCGGGTGATTGAGCGCCTTGTTAAACTTAAAACAG
403546 403651 CCTTGATCTCAAGAAAGTGTTGCATCGAGCATTGGGCGATGATGTCGATCCTATCGAACTGAATATGTCGCTCTGGCATCTCTACACGCAGGTGATTCAGAAGAAG
403884 404029 GACTGATTGCTGAAAAGCGGGTGCGCTTTATCACCTTCCATCAGAGTTTTGGCTATGAAGAGTTTATCGAAGGGCTAAGAGCAGAAACCACCGATGATGGCAATGTACGCTATGAAGTCAAAGCGGGCATTTTTAAGCAAATTTGT
404406 404514 TTAACCGTGGCAACATCTCCAAAATCTTCGGCGAACTGATCACCCTGATAGAAACCTCTAAACGTGCCGGGGAACCGGAGGCGCTTAGCGTGATTCTGCCTTACTCTTC
404949 405115 AGAAACAGTTTGCCGATCTTTTTGGCAATAACGGAACTGAAGATCTGCATGATATAGGTGCCAGTTTCCATCTAGCCCCTGAGGACGACAAGGTTTGGGATAACCCGCTCGCCTGGCAACAAATTTATGCGCCACATAAAGGTAACCCGGTGAGTGGGGAATGAAGT
405098 405276 GTGAGTGGGGAATGAAGTCTCGTAAGGACTTTACCGTCTTCGAATATGGTCACCTGATTAGCGACAAAGACAGTGGCCGCATAAATAATGCGGTCTCGCTGCCTCCGCAGCTGTTTGAGTGGCTTGAGCAACGTTGCCTGAAAGATGACGCGGATAACACTCACCGTCTGCTCACCCTT
405503 405632 TTTTTATTCAGCAGTTTATCGATAGCGTACAGCAGATTGTGCGCCAGGGGCTAAAGCGTGACTATTTACGCCAGCAGGACAATCTGCCTTGGATGAAAGGGAAGTTACGGGTCTCAGCTCAAATAAGCAA
411304 411440 ATCCAGATTATCTTCACGCAGTCCCAGCATCACGTTATACCCCGCCATCAGCAAGGGGTCGCCCATACGTGGCAGCGGTGCGAGATTCATCGGGATTGTCAGTGTCAGGCTGGCATCGTAATCACACCCCAGATAAA
411493 411600 TTTGGCCTCTGCCGTGTCTTCAGTTAACATTTCGACCCCGATGAAGTAGTTAGCATCTGAAGTTTCGCTGCCGAGGACCAGATGATCCCCCAGCCTTTACTACTCTGA
411606 411832 AGAATCTCGCCTGTTTTCCGGCGGGCATCCGCGTGGCCTTATGCGGCTTCCCTGTGACTCGGGTGTTGGGCGCCTGCGTGGGCAGTTCCTTCCGCCGTGCGGGTGGTTCTTACCAGTGTCAGCGACTGCAATAAGCCAGTAAACGCGACGTGGGCACGTCTCCGCTGTCAGTGATGCCGGTCAGGGCCATCAGGCTGCGCGAAGTTTTGTCGGTACTGTCAACGCTA
412181 412289 CAGCAGGGGGCGTACATTAAATTGCAGGAACAAGCCAATGCGTTAGCACATATTCAGGCGTTGAACTTCGAGAGTATTGACCTACCCACTGCGCAGCGCCAACTGGAAG
412280 412381 CAACTGGAAGAATTACAGGCTCGTCTGGATAGGCTGACCCATCCGCAGTCCGATATCGCTATTGCTAAGGCCGCACTGGATGAGGCTGAAGCACGGCAAAAA
412564 412794 TCAGGACATTGAACGTGAGCATGCACGCCAATTGCAAGGCCGACTCAATGAGGTCAATGACCGATTACAACGGTTACGGGAAGAGCTTATCAAGCGGATGTCCGACGCTAAAAAAGAGGATACTGGCGCGCTTTCGGAAGTGGGGCGGGAATTAGACGATATTCCCAACTATCTGGCGCGCCTACAGACCTTGACCGAAGAGGCGCTTCCCGAAAAACTTAAGCGTTTCCA
437241 437342 AATACACATTTTTTGTTTAGCCGCGGCACCCTTTATGGGTGCCGTTTAACGTTAATGGCCTGATGAATCAATCGTACTCATAAATCAATGGTCTGATGAACA
796218 796337 AAGCAAGAAATAGAGACTCCGAAGGCTGTGAATGAATACGTTGTCAAACATTCTGCGGATAGCGGTGTCATGAATATGCAATTGGCCGATAAAGATCTGGCGATGAAGGCAGATAAAAAG
816895 817002 CACCTTGAGTTGTTTCACCATATTTCGTAACATCACCGAAATCACGATCACGCAAAGTATACCCTTCTGCTGGTTTATCAGTATCATGGCTATTTCCATGTGCTGAAT
1047061 1047168 AATTAGAGGAGGGTGGCAACTGAATGATAAATCGTCGAAAACTAATTTGAACCGCATTTATACCTAAGTGACGTCGAGATACAGCTAGGCGGCAACCGAATAACCATT
1131895 1132020 TGCCCAATCTCCTGCCAACCTATTTTGTTGTTCCGGCGGCAACCAGGTGGCAACAATGTTACGGAGACTCGCTGGTGGTAGATTGTCTAATTCGCTCCAGTTAATCCGTGGCCCCTGATAATACGG
1132586 1132685 TTCAGTTTTTGCAGGTTGTCGGGCAGTGTTGCTGGCAACTCAGTTAACTTATTACCCATGAGATCCAGTTCGTTACACGGCGGCAGAATGTCTGGTAATG
1133451 1133559 CCCATGATTCATGACTGCTTTTGCATATTGTTTAAGTCGATTCAGCTAATCCGTGGCCCCTGATAATTCGGCTTAGTACTGAGACGCAGTAGTGCCAGCAGCGTGCTGG
1135403 1135520 AGTTTCTTTTTCGATCTTTGCCCAATCTTCTGCCAACGATTTTTGTTGTTCCGGCGGCACCCAGGTGGCAACAATGTTACGGAGACTCACTGGTGGTAGATTGTCTAATTCGCGGCAG
1135823 1135929 CAATGTAGGCAGTTCAGTTAAGCGACAGTTTGCCATACCAAGAGACTCCAGATTCGGGGGCAAGCGGTTTGGGCGTTGCAGTGACTCATTACGACCAACGTCTAATT
1136289 1136391 TCAGCCGGAGTTAAGGCTGCTGTTGATGTCCGTGCAGAATGGATTTCACGGTCTGGCTCGATGTTCGGCATTGACTCGTTCGACGTTATATTTGATAGATTCA
1235412 1235547 AAAAATCAGTTTAACAGTCCGGTATTCGAAGTCATTCATCGTATGGTCTTAGTGGGATTTAGTGCTAAGGCCATTTAACAAGAAATTATGTACCGTTTTGATAAAATCAGTACCAAGCCTAAATCCTTCCCCCATC
1235808 1236053 CAGTTTCCAGCTTGTCGAGCCAGCGGGTAAATTTCTTCGCAACATCAAAACCCATGCCGCGCATATCATCCCTCACCAGCGATTTTGTACAGGCCTCACCACTGGCGGTACGGGAGCGGATAGATTGCCACAGCGCGATATGATTAGCCGTCAGGCGCTGAATACTCGCCAGTTCCGCCTCATAGGGTGAGCCTTCATCGCTGACTTCACGGCCTTTATCACATAGCACCAATGAGTTAACTTCTT
1236360 1236460 GTGCCAGCGTATCCAGTACAATCAGGCGCACCGGCGAGCCGGTTTGCTCGGTGACATCCTGAGCCGCTTTAATCACCTGAATGGCGCTCTCCGGGCTGGCG
1236443 1236609 GCGCTCTCCGGGCTGGCGGGAAATACCGGGCAGTCGATGCGGTAGAGCGATTCAATCGGTGTGCCGCCGTTAAATTCCATTCCCCACGCGCGGATACGCCGTGGCACGCCGATACCGCCTTCTCCGACGATATAGACGACGGAACCTTGGGTCACACGCCGCGAGGC
1236592 1236727 GGTCACACGCCGCGAGGCCCACGGCTTGCCGGTGGCAATATGACAGGCCCATGAGACGGCCAAAAAGGATTTATACGAGCCGCTTGGCCCGTAAATACTGGCCAGTGTGTTACTTGGCAAGTGGCCCTTAATATTG
1236927 1237096 GGTGGGTGGCAGGGTCACTAAGCCATTGACTGCAAGAGCGGCGTATTCCGCCTGTTGTTTGCCGGGATTGTTGTCAGAGTTTGTTACGTCGTTATCCGCCGCGATAATAATCTGCGCGTCGGGATAACGTTCACGCAGTGCCACAGCAACCTGGGTCAGATTAGTGGCTG
1237201 1237303 GGGATAAATGAAGATTTAATCCGCGAGCCGGGCAGTAAACGCTTTTCTCCCCGGCGATTGATTAACAGCGCACCGGTAGTGTTGCCACTCAGATCCACCAGTG
1237923 1238054 GATCAACAATTTCTTCAATCGTGAGCGGGGAAGAGTTATGCATAGCACACCCCCGGATGGGTACGAACGTATTGCGGGTACAGACAGTGAAGGGTAGTAGCCATTGGGCCGCCTCCAAGAAGTAGTTGTAAT
1238041 1238291 GAAGTAGTTGTAATGACTACCACCTGAGACGCTAATCCCTTGGGTGGTAGCCCAGACGGAGTTAGCGTTACCGGACTTCTTGGAAACCGGCGCTCCTTTCGGAGCCCCCGCCTGGGCCACCATTGACGGGTGTGCGCAAGCCATACCTATAAAAGACCATGTCTGAATGACGAGGCTAGGTATCAAGAAGATTTTCGGAACGCTAATTCCGGCTACGGATTTTGCCGCAGCGAAAGGATTATACTGCAAAG
1238657 1238758 GCCCAGGCATAGCGCAGGCTGTGGGGGGAGTAATGTCCGGTCAGCCCTAAGCGGGTGGTATGGGAGCGCCAGAAATTCATCGCGGTTTTCAGGTCAGGTTTA
1239272 1239379 ACCTGAATATTAAGGGCCAGCAGATGGCGGCTAAACCTGTCCATAATCCGAATACGATCATGAACGGTTTTATGGCTACCGCCAGCTTGTTTTGCCAGCATTTTCATC
1239698 1239839 TGCTGTGGCGTCACTTTCACTCGTAAAAGTGACGCCACGAGCAACACGCAACAGGGCAGGCGCTAGCAGATGCCTTTCGGCATAGCAGAGCGCAGCCAGCGGCGTTTAGCACGCGCCAGAGATTGAGGTATAAGGAGTCTTT
1239860 1239963 ATTTAAGTGTGCAAAAACAGCGCCGTCATGGTAGACCACCGGTTTAAGAGGCCGTTGCTCTACTGAGGCGAGTCGTTCGCTTAGTTGATAACCAATGACAGTAT
1239946 1240046 GATAACCAATGACAGTATCCCCGATGGGCAATACCGGTTGGTCAGGAAATGCTGGTAATTAGCCATTTCGTTAGTGTGATAAGAGTACCTTTATCACCTTA
1240040 1240154 CACCTTAGTGGGGTGACACCCGTTTTCAAAGGTTCAACGGGTTTACCGCTCTTACAGCGCTACTTTTCAGGTTTCGGTAGCTATTTTGTGGTCGCTCGAAAGCGTCAGTCTCAGA
1240228 1240331 CTGATACTGGAAACATTGGGTGCTGTTATCGACAGCGAGTTTACATGGGGGAAATTACGGCAAAAACTGAATAGCGGCAAGTTATTGGAAGCTGTAATAAGGAA
1409939 1410123 TATCTCTGGTGGAATTCAGGGACGGAATGAAGCAACATCGGCTAGACTTTTCAGCTCTAACGCCACTGGAGTTTTTCGCACTGACGGTCAATTTGGATCTTATGCTGCTAGCGCAGATGTGGCAGTAGGCGTTACGGATGATAGACTCGCCGAACTTTTTTTTGACGCTTCCAGATCTGTACCCA
1414265 1414376 TATGTATAGGTATCCAGTGATGATGAACCGATCTCTATACAATAACGACTTTCATCGCTCATTAACGATGGGGCCTCCTTGTCTAAGCGGGCCTTATAGCCATTATAAGTAA
1414744 1414849 AAAATCTCTTTTAATGAGTACCAGATTACCAACATCATTGGTTCCGCTATCGTCGAGCGGTAGCTTATGGTGCACATCCCAACCTTTCGGGGCCTTGCCATTTGCC
1901237 1901347 ATCGATTTCACAATAAAGATTGCGTTGTTCGATAGCGTGTCTCACCCGTGCAAGATATGTTGGGTCATCGCTACTCAACACACCAAATTCAGGGTGAAGGCCCTCATCTGG
1902075 1902185 GAAATGAGAGTTTTACACAGGATGTCAATAACCTGGTCCAAAGCCACCTTGAACCGGTTAAGAGTATGGTTTATGCGAATAAAGACATCCCTAGTAAGAACAAAAATTTCG
1902915 1903025 TTGTTGTGGCGTTTTCTACTATTGCTTTACCACCAATGGCATTAGCAGCATGCACCAGCCCCGGGGTGGGAACCTATGTCTGCGAAGGTGAAAATACCGACGGCATCATCC
1904010 1904178 TTGTTGCTGAAAATGGCCTCGGTTTCATGGTGATGGGGGGCGCTTCCGAAGGCAACTCCACTATGATCATCAATGCGAATAACATTAGTTCTGGTAGTCAAGCACTCAATATTTACAACTATAGCGGTTTAGGTTCGGCATTTACTGCCGTTACCGCAACCGGGCATCT
2064702 2064801 GTTACGAGAGGTCAGAAAAGGGGGGGTAAACCCCGCCGAAACGGGGCGATAAGCAGCGACAGGAGAAGGGTTACGCCAATTTCTGTAATTCTTTACGTGC
2064792 2064921 CTTTACGTGCTGCTTCGGCTAGAAAATTGCTTCGGCTGGCATACTCGGGACGCACTTTAACAATAGTATCGATTTGATGAATCAGCCGGTGCGGAAGAGTGACGTTAATTCGTTCAACTTTTCCATCATA
2064907 2065047 AACTTTTCCATCATATTTCGACATATCAATATTCACGTTATACCACTGTCCGCCATCAGCGTAATCGCATGGGTTACGGTAAACGTGGTAAGGCATGTCATGAGCTTCAGGAATTTCGATACCTTCAGACACCAAAGCTTC
2065108 2065258 AAAAATACAGCCTTTGACATCGGGGAAATAACCGCTTGCCGAGCCATCATTATCAACGTGAACATAGGCGGGATAGATAGCCATATAAAACTCCTGCGTTGCTTAATAGCACCCTAGAGGGCGCTTTGTTAACTAAGGTTTGCATCTTTCA
2135213 2135411 TATCCAGCACCAGCTCTCATGAAAGCGTCGGTTTCATTGACCAGCCCATCACGTTATGGGCATAGAGGTTGATAATCGACACTTTGACGTGTCACCGCTTCCCGCATTGCATTGATATCTGAGGCAGGTAGGTATCGCTTGGCGCAGAGCTTCGCTACCACTGAAATCCAAGACCCGCATAATGAGGAAAAGCGAATCT
2278912 2279020 TTCAGTCTAATAGGGGGGATGGATATCAAACGTTGGTAGCGTTTGACTGGGCTTTGACCGTGTTATCCGATCCTGACTATCAATGGTTAGAGACAGATTCAGTGAAATG
2279003 2279122 GACAGATTCAGTGAAATGGCTGGTCGATGAGGTTGTGATAGGGCGGACTGACGGTACAAGAATGTGTTGGCGGCAACCGCGAAACATCTGACCACGTATCCTACCACCTGGCGTTTATGT
2279137 2279278 TTATGTCATATGGTTATTACCGCTACTAACGTCTCATGGGCAGGATGCCGTGGCCATTGAGCATGATTGCTCGGTACACGGTAAAAGCGTGGCGCACCAGCTGAGTGCCTCGATAAAACCGGCAGAAGACCCTCAACAAGCT
2279261 2279414 GAAGACCCTCAACAAGCTGCGACTACCTTTTTTGCCGGTAGTTTCTCGTATGCGCTGGAGCGGGCAGCCAACTGGGCACACGACTCTCCTGGAGGTGCTCGTCTGATCTTGGCGGATGATGCGTTTTCTGCAGAACTGGTGCCCGCACTGCTGG
2279435 2279547 TCAAGACAGATTTATGTAGCGGCGATAGCCCCGCAACCCATAGGCGAAAAAGCTTCCACGCTGGCTTCTGCTTGGCACCAACGCTGGGGGGGAACGCCTGGAGGCTATTTTTG
2279780 2279961 TATTTTTCGGTAAGGGAGCACAACATGAAACCGATTATGGTCGTCAGCGTGTGGCGCGCTGGATAGCGAACGTATCGGCCCCGCTGAGTGGGAAAATAGAGGTAACCGGCCATGTGCGCCGCTTTATCCCTGAAAAATTGCGCAGTGCTGAGCCGATTTTTTATCTGCATGGTGGTGGACTG
2279974 2280095 ACCGAGGTATTCAGCGCCTTTTTAAGCACCCTGGCGCATATCAGTGGACGGGAGATCCGCGCATTTGATTATCCAAAAGCGCCGGAAACGCCAATGGCCGAGCTCGTCGGCCATCTGGAACA
2280639 2280743 AGAACAGCGAAAATGTCGTTGTGGTCGATGGACGCAGTAAAGAGGTGTACCAGAGGGAACATATTCCGGGAGCTATCAACCAGCCTCACAAAGAAATCTGTTTTA
2280913 2281018 TATCGAGGGGAAAGAAGGTACACCGATTTCCTGCGGTTGCTGAGGATCCGGATATAGACCTGTCCTGATATAGGTTGACACTTTTTTGCCTTAAAACGCCTGATGA
2281176 2281347 CATTCTGATAACAGTCATATCCGTCCGTCATCGAGCACGTGACACCATACCGTTGATACAACGCCTGATATTCCTGCGAGCAGTACTGGATACCCCAGTCGGAGTGATGTATCAACGCAGAGGTCGTGCGTCTGCTCACCAGCGCCATTTTAAGCGCGGCGGCCACATGCCG
2281737 2281853 TGTAGAAGAGGGGGAAGACCAGTCGAGTTGACCGTACTTGCGTAGCCACTTCAAGACGGTAGAGCAGCCTTGAATACCGTAACGGTCTTGCGCCTGGCGGTAAGTCATTTCACCTTT
2322326 2322442 AGCCATACCAATGATAAAAGCAGTAGCGTTACATCTAAACTCTCGAAGGTCACGCTCAAGAAGTTCGAGAAACGGACTCAAGAAAAGTTAGACTCAGGTATTAAAGACATTGGGTTA
2322672 2322778 CTGAACAATCCGGTATTTTCGGTATTGATGAGACGAATATTTCGGTAAAAGGGACCACCAAACTACACGGCGCGGAAATAAGTAGTGGATCAGGGCAGCTCACCTTG
2363133 2363263 CTGTGAAGATGTGTGCCCCATCTGGTTGGCGACAAAATTCGGATTGGCTCCTGCACTCAGCGCCCAACATGCAAAAGTGTGCCGGGTCTCGTATGCCTTCCGATGGCGGATACCTGCTTTCTTCAATGCAA
2363302 2363409 TCTCCTCCTTTCCCATTGCGAGCTGAAAGGCGAGGGGAGAACACGAAGGTACAATCATCTTTGCGGATTTTGCCAAATTCGCGAAGGTGGACATCTACCTTTATTTTT
2363707 2363842 GATTTGCTCTTTTTCAACGGTGAAATACCGACGAAAGGCGATCGCTCTATGTACCCATTCGCAACAGCGAACTCCATCATCGCGGCAAAACAAGAAAGGTAGACGTTTACCGTTCTGACAGACCGACCTTTTTTCA
2854499 2854650 TGATAGCAAATTCACAAAGCGATTCTCTACGGTAGTCATCAAGCCTATATTCTGGTTTGATGTTTCCACGTGAAACGGGTTGATGTGTATTTGTCCACGCCTGTTCTACAATCAGCAAACTCAGCGCGACACTAGTGTATGATTTTTCCAAA
3224277 3224393 AAAGAAGCTCAACAAGCTAAAACCGAACAAGACGGAAAAGAACTGGCGTCCGGCGGGTATAAAGAACTCTTCGAGGAGGATAAGAAAACTAGCGGTTACTTTGGCATTGCGGAAAAT
3839315 3839432 GCACAGGGTCTCTTCACTTATTCGCCCATCAATCTCAGTAAAGGGTCGTAACTTGCCATTAACACGCACCTCGAATTCAGGATCGTGGATGAAGCGGGCGGATAATACAGTGAGTATT
3839761 3839898 TACCATGACCATCGTCTGTCACAGTCAATATTCCACCTATATCGGTAGGTAAAACCAAGTCTACTCGTGACGCTCCGGCATCCCAGGCGTTAGCAACTAACTCTGTTAAAGCGACCTGTGGAACATGTGCTACCTGCC
3840121 3840277 ACACTGGAGTCCAAACTTGCGGCAAAGAAGTTCTGTTTGGCATAGTGCACCCGCTTCAGGGGAAACGGTTTGTCCTGCAGGCTCCGGTTAACATCTGCTGGTAAGAGTAAAATTGACGCAACATGATTGCGCCAATCTTGAAATTCCTGCTCAGTAC
3840600 3840851 AGACACACTTGAGTATCCAACCCGGACATAATTCACTACTCGACGCATCAACCAGATGTCCAAATAAGTAGCTGTGACCGCCAGTTTGCGATCCACTGTCTCGTCGTCGTCAGTTTCTATCAACGATGCTAGGAGCACCGTATTCTGCCAAGTAAAATCATTGTGAGCGTTGTAGTAGACCGCTCGTAGCTTCGGTGTGTAATGCTTGCTGGTATGTAGGATACGTCGGTACGCTTTGGCAAAAAATGGAAA
3840870 3841074 CTGATTTTCCTGAGCTTTACCCAAACCCAGGCGAGAGTTGTTTTCTCGCACCCAGCGGTGAAAGACTGAACCGATCAACTCCCAATCTTTATCGACCGAGCCCGCCTTACGTTCACGGATACTTTCTGCGTACTGTGCCCGTAACCACGCCTTAATACAGTTGGCATCACGCTCCGGTTCGTGCCCCCCTCCCCAAGAGATCAAT
4034077 4034176 TGTTTTTTGAAAAAGAAACCGGGGGAATGTATGAGGGTCAGGAACGCTATAAAAACACTATGCTAAGGATGGTGTACCCGGTTAATGTGATTAATCATCA
4101318 4101439 GCTGGCGATTAGCTGACCGGAGGCGTTATAGCGGTTGCGGCTGATCAGTTCGTCAACGGTTTCATTATGATCGCGGCGGTTATAAGCCAGTGTACGAATAGCTAACCCCCGGTTATCGTGGA
4103744 4103860 TTTCGCCAAACCAGTCAACTGGGCCTTTGGGATTCGTCAGTAGACGGCGCTGTTGTTGCAGTGGTACGCCAATAATCGACTGGTTAATCAGTGCCTGCAAACCGGCGCTGTCGTAAT
4104174 4104289 GCTGGCGATTAGCTGACCGGAGGCGTTATAGCGGTTGCGGCTGATCAGTTCGCCAATGGTTTCATTATGATCGCGGCGGTTATAAGCCAGTGTGCGAATAGCGAACCCCCGGTTAT
4105999 4106157 CGTACTCGGTACGGCCAGAGACCGCCCAGCGTTGATCGACTTCGAGTTGCACCAGATTGCCTTTGTCATCAACTAAGAGGCCGTTTTCCGGATCGAATGAGTAAGAGGGTCCTGTCTCCACCCGTCGTGCGCTGAGTAATACGCGGCCAAAGCCGTCAC
4316319 4316427 CTAAATTGTCCGAAGCGGCCAGTGCGCAAGCCGGTGAAAATGTTAGCTTTGGTTAAAGGCAGTTGGATCAAAAAAGGACACGATTTTTACGTGTCAGACTGCTGACAAA
4433246 4433369 TTATGGGCACCTACTTTGGTCTTAACATTGATGATTCTATCGCGAATACTCGTGCTATCGGCGCAGTCTTGGGTGGAATGCTAGGGGGGCCATCGGTCGGGTTTCTGGTAGGGCTAACCGGAGG
4435970 4436095 TTGACGGCATCCCCGTTATTATCGGTCAGTACCGTTACCTTACTCAGTTGCGCACCACGGTTGGCGGTAAATGTCACGGCCATATCGGCCACCGAATTGCCCGTACTATCGACAACATGCGCCCGG
4436166 4436270 TATTTTTGTTCGCGATAAATGCGGCACCAAACGTGTCCGTATAGTTATTCGATACCAAGGTCCCCGTAATAGGGTGAACCCCAGCCTGAGTACTCGTGACAGATA
4437762 4437872 TATGTGTAAAGGTTGTGCTAAACGTCCCACTGCTATCCGTCATCCCTGATGTTGGTGTCAGTAGTGCGCCATTATCCGAGGTATAACCCACAAACATACTCGGCACCGGGT
4438176 4438283 CCTTCGCATCGGTGACCCGTGCATGGACCTCATTTTTATCCAGGTTATCCGCGACCGCGTCATCAACAATAACCATCAAATCACTTGCAGCAATGTGTGCCGTCGTCA
4446000 4446101 TGTCCAAAAAAAACACGAACCAGAGTTCCAAGATGGATGGCTTTCATTCAGCGTCAGCCTTGATGCTGTAAGGCAAGATTCTGCACATCATGCAAAAAATCT
4446150 4446254 GTGTTAAGCTCATCATCAAGTGTATTGGGATACGCGGCTGTAAATTCAGCTATGTCTGAAATCAGCGTTTCAATCTGCGAAGGCGAGTTGTCTTGAAGATAAAAA
4447016 4447137 ATTTATTAGGTAGAAGTATGTGCGACAGCATACGATAAACAAAGCCCGCACTCTGGCGGGCTTACGGGTCTTCCACGACACTATGCGACGCTAAAAAATAGTCTACAGCAGAATTCGCAGCA
4456102 4456207 CTTATTTACCTGTTTGCAACATTATCCGTTTGCAACATTACCCACTTGCAACATTATCCGTTTGCAACAGTTACGCGTCTGCTGGCGTTTTCCTCGTGAAAAATAA
4534172 4534314 GGATGAAAAGCTTCATCGAAAAGGAGCGCTTAGGAATTGGGCCAACATGTGGAGCTAACTTGTTGATTACGGAAGATATAATGCAAGAAGTTAACGGGAAAGTGATTGTCACTCTATTCTTTGATAGTGCATGGAAATATAAA
4544105 4544213 CAACGTGACCGATGCGGAGGCATTCCCGGGGCTTATCCGCCAGACACATCGGAAAATCAGGGTCGCTTCAGCCAATGGGGCTTATGACACAAAACAGTGCCACGACGAG
4544284 4544393 TATATCACTGGCGCTAACAATGGTCATCGAGTATACCGACAGAAATCAGGCTGTGGCGAGGCAACGGCTGACGGGGAGCAACGCTTACTGGAAATGGAACACGGCTTATA
4544409 4544523 CTGAAACGGCGATGTATCGGGTCAAACAACTGTTTGGTAGGCATCTTACACTGCGTGACTACGATGCCCTGATCGGGGAGACTATAGCGATGATCCGTGCCTTAAACAAAATGAC
4545192 4545295 AGTGCATGATAAAGTCAGAGCGCGTAGGCGGGGGAGTATTGCGCGCGCAATGGTTTTAATCAACTTTAGGTCGCATAACTGCCTGCTCTAGAAACTCATAATAT
4546729 4546852 TCTTATCAACGTGAGCCAATTGTAGCTCAATCCATGCACTATCATAGCCCTGCTCATGCAATATCGTTGACATGGTATGTCTAAATCCGTGACCCGTCGCACGGCCTTTATAACCCAGTAATTC
4547013 4547177 TCTGCTGGAATCTCCCACAAGCCGCCCTCAAGATCGATTTCCTGCCAAGTAGCAAAACGCATTTCCTTAGTTCGCACACCCGTTAGCATAAGTATCTTTGCCGCGTTCTTGGTTATGATGCTGCCGGTGTAACCCTCTAGATCACGAATAAAATATGGCATCTCT


403546 403651 NS2456_Yersinia_fredriksenii_ATCC33641
403884 404029 NS2456_Yersinia_fredriksenii_ATCC33641
404406 404514 NS2456_Yersinia_fredriksenii_ATCC33641
404949 405115 NS2456_Yersinia_fredriksenii_ATCC33641
405098 405276 NS2456_Yersinia_fredriksenii_ATCC33641
405503 405632 NS2456_Yersinia_fredriksenii_ATCC33641
1235808 1236053 NS2255_Yersinia_aldovae_ATCC35236
1236927 1237096 NS2255_Yersinia_aldovae_ATCC35236
4546729 4546852 NS2456_Yersinia_fredriksenii_ATCC33641 NS2458_Yersinia_mollaretii_ATCC43969 NS3669_Yersinia_kristensenii_ATCC33638
4547013 4547177 NS2456_Yersinia_fredriksenii_ATCC33641 NS2458_Yersinia_mollaretii_ATCC43969


Insignia V0.7
Signatures calculated: Wed Oct 15 2008 10:06:16

Reference Organism:
Yersinia enterocolitica subsp. enterocolitica 8081

Target Organism(s):


Signatures:
Start Stop Sequence Contig ID Signature Type
2756869 2756968 TTCCGGCCACCACTGCGGGGCTAAACTGCGAACGAAAACTTTCGCCGAGGCGATAATCCATTTGTAAGTTCCATTGTATTTCATGCTGCCGACTTTTTCC NC_008800
405895 405994 CTCGTACCTTCACTAACGCATTGTAATCTCACCACAGCCCTCCTTGATAAAGGGGGGGCTTGTGTCGCTATTCGGGAACATCCAAACGGCAGAAAGCAAA NC_008800
4285088 4285187 TTAGCTTTACCACTGACCCTTGTTTATCCCCAGAGAGCCCTTTGTTACATAAACAGTTGTGGGGTCAAGCACTTAGCGGCCAACAATGGATTATCCGCAG NC_008800
3345448 3345547 GATACAAGCAATCTTCTAATCCGAGTGCTATGTTAGCAATAGCCCGAGTAATTTTACGAGTAACAAACGTTTCCCCACGGCGAGGAGATTCATGGTTGAA NC_008800
1685163 1685262 CTTCACCATCAGCTAAACAACGCACGGTAAGCTGGATAAATGGGCTTAAACCGGGTGTGGAAGAGAGGGTATAAGCACGTAGAGTTTCATCACTGTTACG NC_008800
3861299 3861398 CCGTCACCAATGGTGTGGTAACCGGTCGGCGCAAATTGACGCTAACCGGCGGATTATCAACCCGACACAGAGTGGTGAATAGAGGTTCTGATGATGCAGT NC_008800
3019065 3019164 TGCAGTTGTTTAACCGCAGTGTAGCGTTCATAGCGCTCTAAGCCTTTAACATCGCTGGAGCTACGCGTAACAGCCTGCTGAATAAGTTGCTGAATGTTCA NC_008800
2331322 2331421 ATGCCTGGCGTCGGGATAGCGCAAACTTCTAACCGCCACAGGGTTAATGTGACATTGTCACGTCATCCGCACACAAACCTGCCTCCGCGATCGTCTGATC NC_008800
2453761 2453860 TAACCAAATCGACACCATCGTTTGGCGATATAAGAGAAACTCAGTGGTAAACTACGGTGCCTGAATCCTATGGGGATTAGGTTGGCTACAAAACAAAAAC NC_008800
2060704 2060803 TATTGAACCAGCGAATGTTGTTGCTCATAGTGATATAGCACCGCTGAGAAAGTCTGACCCTGGCCCGTTATTCCCCTGGAAACGTTTAGCGACACAGGGT NC_008800
191056 191155 GATTCTCTATCCGTTGGCAAACCACTACCGGTATATTTCAGCCTTAATGTTTGGCCCGCATTCGCTTCAATACGGAAGAAAGGGGGAGTGACAATAAAGG NC_008800
4410717 4410816 GTCGCCGTTCAAGAGAGTTTCACTCTTGAGGCACAAGCTAAATAACGCCACCTAAGCCTTTACGCCACATCAGGGAAACATAACCCCAGGCTGACATAAG NC_008800
134843 134942 CGGTGCCTGGCTGTTATCCCACCGCATCCCAACTGGCGCTTAAGCCGTTGGTCGATGGTGGTTTGCTCAATAATGCGCAGTGGCCCGTGATTAACGCCGT NC_008800
382755 382854 GTGAAAGCGGTCACACTTGAGCAGGTTAATGTCTTAGCTGCCAAATGGTTAGGGCGCAATCCAAAAGTGTTTAGCCTGAGTCCGGCGAAAATGTAATTTT NC_008800
1601827 1601926 GGCGCAACAAGCCACGTTTAATCGGGAAAGCCACTTGTAGCCCTTTGACTTCAAGTAAGGGTTGCAGGCCCGTCTGGAGTGGAAGTGGTTCGCCCTGTGG NC_008800
4515499 4515598 ACCCGGCGGTACATCGGGCACAGGACGTGATGTCAGCAGAGCCACAAGCGGAGTGGTCACTACTTCAAGTGGCGCACAAAGCACATGTCAGCAGCCGTCA NC_008800
259291 259390 CGAGCAAGGCGCAAAGGGCACTCTACGCCTTCAAGGGACTCGCTATATTAGCTTTGTCCCAAATTCAGTCGCTAAATAACCTGTCGCCGTAACAGAAATA NC_008800
4354637 4354736 TGCTCGTAAGGCTGTATTGGTTGATAGCGTAATATGAGGTAACAGTAAGGGCTTAGAATCATCCATACGTGTTCGCTCCATGGTGTAACGTTATATGAAC NC_008800
3471797 3471896 AACAATCACAATAGCGTGTTTACATTAGGGAAACATCATGATCGTACTCAGAGAGATGCGTGCTGAGGAACTTAGCGACTATCGGAATTTATTTATTACC NC_008800
2091484 2091583 GTTTGTCGTGCCAATTTCTTATTTATGGCGCCATTTTATGACATGAGAATTTTCGAGTATGAGCGATTAACTTCCTGAAACGGTAGACTTCTTTGCCAAT NC_008800
2403913 2404012 CACCAGAAAGTTGTGACAACATGCGGGGTAACTTATCGGTTAAGCGTAATCTCTCACAGAGATAGCCAATGGTAGTAGCAATGGCGTCCAAGGAAGCCCC NC_008800
1005649 1005749 GAAATGGCCGCTTAACCAGACTGACCCGCTCACCTGTGTAGTGCCGCTACTCAATGCCCTGGCTTATCAGCGTGATGTAACACAGTTTGCCGGCGAACCGC NC_008800
387230 387330 AGTTAGAGCAAGTTTACCAAGGGGGTTGGAATAGCGACCGTACTCTGATGAATCGCTATTATTCTGGTGAAGACTCCTCGCTTAATGCGTTTGCGGTGGAT NC_008800
272230 272330 GATTATCAGACAGTAACTTTAGGCTGGCATGATCCGTACCCCAGCTTATATTAGCAGGGGAGAACTCCTTAACTGGAGAGTAGACAACCCCTGAGCGGCGG NC_008800
3102197 3102297 TTTATCATCGATAGGGATGGATATGCCCGAAATAGTTATCAGGTTAAGTGGGGGATAAAAGCTGCGGGTAACAATATCACTGATGGAATTAATAGCGGTCA NC_008800
2179856 2179956 TCAGAACTCATATTTGATAGCGAAGGGAGTACATCGGCAGAGGAAGACAGTGTCGGTGTCATGGTGCGAAAGTTCCTGTTAGTGGTGATAAAACGGCGGCA NC_008800
4190436 4190536 TTGAACGTTTTGACCGACGACTATCGAATGATAGCCGCTGGATTGTCAGGTTACCGCAAGAGGATATGTGCCAAGCGACTGGGACTTCCCCCTTAAAAAAA NC_008800
1835939 1836039 ATCAATTGCGTCATTGCCTGGCGGTATTACGCAAGAAAGCCGCGATTTAAAATGCGGCTAGCCGTTTTAAAGCTATCCGTATTTTAAAGCTATCCGCATTT NC_008800
3146988 3147088 ATTCCTGCCCGCTATTTTGTCAGTAGCCTGCAAACTCTATTTTTGGCGGGGAATATCGGTACCGTTTTGGTGATTAACCTATTGTTCCTGATCGCATCAGC NC_008800
2123087 2123187 AAAAATTGAGTACAAAAGTCATCCTATTGAGCGCGCTGACTCTTTGCATCACGTTCTCGCTACCTTACGCTAATGCCGAACGCTATCCGGATGTACCTGTT NC_008800
2039376 2039476 AGTAATTTAATAAGTGACGGTCACTAATAGCGTATCGGTGTAATTCCCCGCGGTGGGGAATGAACTACTGGCAAACAATTGCGCATATAAAACGATCTGCT NC_008800
3960142 3960243 CATCGCCTTCGCTCAATAAAACTCTCTGACATTTAGCCTCACGGGATGCACCACCAAATTGGAATACCGCTGGCAGGCCCAACGAAACGGAGACAATGGGCT NC_008800
3275682 3275783 AAGACCCACAAAACTTCAGTCAAGTGGGTCTGGCGGCATGGTACGGTGAAGAGGCTAATGGCAACTCTACAGCAACGGGCGAAATCTTTGATCCCAATGCAT NC_008800
3909143 3909244 CATTATGCTGTGCGATAAGTTGCCGGGCTGTGACGTGTTGTCTATCCGCAGTAGTATGGGCATCAGAGGCAATGGTCAGAGCGTATCCCTTACTGGCTGCAA NC_008800
4425540 4425641 ACCATTGGCCGCACCTTCAGCTGAACCACGGCGTTCGTGCAAACTGCCTAAAGTGGTAAAACCATAGAATGCATTTAACCACGGCGCGGCAATATATCCTGG NC_008800
369784 369885 CATCCATACCGGGTAGCCCTAAATCTATTATGACAATGTCTGGCTCGGTTTGGCGGCATAAGTTATACACATCCAACCCATTCTCCGCTTCACCGACAATCT NC_008800
3660534 3660635 TCCCCAAAGACTCTAAATCTAATTTGAGGCTATGTTTGTCTAGCAACCGCATGACAACACGTTCACCATGATTTACTGGTAATATCGAGACTCGAACATCTA NC_008800
2633962 2634063 TTGTTCCATGGGTCTTCCCCGATTATGGTGGGAAAAATTCGCGCCTACTGTCTTGTTAGGGGTGAAACACTTATATGATAGGCTCTGTTTCTTGCTATTATT NC_008800
619641 619742 ATCAAATGAATTTATAACTGAAGCGTGTATATCAAGCTGGCTGGCTAATACTGATAACTCCGGTAACACCGGATCACTGACGATAACGGCATGAATATCATC NC_008800
4492506 4492607 TTTATTCCGTTTGTACTCGTGCCTATGGTCAATGCCACACTCGCCTACTTCGCACTGGATTTTGATTTGGTCTCCCGGGTAGTTCAAATGACACCTTGGACA NC_008800
2161913 2162014 CCATCAGCAGCAATAGTGACGCGGCACACTAATACCGGGCGGCGTTCATTCGGTCGCAGTGAGCAAAGATTATCGGATAAATCACGTGGTAGCATCGGGATG NC_008800
1421628 1421729 TGAACAAGATCACCCCCTGCAATTCTAACGGATAAGGTCGCTATACATCGGGGGCTGCCCGCGCTACCTTAAATAGAATCACGTAAAGTGACTATTCGTGCC NC_008800
380562 380664 TTGCGCCAAGGTGTGGGGTTCCGTATCAATCAGGCTTTGGAGCAGCTACGTTACCACGGTAGCCGCTATGCAGAACGTGATCCTATTGGGTTGCTGGCTGTGG NC_008800
1124304 1124406 AAGGAGTTGGCATGAGTATATCTGTTAGTTTAGCCCGGCCTGTTCCATTAGCAGGCTTGATGAACCAGGTGCAACCAATAGAGCAGGTCAAGAAGGAGAACTC NC_008800
1062196 1062298 GTTGCTGACAGCTTGCAATAAAATGCGGGCGGGAAAGTGTAGCTCCTAATGTCGCGATAGCACTGTAGCCCCCTTTTGAATGGCCTATCACCCCGATGGCTTG NC_008800
1035921 1036023 ATGCTGCTTCCTGTCAGTGGTGTCACTAATCCTGCCGCGTTACAGCAGTTGGCCAAATCAGTGCAAGGAGTGACCTGGGTTGACAGGAAAACTGAATTCAACA NC_008800
1904300 1904402 TTTAACGTGTTTTATAAGCGAGTGGCATTAACATTGGCGCGGGTTTCTCACATCTGAAACCCGCGTCATTAATGGCTTAAGCTACGCGATGCTGAGCAAATGC NC_008800
2104392 2104494 AATAACGAGTTACAGGTTCAATTAAGAGATGTTTATACCGCTTCCGCTAAGGGCAATATAGCGGGAGCGGTAAAATTTATTGCTCGTTCAACGCGTTGTTGGT NC_008800
660226 660328 CAGCCCACGATTTCTCGTCAATTAACCCGGTTGAAAGAGCGGGTACTAAAGTTAGGGAAAGGAAAGTCTACCCGTTATGCTTTACTTCGCTTGGTTGCAGGTG NC_008800
612419 612522 TCATGCGGCTGGATCGCAAACGGACATTGACGGTGCAAACTGATCCAACTCAGCAAGGGGGCGAAACCTCGGCTGAGTTATTACAGCGCATCAAGCCAGGGGTT NC_008800
1642902 1643005 GCCAGTGAGGAATTAGCAACCTGGATTCGGGCTAACTACCGCGGTAAAACCTCAGAATACCTTAATCGCACCAGTATGGGGAAAACCACCCCGTCTTATACTTT NC_008800
2743196 2743299 ATGCCATAAATACGCTGGAACCGGTATGCCCCACCGTCATGGTACGGTTTGAGTCTACCCGTTGTTGAATCGCAACATCCCCCCCTACGTAAGTCACTTCACCT NC_008800
1615228 1615331 GGTATAAAAACATGGTCGGTATAAAAGCATTCGTGGGATAACCGCATTAGTCGGTATAACAAAATTGTATGCACCTGAACAGCTTACGTAGAAGGTAAAAATAA NC_008800
530970 531073 TGACGAGCGTTATAAATAGTGCCGTCATAAGTGCCTAATTGGATAATCGCCAGGCGGAATGGGCGTGCTTCGTTGGCACGTTCCGGGGCTACGCTACGGATTTG NC_008800
2256590 2256693 ATAATATGAATAGTCAACTCGTATCTTCTGCACGCGTCACGCTATTGTTAACAGTGCGCAACCACCCTGGGGTCATGTCCCATATCTGTGGTTTATTTGCCCGC NC_008800
3837468 3837571 CCAGAACCAGCCAATGACGGACAAACCCACCACTGGGATGATAACCCCCCAGACGGTAATTTCACTGATCTTACCGGTAATGCGTGCACCACCGAAGTTGGCAA NC_008800
2548089 2548192 CGCATCCCCGCACCCCGTAATACACAAAATAAGGGGTCACCATGAAATACGATGACAATCACAGGAGTAAGACCAGCAAGGATGCTGTGGAGTACGCCCTCGCC NC_008800
364413 364516 CCATTCTGGAAACCGGTTATACGCCCACCCGCAGTGTAATAATCGCCACGTGAACGGGTTCGTTTAGATGCCCAGTAGGTGATATATAGTGTTGCGCCGACAAA NC_008800
2156785 2156888 GACAGCGAAGCCATGCACGCGTTTGAGTAAACCCCGTTTCTCCAGCAGATTGAGATCTTGCCGAATGGTGACTTCTGATACATGGGTTACTTGGGATAATTCGC NC_008800
3178875 3178978 TGCTGCAGCTATTGATAACTGCCTTGTCGGCTGCCCGACCGGTGGTAGTGATCAAACAATTATCCGTGATGTGTATACGCTGAACAATAACAGTCATACCAAAT NC_008800
2024425 2024528 AAATATACAATGATAGTCACTAATCGGTGGACAAAACACAGATCCCAGTCCTCTATTAAGCAGTCTTTCAGAATTCACAAGCTAACGTGTTATAGTGTAATTTT NC_008800
3623771 3623874 GCGATGGCAGCCAAAGTACTCGTCCCCCGCTGTTGCTCCAGTCTCATAAATTATCCCCACGAATGGTATAAGTCAGTTGACGGTGTATTGTCGGATTACCTCTT NC_008800
1476651 1476754 ATTTTCTTATGGTGGATAGTGGCTTTTCTCGGATTAATCTGACCGATTGGATGATAGAATGAAGCCACATTGAGTTCTCATTACGTATTATGGCGGCAGGTGTT NC_008800
2778913 2779017 CTGGCAGGATTATGCCTTATCGGATGGCTCTGATATTCGCAATCTGGGGCCGCAACTCTCCCATCCCTCACGCGCACTCGGTGTGTTAGGGATGCCGGGCTTTAC NC_008800
3368152 3368256 CATCAAATATCGCCTTGTTGTGGTTCGTTGCATGCAAAACGTCTGAATTCACCGTTCCCCTACCTAACAGTCTTAAGCGAGTGGCTTGTTGATATCCGGTAAATC NC_008800
3777322 3777426 TGACTGGGATAGGGTAATTTATTCTGTTAGCCGTGTTATACGCCACGGCAGCAGTTCACTGGTAATCCCCGTAAAAATAACTGCGCTGATTAAAAGCCAGCAATA NC_008800
47945 48049 TTCTTCGAATACTCGAAATCTTTCCCCCTTATGTAGAAGTCTCTTTACGCCGATTACGTTATATTTCACGTGATCGGCGGTTTGTGTTCAGAGTCATTGCTGACA NC_008800
1434726 1434830 CAAAACCATAGTTGGTAAGCAATTCAAAATTGATCGGCCCCAACTTCACGCCATGTATTTTGGAGGTCAGAGCATAATGACCGTTATCACCGGTTCCTGAACCAC NC_008800
2742602 2742706 ACCCCAAACCTTAGTGGGAAAAGGTGACATACCAAAAAGTGTACCCGGTTTAAGTACACCTAAGAGGTAACCTGCGGGGCTGACCCACAGGTATATACCCGTCAT NC_008800
1623633 1623737 GTTCAAAAGACTCGCCAGCCTCCATATGGCCACCGGGAATGGACCAGTAAGGCGCATGCTGACTGCTGCGTTTGCCCAACAGTACCTCACCTTGTTGATTGATAA NC_008800
425321 425425 TAAATTATAGGGGAGGGAGCTCCCCTTGTTTAACCCATATTAACTCCTCCACCGGGAAACCATAATGCTTCGCGACAGATACCAAACGATCGCGGGTTTGGTTAT NC_008800
3069850 3069954 TACTCATGGTAGTGATGACTTCTCAACCGCGTTGTCTATTACTACAGCGCTAACTTACGCAGCTATTTTGCTCGCCCACTACCTCTTTTTGCTGACGCTTACTTT NC_008800
48698 48803 CACCAGTGTTAGCTCATCGAGTTGTAGGCGGTGGCTATCACGCAGCCAGTTACCTTGAGCTCTTAACAGACCATCCTGCCAACGAGTGGAGAACTGATTGATTGCC NC_008800
3032966 3033071 CACCCAAGCAACGCTGGCCGCATTAGGCTGTAGCGATGTAAAACGCGTGACATTTTAAACAGATAAAACCGCAACGGGGTTACCTGCTAACCCCGATAAACACCAA NC_008800
2055915 2056020 AGGGATAAACGCCTTTATCAAGAGTGGGAAGTGATTACCAGCTATATGAAAGGCCGGGCTAAAAGAGGTGGGGATGAGACAACCCTAATTGATTTTATTAGTGACT NC_008800
2512693 2512798 AAATAAAGGTTCCATATCCCTATTCCCATCCATAACAGTTCGTTACATGATATCAGGGGGATAGTGCAAGGCAACGCAAAATGGTAGGAGCTAGAATGCAAAAACC NC_008800
3815209 3815314 ACCATGTACAAGTCGCCAACCTGCGAGAGTTACTGGCAATAGTTGTCGCACAACTAAGCACAGCCTAAGAAAATGAGATGTGGGCCAGACTTGCGCCCAAGGGTGC NC_008800
3136691 3136796 TTATAAACTAAATGGTTAGTGTTTCGGAACGGCAACGTGTAATTTGGCTCTCGGAATGTTGCCATTTTCTCGGTCGTAGTGCTTAGCTGGCCTATGAGTGGATATT NC_008800
3234960 3235065 TGAGTCAGTTGATTCAGCGATAGCAACGTATAACCAGATACACCGCCCCATAGCAGGCAAAATGCCCCAAGCACCCAAAGTAACGCGGCTCTGATAGAAATATTTC NC_008800
4425220 4425325 CCTTGGCCAACGAGGTCAATATAGCCACCTTGTGGCGTCATTAAGGTGTAATAACCACCCAGACCATAAGCATCAGTCTTAATCTTACCCGTATTGACTGATAGCT NC_008800
2164893 2164998 CTTAGCAATTAATTCAAGAGCACTTTCCCAACTAATCGGCTGATAGCGGTCAGTCTGTGGGTTATAGCGCATGGGATGGGTGATTCGCCCTTGATATTCAAGAAAG NC_008800
1437464 1437570 TGTTTCCAATTGCGATAGAACGTAGAGGTATCATTCGGGCCAATCTGTTGGTGAAAGCGCGACGGTTGACTGGCGTACCAGGCATCCCAGGTTGTGGGGCTTTTGCC NC_008800
3290401 3290507 TGGATGGAGTTGACGCCAATTGTGCTTAATAGATACGATCATTAAATGACAGGCAGAATATAGCGGCCCCCAAACAAAGGACACCGGTATATTCGCTTCCAGCAACA NC_008800
1631899 1632005 TGTCACCGGGGCCTTTTAAGTTAACTGATATGCTGCCAAGTCACTCAAACGGTGATTTGCATGTGACGGTGTTGGAAAGTAACGGCACAACGCAGCAGTTTACCGTG NC_008800
2675105 2675211 AAACACGCCAATTGTCAGCTATCGGTGGGCTATATAATAGAGTCTCGACACCATAGTCCTTATTGCCACTCACTGGGCTATTGGAAGAGAGCGTGCGATTACTGGAA NC_008800
628564 628670 TGATAATTGTAATATCGTGGCGCGTGATCGAAGTGATATTGAGCCGCCTTATGGCTATATCACGGCCCCGAGTACATTAACGACCTCGCCTTATGGTTTGATATTTA NC_008800
3301931 3302038 CTTCTCTCACCGTGACCCTGCTATGGTGGAATGGCTAACCGGTCGTAAACTCGATGTCTGATACTGTTAACCCGCTCGGTATAATGTCTTAAATCACAATCAGCGACG NC_008800
3647923 3648030 CAATAACCTGATTAACTCACCACTGGCAATATACGGCGCGGCAGAGTAACTGGGCTGCAAAGTGATACCGGCACCCTCTAGTGCTCCGCTGAGTAGAACAACTGATTC NC_008800
866412 866519 TACCGAGCAAAAGCGCCAATATAGCCGCGGTAAGCTTAAACGTTTGTTGAACCGATGCAACGAAAGGGTGGTTCCTGCTTGCCCTCACTTCGGAATTTGTGGAGGCTG NC_008800
2655043 2655150 TACCAAGGGTCATTCATTCCCACAGCCACTTCTATAGCACTGACGCCCACCACTTTCTCTCGCTCAGTGTCTTCTAATACGAACAGGTAACCTTGCTCACCCACAGCC NC_008800
4484578 4484685 TATCAATAAACGTCCACTGATGTACACCAAAACAATTCGCTCCTTCATTGATTGCCCGATGCAACTGAACCAAATGTTCTTCCATTAGGCCAATACGAAAACTGTCAT NC_008800
3724365 3724472 AGATCCGTGTATGAATATCTACACGGGCGCTTATTATCTTGCGATCGCATTTCGTCACTGGGGCGTTAATTGGGATGCAGTAGGGGCATACAATGCAGGGTTTGCTAA NC_008800
4212556 4212663 TTGATAATATGGGAGCTTTCATTTCTATGAGCCGCGGTTAACTCAGGGCTTGAATGCGATAAATCATCTACCTCCCCCGCATACGCTATTCTTTTCGCATAGAACTCG NC_008800
1621119 1621227 TGGCCAACGTGATCCGGAACGTTCCAACTATTTTTCAGCGTGATCACTGGGTTACACCCCCATAGGATCACCAGCTCGCTATTATCCACTACGTTAGGCCAGGCAGTTT NC_008800
3814255 3814363 CCTCAGAATGACATCACTGTACCGTCACTTAGTTTCACCACCGAATCGGTAATAGCCGTGACCTTAAGAGAACTGCCGGGCAACCTATCACCTTGCCTAACCTCAACGA NC_008800
906735 906843 AAGCGGCATCTATGGGGATTGGCCGTGACTTAATGGAATTTGGTGCGGTGTCGGTTGATATGACCCAATCATGGGCAACGTTACCCGAGCAAGGCACTTTGAGCGGTGG NC_008800
422906 423014 CCAGCGGCGTTTTACTAGGCTGCCCCCCGATTTCTCGTGCCAGACGTGGTACCAAATAACCCGATACGCGGCTGAGTAACCCTTTCATTAACTGCCGGGCTTCATCATC NC_008800
3358767 3358876 GCAGAAAAACCGGGGCATATAAGTTGAATATGTTTAACACCTTGAGAGGGCAGACTTTTTAAGGTTTCGTCGGTATATGGGGTCAACCACGGCTCACGACCAAAACGTGA NC_008800
462176 462285 CCCCACCATCACTATTGAGAATCAGAAGGGCTTCATGGAAACTCTTAATCGCGCTGTCCCGGGGAATCTCTCCTTGCACGGTACCAATCCACTCACCGACACCGGCAGTG NC_008800
2332064 2332173 CCGATTAAGTAGCGCCTTAACTGTCGCACAAGTTACCGGAATTCAGCGTTTATGCAGTCACTATGCCGCCCTCCTCACACCACTTTCCAGCCCAGATGCCTCGCGGGAAA NC_008800


Genomic characterization of the Yersinia genus
CITATION SEARCH THUMBNAILS DOWNLOADS PDF VIEWER PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00099885/00001
 Material Information
Title: Genomic characterization of the Yersinia genus
Series Title: Genome biology
Physical Description: Book
Language: English
Creator: Chen,Peter
Cook,Christopher
Stewart,Andrew
Nagarajan,Niranjan
Sommer,Dan
Pop,Mihai
Thomason,Brendan
Thomason,Maureen
Lentz,Shannon
Nolan,Nichole
Sozhamannan,Shanmuga
Sulakvelidze,Alexander
Mateczun,Alfred
Du,Lei
Zwick,Michael
Read,Timothy
Publisher: Genome Biology
Publication Date: 2010
 Notes
Abstract: BACKGROUND:New DNA sequencing technologies have enabled detailed comparative genomic analyses of entire genera of bacterial pathogens. Prior to this study, three species of the enterobacterial genus Yersinia that cause invasive human diseases (Yersinia pestis, Yersinia pseudotuberculosis, and Yersinia enterocolitica) had been sequenced. However, there were no genomic data on the Yersinia species with more limited virulence potential, frequently found in soil and water environments.RESULTS:We used high-throughput sequencing-by-synthesis instruments to obtain 25- to 42-fold average redundancy, whole-genome shotgun data from the type strains of eight species: Y. aldovae, Y. bercovieri, Y. frederiksenii, Y. kristensenii, Y. intermedia, Y. mollaretii, Y. rohdei, and Y. ruckeri. The deepest branching species in the genus, Y. ruckeri, causative agent of red mouth disease in fish, has the smallest genome (3.7 Mb), although it shares the same core set of approximately 2,500 genes as the other members of the species, whose genomes range in size from 4.3 to 4.8 Mb. Yersinia genomes had a similar global partition of protein functions, as measured by the distribution of Cluster of Orthologous Groups families. Genome to genome variation in islands with genes encoding functions such as ureases, hydrogeneases and B-12 cofactor metabolite reactions may reflect adaptations to colonizing specific host habitats.CONCLUSIONS:Rapid high-quality draft sequencing was used successfully to compare pathogenic and non-pathogenic members of the Yersinia genus. This work underscores the importance of the acquisition of horizontally transferred genes in the evolution of Y. pestis and points to virulence determinants that have been gained and lost on multiple occasions in the history of the genus.
General Note: Periodical Abbreviation:Genome Biol.
General Note: Start pageR1
General Note: M3: 10.1186/gb-2010-11-1-r1
 Record Information
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access: http://www.biomedcentral.com/info/about/openaccess/
Resource Identifier: issn - 1465-6906
http://genomebiology.com/2010/11/1/R1
System ID: UF00099885:00001

Downloads
Full Text


Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


Genome Biology


Genomic characterization of the Yersinia genus

Peter E Chenit, Christopher Cook Andrew C Stewart Niranjan Nagarajan 2'7, Dan D Sommer 2, Mihai Pop2,
Brendan Thomason', Maureen P Kiley Thomason', Shannon Lentz', Nichole Nolan', Shanmuga Sozhamannan1,
Alexander Sulakvelidze3, Alfred Mateczun Lei Du4, Michael E Zwickl',, Timothy D Read ','6*


Abstract
Background: New DNA sequencing technologies have enabled detailed comparative genomic analyses of entire
genera of bacterial pathogens. Prior to this study, three species of the enterobacterial genus Yersinia that cause
invasive human diseases (Yersinia pestis, Yersinia pseudotuberculosis, and Yersinia enterocolitica) had been sequenced.
However, there were no genomic data on the Yersinia species with more limited virulence potential, frequently
found in soil and water environments.
Results: We used high-throughput sequencing-by-synthesis instruments to obtain 25- to 42-fold average
redundancy, whole-genome shotgun data from the type strains of eight species: Y. aldovae, Y. bercovieri, Y.
frederiksenii, Y. kristensenii, Y. intermedia, Y mollaretii, Y. rohdei, and Y. ruckeri. The deepest branching species in the
genus, Y. ruckeri, causative agent of red mouth disease in fish, has the smallest genome (3.7 Mb), although it shares
the same core set of approximately 2,500 genes as the other members of the species, whose genomes range in
size from 4.3 to 4.8 Mb. Yersinia genomes had a similar global partition of protein functions, as measured by the
distribution of Cluster of Orthologous Groups families. Genome to genome variation in islands with genes
encoding functions such as ureases, hydrogeneases and B-12 cofactor metabolite reactions may reflect adaptations
to colonizing specific host habitats.
Conclusions: Rapid high-quality draft sequencing was used successfully to compare pathogenic and non-
pathogenic members of the Yersinia genus. This work underscores the importance of the acquisition of horizontally
transferred genes in the evolution of Y. pestis and points to virulence determinants that have been gained and lost
on multiple occasions in the history of the genus.


Background
Of the millions of species of bacteria that live on this
planet, only a very small percentage cause serious
human diseases [1]. Comparative genetic studies are
revealing that many pathogens have only recently
emerged from protean environmental, commensal or
zoonotic populations [2-5]. For a variety of reasons,
most research effort has been focused on characterizing
these pathogens, while their closely related non-patho-
genic relatives have only been lightly studied. As a
result, our understanding of the population biology of
these clades remains biased, limiting our knowledge of
the evolution of virulence and our ability to design


* Correspondence' tread@emoryedu
t Contributed equally
SBiological Defense Research Directorate, Naval Medical Research Center, 503
Robert Grant Avenue, Silver Spring, Maryland 20910, USA


0 BiolMed Central


reliable assays that distinguish pathogen signatures from
the background in the clinic and environment [6].
The recent development of second generation sequen-
cing platforms (reviewed by Mardis [7,8] and Shendure
[7,8]) offers an opportunity to change the direction of
microbial genomics, enabling the rapid genome sequen-
cing of large numbers of strains of both pathogenic and
non-pathogenic strains. Here we describe the deploy-
ment of new sequencing technology to extensively sam-
ple eight genomes from the Yersinia genus of the family
Enterobacteriaceae. The first published sequencing stu-
dies on the Yersinia genus have focused exclusively on
invasive human disease-causing species that included
five Yersinia pestis genome sequences (one of which,
strain 91001, is from the avirulent 'microtus' biovar)
[9-12], two Yersinia pseudotuberculosis [13,14] and one
Yersinia enterocolitica biotype 1B [15]. Primarily a zoo-
notic pathogen, Y pestis, the causative agent of bubonic


2010 Chen et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Common,
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.






Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


plague and a category A select agent, is a recently
emerged lineage that has since undergone global expan-
sion [2]. Following introduction into a human through
flea bite [16], Y. pestis is engulfed by macrophages and
taken to the regional lymph nodes. Y. pestis then
escapes the macrophages and multiplies to cause a
highly lethal bacteremia if untreated with antibiotics. Y.
pseudotuberculosis and Y enterocolitica (primarily bio-
type 1B) are enteropathogens that cause gastroenteritis
following ingestion and translocation of the Peyer's
patches. Like Y. pestis, the enteropathogenic Yersiniae
can escape macrophages and multiply outside host cells,
but unlike their more virulent cogener, they only usually
cause self-limiting inflammatory diseases.
The generally accepted pathway for the evolution of
these more severe disease-causing Yersiniae is memor-
ably encapsulated by the recipe, 'add DNA, stir, reduce'
[17]. In each species DNA has been 'added' by horizon-
tal gene transfer in the form of plasmids and genomic
islands. All three human pathogens carry a 70-kb pYV
virulence plasmid (also known as pCD), which carries
the Ysc type III secretion system and Yops effectors
[18-20], that is not detected in non-pathogenic species.
Y. pestis also has two additional plasmids, pMT (also
known as pFra), containing the F1 capsule-like antigen
and murine toxin, and pPla (also known as pPCP1),
which carries plasminogen-activating factor, Pla. Y. pes-
tis, Y. pseudotuberculosis, and biotype 1B Y. enterocoli-
tica also contain a chromosomally located, mobile, high-
pathogenicity island (HPI) [21]. The HPI includes a
cluster of genes for biosynthesis of yersiniabactin, an
iron-binding siderophore necessary for systemic infec-
tion [22]. 'Stir' refers to intra-genomic change, notably
the recent expansion of insertion sequences (IS) within
Y. pestis (3.7% of the Y. pestis C092 genome [9]) and a
high level of genome structural variation [23]. 'Reduce'
describes the loss of functions via deletions and pseudo-
gene accumulation in Y. pestis [9,13] due to shifts in
selection pressure caused by the transition from Y pseu-
dotuberculosis-like enteropathogenicity to a flea-borne
transmission cycle. This description of Y. pestis evolu-
tion is, of course, oversimplified. Y. pestis strains show
considerable diversity at the phenotypic level and there
is evidence of acquisition of plasmids and other horizon-
tally transferred genes [[12,24,25] DNA microarray,
[26,27]].
While most attention is focused on the three well-
known human pathogens, several other, less familiar
Yersinia species have been split off from Y enterocolitica
over the past 40 years based on biochemistry, serology
and 16S RNA sequence [28,29]. Y ruckeri is an agricul-
turally important fish pathogen that is a cause of 'red
mouth' disease in salmonid fish. The species has suffi-
cient phylogenetic divergence from the rest of the


Yersinia genus to stir controversy about its taxonomic
assignment [30]. Y fredricksenii, Y kristensenii, Y. inter-
media, Y. mollaretii, Y. bercovieri, and Y. rohdei have
been isolated from human feces, fresh water, animal
feces and intestines and foods [28]. There have been
reports associating some of the species with human
diarrheal infections [31] and lethality for mice [32]. Y.
aldovae is most often isolated from fresh water but has
also been cultured from fish and the alimentary tracts of
wild rodents [33]. There is no report of isolation of Y.
aldovae from human feces or urine [28].
Using microbead-based, massively parallel sequencing
by synthesis [34] we rapidly and economically obtained
high redundancy genome sequence of the type strains of
each of these eight lesser known Yersinia species. From
these genome sequences, we were able to determine the
core gene set that defines the Yersinia genus and to
look for clues to distinguish the genomes of human
pathogens from less virulent strains.

Results
High-redundancy draft genome sequences of eight
Yersinia species
Whole genome shotgun coverage of eight previously
unsequenced Yersinia species (Table 1) was obtained by
single-end bead-based pyrosequencing [34] using the
454 Life Sciences GS-20 instrument. Each of the eight
genomes was sequenced to a high level of redundancy
(between 25 and 44 sequencing reads per base) and
assembled de novo into large contigs (Table 2; Addi-
tional file 1). Excluding contigs that covered repeat
regions and therefore had significantly increased copy
number, the quality of the sequence of the draft assem-
blies was high, with less than 0.1% of the sequence of
each genome having a consensus quality score [35] less
than 40. Moreover, a more recent assessment of quality
of GS-20 data suggests that the scores generated by the
454 Life Sciences software are an underestimation of the
true sequence quality [36]. The most common sequen-
cing error encountered when assembling pyrosequen-
cing data is the rare calling of incorrect numbers of
homopolymers caused by variation in the intensity of
fluorescence emitted upon extension with the labeled
nucleoside [34].
Previous studies and our experience suggest that at
this level of sequence coverage the assembly gaps fall in
repeat regions that cannot be spanned by single-end
sequence reads (average length 109 nucleotides in this
study) [34]. Fewer RNA genes are observed compared to
published Yersinia genomes finished using traditional
Sanger sequencing technology (Additional file 1), reflect-
ing the greater difficulty of uniquely assembling repeti-
tive sequences with single-end reads. We assessed the
quality of our assemblies using metrics implemented in


Page 2 of 18







Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


CNY 6065
CDC 2475-87
CDC 1461-81, CIP
80-29
CIP 80-28
CIP 80-30
CDC 2465-87
H271-36/78, CDC
3022-85
2396-61


Year Location
isolated isolated

NR Czechoslovak
NR France
NR Denmark


NR
NR
USA
Germany


Description


a Drinking water
Human stool
Sewage

Human urine
Human urine
Soil
Dog feces


Optimum
growth
temperature
26C
26C
26C

37C
26C
26C
26C


1961 Idaho, USA Rainbow trout (Oncorhynchus
mykiss) with red mouth disease


NR, not reported in reference publication.


Table 2 Genomes summary
Species Type strain NCBI
project
ID
Y rohdei ATCC_43380 29767

Y ruckeri ATCC_29473 29769

Y aldovae ATCC_35236 29741

Y. kristensenii ATCC_33638 29761

Y. intermedia ATCC_29909 29755

Y. frederiksenii ATCC_33641 29743

Y. mollaretii ATCC_43969 16105

Y bercovieri ATCC 43970 16104


GenBank
accession
number
[Genbank:
ACCD00000000]
[Genbank:
ACCC0000000O]
[Genbank:
ACCB00000000]
[Genbank:
ACCA00000000]
[Genbank:
AALF00000000]
[Genbank:
AALE00000000]
[Genbank:
AALD00000000]
[Genbank:
AALC000000001


Total Number of
reads contigs >500
nt


991,106

1,347,304

1,125,002

1,374,452

1,768,909

1,504,985

1,825,876

1,263,275


Total length of
large contigs


83 4,303,720

103 3,716,658

104 4,277,123

86 4,637,246


4,684,150


90 4,864,031

110 4,535,932

144 4,316,521


% large
contigs
0.11

0.004

0.006

0.003

0.003


Number of contigs aligned
to chromosomal scaffold

60

68

60

63

68


0.006


the amosvalidate package [37]. Specifically, we focused
on three measures frequently correlated with assembly
errors: density of polymorphisms within assembled
reads, depth of coverage, and breakpoints in the align-
ment of unassembled reads to the final assembly.
Regions in each genome where at least one measure
suggested a possible mis-assembly were validated by
manual inspection (Additional file 2). Many of the sus-
pect regions corresponded to collapsed repeats, where
the location of individual members of the repeat family
within the genome could not be accurately determined.
Based on the results of the amosvalidate analysis and
the optical map alignment we found no evidence of
mis-assemblies leading to chimeric contigs in the eight
genomes we sequenced. Genomic regions flagged by the
amosvalidate package are made available in GFF format
(compatible with most genome browsers) in Additional
file 3.


Genome sizes were estimated initially as the sum of
the sizes of the contigs from the shotgun assembly, with
corrections for contigs representing collapsed repeats
(Table 2). We also derived an independent estimate for
the genome size from the whole-genome optical restric-
tion mapping of the species [38] (Additional file 4).
Alignment of contigs to the optical maps [39] suggested
that the optical maps consistently overestimated sizes (2
to 10% on average). After correction, the map-based
estimates and sequence-based estimates agreed well
(within 7%). Two species, Y aldovae (4.22 to 4.33 Mbp)
and Y. ruckeri (3.58 to 3.89 Mbp), have a substantially
reduced total genome size compared with the 4.6 to 4.8
Mbp seen in the genus generally. The agreement
between the optical maps and sequence-based estimates
of genome sizes tallied with experimental evidence for
the lack of large plasmids in the sequenced genomes
(Additional file 5). A screen for matches to known


Table 1 Strains sequenced in this study
Species ATCC Other
number designations


Page 3 of 18


Y aldovae
Y. bercovieri
Y. frederiksenii

Y. intermedia
Y kristensenii
Y mollaretii
Y rohdei

Y ruckeri


35236T
43970T
33641T

29909T
33638T
43969T
43380T

29473T


Reference


[100]
[101]
[102]

[103]
[104]
[101]
[105]

[67]







Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


plasmid genes produced only a few candidate plasmid
contigs, totaling less than 10 kbp of sequence in each
genome.
The number of IS elements per genome for the eight
species (12 to 167 matches) discovered using the IS fin-
der database [40] was much lower than in the Y. pestis
genome (1,147 matches; copy numbers estimates took
into account the possibility of mis-assembly and were
accordingly adjusted; see Methods). Furthermore, the
non-pathogenic species with the most IS matches,
namely Y. bercovieri (167 matches), Y. aldovae (143
matches) and Y. ruckeri (136 matches), have compara-
tively smaller genomes. We also searched for novel
repeat families using a de novo repeat-finder [41] and
collected a non-redundant set of 44 repeat sequence
families in the Yersinia genus (Table 3; Additional file
6). Interestingly, the well-known ERIC element [42] was
recovered by our de novo search and was found to be
present in many copies in all the pathogenic species, but
was relatively rare in the non-pathogenic ones. On the
other hand, a similar and recently discovered element,
YPAL [43] (also recovered by the de novo search), was
abundant in all the Yersinia genomes except the fish
pathogen Y. ruckeri. Insertion sequence IS1541C in the
IS finder database, which has expanded in Y. pestis (to
more than 60 copies), had only a handful of strong
matches in Y. enterocolitica, Y. pseudotuberculosis, and
Y. bercovieri and no discernable matches in the other
Yersinia genomes.

New Yersinia genome data reduce the pool of unique
detection targets for Y. pestis and Y. enterocolitica
The sequences generated in this study provide new
background information for validating genus detection
and diagnosis assays targeting pathogenic members of
the Yersinia genus. The assay design process commonly

Table 3 Distribution of common repeat sequences


ERIC
(127 bp)


YPAL
(167 bp)


starts by computationally identifying genomic regions
that are unique to the targeted genus ('signatures') an
ideal signature is shared by all targeted pathogens but
not found in a background comprising non-pathogenic
near neighbors or in other unrelated microbes. While
many pathogens are well characterized at the genomic
level, the background set is only sparsely represented in
genomic databases, thereby limiting the ability to com-
putationally screen out non-specific candidate assays
(false positives). As a result, many assays may fail
experimental field tests, thereby increasing the costs of
assay development efforts. To evaluate whether the new
genomic sequences generated in our study can reduce
the incidence of false positives in assay development, we
computed signatures for the Y. pestis and Y enterocoli-
tica genera using the Insignia pipeline [44], the system
previously used to successfully develop assays for the
detection of V cholerae [44]. We identified 171 and 100
regions within the genomes of Y. pestis and Y enteroco-
litica, respectively, that represent good candidates for
the design of detection assays. In Y pestis these regions
tended to cluster around the origin of replication,
whereas in Y. enterocolitica there was a more even dis-
tribution. The average G+C content of the regions for
the unique sequences in both species was close to the
Yersinia average (47%) and there was not a strong asso-
ciation with putative genome islands (Additional files 7,
8, 9, 10, 11, 12, [45]). For both species, most regions
overlapped predicted genes (161 of 171 (94%) and 96 of
100 (96%) in Y pestis and Y. pseudotuberculosis, respec-
tively). Interestingly, 171 Y. pestis gene regions were
spread over only 70 different genes, whereas the 96 Y.
enterocolitica regions were found overlapping only 90
genes. There was no obvious trend in the nature of the
genes harboring these putative signals except that many
could be arguably classed as 'non-core' functions,


Kristensenii 39
(142 bp)


E coli
Y pestis
Y pseudotuberculosis
Y enterocolitica
Y aldovae
Y bercovieri
Y frederiksenii
Y intermedia
Y kristensenii
Y mollaretii
Y rohdei
Y ruckeri


IS1541C
(708 bp)
0


Aldovae3
(154 bp)


Three of the repeat sequences found using de novo searches matched the known repeat elements ERIC, YPAL, and IS1541C and are identified as such.
Kristensenii39 and Aldovae3 are elements found from de novo searches in the Y. kristensenii and Y. aldovae genomes, respectively.


Page 4 of 18






Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


encoding phage endonucleases, invasins, hemolysins and
hypothetical proteins.
Ten Y. pestis-specific and 31 Y enterocolitica-specific
putative signatures have significant matches in the new
genome sequence data (Additional files 7, 8, 9, 10), indi-
cating assays designed within these regions would result
in false positive results. This result underscores the need
for a further sampling of genomes of the Yersinia genus
in order to assist the design of diagnostic assays.

Yersinia whole-genome comparisons
We performed a multiple alignment of the 11 Yersinia
species using the MAUVE algorithm [46] (from here on
Y. pestis CO92 and Y. pseudotuberculosis IP32953 were
used as the representative genomes of their species) and
obtained 98 locally collinear blocks (LCBs; Additional
files 13, 14, [47]). The mean length of the LCBs was
23,891 bp. The shortest block was 1,570 bp, and the
longest was 201,130 bp. This multiple alignment of the
'core' region on average covered 52% of each Yersinia
genome. The nucleotide diversity (II) for the concate-
nated aligned region was 0.27, or an approximate genus-
wide nucleotide sequence homology of 73%. As expected
for a set of bacteria with this level of diversity, the align-
ment of the genomes shows evidence of multiple large
genome rearrangements [23] (Additional file 13).
Using an automated pipeline for annotation and
clustering of protein orthologs based on the Markov
chain clustering tool MCL [48], we estimated the size
of the Yersinia protein core set to be 2,497 and the
pan-genome [49] to be 27,470 (Additional files 15, 16,
17, 18). The core number falls asymptotically as gen-
omes are introduced and hence this estimate is some-
what lower than the recent analysis of only the Y.
enterocolitica, Y. pseudotuberculosis and Y. pestis gen-
omes (2,747 core proteins) [15]. We found 681 genes
to be in exactly one copy in each Yersinia genome and
to be nearly identical in length. We used ClustalW
[50] to align the members of this highly conserved set,
and concatenated individual gene product alignments
to make a dataset of 170,940 amino acids for each of
the species. Uninformative characters were removed
from the dataset and a phylogeny of the genus was
computed using Phylip [51] (Figure 1). The topology
of this tree was identical whether distance or parsi-
mony methods were used (Additional files 19, 20) and
was also identical to a tree based on the nucleotide
sequence of the approximately 1.5 Mb of the core gen-
ome in LCBs (see above). The genus broke down into
three major clades: the outlying fish pathogen, Y. ruck-
eri; Y. pestislY. pseudotuberculosis; and the remainder
of the 'enterocolitica'-like species. Y. kristensenii
ATCC33638T was the nearest neighbor of Y. enteroco-
litica 8081. The outlying position of Y. ruckeri was


confirmed further when we analyzed the contribution
of the genome to reducing the size of the Yersinia
core protein families set. If Y. ruckeri was excluded,
the Yersinia core would be 2,232 protein families of
N = 2 rather than 2,072 (Table 4). In contrast, omis-
sion of any one of the 10 other species only reduced
the set by a maximum of 22 families.
Clustering the significant Cluster of Orthologous
Groups (COG) hits [52] for each genome hierarchically
(Figure 2) yielded a similar pattern for the three basic
clades. The overall composition of the COG matches in
each genome, as measured by the proportion of the
numbers in each COG supercategory, was similar
throughout the genus, with the notable exceptions of
the high percentage of group L COGs in Y pestis due to
the expansion of IS recombinases and the relatively
low number of group G (sugar metabolism) COGs in
Y ruckeri (Figure 2).

Shared protein clusters in pathogenic Yersinia:
yersiniabactin biosynthesis is the key chromosomal
function specific to high virulence in humans
The Yersinia proteomes were investigated for common
clusters in the three high virulence species missing from
the low human virulence genomes (Figure 3). Because of
the close evolutionary relationship of the 'enterocolic-
tica' clade strains, the number of unique protein clusters
in Y enterocolitica was reduced to a greater degree than
the more phylogentically isolated Y. pestis and Y. pseu-
dotuberculosis. Many of the same genome islands identi-
fied as recent horizontal acquisition by Y. pestis and/or
Y. pseudotuberculosis [9,13,15] were not present in any
of the newly sequenced genomes. However, some genes,
interesting from the perspective of the host specificity of
the Y pestislY. pseutoberculosis ancestor, were detected
in other Yersinia species for the first time. These
included orthologs of YP03720/YP03721, a hemolysin
and activator protein in Y. intermedia, Y. bercovieri and
Y. fredricksenii; YPO0599, a heme utilization protein
also found in Y intermedia; and YP00399, an enhancin
metalloprotease that had an ortholog in Y. kristensenii
(ykris0001_41250). Enhancin was originally identified as
a factor promoting baculovirus infection of gypsy moth
midgut by degradation of mucin [53]. Other loci in Y.
pestis/Y. pseudotuberculosis linked with insect infection,
the TccC and TcABC toxin clusters [54], were also
found in Y mollaretti. In Y. mollaretti the Tca and Tcc
proteins show about 90% sequence identity to Y pestisi
Y. pseudotuberculsis and share identical flanking chro-
mosomal locations. Further work will need to be under-
taken to resolve whether the insertion of the toxin
genes in Y. mollaretti is an independent horizontal
transfer event or occurred prior to divergence of the
species.


Page 5 of 18







Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


Table 4 Yersinia core size reduction by exclusion of one
species
Species excluded Core protein families
None 2,072
Y enterocolitico 2,074
Y aldovoe 2,085
Y. bercovieri 2,079
Y frederiksenii 2,077
Y intermedia 2,080
Y kristensenii 2,076
Y molloretii 2,078
Y rohdei 2,091
Y ruckeri 2,232
Y pseudotuberculosis 2,076
Y pestis 2,094
The core protein families with number of members 2 or greater were
recalculated in each case (see Materials and methods) with the protein set
from one genome missing.


After comparison of the new low virulence genomes,
the number of protein clusters shared by Y. enterocoli-
tica and the other two pathogens was reduced to 12 and
13 for Y. pseudotuberculosis and Y. pestis, respectively
(Figure 3). The remaining shared proteins were either
identified as phage-related or of unknown role, provid-
ing few clues to possible functions that might define dis-
tinct pathogenic niches. Performing a similar analysis


strategy between others genome of the 'enterocolitica'
clade and Y pestis or Y pseudotuberculosis gave a simi-
lar result in terms of numbers and types of shared pro-
tein clusters.
Only sixteen clusters of chromosomal proteins were
found to be common to all three high-virulence species
but absent from all eight non-pathogens (Figure 3). Ele-
ven of these are components of the yersiniabactin bio-
synthesis operon (Additional file 21), further
highlighting the critical importance of this iron binding
siderophore for invasive disease. The other proteins are
generally small proteins that are likely included because
they fall in unassembled regions of the eight draft gen-
omes. One other small island of three proteins consti-
tuting a multi-drug efflux pump (YE0443 to YE0445)
was common to the high-virulence species but missing
from the eight draft low-virulence species.

Variable regions of Y. enterocolitica clade genomes
The basic metabolic similarities of Y. enterocolitica and
the seven species on the main branch of the Yersinia
genus phylogenetic tree are further illustrated in Figure
4, where the best protein matches against each Y enter-
ocolitica 8081 gene product [15] are plotted against a
circular genome map. Very few genes exclusive to Y.
enterocolitica 8081 were found outside of prophage
regions, which is a typical result when groups of closely


Page 6 of 18


A B
1.00- 1.00-

0.75- 0.75-

0.50- 0.50-



0.00- ___________________________ 0.00- _______________
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
1-Specificity 1-Specificity


C D
1.00- 1.00-

0.75- 0.75-

0.50- 0.50-

0.25- 0.25- 3

0.00- ___________________________ 0.00-_____________
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
1-Specificity 1-Specificity
Figure 1 Yersinia whole-genome phylogeny. The phylogeny of the Yersinia genus was constructed from a dataset of 681 concatenated,
conserved protein sequences using the Neighbor-Joining (NJ) algorithm implemented by PHYLIP [51]. The tree was rooted using E coli. The
scale measures number of substitutions per residue. Tree topologies computed using maximum likelihood and parsimony estimates are identical
with each other and the NJ tree (Additional file 20). The only branches not supported in more than 99% of the 1,000 bootstrap replicates using
both methods are marked with asterisks. Both these branches were supported by >57% of replicates.






Chen et al. Genome Biology 2010, 11:R1 Page 7 of 18
http://genomebiology.com/2010/11/1/R1



4000



3500 A
S500.............. ............. M A-

3000 -

.E

W 2500 -
.: G
H

2000



1500 L



1000 -II
HP
HQ
500 R


I l I I I I I I I.
HU

0 o 6" ,, ZO
A. ,A,. & A.9 '. -z



Figure 2 Comparison of major COG groups in Yersinia genomes. Bars represent the number of proteins assigned to COG superfamilies [52]
for each genome, based on matches to the Conserved Domain Database [95] database with an E-value threshold <10'0. The COG groups are:
U, intracellular G, carbohydrate transport and metabolism; R, general function prediction; I, lipid transport and metabolism; D, cell
cycle control; H, coenzyme transport and metabolism; B, chromatin structure; P, inorganic ion transport and metabolism; W, extracellular
structures; O, post-translational modification; J, translation; A, RNA processing and editing; L, replication, recombination and repair; C, energy
production; M, cell wall/membrane biogenesis; Q, secondary metabolite biosynthesis; Z, cytoskeleton; V, defense mechanisms; E, amino acid
transport and metabolism; K, transcription; N, cell motility; T, signal transduction; F, nucleotide transport; S, function unknown.


related bacterial genomes are compared [55]. One of the pseudotuberculosis strain IP32953 [14]. A model has
largest islands found in Y. enterocolitica 8081 was the been proposed for the acquisition of YAPI in a common
66-kb Y. pseudotuberculosis adhesion pathogenicity ancestor of Y. pseudotuberculosis and Y. enterocolitica
island (YAPIye) [15,56,57], a unique feature of biotype and subsequent degradation to various degrees within
lB strains. YAPIye, containing a type IV pilus gene clus- the Y. pseudotuberculosis clade. However, the complete
ter and other putative virulence determinants, such as absence of YAPI from any of the seven species in the Y.
arsenic resistance, is similar to a 99-kb YAPIpst that is enterocolitica branch (Figure 4), as well as from most
found in several other serotypes of Y pseudotuberculosis strains of Y enterocolitica [15], argues against an ancient
[14,57] but is missing in Y pestis and the serotype I Y. acquisition of YAPI, but instead suggests the recent






Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


Y pseudotuberculosis







(b)


















Y pseudotuberculosis


Y enterocolitica






is


















Y enterocolitica


Figure 3 Distribution of protein clusters across Y. enterocolitica 8081, Y. pestis C092, and Y. pseudotuberculosis IP32953. (a) The Ver
diagram shows the number of protein clusters unique or shared between the two other high virulence Yersinia species (see Materials and
methods). (b) The number of shared and unique clusters that do not contain a single member of the eight low human virulence genomes
sequenced in this study.


Page 8 of 18







Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


Figure 4 Protein-based comparison of Y. enterocolitica 8081 to the Yersinia genus. The map represents the blast score ratio (BSR) [98,99] to
the protein encoded by Y enterocolitica [15]. Blue indicates a BSR >0.70 (strong match); cyan 0.69 to 0.4 (intermediate); green <0.4 (weak). Red
and pink outer circles are locations of the Y. enterocolitica genes on the + and strands. The genomes are ordered from outside to inside based
on the greatest overall similarity to Y enterocolitica: Y kristensenii, Y frederiksenii, Y mollaretii, Y intermedia, Y bercovieri, Y aldovae, Y rohdei, Y
ruckeri, Y pseudotuberculosis, and Y. pests. The black bars on the outside refer to genome islands in Y enterocolitica identified by Thomson et al.
[15].


independent acquisition of related islands by both Y.
enterocolitica biogroup 1B and Y pseudotuberculosis.
Many genes previously thought to be unique to Y.
enterocolitica in general and biotype 1B in particular
turned out to have orthologs in the low human viru-
lence species sequenced in this study. These included
several putative biotype 1B-specific genes identified by
microarray-based screening [58], including YE0344
HylD hemophore (yinte0001_41550 has 78% nucleotide


identity), YE4052 metalloprotease (yinte0001_36030 has
95% nucleotide identity), and YE4088, a two-component
sensor kinase, which had orthologs in all species. Large
portions of the biogroup 1B-specific island containing
the Ytsl type II secretion system were found in Y ruck-
eri, Y mollaretii, and Y aldovae. Y aldovae and Y mol-
laretii also had islands containing ysa type three
secretion systems (TTSS) with 75 to 85% nucleotide
identity to the homolog in Y. enterocolitica lB. The


Page 9 of 18







Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


ysagenes are a chromosomal cluster [9,13,15] that in Y.
enterocolitica, at least, appears to play a role in virulence
[59]. The Y. enterocolitica ysa genes are found in the
plasticity zone (Figure 4) and have very low similarity to
the Y pestis and Y. pseudotuberculosis ysa genes (which
are more similar to the Salmonella SPI-2 island [60,61])
and are found between orthologs of YP00254 and
YP00274 [9]. Species within the Yersinia genus had
either the Y. enterocolitica type of ysa TTSS locus or
the Y. pestis/SPI-2 type (with the exception of Y. aldo-
vae, which has both; Additional file 22). This suggested
the exchange of chromosomal TTSS genes within
Yersinia.
The modular nature of the islands found in the Y.
enterocolitica genome was demonstrated further by two
examples gleaned from comparison with the evolutiona-
rily closest low human virulence genome, Y kristensenii
ATCC 33638T (Figure 1). The YGI-3 island [15] in Y.
enterocolitica 8081 is a degraded integrated plasmid; at
the same chromosomal locus in Y. kristensenii ATCC
33638T a prophage was found, suggesting that the YGI-
3 location may be a recombinational hotspot. Another
Y. enterocolitica 8081 island, YGI-1, encodes a 'tight
adherence' (tad) locus responsible for non-specific sur-
face binding. Y kristensenii ATCC 33638T had an iden-
tical 13 gene tad locus in the same position, but the
nucleotide sequence identity of the region to Y. entero-
colitica 8081 was uniformly lower than that found for
the rest of the genome, suggesting there had been either
a gene conversion event replacing the tad locus with a
set of new alleles in the recent history of Y. kristensenii
or Y. enterocolitica or the locus was under very high
positive selective pressure.

Niche-specific metabolic adaptations in the Yersinia genus
Comparison of the Y. enterocolitica genome to Y. pestis
and Y. pseudotuberculosis revealed some potentially


significant metabolic differences that may account for
varying tropisms in gastric infections [62]. Y. enterocoli-
tica 8081 alone contained entire gene clusters for coba-
lamin (vitamin B12) biosynthesis (cbi), 1,2-propanediol
utilization (pdu), and tetrathionate respiration (ttr). In Y
enterocolitica and Salmonella typhimurium [63,64], vita-
min B12 is produced under anaerobic conditions where
it is used as a cofactor in 1,2-propanediol degradation,
with tetrathionate serving as an electron acceptor. This
study showed the genes for this pathway to be a general
feature of species in the 'enterocolitica' branch of the
Yersinia genus (with the caveat that some portions are
missing in some species; for example, Y. rohdei is miss-
ing the pdu cluster (Table 5). Additionally, Y. interme-
dia, Y. bercovieri, and Y. mollaretii contained gene
clusters encoding degradation of the membrane lipid
constituent ethanolamine. Ethanolamine metabolism
under anaerobic conditions also requires the B12 cofac-
tor. Y. intermedia contained the full 17-gene cluster
reported in S. typhimurium [65], including structural
components of the carboxysome organelle. Another dis-
covery from the Y. enterocolitica genome analysis was
the presence of two compact hydrogenase gene clusters,
Hyd-2 and Hyd-4 [15]. Hydrogen released from fermen-
tation by intestinal microflora is imputed to be an
important energy source for enteric gut pathogens [66].
Both gene clusters are conserved across all the other
seven enterocolitica-branch species, but are missing
from Y pestis and Y. pseudotuberculosis. Y ruckeri con-
tained a single [NiFe]-containing hydrogenase complex.
Y. ruckeri, the most evolutionarily distant member of
the genus (Figure 1) with the smallest genome (3.7 Mb),
had several features that were distinctive from its coge-
ners. The Y ruckeri O-antigen operon contained a neuB
sialic acid synthase gene, therefore the bacterium was
predicted to produce a sialated outer surface structure.
Among the common Yersinia genes that are missing


Table 5 Key niche-specific genes in Yersinia
cbi pdu ttr eut


Y enterocolitica
Y aldovae
Y bercovieri
Y frederiksenii
Y intermedia
Y kristensenii
Y mollaretii
Y rohdei
Y ruckeri
Y pseudotuberculosis
Y pestis


hyd-2


+ + + -
+ + -
+ + + eutABC
+ + + -
+ + + eutSPQTDMNEJGHABCLKR
+ + + -


+ +


hyd-4


eutABC


+hyABCGINd
+/- hyfABCGHINfc


+
+/- (hyaD, hypEDE


ure mtn opg
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+
+ + +
+/-


Abbreviations: cbi, cobalamin (vitamin B12) biosynthesis; pdu, 1,2-propanediol utilization; ttr, tetrathionate respiration; eut, ethanolamine degradation; hyd-2 and
hyd-4, hydrogenases 2 and 4, respectively; ure, urease; mtn, methionine salvage pathway; opg, osmoprotectant (synthesis of periplasmic branched glucans).


Page 10 of 18






Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


only in Y. ruckeri were those for xylose utilization and
urease activity, consistent with phenotypes that have
long been known in clinical microbiology [67] (Table 3).
Surprisingly, we discovered that Y. ruckeri was also
missing the mtnKADCBEU gene cluster that comprises
the majority of the methionine salvage pathway [68]
found in most other Yersiniae. These genes have also
been deleted from Y. pestis, but as with Y. ruckeri, the
mtnN (methylthioadenosine nucleosidase) is maintained.
The loss of these genes in Y pestis has been interpreted
as a consequence of adaptation to an obligate host-
dwelling lifecycle, where the availability of the sulfur-
containing amino acids is not a nutritional limitation
[15].

Discussion
Whole-genome shotgun sequencing by high-throughput
bead-based pyrosequencing has proved remarkably use-
ful for the large-scale sequencing of closely related bac-
teria [49,69-74]. High-quality de novo assemblies can be
obtained with relatively few errors and gaps when the
sequence read coverage redundancy is 15-fold or
greater. Closing all the gaps in each genome sequence is
time-consuming and costly; therefore, in the near future
there will be an excess of draft bacterial sequences ver-
sus closed genomes in public databases. Our analysis
strategy here melds both draft and complete genomes
using consistent automated annotation that is scalable
to encompass potentially much larger datasets. High
quality draft sequencing is likely to shortly supersede
comparative genome hybridization using microarrays
[25,58,75,76] as the most popular strategy for genome-
wide bacterial comparisons. Genome sequence datasets
can be used to shed light on the novel functions in
close relatives that may have been lost in the pathogen
of interest, as well as orthologs in genomes that fall
below the threshold for hybridization-based detection.
The problems of using microarrays for comparisons of
more diverse bacterial taxa are illustrated in a study of
the Yersinia genus, using many of the strains sequenced
in this work, where the estimated number of core genes
was found to be only 292 [25].
We cannot claim complete coverage of all the type
strains of the Yersinia genus, as three new species have
been created [77-79] since our work began. Nonetheless,
from this extensive genomic survey we have attempted
to categorize the features that define Yersinia. The core
of about 2,500 proteins present in all 11 species is not a
subset of any other enterobacterial genome. Species of
the Y enterocolitica clade (Figure 1) have overall a simi-
lar array of protein functions and contain a number of
conserved gene clusters cobalaminn, hydrogenases,
ureases, and so on) found in other bacteria (Helicobac-
ter, Campylobacter, Salmonella, Escherichia coli) that


colonize the mammalian gut. Y pestis has lost many of
these genes by deletion or disruption since its split from
the enteric pathogen Y pseudotuberculosis and adoption
of an insect vector-mediated pathogenicity mode. The
smaller Y ruckeri chromosome does not appear to result
from recent reductive evolution (as is the case of Y pes-
tis), evidenced by the relatively low number of frame-
shifts and pseudogenes, and the normal amount of
repetitive contigs in the newbler genome assembly. Like
Y. pestis, Y. ruckeri lacks urease, methionine salvage
genes, and B12-related metabolism. The prevailing con-
sensus is that the pathway of transmission of red mouth
disease in fish is gastrointestinal yet the similarities of Y
ruckeri genome reduction to Y pestis hint at an alterna-
tive mode of infection for Y ruckeri.
This comparative genomic study reaffirms that the
distinguishing features of the high-level mammalian
pathogens is the acquisition of a particular set of
mobile elements: HPI, the pYV, pMT1 and pPCP plas-
mids, and the YADI island. However, the eight species
sequenced in this study believed to have either low or
zero potential for human infection, contain numerous,
apparently horizontally transferred genes that would be
considered putative virulence determinants if discov-
ered in the genome of a more serious pathogen. Two
examples are yaldo0001_40900 (bile salt hydrolase) and
yfred0001_36480, an ortholog of the TibA adhesin of
enterotoxigenic E. coli. Bile salt hydrolase in patho-
genic Brucella abortus has been shown to enhance bile
resistance during oral mouse infections [80] and the
TibA adhesin forms a biofilm that mediates human
cell invasion [81]. The low-virulence species contain a
similar (and in some cases greater) number of matches
to known drug resistance mechanisms that have been
curated in the Antibiotic Resistance Genes Database
[82] (Additional file 23, [83]). Adding DNA, stirring
and reducing [17] is, therefore, the general recipe for
Yersinia genome evolution rather than a formula speci-
fic to pathogens. Comparative genomic studies such as
these can be used to enhance our ability to rapidly
assess the virulence potential of a genome sequence of
an emerging pathogen and we plan to continue to
build more extensive databases of non-pathogenic Yer-
sinia genomes that will allow us to draw conclusions
with more statistical power possible than just 11 repre-
sentative species.

Conclusions
Genomes of the 11 Yersinia species studied range in
estimated size from 3.7 to 4.8 Mb. The nucleotide diver-
sity (II) of the conserved backbone based on large colli-
near conserved blocks was calculated to be 0.27. There
were no orthologs of genes and predicted proteins in
the virulence-associated plasmids pYV, pMT1, and pPla,


Page 11 of 18






Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


and the HPI of Y. pestis in the genomes of the type
strains eight non- or low-pathogenic Yersinia species
Apart from functions encoded on the aforementioned
plasmids, HPI and YAPI regions, only nine proteins
detected as common to all three Yersinia pathogen spe-
cies (Y. pestis, Y. enterocolitica and Y. pseudotuberculo-
sis) were not found on at least one of the other eight
species. Therefore, our study is in agreement with the
hypothesis that genes acquired by recent horizontal
transfer effectively define the members of the Yersinia
genus virulent for humans.
The core proteome of the 11 Yersinia species consists
of approximately 2,500 proteins. Yersinia genomes had a
similar global partition of protein functions, as measured
by the distribution of COG families. Genome to genome
variation in islands with genes encoding functions such
as ureases, hydrogenases and B12 cofactor metabolite
reactions may reflect adaptations to colonizing specific
host habitats.
Y. ruckeri, a salmonid fish pathogen, is the earliest
branching member of the genus and has the smallest
genome (3.7 Mb). Like Y. pestis, Y. ruckeri lacks func-
tional urease, methionine salvage genes, and B12-related
metabolism. These losses may reflect adaptation to a
lifestyle that does not include colonization of the mam-
malian gut.
The absence of the YAPI island in any of the seven 'Y.
enterocolitica clade' genomes likely indicates that YAPI
was acquired independently in Y. enterocolitica and Y.
pseudotuberculosis.
We identified 171 and 100 regions within the genomes
of Y pestis and Y enterocolitica, respectively, that repre-
sented potential candidates for the design of nucleotide
sequence-based assays for unique detection of each
pathogen.

Materials and methods
Bacterial strains
Type strains of the eight Yersinia species sequenced in
this study (Table 1) were acquired from the American
Type Culture Collection (ATCC) and propagated at 37
C or 25'C (Y ruckeri) on Luria media. DNA for genome
sequencing was prepared from overnight broth cultures
propagated from single colonies streaked on a Luria
agar plate using the Promega Wizard Maxiprep System
(Promega, Madison, WI, USA).

Genome sequencing and assembly
Genomes were sequenced using the Genome Sequencer
20 Instrument (454 Life Sequencing Inc., Branford, CT)
[34]. Libraries for sequencing were prepared from 5 ptg
of genomic DNA. The sequencing reads for each project
were assembled de novo using the newbler program
(version 01.51.02; 454 Life Sciences Inc).


Optical mapping
Optical maps [38] for each genome using the restriction
enzymes AflII and Nhel (Y. aldovae and Y. kristensenii
only have maps for AflII) were constructed by Opgen
Inc. (Madison, WI). The newbler assemblies for each
genome were scaffolded using the optical maps and the
SOMA package [39] (Additional file 4). Assemblies that
did not align against the optical map were tested for
high read coverage, unusual GC content, and good
matches to plasmid-associated genes from the ACLAME
database [84] (BLAST E-value less than 10-20) to identify
sequences that could potentially be part of an extrachro-
mosomal element.

Detection of disrupted genes
We used two methods for detecting disrupted proteins
used. In the first method clustered protein groups were
used to adduce evidence for possible gene disruption
events. The clusters were parsed for pairs of proteins
that met the following criteria: both from the same gen-
ome; encoded by genes located on the same strand with
less than 200 bp separating their frames; and total
length of the combined genes was not greater than
120% of the longest gene in the cluster. The second
method used was the FSFIND algorithm [85] with a
standard bacterial gene model to compare the accumu-
lation of predicted frameshifts across different genomes.

Assembly validation
In order to rule out artifacts due to undocumented
features of the newbler assemblies, new assemblies
were generated for validation purposes by re-mapping
all the shotgun reads to the sequence of the assembled
contigs using AMOScmp [86]. The resulting assembly
was then subjected to analysis using the amosvalidate
package [37]. The output of this program includes a
list of genomic regions that contain inconsistencies
highlighting possible misassemblies. The resulting
regions were manually inspected to reduce the possibi-
lity of assembly errors. The regions flagged by the
amosvalidate package are provided in GFF (general
feature format), compatible with most genome brow-
sers (Additional file 3).

Insertion sequences and de novo repeat finding
The presence of repeats is known to confound assembly
programs and the newbler assembler is known to collapse
high-fidelity repeat instances into a single contig. To
account for the possibility of such misassemblies, we
computed the copy number of contigs based on coverage
statistics and used this information to correct our esti-
mates for the abundance of classes of repeats (Additional
file 3). To find known insertion sequences, the genomes
were scanned for matches using the IS finder web service


Page 12 of 18






Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


[40] with a BLAST E-value threshold of 10-10 (matches to
known repeat contigs were counted as multiple matches
based on the coverage of the contig). In addition, we
searched for common repeat sequences in the genome
using the RepeatScout program [41] after duplicating
known repeat contigs. The repeats found in each genome
were collected (64 sequences) and transformed into a
non-redundant set of 44 sequences using the CD-HIT
program [87] (Additional file 6). The repeats found were
then searched against all the genomes using BLAST with
an E-value threshold of 10 10 to record matches. The
resultant figures for repeat content are estimations that
may be lower than the true number found in the
genomes.

Finding unique DNA signatures in Y. pestis and Y.
enterocolitica
DNA signatures for the Y. pestis and the Y. enterocoli-
tica genomes were identified using the Insignia pipeline
[44]. Signatures of 100 bp or longer were considered
good candidates for the design of detection assays.
These signatures were then compared with the genomes
of the Yersinia strains sequenced during the current
study using the MUMmer package [88] with default
parameters. Signatures that matched by more than 40
bp were deemed invalidated, as they would likely lead to
false-positive results.

Automated annotation
We used DIYA [89] for automated annotation, which
is a pipeline for integrating bacterial analysis tools.
Using DIYA, the assemblies generated by newbler were
scaffolded based on the optical map, concatenated, and
used as a template for the programs GLIMMER [90],
tRNASCAN-SE [91], and RNAmmer [92] for predic-
tion of open reading frames and RNA genes, respec-
tively. All predicted proteins encoded by each coding
sequence were compared against a database of all pro-
teins predicted from the canonical annotation of Y.
pestis C092 [9] as a preliminary screen for potentially
novel functions. The GenBank format files created
from the eight genomes sequenced in this study were
combined with other DIYA-annotated, published
whole genomes to form a dataset for analysis. All pro-
teins were searched against the UniRef50O database
(July 2008) [93] using BLASTP [94] and against the
Conserved Domain Database [95] using RPSBLAST
[96] with an E-value threshold of 10-10 to record
matches.

Database accession numbers
The annotated genome data were submitted to NCBI
GenBank and the sequence data submitted to the NCBI


Short Read Archive (SRA). The accession numbers are:
Y. rohdei, ATCC_43380: [Genbank:ACCDOOOOOOOO]/
[SRA:SRA009766.1]; Y ruckeri ATCC_29473: [Genbank:
ACCCOOOOOOOO]/[SRA:SRA009767.1]; Y. aldovae
ATCC_35236: [Genbank:ACCBOOOOOOOO]/[SRA:
SRA009760.1]; Y. kristensenii ATCC_33638: [Genbank:
ACCAOOOOOOOO]/[SRA:SRA009764.1]; Y. intermedia
ATCC_29909: [Genbank:AALFOOOOOOOO]/[SRA:
SRA009763.1]; Y. frederiksenii ATCC_33641: [Genbank:
AALEOOOOOOOO]/[SRA:SRA009762.1]; Y. mollaretii
ATCC_43969: [Genbank:AALDOOOOOOOO]/[SRA:
SRA009765.1]; Y. bercovieri ATCC_43970: [Genbank:
AALCOOOOOOOO]/[SRA:SRA009761.1].

Whole-genome alignment using MAUVE
Yersinia genomes were aligned using the standard
MAUVE [46] algorithm with default settings. A cutoff
for 1,500 bp was set as the minimum LCB length.
LCBs for each genome were extracted from the output
of the program and concatenated. From the alignment
nucleotide diversity was calculated by an in-house
script using positions where there was a base in all 11
genomes. Because of the size of the dataset, the calcu-
lated value of HI is very robust in terms of sequence
error. We calculated that 112,696 nucleotides of
sequence in the concatenated core would have to be
wrong to alter the estimation of P by + 5% (Additional
file 24). PHYLIP [51] programs were used to build a
consensus tree of the MAUVE alignment with boot-
strapping 1,000 replicates. The underlying model for
each replicate was Fitch-Margoliash. The final phylo-
geny was resolved according to the majority consensus
rule.

Clustering protein orthologs
The complete predicted proteome from all genomes
annotated in this study was searched against itself
using BLASTP with default parameters. We removed
short, spurious, and non-homologous hits by setting a
bitscore/alignment length filtering threshold of 0.4 and
minimum protein length of 30. Predicted proteins pas-
sing this filter were clustered into families based on
these normalized distances using the MCL algorithm
[48] with an inflation parameter value of 4. These
parameters were based on an investigation of cluster-
ing 12 completed E. coli genomes, which produced
very similar results to a previous study [42].

Whole genome phylogenetic reconstruction
From the results of clustering analysis, 681 proteins
were found that had exactly one member in each of the
genomes and the length of each protein in the cluster
was nearly identical. These protein sequences were


Page 13 of 18








Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


aligned using ClustalW [50], and individual gene align-
ments were concatenated into a string of 170,940 amino
acids for each genome. Uninformative characters were
removed from the dataset using Gblocks [97] and a phy-
logeny reconstructed with PHYLIP [51] under a neigh-
bor-joining model. To evaluate node support, a majority
rule-consensus tree of 1,000 bootstrap replicates was
computed.


Additional file 1: Statistics from DIYA and frameshift detection
programs on eight genomes sequenced in this study and other
enterobacterial genomes from NCBI Statistics from running DIYA [89]
and frameshift detection programs on the eight genomes sequenced in
this study and various other enterobacterial genomes downloaded from
NCBI
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010-1 1-1-rl-
S1 xls]
Additional file 2: Results of amosvalidate analysis on the eight
genomes of this study Results of amosvalidate [37] analysis on the
eight genomes of this study
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010- 1-1-rl-
S2 doc]
Additional file 3: Additional annotation files These consist of ISfinder
[40], RepeatScout [41]and amosvalidate [37] results (GFF format); repeats
found by RepeatScout in fasta format, scaffold files (NCBI AGP format);
and information about length of contigs, read count, estimated repeat
number, count in scaffold and whether or not the contig was placed by
SOMA [39]
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010-1 1-1-rl-
53 gz]
Additional file 4: Estimates for genome sizes (in Mbp) based on
optical map data Estimates for genome sizes (in Mbp) based on optical
map data
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010-1 1-1-rl-
4 doc]
Additional file 5: Pulsed field gel analysis of the eight sequenced
Yersinia species and failure to detect plasmids An E coli strain with
known plasmids was a positive control
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010-1 1-1-rl-
S5 doc]
Additional file 6: Sequences of the detected repeat families
Sequences of the detected repeat families
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010-1 1-1-rl-
56txt ]
Additional file 7: Y. pestis C092 signatures longer than 100 bp
computed by the Insignia pipeline Y pestis C092 signatures longer
than 100 bp computed by the Insignia [44] pipeline
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010-1 1-1-rl-
57 txt
Additional file 8: Sequences of the new genomes that match (that
is, invalidate) the Y. pestis C092 signatures listed in Additional file
7 Sequences of the new genomes that match (that is, invalidate) the Y
pestis C092 signatures listed in Additional file 7
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010-11 1-rl-
58txt I


Additional file 9: Y. enterocolitica signatures longer than 100 bp
computed by the Insignia pipeline Y enterocolitica signatures longer
than 100 bp computed by the Insignia pipeline
Click here for file
Shttp'//www biomedcentral com/content/supplementary/gb-2010-11 1-rl-
9 txt]
Additional file 10: Sequences of the new genomes that match (that
is, invalidate) the Y. enterocolitica signatures Sequences of the new
genomes that match (that is, invalidate) the Y enterocolitica signatures
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010-11 1-rl-
510txt]
Additional file 11: Y. pestis genome with the Insiginia-indentified
repeats and genome islands plotted Y pestis genome with the
Insiginia-indentified repeats and genome islands identified using
IslandViewer [45] plotted The figure was created using DNAPIotter [106]
Click here for file
[http'//www biomedcentral com/content/supplementary/gb-2010-11 1-rl-
511 png]
Additional file 12: Y. enterocolitica genome with the Insiginia-
indentified repeats and genome plotted Y enterocolitica genome with
the Insiginia-indentified repeats and genome islands identified using
IslandViewer [45] plotted The figure was created using DNAPlotter [106]
Click here for file
[http'//www biomedcentral com/content/supplementary/gb-2010-11 1-rl-
512 png]
Additional file 13: Output of the MAUVE [46] alignment of 11
Yersinia species The eight genomes sequenced in this study are
represented as pseudocontigs, ordered by a combination of optical
mapping and alignment to the closest completed reference genome
Click here for file
[http'//www biomedcentral com/content/supplementary/gb-2010-11 1-rl-
513jpeg]
Additional file 14: Whole genome multiple alignment produced by
MAUVE of the 11 Yersinia genomes Whole genome multiple alignment
produced by MAUVE of the 11 Yersinia genomes in XMFA format [106]
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010- 1-1-rl-
514zip]
Additional file 15: Output of the cluster analysis of the 11 Yersinia
species The top level directory consists of a directory called
Additional_cluster_files and 5010 directories, one for each multi-protein
cluster family (This top level directory has been split into three data files
for uploading purposes (Additional files 15, 16, 17)) Within the directory
are the following files PGLl_unique_Yersinia_unclusteredout list of all
protein singletons that MCL did not group into a cluster (see Materials
and Methods); PGLl_Yersinia_unique_locus_tagstxt names of the 11
locus tag prefixes used for each genome; PGLl_unique_Yersiniagff -
mapping each Yersinia protein to a cluster in tab delimited GFF;
PGLl_unique_Yersiniasigfile list of the longest protein in each cluster;
PGLl_unique_Yersiniasummary summary table of features of each of
the clusters; PGLl_unique_Yersina table summary table of each protein
in the clusters Within each cluster directory are the following files, where
x' is the cluster name' PGL1_unique_Yersinia-xfaa multifasta file of the
proteins in the cluster; PGLlunique Yersinia-xsummary summary of
the properties of the proteins; PGLlunique Yersinia-xmatches blast
matches between the proteins of the cluster; PGLlunique Yersina-x
musclefasta muscle alignment of the proteins; PGLl_unique_Yersinia-x
musclefastagblo blocks output of muscle alignment (that is, auto-
trimmed alignment); PGLl_unique_Yersina-x musclefasta gblo htm as
above in html format; PGLl_unique_Yersinia-xmuscletree treefile from
muscle alignment; PGL1_unique_Yersinia-xsif matches between
proteins in simple interaction format for display on graphing software
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010- 1-1-rl-
515zip]


Page 14 of 18








Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


Additional file 16: Output of the cluster analysis of the 11 Yersinia
species The top level directory consists of a directory called
Additional_cluster_files and 5010 directories, one for each multi-protein
cluster family (This top level directory has been split into three data files
for uploading purposes (Additional files 15, 16, 17) Within the directory
are the following files PGL1_unique_Yersinia_unclusteredout list of all
protein singletons that MCL did not group into a cluster (see Materials
and Methods); PGL_Yersinia_unique_locus_tagstxt names of the 11
locus tag prefixes used for each genome; PGLLlunique_Yersiniagff -
mapping each Yersinia protein to a cluster in tab delimited GFF;
PGL1_unique_Yersinasigfile list of the longest protein in each cluster;
PGL1_unique_Yersini summary summary table of features of each of
the clusters; PGL1_unique_Yersina table summary table of each protein
in the clusters Within each cluster directory are the following files, where
x' is the cluster name' PGL1_unique_Yersin a-xfaa multifasta file of the
proteins in the cluster; PGL _unique_Yersina-xsummary summary of
the properties of the proteins; PGL1_unique_Yersina-x matches blast
matches between the proteins of the cluster; PGL1_unique Yersinia-x
musclefasta muscle alignment of the proteins; PGL 1_unique_Yersina-x
musclefastagblo blocks output of muscle alignment (that is, auto-
trimmed alignment); PGL1_unique_Yersina-xmuscle fastagblohtm as
above in html format; PGL1_unique_Yersinia-xmuscletree treefile from
muscle alignment; PGL1_unique_Yersinia-xsif- matches between
proteins in simple interaction format for display on graphing software
Click here for file
[http'//www biomedcentral com/content/supplementary/gb-2010-1 1-1-rl-
S16zip]
Additional file 17: Output of the cluster analysis of the 11 Yersinia
species The top level directory consists of a directory called
Additional_cluster_files and 5010 directories, one for each multi-protein
cluster family (This top level directory has been split into three data files
for uploading purposes (Additional files 15, 16, 17) Within the directory
are the following files PGLl_unique_Yersinia_unclusteredout list of all
protein singletons that MCL did not group into a cluster (see Materials
and Methods); PGLl_Yersinia_unique_locus_tagstxt names of the 11
locus tag prefixes used for each genome; PGLl_unique_Yersiniagff -
mapping each Yersinia protein to a cluster in tab delimited GFF;
PGLl_unique_Yersinasigfile list of the longest protein in each cluster;
PGLl_unique_Yersina summary summary table of features of each of
the clusters; PGLl_unique_Yersina table summary table of each protein
in the clusters Within each cluster directory are the following files, where
x' is the cluster name' PGL1_unique_Yersin a-xfaa multifasta file of the
proteins in the cluster; PGLl_unique_Yersina-xsummary summary of
the properties of the proteins; PGLl_unique_Yersina-x matches blast
matches between the proteins of the cluster; PGL1_unique_Yersinia-x
musclefasta muscle alignment of the proteins; PGL1_unique_Yersinia-x
musclefastagblo blocks output of muscle alignment (that is, auto-
trimmed alignment); PGL1_unique_Yersina-xmuscle fastagblohtm as
above in html format; PGL1_unique_Yersinia-xmuscletree treefile from
muscle alignment; PGL1_unique_Yersinia-xsif- matches between
proteins in simple interaction format for display on graphing software
Click here for file
[http'//www biomedcentral com/content/supplementary/gb-2010-1 1-1-rl-
S17zip]
Additional file 18: Complete protein sets for the 11 species of
Yersinia Complete protein sets for the 11 species of Yersinia
Click here for file
[http//wwwbiomedcentral com/content/supplementary/gb-2010-1l 1-1-rl-
S18zip]
Additional file 19: Inferred evolutionary trees reconstructed using
PHYLIP [51] of the 11 Yersinia species proteomes based on
parsimony To evaluate node support, a majority rule-consensus tree of
1,000 bootstrap replicates was computed E coli was used as an
outgroup species
Click here for file
[http'//www biomedcentral com/content/supplementary/gb-2010-1 1-1-rl-
S19pdf]


Additional file 20: Inferred evolutionary trees reconstructed using
PHYLIP [51] of the 11 Yersinia species proteomes based on
maximum likelihood To evaluate node support, a majority rule-
consensus tree of 1,000 bootstrap replicates was computed E coli was
used as an outgroup species
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010- 1-1-rl-
S20 pdf]
Additional file 21: Twenty proteins conserved in pathogenic strains
but missing from the non-pathogen set A curve showing the rate of
decline in number of this set as more non-pathogen genomes are
added is also included
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010- 1-1-rl-
S21 doc]
Additional file 22: Phylogeny of TTSS component YscN in Yersinia
and other enterobacteria species Phylogeny of TS component YscN
in Yersinia and other enterobacteria species
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010- 1-1-rl-
S22 doc]
Additional file 23: Putative antibiotic resistance genes in the
Yersinia genus determined using the Antibiotic Resistance Genes
Database Putative antibiotic resistance genes in the Yersinia genus
determined using the Antibiotic Resistance Genes Database [45]
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/gb-2010- 1-1-rl-
S23xls]
Additional file 24: Calculations for the estimation of 1 from aligned
Yersinia core genomes Calculations for the estimation of ar from
aligned Yersinia core genomes
Click here for file
[http//www biomedcentral com/content/supplementary/gb-2010- 1-1-rl-
24 doc]




Abbreviations
ATCC American Type Culture Collection; COG' Cluster of Orthologous
Groups; HPI' high-pathogenicity island; 15I insertion sequence; LCB' locally
collinear block; SRA' Short Read Archive; TTSS type III secretion system; YAPI
Y pseudotuberculosis adhesion pathogenicity island

Acknowledgements
We would like to thank Ayra Akmal, Kim Bishop-Lilly, Mike Cariaso, Brian
Osborne, Bill Klimke, Tim Welch, Jennifer Tsai, Cheryl Timms Strauss and
members of the 454 Service Center for their help and advice in completing
this manuscript This work was supported by grant TMT10068 07 NM T from
the Joint Science and Technology Office for Chemical and Biological
Defense (JSTO-CBD), Defense Threat Reduction Agency Initiative to TDR The
views expressed in this article are those of the authors and do not
necessarily reflect the official policy or position of the US Department of the
Navy, US Department of Defense, or the US Government Some of the
authors are employees of the US Government, and this work was prepared
as part of their official duties Title 17 USC 105 provides that 'Copyright
protection under this title is not available for any work of the United States
Government' Title 17 USC 101 defines a US Government work as a work
prepared by a military service member or employee of the US Government
as part of that person's official duties

Author details
SBiological Defense Research Directorate, Naval Medical Research Center, 503
Robert Grant Avenue, Silver Spring, Maryland 20910, USA 2University of
Maryland Institute for Advanced Computer Sciences, Center for
Bioinformatics and Computational Biology, University of Maryland, College
Park, Maryland 20742, USA Emerging Pathogens Institute and Department
of Molecular Genetics and Microbiology, University of Florida College of


Page 15 of 18








Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


Medicine, Gainesville, Florida 32610, USA 4454 Life Sciences Inc, 15
Commercial Street, Branford, Connecticut 06405, USA sDepartment of
Human Genetics, Emory University School of Medicine, 615 Michael Street,
Atlanta, Georgia 30322, USA 6Division of Infectious Diseases, Emory
University School of Medicine, 615 Michael Street, Atlanta, Georgia 30322,
USA 'Current address' Computational and Mathematical Biology, Genome
Institute of Singapore, Singapore- 127726

Authors' contributions
TDR, MEZ, LD, and SS were involved in study design AS, and AM were
involved in materials LD, MPKT, SL, and NNo were involved in 454
sequencing SS, MPKT, and CC were involved in additional experiments PEC,
TDR, CC, MEZ, ACS, NN, MP, BT, and DDS were involved in data analysis
TDR, MP, and NN wrote the paper

Received: 23 May 2009 Revised: 7 October 2009
Accepted: 4 January 2010 Published: 4 January 2010

References
1 Ecker DJ, Sampath R, Willett P, Wyatt JR, Samant V, Massire C, Hall TA,
Hari K, McNeil JA, Buchen-Osmond C, Budowle B The Microbial Rosetta
Stone Database: a compilation of global and emerging infectious
microorganisms and bioterrorist threat agents. BMC Microbiol 2005, 5'19
2 Achtman M, Zurth K, Morelli G, Torrea G, Guiyoule A, Carniel E Yersinia
pestis, the cause of plague, is a recently emerged clone of Yersinia
pseudotuberculosis. Proc Nat/ Acad Sc USA 1999, 96'14043-14048
3 van Baarlen P, van Belkum A, Summerbell RC, Crous PW, Thomma BP'
Molecular mechanisms of pathogenicity: how do pathogenic
microorganisms develop cross-kingdom host jumps?. FEMS Microbiol Rev
2007, 31'239-277
4 Van Ert MN, Easterday WR, Huynh LY, Okinaka RT, Hugh Jones ME, Ravel J,
Zanecki SR, Pearson T, Simonson TS, U'Ren JM, Kachur SM, Leadem-
Dougherty RR, Rhoton SD, Zinser G, Farlow J, Coker PR, Smith KL, Wang B,
Kenefic U, Fraser-Liggett CM, Wagner DM, Keim P Global Genetic
Population Structure of Bacillus anthracis. PLoS ONE 2007, 2'e461
5 Zwick ME, McAfee F, Cutler DJ, Read TD, Ravel J, Bowman GR, Galloway DR,
Mateczun A Microarray-based resequencing of multiple Bacillus
anthracis isolates. Genome Biol 2005, 6 R10
6 Ahmed N, Dobrindt U, Hacker J, Hasnain SE Genomic fluidity and
pathogenic bacteria: applications in diagnostics, epidemiology and
intervention. Nat Rev Microbiol 2008, 6'387-394
7 Mardis ER' The impact of next-generation sequencing technology on
genetics. Trends Genet 2008, 24'133-141
8 Shendure J, Ji H Next-generation DNA sequencing. Nat Biotechnol 2008,
26'1135-1145
9 Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MT, Prentice MB,
Sebaihia M, James KD, Churcher C, Mungall KL, Baker S, Basham D,
Bentley SD, Brooks K, Cerdeno Tarraga AM, Chillingworth T, Cronin A,
Davies RM, Davis P, Dougan G, Feltwell T, Hamlin N, Holroyd S, Jagels K,
Karlyshev AV, Leather S, Moule S, Oyston PC, Quail M, Rutherford K, et a/
Genome sequence of Yersinia pestis, the causative agent of plague.
Nature 2001, 413'523-527
10 Deng W, Burland V, Plunkett G, Boutin A, Mayhew GF, Liss P, Perna NT,
Rose DJ, Mau B, Zhou S, Schwartz DC, Fetherston JD, Lindler LE,
Brubaker RR, Plano GV, Straley SC, McDonough KA, Nilles ML, Matson JS,
Blattner FR, Perry RD Genome sequence of Yersinia pestis KIM. J Bactenol
2002, 1844601 4611
11 Song Y, Tong Z, Wang J, Wang L, Guo Z, Han Y, Zhang J, Pei D, Zhou D,
Qin H, Pang X, Han Y, Zhai J, Li M, Cui B, Qi Z, Jin L, Dai R, Chen F, Li S,
Ye C, Du Z, Lin W, Wang J, Yu J, Yang H, Wang J, Huang P, Yang R'
Complete genome sequence of Yersinia pestis strain 9 an isolate
avirulent to humans. DNA Res 2004z, 11 179-197
12 Chain PS, Hu P, Malfatti SA, Radnedge L, Larimer F, Vergez LM, Worsham P,
Chu MC, Andersen GL Complete genome sequence of Yersinia pestis
strains Antiqua and Nepal516: evidence of gene reduction in an
emerging pathogen. J Bactenol 2006, 1884453-4463
13 Chain PS, Carniel E, Larimer FW, Lamerdin J, Stoutland PO, Regala WM,
Georgescu AM, Vergez LM, Land ML, Motin VL, Brubaker RR, Fowler J,
Hinnebusch J, Marceau M, Medigue C, Simonet M, Chenal-Francisque V,
Souza B, Dacheux D, Elliott JM, Derbise A, Hauser L, Garcia E Insights into
the evolution of Yersinia pestis through whole-genome comparison with


Yersinia pseudotuberculosis. Proc Nat/ Acad Sc USA 2004,
101'13826-13831
14 Eppinger M, Rosovitz MJ, Fricke WF, Rasko DA, Kokorina G, Fayolle C,
Lindler LE, Carniel E, Ravel J The complete genome sequence of Yersinia
pseudotuberculosis IP31758, the causative agent of Far East scarlet-like
fever. PLoS Genet 2007, 3'e142
15 Thomson NR, Howard S, Wren BW, Holden MT, Crossman L, Challis GL,
Churcher C, Mungall K, Brooks K, Chillingworth T, Feltwell T, Abdellah Z,
Hauser H, Jagels K, Maddison M, Moule S, Sanders M, Whitehead S,
Quail MA, Dougan G, Parkhill J, Prentice MB The Complete Genome
Sequence and Comparative Genome Analysis of the High Pathogenicity
Yersinia enterocolitica Strain 8081. PLoS Genet 2006, 2e206
16 Rollins SE, Rollins SM, Ryan ET Yersinia pestis and the plague. Am Clin
Pathol 2003, 119(Suppl)'S78-85
17 Wren BW The yersiniae-a model genus to study the rapid evolution of
bacterial pathogens. Nat Rev Microbiol 2003, 1 55-64
18 Cornelis GR The Yersinia Ysc-Yop virulence apparatus. Int J Med Microbiol
2002, 291455-462
19 Juris SJ, Shao F, DIxon JE Yersinia effectors target mammalian signaling
pathways. Cell Microbiol 2002, 4201-211
20 Viboud GI, Bliska JB Yersinia outer proteins: role in modulation of host cell
signaling responses and pathogenesis. Annu Rev Microbiol 2005, 59'69-89
21 Schubert S, Rakin A, Heesemann J The Yersinia high-pathogenicity island
(HPI): evolutionary and functional aspects. Int J Med Microbiol 2004,
29483-94
22 Carniel E The Yersinia high-pathogenicity island: an iron-uptake island.
Microbes Infect 2001, 3'561-569
23 Darling AE, Miklos I, Ragan MA Dynamics of genome rearrangement in
bacterial populations. PLoS Genet 2008, 4'e1000128
24 Anisimov AP, Lindler LE, Pier GB Intraspecific diversity of Yersinia pestis.
Chn Microbiol Rev 2004, 17434 464
25 Wang X, Han Y, Li Y, Guo Z, Song Y, Tan Y, Du Z, Rakin A, Zhou D, Yang R
Yersinia genome diversity disclosed by Yersinia pestis genome-wide
DNA microarray. Can J Microbiol 2007, 53'1211-1221
26 Welch TJ, Fricke WF, McDermott PF, White DG, Rosso ML, Rasko DA,
Mammel MK, Eppinger M, Rosovitz MJ, Wagner D, Rahalison L, Leclerc JE,
Hinshaw JM, Lindler LE, Cebula TA, Carniel E, Ravel J Multiple antimicrobial
resistance in plague: an emerging public health risk. PLoS ONE 2007, 2
e309
27 Derbise A, Chenal-Francisque V, Pouillot F, Fayolle C, Prevost MC,
Medigue C, Hinnebusch BJ, Carniel E A horizontally acquired filamentous
phage contributes to the pathogenicity of the plague bacillus. Mol
Microbiol 2007, 63'1145-1157
28 Sulakvelidze A Yersiniae other than Y. enterocolitica, Y.
pseudotuberculosis, and Y. pestis: the ignored species. Microbes Infect
2000, 2497-513
29 Bottone EJ, Bercovier H, Mollaret HH' Genus XLI. Yersinia Van Loghem
1944, 1 SAL. Bergeys Manual of Systematic Bactenology 2005, 2'838-846
30 Kotetishvili M, Kreger A, Wauters G, Morris JG Jr, Sulakvelidze A, Stine OC
Multilocus sequence typing for studying genetic relationships among
Yersinia species. J Chn Microbiol 2005, 432674-2684
31 Noble MA, Barteluk RL, Freeman HJ, Subramaniam R, Hudson JB Clinical
significance of virulence-related assay of Yersinia species. J Clin Microbiol
1987, 25802-807
32 Robins-Browne RM, Cianciosi S, Bordun AM, Wauters G Pathogenicity of
Yersinia kristensenii for mice. Infect Immun 1991, 59'162-167
33 Fukushima H, Gomyoda M, Kaneko S Mice and moles inhabiting
mountainous areas of Shimane Peninsula as sources of infection with
Yersinia pseudotuberculosis. J Chn Microbiol 1990, 282448-2455
34 Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J,
Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV,
Godwin BC, He W, Helgesen S, Ho CH, Ho CH, Irzyk GP, Jando SC,
Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH,
Lefkowitz SM, Lei M, et a/ Genome sequencing in microfabricated high-
density picolitre reactors. Nature 2005, 437376-380
35 Ewing B, Green P Base-calling of automated sequencer traces using
phred. II. Error probabilities. Genome Res 1998, 8'186-194
36 Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, Russ C,
Lander ES, Nusbaum C, Jaffe DB Quality scores and SNP detection in
sequencing-by-synthesis systems. Genome Res 2008, 18763-770


Page 16 of 18








Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


37 Phillippy AM, Schatz MC, Pop M Genome assembly forensics: finding the
elusive mis-assembly. Genome Biol 2008, 9'R55
38 Samad AH, Cai WW, Hu X, Irvin B, Jing J, Reed J, Meng X, Huang J, Huff E,
Porter B Mapping the genome one molecule at a time-optical mapping.
Nature 1995, 378'516-517
39 Nagarajan N, Read TD, Pop M Scaffolding and validation of bacterial
genome assemblies using optical restriction maps. Bioinformatics 2008,
241229-35
40 Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M ISfinder: the
reference centre for bacterial insertion sequences. Nucleic Acids Res 2006,
34'D32-36
41 Price AL, Jones NC, Pevzner PA De novo identification of repeat families
in large genomes. Bioinformatics 2005, 21(Suppl 1)'i351-358
42 Hulton CS, Higgins CF, Sharp PM ERIC sequences: a novel family of
repetitive elements in the genomes of Escherichia coli, Salmonella
typhimurium and other enterobacteria. Mol Microbiol 1991, 5'825-834
43 De Gregorio E, Silvestro G, Venditti R, Carlomagno MS, Di Nocera PP'
Structural organization and functional properties of miniature DNA
insertion sequences in yersiniae. J Bactenol 2006, 1887876-7884
44 Phillippy AM, Mason JA, Ayanbule K, Sommer DD, Taviani E, Huq A,
Colwell RR, Knight IT, Salzberg SL Comprehensive DNA signature
discovery and validation. PLoS Comput Biol 2007, 3 e98
45 Langille MG, Brinkman FS IslandViewer: an integrated interface for
computational identification and visualization of genomic islands.
Bioinformatics 2009, 25'664-665
46 Darling AC, Mau B, Blattner FR, Perna NT Mauve: multiple alignment of
conserved genomic sequence with rearrangements. Genome Res 2004,
141394-1403
47 MAUVE Aligner User Guide. http//asapahabswiscedu/mauve-aligner/
mauve-user-guide/
48 Enright AJ, Van Dongen S, Ouzounis CA An efficient algorithm for
large-scale detection of protein families. Nucleic Acids Res 2002,
30 1575-1584
49 Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL,
Angiuoli SV, Crabtree J, Jones AL, Durkin AS, Deboy RT, Davidsen TM,
Mora M, Scarselli M, Margarit y Ros I, Peterson JD, Hauser CR, Sundaram JP,
Nelson WC, Madupu R, Brinkac LM, Dodson PJ, Rosovitz MJ, Sullivan SA,
Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N, et a'
Genome analysis of multiple pathogenic isolates of Streptococcus
agalactiae: implications for the microbial "pan-genome". Proc Nat/ Acad
Sc USA 2005, 102'13950-13955
50 Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA,
McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD,
Gibson TJ, Higgins DG Clustal W and Clustal X version 2.0. Bioinformatics
2007, 23'2947-2948
51 Felsenstein J PHYLIP: Phylogeny Inference Package, version 3.6. Seattle,
WA, USA University of Washington 2001
52 Tatusov RL, Galperin MY, Natale DA, Koonin EV The COG database: a tool
for genome-scale analysis of protein functions and evolution. Nucleic
Acids Res 2000, 2833-36
53 Lepore LS, Roelvink PR, Granados RR Enhancin, the granulosis virus
protein that facilitates nucleopolyhedrovirus (NPV) infections, is a
metalloprotease. J Invertebr Pathol 1996, 68131 -140
54 Bowen D, Rocheleau TA, Blackburn M, Andreev 0, Golubeva E, Bhartia R,
ffrench-Constant RH Insecticidal toxins from the bacterium Photorhabdus
luminescens. Science 1998, 2802129-2132
55 Brussow H, Canchaya C, Hardt WD Phages and the evolution of bacterial
pathogens: from genomic rearrangements to lysogenic conversion.
Microbiol Mol Biol Rev 2004, 68'560-602
56 Collyn F, Guy L, Marceau M, Simonet M, Roten CA Describing ancient
horizontal gene transfers at the nucleotide and gene levels by
comparative pathogenicity island genometrics. Bioinformatics 2006,
22'1072-1079
57 Collyn F, Billault A, Mullet C, Simonet M, Marceau M YAPI, a new Yersinia
pseudotuberculosis pathogenicity island. Infect immun 2004,
724784-4790
58 Howard SL, Gaunt MW, Hinds J, Witney AA, Stabler R, Wren BW'
Application of comparative phylogenomics to study the evolution of
Yersinia enterocolitica and to identify genetic differences relating to
pathogenicity. J Bactenol 2006, 188'3645-3653


59 Haller JC, Carlson S, Pederson KJ, Pierson DE A chromosomally encoded
type III secretion pathway in Yersinia enterocolitica is important in
virulence. Mol Microbiol 2000, 36'1436-1446
60 Hensel M, Shea JE, Baumler AJ, Gleeson C, Blattner F, Holden DW Analysis
of the boundaries of Salmonella pathogenicity island 2 and the
corresponding chromosomal region of Escherichia coli K-12. J Bactenol
1997, 179'1105-1111
61 Shea JE, Hensel M, Gleeson C, Holden DW Identification of a virulence
locus encoding a second type III secretion system in Salmonella
typhimurium. Proc Nat/ Acad Sc USA 1996, 93'2593-2597
62 Thomson NR, Howard S, Wren BW, Prentice MB Comparative genome
analyses of the pathogenic Yersiniae based on the genome sequence of
Yersinia enterocolitica strain 8081. Adv Exp Med Biol 2007, 603'2-16
63 Prentice MB, Cuccui J, Thomson N, Parkhill J, Deery E, Warren MJ
Cobalamin synthesis in Yersinia enterocolitica 8081. Functional aspects
of a putative metabolic island. Adv Exp Med Biol 2003, 52943-46
64 Roth JR, Lawrence JG, Bobik TA Cobalamin (coenzyme B12): synthesis
and biological significance. Annu Rev Microbiol 1996, 50'137-181
65 Kofoid E, Rappleye C, Stojiljkovic I, Roth J The 17-gene ethanolamine (eut)
operon of Salmonella typhimurium encodes five homologues of
carboxysome shell proteins. J Bactenol 1999, 1815317-5329
66 Maier RJ Use of molecular hydrogen as an energy substrate by human
pathogenic bacteria. Biochem Soc Trans 2005, 33'83-85
67 Ewing WH, Ross AJ, Brenner DJ, R FG Yersinia ruckeri sp. nov., the
Redmouth (RM) Bacterium. Int J Syst Bactenol 1978, 28'37-44
68 Sekowska A, Denervaud V, Ashida H, Michoud K, Haas D, Yokota A,
Danchin A Bacterial variations on the methionine salvage pathway. BMC
Microbiol 2004, 4'9
69 Hiller NL, Janto B, Hogg JS, Boissy R, Yu S, Powell E, Keefe R, Ehrlich NE,
Shen K, Hayes J, Barbadora K, Klimke W, Dernovoy D, Tatusova T, Parkhill J,
Bentley SD, Post JC, Ehrlich GD, Hu FZ Comparative Genomic Analyses of
Seventeen Streptococcus pneumoniae Strains: Insights into the
Pneumococcal Supragenome. J Bactenol 2007, 1898186-95
70 Hogg JS, Hu FZ, Janto B, Boissy R, Hayes J, Keefe R, Post JC, Ehrlich GD
Characterization and modeling of the Haemophilus influenzae core and
supragenomes based on the complete genomic sequences of Rd and
12 clinical nontypeable strains. Genome Biol 2007, 8'RI03
71 Mathee K, Narasimhan G, Valdes C, Qiu X, Matewish JM, Koehrsen M,
Rokas A, Yandava CN, Engels R, Zeng E, Olavarietta R, Doud M, Smith RS,
Montgomery P, White JR, Godfrey PA, Kodira C, Birren B, Galagan JE, Lory S
Dynamics of Pseudomonas aeruginosa genome evolution. Proc Nat/ Acad
Sc USA 2008, 1053100-3105
72 Holt K, Parkhill J, Mazzoni C, Roumagnac P, Weill F, Goodhead I, Rance R,
Baker S, Maskell D, Wain J, Dolecek C, Achtman M, Dougan G High-
throughput sequencing provides insights into genome variation and
evolution in Salmonella Typhi. Naot Genet 2008, 40987-93
73 Simmons S, Dibartolo G, Denef V, Goltsman D, Thelen M, Banfield J, Eisen J
Population Genomic Analysis of Strain Variation in Leptospirillum Group
II Bacteria Involved in Acid Mine Drainage Formation. Plos Biol 2008, 6
el77
74 Rasko D, Rosovitz M, Myers G, Mongodin E, Fricke W, Gajer P, Crabtree J,
Sperandio V, Ravel J The pan-genome structure of Escherichia coli:
comparative genomic analysis of E. coli commensal and pathogenic
isolates. Journal of Bacterology 2008, 1906881 -93
75 Read TD, Peterson SN, Tourasse N, Baillie LW, Paulsen IT, Nelson KE,
Tettelin H, Fouts DE, Eisen JA, Gill SR, Holtzapple EK, Okstad OA, Helgason E,
Rilstone J, Wu M, Kolonay JF, Beanan MJ, Dodson RJ, Brinkac LM, Gwinn M,
DeBoy RT, Madpu R, Daugherty SC, Durkin AS, Haft DH, Nelson WC,
Peterson JD, Pop M, Khouri HM, Radune D, et al The genome sequence of
Bacillus anthracis Ames and comparison to closely related bacteria.
Nature 2003, 42381 -86
76 Tettelin H, Masignani V, Cieslewicz MJ, Eisen JA, Peterson S, Wessels MR,
Paulsen IT, Nelson KE, Margarit I, Read TD, Madoff LC, Wolf AM, Beanan MJ,
Brinkac LM, Daugherty SC, DeBoy RT, Durkin AS, Kolonay JF, Madupu R,
Lewis MR, Radune D, Fedorova NB, Scanlan D, Khouri H, Mulligan S,
Carty HA, Cline RT, Van Aken SE, Gill J, Scarselli M, et al Complete genome
sequence and comparative genomic analysis of an emerging human
pathogen, serotype V Streptococcus agalactiae. Proc Nat/ Acad Sci USA
2002, 99'12391-12396
77 Sprague LD, Neubauer H Yersinia aleksiciae sp. nov. Int J Syst Evol
Microbiol 2005, 55'831-835


Page 17 of 18








Chen et al. Genome Biology 2010, 11:R1
http://genomebiology.com/2010/11/1/R1


78 Sprague LD, Scholz HC, Amann S, Busse HJ, Neubauer H Yersinia similis
sp. nov. Int J Syst Evol Microbiol 2008, 58'952-958
79 Merhej V, Adekambi T, Pagnier I, Raoult D, Drancourt M Yersinia
massiliensis sp. nov., isolated from fresh water. Int J Syst Evol Microbiol
2008, 58779-784
80 Delpino MV, Marchesini MI, Estein SM, Comerci DJ, Cassataro J, Fossati CA,
Baldi PC A bile salt hydrolase of Brucella abortus contributes to the
establishment of a successful infection through the oral route in mice.
Infect immune 2007, 75'299-305
81 Sherlock 0, Vejborg RM, Klemm P The TibA adhesin/invasin from
enterotoxigenic Escherichia coli is self recognizing and induces bacterial
aggregation and biofilm formation. Infect immun 2005, 73'1954-1963
82 Liu B, Pop M ARDB-Antibiotic Resistance Genes Database. Nucleic Acids
Res 2009, 37'D443-447
83 Antibiotic Resistance Genes Database. http //ardbcbcbumd edu/
84 Leplae R, Hebrant A, Wodak SJ, Toussaint A ACLAME: a CLAssification of
Mobile genetic Elements. Nucleic Acids Res 2004, 32 D45-49
85 Kislyuk A, Lomsadze A, Lapidus AL, Borodovsky M Frameshift detection in
prokaryotic genomic sequences. Int J Bioinform Res Appl 2009, 5'458-477
86 Pop M, Phillippy A, Delcher AL, Salzberg SL Comparative genome
assembly. Brief Bioform 2004, 5'237-248
87 Li W, Godzik A Cd-hit: a fast program for clustering and comparing large
sets of protein or nucleotide sequences. Bioinformatics 2006,
22'1658-1659
88 Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C,
Salzberg SL Versatile and open software for comparing large genomes.
Genome Biol 2004, 5'R12
89 Stewart AC, Osborne B, Read TD DIYA: A bacterial annotation pipeline for
any genomics lab. Bioinformatics 2009, 25'962-3
90 Salzberg SL, Delcher AL, Kasif S, White 0 Microbial gene identification
using interpolated Markov models. Nucleic Acids Res 1998, 26'544-548
91 Lowe TM, Eddy SR tRNAscan-SE: a program for improved detection of
transfer RNA genes in genomic sequence. Nucleic Acids Res 1997,
25955-964
92 Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW
RNAmmer: consistent and rapid annotation of ribosomal RNA genes.
Nucleic Acids Res 2007, 35'3100-3108
93 Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH UniRef:
comprehensive and non-redundant UniProt reference clusters.
Bioiformatics 2007, 23'1282-1288
94 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ Basic local alignment
search tool. J Mol Biol 1990, 215403-410
95 Conserved Domain Database(CDD). http//wwwncbinlmnihgov/sites/
entrezdb cdd
96 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res 1997, 25'3389-3402
97 Talavera G, Castresana J Improvement of phylogenies after removing
divergent and ambiguously aligned blocks from protein sequence
alignments. Syst Biol 2007, 56564-577
98 Read TD, Myers GS, Brunham RC, Nelson WC, Paulsen IT, Heidelberg J,
Holtzapple E, Khouri H, Federova NB, Carty HA, Umayam LA, Haft DH,
Peterson J, Beanan MJ, White 0, Salzberg SL, Hsia RC, McClarty G, Rank RG,
Bavoil PM, Fraser CM Genome sequence of Chlamydophila caviae
(Chlamydia psittaci GPIC): examining the role of niche-specific genes in
the evolution of the Chlamydiaceae. Nucleic Acids Res 2003, 31'2134-2147
99 Rasko DA, Myers GS, Ravel J Visualization of comparative genomic
analyses by BLAST score ratio. BMC Bioinformatics 2005, 6'2
100 Bercovier H, Steigerwalt AG, Guiyoule A, Huntley-Carter G, Brenner DJ'
Yersinia aldovae (Formerly Yersinia enterocolitica-Like Group X2): a New
Species of Enterobacteriaceae Isolated from Aquatic Ecosystems. Int J
Syst Bacteriol 1984, 34'166-172
101 Wauters G, Janssens M, Steigerwalt AG, Brenner DJ Yersinia mollaretii sp.
nov. and Yersinia bercovieri sp. nov., Formerly Called Yersinia
enterocolitica Biogroups 3A and 3B. Int J Syst Bactenol 1988, 38424
102 Ursing J, Brennert DJ, Bercovier H, Fanning GR, Steigerwalt AG, Brault J,
Mollaret HH Yersinia frederiksenii: A new species of enterobacteriaceae
composed of rhamnose-positive strains (formerly called atypical yersinia
enterocolitica or Yersinia enterocolitica -Like). Current 1980,
4213-217


103 Brenner DJ, Bercovier HH, Ursing J, Alonso JM, Steigerwalt AG, Fanning GR,
Carter GP, Mollaret HH Yersinia intermedia: A new species of
enterobacteriaceae composed of rhamnose-positive, melibiose-positive,
raffinose-positive strains (formerly called Yersinia enterocolitica or
Yersinia enterocolitica -like). Current Microbiology 1980, 4207-212
104 Bercovier H, Ursing J, Brenner DJ, Steigerwalt AG, Fanning GR, Carter GP,
Mollaret HH Yersinia kristensenii: A new species of enterobacteriaceae
composed of sucrose-negative strains (formerly called atypical Yersinia
enterocolitica or Yersinia enterocolitica -Like). Current Microbiology 1980,
4'219-224
105 Aleksic S, Steigerwalt AG, Bockemuehl J Yersinia rohdei sp. nov. isolated
from human and dog feces and surface water. Int J Syst Bactenol 1987
106 Carver T, Thomson N, Bleasby A, Berriman M, Parkhill J' DNAPlotter: circular
and linear interactive genome visualization. Bioinformatics 2009,
25 119-120

doi:10.1186/gb-2010-11-1-rl
Cite this article as: Chen et al Genomic characterization of the Yersinia
genus. Genome Biology 2010 11 R1


O IoMed Central


Page 18 of 18


Submit your next manuscript to BioMed Central
and take full advantage of:

* Convenient online submission
* Thorough peer review
* No space constraints or color figure charges
* Immediate publication on acceptance
* Inclusion in PubMed, CAS, Scopus and Google Scholar
* Research which is freely available for redistribution


Submit your manuscript at
www.biomedcentral.com/submit