<%BANNER%>
Comparative genomic analysis of Vibrio parahaemolyticus: serotype conversion and virulence
CITATION DOWNLOADS PDF VIEWER
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/AA00012383/00001
 Material Information
Title: Comparative genomic analysis of Vibrio parahaemolyticus: serotype conversion and virulence
Series Title: BMC Genomics
Physical Description: Book
Language: English
Creator: Chen, Yuansha
Stine, O. Colin
Badger, Jonathan H.
Gil, Ana I.
Balakrish, G. Nair
Nishibuchi, Mitsuaki
Fouts, Derrick E.
Publisher: BioMed Central
Publication Date: 2011
 Notes
Abstract: Background: Vibrio parahaemolyticus is a common cause of foodborne disease. Beginning in 1996, a more virulent strain having serotype O3:K6 caused major outbreaks in India and other parts of the world, resulting in the emergence of a pandemic. Other serovariants of this strain emerged during its dissemination and together with the original O3:K6 were termed strains of the pandemic clone. Two genomes, one of this virulent strain and one pre-pandemic strain have been sequenced. We sequenced four additional genomes of V. parahaemolyticus in this study that were isolated from different geographical regions and time points. Comparative genomic analyses of six strains of V. parahaemolyticus isolated from Asia and Peru were performed in order to advance knowledge concerning the evolution of V. parahaemolyticus; specifically, the genetic changes contributing to serotype conversion and virulence. Two pre-pandemic strains and three pandemic strains, isolated from different geographical regions, were serotype O3:K6 and either toxin profiles (tdh+, trh-) or (tdh-, trh+). The sixth pandemic strain sequenced in this study was serotype O4:K68. Results: Genomic analyses revealed that the trh+ and tdh+ strains had different types of pathogenicity islands and mobile elements as well as major structural differences between the tdh pathogenicity islands of the pre-pandemic and pandemic strains. In addition, the results of single nucleotide polymorphism (SNP) analysis showed that 94% of the SNPs between O3:K6 and O4:K68 pandemic isolates were within a 141 kb region surrounding the O- and Kantigen- encoding gene clusters. The “core” genes of V. parahaemolyticus were also compared to those of V. cholerae and V. vulnificus, in order to delineate differences between these three pathogenic species. Approximately one-half (49-59%) of each species’ core genes were conserved in all three species, and 14-24% of the core genes were species-specific and in different functional categories. Conclusions: Our data support the idea that the pandemic strains are closely related and that recent South American outbreaks of foodborne disease caused by V. parahaemolyticus are closely linked to outbreaks in India. Serotype conversion from O3:K6 to O4:K68 was likely due to a recombination event involving a region much larger than the O-antigen- and K-antigen-encoding gene clusters. Major differences between pathogenicity islands and mobile elements are also likely driving the evolution of V. parahaemolyticus. In addition, our analyses categorized genes that may be useful in differentiating pathogenic Vibrios at the species level.
 Record Information
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution.
Resource Identifier: doi - 10.1186/1471-2164-12-294
System ID: AA00012383:00001

Downloads
Full Text

PAGE 1

RESEARCHARTICLE OpenAccessComparativegenomicanalysisof Vibrio parahaemolyticus :serotypeconversion andvirulenceYuanshaChen1,2,OColinStine3,JonathanHBadger7,AnaIGil4,GBalakrishNair5,MitsuakiNishibuchi6and DerrickEFouts7*AbstractBackground: Vibrioparahaemolyticus isacommoncauseoffoodbornedisease.Beginningin1996,amorevirulent strainhavingserotypeO3:K6causedmajoroutbreaksinIndiaandotherpartsoftheworld,resultinginthe emergenceofapandemic.Otherserovariantsofthisstrainemergedduringitsdisseminationandtogetherwith theoriginalO3:K6weretermedstrainsofthepandemicclone.Twogenomes,oneofthisvirulentstrainandone pre-pandemicstrainhavebeensequenced.Wesequencedfouradditionalgenomesof V.parahaemolyticus inthis studythatwereisolatedfromdifferentgeographicalregionsandtimepoints.Comparativegenomicanalysesofsix strainsof V.parahaemolyticus isolatedfromAsiaandPeruwereperformedinordertoadvanceknowledge concerningtheevolutionof V.parahaemolyticus ;specifically,thegeneticchangescontributingtoserotype conversionandvirulence.Twopre-pandemicstrainsandthreepandemicstrains,isolatedfromdifferent geographicalregions,wereserotypeO3:K6andeithertoxinprofiles( tdh+ trh -)or( tdh, trh +).Thesixthpandemic strainsequencedinthisstudywasserotypeO4:K68. Results: Genomicanalysesrevealedthatthe trh +and tdh +strainshaddifferenttypesofpathogenicityislandsand mobileelementsaswellasmajorstructuraldifferencesbetweenthe tdh pathogenicityislandsofthepre-pandemic andpandemicstrains.Inaddition,theresultsofsinglenucleotidepolymorphism(SNP)analysisshowedthat94%of theSNPsbetweenO3:K6andO4:K68pandemicisolateswerewithina141kbregionsurroundingtheO-andKantigen-encodinggeneclusters.The core genesof V.parahaemolyticus werealsocomparedtothoseof V. cholerae and V.vulnificus ,inordertodelineatedifferencesbetweenthesethreepathogenicspecies.Approximately one-half(49-59%)ofeachspecies coregeneswereconservedinallthreespecies,and14-24%ofthecoregenes werespecies-specificandindifferentfunctionalcategories. Conclusions: OurdatasupporttheideathatthepandemicstrainsarecloselyrelatedandthatrecentSouth Americanoutbreaksoffoodbornediseasecausedby V.parahaemolyticus arecloselylinkedtooutbreaksinIndia. SerotypeconversionfromO3:K6toO4:K68waslikelyduetoarecombinationeventinvolvingaregionmuchlarger thantheO-antigen-andK-antigen-encodinggeneclusters.Majordifferencesbetweenpathogenicityislandsand mobileelementsarealsolikelydrivingtheevolutionof V.parahaemolyticus .Inaddition,ouranalysescategorized genesthatmaybeusefulindifferentiatingpathogenicVibriosatthespecieslevel. *Correspondence:dfouts@jcvi.org7TheJ.CraigVenterInstitute,Rockville,MD,USA FulllistofauthorinformationisavailableattheendofthearticleChen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 2011Chenetal;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommons AttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,andreproductionin anymedium,providedtheoriginalworkisproperlycited.

PAGE 2

BackgroundVibrioparahaemolyticus isahalophilicbacterium whichhaslongbeenrecognized[1]asahumanpathogenthatcausesgastroenteritisand,occasionally, woundinfectionsandsepsisinimmunocompromised patients.Itistheleadingetiologicagentforbacterial foodbornediseaseinJapanandotherpartsofAsia, anditisthemostcommonbacterialcauseofseafoodassociateddiseaseintheUnitedStates.Priorto1996, therewasnospecificserotypeof V.parahaemolyticus thatwasassociatedwithdiseaseoutbreaks,andthe bacteriumhadneverbeenreportedtocauseapandemic.However,duringthatyear,amajoroutbreak occurredinIndia,>50%ofthe V.parahaemolyticus strainsisolatedfrompatientswereserotypeO3:K6[2]. Also,theoutbreakrapidlyspreadtoothercountriesin Asia,SouthAmerica,NorthAmerica,Africaand Europe,resultinginapandemicaffectingtensofthousandsofpeople[2,3].Durin gitsglobaldissemination, >20serovariants(includ ingO3:K6,O4:K68,O1:K25, O1:KUT[untypable],andothers[2,4,5]rapidlyevolved fromtheoriginalpandemicO3:K6strain.ThepandemicO3:K6anditsserovariantsaretermedstrainsof thepandemicclone. Athermostabledirecthemolysin(TDH)isrecognized [6]asthemostimportantvirulencefactorof V.parahaemolyticus ,andaTDH-relatedhemolysin(TRH)is believedtoaccountforthevirulenceofstrainsthatdo notproduceTDH.Priorwhole-genomesequencing[7,8] ofaserotypeO3:K6,pandemicisolatedesignated RIMD2210633identifiedtwotypeIIIsecretionsystems (T3SS).T3SSIispresentinall V.parahaemolyticus isolatesexaminedandisrequiredforthebacterium scytolyticactivity[8];whereas,T3SSIIisrequiredfor enterotoxicityandislocatedinthe tdh -containing pathogenicityisland[7,8]. Outbreaksofdiarrhealdiseasecausedby V.parahaemolyticus mayposeasignificanthealththreat.Thus far,themostaffectedcountry(otherthanIndia)has beenChile,where>10,000caseswerereportedduring 2005.Thisobservationsuggeststhat,underappropriate conditions, V.parahaemolyticus maycauselarge-scale outbreakscomparabletothoseelicitedby V.cholerae. Atthepresenttime,thereasonsforthepandemic strains rapidincreaseinvirulen ce/prevalencehavenot beenrigorouslydetermined.Inaddition,themechanism(s)forrapidserotypeconversionwarrantfurther study.Furthermore,itisnotclearwhetherthe virulencemechanismsof tdh+and trh+strainsare similar.Therefore,inordertoaddresssomeofthese questions,weperformedrigorousgenomicanalysesof twopre-pandemicandfourpandemicisolatesof V.parahaemolyticus .ResultsandDiscussionComparativegenomicsof V.parahaemolyticusPriortothisstudy,anO3:K6pandemicisolate(strain RIMD2210633)wassequencedtocompletion[7]andan O3:K6non-pandemicisolate(strainAQ3810)was sequencedtodraftstatus[9].Inthisstudy,we sequencedfouradditionalisolatesof V.parahaemolyticus toatleast8-folddraftcoverage,foratotalofsix clinicalisolates;twonon-pandemicandfourpandemic (Table1).Thetwonon-pandemicstrains,AQ3810and AQ4037,wereisolatedin1983and1985,respectively, andbothoriginatedfromSoutheastAsia.Throughout theremainderofthisstudy,wewillrefertothesetwo non-pandemicisolatesas pre-pandemic becausethey wereisolatedpriortothedocumentedstartofthepandemic.ThreeofthepandemicisolateswerefromSoutheastAsia,includingstrainRIMD2210633in1996,strain AN5034in1998,andstrainK5030in2005,whilethe fourthpandemicisolate(strainPeru466)wasisolated fromPeruin1996.Therefore,theisolatesrepresented twogeographicareaswheremajoroutbreaksoccurred. Inaddition,theyalsohavedifferentserotypesandtoxin profiles.Allofthepandemicstrainswere( tdh + trh-) andthepre-pandemicstrainswereeither( tdh + trh-)or ( tdh trh+),thusrepresentingtwopotentiallydifferent virulencemechanisms.Toimproveourunderstanding Table1Six V.parahaemolyticus strainsanalyzedduringthisstudyYearStrainSourceSerotype tdhtrh #contigsContigN50 (bp)Max.ContigReference 1983AQ3810SingaporeO3:K6+-1037 52609 295134[9] 1985AQ4037MaldivesO3:K6-+164 67710 241746Thisstudy 1996RIMD2210633ThailandO3:K6+-2 N.A. 3288558[7] 1996Peru466PeruO3:K6+-149 81497 273858Thisstudy 1998AN5034BangladeshO4:K68+-54 346246 1183081Thisstudy 2005K5030IndiaO3:K6+-164 62978 657114ThisstudyThecontigN50isthelengthofthesmallestcontiginthesetthatcontainsthefewest(largest)contigswhosecombinedlengthrepresentsatleast50%ofthe assembly[41].ThecontigN50wascalculatedfromtheCeleraassembly,notthecontigssubmittedtoGenbank.Pre-pandemicyears. N.A.NotApplicable.Chen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page2of13

PAGE 3

ofthepandemicclone sevolutionduringtheirglobal dissemination,thegenomeof aPeruvianisolate(strain Peru466)[10]wassequencedandcomparedtothegenomesofAsianisolatescollectedatdifferenttimepoints duringthepandemic.Inthelaterstageofthepandemic, therewerefewercasesofinfectioninSouthAsia;thus, V.parahaemolyticus isolatedduringthistimeseemsto belessvirulent(Nair,personalobservation).Therefore, anisolate(strainK5030)collectedin2005fromIndia wasincludedandconsidereda lessvirulent latestage pandemicisolateinthisstudy.Also,thegenomeofa never-before-sequencedserotypeO4:K68pandemic isolate(strainAN5034)wascharacterizedinorderto advanceourunderstandingofthemechanismforits serotypeconversion. Thepangenomeofthesix V.parahaemolyticus strainsweexaminedhad6,616chromosomalcoding genes,andeachindividualgenome(excludingplasmids) hadanaverageof4,673codinggenes(Figure1).Three thousandtwentyeightgenes,ca.71%ofthecoding geneswerepresentinallthestrains(Additionalfile1). However,thatnumbermaybelowerthantheactual numberbecausethegenomes,exceptforthegenomeof RIMD2210633,werenotsequencedtocompletion. Therefore,someoftheopen-readingframes(ORFs)that borderedcontigsmayhavefailedtomeetthecut-offs and,subsequently,weretreatedasnotpresent.Thefour newlysequencedgenomesdisplayedahighdegreeof syntenywithRIMD2210633(Figure1).Therewasvery littlerearrangementofthegenomeofthepre-pandemic strainAQ4037andessentia llynorearrangementinthe pandemicstrains.Becausethegapsarenotclosedfor fiveofthegenomes,ourreportofsyntenyrepresents ourbestestimation.SuperintegronV.parahaemolyticus harborsasuperintegron(SI)on chromosomeI.TheSIisabout48kblongandcontainsca.77ORFs,whichismuchsmallerthantheSIs in V.cholerae (120kb)andin V.vulnificus (138kb). MostoftheORFsintheSIregionsencodehypotheticalproteins.TheSIintegraseswereidenticalinthesix V.parahaemolyticus strainsexamined,butthecassettes intheSIregionsofthepre-pandemicstrainsvaried greatlyfromthoseofthepandemicstrains.Forexample, only24and28ofthe77ORFsinthepandemicstrains SIregionswerepresentinthoseofpre-pandemicstrains AQ3810andAQ4037,respectively.However,thecassettesintheSIregionsofthefourpandemicstrains werenearlyidentical;i.e.,theycontainedonlyafew pointmutations.TheonlyexceptionwasisolateK5030, whichhadanadditionalsixhypotheticalproteins insertedbetweentheintegraseandtherestofthecassettes.Theseobservationsindicatethattheintegraseis activein V.parahaemolyticus andcontributestospecies evolution.However,thefactthatitsSIregionissmaller thanthoseofotherpathogenic Vibrio species,andthe presenceofhighlyconservedcassettesinthepandemic strains,suggeststhatthegenomesof V.parahaemolyticus maybemorestablethanthoseofotherpathogenic Vibrios.Pathogenicityislands,prophages,andintegrated elementsOnlythepandemicstrainsex aminedinthisstudycontainedthepathogenicityislandspreviouslydescribed[7] for V.parahaemolyticus (Table2).Inaddition,we detectedvariousprophagesandintegratedelements usingPhage_Finder[11].Prophagef237,whichhasbeen widelyusedasageneticmarkerforthepandemicclone [12],waspresentinchromosomeI(lociVP1549-1562in strainRIMD2210633)ofallthepandemicstrainswe examined,butitwasabsentfromthepre-pandemic strains(Figure2A).However,aprophagesimilartof237 waspresentinpre-pandemicstrainAQ4037,inthe samelocationoccupiedbyf237inthepandemicstrains (Figure2A).Inadditiontof237,anotherprophagewas identifiedadjacenttof237(lociVP1563-1586instrain RIMD2210633)inallofthepandemicstrainsandin pre-pandemicstrainAQ3810,butitwasabsentfrom strainAQ4037.Also,asecondcopyofthatprophage waspresentinchromosomeIIofthepandemicstrains andinstrainAQ3810(Figure2B).Inaddition,aprophageregionuniquetotheserotypeO4:K68,pandemic strainAN5034(AN5034_0425-0489)wasidentifiedby Phage_Finder(Table2andAdditionalfile1). EachofthestrainswestudiedhadoneortwointegratedelementstargetingthetmRNAgene(Table2). Forexample,Peru466,AN5034,andK5030hadtwodifferent,integratedelementsinsertedintandemintothe 3 endoftheirtmRNAgenes.Theelementclosestto thetmRNAgenecontainedtwogenesthatmayinfluencevirulence:aputativec yclicdiguanylatephosphodiesteraseEALdomainproteinandanAraCsuperfamily putativefimbrialtranscriptionalactivator.Thesecond elementwasdistinguishedbythepresenceofaribonucleaseH-encodinggene.Thefirstelementwaspresent instrainsAQ3810andAQ4037,butnotinstrain RIMD2210633.However,strainsAQ3810andAQ4037 lackedtheribonucleaseH-e ncodingelementpresentin strainRIMD2210633.CharacterizationofthepathogenicityislandsPre-pandemicstrainAQ4037is tdh -, trh+andureasepositive,anditsgenomesequencerevealedapathogenicityisland(hereaftercalled trh PAI)containing81ORFs (Figure3).Anotherpathogenicityisland(hereafter referredtoas tdh PAI)waspreviouslyidentifiedinChen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page3of13

PAGE 4

0 5.0 10 5 1.0 10 6 1.5 10 6 2.0 10 6 2.5 10 6 3.0 10 6 3.5 10 6 Peru466RIMD2210633 AQ4037 K5030 AN5034 RIMD2210633 F E AQ3810 D K5030 C AN5034 B Peru466371 148 194 238 423 376 4259 3 4421 10 4374 9 3727 18 3634 52 3997 3 4189 28 3909 3 4264 55 3652 3 3506 0 3480 4 3327 3593 31 3750 423 3415 3 3454 2 4048 147 3429 1 3849 42 3371 44 3478 151 3368 41 3411 84 338457AQ4037A. B.05.0 10 5 1.0 10 6 1.5 10 6 2.0 10 6 2.5 10 6 3.0 10 6 3.5 10 605.0 10 5 1.0 10 6 1.5 10 6 2.0 10 6 2.5 10 6 3.0 10 6 3.5 10 605.0 10 5 1.0 10 6 1.5 10 6 2.0 10 6 2.5 10 6 3.0 10 6 3.5 10 605.0 10 5 1.0 10 6 1.5 10 6 2.0 10 6 2.5 10 6 3.0 10 6 3.5 10 6RIMD2210633 Chromosome I2.0 10 6 1.8 10 6 1.6 10 6 1.4 10 6 1.2 10 6 1.0 10 6 8.0 10 5 6.0 10 5 4.0 10 5 2.0 10 5002.0 10 5 4.0 10 5 6.0 10 5 8.0 10 5 1.0 10 6 1.2 10 6 1.4 10 6 1.6 10 6 1.8 10 6 2.0 10 62.0 10 6 1.8 10 6 1.6 10 6 1.4 10 6 1.2 10 6 1.0 10 6 8.0 10 5 6.0 10 5 4.0 10 5 2.0 10 502.0 10 6 1.8 10 6 1.6 10 6 1.4 10 6 1.2 10 6 1.0 10 6 8.0 10 5 6.0 10 5 4.0 10 5 2.0 10 502.0 10 6 1.8 10 6 1.6 10 6 1.4 10 6 1.2 10 6 1.0 10 6 8.0 10 5 6.0 10 5 4.0 10 5 2.0 10 50Chromosome II 30 40 50 60 70 80 90 100 % identity Figure1 Whole-genomecomparisonofsix V.parahaemolyticus strains .PanelA:Coloredlinesdenotethe percentidentitiesofprotein translations,andtheyareplottedaccordingtotheirlocationsinthereferencestrain(RIMD2210633)andquerystrain sgenomes.PanelB:Venn diagramsindicatethenumberofsharedproteins(black)anduniqueproteins(red)withinaparticularrelationshipforallsix V.parahaemolyticus strains. Chen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page4of13

PAGE 5

Table2Variableregionsin V.parahaemolyticus#Regionorinsertionsite relativetoRIMD2210633 Number ofORFs FunctionRIMD (O3:K6 1996) Peru466 (O3:K6 1996) AN5034 (O4:K68 1998) K5030 (O3:K6 2005) AQ3810 (O3:K6 1983) AQ4037 (O3:K6 1985) 1 BetweenVP0001-0002 (K5030_3039-3061) 23DNAsulfur modificationproteins ---+-2 VP0197-023842O3:K6LPS/CPS++-+++ 3 ReplacedVP0197-0238 (AN5034_1849-1901) 53O4:K68LPS/CPS--+--4 BetweenVP0248-0249 (AN5034_1830-1837) 8Unknown--+--5 VP0380-040324TypeIrestriction endonucleaseintRNAMet-1 ++++-6 VP0637-06437Integratedelement targettmRNA ++++-7 BetweenVP0643-0644 (AN5034_1437-1442) 6Integratedelement targettmRNA -+++++ 8 VP1071-10766Unknown,contains phageintegrase ++++-9 VP1077-108711Unknown,contains phageintegrase +++++10 VP1385-142137TypeVIsecretion system ++++-+ 11 VP1549-156214Phagef237++++-12 ReplacedVP1549-1562 (AQ4037_2432-2444) 13Phagesimilartof237-----+ 13 VP1563-158624Phagealpha*+++++14 BetweenVP1604-1605 (AN5034_0425-0489) 65Phage--+--15 VP1787-186578Superintegron++++vv 16 BetweenVP1864-1865 (K50301808-1814) 7Additiontosuper integron,nextto integrase ---+-17 VP1884-18918Unknown++++-18 VP1969-19746Fattyacidandamino acidmetabolism ++++-+ 19 VP2131-214414Hypotheticalproteins++++-20 BetweenVP2275-2276 (AQ4037_1749-1829) 81 trh pathogenicityisland-----+ 21 BetweenVP2638-2639 (AQ4037_1361_1383) 23Hypotheticalproteins, containsintegrase -----+ 22 VP2900-291011Hypotheticalproteins++++-23 VPA0434-044 0 7Hypotheticalproteins++ + +24 VPA0889-0912 24Phagebeta* ++ + ++25 VPA1254-1270 17Unknown ++ + +26 VPA1310-1398 86 tdh pathogenicity island +++++27 ReplacedVPA1310-1313 (AN5034_A0845-A0851) 7Hypotheticalproteins-+ + +28 ReplacedVPA1310-1398 (AQ4037_A1228-A1253) 25Nutrientuptakeand metabolism ----++Inthefirstcolumn,ifgenesareabsentfromRIMD,thegenenumbersinoneoftheothergenomesareindicatedintheparenthesis. *Thesetwophagesareverysimilar. vVariablecontentsinthesuperintegronChen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page5of13

PAGE 6

chromosomeIIofpandemicstrainRIMD2210633and includeslociVPA1310-1398(Figure3). tdh PAIcontains atypeIIIsecretionsystem(T3SSII)andtwocopiesof tdh ;whereas, trh PAIcontains trh ,anintegrase,transposases,aureasegenecluster,apeptide/nickeltransportationsystem,andaT3SSthatisdifferentfromtheone in tdh PAI(Figure3).TheT3SSinAQ4037 s trhPAIis similartoT3SSII b in V.parahaemolyticus TH3996, whichisrelatedtotheT3SSinnon-O1,non-O139 strainsof V.cholerae [13].Interestingly, trhPAIwas foundinchromosomeIIofstrainTH3996,butitwas locatedinchromosomeIofstrainAQ4037.Thisdiscrepancyinchromosomallocationmaybeprovidingaclue tothepathogenicityisland smobility. Closeexaminationofthe tdh PAIregioninthesix genomesrevealedmajordifferencesbetweenthe f237 orf8 orf4 orf7 orf9 orf1 orf5 orf2 orf3 orf6 orf10AQ3830_-like RIMD_f237/ 1549 1569 1583 1568 1562 1566 1556 1561 1551 1586 1573 1558 1565 1555 1578 1554 1567 1574 1553 1563 1571 1552 1564 1570 1575 1579 1550 1585 1582 1572 1581 1557 1576 1580 1559 1577 1584 1560Peru466_f237/ AQ4037_f237-like AN5034_f237/ (0537..0504) K5030_f237/ (1509..1542) orf1 1517 1551 24452432 5401 5570 100|% identity1| major capsid phage structural protein Drug resistance Hypothetical protein Conserved phage protein Conserved hypothetical replication zot zot zot zotrep rep rep rep rep rep Putative fitness factor DNA RepairA. RIMD_f237/ 1549 1569 1583 1568 1562 1566 1556 1561 1551 1586 1573 1558 1565 1555 1578 1554 1567 1574 1553 1563 1571 1552 1564 1570 1575 1579 1550 1585 1582 1572 1581 1557 1576 1580 1559 1577 1584 1560RIMD_ AQ3830_ AN5034_ Peru466_ K5030_ A0416 A0879 889910 0239 0260 A0756 A0778 A0438 A0901 rep rep rep rep rep zotB. Figure2 Linearillustrationoff237-likeprophageandjuxtaposedregions .DepictedarelinearrepresentationsofORFsfoundon chromosomeIofeachquerygenomewithsimilaritytothef237-likeprophageinRIMD2210633(A).ThoseregionsfoundonchromosomeIIof querygenomeswithsimilarityonlytothealpharegionofRIMD_f237/ a arealsoshown(B).QueryORFsarecoloredbyproteinpercentidentity toRIMD2210633proteins(seekey).ThereferenceRIMD_f237/a ORFsandqueryORFswithnomatchtoRIMD_f237/a ORFsarecoloredby functionrolecategoriesasnotedintheboxedkey. Chen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page6of13

PAGE 7

pre-pandemicandpandemicstrains(Figure4).Thefour epidemicstrains tdh PAIswereverysimilartoone another;however,theentirepathogenicityislandwas absentfromthepre-pandemic, tdh -strainAQ4037. Instead,thatstraincontainedapre-pandemic-specific regionof18ORFsimportantfortheuptakeandmetabolismofcarbonsourcesandothernutrients.Inaddition,althoughpre-pandemicstrainAQ3810contained both tdh PAIandthepre-pandemicnutrientuptake region,ithadaninverted tdh Sgene(Figure4).Thus, thepre-pandemicandpandemicstrainsexhibitedmajor differencesbetweentheregionupstreamofthepathogenicityislandandin tdh sorientation.Whetherthose variationsaffecttheexpressionofthepathogenicity island sgenes,whichcontributestodifferencesbetween thepre-pandemicandpandemicstrains virulence, remainstobedetermined.Thepathogenicityisland s organizationsuggeststhatanancestralstrainpossessing theO3:K6serotypemayhaverecruiteda tdhPAInextto VPA1309,whichyieldedatransientstrainthat subsequentlylostthepre-pandemicislandandgaverise tothepandemicstrains.Anotherpossibilityisthatthe tdh +,pre-pandemicstrainan dthepandemicstrains independentlyrecruitedthepathogenicityislandsinto thesamelocation. The tdh and trh genesarerelatedbutvarysubstantially[14],andmanyvariantsof tdh andtwo trh genes havebeendescribed.Theyhavebeenfoundinvarious Vibrio speciesandtheirphylogeneticrelationshipsare notinaccordancewiththe relationshipofthehost species[14].Althoughmostofthemareinthechromosomes,someofthemarepresentinplasmids,afinding thatisconsistentwiththeirproposedmobility.Also, thepresenceof trh and tdh onbothchromosomesof V.parahaemolyticus supportstheideathattheymay havebeenacquiredbylateralgenetransferandmay haveintegratedintothebacterium sgenomeduring independentevents.Also,since V.parahaemolyticus strainsthathaveboth tdh and trh havebeendescribed [15],itmightbeworthwhiletosequencethosestrains T3SSrelated trh ureasegenes Ni/peptidetransportation Integrase T ransposaseAQ4037() chromosomeI trh+ RIMD2210633 chromosomeII (tdh+)VP2276/ VIPARAQ4037_1748 toxR toxRtoxR toxRT3SSrelated tdhA tdhS T ransposase VP A130 9VP2275/ VIPARAQ4037_1830VP A1399 Figure3 Geneclustersinthe trh and tdh pathogenicityislands .TheupperlineindicateschromosomeIofstrainAQ4037,whilethelower lineindicateschromosomeIIofstrainRIMD2210633.Thinlinesconnecthomologousgenes,andtheboxesindicateORFs.Thecolorsdenote variousfunctionalcategories:brown,integrase;orange,transposase;green,urease-encodinggene;yellow,nickel/peptidetransport-encoding gene;red, tdh and trh ;pink, toxR ;blue,T3SS-relatedgene;openbox,othergenes. Pathogenicityisland GenesfornutrientuptakeVPA1309 VPA1399 tdhS tdhA tdhS T3SSPandemicstrains AQ3810 AQ4037 tdhA VIPARAQ4037_A1227 VIPARAQ4037_A1254A79_0534 malG A79_1303 Figure4 Diagramofthe tdh pathogenicityisland .Solidblackboxesindicatethecommonbordergenes,andthehomologousgenesare connectedbydottedlines.Homologousregionsareshadedand tdh genesareindicatedwitharrowsshowingthedirectionoftranscription. OtherORFsareomittedforsimplicity. Chen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page7of13

PAGE 8

genomesinordertounderstandtheevolutionaryhistory of tdh and trh intheirhosts. TDHandTRHaretheonlycon firmedvirulencefactorsof V.parahaemolyticus ;however,theirpreciseroles arenotwellunderstood.Genomicsequencingrevealed thatthegenesencodingthosetoxinsareincloseproximitytoaT3SSIIsystem,whichsuggeststheymayhave thesameoriginasthattransportsystem.Thus,itis temptingtospeculatethatTDH(and/orTRH)and T3SSIIhavecoordinatingactivitiesrelatedtothevirulenceof V.parahaemolyticus .Sometransloconproteins andeffectorshavebeenidentifiedforT3SSII[16,17]; however,theputativerelationshipbetweenTDH,TRH, andT3SSIIneedsfurtherinve stigation.AnotherT3SS (T3SSI)identifiedinthe V.parahaemolyticus genome wasdemonstrated[8,18]toberequiredforcytolytic activity.However,T3SSIwasconservedinallofthe genomesweexamined.VariabilitywithintheO3:K6geneticlocusSince1996,serotypeO3:K6haspredominatedamong clinicalisolatesof V.parahaemolyticus ,thusthatserotypehasbeenassociatedwiththebacterium sincreased virulence.However,strainsofO3:K6serotypehadbeen isolatedmorethanadecadebeforethepandemic initiated.Itisnotcleariftherearevariationswithinthe O3:K6geneticdeterminantsbetweennon-pandemicand pandemicstrains,andthuscausingsubtlestructuraldifferenceoftheOandKantigensthatcouldnotbe detectedbytheserotypingtechniques.Therefore,we examinedtheO-andK-antigen-encodinggeneclusters inthepre-pandemicandpandemicO3:K6strainsfor anyvariationsthatmayexplaintheobservedincreasein virulence.TheO-andK-antigen-encodinggeneclusters arejuxtaposedin V.parahaemolyticus O3:K6[19].In strainRIMD2210633,theyarelocatedatlociVP01900238inchromosomeI(position201,797-253,279) [20,21].However,thatregionwasconservedinallO3: K6strains(sharing>99.5%a minoacid-encodingidentity),whichsuggeststhattheserotypeof V.parahaemolyticus maynotbedirectlyrelatedtothepandemic strain sincreasedvirulence.SerotypeconversionfromO3:K6toO4:K68InadditiontoserotypeO3:K6,>20otherserotypesof V.parahaemolyticus weredetectedamongpandemic strains[2].Themechanismforthisserotypeconversion remainsunknown.Sinceth eO-andK-antigenlociare tightlylinkedonthechromosome[19],arecombination eventinvolvingthisregionwouldenablerapidconversionofserotypes.BeforecomparingtheO-andKantigenregionbetweenO3:K6andO4:K68serotypes, wefirstwantedtodeterminethevariabilityofthis regionwithintheO4:K68serotypeaswedidabovefor theO3:K6serotype.WecomparedtheO-andK-region ofstrainAN5034(O4:K68),AN5034_1842-1901,tothe O-andK-regionofanotherO4:K68straindesignated NIID242-200[21].Bothclusterswerenearlyidentical(9 mismatchesina63-kb-longregion),asobservedabove fortheO3:K6loci.WhencomparingtheO-andK-antigenregionsbetweenstrainAN5034(O4:K68)andstrain RIMD2210633(O3:K6),thisregionvariedsubstantially, exceptforthefirstsevengenes,suggestingrecombinationasamethodforserotypeconversion.Tohelpidentifythescopeoftherecombinationevent,weanalyzed thedistributionofsingle-nucleotidepolymorphisms (SNPs)inthegenomeofstrainAN5034comparedto strainRIMD2210633.Comparedtotheotherpandemic isolates,strainAN5034had2,281SNPs(excluding insertionsanddeletions),2,142(94%)ofwhichclustered ina141kbregion(position199,786-341,273in RIMD2210633chromosomeI,andposition166,252324,726inAN5034contigACFO01000016.1),correspondingtobetween2-kbupstreamand88-kbdownstreamoftheO-andK-antigen-encodinggeneclusters (Additionalfile2bluehighlight,Figure5circle5).This observationsuggeststhatarecombinationeventinvolvingamuchlargerregionthantheO-antigen-and K-antigenlocioccurredandgaverisetothenewO4: K68serotypeduringthepandemic.OriginoftheSouthAmericanoutbreaksOurresultsindicatedthatthepandemicstrainsare closelyrelatedtooneanother.StrainRIMD2210633 differedfromstrainK5030andthePeruvianstrain (Peru466)by70and76SNPs,r espectively(Additional file2).Peru466wasisolate datapproximatelythesame timethefirstoutbreak(whichlaterevolvedintoapandemic)wasreportedinIndia.Also,itsgenecontentis almostidenticaltothatofapandemicAsianstrain (RIMD2210633)isolatedatthesametime,which supportstheideathatthepandemicstrainspreadfrom AsiatoSouthAmericaverysoonafteritemerged. Consideringtheproximity,itislikelythatthe2005 outbreakinChilewascausedbystrainsdescended fromthePeruvianisolates.Wespeculatethatthe strainsfromSouthAmericanoutbreaksareclosely relatedtothestrainsfromIndianoutbreaks.This hypothesiswillrequiregenomicsequencingofadditionalstrainsfromSouthAmerica,specificallythose fromChile,toconfirm. Fewercasesofdiseasecausedby V.parahaemolyticus werereportedfromAsiaduringthelaterstagesofthe pandemicthanduringitsearlystages(Nair,personal observation);thus,wesuspec tthatthestrainsisolated duringthelaterstageswerenotasvirulentasthose obtainedduringitsearlystages.Therefore,we sequencedandanalyzedthegenomeofstrainK5030,Chen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page8of13

PAGE 9

whichwasisolatedfromIndiaalmostadecadeafterthe pandemicstarted,tolookforgenedeletionsandvariations.Majorgenedeletionswerenotidentifiedinstrain K5030;instead,severalinsert ionsweredetected(Table 2).Inadditiontothecassettesaddedtothestrain sSI, K5030hadaninsertionof23ORFs(betweenVP0001 andVP0002)containingaDNAphosphorothioation (dnd)system,whichincorporatessulphurintotheDNA backbone,[22,23].Recentevi dencesupportsthelateral transferofdndgenesamongbacterialgenomes[24]. Besidesthedndgenes,therewerealsotwotransposases andoneintegraseintheinsertion.However,itisnot clearwhethertheadditionofthoseORFsreducedthe strain svirulencebyalteringthestructureandfunction ofvirulencefactors.Comparativegenomicanalysesofthethreemajor pathogenic Vibrio speciesInordertocharacterizeimportantgeneticdifferences between V.parahaemolyticus V.cholerae and V. vulnificus ,wecomparedtheircoregenomes.The V. cholerae groupconsistedoffourcompletelysequenced genomesoftoxigenic,serotypeO1strains:N16961, M66-2,MJ-1236andO395[25-27].The V.vulnificus groupcontainedtwocompletelysequencedgenomesof strainsCMCP6andYJ016[28,29],andthe V.parahaemolyticus groupconsistedofstrainRIMD2210633and thetwopre-pandemicisolates(strainsAQ3810and AQ4037).Theorthologsfromeachgroupwere extractedbycomparingallmembersofthegroup,after whichthecoregenesfromeachgroupwerecompared toeachother. Forty-nineto59%ofthecoregenesineachspecies werecommontoallthreespecies(Figure6).However, 14-24%ofthecoregenesineachgroupwereonlyconservedinitsowngroupandwereindifferentCOG (ClustersofOrthologousGenes)categories(Figure6). Thesearelikelytobethegenesthatdefinethebacteria atthespecieslevel.Furthermore,eachspecieshadspecificgenesandtransportersrequiredforvariousmetabolic Figure5 CircularIllustrationofSingleNucleotidePolymorphismsandGenomeFeaturesRelativetotheReferenceStrain RIMD2210633.ChromosomesI(A)andII(B)areillustratedasacircleswhereeachconcentriccirclerepresentsgenomicdataandisnumbered fromtheoutermosttotheinnermostcircle.Circles1and2representRIMD2210633ORFsandarecoloredbasedonfunctionrolecategories. Circle3depictsthelocationinRIMD2210633ofvariousgenomicfeaturesdescribedinthisstudy.Circles4-8denotethelocationofSNPsrelative toRIMD2210633andgenomicfeaturesforstrainPeru466,AN5034,K5030,AQ4037,andAQ3810,respectively.Refertothekeyfordetailsoncolor representationsandcirclenumbers. Chen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page9of13

PAGE 10

pathways,whichindicatesthattheyhavedifferent requirementsfortransportingvariousions,nutrients, andothermetabolitesacrosstheiroutermembranes.In addition,eachspecieshaduniquetwo-componentregulatorysystemsandchemotaxisgenes,whichindicate thattheyhavespecificsignalpathwaysthatrespondto variousenvironmentalstimuli. Thethreespeciesalsodifferintheiroutermembrane structureandvirulencegenes(e.g.theyhaddifferentsets ofgenesforsurfacepolysaccharidebiosynthesis). Furthermore, V.vulnificus hasauniquesetofgenesfor FLPpilussynthesis,andtoxigenicstrainsof V.cholerae havegenesfortoxinco-regulatedpilussynthesis,awellstudiedvirulencefactor[27](the ctx phageisnot includedascontaininggenesspecificfortoxigenic V.cholerae becauseitisnotpresentinstrainM66-2). V.parahaemolyticus possessesgenesencodingtwounique flagella,inadditiontothegenesrequiredforthe biosynthesisofflagellapossessedbyallthreespecies. V. cholerae alsohastwodifferentT3SS-containingislands [7].T3SShasbeenreportedtobecloselyrelatedto T3SSIIin V.parahaemolyticus andtobepresentinnonO1,non-O139strainsof V.cholerae [30,31],butwedid notdetectitinthetoxigenicstrainsof V.cholerae .However,thepossessionofT3SSby V.parahaemolyticus and non-O1,non-O139strainsof V.cholerae suggeststhat thistransportsystemisrequiredforcolonizationoftheir unknownenvironmentalhostsandreservoirs.Evolutionof V.parahaemolyticusInordertoadvanceourunderstandingoftherelationshipsbetweenthe V.parahaemolyticus strainswe characterized,asetof924single-copygenespresentin allsixstrains(plusanoutlierof V.vulnificus CMCP6), takenfromtheanalysisofthreepathogenicVibrios, wascompiled,andanucleo tidemaximum-likelihood treewasinferredfortheconcatenatedsetof924genes (Additionalfile3).Ourexpectationthataresolved phylogenysupportedbyth emajorityofgeneswould befoundandinterpretedasanestimateofthestrains phylogenieswasnotmet,aresultsimilartothatof Boydetal2008[9]whoalsofoundthatallofthepandemicstrainswereintermixedwithpre-pandemic strains,butusedjustthree genesintheiranalysis. InspectionofthealignedgenomeswiththeSNPs markedineachone(Figure5)revealedthattheSNPs wereoftenclustered,possi blyindicativeofrecombinationevents.Recombination eventsinvolvingmultiple SNPswoulddistortthebranchlengthsandprovidea poorestimateofthephylogeny.Inretrospect,this resultmighthavebeenexpectedbecausewehave shownthatrecombinationeventscaninvolvealarge numberofvariablenucleotidesinnumerousgenesand eachoneofthoseeventswilldistortthedistancemeasureusedforphylogenetictreesthatassumeallmutationsareindependentlyacquired.Inthesequenced strains,thesinglerecombinationeventthatconverted theserotypeandreplaced theneighboring90kbcontained14timesthetotalnumberofvariablenucleotidesobservedintherestofthegenome(Additional file2,Figure5circle5).Thus,ifonlyoneofthe110 genesfromthisregionwereincludedwiththerestof thegenesinthegenometocalculateatree,thatone gene,onaverage,wouldcontribute11%ofthetotal variation.Inordertocalculateaccuratetrees,each recombinationeventandindependentmutationevent mustbegivenequalweight.ConclusionThisstudyhelpstoimproveourunderstandingofhow V.parahaemolyticus evolvedduringapandemic.The resultsofourmultiplegenomeanalysesareconsistent AV. parahaemolyticus CV. cholerae B850 611 486 247962620041511853 2479626 V. vulnificus A .Information storage and processing JTranslation,ribosomalstructureandbiogenesis KTranscription LDNAreplication,recombinationandrepair Cellular processes DCelldivisionandchromosomepartitioning OPosttranslationalmodification,proteinturnover,chaperones MCellenvelopebiogenesis,outermembrane NCellmotilityandsecretion PInorganiciontransportandmetabolism TSignaltransductionmechanisms CEnergyproductionandconversion GCarbohydratetransportandmetabolism EAminoacidtransportandmetabolism FNucleotidetransportandmetabolism HCoenzymemetabolismI Lipidmetabolism QSecondarymetabolitesbiosynthesis,transportandcatabolism SFunction unclearMetabolism VVS C T K M E G N PVP S C T K M E G N P SK M C T E N P GVCB. Figure6 Species-specificcoregenesof V.parahaemolyticus V. cholerae ,and V.vulnificus .Leftpanel:Eachcirclerepresentscore genesextractedfromselectedstrainsofeachspecies.TheVenn diagramsshowthenumberofsharedproteins(black)andunique proteins(red)withinaparticularrelationshipforallthreespecies. Rightpanel:Piechartofspecies-specificgenes.Functional categoriesareindicatedwithvariouscolorsandletters. Chen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page10of13

PAGE 11

withtheideathatpandemicstrainsof V.parahaemolyticus evolvedfrompre-pandemicstrainsbynumerous deletionsandacquisitionsofgeneticmaterial.Pandemic strainsdifferfrompre-pandemicstrainsmostlyin mobilegeneticelementsandthestructureofthepathogenicityislands.SerotypeconversiontoO4:K68was likelyduetoarecombinatio neventinvolvingaregion muchlargerthantheO-antigen-andK-antigen-encodinggeneclusters.Inaddition,thisstudyrevealedthat ( tdh + trh -)and( tdh trh +)strainsnotonlyhavedifferenttoxingenes,butalsodifferinthestructuresand locationsoftheirpathogenicityislands. Lateralgenetransfersee mstobethemajorforce shapingthevirulenceof V.parahaemolyticus ,asevidencedbythediversityinthelocationsandnucleotide sequencesofthevirulencefactor-encodinggenes.In addition,previousstudieshadshownthatinsertion sequencesin V.parahaemolyticus couldchangethegenomestructureandresultinthelossofamajorvirulence factor[32,33].However,thepandemicstrainswestudied arealmostmonomorphic(exceptforpathogenicity islandsandmobileelements)andtheoutbreaksinAsia andSouthAmericaarecloselyrelated.Duringthepandemic slaterdevelopment,pandemicstrainswiththe O3:K6serotype(anditsserovariants)werenolonger prevalentinIndia,wherethepandemicoriginated(Nair, personalobservations).Instead,massiveoutbreakswere reportedinChile.Thisobservationsuggeststhatthe virulentclonethatspreadtoSouthAmericaduringthe pandemic searlystagehaspersistedinthatareaand continuestocauseoutbreaks.Therewasnolossof knownvirulencegenesinthelaterstagepandemic (K5030)isolatefromsouthAsia;therefore,itsvirulence statusneedsfurtherevaluation. Geneticchangesintheetiologicagentmaynotbethe onlyfactorleadingto V.parahaemolyticus -mediatedpandemics(e.g.optimalenvironmentalconditionsmay enablepandemicstrainstoflourishintheirreservoirs). Forexample,anoutbreakoffoodbornediarrhealdisease causedby V.parahaemolyticus wasreportedonanAlaskancruiseshipduring2004,andthesourceoftheinfectionwas V.parahaemolyticus -contaminatedoysters harvestedfollowingwarmweatherinAlaska,where V. parahaemolyticus hadnotbeenpreviouslyisolated[34]. Inaddition,althoughtherearemajorgeneticdifferences betweenpre-pandemicandpandemicstrainsof V.parahaemolyticus ,andtheymay,tosomeextent,contribute tothepandemicstrains increasedvirulence,apathogenicityisland(whichcontains tdh andT3SS)isalsopresent inthepre-pandemicstrainAQ3810.Thus,itislikelythat therecent V.parahaemolyticus -mediatedpandemic resultedfromtheconvergenceofgeneticchangesinthe etiologicagentandthepres enceofoptimalconditions forsurvivalandgrowthinitsnaturalreservoirs.MaterialsandmethodsStrainisolationandverificationThe V.parahaemolyticus strainswereisolatedon TCBS(thiosulfate,citrate,bilesalts,andsucrose)agar mediumfollowedbytheirpresumptiveidentification withamultitestmedium[35].Thestrains identities wereconfirmedbyaspecies-specific toxR assay[29],a commerciallyavailable V.parahaemolyticus antiserum kit(ToshibaKagakuKogyoCo.,Ltd.,Tokyo,Japan) wasemployedforserologicaltyping,and tdh and trh wereidentifiedbyPCR[36 ].Strainswerecultured overnightinLuria-Bertanibroth,andDNAwas obtainedbylysingthebacteriawithproteinaseKfollowedbyDNAextractionandpurificationwithaQiagenMaxiKit(Valencia,CA).GenomesequencingThegenomesof V.parahaemolyticus strainsAN5034, AQ4037,K5030,andPeru466weresequencedbythe Sangerwhole-genomerandomshotgunmethod[37]. Briefly,onesmallinsertplasmidlibrary(3-4kb)and onemediuminsertplasmidlibrary(10-12kb)wereconstructedbyrandomnebulizationandcloningofgenomic DNA.Duringtheinitialrandom-sequencingphase,8foldsequencecoveragewasachievedwiththesmallandmedium-sizelibrariessequencedtoyield5-foldand 3-foldcoverage,respectively.Thesequenceswere assembledusingtheCeleraAssembler[38],andordered scaffoldsweregeneratedbyusingNUCMER[39]to alignthecontigstothegenomeof V.parahaemolyticus RIMD2210633,followedbyhierarchicalscaffoldingwith BAMBUS[40].ThecontigN50wasdeterminedas described[41]. Aninitialsetofopen-readingframes(ORFs)wasidentifiedusingGLIMMER[42],andORFsshorterthan90 bp(andsomewithoverlaps)wereeliminated.Aregion containingthelikelyoriginofreplicationwasidentified, andbasepair1wasdesignatedadjacenttothe dnaA genelocatedinthatregion[43].TheORFswere searchedagainstanonredundantproteindatabase,and theORFpredictionsandgenefamilyidentifications weredoneaspreviouslydescribed[27].TwosetsofhiddenMarkovmodels(HMMs)wereusedtodetermine ORFsmembershipinfamiliesandsuperfamilies.These included721HMMsfromPfamv22.0and631HMMs fromtheTIGRorthologres ource.Atransmembrane hiddenMarkovmodel(TMHMM)[44]wasusedto identifymembrane-spanningdomainsinproteins,and putativefunctionalrolecategorieswereassignedinternallyaspreviouslydescribed[45]. Thenucleotidesequencesandthecorresponding automatedannotationsforthegenomesof V.parahaemolyticus strainsAN5034,AQ4037,K5030, andPeru466weresubmittedtoNCBI,withaccessionChen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page11of13

PAGE 12

numbers[GenBank:ACFO00000000,Genbank: ACFN00000000,Genbank:ACKB00000000,andGenbank:ACFM00000000],respectively.ComparativegenomicsThedatabaseandcut-offsmentionedabovewereused, aspreviouslydescribed[37],to(i)produceanortholog matchtable,(ii)constructaVenndiagram,and(iii)bin therelationshipswithintheVenndiagram.Synteny plotsusingPROMER[39]werecomputedaspreviously described[37].SNPswereidentifiedbymappingchromosomalcontigstothecompletereferencegenomeof RIMD2210633usingNUCMER[39]withdefaultsetting anddisplayedusingtheSHOW-SNPStoolwiththe-C (donotreportSNPsfromalignmentswithanambiguousmapping)and-I(donotreportindels)options. SHOW-SNPSispartoftheMUMMER3distribution http://mummer.sourceforge.net/.DNAmaximumlikelihoodtreeswerecreated(usingPAUP*4.0b)foreachof the924entriesintheabovementionedorthologtable thathadorthologsfor V.vulnificus CMCP6andthesix V.parahaemolyticus strains.Inordertoensureproper alignmentofthecodingregions,thetreeswerebased onDNAalignmentsback-alignedfromtheproteins alignments.AdditionalmaterialAdditionalfile1: Vibrioparahaemolyticus OrthologMatchTable BLAST-basedorthologMatchTableof V.parahaemolyticus strains Additionalfile2: Vibrioparahaemolyticus SingleNucleotide Polymorphisms .SingleNucleotidePolymorphismsof V. parahaemolyticus genomesrelativetoreferencestrainRIMD2210633 Additionalfile3:Relationshipsof V.parahaemolyticus strains .DNA MaximumLikelihoodtreebasedon924orthologsofthesix V. parahaemolyticus strains. V.vulnificus strainCMCP6wasusedasan outlier. Acknowledgements WearegratefultotheJ.CraigVenterInstitute ssequencing,bioinformatics andITdepartmentsforsupportingtheinfrastructurerequiredtodetermine thegenomes sequencesandannotation.WeespeciallythankA.Scott Durkinforcontributingtothegenomes annotation.WealsothankArnoldS. KregerandStephanieZafonteforreadingandeditingthemanuscript.This projectwassupportedbythe(i)NationalInstituteofAllergyandInfectious Diseases(NationalInstitutesofHealth,DepartmentofHealthandHuman Services),undercontractnumberN01-AI-30071,and(ii)Universityof MarylandClinicalResearchUnitoftheFoodandWaterborneDiseases IntegratedResearchNetwork(fundedbytheNationalInstituteofAllergy andInfectiousDiseases),undercontractnumberN01-AI-40014. Authordetails1EmergingPathogensInstitute,UniversityofFlorida,Gainesville,FL,USA.2DepartmentofPathology,UniversityofFlorida,Gainesville,FL,USA.3School ofMedicine,UniversityofMaryland,Baltimore,MD,USA.4Institutode InvestigacinNutricional,Peru.5NationalInstituteofCholeraandEnteric Diseases,Kolkata,India.6KyotoUniversity,Kyoto,Japan.7TheJ.CraigVenter Institute,Rockville,MD,USA. Authors contributions OCS,RH,GBN,AIG,MN,andDEFconceivedandorganizedtheproject;GBN, AIG,andMNprovidedstrains;OCSandYCisolatedDNA;DEForganizedthe sequencingstudies;DEF,JHB,andYCanalyzedthegenomicdata;andYC, DEF,JHB,andOCSpreparedthemanuscript.Allauthorsreadandapproved thefinalmanuscript. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Received:21December2010Accepted:6June2011 Published:6June2011 References1.FujinoL,OkunoY,NakadaD,AoyamaA,FukaiK,MukaiT,UeboT: Onthe bacteriologicalexaminationofshirasufoodpoisoning. MedJOsakaUniv 1953, 4 :299-304. 2.NairGB,RamamurthyT,BhattacharyaSK,DuttaB,TakedaY,SackDA: GlobaldisseminationofVibrioparahaemolyticusserotypeO3:K6andits serovariants. ClinMicrobiolRev 2007, 20(1) :39-48. 3.NairGB,HormazabalJC: The Vibrioparahaemolyticus pandemic. Rev ChilenaInfectol 2005, 22(2) :125-130. 4.ChowdhuryNR,ChakrabortyS,RamamurthyT,NishibuchiM,YamasakiS, TakedaY,NairGB: MolecularevidenceofclonalVibrioparahaemolyticus pandemicstrains. EmergInfectDis 2000, 6(6) :631-636. 5.ChowdhuryNR,StineOC,MorrisJG,NairGB: Assessmentofevolutionof pandemicVibrioparahaemolyticusbymultilocussequencetyping. JClin Microbiol 2004, 42(3) :1280-1282. 6.NishibuchiM,TaniguchiT,MisawaT,Khaeomanee-IamV,HondaT, MiwataniT: Cloningandnucleotidesequenceofthegene( trh )encoding thehemolysinrelatedtothethermostabledirecthemolysinof Vibrio parahaemolyticus InfectImmun 1989, 57(9) :2691-2697. 7.MakinoK,OshimaK,KurokawaK,YokoyamaK,UdaT,TagomoriK,IijimaY, NajimaM,NakanoM,YamashitaA, etal : Genomesequenceof Vibrio parahaemolyticus :apathogenicmechanismdistinctfromthatofV cholerae. Lancet 2003, 361(9359) :743-749. 8.ParkKS,OnoT,RokudaM,JangMH,OkadaK,IidaT,HondaT: Functional characterizationoftwotypeIIIsecretionsystemsof Vibrio parahaemolyticus InfectImmun 2004, 72(11) :6659-6665. 9.BoydEF,CohenAL,NaughtonLM,UsseryDW,BinnewiesTT,StineOC, ParentMA: Molecularanalysisoftheemergenceofpandemic Vibrio parahaemolyticus BMCMicrobiol 2008, 8 :110. 10.GilAI,MirandaH,LanataCF,PradaA,HallER,BarrenoCM,NusrinS, BhuiyanNA,SackDA,NairGB: O3:K6serotypeof Vibrioparahaemolyticus identicaltotheglobalpandemiccloneassociatedwithdiarrheainPeru. IntJInfectDis 2007, 11(4) :324-328. 11.FoutsDE: Phage_Finder:automatedidentificationandclassificationof prophageregionsincompletebacterialgenomesequences. NucleicAcids Res 2006, 34(20) :5839-5851. 12.NasuH,IidaT,SugaharaT,YamaichiY,ParkKS,YokoyamaK,MakinoK, ShinagawaH,HondaT: A filamentousphageassociatedwithrecentpandemic Vibrioparahaemolyticus O3:K6strains. JClinMicrobiol 2000, 38(6) :2156-2161. 13.OkadaN,IidaT,ParkKS,GotoN,YasunagaT,HiyoshiH,MatsudaS, KodamaT,HondaT: IdentificationandcharacterizationofanoveltypeIII secretionsystemin trh -positive Vibrioparahaemolyticus strainTH3996 revealgeneticlineageanddiversityofpathogenicmachinerybeyond thespecieslevel. InfectImmun 2009, 77(2) :904-913. 14.NishibuchiM,KaperJB: Thermostabledirecthemolysingeneof Vibrio parahaemolyticus :avirulencegeneacquiredbyamarinebacterium. InfectImmun 1995, 63(6) :2093-2099. 15.ShiraiH,ItoH,HirayamaT,NakamotoY,NakabayashiN,KumagaiK, TakedaY,NishibuchiM: Molecularepidemiologicevidenceforassociation ofthermostabledirecthemolysin(TDH)andTDH-relatedhemolysinof Vibrioparahaemolyticus withgastroenteritis. InfectImmun 1990, 58(11) :3568-3573. 16.KodamaT,HiyoshiH,GotohK,AkedaY,MatsudaS,ParkKS,CantarelliVV, IidaT,HondaT: Identificationoftwotransloconproteinsof Vibrio parahaemolyticus typeIIIsecretionsystem2. InfectImmun 2008, 76(9) :4282-4289. 17.KodamaT,RokudaM,ParkKS,CantarelliVV,MatsudaS,IidaT,HondaT: IdentificationandcharacterizationofVopT,anovelADP-Chen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page12of13

PAGE 13

ribosyltransferaseeffectorproteinsecretedviathe Vibrio parahaemolyticus typeIIIsecretionsystem2. CellMicrobiol 2007, 9(11) :2598-2609. 18.OnoT,ParkKS,UetaM,IidaT,HondaT: Identificationofproteinssecreted via Vibrioparahaemolyticus typeIIIsecretionsystem1. InfectImmun 2006, 74(2) :1032-1042. 19.ChenY,DaiJ,MorrisJG,JohnsonJA: Geneticanalysisofthecapsule polysaccharide(Kantigen)andexopolysaccharidegenesinpandemic Vibrioparahaemolyticus O3:K6. 2010, 10 :274. 20.ChenY,DaiJ,JohnsonJA,MorrisJG: Geneticdeterminantofthecapsule polysaccharide(Kantigen)inpandemic Vibrioparahaemolyticus O3:K6. 2009. 21.OkuraM,OsawaR,TokunagaA,MoritaM,ArakawaE,WatanabeH: Genetic analysesoftheputativeOandKantigengeneclustersofpandemic Vibrioparahaemolyticus MicrobiolImmunol 2008, 52(5) :251-264. 22.ZhouX,HeX,LiangJ,LiA,XuT,KieserT,HelmannJD,DengZ: Anovel DNAmodificationbysulphur. MolMicrobiol 2005, 57(5) :1428-1438. 23.ZhouX,DengZ,FirminJL,HopwoodDA,KieserT: Site-specific degradationofStreptomyceslividansDNAduringelectrophoresisin bufferscontaminatedwithferrousiron. NucleicAcidsRes 1988, 16(10) :4341-4352. 24.WangL,ChenS,VerginKL,GiovannoniSJ,ChanSW,DemottMS, TaghizadehK,CorderoOX,CutlerM,TimberlakeS, etal : DNA phosphorothioationiswidespreadandquantizedinbacterialgenomes. ProcNatlAcadSciUSA 2011. 25.ChunJ,GrimCJ,HasanNA,LeeJH,ChoiSY,HaleyBJ,TavianiE,JeonYS, KimDW,LeeJH, etal : Comparativegenomicsrevealsmechanismfor short-termandlong-termclonaltransitionsinpandemic Vibriocholerae ProcNatlAcadSciUSA 2009, 106(36) :15442-15447. 26.FengL,ReevesPR,LanR,RenY,GaoC,ZhouZ,RenY,ChengJ,WangW, WangJ, etal : Arecalibratedmolecularclockandindependentoriginsfor thecholerapandemicclones. PLoSOne 2008, 3(12) :e4053. 27.HeidelbergJF,EisenJA,NelsonWC,ClaytonRA,GwinnML,DodsonRJ, HaftDH,HickeyEK,PetersonJD,UmayamL, etal : DNAsequenceofboth chromosomesofthecholerapathogen Vibriocholerae Nature 2000, 406(6795) :477-483. 28.ChenCY,WuKM,ChangYC,ChangCH,TsaiHC,LiaoTL,LiuYM,ChenHJ, ShenAB,LiJC, etal : Comparativegenomeanalysisof Vibriovulnificus ,a marinepathogen. GenomeRes 2003, 13(12) :2577-2587. 29.KimYB,OkudaJ,MatsumotoC,TakahashiN,HashimotoS,NishibuchiM: Identificationof Vibrioparahaemolyticus strainsatthespecieslevelby PCRtargetedtothetoxRgene. JClinMicrobiol 1999, 37(4) :1173-1177. 30.ChenY,JohnsonJA,PuschGD,MorrisJGJr,StineOC: Thegenomeof non-O1 Vibriocholerae NRT36Sdemonstratesthepresenceof pathogenicmechanismsthataredistinctfromthoseofO1 Vibrio cholerae InfectImmun 2007, 75(5) :2645-2647. 31.DziejmanM,SerrutoD,TamVC,SturtevantD,DiraphatP,FaruqueSM, RahmanMH,HeidelbergJF,DeckerJ,LiL, etal : Genomiccharacterization ofnon-O1,non-O139 Vibriocholerae revealsgenesforatypeIII secretionsystem. ProcNatlAcadSciUSA 2005, 102(9) :3465-3470. 32.KamruzzamanM,BhoopongP,VuddhakulV,NishibuchiM: Detectionofa functionalinsertionsequenceresponsiblefordeletionofthe thermostabledirecthemolysingene( tdh )in Vibrioparahaemolyticus Gene 2008, 421(1-2) :67-73. 33.KamruzzamanM,NishibuchiM: Detectionandcharacterizationofa functionalinsertionsequence,ISVpa2,in Vibrioparahaemolyticus Gene 2008, 409(1-2) :92-99. 34. TheGreatConvergence:People,Animals,andtheEnvironment. [http:// www.cdc.gov/nczved/framework/features/convergence.html]. 35.BagPK,NandiS,BhadraRK,RamamurthyT,BhattacharyaSK,NishibuchiM, HamabataT,YamasakiS,TakedaY,NairGB: Clonaldiversityamong recentlyemergedstrainsof Vibrioparahaemolyticus O3:K6associated withpandemicspread. JClinMicrobiol 1999, 37(7) :2354-2357. 36.TadaJ,OhashiT,NishimuraN,ShirasakiY,OzakiH,FukushimaS,TakanoJ, NishibuchiM,TakedaY: Detectionofthethermostabledirecthemolysin gene( tdh )andthethermostabledirecthemolysin-relatedhemolysin gene( trh )of Vibrioparahaemolyticus bypolymerasechainreaction. Mol CellProbes 1992, 6(6) :477-487. 37.FoutsDE,MongodinEF,MandrellRE,MillerWG,RaskoDA,RavelJ, BrinkacLM,DeBoyRT,ParkerCT,DaughertySC, etal : Majorstructural differencesandnovelpotentialvirulencemechanismsfromthe genomesofmultiple Campylobacter species. PLoSBiol 2005, 3(1) :e15. 38.MyersEW,SuttonGG,DelcherAL,DewIM,FasuloDP,FlaniganMJ, KravitzSA,MobarryCM,ReinertKH,RemingtonKA, etal : Awhole-genome assemblyof Drosophila Science 2000, 287(5461) :2196-2204. 39.DelcherAL,PhillippyA,CarltonJ,SalzbergSL: Fastalgorithmsforlargescalegenomealignmentandcomparison. NucleicAcidsResearch 2002, 30(11) :2478-2483. 40.PopovicT,BoppC,OlsvikO,WachsmuthK: Epidemiologicapplicationofa standardizedribotypeschemefor Vibriocholerae O1. JClinMicrobiol 1993, 31(9) :2474-2482. 41.MillerJR,KorenS,SuttonG: Assemblyalgorithmsfornext-generation sequencingdata. Genomics 2010, 95(6) :315-327. 42.DelcherAL,HarmonD,KasifS,WhiteO,SalzbergSL: Improvedmicrobial geneidentificationwithGLIMMER. NucleicAcidsResearch 1999, 27(23) :4636-4641. 43.BramhillD,KornbergA: Duplexopeningby dnaA proteinatnovel sequencesininitiationofreplicationattheoriginofthe E.coli chromosome. Cell 1988, 52(5) :743-755. 44.KroghA,LarssonB,vonHeijneG,SonnhammerEL: Predicting transmembraneproteintopologywithahiddenMarkovmodel: applicationtocompletegenomes. JMolBiol 2001, 305(3) :567-580. 45.FleischmannRD,AdamsMD,WhiteO,ClaytonRA,KirknessEF, KerlavageAR,BultCJ,TombJF,DoughertyBA,MerrickJM, etal : Wholegenomerandomsequencingandassemblyof Haemophilusinfluenzae Rd. Science 1995, 269(5223) :496-512.doi:10.1186/1471-2164-12-294 Citethisarticleas: Chen etal .: Comparativegenomicanalysisof Vibrio parahaemolyticus :serotypeconversionandvirulence. BMCGenomics 2011 12 :294. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Chen etal BMCGenomics 2011, 12 :294 http://www.biomedcentral.com/1471-2164/12/294 Page13of13


!DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd'
ui 1471-2164-12-294
ji 1471-2164
fm
dochead Research article
bibl
title p Comparative genomic analysis of it Vibrio parahaemolyticus: serotype conversion and virulence
aug
au id A1 snm Chenfnm Yuanshainsr iid I1 I2 email yuansha.chen@pathology.ufl.edu
A2 StineO ColinI3 ostin001@umaryland.edu
A3 Badgermi HJonathanI7 jbadger@jcvi.org
A4 GilIAnaI4 agil@iin.sld.pe
A5 NairG BalakrishI5 gbnair_2000@yahoo.com
A6 NishibuchiMitsuakiI6 nisibuti@cseas.kyoto-u.ac.jp
ca yes A7 FoutsEDerrickdfouts@jcvi.org
insg
ins Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA
Department of Pathology, University of Florida, Gainesville, FL, USA
School of Medicine, University of Maryland, Baltimore, MD, USA
Instituto de Investigación Nutricional, Peru
National Institute of Cholera and Enteric Diseases, Kolkata, India
Kyoto University, Kyoto, Japan
The J. Craig Venter Institute, Rockville, MD, USA
source BMC Genomics
issn 1471-2164
pubdate 2011
volume 12
issue 1
fpage 294
url http://www.biomedcentral.com/1471-2164/12/294
xrefbib pubidlist pubid idtype pmpid 21645368doi 10.1186/1471-2164-12-294
history rec date day 21month 12year 2010acc 662011pub 662011cpyrt 2011collab Chen et al; licensee BioMed Central Ltd.note This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
abs
sec st Abstract
Background
Vibrio parahaemolyticus is a common cause of foodborne disease. Beginning in 1996, a more virulent strain having serotype O3:K6 caused major outbreaks in India and other parts of the world, resulting in the emergence of a pandemic. Other serovariants of this strain emerged during its dissemination and together with the original O3:K6 were termed strains of the pandemic clone. Two genomes, one of this virulent strain and one pre-pandemic strain have been sequenced. We sequenced four additional genomes of V. parahaemolyticus in this study that were isolated from different geographical regions and time points. Comparative genomic analyses of six strains of V. parahaemolyticus isolated from Asia and Peru were performed in order to advance knowledge concerning the evolution of V. parahaemolyticus; specifically, the genetic changes contributing to serotype conversion and virulence. Two pre-pandemic strains and three pandemic strains, isolated from different geographical regions, were serotype O3:K6 and either toxin profiles (tdh+, trh-) or (tdh-, trh+). The sixth pandemic strain sequenced in this study was serotype O4:K68.
Results
Genomic analyses revealed that the trh+ and tdh+ strains had different types of pathogenicity islands and mobile elements as well as major structural differences between the tdh pathogenicity islands of the pre-pandemic and pandemic strains. In addition, the results of single nucleotide polymorphism (SNP) analysis showed that 94% of the SNPs between O3:K6 and O4:K68 pandemic isolates were within a 141 kb region surrounding the O- and K-antigen-encoding gene clusters. The "core" genes of V. parahaemolyticus were also compared to those of V. cholerae and V. vulnificus, in order to delineate differences between these three pathogenic species. Approximately one-half (49-59%) of each species' core genes were conserved in all three species, and 14-24% of the core genes were species-specific and in different functional categories.
Conclusions
Our data support the idea that the pandemic strains are closely related and that recent South American outbreaks of foodborne disease caused by V. parahaemolyticus are closely linked to outbreaks in India. Serotype conversion from O3:K6 to O4:K68 was likely due to a recombination event involving a region much larger than the O-antigen- and K-antigen-encoding gene clusters. Major differences between pathogenicity islands and mobile elements are also likely driving the evolution of V. parahaemolyticus. In addition, our analyses categorized genes that may be useful in differentiating pathogenic Vibrios at the species level.
bdy
Background
Vibrio parahaemolyticus is a halophilic bacterium which has long been recognized abbrgrp abbr bid B1 1 as a human pathogen that causes gastroenteritis and, occasionally, wound infections and sepsis in immunocompromised patients. It is the leading etiologic agent for bacterial foodborne disease in Japan and other parts of Asia, and it is the most common bacterial cause of seafood-associated disease in the United States. Prior to 1996, there was no specific serotype of V. parahaemolyticus that was associated with disease outbreaks, and the bacterium had never been reported to cause a pandemic. However, during that year, a major outbreak occurred in India, > 50% of the V. parahaemolyticus strains isolated from patients were serotype O3:K6 B2 2. Also, the outbreak rapidly spread to other countries in Asia, South America, North America, Africa and Europe, resulting in a pandemic affecting tens of thousands of people 2B3 3. During its global dissemination, > 20 serovariants (including O3:K6, O4:K68, O1:K25, O1:KUT [untypable], and others 2B4 4B5 5 rapidly evolved from the original pandemic O3:K6 strain. The pandemic O3:K6 and its serovariants are termed strains of the pandemic clone.
A thermostable direct hemolysin (TDH) is recognized B6 6 as the most important virulence factor of V. parahaemolyticus, and a TDH-related hemolysin (TRH) is believed to account for the virulence of strains that do not produce TDH. Prior whole-genome sequencing B7 7B8 8 of a serotype O3:K6, pandemic isolate designated RIMD2210633 identified two type III secretion systems (T3SS). T3SSI is present in all V. parahaemolyticus isolates examined and is required for the bacterium's cytolytic activity 8; whereas, T3SSII is required for enterotoxicity and is located in the tdh-containing pathogenicity island 78.
Outbreaks of diarrheal disease caused by V. parahaemolyticus may pose a significant health threat. Thus far, the most affected country (other than India) has been Chile, where > 10,000 cases were reported during 2005. This observation suggests that, under appropriate conditions, V. parahaemolyticus may cause large-scale outbreaks comparable to those elicited by V. cholerae. At the present time, the reasons for the pandemic strains' rapid increase in virulence/prevalence have not been rigorously determined. In addition, the mechanism(s) for rapid serotype conversion warrant further study. Furthermore, it is not clear whether the virulence mechanisms of tdh+ and trh+ strains are similar. Therefore, in order to address some of these questions, we performed rigorous genomic analyses of two pre-pandemic and four pandemic isolates of V. parahaemolyticus.
Results and Discussion
Comparative genomics of V. parahaemolyticus
Prior to this study, an O3:K6 pandemic isolate (strain RIMD2210633) was sequenced to completion 7 and an O3:K6 non-pandemic isolate (strain AQ3810) was sequenced to draft status B9 9. In this study, we sequenced four additional isolates of V. parahaemolyticus to at least 8-fold draft coverage, for a total of six clinical isolates; two non-pandemic and four pandemic (Table tblr tid T1 1). The two non-pandemic strains, AQ3810 and AQ4037, were isolated in 1983 and 1985, respectively, and both originated from Southeast Asia. Throughout the remainder of this study, we will refer to these two non-pandemic isolates as "pre-pandemic" because they were isolated prior to the documented start of the pandemic. Three of the pandemic isolates were from Southeast Asia, including strain RIMD2210633 in 1996, strain AN5034 in 1998, and strain K5030 in 2005, while the fourth pandemic isolate (strain Peru466) was isolated from Peru in 1996. Therefore, the isolates represented two geographic areas where major outbreaks occurred. In addition, they also have different serotypes and toxin profiles. All of the pandemic strains were (tdh+ trh-) and the pre-pandemic strains were either (tdh+ trh-) or (tdh- trh+), thus representing two potentially different virulence mechanisms. To improve our understanding of the pandemic clone's evolution during their global dissemination, the genome of a Peruvian isolate (strain Peru466) B10 10 was sequenced and compared to the genomes of Asian isolates collected at different time points during the pandemic. In the later stage of the pandemic, there were fewer cases of infection in South Asia; thus, V. parahaemolyticus isolated during this time seems to be less virulent (Nair, personal observation). Therefore, an isolate (strain K5030) collected in 2005 from India was included and considered a "less virulent" late stage pandemic isolate in this study. Also, the genome of a never-before-sequenced serotype O4:K68 pandemic isolate (strain AN5034) was characterized in order to advance our understanding of the mechanism for its serotype conversion.
tbl Table 1caption Six V. parahaemolyticus strains analyzed during this studytblbdy cols 10
r
c center
b Year
Strain
Source
Serotype
tdh
trh
# contigs
Contig N50sup †§ (bp)
Max. Contig
Reference
cspan
hr
1983‡
AQ3810
Singapore
O3:K6
+
-
1037
52609
295134
9
1985‡
AQ4037
Maldives
O3:K6
-
+
164
67710
241746
This study
1996
RIMD2210633
Thailand
O3:K6
+
-
2
N.A.
3288558
7
1996
Peru466
Peru
O3:K6
+
-
149
81497
273858
This study
1998
AN5034
Bangladesh
O4:K68
+
-
54
346246
1183081
This study
2005
K5030
India
O3:K6
+
-
164
62978
657114
This study
tblfn
†The contig N50 is the length of the smallest contig in the set that contains the fewest (largest) contigs whose combined length represents at least 50% of the assembly B41 41.
§The contig N50 was calculated from the Celera assembly, not the contigs submitted to Genbank.
‡Pre-pandemic years.
N.A. Not Applicable.
The pan genome of the six V. parahaemolyticus strains we examined had 6,616 chromosomal coding genes, and each individual genome (excluding plasmids) had an average of 4,673 coding genes (Figure figr fid F1 1). Three thousand twenty eight genes, ca. 71% of the coding genes were present in all the strains (Additional file supplr sid S1 1). However, that number may be lower than the actual number because the genomes, except for the genome of RIMD2210633, were not sequenced to completion. Therefore, some of the open-reading frames (ORFs) that bordered contigs may have failed to meet the cut-offs and, subsequently, were treated as not present. The four newly sequenced genomes displayed a high degree of synteny with RIMD2210633 (Figure 1). There was very little rearrangement of the genome of the pre-pandemic strain AQ4037 and essentially no rearrangement in the pandemic strains. Because the gaps are not closed for five of the genomes, our report of synteny represents our best estimation.
fig Figure 1Whole-genome comparison of six V. parahaemolyticus strainstext
Whole-genome comparison of six V. parahaemolyticus strains. Panel A: Colored lines denote the percent identities of protein translations, and they are plotted according to their locations in the reference strain (RIMD2210633) and query strain's genomes. Panel B: Venn diagrams indicate the number of shared proteins (black) and unique proteins (red) within a particular relationship for all six V. parahaemolyticus strains.
graphic file 1471-2164-12-294-1 hint_layout double
suppl
Additional file 1
Vibrio parahaemolyticus Ortholog Match Table. BLAST-based ortholog Match Table of V. parahaemolyticus strains
name 1471-2164-12-294-S1.XLS
Click here for file
Super integron
V. parahaemolyticus harbors a super integron (SI) on chromosome I. The SI is about 48 kb long and contains ca. 77 ORFs, which is much smaller than the SIs in V. cholerae (120 kb) and in V. vulnificus (138 kb). Most of the ORFs in the SI regions encode hypothetical proteins. The SI integrases were identical in the six V. parahaemolyticus strains examined, but the cassettes in the SI regions of the pre-pandemic strains varied greatly from those of the pandemic strains. For example, only 24 and 28 of the 77 ORFs in the pandemic strains' SI regions were present in those of pre-pandemic strains AQ3810 and AQ4037, respectively. However, the cassettes in the SI regions of the four pandemic strains were nearly identical; i.e., they contained only a few point mutations. The only exception was isolate K5030, which had an additional six hypothetical proteins inserted between the integrase and the rest of the cassettes. These observations indicate that the integrase is active in V. parahaemolyticus and contributes to species evolution. However, the fact that its SI region is smaller than those of other pathogenic Vibrio species, and the presence of highly conserved cassettes in the pandemic strains, suggests that the genomes of V. parahaemolyticus may be more stable than those of other pathogenic Vibrios.
Pathogenicity islands, prophages, and integrated elements
Only the pandemic strains examined in this study contained the pathogenicity islands previously described 7 for V. parahaemolyticus (Table T2 2). In addition, we detected various prophages and integrated elements using Phage_Finder B11 11. Prophage f237, which has been widely used as a genetic marker for the pandemic clone B12 12, was present in chromosome I (loci VP1549-1562 in strain RIMD2210633) of all the pandemic strains we examined, but it was absent from the pre-pandemic strains (Figure F2 2A). However, a prophage similar to f237 was present in pre-pandemic strain AQ4037, in the same location occupied by f237 in the pandemic strains (Figure 2A). In addition to f237, another prophage was identified adjacent to f237 (loci VP1563-1586 in strain RIMD2210633) in all of the pandemic strains and in pre-pandemic strain AQ3810, but it was absent from strain AQ4037. Also, a second copy of that prophage was present in chromosome II of the pandemic strains and in strain AQ3810 (Figure 2B). In addition, a prophage region unique to the serotype O4:K68, pandemic strain AN5034 (AN5034_0425-0489) was identified by Phage_Finder (Table 2 and Additional file 1).
Table 2Variable regions in V. parahaemolyticus
left
#
Region or insertion site relative to RIMD2210633
Number of ORFs
Function
RIMD (O3:K6 1996)
Peru466 (O3:K6 1996)
AN5034 (O4:K68 1998)
K5030 (O3:K6 2005)
AQ3810(O3:K6 1983)
AQ4037(O3:K6 1985)
1
Between VP0001-0002 (K5030_3039-3061)
23
DNA sulfur modification proteins
-
-
-
+
-
-
2
VP0197-0238
42
O3:K6 LPS/CPS
+
+
-
+
+
+
3
Replaced VP0197-0238 (AN5034_1849-1901)
53
O4:K68 LPS/CPS
-
-
+
-
-
-
4
Between VP0248-0249 (AN5034_1830-1837)
8
Unknown
-
-
+
-
-
-
5
VP0380-0403
24
Type I restriction endonuclease in tRNA-Met-1
+
+
+
+
-
-
6
VP0637-0643
7
Integrated element target tmRNA
+
+
+
+
-
-
7
Between VP0643-0644 (AN5034_1437-1442)
6
Integrated element target tmRNA
-
+
+
+
+
+
8
VP1071-1076
6
Unknown, contains phage integrase
+
+
+
+
-
-
9
VP1077-1087
11
Unknown, contains phage integrase
+
+
+
+
+
-
10
VP1385-1421
37
Type VI secretion system
+
+
+
+
-
+
11
VP1549-1562
14
Phage f237
+
+
+
+
-
-
12
Replaced VP1549-1562 (AQ4037_2432-2444)
13
Phage similar to f237
-
-
-
-
-
+
13
VP1563-1586
24
Phage alpha*
+
+
+
+
+
-
14
Between VP1604-1605 (AN5034_0425-0489)
65
Phage
-
-
+
-
-
-
15
VP1787-1865
78
Super integron
+
+
+
+
v
v
16
Between VP1864-1865 (K5030 1808-1814)
7
Addition to super integron, next to integrase
-
-
-
+
-
-
17
VP1884-1891
8
Unknown
+
+
+
+
-
-
18
VP1969-1974
6
Fatty acid and amino acid metabolism
+
+
+
+
-
+
19
VP2131-2144
14
Hypothetical proteins
+
+
+
+
-
-
20
Between VP2275-2276 (AQ4037_1749-1829)
81
trh pathogenicity island
-
-
-
-
-
+
21
Between VP2638-2639 (AQ4037_1361_1383)
23
Hypothetical proteins, contains integrase
-
-
-
-
-
+
22
VP2900-2910
11
Hypothetical proteins
+
+
+
+
-
-
23
VPA0434-0440
7
Hypothetical proteins
+
+
+
+
-
-
24
VPA0889-0912
24
Phage beta*
+
+
+
+
+
-
25
VPA1254-1270
17
Unknown
+
+
+
+
-
-
26
VPA1310-1398
86
tdh pathogenicity island
+
+
+
+
+
-
27
Replaced VPA1310-1313 (AN5034_A0845-A0851)
7
Hypothetical proteins
-
+
+
+
-
-
28
Replaced VPA1310-1398 (AQ4037_A1228-A1253)
25
Nutrient uptake and metabolism
-
-
-
-
+
+
In the first column, if genes are absent from RIMD, the gene numbers in one of the other genomes are indicated in the parenthesis.
* These two phages are very similar.
v Variable contents in the super integron
Figure 2Linear illustration of f237-like prophage and juxtaposed regions
Linear illustration of f237-like prophage and juxtaposed regions. Depicted are linear representations of ORFs found on chromosome I of each query genome with similarity to the f237-like prophage in RIMD2210633 (A). Those regions found on chromosome II of query genomes with similarity only to the alpha region of RIMD_f237/α are also shown (B). Query ORFs are colored by protein percent identity to RIMD2210633 proteins (see key). The reference RIMD_f237/α ORFs and query ORFs with no match to RIMD_f237/α ORFs are colored by function role categories as noted in the boxed key.
1471-2164-12-294-2 single
Each of the strains we studied had one or two integrated elements targeting the tmRNA gene (Table 2). For example, Peru466, AN5034, and K5030 had two different, integrated elements inserted in tandem into the 3' end of their tmRNA genes. The element closest to the tmRNA gene contained two genes that may influence virulence: a putative cyclic diguanylate phosphodiesterase EAL domain protein and an AraC superfamily putative fimbrial transcriptional activator. The second element was distinguished by the presence of a ribonuclease H-encoding gene. The first element was present in strains AQ3810 and AQ4037, but not in strain RIMD2210633. However, strains AQ3810 and AQ4037 lacked the ribonuclease H-encoding element present in strain RIMD2210633.
Characterization of the pathogenicity islands
Pre-pandemic strain AQ4037 is tdh-, trh+ and urease-positive, and its genome sequence revealed a pathogenicity island (hereafter called trhPAI) containing 81 ORFs (Figure F3 3). Another pathogenicity island (hereafter referred to as tdhPAI) was previously identified in chromosome II of pandemic strain RIMD2210633 and includes loci VPA1310-1398 (Figure 3). tdhPAI contains a type III secretion system (T3SSII) and two copies of tdh; whereas, trhPAI contains trh, an integrase, transposases, a urease gene cluster, a peptide/nickel transportation system, and a T3SS that is different from the one in tdhPAI (Figure 3). The T3SS in AQ4037's trhPAI is similar to T3SSIIβ in V. parahaemolyticus TH3996, which is related to the T3SS in non-O1, non-O139 strains of V. cholerae B13 13. Interestingly, trhPAI was found in chromosome II of strain TH3996, but it was located in chromosome I of strain AQ4037. This discrepancy in chromosomal location may be providing a clue to the pathogenicity island's mobility.
Figure 3Gene clusters in the trh and tdh pathogenicity islands
Gene clusters in the trh and tdh pathogenicity islands. The upper line indicates chromosome I of strain AQ4037, while the lower line indicates chromosome II of strain RIMD2210633. Thin lines connect homologous genes, and the boxes indicate ORFs. The colors denote various functional categories: brown, integrase; orange, transposase; green, urease-encoding gene; yellow, nickel/peptide transport-encoding gene; red, tdh and trh; pink, toxR; blue, T3SS-related gene; open box, other genes.
1471-2164-12-294-3
Close examination of the tdhPAI region in the six genomes revealed major differences between the pre-pandemic and pandemic strains (Figure F4 4). The four epidemic strains' tdhPAIs were very similar to one another; however, the entire pathogenicity island was absent from the pre-pandemic, tdh- strain AQ4037. Instead, that strain contained a pre-pandemic-specific region of 18 ORFs important for the uptake and metabolism of carbon sources and other nutrients. In addition, although pre-pandemic strain AQ3810 contained both tdhPAI and the pre-pandemic nutrient uptake region, it had an inverted tdhS gene (Figure 4). Thus, the pre-pandemic and pandemic strains exhibited major differences between the region upstream of the pathogenicity island and in tdh's orientation. Whether those variations affect the expression of the pathogenicity island's genes, which contributes to differences between the pre-pandemic and pandemic strains' virulence, remains to be determined. The pathogenicity island's organization suggests that an ancestral strain possessing the O3:K6 serotype may have recruited a tdhPAI next to VPA1309, which yielded a transient strain that subsequently lost the pre-pandemic island and gave rise to the pandemic strains. Another possibility is that the tdh+, pre-pandemic strain and the pandemic strains independently recruited the pathogenicity islands into the same location.
Figure 4Diagram of the tdh pathogenicity island
Diagram of the tdh pathogenicity island. Solid black boxes indicate the common border genes, and the homologous genes are connected by dotted lines. Homologous regions are shaded and tdh genes are indicated with arrows showing the direction of transcription. Other ORFs are omitted for simplicity.
1471-2164-12-294-4
The tdh and trh genes are related but vary substantially B14 14, and many variants of tdh and two trh genes have been described. They have been found in various Vibrio species and their phylogenetic relationships are not in accordance with the relationship of the host species 14. Although most of them are in the chromosomes, some of them are present in plasmids, a finding that is consistent with their proposed mobility. Also, the presence of trh and tdh on both chromosomes of V. parahaemolyticus supports the idea that they may have been acquired by lateral gene transfer and may have integrated into the bacterium's genome during independent events. Also, since V. parahaemolyticus strains that have both tdh and trh have been described B15 15, it might be worthwhile to sequence those strains' genomes in order to understand the evolutionary history of tdh and trh in their hosts.
TDH and TRH are the only confirmed virulence factors of V. parahaemolyticus; however, their precise roles are not well understood. Genomic sequencing revealed that the genes encoding those toxins are in close proximity to a T3SSII system, which suggests they may have the same origin as that transport system. Thus, it is tempting to speculate that TDH (and/or TRH) and T3SSII have coordinating activities related to the virulence of V. parahaemolyticus. Some translocon proteins and effectors have been identified for T3SSII B16 16B17 17; however, the putative relationship between TDH, TRH, and T3SSII needs further investigation. Another T3SS (T3SSI) identified in the V. parahaemolyticus genome was demonstrated 8B18 18 to be required for cytolytic activity. However, T3SSI was conserved in all of the genomes we examined.
Variability within the O3:K6 genetic locus
Since 1996, serotype O3:K6 has predominated among clinical isolates of V. parahaemolyticus, thus that serotype has been associated with the bacterium's increased virulence. However, strains of O3:K6 serotype had been isolated more than a decade before the pandemic initiated. It is not clear if there are variations within the O3:K6 genetic determinants between non-pandemic and pandemic strains, and thus causing subtle structural difference of the O and K antigens that could not be detected by the serotyping techniques. Therefore, we examined the O- and K-antigen-encoding gene clusters in the pre-pandemic and pandemic O3:K6 strains for any variations that may explain the observed increase in virulence. The O- and K-antigen-encoding gene clusters are juxtaposed in V. parahaemolyticus O3:K6 B19 19. In strain RIMD2210633, they are located at loci VP0190-0238 in chromosome I (position 201,797-253,279) B20 20B21 21. However, that region was conserved in all O3:K6 strains (sharing > 99.5% amino acid-encoding identity), which suggests that the serotype of V. parahaemolyticus may not be directly related to the pandemic strain's increased virulence.
Serotype conversion from O3:K6 to O4:K68
In addition to serotype O3:K6, > 20 other serotypes of V. parahaemolyticus were detected among pandemic strains 2. The mechanism for this serotype conversion remains unknown. Since the O- and K-antigen loci are tightly linked on the chromosome 19, a recombination event involving this region would enable rapid conversion of serotypes. Before comparing the O- and K-antigen region between O3:K6 and O4:K68 serotypes, we first wanted to determine the variability of this region within the O4:K68 serotype as we did above for the O3:K6 serotype. We compared the O- and K- region of strain AN5034 (O4:K68), AN5034_1842-1901, to the O- and K- region of another O4:K68 strain designated NIID242-200 21. Both clusters were nearly identical (9 mismatches in a 63-kb-long region), as observed above for the O3:K6 loci. When comparing the O- and K-antigen regions between strain AN5034 (O4:K68) and strain RIMD2210633 (O3:K6), this region varied substantially, except for the first seven genes, suggesting recombination as a method for serotype conversion. To help identify the scope of the recombination event, we analyzed the distribution of single-nucleotide polymorphisms (SNPs) in the genome of strain AN5034 compared to strain RIMD2210633. Compared to the other pandemic isolates, strain AN5034 had 2,281 SNPs (excluding insertions and deletions), 2,142 (94%) of which clustered in a 141 kb region (position 199,786-341,273 in RIMD2210633 chromosome I, and position 166,252-324,726 in AN5034 contig ACFO01000016.1), corresponding to between 2-kb upstream and 88-kb downstream of the O- and K-antigen-encoding gene clusters (Additional file S2 2 blue highlight, Figure F5 5 circle 5). This observation suggests that a recombination event involving a much larger region than the O-antigen- and K-antigen loci occurred and gave rise to the new O4:K68 serotype during the pandemic.
Additional file 2
Vibrio parahaemolyticus Single Nucleotide Polymorphisms. Single Nucleotide Polymorphisms of V. parahaemolyticus genomes relative to reference strain RIMD2210633
1471-2164-12-294-S2.XLS
Click here for file
Figure 5Circular Illustration of Single Nucleotide Polymorphisms and Genome Features Relative to the Reference Strain RIMD2210633
Circular Illustration of Single Nucleotide Polymorphisms and Genome Features Relative to the Reference Strain RIMD2210633. Chromosomes I (A) and II (B) are illustrated as a circles where each concentric circle represents genomic data and is numbered from the outermost to the innermost circle. Circles 1 and 2 represent RIMD2210633 ORFs and are colored based on function role categories. Circle 3 depicts the location in RIMD2210633 of various genomic features described in this study. Circles 4-8 denote the location of SNPs relative to RIMD2210633 and genomic features for strain Peru466, AN5034, K5030, AQ4037, and AQ3810, respectively. Refer to the key for details on color representations and circle numbers.
1471-2164-12-294-5
Origin of the South American outbreaks
Our results indicated that the pandemic strains are closely related to one another. Strain RIMD2210633 differed from strain K5030 and the Peruvian strain (Peru466) by 70 and 76 SNPs, respectively (Additional file 2). Peru466 was isolated at approximately the same time the first outbreak (which later evolved into a pandemic) was reported in India. Also, its gene content is almost identical to that of a pandemic Asian strain (RIMD2210633) isolated at the same time, which supports the idea that the pandemic strain spread from Asia to South America very soon after it emerged. Considering the proximity, it is likely that the 2005 outbreak in Chile was caused by strains descended from the Peruvian isolates. We speculate that the strains from South American outbreaks are closely related to the strains from Indian outbreaks. This hypothesis will require genomic sequencing of additional strains from South America, specifically those from Chile, to confirm.
Fewer cases of disease caused by V. parahaemolyticus were reported from Asia during the later stages of the pandemic than during its early stages (Nair, personal observation); thus, we suspect that the strains isolated during the later stages were not as virulent as those obtained during its early stages. Therefore, we sequenced and analyzed the genome of strain K5030, which was isolated from India almost a decade after the pandemic started, to look for gene deletions and variations. Major gene deletions were not identified in strain K5030; instead, several insertions were detected (Table 2). In addition to the cassettes added to the strain's SI, K5030 had an insertion of 23 ORFs (between VP0001 and VP0002) containing a DNA phosphorothioation (dnd) system, which incorporates sulphur into the DNA backbone, B22 22B23 23. Recent evidence supports the lateral transfer of dnd genes among bacterial genomes B24 24. Besides the dnd genes, there were also two transposases and one integrase in the insertion. However, it is not clear whether the addition of those ORFs reduced the strain's virulence by altering the structure and function of virulence factors.
Comparative genomic analyses of the three major pathogenic Vibrio species
In order to characterize important genetic differences between V. parahaemolyticus, V. cholerae and V. vulnificus, we compared their core genomes. The V. cholerae group consisted of four completely sequenced genomes of toxigenic, serotype O1 strains: N16961, M66-2, MJ-1236 and O395 B25 25B26 26B27 27. The V. vulnificus group contained two completely sequenced genomes of strains CMCP6 and YJ016 B28 28B29 29, and the V. parahaemolyticus group consisted of strain RIMD2210633 and the two pre-pandemic isolates (strains AQ3810 and AQ4037). The orthologs from each group were extracted by comparing all members of the group, after which the core genes from each group were compared to each other.
Forty-nine to 59% of the core genes in each species were common to all three species (Figure F6 6). However, 14-24% of the core genes in each group were only conserved in its own group and were in different COG (Clusters of Orthologous Genes) categories (Figure 6). These are likely to be the genes that define the bacteria at the species level. Furthermore, each species had specific genes and transporters required for various metabolic pathways, which indicates that they have different requirements for transporting various ions, nutrients, and other metabolites across their outer membranes. In addition, each species had unique two-component regulatory systems and chemotaxis genes, which indicate that they have specific signal pathways that respond to various environmental stimuli.
Figure 6Species-specific core genes of V. parahaemolyticus, V. cholerae, and V. vulnificus
Species-specific core genes of V. parahaemolyticus, V. cholerae, and V. vulnificus. Left panel: Each circle represents core genes extracted from selected strains of each species. The Venn diagrams show the number of shared proteins (black) and unique proteins (red) within a particular relationship for all three species. Right panel: Pie chart of species-specific genes. Functional categories are indicated with various colors and letters.
1471-2164-12-294-6
The three species also differ in their outer membrane structure and virulence genes (e.g. they had different sets of genes for surface polysaccharide biosynthesis). Furthermore, V. vulnificus has a unique set of genes for FLP pilus synthesis, and toxigenic strains of V. cholerae have genes for toxin co-regulated pilus synthesis, a well-studied virulence factor 27 (the ctxØ phage is not included as containing genes specific for toxigenic V. cholerae because it is not present in strain M66-2). V. parahaemolyticus possesses genes encoding two unique flagella, in addition to the genes required for the biosynthesis of flagella possessed by all three species. V. cholerae also has two different T3SS-containing islands 7. T3SS has been reported to be closely related to T3SSII in V. parahaemolyticus and to be present in non-O1, non-O139 strains of V. cholerae B30 30B31 31, but we did not detect it in the toxigenic strains of V. cholerae. However, the possession of T3SS by V. parahaemolyticus and non-O1, non-O139 strains of V. cholerae suggests that this transport system is required for colonization of their unknown environmental hosts and reservoirs.
Evolution of V. parahaemolyticus
In order to advance our understanding of the relationships between the V. parahaemolyticus strains we characterized, a set of 924 single-copy genes present in all six strains (plus an outlier of V. vulnificus CMCP6), taken from the analysis of three pathogenic Vibrios, was compiled, and a nucleotide maximum-likelihood tree was inferred for the concatenated set of 924 genes (Additional file S3 3). Our expectation that a resolved phylogeny supported by the majority of genes would be found and interpreted as an estimate of the strains' phylogenies was not met, a result similar to that of Boyd et al 2008 9 who also found that all of the pandemic strains were intermixed with pre-pandemic strains, but used just three genes in their analysis. Inspection of the aligned genomes with the SNPs marked in each one (Figure 5) revealed that the SNPs were often clustered, possibly indicative of recombination events. Recombination events involving multiple SNPs would distort the branch lengths and provide a poor estimate of the phylogeny. In retrospect, this result might have been expected because we have shown that recombination events can involve a large number of variable nucleotides in numerous genes and each one of those events will distort the distance measure used for phylogenetic trees that assume all mutations are independently acquired. In the sequenced strains, the single recombination event that converted the serotype and replaced the neighboring 90 kb contained 14 times the total number of variable nucleotides observed in the rest of the genome (Additional file 2, Figure 5 circle 5). Thus, if only one of the 110 genes from this region were included with the rest of the genes in the genome to calculate a tree, that one gene, on average, would contribute 11% of the total variation. In order to calculate accurate trees, each recombination event and independent mutation event must be given equal weight.
Additional file 3
Relationships of V. parahaemolyticus strains. DNA Maximum Likelihood tree based on 924 orthologs of the six V. parahaemolyticus strains. V. vulnificus strain CMCP6 was used as an outlier.
1471-2164-12-294-S3.EPS
Click here for file
Conclusion
This study helps to improve our understanding of how V. parahaemolyticus evolved during a pandemic. The results of our multiple genome analyses are consistent with the idea that pandemic strains of V. parahaemolyticus evolved from pre-pandemic strains by numerous deletions and acquisitions of genetic material. Pandemic strains differ from pre-pandemic strains mostly in mobile genetic elements and the structure of the pathogenicity islands. Serotype conversion to O4:K68 was likely due to a recombination event involving a region much larger than the O-antigen- and K-antigen-encoding gene clusters. In addition, this study revealed that (tdh+ trh-) and (tdh- trh+) strains not only have different toxin genes, but also differ in the structures and locations of their pathogenicity islands.
Lateral gene transfer seems to be the major force shaping the virulence of V. parahaemolyticus, as evidenced by the diversity in the locations and nucleotide sequences of the virulence factor-encoding genes. In addition, previous studies had shown that insertion sequences in V. parahaemolyticus could change the genome structure and result in the loss of a major virulence factor B32 32B33 33. However, the pandemic strains we studied are almost monomorphic (except for pathogenicity islands and mobile elements) and the outbreaks in Asia and South America are closely related. During the pandemic's later development, pandemic strains with the O3:K6 serotype (and its serovariants) were no longer prevalent in India, where the pandemic originated (Nair, personal observations). Instead, massive outbreaks were reported in Chile. This observation suggests that the virulent clone that spread to South America during the pandemic's early stage has persisted in that area and continues to cause outbreaks. There was no loss of known virulence genes in the later stage pandemic (K5030) isolate from south Asia; therefore, its virulence status needs further evaluation.
Genetic changes in the etiologic agent may not be the only factor leading to V. parahaemolyticus-mediated pandemics (e.g. optimal environmental conditions may enable pandemic strains to flourish in their reservoirs). For example, an outbreak of foodborne diarrheal disease caused by V. parahaemolyticus was reported on an Alaskan cruise ship during 2004, and the source of the infection was V. parahaemolyticus-contaminated oysters harvested following warm weather in Alaska, where V. parahaemolyticus had not been previously isolated B34 34. In addition, although there are major genetic differences between pre-pandemic and pandemic strains of V. parahaemolyticus, and they may, to some extent, contribute to the pandemic strains' increased virulence, a pathogenicity island (which contains tdh and T3SS) is also present in the pre-pandemic strain AQ3810. Thus, it is likely that the recent V. parahaemolyticus-mediated pandemic resulted from the convergence of genetic changes in the etiologic agent and the presence of optimal conditions for survival and growth in its natural reservoirs.
Materials and methods
Strain isolation and verification
The V. parahaemolyticus strains were isolated on TCBS (thiosulfate, citrate, bile salts, and sucrose) agar medium followed by their presumptive identification with a multitest medium B35 35. The strains' identities were confirmed by a species-specific toxR assay 29, a commercially available V. parahaemolyticus antiserum kit (Toshiba Kagaku Kogyo Co., Ltd., Tokyo, Japan) was employed for serological typing, and tdh and trh were identified by PCR B36 36. Strains were cultured overnight in Luria-Bertani broth, and DNA was obtained by lysing the bacteria with proteinase K followed by DNA extraction and purification with a Qiagen Maxi Kit (Valencia, CA).
Genome sequencing
The genomes of V. parahaemolyticus strains AN5034, AQ4037, K5030, and Peru466 were sequenced by the Sanger whole-genome random shotgun method B37 37. Briefly, one small insert plasmid library (3-4 kb) and one medium insert plasmid library (10-12 kb) were constructed by random nebulization and cloning of genomic DNA. During the initial random-sequencing phase, 8-fold sequence coverage was achieved with the small- and medium-size libraries sequenced to yield 5-fold and 3-fold coverage, respectively. The sequences were assembled using the Celera Assembler B38 38, and ordered scaffolds were generated by using NUCMER B39 39 to align the contigs to the genome of V. parahaemolyticus RIMD2210633, followed by hierarchical scaffolding with BAMBUS B40 40. The contig N50 was determined as described 41.
An initial set of open-reading frames (ORFs) was identified using GLIMMER B42 42, and ORFs shorter than 90 bp (and some with overlaps) were eliminated. A region containing the likely origin of replication was identified, and base pair 1 was designated adjacent to the dnaA gene located in that region B43 43. The ORFs were searched against a nonredundant protein database, and the ORF predictions and gene family identifications were done as previously described 27. Two sets of hidden Markov models (HMMs) were used to determine ORFs membership in families and super families. These included 721 HMMs from Pfam v22.0 and 631 HMMs from the TIGR ortholog resource. A transmembrane hidden Markov model (TMHMM) B44 44 was used to identify membrane-spanning domains in proteins, and putative functional role categories were assigned internally as previously described B45 45.
The nucleotide sequences and the corresponding automated annotations for the genomes of V. parahaemolyticus strains AN5034, AQ4037, K5030, and Peru466 were submitted to NCBI, with accession numbers [GenBank:ext-link ext-link-id ACFO00000000 ext-link-type gen ACFO00000000, Genbank:ACFN00000000 ACFN00000000, Genbank:ACKB00000000 ACKB00000000, and Genbank:ACFM00000000 ACFM00000000], respectively.
Comparative genomics
The database and cut-offs mentioned above were used, as previously described 37, to (i) produce an ortholog match table, (ii) construct a Venn diagram, and (iii) bin the relationships within the Venn diagram. Synteny plots using PROMER 39 were computed as previously described 37. SNPs were identified by mapping chromosomal contigs to the complete reference genome of RIMD2210633 using NUCMER 39 with default setting and displayed using the SHOW-SNPS tool with the -C (do not report SNPs from alignments with an ambiguous mapping) and -I (do not report indels) options. SHOW-SNPS is part of the MUMMER 3 distribution http://mummer.sourceforge.net/. DNA maximum likelihood trees were created (using PAUP* 4.0b) for each of the 924 entries in the above mentioned ortholog table that had orthologs for V. vulnificus CMCP6 and the six V. parahaemolyticus strains. In order to ensure proper alignment of the coding regions, the trees were based on DNA alignments back-aligned from the proteins' alignments.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
OCS, RH, GBN, AIG, MN, and DEF conceived and organized the project; GBN, AIG, and MN provided strains; OCS and YC isolated DNA; DEF organized the sequencing studies; DEF, JHB, and YC analyzed the genomic data; and YC, DEF, JHB, and OCS prepared the manuscript. All authors read and approved the final manuscript.
bm
ack Acknowledgements
We are grateful to the J. Craig Venter Institute's sequencing, bioinformatics and IT departments for supporting the infrastructure required to determine the genomes' sequences and annotation. We especially thank A. Scott Durkin for contributing to the genomes' annotation. We also thank Arnold S. Kreger and Stephanie Zafonte for reading and editing the manuscript. This project was supported by the (i) National Institute of Allergy and Infectious Diseases (National Institutes of Health, Department of Health and Human Services), under contract number N01-AI-30071, and (ii) University of Maryland Clinical Research Unit of the Food and Waterborne Diseases Integrated Research Network (funded by the National Institute of Allergy and Infectious Diseases), under contract number N01-AI-40014.
refgrp On the bacteriological examination of shirasu food poisoningFujinoLOkunoYNakadaDAoyamaAFukaiKMukaiTUeboTMed J Osaka Univ19534299lpage 304Global dissemination of Vibrio parahaemolyticus serotype O3:K6 and its serovariantsNairGBRamamurthyTBhattacharyaSKDuttaBTakedaYSackDAClin Microbiol Rev2007201394810.1128/CMR.00025-06pmcid 1797631link fulltext 17223622The Vibrio parahaemolyticus pandemicNairGBHormazabalJCRev Chilena Infectol200522212513015891792Molecular evidence of clonal Vibrio parahaemolyticus pandemic strainsChowdhuryNRChakrabortySRamamurthyTNishibuchiMYamasakiSTakedaYNairGBEmerg Infect Dis20006663163610.3201/eid0606.000612264092911076722Assessment of evolution of pandemic Vibrio parahaemolyticus by multilocus sequence typingChowdhuryNRStineOCMorrisJGNairGBJ Clin Microbiol20044231280128210.1128/JCM.42.3.1280-1282.200435682515004094Cloning and nucleotide sequence of the gene (trh) encoding the hemolysin related to the thermostable direct hemolysin of Vibrio parahaemolyticusNishibuchiMTaniguchiTMisawaTKhaeomanee-IamVHondaTMiwataniTInfect Immun1989579269126973135132759706Genome sequence of Vibrio parahaemolyticus: a pathogenic mechanism distinct from that of V choleraeMakinoKOshimaKKurokawaKYokoyamaKUdaTTagomoriKIijimaYNajimaMNakanoMYamashitaAetal Lancet2003361935974374910.1016/S0140-6736(03)12659-112620739Functional characterization of two type III secretion systems of Vibrio parahaemolyticusParkKSOnoTRokudaMJangMHOkadaKIidaTHondaTInfect Immun200472116659666510.1128/IAI.72.11.6659-6665.200452303415501799Molecular analysis of the emergence of pandemic Vibrio parahaemolyticusBoydEFCohenALNaughtonLMUsseryDWBinnewiesTTStineOCParentMABMC Microbiol2008811010.1186/1471-2180-8-110249162318590559O3:K6 serotype of Vibrio parahaemolyticus identical to the global pandemic clone associated with diarrhea in PeruGilAIMirandaHLanataCFPradaAHallERBarrenoCMNusrinSBhuiyanNASackDANairGBInt J Infect Dis200711432432810.1016/j.ijid.2006.08.00317321179Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequencesFoutsDENucleic Acids Res200634205839585110.1093/nar/gkl732163531117062630A filamentous phage associated with recent pandemic Vibrio parahaemolyticus O3:K6 strainsNasuHIidaTSugaharaTYamaichiYParkKSYokoyamaKMakinoKShinagawaHHondaTJ Clin Microbiol2000386215621618675210834969Identification and characterization of a novel type III secretion system in trh-positive Vibrio parahaemolyticus strain TH3996 reveal genetic lineage and diversity of pathogenic machinery beyond the species levelOkadaNIidaTParkKSGotoNYasunagaTHiyoshiHMatsudaSKodamaTHondaTInfect Immun200977290491310.1128/IAI.01184-08263201619075025Thermostable direct hemolysin gene of Vibrio parahaemolyticus: a virulence gene acquired by a marine bacteriumNishibuchiMKaperJBInfect Immun1995636209320991732717768586Molecular epidemiologic evidence for association of thermostable direct hemolysin (TDH) and TDH-related hemolysin of Vibrio parahaemolyticus with gastroenteritisShiraiHItoHHirayamaTNakamotoYNakabayashiNKumagaiKTakedaYNishibuchiMInfect Immun19905811356835733136992228229Identification of two translocon proteins of Vibrio parahaemolyticus type III secretion system 2KodamaTHiyoshiHGotohKAkedaYMatsudaSParkKSCantarelliVVIidaTHondaTInfect Immun20087694282428910.1128/IAI.01738-07251942118541652Identification and characterization of VopT, a novel ADP-ribosyltransferase effector protein secreted via the Vibrio parahaemolyticus type III secretion system 2KodamaTRokudaMParkKSCantarelliVVMatsudaSIidaTHondaTCell Microbiol20079112598260910.1111/j.1462-5822.2007.00980.x17645751Identification of proteins secreted via Vibrio parahaemolyticus type III secretion system 1OnoTParkKSUetaMIidaTHondaTInfect Immun20067421032104210.1128/IAI.74.2.1032-1042.2006136030416428750Genetic analysis of the capsule polysaccharide (K antigen) and exopolysaccharide genes in pandemic Vibrio parahaemolyticus O3:K6ChenYDaiJMorrisJGJohnsonJA201010274Genetic determinant of the capsule polysaccharide (K antigen) in pandemic Vibrio parahaemolyticus O3:K6ChenYDaiJJohnsonJAMorrisJG2009Genetic analyses of the putative O and K antigen gene clusters of pandemic Vibrio parahaemolyticusOkuraMOsawaRTokunagaAMoritaMArakawaEWatanabeHMicrobiol Immunol200852525126410.1111/j.1348-0421.2008.00027.x18557895A novel DNA modification by sulphurZhouXHeXLiangJLiAXuTKieserTHelmannJDDengZMol Microbiol20055751428143810.1111/j.1365-2958.2005.04764.x16102010Site-specific degradation of Streptomyces lividans DNA during electrophoresis in buffers contaminated with ferrous ironZhouXDengZFirminJLHopwoodDAKieserTNucleic Acids Res198816104341435210.1093/nar/16.10.43413366342837731DNA phosphorothioation is widespread and quantized in bacterial genomesWangLChenSVerginKLGiovannoniSJChanSWDemottMSTaghizadehKCorderoOXCutlerMTimberlakeSProc Natl Acad Sci USA2011Comparative genomics reveals mechanism for short-term and long-term clonal transitions in pandemic Vibrio choleraeChunJGrimCJHasanNALeeJHChoiSYHaleyBJTavianiEJeonYSKimDWLeeJHProc Natl Acad Sci USA200910636154421544710.1073/pnas.0907787106274127019720995A recalibrated molecular clock and independent origins for the cholera pandemic clonesFengLReevesPRLanRRenYGaoCZhouZRenYChengJWangWWangJPLoS One2008312e405310.1371/journal.pone.0004053260572419115014DNA sequence of both chromosomes of the cholera pathogen Vibrio choleraeHeidelbergJFEisenJANelsonWCClaytonRAGwinnMLDodsonRJHaftDHHickeyEKPetersonJDUmayamLNature2000406679547748310.1038/3502000010952301Comparative genome analysis of Vibrio vulnificus, a marine pathogenChenCYWuKMChangYCChangCHTsaiHCLiaoTLLiuYMChenHJShenABLiJCGenome Res200313122577258710.1101/gr.129550340379914656965Identification of Vibrio parahaemolyticus strains at the species level by PCR targeted to the toxR geneKimYBOkudaJMatsumotoCTakahashiNHashimotoSNishibuchiMJ Clin Microbiol1999374117311778866910074546The genome of non-O1 Vibrio cholerae NRT36S demonstrates the presence of pathogenic mechanisms that are distinct from those of O1 Vibrio choleraeChenYJohnsonJAPuschGDMorrisJGsuf JrStineOCInfect Immun20077552645264710.1128/IAI.01317-06186577917283087Genomic characterization of non-O1, non-O139 Vibrio cholerae reveals genes for a type III secretion systemDziejmanMSerrutoDTamVCSturtevantDDiraphatPFaruqueSMRahmanMHHeidelbergJFDeckerJLiLProc Natl Acad Sci USA200510293465347010.1073/pnas.040991810255295015728357Detection of a functional insertion sequence responsible for deletion of the thermostable direct hemolysin gene (tdh) in Vibrio parahaemolyticusKamruzzamanMBhoopongPVuddhakulVNishibuchiMGene20084211-2677310.1016/j.gene.2008.06.00918598741Detection and characterization of a functional insertion sequence, ISVpa2, in Vibrio parahaemolyticusKamruzzamanMNishibuchiMGene20084091-2929910.1016/j.gene.2007.11.01218164873The Great Convergence: People, Animals, and the Environmenthttp://www.cdc.gov/nczved/framework/features/convergence.htmlClonal diversity among recently emerged strains of Vibrio parahaemolyticus O3:K6 associated with pandemic spreadBagPKNandiSBhadraRKRamamurthyTBhattacharyaSKNishibuchiMHamabataTYamasakiSTakedaYNairGBJ Clin Microbiol1999377235423578516310364615Detection of the thermostable direct hemolysin gene (tdh) and the thermostable direct hemolysin-related hemolysin gene (trh) of Vibrio parahaemolyticus by polymerase chain reactionTadaJOhashiTNishimuraNShirasakiYOzakiHFukushimaSTakanoJNishibuchiMTakedaYMol Cell Probes19926647748710.1016/0890-8508(92)90044-X1480187Major structural differences and novel potential virulence mechanisms from the genomes of multiple Campylobacter speciesFoutsDEMongodinEFMandrellREMillerWGRaskoDARavelJBrinkacLMDeBoyRTParkerCTDaughertySCPLoS Biol200531e1510.1371/journal.pbio.003001553933115660156A whole-genome assembly of DrosophilaMyersEWSuttonGGDelcherALDewIMFasuloDPFlaniganMJKravitzSAMobarryCMReinertKHRemingtonKAScience200028754612196220410.1126/science.287.5461.219610731133Fast algorithms for large-scale genome alignment and comparisonDelcherALPhillippyACarltonJSalzbergSLNucleic Acids Research200230112478248310.1093/nar/30.11.247811718912034836Epidemiologic application of a standardized ribotype scheme for Vibrio cholerae O1PopovicTBoppCOlsvikOWachsmuthKJ Clin Microbiol1993319247424822657807691876Assembly algorithms for next-generation sequencing dataMillerJRKorenSSuttonGGenomics201095631532710.1016/j.ygeno.2010.03.001287464620211242Improved microbial gene identification with GLIMMERDelcherALHarmonDKasifSWhiteOSalzbergSLNucleic Acids Research199927234636464110.1093/nar/27.23.463614875310556321Duplex opening by dnaA protein at novel sequences in initiation of replication at the origin of the E. coli chromosomeBramhillDKornbergACell198852574375510.1016/0092-8674(88)90412-62830993Predicting transmembrane protein topology with a hidden Markov model: application to complete genomesKroghALarssonBvon HeijneGSonnhammerELJ Mol Biol2001305356758010.1006/jmbi.2000.431511152613Whole-genome random sequencing and assembly of Haemophilus influenzae RdFleischmannRDAdamsMDWhiteOClaytonRAKirknessEFKerlavageARBultCJTombJFDoughertyBAMerrickJMScience1995269522349651210.1126/science.75428007542800


xml version 1.0 encoding utf-8 standalone no
mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance
xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd
metsHdr CREATEDATE 2012-07-26T08:06:14
agent ROLE CUSTODIAN TYPE ORGANIZATION
name BioMed Central
dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1
mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml
xmlData
epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI
http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd
epdcx:description epdcx:resourceId sword-mets-epdcx-1
epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork
http:purl.orgdcelements1.1title
epdcx:valueString Comparative Genomic Analysis of Vibrio parahaemolyticus: Serotype Conversion and Virulence
http:purl.orgdctermsabstract
Abstract
Background
Vibrio parahaemolyticus is a common cause of foodborne disease. Beginning in 1996, a more virulent strain having serotype O3:K6 caused major outbreaks in India and other parts of the world, resulting in the emergence of a pandemic. Other serovariants of this strain emerged during its dissemination and together with the original O3:K6 were termed strains of the pandemic clone. Two genomes, one of this virulent strain and one pre-pandemic strain have been sequenced. We sequenced four additional genomes of V. parahaemolyticus in this study that were isolated from different geographical regions and time points. Comparative genomic analyses of six strains of V. parahaemolyticus isolated from Asia and Peru were performed in order to advance knowledge concerning the evolution of V. parahaemolyticus; specifically, the genetic changes contributing to serotype conversion and virulence. Two pre-pandemic strains and three pandemic strains, isolated from different geographical regions, were serotype O3:K6 and either toxin profiles (tdh+, trh-) or (tdh-, trh+). The sixth pandemic strain sequenced in this study was serotype O4:K68.
Results
Genomic analyses revealed that the trh+ and tdh+ strains had different types of pathogenicity islands and mobile elements as well as major structural differences between the tdh pathogenicity islands of the pre-pandemic and pandemic strains. In addition, the results of single nucleotide polymorphism (SNP) analysis showed that 94% of the SNPs between O3:K6 and O4:K68 pandemic isolates were within a 141 kb region surrounding the O- and K-antigen-encoding gene clusters. The "core" genes of V. parahaemolyticus were also compared to those of V. cholerae and V. vulnificus, in order to delineate differences between these three pathogenic species. Approximately one-half (49-59%) of each species' core genes were conserved in all three species, and 14-24% of the core genes were species-specific and in different functional categories.
Conclusions
Our data support the idea that the pandemic strains are closely related and that recent South American outbreaks of foodborne disease caused by V. parahaemolyticus are closely linked to outbreaks in India. Serotype conversion from O3:K6 to O4:K68 was likely due to a recombination event involving a region much larger than the O-antigen- and K-antigen-encoding gene clusters. Major differences between pathogenicity islands and mobile elements are also likely driving the evolution of V. parahaemolyticus. In addition, our analyses categorized genes that may be useful in differentiating pathogenic Vibrios at the species level.
http:purl.orgdcelements1.1creator
Chen, Yuansha
Stine, O Colin
Badger, Jonathan H
Gil, Ana I
Nair, G Balakrish
Nishibuchi, Mitsuaki
Fouts, Derrick E
http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1
http:purl.orgeprintentityTypeExpression
http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066
en
http:purl.orgeprinttermsType
http:purl.orgeprinttypeJournalArticle
http:purl.orgdctermsavailable
epdcx:sesURI http:purl.orgdctermsW3CDTF 2011-06-06
http:purl.orgdcelements1.1publisher
BioMed Central Ltd
http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus
http:purl.orgeprintstatusPeerReviewed
http:purl.orgeprinttermscopyrightHolder
Yuansha Chen et al.; licensee BioMed Central Ltd.
http:purl.orgdctermslicense
http://creativecommons.org/licenses/by/2.0
http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights
http:purl.orgeprintaccessRightsOpenAccess
http:purl.orgeprinttermsbibliographicCitation
BMC Genomics. 2011 Jun 06;12(1):294
http:purl.orgdcelements1.1identifier
http:purl.orgdctermsURI http://dx.doi.org/10.1186/1471-2164-12-294
fileSec
fileGrp sword-mets-fgrp-1 USE CONTENT
file sword-mets-fgid-0 sword-mets-file-1
FLocat LOCTYPE URL xlink:href 1471-2164-12-294.xml
sword-mets-fgid-1 sword-mets-file-2 applicationpdf
1471-2164-12-294.pdf
sword-mets-fgid-3 sword-mets-file-3 applicationpostscript
1471-2164-12-294-S3.EPS
sword-mets-fgid-4 sword-mets-file-4 applicationvnd.ms-excel
1471-2164-12-294-S2.XLS
sword-mets-fgid-5 sword-mets-file-5
1471-2164-12-294-S1.XLS
structMap sword-mets-struct-1 structure LOGICAL
div sword-mets-div-1 DMDID Object
sword-mets-div-2 File
fptr FILEID
sword-mets-div-3
sword-mets-div-4
sword-mets-div-5
sword-mets-div-6