SOLiD sequencing of four Vibrio vulnificus genomes enables comparative genomic analysis and identification of candidate clade-specific virulence genes

Material Information

SOLiD sequencing of four Vibrio vulnificus genomes enables comparative genomic analysis and identification of candidate clade-specific virulence genes
Series Title:
BioMed Central Genomics
Gulig, Paul A.
de Crécy-Lagard, Valérie
Wright, Anita C.
Walts, Brandon
Telonis-Scott, Marina
McIntyre, Lauren M.
BioMed Central
Publication Date:


Background: Vibrio vulnificus is the leading cause of reported death from consumption of seafood in the United States. Despite several decades of research on molecular pathogenesis, much remains to be learned about the mechanisms of virulence of this opportunistic bacterial pathogen. The two complete and annotated genomic DNA sequences of V. vulnificus belong to strains of clade 2, which is the predominant clade among clinical strains. Clade 2 strains generally possess higher virulence potential in animal models of disease compared with clade 1, which predominates among environmental strains. SOLiD sequencing of four V. vulnificus strains representing different clades (1 and 2) and biotypes (1 and 2) was used for comparative genomic analysis. Results: Greater than 4,100,000 bases were sequenced of each strain, yielding approximately 100-fold coverage for each of the four genomes. Although the read lengths of SOLiD genomic sequencing were only 35 nt, we were able to make significant conclusions about the unique and shared sequences among the genomes, including identification of single nucleotide polymorphisms. Comparative analysis of the newly sequenced genomes to the existing reference genomes enabled the identification of 3,459 core V. vulnificus genes shared among all six strains and 80 clade 2-specific genes. We identified 523,161 SNPs among the six genomes. Conclusions: We were able to glean much information about the genomic content of each strain using next generation sequencing. Flp pili, GGDEF proteins, and genomic island XII were identified as possible virulence factors because of their presence in virulent sequenced strains. Genomic comparisons also point toward the involvement of sialic acid catabolism in pathogenesis. ( en )
General Note:
Publication of this article was funded in part by the University of Florida Open-Access publishing Fund. In addition, requestors receiving funding through the UFOAP project are expected to submit a post-review, final draft of the article to UF's institutional repository, IR@UF, ( at the time of funding. The Institutional Repository at the University of Florida (IR@UF) is the digital archive for the intellectual output of the University of Florida community, with research, news, outreach and educational materials

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:

UFDC Membership

University of Florida Institutional Repository


This item has the following downloads:

Full Text

Gulig et al. BMC Genomics 2010, 11:512


SOLID sequencing of four Vibrio vulnificus

genomes enables comparative genomic analysis

and identification of candidate clade-specific

virulence genes

Paul A Gulig", Valerie de Crecy-Lagard2, Anita C Wright3, Brandon Walts Marina Telonis-Scottl4,
Lauren M Mclntyre'

Background: Vibrio vulnificus is the leading cause of reported death from consumption of seafood in the United
States. Despite several decades of research on molecular pathogenesis, much remains to be learned about the
mechanisms of virulence of this opportunistic bacterial pathogen. The two complete and annotated genomic DNA
sequences of V. vulnificus belong to strains of clade 2, which is the predominant clade among clinical strains. Clade
2 strains generally possess higher virulence potential in animal models of disease compared with clade 1, which
predominates among environmental strains. SOLiD sequencing of four V. vulnificus strains representing different
clades (1 and 2) and biotypes (1 and 2) was used for comparative genomic analysis.
Results: Greater than 4,100,000 bases were sequenced of each strain, yielding approximately 100-fold coverage for
each of the four genomes. Although the read lengths of SOLiD genomic sequencing were only 35 nt, we were
able to make significant conclusions about the unique and shared sequences among the genomes, including
identification of single nucleotide polymorphisms. Comparative analysis of the newly sequenced genomes to the
existing reference genomes enabled the identification of 3,459 core V vulnificus genes shared among all six strains
and 80 clade 2-specific genes. We identified 523,161 SNPs among the six genomes.
Conclusions: We were able to glean much information about the genomic content of each strain using next
generation sequencing. Flp pili, GGDEF proteins, and genomic island XII were identified as possible virulence
factors because of their presence in virulent sequenced strains. Genomic comparisons also point toward the
involvement of sialic acid catabolism in pathogenesis.

Vibrio vulnificus is an opportunistic pathogen that
causes sepsis in humans after ingestion of contaminated
raw oysters or wound infection and necrotizing fasciitis
from contamination of wounds (for a review see [1,2]).
The mortality rates for sepsis and wound infection are
~50% and ~15%, respectively. During infection of
humans the bacteria replicate rapidly, extensively invade
tissues, and cause severe tissue destruction. Mouse mod-
els of infection coupled with molecular genetic analysis

* Correspondence' guligufl edu
Department of Molecular Genetics and Microbiology, University of Florida,
Gainesville, Florida, USA
Full list of author information is available at the end of the article

have identified several virulence factors partially explain-
ing the high mortality and extreme tissue destruction,
most importantly, polysaccharide capsule [3,4], RtxAl
toxin [5-7], acquisition of iron [8,9], pili [10,11], and fla-
gella [12,13]. However, these factors do not completely
explain the remarkable virulence of V vulnificus.
V vulnificus can be classified in several different man-
ners. One of the first classification schemes was based
on biochemical reactions of strains initially yielding two
biotypes: biotype 1 most often associated with contami-
nation of oysters and causing human disease and bio-
type 2 associated with infection of eels [14]. Recently, a
third biotype that caused wound infection from handling
fish in Israel was identified [15]. Genetic analysis using

0 BioMed Central

2010 Gulig et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Common
Attribution License (, which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.

Gulig et al. BMC Genomics 2010, 11:512

analysis of ribosomal RNA loci [16,17], multilocus
sequence typing (MLST) [18-20], and virulence-corre-
lated gene (vcg) PCR [21] revealed that V. vulnificus
strains could be divided into two groups. While the
descriptors for these two groups vary (clades, popula-
tions, clusters, and lineages), the terms clade 1 and
clade 2 are used here to follow the MLST clusters of
Bisharat et al. [19]. Biotype 1 strains are present in both
clades, whereas biotype 2 strains are present only in
clade 1. Based on MLST analysis, biotype 3 strains
appear to be a hybrid between clades 1 and 2 [18].
Clade 1 strains are most often isolated from environ-
mental samples, while clade 2 strains are most often
associated with human disease. Because of these epide-
miological patterns, many investigators hypothesized
that clade 2 strains possess inherently greater virulence.
In an analysis of 69 biotype 1 V. vulnificus strains, we
recently determined that both clade 1 and clade 2
strains have the ability to cause severe skin infection in
subcutaneously inoculated iron dextran-treated mice
(Thiaville, P.C. et al., Infect. Immun, submitted; Jones,
M. et al., in preparation). The major distinction between
the clades was that clade 2 strains had a greater propen-
sity to cause systemic infection and death in the mouse
model, although there were some attenuated clade 2
strains and highly virulent clade 1 strains.
Analysis of the genomic DNA sequences of clade 1
and clade 2 strains would contribute to the identifica-
tion of genetic differences among strains. As microbes
engage in lateral gene transfer and are often highly
divergent in genomic content, this study could help
identify genes responsible for the differences in viru-
lence between these clades. Both of the complete and
annotated V. vulnificus genomes are of clade 2 strains,
CMCP6 (GenBank accession numbers AE016795 and
AE016796) and YJ016 (GenBank accession numbers
BA000037, BA000037, AP005352). The lack of geno-
mic sequence data from clade 1 strains is a serious
impediment to understanding the differences in viru-
lence between the two clades and in dissecting the
virulence of V. vulnificus in general. We therefore
undertook the present study to rapidly and economic-
ally obtain genomic sequence of numerous V. vulnificus

strains representing both clades and the two major
We hypothesized that clade 2 strains are more viru-
lent, at least in part, because they contain unique viru-
lence genes that are missing in most clade 1 strains.
Therefore, identifying DNA sequences common to clade
2 strains and missing from clade 1 strains would create
a set of putative virulence genes that could be subse-
quently experimentally examined. Because of the pro-
pensity of clade 1 strains to be associated with oysters,
these strains may possess unique genes enabling coloni-
zation of shellfish. Therefore, unique clade 1 genes offer
insight into the Vibrio-oyster relationship. However, not
all genes uniquely associated with one clade will be
involved with interactions with animal hosts, and viru-
lence genes will not necessarily be present only in viru-
lent genotypes. An example of the former is that the
ability of V vulnificus to ferment mannitol is associated
with the cluster of strains that we are calling clade 2
most often derived from clinical cases [22], and an
example of the latter is the nearly universal presence of
the RtxAl toxin in both virulent and attenuated V vul-
nificus strains (Joseph, J.L., et al., in preparation). Finally,
by comparing the genomes of a variety of strains repre-
senting the different clades and biotypes, the set of
genes in the V vulnificus genome shared by all V vulni-
ficus strains can be identified. Over and above identify-
ing relationships between the presence and/or absence
of genes among strains, identifying single nucleotide
polymorphisms (SNPs) could also reveal the genetic
basis for differential virulence and shellfish-colonizing
phenotypes, as well as other phenotypes.
Given these goals, we used the SOLiD sequencing system
on four V. vulnificus strains, each of which represented
a unique genotype/virulence phenotype combination
(Table 1). V vulnificus M06-24/O [4] is a typical clade
2 strain exhibiting a high level of virulence in the sub-
cutaneously inoculated iron dextran-treated mouse
model [23-25]. Strain 99-520 DP-B8 [25] is a typical
clade 1 strain that can infect skin tissues but is defec-
tive at causing systemic infection and death in the
mouse model. Strain 99-738 DP-B5 [25] is an unusual
clade 1 strain that is highly virulent in the mouse

Table 1 Genotypes and virulence phenotypes of the V. vulnificus strains whose genomes were sequenced in this
Strain Source Biotype MLST* vcg* rrn rep-PCR* Skin Infection Liver Infection/Death

99-520 DP-B8
99-738 DP-B5
ATCC 33149


*Virulence data for biotype 1 strains are from Thiaville, P.C., et al. (Infect. Immun., submitted) and for ATCC 33149 are from this study. MLST, vcg, and rep-PCR
data are from Mahmud, et al. [83]. rrn data for biotype 1 strains are from Thiaville, P.C., et al. (Infect. Immun., submitted) and for ATCC 33149 are from Vickery
et al. [84].

Page 2 of 16

Gulig et al. BMC Genomics 2010, 11:512

model, causing systemic infection and death. ATCC
33149 [26] is typical biotype 2 strain isolated from
an eel. Using SOLiD sequencing enabled us to obtain
approximately 100X coverage with 35-nt reads among
four genomes. This selection of strains analyzed with
SOLiD sequencing enabled comparative genomics to be
performed and identified clade 2-specific genomic
sequences and the genes of V vulnificus shared among
all of the strains sequenced to date.

Numbers of SOLiD sequencing reads
We performed SOLiD sequencing on the genomes of
four V. vulnificus strains to increase the understanding
of the genetic differences between the two major clades
and the biotypes of this organism and to possibly iden-
tify sequences associated with differences in virulence
potential in our subcutaneously inoculated iron dextran-
treated mouse model [23-25]. V. vulnificus 99-520 DP-
B8 and 99-738 DP-B5 are clade 1 strains, typically iso-
lated from environmental sources. Strain 99-520 DP-B8
exhibits the typical attenuated virulence of clade 1
strains, i.e., it can cause skin infection but is defective at
causing systemic infection and death. In contrast, strain
99-738 DP-B5 exhibits a high level virulence more char-
acteristic of clade 2 strains, i.e., it causes skin infection,
systemic infection, and death (Thiaville, P.C., et al.,
Infect. Immun., submitted). V vulnificus M06-24/O is a
typical clade 2 strain with full virulence that has been
widely used in examining molecular pathogenesis by
many laboratories [4]. V. vulnificus ATCC 33149 is a
biotype 2 strain isolated from an eel [26]. Genomic DNA
from each of these strains was loaded onto one-fourth of
a 25 mm x 75 mm SOLiD" slide for sequencing on an
Applied Biosystems SOLiD" apparatus at the University
of Florida Interdisciplinary Center for Biotechnology
Research, as described in the Methods. The total num-
bers of 35-bp reads for each strain were as follows: 99-
520 DP-B8 3.16 x 107, 99-738 DP-B5 3.21 x 107 ,
M06-24/O 3.50 x 107, and ATCC 33149 3.38 x 107.
These totals represented putative 210- to 239-fold cover-
age of each of the genomes, on the assumption that all of
the data were usable. The reads from each of the four
newly sequenced genomes have been deposited into the
NCBI Short Read Archive (accession number

Comparison of SOLiD sequencing reads to reference
V. vulnificus genomes
Reads were mapped onto the two reference V vulnificus
genomes, CMCP6 and YJ016 using MAQ [27]. This analy-
sis enabled the identification of DNA sequences and ORFs
that were present in the newly sequenced strains that have
already been described in the reference strains. Graphical

representation of the coverage of the CMCP6 and YJ016
genomes by the reads from each of the four newly
sequenced V vulnificus strains is shown in Figure 1.
We then mapped reads to plasmids described for bio-
type 2 V. vulnificus and whose DNA sequences are known
(pR99, pC4602-1, and pC4602-2) [28]. As expected,
greater than 90% of all three of these reference sequences
were matched to reads from biotype 2 ATCC 33149, and
lesser homologies were observed for the biotype 1 strains
(Additional File 1, Table Sl). For clade 1 strains, between
37 and 56% of these plasmid sequences matched with 99-
738 DP-B5, and only 6 to 20% of plasmid sequenced
matched the SOLiD reads from strain 99-520 DP-B8.
These results suggested that 99-738 DP-B5 would have a
plasmid, whereas 99-520 DP-B8 would not, and we con-
firmed this by gel electrophoresis of extracted plasmid
DNA (data not shown). The reads from strain M06-24/O,
which is a clade 2 strain and least related to the other
strains, only matched to 1% of plasmid pC4602-1 and
failed to match to any sequences of plasmids pC4602-2
and pR99. This is in agreement with M06-24/O not having
a plasmid [29].
Despite the prediction of approximately 210-fold cover-
age based on the raw number of reads obtained for each
genome, coverage was actually on the order of 100-fold.
In total, 45% to 64% of the raw sequencing reads mapped
to one of the two reference genomes, leaving a consider-
able number of unmapped reads. Some of these reads
were of low complexity and may represent sequencing
error. Because approximately 14% of both CMCP6 and
YJ016 are low complexity, these unmapped reads also
may be derived from regions of low complexity in the
sequenced genomes. It is a limitation of the short read
technology that we cannot distinguish among these sce-
narios. For the remaining unmapped reads that were not
of low complexity, there are two possibilities: these reads
represented truly unique sequences for the newly
sequenced genomes or these reads were errors in the
sequencing system. In an attempt to separate these two
possibilities, these unmapped reads were compared to
several bacterial genomes by mapping the reads in
SOLiD colorspace using MAQ [27]. This would identify
orthologs of V. vulnificus strains in other species. The
largest number of matches (273,045) was found with the
genomic sequence of V. cholerae NC16961 (GenBank
accession numbers AE003852 and AE003852). (Addi-
tional File 2, Table S2). These V. cholerae matches
yielded 20 genes in total from the four sequenced gen-
omes. Of these V. cholerae genes, sixteen were identified
from only a single V vulnificus strain. Other novel genes
may still be found, but they would be genes not pre-
viously identified in any other bacterial genomes.
There were between 15 and 22 million unmatched
reads for each of the newly sequenced genomes. The

Page 3 of 16

Gulig et al. BMC Genomics 2010, 11:512

CMCP6 Chromosome 1

YJO16 Chromo~

CMCP6 Chromosome 2

MO' 6 j |

B5 i /iB
13 lili ,I|||||i|
~~uiu Yii/~

isome YJ016 Chromosome 2
some 1 YJ016 Chromosome 2

ATCCjjjjjju ui III
iiiiii Ml U Allrdl

YJ016 Plasmid




Figure 1 Graphical representation of coverage of the reference genome components by sequences of each of the four newly
sequenced genomes. The depth of coverage (number of matched 35-nt reads per 100-nt window of the reference genomes) is plotted for
both chromosomes of the reference CMCP6 and YJ016 genomes and the YJ016 plasmid. The source strain for the reads being matched are as
follows: M06 M06-24/0, B5 99-738 DP-B5, B8 99-520 DP-B8, ATCC ATCC 33149. It should be noted that coverage of the reference genomes
is not as continuous as it appears in the figures.

cause of such a large amount of data with no similarity
to known genes cannot be explained by low complexity
alone, as many of these reads are not of low complexity.
While it remains possible that novel genes are included
in these data, it is also possible that these reads are just
noise from the technology.

Figure 1, which graphically shows the coverage of
the reference genome elements by each of the newly
sequenced genomic reads, reveals large regions on each
of the reference genomic elements for which there were
no matched reads from each of the newly sequenced
genomes. Detailed comparisons of coverage generated

Page 4 of 16




Gulig et al. BMC Genomics 2010, 11:512

lists of the genes of CMCP6 and YJ016 lacking signifi-
cant depth of coverage from the newly sequenced reads
(Additional File 3, Table S3A and Additional File 4,
Table S3B, respectively). There were 309 ORFs unique
to CMCP6 and 489 ORFS unique to YJ016 relative to
the other five sequenced strains. In CMCP6 chromo-
some 1, two large regions that were not present in any
of the four newly sequenced genomes included genes
VV1_0063 to VV1_0124 and VV1_0374 to VV1_0400.
These regions, which were also missing from YJ016,
appear to encode phage genomes. They contain genes
annotated as bacteriophage phi 1.45 protein-like protein
(VV1_0066), P2-like prophage tail protein x (VV1_0086),
phage integrase (VV1_0372), or they resembled other
mobile genetic elements with putative transposases
(VV1_0385, VV1_0386).
Another CMCP6-specific region spanned the beginning
and ends of chromosome 1 (genes VV1_0001 to
VV1_0011 and VV1_3192 to VV1_3205). This region
also appeared to encode a phage or other mobile genetic
element. A smaller CMCP6-specific region located at
genes VV1_0777 to VV1_0781 appeared to encode sugar
metabolism genes possibly involved in lipopolysaccharide
(LPS) or capsular biosynthesis. CMCP6 chromosome 2
contained a very large region at genes VV2_0630 to
VV2_0712 not present in any other strains. This region
appeared to have been derived from a mobile genetic
element, either a phage or transposon. There were also
smaller regions unique to CMCP6 on chromosome 2.
The YJ016 genome similarly contained numerous
regions that were not present in any of the other newly
sequenced genomes. On chromosome 1, YJ016-specific
genes were located at VV0130 to VV0165, VV0343 to
VV0367, VV0514 to VV0559, VV0799 to VV0817, and
VV2191 to VV2262. The largest of these YJ106 regions
at VV2191 to VV2262 appeared to be phage-related.
A similar pattern was evident for YJ016 chromosome 2.
A very large YJ106-specific region spanning VVA0825
to VVA0888 was notable. This region consisted mainly
of hypothetical proteins, but there is a possibility that
this region is phage-related, as VVA0886 is annotated as
a phage integrase.
The coverage of the YJ106 plasmid, which encodes 69
genes, was very different among the four newly
sequenced genomes. The genomes containing the most
matches were 99-738 DP-B5 and ATCC 33149, with 50
and 44 genes, respectively, matched to the YJ016 plas-
mid. As expected, both 99-738 DP-B5 and ATCC 33149
contain plasmids. None of the YJ016 plasmid genes
matched to the reads of 99-520 DP-B8 or M06-24/O,
neither of which contains plasmids.
V. vulnificus, like other Vibrio species, encodes a
super-integron on its large chromosome [30]. Integrons
are specific regions of genomic sequence that have the

ability to accumulate gene cassettes via site-specific
recombination [31]. They are located in genomes at attI
sites and contain a site-specific integrase, intl, that med-
iates acquisition of gene cassettes at repetitive attC sites,
which are generally conserved among closely related
bacteria. The vibrio integrons are called super-integrons
because of their unusually large sizes [32]. In CMCP6
the super-integron spans genes VV1_2401 to VV1_2501,
and in YJ016 the super-integron spans genes VV1745 to
VV1941. As shown in Additional File 3, Table S3A and
Additional File 4, Table S3B, the genes encoded within
these super-integrons are mostly strain-specific, not hav-
ing significant homology with the four newly sequenced
genomes or each other. It is interesting that the super-
integrons did not appear in Figure 1 as missing from
the newly sequenced genomes, most likely because of
the attC sites and presence of infrequent homologous
genes between the genomes.
In contrast to identifying sequences missing from the
newly sequenced genomes, we also identified the genes
shared among all of the six genomes, thereby identifying
the core V vulnificus genome. Up to this point, shared
genes based on the two reference genomes numbered
3,915 genes. Adding our four newly sequenced genomes,
there are 3,459 genes common to all sequenced V. vul-
nificus strains, listed in Additional File 5, Table S4. The
number of shared genes can only get smaller as more
genomes are sequenced. Since there are 4,473 protein-
coding genes in the CMCP6 genome and 5,024 protein-
coding genes in the YJ016 genome, but only 3,915 genes
shared between them, there is clearly an enormous
amount of strain-specific sequence among these clade 2
strains. The frequency of hypothetical proteins in the
core genome was 20.3% compared with the overall fre-
quency of 23.6% in the CMCP6 genome.
The total number of genes obtained by combining the
CMCP6 and YJ016 reference genomes and excluding
redundancy is 5,630. Among the 4,473 genes in the
CMCP6 genome, 309 (6.9%) were unique to this strain,
and among the 5,026 genes in the YJ016 genome, 489
(9.7%) were unique to this strain relative to all of the
other genomes. By combining the matches for each
strain with the reference genomes we identified the fol-
lowing numbers of genes for each strain: ATCC 33149 -
4,184; 99-738 DP-B5 4,359; 99-520 DP-B8 4,225; and
M06-24/O 4,534.

Genomic inference of different V. vulnificus genotypes
We asked which genes were common only to the three
biotype 1/clade 2 strains, but not present in the two bio-
type 1/clade 1 strains or the biotype 2 strain, because
this could help identify the genes that are responsible
for the increased virulence of clade 2 strains (Thiaville,
P.C. et al., Infect. Immun., submitted). The 80 clade 2-

Page 5 of 16

Gulig et al. BMC Genomics 2010, 11:512

specific genes are listed in Table 2. Among the notable
clade 2-specific genes and regions are several GGDEF
proteins (genes VV12061, VV1_2228, VV1_2321 in the
CMCP6 genome) and a Flp pilus-coding region (genes
VV1_2330 to VV1_2337 in the CMCP6 genome).
GGDEF proteins are involved with signal transduction
in many bacteria by regulating intracellular levels of the
signaling molecule cyclic-di-GMP [33], and Flp pili
could be involved with adherence or genetic exchange
[34]. Hypothetical proteins comprised 36.3% of the clade
2-specific genes, compared with the overall frequency of
23.6% of hypothetical proteins in the CMCP6 genome.
Because the reference strains are both clade 2, any clade
1-specific genes will be missed in this initial mapping.
Strain 99-520 DP-B8 is a typical clade 1 strain with
attenuated virulence, while strain 99-738 DP-B5 is a
clade 1 strain with high virulence typical of clade 2
strains. There were 61 genes in 99-738 DP-B5 that were
common to the three clade 2 strains but missing from
attenuated clade 1 strain 99-520 DP-B8 and biotype 2
strain ATCC 33149. (Table 3). This set of genes could
contain virulence genes acquired by 99-738 DP-B5 that
endow it with clade 2-like virulence. Hypothetical pro-
teins comprised 19.7% of this set of genes, compared
with the overall frequency of 23.6% hypothetical pro-
teins in the CMCP6 genome. It is noteworthy that the
clade 2 + 99-738 DP-B5 specific set of genes includes
genomic island XII identified by Cohen et al. [20] as
being present in most clade 2 strains and missing from
most clade 1 strains (genes VV2_1090 to VV2_1111 on
the CMCP6 genome). They hypothesized that genomic
island XII could be responsible for the putative differen-
tial virulence of clade 2 strains, evidenced by their asso-
ciation with clinical cases.
Within genomic island XII are paralogs of galactose
utilization genes (VV2_1095, a paralog of galE2 encod-
ing UDP-glucose 4-epimerase and VV2_1094, a paralog
of galT2 encoding galactose-1-phosphate uridylyltrans-
ferase) that are in an operon with a predicted sulfate
transporter (VV2_1096). The canonical galE (VV1_1770)
and galT (VV1_1771) are located elsewhere in the
galETKM operon (VV_1770 to VV1_1773). The pre-
sence of additional galET genes in a subset of V vulnifi-
cus strains with high virulence suggests a role for these
genes in another metabolic pathway possibly benefitting
the bacteria during infection.
The link with sulfate metabolism was intriguing
because five other genomic island XII genes are anno-
tated as arylsulfatase A (VV2_1106, VV2_1108,
VV2_1109, and VV2_0151) or alkyl sulfatase
(VV2_0989). These enzymes hydrolyze the sulfate from
sulfated gangliosides (sulfatides). VV2_1098 and
VV2_1110 in the genomic island encode chondroitinases
(although they are not annotated as such in the

reference genome sites). Sulfatides are important com-
ponents of connective tissue involved with cell adhesion
[35] and serve as the receptors for various microbial
pathogens ranging from HIV [36], Bordetella pertussis
[37], and Helicobacter pylori [38]. An arylsulfatase of E.
coli K1 is necessary for invasion of the blood-brain bar-
rier [39]; hence, such activity in virulent V. vulnificus
strains could enable invasion through tissues, which is
characteristic of V. vulnificus infection. In clinical
V vulnificus isolates, the presence of region XII, encod-
ing arylsulfatases, chondroitinases, sulfate transport, and
sulfate metabolism functions, suggests that this region
may have an important scavenging function removing
sulfate groups from host components, thereby providing
sulfur and/or carbon sources, which could facilitate sur-
vival in the human host where free sulfur is limited.
However, as noted above, some of the degradative
enzymes in genomic island XII could also be involved in
invasion of tissues. Cohen et al. [20] had noted the pre-
sence of such genes in genomic island XII predominant
in the clade of V. vulnificus strains most associated with
clinical strains. The exclusive presence of all of these
genes in clade 2 plus the highly virulent clade 1 strain
99-738 DP-B5 suggests a role in virulence. The dissec-
tion of the roles in virulence, if any, played by these
genomic island XII genes identified through our com-
parative genomic analysis will await construction and
analysis of specific mutants. However, Bryant et al. [40]
described the use of sodium dodecyl sulfate-polymyxin
B-sucrose plates for the identification of V. vulnificus
from shellfish samples. The ability of bacteria to form
halos around colonies on this medium is indicative of
alkyl sulfatase activity. In contrast to our determination
that VV2_0989 is absent in the biotype 2 strain ATCC
33149 and clade 1 strain 99-520 DP-B8 and the results
of Cohen et al. [20] similarly describing the limited pre-
sence of genomic island 12 among V. vulnificus strains,
Bryant et al. observed that all 20 V. vulnificus strains
examined possessed alkyl sulfatase activity. However,
VV2_0885 and VV2_1032 are also annotated as alkyl
sulfatase. Our results show that VV2_0885 is present in
all six strains except 99-738 DP-B5 and VV2_1032 is
present in all six strains. Hence, it would be expected
that all V vulnificus strains would exhibit alkyl sulfatase
activity, in agreement with Bryant et al. [40].
Also of note in the clade 2 plus 99-738 DP-B5-specific
genes not present in genomic island XII were linked genes
possibly involved with sialic acid catabolism: N-acetylneur-
aminate lyase (NanA, VV2_0730), a TRAP transport sys-
tem possibly involved with sialic acid transport (VV2_0731
to VV2_0733), N-acetylmannosamine-6-phosphate 2-epi-
merase (NanE, VV2_0734), N-acetylmannosamine kinase
(NanK, VV2_0735), and N-acetylglucosamine-6-phosphate
deacetylase (NagA, VV2_0736). Because the nagB gene

Page 6 of 16

Gulig et al. BMC Genomics 2010, 11:512

Table 2 Clade 2-specific genes
Tag Product
W1_0456 putative transcriptional regulator
W1_0457 hypothetical protein W1_0457

0458 hypotheti
0459 hypotheti
0465 exopolypl

al protein
al protein

hypothetical protein W1
hypothetical protein W1
hypothetical protein W1
hypothetical protein W1
chromosome seareaatior


1095 Serine/threonine protein kinase
1518 3-methyladenine DNA glycosylase
1751 hypothetical protein W1_1751
2031 Type I restriction enzyme EcoEl M
2037 Type I restriction enzyme EcoEl R
2038 transcriptional regulator
2061 GGDEF family protein OMPH_PHOPR
porin-like protein H
2114 precursor
2115 hypothetical protein W1_2115

methyl-accepting c
hypothetical proteir
ATPase involved in
GGDEF family prote
GGDEF family prote
hypothetical proteir
hypothetical proteir
hypothetical proteir

lemotaxis protein
DNA repair

FIp pilus assembly protein CpaB
FIp pilus assembly protein
hypothetical protein W1_2332
pilus assembly protein CpaE-like
FIp pilus assembly protein
FIp pilus assembly protein TadB

Ip pilus assembly protein
Ip pilus assembly protein
hypothetical protein W1
hypothetical protein W1
hypothetical protein W1_

n TadD

2341 azoreductase
2401 super-integron integrase IntlA
2708 hypothetical protein W1_2708
2748 response regulator
2758 amino acid transporter
2840 NhaP-type Na+/H+ and K+/H+

12 0073

methyl-accepting chemotaxis protein
hypothetical protein W1_3144
alcohol dehydrogenase
anti-anti-sigma regulatory factor

Table 2: Clade 2-specific genes (Continued)

Gene Cog









/2 0077


/2 1363

anti-anti-sigma regulatory factor
anti-sigma regulatory factor
Serine phosphatase RsbU
FOG: CheY-like receiver
response regulator AraC-type DB
binding domain-containina

type DNA

al protein W2_0312
al protein W2_0627 AraC-
bindinq domain-containine

major facilitator superfamily permease
hypothetical protein W2_0851
hypothetical protein W2_0864
long-chain fatty acid ABC transporter
Mg2+ and Co2+ transporter
transcriptional regulator
multidrug resistance efflux pump

hypothetical protein
hypothetical protein

transcriptional regulator
hypothetical protein W2_
glutathione synthetase
transcriptional regulator
Ca2+/H+ antiporter
hypothetical protein W2_
hypothetical protein W2_
hypothetical protein W2_
Beta-glucosidase-related <
DMT family permease
transcriptional regulator

/7 1149



COG4965U (VV2_1200, glucosamine-6-P deaminase) is in the V.
COG4965U vulnificus core genome, the clade 2 strains and 99-738
COG5010U DP-B5 uniquely have the ability to assimilate exogen-
ous sialic acid into central metabolism as fructose-
COG4961U 6-phosphate, relative to the other clade 1 strains and
biotype 2 strains. However, V. vulnificus does not
acpD COG11821 encode a neuraminidase (NanH) which would liberate
COG4974L sialic acid from host components. Almagro-Moreno
and Boyd [41] had noted that sialic acid metabolism
COG3437KT was unique to bacteria that interacted with mammalian
hosts, either as pathogens or as commensals. Jeong et
COG0025P al. [42] recently constructed a nanA deletion in V vul-
nificus and confirmed that the ability to utilize exogen-
COG0840NT ous sialic acid was essential for virulence in
intraperitoneally inoculated iron dextran-treated mice,
COG1454C as well as cytotoxicity in cell culture assays. They
COG1366T focused analysis of nanA on a single V. vulnificus

Page 7 of 16












Gulig et al. BMC Genomics 2010, 11:512

Table 3 Genes common to V. vulnificus 99 738 DP B5 and
clade 2 strains
Tag Product Gene cog

0251 hypothetical protein W1_0251
0411 choline-glycine betaine transporter
0638 mannitol/fructose-specific
phosphotransferase system IIA protein
0639 mannitol-1-phosphate 5-
0640 mannitol repressor protein



12 0542

DMT family permease
hypothetical protein W1_0835
H+/gluconate symporter
sugar diacid utilization regulator
helicase-related protein
tellurite resistance protein-related
response regulator
putative transcriptional regulator
arylsulfatase A
methyl-accepting chemotaxis protein
manganese transporter NRAMP

'2_u/2z nypotetical protein vvz_u/z
/2_0729 transcriptional regulator
/2_0730 dihydrodipicolinate synthase/
Nacetylneuraminate lyase
22_0731 TRAP-type C4-dicarboxylate transport
/2_0732 TRAP-type C4-dicarboxylate transport
/2_0733 TRAP-type C4-dicarboxylate transport
/2_0734 N-acetylmannosamine-6-phosphate 2-
22_0735 N-acetylmannosamine kinase
/2_0736 N-acetylglucosamine-6-phosphate

12 1035

diadenosine tetraphosphate hydrolase
arsenite efflux pump ACR3
transcriptional regulator
hypothetical protein W2_0988
Alkyl sulfatase
ABC transporter permease

/2_1090 hypothetical protein W2_1090
/2_1091 hypothetical protein W2_1091
/2_1092 hypothetical protein W2_1092
/2_1093 2-deoxy-D-gluconate 3-
/2_1094 galactose-1-phosphate
/2_1095 UDP-glucose 4-epimerase
/2_1096 Sulfate permease
/2_1097 hypothetical protein W2_1097



mtlR COG3722K




Table 3: Genes common to V. vulnificus 99 738 DP B5
and clade 2 strains (Continued)

CRS dnmain-contai


_2 199 methyl-accepting chemotaxis protein
/2_1100 ATPase component of various ABC-
type transport system
/2_1101 ABC-type dipeptide/oligopeptide/
nickel transport system
/2_1102 ABC-type dipeptide/oligopeptide/
nickel transport system
/2_1104 ABC-type dipeptide transport system
/2_1105 hypothetical protein W2_1105
/2_1106 arylsulfataseA
/2 1107 arvlsulfatase regulator

108 arylsulfatase A
109 arylsulfatase A
110 hypothetical protein W2_
259 hypothetical protein W2_
403 GGDEF domain-containin
505 hypothetical protein W2_
508 putative two-component
509 GGDEF family protein
510 response regulator
511 response regulator VieA
512 sensor kinase VieS

g protein



COG0601 EP

COG0641 R



COG1593G strain and did not perform comparative genetics
among strains of different genotypes or virulence phe-
notypes. The summation of these data regarding nanA
COG 638G is that our comparative genomic sequencing correctly
identified unique virulence genes among different sets
COG3010G of V vulnificus.
Another carbon source utilization pathway specific to
COG1940KG the clade 2 plus 99-738 DP-B5 strains but not in genomic
COG1820G island XII is a complete mannitol catabolic pathway

encoding the mannitol/fructose-specific phosphotransfer-
COG537FGR ase system IIA protein (mtlABC, W1_0638), mannitol-1-
COG0798P phosphate 5-dehydrogenase (mtlD, VV1_0639), and a
COG0640K specific mannitol repressor (mtlR, VV1_0640). The sig-
COG388R .
nificance of these genes to virulence is unknown. Inter-
estingly, by examining 465 V vulnificus strains, Drake et
COG2015R al. [22] previously determined that the ability to ferment
mannitol by V. vulnificus was highly correlated with a
strain being in, what we are calling, clade 2. Tison et al.
[14] reported that biotype 2 strains were mannitol-
negative. Our sequencing data, albeit on a considerably
smaller sample size of strains, therefore corroborate the

COG1085C phenotypic analyses of these two previous studies.

COG1087M SNP analysis
COG0659P In addition to the presence or absence of whole genes or
blocks of genes, detailed above, genetic variation among
the sequenced strains also consisted of nucleotide

Page 8 of 16


Gulig et al. BMC Genomics 2010, 11:512

polymorphisms. We used MAQ to identify SNPs present
in the newly sequenced genomes relative to the reference
genomes. The SNPs from each of the pairwise analyses
versus the reference genomes are listed in Additional Files
6, 7, 8, 9, 10, 11, 12, and 13, and the summary of the num-
bers of SNPs from each sequenced strain relative to the
reference genomes is shown in Table 4. In examining
SNPs, we did not exclude any sets of genes, such as puta-
tive mobile genetic elements, e.g., phages. It is interesting
that M06-24/O, which had the highest amount of coverage
relative to the reference genomes, had the lowest number
of SNPs (mean of 42,191 SNPs per reference genome)
compared with the other three strains (mean of 73,130
SNPs per reference genome). This likely reflects the fact
that M06-24/O is in the same clade as the reference
The accuracy of the SOLiD-based SNPs in identifying
polymorphisms was verified by examining Sanger
sequencing of specific genomic regions of each of these
strains. Having examined 8.7 kb of Sanger-derived
sequence that contained SNPs identified from our
SOLiD sequencing, we determined that 126 of 128 SNPs
were accurately identified (98.4% accuracy).
We then examined the distribution of nonredundant
SNPs among different sets of annotated ORFs using the
CMCP6 reference genome. It must be emphasized that
the vast majority of annotated ORFs have not been
experimentally verified; hence, such an analysis is con-
jectural. Of the 201,981 nonredundant SNPs in the
CMCP6 genome from all four sequenced strains,
177,464 fell within annotated ORFs (87.9%). This was
not unexpected since this figure approximates the
amount of the genome contained within annotated
ORFs [30]. However, other interesting trends were evi-
dent. There were highly significant differences in the
frequencies of SNPs between chromosomes 1 and 2 of
CMCP6. Among the annotated ORFs, there were 0.037
SNPs/base for chromosome 1 and 0.044 SNPs/base in
chromosome 2. Among the other sets of ORFs, there
were significantly more SNPs/base in the core genome
(0.043 SNPs/base) than in the total ORFs (0.040 SNPs/
base) (Figure 2). As opposed to the inference that the
core genome is actually more variable among strains,
this difference most likely is due to the fact that the
core genome, by definition, was shared among all of the
sequenced strains; hence, had more shared sequences in
which SNPs could be identified. In contrast, the lowest
rate of SNPs was among the clade 2-specific genes, with
only 0.019 SNPs/base. In the opposite manner to the
core genome, this result would be expected since
the clade 2-specific genes are unique and shared among
the set of three genetically related clade 2 strains and
because only one newly sequenced clade 2 strain, M06-

24/0, contributed to this particular SNP pool. The fre-
quency of SNPs in the clade 2 + 99-738 DP-B5 set of
ORFs was 0.033 SNPs/base. The frequency of SNPs
among hypothetical proteins (0.037 SNPs/base) was sig-
nificantly lower than that of the total ORFs.

Lineage-specific Expansions
Gu et al. [43]. recently reported an analysis of numerous
Vibrio spp. to identify lineage-specific expansions (LSEs),
genes that have been duplicated within a species or geno-
type. Some LSEs are specific to single strain, while others
are present among varied strains across species. We
examined some of the LSEs present in the reference gen-
omes of V. vulnificus to determine if these loci are simi-
larly present in the newly sequenced V. vulnificus
genomes. We did not find a pattern to the presence or
absence of the LSEs examined. For example, VV1_3196
and VV2_0703 form a pair of LSE genes in CMCP6.
Neither of these genes has a homologue in YJ016 or any
of the newly sequenced V. vulnificus genomes. In con-
trast, VV1_2851 and VV2_0347 constitute a pair of LSEs
in CMCP6 that have homologues in YJ106 (VV1419 and
VVA0904). The VV1_2851/VV1419 pair of genes has
homologues in all of the four newly sequenced genomes,
while VV2_0347 and VVA0904 do not.

This study is one of the first to use Applied Biosystems
SOLiD sequencing for genomic sequencing of bacteria.
Whole genome analysis has progressed considerably
since the publication of the first complete DNA sequence
of the pathogenic bacterium Haemophilus influenzae
[44]. Until recently, the wealth of complete genomes
available in public databases was decoded via the large-
scale industrialization of the Sanger dideoxy chain-termi-
nation sequencing method [45,46]. The prospect of
quickly and inexpensively resequencing large segments of
the human genome or whole genomes of populations or
species is driving development of a new generation of
sequencing technologies with impacts in microbiology,
functional genomics, ecology and evolutionary biology,
human health, and beyond [45,47-51]. In particular, bac-
terial sequencing has been advanced by the high through-
put, parallel format of the 454 Sequencer [51], the first
'next-generation' technology to de novo sequence and
assemble whole bacterial genomes including Mycoplasma
genitalium in a single machine run [52]. Bacterial com-
parative genomics has expanded rapidly owing to the
speed of the 454 Sequencer compared to Sanger sequen-
cing [53] as well as from a combination of the two tech-
nologies [54], while assessment of microbial diversity
from complex communities (metagenomics) [55]
has revealed insights into complex interactions such as

Page 9 of 16

Gulig et al. BMC Genomics 2010, 11:512 Page 10 of 16

Table 4 Numbers of SNPs from each of the four sequenced genomes relative to the two chromosomes of the
reference genomes

99-738 DP-B5
99-520 DP-B8
ATCC 33149

Chrom. 1

Chrom. 2


Chrom. 1

Chrom. 2


mammalian obesity and the microbiome [56,57], the
ocean biosphere [58], and the role of microbes in colony
collapse disorder in honeybees [59].
More recently released 'second-generation' sequencing
technologies such the Illumina GA2X Genome Analyzer
(GA) and ABI SOLiD system have been developed [51].
To date, these next generation sequencing technologies
generate shorter read lengths than Sanger sequencing,
which poses a difficulty for de novo sequence assembly
and defining large chromosomal rearrangements [49,51].
So far in prokaryotes, high quality draft sequences have
been assembled in the absence of Sanger sequencing by
combing the 454 and GA technologies [60-64]. Stud-
holme et al. [65] utilized the Illumina platform alone for
the de novo assembly of the draft genome sequence of
Pseudomonas syringae pathovar tabaci strain 11528,
revealing insights into the nature of type III protein-
mediated pathogenicity.
The improved throughput from the massively parallel
format of the new platforms (billions of bases in a single
run) is ideal for revealing patterns of genetic variation
among individuals by resequencing. For example, Srivat-
san et al. [66] employed Illumina sequencing to improve
the existing draft of the extensively studied model bac-
terium Bacillus subtilis, while also identifying poly-
morphisms between other well studied laboratory
strains and their isolates. Moreover, this method was
sensitive enough to identify typically difficult to isolate
suppressor mutations in a single strain [66]. Using the
same platform, whole-genome analysis of 12 isolates of
the monomorphic human pathogen Salmonella enterica
serovar Typhi revealed evolutionary loss of gene func-
tion consistent with the effects of genetic drift on a
small effective population size [67]. Resequencing of the
Caernohabditis elegans N2 Bristol strain and SNP dis-
covery in another strain demonstrated the effectiveness
of this technology in eukaryotes [68], and single base
mutations in a mutant C. elegans strain were mapped,
avoiding traditional genetic mapping efforts [69].
As one of the newer second-generation sequencers
currently available, (although 'third-generation' single
molecule sequencers are set to be marketed in 2010),
the ABI SOLiD platform has been used more with
eukaryotes than prokaryotes. One of the first studies

focused on assessing cross-platform performance for
sequence detection of known mutations in C. elegans.
Comparable accuracy between GA and SOLiD for map-
ping the same C. elegans mutant strain as Sarin et al.
[69] was reported [70]. Similarly, comparable accuracy
was reported between 454, GA, and SOLiD methods for
comparing a mutant strain of yeast to a reference gen-
ome [71]. At present the utility of the SOLiD platform
is reflected in several resequencing studies in humans,
including haplotype analysis, breakpoint mapping in dis-
ease-associated chromosomal rearrangements, and poly-
morphism discovery in protein coding exons [72-74].
With bacteria, SOLiD sequencing has been limited to
verifying an E. coli reference strain sequence in conjunc-
tion with traditional sequencing [75], as well as rese-
quencing of Bacillus anthracis strains for rapid and
accurate forensic typing [76]. In our presently described
study, the SOLiD platform was successfully utilized for
rapid comparative genomic analysis of clade-specific and
core genome sequences of the opportunistic pathogen
V vulnificus.
By examining the genomic DNA of each of four
V vulnificus strains on one-fourth of a SOLiD slide, we
obtained 3.16 x 107 to 3.50 x 107 35-nt reads. This level
of sequencing yielded approximately 100-fold coverage
of each genome. Although the total numbers of reads
would have predicted over 200-fold coverage, there was
a significant amount of low complexity reads, as well as
reads that were unmappable to the reference genomes.
We identified sequences that are unique to the highly
virulent clade 2 strains. These 80 genes represent the
set that could contain virulence genes that are responsi-
ble for the ability of clade 2 strains to cause systemic
infection and death in subcutaneously inoculated iron
dextran-treated mice (Thiaville, P.C., et al., Infect.
Immun. submitted). Furthermore, we identified 61 addi-
tional genes that are common to the clade 2 strains and
an unusual highly virulent clade 1 strain but absent
from a typical attenuated clade 1 strain and a biotype 2
eel isolate. These 61 genes represent a very interesting
set that could contain generally clade 2-specific genes
that were acquired by a clade 1 strain and increased its
virulence to that of typical clade 2 strains. Among these
putative virulence genes were genomic island XII

Gulig et al. BMC Genomics 2010, 11:512

Hypothetical I

Clade2+ 99738 DPB5;

Clade 2



xx X

,I II i iap~~XX


0.000 0.010 0.020 0.030 0.040 0.050 0.060 0.070 0.080 0.090 0.100 0.110 0.120

Figure 2 Distribution of SNPs Relative to the CMCP6 Genome and Subsets of Genes Box and whisker plots of
the subsets of annotated genes are shown.

identified by Cohen et al. [20], and most interesting was
a set of genes involved with sialic acid catabolism. Jeong
et al. [42] recently determined that the ability to utilize
sialic acid for metabolism was essential for virulence of
V. vulnificus. We are currently examining the possible
roles of several of these loci in virulence.
At the time of our performing this genomic sequence
analysis, we had not performed virulence studies of bio-
type 2 ATCC 33149 in our subcutaneously inoculated
iron dextran-treated mouse model for infection. How-
ever, Amaro et al. [77] previously reported that ATCC
33149 was of modest virulence in a different mouse
model involving intraperitoneal infection. Based on our
results indicating that ATCC 33149 lacked the genes
shared among virulent clade 2 strains or clade 2 strains
plus virulent clade 1 99-738 DP-B5, we hypothesized
that ATCC 33149 would be attenuated for virulence in
our model. In fact, when administered at the standard
lethal dose of 1,000 CFU for virulent strains, ATCC
33149 caused only minimally detectable skin infection in
one of five mice. Furthermore, when administered at
100 times the typical lethal dose (105 CFU/mouse), skin
infection but no systemic infection or death ensued.
Therefore, our genomic analysis of ATCC 33149 correctly

the SNPs/hase for each of

predicted its attenuated virulence. It should be noted that
Amaro and Biosca [78] reported that some biotype 2
strains are virulent for mammals, so the attenuation of
ATCC 33149 was not a foregone conclusion.
Because phenotypic differences are not only rooted in
presence or absence of whole genes, but also nucleotide
polymorphisms, we generated a set of SNPs among the
shared sequences of the reference and newly sequenced
genomes (Table 4 and Additional Files 6, 7, 8, 9, 10, 11,
12, and 13). By examining Sanger-derived sequences for
a subset of SNPs, we determined that 98.4% of our
reported SNPs are accurate. Of the 128 SNPs examined,
only two in one gene of one strain were not confirmed
by Sanger sequencing.
Although the sample size of newly sequenced strains
was small and each strain is a single representative of a
unique genotype/virulence phenotype combination,
some interesting relationships in SNPs were observed.
Most interesting, the rate of SNPs was significantly
higher for genes encoded on chromosome 2 compared
with chromosome 1. Given that chromosome 1 of
Vibrio is believed to encode most of the essential genes
and that chromosome 2 is believed to have been
acquired exogenously [79], it is reasonable that the

Page 11 of 16

Gulig et al. BMC Genomics 2010, 11:512

highest rate of polymorphisms would occur in the chro-
mosome 2. The number of SNPs between M06-24/0
and the reference genomes was much lower than those
from the other three genomes (Table 4), even though
there were slightly more genes identified in M06-24/0.
Because M06-24/0 is in the same clade as the reference
genomes, this result would be expected. Significant dif-
ferences were observed in the frequencies of SNPs
among about every subset of genes examined, e.g., clade
2-specific, core genome, hypothetical proteins (Figure 2).
However, it must be noted that the numbers of strains
contributing to the SNP pool for these subsets of genes
differ between the sets. For example, the core genome is
shared among all six strains, so all four newly sequenced
strains contributed SNPs and could have generated
a higher frequency of SNPs. In contrast, for the
clade 2-specific genes, the only newly sequenced strain
contributing SNPs, by definition of the subset, was
By comparing the sequences shared among all six gen-
omes, we identified the core V vulnificus genome con-
sisting of 3,459 genes. Gu et al. [43] examined the
genomic sequences of all Vibrio species as of 2008 and
identified 1,882 genes common to the genus. We are pre-
sently examining the core V. vulnificus genome to deduce
possible metabolic and virulence characteristics of the
species. We identified 20 genes previously unreported in
V. vulnificus by using MAQ to compare the unmapped
reads to the V. cholerae N16961 genome. If the clade 1
or biotype 2 genomes possessed sequences with sufficient
similarity to the V. cholerae genome, we should have
been able to identify and assemble them exactly as we
did for the V vulnificus reference genomes.
Most recently, Chun et al. [80] examined the genomes
of 23 V cholerae strains collected over 98 years. Their
newly sequences genomes were obtained using a combi-
nation of Sanger and 454 sequencing. Like us, they
based their phylogenetic relationships primarily on pre-
sence or absence of ORFs. Their analysis enabled the
division of that species into 12 lineages, with one com-
prising the 01 strains and the seventh pandemic com-
prising a nearly identical clade. They determined that
horizontal gene transfer significantly contributed to the
evolution of the species.

SOLiD sequencing of multiple bacterial genomes of
V vulnificus and subsequent comparative genomic analy-
sis identified numerous genes that are common to the
most virulent strains yet lacking from attenuated strains
for which genomic DNA sequence data are available.
These candidate virulence genes encode Flp pili, GGDEF
proteins, and genomic island XII. Sialic acid catabolism
was similarly identified as a potential contributory factor

in molecular pathogenesis. These intriguing results will
likely lead to more thorough understanding of molecular
pathogenesis of V. vulnificus.

V. vulnificus strains
Each of the four V. vulnificus strains used for genomic
sequencing was chosen to represent a specific combina-
tion of genotype and virulence phenotype. M06-24/0 is
a typical biotype 1, clade 2 strain that is highly virulent
in our subcutaneously inoculated, iron dextran-treated
mouse model. 99-520 DP-B8 is a typical biotype 1, clade
1 strain that is attenuated in our mouse model in that it
can cause skin infection, but not systemic infection or
death. 99-738 DP-B5 is an unusual biotype 1, clade 1
strain in that it is fully virulent in our mouse model.
ATCC 33149 is a biotype 2 strain that is highly attenu-
ated for virulence in our mouse model. These data are
summarized in Table 1.

SOLiD DNA sequencing
Sequencing runs were done using cycled ligation sequen-
cing on a SOLiD" Analyzer (Applied Biosystems, Beverly,
MA) at the Interdisciplinary Center for Biotechnology
Research at the University of Florida. Approximately 3 to
5 pg of purified bacterial genomic DNA was sheared into
80 to 100-bp short fragments with the Covaris S2 system
according to the AB protocol. The sheared DNA was puri-
fied using a Qiagen MiniElute" reaction cleanup kit. The
purified sheared fragments were made blunt-ended with
the Epicenter" End-It" DNA end-repair kit and sub-
sequently ligated to short SOLiD P1 and P2 adapters (Pl,
GAACCCGGGGCAG-3'), which provide the primary
sequences for both amplification and sequencing of the
sample library fragments. Adapter-ligated DNA was then
purified using the Agencourt kit. The reaction conditions
were optimized to selectively bind DNA 100-bp and larger.
At this point, DNA was nick-translated and resolved on
4% agarose gel, from which the 120 to 180-bp fragments
were excised. The fractionated DNA was subjected to 8 to
10 cycles of PCR amplification. The number of PCR cycles
needed for amplification was determined by the ability to
visualize the amplified product on a 2.2% Lonza flash gel.
The amplified PCR products were purified and then quan-
tified using an Agilent 2100 bioanalyzer.
In preparation for sequencing, the DNA fragments
were clonally amplified by emulsion PCR by using 1.6
billion, 1 ptM beads with P1 primer covalently attached
to the surface. Emulsions were broken with butanol, and
ePCR beads were enriched for template-positive beads
by hybridization with P2-coated capture beads (SOLiD
reagent, Applied Biosystems). Template-enriched beads

Page 12 of 16

Gulig et al. BMC Genomics 2010, 11:512

were extended at the 3' end in the presence of terminal
transferase and 3' bead linker. About 60 million beads
with clonally amplified DNA were then deposited onto
one-fourth of a derivatized glass surface of a 25 mm x
75 mm SOLiD" slide. The slide was then loaded onto a
SOLiD instrument, and the 35-base sequences were
obtained according to manufacturer's protocol.

DNA sequence data management
The colorspace reads from SOLiD sequencing were
aligned to the genomes of V vulnificus strains CMCP6
(GenBank accession numbers AE016795 and AE016796)
and YJ016 (GenBank accession numbers BA000037,
BA000037, AP005352) using MAQ [27]. Reads from
each of the four sample strains were mapped to the two
reference sequences separately. Reads unmapped in both
reference genomes were identified. Mapped reads were
used to develop a consensus sequence for each of the
four strains. For each strain relative to the two reference
sequences, a gene was determined to be absent when
the average depth of coverage over the open reading
frame was less than 5X. Consensus sequences were also
used to generate a list of SNPs among the six strains of
V vulnificus using the MAQ cns2snp [27].
Reads with low-complexity characteristics, defined as
containing a homopolymer run of at least 5 bases, at
least four repeats of the same dinucleotide in a row, or
at least four repeats of the same trinucleotide in a row,
were removed from the data set before further analysis.
While these reads may represent true genomic regions,
the difficulty in assigning them to a particular genomic
region limits their value. This is an inherent problem
with low complexity genomes and short read data.
Reads unmapped in both reference sequences were then
compared to the V cholerae NC16961 reference genome
using MAQ [27]. Windows of 100 nucleotides in the
V. cholerae genome with a read depth of five or more
were identified. Regions where five or more windows
occurred in tandem were retained, while those with cov-
erage less than five were discarded. Reads that initially
mapped to a lower density area of the V. cholerae
genome were re-examined for possible matches to the
tandem windows.
In parallel to the V. cholerae exploration, the unmapped
reads were examined for similarity to V. vulnificus biotype
2 plasmids pR99 (accession # AM293858), pC4602-1
(accession # AM293859), and pC4602-2 (accession #
AM293860) using MAQ and the same criteria as above.

Bioinformatic analysis
Functional analysis and annotations analysis of the
V. vulnificus YJ016 and CMCP6 genes were done using
the Pathway Tool Omics viewer from the BioCyc plat-
form [81] and the SEED database [82].

Additional material

Additional file 1: Table S1: Coverage of the V. vulnificus biotype 2
plasmids by newly sequenced reads SOLD sequencing reads of each
of the four newly sequenced genomes were matched with the three
plasmids of V vulnificus biotype 2 using MAQ The size of each plasmid is
shown "Numbers of nucleotides of the reference plasmid with less than
10-fold coverage by 35-nt reads from the newly sequenced genome
*Number of nucleotides that were matched by virtue of having 10-fold
or greater coverage depth ..Percent of reference plasmid matched to
the newly sequenced genome
Additional file 2: Table S2: Identification of ORFs in newly
sequenced V. vulnificus genomes by matching with the V. cholerae
NC16961 genome SOLD sequencing reads of each of the four newly
sequenced genomes were matched with the V cho/erae NC16961 using
MAQ, as described in the Methods V vulnificus strains' M06 M06-24/0,
B5 -99-738 DP-B5, B8 -99-520 DP-B8, ATCC -ATCC 33149 Genes were
considered matched if there was fivefold or higher depth of coverage
over five tandem 100-nt windows
Additional file 3: Table S3A: Matches of CMCP6 genes from the
YJ016 reference genome and the four newly sequenced genomes
The CMCP6 genes are shown by their tag, gene name (if annotated), and
product (if known) Matching of each gene with the newly sequenced
genomes was determined using MAQ, as described in the Methods
Matches with the YJ016 genome were obtained using GenPlot at http//
wwwncbi nim nihgov using default parameters Genes from each
queried genome that were not matched to the CMCP6 genome are
indicated with a CMCP6 gene is missing rom all of the other five
genomes, it is indicated with an x in the CMCP6-Specific column V
vulnificus strains M06 M06-24/, B5 99738 DP-B5, B8 99-520 DPB8,
ATCC -ATCC 33149
Additional file 4: Table S3B: Matches of YJ016 genes from the
CMCP6 reference genome and the four newly sequenced genomes
The YJ016 genes are shown by their tag, gene name (if annotated), and
product (if known) Matching of each gene with the newly sequenced
genomes was determined using MAQ, as described in the Methods
Matches with the CMCP6 genome were obtained using GenPlot at
http//wwwncbi nim nih gov using default parameters Genes from each
queried genome that were not matched to the YJ016 genome are
indicated with an X If a YJ016 gene is missing from all of the other five
genomes, it is indicated with an x in the YJ016-Specific column V
vulnificus strains M06 -M06-24/0, B5 -99-738 DP-B5, B8 -99-520 DP-B8,
Additional file 5: Table 54: The core V. vulnificus genome Genes that
were present in the two reference genomes and each of the four newly
sequenced genomes are shown using the CMCP6 tag, product, gene
name, and cog
Additional file 6: Table S5A: SNP analysis of V. vulnificus M06-24/O
compared with the CMCP6 reference genomes MAQ was used to
identify SNPs from the SOLD sequencing reads from M06-24/0
compared with the CMCP6 reference genome, as described in the
Methods Pos Position of the nucleotide in the genomic element Ref -
Reference base in thereeference genome Con Consensus base in the
newly sequenced genome Con QS Consensus Quality Score Read
depth Depth of coverage at the chosen nucleotide Avg hits Average
number o hits of read covering the position HMQ Highest mapping
quality of reads covering the position MCQ Minimum consensus
quality in the third flanking region on each side of the site 2nd second
best call for the nucleotide LLR Log likelihood ratio of the second and
third best call 3rd Third best cal
Additional file 7: Table S5B: SNP analysis of V. vulnificus M06-24/O
compared with the YJ016 reference genome MAQ was used to
identify SNPs from the SOLD sequencing reads from M06-24/0
compared with the YJ016 reference genome, as described in the
Methods Column headings are as for Additional File 6, Table S5A
Additional file 8: Table S6A: SNP analysis of V. vulnificus 99-738 DP-
B5 compared with the CMCP6 reference genome MAQ was used to
identify SNPs from the SOLD sequencing reads from 99-738 DP-B5

Page 13 of 16

Gulig et al. BMC Genomics 2010, 11:512

compared with the CMCP6 reference genome, as described in the
Methods Column headings are as for Additional File 6, Table S5A
Additional file 9: Table S6B: SNP analysis of V. vulnificus 99-738 DP-
B5 compared with the YJ016 reference genome MAQ was used to
identify SNPs from the SOLD sequencing reads from 99-738 DP-B5
compared with the YJ016 reference genome, as described in the
Methods Column headings are as for Additional File 6, Table S5A
Additional file 10: Table S7A: SNP analysis of V. vulnificus 99-520
DP-B8 compared with the CMCP6 reference genome MAQ was used
to identify SNPs from the SOLD sequencing reads from 99-520 DP-B8
compared with the CMCP6 reference genome, as described in the
Methods Column headings are as for Additional File 6, Table S5A
Additional file 11: Table 7B: SNP analysis of V. vulnificus 99-520 DP-
B8 compared with the YJ016 reference genome MAQ was used to
identify SNPs from the SOLD sequencing reads from 99-520 DP-B8
compared with the YJ016 reference genome, as described in the
Methods Column headings are as for Additional File 6, Table S5A
Additional file 12: Table 8A: SNP analysis of V. vulnificus ATCC
33149 compared with the CMCP6 reference genome MAQ was used
to identify SNPs from the SOLD sequencing reads from ATCC 33149
compared with the CMCP6 reference genome, as described in the
Methods Column headings are as for Additional File 6, Table S5A
Additional file 13: Table 8B: SNP analysis of V. vulnificus ATCC
33149 compared with the YJ016 reference genome MAQ was used
to identify SNPs from the SOLD sequencing reads from ATCC 33149
compared with the YJ016 reference genome, as described in the
Methods Column headings are as for Additional File 6, Table S5A

We thank Robert Edwards for the initial BLAST analysis of the SOLD
sequencing data We thank Patrick Thiaville for critical review of this
This work was supported by funding from the University of Florida
Emerging Pathogens Institute, The University of Florida Opportunity Fund,
and Florida Sea Grant Publication of this article was funded in part by the
University of Florida Open-Access Publishing Fund

Author details
1Department of Molecular Genetics and Microbiology, University of Florida,
Gainesville, Florida, USA 2Department of Microbiology and Cell Science,
University of Florida, Gainesville, Florida, USA 3Department of Food Science
and Human Nutrition, University of Florida, Gainesville, Florida, USA
4Department of Genetics, University of Melbourne, 3010 Australia

Authors' contributions
PAG planned and coordinated the research, analyzed data, and wrote the
manuscript VDC contributed to data analysis and writing ACW contributed
to planning and writing BW performed MAQ data analysis and planning
MTS contributed to the writing the manuscript LMM helped plan the study,
planned analyses, and contributed to the writing of the manuscript Al
authors read and approved the final manuscript

Received: 15 March 2010 Accepted: 24 September 2010
Published: 24 September 2010

S Gulig PA, Bourdage KL, Starks AM Molecular Pathogenesis of Vibrio
vulnificus. J Microbiol 2005, 43'118-131
2 Oliver JD, Jones MK Vibrio vulnificus: Disease and pathogenesis. Infect
Immun 2009, 77'1723-1733
3 Simpson LM, White VK, Zane SF, Oliver JD Correlation between virulence
and colony morphology in Vibrio vulnificus. Infect immun 1987,
4 Wright AC, Simpson LM, Oliver JD, Morris JG Jr Phenotypic evaluation of
acapsular transposon mutants of Vibrio vulnificus. Infect immun 1990,
58 1769-1773

5 Liu M, Alice AF, Naka H, Crosa JH' The HlyU protein is a positive
regulator of rtxA1, a gene responsible for cytotoxicity and virulence
in the human pathogen Vibrio vulnificus. Infect immun 2007, 75
6 Lee JH, Kim MW, Kim BS, Kim SM, Lee BC, Kim TS, Choi SH Identification
and characterization of the Vibrio vulnificus rtxA essential for cytotoxicity
in vitro and virulence in mice. J Microbiol 2007, 45 146-152
7 Kim YR, Lee SE, Kook H, Yeom JA, Na HS, Kim SY, Chung SS, Choy HE,
Rhee JH Vibrio vulnificus RTX toxin kills host cells only after contact of
the bacteria with host cells. Cell Microbiol 2008, 10848-862
8 Litwin CM, Rayback TW, Skinner J Role of catechol siderophore synthesis
in Vibrio vulnificus virulence. Infect immun 1996, 642834-2838
9 Wright AC, Simpson LM, Oliver JD Role of iron in the pathogenesis of
Vibrio vulnificus infections. Infect immun 1981, 34503-507
10 Paranjpye RN, Strom MS A Vibrio vulnificus type IV pilin contributes to
biofilm formation, adherence to epithelial cells, and virulence. Infect
Immun 2005, 731411-1422
11 Paranjpye RN, Lara JC, Pepe JC, Pepe CM, Strom MS' The type IV leader
peptidase/N-methyltransferase of Vibrio vulnificus controls factors
required for adherence to HEp-2 cells and virulence in iron-overloaded
mice. Infect immun 1998, 665659-5668
12 Kim YR, Rhee JH Flagellar basal body fig operon as a virulence
determinant of Vibrio vulnificus. Biochem Biophys Res Commun 2003,
13 Lee JH, Rho JB, Park KJ, Kim CB, Han YS, Choi SH, Lee KH, Park SJ Role of
flagellum and motility in pathogenesis of Vibrio vulnificus. Infect immun
2004, 724905 4910
14 Tison DL, Nishibuchi M, Greenwood JD, Seidler RJ Vibrio vulnificus
biogroup 2:new biogroup pathogenic for eels. Apple Environ Microbiol
1982, 44'640-646
15 Bisharat N, Agmon V, Finkelstein R, Raz R, Ben Dror G, Lerner L, Soboh S,
Colodner R, Cameron DN, Wykstra DL, Swerdlow DL, Farmer JJ Jr Clinical,
epidemiological, and microbiological features of Vibrio vulnificus
biogroup 3 causing outbreaks of wound infection and bacteraemia in
Israel. Israel Vibrio Study Group. Lancet 1999, 354'1421 1424
16 Nilsson WB, Paranjpye RN, DePaola A, Strom MS Sequence polymorphism
of the 16 S rRNA gene of Vibrio vulnificus is a possible indicator of strain
virulence. J Cin Microbiol 2003, 41442-446
17 Gonzalez-Escalona N, Jaykus LA, DePaola A Typing of Vibrio vulnificus
strains by variability in their 16S-23 S rRNA intergenic spacer regions.
Foodborne Pathog Dis 2007, 4327-337
18 Bisharat N, Cohen DI, Harding RM, Falush D, Crook DW, Peto T, Maiden MC
Hybrid Vibrio vulnificus. Emerg Infect Dis 2005, 1130-35
19 Bisharat N, Cohen DI, Maiden MC, Crook DW, Peto T, Harding RM The
evolution of genetic structure in the marine pathogen, Vibrio vulnificus.
infect Genet Evol 2007, 7685-693
20 Cohen AL, Oliver JD, DePaola A, Feil EJ, Boyd EF Emergence of a virulent
clade of Vibrio vulnificus and correlation with the presence of a 33-
kilobase genomic island. App Environ Microbiol 2007, 735553-5565
21 Rosche TM, Yano Y, Oliver JD A rapid and simple PCR analysis indicates
there are two subgroups of Vibrio vulnificus which correlate with clinical
or environmental isolation. Microbiol immunol 2005, 49381 389
22 Drake SL, Whitney B, Levine JF, DePaola A, Jaykus LA Correlation of
mannitol fermentation with virulence-associated genotypic
characteristics in Vibrio vulnificus isolates from oysters and water
samples in the Gulf of Mexico. Foodborne Pathog Dis 2010, 797-101
23 Starks AM, Schoeb TR, Tamplin ML, Parveen S, Doyle TJ, Bomeisl PE,
Escudero GM, Gulig PA Pathogenesis of infection by clinical and
environmental strains of Vibrio vulnificus in iron dextran-treated mice.
Infect immun 2000, 68 57855793
24 Starks AM, Bourdage KL, Thiaville PC, Gulig PA Use of a marker plasmid to
examine growth and death of Vibrio vulnificus in infected mice. Mol
Microbiol 2006, 61 310323
25 DePaola A, Nordstrom JL, Dalsgaard A, Forslund A, Oliver JD, Bates T,
Bourdage KL, Gulig PA Analysis of Vibrio vulnificus from market oysters
and septicemia cases for virulence markers. Apple Environ Microbiol 2003,
26 Biosca EG, Llorens H, Garay E, Amaro C Presence of a capsule in Vibrio
vulnificus biotype 2 and its relationship to virulence for eels. Infect
Immun 1993, 61 1611-1618

Page 14 of 16

Gulig et al. BMC Genomics 2010, 11:512

27 Li H, Ruan J, Durbin R Mapping short DNA sequencing reads and calling
variants using mapping quality scores. Genome Res 2008, 18'1851 1858
28 Lee CT, Amaro C, Wu KM, Valiente E, Chang YF, Tsai SF, Chang CH, Hor LI A
common virulence plasmid in biotype 2 Vibrio vulnificus and its
dissemination aided by a conjugal plasmid. J Bactenol 2008,
190 1638-1648
29 Davidson LS, Oliver JD Plasmid carriage in Vibrio vulnificus and other
lactose- fermenting marine vibrios. Appi Environ Microbiol 1986,
30 Chen CY, Wu KM, Chang YC, Chang CH, Tsai HC, Liao TL, Liu YM, Chen HJ,
Shen AB, Li JC, Su TL, Shao CP, Lee CT, Hor LI, Tsai SF Comparative
genome analysis of Vibrio vulnificus, a marine pathogen. Genome Res
2003, 13'2577-2587
31 Labbate M, Case PJ, Stokes HW The integron/gene cassette system: an
active player in bacterial adaptation. Methods Mol Biol 2009, 532'103-125
32 Mazel D, Dychinco B, Webb VA, Davies J A distinctive class of integron in
the Vibrio cholerae genome. Science 1998, 280605-608
33 Cotter PA, Stibitz S c-di-GMP-mediated regulation of virulence and
biofilm formation. Curr Opin Microbioi 2007, 1017-23
34 Kachlany SC, Planet PJ, DeSalle R, Fine DH, Figurski DH, Kaplan JB flp-1, the
first representative of a new pilin gene subfamily, is required for non-
specific adherence of Actinobacillus actinomycetemcomitans. Mol
Microbiol 2001, 40 542-554
35 Roberts DD, Ginsburg V Sulfated glycolipids and cell adhesion. Arch
Biochem Biophys 1988, 267405-415
36 Bhat S, Spitalnik SL, Gonzalez-Scarano F, Silberberg DH Galactosyl
ceramide or a derivative is an essential component of the neural
receptor for human immunodeficiency virus type 1 envelope
glycoprotein gp120. Proc Naot Acad Sc USA 1991,88'7131 7134
37 Hannah JH, Menozzi FD, Renauld G, Locht C, Brennan MJ Sulfated
glycoconjugate receptors for the Bordetella pertussis adhesin filamentous
hemagglutinin (FHA) and mapping of the heparin-binding domain on
FHA. Infect immune 1994, 62'5010-5019
38 Kamisago S, Iwamori M, Tai T, Mitamura K, Yazaki Y, Sugano K Role of
sulfatides in adhesion of Helicobacter pylori to gastric cancer cells. Infect
Immune 1996, 64624-628
39 Hoffman JA, Badger JL, Zhang Y, Huang SH, Kim KS Escherichia coliK1 as/A
contributes to invasion of brain microvascular endothelial cells in vitro
and in vivo. Infect immune 2000, 685062-5067
40 Bryant RG, Jarvis J, Janda JM Use of sodium dodecyl sulfate-polymyxin B-
sucrose medium for isolation of Vibrio vulnificus from shellfish. Apple
Environ Microbiol 1987, 531556-1559
41 Almagro-Moreno S, Boyd EF Insights into the evolution of sialic acid
catabolism among bacteria. BMC Evol Biol 2009, 9 118
42 Jeong HG, Oh MH, Kim BS, Lee MY, Han HJ, Choi SH The capability of
catabolic utilization of N-acetylneuraminic acid, a sialic acid, is essential
for Vibrio vulnificus pathogenesis. Infect immune 2009, 77'3209-3217
43 Gu J, Neary J, Cai H, Moshfeghian A, Rodriguez SA, Lilburn TG, Wang Y'
Genomic and systems evolution in Vibrionaceae species. BMC Genomics
2009, 10(Suppl 1)'S11
44 Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF,
Kerlavage AR, Bult C, Tomb JF, Dougherty BA, Merrick JM, McKenney K,
Sutton G, Fitzhugh W, Fields C, Gocayne JD, Scott J, Shirley R, Liu LI,
Glodek A, Kelley JM, Weidman JF, Phillips CA, Spriggs T, Hedblom E,
Cotton MD, Utterback TR, Hanna MC, Nguyen DT, Saudek DM, Brandon RC,
Fine LD, Fritchman JL, Fuhrmann JL, Geoghagen NSM, Gnehm CL,
McDonald LA, Small KV, Fraser CM, Smith HO, Venter JC Whole-genome
random sequencing and assembly of Haemophilus-influenzae Rd. Science
1995, 269496-512
45 Hall N' Advanced sequencing technologies and their wider impact in
microbiology. J Exp Biol 2007, 210 1518-1525
46 Sanger F, Nicklen S, Coulson AR DNA sequencing with chain-terminating
inhibitors. Proc Nat/ Acad Sc USA 1977, 745463-5467
47 Hudson ME Sequencing breakthroughs for genomic ecology and
evolutionary biology. Mol Ecol Resour 2008, 83-17
48 Mardis ER The impact of next-generation sequencing technology on
genetics. Trends Genet 2008, 24'133-141
49 Morozova O, Marra MA Applications of next-generation sequencing
technologies in functional genomics. Genomics 2008, 92255-264

50 Pettersson E, Lundeberg J, Ahmadian A Generations of sequencing
technologies. Genomics 2009, 93105-111
51 Rothberg JM, Leamon JH The development and impact of 454
sequencing. Nat Biotechnol 2008, 261117-1124
52 Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J,
Braverman MS, Chen YJ, Chen ZT, Dewell SB, Du L, Fierro JM, Gomes XV,
Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI,
Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM,
Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP,
Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT,
Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A,
Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu PG, Begley RF,
Rothberg JM Genome sequencing in microfabricated high-density
picolitre reactors. Nature 2005, 437376-380
53 Hiller NL, Janto B, Hogg JS, Boissy R, Yu S, Powell E, Keefe R, Ehrlich NE,
Shen K, Hayes J, Barbadora K, Klimke W, Dernovoy D, Tatusova T, Parkhill J,
Bentley SD, Post JC, Ehrlich GD, Hu FZ Comparative genomic analyses of
seventeen Streptococcus pneumoniae strains: insights into the
pneumococcal supragenome. J Bactenol 2007, 1898186-8195
54 Adams MD, Goglin K Molyneaux N, Hujer KM, Lavender H, Jamison JJ,
MacDonald IJ, Martin KM, Russo T, Campagnari AA, Hujer AM, Bonomo RA,
Gill SR Comparative genome sequence analysis of multidrug-resistant
Acinetobacter baumannii. J Bactenol 2008, 190'8053-8064
55 Snyder LAS, Loman N, Pallen MJ, Penn CW Next-generation sequencing-
the promise and perils of charting the great microbial unknown.
Microbial Ecology 2009, 571-3
56 Ley RE, Turnbaugh PJ, Klein S, Gordon Jl Microbial ecology Human gut
microbes associated with obesity. Nature 2006, 4441022-1023
57 Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE,
Sogin ML, Jones WJ, Roe BA, Affourtit JP, Egholm M, Henrissat B, Heath AC,
Knight R, Gordon Jl A core gut microbiome in obese and lean twins.
Nature 2009, 457480-484
58 Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR,
Arrieta JM, Herndl GJ Microbial diversity in the deep sea and the
underexplored "rare biosphere". Proc Nati Acad Sc USA 2006,
59 Cox-Foster DL, Conlan S, Holmes EC, Palacios G, Evans JD, Moran NA,
Quan PL, Briese T, Hornig M, Geiser DM, Martinson V, vanEngelsdorp D,
Kalkstein AL, Drysdale A, Hui J, Zhai JH, Cui LW, Hutchison SK Simons JF,
Egholm M, Pettis JS, Lipkin WI A metagenomic survey of microbes in
honey bee colony collapse disorder. Science 2007, 318283-287
60 Aury JM, Cruaud C, Barbe V, Rogier O, Mangenot S, Samson G, Poulain J,
Anthouard V, Scarpelli C, Artiguenave F, Wincker P High quality draft
sequences for prokaryotic genomes using a mix of new sequencing
technologies. Bmc Genomics 2008, 9'11
61 Loman NJ, Pallen MJ XDR-TB genome sequencing: a glimpse of the
microbiology of the future. Future Microbiol 2008, 3'111 113
62 Loman NJ, Snyder LAS, Linton JD, Langdon R, Lawson AJ, Weinstock GM,
Wren BW, Fallen MJ Genome sequence of the emerging pathogen
Helicobacter canadensis. J Bactenol 2009, 191 55665567
63 Qi W, Kaser M, Roltgen K Yeboah-Manu D, Pluschke G Genomic diversity
and evolution of Mycobacterium ulcerans revealed by next-generation
sequencing. PLos Pathogens 2009, 5 e1000580
64 Reinhardt JA, Baltrus DA, Nishimura MT, Jeck WR, Jones CD, Dangl JL' De
novo assembly using low-coverage short read sequence data from the
rice pathogen Pseudomonas syringae pv. oryzae. Genome Res 2009,
65 Studholme DJ, Ibanez SG, MacLean D, Dangl JL, Chang JH, Rathjen JP A
draft genome sequence and functional screen reveals the repertoire of
type III secreted proteins of Pseudomonas syringae pathovar tabaci
11528. Bmc Genomics 2009, 10'19
66 Srivatsan A, Han Y, Peng JL, Tehranchi AK, Gibbs R, Wang JD, Chen R High-
precision, whole-genome sequencing of laboratory strains facilitates
genetic studies. PLoS Genet 2008, 4'14
67 Holt KE, Parkhill J, Mazzoni C, Roumagnac P, Weill FX, Goodhead I, Rance R,
Baker S, Maskell DJ, Wain J, Dolecek C, Achtman M, Dougan G High-
throughput sequencing provides insights into genome variation and
evolution in Salmonella Typhi. Nature Genet 2008, 40987-993
68 Hillier LW, Marth GT, Quinlan AR, Dooling D, Fewell G, Barnett D, Fox P,
Glasscock I, Hickenbotham M, Huang WC, Magrini VJ, Richt RJ, Sander SN,

Page 15 of 16

Gulig et al. BMC Genomics 2010, 11:512

Stewart DA, Stromberg M, Tsung EF, Wylie T, Schedl T, Wilson RK,
Mardis ER Whole-genome sequencing and variant discovery in C.
elegans. Nat Methods 2008, 5183-188
69 Sarin S, Prabhu S, O'Meara MM, Pe'er I, Hobert O Caenorhabditis elegans
mutant allele identification by whole-genome sequencing. Nat Methods
2008, 5'865-867
70 Shen Y, Sarin S, Liu Y, Hobert O, Pe'er I Comparing platforms for C.
elegans mutant identification using high-throughput whole-genome
sequencing. Plos one 2008, 3'e4012
71 Smith DR, Quinlan AR, Peckham HE, Makowsky K, Tao W, Woolf B, Shen L,
Donahue WF, Tusneem N, Stromberg MP, Stewart DA, Zhang L, Ranade SS,
Warner JB, Lee CC, Coleman BE, Zhang Z, McLaughlin SF, Malek JA,
Sorenson JM, Blanchard AP, Chapman J, Hillman D, Chen F, Rokhsar DS,
McKernan KJ, Jeffries TW, Marth GT, Richardson PM Rapid whole-genome
mutational profiling using next-generation sequencing technologies.
Genome Res 2008, 181638-1642
72 Antipova AA, Sokolsky TD, Clouser CR, Dimalanta ET, Hendrickson CL,
Kosnopo C, Lee CC, Ranade SS, Zhang L, Blanchard AP, McKernan KJ'
Polymorphism discovery in high-throughput resequenced microarray-
enriched human genomic loci. Journal of Biomolecular Techniques 2009,
73 Chen W, Ullmann R, Langnick C, Menzel C, Wotschofsky Z, Hu H, Doring A,
Hu Y, Kang H, Tzschach A, Hoeltzenbein M, Neitzel H, Markus S,
Wiedersberg E, Kistner G, van Ravenswaaij-Arts CM, Kleefstra T,
Kalscheuer VM, Ropers HH Breakpoint analysis of balanced chromosome
rearrangements by next-generation paired-end sequencing. European
Journal of Human Genetics 2009, 18539-543
74 McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu YT, Tsung EF,
Closer CR, Duncan C, Ichikawa JK, Lee CC, Zhang Z, Ranade SS,
Dimalanta ET, Hyland FC, Sokolsky TD, Zhang L, Sheridan A, Fu HN,
Hendrickson CL, Li B, Kotler L, Stuart JR, Malek JA, Manning JM,
Antipova AA, Perez DS, Moore MP, Hayashibara KC, Lyons MR, Beaudoin RE,
Coleman BE, Laptewicz MW, Sannicandro AE, Rhodes MD, Gottimukkala RK,
Yang S, Bafna V, Bashir A, MacBride A, Alkan C, Kidd JM, Eichler EE,
Reese MG, De la Vega FM, Blanchard AP Sequence and structural
variation in a human genome uncovered by short-read, massively
parallel ligation sequencing using two-base encoding. Genome Res 2009,
75 Durfee T, Nelson R, Baldwin S, Plunkett G, Burland V, Mau B, Petrosino JF,
Qin X, Muzny DM, Ayele M, Gibbs RA, Csorgo B, Posfai G, Weinstock GM,
Blattner FR The complete genome sequence of Escherichia coli DH10B:
Insights into the biology of a laboratory workhorse. J Bactenol 2008,
190 2597-2606
76 Cummings CA, Bormann Chung CA, Fang R, Barker M, Brzoska PM,
Williamson P, Beaudry JA, Matthews M, Schupp JM, Wagner DM,
Furtado MR, Kiem P, Budowle B Whole-genome typing of Bacillus
anthracis isolates by next-generation sequencing accurately and rapidly
identifies strain-specific diagnostic polymorphisms. Forensic Science
International Genetics Supplement Series 2009, 300-301.
77 Amaro C, Biosca EG, Fouz B, Toranzo AE, Garay E Role of iron, capsule,
and toxins in the pathogenicity of Vibrio vulnificus biotype 2 for mice.
Infect immune 1994, 62759-763
78 Amaro C, Biosca EG Vibrio vulnificus biotype 2, pathogenic for eels, is
also an opportunistic pathogen for humans. Apple Environ Microbiol 1996,
62 1454-1457
79 Heidelberg JF, Eisen JA, Nelson WC, Clayton RA, Gwinn ML, Dodson PJ,
Haft DH, Hickey EK, Peterson JD, Umayam L, Gill SR, Nelson KE, Read TD,
Tettelin H, Richardson D, Ermolaeva MD, Vamathevan J, Bass S, Qin H,
Dragoi I, Sellers P, McDonald L, Utterback T, Fleishmann RD, Nierman WC,
White O DNA sequence of both chromosomes of the cholera pathogen
Vibrio cholerae. Nature 2000, 406477-483
80 Chun J, Grim C, Hasan NA, Lee JH, Choi SY, Haley BJ, Taviani E, Jeon YS,
Kim DW, Lee JH, Brettin TS, Bruce DC, Challacombe JF, Detter JC, Han CS,
Munk AC, Chertkov O, Meincke L, Saunders E, Walters RA, Huq A, Nair GB,
Colwell RR Comparative genomics reveals mechanism for short-term
and long-term clonal transitions in pandemic Vibrio cholerae. Proc Nat/
Acad Sc USA 2009, 106'15442-15447
81 Paley SM, Karp PD The Pathway Tools cellular overview diagram and
Omics Viewer. Nuclei Acids Res 2006, 34'3771 3778
82 Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M,
Crecy Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S,

Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N,
Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H,
Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA,
Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O,
Vonstein V The subsystems approach to genome annotation and its use
in the project to annotate 1000 genomes. Nucleic Acids Res 2005,
83 Mahmud ZH, Wright AC, Mandal SC, Dai J, Jones MK, Hasan M, Rashid MH,
Islam MS, Johnson JA, Gulig PA, Morris JG Jr, Ali A Genetic
characterization of Vibrio vulnificus strains from tilapia aquaculture in
Bangladesh. Appi Environ Microbiol 2010, 764890-4895
84 Vickery MC, Nilsson WB, Strom MS, Nordstrom JL, DePaola A A real-time
PCR assay for the rapid determination of 16 S rRNA genotype in Vibrio
vulnificus. J Microbiol Methods 2007, 68376-384

Cite this article as: Gulig et ail SOLiD sequencing of four Vibrio
vulnificus genomes enables comparative genomic analysis and
identification of candidate clade-specific virulence genes. BMC Genomics
2010 11512

0 B1Med Central

Page 16 of 16

Submit your next manuscript to BioMed Central
and take full advantage of:

* Convenient online submission
* Thorough peer review
* No space constraints or color figure charges
* Immediate publication on acceptance
* Inclusion in PubMed, CAS, Scopus and Google Scholar
* Research which is freely available for redistribution

Submit your manuscript at


Full Text


RESEARCHARTICLEOpenAccess SOLiDsequencingoffour Vibriovulnificus genomesenablescomparativegenomicanalysis andidentificationofcandidateclade-specific virulencegenes PaulAGulig 1* ,ValriedeCrcy-Lagard 2 ,AnitaCWright 3 ,BrandonWalts 1 ,MarinaTelonis-Scott 1,4 LaurenMMcIntyre 1 Abstract Background: Vibriovulnificus istheleadingcauseofreporteddeathfromconsumptionofseafoodintheUnited States.Despiteseveraldecadesofresearchonmolecularpathogenesis,muchremainstobelearnedaboutthe mechanismsofvirulenceofthisopportunisticbacterialpathogen.ThetwocompleteandannotatedgenomicDNA sequencesof V.vulnificus belongtostrainsofclade2,whichisthepredominantcladeamongclinicalstrains.Clade 2strainsgenerallypossesshighervirulencepotentialinanimalmodelsofdiseasecomparedwithclade1,which predominatesamongenvironmentalstrains.SOLiDsequencingoffour V.vulnificus strainsrepresentingdifferent clades(1and2)andbiotypes(1and2)wasusedforcomparativegenomicanalysis. Results: Greaterthan4,100,000basesweresequencedofeachstrain,yieldingapproximately100-foldcoveragefor eachofthefourgenomes.AlthoughthereadlengthsofSOLiDgenomicsequencingwereonly35nt,wewere abletomakesignificantconclusionsabouttheuniqueandsharedsequencesamongthegenomes,including identificationofsinglenucleotidepolymorphisms.Comparativeanalysisofthenewlysequencedgenomestothe existingreferencegenomesenabledtheidentificationof3,459core V.vulnificus genessharedamongallsixstrains and80clade2-specificgenes.Weidentified523,161SNPsamongthesixgenomes. Conclusions: Wewereabletogleanmuchinformationaboutthegenomiccontentofeachstrainusingnext generationsequencing.Flppili,GGDEFproteins,andgenomicislandXIIwereidentifiedaspossiblevirulence factorsbecauseoftheirpresenceinvirulentsequencedstrains.Genomiccomparisonsalsopointtowardthe involvementofsialicacidcatabolisminpathogenesis. Background Vibriovulnificus isanopportunisticpathogenthat causessepsisinhumansafteringestionofcontaminated rawoystersorwoundinfectionandnecrotizingfasciitis fromcontaminationofwounds(forareviewsee[1,2]). Themortalityratesforsepsisandwoundinfectionare ~50%and~15%,respectively.Duringinfectionof humansthebacteriareplicaterapidly,extensivelyinvade tissues,andcauseseveretissuedestruction.Mousemodelsofinfectioncoupledwith moleculargeneticanalysis haveidentifiedseveralvirulencefactorspartiallyexplainingthehighmortalityandextremetissuedestruction, mostimportantly,polysaccharidecapsule[3,4],RtxA1 toxin[5-7],acquisitionofiron[8,9],pili[10,11],andflagella[12,13].However,thes efactorsdonotcompletely explaintheremarkablevirulenceof V.vulnificus V.vulnificus canbeclassifiedinseveraldifferentmanners.Oneofthefirstclassi ficationschemeswasbased onbiochemicalreactionsofstrainsinitiallyyieldingtwo biotypes:biotype1mostoftenassociatedwithcontaminationofoystersandcausinghumandiseaseandbiotype2associatedwithinfectionofeels[14].Recently,a thirdbiotypethatcausedwoundinfectionfromhandling fishinIsraelwasidentified[ 15].Geneticanalysisusing * 1 DepartmentofMolecularGeneticsandMicrobiology,UniversityofFlorida, Gainesville,Florida,USA Fulllistofauthorinformationisavailableattheendofthearticle Gulig etal BMCGenomics 2010, 11 :512 2010Guligetal;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommons AttributionLicense(,whichpermitsunrestricteduse,distribution,andreproductionin anymedium,providedtheoriginalworkisproperlycited.


analysisofribosomalRNAloci[16,17],multilocus sequencetyping(MLST)[18-20],andvirulence-correlatedgene( vcg )PCR[21]revealedthat V.vulnificus strainscouldbedividedintotwogroups.Whilethe descriptorsforthesetwogroupsvary(clades,populations,clusters,andlineages),thetermsclade1and clade2areusedheretofollowtheMLSTclustersof Bisharatetal.[19].Biotype1strainsarepresentinboth clades,whereasbiotype2strainsarepresentonlyin clade1.BasedonMLSTanalysis,biotype3strains appeartobeahybridbetweenclades1and2[18]. Clade1strainsaremostoftenisolatedfromenvironmentalsamples,whileclad e2strainsaremostoften associatedwithhumandisease.Becauseoftheseepidemiologicalpatterns,manyin vestigatorshypothesized thatclade2strainspossessinherentlygreatervirulence. Inananalysisof69biotype1 V.vulnificus strains,we recentlydeterminedthatbothclade1andclade2 strainshavetheabilitytocausesevereskininfectionin subcutaneouslyinoculatedirondextran-treatedmice (Thiaville,P.C.etal.,Infec t.Immun,submitted;Jones, M.etal.,inpreparation).Themajordistinctionbetween thecladeswasthatclade2strainshadagreaterpropensitytocausesystemicinfectionanddeathinthemouse model,althoughthereweresomeattenuatedclade2 strainsandhighlyvirulentclade1strains. AnalysisofthegenomicDNAsequencesofclade1 andclade2strainswouldcontributetotheidentificationofgeneticdifferencesamongstrains.Asmicrobes engageinlateralgenetransferandareoftenhighly divergentingenomiccontent,thisstudycouldhelp identifygenesresponsibleforthedifferencesinvirulencebetweentheseclades.Bothofthecompleteand annotated V.vulnificus genomesareofclade2strains, CMCP6(GenBankaccessionnumbersAE016795and AE016796)andYJ016(GenBankaccessionnumbers BA000037,BA000037,AP005352).Thelackofgenomicsequencedatafromclade1strainsisaserious impedimenttounderstandin gthedifferencesinvirulencebetweenthetwocladesandindissectingthe virulenceof V.vulnificus ingeneral.Wetherefore undertookthepresentstudytorapidlyandeconomicallyobtaingenomicsequenceofnumerous V.vulnificus strainsrepresentingbothcladesandthetwomajor biotypes. Wehypothesizedthatclade2strainsaremorevirulent,atleastinpart,becausetheycontainuniquevirulencegenesthataremissinginmostclade1strains. Therefore,identifyingDNAsequencescommontoclade 2strainsandmissingfromclade1strainswouldcreate asetofputativevirulencegenesthatcouldbesubsequentlyexperimentallyexamined.Becauseofthepropensityofclade1strainstobeassociatedwithoysters, thesestrainsmaypossessuniquegenesenablingcolonizationofshellfish.Therefore,uniqueclade1genesoffer insightintotheVibrio-oysterrelationship.However,not allgenesuniquelyassociatedwithonecladewillbe involvedwithinteractionswithanimalhosts,andvirulencegeneswillnotnecessarilybepresentonlyinvirulentgenotypes.Anexampleoftheformeristhatthe abilityof V.vulnificus tofermentmannitolisassociated withtheclusterofstrainsthatwearecallingclade2 mostoftenderivedfromclinicalcases[22],andan exampleofthelatteristhenearlyuniversalpresenceof theRtxA1toxininbothvirulentandattenuated V.vulnificus strains(Joseph,J.L.,etal.,inpreparation).Finally, bycomparingthegenomesofavarietyofstrainsrepresentingthedifferentcladesandbiotypes,thesetof genesinthe V.vulnificus genomesharedbyall V.vulnificus strainscanbeidentified.Overandaboveidentifyingrelationshipsbetweenthepresenceand/orabsence ofgenesamongstrains,identifyingsinglenucleotide polymorphisms(SNPs)cou ldalsorevealthegenetic basisfordifferentialvirulenceandshellfish-colonizing phenotypes,aswellasotherphenotypes. Giventhesegoals,weusedtheSOLiDsequencingsystem onfour V.vulnificus strains,eachofwhichrepresented auniquegenotype/virulencephenotypecombination (Table1). V.vulnificus M06-24/O[4]isatypicalclade 2strainexhibitingahighlevelofvirulenceinthesubcutaneouslyinoculatediro ndextran-treatedmouse model[23-25].Strain99-520DP-B8[25]isatypical clade1strainthatcaninfectskintissuesbutisdefectiveatcausingsystemicinfectionanddeathinthe mousemodel.Strain99-738DP-B5[25]isanunusual clade1strainthatishighlyvirulentinthemouse Table1Genotypesandvirulencephenotypesofthe V.vulnificus strainswhosegenomesweresequencedinthis study*StrainSourceBiotypeMLST* vcg rrn rep-PCR*SkinInfectionLiverInfection/Death M06-24/OClinical12CB8++ 99-520DP-B8Oyster11EAB7+99-738DP-B5Oyster11EA7++ ATCC33149Eel21EA5--*Virulencedataforbiotype1strainsarefromThiaville,P.C.,etal.(Infect.Immun.,submitted)andforATCC33149arefromthisstudy.MLST, vcg ,andrep-PCR dataarefromMahmud,etal.[83]. rrn dataforbiotype1strainsarefromThiaville,P.C.,etal.(Infect.Immun.,submitted)andforATCC33149arefromVickery etal.[84].Gulig etal BMCGenomics 2010, 11 :512 Page2of16


model,causingsystemici nfectionanddeath.ATCC 33149[26]istypicalbiotype2strainisolatedfrom aneel.UsingSOLiDsequencingenabledustoobtain approximately100Xcoveragewith35-ntreadsamong fourgenomes.Thisselectionofstrainsanalyzedwith SOLiDsequencingenabledcomparativegenomicstobe performedandidentifiedclade2-specificgenomic sequencesandthegenesof V.vulnificus sharedamong allofthestrainssequencedtodate.ResultsNumbersofSOLiDsequencingreadsWeperformedSOLiDsequencingonthegenomesof four V.vulnificus strainstoincreasetheunderstanding ofthegeneticdifferencesbetweenthetwomajorclades andthebiotypesofthisorganismandtopossiblyidentifysequencesassociatedwithdifferencesinvirulence potentialinoursubcutaneouslyinoculatedirondextrantreatedmousemodel[23-25]. V.vulnificus 99-520DPB8and99-738DP-B5areclade1strains,typicallyisolatedfromenvironmentalsources.Strain99-520DP-B8 exhibitsthetypicalattenuatedvirulenceofclade1 strains,i.e.,itcancauseskininfectionbutisdefectiveat causingsystemicinfectionanddeath.Incontrast,strain 99-738DP-B5exhibitsahighlevelvirulencemorecharacteristicofclade2strains,i.e.,itcausesskininfection, systemicinfection,anddeath(Thiaville,P.C.,etal., Infect.Immun.,submitted). V.vulnificus M06-24/Oisa typicalclade2strainwithfullvirulencethathasbeen widelyusedinexaminingmolecularpathogenesisby manylaboratories[4]. V.vulnificus ATCC33149isa biotype2strainisolatedfromaneel[26].GenomicDNA fromeachofthesestrainswasloadedontoone-fourthof a25mm75mmSOLiD ™ slideforsequencingonan AppliedBiosystemsSOLiD ™ apparatusattheUniversity ofFloridaInterdisciplinaryCenterforBiotechnology Research,asdescribedintheMethods.Thetotalnumbersof35-bpreadsforeachstrainwereasfollows:99520DP-B8-3.16107,99-738DP-B5-3.21107, M06-24/O-3.50107,andATCC33149-3.38107. Thesetotalsrepresentedputative210-to239-foldcoverageofeachofthegenomes,ontheassumptionthatallof thedatawereusable.Thereadsfromeachofthefour newlysequencedgenomesh avebeendepositedintothe NCBIShortReadArchive(accessionnumber SRA009283.2).ComparisonofSOLiDsequencingreadstoreference V.vulnificus genomesReadsweremappedontothetworeference V.vulnificus genomes,CMCP6andYJ016usingMAQ[27].ThisanalysisenabledtheidentificationofDNAsequencesandORFs thatwerepresentinthenewlysequencedstrainsthathave alreadybeendescribedinthereferencestrains.Graphical representationofthecoverageoftheCMCP6andYJ016 genomesbythereadsfromeachofthefournewly sequenced V.vulnificus strainsisshowninFigure1. Wethenmappedreadstoplasmidsdescribedforbiotype2 V.vulnificus andwhoseDNAsequencesareknown (pR99,pC4602-1,andpC4602-2)[28].Asexpected, greaterthan90%ofallthreeofthesereferencesequences werematchedtoreadsfrombiotype2ATCC33149,and lesserhomologieswereobservedforthebiotype1strains (AdditionalFile1,TableS1).Forclade1strains,between 37and56%oftheseplasmidsequencesmatchedwith99738DP-B5,andonly6to20%ofplasmidsequenced matchedtheSOLiDreadsfromstrain99-520DP-B8. Theseresultssuggestedthat99-738DP-B5wouldhavea plasmid,whereas99-520DP-B8wouldnot,andweconfirmedthisbygelelectrophoresisofextractedplasmid DNA(datanotshown).ThereadsfromstrainM06-24/O, whichisaclade2strainandleastrelatedtotheother strains,onlymatchedto1 %ofplasmidpC4602-1and failedtomatchtoanysequencesofplasmidspC4602-2 andpR99.ThisisinagreementwithM06-24/Onothaving aplasmid[29]. Despitethepredictionofapproximately210-foldcoveragebasedontherawnumberofreadsobtainedforeach genome,coveragewasactuallyontheorderof100-fold. Intotal,45%to64%oftherawsequencingreadsmapped tooneofthetworeferencegenomes,leavingaconsiderablenumberofunmappedreads.Someofthesereads wereoflowcomplexityandmayrepresentsequencing error.Becauseapproximately14%ofbothCMCP6and YJ016arelowcomplexity,theseunmappedreadsalso maybederivedfromregionsoflowcomplexityinthe sequencedgenomes.Itisalimitationoftheshortread technologythatwecannotdis tinguishamongthesescenarios.Fortheremainingunmappedreadsthatwerenot oflowcomplexity,therearetwopossibilities:thesereads representedtrulyuniquesequencesforthenewly sequencedgenomesorthesereadswereerrorsinthe sequencingsystem.Inanattempttoseparatethesetwo possibilities,theseunmap pedreadswerecomparedto severalbacterialgenomesbymappingthereadsin SOLiDcolorspaceusingMAQ[27].Thiswouldidentify orthologsof V.vulnificus strainsinotherspecies.The largestnumberofmatches(273,045)wasfoundwiththe genomicsequenceof V.cholerae NC16961(GenBank accessionnumbersAE003852andAE003852).(AdditionalFile2,Ta bleS2).These V.cholerae matches yielded20genesintotalfromthefoursequencedgenomes.Ofthese V.cholerae genes,sixteenwereidentified fromonlyasingle V.vulnificus strain.Othernovelgenes maystillbefound,buttheywouldbegenesnotpreviouslyidentifiedinanyotherbacterialgenomes. Therewerebetween15and22millionunmatched readsforeachofthenewlysequencedgenomes.TheGulig etal BMCGenomics 2010, 11 :512 Page3of16


causeofsuchalargeamountofdatawithnosimilarity toknowngenescannotbeexplainedbylowcomplexity alone,asmanyofthesereadsarenotoflowcomplexity. Whileitremainspossiblethatnovelgenesareincluded inthesedata,itisalsopossiblethatthesereadsarejust noisefromthetechnology. Figure1,whichgraphicallyshowsthecoverageof thereferencegenomeelementsbyeachofthenewly sequencedgenomicreads,revealslargeregionsoneach ofthereferencegenomicelementsforwhichtherewere nomatchedreadsfromeachofthenewlysequenced genomes.Detailedcomparisonsofcoveragegenerated Figure1 Graphicalrepresentationofcoverageofthereferencegenomecomponentsbysequencesofeachofthefournewly sequencedgenomes .Thedepthofcoverage(numberofmatched35-ntreadsper100-ntwindowofthereferencegenomes)isplottedfor bothchromosomesofthereferenceCMCP6andYJ016genomesandtheYJ016plasmid.Thesourcestrainforthereadsbeingmatchedareas follows:M06-M06-24/O,B5-99-738DP-B5,B8-99-520DP-B8,ATCC-ATCC33149.Itshouldbenotedthatcoverageofthereferencegenomes isnotascontinuousasitappearsinthefigures. Gulig etal BMCGenomics 2010, 11 :512 Page4of16


listsofthegenesofCMCP6andYJ016lackingsignificantdepthofcoveragefromthenewlysequencedreads (AdditionalFile3,TableS3AandAdditionalFile4, TableS3B,respectively).Therewere309ORFsunique toCMCP6and489ORFSuniquetoYJ016relativeto theotherfivesequencedstrains.InCMCP6chromosome1,twolargeregionsthatwerenotpresentinany ofthefournewlysequencedgenomesincludedgenes VV1_0063toVV1_0124andVV1_0374toVV1_0400. Theseregions,whichwerealsomissingfromYJ016, appeartoencodephagegenomes.Theycontaingenes annotatedasbacteriophagephi 1.45protein-likeprotein (VV1_0066),P2-likeprophagetailprotein(VV1_0086), phageintegrase(VV1_0372),ortheyresembledother mobilegeneticelementswithputativetransposases (VV1_0385,VV1_0386). AnotherCMCP6-specificregionspannedthebeginning andendsofchromosome1(genesVV1_0001to VV1_0011andVV1_3192toVV1_3205).Thisregion alsoappearedtoencodeaphageorothermobilegenetic element.AsmallerCMCP6-specificregionlocatedat genesVV1_0777toVV1_0781appearedtoencodesugar metabolismgenespossiblyinvolvedinlipopolysaccharide (LPS)orcapsularbiosynt hesis.CMCP6chromosome2 containedaverylargeregionatgenesVV2_0630to VV2_0712notpresentinanyotherstrains.Thisregion appearedtohavebeenderivedfromamobilegenetic element,eitheraphageortransposon.Therewerealso smallerregionsuniquetoCMCP6onchromosome2. TheYJ016genomesimilarlycontainednumerous regionsthatwerenotpresentinanyoftheothernewly sequencedgenomes.Onchrom osome1,YJ016-specific geneswerelocatedatVV0130toVV0165,VV0343to VV0367,VV0514toVV0559,VV0799toVV0817,and VV2191toVV2262.ThelargestoftheseYJ106regions atVV2191toVV2262appearedtobephage-related. AsimilarpatternwasevidentforYJ016chromosome2. AverylargeYJ106-specificregionspanningVVA0825 toVVA0888wasnotable.Thisregionconsistedmainly ofhypotheticalproteins,b utthereisapossibilitythat thisregionisphage-related,asVVA0886isannotatedas aphageintegrase. ThecoverageoftheYJ106plasmid,whichencodes69 genes,wasverydifferentamongthefournewly sequencedgenomes.Thegenomescontainingthemost matcheswere99-738DP-B5andATCC33149,with50 and44genes,respectively,matchedtotheYJ016plasmid.Asexpected,both99-738DP-B5andATCC33149 containplasmids.NoneoftheYJ016plasmidgenes matchedtothereadsof99-520DP-B8orM06-24/O, neitherofwhichcontainsplasmids. V.vulnificus ,likeother Vibrio species,encodesa super-integrononitslargechromosome[30].Integrons arespecificregionsofgenomicsequencethathavethe abilitytoaccumulategeneca ssettesviasite-specific recombination[31].Theyarelocatedingenomesat attI sitesandcontainasite-specificintegrase, intI ,thatmediatesacquisitionofgenecassettesatrepetitive attC sites, whicharegenerallyconservedamongcloselyrelated bacteria.Thevibriointegronsarecalledsuper-integrons becauseoftheirunusuallylargesizes[32].InCMCP6 thesuper-integronspansgenesVV1_2401toVV1_2501, andinYJ016thesuper-integronspansgenesVV1745to VV1941.AsshowninAdditionalFile3,TableS3Aand AdditionalFile4,TableS3B ,thegenesencodedwithin thesesuper-integronsaremostlystrain-specific,nothavingsignificanthomologywiththefournewlysequenced genomesoreachother.ItisinterestingthatthesuperintegronsdidnotappearinFigure1asmissingfrom thenewlysequencedgenomes,mostlikelybecauseof the attC sitesandpresenceofinfrequenthomologous genesbetweenthegenomes. Incontrasttoidentifyingsequencesmissingfromthe newlysequencedgenomes,wealsoidentifiedthegenes sharedamongallofthesixgenomes,therebyidentifying thecore V.vulnificus genome.Uptothispoint,shared genesbasedonthetworeferencegenomesnumbered 3,915genes.Addingourfournewlysequencedgenomes, thereare3,459genescommontoallsequenced V.vulnificus strains,listedinAdditionalFile5,TableS4.The numberofsharedgenescanonlygetsmallerasmore genomesaresequenced.Sincethereare4,473proteincodinggenesintheCMCP6genomeand5,024proteincodinggenesintheYJ016genome,butonly3,915genes sharedbetweenthem,thereisclearlyanenormous amountofstrain-specificsequenceamongtheseclade2 strains.Thefrequencyofhypotheticalproteinsinthe coregenomewas20.3%comparedwiththeoverallfrequencyof23.6%intheCMCP6genome. Thetotalnumberofgenesobtainedbycombiningthe CMCP6andYJ016referencegenomesandexcluding redundancyis5,630.Amongthe4,473genesinthe CMCP6genome,309(6.9%)wereuniquetothisstrain, andamongthe5,026genesintheYJ016genome,489 (9.7%)wereuniquetothisstrainrelativetoallofthe othergenomes.Bycombiningthematchesforeach strainwiththereferencegenomesweidentifiedthefollowingnumbersofgenesforeachstrain:ATCC331494,184;99-738DP-B5-4,359;99-520DP-B8-4,225;and M06-24/O-4,534.Genomicinferenceofdifferent V.vulnificus genotypesWeaskedwhichgeneswerecommononlytothethree biotype1/clade2strains,butnotpresentinthetwobiotype1/clade1strainsorthebiotype2strain,because thiscouldhelpidentifythegenesthatareresponsible fortheincreasedvirulenceofclade2strains(Thiaville, P.C.etal.,Infect.Immun.,submitted).The80clade2-Gulig etal BMCGenomics 2010, 11 :512 Page5of16


specificgenesarelistedinTable2.Amongthenotable clade2-specificgenesandregionsareseveralGGDEF proteins(genesVV1_2061,VV1_2228,VV1_2321inthe CMCP6genome)andaFlppilus-codingregion(genes VV1_2330toVV1_2337intheCMCP6genome). GGDEFproteinsareinvolvedwithsignaltransduction inmanybacteriabyregulatingintracellularlevelsofthe signalingmoleculecyclic-di-GMP[33],andFlppili couldbeinvolvedwithadherenceorgeneticexchange [34].Hypotheticalproteinscomprised36.3%oftheclade 2-specificgenes,comparedwiththeoverallfrequencyof 23.6%ofhypotheticalproteinsintheCMCP6genome. Becausethereferencestrainsarebothclade2,anyclade 1-specificgeneswillbemissedinthisinitialmapping. Strain99-520DP-B8isatypicalclade1strainwith attenuatedvirulence,whilestrain99-738DP-B5isa clade1strainwithhighvirulencetypicalofclade2 strains.Therewere61genesin99-738DP-B5thatwere commontothethreeclade2strainsbutmissingfrom attenuatedclade1strain99-520DP-B8andbiotype2 strainATCC33149.(Table3).Thissetofgenescould containvirulencegenesacquiredby99-738DP-B5that endowitwithclade2-likevirulence.Hypotheticalproteinscomprised19.7%ofthissetofgenes,compared withtheoverallfrequencyof23.6%hypotheticalproteinsintheCMCP6genome.Itisnoteworthythatthe clade2+99-738DP-B5specificsetofgenesincludes genomicislandXIIidentifiedbyCohenetal.[20]as beingpresentinmostclade2strainsandmissingfrom mostclade1strains(genesVV2_1090toVV2_1111on theCMCP6genome).Theyhypothesizedthatgenomic islandXIIcouldberesponsiblefortheputativedifferentialvirulenceofclade2strains,evidencedbytheirassociationwithclinicalcases. WithingenomicislandXIIareparalogsofgalactose utilizationgenes(VV2_1095,aparalogof galE2 encodingUDP-glucose4-epimeraseandVV2_1094,aparalog of galT2 encodinggalactose-1-phosphateuridylyltransferase)thatareinanoperonwithapredictedsulfate transporter(VV2_1096).Thecanonical galE (VV1_1770) and galT (VV1_1771)arelocatedelsewhereinthe galETKM operon(VV_1770toVV1_1773).Thepresenceofadditional galET genesinasubsetof V.vulnificus strainswithhighvirulencesuggestsaroleforthese genesinanothermetabolicpathwaypossiblybenefitting thebacteriaduringinfection. Thelinkwithsulfatemetabolismwasintriguing becausefiveothergenomicislandXIIgenesareannotatedasarylsulfataseA(VV2_1106,VV2_1108, VV2_1109,andVV2_0151)oralkylsulfatase (VV2_0989).Theseenzymeshydrolyzethesulfatefrom sulfatedgangliosides(sulfatides).VV2_1098and VV2_1110inthegenomicislandencodechondroitinases (althoughtheyarenotannotatedassuchinthe referencegenomesites).Sulfatidesareimportantcomponentsofconnectivetissueinvolvedwithcelladhesion [35]andserveasthereceptorsforvariousmicrobial pathogensrangingfromHIV[36], Bordetellapertussis [37],and Helicobacterpylori [38].Anarylsulfataseof E. coli K1isnecessaryforinvasionoftheblood-brainbarrier[39];hence,suchactivityinvirulent V.vulnificus strainscouldenableinvasi onthroughtissues,whichis characteristicof V.vulnificus infection.Inclinical V.vulnificus isolates,thepresenceofregionXII,encodingarylsulfatases,chondroitinases,sulfatetransport,and sulfatemetabolismfunctions,suggeststhatthisregion mayhaveanimportantscavengingfunctionremoving sulfategroupsfromhostcomponents,therebyproviding sulfurand/orcarbonsources,whichcouldfacilitatesurvivalinthehumanhostwherefreesulfurislimited. However,asnotedabove,someofthedegradative enzymesingenomicislandXIIcouldalsobeinvolvedin invasionoftissues.Cohenetal.[20]hadnotedthepresenceofsuchgenesingenomicislandXIIpredominant inthecladeof V.vulnificus strainsmostassociatedwith clinicalstrains.Theexclusivepresenceofallofthese genesinclade2plusthehighlyvirulentclade1strain 99-738DP-B5suggestsaroleinvirulence.Thedissectionoftherolesinvirulence,ifany,playedbythese genomicislandXIIgenesidentifiedthroughourcomparativegenomicanalysiswillawaitconstructionand analysisofspecificmutants.However,Bryantetal.[40] describedtheuseofsodiumdodecylsulfate-polymyxin B-sucroseplatesfortheidentificationof V.vulnificus fromshellfishsamples.Theabilityofbacteriatoform halosaroundcoloniesonthismediumisindicativeof alkylsulfataseactivity.Incontrasttoourdetermination thatVV2_0989isabsentinthebiotype2strainATCC 33149andclade1strain99-520DP-B8andtheresults ofCohenetal.[20]similarlydescribingthelimitedpresenceofgenomicisland12among V.vulnificus strains, Bryantetal.observedthatall20 V.vulnificus strains examinedpossessedalkylsulfataseactivity.However, VV2_0885andVV2_1032are alsoannotatedasalkyl sulfatase.OurresultsshowthatVV2_0885ispresentin allsixstrainsexcept99-738DP-B5andVV2_1032is presentinallsixstrains.Hence,itwouldbeexpected thatall V.vulnificus strainswouldexhibitalkylsulfatase activity,inagreementwithBryantetal.[40]. Alsoofnoteintheclade2plus99-738DP-B5-specific genesnotpresentingenomicislandXIIwerelinkedgenes possiblyinvolvedwithsialicacidcatabolism:N-acetylneuraminatelyase(NanA,VV2_0730),aTRAPtransportsystempossiblyinvolvedwithsialicacidtransport(VV2_0731 toVV2_0733),N-acetylmannosamine-6-phosphate2-epimerase(NanE,VV2_0734),N-acetylmannosaminekinase (NanK,VV2_0735),andN-acetylglucosamine-6-phosphate deacetylase(NagA,VV2_0736).Becausethe nagB geneGulig etal BMCGenomics 2010, 11 :512 Page6of16


(VV2_1200,glucosamine-6-Pdeaminase)isinthe V. vulnificus coregenome,theclade2strainsand99-738 DP-B5uniquelyhavetheabilitytoassimilateexogenoussialicacidintocentralmetabolismasfructose6-phosphate,relativetotheotherclade1strainsand biotype2strains.However, V.vulnificus doesnot encodeaneuraminidase(NanH)whichwouldliberate sialicacidfromhostcomponents.Almagro-Moreno andBoyd[41]hadnotedthatsialicacidmetabolism wasuniquetobacteriathatinteractedwithmammalian hosts,eitheraspathogensorascommensals.Jeonget al.[42]recentlyconstructeda nanA deletionin V.vulnificus andconfirmedthattheabilitytoutilizeexogenoussialicacidwasessentialforvirulencein intraperitoneallyinoculatedirondextran-treatedmice, aswellascytotoxicityincellcultureassays.They focusedanalysisof nanA onasingle V.vulnificus Table2Clade2-specificgenesTagProductGeneCog VV1_0456putativetranscriptionalregulator-COG0583K VV1_0457hypotheticalproteinVV1_0457-VV1_0458hypotheticalproteinVV1_0458-VV1_0459hypotheticalproteinVV1_0459-VV1_0465exopolyphosphatase-COG0248FP VV1_0515hypotheticalproteinVV1_0515-COG3930S VV1_0766hypotheticalproteinVV1_0766-VV1_0789hypotheticalproteinVV1_0789-VV1_1090hypotheticalproteinVV1_1090-VV1_1094chromosomesegregationATPase-VV1_1095Serine/threonineproteinkinase-COG0515RTKL VV1_15183-methyladenineDNAglycosylase-COG0122L VV1_1751hypotheticalproteinVV1_1751-VV1_2031TypeIrestrictionenzymeEcoEIM protein -COG0286V VV1_2037TypeIrestrictionenzymeEcoEIR protein -COG4096V VV1_2038transcriptionalregulator-VV1_2061GGDEFfamilyproteinOMPH_PHOPR porin-likeproteinH -COG2199T VV1_2114precursor-COG3203M VV1_2115hypotheticalproteinVV1_2115-COG3110S VV1_2158methyl-acceptingchemotaxisprotein-COG0840NT VV1_2183hypotheticalproteinVV1_2183-COG2378K VV1_2184ATPaseinvolvedinDNArepair-COG0419L VV1_2228GGDEFfamilyprotein-COG3706T VV1_2321GGDEFfamilyprotein-COG3614T VV1_2326hypotheticalproteinVV1_2326-VV1_2327hypotheticalproteinVV1_2327-VV1_2329hypotheticalproteinVV1_2329-VV1_2330FlppilusassemblyproteinCpaB-COG3745U VV1_2331Flppilusassemblyprotein-COG4964U VV1_2332hypotheticalproteinVV1_2332-VV1_2333pilusassemblyproteinCpaE-like protein -COG4963U VV1_2334Flppilusassemblyprotein-COG4962U VV1_2335FlppilusassemblyproteinTadB-COG4965U VV1_2336FlppilusassemblyproteinTadC-COG4965U VV1_2337FlppilusassemblyproteinTadD-COG5010U VV1_2338hypotheticalproteinVV1_2338-VV1_2339hypotheticalproteinVV1_2339-COG4961U VV1_2340hypotheticalproteinVV1_2340-VV1_2341azoreductaseacpDCOG1182I VV1_2401super-integronintegraseIntIA-COG4974L VV1_2708hypotheticalproteinVV1_2708-VV1_2748responseregulator-COG3437KT VV1_2758aminoacidtransporter-VV1_2840NhaP-typeNa+/H+andK+/H+ antiporters -COG0025P VV1_2868methyl-acceptingchemotaxisprotein-COG0840NT VV1_3144hypotheticalproteinVV1_3144-VV2_0019alcoholdehydrogenase-COG1454C VV2_0073anti-anti-sigmaregulatoryfactor-COG1366T Table2:Clade2-specificgenes (Continued)VV2_0074anti-anti-sigmaregulatoryfactor-COG1366T VV2_0075anti-sigmaregulatoryfactor-COG2172T VV2_0076SerinephosphataseRsbU-VV2_0077FOG:CheY-likereceiver-COG0642T VV2_0078responseregulatorAraC-typeDNAbindingdomain-containing -COG3437KT VV2_0212protein-COG2207K VV2_0312hypotheticalproteinVV2_0312-VV2_0313responseregulator-COG0745TK VV2_0627hypotheticalproteinVV2_0627AraCtypeDNA-bindingdomain-containing -COG2378K VV2_0782protein-COG2207K VV2_0783majorfacilitatorsuperfamilypermease-VV2_0851hypotheticalproteinVV2_0851-COG0845M VV2_0864hypotheticalproteinVV2_0864-VV2_0868acetyltransferase-COG0456R VV2_0881long-chainfattyacidABCtransporter-COG2067I VV2_0884Mg2+andCo2+transporter-VV2_0993transcriptionalregulator-COG0583K VV2_0994multidrugresistanceeffluxpump-COG1566V VV2_1075dehydrogenase-COG1028IQR VV2_1138hypotheticalproteinVV2_1138-COG3904S VV2_1149hypotheticalproteinVV2_1149-VV2_1186transcriptionalregulator-COG0583K VV2_1203hypotheticalproteinVV2_1203-COG3930S VV2_1204glutathionesynthetase-COG0189HJ VV2_1273transcriptionalregulator-COG0583K VV2_1274Ca2+/H+antiporter-COG0387P VV2_1275hypotheticalproteinVV2_1275-COG0586S VV2_1290hypotheticalproteinVV2_1290-COG0834ET VV2_1303hypotheticalproteinVV2_1303-VV2_1304Beta-glucosidase-relatedglycosidase-COG1472G VV2_1309DMTfamilypermease-VV2_1363transcriptionalregulator-COG0583K Gulig etal BMCGenomics 2010, 11 :512 Page7of16


strainanddidnotperformcomparativegenetics amongstrainsofdifferentgenotypesorvirulencephenotypes.Thesummationofthesedataregarding nanA isthatourcomparativegenomicsequencingcorrectly identifieduniquevirulencegenesamongdifferentsets of V.vulnificus Anothercarbonsourceutiliz ationpathwayspecificto theclade2plus99-738DP-B5strainsbutnotingenomic islandXIIisacompletemannitolcatabolicpathway encodingthemannitol/fructose-specificphosphotransferasesystemIIAprotein( mtlABC ,VV1_0638),mannitol-1phosphate5-dehydrogenase( mtlD ,VV1_0639),anda specificmannitolrepressor( mtlR ,VV1_0640).Thesignificanceofthesegenestovirulenceisunknown.Interestingly,byexamining465 V.vulnificus strains,Drakeet al.[22]previouslydeterminedthattheabilitytoferment mannitolby V.vulnificus washighlycorrelatedwitha strainbeingin,whatwearecalling,clade2.Tisonetal. [14]reportedthatbiotype2strainsweremannitolnegative.Oursequencingdata,albeitonaconsiderably smallersamplesizeofstrains,thereforecorroboratethe phenotypicanalysesofthesetwopreviousstudies.SNPanalysisInadditiontothepresenceorabsenceofwholegenesor blocksofgenes,detailedabove,geneticvariationamong thesequencedstrainsalsoconsistedofnucleotide Table3Genescommonto V.vulnificus 99738DPB5and clade2strainsTagProductGenecog VV1_0251hypotheticalproteinVV1_0251-COG3094S VV1_0411choline-glycinebetainetransporter-COG1292M VV1_0638mannitol/fructose-specific phosphotransferasesystemIIAprotein -COG2213G VV1_0639mannitol-1-phosphate5dehydrogenase -COG0246G VV1_0640mannitolrepressorproteinmtlRCOG3722K VV1_0641D-fructose-6-phosphate amidotransferase -COG0449M VV1_0834DMTfamilypermease-VV1_0835hypotheticalproteinVV1_0835-VV1_1655H+/gluconatesymporter-COG2610GE VV1_1656sugardiacidutilizationregulator-COG3835KT VV1_2188helicase-relatedprotein-COG1061KL VV1_2189telluriteresistanceprotein-related protein -COG2227H VV1_2744responseregulator-COG2197TK VV1_2936putativetranscriptionalregulator-VV2_0151arylsulfataseA-COG3119P VV2_0335methyl-acceptingchemotaxisprotein-COG0840NT VV2_0542manganesetransporterNRAMP-COG1914P VV2_0726hypotheticalproteinVV2_0726-COG3055S VV2_0729transcriptionalregulator-COG1737K VV2_0730dihydrodipicolinatesynthase/ Nacetylneuraminatelyase -COG0329EM VV2_0731TRAP-typeC4-dicarboxylatetransport System -COG1593G VV2_0732TRAP-typeC4-dicarboxylatetransport system -COG3090G VV2_0733TRAP-typeC4-dicarboxylatetransport system -COG1638G VV2_0734N-acetylmannosamine-6-phosphate2epimerase -COG3010G VV2_0735N-acetylmannosaminekinase-COG1940KG VV2_0736N-acetylglucosamine-6-phosphate deacetylase -COG1820G VV2_0892diadenosinetetraphosphatehydrolase-COG0537FGR VV2_0893arseniteeffluxpumpACR3-COG0798P VV2_0894transcriptionalregulator-COG0640K VV2_0920amidohydrolase-COG0388R VV2_0988hypotheticalproteinVV2_0988-VV2_0989Alkylsulfatase-COG2015Q VV2_1035ABCtransporterpermease-COG3932R VV2_1090hypotheticalproteinVV2_1090-VV2_1091hypotheticalproteinVV2_1091-VV2_1092hypotheticalproteinVV2_1092-VV2_10932-deoxy-D-gluconate3dehydrogenase -COG1028IQR VV2_1094galactose-1-phosphate uridylyltransferase -COG1085C VV2_1095UDP-glucose4-epimerase-COG1087M VV2_1096Sulfatepermease-COG0659P VV2_1097hypotheticalproteinVV2_1097-Table3:Genescommonto V.vulnificus 99738DPB5 andclade2strains (Continued)VV2_1098CBSdomain-containingprotein-COG3448T VV2_1099methyl-acceptingchemotaxisprotein-COG0840NT VV2_1100ATPasecomponentofvariousABCtypetransportsystem -COG1123R VV2_1101ABC-typedipeptide/oligopeptide/ nickeltransportsystem -COG1173EP VV2_1102ABC-typedipeptide/oligopeptide/ nickeltransportsystem -COG0601EP VV2_1104ABC-typedipeptidetransportsystem-COG0747E VV2_1105hypotheticalproteinVV2_1105-COG4289S VV2_1106arylsulfataseA-COG3119P VV2_1107arylsulfataseregulator-COG0641R VV2_1108arylsulfataseA-COG3119P VV2_1109arylsulfataseA-COG3119P VV2_1110hypotheticalproteinVV2_1110-VV2_1259hypotheticalproteinVV2_1259-VV2_1403GGDEFdomain-containingprotein-COG2199T VV2_1505hypotheticalproteinVV2_1505-COG1233Q VV2_1508putativetwo-componentresponse regulator -COG2197TK VV2_1509GGDEFfamilyprotein-COG2199T VV2_1510responseregulator-COG2197TK VV2_1511responseregulatorVieA-COG2200T VV2_1512sensorkinaseVieS-COG0642T Gulig etal BMCGenomics 2010, 11 :512 Page8of16


polymorphisms.WeusedMAQtoidentifySNPspresent inthenewlysequencedgenomesrelativetothereference genomes.TheSNPsfromeachofthepairwiseanalyses versusthereferencegenomesarelistedinAdditionalFiles 6,7,8,9,10,11,12,and13,andthesummaryofthenumbersofSNPsfromeachsequencedstrainrelativetothe referencegenomesisshowninTable4.Inexamining SNPs,wedidnotexcludeanysetsofgenes,suchasputativemobilegeneticelements,e.g.,phages.Itisinteresting thatM06-24/O,whichhadthehighestamountofcoverage relativetothereferencegenomes,hadthelowestnumber ofSNPs(meanof42,191SNPsperreferencegenome) comparedwiththeotherthreestrains(meanof73,130 SNPsperreferencegenome).Thislikelyreflectsthefact thatM06-24/Oisinthesamecladeasthereference genomes. TheaccuracyoftheSOLiD-basedSNPsinidentifying polymorphismswasverifiedbyexaminingSanger sequencingofspecificgenomicregionsofeachofthese strains.Havingexamined8.7kbofSanger-derived sequencethatcontainedSNPsidentifiedfromour SOLiDsequencing,wedeterminedthat126of128SNPs wereaccuratelyidentified(98.4%accuracy). Wethenexaminedthedistr ibutionofnonredundant SNPsamongdifferentsetsofannotatedORFsusingthe CMCP6referencegenome.Itmustbeemphasizedthat thevastmajorityofannotatedORFshavenotbeen experimentallyverified;hence,suchananalysisisconjectural.Ofthe201,981nonredundantSNPsinthe CMCP6genomefromallfoursequencedstrains, 177,464fellwithinannotatedORFs(87.9%).Thiswas notunexpectedsincethisfigureapproximatesthe amountofthegenomecontainedwithinannotated ORFs[30].However,otherinterestingtrendswereevident.Therewerehighlysignificantdifferencesinthe frequenciesofSNPsbetweenchromosomes1and2of CMCP6.AmongtheannotatedORFs,therewere0.037 SNPs/baseforchromosome1and0.044SNPs/basein chromosome2.AmongtheothersetsofORFs,there weresignificantlymoreSNPs/baseinthecoregenome (0.043SNPs/base)thaninthetotalORFs(0.040SNPs/ base)(Figure2).Asopposedtotheinferencethatthe coregenomeisactuallymorevariableamongstrains, thisdifferencemostlikelyisduetothefactthatthe coregenome,bydefinition,wassharedamongallofthe sequencedstrains;hence,hadmoresharedsequencesin whichSNPscouldbeidentifi ed.Incontrast,thelowest rateofSNPswasamongtheclade2-specificgenes,with only0.019SNPs/base.Intheoppositemannertothe coregenome,thisresultwouldbeexpectedsince theclade2-specificgenesareuniqueandsharedamong thesetofthreegeneticallyrelatedclade2strainsand becauseonlyonenewlysequencedclade2strain,M0624/O,contributedtothisparticularSNPpool.ThefrequencyofSNPsintheclade2+99-738DP-B5setof ORFswas0.033SNPs/base.ThefrequencyofSNPs amonghypotheticalproteins(0.037SNPs/base)wassignificantlylowerthanthatofthetotalORFs.Lineage-specificExpansionsGuetal.[43].recentlyreportedananalysisofnumerous Vibrio spp.toidentifylineage-specificexpansions(LSEs), genesthathavebeenduplicatedwithinaspeciesorgenotype.SomeLSEsarespecifictosinglestrain,whileothers arepresentamongvariedstrainsacrossspecies.We examinedsomeoftheLSEspresentinthereferencegenomesof V.vulnificus todetermineiftheselociaresimilarlypresentinthenewlysequenced V.vulnificus genomes.Wedidnotfindapatterntothepresenceor absenceoftheLSEsexamined.Forexample,VV1_3196 andVV2_0703formapairofLSEgenesinCMCP6. NeitherofthesegeneshasahomologueinYJ016orany ofthenewlysequenced V.vulnificus genomes.Incontrast,VV1_2851andVV2_0347constituteapairofLSEs inCMCP6thathavehomologuesinYJ106(VV1419and VVA0904).TheVV1_2851/VV1419pairofgeneshas homologuesinallofthefournewlysequencedgenomes, whileVV2_0347andVVA0904donot.DiscussionThisstudyisoneofthefirsttouseAppliedBiosystems SOLiDsequencingforgenomicsequencingofbacteria. Wholegenomeanalysishasprogressedconsiderably sincethepublicationofthefirstcompleteDNAsequence ofthepathogenicbacterium Haemophilusinfluenzae [44].Untilrecently,thewealthofcompletegenomes availableinpublicdatabaseswasdecodedviathelargescaleindustrializationoftheSangerdideoxychain-terminationsequencingmethod[45,46].Theprospectof quicklyandinexpensivelyresequencinglargesegmentsof thehumangenomeorwholegenomesofpopulationsor speciesisdrivingdevelopmentofanewgenerationof sequencingtechnologieswi thimpactsinmicrobiology, functionalgenomics,ecologyandevolutionarybiology, humanhealth,andbeyond[45,47-51].Inparticular,bacterialsequencinghasbeenadvancedbythehighthroughput,parallelformatofthe454Sequencer[51],thefirst ‘ next-generation ’ technologyto denovo sequenceand assemblewholebacterialgenomesincluding Mycoplasma genitalium inasinglemachinerun[52].Bacterialcomparativegenomicshasexp andedrapidlyowingtothe speedofthe454SequencercomparedtoSangersequencing[53]aswellasfromacombinationofthetwotechnologies[54],whileassessmentofmicrobialdiversity fromcomplexcommunities(metagenomics)[55] hasrevealedinsightsintocomplexinteractionssuchasGulig etal BMCGenomics 2010, 11 :512 Page9of16


mammalianobesityandthemicrobiome[56,57],the oceanbiosphere[58],andtheroleofmicrobesincolony collapsedisorderinhoneybees[59]. Morerecentlyreleased ‘ second-generation ’ sequencing technologiessuchtheIlluminaGA2XGenomeAnalyzer (GA)andABISOLiDsystemhavebeendeveloped[51]. Todate,thesenextgenerationsequencingtechnologies generateshorterreadlengthsthanSangersequencing, whichposesadifficultyfor denovo sequenceassembly anddefininglargechromosomalrearrangements[49,51]. Sofarinprokaryotes,highqualitydraftsequenceshave beenassembledintheabsenceofSangersequencingby combingthe454andGAtechnologies[60-64].Studholmeetal.[65]utilizedtheIlluminaplatformalonefor the denovo assemblyofthedraftgenomesequenceof Pseudomonassyringae pathovar tabaci strain11528, revealinginsightsintothenatureoftypeIIIproteinmediatedpathogenicity. Theimprovedthroughputfromthemassivelyparallel formatofthenewplatforms(billionsofbasesinasingle run)isidealforrevealingpatternsofgeneticvariation amongindividualsbyresequencing.Forexample,Srivatsanetal.[66]employedIlluminasequencingtoimprove theexistingdraftoftheext ensivelystudiedmodelbacterium Bacillussubtilis ,whilealsoidentifyingpolymorphismsbetweenotherwellstudiedlaboratory strainsandtheirisolates.Moreover,thismethodwas sensitiveenoughtoidentifytypicallydifficulttoisolate suppressormutationsinasinglestrain[66].Usingthe sameplatform,whole-genomeanalysisof12isolatesof themonomorphichumanpathogen Salmonellaenterica serovarTyphirevealedevolutionarylossofgenefunctionconsistentwiththeeffectsofgeneticdriftona smalleffectivepopulationsize[67].Resequencingofthe Caernohabditiselegans N2BristolstrainandSNPdiscoveryinanotherstraindemonstratedtheeffectiveness ofthistechnologyineukaryotes[68],andsinglebase mutationsinamutant C.elegans strainweremapped, avoidingtraditionalgeneticmappingefforts[69]. Asoneofthenewersecond-generationsequencers currentlyavailable,(although ‘ third-generation ’ single moleculesequencersaresettobemarketedin2010), theABISOLiDplatformhasbeenusedmorewith eukaryotesthanprokaryotes.Oneofthefirststudies focusedonassessingcross-platformperformancefor sequencedetectionofknownmutationsin C.elegans ComparableaccuracybetweenGAandSOLiDformappingthesame C.elegans mutantstrainasSarinetal. [69]wasreported[70].Similarly,comparableaccuracy wasreportedbetween454,GA,andSOLiDmethodsfor comparingamutantstrainofyeasttoareferencegenome[71].AtpresenttheutilityoftheSOLiDplatform isreflectedinseveralresequencingstudiesinhumans, includinghaplotypeanalysis,breakpointmappingindisease-associatedchromosomalrearrangements,andpolymorphismdiscoveryinproteincodingexons[72-74]. Withbacteria,SOLiDsequ encinghasbeenlimitedto verifyingan E.coli referencestrainsequenceinconjunctionwithtraditionalsequencing[75],aswellasresequencingof Bacillusanthracis strainsforrapidand accurateforensictyping[76].Inourpresentlydescribed study,theSOLiDplatformwassuccessfullyutilizedfor rapidcomparativegenomicanalysisofclade-specificand coregenomesequencesoftheopportunisticpathogen V.vulnificus ByexaminingthegenomicDNAofeachoffour V.vulnificus strainsonone-fourthofaSOLiDslide,we obtained3.16107to3.5010735-ntreads.Thislevel ofsequencingyieldedapproximately100-foldcoverage ofeachgenome.Althoughthetotalnumbersofreads wouldhavepredictedover200-foldcoverage,therewas asignificantamountoflowcomplexityreads,aswellas readsthatwereunmappabletothereferencegenomes. Weidentifiedsequencesthatareuniquetothehighly virulentclade2strains.These80genesrepresentthe setthatcouldcontainvirulencegenesthatareresponsiblefortheabilityofclade2strainstocausesystemic infectionanddeathinsubcutaneouslyinoculatediron dextran-treatedmice(Thiav ille,P.C.,etal.,Infect. Immun.submitted).Furthermore,weidentified61additionalgenesthatarecommontotheclade2strainsand anunusualhighlyvirulentclade1strainbutabsent fromatypicalattenuatedclade1strainandabiotype2 eelisolate.These61genesrepresentaveryinteresting setthatcouldcontaingenerallyclade2-specificgenes thatwereacquiredbyaclade1strainandincreasedits virulencetothatoftypicalclade2strains.Amongthese putativevirulencegenesweregenomicislandXII Table4NumbersofSNPsfromeachofthefoursequencedgenomesrelativetothetwochromosomesofthe referencegenomesCMCP6YJ016 Chrom.1Chrom.2TotalChrom.1Chrom.2Total M06-24/O23,75217,39041,14225,53017,70943,239 99-738DP-B546,46927,45773,92646,83327,15273,985 99-520DP-B846,05926,44072,49946,15626,22372,379 ATCC3314946,25926,35572,61447,54925,82873,377 Gulig etal BMCGenomics 2010, 11 :512 Page10of16


identifiedbyCohenetal.[20],andmostinterestingwas asetofgenesinvolvedwithsialicacidcatabolism.Jeong etal.[42]recentlydeterminedthattheabilitytoutilize sialicacidformetabolismwasessentialforvirulenceof V.vulnificus .Wearecurrentlyexaminingthepossible rolesofseveraloftheselociinvirulence. Atthetimeofourperformingthisgenomicsequence analysis,wehadnotperformedvirulencestudiesofbiotype2ATCC33149inoursubcutaneouslyinoculated irondextran-treatedmousemodelforinfection.However,Amaroetal.[77]previouslyreportedthatATCC 33149wasofmodestvirulenceinadifferentmouse modelinvolvingintraperitonealinfection.Basedonour resultsindicatingthatATCC33149lackedthegenes sharedamongvirulentclade2strainsorclade2strains plusvirulentclade199-738DP-B5,wehypothesized thatATCC33149wouldbeattenuatedforvirulencein ourmodel.Infact,whenadministeredatthestandard lethaldoseof1,000CFUforvirulentstrains,ATCC 33149causedonlyminimallydetectableskininfectionin oneoffivemice.Furthermore,whenadministeredat 100timesthetypicallethaldose(105CFU/mouse),skin infectionbutnosystemicinfectionordeathensued. Therefore,ourgenomicanalysisofATCC33149correctly predicteditsattenuatedvirulence.Itshouldbenotedthat AmaroandBiosca[78]reportedthatsomebiotype2 strainsarevirulentformammals,sotheattenuationof ATCC33149wasnotaforegoneconclusion. Becausephenotypicdifferencesarenotonlyrootedin presenceorabsenceofwholegenes,butalsonucleotide polymorphisms,wegeneratedasetofSNPsamongthe sharedsequencesofthereferenceandnewlysequenced genomes(Table4andAdditionalFiles6,7,8,9,10,11, 12,and13).ByexaminingSanger-derivedsequencesfor asubsetofSNPs,wedeterminedthat98.4%ofour reportedSNPsareaccurate.Ofthe128SNPsexamined, onlytwoinonegeneofonestrainwerenotconfirmed bySangersequencing. Althoughthesamplesizeofnewlysequencedstrains wassmallandeachstrainisasinglerepresentativeofa uniquegenotype/virulence phenotypecombination, someinterestingrelationshipsinSNPswereobserved. Mostinteresting,therateofSNPswassignificantly higherforgenesencodedonchromosome2compared withchromosome1.Giventhatchromosome1of Vibrioisbelievedtoencodemostoftheessentialgenes andthatchromosome2isbelievedtohavebeen acquiredexogenously[79],itisreasonablethatthe Figure2 DistributionofSNPsRelativetotheCMCP6GenomeandSubsetsofGenes .BoxandwhiskerplotsoftheSNPs/baseforeachof thesubsetsofannotatedgenesareshown. Gulig etal BMCGenomics 2010, 11 :512 Page11of16


highestrateofpolymorphismswouldoccurinthechromosome2.ThenumberofSNPsbetweenM06-24/O andthereferencegenomeswasmuchlowerthanthose fromtheotherthreegenomes(Table4),eventhough therewereslightlymoregenesidentifiedinM06-24/O. BecauseM06-24/Oisinthesamecladeasthereference genomes,thisresultwouldb eexpected.SignificantdifferenceswereobservedinthefrequenciesofSNPs amongabouteverysubsetofgenesexamined,e.g.,clade 2-specific,coregenome,hypotheticalproteins(Figure2). However,itmustbenotedthatthenumbersofstrains contributingtotheSNPpoolforthesesubsetsofgenes differbetweenthesets.Forexample,thecoregenomeis sharedamongallsixstrains,soallfournewlysequenced strainscontributedSNPsandcouldhavegenerated ahigherfrequencyofSNPs.Incontrast,forthe clade2-specificgenes,theonlynewlysequencedstrain contributingSNPs,bydefinitionofthesubset,was M06-24/O. Bycomparingthesequencessharedamongallsixgenomes,weidentifiedthecore V.vulnificus genomeconsistingof3,459genes.Guetal.[43]examinedthe genomicsequencesofall Vibrio speciesasof2008and identified1,882genescommontothegenus.Wearepresentlyexaminingthecore V.vulnificus genometodeduce possiblemetabolicandvirulencecharacteristicsofthe species.Weidentified20genespreviouslyunreportedin V.vulnificus byusingMAQtocomparetheunmapped readstothe V.cholerae N16961genome.Iftheclade1 orbiotype2genomespossessedsequenceswithsufficient similaritytothe V.cholerae genome,weshouldhave beenabletoidentifyandassemblethemexactlyaswe didforthe V.vulnificus referencegenomes. Mostrecently,Chunetal.[80]examinedthegenomes of23 V.cholerae strainscollectedover98years.Their newlysequencesgenomeswereobtainedusingacombinationofSangerand454sequencing.Likeus,they basedtheirphylogeneticrelationshipsprimarilyonpresenceorabsenceofORFs.Theiranalysisenabledthe divisionofthatspeciesinto12lineages,withonecomprisingtheO1strainsandtheseventhpandemiccomprisinganearlyidenticalclade.Theydeterminedthat horizontalgenetransfersignificantlycontributedtothe evolutionofthespecies.ConclusionsSOLiDsequencingofmulti plebacterialgenomesof V.vulnificus andsubsequentcomparativegenomicanalysisidentifiednumerousgenesthatarecommontothe mostvirulentstrainsyetlack ingfromattenuatedstrains forwhichgenomicDNAsequencedataareavailable. ThesecandidatevirulencegenesencodeFlppili,GGDEF proteins,andgenomicislandXII.Sialicacidcatabolism wassimilarlyidentifiedasapotentialcontributoryfactor inmolecularpathogenesis.Th eseintriguingresultswill likelyleadtomorethoroughunderstandingofmolecular pathogenesisof V.vulnificus .MethodsV.vulnificus strainsEachofthefour V.vulnificus strainsusedforgenomic sequencingwaschosentorepresentaspecificcombinationofgenotypeandvirulencephenotype.M06-24/Ois atypicalbiotype1,clade2str ainthatishighlyvirulent inoursubcutaneouslyinocu lated,irondextran-treated mousemodel.99-520DP-B8isatypicalbiotype1,clade 1strainthatisattenuatedinourmousemodelinthatit cancauseskininfection,butnotsystemicinfectionor death.99-738DP-B5isanunusualbiotype1,clade1 straininthatitisfullyvirulentinourmousemodel. ATCC33149isabiotype2strainthatishighlyattenuatedforvirulenceinourmousemodel.Thesedataare summarizedinTable1.SOLiDDNAsequencingSequencingrunsweredoneusingcycledligationsequencingonaSOLiD ™ Analyzer(AppliedBiosystems,Beverly, MA)attheInterdisciplinaryCenterforBiotechnology ResearchattheUniversityofFlorida.Approximately3to 5 gofpurifiedbacterialgenomicDNAwasshearedinto 80to100-bpshortfragmentswiththeCovaris ™ S2system accordingtotheABprotocol.TheshearedDNAwaspurifiedusingaQiagenMiniElutereactioncleanupkit.The purifiedshearedfragmentsweremadeblunt-endedwith theEpicenterEnd-It ™ DNAend-repairkitandsubsequentlyligatedtoshortSOLiDP1andP2adapters(P1, 41-bp:5 ’ -CCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT-3 ’ ;P2,23-bp:5 ’ -AGAGAATGAGGAACCCGGGGCAG-3 ’ ),whichprovidetheprimary sequencesforbothamplificationandsequencingofthe samplelibraryfragments.Adapter-ligatedDNAwasthen purifiedusingtheAgencourtkit.Thereactionconditions wereoptimizedtoselectivelybindDNA100-bpandlarger. Atthispoint,DNAwasnick-translatedandresolvedon 4%agarosegel,fromwhichthe120to180-bpfragments wereexcised.ThefractionatedDNAwassubjectedto8to 10cyclesofPCRamplification.ThenumberofPCRcycles neededforamplificationwasdeterminedbytheabilityto visualizetheamplifiedproductona2.2%Lonzaflashgel. TheamplifiedPCRproductswerepurifiedandthenquantifiedusinganAgilent2100bioanalyzer. Inpreparationforsequencing,theDNAfragments wereclonallyamplifiedbyemulsionPCRbyusing1.6 billion,1 MbeadswithP1primercovalentlyattached tothesurface.Emulsionswerebrokenwithbutanol,and ePCRbeadswereenrichedfortemplate-positivebeads byhybridizationwithP2-coatedcapturebeads(SOLiD reagent,AppliedBiosystems).Template-enrichedbeadsGulig etal BMCGenomics 2010, 11 :512 Page12of16


wereextendedatthe3 ’ endinthepresenceofterminal transferaseand3 ’ beadlinker.About60millionbeads withclonallyamplifiedDNAwerethendepositedonto one-fourthofaderivatizedglasssurfaceofa25mm 75mmSOLiD ™ slide.Theslidewasthenloadedontoa SOLiDinstrument,andthe35-basesequenceswere obtainedaccordingtomanufacturer ’ sprotocol.DNAsequencedatamanagementThecolorspacereadsfromSOLiDsequencingwere alignedtothegenomesof V.vulnificus strainsCMCP6 (GenBankaccessionnumbersAE016795andAE016796) andYJ016(GenBankaccessionnumbersBA000037, BA000037,AP005352)usingMAQ[27].Readsfrom eachofthefoursamplestrainsweremappedtothetwo referencesequencesseparately.Readsunmappedinboth referencegenomeswereidentified.Mappedreadswere usedtodevelopaconsensussequenceforeachofthe fourstrains.Foreachstrainrelativetothetworeference sequences,agenewasdeterminedtobeabsentwhen theaveragedepthofcoverageovertheopenreading framewaslessthan5X.Consensussequenceswerealso usedtogeneratealistofSNPsamongthesixstrainsof V.vulnificus usingtheMAQcns2snp[27]. Readswithlow-complexitycharacteristics,definedas containingahomopolymerrunofatleast5bases,at leastfourrepeatsofthesamedinucleotideinarow,or atleastfourrepeatsofthesametrinucleotideinarow, wereremovedfromthedatasetbeforefurtheranalysis. Whilethesereadsmayrepresenttruegenomicregions, thedifficultyinassigningthemtoaparticulargenomic regionlimitstheirvalue.T hisisaninherentproblem withlowcomplexitygenomesandshortreaddata. Readsunmappedinbothreferencesequenceswerethen comparedtothe V.cholerae NC16961referencegenome usingMAQ[27].Windowsof100nucleotidesinthe V.cholerae genomewithareaddepthoffiveormore wereidentified.Regionswherefiveormorewindows occurredintandemwereretained,whilethosewithcoveragelessthanfivewerediscarded.Readsthatinitially mappedtoalowerdensityareaofthe V.cholerae genomewerere-examinedforpossiblematchestothe tandemwindows. Inparalleltothe V.cholerae exploration,theunmapped readswereexaminedforsimilarityto V.vulnificus biotype 2plasmidspR99(accession#AM293858),pC4602-1 (accession#AM293859),andpC4602-2(accession# AM293860)usingMAQandthesamecriteriaasabove.BioinformaticalnalysisFunctionalanalysisandannotationsanalysisofthe V.vulnificus YJ016andCMCP6genesweredoneusing thePathwayToolOmicsviewerfromtheBioCycplatform[81]andtheSEEDdatabase[82].AdditionalmaterialAdditionalfile1:TableS1:Coverageofthe V.vulnificus biotype2 plasmidsbynewlysequencedreads .SOLiDsequencingreadsofeach ofthefournewlysequencedgenomeswerematchedwiththethree plasmidsof V.vulnificus biotype2usingMAQ.Thesizeofeachplasmidis shown.*Numbersofnucleotidesofthereferenceplasmidwithlessthan 10-foldcoverageby35-ntreadsfromthenewlysequencedgenome. **Numberofnucleotidesthatwerematchedbyvirtueofhaving10-fold orgreatercoveragedepth.***Percentofreferenceplasmidmatchedto thenewlysequencedgenome. Additionalfile2:TableS2:IdentificationofORFsinnewly sequenced V.vulnificus genomesbymatchingwiththe V.cholerae NC16961genome .SOLiDsequencingreadsofeachofthefournewly sequencedgenomeswerematchedwiththe V.cholerae NC16961using MAQ,asdescribedintheMethods. V.vulnificus strains:M06-M06-24/O, B5-99-738DP-B5,B8-99-520DP-B8,ATCC-ATCC33149.Geneswere consideredmatchediftherewasfive-foldorhigherdepthofcoverage overfivetandem100-ntwindows. Additionalfile3:TableS3A:MatchesofCMCP6genesfromthe YJ016referencegenomeandthefournewlysequencedgenomes TheCMCP6genesareshownbytheirtag,genename(ifannotated),and product(ifknown).Matchingofeachgenewiththenewlysequenced genomeswasdeterminedusingMAQ,asdescribedintheMethods. MatcheswiththeYJ016genomewereobtainedusingGenPlotathttp:// www.ncbi.nlm.nih.govusingdefaultparameters.Genesfromeach queriedgenomethatwerenotmatchedtotheCMCP6genomeare indicatedwithanX.IfaCMCP6geneismissingfromalloftheotherfive genomes,itisindicatedwithanintheCMCP6-Specificcolumn. V. vulnificus strains:M06-M06-24/O,B5-99-738DP-B5,B8-99-520DP-B8, ATCC-ATCC33149. Additionalfile4:TableS3B:MatchesofYJ016genesfromthe CMCP6referencegenomeandthefournewlysequencedgenomes TheYJ016genesareshownbytheirtag,genename(ifannotated),and product(ifknown).Matchingofeachgenewiththenewlysequenced genomeswasdeterminedusingMAQ,asdescribedintheMethods. MatcheswiththeCMCP6genomewereobtainedusingGenPlotat http://www.ncbi.nlm.nih.govusingdefaultparameters.Genesfromeach queriedgenomethatwerenotmatchedtotheYJ016genomeare indicatedwithanX.IfaYJ016geneismissingfromalloftheotherfive genomes,itisindicatedwithanintheYJ016-Specificcolumn. V. vulnificus strains:M06-M06-24/O,B5-99-738DP-B5,B8-99-520DP-B8, ATCC-ATCC33149. Additionalfile5:TableS4:Thecore V.vulnificus genome .Genesthat werepresentinthetworeferencegenomesandeachofthefournewly sequencedgenomesareshownusingtheCMCP6tag,product,gene name,andcog. Additionalfile6:TableS5A:SNPanalysisof V.vulnificus M06-24/O comparedwiththeCMCP6referencegenomes .MAQwasusedto identifySNPsfromtheSOLiDsequencingreadsfromM06-24/O comparedwiththeCMCP6referencegenome,asdescribedinthe Methods.Pos.-Positionofthenucleotideinthegenomicelement.Ref.Referencebaseinthereferencegenome.Con.-Consensusbaseinthe newlysequencedgenome.Con.QS-ConsensusQualityScore.Read depth-Depthofcoverageatthechosennucleotide.Avg.hits-Average numberofhitsofreadscoveringtheposition.HMQ-Highestmapping qualityofreadscoveringtheposition.MCQ-Minimumconsensus qualityinthethirdflankingregiononeachsideofthesite.2nd-second bestcallforthenucleotide.LLR-Loglikelihoodratioofthesecondand thirdbestcall.3rd-Thirdbestcall. Additionalfile7:TableS5B:SNPanalysisof V.vulnificus M06-24/O comparedwiththeYJ016referencegenome .MAQwasusedto identifySNPsfromtheSOLiDsequencingreadsfromM06-24/O comparedwiththeYJ016referencegenome,asdescribedinthe Methods.ColumnheadingsareasforAdditionalFile6,TableS5A. Additionalfile8:TableS6A:SNPanalysisof V.vulnificus 99-738DPB5comparedwiththeCMCP6referencegenome .MAQwasusedto identifySNPsfromtheSOLiDsequencingreadsfrom99-738DP-B5Gulig etal BMCGenomics 2010, 11 :512 Page13of16


comparedwiththeCMCP6referencegenome,asdescribedinthe Methods.ColumnheadingsareasforAdditionalFile6,TableS5A. Additionalfile9:TableS6B:SNPanalysisof V.vulnificus 99-738DPB5comparedwiththeYJ016referencegenome .MAQwasusedto identifySNPsfromtheSOLiDsequencingreadsfrom99-738DP-B5 comparedwiththeYJ016referencegenome,asdescribedinthe Methods.ColumnheadingsareasforAdditionalFile6,TableS5A. Additionalfile10:TableS7A:SNPanalysisof V.vulnificus 99-520 DP-B8comparedwiththeCMCP6referencegenome .MAQwasused toidentifySNPsfromtheSOLiDsequencingreadsfrom99-520DP-B8 comparedwiththeCMCP6referencegenome,asdescribedinthe Methods.ColumnheadingsareasforAdditionalFile6,TableS5A. Additionalfile11:Table7B:SNPanalysisof V.vulnificus 99-520DPB8comparedwiththeYJ016referencegenome .MAQwasusedto identifySNPsfromtheSOLiDsequencingreadsfrom99-520DP-B8 comparedwiththeYJ016referencegenome,asdescribedinthe Methods.ColumnheadingsareasforAdditionalFile6,TableS5A. Additionalfile12:Table8A:SNPanalysisof V.vulnificus ATCC 33149comparedwiththeCMCP6referencegenome .MAQwasused toidentifySNPsfromtheSOLiDsequencingreadsfromATCC33149 comparedwiththeCMCP6referencegenome,asdescribedinthe Methods.ColumnheadingsareasforAdditionalFile6,TableS5A. Additionalfile13:Table8B:SNPanalysisof V.vulnificus ATCC 33149comparedwiththeYJ016referencegenome .MAQwasused toidentifySNPsfromtheSOLiDsequencingreadsfromATCC33149 comparedwiththeYJ016referencegenome,asdescribedinthe Methods.ColumnheadingsareasforAdditionalFile6,TableS5A. Acknowledgements WethankRobertEdwardsfortheinitialBLASTanalysisoftheSOLiD sequencingdata.WethankPatrickThiavilleforcriticalreviewofthis manuscript. ThisworkwassupportedbyfundingfromtheUniversityofFlorida EmergingPathogensInstitute,TheUniversityofFloridaOpportunityFund, andFloridaSeaGrant.Publicationofthisarticlewasfundedinpartbythe UniversityofFloridaOpen-AccessPublishingFund. Authordetails1DepartmentofMolecularGeneticsandMicrobiology,UniversityofFlorida, Gainesville,Florida,USA.2DepartmentofMicrobiologyandCellScience, UniversityofFlorida,Gainesville,Florida,USA.3DepartmentofFoodScience andHumanNutrition,UniversityofFlorida,Gainesville,Florida,USA.4DepartmentofGenetics,UniversityofMelbourne,3010Australia. Authors Â’ contributions PAGplannedandcoordinatedtheresearch,analyzeddata,andwrotethe manuscript.VDCcontributedtodataanalysisandwriting.ACWcontributed toplanningandwriting.BWperformedMAQdataanalysisandplanning. MTScontributedtothewritingthemanuscript.LMMhelpedplanthestudy, plannedanalyses,andcontributedtothewritingofthemanuscript.All authorsreadandapprovedthefinalmanuscript. Received:15March2010Accepted:24September2010 Published:24September2010 References1.GuligPA,BourdageKL,StarksAM: MolecularPathogenesisof Vibrio vulnificus JMicrobiol 2005, 43 :118-131. 2.OliverJD,JonesMK: Vibriovulnificus :Diseaseandpathogenesis. Infect Immun 2009, 77 :1723-1733. 3.SimpsonLM,WhiteVK,ZaneSF,OliverJD: Correlationbetweenvirulence andcolonymorphologyin Vibriovulnificus InfectImmun 1987, 55 :269-272. 4.WrightAC,SimpsonLM,OliverJD,MorrisJGJr: Phenotypicevaluationof acapsulartransposonmutantsof Vibriovulnificus InfectImmun 1990, 58 :1769-1773. 5.LiuM,AliceAF,NakaH,CrosaJH: TheHlyUproteinisapositive regulatorof rtxA1 ,ageneresponsibleforcytotoxicityandvirulence inthehumanpathogen Vibriovulnificus InfectImmun 2007, 75 : 3282-3289. 6.LeeJH,KimMW,KimBS,KimSM,LeeBC,KimTS,ChoiSH: Identification andcharacterizationofthe VibriovulnificusrtxA essentialforcytotoxicity invitroandvirulenceinmice. JMicrobiol 2007, 45 :146-152. 7.KimYR,LeeSE,KookH,YeomJA,NaHS,KimSY,ChungSS,ChoyHE, RheeJH: Vibriovulnificus RTXtoxinkillshostcellsonlyaftercontactof thebacteriawithhostcells. CellMicrobiol 2008, 10 :848-862. 8.LitwinCM,RaybackTW,SkinnerJ: Roleofcatecholsiderophoresynthesis in Vibriovulnificus virulence. InfectImmun 1996, 64 :2834-2838. 9.WrightAC,SimpsonLM,OliverJD: Roleofironinthepathogenesisof Vibriovulnificus infections. InfectImmun 1981, 34 :503-507. 10.ParanjpyeRN,StromMS: A Vibriovulnificus typeIVpilincontributesto biofilmformation,adherencetoepithelialcells,andvirulence. Infect Immun 2005, 73 :1411-1422. 11.ParanjpyeRN,LaraJC,PepeJC,PepeCM,StromMS: ThetypeIVleader peptidase/ N -methyltransferaseof Vibriovulnificus controlsfactorsrequiredforadherencetoHEp-2cellsandvirulenceiniron-overloaded mice. InfectImmun 1998, 66 :5659-5668. 12.KimYR,RheeJH: Flagellarbasalbody flg operonasavirulence determinantof Vibriovulnificus BiochemBiophysResCommun 2003, 304 :405-410. 13.LeeJH,RhoJB,ParkKJ,KimCB,HanYS,ChoiSH,LeeKH,ParkSJ: Roleof flagellumandmotilityinpathogenesisof Vibriovulnificus InfectImmun 2004, 72 :4905-4910. 14.TisonDL,NishibuchiM,GreenwoodJD,SeidlerRJ: Vibriovulnificus biogroup2:newbiogrouppathogenicforeels. ApplEnvironMicrobiol 1982, 44 :640-646. 15.BisharatN,AgmonV,FinkelsteinR,RazR,BenDrorG,LernerL,SobohS, ColodnerR,CameronDN,WykstraDL,SwerdlowDL,FarmerJJJr: Clinical, epidemiological,andmicrobiologicalfeaturesof Vibriovulnificus biogroup3causingoutbreaksofwoundinfectionandbacteraemiain Israel.IsraelVibrioStudyGroup. Lancet 1999, 354 :1421-1424. 16.NilssonWB,ParanjpyeRN,DePaolaA,StromMS: Sequencepolymorphism ofthe16SrRNAgeneof Vibriovulnificus isapossibleindicatorofstrain virulence. JClinMicrobiol 2003, 41 :442-446. 17.Gonzalez-EscalonaN,JaykusLA,DePaolaA: Typingof Vibriovulnificus strainsbyvariabilityintheir16S-23SrRNAintergenicspacerregions. FoodbornePathogDis 2007, 4 :327-337. 18.BisharatN,CohenDI,HardingRM,FalushD,CrookDW,PetoT,MaidenMC: Hybrid Vibriovulnificus EmergInfectDis 2005, 11 :30-35. 19.BisharatN,CohenDI,MaidenMC,CrookDW,PetoT,HardingRM: The evolutionofgeneticstructureinthemarinepathogen, Vibriovulnificus InfectGenetEvol 2007, 7 :685-693. 20.CohenAL,OliverJD,DePaolaA,FeilEJ,BoydEF: Emergenceofavirulent cladeof Vibriovulnificus andcorrelationwiththepresenceofa33kilobasegenomicisland. ApplEnvironMicrobiol 2007, 73 :5553-5565. 21.RoscheTM,YanoY,OliverJD: ArapidandsimplePCRanalysisindicates therearetwosubgroupsof Vibriovulnificus whichcorrelatewithclinical orenvironmentalisolation. MicrobiolImmunol 2005, 49 :381-389. 22.DrakeSL,WhitneyB,LevineJF,DePaolaA,JaykusLA: Correlationof mannitolfermentationwithvirulence-associatedgenotypic characteristicsin Vibriovulnificusisolatesfromoystersandwater samplesintheGulfofMexico. FoodbornePathogDis 2010, 7 :97-101. 23.StarksAM,SchoebTR,TamplinML,ParveenS,DoyleTJ,BomeislPE, EscuderoGM,GuligPA: Pathogenesisofinfectionbyclinicaland environmentalstrainsof Vibriovulnificus inirondextran-treatedmice. InfectImmun 2000, 68 :5785-5793. 24.StarksAM,BourdageKL,ThiavillePC,GuligPA: Useofamarkerplasmidto examinegrowthanddeathof Vibriovulnificus ininfectedmice. Mol Microbiol 2006, 61 :310-323. 25.DePaolaA,NordstromJL,DalsgaardA,ForslundA,OliverJD,BatesT, BourdageKL,GuligPA: Analysisof Vibriovulnificus frommarketoysters andsepticemiacasesforvirulencemarkers. ApplEnvironMicrobiol 2003, 69 :4006-4011. 26.BioscaEG,LlorensH,GarayE,AmaroC: Presenceofacapsulein Vibrio vulnificus biotype2anditsrelationshiptovirulenceforeels. Infect Immun 1993, 61 :1611-1618.Gulig etal BMCGenomics 2010, 11 :512 Page14of16


27.LiH,RuanJ,DurbinR: MappingshortDNAsequencingreadsandcalling variantsusingmappingqualityscores. GenomeRes 2008, 18 :1851-1858. 28.LeeCT,AmaroC,WuKM,ValienteE,ChangYF,TsaiSF,ChangCH,HorLI: A commonvirulenceplasmidinbiotype2 Vibriovulnificus andits disseminationaidedbyaconjugalplasmid. JBacteriol 2008, 190 :1638-1648. 29.DavidsonLS,OliverJD: Plasmidcarriagein Vibriovulnificus andother lactose-fermentingmarinevibrios. ApplEnvironMicrobiol 1986, 52 :211-213. 30.ChenCY,WuKM,ChangYC,ChangCH,TsaiHC,LiaoTL,LiuYM,ChenHJ, ShenAB,LiJC,SuTL,ShaoCP,LeeCT,HorLI,TsaiSF: Comparative genomeanalysisof Vibriovulnificus ,amarinepathogen. GenomeRes 2003, 13 :2577-2587. 31.LabbateM,CaseRJ,StokesHW: Theintegron/genecassettesystem:an activeplayerinbacterialadaptation. MethodsMolBiol 2009, 532 :103-125. 32.MazelD,DychincoB,WebbVA,DaviesJ: Adistinctiveclassofintegronin the Vibriocholerae genome. Science 1998, 280 :605-608. 33.CotterPA,StibitzS: c-di-GMP-mediatedregulationofvirulenceand biofilmformation. CurrOpinMicrobiol 2007, 10 :17-23. 34.KachlanySC,PlanetPJ,DeSalleR,FineDH,FigurskiDH,KaplanJB: flp-1,the firstrepresentativeofanewpilingenesubfamily,isrequiredfornonspecificadherenceof Actinobacillusactinomycetemcomitans Mol Microbiol 2001, 40 :542-554. 35.RobertsDD,GinsburgV: Sulfatedglycolipidsandcelladhesion. Arch BiochemBiophys 1988, 267 :405-415. 36.BhatS,SpitalnikSL,Gonzalez-ScaranoF,SilberbergDH: Galactosyl ceramideoraderivativeisanessentialcomponentoftheneural receptorforhumanimmunodeficiencyvirustype1envelope glycoproteingp120. ProcNatlAcadSciUSA 1991, 88 :7131-7134. 37.HannahJH,MenozziFD,RenauldG,LochtC,BrennanMJ: Sulfated glycoconjugatereceptorsforthe Bordetellapertussis adhesinfilamentous hemagglutinin(FHA)andmappingoftheheparin-bindingdomainon FHA. InfectImmun 1994, 62 :5010-5019. 38.KamisagoS,IwamoriM,TaiT,MitamuraK,YazakiY,SuganoK: Roleof sulfatidesinadhesionof Helicobacterpylori togastriccancercells. Infect Immun 1996,64 :624-628. 39.HoffmanJA,BadgerJL,ZhangY,HuangSH,KimKS: Escherichiacoli K1 aslA contributestoinvasionofbrainmicrovascularendothelialcellsinvitro andinvivo. InfectImmun 2000, 68 :5062-5067. 40.BryantRG,JarvisJ,JandaJM: Useofsodiumdodecylsulfate-polymyxinBsucrosemediumforisolationof Vibriovulnificus fromshellfish. Appl EnvironMicrobiol 1987, 53 :1556-1559. 41.Almagro-MorenoS,BoydEF: Insightsintotheevolutionofsialicacid catabolismamongbacteria. BMCEvolBiol 2009, 9 :118. 42.JeongHG,OhMH,KimBS,LeeMY,HanHJ,ChoiSH: Thecapabilityof catabolicutilizationofN-acetylneuraminicacid,asialicacid,isessential for Vibriovulnificus pathogenesis. InfectImmun 2009, 77 :3209-3217. 43.GuJ,NearyJ,CaiH,MoshfeghianA,RodriguezSA,LilburnTG,WangY: GenomicandsystemsevolutioninVibrionaceaespecies. BMCGenomics 2009, 10(Suppl1) :S11. 44.FleischmannRD,AdamsMD,WhiteO,ClaytonRA,KirknessEF, KerlavageAR,BultCJ,TombJF,DoughertyBA,MerrickJM,McKenneyK, SuttonG,FitzhughW,FieldsC,GocayneJD,ScottJ,ShirleyR,LiuLI, GlodekA,KelleyJM,WeidmanJF,PhillipsCA,SpriggsT,HedblomE, CottonMD,UtterbackTR,HannaMC,NguyenDT,SaudekDM,BrandonRC, FineLD,FritchmanJL,FuhrmannJL,GeoghagenNSM,GnehmCL, McDonaldLA,SmallKV,FraserCM,SmithHO,VenterJC: Whole-genome randomsequencingandassemblyof Haemophilus-influenzae Rd. Science 1995, 269 :496-512. 45.HallN: Advancedsequencingtechnologiesandtheirwiderimpactin microbiology. JExpBiol 2007, 210 :1518-1525. 46.SangerF,NicklenS,CoulsonAR: DNAsequencingwithchain-terminating inhibitors. ProcNatlAcadSciUSA 1977, 74 :5463-5467. 47.HudsonME: Sequencingbreakthroughsforgenomicecologyand evolutionarybiology. MolEcolResour 2008, 8 :3-17. 48.MardisER: Theimpactofnext-generationsequencingtechnologyon genetics. TrendsGenet 2008, 24 :133-141. 49.MorozovaO,MarraMA: Applicationsofnext-generationsequencing technologiesinfunctionalgenomics. Genomics 2008, 92 :255-264. 50.PetterssonE,LundebergJ,AhmadianA: Generationsofsequencing technologies. Genomics 2009, 93 :105-111. 51.RothbergJM,LeamonJH: Thedevelopmentandimpactof454 sequencing. NatBiotechnol2008, 26 :1117-1124. 52.MarguliesM,EgholmM,AltmanWE,AttiyaS,BaderJS,BembenLA,BerkaJ, BravermanMS,ChenYJ,ChenZT,DewellSB,DuL,FierroJM,GomesXV, GodwinBC,HeW,HelgesenS,HoCH,IrzykGP,JandoSC,AlenquerMLI, JarvieTP,JirageKB,KimJB,KnightJR,LanzaJR,LeamonJH,LefkowitzSM, LeiM,LiJ,LohmanKL,LuH,MakhijaniVB,McDadeKE,McKennaMP, MyersEW,NickersonE,NobileJR,PlantR,PucBP,RonanMT,RothGT, SarkisGJ,SimonsJF,SimpsonJW,SrinivasanM,TartaroKR,TomaszA, VogtKA,VolkmerGA,WangSH,WangY,WeinerMP,YuPG,BegleyRF, RothbergJM: Genomesequencinginmicrofabricatedhigh-density picolitrereactors. Nature 2005, 437 :376-380. 53.HillerNL,JantoB,HoggJS,BoissyR,YuS,PowellE,KeefeR,EhrlichNE, ShenK,HayesJ,BarbadoraK,KlimkeW,DernovoyD,TatusovaT,ParkhillJ, BentleySD,PostJC,EhrlichGD,HuFZ: Comparativegenomicanalysesof seventeen Streptococcuspneumoniae strains:insightsintothe pneumococcalsupragenome. JBacteriol 2007, 189 :8186-8195. 54.AdamsMD,GoglinK,MolyneauxN,HujerKM,LavenderH,JamisonJJ, MacDonaldIJ,MartinKM,RussoT,CampagnariAA,HujerAM,BonomoRA, GillSR: Comparativegenomesequenceanalysisofmultidrug-resistant Acinetobacterbaumannii JBacteriol 2008, 190 :8053-8064. 55.SnyderLAS,LomanN,PallenMJ,PennCW: Next-generationsequencingthepromiseandperilsofchartingthegreatmicrobialunknown. MicrobialEcology 2009, 57 :1-3. 56.LeyRE,TurnbaughPJ,KleinS,GordonJI: Microbialecology-Humangut microbesassociatedwithobesity. Nature 2006, 444 :1022-1023. 57.TurnbaughPJ,HamadyM,YatsunenkoT,CantarelBL,DuncanA,LeyRE, SoginML,JonesWJ,RoeBA,AffourtitJP,EgholmM,HenrissatB,HeathAC, KnightR,GordonJI: Acoregutmicrobiomeinobeseandleantwins. Nature 2009, 457 :480-484. 58.SoginML,MorrisonHG,HuberJA,MarkWelchD,HuseSM,NealPR, ArrietaJM,HerndlGJ: Microbialdiversityinthedeepseaandthe underexplored “ rarebiosphere ” ProcNatlAcadSciUSA 2006, 103 :12115-12120. 59.Cox-FosterDL,ConlanS,HolmesEC,PalaciosG,EvansJD,MoranNA, QuanPL,BrieseT,HornigM,GeiserDM,MartinsonV,vanEngelsdorpD, KalksteinAL,DrysdaleA,HuiJ,ZhaiJH,CuiLW,HutchisonSK,SimonsJF, EgholmM,PettisJS,LipkinWI: Ametagenomicsurveyofmicrobesin honeybeecolonycollapsedisorder. Science 2007, 318 :283-287. 60.AuryJM,CruaudC,BarbeV,RogierO,MangenotS,SamsonG,PoulainJ, AnthouardV,ScarpelliC,ArtiguenaveF,WinckerP: Highqualitydraft sequencesforprokaryoticgenomesusingamixofnewsequencing technologies. BmcGenomics 2008, 9 :11. 61.LomanNJ,PallenMJ: XDR-TBgenomesequencing:aglimpseofthe microbiologyofthefuture. FutureMicrobiol 2008, 3 :111-113. 62.LomanNJ,SnyderLAS,LintonJD,LangdonR,LawsonAJ,WeinstockGM, WrenBW,PallenMJ: Genomesequenceoftheemergingpathogen Helicobactercanadensis JBacteriol 2009, 191:5566-5567. 63.QiW,KaserM,RoltgenK,Yeboah-ManuD,PluschkeG: Genomicdiversity andevolutionof Mycobacteriumulcerans revealedbynext-generation sequencing. PLosPathogens 2009, 5 :e1000580. 64.ReinhardtJA,BaltrusDA,NishimuraMT,JeckWR,JonesCD,DanglJL: De novoassemblyusinglow-coverageshortreadsequencedatafromthe ricepathogenPseudomonassyringaepv.oryzae. GenomeRes 2009, 19 :294-305. 65.StudholmeDJ,IbanezSG,MacLeanD,DanglJL,ChangJH,RathjenJP: A draftgenomesequenceandfunctionalscreenrevealstherepertoireof typeIIIsecretedproteinsof Pseudomonassyringae pathovartabaci 11528. BmcGenomics 2009, 10 :19. 66.SrivatsanA,HanY,PengJL,TehranchiAK,GibbsR,WangJD,ChenR: Highprecision,whole-genomesequencingoflaboratorystrainsfacilitates geneticstudies. PLoSGenet 2008, 4 :14. 67.HoltKE,ParkhillJ,MazzoniCJ,RoumagnacP,WeillFX,GoodheadI,RanceR, BakerS,MaskellDJ,WainJ,DolecekC,AchtmanM,DouganG: Highthroughputsequencingprovidesinsightsintogenomevariationand evolutionin Salmonella Typhi. NatureGenet 2008, 40 :987-993. 68.HillierLW,MarthGT,QuinlanAR,DoolingD,FewellG,BarnettD,FoxP, GlasscockJI,HickenbothamM,HuangWC,MagriniVJ,RichtRJ,SanderSN,Gulig etal BMCGenomics 2010, 11 :512 Page15of16


StewartDA,StrombergM,TsungEF,WylieT,SchedlT,WilsonRK, MardisER: Whole-genomesequencingandvariantdiscoveryin C. elegans NatMethods 2008, 5 :183-188. 69.SarinS,PrabhuS,O Â’ MearaMM,Pe Â’ erI,HobertO: Caenorhabditiselegans mutantalleleidentificationbywhole-genomesequencing. NatMethods 2008, 5 :865-867. 70.ShenY,SarinS,LiuY,HobertO,Pe Â’ erI: Comparingplatformsfor C. elegans mutantidentificationusinghigh-throughputwhole-genome sequencing. Plosone 2008, 3 :e4012. 71.SmithDR,QuinlanAR,PeckhamHE,MakowskyK,TaoW,WoolfB,ShenL, DonahueWF,TusneemN,StrombergMP,StewartDA,ZhangL,RanadeSS, WarnerJB,LeeCC,ColemanBE,ZhangZ,McLaughlinSF,MalekJA, SorensonJM,BlanchardAP,ChapmanJ,HillmanD,ChenF,RokhsarDS, McKernanKJ,JeffriesTW,MarthGT,RichardsonPM: Rapidwhole-genome mutationalprofilingusingnext-generationsequencingtechnologies. GenomeRes 2008, 18 :1638-1642. 72.AntipovaAA,SokolskyTD,ClouserCR,DimalantaET,HendricksonCL, KosnopoC,LeeCC,RanadeSS,ZhangL,BlanchardAP,McKernanKJ: Polymorphismdiscoveryinhigh-throughputresequencedmicroarrayenrichedhumangenomicloci. JournalofBiomolecularTechniques 2009, 5 :253-257. 73.ChenW,UllmannR,LangnickC,MenzelC,WotschofskyZ,HuH,DoringA, HuY,KangH,TzschachA,HoeltzenbeinM,NeitzelH,MarkusS, WiedersbergE,KistnerG,vanRavenswaaij-ArtsCM,KleefstraT, KalscheuerVM,RopersHH: Breakpointanalysisofbalancedchromosome rearrangementsbynext-generationpaired-endsequencing. European JournalofHumanGenetics 2009, 18 :539-543. 74.McKernanKJ,PeckhamHE,CostaGL,McLaughlinSF,FuYT,TsungEF, ClouserCR,DuncanC,IchikawaJK,LeeCC,ZhangZ,RanadeSS, DimalantaET,HylandFC,SokolskyTD,ZhangL,SheridanA,FuHN, HendricksonCL,LiB,KotlerL,StuartJR,MalekJA,ManningJM, AntipovaAA,PerezDS,MooreMP,HayashibaraKC,LyonsMR,BeaudoinRE, ColemanBE,LaptewiczMW,SannicandroAE,RhodesMD,GottimukkalaRK, YangS,BafnaV,BashirA,MacBrideA,AlkanC,KiddJM,EichlerEE, ReeseMG,DelaVegaFM,BlanchardAP: Sequenceandstructural variationinahumangenomeuncoveredbyshort-read,massively parallelligationsequencingusingtwo-baseencoding. GenomeRes 2009, 19 :1527-1541. 75.DurfeeT,NelsonR,BaldwinS,PlunkettG,BurlandV,MauB,PetrosinoJF, QinX,MuznyDM,AyeleM,GibbsRA,CsorgoB,PosfaiG,WeinstockGM, BlattnerFR: Thecompletegenomesequenceof Escherichiacoli DH10B: Insightsintothebiologyofalaboratoryworkhorse. JBacteriol 2008, 190 :2597-2606. 76.CummingsCA,BormannChungCA,FangR,BarkerM,BrzoskaPM, WilliamsonP,BeaudryJA,MatthewsM,SchuppJM,WagnerDM, FurtadoMR,KiemP,BudowleB: Whole-genometypingof Bacillus anthracis isolatesbynext-generationsequencingaccuratelyandrapidly identifiesstrain-specificdiagnosticpolymorphisms. ForensicScience International:GeneticsSupplementSeries 2009, 300-301. 77.AmaroC,BioscaEG,FouzB,ToranzoAE,GarayE: Roleofiron,capsule, andtoxinsinthepathogenicityof Vibriovulnificus biotype2formice. InfectImmun 1994, 62 :759-763. 78.AmaroC,BioscaEG: Vibriovulnificus biotype2,pathogenicforeels,is alsoanopportunisticpathogenforhumans. ApplEnvironMicrobiol 1996, 62 :1454-1457. 79.HeidelbergJF,EisenJA,NelsonWC,ClaytonRA,GwinnML,DodsonRJ, HaftDH,HickeyEK,PetersonJD,UmayamL,GillSR,NelsonKE,ReadTD, TettelinH,RichardsonD,ErmolaevaMD,VamathevanJ,BassS,QinH, DragoiI,SellersP,McDonaldL,UtterbackT,FleishmannRD,NiermanWC, WhiteO: DNAsequenceofbothchromosomesofthecholerapathogen Vibriocholerae Nature 2000, 406 :477-483. 80.ChunJ,GrimCJ,HasanNA,LeeJH,ChoiSY,HaleyBJ,TavianiE,JeonYS, KimDW,LeeJH,BrettinTS,BruceDC,ChallacombeJF,DetterJC,HanCS, MunkAC,ChertkovO,MeinckeL,SaundersE,WaltersRA,HuqA,NairGB, ColwellRR: Comparativegenomicsrevealsmechanismforshort-term andlong-termclonaltransitionsinpandemic Vibriocholerae ProcNatl AcadSciUSA 2009, 106 :15442-15447. 81.PaleySM,KarpPD: ThePathwayToolscellularoverviewdiagramand OmicsViewer. NucleicAcidsRes 2006, 34 :3771-3778. 82.OverbeekR,BegleyT,ButlerRM,ChoudhuriJV,ChuangHY,CohoonM, Crecy-LagardV,DiazN,DiszT,EdwardsR,FonsteinM,FrankED,GerdesS, GlassEM,GoesmannA,HansonA,Iwata-ReuylD,JensenR,JamshidiN, KrauseL,KubalM,LarsenN,LinkeB,McHardyAC,MeyerF,NeuwegerH, OlsenG,OlsonR,OstermanA,PortnoyV,PuschGD,RodionovDA, RuckertC,SteinerJ,StevensR,ThieleI,VassievaO,YeY,ZagnitkoO, VonsteinV: Thesubsystemsapproachtogenomeannotationanditsuse intheprojecttoannotate1000genomes. NucleicAcidsRes 2005, 33 :5691-5702. 83.MahmudZH,WrightAC,MandalSC,DaiJ,JonesMK,HasanM,RashidMH, IslamMS,JohnsonJA,GuligPA,MorrisJGJr,AliA: Genetic characterizationof Vibriovulnificus strainsfromtilapiaaquaculturein Bangladesh. ApplEnvironMicrobiol 2010, 76 :4890-4895. 84.VickeryMC,NilssonWB,StromMS,NordstromJL,DePaolaA: Areal-time PCRassayfortherapiddeterminationof16SrRNAgenotypein Vibrio vulnificus JMicrobiolMethods 2007, 68 :376-384. doi:10.1186/1471-2164-11-512 Citethisarticleas: Gulig etal .: SOLiDsequencingoffour Vibrio vulnificus genomesenablescomparativegenomicanalysisand identificationofcandidateclade-specificvirulencegenes. BMCGenomics 2010 11 :512. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at Gulig etal BMCGenomics 2010, 11 :512 Page16of16