Group Title: BMC Genomics
Title: Comparative metagenomics of Daphnia symbionts
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00099933/00001
 Material Information
Title: Comparative metagenomics of Daphnia symbionts
Physical Description: Book
Language: English
Creator: Qi, Weihong
Nong, Guang
Preston, James
Ben-Ami, Frida
Ebert, Dieter
Publisher: BMC Genomics
Publication Date: 2009
 Notes
Abstract: BACKGROUND:Shotgun sequences of DNA extracts from whole organisms allow a comprehensive assessment of possible symbionts. The current project makes use of four shotgun datasets from three species of the planktonic freshwater crustaceans Daphnia: one dataset from clones of D. pulex and D. pulicaria and two datasets from one clone of D. magna. We analyzed these datasets with three aims: First, we search for bacterial symbionts, which are present in all three species. Second, we search for evidence for Cyanobacteria and plastids, which had been suggested to occur as symbionts in a related Daphnia species. Third, we compare the metacommunities revealed by two different 454 pyrosequencing methods (GS 20 and GS FLX).RESULTS:In all datasets we found evidence for a large number of bacteria belonging to diverse taxa. The vast majority of these were Proteobacteria. Of those, most sequences were assigned to different genera of the Betaproteobacteria family Comamonadaceae. Other taxa represented in all datasets included the genera Flavobacterium, Rhodobacter, Chromobacterium, Methylibium, Bordetella, Burkholderia and Cupriavidus. A few taxa matched sequences only from the D. pulex and the D. pulicaria datasets: Aeromonas, Pseudomonas and Delftia. Taxa with many hits specific to a single dataset were rare. For most of the identified taxa earlier studies reported the finding of related taxa in aquatic environmental samples. We found no clear evidence for the presence of symbiotic Cyanobacteria or plastids. The apparent similarity of the symbiont communities of the three Daphnia species breaks down on a species and strain level. Communities have a similar composition at a higher taxonomic level, but the actual sequences found are divergent. The two Daphnia magna datasets obtained from two different pyrosequencing platforms revealed rather similar results.CONCLUSION:Three clones from three species of the genus Daphnia were found to harbor a rich community of symbionts. These communities are similar at the genus and higher taxonomic level, but are composed of different species. The similarity of these three symbiont communities hints that some of these associations may be stable in the long-term.
General Note: Periodical Abbreviation:BMC Genomics
General Note: Start page 172
General Note: M3: 10.1186/1471-2164-10-172
 Record Information
Bibliographic ID: UF00099933
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access: http://www.biomedcentral.com/info/about/openaccess/
Resource Identifier: issn - 1471-2164
http://www.biomedcentral.com/1471-2164/10/172

Downloads

This item has the following downloads:

PDF ( 4 MBs ) ( PDF )


Full Text


0
BMC Genomics BioMed Central



Research article

Comparative metagenomics of Daphnia symbionts
Weihong Qil,4, Guang Nong2, James F Preston2, Frida Ben-Ami3 and
Dieter Ebert*3


Address: 'Swiss Tropical Institute, Socinstrasse 57, 4002 Basel, Switzerland, 2Department of Microbiology and Cell Sciences, University of Florida,
Gainesvillle, FL 32611, USA, 3Zoological Institute, Basel University, Vesalgasse 1, 4051 Basel, Switzerland and 4Functional Genomics Center
Zurich, UNI/ETH Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
Email: Weihong Qi weihong.qi@fgcz.ethz.ch; Guang Nong gnong@ufl.edu; James F Preston jpreston@ufl.edu; Frida Ben-Ami frida.ben-
ami@unibas.ch; Dieter Ebert* dieter.ebert@unibas.ch
* Corresponding author



Published: 21 April 2009 Received: 4 March 2008
BMC Genomics 2009, 10:172 doi:10.1 186/1471-2164-10-172 Accepted: 21 April 2009
This article is available from: http://www.biomedcentral.com/1471-2164/10/172
2009 Qi et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.



Abstract
Background: Shotgun sequences of DNA extracts from whole organisms allow a comprehensive
assessment of possible symbionts. The current project makes use of four shotgun datasets from
three species of the planktonic freshwater crustaceans Daphnia: one dataset from clones of D. pulex
and D. pulicaria and two datasets from one clone of D. magna. We analyzed these datasets with
three aims: First, we search for bacterial symbionts, which are present in all three species. Second,
we search for evidence for Cyanobacteria and plastids, which had been suggested to occur as
symbionts in a related Daphnia species. Third, we compare the metacommunities revealed by two
different 454 pyrosequencing methods (GS 20 and GS FLX).
Results: In all datasets we found evidence for a large number of bacteria belonging to diverse taxa.
The vast majority of these were Proteobacteria. Of those, most sequences were assigned to
different genera of the Betaproteobacteria family Comamonadaceae. Other taxa represented in all
datasets included the genera Flavobacterium, Rhodobacter, Chromobacterium, Methylibium, Bordetella,
Burkholderia and Cupriavidus. A few taxa matched sequences only from the D. pulex and the D.
pulicaria datasets: Aeromonas, Pseudomonas and Delftia. Taxa with many hits specific to a single
dataset were rare. For most of the identified taxa earlier studies reported the finding of related
taxa in aquatic environmental samples. We found no clear evidence for the presence of symbiotic
Cyanobacteria or plastids. The apparent similarity of the symbiont communities of the three
Daphnia species breaks down on a species and strain level. Communities have a similar composition
at a higher taxonomic level, but the actual sequences found are divergent. The two Daphnia magna
datasets obtained from two different pyrosequencing platforms revealed rather similar results.
Conclusion: Three clones from three species of the genus Daphnia were found to harbor a rich
community of symbionts. These communities are similar at the genus and higher taxonomic level,
but are composed of different species. The similarity of these three symbiont communities hints
that some of these associations may be stable in the long-term.




Daphnia Genomics Page 1 of 21



Consort (page number not for citation purposes)







BMC Genomics 2009, 10:172


Background
Metagenomics is the field that infers the properties of a
habitat through the analysis of genomic sequence infor-
mation obtained from a sample usually collected from a
single habitat. The sequences are usually compared to
databases, with the aim to characterize the biological
community of this habitat. Among the advantages of this
explorative method are the free and uncomplicated sam-
pling of the material, the possibility of obtaining
sequences from unknown and unculturable organisms,
the absence of any taxonomic restrictions and the relative
ease of conducting such studies [1-4]. Metagenomics stud-
ies have been done in various habitats, including sea
water [5], ice cores [6] and deep mine communities [7].
Of particular recent interest has been the application of
metagenomic approaches to study samples obtained from
organisms, which harbor various symbionts, such as
unknown and uncultuable bacteria, protozoa or viruses.
For example, the symbiont communities of honey bees
[8], the guts of mice [9] and humans [ 10], marine sponges
[11], oligochaetes [12] and plant-rhizobacteria [13]
revealed many new symbiont taxa. However, not only
samples collected with the aim to find symbionts revealed
previously unknown organisms, but also datasets from
genome projects where one single genome was targeted
may contain sequences of other species, presumably sym-
bionts [14]. Here we report on the bacterial communities
associated with three clones each from one species of crus-
taceans of the genus Daphnia, which had been used in
genome projects and revealed besides sequences to the
targeted species, a rich body of sequences to other species.
We use the term symbiont to include organisms that were
found to be associated with the samples of these Daphnia,
disregarding whether they are parasites, commensals or
mutualists. We cannot rule out, that some of these organ-
isms are independent of the Daphnia, e.g. free living bac-
teria in the water, parts of the ingested food or
contaminants from handling the samples. For simplicity
we use the term symbiont throughout this article.

Daphnia is a genus of small freshwater plankton living in
standing freshwater bodies. Their body sizes ranges from
0.3 to 5 mm. They are primary consumers in the aquatic
food chain and their ecology and evolution has been
intensively studied [15]. Numerous ecto- and endo-para-
sites have been described [16,17], but the non-parasitic
bacterial symbionts of Daphnia are very poorly known.
Electron micrographs typically reveal large numbers of
bacteria associated with Daphnia, as is illustrated with the
examples in Figure 1. The entire body of Daphnia can be
coated in thick bacterial mats [16,17]. Thus, Daphnia are
likely to carry a community of prokaryotes with them.
Only one case of a possible mutualist has been reported
so far. Chang and Jenkins [18] reported the presence of
photosynthetically active gut endosymbionts in Daphnia


http://www. biomedcentral.com/1471-2164/10/172



obtusa. They speculate that the Daphnia take up plastids via
phagocytosis, after the lysis of the mother cell in the gut.
Variations in ultrastructure lead them to assume that plas-
tids from different sources are taken up, including Cyano-
bacteria. These findings have not been confirmed for any
other Daphnia species, although the ecological niches of
Daphnia species are often strongly overlapping.

Here we take advantage of shotgun sequences obtained
from three laboratory clones (= iso-female lines) each
from one Daphnia species to search for indications of bac-
terial and plastid symbionts. For this we compared the
sequences against the NCBI-nt database on nucleotide
sequences using BLASTN [19] and analyzed and ordered
the results using the metagenomics software MEGAN
[20]. This software allows the exploration of the taxo-
nomic content of a community sample based on the NCBI
taxonomy. Community shotgun datasets represent
sequences independently sampled from random regions
of genomes randomly selected from a given community.
These sequences can have very different levels of conserva-
tion. Without any assumptions about the functions of the
sequences used, MEGAN associates each sequence to the
lowest common ancestor of the set of taxa it hits. Thus,
species specific sequences are assigned to low order taxa
such as species or strains, while widely conserved
sequences are assigned to high-order taxa. In other words,
the taxonomical level of the assigned taxon reflects the
level of conservation of the sequence. The strength of this
statistical approach is that it makes use of all kind of
sequences for taxon identification. Therefore, when using
random sequences MEGAN, will usually show better tax-
onomic resolution than an analysis using only a small set
of phylogenetic markers [20]. This type of analysis is in
particular useful when, as is the case here, datasets are ana-
lyzed, which were obtained by random shotgun sequenc-
ing, rather than targeted sequencing (see also [21]) and
where the length of the sequence reads are short [20,22].

Our choice to use the software MEGAN for the analysis of
the datasets from the Daphnia projects is based on several
aspects, which help to reduce known problems in com-
parative metagenomics. A known shortcoming of the
assignment of sequences to taxonomic groups is its inabil-
ity to deal with horizontally transferred genes and the ina-
bility of mapping sequences to internal nodes of the tree
[23]. However, these problems are mainly of concern
when using "best-BLAST-hit" mapping. The software
MEGAN was developed to avoid this problem (see previ-
ous paragraph). A further problem of assigning sequences
to taxonomic groups is the well know bias in the taxon
representation in our databases [24,25]. This problem
cannot be fully solved, but the ability of MEGAN to assign
sequence to the lowest common ancestor, ameliorates the
consequences of a database bias. Sequences will be


Page 2 of 21
(page number not for citation purposes)







http://www. biomedcentral.com/1471-2164/10/172


Figure I
Four examples of scanning electron microscopic (SEM) images of parts of D. magna showing numerous bacte-
ria attached to different surface structures. A. Head of D. magna. The white filamentous structures on the surface are
bacteria. B and C. Surface of the carapace with bacteria attached. The thin lines on the carapace denote epidermis cell bound-
aries. D. Parts of the filter apparatus of D. magna. The oval objects are bacteria. None of the bacteria have yet been identified.
Scale bar 200 im in A and 10 im in B, C and D.


assigned to the common ancestor of the true species in
question and those being represented in the database.
Novel sequences will not be assigned at all [20].

The aims of our analysis were first to compare the shotgun
sequences of the prokaryote communities coming from
three Daphnia species. Second to test if the shotgun
sequences give evidence for a plastid symbiont in Daphnia
as had been suggested [18]. Third, to estimate the repeat-
ability of a metagenomics approach using two different
sequencing platforms, the pyrosequencers GS 20 and GS
FLX [26] for one of the three Daphnia species.

Results and discussion
In the four datasets, sequences that were assigned to
known cellular organisms varied from 9% to 18% (Table
1). The vast majority of the assigned sequences were to
Eukaryota and to Bacteria. Few sequences were assigned to
the NCBI Taxonomy categories: Archaea, Viroids, Other
and Unclassified. Only among the D. pulicaria sequences


were hits (a total of 4) found to viruses. However, the low
bit scores suggest that these may have other origins. As the
scaffolds of D. pulex included in this study had been pre-
sorted to include only bacteria, there might have been
more hits to taxa other than Bacteria and Eukaryota.

The numbers of bacterial genera (excluding the Firmi-
cutes) with at least two reads assigned were 90, 123, 37
and 51 for the D. pulex, D. pulicaria, D. magna GS 20 and
D. magna GS FLX datasets, respectively. The lower number
of genera revealed by the D. magna datasets corresponds
with the smaller size of these datasets (Table 1, Table 2).
This large number of genera indicates a rich community of
bacteria in and on Daphnia. In all datasets the majority of
the sequences were assigned to the Gamma- and Betapro-
teobacteria (Fig. 2), which together accounted for more
than 87% of the sequences assigned to bacteria. Outside
the Proteobacteria, the Bacteroidetes and to a lesser degree
to the Actinobacteria were found, the later however,
mainly in the D. pulicaria dataset. Except the Actinobacte-


Page 3 of 21
(page number not for citation purposes)


BMC Genomics 2009,10:172








http://www. biomedcentral.com/1471-2164/10/172


Table I: Number of sequences assigned and unassigned in the MEGAN analysis.


Daphnia species/dataset Assigned to cellular organisms Assigned to Bacteria without Firmicutesi Not assigned2 Sequences without hits


D. pulex
D. pulicaria
D. magna GS 20
D. magna GS FLX


38,249
99,178
3,028
4,781


25,868
25,604
2,560


97,852
966,027
16,007
21,535


120,355
23,469


I The Firmicutes were excluded, because the D. magna datasets contained a bacterial parasite belonging into this taxon. For each dataset, the sum
of columns 2, 4, and 5 is less than the total number of sequences analyzed (Table 2) due to the few sequences assigned to other NCBI taxonomy
categories such as "Other" and "Unclassified".
2 The unassigned sequences are sequences without hits above the defined thresholds (See Materials and Methods). They may be A) sequences that
do not have homologs in the current NCBI-nt database, B) sequences that evolved so strongly that their homologs are disguised by bit scores
below our threshold or C) sequences that are assigned to species to which no other sequences is assigned (min-support threshold = 2).


ria, all taxa with substantial number of sequences assigned
to were found in datasets from all three Daphnia species.

Assignment of sequences to the bacteria, without
Firmicutes and Cyanobacteria
The majority of the assigned sequences fall on two phyla,
the Bacteroidetes and the Proteobacteria. Among the
Bacteroidetes, most sequences were assigned to the Flavo-


bateriales (between 187 to 463 sequences per sets, or 1.3
- 7.7% of the sequences) and a very large proportion of
those to the genus Flavobacterium (Fig. 3). Within this
genus, no single species stuck out as giving a better match
than other species. Flavobacteria are a group of opportun-
istic pathogens (e.g. salmon), commensals (e.g. in infuso-
ria, cnidaria) [27] and intracellular symbionts of insects
[28-30]. They are widely distributed in freshwater habi-


Table 2: Summary of the four datasets included in this analysis.


D. pulex


D. pulicaria


D. magna GS 20


D. magna GS FLX


Original input data:


Possible bacterial scaffolds


Contigs and raw reads
longer than 500 bps


Contigs longer than 100
bps


Contigs longer than 100
bps


No. of original sequences

Total length (bps)

Average length
(mean stdev bps)

Median length (bps)

Minimum length (bps)

Maximum length (bps)

Sequence fragments
subjected to BLASTN:

No. fragments

Total length (bps)

Average length
(mean stdev bps)

Median length (bps)


21,646

59,379,440

2,743 7,205


216,125


256,498

133,734,869

521 100


Page 4 of 21
(page number not for citation purposes)


Data type


327,632

323,393,910

987 255


9,681


4,388

4,335,734

988 2,830


218

100

40,374


19,163

8,809,340

459 149


500


6,696

6,154,579

919 2,507


280

100

40,088


26,430

12,259,583

463 131


500


1,088,697

570,776,073

524 195


BMC Genomics 2009, 10:172






http://www. biomedcentral.com/1471-2164/10/172


pulex
pulicaria
magna


Proteobacteria 55








Bacter


Bacteria 58739
-







Cy


5757


rGammaproteobacteria 19199
rAlphaproteobacteria 3270
Deltaproteobacteria 203
Upsilonproteobacteria 10
Betaproteobacteria 32158
I unclassified Proteobacteria 9


-- ,'unclassified 2


oidetes 1368 I-e IFlavobacteria 1210
r--__ Sphingobacteria 74
OBacteroidetes (class) 32
- --5Chlorobia 30
--*Planctomycetacia 16
- - - Firmicutes

--- ODeinococci 8
---*Chloroflexi (class) 2
*Chroococcales 7
noaeria 66 *Nostocales 55
-- Gloeobacteria 4
-- O Actinobacteria (class) 706
-----*Solibacteres 15
-- *Nitrospira (class) 3
---- Chlamydiae (class) 2


Figure 2
The comparative taxonomic tree of the bacterial orders found in the three Daphnia datasets. The data of the two
D. magna datasets were combined for this figure. Only bacterial orders, with at least 2 sequences assigned are included. The
Firmicutes were excluded (see text for explanation). The numbers next to the taxon names are the cumulative number of
sequences assigned to this taxon. The size of the circles is proportional to the number of sequences assigned to this node. The
color scheme of each pie chart is as the following: dark dull magenta for D. pulex sequences, pale dull blue for D. pulicaria
sequences, vanilla for D. magna sequences.



Page 5 of 21
(page number not for citation purposes)


BMC Genomics 2009,10:172







http://www. biomedcentral.com/1471-2164/10/172


Bacteroidetes


Flavobacteriaceae I-rFlavobacterium
Flavobacteriales -- rnithobacterium
--cGramella

-Candidatus Sulda
Flexibacteracea eid-Cytophaga
, nhartprial [ oDFlexibarter
*--- Salinibacter

Bacteroidales I--Bacteroides
rpnhyrumunrraceae Porphyromonas
I Parabacteroides


Chlorobium/Pelodictyon group --- Chlorobium
.Chloro bi rnfelf [ Pelodictyon
I OChlorobaculum


Figure 3
Taxonomic diversity of the three Daphnia datasets within the Bacteroidetes/Chlorobi group. For more explana-
tion see legend to Fig. 2.


tats, but also occur in association with terrestrial hosts.
Some members of Flavobacteria are known to play a signif-
icant role in the degradation of proteins, polysaccharides,
and diatom debris in natural environments [31,32]. Cul-
tured representatives of Flavobacteria with ability to
degrade various biopolymers such as cellulose, chitin and
pectin were described [33]. The commonness in all data-
sets here indicates that they may indeed be symbionts of
Daphnia. One may speculate that Flavobacterium may play
a role in food digestion in Daphnia, which mainly feed on
unicellular planktonic algae [34]. This hypothesis has to
be tested with a targeted approach.

Another genus of the Bacteroidetes, which was consist-
ently found in all datasets is Cytophaga (Fig. 3) These are
gliding bacteria found in freshwater and marine habitats,
in soil and in decomposing organic matter. However, hits
to this genus were never frequent (between 10 and 25
hits).

The phylum Proteobacteria attracted 98, 94, 84 and 88%
of the sequences assigned to bacteria in the D. pulex, D.
pulicaria, D. magna GS 20 and D. magna GS FLX datasets,
respectively. Table 3 shows the distribution of all Proteo-
bacteria genera for which at least one dataset attracted
more than 1% of the sequences assigned to Bacteria.


The Alphaproteobacteria attracted a lager number of hits
(3.9 to 8% of sequences), with the genus Rhodobacter
being the most common in all three Daphnia species (0.4
to 2.8% of reads) (Fig. 4). Other genera of the Alphapro-
teobacteria were only found in the D. pulex or the D. puli-
caria datasets (Fig. 4). Alphaproteobacteria are commonly
found in freshwater environments, including sewage.
They are known for a wide range of metabolic capabilities.
Rhodobacter were isolated from sea and freshwater.

The majority of the sequences assigned to the Proteobac-
teria (overall about 50% of sequences) where assigned to
the Burkholderiales within the Betaproteobacteria (Fig. 2,
Table 3). Within the Burkholderiales, one family, the
Comamonadaceae accounted for most of these hits (Fig.
5). The Comamonadaceae is a family of gram-negative
aerobic bacteria, encompassing the acidovorans rRNA
complex. Some species are pathogenic for plants. Within
this family four genera (Acidovorax, Rhodoverax, Polarom-
onas and Verminephrobacter) showed up repeatedly and in
high numbers in all datasets (Table 3, Fig. 5). The genera
Acidovorax and Polaromonas were particularly common. A
further genus, Delftia was only common in the D. pulex
and D. pulicaria sequences (Fig. 5).




Page 6 of 21
(page number not for citation purposes)


Bacteroidetes/
Chlorobi group


BMC Genomics 2009, 10:172








http://www. biomedcentral.com/1471-2164/10/172


Table 3: Taxa within the Proteobacteria, which attracted at least 1% of the sequences within at least one of the four datasets.

Taxon level Taxon D. pulex D. pulicaria D. magna GS 20 D. magna GS FLX Average

Class Alphaproteobacteria 3.9 8.0 4.5 6.6 5.7

Genus Rhodobacter 0.4 1.4 2.0 2.8 1.6

Class Betaproteobacteria 41.9 72.7 63.0 63.5 60.3

Family Neisseriaceae 0.1 1.2 0.0 0.3 0.4

Genus Chromobacterium 0.1 1.2 0.0 0.2 0.4

Order Burkholderiales 41.0 69.5 61.8 61.8 58.5

Genus Methylibium 2.8 3.9 1.0 1.2 2.2

Family Alcaligenaceae 0.3 0.6 1.2 1.1 0.8

Genus Bordetella 0.3 0.4 1.2 1.1 0.7

Family Burkholderiaceae 1.3 3.0 2.2 1.9 2.1

Genus Burkholderia 0.3 1.1 0.3 0.3 0.5

Genus Cupriavidus 0.5 1.0 1.4 1.1 1.0

Family Comamonadaceae 32.0 56.5 53.0 53.1 48.7

Genus Acidovorax 9.9 10.5 16.0 16.3 13.2

Genus Rhodoverax 0.9 4.1 3.2 2.8 2.8

Genus Polaromonas 3.9 12.8 14.6 14.7 11.5

Genus Delftia 2.5 5.5 0.2 0.2 2.1

Genus Verminephrobacter 6.8 4.8 4.4 4.6 5.2

Class Gammaproteobacteria 53.0 16.8 29.6 27.2 31.6

Genus Pseudomonas 43.3 11.5 0.8 1.5 14.3

Genus Serratia 8.6 0.0 0.0 0.0 2.2

Genus Aeromonas 0.1 3.8 0.0 0.0 1.0

Genus Escherichia 0.1 0.0 4.6 4.4 2.3

Cell entries are percentages of the number of sequences assigned to the Proteobacteria.


A few other genera within the Betaproteobacteria attracted cates that the species in our datasets is not exactly this, but
relatively high numbers of sequences across all or most of a related species.
the datasets: Chromobacterium, Methylibium, Bordetella,
Burkholderia and Cupriavidus (Table 3, Fig. 5). Of those Four genera within the Gammaproteobacteria attracted
Methylibium petroleiphilum was highly represented. How- larger numbers of sequences, but in contrast to the genera
ever, a closer inspection of the sequence alignments indi-



Page 7 of 21
(page number not for citation purposes)


BMC Genomics 2009,10:172






http://www. biomedcentral.com/1471-2164/10/172


Rhiz
Rhi










Alphaproteobacteria
,I


RI





Sphing


- -- Caulobacter
Xanthobacteraceae Xanthobacter
'-fl Azorhizobium
Brady hizobiaceae i-Bradyrhizobium
---I Rhodopseudomonas
Dbium grobacterium groufNitrobacter
Tobiale o i---- Agrobacterium
bP -ia2n 'Rhizobium
I -- Sinorhizobium
-- -Bartonella
Brucellaceae -- -Brucella
L-- Ochrobactrum
Phyl obacteriaceae f--* Mesorhizobium
L -- OParvibaculum
Hyphomonadaceae --* Hyphomonas
-- *Maricaulis
-r IgP aracoccus
Lhodobacterales Rhodobacter
.derace Silicibacter
_--_Roseobacter
--ORuegeria
-- Jannaschia
Dinoroseobacter
Acetobacteraceae r- *Acidiphilium
-*i Granulibacter
hodospirillales I ...
n dn s ir i ae --s*Azospirillum
RhodospirIllaceael 1. .Rspirillu
--- Rhodospirillum
-- Magnetospirillum
*Rickettsia
i mo-nadales Erythrobacter
phingemenf tceae I -Sphingomonas
-- Novosphingobium
^ Sphingopyxis


Figure 4
Taxonomic diversity of the three Daphnia datasets within Alphaproteobacteria. For more explanation see legend
to Fig. 2.



Page 8 of 21
(page number not for citation purposes)


BMC Genomics 2009,10:172







http://www. biomedcentral.com/1471-2164/10/172


Burkholderiales















Burkholderi


Neisseriaceae F- *Chromobacterium
I I ONeisseria


PMethylibium

Alcaligenes

Bordetella

lBurkholderia

IRalstonia

kCupriavidus

Polynucleobacter

kAcidovorax


--Polaromonas

--*Hydrogenophaga

-- Delftia
Caldimonas

-9 Verminephrobacter

|- Herbaspirillum
Oxal bacteraceae L janthinobacterium

Collimonas
-- Herminiimonas

-Thiobacillus

Rhodocyclaceae A--*Azoarcus
-0
--0- Dechloromonas

-Methylobacillus

Nitrosomonadaceae --- Nitrosospira
Nitrosomonas

Figure 5
Taxonomic diversity of the three Daphnia datasets within Betaproteobacteria. For more explanation see legend to
Fig. 2.




Page 9 of 21
(page number not for citation purposes)


BMC Genomics 2009,10:172







http://www. biomedcentral.com/1471-2164/10/172


Pseudomonadales O-IW Pseudomonas
toraxaceae [cinetobacter
---- Psychrobacter
-- Shewanella
---*Pseudoalteromonas
Alte romonmdacea--- -Marinobacter
Alteromor adales
Saccharophagus
----.Colwellia
*---Psychromonas
--SIdiomarina
Xanthomnonadaceae r-- ,Xanthomonas
t--*OXylella
'- OChromohalobacter
Oceanosp rillales -- *Marinomonas
--- *Alcanivorax
-Hahella
S--Methylococcus
Gammaproteobacteria
[---*-Citrobacter
Enterobacter
*Pectobacterium
Enterobacteriaceae -Escherichia
__Klebsiella
almonella
tSerratia
shigella
---Yersinia
uchnera
---^ BAeromonas
----Vibrio
I *Actinobacillus
P isteurellaceae Pasteurella
S---*Mannheimia
*Histophilus
Chron atiales r--- -- Nitrosococcus
I-- --OAlkalilimnicola
Dichelobacter

Figure 6
Taxonomic diversity of the three Daphnia datasets within Gammaproteobacteria. For more explanation see leg-
end to Fig. 2.



Page 10 of 21
(page number not for citation purposes)


BMC Genomics 2009,10:172







BMC Genomics 2009, 10:172


in the other classes of the Proteobacteria, here the distri-
bution was not even across the datasets (Table 3, Fig. 6).

Hits to species of the genus Aeromonas were found in large
number in the D. pulicaria dataset, but hardly in the other
sets (Table 3, Fig. 6). Hits were mainly to A. hydrophila and
A. salmonicida, but similarities were below 100%. Both
can live under aerobic or anaerobic conditions and are
found in water. A. hydrophila is an opportunistic pathogen
of humans, A. salmonicida causes the fish disease, furunc-
ulosis.

The single most often assigned genus in the entire analysis
was Pseudomonas in the D. pulex dataset (10,994 assigned
reads, 43.3%). These hits were mainly to the species P. flu-
orescens (7,067 reads), and in particularly to the strain
PfO-1. Similar, but not as extreme was the presence of the
same bacterium in the D. pulicaria sequences (Table 3, Fig.
6). The P. fluorescens PfO-1 genome project was run in the
same genome center (The DOE Joint Genome Institute
(JGI, http://www.jgi .doe gov/) where the D. pulex and the
D. pulicaria sequences were obtained and it seemed possi-
ble, that these hits reflect a contamination in the D. pulex
scaffolds, rather than a symbiont of D. pulex. However,
inspection of bit scores and sequence identity values in
the BLASTN outputs indicated that the Daphnia symbiont
is clearly not P. fluorescens PfO- 1. The P. fluorescens group
includes diverse bacteria that are found in soil, but also in
aquatic environments.

A further contamination candidate is the Gammmapro-
teobacterium Serratia, to which we found 2,184 matched
sequences in the D. pulex genome. However, it is hardly
seen among the D. pulicaria sequences, and not seen at all
among the D. magna sequences (Table 3, Fig. 6). The spe-
cies to which most sequences were assigned is Serratia pro-
teamaculans 568, whose genome was sequenced as well by
the DOE Joint Genome Institute. Also here, the inspection
of the BLASTN results indicated high similarity, but few
perfect matches, excluding contamination at the JGI. Ser-
ratia are often associated with the human gut, but are not
pathogenic.

Another genus with many hits to the D. pulex and the D.
pulicaria sequences, but not to the D. magna sequences
(Table 3), is the already mentioned Betaproteobacterium
Delftia (Fig. 5). The DOE Joint Genome Institute
sequenced Delftia acidovorans strain SPH-1, which is the
strain most of the sequences were assigned to. However,
inspection of the BLASTN results again showed that the
Daphnia symbiont is clearly not D. acidovorans strain SPH-
1.

About 200 sequences matched Deltaproteobacteria (Fig.
2). Within this order various taxa matched sequences


http://www. biomedcentral.com/1471-2164/10/172



from the datasets. However, there was no consistent pic-
ture across the three Daphnia species (Fig. 7).

Searching for Cyanobacteria and plastid sequences
Following the suggestion of Chang and Jenkins [18] that
Daphnia may carry symbiontic plastids or cyanobacteria
with them, we looked more closely into these two groups.
The D. magna sequences revealed no hit to any Cyanobac-
teria taxon. Of the D. pulex sequences 44 (= 0.17% of the
assigned sequences) were assigned to the Nostocales, a
taxon of the Cyanobacteria. 19 (= 0.074%) of these hits
were to the genus Nostoc. In the D. pulicaria we found 22
sequences assigned to the Cyanobacteria, half of which
were to the Nostocales (Fig. 8).

The D. pulicaria dataset revealed 23 sequences assigned to
plastids. One of them was a short sequence (100 bps) to
the chloroplasts of the green algae Chlamydomonas, the
other to the chloroplasts of flowering plants. Hits to the
later came mostly from one scaffold and had high bit
scores (> 500) and similarities of more than 90%. The D.
pulex sequences revealed no hits to plastids, but this is not
surprising, as the dataset had been sorted out to contain
predominately prokaryote sequences. The D. magna GS 20
dataset did not reveal any hits to plastids. The D. magna
GS FLX sequences contained a short sequence (104 bps)
matched to a plastid, the chloroplast of the green algae Sti-
geoclonium helveticum.

The presence of plastid sequences in Daphnia shotgun
datasets has however, to be looked at with care, as unicel-
lular green algae are the main food of Daphnia, both in the
field and in the laboratory [34,35]. However, the few
sequences assigned to plastids here seem not to corre-
spond closely with the algae, which were used to feed the
Daphnia in the cultures, before they were used for DNA
extraction. The D. magna and the D. pulex clone had been
kept on an exclusive diet of the green algae Scenedesmus sp.
and the D. pulicaria clone on a diet of the green algae Ank-
istrodesmus falcatus.

All in all we consider this as rather weak evidence for plas-
tid symbionts in these Daphnia samples. The original find-
ing was done in D. obtusa [18], which was not included in
our study. The authors had observed variation in the type
and frequency of plastid occurrence in this species, so it
may not be surprising that things are different in other
species. Furthermore, the long maintenance of the Daph-
nia clones in laboratory cultures may have contributed to
a loss of plastids. Therefore, the absence of evidence from
our metagenomics analysis is certainly not evidence for
the absence of possible plastid symbionts in Daphnia.


Page 11 of 21
(page number not for citation purposes)







http://www. biomedcentral.com/1471-2164/10/172


Myxococcaceae r--- Myxooccus

Myxococcales -- Anaeromyxobacter

IPolyangiaceae f--- Sorangium

Deltaproteobacteria -- Polyangium
----- --- O*Desulfovibrio

-- f Desulfococcus

Desulf romonadales -- ---9Pelobacter
1 (3lGeobacter

-- Bdellovibrio
Campylobacteraceae 1-Campylobacter
Campylnhartpralpes II [Arcobacter

--*Sulfurimonas

Figure 7
Taxonomic diversity of the three Daphnia datasets within Delta- and Epsilonproteobacteria. For more explana-
tion see legend to Fig. 2.


Searching for 16S rDNA sequences
All four datasets were also analyzed with a more conven-
tional approach, which was to identify contigs/scaffolds
similar to known 16S rDNA sequences. We compared our
data with a collection of 471,792 16S rDNA sequences
collected by the Ribosomal Database Project (RDP release
9 update 57) [36]. In total, 27 16S rDNA fragments were
identified in the D. pulicaria dataset, 13 in the D. pulex, 14
in the D. magna GS 20, and 11 in the D. magna GS FLX. Of
those, 17, 11, 9, and 10 bacterial species could be inferred
in the D. pulicaria, D. pulex, D. magna GS 20, and D. magna
GS FLX dataset, respectively. Other partial 16S rDNA
sequences were identical or almost identical to regions
conserved across species, thus could not be used to infer
the species. In Table 4 we listed close to full length 16S
rDNA sequences found in the four datasets. The nucle-
otide sequence identity between these sequences and their
corresponding best matches ranged from 91% to 100%.
Most best matched 16S rDNAs to our sequences were
from uncultured bacteria. Bacterial species that could be
inferred using 97% sequence identity as the cutoff value
included Pseudomonas sp., E. colil i i,..ii., and the already
discussed (see above) Flavobacterium sp. (Table 4). In both
D. pulex and D. pulicaria datasets, sequences highly similar
to 16S rDNA of unclassified aquatic bacterium R1-B19
were found, an undescribed beta proteobacterium (Table
4).


The 16S rDNA sequences identified only a small subset of
the species/genus found in our main analysis based on
comparison to NCBI-nt database. One likely explanation
of this discrepancy is the low sequencing coverage within
the 16S rDNA regions in the shotgun datasets. Another
explanation could be that some of the earlier predictions
were false positives. However, MEGAN associates a
sequence to the lowest common ancestor of the set of taxa
defined by all matches above defined thresholds. The
amount of false predictions is predicted to be low since
the algorithm makes higher amount of unspecific assign-
ments to higher taxonomy levels [20]. Certainly when
taxa were inferred regardless if the matched sequence was
a suitable phylogenetic marker or not, it could not be
excluded that some of the predictions were results of hor-
izontal gene transfer events. However, if this were the case,
MEGAN would assign the hit to the least common ances-
tor of the species, which were involved in horizontal gene
transfer, unless neither these species nor related species
are in the NCBI database. It was predicted that computing
taxonomic content based on sequence comparison to
NCBI-nt database will show better resolution at all levels
of the taxonomy than an analysis based on a small set of
phylogenetic markers or on 16S rDNA sequences alone
[20,21]. Our results are consistent with this prediction.


Page 12 of 21
(page number not for citation purposes)


BMC Genomics 2009,10:172







http://www. biomedcentral.com/1471-2164/10/172


Cyanobacteria


C hro oco ccal es [-- 1Syn e h cysti s
i itTh er m osyn echoo cwu s
Nostocaceae *Anaan
b so a F- 1 *N osto c

.*Tolypothrix
I -*Gloeobacter


Microcccine4 -11Arthrobacter
LFtUhaiteiaCVC* Clavibacter
IL fk.I


CorynebacteI


Actinomycetales
Actinobacteriap



Frar


"eiIUlsoit


ineae I_' M~EU[
iNUL 1aceae r-WNocardia
I Rhodococcus


Ok. I~ *Propionibacterium
La-LA ieae*Nocardioides

,*Saccharopolyspora
kineae r--' *Frankia
L. *Acidothermus


-- *Streptomyces
- -- Thermobifida
--*Rubrobacter


Figure 8
Taxonomic diversity of the three Daphnia datasets within Cyanobacteria and Actinobacteria. For more explana-
tion see legend to Fig. 2.


Despite the under-prediction and the differences between
the NCBI-nt and the 16S rDNA databases, quantitatively,
the two approaches correlated fairly well at higher taxo-
nomic level (Fig. 9).

Searching for identical and similar sequences common in
four datasets
Although sequences in all datasets were assigned to simi-
lar bacterial taxa, it is not clear how similar the sequences
are across datasets. To identify common sequences, we
compared the D. magna GS 20 sequences with sequences
from D. magna GS FLX, D. pulex, and D. pulicaria using
BLASTN. Identical or nearly identical sequences were
identified when a stretch longer than 80% of a query
sequence can be aligned with over 98% nucleotide


sequence identity to a hit sequence. With this criterion five
D. magna GS 20 contigs (corresponding to six D. pulex
scaffolds and 12 D. pulicaria reads) were identified. Hits
identical to these sequences were all found in complete
genome sequences of Escherichia coli W3110
(AP009048.1) and E. coli K12 MG1655 (U00096.2),
which suggests that commensal E. coli strains carried by
the three Daphnia species are highly similar.

With a less stringent criterion (a stretch longer than 50%
of a query sequence can be aligned with over 90% nucle-
otide sequence identity to a hit sequence), similar
sequences to about 80 GS 20 contig sequences were also
identified across the datasets. These sequences mainly fall



Page 13 of 21
(page number not for citation purposes)


rum


BMC Genomics 2009,10:172








http://www. biomedcentral.com/1471-2164/10/172


Table 4: 16S rDNA sequences close to full length identified in the four datasets.


Sequence ID




contig04123


Best matched 16S

IDI Description Bit score3 Identity (%)


S000437499 Daphnia endosymbiotic
bacterium2


contig03555 S000446092 aquatic bacterium R I-C I


D. magna GS FLX contig00041


S000893806 Shigella dysenteriae


contig06506 S000343002 uncultured Cytophagales
bacterium


contig06300 S000372741 uncultured bacterium




contig06583 S000437499 Daphnia endosymbiotic
bacterium2


ANITI59445.gI


S000966592 Flavobacterium sp. MH45


ANIT198306.bl S000799101 uncultured bacterium

ANIT159586.bl S000639702 uncultured bacterium




ANIT82605.bl S000634984 uncultured
Burkholderiales
bacterium

ANIU5178.g2 S000429300 Flavobacterium sp.
GOBB3-209


ANITI42825.bl S000634984


uncultured
Burkholderiales
bacterium


ANIS174043.gl S000446066 aquatic bacterium RI-B 19




ANITI69338.bI S000005772 Aeromonas eucrenophila


ANIS242375.b S000658887


uncultured
actinobacterium


AN IS247631 .yl S000607919 Pseudomonas sp. R-25061


Description of the next three matches4


1970 99 uncultured Pasteuria sp., P. nishizawoe,
P. penetrans

1374 98 uncultured Cytophagales bacterium,
aquatic bacterium RI-C5, uncultured
bacterium

2627 99 Escherichia coli W3 10, E. coli KI2, E.
coli

2468 96 uncultured bacterium, Flavobacterium
sp. Nj-26, uncultured Flavobacteriales
bacterium

1947 93 Myxococcales str. NOSO- 1,
Chondromyces pediculatus, Polyangium
thaxteri

1943 99 uncultured Pasteuria sp., P. nishizawoe,
P. penetrans


99 Arctic sea ice bacterium ARK10164,
uncultured bacterium, Flavobacterium
succinicans


1857 98 Comamonadaceae bacterium BP- I b,

1853 98 uncultured Burkholderiales bacterium,
Comamonadaceae bacterium BP- I b,
uncultured proteobacterium

1846 99 uncultured bacterium,
Comamonadaceae bacterium BP- I b,
Comamonadaceae bacterium BP- I

1653 98 uncultured bacterium, uncultured
Cytophagales bacterium, uncultured
Sphingobacteriales bacterium

1570 98 uncultured beta proteobacterium,
uncultured organism, Rhodoferax
ferrireducens T I 18

1485 99 uncultured beta proteobacterium,
aquatic bacterium RI-B6, uncultured
Burkholderiales bacterium

1465 99 Aeromonas sp. 'CDC 859-83', A.
molluscorum, uncultured bacterium

1439 97 uncultured bacterium, Modestobacter
multiseptatus, Sporichthya polymorpha

1419 99 Pseudomonas sp. R-25209, uncultured
bacterium, P. pseudoalcoligenes





Page 14 of 21
(page number not for citation purposes)


Dataset


D. magna GS 20


D. pulicaria


BMC Genomics 2009, 10:172








http://www. biomedcentral.com/1471-2164/10/172


Table 4: 16S rDNA sequences close to full length identified in the four datasets. (Continued)


ANIU8/6.b3 S0009489/4 uncultured bacterium



ANIT143068.bl S000550675 Pseudomonas sp. GDI 00


ANIT82605.g2 S000634984



ANITI31207.y2 S000018838


uncultured
Burkholderiales
bacterium

uncultured Cytophagales
bacterium


1386 98 uncultured gamma proteobacterium,
uncultured Pseudomonas sp.,
Pseudomonas sp. G2

1318 96 Pseudomonas sp. Pb I (2006), P. pooe, P.
lurida

1312 100 uncultured bacterium, Variovorax
paradoxus, uncultured bacterium SJA-
62


uncultured bacterium, uncultured
Bacteroidetes bacterium, rhizosphere
soil bacterium RSC-11-81


ANIT102921.y2 S000895013 uncultured bacterium



ANIUI607.g2 S000799546 uncultured bacterium



scaffold_278 S000541019 Pseudomonas argentinensis

scaffold_567 S000402041 uncultured bacterium



scaffold_1523 S000926010 Serratia proteamaculans
568


scaffold_6081 S000730527 Deefgea rivuli



scaffold_16248 S000736150 gamma proteobacterium
GPTSAI00-21


scaffold 10095 S000404820 Pseudomonas sp. Hsa.28


scaffold_1408 S000446066 aquatic bacterium R I-B 19


scaffold_21984 S000656075 uncultured Pseudomonas 1023
sp.


1170 93 uncultured Cytophagales bacterium,
uncultured Bacteroidetes bacterium,
uncultured bacterium

1092 96 Hydrogenophaga sp. AH-24,
Hydrogenophaga sp. CL3,
Hydrogenophaga sp. YED I 18


98 P. argentinensis, P. fluorescens PfO- I,


2680 97 uncultured soil bacterium, uncultured
Comamonadaceae bacterium,
uncultured beta proteobacterium

2615 96 Serratia proteamaculans 568, uncultured
bacterium, uncultured
proteobacterium

1792 97 uncultured bacterium, Chitinibacter
tainanensis, uncultured
proteobacterium


98 gamma proteobacterium GPTSA 100-
22, uncultured bacterium, gamma
proteobacterium GPTSA 100-26

99 uncultured bacterium, uncultured
Pseudomonas sp., P. anguilliseptica

99 uncultured beta proteobacterium,
aquatic bacterium RI-B6, aquatic
bacterium RI-B7

100 gamma proteobacterium LC-G-2,
Pseudomonas sp. 7-I, P. fluorescens


I Given as the ID in RDP.
2 Pasteuria ramosa, the parasite which was present in the D. magna datasets.
3 The BLAST bit scores obtained from a comparison of the contigs/scaffolds to annotated 16S rDNA sequences present in RDP are shown. A higher
number indicates a more significant match.
4 The next top three unique matched species, if they were not the same as the best match

into taxa within the Proteobacteria, with a few sequences three Daphnia clones from which our datasets originated
assigned to Flavobacterium. might be diverse at species and strain level, despite very
high homogeneousness observed at higher taxonomy
The small number of similar sequences shared across the nodes. It should be noted however, that our datasets do
datatsets suggested the bacterial community carried by the not originate directly from field samples, but from three



Page 15 of 21
(page number not for citation purposes)


D. pulex


BMC Genomics 2009, 10:172







http://www. biomedcentral.com/1471-2164/10/172


10 100 1000 10000
no. sequence assigned based on comparison with NCBI-nt database


100000


Figure 9
Correlation of taxonomic content computed by comparison to NCBI-nt and comparison to 16S rDNA data-
base. The number of sequences assigned to the following taxonomic nodes were plotted: Bacteria, Proteobacteria, Bacter-
oidetes, Gammaproteobacteria, Deltaproteobacteria, Betaproteobacteria, Flavobacteria, Sphingobacteria, Actinobacteria.


clones, which had been kept in three different laboratories
for several generations before the DNA was isolated. This
may possibly influence our results in two ways. First, we
cannot truly make statements about three Daphnia spe-
cies, but only about three clones, each coming from a dif-
ferent Daphnia species. Including more clones, might
reveal more bacterial symbionts. Second, while culturing
these clones in the laboratory, the symbiont community
may have changed both qualitatively and quantitatively.
New bacterial species may have arrived with food or cul-
ture conditions, while other bacteria may have been lost
due to the inappropriateness of the laboratory conditions
for their culture. For the current analysis, no attempts have
been undertaken to vary the culture conditions for any of
the three clones and the bacteria associated with the food
alga have not been analyzed.


Repeatability of the metagenomics approach
For D. magna we obtained two shotgun datasets, with
sequences produced with two different sequencing plat-
forms, the pyrosequencers GS 20 and GS FLX. Figure 10
shows the number of sequences assigned to all prokaryote
genera (excluding the Firmicutes) in the two datasets. The
two datasets gave very congruent results, with a correla-
tion coefficient of r= 0.98 (P < 0.001, n = 55). The plot
shows clearly that stochastic differences occur for genera
with very few hits. Expectedly, below 10 sequences
assigned to a genus, the datasets lead to quite divergent
result.

Using contigs instead of reads
For the D. pulicaria dataset, both contigs and singleton raw
reads were included in our analysis. For the other three
datasets, we used only sequences, which had previously


Page 16 of 21
(page number not for citation purposes)


E
0


4'M-


((0

(A

U)

WA


BMC Genomics 2009,10:172







http://www. biomedcentral.com/1471-2164/10/172


1000






100






10






1


100


1000


Number of sequences from GS20



Figure 10
Comparison of the number of assigned sequences (loglo(x+ I)) to prokaryote genera (excluding Firmicutes) of
the combined two D. magna datasets.


been assembled to contigs or scaffolds. This reduced the
number of sequences and thus the number of BLASTN
searches considerably. Using large numbers of raw reads
would have been beyond our computing power and the
abilities of the MEGAN software within a reasonable time
period. Using contigs and scaffolds influences the results
in various ways. First, it strongly reduces redundancy in
the dataset and therefore makes the analysis much
quicker. Second, it compromises somewhat the usefulness
of the number of assigned sequences as a measure for the
abundance for the different taxa. The number of assigned
sequences is still a relative measure for the frequency of a
given taxa, but the larger the real number of hits would
have been, the more strongly the value is reduced. Third,
rare members of the symbiont community are likely to
remain undetected, because the few reads sequenced for
rare species, were unlikely to be assembled in contigs.
Thus, our estimates of the number of taxa detected are
likely to underestimate the true number of taxa in the
community. This conclusion is also supported by the


observation that the D. pulicaria dataset contained the
highest number of taxa identified.

Conclusion
Our analysis of shotgun sequences of three clones, each
from one Daphnia species revealed a rich bacterial com-
munity to be associated with these clones. The particular
data structure of our analysis allows for certain conclu-
sions to be drawn. First, the majority of the common bac-
terial taxa identified are found in all Daphnia datasets.
While the D. pulex and D. pulicaria clone cultures from
which DNA was isolated originated from laboratories in
North America, the D. magna cultures originate from a
laboratory in Switzerland. To the best of our knowledge,
there was never a cross Atlantic exchange of cultures
between laboratories by the time these samples had been
taken. Thus, we speculate that the similarity of the symbi-
ont communities in European and North American Daph-
nia samples, indicates a long lasting stability of these
associations.


Page 17 of 21
(page number not for citation purposes)


O
S ,/
0S ,/

0
6..,'
,,

*.@ ~'
9/
S

0,' 0
"'S


p p


BMC Genomics 2009,10:172







BMC Genomics 2009, 10:172


Second, the symbiont communities across the three Daph-
nia species are remarkable similar, yet, they are not iden-
tical. At sequence level, the similarity breaks down,
indicating that each Daphnia species harbors different spe-
cies or strains of bacterial symbionts.

Third, some bacterial taxa were found to be specific to the
two datasets produced in the DOE Joint Genome Institute
(JGI). Coincidentally, some of the published genomes in
these taxa had been originally sequenced by JGI, leading
to speculations of whether the JGI may have contami-
nated the Daphnia samples. Our analysis allows us clearly
to reject this hypothesis. Whether the bacterial taxa found
to be associated with specific Daphnia samples are con-
taminations of the laboratory where they were cultured
previous to sequencing, or if they are natural symbionts of
the Daphnia, cannot not be worked out here.

Fourth, there is no clear evidence for a stable cyanobacte-
rial or plastid symbiont in the Daphnia species. The few
scattered hits to some plastid and Cyanobacteria may
have been a contamination with the algae food of the
Daphnia. Plastid symbionts had been observed in D.
obtusa [37]. However, the long laboratory culture of the
clones used in the genome study may have influenced the
presence of such a photoactive symbiont.

Methods
The D. pulex dataset
The sequences of D. pulex are from the DGC whole
genome sequencing project. The chosen D. pulex clone
called The Chosen One was cultured at Indiana Univer-
sity, Bloomington, USA on a diet of the green algae
Scenedesmus sp. The animals used to isolate the DNA for
the genome project were treated with tetracycline (250
mg/L overnight) before DNA isolation to reduce their bac-
terial load. Sequencing was done at the DOE Joint
Genome Institute (JGI) using the Sanger method. These
sequences were obtained from the wFleaBase website
http://wfleabase.org:7182/genome/Daphnia pulex/cur
rent/genome-assembly-full-jazz 20060901/scaffold
seences/. Scaffolds included in this study were excluded
scaffolds, prokaryotic scaffolds, and possible bacterial
scaffolds in the current D. pulex genome assembly http:/4
wfleabase.org:7182/genome/Daphnia pulex/current/
bacteria/dpulex jgi060905 possible bacterial.txt.

The D. pulicaria dataset
Daphnia pulicaria is closely related to D. pulex and forms
with intermediate characters are frequently encountered,
suggesting hybridization of these two species. Indeed,
allozyme test for allelic variation at the lactate dehydroge-
nase loci show both fast and slow electromorphic alleles,
indicating that the chosen D. pulicaria strain is a pulicaria/
pulex hybrid. This chosen D. pulicaria clone was cultured


http://www. biomedcentral.com/1471-2164/10/172



at the Hubbard Center for Genome Studies at the Univer-
sity of New Hampshire, USA, on a diet of the green algae
Ankistrodesmus falcatus. Previous to it's culturing at the
University of New Hampshire it was maintained in a lab-
oratory at Utah State University. The animals used to iso-
late the DNA for the genome project were treated with
tetracycline (250 mg/L overnight) before DNA isolation
to reduce their bacterial load. Sequencing of D. pulicaria
was also done at the DOE Joint Genome Institute (JGI)
using the Sanger method. A low coverage genome assem-
bly of a D. pulicaria clone is available to DGC members,
and others may request access to this data. As the DGC
and JGI data agreements allow, this will be released for
public access on the wfleabase database: http://wflea
base.org/genome/Daphnia pulicaria/. For more informa-
tion on the D. pulex and D. pulicaria genome data see
http://wfleabase.org/.

The D. magna datasets
The sequences of D. magna originated from a shotgun
sequencing project which aimed at sequencing the
endoparasitic bacterium P. ramosa. During the analysis of
the data large number of sequences clearly unrelated to
the Firmicutes (the group to which P. ramosa belongs)
showed up. Only these sequences are included in this
paper. As these data are not yet published elsewhere, we
describe here the DNA isolation, library construction and
sequencing in detail.

Daphnia magna cultures were raised at the University of
Fribourg, Switzerland on a diet of the green algae Scenedes-
mus sp. The Daphnia had been exposed to the gram-posi-
tive bacterium Pasteuria ramosa, an endo-parasite of
Daphnia [17] when they were 3-5 days old. Most animals
became infected and were shipped for further processing
to the University of Florida, USA. One thousand P. ramosa
infected D. magna were suspended in 5 ml of Buffer A (1.0
M NaC1, 50 mM Tris-HCl pH 8.0) and homogenized gen-
tly in a glass pestle and mortar. The homogenate was
passed through a 50-100 micron metal mesh and 21
micron nylon mesh to remove Daphnia debris. About
5,000,000 P. ramosa cells were obtained and resuspended
in 450 Al of Buffer A. These were added to an equal vol-
ume (450 |il) of 2% agarose for preparing a gel plug to
embed the vegetative cells, and 10 gel plugs were pro-
duced. To disrupt cells gently, the gel plugs were trans-
ferred into Buffer B (0.2% sodium deoxycholate, 0.5%
Brij 58, 0.5% sarcosine, 50 mM Tris-HCl pH 8.0, 100 mM
EDTA pH 8.0, 0.40 M NaCl) and incubated at 37C over-
night. These were then transferred into 10 ml of Buffer C
(100 mM NaCl, 50 mM Tris-HCl pH 8.0, 100 mM EDTA
pH 8.0, 0.5% sarcosine, 0.2 mg/ml protease K) at room
temperature. The gel plugs were transferred to 40 ml of
Wash Buffer (10 mM Tris-HCl pH 8.0, 10 mM EDTA pH
8.0) and washed three times in a shaker at low speed for 1


Page 18 of 21
(page number not for citation purposes)







BMC Genomics 2009, 10:172


hourrespectively to remove detergents. Gel plugs were
transferred to 40 ml of PMSF Buffer (1.0 mM phenylmeth-
ylsulfonyl floride PMSF, 10 mM Tris-HCl pH 8.0, 10 mM
EDTA pH 8.0) and incubated at room temperature for 1
hourwith gentle shaking; this process was repeated with
fresh PMSF buffer. The plugs were then washed twice in 40
ml of Wash Buffer following incubation at 50C for 20
minutes. The gel plugs were then transferred to 40 ml of
50 mM EDTA (pH 8.0) and stored at 4C overnight. The
DNA in the gel plugs was digested with 10 U of HindIII
per plug at 37C for 30 minutes.

The gel plugs with the partially digested DNA were cut
into slurry. They were loaded onto a 1% agraose gel
(Sigma, Type VII, low gelling temperature), and sealed on
the top with agarose. Electrophoretic development
occurred in 0.7 x TAE Buffer using a FIGE apparatus under
Program 4 (BioRad, Hercules, CA 94547). Products rang-
ing in size from 18 to 33 Kb were extracted from the gel
(estimated 60 ng DNA total) following the protocol of
GELase Agarose Gel-Digesting Preparation kit (Epicentre,
Madison, WI 53713), and used to prepare the cosmid
library.

The preparation of the cosmid library followed the proce-
dures described by Bell et al. [38], with additional infor-
mation described by Chow et al. [39]. In brief to construct
the cosmid library an estimated 60 ng of 18-33 Kb frag-
ments recovered from gel were cloned into vector pCC1
which was digested with HindIII and then dephosphor-
ylated with shrimp alkaline phosphatase followed the
protocol (Roche, Indianapolis, IN 46250). The ligation
products were packaged into bacteriophage particles using
MaxPlax Lamda DNA packaging extracts (Epicentre, Mad-
ison, WI 53713) according to the protocol of the kit. Bac-
teriophage containing an estimated 5 x 103 particles in 50
gL were applied to infect 200 gl of EPI300 cells grown to
exponential phase in LB liquid medium (Luria-Bertani
medium) containing 10 mM MgSO4 and 0.2% maltose,
which had been inoculated from the overnight culture
grown in LB containing 10 mM MgSO4. After absorption
following incubating at 37C for 20 minutes, 1 ml of
fresh LB medium was added and incubated for an addi-
tional 45 minutes. The infected cells were spread on LB
1% agar plates containing 12.5 gg/ml of chlorampheni-
col, 1 mM of IPTG and 40 gg/ml of X-gal for selection.

The cosmid library was used in two runs of 454 pyrose-
quencing [26]. The first run was carried out on a GS 20
454 pyrosequencer, which gave read length around 90
basepairs (bps). The second run was done on a GS FLX
454 pyrosequencer, which gave reads length around 250
bps. Both pyrosequencing projects were done in the Inter-
disciplinary Center for Biotechnology Research at the Uni-
versity of Florida, Gainesville, USA. The reads obtained


http://www. biomedcentral.com/1471-2164/10/172



from the GS 20 and the GS FLX shotgun sequencing were
separately assembled into contigs. These contigs were
used in the analyses presented here.

Scanning electron microscopy
For scanning electron microscopic (SEM) D. magna was
fixed in 3% glutaraldehyde in 0.1 M PB for 2 hours at
20 C. Sample was washed two times in distilled water for
5 to 10 seconds, dehydrated in graded ethanol series, and
critical point dried (CPD) overnight (16 hours). The spec-
imens were coated with gold (20 nm) and viewed using a
Philips XL 30 ESEM under high volume conditions from
5 to 15 kv.

Data analysis
Sequences from the D. pulex, D. pulicaria and the two D.
magna datasets included in this study are described in
Table 2. Sequences were compared against the NCBI-nt
database on nucleotide sequences using BLASTN [19]
with the default settings in December 2007. Sequences
longer than 1000 bps were divided into overlapping frag-
ments around 500 bps. Sequences were homogenized to
fragments of similar length so BLAST scores were compa-
rable across different searches. Sequence comparison is
computational challenging and was performed with an
Opteron Linux high performance computer cluster estab-
lished and maintained by the [BC]2 Basel Computational
Biology Center at the Biozentrum University of Basel
http://www.bc2.ch/center/index.htm. For the graphical
presentation of the results we combined the two D. magna
data sets.

For the analysis of the BLASTN results we used the metage-
nomics software MEGAN [20]. This software allows
exploring the taxonomic content of a sample based on the
NCBI taxonomy. The blast files were imported into
MEGAN using the import option BLASTN. The program
then uses several thresholds to generate sequence-taxon
matches. The "min-score" filter sets a bit-score cutoff
value. The "top-percent" filter is used to retain hits whose
scores lie within a given percentage of the highest bit
score. The "min-support" filter is used to set a threshold
for the minimum number of sequences that must be
assigned to a taxon. We used all default parameter settings
of the software (top-percent = 10, min-support = 2),
except the minimal threshold for the bit score of hits,
which were set at 100, following the recommendation of
the authors [20]. This reduces the number of reads
assigned to a taxon, but avoids assignment based on weak
homology. This analysis was done for all datasets between
the 8. and the 11. January 2008.

While inspecting the data we ignored reads assigned to
taxa other than plants and bacteria. Within the bacteria,
we ignored the taxon Firmicutes (mostly gram-positive


Page 19 of 21
(page number not for citation purposes)








BMC Genomics 2009, 10:172


bacteria, many of which are endospore formers, because
the two datasets of D. magna came from animals infected
with the endospore forming pathogen, P. ramosa. The two
other datasets (D. pulex and D. pulicaria), had only few
sequences assigned to the Firmicutes (less than 0.2%).
Thus, excluding the Firmicutes from the analysis did not
influence the overall analysis.

In a separate analysis we manually inspected all four data-
sets for hits assigned to plant taxa (every taxon within and
including the Viridiplantae), searching for hits to plastids
(chloroplasts). For this analysis we set the MEGAN param-
eter minimum supported taxa to one.


Authors' contributions
WQ and DE carried out the Bioinformatics analysis. NG
and JP produced the D. magna sequences. DE designed the
study. FBA produced the SEM images. DE and WQ wrote
most of the manuscript. All authors took part in reviewing
and approval of the final manuscript.


Acknowledgements
Support for the preparation and characterization of cosmid DNA libraries
for D. magna was provided by USDA/CSREES Project 50554, USDA/
CSREES Multi-State Project NE 1019, and the University of Florida IFAS
Agricultural Experiment Station (CRIS Projects FLA-MCS-04353 and FLA-
MCS-04080). The sequencing and portions of the analyses of the D. magna
data were done at the Interdisciplinary Center for Biotechnology Research
at the University of Florida, Gainesville, USA. We thank Li Liu for support
and for the assembly of the contigs of the two D. magna datasets. The
sequencing and portions of the analyses of the D. pulex and the D. pulicaria
data were performed at the DOE Joint Genome Institute under the aus-
pices of the U.S. Department of Energy's Office of Science, Biological and
Environmental Research Program, and by the University of California, Law-
rence Livermore National Laboratory under Contract No. W-7405-Eng-
48, Lawrence Berkeley National Laboratory under Contract No. DE-
AC02-05CH 11231, Los Alamos National Laboratory under Contract No.
W-7405-ENG-36 and in collaboration with the Daphnia Genomics Consor-
tium (DGC) http://daphnia.cgb.indiana.edu. Additional analyses were per-
formed by wFleaBase, developed at the Genome Informatics Lab of Indiana
University with support to Don Gilbert from the National Science Founda-
tion and the National Institutes of Health. Coordination infrastructure for
the DGC is provided by The Center for Genomics and Bioinformatics at
Indiana University, which is supported in part by the METACyt Initiative of
Indiana University, funded in part through a major grant from the Lilly
Endowment, Inc. We thank [BC]2 Basel Computational Biology Center at
the Biozentrum University of Basel for hardware and software support.
Our work benefits from, and contributes to the Daphnia Genomics Con-
sortium. We are grateful to Daniel Mathys from the Zentrum fur Mikrosko-
pie Universitat Basel for technical support with the SEM.

References
I. Delwart EL: Viral metagenomics. Reviews in Medical Virology 2007,
17(2):l 15-131.
2. Beardsley TM: Metagenomics reveals microbial diversity. Bio-
science 2006, 56(3):192-196.
3. Allen EE, Banfield JF: Community genomics in microbial ecol-
ogy and evolution. Nature Reviews Microbiology 2005, 3(6):489-498.
4. Streit WR, Schmitz RA: Metagenomics the key to the uncul-
tured microbes. Curr Opin Mircobiol 2004, 7(5):492-498.


http://www. biomedcentral.com/1471-2164/10/172




5. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen
JA, Wu DY, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap
AH, Lomas MW, Nealson K, White 0, Peterson J, Hoffman J, Parsons
R, Baden-Tillson H, Pfannkoch C, Rogers Y, Smith HO: Environ-
mental genome shotgun sequencing of the Sargasso Sea. Sci-
ence 2004, 304(5667):66-74.
6. Bidle KD, Lee S, Marchant DR, Falkowski PG: Fossil genes and
microbes in the oldest ice on Earth. Proc NatlAcad Sci USA 2007,
104(33): 13455-13460.
7. Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M,
Peterson DM, Saar MO, Alexander S, Alexander EC, Rohwer F:
Using pyrosequencing to shed light on deep mine microbial
ecology. Bmc Genomics 2006, 7:57.
8. Cox-Foster DL, Conlan S, Holmes EC, Palacios G, Evans JD, Moran
NA, Quan PL, Briese T, Hornig M, Geiser DM, Martinson V, vanEn-
gelsdorp D, Kalkstein AL, Drysdale A, Hui J, Zhai J, Cui L, Hutchison
SK, Simons JF, Egholm M, Pettis JS, Lipkin WI: A metagenomic sur-
vey of microbes in honey bee colony collapse disorder. Sci-
ence 2007, 318(5848):283-287.
9. Turnbaugh PJ, Baeckhed F, Fulton L, Gordon Jl: Diet-induced obes-
ity is linked to marked but reversible alterations in the
mouse distal gut microbiome. Cell Host & Microbe 2008,
3(4):213-223.
10. Booijink C, Zoetendal EG, Kleerebezem M, de Vos WM: Microbial
communities in the human small intestine: coupling diversity
to metagenomics. Future Microbiology 2007, 2(3):285-295.
I I. Schmitt S, Wehrl M, Bayer K, Siegl A, Hentschel U: Marine sponges
as models for commensal microbe-host interactions. Symbio-
sis 2007, 44(l-3):43-50.
12. Woyke T, Teeling H, Ivanova NN, Huntemann M, Richter M, Gloeck-
ner FO, Boffelli D, Anderson IJ, Barry KW, Shapiro HJ, Szeto E, Kyrpi-
des NC, Mussmann M, Amann R, Bergin C, Ruehland C, Rubin EM,
Dubilier N: Symbiosis insights through metagenomic analysis
of a microbial consortium. Nature 2006, 443(7114):950-955.
13. Leveau JHJ: The magic and menace of metagenomics: pros-
pects for the study of plant growth-promoting rhizobacteria.
European journal of Plant Pathology 2007, II 9(3):279-300.
14. Poinar HN, Schwarz C, Qi J, Shapiro B, MacPhee RDE, Buigues B,
Tikhonov A, Huson DH, Tomsho LP, Auch A, Rampp M, Miller W,
Schuster SC: Metagenomics to paleogenomics: Large-scale
sequencing of mammoth DNA. Science 2006,
311 I(5759):392-394.
15. Peters RH, Bernardi DR, eds: Daphnia. Verbania Pallanza: Consiglio
Nazionale delle Ricerche Istituto Italiano di Idrobiologia; 1987.
16. Green J: Parasites and epibionts of Cladocera. Trans Zool Soc
Lond 1974, 32:417-515.
17. Ebert D: Ecology, epidemiology and evolution of parasitism in
Daphnia. 2005 [http://www.ncbi.nlm.nih.gov/books/bookres.fcgi/
daph/screenA4.pdf]. Bethesda (MD): National Library of Medicine
(US), National Center for Biotechnology Information
18. Chang N, Jenkins DG: Plastid endosymbionts in the freshwater
crustacean Daphnia obtusa. J Crustac Biol 2000, 20(2):231-238.
19. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local
alignment search tool. J Mol Biol 1990, 215:403-410.
20. Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of
metagenomic data. Genome Research 2007, 17(3):377-386.
21. Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer
F, Edwards RA, StoyeJ: Phylogenetic classification of short envi-
ronmental DNA fragments. Nucleic Acids Research 2008,
36(7):2230-2239.
22. Pop M, Salzberg SL: Bioinformatics challenges of new sequenc-
ing technology. Trends Genet 2008, 24(3):142-149.
23. Raes J, Foerstner KU, Bork P: Get the most out of your metage-
nome: computational analysis of environmental sequence
data. Curr Opin Mircobiol 2007, 10(5):490-498.
24. McHardy A, Rigoutsos I: What's in the mix: phylogenetic classi-
fication of metagenome sequence samples. Curr Opin Mircobiol
2007, 10(5):499-503.
25. Schloss PD, Handelsman J: A statistical toolbox for metagenom-
ics: assessing functional diversity in microbial communities.
Bmc Bioinformatics 2008, 9:.
26. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA,
Berka J, Braverman MS, Chen YJ, Chen ZT, Dewell SB, Du L, Fierro
JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP,
Jando SC, Alenquer ML, Jarvie TP,Jirage KB, Kim JB, Knight JR, Lanza
JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani



Page 20 of 21
(page number not for citation purposes)








http://www. biomedcentral.com/1471-2164/10/172


VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR,
Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson
JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA,
Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM:
Genome sequencing in microfabricated high-density picoli-
tre reactors. Nature 2005, 437(7057):376-380.
27. Fraune S, Bosch TCG: Long-term maintenance of species-spe-
cific bacterial microbiota in the basal metazoan Hydra. Proc
Natl Acad Sci USA 2007, 104:13146-13151.
28. Bandi C, Damiani G, Magrassi L, Grigolo A, Fani R, Sacchi L: Flavo-
bacteria as intracellular symbionts in cockroaches. Proc R Soc
Lond B 1994, 257:43-48.
29. Hurst GDD, Hammarton TC, Bandi C, Majerus TMO, Bertrand D,
Majerus MEN: The diversity of inherited parasites of insects:
the male-killing agent of the ladybird beetle Coleomegilla
maculata is a member of the Flavobacteria. Genet Res Camb
1997, 70:1-6.
30. Hurst GDD, Bandi C, Sacchi L, Cochrane AG, Bertrand D, Bernardet
JF, Nakagawa Y, Holmes B, Karaca I, Majerus MEN: Adonia variegata
(Coleoptera: Coccinellidae) bears maternally inherited Fla-
vobacteria that kill males only. Parasitology 1999, I 18:125-134.
31. Pinhassi J, Azam F, Hemphala J, Long R, Martinez J, Zweifel U, Hag-
strom A: Coupling between bacterioplankton species compo-
sition, population dynamics, and organic matter
degradation. Aquat Microb Ecol 1999, 1 7:13-26.
32. Cottrell M, Kirchman D: Natural assemblages of marine pro-
teobacteria and members of the Cytophaga-Flavobacter clus-
ter consuming low- and high-molecular-weight dissolved
organic matter. Appl Environ Microbiol 2000, 66:1692-1697.
33. Bernardet J, Segers P, Vancanneyt M, Berthe F, Kersters K, Van-
damme P: Cutting a gordian knot: emended classification and
description of the genus Flavobacterium, emended descrip-
tion of the family Flavobacteriaceae, and proposal of Flavo-
bacterium hydatis nom. nov. (Basonym, Cytophaga aquatilis
Strohl and Tait 1978). IntJ Bacteriol 1996, 46:128-148.
34. Lampert W: Feeding and Nutrition in Daphnia. Mem Ist Ital idro-
biol 1987, 45:143-192.
35. Wetzel RG: Limnology. Philadelphia, USA: Saunders College Pub-
lishing; 1975.
36. Cole Chai B, Farris R,WangQ, Kulam-Syed-MohideenA, McGarrell
D, Bandela A, Cardenas E, Garrity G, Tiedje J: The ribosomal data-
base project (RDP-11): introducing myRDP space and quality
controlled public data. Nucleic Acids Res 2007, 35:D 169-D 172.
37. Chang HH, Shyu HF, Wang YM, Sun DS, Shyu RH, Tang SS, HuangYS:
Facilitation of cell adhesion by immobilized dengue viral
nonstructural protein I (NSI): Arginine-glycine-aspartic
acid structural mimicry within the dengue viral NS I antigen.
J Infect Dis 2002, 186(6):743-75 1.
38. Bell KS, Avrova AO, Holeva MC, Cardle L, Morris W, Dejong W,
Toth IK, Waugh R, Bryan GJ, Birch PRJ: Sample sequencing of a
selected region of the genome of Erwinia carotovora subsp.
atroseptica reveals candidate phytopathogenicity genes and
allows comparison with Escherichia coli. Microbiology 2002,
148:1367-1378.
39. Chow V, Nong G, Preston JF: Structure, Function, and Regula-
tion of the Aldouronate Utilization Gene Cluster from
Paenibacillus sp. Strain JDR-2. J Bacteriol 2007, 189:8863-8870.



Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright

Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishingadv.asp


Page 21 of 21
(page number not for citation purposes)


BMC Genomics 2009,10:172




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs