The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus

MISSING IMAGE

Material Information

Title:
The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus
Physical Description:
Mixed Material
Language:
English
Creator:
Rokyta, Darian R.
Lemmon, Alan R.
Margres, Mark J.
Aronow, Karalyn
Publisher:
BioMed Central (BMC Genetics)
Publication Date:

Notes

Abstract:
Background: Snake venoms have significant impacts on human populations through the morbidity and mortality associated with snakebites and as sources of drugs, drug leads, and physiological research tools. Genes expressed by venom-gland tissue, including those encoding toxic proteins, have therefore been sequenced but only with relatively sparse coverage resulting from the low-throughput sequencing approaches available. High-throughput approaches based on 454 pyrosequencing have recently been applied to the study of snake venoms to give the most complete characterizations to date of the genes expressed in active venom glands, but such approaches are costly and still provide a far-from-complete characterization of the genes expressed during venom production.
General Note:
Rokyta et al. BMC Genomics 2012, 13:312 http://www.biomedcentral.com/1471-2164/13/312; Pages 1-23
General Note:
doi:10.1186/1471-2164-13-312 Cite this article as: Rokyta et al.: The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus). BMC Genomics 2012 13:312
General Note:
Publication of this article was funded in part by the University of Florida Open-Access publishing Fund. In addition, requestors receiving funding through the UFOAP project are expected to submit a post-review, final draft of the article to UF's institutional repository at the University of Florida community, with research, news, outreach, and educational materials.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution.
System ID:
AA00013473:00001


This item is only available as the following downloads:


Full Text


Rokyta etal. BMCGenomics 2012,13:312
http://www.biomedcentral.com/1471-2164/13/312


BM ics
Genomics


The venom-gland transcriptome of the eastern

diamondback rattlesnake (Crotalus adamanteus)

Darin R Rokyta1, Alan R Lemmon MarkJ Margres1 and Karalyn Aronow1


Abstract
Background: Snake venoms have significant impacts on human populations through the morbidity and mortality
associated with snakebites and as sources of drugs, drug leads, and physiological research tools. Genes expressed by
venom-gland tissue, including those encoding toxic proteins, have therefore been sequenced but only with relatively
sparse coverage resulting from the low-throughput sequencing approaches available. High-throughput approaches
based on 454 pyrosequencing have recently been applied to the study of snake venoms to give the most complete
characterizations to date of the genes expressed in active venom glands, but such approaches are costly and still
provide a far-from-complete characterization of the genes expressed during venom production.
Results: We describe the denovo assembly and analysis of the venom-gland transcriptome of an eastern
diamondback rattlesnake (Crotalusadamanteus) based on 95,643,958 pairs of quality-filtered, 100-base-pair Illumina
reads. We identified 123 unique, full-length toxin-coding sequences, which cluster into 78 groups with less than 1%
nucleotide divergence, and 2,879 unique, full-length nontoxin coding sequences. The toxin sequences accounted for
35.4% of the total reads, and the nontoxin sequences for an additional 27.5%. The most highly expressed toxin was a
small myotoxin related to crotamine, which accounted for 5.9% of the total reads. Snake-venom metalloproteinases
accounted for the highest percentage of reads mapping to a toxin class (24.4%), followed by C-type lectins (22.2%)
and serine proteinases (20.0%). The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom
metalloproteinases (16 clusters), and the serine proteinases (14 clusters). The high-abundance nontoxin transcripts
were predominantly those involved in protein folding and translation, consistent with the protein-secretory function
of the tissue.
Conclusions: We have provided the most complete characterization of the genes expressed in an active snake
venom gland to date, producing insights into snakebite pathology and guidance for snakebite treatment for the
largest rattlesnake species and arguably the most dangerous snake native to the United States of America, C
adamanteus. We have more than doubled the number of sequenced toxins for this species and created extensive
genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.


Background
Human envenomation by snakes is a worldwide issue
that claims more than 100,000 lives per year and exacts
untold costs in the form of pain, disfigurement, and loss
of limbs or limb function [1-3]. Despite the significance
of snakebites, their treatments have remained largely
unchanged for decades. The only treatments currently


*Correspondence: drokyta@bio.fsu.edu
1 Department of Biological Science, Florida State University, Tallahassee, FL
32306-4295, USA
Full list of author information is available at the end of the article


available are traditional antivenoms derived from antisera
of animals, usually horses [4], innoculated with whole
venoms [5,6]; such an approach is the only readily avail-
able option for largely uncharacterized, complex mixtures
of proteins such as snake venoms. Although often lifesav-
ing and generally effective against systemic effects, these
antivenoms have little or no effect on local hemorrhage
or necrosis [7-9], which are major aspects of the pathol-
ogy of viperid bites and can result in lifelong disability
[4,5]. These traditional treatments also sometimes lead to
adverse reactions in patients [6]. Advances in treatment
approaches will depend on a complete knowledge of the


Q BioMled Central






Rokyta etal. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


nature of the offending toxins, but current estimates of
the numbers of unique toxins present in snake venoms
are in excess of 100 [10], a number not approached in
even the most extensive venom-characterization efforts
to date [11].
The significance of snake venoms extends well beyond
the selective pressures they may directly impose upon
human populations. Snake venoms have evolutionary
consequences for those species that snakes prey upon
[12,13], as well as species that prey upon the snakes
[14], and their study can therefore provide insights into
predator-prey coevolution. Snake venom components
have been leveraged as drugs and drug leads [15-17] and
have been used directly as tools for studying physiologi-
cal processes such as pain reception [18]. In addition to
the significance of the toxins, the nature of the extreme
specialization of snake venom glands for the rapid but
temporary production and export of large quantities of
protein could provide insights into basic mechanisms of
proteostasis, the breakdown of which is thought to con-
tribute to neurodegenerative diseases such as Parkinson's
and Alzheimer's [19].
The eastern diamondback rattlesnake (Crotalus
adamanteus) is a pit viper native to the southeastern
United States and is the largest member of the genus
Crotalus, reaching lengths of up to 2.44 m [20]. The diet
of C. adamanteus consists primarily of small mammals
(e.g., squirrels, rabbits, and mouse and rat species) and
birds, particularly ground-nesting species such as quail
[20]. Because of its extreme size and consequent large
venom yield, C. adamanteus is arguably the most dan-
gerous snake species in the United States and is one of
the major sources of snakebite mortality throughout its
range [21]. Crotalus adamanteus has recently become of
interest from a conservation standpoint because of its
declining range, which at one time included seven states
along the southeastern Coastal Plain [22]. This species
has now apparently been extirpated from Louisiana and
is listed as endangered in North Carolina [23,24]. As a
consequence of recent work by Rokyta et al. [11] based
on 454 pyrosequencing, the venom of C. adamanteus is
among the best-characterized snake venoms; 40 toxins
have been identified.
Transcriptomic characterizations of venom glands
of snakes [25-28] and other animals [29-32] have
relied almost exclusively on low-throughput sequencing
approaches. Sanger sequencing, with its relatively long,
high-quality reads, has been the only method available
until recently and has provided invaluable data on the
identities of venom genes. Because venomous species
are primarily nonmodel organisms, high-throughput
sequencing approaches have been slow to pervade
the field of venomics (but see Hu et al. [33]), despite
becoming commonplace in other transcriptomic-based


fields. Rokyta et al. [11] recently used 454 pyrosequenc-
ing to characterize venom genes for C. adamanteus. More
recently, Durban et al. [34] used 454 sequencing to study
the venom-gland transcriptomes of a mix of RNA from
eight species of Costa Rican snakes. Whittington et al.
[30] used a hybrid approach with both 454 and Illumina
sequencing to characterize the platypus venom-gland
transcriptome, although they had a reference genome
sequence, making de novo assembly unnecessary. Pyrose-
quencing is expensive and low-throughput relative to
Illumina sequencing, and the high error rate, particu-
larly for homopolymer errors [35], significantly increases
the difficulty of identifying coding sequences without
reference sequences.
We sequenced the venom-gland transcriptome of the
eastern diamondback rattlesnake with Illumina technol-
ogy using a paired-end approach coupled with short
insert sizes effectively to produce longer, high-quality
reads on the order of approximately 150 nt to facili-
tate de novo assembly (an approach similar to that of
Rodrigue et al. [36] for metagenomics). The difference in
read length from that of 454 sequencing was compensated
for by the increase of more than two orders of magni-
tude in the number of reads. We demonstrated de novo
assembly and analysis of a venom-gland transcriptome
using only Illumina sequences and provided a compre-
hensive characterization of both the toxin and nontoxin
genes expressed in an actively producing snake venom
gland.


Results and discussion
Venom-gland transcriptome sequencing and assembly
We generated a total of 95,643,958 pairs of reads that
passed the Illumina quality filter for >19 gigabases (Gb) of
sequence from a cDNA library with an average insert size
of -170 nt. Of these reads, 72,114,709 (75%) were merged
(see Methods) on the basis of their 3' overlap (Figure 1),
yielding composite reads of average length 142 nt with
average phred qualities >40 and a total length >10 Gb.
This merging of reads reduced the effective size of the data
set without loss of information and provided long reads to
facilitate accurate assembly.
Our first approach to transcriptome assembly was
aimed at identifying toxin genes. We attempted to use as
many of the data as possible to ensure the identification
of even the lowest-abundance toxins. To this end, we con-
ducted extensive searches of assembly parameter space for
both ABySS [37,38] (Table 1) and Velvet [39] on the basis
of the full set of both merged and unmerged reads. We
used the assemblies with the best N50 values for further
analysis. For Velvet, the assembly using a k-mer size of 91
was best (N50 = 408); this assembly was subsequently
analyzed with Oases [40]. For ABySS, the best k-mer value


Page 2 of 23








Rokyta etoal. BMCGenomics 2012,13:312
http://www.biomedcentral.com/1471-2164/13/312


Table 1 ABySS assembly summaries


C e

1 2 2

1 10 2

1 10 10

1 20 20

1 30 30

1 50 50

1 100 10

1 100 100

1 1000 1000
1 2 2

1 10 2

1 10 10

1 20 20

1 30 30

1 50 50

1 100 10

1 100 100

A1 1000 1000
71 2 2

71 10 2
71 10 10

71 20 20

71 30 30

71 50 50

71 100 10


Total

contigs

2,168,050

329,609

337,039

191,367

135,961

87,092

42,366

42,251

2,319

1,769,274

263,688

272,814

154,459

109,994

70,029

32,476

32,392

1,709

1,431,412

208,036

219,099

122,216

86,599

54,694

23,980


Longest Contigs Contigs


contig

15,079

17,493

17,459

15,906

13,472

10,380

8,553

8,552

5,232

17,166

17,493

17,459

15,906

10,070

10,349

8,300

7,822

5,209

15,641

17,484

17,432

14,372

10,416

10,341

7,817


>200 nt

147,942

37,575

37,944

27,546

21,986

15,461

8,510

8,401

571

141,471

34,076

35,002

25,114

19,994

13,916

7,318

7,231

531

131,742

29,793

31,400

21,816

17,138

11,925

6,183


> N50

27,410

6,399

6,390

4,931

4,034

2,955

1,725

1,707

123

25,105

5,959

6,036

4,575

3,721

2,675

1,479

1,463

114

22,422

5,393

5,567

4,052

3,249

2,313

1,253


Page 3 of 23


A B
100-

80- Overlap = 50 nt
80- Significant


60- 60-



40-






0 50 100 150 200 0 50 100 150
Proposed fragment length Position in fragment
Figure 1 Merging overlapping reads. (A) Reads are slid along each other until the number of matches exceeds the ,i .... threshold. In the
example shown, the optimal overlap is 74 nucleotides (nt). (B) The quality of reads declines dramatically toward their 3' ends, where overlap occurs
if the fragment length is less than twice the read length, allowing the actual quality to be much higher than the nominal values. The example
shown is the average of pairs that overlap by exactly 50 nt.


Median

364

571

554

529

494

463

432

431

428

361

618

586

545

496

455

426

423

424

360

683

629

581

524

464

424


Mean

592

985

967

878

812

737

658

656

631

604

1,032

998

891

808

725

655

652

614

617

1,101

1,041

928

835

729

650


Total

length

.77 x 10

.70 x 10,

.67 x 10,

.42 x 10,

.79 x 10,

.14 x 10

.60 x 10'

.52 x 10'

.61 x 10'

.55 x 10,

.52 x 10,

.50 x 10,

.24 x 10,

.62 x 10,

.01 x 10

.79 x 10'

.72 x 10'

.27 x 10'

.13 x 10

.28 x 10,

.27 x 10

.03 x 10,

.43 x 10,

.70 x 10'

.02 x 10'







Rokyta etal. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312




Table 1 ABySS assembly summaries (Continued)
71 100 100 23,997 7,81
71 1000 1000 1,199 5,2C


Page 4 of 23


6,119
443


1,142,924 15,303 120,593 19,69,


81 10 2 158,781
81 10 10 174,032
81 20 20 92,627
81 30 30 64,866
81 50 50 39,613
81 100 10 16,303
81 100 100 16,493


17,713
17,688
15,784
9,868
10,328
10,149
10,155


81 1000 1000 889 5,198


354 625 911 7.54 x 10'


24,939 4,721 788 1,202 1,898 3.00x 10
27,366 5,005 691 1,096 1,774 3.00 x 107
17,715 3,396 654 1,011 1,573 1.79 x 107
13,697 2,666 592 904 1,366 1.24 x 107
9,358 1,874 513 778 1,130 7.28 x 106
4,777 970 454 687 963 3.28 x 106
4,817 981 438 671 943 3.23 x 106
381 8) 454 649 790 )47 x 10'


91 2 2 932,237
91 10 2 124,647
91 10 10 142,306


15,694
17,713


91 20 20 72,117 15,792


91 30 30 48,540
91 50 50 27,581
91 100 10 10,503
91 100 100 10,770


15,792
10,199
10,105
10,149


108,954
20,420
23,525
13,614
9,949
6,477
3,081
3,308


91 1000 1000 598 3,008 3'



was also 91 (N50 = 2,007), but because the performance
in terms of full-length transcripts appeared to depend
strongly on the coverage (c) and erode (e) parameters, we
further analyzed the k = 91 assemblies with c = 10
and e = 2, c = 100 and e = 100, and c = 1000 and
e = 1000. We identified all full-length toxins by means of
blastx searches on the results of all four assemblies.
As part of our first approach, we also performed
four independent de novo transcriptome assemblies with
NGen: three with 20 million merged reads each and one
with the remaining 12,114,709 merged reads (Table 2). We
identified all full-length toxins from all four assemblies.
Given that all three assembly methods tended to generate
a large number of fragmented toxin sequences, apparently
because of retained introns and possibly alternative splic-
ing, we developed and implemented a simple hash-table


17,394 344
4,025 880


622 936
1,293 2,00-/


6.79 x 10/
2.64 x 107


4,428 727 1,126 1,804 2.65 x 10/
2,702 752 1,108 1,712 1.51 x 107
2,009 700 1,023 1,529 1.02 x 107
1,336 624 901 1,309 5.84 x 106
658 564 816 1,155 2.52 x 106
705 528 769 1,078 2.54 x 106


76 438 621 754 2.12 x 10'



approach to completing partial transcripts, which we will
refer to as Extender (see Methods). We used Extender on
partial toxin sequences identified for two of the four NGen
assemblies. We also annotated the most abundant full-
length nontoxin transcripts for the three assemblies based
on 20 million reads. After combining all of the anno-
tated toxin and nontoxin sequences from the ABySS, Vel-
vet, and NGen assemblies and eliminating duplicates, we
had 72 unique toxin sequences and 234 unique nontoxin
sequences. The paucity of full-length annotated nontox-
ins reflects our focus on toxin sequences rather than their
absence in the assemblies.
Our second approach to transcriptome assembly
was designed to annotate as many full-length coding
sequences (toxin and nontoxin) as possible and to build
a reference database of sequences to facilitate the future


Table 2 NGen assembly summaries


No. contigs


Assembly
NGen 1
NGen 2
NGen 3
NGen4


No. reads
20,000,000
20,000,000
20,000,000
12,114,709


No. contigs
12,694
12,746
12,698
8,484


Assembled
sequences
9,786,054
9,821,212
9,820,553
5,948,003


Unique full-
length toxins
34


Unique Extender
toxins
54


Total unique full-length toxins







Rokyta etal. BMCGenomics 2012,13:312
http://www.biomedcentral.com/1471-2164/13/312


analysis of other snake venom-gland transcriptomes. We
found that NGen was much more successful at producing
transcripts with full-length coding sequences but also
that it was quite inefficient when the coverage distribu-
tion was extremely uneven (see Figure 2). Feldmeyer et
al. [41] also found NGen to have the best assembly per-
formance with Illumina data. We sought therefore first
to eliminate the transcripts and corresponding reads for
the extremely high-abundance sequences. To do so, we


employed Extender as a de novo assembler by starting
from 1,000 individual high-quality reads and attempting
to complete their transcripts (see Methods). From 1,000
seeds, we identified 318 full-length coding sequences with
213 toxins and 105 nontoxins. After duplicates were elim-
inated, this procedure resulted in 58 unique toxin and 44
unique nontoxin full-length transcripts. These sequences
were used to filter the corresponding reads from the full
set of merged reads with NGen. We then performed a de


01
0I


0 Nontoxin (137)
* Toxin (63)


I-III-lenqth Tran?'ript rank


E Nontoxin (2,879)
N Toxin (78)


0 500 1000 1500 2000 2500
Full-length transcript rank


m BPP
m CRISP
E CTL
E LAAO
* MYO
* Others
* PLA2
E SVMPII
* SVMPIII
* SVSP


Toxin clusters
Figure 2 Domination of the C. adamanteus venom-gland transcriptome by toxin transcripts. The 123 unique toxin sequences were clustered
into 78 groups with less than 1% nucleotide divergence for estimation of abundances. (A) The vast majority ofthe extremely highly expressed
genes were toxins. The inset shows a .. ii., .1, ,. ofthe top 200 transcripts. (B) Expression levels of individual toxin clusters are shown with toxin
classes coded by color. The toxin clusters are in the same order as in Table 3.


Page 5 of 23







Rokyta etal. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


novo transcriptome assembly on 10 million of the filtered
reads with NGen, annotated full-length transcripts from
contigs comprising > 200 reads with significant blastx
hits, and used the resulting unique sequences as a new fil-
ter. This process of assembly, annotation, and filtering was
iterated two more times. The end result was 91 unique
toxin and 2,851 unique nontoxin sequences.
The results from both assembly approaches were
merged to yield the final data set. The first approach
produced 72 unique toxin and 234 unique nontoxin
sequences, and the second 91 toxin and 2,851 non-
toxin sequences. The merged data set consisted of
123 unique toxin sequences and 2,879 nontoxins that
together accounted for 62.9% of the sequencing reads
(Figure 3).


Toxin transcripts
We identified 123 individual, unique toxin transcripts with
full-length coding sequences. To estimate the abundances
of these transcripts in the C. adamanteus venom-gland
transcriptome, we clustered them into 78 groups with less
than 1% nt divergence (Table 3). Clusters could include
alleles, recent duplicates, or even sequencing errors,
which are characteristic of high-throughput sequencing
[42]. For longer genes, clusters might also include different
combinations of variable sites that are widely separated in
the sequence. We chose 1% as a practical, but arbitrary,
cut-off for defining clusters. Mapping reads back to more
similar sequences to estimate abundances would be prob-
lematic because reads could not be uniquely assigned to
a particular sequence. The true number of toxin genes


0


\% 0 4%


Toxin


'-4 4:


Nontoxin 27.65,


Unidentified


57
sequences


116
sequences


I I I I I I ____________ ____________i ____________ ou no


Ribosomal Mitochondrial Protein
degradation


Figure 3 Expression levels of major classes of toxins and nontoxins. More than 60% of the total reads have been accounted for with full-length
annotated transcripts. (A) The major toxin classes were the CTLs, SVSPs, MYO, and SVMPs (types II and III). (B) As expected for a protein-secreting
tissue, the venom gland expresses an abundance of proteins involved in proteostasis.


Chaperones/
protein folding


8
sequences


Page 6 of 23








Rokyta etoal. BMCGenomics 2012,13:312
http://www.biomedcentral.com/1471-2164/13/312


Table 3 Expression levels of full-length toxin clusters
Rank Cluster name Cluster size Length

1 MYO 1 994
2 LAAO 1 3089
3 SVSP-6 1 1720
4 CTL-8 3 721


SVSP-13
PLA2-1


SVSP-3


SVMPII-5


CTL-4


% total reads

5.93
1.88
1.71
1.02




1.01
9.68 x 10 1


9.38 x 10-1


9.15 x 10


'80 9.14 x 10


SVMPII-1


SVMPIII-2


SVMPII-2


SVSP-1
CTL- 16
SVMPII-7
SVMPII-3


2138


9.13 x 10







89 1
8.97 x 10







8.38 x 10


% toxin reads

16.780
5.309
4.849
2.896




2.864
2.739


2.653


2.587













2.585







2.583







2.538







2.369


2.348
2.211
2.191
2.176


17 PLA2-4 1


GenBank TSA accessions

JU173668
JU173667
JU173733
a:JU 173656,
b:JU 173657,
c:JU173658
JU173724
aJU173675,
b:JU 173676
aJU173728,
b:JU173729
aJU173694,
b:JU 173695,
cJU173696,
d:JU 173697,
e:JU 173698,
f:JU 173699,
,, I1173700,

h:JU173701
aJU173646,
b:JU 173647,
cJU173648,
d:JU 173649,
e:JU 173650
aJU173682,
b:JU 173683,
cJU173684,
d:JU 173685,
e:JU 173686
aJU173707,
b:JU 173708,
cUl173709,
d:JU 173710,
e:JU 173711
aJU173687,
b:JU 173688
JU173726
JU173631
JU173703
aJU173689,
b:JU 173690,
cJU173691,
d:JU 173692
JU173679


Page 7 of 23


890 6.98 x 10








Rokyta etal. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312





Table 3 Expression levels of full-length toxin clusters (Continued)
18 CTL-3 6 797


SVMPII-6
SVMPIII-3




SVSP-9
SVMPII-4
SVSP-8
CTL-7
CTL- 18
CTL-1
SVSP-12
CTL-6
CRISP
SVMPIII-7
SVMPII-8
PLA2 -6
SVSP-7




CTL- 15
PLA2 -2
CTL- 10
CTL- 14
CTL- 13
PLA2 -5
CTL-9


BPP
SVMPIII-1


SVSP-14
SVMPIII-4


10270
2016
3524
763


CTL-2


SVMPIII-8
VESP
SVMPIII -5


6.73 x 10


1.905


Page 8 of 23








a:JU 173640,
b:JU 173641,
c:JU 173642,
d:JUl73643,
e:JUl73644,
f:JU 173645
JU173702
a:JU173712,
b:JU 173713,
c:JU 173714
JU173738
JU173693
JU173737
JU173655
JU173633
JU173635
JU173723
JU173654
JU173623
JU173719
JU173704
JU173681
a:JU173734,
b:JU 173735,
c:JU 173736
JU173630
JU173677
JU173624
JU173629
JU173628
JU173680
a:JU 173659,
b:JU 173660
JU173621
a:JU173705,
b:JU 173706
JU173725
a:JU173715,
b:JU173716
a:JU 173638,
b:JUl73639
JU173720
JU173741
JU173717


1.90 x 10







Rokyta etal. BMCGenomics 2012,13:312
http://www.biomedcentral.com/1471-2164/13/312




Table 3 Expression levels of full-length toxin clusters (Continued)
49 SVSP-4 2 2152


CTL-20
NUC
SVSP-5
SVMPIII-6
CTL-21
CTL-19
NF
CTL-12


SVSP-2
CTL-5




PDE
CTL-11
PLA2-3
CREGF
SVSP 10
HYAL-1
KUN
HYAL-2
KUN-1
VEGF-1


SVSP- 11
GC
PDE-6
NGF
KUN-2
PDE-4


824


1698


5087


WAP
CTL- 1


for C. adamanteus probably lies somewhere between 78
and 123. This range is at the lower end of the number
of unique toxins typically identified for viperids by means
of proteomic techniques [10], which may indicate that
the venom of C. adamanteus is less complex than that of
other species. Alternatively, posttranscriptional processes
such as alternative splicing or posttranslational modifica-
tions could significantly increase the diversity of toxins
present in the venom. Our identified toxins accounted for
35.4% of the total reads (Figure 3), and the vast majority


of the extremely high-abundance transcripts were those
encoding toxin proteins (Figure 2A). We named toxins
with a combination of a toxin-class abbreviation, a cluster
number, and, if the cluster had more than a single member,
a lower-case letter to indicate the member of the cluster
(e.g., CTL-3b).
We used the number or percentage of reads mapping
to a particular transcript as a measure of its abundance.
Although average coverage might be a more appropriate
proxy for the number of copies of a given transcript


Page 9 of 23


1.34 x 10


1.19 x 10


0.336


1.09 x 10


9.40 x 10


0.30,


0.266


0.091


3.22 x 10


a:JU 173730,
bJU173731
JU173636
JU173671
JU173732
JU173718
JU173637
JU173634
JU173669
a:JU 173626,
bJU173627
JU173727
a:JU 173651,
bUl173652,
c:JU 173653
JU173674
JU173625
JU173678
JU173622
JU173721
JU173662
JU173666
JU173663
JU173664
a:JU 173739,
bJU173740
JU173722
JU173661
JU173673
JU173670
JU173665
JU173672
JU173742
JU173743
JU173632


1.14 x 10


0.032


4.99 x 10


0.014


3.36 x 10
3.14 x 10


0.010
0009


1.70 x 10


0.005






Rokyta etal. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


present, because it accounts for differences in transcript
lengths, we prefer read counts as a measure of the expres-
sion expenditure on a given transcript because they better
reflect the energetic cost associated with producing the
encoded protein and are consistent with previous work
using low-throughput sequencing (see, e.g., Pahari et al.
[25]). In addition, this measurement should more closely
match proteomic-based measurements of the contents of
venom components (see, e.g., Gibbs et al. [43]) which
come in the form of the percentages of total peptide bonds
in the sample.

Snake venom metalloproteinases
We identified 39 unique sequences and 16 clusters of
snake-venom metalloproteinases (SVMPs) that accounted
for 24.4% of the reads mapping to toxin sequences and
8.6% of the total reads (Figure 3A and Table 3). In terms
of total reads, the SVMPs were the most abundant class of
toxins in the C. adamanteus venom-gland transcriptome.
SVMPs are the primary sources of the local and systemic
hemorrhage associated with envenomation by viperids
and are divided into a number of subclasses based on
their domain structure [44,45]. All SVMPs have a metallo-
proteinase domain characterized by a zinc-binding motif.
All of the SVMPs identified for C. adamanteus belong to
either the type II or the type III subclass. Type II SVMPs
(SVMPIIs) have a disintegrin domain in addition to the
metalloproteinase domain, which may be proteolytically
cleaved posttranslationally to produce a free disintegrin.
Type III SVMPs (SVMPIIIs) have a disintegrin-like and a
cysteine-rich domain in addition to the metalloproteinase
domain. We found 8 clusters of each of these two sub-
classes with 23 unique SVMPII sequences and 16 unique
SVMPIII sequences. SVMPII and SVMPIII clusters com-
prise 16.4% and 8.0% of the reads mapping to toxins
respectively (Figure 3). The sequences in both subclasses
are diverse. The maximum pairwise nt divergence for the
SVMPIIs was 10.0%, corresponding to a maximum amino-
acid divergence of 18.1%. For the SVMPIIIs, the maxi-
mum pairwise nt divergence was 20.4% with a maximum
amino-acid divergence of 42.3%. Although SVMPs were
the dominant toxins as a class, the individual SVMP clus-
ter with the highest abundance was SVMPII-5, which was
only the eighth most abundant toxin cluster (Figure 2B
and Table 3).
Mackessy [46] categorized rattlesnake venoms as type
I or type II on the basis of their toxicities and metallo-
proteinase activities. These two measurements tend to be
inversely related in rattlesnakes: species (or populations)
with low LDso values tend also to have low or undetectable
hemorrhagic activities. SVMPs are the major hemorrhagic
components of snake venoms, and high toxicity appears
to be caused mostly by neurotoxic venom components.
Low-toxicity venoms with high metalloproteinase activity


are classified as type I, and high-toxicity venoms with low
metalloproteinase activity are classified as type II. On the
basis of the abundance of SVMPs in the venom-gland
transcriptome, C. adamanteus clearly has type I venom,
although the relatively low toxicity of its venom [46] is
at least partially compensated for by its large size and
venom yield.

C-type lectins
The most diverse and the second most abundant toxin
class in the C. adamanteus venom-gland transcriptome
was the C-type lectin (CTL) class. We identified 37 unique
sequences and 21 clusters of CTLs that accounted for
22.2% of the reads mapping to toxins and 7.8% of the
total reads (Figure 3A and Table 3). CTLs generally either
inhibit or activate components of plasma or blood-cell
types, thereby interfering with hemostasis [47]. Most
known snake-venom CTLs function as heterodimers or
even more complex arrangements [48], probably account-
ing in part for their diversity. The divergence among
members of this class within the C. adamanteus genome
was extreme, although all members preserved a CTL-like
domain. Some pairs shared virtually no conserved amino-
acid positions. Three of the CTL clusters provide evidence
for the relevance of alternative splicing in the generation
of toxin proteins. CTL-3f, CTL-4e, and CTL9b all have 48-
nt insertions in the same region but are otherwise similar
or identical to other members of their clusters.

Snake venom serineproteinases
The third most abundant toxin class for C. adaman-
teus was the snake-venom serine proteinases (SVSPs). We
identified 18 unique sequences and 14 clusters in this
toxin class, accounting for 20.0% of the toxin reads and
7.1% of the total reads (Figure 3A and Table 3). Three of
the 10 most highly expressed individual toxins were SVSPs
(Figure 2). SVSPs interfere with a wide array of reactions
involving blood coagulation and hemostasis and belong to
the trypsin family of serine proteases [49,50]. Mackessy
[46] detected significant thrombin-like and kallikrein-
like activity in the venom of C. adamanteus, which are
attributable to the action of SVSPs. The diversity of SVSPs
within the C. adamanteus genome is high; maximum pair-
wise nt divergence is 20.6% and amino-acid divergence
is 47.4%.
The members of two SVSP clusters differ in a way
that should be noted. The lengths of SVSPs are gener-
ally well conserved throughout the class. SVSP-7a has a
27-nt insertion relative to the two other members of its
cluster but is otherwise identical to SVSP-7b. This differ-
ence could reflect the presence of alternative splicing for
this gene. SVSP-3a is unique among the C. adamanteus
SVSPs or those known from other snake species in appar-
ently having a 65-amino-acid extension of its C-terminal


Page 10 of 23






Rokyta etoal. BMCGenomics 2012,13:312
http://www.biomedcentral.com/1471-2164/13/312


region. The other member of its cluster, SVSP-3b, has a
single deletion of a C nt in a poly-C tract that terminates
its coding sequence consistently with other known SVSPs.
The reads generating the SVSP-3a form vastly outnum-
ber those for the SVSP-3b form; more than 95% of the
reads support the extended version of the protein. The
effect, if any, of this C-terminal extension remains to be
determined.

Phospholipase A2's
Previous work with C. adamanteus identified only a single
phospholipase A2 (PLA2) sequence [11], but we iden-
tifed seven unique sequences in six clusters (Figure 2
and Table 3), accounting for 7.8% of the toxin reads and
2.8% of the total reads (Figure 3). PLA2s are among the
most functionally diverse classes of snake-venom toxins
and have pharmocological effects ranging from neuro-
toxicity (presynaptic or postsynaptic) to myotoxicity and
cardiotoxicity. Anticoagulant and hemolytic effects due to
PLA2s are also known [51,52]. Compared to other toxin
classes of C. adamanteus, the diversity of PLA2s is low.
Five of the six clusters are all within 5% nt divergence of
one another. PLA2-3 is the lone, high-divergence outlier,
differing by more than 31% at the nt level from the other
clusters. PLA2-3 is also expressed at the lowest level of any
of the PLA2s (Table 3).

Other high-abundance toxins
The SVMPs, CTLs, SVSPs, and PLA2s account for 74% of
the reads mapping to toxin sequences (Figure 3), 73% of
the toxin clusters, and 82% of the unique toxin sequences.
The remaining toxins belong to 16 different classes. Many
of these are low-abundance transcripts (Figure 2 and
Table 3) and may not actually function as significant
toxins, whereas several others have high to moderate
abundances and represent significant components of the
venom.
The most abundant toxin transcript and the most
abundant transcript overall (Figure 2) was a small basic
myotoxin related to crotamine [53,54]. The precursor pro-
tein is just 70 amino acids in length with a predicted 22-
amino-acid signal peptide. This transcript was detected
by Rokyta et al. [11], but the coding sequence was prema-
turely truncated in their sequence because of a single nt
deletion. This toxin accounts for 16.8% of the toxin reads
(Figure 3A) and 5.9% of the total reads. Crotamine, origi-
nally isolated from the venom of C. durissus, causes spas-
tic paralysis in mice and is found in the venoms of many
species of Crotalus [54]. Muscle spasms, twitching, and
paralysis of the legs have been reported for human enven-
omations by C. adamanteus [20]. Interestingly, Straight
et al. [55] noted that individuals of C. adamanteus from
populations in southern and central Florida lack this toxin
in their venoms. Given that this myotoxin is the most


abundant transcript in the venom of our specimen, its
absence in southern populations points to a dramatic dif-
ference in venoms within this species and the potential for
significantly different pathological effects associated with
bites from different C. adamanteus populations.
A single L-amino-acid oxidase (LAAO) transcript was
the second most abundant toxin transcript (Figure 2B),
consistent with the previously detected LAAO activity
in the C. adamanteus venom [46]. This single tran-
script accounted for 5.3% of the reads mapping to toxins
and 1.9% of the total reads. LAAOs are flavoproteins,
giving the venom its yellow color; can be edema- or
apoptosis-inducing; and can induce or inhibit platelet
aggregation [56]. These effects are probably mediated by
H202 released during the oxidation reaction catalyzed
by the enzyme. The 29th most abundant toxin transcript
was a cysteine-rich secretary protein (CRISP) (Figure 2B
and Table 3), accounting for 1.3% of the toxin reads
(Figure 3A). Although CRISPs are widely found in snake
venoms, their precise effects are not well established
[57], but they appear to interfere with smooth-muscle
contraction [58,59]. A single transcript for a bradykinin-
potentiating and C-type natriuretic peptide transcript
(BPP) was found to account for 0.7% of the toxin reads
(Figure 3A). The encoded protein is similar to a pro-
tein identified in Sistrurus catenatus (GenBank accession:
DQ464265) that was hypothesized to reduce blood pres-
sure in envenomated prey [25]. A loss of blood pres-
sure has been reported in human envenomations by
C. adamanteus [20].

Other low-abundance toxins
The remaining 17 clusters are classified as "others" in
Figure 3A. Because each has a relatively low expression
level (Table 3), many of these should be considered puta-
tive toxins until their presence in the C. adamanteus
venom is confirmed proteomically and pharmacological
effects are associated with them.
Rokyta et al. [11] detected the presence of a transcript
encoding a protein homologous to ohanin from Ophioph-
agus hannah [60,61] and to a homologous protein from
Lachesis muta [62]; we found a transcript identical to
that of Rokyta et al. [11]. Pung et al. [60,61] found the
0. hannah version of this protein to increase pain sensi-
tivity (hyperalgesia) and to induce temporary hypoloco-
motion in mice and proposed naming the class vespryns
(VESP). Exceptionally intense pain has been reported
after envenomation of humans by C. adamanteus [20],
although whether such pain is due to a specific toxin is
not clear.
We detected three different nucleotidases (NUCs) and
five different phosphodiesterases (PDEs) in the venom-
gland transcriptome of C. adamanteus. Only one of the
NUCs and three of the PDEs had signal peptides, and


Page 11 of 23






Rokyta etal. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


we therefore only considered these as potential toxins:
NUC, PDE, PDE-4, and PDE-6 (Table 3). The roles of
these enzymes in venoms are uncertain, but their primary
function may be to liberate toxic nucleosides [63-65].
Significant PDE activity has been detected previously in
the venom of C. adamanteus [46].
The C. adamanteus venom-gland transcriptome con-
tained three Kunitz-type protease inhibitors (KUNs).
Two of these shared more than 75% animo-acid identity
with a KUN from Austrelaps labialis (GenBank acces-
sion: B2BS84), an Australian elapid. All three KUNs have
domains that place them in the superfamily of bovine pan-
creatic trypsin-like inhibitors, and snake toxins from this
family are known to inhibit plasma serine proteinases.
Although KUNs are commonly observed in snake venoms,
their role in envenomation (if any) is not well defined [66].
The three KUNs detected for C. adamanteus are all at rel-
atively low abundances, suggesting that they are not major
components of the venom.
We identified two transcripts, HYAL-1 and HYAL-2,
encoding hyaluronidase-like proteins. Hyaluronidases are
generally regarded as venom components that promote
the dissemination of other venom components by degrad-
ing the extracellular matrix at the site of injection [67],
although they may have more direct toxic effects [68].
The coding sequences of our two transcripts differ only
in the presence of a 765-nt deletion in HYAL-2 relative to
HYAL-1. Truncated hyaluronidases such as HYAL-2 have
been detected in the venoms of other viperid species [67]
and may represent an example of alternative splicing. We
also identified a transcript encoding a glutaminyl-peptide
cyclotransferase (glutaminyl cyclase; GC). Many snake
venom components have N termini blocked by pyrogluta-
mate, and GCs catalyze the formation of this block. This
component is related more to maturation and protection
of other toxins and probably contributes only indirectly to
toxicity [69].
We identified six growth-factor-related sequences in the
venom-gland transcriptome of C. adamanteus: a nerve
growth factor (NGF), a neurotrophic factor (NF), two
vascular endothelial growth factors (VEGF) in a sin-
gle cluster, and a cysteine-rich with EGF-like domain
protein (CREGF). The NGF transcript encodes a 241
amino-acid precursor protein and shares 99% amino-
acid identity with a NGF from C. durissus (GenBank
accession: AAG30924). The NF transcript encodes a 180-
amino-acid precursor that shares h..I..l..\ with mes-
encephalic astrocyte-derived neurotrophic factors. We
found no close venom-related sequences for this NF in
the available databases. The VEGF sequences appear to
be alternatively spliced versions of one another. VEGF-
la encodes a 192-amino-acid precursor, and VEGF-lb
encodes a 148-amino-acid precursor. Aside from the
132-nt deletion in VEGF-1b relative to VEGF-la, their


coding sequences are identical. Both forms have database
matches of the same length with 99% amino-acid iden-
tity from Trimeresurus flavoviridis (GenBank accessions:
AB154418 and AB154419). Finally, we detected the same
cysteine-rich with EGF-like domain protein as described
by Rokyta et al. [11].
The final two putative toxin transcripts are of question-
able significance because of their low expression levels.
A single sequence with 77% amino-acid identity to a
waprin (WAP) sequence from Philodryas olfersii (Gen-
Bank accession: EU029742), a rear-fanged colubrid, was
detected. Related sequences have been detected in a vari-
ety of other rear-fanged snake species, but such proteins
are only known to exhibit antimicrobial activity [70]. We
detected a venom factor (VF) transcript that shares 87%
animo-acid identity with a VF from Austrelaps superbus
(GenBank accession: AY903291) [71]. The C. adamanteus
VF transcript encodes a 1,652-amino-acid precursor with
a 22-amino-acid signal peptide. The best-studied member
of this toxin family is cobra venom factor, which is known
to activate the complement system [72]. The extremely
low expression levels of these transcripts may indicate that
they represent the orthologous genes to the ancestors of
the known toxic forms and may therefore have no toxic
functions.

Comparison to previous work
Rokyta et al. [11] previously described toxin transcripts in
the venom-gland transcriptome of C. adamanteus on the
basis of 454 pyrosequencing. Their work used RNA from
the venom gland of the same individual used in the present
work. They found 40 unique toxin transcripts, 10 of which
contained only partial coding sequences. Table 4 lists the
closest matches from our current sequences to those of
Rokyta et al. [11]. The vast majority of the 454-based
sequences had either identical matches in our current
set of toxins or matches with less than 1% nt divergence
(Table 4). Only a single 454 toxin, SVSP-9, did not have
a close match. This sequence contains only a partial cod-
ing sequence and therefore may not represent a true,
functional toxin.

Nontoxin transcripts
We characterized the nontoxin genes expressed in the
C. adamanteus venom gland by two means. First, we
took all of the contigs from one of our four de novo
NGen assemblies based on 20 million merged reads and
conducted a full Blast2Go [73] analysis on the contigs
comprising >100 reads. Of the 12,746 contigs (assem-
bly 2 in Table 2), we were able to provide gene .!i.l .1. .,\
(GO) annotations for 9,040 of them (Figure 4A). The
major functional classes (level 2) represented in these
results were binding and catalysis, followed by transcrip-
tion regulation (Figure 4B). The major biological process


Page 12 of 23








Rokyta etoL BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312








Table 4 Correspondence with the results of Rokyta et al. [11]


454 name

CREGF
CRISP
CTL- 1
CTL-2
CTL-3
CTL-4
CTL-5
CTL-6
CTL-7
CTL-8
CTL-9
HYAL
LAAO
MYO
NUC
PDE-1
PDE-2
PLA2
PLB
SVMP-1
SVMP-2
SVMP-3
SVMP-4
SVMP-5
SVMP-6
SVMP-7
SVMP-8
SVMP-9
SVMP-10
SVMP- 11
SVSP- 1
SVSP-2
SVSP-3
SVSP-4
SVSP-5
SVSP-6
SVSP-7
SVSP-8
SVSP-9
VESP


Accession

HQ414087
HQ414088
HQ414089
HQ414090
HQ414091
HQ414092
HQ414093
HQ414094
HQ414095
HQ414096
HQ414097
HQ414098
HQ414099
HQ414100
HQ414101
HQ414102
HQ414103
HQ414104
HQ414105
HQ414106
HQ414107
HQ414108
HQ414109
HQ414110
HQ414111
HQ414112
HQ414113
HQ414114
HQ414115
HQ414116
HQ414117
HQ414118
HQ414119
HQ414120
HQ414121
HQ414122
HQ414123
HQ414124
HQ414125
HQ414126


Page 13 of 23


Closest match

CREGF
CRISP
CTL-4a
CTL-8a
CTL-1
CTL-9a
CTL-3e
CTL-12b
CTL- 10
CTL-2a
CTL-5a
HYAL-1
LAAO
MYO
NUC
PDE
PDE -2 (nontoxin)
PLA2-1 b
PLB (nontoxin)
SVMPII-3b
SVMPII-3b/c
SVMPII-5a
SVMPIII-2d
SVMPIII-4b
SVMPIII-2d
SVMPIII-4a
SVMPIII-5
SVMPIII la/b
SVMPIII-6
SVMPIII-3a
SVSP-3a
SVSP-1
SVSP-7a
SVSP-5
SVSP-9
SVSP-6
SVSP-4b
SVSP-2
None
VESP


% nt divergence

0.1
0.0
0.0
0.0
0.0
0.8
0.9
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.2
0.0
0.5
0.3
1.2
1.0
0.2
0.0
0.5
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.5
0.0
0.0
0.0
> 10
0.0


Notes


Identical
Identical
Identical
Identical




Identical
Identical
Identical
Identical
454 version incomplete
Identical
454 version has 1-nt deletion that truncates the coding sequence prematurely)
454 version incomplete
454 version has 123-nt insertion
454 version incomplete; no signal peptide; no longer considered toxin
Identical
No longer considered toxin
Identical









Identical
454 version incomplete
454 version incomplete
454 version incomplete
454 version incomplete
454 version has 1 -nt deletion that truncates the i I,, I sequence prematurely)
Identical
Identical




Identical
454 version incomplete
454 version incomplete
454 version incomplete
Identical







Rokyta etol. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


NGen assembly (12,746 contigs)
Annotated nontoxins (2,879 contigs)


74,

14


0%

04


4,
C
S
C
4


B
50
s-

40-
-
6 30

S20-
-o
a 10
0-


Molecular function (level 2)


NGen assembly (13,856 hits)
Annotated nontoxins (4,205 hits)


\


Biological process (level 2)

NGen assembly (27,847 hits)
1 Annotated nontoxins (8,614 hits)


Cellular component (level 2)


o 30

0
20
Eo
I 10
Q-


NGen assembly (17,376 hits)
Annotated nontoxins (5,686 hits)


-.. *Q S O ," ... % ,
6-9 '. 9 0 ? \
) \
t 0 2<>& %4stt
'0C
C40 1Q


0~ Q 4^ 4

0<41

'0
/-I-


Figure 4 Comparison of gene ontology (GO) results for our annotated full-length nontoxin sequences with those of the contigs from a de
novo assemblywith NGen. Only level 2 GO terms are shown. The distributions of GO terms are similar across data sets, suggesting that the
annotated transcripts provided a comprehensive characterization of the genes expressed in the venom gland. (A) The distributions of sequences
reaching various stages of identification and annotation are shown. The level 2 GO terms are shown for molecular function (B), biological process
(C), and cellular component (D).


GO terms (level 2) were cellular processes and metabolic
processes (Figure 4C). Interestingly, viral reproductive
function was detected and probably represents the activ-
ity of transposable elements or retroviruses like those
previously noted in snake venom-gland transcriptomes
[34]. The major cellular component GO terms (level 2)
were cell and organelle (Figure 4D). For these results, we
made no attempt to exclude toxin sequences, because they
are necessarily a small minority of the total sequences,
and did not require that contigs contain full-length
coding sequences.
For our second approach, we used only the 2,879 tran-
scripts with full-length coding sequences for nontoxin
proteins. We analyzed these sequences with Blast2GO.
The distributions of level 2 GO terms for these data
were almost identical to those of the full NGen assem-
bly described above (Figure 4), suggesting that our 2,879
annotated nontoxin sequences provide a representative
sample of the full venom-gland transcriptome. The full


distributions of GO terms for these sequences across all
levels are shown in Figures 5, 6, and 7. As expected for
a secretary tissue, processes related to protein produc-
tion and secretion were well represented (e.g., protein
transport and protein modification; Figure 5), as were
protein-binding functions (Figure 6) and proteins local-
ized to the endoplasmic reticulum (ER) and the Golgi
apparatus (Figure 7).
Four of the top 20 most highly expressed nontoxin
genes (Table 5), including the most highly expressed,
were protein disulfide isomerases (PDIs). In particular,
they were members of the PDI family that is retained
in the ER and are characterized by having two or more
PDI domains, which are similar to thioredoxin. PDIs cat-
alyze the formation or breaking of disulfide bonds and are
therefore involved in protein folding. Molecular chaper-
ones were well represented in the top 20 nontoxins by
four genes: endoplasmin (a member of the HSP90 family),
calreticulin, 78-kDA glucose-regulated protein (GRP78),


Page 14 of 23


0,
\,


\ 0
\

\


\
/0
4C16







Rokyta etoal. BMCGenomics 2012,13:312
http://www.biomedcentral.com/1471-2164/13/312


and heat shock protein 5. The latter gene appears to be
a splice variant of GRP78, differing within the coding
region by two point mutations and two short deletions.
All of these chaperones are ER specific. Six of the top 20
nontoxins were mitochondrial genes involved in oxida-
tive cellular respiration, consistent with the high ener-
getic demands of venom production [74]: cytochrome C
oxidase subunits I and III, cytochrome B, and NADH
dehydrogenase subunits 1, 4, and 5. The cells of venom
glands are particularly rich in mitochondria [75]. Four
genes were involved in various aspects of translation:
two translation elongation factors, 18S rRNA, and vig-
ilin. Vigilins are hypothesized to be involved in reg-
ulating mRNA stability and translation and might be
involved in RNA-mediated gene silencing [76,77]. The
final top 20 nontoxin gene was actin, a component of the
cytoskeleton.
The abundances of several major classes of nontoxins
are provided in Figure 3B. We identified 57 sequences with
functions related to protein folding [19,78-80], including
various classes of heat-shock proteins, protein-disulfide
isomerases, peptidyl-prolyl cis-trans isomerases, dnaJ-
complex components, and T-complex components. These
sequences together accounted for 28.4% of the total
reads mapping to nontoxins. Ribosomal-protein tran-
scripts (cytoplasmic and mitochondrial) accounted for


9.5% of the nontoxin reads, and mitochondrial genes
accounted for another 9.0%. Finally, we identified 110
sequences transcripts encoding proteins involved in pro-
tein degradation [81,82], including proteins involved in
the ubiquitin-proteasome system and the ER-associated
protein-degradation system [83], which accounted for
2.6% of the nontoxin reads. Protein-quality control should
be essential in a high-throughput protein-producing tis-
sue such as a snake venom gland.
Our collection of nontoxins included several notable
potential inhibitors of the toxins or other proteases
(Table 6). Such inhibitors may play a role in preventing
autolysis [84] or may serve to protect venom components
once inside a victim [85]. We detected three cystatin-like
transcripts in the venom gland. Cystatins are cysteine-
protease inhibitors and have been detected in numerous
elapid venom glands and venoms [85]. We detected three
unique metalloproteinase inhibitors and two serine pro-
teinase inhibitors (serpins). Finally, we found four unique
PLA2 inhibitors.

Sequence accession numbers
The original, unmerged sequencing reads were submit-
ted to the National Center for Fi.. J.l....1. Information
(NCBI) Sequence Read Archive under accession number
SRA050594. The annotated toxin and nontoxin sequences


Page 15 of 23


2000 -
= 80-
60-
0




cj.
) 1500- 40-


Sz 200
1000 .. .
0
0 - -. .-.'-. .


500 -' c

0 "" WT r^- -









Figure 5 The biological-process GO terms identified for the 2,879 annotated full-length nontoxin sequences. Terms specific for the
production, processing, and export of proteins are highlighted in black. The inset shows the low-abundance portion of the full distribution.






Rokyta etal. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


were submitted to the GenBank Transcriptome Shot-
gun Assembly (TSA) database under accession numbers
JU173621-JU173743 (toxins) and JU173744-JU176622
(nontoxins).

Conclusions
We have described the most comprehensive venom-gland
transcriptomic characterization of a snake species to
date and provided full-length coding sequences for 123
unique toxin proteins and 2,879 unique nontoxin proteins.
We have demonstrated the use of Illumina sequencing
t,... hii.. l.. for the sequencing and de novo assembly of
a tissue-specific transcriptome for a nonmodel species,
C. adamanteus, for which genome-scale resources were
previously unavailable. Because the nontoxin sequences in
particular should be conserved across snake species, our
results should greatly facilitate similar work with other
venomous species, serving as an assembly template and
reducing the number of reads for which de novo assembly
will be necessary.
The expressed toxin genes in the venom gland of
C. adamanteus provide a detailed portrait of a type I
rattlesnake venom [46]. The most abundant transcript
expressed in the C. adamanteus venom gland encoded


a myotoxin homologous to crotamine. Crotamine is
known to induce spastic paralysis [54], a symptom
that has been observed in human envenomations by
C. adamanteus [20]. Like those of most viperids, the
bites of C. adamanteus result in significant tissue dam-
age and necrosis, and we found that SVMPs, the major
class of hemorrhagic toxins, dominated venom-gland
gene expression. The second most abundant toxin tran-
script overall was an LAAO, which are also noted for
causing local tissue damage [46]. Coagulopathy is a
common occurrence with pit-viper bites [5]. The CTLs
and SVSPs were also both diverse and abundant in the
venom-gland transcriptome of C. adamanteus, and both
classes primarily attack the hemostatic system. In terms
of gene sequences of venom components, the venom of
C. adamanteus is now the best-characterized snake
venom, although a thorough proteomic analysis of the
venom is still needed. The sequences we have generated
will greatly facilitate such a proteomic characteriza-
tion by serving as a database against which to query
mass-spectrum results.
The expression patterns of the nontoxin genes in the
venom gland of C. adamanteus reflect the protein-
secretory function of the tissue and the high energetic


Page 16 of 23


2000-- 15
(D 15

W~
o.1


W- Z


E0- % % %,% %0, %

S500 E 5'






"" \\ % "%o%
Z~







-,o. % ,%0








Figure 6 The molecular-function GO terms identified for the 2,879 annotated full-length nontoxin sequences. Terms specific for the
production, processing, and export of proteins are highlighted in black. The inset shows the low-abundance portion of the full distribution.
(D. 000

(D 0 0%
W~~~~- oot qn 1,Z;/-0 //

0 0
Fi u e 6 T e m le u a u c i n G O t r s i e t fi d f r t e 2 8 9 n o a e u l le g h n n o i s q e c s e m s e i i o h
prdcto, rcssn, n xpr o rten aehghihtdi bak h nstshw helw bnanepoto ote ulditibto







Rokyta etoal. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


demands of rapid venom production [75]. The most highly
expressed nontoxin genes were those involved in the
production and processing of proteins and energy pro-
duction to support these activities. Molecular chaperones
and PDIs were particularly abundant. Though the expres-
sion patterns for nontoxins were not surprising, future
comparisons with other snake species, especially those
from other snake families, may be able to elucidate the ori-
gin and early stages of the evolution of the venom gland.

Methods
Venom-gland transcriptome sequencing
We sequenced the venom-gland transcriptome of a sin-
gle animal from Florida (Wakulla County): an adult female
weighing 393 g with a snout-to-vent length of 792 mm and
a total length of 844 mm. To stimulate transcription in
the venom glands, we anesthetized the snake by propofol
injection (10 mg/kg) and extracted venom by electros-
timulation under anesthesia [86]. After venom extraction,
the animal was allowed to recover for four days while
transcription levels reached their maxima [87]. The snake
was euthanized by injection of sodium pentobarbitol (100
mg/kg), and its venom glands were subsequently removed.
The above techniques were approved by the Florida State


University Institutional Animal Care and Use Committee
(IACUC) under protocol #0924.
Sequencing and nonnormalized cDNA library prepa-
ration were performed by the HudsonAlpha Institute
for i..l...ii,..l...\ Genomic Services Laboratory (http://
www.hudsonalpha.org/gsl/). Transcriptome sequencing
was performed essentially as described by Mortazavi
et al. [88] in a modification of the standard Illumina
methods described in detail in Bentley et al. [89].
Total RNA was reduced to poly-A+ RNA with oligo-
dT beads. Two rounds of poly-A+ selection were per-
formed. The purified mRNA was then subjected to
a mild heat fragmentation followed by random prim-
ing for first-strand synthesis. Standard second-strand
synthesis was followed by standard library preparation
with the double-stranded cDNA as input material. This
approach is similar to that of Illumina's TruSeq RNA-
seq library preparation kit. Sequencing was performed in
one lane on the Illumina HiSeq 2000 with 100-base-pair
paired-end reads.

Transcriptome assembly and analysis
The average insert length of our cDNA library was ~170
nt, excluding the Illumina adaptors. With 100-base-pair


Page 17 of 23


2500-


2000

240-
-I-I 5 3o-

0) 1500 -
E E
z: 10

0,-/0 <<4 0 0
ZZ


o co
500-
1 -oCpj_ 0 \


I I Cb % ," % ~ ~ Q~


%%q>Oo,, 2


,o o % \
0




Figure 7 The cellular-components GO terms identified for the 2,879 annotated full-length nontoxin sequences. Terms specific for the
production, processing, and export of proteins are highlighted in black. The inset shows the low abundance portion of the full distribution.







Rokyta etol. BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


Table 5 The 20 most highly expressed nontoxin transcripts


Name
Protein disulfide isomerase
Cytochrome C oxidase subunit I
Cytochrome B
Translation elongation factor 1 a 1
18S rRNA
Calreticulin
Endoplasmin (HSP90 family)
78 kDa glucose-regulated protein
Heat shock protein 5 (GRP78 splice variant?)
NADH dehydrogenase subunit 5
Cytochrome C oxidase subunit III
Protein disulfide isomerase A6
Nucleobindin 2
Protein disulfide isomerase A3
NADH dehydrogenase subunit 1
NADH dehydrogenase subunit 4
Protein disulfide isomerase A4
Translation elongation factor 2


Actin, cytoplasmic 2


Length


paired-end sequencing, the majority of paired-end reads
overlapped at their 3' ends. Because read quality declines
toward the 3' ends of reads, we developed a method
similar to that of Rodrigue et al. [36] for merging the
overlapping pairs into single, long, high-quality reads.
The members of each pair of reads were slid along
each other, and, for each overlap of length n, we cal-
culated the probability of getting the observed number


% reads
5.223
0.966
0.499
0.459
0.421
0.406
0.333
0.332
0.327
0.272
0.239
0.212
0.203
0.186
0.173
0.172
0.159
0.147
0.129
0.174


Function
...... disulfide bonds (ER)
Electron transport chain
Electron transport chain
Translation
Ribosomal component
Protein chaperone (ER)
Protein chaperone (ER)
Protein chaperone (ER)
Protein chaperone (ER)
Electron transport chain
Electron transport chain
... ... disulfide bonds (ER)
Calcium binding
... 1i disulfide bonds (ER)
Electron transport chain
Electron transport chain
...... disulfide bonds (ER)
Translation
mRNA stability and translation
Cvtoskeleton


Accession
JU175360


of matches k by chance using a binomial probability
given by


assuming any of the four nucleotides is equally likely to be
at any position. To be conservative, we only merged reads


Table 6 Toxin and protease inhibitors detected in the venom-gland transcripts
Name Length % reads F
Cystatin 1 790 9.80 x 10-4 Cysteine-
Cystatin B 460 1.23 x 10-3 Cysteine-
Cystatin 2 709 1.31 x 10-3 Cysteine-
Metalloproteinase inhibitor 1 820 1.04 x 10- Metallopr
Metalloproteinase inhibitor 2 2560 2.57 x 10- Metallopr
Metalloproteinase inhibitor 3 2202 1.01 x 10- Metallopr
PLA2 inhibitor beta 1210 1.54 x 10- PLA
PLA2 inhibitor gamma B1 1492 3.68 x 10- PLA
PLA2 inhibitor gamma B 2 1694 7.70 x 10-4 PLA
PLA2 inhibitor B 2339 1.07 x 10- PLA
Serpin B6 1708 9.68 x 10-3 Serine-pr
Serpin HI 2004 9.40 x 10-4 Serine-pr


unction
protease inhibitor
protease inhibitor
protease inhibitor
oteinase inhibitor
oteinase inhibitor
oteinase inhibitor
,2 inhibitor
,2 inhibitor
,2 inhibitor
,2 inhibitor
oteinase inhibitor
oteinase inhibitor


Accession
JU174278
JU174279
JU174280
JU175124
JU175125
JU175126
JU175425
JU175444
JU175442
JU175443
JU175869
JU175870


Page 18 of 23


n 1 3 n -k
P(kn) n
Sk 4 4






Rokyta etoal. BMCGenomics 2012,13:312
http://www.biomedcentral.com/1471-2164/13/312


if the minimum probability was less than 10-10 and the
second smallest probability was at least 1000 times larger
(Figure 1A). The latter condition was meant to help avoid
merging reads that span highly repetitive regions. For
cases in which the insert size was less than the read length,
sequence data outside the overlap were assumed to repre-
sent adaptors and were deleted. We updated quality scores
for the overlapping positions following the approach of
Rodrigue et al. [36]. For merged reads, quality scores for
nonoverlapping bases were left unchanged (Figure 1B).
The unmerged reads were typically those pairs from the
longer end of the insert-size distribution.
Because of the inherent difficulty in de novo tran-
scriptome assembly, we used a diverse array of assembly
approaches and combined the results for a final data
set. We performed assemblies using ABySS version 1.2.6
[37,38] under a wide array of parameter values using both
the merged and unmerged reads. In particular, we used
k-mer values of 51, 61, 71, 81, and 91 and varied the
coverage (c) and erode (e) parameters from 2 to 1,000.
We set E = 0, m = 20, and s = 200 for all assem-
blies. Trans-ABySS [90] provided little or no improvement
of our assemblies, primarily because assembly quality
appeared to be more dependent on the coverage and
erode parameters than on the k-mer length. We also con-
ducted assemblies using both the merged and unmerged
reads with Velvet version 1.1.02 [39] and k-mer values
of 71, 81, and 91. We selected the best of these assem-
blies on the basis of the N50 values for further assembly
into transcripts with Oases version 0.1.20 (http://www.ebi.
ac.uk/~zerbino/oases/) [40]. For Oases, we set the mini-
mum transcript length to 300 nt and the coverage cutoff
to 10. We also followed the approach of Rokyta et al.
[11] and used the NGen2.2 assembler from DNAStar
(http://www.dnastar.com/). Because this assembler is lim-
ited to 20-30 million reads, we used only the merged
reads. We performed four independent assemblies: three
with 20 million merged reads each and one with the
remaining 12,114,709 merged reads. Each assembly was
performed with the default settings for high-stringency,
de novo transcriptome assembly for long Illumina reads,
including default quality trimming. The high-stringency
setting corresponded to setting the minimum match per-
centage to 90%. We retained contigs comprising at least
100 reads.
In addition to the all-at-once assembly approaches
above, we developed an iterative approach that was both
more effective at generating full-length transcripts and
more computationally efficient. The first step consisted of
applying our Extender program (see below) as a de novo
assembler starting from 1,000 reads. Full-length tran-
scripts were identified with blastx searches (see below),
then used as templates in a reference-based assembly in
NGen3.1 with a 98% minimum match percentage to filter


reads corresponding to identified transcripts. Ten million
of the unassembled sequences were then used in a de
novo transcriptome assembly in NGen3.1 with the same
settings as described above for de novo assembly except
that the minimum match percentage was increased to 93%
and contigs comprising less than 200 sequences were dis-
carded. The resulting sequences were identified, where
possible, by means of blastx searches, and the identified
full-length transcripts were used in another templated
assembly to generate a further-reduced set of reads. This
iterative process was repeated two additional times.
To provide transcriptional profiles of the venom gland,
we performed GO annotation with Blast2GO [73]. We
ran full analyses on one of NGen assemblies of 20 mil-
lion merged reads, including blastx searches, GO map-
ping, and annotation. We used the default Blast2GO
parameters throughout. We converted the GO anno-
tation to generic GO-slim terms. We ran the same
analysis on the combined set of annotated nontoxin
sequences.
For gene identification and annotation, we conducted
blastx searches using mpiblast version 1.6.0 (http://www.
mpiblast.org/) of the consensus sequences of contigs of
our assemblies against the NCBI nonredundant pro-
tein database (nr; downloaded March 2011 and updated
through November 2011). We used an E-value cut-off of
10-4, and only the top 10 matches were considered. For
toxin identification, hit descriptions were searched for
a set of keywords based on known snake-venom toxins
and protein classes. Any sequence matching these key-
words was checked for a full-length coding sequence. We
generally only retained transcripts with full-length cod-
ing sequences (but see below). For the iterative assembly
approach, the remaining, presumably nontoxin-encoding,
contigs were screened for those whose match lengths
were at least 90% of the length of at least one of their
database matches. This step was intended to minimize
the number of fragmented or partial sequences that were
considered for annotation. In addition, we sorted the
contigs of the three 20-million-sequence NGen assem-
blies from the all-at-once approach on the basis of the
number of reads and attempted to annotate the top 500
contigs from one assembly and the top 100 from the
other two.
We estimated transcript abundances using high-
stringency reference-based assemblies in NGen3.1 with
a minimum match percentage of 95. Ten million of the
merged reads were mapped onto the full-length, anno-
tated transcripts, and the percentage of reads mapping to
each transcript was used as a proxy for abundance.

The extender
The purpose of Extender is to estimate quickly one
or more full-length transcript sequences from a large


Page 19 of 23







Rokyta etal BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


number of high-quality sequence reads. The procedure
begins with one or more seed sequences provided by the
user. The seeds can be known sequences (e.g., partial
transcripts from a previous assembly) or simply sequences
of one or more of the reads. The Extender procedure
begins by hashing the k-mers observed at the two ends
of the seeds. If k is set to 50, for example, then the 50-
base sequence present at the 5' end of each seed is used
as a key in a hash table, and the hash value is a pointer
to the seed in the list of seeds. A second hash table is
likewise used for k-mers from the 3' ends of the seeds.
Note that this method requires that all initial k-mers be
unique (that no two sequence ends be identical). Once
the seeds are hashed, the seeds are extended with the
set of reads provided by the user as follows. The two k-
mers from the ends of each read are looked up in each
hash table. If the key is present in the hash table, the
seed is extended by concatenation of the nonoverlap-
ping bases from the read onto the appropriate end of the
seed. If the key is absent, the reverse complement of the
read is used to extend the seed if the end k-mers are
found. After each extension, the k-mer key facilitating the
extension is removed from the hash table and the new
k-mer key is added (the reference to the seed remains
the same). The procedure is repeated until the reads have
been cycled through N times, where N is chosen by the
user. Cycling is beneficial because the Extender does not
reset to the beginning of the read list when an extension
is made.
Extension of a seed typically terminates when the end
of the full-length transcript is reached or when a sequenc-
ing error is encountered in the end of an incorporated
read. The presence of low-frequency biological artifacts
(e.g., unspliced introns) may also result in termination
of the extension. In order to improve the accuracy of
the consensus sequence prediction, Extender can cre-
ate replicate seeds for a particular seed by sequentially
trimming one base at a time from both ends. Using
replicate seeds allows several independent sequences that
represent the same target consensus sequence to be gen-
erated simultaneously, and these replicates are entirely
independent because they begin with different keys. The
user can obtain the final estimate of the sequence cor-
responding to each original seed by taking the consen-
sus across replicates or by simply choosing the replicate
producing the longest sequence. We took the former
approach for all of our assembly efforts. Overall, Extender
is highly inefficient with its use of data and requires many
long, high-quality reads, but it is extremely computation-
ally efficient, having short run times and low memory
requirements.
We used Extender in two different ways: to complete
partial toxin transcripts and as a de novo assembler. For
the former, we used partial toxin transcripts from NGen


assemblies that were found to have fragments of cod-
ing sequence homologous with known toxins. The par-
tial transcripts were trimmed to just the partial coding
sequence and used as seeds. To use Extender as a de novo
assembler, we seeded it with 1,000 random reads. For
both applications, we used a k-mer size of 100, 20 repli-
cates, 10 cycles through the complete set of merged reads
excluding all reads with any bases with quality scores less
than 30.

Abbreviations
BPP: Bradykinin potentiating and C-type natriuretic peptides; CTL: C-type
lectin; CREGF: Cysteine-rich with EGF-like domain; CRISP: Cysteine-rich
secretary protein; Gb: Gigabase; GC: Glutaminyl-peptide cyclotransferase; GO:
Gene ontology; HYAL: Hyaluronidase; KUN: Kunitz-type protease inhibitor;
LAAO: L amino-acid oxidase; MYO: Myotoxin (crotamine); NGF: Nerve growth
factor; NF: Neurotrophic factor; nt: Nucleotide; NUC: Nucleotidase; PDE:
Phosphodiesterase; PDI: Protein disulfide isomerase; PLA2: Phospholipase A2;
SVMP: Snake venom metalloproteinase (types II and III); SVSP: Snake venom
serine proteinase; VEGF: Vascular endothelial growth factor; VESP: Vespryn
(ohanin-like); VF: Venom factor; WAP: Waprin.

Competing interests
The authors declare that they have no competing interests.

Authors' contributions
The project was conceived and planned by DR and AL. DR, MM, and KA
collected and analyzed the data. DR wrote the manuscript. All authors read
and approved the final manuscript.

Acknowledgements
The authors thank Kenneth P. Wray for dissecting the venom glands and Darryl
Heard for training DRR in the electrostimulation technique for venom
extraction. Computational resources were provided by the Florida State
University High-Performance Computing cluster, and the authors thank James
C. Wilgenbusch for assistance in the use of these resources. Funding for this
work was provided to DRR and ARL by Florida State University.

Author details
1 Department of Biological Science, Florida State University, Tallahassee, FL
32306-4295, USA.2 Department of Scientific Computing, Florida State
University, Tallahassee, FL 32306-4120, USA.

Received: 14 March 2012 Accepted: 2 July 2012
Published: 16 July 2012

References
1. Chippaux JP: Snake-bites: appraisal of the global situation. Bull WHO
1998, 76:515-524.
2. O'Neil ME, Mack KA, Gilchrist J, Wozniak EJ: Snakebite injuries treated in
United States emergency departments, 2001-2004. Wilderness
Med 2007, 18:281 -287.
3. Langley RL: Deaths from reptile bites in the United States,
1979-2004. Clin Toxicol 2009, 47:44-47.
4. Theakston RDG, Warrell DA, Griffiths E: Report of a WHO workshop on
the standardization and control of antivenoms. 2003,
41:541-557.
5. Smith J, Bush S: Envenomations by reptiles in the United States. In
Handbookof Vnoms and Toxns of Reptiles. Edited by Mackessy SP. Boca
Raton, Florida: CRC Press; 2010:475-490.
6. Neves-Ferreira AGC, Valente RH, Perales J, Domont GB: Natural
inhibitors: innate immunity to snake venoms. In HandbookofVenoms
and Toxins ofReptiles. Edited by Mackessy SP. Boca Raton, Florida: CRC
Press; 2010:259-284.
7. Rucavado A, Lomonte B: Neutralization of myonecrosis, hemorrhage,
and edema induced by Bothrops apser snake venom by
homologous and heterologous pre-existing antibodies in mice.
Toxicon 1996, 34:567-577.


Page 20 of 23








Rokyta etoL BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


8. Huang KF, Chow LP, Chiou SH: Isolation and characterization of a
novel proteinase inhibitor from the snake serum of Taiwan habu
(Trimeresurus mucrosquamatus). i m Bohy n 1999,
263:610-616.
9. Valente RH, Dragulev B, PeralesJ, Fox JW, Domont GB: BJ46a, a snake
venom metalloproteinase inhibitor. EurJBiochem 2001,
268:3042-3052.
10. Serrano SMT, Shannon JD, Wang D, Camargo ACM, Fox JW: A
multifaceted analysis of viperid snake venoms by two-dimensional
gel electrophoresis: An approach to understanding venom
proteomics. Proteomics 2005,5:501-510.
11. Rokyta DR, Wray KP, Lemmon AR, Lemmon EM, Caudle SB: A
high-throughput venom-gland transcriptome for the eastern
diamondback rattlesnake (Crotalus adamanteus) and evidence for
pervasive positive selection across toxin classes. T n 2011,
57:657-671.
12. Biardi JE, Chien DC, Coss RG: California ground squirrel (Spermophilus
beecheyi) defenses against rattlesnake venom digestive and
hemostatic toxins. J Chem Ecol2005, 31:2501 -2518.
13. Biardi JE, Nguyen KT, Lander S, Whitley M, Nambiar KP: A rapid and
sensitive fluorometric method for the quantitative analysis of snake
venom metalloproteases and their inhibitors. Toxicon 2011,
57:342-347.
14. Jansa SA, Voss RS: Adaptive evolution of the venom-targeted vWF
protein in opossums that eat pitvipers. PLoS One 2011, 6:e20997.
15. Harvey AL, Bradley KN, Cochran SA, Rowan EG, Pratt JA, Quillfeldt JA,
Jerusalinsky DA: What can toxins tell us for drug discovery?. Toxicon
1998, 36:1635-1640.
16. M6nez A: Functional architectures of animal toxins: a clue to drug
design? Toxicon 1998,36:1557-1572.
17. Escoubas P, King GF: Venomics as a drug discovery platform. Expert
Rev Proteomics 2009, 6:221 -224.
18. Bohlen CJ, Chesler AT, Sharif Naeini R, Medzihradszky KF, Zhou S, King D,
Sanchez EE, Burlingame AL, Basbaum AI, Julius D: A heteromeric Texas
coral snake toxin targets acid-sensing ion channels to produce pain.
Nature 2011, 479:410-414.
19. Hartl FU, Bracher A, Hayer-Hartl M: Molecular chaperones in protein
folding and proteostasis. Nature 2011,475:324-332.
20. Klauber LM: Rattlesnaks: Their Habits, Life Historie, and Influenc on
Mankind. second edition. Berkeley, California: University of California
Press; 1997.
21. Gold BS, Dart RC, Barish RA: Bites of venomous snakes. NEng/JMed
2002, 347:347-356.
22. Conant R, Collins JT: A Fieldguide to ReptilesandAmphibians of Easter and
Central North America. third edition. New York, New York: Houghton
Mifflin Harcourt; 1998.
23. Palmer WM, Braswell AL: ReptilesofNorth Carolina. Chapel Hill, North
Carolina: University of North Carolina Press; 1995.
24. Dundee HA, Rossman DA: TheAmphibians and Reptiles ofLouisiana. Baton
Rouge, Louisiana: Louisiana University Press; 1996.
25. Pahari S, Mackessy SP, Kini RM: The venom gland transcriptome of the
desert massasauga rattlesnake (Sistrurus catenatus edwardsii):
towards an understanding of venom composition among advanced
snakes (superfamily Colubroidea). BMCMolBiol 2007, 8:115.
26. Casewell NR, Harrison RA, Wuster W, Wagstaff SC: Comparative venom
gland transcriptome surveys of the saw-scaled vipers (Viperidae:
Echis) reveal substantial intra-family gene diversity and novel
venom transcripts. B noms 2009,10:564.
27. Leao LI, Ho PL, de L M Junqueira-de Azevedo I: Transcriptomic basis for
an antiserum against Micrurus corallinus (coral snake) venom. BMC
2009,10:112.
28. Jiang Y, Li Y, Lee W, Xu X, Zhang Y, Zhao R, Zhang Y, Wang W: Venom
gland transcriptomes of two elapid snakes (Bungarus multicinctus
and Naja atra) and evolution of toxin genes. BMC Cnomics 2011,12:1.
29. Morgenstern D, Rohde BH, King GF, Tal T, Sher D, Zlotkin E: The tale of a
resting venom gland: transcriptome of a replete venom gland from
the scorpion Hottentottajudaicus. Toxicon 2011,57:695 -703.
30. Whittington CM, Papenfuss AT, Locke DP, Mardis ER, Wilson RK,
Abubucker S, Mitreva M, Wong ESW, Hsu AL, Kuchel PW, Belov K, Warren
WC: Novel venom gene discovery in the platypus. enome Biol 2010,
11:R95.


31. Gremski LH, Silveira RBD, Chaim OM, Probst CM, Ferrer VP, Nowatzki J,
Weinschutz HC, Madeira HM, Gremski W, Nader HB, Senff -Ribeiro A, Veiga
SS: A novel expression profile of the Loxosceles intermedia spider
venomous gland revealed by transcriptome analysis. MolBioSyst
2010, 6:2403-2416.
32. Ruiming Z, Yibao M, Yawen H, Zhiyong D, Yingliang W, Zhijian C, Wenxin
L: Comparative venom gland transcriptome analysis of the scorpion
Lychas mucronatus reveals intraspecific toxic gene diversity and
new venomous components. BMC Genomics 2010,11:452.
33. Hu H, Bandyopadhyay PK, Olivera BM, Yandell M: Characterization of
the Conus bullatus genome and its venon-duct transcriptome. BMC
Cenomics 2011,12:60.
34. Durban J, Juarez P, Angulo Y, Lomonte B, Flores-Diaz M, Alape-Gir6n A,
Sasa M, Sanz L, Gutierrez JM, Dopazo J, Conesa A, Calvete JJ: Profiling the
venom gland transcriptomes of Costa Rican snakes by 454
pyrosequencing. BMC Cenomics 2011,12:259.
35. Gilles A, Megl6cz E, Pech M, Ferreira S, Malausa T, Martin JF: Accuracy and
quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC
Cenomics 2011,12:245.
36. Rodrigue S, Materna AC, Timberlake SC, Blackburn MC, Malmstrom RR,
Aim EJ, Chisholm SW: Unlocking short read sequencing for
metagenomics. PLoS One 2010, 5:el 1840.
37. Birol I, Jackman SD, Nielsen CB, Qian JQ, Varhol R, Stazyk G, Morin RD,
Zhao Y, Hirst M, Schein JE, Horsman DE, Connors JM, Gascoyne RD, Marra
MA, Jones SJM: De novo transcriptome assembly with ABySS.
Bioinformatics 2009, 25:2872-2877.
38. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I: ABySS: a
parallel assembler for short read sequence data. Genome Res 2009,
19:1117-1123.
39. Zerbino DR, Birney E: Velvet: algorithms for de novo short read
assembly using de Bruijn graphs. GenomeRes 2008, 18:821 -829.
40. Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo
RNA-seq assembly across the dynamic range of expression levels.
Bioinormatics 2012, 28:1086-1092.
41. Feldmeyer B, Wheat CW, Krezdorn N, Rotter B, Pfenninger M: Short read
Illumina data for the de novo assembly of a non-model snail species
transcriptome (Radixbalthica, Basommatophora, Pulmonata), and a
comparison of assembler performance. BMC Genomics 2011,12:317.
42. Dohm JC, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in
ultra-short read data sets from high-throughput DNA sequencing.
NuclAcids Rs 2008, 36:el 05.
43. Gibbs HL, Sanz L, Calvete JJ: Snake population venomics:
proteomics-based analyses of individual variation reveals
significant gene regulation effects on venom protein expression in
Sistrurus rattlesnakes. J Mol Evo 2009, 68:113-125.
44. Fox JW, Serrano SMT: Structural considerations of the snake venom
metalloproteinases, key members of the M12 reprolysin family of
metalloproteinases. Toxicon 2005, 45:969-985.
45. Fox JW, Serrano SMT: Snake venom metalloproteinases. In Handbook
of Vnoms and Toxins ofReptiles. Edited by Mackessy SP. Boca Raton,
Florida: CRC Press; 2010:95-113.
46. Mackessy SP: Venom composition in rattlesnakes: trends and
biological significance. In TheBioogyofRattlesnakes. Edited by Hayes
WK, Beaman KR, Cardwell MD, Bush SP. Loma Linda, California: Loma
Linda University Press; 2008:495-510.
47. Du XY, Clemetson KJ: Reptile C-type lectins. In Handbookof Venoms and
Toxins ofReptiles. Edited by Mackessy SP. Boca Raton, Florida: CRC Press;
2010:359-375.
48. Walker JR, Nagar B, Young NM, Hirama T, Rini JM: X-ray crystal structure
of a galactose-specific C-type lectin possessing a novel decameric
quaternary structure. Biochemistry 2004, 43:3783 3792.
49. Serrano SMT, Maroun RC: Snake venom serine proteinases: sequence
homology vs. substrate specificity, a paradox to be solved. Toxicon
2005,45:1115-1132.
50. Phillips DJ, Swenson SD, Francis S, Markland J: Thrombin-like snake
venom serine proteinases. In Handbookof Vnoms and Toxins ofReptiles.
Edited by Mackessy SP. Boca Raton, Florida: CRC Press; 2010:
139-154.
51. Lynch VJ: Inventing an arsenal: adaptive evolution and
neofunctionalization of snake venom phospholipase A2genes. BMC
Evol Bio 2007, 7:2.


Page 21 of 23








Rokyta etal BMCGenomics 2012, 13:312
http://www.biomedcentral.com/1471-2164/13/312


52. Doley R, Zhou X, Kini RM: Snake venom phospholipase A2 enzymes. In
Handbookof Vnoms and ToxinsofReptiles. Edited by Mackessy SP. Boca
Raton, Florida: CRC Press; 2010:173-205.
53. Radis-Baptista G, Oguiura N, Hayashi MAF, Camargo ME, Grego KF,
Oliveira EB, Yamane T: Nucleotide sequence of crotamine isoform
precursors from a single South American rattlesnake (Crotalus
durissus terrificus). Toxion 1999, 37:973-984.
54. Oguiura N, Boni-Mitake M, Radis-Baptista G: New view on crotamine, a
small basic polypeptide myotoxin from South American rattlesnake
venom. Toxicon 2005, 46:363-370.
55. Straight RC, Glenn JL, Wolt TB, Wolfe MC: Regional differences in
content of small basic peptide toxins in the venoms of Crotalus
adamanteus and Crotalus horridus. Comp Biochem Physio/ B 1991,
100:51-58.
56. Tan NH, Fung SY: Snake venom L-amino acid oxidases. In Handbookof
Venoms and Toxins ofReptiles. Edited by Mackessy SP. Boca Raton, Florida:
CRC Press; 2010:221- 235.
57. Heyborne WH, Mackessy SP: Cysteine-rich secretary proteins in reptile
venoms. In Handbookof Venoms and Toxins of Reptiles. Edited by
Mackessy SP. Boca Raton, Florida: CRC Press; 2010:325-336.
58. Yamazaki Y, Hyodo F, Morita T: Wide distribution of cysteine-rich
secretary proteins in snake venoms: isolation and cloning of novel
snake venom cysteine-rich secretary proteins. Arch Biochem Biophys
2003, 412:133-141.
59. Yamazaki Y, Morita T: Structure and function of snake venom
cysteine-rich secretary proteins. Toxicon 2004, 44:227-231.
60. Pung YF, Wong PTH, Kumar PP, Hodgson WC, Kini RM: Ohanin, a novel
protein from king cobra venom, induces hypolocomotion and
hyperalgesia in mice. JBio/Chem 2005, 280:
13137-13147.
61. Pung YF, Kumar SV, Rajagopalan N, Fry BG, Kumar PP, Kini RM: Ohanin, a
novel protein from king cobra venom: its cDNA and genomic
organization. Gene 2006, 371:246-256.
62. Junqueira-de-Azevedo ILM, Ching ATC, Carvalho E, Faria F, Nishiyama Jr
MY, Ho PL, Diniz MRV: Lachesis muta (Viperidae) cDNAs reveal
diverging pit viper molecules and scaffolds typical of cobra
(Elapidae) venoms: implications for snake toxin repertoire
evolution. Genetics 2006, 173:877-889.
63. Aird SD: Ophidian envenomation strategies and the role of purines.
Toxicon 2002, 40:335-393.
64. Aird SD: The role of purine and pyrimidine nucleosides in snake
venoms. In Handbookof Vnoms and Toxins ofReptiles. Edited by
Mackessy SP. Boca Raton, Florida: CRC Press; 2010:393 -419.
65. Dhananjaya BL, Vishwanath BS, D'Souza CJM: Snake venom nucleases,
nucleotidases, and phosphomonoesterases. In Handbookof Venoms
and Toxins of Reptiles. Edited by Mackessy SP. Boca Raton, Florida: CRC
Press; 2010:155-171.
66. Shafqat J, Zaidi ZH, Jornvall H: Purification and characterization of a
chymotrypsin Kunitz inhibitor type of polypeptide from the venom
of cobra (Naja naja naja). FEBS Lett 1990, 275:6-8.
67. Harrison RA, Ibison F, Wilbraham D, Wagstaff SC: Identification of cDNAs
encoding viper venom hyaluronidases: cross-generic sequence
conservation of full-length and unusually short variant transcripts.
Gene 2007, 392:22-33.
68. Kemparaju K, Girish KS, Nagaraju S: Hyaluronidases, a neglected class
of glycosidases from snake venom: beyond a spreading factor. In
HandbookofVenoms and ToxinsofReptiles. Edited by Mackessy SP. Boca
Raton, Florida: CRC Press; 2010:237-258.
69. Pawlak J, Kini RM: Snake venom glutaminyl cyclase. Toxicon 2006,
48:278-286.
70. Fry BG, Scheib H, van der Weerd L, Young B, McNaughtan J, Ramjan SFR,
Vidal N, Poelmann RE, Norman JA: Evolution of an arsenal. Mol Cell
Proeomics 2008, 7:215-246.
71. Rehana S, Kini RM: Molecular isoforms of cobra venom factor-like
proteins in the venom of Austrelaps superbus. Toxicon 2007, 50:32-52.
72. Eggertsen G, Lind P, Sjoquist J: Molecular characterization of the
complement activating protein in the venom of the Indian cobra
(Naja n. siamensis). Mol /mmuno/1981, 18:125-133.
73. Conesa A, Gotz S, Garcfa-G6mez JM, Terol J, Talon M, Robles M: Blast2GO:
a universal tool for annotation, visualization and analysis in
functional genomics research. Bioinformatics 2005, 21:3674-3676.


74. McCue MD: Cost of producing venom in three North American
pitviper species. Copeia 2006, 2006:818-825.
75. Mackessy SP, Baxter LM: Bioweapons synthesis and storage: the
venom gland of front-fanged snakes. ZoolAnz 2006, 245:147-159.
76. Wang Q, Zhang Z, Blackwell K, Carmichael GG: Vigilins bind to
promiscuously A-to-l-edited RNAs and are involved in the formation
of heterochromatin. Curt Bio/2005,15:384-391.
77. Nishikura K: Functions and regulation of RNA editing by ADAR
deaminases. Annu RevBiochem 2010, 79:321-349.
78. Hartl FU: Molecular chaperones in cellular protein folding. Nature
1996, 381:571-580.
79. Fink AL: Chaperone-mediated protein folding. PhysiolRev 1999,
79:425-449.
80. Young JC, Agashe VR, Siegers K, Hartl FU: Pathways of
chaperone-mediated protein folding in the cytosol. Nat RevMol Cell
Bio/2004, 5:781-791.
81. Finley D: Recognition and processing of ubiquitin-protein
conjugates by the proteasome.nnu oh 2009,
78:477-513.
82. Buchberger A, Bukau B, SommerT: Protein quality control in the
cytosol and the endoplasmic reticulum: brothers in arms. Mol Cell
2010, 40:238-252.
83. Bagola K, Mehnert M, Jarosch E, SommerT: Protein dislocation from the
ER. Biochim BiophysActa 2011,1808:925-936.
84. Huang KF, Chiou SH, Ko TP, Wang AHJ: Determinants of the inhibition
of a Taiwan habu venom metalloproteinase by its endogenous
inhibitors by X-ray crystallography and synthetic inhibitor
analogues. EurJBiochem 2002, 269:3047-3056.
85. Richards R, St Pierre L, Trabi M, Johnson LA, de Jersey J, Masci PP, Lavin
MF: Cloning and characterization of novel cystatins from elapid
snake venom glands. Biochimie 2011,93:659-668.
86. McCleary RJR, Heard DJ: Venom extraction from anesthetized Florida
cottonmouths, Agkistrodon piscivorus conanti, using a portable
nerve stimulator. Toxion 2010,55:250-255.
87. Rotenberg D, Bamberger ES, Kochva E: Studies on ribonucleic acid
synthesis in the venom glands of Viperapalaestinae (Ophidia,
Reptilia). BiochemJ 1971,121:609-612.
88. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and
quantifying mammalian transcriptomes by RNA-Seq. NatMethods
2008, 5:621-628.
89. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown
CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ,
Cheetham RK, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ,
Irving UJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray UJ,
Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IMJ, Reed MT,
Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP,
Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH,
Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DMD,
Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF,
Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown
AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E,
Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada 00, Diakoumakos
KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW,
Etchin SS, Ewan MR, Fedurco M, Fraser UJ, Fuentes Fajardo KV, Furey WS,
George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE,
Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims
MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ,
James T, Jones TAH, Kang GD, Kerelska TH, Kersey AD, Khrebtukova I,
Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA,
Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM,
Martin JW, McCauley PG, McNitt P, Mehta P, Moon KW, Mullens JW,
Newington T, Ning Z, Ng BL, Novo SM, O'Neill MJ, Osborne MA, Osnowski
A, Ostadan 0, Paraschos LL, Pickering L, Pike AC, Pike AC, Pinkard DC,
Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR,
Rodriguez AC, Roe PM, Rogers J, Bacigalupo MCR, Romanov N, Romieu A,
Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker
MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA,
Sohna JES, Spence EJ, Stevens K, Sutton N, Szajkowski L, Tregidgo CL,
Turcatti G, vandeVondele S, Verhovsky Y, Virk SM, Wakelin S, Walcott GC,
Wang J, Worsley GJ, Yan J, Yau L, Zuerlein M, Rogers J, Mullikin JC, Hurles
ME, McCooke NJ, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R,


Page 22 of 23







Rokyta etoL BMCGenomics 2012,13:312 Page 23 of 23
http://www.biomedcentral.com/1471-2164/13/312




Smith AJ: Accurate whole human genome sequencing using
reversible terminator chemistry. Nature 2008, 456:53-59.
90. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K,
Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T,
Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu AL,
Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJM, Hoodless PA,
Birol I: De novo assembly and analysis of RNA-seq data. Nat Methods
2010, 7:909-912.

doi:10.1186/1471-2164-13-312
Cite this article as: Rokyta et al: The venom-gland transcriptome of the
eastern diamondback rattlesnake (Crotalus adamanteus). BMC Genomics
2012 13312,


0 IMoll Central


Submit your next manuscript to BioMed Central
and take full advantage of:

* Convenient online submission
* Thorough peer review
* No space constraints or color figure charges
* Immediate publication on acceptance
* Inclusion in PubMed, CAS, Scopus and Google Scholar
* Research which is freely available for redistribution


Submit your manuscript at
www.biomedcentral.com/subm it




Full Text

PAGE 1

Rokyta etal.BMCGenomics 2012, 13 :312 http://www.biomedcentral.com/1471-2164/13/312 RESEARCHARTICLE OpenAccessThevenom-glandtranscriptomeoftheeastern diamondbackrattlesnake( Crotalusadamanteus )DarinRRokyta1*,AlanRLemmon2,MarkJMargres1andKaralynAronow1 AbstractBackground: Snakevenomshavesigni“cantimpactsonhumanpopulationsthroughthemorbidityandmortality associatedwithsnakebitesandassourcesofdrugs,drugleads,andphysiologicalresearchtools.Genesexpressedby venom-glandtissue,includingthoseencodingtoxicproteins,havethereforebeensequencedbutonlywithrelatively sparsecoverageresultingfromthelow-throughputsequencingapproachesavailable.High-throughputapproaches basedon454pyrosequencinghaverecentlybeenappliedtothestudyofsnakevenomstogivethemostcomplete characterizationstodateofthegenesexpressedinactivevenomglands,butsuchapproachesarecostlyandstill provideafar-from-completecharacterizationofthegenesexpressedduringvenomproduction. Results: Wedescribethe denovo assemblyandanalysisofthevenom-glandtranscriptomeofaneastern diamondbackrattlesnake( Crotalusadamanteus )basedon95,643,958pairsofquality-“ltered,100-base-pairIllumina reads.Weidenti“ed123unique,full-lengthtoxin-codingsequences,whichclusterinto78groupswithlessthan1% nucleotidedivergence,and2,879unique,full-lengthnontoxincodingsequences.Thetoxinsequencesaccountedfor 35.4%ofthetotalreads,andthenontoxinsequencesforanadditional27.5%.Themosthighlyexpressedtoxinwasa smallmyotoxinrelatedtocrotamine,whichaccountedfor5.9%ofthetotalreads.Snake-venommetalloproteinases accountedforthehighestpercentageofreadsmappingtoatoxinclass(24.4%),followedbyC-typelectins(22.2%) andserineproteinases(20.0%).ThemostdiversetoxinclassesweretheC-typelectins(21clusters),thesnake-venom metalloproteinases(16clusters),andtheserineproteinases(14clusters).Thehigh-abundancenontoxintranscripts werepredominantlythoseinvolvedinproteinfoldingandtranslation,consistentwiththeprotein-secretoryfunction ofthetissue. Conclusions: Wehaveprovidedthemostcompletecharacterizationofthegenesexpressedinanactivesnake venomglandtodate,producinginsightsintosnakebitepathologyandguidanceforsnakebitetreatmentforthe largestrattlesnakespeciesandarguablythemostdangeroussnakenativetotheUnitedStatesofAmerica, C. adamanteus .Wehavemorethandoubledthenumberofsequencedtoxinsforthisspeciesandcreatedextensive genomicresourcesforsnakesbasedentirelyon denovo assemblyofIlluminasequencedata.BackgroundHumanenvenomationbysnakesisaworldwideissue thatclaimsmorethan100,000livesperyearandexacts untoldcostsintheformofpain,dis“gurement,andloss oflimbsorlimbfunction[1-3].Despitethesigni“cance ofsnakebites,theirtreatmentshaveremainedlargely unchangedfordecades.Theonlytreatmentscurrently *Correspondence:drokyta@bio.fsu.edu 1 DepartmentofBiologicalScience,FloridaStateUniversity,Tallahassee,FL 32306-4295,USA Fulllistofauthorinformationisavailableattheendofthearticleavailablearetraditionalantivenomsderivedfromantisera ofanimals,usuallyhorses[4],innoculatedwithwhole venoms[5,6];suchanapproachistheonlyreadilyavailableoptionforlargelyuncharacterized,complexmixtures ofproteinssuchassnakevenoms.Althoughoftenlifesavingandgenerallyeectiveagainstsystemiceects,these antivenomshavelittleornoeectonlocalhemorrhage ornecrosis[7-9],whicharemajoraspectsofthepathologyofviperidbitesandcanresultinlifelongdisability [4,5].Thesetraditionaltreatmentsalsosometimesleadto adversereactionsinpatients[6].Advancesintreatment approacheswilldependonacompleteknowledgeofthe 2012Rokytaetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited.

PAGE 2

Rokyta etal.BMCGenomics 2012, 13 :312 Page2of23 http://www.biomedcentral.com/1471-2164/13/312natureoftheoendingtoxins,butcurrentestimatesof thenumbersofuniquetoxinspresentinsnakevenoms areinexcessof100[10],anumbernotapproachedin eventhemostextensivevenom-characterizationeorts todate[11]. Thesigni“canceofsnakevenomsextendswellbeyond theselectivepressurestheymaydirectlyimposeupon humanpopulations.Snakevenomshaveevolutionary consequencesforthosespeciesthatsnakespreyupon [12,13],aswellasspeciesthatpreyuponthesnakes [14],andtheirstudycanthereforeprovideinsightsinto predator-preycoevolution.Snakevenomcomponents havebeenleveragedasdrugsanddrugleads[15-17]and havebeenuseddirectlyastoolsforstudyingphysiologicalprocessessuchaspainreception[18].Inadditionto thesigni“canceofthetoxins,thenatureoftheextreme specializationofsnakevenomglandsfortherapidbut temporaryproductionandexportoflargequantitiesof proteincouldprovideinsightsintobasicmechanismsof proteostasis,thebreakdownofwhichisthoughttocontributetoneurodegenerativediseasessuchasParkinsons andAlzheimers[19]. Theeasterndiamondbackrattlesnake( Crotalus adamanteus )isapitvipernativetothesoutheastern UnitedStatesandisthelargestmemberofthegenus Crotalus ,reachinglengthsofupto2.44m[20].Thediet of C.adamanteus consistsprimarilyofsmallmammals (e.g.,squirrels,rabbits,andmouseandratspecies)and birds,particularlyground-nestingspeciessuchasquail [20].Becauseofitsextremesizeandconsequentlarge venomyield, C.adamanteus isarguablythemostdangeroussnakespeciesintheUnitedStatesandisoneof themajorsourcesofsnakebitemortalitythroughoutits range[21]. Crotalusadamanteus hasrecentlybecomeof interestfromaconservationstandpointbecauseofits decliningrange,whichatonetimeincludedsevenstates alongthesoutheasternCoastalPlain[22].Thisspecies hasnowapparentlybeenextirpatedfromLouisianaand islistedasendangeredinNorthCarolina[23,24].Asa consequenceofrecentworkbyRokytaetal.[11]based on454pyrosequencing,thevenomof C.adamanteus is amongthebest-characterizedsnakevenoms;40toxins havebeenidenti“ed. Transcriptomiccharacterizationsofvenomglands ofsnakes[25-28]andotheranimals[29-32]have reliedalmostexclusivelyonlow-throughputsequencing approaches.Sangersequencing,withitsrelativelylong, high-qualityreads,hasbeentheonlymethodavailable untilrecentlyandhasprovidedinvaluabledataonthe identitiesofvenomgenes.Becausevenomousspecies areprimarilynonmodelorganisms,high-throughput sequencingapproacheshavebeenslowtopervade the“eldofvenomics(butseeHuetal.[33]),despite becomingcommonplaceinothertranscriptomic-based “elds.Rokytaetal.[11]recentlyused454pyrosequencingtocharacterizevenomgenesfor C.adamanteus .More recently,Durbanetal.[34]used454sequencingtostudy thevenom-glandtranscriptomesofamixofRNAfrom eightspeciesofCostaRicansnakes.Whittingtonetal. [30]usedahybridapproachwithboth454andIllumina sequencingtocharacterizetheplatypusvenom-gland transcriptome,althoughtheyhadareferencegenome sequence,making denovo assemblyunnecessary.Pyrosequencingisexpensiveandlow-throughputrelativeto Illuminasequencing,andthehigherrorrate,particularlyforhomopolymererrors[35],signi“cantlyincreases thedicultyofidentifyingcodingsequenceswithout referencesequences. Wesequencedthevenom-glandtranscriptomeofthe easterndiamondbackrattlesnakewithIlluminatechnologyusingapaired-endapproachcoupledwithshort insertsizeseectivelytoproducelonger,high-quality readsontheorderofapproximately150nttofacilitate denovo assembly(anapproachsimilartothatof Rodrigueetal.[36]formetagenomics).Thedierencein readlengthfromthatof454sequencingwascompensated forbytheincreaseofmorethantwoordersofmagnitudeinthenumberofreads.Wedemonstrated denovo assemblyandanalysisofavenom-glandtranscriptome usingonlyIlluminasequencesandprovidedacomprehensivecharacterizationofboththetoxinandnontoxin genesexpressedinanactivelyproducingsnakevenom gland.ResultsanddiscussionVenom-glandtranscriptomesequencingandassemblyWegeneratedatotalof95,643,958pairsofreadsthat passedtheIlluminaquality“lterfor > 19gigabases(Gb)of sequencefromacDNAlibrarywithanaverageinsertsize of 170nt.Ofthesereads,72,114,709(75%)weremerged (seeMethods)onthebasisoftheir3overlap(Figure1), yieldingcompositereadsofaveragelength142ntwith averagephredqualities > 40andatotallength > 10Gb. Thismergingofreadsreducedtheeectivesizeofthedata setwithoutlossofinformationandprovidedlongreadsto facilitateaccurateassembly. Our“rstapproachtotranscriptomeassemblywas aimedatidentifyingtoxingenes.Weattemptedtouseas manyofthedataaspossibletoensuretheidenti“cation ofeventhelowest-abundancetoxins.Tothisend,weconductedextensivesearchesofassemblyparameterspacefor bothABySS[37,38](Table1)andVelvet[39]onthebasis ofthefullsetofbothmergedandunmergedreads.We usedtheassemblieswiththebest N 50valuesforfurther analysis.ForVelvet,theassemblyusinga k -mersizeof91 wasbest( N 50 = 408);thisassemblywassubsequently analyzedwithOases[40].ForABySS,thebest k -mervalue

PAGE 3

Rokyta etal.BMCGenomics 2012, 13 :312 Page3of23 http://www.biomedcentral.com/1471-2164/13/312 050100 Proposed fragment length 150200Number of matches observed 0 20 40 60 80 100Maximum possibl e matche sSignificance thresholdB A 0 50 100 20 40 60 Position in fragmentAverage quality score 150Read 1 Read 2 Merged read80Overlap = 50 nt Significant Figure1 Mergingoverlappingreads. ( A )Readsareslidalongeachotheruntilthenumberofmatchesexceedsthesigni“cancethreshold.Inthe exampleshown,theoptimaloverlapis74nucleotides(nt).( B )Thequalityofreadsdeclinesdramaticallytowardtheir3ends,whereoverlapoccurs ifthefragmentlengthislessthantwicethereadlength,allowingtheactualqualitytobemuchhigherthanthenominalvalues.Theexample shownistheaverageofpairsthatoverlapbyexactly50nt.Table1ABySSassemblysummaries TotalLongestContigsContigsTotal kce contigscontig > 200nt > N 50MedianMean N 50length 51222,168,05015,079147,94227,4103645927908.77 10751102329,60917,49337,5756,3995719851,6353.70 107511010337,03917,45937,9446,3905549671,6213.67 107512020191,36715,90627,5464,9315298781,4012.42 107513030135,96113,47221,9864,0344948121,2561.79 10751505087,09210,38015,4612,9554637371,0881.14 107511001042,3668,5538,5101,7254326589065.60 1065110010042,2518,5528,4011,7074316568995.52 10651100010002,3195,2325711234286317973.61 10561221,769,27417,166141,47125,1053616048278.55 10761102263,68817,49334,0765,9596181,0321,6913.52 107611010272,81417,45935,0026,0365869981,6513.50 107612020154,45915,90625,1144,5755458911,4082.24 107613030109,99410,07019,9943,7214968081,2321.62 10761505070,02910,34913,9162,6754557251,0731.01 107611001032,4768,3007,3181,4794266558944.79 1066110010032,3927,8227,2311,4634236528934.72 10661100010001,7095,2095311144246147983.27 10571221,431,41215,641131,74222,4223606178708.13 10771102208,03617,48429,7935,3936831,1011,7853.28 107711010219,09917,43231,4005,5676291,0411,7053.27 107712020122,21614,37221,8164,0525819281,4602.03 10771303086,59910,41617,1383,2495248351,2721.43 10771505054,69410,34111,9252,3134647291,0758.70 106711001023,9807,8176,1831,2534246508924.02 106

PAGE 4

Rokyta etal.BMCGenomics 2012, 13 :312 Page4of23 http://www.biomedcentral.com/1471-2164/13/312Table1ABySSassemblysummaries (Continued) 7110010023,9977,8106,1191,2394196448893.95 10671100010001,1995,202443874446608852.93 10581221,142,92415,303120,59319,6973546259117.54 10781102158,78117,71324,9394,7217881,2021,8983.00 107811010174,03217,68827,3665,0056911,0961,7743.00 10781202092,62715,78417,7153,3966541,0111,5731.79 10781303064,8669,86813,6972,6665929041,3661.24 10781505039,61310,3289,3581,8745137781,1307.28 106811001016,30310,1494,7779704546879633.28 1068110010016,49310,1554,8179814386719433.23 10681100010008895,198381824546497902.47 1059122932,23715,694108,95417,3943446229366.79 10791102124,64717,71320,4204,0258801,2932,0072.64 107911010142,30617,68723,5254,4287271,1261,8042.65 10791202072,11715,79213,6142,7027521,1081,7121.51 10791303048,54015,7929,9492,0097001,0231,5291.02 10791505027,58110,1996,4771,3366249011,3095.84 106911001010,50310,1053,0816585648161,1552.52 1069110010010,77010,1493,3087055287691,0782.54 10691100010005983,008342764386217542.12 105 wasalso91( N 50 = 2,007),butbecausetheperformance intermsoffull-lengthtranscriptsappearedtodepend stronglyonthecoverage( c )anderode( e )parameters,we furtheranalyzedthe k = 91assemblieswith c = 10 and e = 2, c = 100and e = 100,and c = 1000and e = 1000.Weidenti“edallfull-lengthtoxinsbymeansof blastxsearchesontheresultsofallfourassemblies. Aspartofour“rstapproach,wealsoperformed fourindependent denovo transcriptomeassemblieswith NGen:threewith20millionmergedreadseachandone withtheremaining12,114,709mergedreads(Table2).We identi“edallfull-lengthtoxinsfromallfourassemblies. Giventhatallthreeassemblymethodstendedtogenerate alargenumberoffragmentedtoxinsequences,apparently becauseofretainedintronsandpossiblyalternativesplicing,wedevelopedandimplementedasimplehash-table approachtocompletingpartialtranscripts,whichwewill refertoasExtender(seeMethods).WeusedExtenderon partialtoxinsequencesidenti“edfortwoofthefourNGen assemblies.Wealsoannotatedthemostabundantfulllengthnontoxintranscriptsforthethreeassembliesbased on20millionreads.AftercombiningalloftheannotatedtoxinandnontoxinsequencesfromtheABySS,Velvet,andNGenassembliesandeliminatingduplicates,we had72uniquetoxinsequencesand234uniquenontoxin sequences.Thepaucityoffull-lengthannotatednontoxinsre”ectsourfocusontoxinsequencesratherthantheir absenceintheassemblies. Oursecondapproachtotranscriptomeassembly wasdesignedtoannotateasmanyfull-lengthcoding sequences(toxinandnontoxin)aspossibleandtobuild areferencedatabaseofsequencestofacilitatethefutureTable2NGenassemblysummaries No.contigsAssembledUniquefull-UniqueExtender AssemblyNo.readsNo.contigs > 2ksequenceslengthtoxinstoxins NGen120,000,00012,6944,4039,786,0543454 NGen220,000,00012,7464,4399,821,2123654 NGen320,000,00012,6984,4129,820,55338… NGen412,114,7098,4843,0785,948,00334… Totaluniquefull-lengthtoxins=154

PAGE 5

Rokyta etal.BMCGenomics 2012, 13 :312 Page5of23 http://www.biomedcentral.com/1471-2164/13/312analysisofothersnakevenom-glandtranscriptomes.We foundthatNGenwasmuchmoresuccessfulatproducing transcriptswithfull-lengthcodingsequencesbutalso thatitwasquiteinecientwhenthecoveragedistributionwasextremelyuneven(seeFigure2).Feldmeyeret al.[41]alsofoundNGentohavethebestassemblyperformancewithIlluminadata.Wesoughttherefore“rst toeliminatethetranscriptsandcorrespondingreadsfor theextremelyhigh-abundancesequences.Todoso,we employedExtenderasa denovo assemblerbystarting from1,000individualhigh-qualityreadsandattempting tocompletetheirtranscripts(seeMethods).From1,000 seeds,weidenti“ed318full-lengthcodingsequenceswith 213toxinsand105nontoxins.Afterduplicateswereeliminated,thisprocedureresultedin58uniquetoxinand44 uniquenontoxinfull-lengthtranscripts.Thesesequences wereusedto“lterthecorrespondingreadsfromthefull setofmergedreadswithNGen.Wethenperformeda de 050010001500200025003000 51045103510251015100Fulllength transcript rank% total reads Nontoxin (2,879) Toxin (78) Fulllength transcript rank% total reads 0.050.100.200.501.002.00 5.00 Nontoxin (137) Toxin (63) % total reads BPP CRISP CTL LAAO MYO Others PLA2 SVMPII SVMPIII SVSP A51045103510251015100B Toxin clusters 0 200 Figure2 Dominationofthe C.adamanteus venom-glandtranscriptomebytoxintranscripts. The123uniquetoxinsequenceswereclustered into78groupswithlessthan1%nucleotidedivergenceforestimationofabundances.( A )Thevastmajorityoftheextremelyhighlyexpressed genesweretoxins.Theinsetshowsamagni“cationofthetop200transcripts.( B )Expressionlevelsofindividualtoxinclustersareshownwithtoxin classescodedbycolor.ThetoxinclustersareinthesameorderasinTable3.

PAGE 6

Rokyta etal.BMCGenomics 2012, 13 :312 Page6of23 http://www.biomedcentral.com/1471-2164/13/312novo transcriptomeassemblyon10millionofthe“ltered readswithNGen,annotatedfull-lengthtranscriptsfrom contigscomprising 200readswithsigni“cantblastx hits,andusedtheresultinguniquesequencesasanew“lter.Thisprocessofassembly,annotation,and“lteringwas iteratedtwomoretimes.Theendresultwas91unique toxinand2,851uniquenontoxinsequences. Theresultsfrombothassemblyapproacheswere mergedtoyieldthe“naldataset.The“rstapproach produced72uniquetoxinand234uniquenontoxin sequences,andthesecond91toxinand2,851nontoxinsequences.Themergeddatasetconsistedof 123uniquetoxinsequencesand2,879nontoxinsthat togetheraccountedfor62.9%ofthesequencingreads (Figure3).ToxintranscriptsWeidenti“ed123individual,uniquetoxintranscriptswith full-lengthcodingsequences.Toestimatetheabundances ofthesetranscriptsinthe C.adamanteus venom-gland transcriptome,weclusteredtheminto78groupswithless than1%ntdivergence(Table3).Clusterscouldinclude alleles,recentduplicates,orevensequencingerrors, whicharecharacteristicofhigh-throughputsequencing [42].Forlongergenes,clustersmightalsoincludedierent combinationsofvariablesitesthatarewidelyseparatedin thesequence.Wechose1%asapractical,butarbitrary, cut-oforde“ningclusters.Mappingreadsbacktomore similarsequencestoestimateabundanceswouldbeproblematicbecausereadscouldnotbeuniquelyassignedto aparticularsequence.Thetruenumberoftoxingenes Toxin Nontoxin Unidentified27.5% 35.4% 37.2% CT L SVSP MYO SVMPI I SVMPIII PLA2 LAAO Others CRISP BPPPercentage of toxin reads 0 5 10 15 2021 clusters 14 clusters 1 cluster 8 clusters 8 clusters 6 clusters 1 cluster 17 clusters 1 cluster 1 cluster Chaperones/ protein folding RibosomalMitochondrialProtein degradationPercentage of nontoxin reads 0 5 10 15 20 25 57 sequences 116 sequences 8 sequences 110 sequences A B Figure3 Expressionlevelsofmajorclassesoftoxinsandnontoxins. Morethan60%ofthetotalreadshavebeenaccountedforwithfull-length annotatedtranscripts.( A )ThemajortoxinclassesweretheCTLs,SVSPs,MYO,andSVMPs(typesIIandIII).( B )Asexpectedforaprotein-secreting tissue,thevenomglandexpressesanabundanceofproteinsinvolvedinproteostasis.

PAGE 7

Rokyta etal.BMCGenomics 2012, 13 :312 Page7of23 http://www.biomedcentral.com/1471-2164/13/312Table3Expressionlevelsoffull-lengthtoxinclusters RankClusternameClustersizeLength%totalreads%toxinreadsGenBankTSAaccessions 1MYO19945.9316.780JU173668 2LAAO130891.885.309JU173667 3SVSP-6117201.714.849JU173733 4CTL-837211.022.896a:JU173656, b:JU173657, c:JU173658 5SVSP-13116611.012.864JU173724 6PLA2-129049.68 10Š 12.739a:JU173675, b:JU173676 7SVSP-3218309.38 10Š 12.653a:JU173728, b:JU173729 8SVMPII-5821249.15 10Š 12.587a:JU173694, b:JU173695, c:JU173696, d:JU173697, e:JU173698, f:JU173699, g:JU173700, h:JU173701 9CTL-457809.14 10Š 12.585a:JU173646, b:JU173647, c:JU173648, d:JU173649, e:JU173650 10SVMPII-1522989.13 10Š 12.583a:JU173682, b:JU173683, c:JU173684, d:JU173685, e:JU173686 11SVMPIII-2522468.97 10Š 12.538a:JU173707, b:JU173708, c:JU173709, d:JU173710, e:JU173711 12SVMPII-2221388.38 10Š 12.369a:JU173687, b:JU173688 13SVSP-1131208.30 10Š 12.348JU173726 14CTL-1617117.82 10Š 12.211JU173631 15SVMPII-7120827.74 10Š 12.191JU173703 16SVMPII-3419317.69 10Š 12.176a:JU173689, b:JU173690, c:JU173691, d:JU173692 17PLA2-418906.98 10Š 11.974JU173679

PAGE 8

Rokyta etal.BMCGenomics 2012, 13 :312 Page8of23 http://www.biomedcentral.com/1471-2164/13/312Table3Expressionlevelsoffull-lengthtoxinclusters (Continued) 18CTL-367976.73 10Š 11.905a:JU173640, b:JU173641, c:JU173642, d:JU173643, e:JU173644, f:JU173645 19SVMPII-6121836.31 10Š 11.784JU173702 20SVMPIII-3324016.00 10Š 11.698a:JU173712, b:JU173713, c:JU173714 21SVSP-91102705.96 10Š 11.685JU173738 22SVMPII-4120165.41 10Š 11.530JU173693 23SVSP-8135245.41 10Š 11.529JU173737 24CTL-717635.40 10Š 11.528JU173655 25CTL-1817755.33 10Š 11.509JU173633 26CTL-117635.15 10Š 11.458JU173635 27SVSP-1219575.04 10Š 11.424JU173723 28CTL-616074.80 10Š 11.358JU173654 29CRISP115794.66 10Š 11.317JU173623 30SVMPIII-7123434.31 10Š 11.218JU173719 31SVMPII-8118634.24 10Š 11.198JU173704 32PLA2-618344.15 10Š 11.174JU173681 33SVSP-7321083.59 10Š 11.016a:JU173734, b:JU173735, c:JU173736 34CTL-1519473.59 10Š 11.016JU173630 35PLA2-218723.58 10Š 11.013JU173677 36CTL-1015873.49 10Š 10.987JU173624 37CTL-14112463.36 10Š 10.951JU173629 38CTL-1316803.15 10Š 10.891JU173628 39PLA2-516512.90 10Š 10.819JU173680 40CTL-927552.88 10Š 10.814a:JU173659, b:JU173660 41BPP113002.57 10Š 10.726JU173621 42SVMPIII-1226652.41 10Š 10.683a:JU173705, b:JU173706 43SVSP-14117322.23 10Š 10.631JU173725 44SVMPIII-4224332.16 10Š 10.611a:JU173715, b:JU173716 45CTL-227491.90 10Š 10.538a:JU173638, b:JU173639 46SVMPIII-8121501.71 10Š 10.484JU173720 47VESP116031.71 10Š 10.483JU173741 48SVMPIII-5123391.59 10Š 10.449JU173717

PAGE 9

Rokyta etal.BMCGenomics 2012, 13 :312 Page9of23 http://www.biomedcentral.com/1471-2164/13/312Table3Expressionlevelsoffull-lengthtoxinclusters (Continued) 49SVSP-4221521.34 10Š 10.379a:JU173730, b:JU173731 50CTL-2017851.19 10Š 10.336JU173636 51NUC127061.16 10Š 10.327JU173671 52SVSP-5118901.12 10Š 10.317JU173732 53SVMPIII-6123671.10 10Š 10.311JU173718 54CTL-2118241.09 10Š 10.307JU173637 55CTL-1916181.05 10Š 10.296JU173634 56NF113959.53 10Š 20.269JU173669 57CTL-1228259.40 10Š 20.266a:JU173626, b:JU173627 58SVSP-2116757.35 10Š 20.208JU173727 59CTL-536375.83 10Š 20.165a:JU173651, b:JU173652, c:JU173653 60PDE127435.62 10Š 20.159JU173674 61CTL-1116254.86 10Š 20.137JU173625 62PLA2-319573.22 10Š 20.091JU173678 63CREGF119453.09 10Š 20.087JU173622 64SVSP-10118152.22 10Š 20.063JU173721 65HYAL-1125452.10 10Š 20.059JU173662 66KUN116981.14 10Š 20.032JU173666 67HYAL-2113027.83 10Š 30.022JU173663 68KUN-1125755.11 10Š 30.014JU173664 69VEGF-129064.99 10Š 30.014a:JU173739, b:JU173740 70SVSP-11112074.46 10Š 30.013JU173722 71GC117303.65 10Š 30.010JU173661 72PDE-6136913.36 10Š 30.010JU173673 73NGF19513.14 10Š 30.009JU173670 74KUN-2114382.68 10Š 30.008JU173665 75PDE-4126332.24 10Š 30.006JU173672 76VF150871.70 10Š 30.005JU173742 77WAP16275.60 10Š 40.002JU173743 78CTL-1717743.80 10Š 40.001JU173632 for C.adamanteus probablyliessomewherebetween78 and123.Thisrangeisatthelowerendofthenumber ofuniquetoxinstypicallyidenti“edforviperidsbymeans ofproteomictechniques[10],whichmayindicatethat thevenomof C.adamanteus islesscomplexthanthatof otherspecies.Alternatively,posttranscriptionalprocesses suchasalternativesplicingorposttranslationalmodi“cationscouldsigni“cantlyincreasethediversityoftoxins presentinthevenom.Ouridenti“edtoxinsaccountedfor 35.4%ofthetotalreads(Figure3),andthevastmajority oftheextremelyhigh-abundancetranscriptswerethose encodingtoxinproteins(Figure2A).Wenamedtoxins withacombinationofatoxin-classabbreviation,acluster number,and,iftheclusterhadmorethanasinglemember, alower-caselettertoindicatethememberofthecluster (e.g.,CTL-3b). Weusedthenumberorpercentageofreadsmapping toaparticulartranscriptasameasureofitsabundance. Althoughaveragecoveragemightbeamoreappropriate proxyforthenumberofcopiesofagiventranscript

PAGE 10

Rokyta etal.BMCGenomics 2012, 13 :312 Page10of23 http://www.biomedcentral.com/1471-2164/13/312present,becauseitaccountsfordierencesintranscript lengths,wepreferreadcountsasameasureoftheexpressionexpenditureonagiventranscriptbecausetheybetter re”ecttheenergeticcostassociatedwithproducingthe encodedproteinandareconsistentwithpreviouswork usinglow-throughputsequencing(see,e.g.,Paharietal. [25]).Inaddition,thismeasurementshouldmoreclosely matchproteomic-basedmeasurementsofthecontentsof venomcomponents(see,e.g.,Gibbsetal.[43])which comeintheformofthepercentagesoftotalpeptidebonds inthesample.SnakevenommetalloproteinasesWeidenti“ed39uniquesequencesand16clustersof snake-venommetalloproteinases(SVMPs)thataccounted for24.4%ofthereadsmappingtotoxinsequencesand 8.6%ofthetotalreads(Figure3AandTable3).Interms oftotalreads,theSVMPswerethemostabundantclassof toxinsinthe C.adamanteus venom-glandtranscriptome. SVMPsaretheprimarysourcesofthelocalandsystemic hemorrhageassociatedwithenvenomationbyviperids andaredividedintoanumberofsubclassesbasedon theirdomainstructure[44,45].AllSVMPshaveametalloproteinasedomaincharacterizedbyazinc-bindingmotif. AlloftheSVMPsidenti“edfor C.adamanteus belongto eitherthetypeIIorthetypeIIIsubclass.TypeIISVMPs (SVMPIIs)haveadisintegrindomaininadditiontothe metalloproteinasedomain,whichmaybeproteolytically cleavedposttranslationallytoproduceafreedisintegrin. TypeIIISVMPs(SVMPIIIs)haveadisintegrin-likeanda cysteine-richdomaininadditiontothemetalloproteinase domain.Wefound8clustersofeachofthesetwosubclasseswith23uniqueSVMPIIsequencesand16unique SVMPIIIsequences.SVMPIIandSVMPIIIclusterscomprise16.4%and8.0%ofthereadsmappingtotoxins respectively(Figure3).Thesequencesinbothsubclasses arediverse.Themaximumpairwisentdivergenceforthe SVMPIIswas10.0%,correspondingtoamaximumaminoaciddivergenceof18.1%.FortheSVMPIIIs,themaximumpairwisentdivergencewas20.4%withamaximum amino-aciddivergenceof42.3%.AlthoughSVMPswere thedominanttoxinsasaclass,theindividualSVMPclusterwiththehighestabundancewasSVMPII-5,whichwas onlytheeighthmostabundanttoxincluster(Figure2B andTable3). Mackessy[46]categorizedrattlesnakevenomsastype IortypeIIonthebasisoftheirtoxicitiesandmetalloproteinaseactivities.Thesetwomeasurementstendtobe inverselyrelatedinrattlesnakes:species(orpopulations) withlowLD50valuestendalsotohaveloworundetectable hemorrhagicactivities.SVMPsarethemajorhemorrhagic componentsofsnakevenoms,andhightoxicityappears tobecausedmostlybyneurotoxicvenomcomponents. Low-toxicityvenomswithhighmetalloproteinaseactivity areclassi“edastypeI,andhigh-toxicityvenomswithlow metalloproteinaseactivityareclassi“edastypeII.Onthe basisoftheabundanceofSVMPsinthevenom-gland transcriptome, C.adamanteus clearlyhastypeIvenom, althoughtherelativelylowtoxicityofitsvenom[46]is atleastpartiallycompensatedforbyitslargesizeand venomyield.C-typelectinsThemostdiverseandthesecondmostabundanttoxin classinthe C.adamanteus venom-glandtranscriptome wastheC-typelectin(CTL)class.Weidenti“ed37unique sequencesand21clustersofCTLsthataccountedfor 22.2%ofthereadsmappingtotoxinsand7.8%ofthe totalreads(Figure3AandTable3).CTLsgenerallyeither inhibitoractivatecomponentsofplasmaorblood-cell types,therebyinterferingwithhemostasis[47].Most knownsnake-venomCTLsfunctionasheterodimersor evenmorecomplexarrangements[48],probablyaccountinginpartfortheirdiversity.Thedivergenceamong membersofthisclasswithinthe C.adamanteus genome wasextreme,althoughallmemberspreservedaCTL-like domain.Somepairssharedvirtuallynoconservedaminoacidpositions.ThreeoftheCTLclustersprovideevidence fortherelevanceofalternativesplicinginthegeneration oftoxinproteins.CTL-3f,CTL-4e,andCTL9ballhave48ntinsertionsinthesameregionbutareotherwisesimilar oridenticaltoothermembersoftheirclusters.SnakevenomserineproteinasesThethirdmostabundanttoxinclassfor C.adamanteus wasthesnake-venomserineproteinases(SVSPs).We identi“ed18uniquesequencesand14clustersinthis toxinclass,accountingfor20.0%ofthetoxinreadsand 7.1%ofthetotalreads(Figure3AandTable3).Threeof the10mosthighlyexpressedindividualtoxinswereSVSPs (Figure2).SVSPsinterferewithawidearrayofreactions involvingbloodcoagulationandhemostasisandbelongto thetrypsinfamilyofserineproteases[49,50].Mackessy [46]detectedsigni“cantthrombin-likeandkallikreinlikeactivityinthevenomof C.adamanteus ,whichare attributabletotheactionofSVSPs.ThediversityofSVSPs withinthe C.adamanteus genomeishigh;maximumpairwisentdivergenceis20.6%andamino-aciddivergence is47.4%. ThemembersoftwoSVSPclustersdierinaway thatshouldbenoted.ThelengthsofSVSPsaregenerallywellconservedthroughouttheclass.SVSP-7ahasa 27-ntinsertionrelativetothetwoothermembersofits clusterbutisotherwiseidenticaltoSVSP-7b.Thisdierencecouldre”ectthepresenceofalternativesplicingfor thisgene.SVSP-3aisuniqueamongthe C.adamanteus SVSPsorthoseknownfromothersnakespeciesinapparentlyhavinga65-amino-acidextensionofitsC-terminal

PAGE 11

Rokyta etal.BMCGenomics 2012, 13 :312 Page11of23 http://www.biomedcentral.com/1471-2164/13/312region.Theothermemberofitscluster,SVSP-3b,hasa singledeletionofaCntinapoly-Ctractthatterminates itscodingsequenceconsistentlywithotherknownSVSPs. ThereadsgeneratingtheSVSP-3aformvastlyoutnumberthosefortheSVSP-3bform;morethan95%ofthe readssupporttheextendedversionoftheprotein.The eect,ifany,ofthisC-terminalextensionremainstobe determined.PhospholipaseA2sPreviousworkwith C.adamanteus identi“edonlyasingle phospholipaseA2(PLA2)sequence[11],butweidentifedsevenuniquesequencesinsixclusters(Figure2 andTable3),accountingfor7.8%ofthetoxinreadsand 2.8%ofthetotalreads(Figure3).PLA2sareamongthe mostfunctionallydiverseclassesofsnake-venomtoxins andhavepharmocologicaleectsrangingfromneurotoxicity(presynapticorpostsynaptic)tomyotoxicityand cardiotoxicity.Anticoagulantandhemolyticeectsdueto PLA2sarealsoknown[51,52].Comparedtoothertoxin classesof C.adamanteus ,thediversityofPLA2sislow. Fiveofthesixclustersareallwithin5%ntdivergenceof oneanother.PLA2-3isthelone,high-divergenceoutlier, dieringbymorethan31%atthentlevelfromtheother clusters.PLA2-3isalsoexpressedatthelowestlevelofany ofthePLA2s(Table3).Otherhigh-abundancetoxinsTheSVMPs,CTLs,SVSPs,andPLA2saccountfor74%of thereadsmappingtotoxinsequences(Figure3),73%of thetoxinclusters,and82%oftheuniquetoxinsequences. Theremainingtoxinsbelongto16dierentclasses.Many ofthesearelow-abundancetranscripts(Figure2and Table3)andmaynotactuallyfunctionassigni“cant toxins,whereasseveralothershavehightomoderate abundancesandrepresentsigni“cantcomponentsofthe venom. Themostabundanttoxintranscriptandthemost abundanttranscriptoverall(Figure2)wasasmallbasic myotoxinrelatedtocrotamine[53,54].Theprecursorproteinisjust70aminoacidsinlengthwithapredicted22amino-acidsignalpeptide.Thistranscriptwasdetected byRokytaetal.[11],butthecodingsequencewasprematurelytruncatedintheirsequencebecauseofasinglent deletion.Thistoxinaccountsfor16.8%ofthetoxinreads (Figure3A)and5.9%ofthetotalreads.Crotamine,originallyisolatedfromthevenomof C.durissus ,causesspasticparalysisinmiceandisfoundinthevenomsofmany speciesof Crotalus [54].Musclespasms,twitching,and paralysisofthelegshavebeenreportedforhumanenvenomationsby C.adamanteus [20].Interestingly,Straight etal.[55]notedthatindividualsof C.adamanteus from populationsinsouthernandcentralFloridalackthistoxin intheirvenoms.Giventhatthismyotoxinisthemost abundanttranscriptinthevenomofourspecimen,its absenceinsouthernpopulationspointstoadramaticdifferenceinvenomswithinthisspeciesandthepotentialfor signi“cantlydierentpathologicaleectsassociatedwith bitesfromdierent C.adamanteus populations. AsingleL-amino-acidoxidase(LAAO)transcriptwas thesecondmostabundanttoxintranscript(Figure2B), consistentwiththepreviouslydetectedLAAOactivity inthe C.adamanteus venom[46].Thissingletranscriptaccountedfor5.3%ofthereadsmappingtotoxins and1.9%ofthetotalreads.LAAOsare”avoproteins, givingthevenomitsyellowcolor;canbeedema-or apoptosis-inducing;andcaninduceorinhibitplatelet aggregation[56].Theseeectsareprobablymediatedby H2O2releasedduringtheoxidationreactioncatalyzed bytheenzyme.The29thmostabundanttoxintranscript wasacysteine-richsecretoryprotein(CRISP)(Figure2B andTable3),accountingfor1.3%ofthetoxinreads (Figure3A).AlthoughCRISPsarewidelyfoundinsnake venoms,theirpreciseeectsarenotwellestablished [57],buttheyappeartointerferewithsmooth-muscle contraction[58,59].AsingletranscriptforabradykininpotentiatingandC-typenatriureticpeptidetranscript (BPP)wasfoundtoaccountfor0.7%ofthetoxinreads (Figure3A).Theencodedproteinissimilartoaproteinidenti“edin Sistruruscatenatus (GenBankaccession: DQ464265)thatwashypothesizedtoreducebloodpressureinenvenomatedprey[25].Alossofbloodpressurehasbeenreportedinhumanenvenomationsby C.adamanteus [20].Otherlow-abundancetoxinsTheremaining17clustersareclassi“edasothersŽin Figure3A.Becauseeachhasarelativelylowexpression level(Table3),manyoftheseshouldbeconsideredputativetoxinsuntiltheirpresenceinthe C.adamanteus venomiscon“rmedproteomicallyandpharmacological eectsareassociatedwiththem. Rokytaetal.[11]detectedthepresenceofatranscript encodingaproteinhomologoustoohaninfrom Ophiophagushannah [60,61]andtoahomologousproteinfrom Lachesismuta [62];wefoundatranscriptidenticalto thatofRokytaetal.[11].Pungetal.[60,61]foundthe O.hannah versionofthisproteintoincreasepainsensitivity(hyperalgesia)andtoinducetemporaryhypolocomotioninmiceandproposednamingtheclassvespryns (VESP).Exceptionallyintensepainhasbeenreported afterenvenomationofhumansby C.adamanteus [20], althoughwhethersuchpainisduetoaspeci“ctoxinis notclear. Wedetectedthreedierentnucleotidases(NUCs)and “vedierentphosphodiesterases(PDEs)inthevenomglandtranscriptomeof C.adamanteus .Onlyoneofthe NUCsandthreeofthePDEshadsignalpeptides,and

PAGE 12

Rokyta etal.BMCGenomics 2012, 13 :312 Page12of23 http://www.biomedcentral.com/1471-2164/13/312wethereforeonlyconsideredtheseaspotentialtoxins: NUC,PDE,PDE-4,andPDE-6(Table3).Therolesof theseenzymesinvenomsareuncertain,buttheirprimary functionmaybetoliberatetoxicnucleosides[63-65]. Signi“cantPDEactivityhasbeendetectedpreviouslyin thevenomof C.adamanteus [46]. The C.adamanteus venom-glandtranscriptomecontainedthreeKunitz-typeproteaseinhibitors(KUNs). Twoofthesesharedmorethan75%animo-acididentity withaKUNfrom Austrelapslabialis (GenBankaccession:B2BS84),anAustralianelapid.AllthreeKUNshave domainsthatplacetheminthesuperfamilyofbovinepancreatictrypsin-likeinhibitors,andsnaketoxinsfromthis familyareknowntoinhibitplasmaserineproteinases. AlthoughKUNsarecommonlyobservedinsnakevenoms, theirroleinenvenomation(ifany)isnotwellde“ned[66]. ThethreeKUNsdetectedfor C.adamanteus areallatrelativelylowabundances,suggestingthattheyarenotmajor componentsofthevenom. Weidenti“edtwotranscripts,HYAL-1andHYAL-2, encodinghyaluronidase-likeproteins.Hyaluronidasesare generallyregardedasvenomcomponentsthatpromote thedisseminationofothervenomcomponentsbydegradingtheextracellularmatrixatthesiteofinjection[67], althoughtheymayhavemoredirecttoxiceects[68]. Thecodingsequencesofourtwotranscriptsdieronly inthepresenceofa765-ntdeletioninHYAL-2relativeto HYAL-1.TruncatedhyaluronidasessuchasHYAL-2have beendetectedinthevenomsofotherviperidspecies[67] andmayrepresentanexampleofalternativesplicing.We alsoidenti“edatranscriptencodingaglutaminyl-peptide cyclotransferase(glutaminylcyclase;GC).Manysnake venomcomponentshaveNterminiblockedbypyroglutamate,andGCscatalyzetheformationofthisblock.This componentisrelatedmoretomaturationandprotection ofothertoxinsandprobablycontributesonlyindirectlyto toxicity[69]. Weidenti“edsixgrowth-factor-relatedsequencesinthe venom-glandtranscriptomeof C.adamanteus :anerve growthfactor(NGF),aneurotrophicfactor(NF),two vascularendothelialgrowthfactors(VEGF)inasinglecluster,andacysteine-richwithEGF-likedomain protein(CREGF).TheNGFtranscriptencodesa241 amino-acidprecursorproteinandshares99%aminoacididentitywithaNGFfrom C.durissus (GenBank accession:AAG30924).TheNFtranscriptencodesa180amino-acidprecursorthatshareshomologywithmesencephalicastrocyte-derivedneurotrophicfactors.We foundnoclosevenom-relatedsequencesforthisNFin theavailabledatabases.TheVEGFsequencesappearto bealternativelysplicedversionsofoneanother.VEGF1aencodesa192-amino-acidprecursor,andVEGF-1b encodesa148-amino-acidprecursor.Asidefromthe 132-ntdeletioninVEGF-1brelativetoVEGF-1a,their codingsequencesareidentical.Bothformshavedatabase matchesofthesamelengthwith99%amino-acididentityfrom Trimeresurus”avoviridis (GenBankaccessions: AB154418andAB154419).Finally,wedetectedthesame cysteine-richwithEGF-likedomainproteinasdescribed byRokytaetal.[11]. The“naltwoputativetoxintranscriptsareofquestionablesigni“cancebecauseoftheirlowexpressionlevels. Asinglesequencewith77%amino-acididentitytoa waprin(WAP)sequencefrom Philodryasolfersii (GenBankaccession:EU029742),arear-fangedcolubrid,was detected.Relatedsequenceshavebeendetectedinavarietyofotherrear-fangedsnakespecies,butsuchproteins areonlyknowntoexhibitantimicrobialactivity[70].We detectedavenomfactor(VF)transcriptthatshares87% animo-acididentitywithaVFfrom Austrelapssuperbus (GenBankaccession:AY903291)[71].The C.adamanteus VFtranscriptencodesa1,652-amino-acidprecursorwith a22-amino-acidsignalpeptide.Thebest-studiedmember ofthistoxinfamilyiscobravenomfactor,whichisknown toactivatethecomplementsystem[72].Theextremely lowexpressionlevelsofthesetranscriptsmayindicatethat theyrepresenttheorthologousgenestotheancestorsof theknowntoxicformsandmaythereforehavenotoxic functions.ComparisontopreviousworkRokytaetal.[11]previouslydescribedtoxintranscriptsin thevenom-glandtranscriptomeof C.adamanteus onthe basisof454pyrosequencing.TheirworkusedRNAfrom thevenomglandofthesameindividualusedinthepresent work.Theyfound40uniquetoxintranscripts,10ofwhich containedonlypartialcodingsequences.Table4liststhe closestmatchesfromourcurrentsequencestothoseof Rokytaetal.[11].Thevastmajorityofthe454-based sequenceshadeitheridenticalmatchesinourcurrent setoftoxinsormatcheswithlessthan1%ntdivergence (Table4).Onlyasingle454toxin,SVSP-9,didnothave aclosematch.Thissequencecontainsonlyapartialcodingsequenceandthereforemaynotrepresentatrue, functionaltoxin.NontoxintranscriptsWecharacterizedthenontoxingenesexpressedinthe C.adamanteus venomglandbytwomeans.First,we tookallofthecontigsfromoneofourfour denovo NGenassembliesbasedon20millionmergedreadsand conductedafullBlast2Go[73]analysisonthecontigs comprising 100reads.Ofthe12,746contigs(assembly2inTable2),wewereabletoprovidegeneontology (GO)annotationsfor9,040ofthem(Figure4A).The majorfunctionalclasses(level2)representedinthese resultswerebindingandcatalysis,followedbytranscriptionregulation(Figure4B).Themajorbiologicalprocess

PAGE 13

Rokyta etal.BMCGenomics 2012, 13 :312 Page13of23 http://www.biomedcentral.com/1471-2164/13/312Table4CorrespondencewiththeresultsofRokytaetal.[11] 454nameAccessionClosestmatch%ntdivergenceNotes CREGFHQ414087CREGF0.1 CRISPHQ414088CRISP0.0Identical CTL-1HQ414089CTL-4a0.0Identical CTL-2HQ414090CTL-8a0.0Identical CTL-3HQ414091CTL-10.0Identical CTL-4HQ414092CTL-9a0.8 CTL-5HQ414093CTL-3e0.9 CTL-6HQ414094CTL-12b0.0Identical CTL-7HQ414095CTL-100.0Identical CTL-8HQ414096CTL-2a0.0Identical CTL-9HQ414097CTL-5a0.0Identical HYALHQ414098HYAL-10.0454versionincomplete LAAOHQ414099LAAO0.0Identical MYOHQ414100MYO0.0454versionhas1-ntdeletionthattruncatesthecodingsequenceprematurely NUCHQ414101NUC0.0454versionincomplete PDE-1HQ414102PDE0.0454versionhas123-ntinsertion PDE-2HQ414103PDE-2(nontoxin)0.0454versionincomplete;nosignalpeptide;nolongerconsideredtoxin PLA2HQ414104PLA2-1b0.0Identical PLBHQ414105PLB(nontoxin)0.2Nolongerconsideredtoxin SVMP-1HQ414106SVMPII-3b0.0Identical SVMP-2HQ414107SVMPII-3b/c0.5 SVMP-3HQ414108SVMPII-5a0.3 SVMP-4HQ414109SVMPIII-2d1.2 SVMP-5HQ414110SVMPIII-4b1.0 SVMP-6HQ414111SVMPIII-2d0.2 SVMP-7HQ414112SVMPIII-4a0.0Identical SVMP-8HQ414113SVMPIII-50.5454versionincomplete SVMP-9HQ414114SVMPIII-1a/b0.0454versionincomplete SVMP-10HQ414115SVMPIII-60.0454versionincomplete SVMP-11HQ414116SVMPIII-3a0.0454versionincomplete SVSP-1HQ414117SVSP-3a0.0454versionhas1-ntdeletionthattruncatesthecodingsequenceprematurely SVSP-2HQ414118SVSP-10.0Identical SVSP-3HQ414119SVSP-7a0.0Identical SVSP-4HQ414120SVSP-50.1 SVSP-5HQ414121SVSP-90.5 SVSP-6HQ414122SVSP-60.0Identical SVSP-7HQ414123SVSP-4b0.0454versionincomplete SVSP-8HQ414124SVSP-20.0454versionincomplete SVSP-9HQ414125None > 10454versionincomplete VESPHQ414126VESP0.0Identical

PAGE 14

Rokyta etal.BMCGenomics 2012, 13 :312 Page14of23 http://www.biomedcentral.com/1471-2164/13/312 Percentage of total contigs 0 20 40 60 80No Blast (too long) No Blast hits No mapping No annotation Annotated NGen assembly (12,746 contigs) Annotated nontoxins (2,879 contigs) Molecular function (level 2)Percentage of total GO hits 0 10 20 30 40 50Binding Catalytic activity Transcription regulator activity Enzyme regulator activity Transporter activity Molecular transducer activity Structural molecule activity Electron carrier activity Translation regulator activity Antioxidant activity Protein tag NGen assembly (13,856 hits) Annotated nontoxins (4,205 hits) Biological process (level 2)Percentage of total GO hits 0 5 10 15 20Cellular process Metabolic process Biological regulation Localization Cellular component organization Developmental process Response to stimulus Signaling Multicellular organismal process Death Cell proliferation Multiorganism process Reproduction Growth Viral reproduction NGen assembly (27,847 hits) Annotated nontoxins (8,614 hits) Cellular component (level 2)Percentage of total GO hits 0 10 20 30 40Cell Organelle Macromolecular complex Membraneenclosed lumen Extracellular region NGen assembly (17,376 hits) Annotated nontoxins (5,686 hits)B A CD Figure4 Comparisonofgeneontology(GO)resultsforourannotatedfull-lengthnontoxinsequenceswiththoseofthecontigsfroma de novo assemblywithNGen. Onlylevel2GOtermsareshown.ThedistributionsofGOtermsaresimilaracrossdatasets,suggestingthatthe annotatedtranscriptsprovidedacomprehensivecharacterizationofthegenesexpressedinthevenomgland.( A )Thedistributionsofsequences reachingvariousstagesofidenti“cationandannotationareshown.Thelevel2GOtermsareshownformolecularfunction( B ),biologicalprocess ( C ),andcellularcomponent( D ).GOterms(level2)werecellularprocessesandmetabolic processes(Figure4C).Interestingly,viralreproductive functionwasdetectedandprobablyrepresentstheactivityoftransposableelementsorretroviruseslikethose previouslynotedinsnakevenom-glandtranscriptomes [34].ThemajorcellularcomponentGOterms(level2) werecellandorganelle(Figure4D).Fortheseresults,we madenoattempttoexcludetoxinsequences,becausethey arenecessarilyasmallminorityofthetotalsequences, anddidnotrequirethatcontigscontainfull-length codingsequences. Foroursecondapproach,weusedonlythe2,879transcriptswithfull-lengthcodingsequencesfornontoxin proteins.WeanalyzedthesesequenceswithBlast2GO. Thedistributionsoflevel2GOtermsforthesedata werealmostidenticaltothoseofthefullNGenassemblydescribedabove(Figure4),suggestingthatour2,879 annotatednontoxinsequencesprovidearepresentative sampleofthefullvenom-glandtranscriptome.Thefull distributionsofGOtermsforthesesequencesacrossall levelsareshowninFigures5,6,and7.Asexpectedfor asecretorytissue,processesrelatedtoproteinproductionandsecretionwerewellrepresented(e.g.,protein transportandproteinmodi“cation;Figure5),aswere protein-bindingfunctions(Figure6)andproteinslocalizedtotheendoplasmicreticulum(ER)andtheGolgi apparatus(Figure7). Fourofthetop20mosthighlyexpressednontoxin genes(Table5),includingthemosthighlyexpressed, wereproteindisul“deisomerases(PDIs).Inparticular, theyweremembersofthePDIfamilythatisretained intheERandarecharacterizedbyhavingtwoormore PDIdomains,whicharesimilartothioredoxin.PDIscatalyzetheformationorbreakingofdisul“debondsandare thereforeinvolvedinproteinfolding.Molecularchaperoneswerewellrepresentedinthetop20nontoxinsby fourgenes:endoplasmin(amemberoftheHSP90family), calreticulin,78-kDAglucose-regulatedprotein(GRP78),

PAGE 15

Rokyta etal.BMCGenomics 2012, 13 :312 Page15of23 http://www.biomedcentral.com/1471-2164/13/312 Number of sequences 0 500 1000 1500 2000Biological Cellular Metabolic Primary metabolic Cellular metabolic Biological reg. Reg. biological Macromolecule metabolic Biosynthetic Cellular macromolecule metabolic Protein metabolic Transport Establishment localization Localization Nuc. metabolic Cellular nitrogen compound metabolic Nitrogen compound metabolic Developmental Cellular protein metabolic Cellular component organizatio n Signaling Gene expression Cellular macromolecule biosyntheti c Cellular biosynthetic Macromolecule biosyntheti c Multicellular organismal development Multicellular organismal Signal transduction Signal transmission Reg. cellular Signaling Resp. stimulus Nucleic acid metabolic Protein modification Macromolecule modification Transcription Catabolic Resp. stress Organelle organization Cell differentiation Cellular developmental Protein transport Establishment protein localization Protein localization Macromolecule localization Cell death Deat h Anatomical structure morphogenesis Anatomical structure developmen t Cell cycle Cell proliferation Translation Lipid metabolic Cell communicatio n Reg. biological quality Resp. external stimulus Embryonic development Ion transport Reproduction Number of sequences 0 20 40 60 80 100 120Carbohydrate metabolic Cytoskeleton organization Resp. endogenous stimulus Cellcell signaling Gen. precursor metabolites energy Cellular homeostasis Homeostatic DNA metabolic Cellular amino acid metabolic Small molecule metabolic Growth Resp. abiotic stimulus Resp. biotic stimulu s Behavior Cell_growth Reg. cell size Reg. cellular component size Reg. anatomical structure size Mitochondrion organization Secondary metabolic Viral reproduction Reg. gene expression, epigenetic Reg. gene expression Reg. macromolecule metabolic process Reg. metabolic processs Symbiosis Interspecies interaction Multiorganism Cell recognition Figure5 Thebiological-processGOtermsidenti“edforthe2,879annotatedfull-lengthnontoxinsequences. Termsspeci“cforthe production,processing,andexportofproteinsarehighlightedinblack.Theinsetshowsthelow-abundanceportionofthefulldistribution.andheatshockprotein5.Thelattergeneappearstobe asplicevariantofGRP78,dieringwithinthecoding regionbytwopointmutationsandtwoshortdeletions. AllofthesechaperonesareERspeci“c.Sixofthetop20 nontoxinsweremitochondrialgenesinvolvedinoxidativecellularrespiration,consistentwiththehighenergeticdemandsofvenomproduction[74]:cytochromeC oxidasesubunitsIandIII,cytochromeB,andNADH dehydrogenasesubunits1,4,and5.Thecellsofvenom glandsareparticularlyrichinmitochondria[75].Four geneswereinvolvedinvariousaspectsoftranslation: twotranslationelongationfactors,18SrRNA,andvigilin.VigilinsarehypothesizedtobeinvolvedinregulatingmRNAstabilityandtranslationandmightbe involvedinRNA-mediatedgenesilencing[76,77].The “naltop20nontoxingenewasactin,acomponentofthe cytoskeleton. Theabundancesofseveralmajorclassesofnontoxins areprovidedinFigure3B.Weidenti“ed57sequenceswith functionsrelatedtoproteinfolding[19,78-80],including variousclassesofheat-shockproteins,protein-disul“de isomerases,peptidyl-prolylcis-transisomerases,dnaJcomplexcomponents,andT-complexcomponents.These sequencestogetheraccountedfor28.4%ofthetotal readsmappingtonontoxins.Ribosomal-proteintranscripts(cytoplasmicandmitochondrial)accountedfor 9.5%ofthenontoxinreads,andmitochondrialgenes accountedforanother9.0%.Finally,weidenti“ed110 sequencestranscriptsencodingproteinsinvolvedinproteindegradation[81,82],includingproteinsinvolvedin theubiquitin-proteasomesystemandtheER-associated protein-degradationsystem[83],whichaccountedfor 2.6%ofthenontoxinreads.Protein-qualitycontrolshould beessentialinahigh-throughputprotein-producingtissuesuchasasnakevenomgland. Ourcollectionofnontoxinsincludedseveralnotable potentialinhibitorsofthetoxinsorotherproteases (Table6).Suchinhibitorsmayplayaroleinpreventing autolysis[84]ormayservetoprotectvenomcomponents onceinsideavictim[85].Wedetectedthreecystatin-like transcriptsinthevenomgland.Cystatinsarecysteineproteaseinhibitorsandhavebeendetectedinnumerous elapidvenomglandsandvenoms[85].Wedetectedthree uniquemetalloproteinaseinhibitorsandtwoserineproteinaseinhibitors(serpins).Finally,wefoundfourunique PLA2inhibitors.SequenceaccessionnumbersTheoriginal,unmergedsequencingreadsweresubmittedtotheNationalCenterforBiotechnologyInformation (NCBI)SequenceReadArchiveunderaccessionnumber SRA050594.Theannotatedtoxinandnontoxinsequences

PAGE 16

Rokyta etal.BMCGenomics 2012, 13 :312 Page16of23 http://www.biomedcentral.com/1471-2164/13/312 Number of sequences 0 500 1000 1500 2000Molecular Binding Protein bindin g Catalytic Nucleic acid binding Nucleotide binding Hydrolase Transferase DNA bindin g Transcription regulato r RNA binding Signal transducer Molecular transducer Transporte r Structural molecule Kinase Transferase phosphoruscontaining_groups Enzyme regulato r Receptor binding Cytoskeletal protein binding Protein kinase Phosphotransferase alcohol group acceptor Receptor Peptidase Transcription factor Metal ion binding Cation binding Ion bindin g Calcium ion binding Lipid binding Actin binding Hydrolase ester bonds Translation factor nucleic acid bindin g Carbohydrate bindin g Phosphoprotein phosphatase Phosphatase Phosphoric ester hydrolas e Electron carrie r Nuclease Number of sequences 0 5 10 15 20 25Chromatin binding Transmembrane transporter Moto r Nucleosidetriphosphatase Pyrophosphatase Hydrolase phosphoruscontaining anhydrides Hydrolase acid anhydrides Ion channel Ion transmembrane transporte r Substratespecific channel Substratespecific transmembrane transporter Channel Substratespecific transporter Passive transmembrane transporter Translation regulato r Antioxidant Oxygen binding Neurotransmitter transporter Lead ion binding Figure6 Themolecular-functionGOtermsidenti“edforthe2,879annotatedfull-lengthnontoxinsequences. Termsspeci“cforthe production,processing,andexportofproteinsarehighlightedinblack.Theinsetshowsthelow-abundanceportionofthefulldistribution.weresubmittedtotheGenBankTranscriptomeShotgunAssembly(TSA)databaseunderaccessionnumbers JU173621…JU173743(toxins)andJU173744…JU176622 (nontoxins).ConclusionsWehavedescribedthemostcomprehensivevenom-gland transcriptomiccharacterizationofasnakespeciesto dateandprovidedfull-lengthcodingsequencesfor123 uniquetoxinproteinsand2,879uniquenontoxinproteins. WehavedemonstratedtheuseofIlluminasequencing technologyforthesequencingand denovo assemblyof atissue-speci“ctranscriptomeforanonmodelspecies, C.adamanteus ,forwhichgenome-scaleresourceswere previouslyunavailable.Becausethenontoxinsequencesin particularshouldbeconservedacrosssnakespecies,our resultsshouldgreatlyfacilitatesimilarworkwithother venomousspecies,servingasanassemblytemplateand reducingthenumberofreadsforwhich denovo assembly willbenecessary. Theexpressedtoxingenesinthevenomglandof C.adamanteus provideadetailedportraitofatypeI rattlesnakevenom[46].Themostabundanttranscript expressedinthe C.adamanteus venomglandencoded amyotoxinhomologoustocrotamine.Crotamineis knowntoinducespasticparalysis[54],asymptom thathasbeenobservedinhumanenvenomationsby C.adamanteus [20].Likethoseofmostviperids,the bitesof C.adamanteus resultinsigni“canttissuedamageandnecrosis,andwefoundthatSVMPs,themajor classofhemorrhagictoxins,dominatedvenom-gland geneexpression.ThesecondmostabundanttoxintranscriptoverallwasanLAAO,whicharealsonotedfor causinglocaltissuedamage[46].Coagulopathyisa commonoccurrencewithpit-viperbites[5].TheCTLs andSVSPswerealsobothdiverseandabundantinthe venom-glandtranscriptomeof C.adamanteus ,andboth classesprimarilyattackthehemostaticsystem.Interms ofgenesequencesofvenomcomponents,thevenomof C.adamanteus isnowthebest-characterizedsnake venom,althoughathoroughproteomicanalysisofthe venomisstillneeded.Thesequenceswehavegenerated willgreatlyfacilitatesuchaproteomiccharacterizationbyservingasadatabaseagainstwhichtoquery mass-spectrumresults. Theexpressionpatternsofthenontoxingenesinthe venomglandof C.adamanteus re”ecttheproteinsecretoryfunctionofthetissueandthehighenergetic

PAGE 17

Rokyta etal.BMCGenomics 2012, 13 :312 Page17of23 http://www.biomedcentral.com/1471-2164/13/312 Number of sequences 0 500 1000 1500 2000 2500Cellular component Cell Cell part Intracellular Intracellular part Organelle Intracellular organelle Cytoplasm Intracellular membrane-bounded organelle Membranebounded organelle Cytoplasmic part Nucleus Macromolecular complex Protein complex Intracellular nonmembrane bounded organelle Nonmembranebounded organell e Cytoso l Plasma membrane Membrane Intracellular organelle part Organelle part Nuclear part Mitochondrion Nuclear lume n Intracellular organelle lumen Organelle lume n Membraneenclosed lumen Endoplasmic reticulum Golgi apparatus Cytoskeleton Nucleoplasm Cytoplasmic membranebounded vesicl e Membranebounded vesicle Cytoplasmic vesicl e Vesicle Nucleolus Extracellular region Ribosome Ribonucleoprotein comple x Endosom e Extracellular region part Number of sequences 0 10 20 30 40 50 60 70Vacuole Chromosome Nuclear envelop e Organelle envelope Endomembrane system Envelope Extracellular space Lysosome Lytic vacuol e Microtubule organizing center Cytoskeletal part Microtubule cytoskeleto n Peroxisom e Microbody Proteinaceous extracellular matrix Extracellular matrix Nuclear chromosom e Cilium Cell projection Lipid particl e Figure7 Thecellular-componentsGOtermsidenti“edforthe2,879annotatedfull-lengthnontoxinsequences. Termsspeci“cforthe production,processing,andexportofproteinsarehighlightedinblack.Theinsetshowsthelow-abundanceportionofthefulldistribution.demandsofrapidvenomproduction[75].Themosthighly expressednontoxingeneswerethoseinvolvedinthe productionandprocessingofproteinsandenergyproductiontosupporttheseactivities.Molecularchaperones andPDIswereparticularlyabundant.Thoughtheexpressionpatternsfornontoxinswerenotsurprising,future comparisonswithothersnakespecies,especiallythose fromothersnakefamilies,maybeabletoelucidatetheoriginandearlystagesoftheevolutionofthevenomgland.MethodsVenom-glandtranscriptomesequencingWesequencedthevenom-glandtranscriptomeofasingleanimalfromFlorida(WakullaCounty):anadultfemale weighing393gwithasnout-to-ventlengthof792mmand atotallengthof844mm.Tostimulatetranscriptionin thevenomglands,weanesthetizedthesnakebypropofol injection(10mg/kg)andextractedvenombyelectrostimulationunderanesthesia[86].Aftervenomextraction, theanimalwasallowedtorecoverforfourdayswhile transcriptionlevelsreachedtheirmaxima[87].Thesnake waseuthanizedbyinjectionofsodiumpentobarbitol(100 mg/kg),anditsvenomglandsweresubsequentlyremoved. TheabovetechniqueswereapprovedbytheFloridaState UniversityInstitutionalAnimalCareandUseCommittee (IACUC)underprotocol#0924. SequencingandnonnormalizedcDNAlibrarypreparationwereperformedbytheHudsonAlphaInstitute forBiotechnologyGenomicServicesLaboratory(http:// www.hudsonalpha.org/gsl/).Transcriptomesequencing wasperformedessentiallyasdescribedbyMortazavi etal.[88]inamodi“cationofthestandardIllumina methodsdescribedindetailinBentleyetal.[89]. TotalRNAwasreducedtopoly-A+RNAwitholigodTbeads.Tworoundsofpoly-A+selectionwereperformed.Thepuri“edmRNAwasthensubjectedto amildheatfragmentationfollowedbyrandomprimingfor“rst-strandsynthesis.Standardsecond-strand synthesiswasfollowedbystandardlibrarypreparation withthedouble-strandedcDNAasinputmaterial.This approachissimilartothatofIlluminasTruSeqRNAseqlibrarypreparationkit.Sequencingwasperformedin onelaneontheIlluminaHiSeq2000with100-base-pair paired-endreads.TranscriptomeassemblyandanalysisTheaverageinsertlengthofourcDNAlibrarywas 170 nt,excludingtheIlluminaadaptors.With100-base-pair

PAGE 18

Rokyta etal.BMCGenomics 2012, 13 :312 Page18of23 http://www.biomedcentral.com/1471-2164/13/312Table5The20mosthighlyexpressednontoxintranscripts NameLength%readsFunctionAccession Proteindisul“deisomerase29705.223Rearrangedisul“debonds(ER)JU175360 CytochromeCoxidasesubunitI27890.966ElectrontransportchainJU175042 CytochromeB12750.499ElectrontransportchainJU175040 Translationelongationfactor1 119850.459TranslationJU174424 18SrRNA25090.421RibosomalcomponentJU173759 Calreticulin16610.406Proteinchaperone(ER)JU174061 Endoplasmin(HSP90family)30120.333Proteinchaperone(ER)JU174456 78kDaglucose-regulatedprotein26760.332Proteinchaperone(ER)JU174713 Heatshockprotein5(GRP78splicevariant?)20630.327Proteinchaperone(ER)JU174801 NADHdehydrogenasesubunit544480.272ElectrontransportchainJU175113 CytochromeCoxidasesubunitIII21030.239ElectrontransportchainJU175043 Proteindisul“deisomeraseA649330.212Rearrangedisul“debonds(ER)JU175358 Nucleobindin229370.203CalciumbindingJU175278 Proteindisul“deisomeraseA319040.186Rearrangedisul“debonds(ER)JU175355 NADHdehydrogenasesubunit118780.173ElectrontransportchainJU175111 NADHdehydrogenasesubunit417510.172ElectrontransportchainJU175112 Proteindisul“deisomeraseA425400.159Rearrangedisul“debonds(ER)JU175356 Translationelongationfactor230570.147TranslationJU174429 Vigilin261070.129mRNAstabilityandtranslationJU176512 Actin,cytoplasmic219710.124CytoskeletonJU173777 paired-endsequencing,themajorityofpaired-endreads overlappedattheir3ends.Becausereadqualitydeclines towardthe3endsofreads,wedevelopedamethod similartothatofRodrigueetal.[36]formergingthe overlappingpairsintosingle,long,high-qualityreads. Themembersofeachpairofreadswereslidalong eachother,and,foreachoverlapoflength n ,wecalculatedtheprobabilityofgettingtheobservednumber ofmatches k bychanceusingabinomialprobability givenby P ( k | n ) = n k 1 4 k 3 4 n Š k(1) assuminganyofthefournucleotidesisequallylikelytobe atanyposition.Tobeconservative,weonlymergedreadsTable6Toxinandproteaseinhibitorsdetectedinthevenom-glandtranscripts NameLength%readsFunctionAccession Cystatin17909.80 10Š 4Cysteine-proteaseinhibitorJU174278 CystatinB4601.23 10Š 3Cysteine-proteaseinhibitorJU174279 Cystatin27091.31 10Š 3Cysteine-proteaseinhibitorJU174280 Metalloproteinaseinhibitor18201.04 10Š 3MetalloproteinaseinhibitorJU175124 Metalloproteinaseinhibitor225602.57 10Š 3MetalloproteinaseinhibitorJU175125 Metalloproteinaseinhibitor322021.01 10Š 3MetalloproteinaseinhibitorJU175126 PLA2inhibitorbeta12101.54 10Š 3PLA2inhibitorJU175425 PLA2inhibitorgammaB114923.68 10Š 3PLA2inhibitorJU175444 PLA2inhibitorgammaB216947.70 10Š 4PLA2inhibitorJU175442 PLA2inhibitorB23391.07 10Š 3PLA2inhibitorJU175443 SerpinB617089.68 10Š 3Serine-proteinaseinhibitorJU175869 SerpinH120049.40 10Š 4Serine-proteinaseinhibitorJU175870

PAGE 19

Rokyta etal.BMCGenomics 2012, 13 :312 Page19of23 http://www.biomedcentral.com/1471-2164/13/312iftheminimumprobabilitywaslessthan10Š 10andthe secondsmallestprobabilitywasatleast1000timeslarger (Figure1A).Thelatterconditionwasmeanttohelpavoid mergingreadsthatspanhighlyrepetitiveregions.For casesinwhichtheinsertsizewaslessthanthereadlength, sequencedataoutsidetheoverlapwereassumedtorepresentadaptorsandweredeleted.Weupdatedqualityscores fortheoverlappingpositionsfollowingtheapproachof Rodrigueetal.[36].Formergedreads,qualityscoresfor nonoverlappingbaseswereleftunchanged(Figure1B). Theunmergedreadsweretypicallythosepairsfromthe longerendoftheinsert-sizedistribution. Becauseoftheinherentdicultyin denovo transcriptomeassembly,weusedadiversearrayofassembly approachesandcombinedtheresultsfora“naldata set.WeperformedassembliesusingABySSversion1.2.6 [37,38]underawidearrayofparametervaluesusingboth themergedandunmergedreads.Inparticular,weused k -mervaluesof51,61,71,81,and91andvariedthe coverage( c )anderode( e )parametersfrom2to1,000. Weset E = 0, m = 20,and s = 200forallassemblies.Trans-ABySS[90]providedlittleornoimprovement ofourassemblies,primarilybecauseassemblyquality appearedtobemoredependentonthecoverageand erodeparametersthanonthe k -merlength.Wealsoconductedassembliesusingboththemergedandunmerged readswithVelvetversion1.1.02[39]and k -mervalues of71,81,and91.Weselectedthebestoftheseassembliesonthebasisofthe N 50valuesforfurtherassembly intotranscriptswithOasesversion0.1.20(http://www.ebi. ac.uk/zerbino/oases/)[40].ForOases,wesettheminimumtranscriptlengthto300ntandthecoveragecuto to10.WealsofollowedtheapproachofRokytaetal. [11]andusedtheNGen2.2assemblerfromDNAStar (http://www.dnastar.com/).Becausethisassemblerislimitedto20…30millionreads,weusedonlythemerged reads.Weperformedfourindependentassemblies:three with20millionmergedreadseachandonewiththe remaining12,114,709mergedreads.Eachassemblywas performedwiththedefaultsettingsforhigh-stringency, denovo transcriptomeassemblyforlongIlluminareads, includingdefaultqualitytrimming.Thehigh-stringency settingcorrespondedtosettingtheminimummatchpercentageto90%.Weretainedcontigscomprisingatleast 100reads. Inadditiontotheall-at-onceassemblyapproaches above,wedevelopedaniterativeapproachthatwasboth moreeectiveatgeneratingfull-lengthtranscriptsand morecomputationallyecient.The“rststepconsistedof applyingourExtenderprogram(seebelow)asa denovo assemblerstartingfrom1,000reads.Full-lengthtranscriptswereidenti“edwithblastxsearches(seebelow), thenusedastemplatesinareference-basedassemblyin NGen3.1witha98%minimummatchpercentageto“lter readscorrespondingtoidenti“edtranscripts.Tenmillion oftheunassembledsequenceswerethenusedina de novo transcriptomeassemblyinNGen3.1withthesame settingsasdescribedabovefor denovo assemblyexcept thattheminimummatchpercentagewasincreasedto93% andcontigscomprisinglessthan200sequenceswerediscarded.Theresultingsequenceswereidenti“ed,where possible,bymeansofblastxsearches,andtheidenti“ed full-lengthtranscriptswereusedinanothertemplated assemblytogenerateafurther-reducedsetofreads.This iterativeprocesswasrepeatedtwoadditionaltimes. Toprovidetranscriptionalpro“lesofthevenomgland, weperformedGOannotationwithBlast2GO[73].We ranfullanalysesononeofNGenassembliesof20millionmergedreads,includingblastxsearches,GOmapping,andannotation.WeusedthedefaultBlast2GO parametersthroughout.WeconvertedtheGOannotationtogenericGO-slimterms.Weranthesame analysisonthecombinedsetofannotatednontoxin sequences. Forgeneidenti“cationandannotation,weconducted blastxsearchesusingmpiblastversion1.6.0(http://www. mpiblast.org/)oftheconsensussequencesofcontigsof ourassembliesagainsttheNCBInonredundantproteindatabase(nr;downloadedMarch2011andupdated throughNovember2011).WeusedanE-valuecut-oof 10Š 4,andonlythetop10matcheswereconsidered.For toxinidenti“cation,hitdescriptionsweresearchedfor asetofkeywordsbasedonknownsnake-venomtoxins andproteinclasses.Anysequencematchingthesekeywordswascheckedforafull-lengthcodingsequence.We generallyonlyretainedtranscriptswithfull-lengthcodingsequences(butseebelow).Fortheiterativeassembly approach,theremaining,presumablynontoxin-encoding, contigswerescreenedforthosewhosematchlengths wereatleast90%ofthelengthofatleastoneoftheir databasematches.Thisstepwasintendedtominimize thenumberoffragmentedorpartialsequencesthatwere consideredforannotation.Inaddition,wesortedthe contigsofthethree20-million-sequenceNGenassembliesfromtheall-at-onceapproachonthebasisofthe numberofreadsandattemptedtoannotatethetop500 contigsfromoneassemblyandthetop100fromthe othertwo. Weestimatedtranscriptabundancesusinghighstringencyreference-basedassembliesinNGen3.1with aminimummatchpercentageof95.Tenmillionofthe mergedreadsweremappedontothefull-length,annotatedtranscripts,andthepercentageofreadsmappingto eachtranscriptwasusedasaproxyforabundance.TheextenderThepurposeofExtenderistoestimatequicklyone ormorefull-lengthtranscriptsequencesfromalarge

PAGE 20

Rokyta etal.BMCGenomics 2012, 13 :312 Page20of23 http://www.biomedcentral.com/1471-2164/13/312numberofhigh-qualitysequencereads.Theprocedure beginswithoneormoreseedsequencesprovidedbythe user.Theseedscanbeknownsequences(e.g.,partial transcriptsfromapreviousassembly)orsimplysequences ofoneormoreofthereads.TheExtenderprocedure beginsbyhashingthe k -mersobservedatthetwoends oftheseeds.If k issetto50,forexample,thenthe50basesequencepresentatthe5endofeachseedisused asakeyinahashtable,andthehashvalueisapointer totheseedinthelistofseeds.Asecondhashtableis likewiseusedfor k -mersfromthe3endsoftheseeds. Notethatthismethodrequiresthatallinitial k -mersbe unique(thatnotwosequenceendsbeidentical).Once theseedsarehashed,theseedsareextendedwiththe setofreadsprovidedbytheuserasfollows.Thetwo k mersfromtheendsofeachreadarelookedupineach hashtable.Ifthekeyispresentinthehashtable,the seedisextendedbyconcatenationofthenonoverlappingbasesfromthereadontotheappropriateendofthe seed.Ifthekeyisabsent,thereversecomplementofthe readisusedtoextendtheseediftheend k -mersare found.Aftereachextension,the k -merkeyfacilitatingthe extensionisremovedfromthehashtableandthenew k -merkeyisadded(thereferencetotheseedremains thesame).Theprocedureisrepeateduntilthereadshave beencycledthrough N times,where N ischosenbythe user.Cyclingisbene“cialbecausetheExtenderdoesnot resettothebeginningofthereadlistwhenanextension ismade. Extensionofaseedtypicallyterminateswhentheend ofthefull-lengthtranscriptisreachedorwhenasequencingerrorisencounteredintheendofanincorporated read.Thepresenceoflow-frequencybiologicalartifacts (e.g.,unsplicedintrons)mayalsoresultintermination oftheextension.Inordertoimprovetheaccuracyof theconsensussequenceprediction,Extendercancreatereplicateseedsforaparticularseedbysequentially trimmingonebaseatatimefrombothends.Using replicateseedsallowsseveralindependentsequencesthat representthesametargetconsensussequencetobegeneratedsimultaneously,andthesereplicatesareentirely independentbecausetheybeginwithdierentkeys.The usercanobtainthe“nalestimateofthesequencecorrespondingtoeachoriginalseedbytakingtheconsensusacrossreplicatesorbysimplychoosingthereplicate producingthelongestsequence.Wetooktheformer approachforallofourassemblyeorts.Overall,Extender ishighlyinecientwithitsuseofdataandrequiresmany long,high-qualityreads,butitisextremelycomputationallyecient,havingshortruntimesandlowmemory requirements. WeusedExtenderintwodierentways:tocomplete partialtoxintranscriptsandasa denovo assembler.For theformer,weusedpartialtoxintranscriptsfromNGen assembliesthatwerefoundtohavefragmentsofcodingsequencehomologouswithknowntoxins.Thepartialtranscriptsweretrimmedtojustthepartialcoding sequenceandusedasseeds.TouseExtenderasa denovo assembler,weseededitwith1,000randomreads.For bothapplications,weuseda k -mersizeof100,20replicates,10cyclesthroughthecompletesetofmergedreads excludingallreadswithanybaseswithqualityscoresless than30.Abbreviations BPP:BradykininpotentiatingandC-typenatriureticpeptides;CTL:C-type lectin;CREGF:Cysteine-richwithEGF-likedomain;CRISP:Cysteine-rich secretoryprotein;Gb:Gigabase;GC:Glutaminyl-peptidecyclotransferase;GO: Geneontology;HYAL:Hyaluronidase;KUN:Kunitz-typeproteaseinhibitor; LAAO:Lamino-acidoxidase;MYO:Myotoxin(crotamine);NGF:Nervegrowth factor;NF:Neurotrophicfactor;nt:Nucleotide;NUC:Nucleotidase;PDE: Phosphodiesterase;PDI:Proteindisul“deisomerase;PLA2:PhospholipaseA2; SVMP:Snakevenommetalloproteinase(typesIIandIII);SVSP:Snakevenom serineproteinase;VEGF:Vascularendothelialgrowthfactor;VESP:Vespryn (ohanin-like);VF:Venomfactor;WAP:Waprin. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Authorscontributions TheprojectwasconceivedandplannedbyDRandAL.DR,MM,andKA collectedandanalyzedthedata.DRwrotethemanuscript.Allauthorsread andapprovedthe“nalmanuscript. Acknowledgements TheauthorsthankKennethP.WrayfordissectingthevenomglandsandDarryl HeardfortrainingDRRintheelectrostimulationtechniqueforvenom extraction.ComputationalresourceswereprovidedbytheFloridaState UniversityHigh-PerformanceComputingcluster,andtheauthorsthankJames C.Wilgenbuschforassistanceintheuseoftheseresources.Fundingforthis workwasprovidedtoDRRandARLbyFloridaStateUniversity. Authordetails1DepartmentofBiologicalScience,FloridaStateUniversity,Tallahassee,FL 32306-4295,USA.2DepartmentofScienti“cComputing,FloridaState University,Tallahassee,FL32306-4120,USA. Received:14March2012Accepted:2July2012 Published:16July2012 References 1.ChippauxJP: Snake-bites:appraisaloftheglobalsituation. BullWHO 1998, 76: 515…524. 2.ONeilME,MackKA,GilchristJ,WozniakEJ: Snakebiteinjuriestreatedin UnitedStatesemergencydepartments,2001…2004. WildernessEnv Med 2007, 18: 281…287. 3.LangleyRL: DeathsfromreptilebitesintheUnitedStates, 1979…2004. ClinToxicol 2009, 47: 44…47. 4.TheakstonRDG,WarrellDA,GrithsE: ReportofaWHOworkshopon thestandardizationandcontrolofantivenoms. Toxicon 2003, 41: 541…557. 5.SmithJ,BushS: EnvenomationsbyreptilesintheUnitedStates. In HandbookofVenomsandToxinsofReptiles .EditedbyMackessySP.Boca Raton,Florida:CRCPress;2010:475…490. 6.Neves-FerreiraAGC,ValenteRH,PeralesJ,DomontGB: Natural inhibitors:innateimmunitytosnakevenoms. In HandbookofVenoms andToxinsofReptiles .EditedbyMackessySP.BocaRaton,Florida:CRC Press;2010:259…284. 7.RucavadoA,LomonteB: Neutralizationofmyonecrosis,hemorrhage, andedemainducedby Bothropsapser snakevenomby homologousandheterologouspre-existingantibodiesinmice. Toxicon 1996, 34: 567…577.

PAGE 21

Rokyta etal.BMCGenomics 2012, 13 :312 Page21of23 http://www.biomedcentral.com/1471-2164/13/3128.HuangKF,ChowLP,ChiouSH: Isolationandcharacterizationofa novelproteinaseinhibitorfromthesnakeserumofTaiwanhabu (Trimeresurusmucrosquamatus) BiochemBiophysResCommun 1999, 263: 610…616. 9.ValenteRH,DragulevB,PeralesJ,FoxJW,DomontGB: BJ46a,asnake venommetalloproteinaseinhibitor. EurJBiochem 2001, 268: 3042…3052. 10.SerranoSMT,ShannonJD,WangD,CamargoACM,FoxJW: A multifacetedanalysisofviperidsnakevenomsbytwo-dimensional gelelectrophoresis:Anapproachtounderstandingvenom proteomics. Proteomics 2005, 5: 501…510. 11.RokytaDR,WrayKP,LemmonAR,LemmonEM,CaudleSB: A high-throughputvenom-glandtranscriptomefortheeastern diamondbackrattlesnake (Crotalusadamanteus) andevidencefor pervasivepositiveselectionacrosstoxinclasses. Toxicon 2011, 57: 657…671. 12.BiardiJE,ChienDC,CossRG: Californiagroundsquirrel (Spermophilus beecheyi) defensesagainstrattlesnakevenomdigestiveand hemostatictoxins. JChemEcol 2005, 31: 2501…2518. 13.BiardiJE,NguyenKT,LanderS,WhitleyM,NambiarKP: Arapidand sensitive”uorometricmethodforthequantitativeanalysisofsnake venommetalloproteasesandtheirinhibitors. Toxicon 2011, 57: 342…347. 14.JansaSA,VossRS: Adaptiveevolutionofthevenom-targetedvWF proteininopossumsthateatpitvipers. PLoSOne 2011, 6: e20997. 15.HarveyAL,BradleyKN,CochranSA,RowanEG,PrattJA,QuillfeldtJA, JerusalinskyDA: Whatcantoxinstellusfordrugdiscovery?. Toxicon 1998, 36: 1635…1640. 16.M enezA: Functionalarchitecturesofanimaltoxins:acluetodrug design? Toxicon 1998, 36: 1557…1572. 17.EscoubasP,KingGF: Venomicsasadrugdiscoveryplatform. Expert RevProteomics 2009, 6: 221…224. 18.BohlenCJ,CheslerAT,Sharif-NaeiniR,MedzihradszkyKF,ZhouS,KingD, S anchezEE,BurlingameAL,BasbaumAI,JuliusD: AheteromericTexas coralsnaketoxintargetsacid-sensingionchannelstoproducepain. Nature 2011, 479: 410…414. 19.HartlFU,BracherA,Hayer-HartlM: Molecularchaperonesinprotein foldingandproteostasis. Nature 2011, 475: 324…332. 20.KlauberLM: Rattlesnakes:TheirHabits,LifeHistories,andIn”uenceon Mankind .secondedition.Berkeley,California:UniversityofCalifornia Press;1997. 21.GoldBS,DartRC,BarishRA:Bitesofvenomoussnakes. NEnglJMed 2002, 347: 347…356. 22.ConantR,CollinsJT: AFieldguidetoReptilesandAmphibiansofEasternand CentralNorthAmerica .thirdedition.NewYork,NewYork:Houghton MiinHarcourt;1998. 23.PalmerWM,BraswellAL: ReptilesofNorthCarolina .ChapelHill,North Carolina:UniversityofNorthCarolinaPress;1995. 24.DundeeHA,RossmanDA: TheAmphibiansandReptilesofLouisiana .Baton Rouge,Louisiana:LouisianaUniversityPress;1996. 25.PahariS,MackessySP,KiniRM: Thevenomglandtranscriptomeofthe desertmassasaugarattlesnake( Sistruruscatenatusedwardsii ): towardsanunderstandingofvenomcompositionamongadvanced snakes(superfamilyColubroidea). BMCMolBiol 2007, 8: 115. 26.CasewellNR,HarrisonRA,W ¨ usterW,WagstaSC: Comparativevenom glandtranscriptomesurveysofthesaw-scaledvipers(Viperidae: Echis )revealsubstantialintra-familygenediversityandnovel venomtranscripts. BMCGenomics 2009, 10: 564. 27.Le aoLI,HoPL,deLMJunqueira-deAzevedoI: Transcriptomicbasisfor anantiserumagainst Micruruscorallinus (coralsnake)venom. BMC Genomics 2009, 10: 112. 28.JiangY,LiY,LeeW,XuX,ZhangY,ZhaoR,ZhangY,WangW: Venom glandtranscriptomesoftwoelapidsnakes( Bungarusmulticinctus and Najaatra )andevolutionoftoxingenes. BMCGenomics 2011, 12: 1. 29.MorgensternD,RohdeBH,KingGF,TalT,SherD,ZlotkinE: Thetaleofa restingvenomgland:transcriptomeofarepletevenomglandfrom thescorpion Hottentottajudaicus Toxicon 2011, 57: 695…703. 30.WhittingtonCM,PapenfussAT,LockeDP,MardisER,WilsonRK, AbubuckerS,MitrevaM,WongESW,HsuAL,KuchelPW,BelovK,Warren WC: Novelvenomgenediscoveryintheplatypus. GenomeBiol 2010, 11: R95. 31.GremskiLH,SilveiraRBD,ChaimOM,ProbstCM,FerrerVP,NowatzkiJ, WeinschutzHC,MadeiraHM,GremskiW,NaderHB,Sen-RibeiroA,Veiga SS: Anovelexpressionpro“leofthe Loxoscelesintermedia spider venomousglandrevealedbytranscriptomeanalysis. MolBioSyst 2010, 6: 2403…2416. 32.RuimingZ,YibaoM,YawenH,ZhiyongD,YingliangW,ZhijianC,Wenxin L: Comparativevenomglandtranscriptomeanalysisofthescorpion Lychasmucronatus revealsintraspeci“ctoxicgenediversityand newvenomouscomponents. BMCGenomics 2010, 11: 452. 33.HuH,BandyopadhyayPK,OliveraBM,YandellM: Characterizationof theConusbullatus genomeanditsvenon-ducttranscriptome. BMC Genomics 2011, 12: 60. 34.DurbanJ,Ju arezP,AnguloY,LomonteB,Flores-DiazM,Alape-Gir onA, SasaM,SanzL,Guti errezJM,DopazoJ,ConesaA,CalveteJJ: Pro“lingthe venomglandtranscriptomesofCostaRicansnakesby454 pyrosequencing. BMCGenomics 2011, 12: 259. 35.GillesA,Megl eczE,PechM,FerreiraS,MalausaT,MartinJF: Accuracyand qualityassessmentof454GS-FLXTitaniumpyrosequencing. BMC Genomics 2011, 12: 245. 36.RodrigueS,MaternaAC,TimberlakeSC,BlackburnMC,MalmstromRR, AimEJ,ChisholmSW: Unlockingshortreadsequencingfor metagenomics. PLoSOne 2010, 5: e11840. 37.BirolI,JackmanSD,NielsenCB,QianJQ,VarholR,StazykG,MorinRD, ZhaoY,HirstM,ScheinJE,HorsmanDE,ConnorsJM,GascoyneRD,Marra MA,JonesSJM: DenovotranscriptomeassemblywithABySS. Bioinformatics 2009, 25: 2872…2877. 38.SimpsonJT,WongK,JackmanSD,ScheinJE,JonesSJM,BirolI: ABySS:a parallelassemblerforshortreadsequencedata. GenomeRes 2009, 19: 1117…1123. 39.ZerbinoDR,BirneyE: Velvet:algorithmsfordenovoshortread assemblyusingdeBruijngraphs. GenomeRes 2008, 18: 821…829. 40.SchulzMH,ZerbinoDR,VingronM,BirneyE: Oases:robustdenovo RNA-seqassemblyacrossthedynamicrangeofexpressionlevels. Bioinformatics 2012, 28: 1086…1092. 41.FeldmeyerB,WheatCW,KrezdornN,RotterB,PfenningerM: Shortread Illuminadataforthe denovo assemblyofanon-modelsnailspecies transcriptome( Radixbalthica ,Basommatophora,Pulmonata),anda comparisonofassemblerperformance. BMCGenomics 2011, 12: 317. 42.DohmJC,LottazC,BorodinaT,HimmelbauerH: Substantialbiasesin ultra-shortreaddatasetsfromhigh-throughputDNAsequencing. NucleicAcidsRes 2008, 36: e105. 43.GibbsHL,SanzL,CalveteJJ: Snakepopulationvenomics: proteomics-basedanalysesofindividualvariationreveals signi“cantgeneregulationeectsonvenomproteinexpressionin Sistrurus rattlesnakes. JMolEvol 2009, 68: 113…125. 44.FoxJW,SerranoSMT: Structuralconsiderationsofthesnakevenom metalloproteinases,keymembersoftheM12reprolysinfamilyof metalloproteinases. Toxicon 2005, 45: 969…985. 45.FoxJW,SerranoSMT:Snakevenommetalloproteinases. In Handbook ofVenomsandToxinsofReptiles .EditedbyMackessySP.BocaRaton, Florida:CRCPress;2010:95…113. 46.MackessySP: Venomcompositioninrattlesnakes:trendsand biologicalsigni“cance. In TheBiologyofRattlesnakes .EditedbyHayes WK,BeamanKR,CardwellMD,BushSP.LomaLinda,California:Loma LindaUniversityPress;2008:495…510. 47.DuXY,ClemetsonKJ: ReptileC-typelectins. In HandbookofVenomsand ToxinsofReptiles .EditedbyMackessySP.BocaRaton,Florida:CRCPress; 2010:359…375. 48.WalkerJR,NagarB,YoungNM,HiramaT,RiniJM: X-raycrystalstructure ofagalactose-speci“cC-typelectinpossessinganoveldecameric quaternarystructure. Biochemistry 2004, 43: 3783…3792. 49.SerranoSMT,MarounRC: Snakevenomserineproteinases:sequence homologyvs.substratespeci“city,aparadoxtobesolved. Toxicon 2005, 45: 1115…1132. 50.PhillipsDJ,SwensonSD,FrancisS,MarklandJ: Thrombin-likesnake venomserineproteinases. In HandbookofVenomsandToxinsofReptiles EditedbyMackessySP.BocaRaton,Florida:CRCPress;2010: 139…154. 51.LynchVJ: Inventinganarsenal:adaptiveevolutionand neofunctionalizationofsnakevenomphospholipaseA2genes. BMC EvolBiol 2007, 7: 2.

PAGE 22

Rokyta etal.BMCGenomics 2012, 13 :312 Page22of23 http://www.biomedcentral.com/1471-2164/13/31252.DoleyR,ZhouX,KiniRM: SnakevenomphospholipaseA2enzymes. In HandbookofVenomsandToxinsofReptiles .EditedbyMackessySP.Boca Raton,Florida:CRCPress;2010:173…205. 53.R adis-BaptistaG,OguiuraN,HayashiMAF,CamargoME,GregoKF, OliveiraEB,YamaneT: Nucleotidesequenceofcrotamineisoform precursorsfromasingleSouthAmericanrattlesnake( Crotalus durissusterri“cus ). Toxicon 1999, 37: 973…984. 54.OguiuraN,Boni-MitakeM,R adis-BaptistaG: Newviewoncrotamine,a smallbasicpolypeptidemyotoxinfromSouthAmericanrattlesnake venom. Toxicon 2005, 46: 363…370. 55.StraightRC,GlennJL,WoltTB,WolfeMC: Regionaldierencesin contentofsmallbasicpeptidetoxinsinthevenomsof Crotalus adamanteus and Crotalushorridus CompBiochemPhysiolB 1991, 100: 51…58. 56.TanNH,FungSY: SnakevenomL-aminoacidoxidases. In Handbookof VenomsandToxinsofReptiles .EditedbyMackessySP.BocaRaton,Florida: CRCPress;2010:221…235. 57.HeyborneWH,MackessySP: Cysteine-richsecretoryproteinsinreptile venoms. In HandbookofVenomsandToxinsofReptiles .Editedby MackessySP.BocaRaton,Florida:CRCPress;2010:325…336. 58.YamazakiY,HyodoF,MoritaT: Widedistributionofcysteine-rich secretoryproteinsinsnakevenoms:isolationandcloningofnovel snakevenomcysteine-richsecretoryproteins. ArchBiochemBiophys 2003, 412: 133…141. 59.YamazakiY,MoritaT: Structureandfunctionofsnakevenom cysteine-richsecretoryproteins. Toxicon 2004, 44: 227…231. 60.PungYF,WongPTH,KumarPP,HodgsonWC,KiniRM: Ohanin,anovel proteinfromkingcobravenom,induceshypolocomotionand hyperalgesiainmice. JBiolChem 2005, 280: 13137…13147. 61.PungYF,KumarSV,RajagopalanN,FryBG,KumarPP,KiniRM: Ohanin,a novelproteinfromkingcobravenom:itscDNAandgenomic organization. Gene 2006, 371: 246…256. 62.Junqueira-de-AzevedoILM,ChingATC,CarvalhoE,FariaF,NishiyamaJr MY,HoPL,DinizMRV: Lachesismuta (Viperidae)cDNAsreveal divergingpitvipermoleculesandscaoldstypicalofcobra (Elapidae)venoms:implicationsforsnaketoxinrepertoire evolution. Genetics 2006, 173: 877…889. 63.AirdSD: Ophidianenvenomationstrategiesandtheroleofpurines. Toxicon 2002, 40: 335…393. 64.AirdSD: Theroleofpurineandpyrimidinenucleosidesinsnake venoms. In HandbookofVenomsandToxinsofReptiles.Editedby MackessySP.BocaRaton,Florida:CRCPress;2010:393…419. 65.DhananjayaBL,VishwanathBS,DSouzaCJM: Snakevenomnucleases, nucleotidases,andphosphomonoesterases. In HandbookofVenoms andToxinsofReptiles .EditedbyMackessySP.BocaRaton,Florida:CRC Press;2010:155…171. 66.ShafqatJ,ZaidiZH,J ¨ ornvallH: Puri“cationandcharacterizationofa chymotrypsinKunitzinhibitortypeofpolypeptidefromthevenom ofcobra( Najanajanaja ). FEBSLett 1990, 275: 6…8. 67.HarrisonRA,IbisonF,WilbrahamD,WagstaSC: Identi“cationofcDNAs encodingvipervenomhyaluronidases:cross-genericsequence conservationoffull-lengthandunusuallyshortvarianttranscripts. Gene 2007, 392: 22…33. 68.KemparajuK,GirishKS,NagarajuS: Hyaluronidases,aneglectedclass ofglycosidasesfromsnakevenom:beyondaspreadingfactor. In HandbookofVenomsandToxinsofReptiles .EditedbyMackessySP.Boca Raton,Florida:CRCPress;2010:237…258. 69.PawlakJ,KiniRM: Snakevenomglutaminylcyclase. Toxicon 2006, 48: 278…286. 70.FryBG,ScheibH,vanderWeerdL,YoungB,McNaughtanJ,RamjanSFR, VidalN,PoelmannRE,NormanJA: Evolutionofanarsenal. MolCell Proteomics 2008, 7: 215…246. 71.RehanaS,KiniRM: Molecularisoformsofcobravenomfactor-like proteinsinthevenomof Austrelapssuperbus Toxicon 2007, 50: 32…52. 72.EggertsenG,LindP,Sj ¨ oquistJ: Molecularcharacterizationofthe complementactivatingproteininthevenomoftheIndiancobra ( Najan.siamensis ). MolImmunol 1981, 18: 125…133. 73.ConesaA,G ¨ otzS,Garc ša-G omezJM,TerolJ,Tal onM,RoblesM: Blast2GO: auniversaltoolforannotation,visualizationandanalysisin functionalgenomicsresearch. Bioinformatics 2005, 21: 3674…3676. 74.McCueMD: CostofproducingvenominthreeNorthAmerican pitviperspecies. Copeia 2006, 2006: 818…825. 75.MackessySP,BaxterLM: Bioweaponssynthesisandstorage:the venomglandoffront-fangedsnakes. ZoolAnz 2006, 245: 147…159. 76.WangQ,ZhangZ,BlackwellK,CarmichaelGG: Vigilinsbindto promiscuouslyA-to-I-editedRNAsandareinvolvedintheformation ofheterochromatin. CurrBiol2005, 15: 384…391. 77.NishikuraK: FunctionsandregulationofRNAeditingbyADAR deaminases. AnnuRevBiochem 2010, 79: 321…349. 78.HartlFU: Molecularchaperonesincellularproteinfolding. Nature 1996, 381: 571…580. 79.FinkAL: Chaperone-mediatedproteinfolding. PhysiolRev 1999, 79: 425…449. 80.YoungJC,AgasheVR,SiegersK,HartlFU: Pathwaysof chaperone-mediatedproteinfoldinginthecytosol. NatRevMolCell Biol 2004, 5: 781…791. 81.FinleyD: Recognitionandprocessingofubiquitin-protein conjugatesbytheproteasome. AnnuRevBiochem 2009, 78: 477…513. 82.BuchbergerA,BukauB,SommerT: Proteinqualitycontrolinthe cytosolandtheendoplasmicreticulum:brothersinarms. MolCell 2010, 40: 238…252. 83.BagolaK,MehnertM,JaroschE,SommerT: Proteindislocationfromthe ER. BiochimBiophysActa 2011, 1808: 925…936. 84.HuangKF,ChiouSH,KoTP,WangAHJ: Determinantsoftheinhibition ofaTaiwanhabuvenommetalloproteinasebyitsendogenous inhibitorsbyX-raycrystallographyandsyntheticinhibitor analogues. EurJBiochem 2002, 269: 3047…3056. 85.RichardsR,StPierreL,TrabiM,JohnsonLA,deJerseyJ,MasciPP,Lavin MF: Cloningandcharacterizationofnovelcystatinsfromelapid snakevenomglands. Biochimie 2011, 93: 659…668. 86.McClearyRJR,HeardDJ: VenomextractionfromanesthetizedFlorida cottonmouths, Agkistrodonpiscivorusconanti ,usingaportable nervestimulator. Toxicon 2010, 55: 250…255. 87.RotenbergD,BambergerES,KochvaE: Studiesonribonucleicacid synthesisinthevenomglandsof Viperapalaestinae (Ophidia, Reptilia). BiochemJ 1971, 121: 609…612. 88.MortazaviA,WilliamsBA,McCueK,SchaeerL,WoldB: Mappingand quantifyingmammaliantranscriptomesbyRNA-Seq. NatMethods 2008, 5: 621…628. 89.BentleyDR,BalasubramanianS,SwerdlowHP,SmithGP,MiltonJ,Brown CG,HallKP,EversDJ,BarnesCL,BignellHR,BoutellJM,BryantJ,CarterRJ, CheethamRK,CoxAJ,EllisDJ,FlatbushMR,GormleyNA,HumphraySJ, IrvingLJ,KarbelashviliMS,KirkSM,LiH,LiuX,MaisingerKS,MurrayLJ, ObradovicB,OstT,ParkinsonML,PrattMR,RasolonjatovoIMJ,ReedMT, RigattiR,RodighieroC,RossMT,SabotA,SankarSV,ScallyA,SchrothGP, SmithME,SmithVP,SpiridouA,TorrancePE,TzonevSS,VermaasEH, WalterK,WuX,ZhangL,AlamMD,AnastasiC,AnieboIC,BaileyDMD, BancarzIR,BanerjeeS,BarbourSG,BaybayanPA,BenoitVA,BensonKF, BevisC,BlackPJ,BoodhunA,BrennanJS,BridghamJA,BrownRC,Brown AA,BuermannDH,BunduAA,BurrowsJC,CarterNP,CastilloN,ChiaraE, CatenazziM,ChangS,NeilCooleyR,CrakeNR,DadaOO,Diakoumakos KD,Dominguez-FernandezB,EarnshawDJ,EgbujorUC,ElmoreDW, EtchinSS,EwanMR,FedurcoM,FraserLJ,FuentesFajardoKV,FureyWS, GeorgeD,GietzenKJ,GoddardCP,GoldaGS,GranieriPA,GreenDE, GustafsonDL,HansenNF,HarnishK,HaudenschildCD,HeyerNI,Hims MM,HoJT,HorganAM,HoschlerK,HurwitzS,IvanovDV,JohnsonMQ, JamesT,JonesTAH,KangGD,KerelskaTH,KerseyAD,KhrebtukovaI, KindwallAP,KingsburyZ,Kokko-GonzalesPI,KumarA,LaurentMA, LawleyCT,LeeSE,LeeX,LiaoAK,LochJA,LokM,LuoS,MammenRM, MartinJW,McCauleyPG,McNittP,MehtaP,MoonKW,MullensJW, NewingtonT,NingZ,NgBL,NovoSM,ONeillMJ,OsborneMA,Osnowski A,OstadanO,ParaschosLL,PickeringL,PikeAC,PikeAC,PinkardDC, PliskinDP,PodhaskyJ,QuijanoVJ,RaczyC,RaeVH,RawlingsSR, RodriguezAC,RoePM,RogersJ,BacigalupoMCR,RomanovN,RomieuA, RothRK,RourkeNJ,RuedigerST,RusmanE,Sanches-KuiperRM,Schenker MR,SeoaneJM,ShawRJ,ShiverMK,ShortSW,SiztoNL,SluisJP,SmithMA, SohnaJES,SpenceEJ,StevensK,SuttonN,SzajkowskiL,TregidgoCL, TurcattiG,vandeVondeleS,VerhovskyY,VirkSM,WakelinS,WalcottGC, WangJ,WorsleyGJ,YanJ,YauL,ZuerleinM,RogersJ,MullikinJC,Hurles ME,McCookeNJ,WestJS,OaksFL,LundbergPL,KlenermanD,DurbinR,

PAGE 23

Rokyta etal.BMCGenomics 2012, 13 :312 Page23of23 http://www.biomedcentral.com/1471-2164/13/312SmithAJ: Accuratewholehumangenomesequencingusing reversibleterminatorchemistry. Nature 2008, 456: 53…59. 90.RobertsonG,ScheinJ,ChiuR,CorbettR,FieldM,JackmanSD,MungallK, LeeS,OkadaHM,QianJQ,GrithM,RaymondA,ThiessenN,CezardT, Butter“eldYS,NewsomeR,ChanSK,SheR,VarholR,KamohB,PrabhuAL, TamA,ZhaoY,MooreRA,HirstM,MarraMA,JonesSJM,HoodlessPA, BirolI: Denovo assemblyandanalysisofRNA-seqdata. NatMethods 2010, 7: 909…912. doi:10.1186/1471-2164-13-312 Citethisarticleas: Rokyta etal. : Thevenom-glandtranscriptomeofthe easterndiamondbackrattlesnake( Crotalusadamanteus ). BMCGenomics 2012 13 :312. Submit your next manuscript to BioMed Central and take full advantage of: € Convenient online submission € Thorough peer review € No space constraints or color “gure charges € Immediate publication on acceptance € Inclusion in PubMed, CAS, Scopus and Google Scholar € Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit


xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EA3PDC3Y5_KBA1NO INGEST_TIME 2013-03-05T20:19:13Z PACKAGE AA00013473_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES


!DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd'
ui 1471-2164-13-312
ji 1471-2164
fm
dochead Research article
bibl
title
p The venom-gland transcriptome of the eastern diamondback rattlesnake (it Crotalus adamanteus)
aug
au id A1 ca yes snm Rokytami Rfnm Darininsr iid I1 email drokyta@bio.fsu.edu
A2 LemmonRAlanI2 alemmon@evotutor.org
A3 MargresJMarkmmargres@bio.fsu.edu
A4 AronowKaralynkaronow@bio.fsu.edu
insg
ins Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295, USA
Department of Scientific Computing, Florida State University, Tallahassee, FL 32306-4120, USA
source BMC Genomics
issn 1471-2164
pubdate 2012
volume 13
issue 1
fpage 312
url http://www.biomedcentral.com/1471-2164/13/312
xrefbib pubidlist pubid idtype doi 10.1186/1471-2164-13-312pmpid 23025625
history rec date day 14month 3year 2012acc 272012pub 1672012
cpyrt 2012collab Rokyta et al.; licensee BioMed Central Ltd.note This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
abs
sec
st
Abstract
Background
Snake venoms have significant impacts on human populations through the morbidity and mortality associated with snakebites and as sources of drugs, drug leads, and physiological research tools. Genes expressed by venom-gland tissue, including those encoding toxic proteins, have therefore been sequenced but only with relatively sparse coverage resulting from the low-throughput sequencing approaches available. High-throughput approaches based on 454 pyrosequencing have recently been applied to the study of snake venoms to give the most complete characterizations to date of the genes expressed in active venom glands, but such approaches are costly and still provide a far-from-complete characterization of the genes expressed during venom production.
Results
We describe the de novo assembly and analysis of the venom-gland transcriptome of an eastern diamondback rattlesnake (Crotalus adamanteus) based on 95,643,958 pairs of quality-filtered, 100-base-pair Illumina reads. We identified 123 unique, full-length toxin-coding sequences, which cluster into 78 groups with less than 1% nucleotide divergence, and 2,879 unique, full-length nontoxin coding sequences. The toxin sequences accounted for 35.4% of the total reads, and the nontoxin sequences for an additional 27.5%. The most highly expressed toxin was a small myotoxin related to crotamine, which accounted for 5.9% of the total reads. Snake-venom metalloproteinases accounted for the highest percentage of reads mapping to a toxin class (24.4%), followed by C-type lectins (22.2%) and serine proteinases (20.0%). The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters). The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.
Conclusions
We have provided the most complete characterization of the genes expressed in an active snake venom gland to date, producing insights into snakebite pathology and guidance for snakebite treatment for the largest rattlesnake species and arguably the most dangerous snake native to the United States of America, C. adamanteus. We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.
bdy
Background
Human envenomation by snakes is a worldwide issue that claims more than 100,000 lives per year and exacts untold costs in the form of pain, disfigurement, and loss of limbs or limb function abbrgrp abbr bid B1 1B2 2B3 3. Despite the significance of snakebites, their treatments have remained largely unchanged for decades. The only treatments currently available are traditional antivenoms derived from antisera of animals, usually horses B4 4, innoculated with whole venoms B5 5B6 6; such an approach is the only readily available option for largely uncharacterized, complex mixtures of proteins such as snake venoms. Although often lifesaving and generally effective against systemic effects, these antivenoms have little or no effect on local hemorrhage or necrosis B7 7B8 8B9 9, which are major aspects of the pathology of viperid bites and can result in lifelong disability 45. These traditional treatments also sometimes lead to adverse reactions in patients 6. Advances in treatment approaches will depend on a complete knowledge of the nature of the offending toxins, but current estimates of the numbers of unique toxins present in snake venoms are in excess of 100 B10 10, a number not approached in even the most extensive venom-characterization efforts to date B11 11.
The significance of snake venoms extends well beyond the selective pressures they may directly impose upon human populations. Snake venoms have evolutionary consequences for those species that snakes prey upon B12 12B13 13, as well as species that prey upon the snakes B14 14, and their study can therefore provide insights into predator-prey coevolution. Snake venom components have been leveraged as drugs and drug leads B15 15B16 16B17 17 and have been used directly as tools for studying physiological processes such as pain reception B18 18. In addition to the significance of the toxins, the nature of the extreme specialization of snake venom glands for the rapid but temporary production and export of large quantities of protein could provide insights into basic mechanisms of proteostasis, the breakdown of which is thought to contribute to neurodegenerative diseases such as Parkinson’s and Alzheimer’s B19 19.
The eastern diamondback rattlesnake (Crotalus adamanteus) is a pit viper native to the southeastern United States and is the largest member of the genus Crotalus, reaching lengths of up to 2.44 m B20 20. The diet of C. adamanteus consists primarily of small mammals (e.g., squirrels, rabbits, and mouse and rat species) and birds, particularly ground-nesting species such as quail 20. Because of its extreme size and consequent large venom yield, C. adamanteus is arguably the most dangerous snake species in the United States and is one of the major sources of snakebite mortality throughout its range B21 21. Crotalus adamanteus has recently become of interest from a conservation standpoint because of its declining range, which at one time included seven states along the southeastern Coastal Plain B22 22. This species has now apparently been extirpated from Louisiana and is listed as endangered in North Carolina B23 23B24 24. As a consequence of recent work by Rokyta et al. 11 based on 454 pyrosequencing, the venom of C. adamanteus is among the best-characterized snake venoms; 40 toxins have been identified.
Transcriptomic characterizations of venom glands of snakes B25 25B26 26B27 27B28 28 and other animals B29 29B30 30B31 31B32 32 have relied almost exclusively on low-throughput sequencing approaches. Sanger sequencing, with its relatively long, high-quality reads, has been the only method available until recently and has provided invaluable data on the identities of venom genes. Because venomous species are primarily nonmodel organisms, high-throughput sequencing approaches have been slow to pervade the field of venomics (but see Hu et al. B33 33), despite becoming commonplace in other transcriptomic-based fields. Rokyta et al. 11 recently used 454 pyrosequencing to characterize venom genes for C. adamanteus. More recently, Durban et al. B34 34 used 454 sequencing to study the venom-gland transcriptomes of a mix of RNA from eight species of Costa Rican snakes. Whittington et al. 30 used a hybrid approach with both 454 and Illumina sequencing to characterize the platypus venom-gland transcriptome, although they had a reference genome sequence, making de novo assembly unnecessary. Pyrosequencing is expensive and low-throughput relative to Illumina sequencing, and the high error rate, particularly for homopolymer errors B35 35, significantly increases the difficulty of identifying coding sequences without reference sequences.
We sequenced the venom-gland transcriptome of the eastern diamondback rattlesnake with Illumina technology using a paired-end approach coupled with short insert sizes effectively to produce longer, high-quality reads on the order of approximately 150 nt to facilitate de novo assembly (an approach similar to that of Rodrigue et al. B36 36 for metagenomics). The difference in read length from that of 454 sequencing was compensated for by the increase of more than two orders of magnitude in the number of reads. We demonstrated de novo assembly and analysis of a venom-gland transcriptome using only Illumina sequences and provided a comprehensive characterization of both the toxin and nontoxin genes expressed in an actively producing snake venom gland.
Results and discussion
Venom-gland transcriptome sequencing and assembly
We generated a total of 95,643,958 pairs of reads that passed the Illumina quality filter for > 19 gigabases (Gb) of sequence from a cDNA library with an average insert size of ∼170 nt. Of these reads, 72,114,709 (75%) were merged (see Methods) on the basis of their 3’ overlap (Figure figr fid F1 1), yielding composite reads of average length 142 nt with average phred qualities > 40 and a total length > 10 Gb. This merging of reads reduced the effective size of the data set without loss of information and provided long reads to facilitate accurate assembly.
fig Figure 1caption Merging overlapping readstext
b Merging overlapping reads. (A) Reads are slid along each other until the number of matches exceeds the significance threshold. In the example shown, the optimal overlap is 74 nucleotides (nt). (B) The quality of reads declines dramatically toward their 3’ ends, where overlap occurs if the fragment length is less than twice the read length, allowing the actual quality to be much higher than the nominal values. The example shown is the average of pairs that overlap by exactly 50 nt.
graphic file 1471-2164-13-312-1
Our first approach to transcriptome assembly was aimed at identifying toxin genes. We attempted to use as many of the data as possible to ensure the identification of even the lowest-abundance toxins. To this end, we conducted extensive searches of assembly parameter space for both ABySS B37 37B38 38 (Table tblr tid T1 1) and Velvet B39 39 on the basis of the full set of both merged and unmerged reads. We used the assemblies with the best N50 values for further analysis. For Velvet, the assembly using a k-mer size of 91 was best (N50=408); this assembly was subsequently analyzed with Oases B40 40. For ABySS, the best k-mer value was also 91 (N50 = 2,007), but because the performance in terms of full-length transcripts appeared to depend strongly on the coverage (c) and erode (e) parameters, we further analyzed the k = 91 assemblies with c = 10 and e = 2, c = 100 and e = 100, and c = 1000 and e = 1000. We identified all full-length toxins by means of blastx searches on the results of all four assemblies.
table
Table 1
ABySS assembly summaries
tgroup cols 11
colspec align left colname c1 colnum 1 colwidth 1*
center c2 2
c3 3
c4 4
c5 5
c6 6
c7 7
c8 8
c9 9
c10 10
c11
thead valign top
row
entry
Total
Longest
Contigs
Contigs
Total
rowsep
k
c
e
contigs
contig
> 200 nt
>
N
50
Median
Mean
N
50
length
tbody
51
2
2
2,168,050
15,079
147,942
27,410
364
592
790
8.77 × 10sup 7
51
10
2
329,609
17,493
37,575
6,399
571
985
1,635
3.70 × 107
51
10
10
337,039
17,459
37,944
6,390
554
967
1,621
3.67 × 107
51
20
20
191,367
15,906
27,546
4,931
529
878
1,401
2.42 × 107
51
30
30
135,961
13,472
21,986
4,034
494
812
1,256
1.79 × 107
51
50
50
87,092
10,380
15,461
2,955
463
737
1,088
1.14 × 107
51
100
10
42,366
8,553
8,510
1,725
432
658
906
5.60 × 106
51
100
100
42,251
8,552
8,401
1,707
431
656
899
5.52 × 106
51
1000
1000
2,319
5,232
571
123
428
631
797
3.61 × 105
61
2
2
1,769,274
17,166
141,471
25,105
361
604
827
8.55 × 107
61
10
2
263,688
17,493
34,076
5,959
618
1,032
1,691
3.52 × 107
61
10
10
272,814
17,459
35,002
6,036
586
998
1,651
3.50 × 107
61
20
20
154,459
15,906
25,114
4,575
545
891
1,408
2.24 × 107
61
30
30
109,994
10,070
19,994
3,721
496
808
1,232
1.62 × 107
61
50
50
70,029
10,349
13,916
2,675
455
725
1,073
1.01 × 107
61
100
10
32,476
8,300
7,318
1,479
426
655
894
4.79 × 106
61
100
100
32,392
7,822
7,231
1,463
423
652
893
4.72 × 106
61
1000
1000
1,709
5,209
531
114
424
614
798
3.27 × 105
71
2
2
1,431,412
15,641
131,742
22,422
360
617
870
8.13 × 107
71
10
2
208,036
17,484
29,793
5,393
683
1,101
1,785
3.28 × 107
71
10
10
219,099
17,432
31,400
5,567
629
1,041
1,705
3.27 × 107
71
20
20
122,216
14,372
21,816
4,052
581
928
1,460
2.03 × 107
71
30
30
86,599
10,416
17,138
3,249
524
835
1,272
1.43 × 107
71
50
50
54,694
10,341
11,925
2,313
464
729
1,075
8.70 × 106
71
100
10
23,980
7,817
6,183
1,253
424
650
892
4.02 × 106
71
100
100
23,997
7,810
6,119
1,239
419
644
889
3.95 × 106
71
1000
1000
1,199
5,202
443
87
444
660
885
2.93 × 105
81
2
2
1,142,924
15,303
120,593
19,697
354
625
911
7.54 × 107
81
10
2
158,781
17,713
24,939
4,721
788
1,202
1,898
3.00 × 107
81
10
10
174,032
17,688
27,366
5,005
691
1,096
1,774
3.00 × 107
81
20
20
92,627
15,784
17,715
3,396
654
1,011
1,573
1.79 × 107
81
30
30
64,866
9,868
13,697
2,666
592
904
1,366
1.24 × 107
81
50
50
39,613
10,328
9,358
1,874
513
778
1,130
7.28 × 106
81
100
10
16,303
10,149
4,777
970
454
687
963
3.28 × 106
81
100
100
16,493
10,155
4,817
981
438
671
943
3.23 × 106
81
1000
1000
889
5,198
381
82
454
649
790
2.47 × 105
91
2
2
932,237
15,694
108,954
17,394
344
622
936
6.79 × 107
91
10
2
124,647
17,713
20,420
4,025
880
1,293
2,007
2.64 × 107
91
10
10
142,306
17,687
23,525
4,428
727
1,126
1,804
2.65 × 107
91
20
20
72,117
15,792
13,614
2,702
752
1,108
1,712
1.51 × 107
91
30
30
48,540
15,792
9,949
2,009
700
1,023
1,529
1.02 × 107
91
50
50
27,581
10,199
6,477
1,336
624
901
1,309
5.84 × 106
91
100
10
10,503
10,105
3,081
658
564
816
1,155
2.52 × 106
91
100
100
10,770
10,149
3,308
705
528
769
1,078
2.54 × 106
91
1000
1000
598
3,008
342
76
438
621
754
2.12 × 105
As part of our first approach, we also performed four independent de novo transcriptome assemblies with NGen: three with 20 million merged reads each and one with the remaining 12,114,709 merged reads (Table T2 2). We identified all full-length toxins from all four assemblies. Given that all three assembly methods tended to generate a large number of fragmented toxin sequences, apparently because of retained introns and possibly alternative splicing, we developed and implemented a simple hash-table approach to completing partial transcripts, which we will refer to as Extender (see Methods). We used Extender on partial toxin sequences identified for two of the four NGen assemblies. We also annotated the most abundant full-length nontoxin transcripts for the three assemblies based on 20 million reads. After combining all of the annotated toxin and nontoxin sequences from the ABySS, Velvet, and NGen assemblies and eliminating duplicates, we had 72 unique toxin sequences and 234 unique nontoxin sequences. The paucity of full-length annotated nontoxins reflects our focus on toxin sequences rather than their absence in the assemblies.
Table 2
NGen assembly summaries
No. contigs
Assembled
Unique full-
Unique Extender
Assembly
No. reads
No. contigs
> 2k
sequences
length toxins
toxins
NGen 1
20,000,000
12,694
4,403
9,786,054
34
54
NGen 2
20,000,000
12,746
4,439
9,821,212
36
54
NGen 3
20,000,000
12,698
4,412
9,820,553
38

NGen 4
12,114,709
8,484
3,078
5,948,003
34

right nameend namest
Total unique full-length toxins =
154
Our second approach to transcriptome assembly was designed to annotate as many full-length coding sequences (toxin and nontoxin) as possible and to build a reference database of sequences to facilitate the future analysis of other snake venom-gland transcriptomes. We found that NGen was much more successful at producing transcripts with full-length coding sequences but also that it was quite inefficient when the coverage distribution was extremely uneven (see Figure F2 2). Feldmeyer et al. B41 41 also found NGen to have the best assembly performance with Illumina data. We sought therefore first to eliminate the transcripts and corresponding reads for the extremely high-abundance sequences. To do so, we employed Extender as a de novo assembler by starting from 1,000 individual high-quality reads and attempting to complete their transcripts (see Methods). From 1,000 seeds, we identified 318 full-length coding sequences with 213 toxins and 105 nontoxins. After duplicates were eliminated, this procedure resulted in 58 unique toxin and 44 unique nontoxin full-length transcripts. These sequences were used to filter the corresponding reads from the full set of merged reads with NGen. We then performed a de novo transcriptome assembly on 10 million of the filtered reads with NGen, annotated full-length transcripts from contigs comprising ≥ 200 reads with significant blastx hits, and used the resulting unique sequences as a new filter. This process of assembly, annotation, and filtering was iterated two more times. The end result was 91 unique toxin and 2,851 unique nontoxin sequences.
Figure 2Domination of the C. adamanteus venom-gland transcriptome by toxin transcripts
Domination of the C. adamanteus venom-gland transcriptome by toxin transcripts. The 123 unique toxin sequences were clustered into 78 groups with less than 1% nucleotide divergence for estimation of abundances. (A) The vast majority of the extremely highly expressed genes were toxins. The inset shows a magnification of the top 200 transcripts. (B) Expression levels of individual toxin clusters are shown with toxin classes coded by color. The toxin clusters are in the same order as in Table T3 3.
1471-2164-13-312-2
The results from both assembly approaches were merged to yield the final data set. The first approach produced 72 unique toxin and 234 unique nontoxin sequences, and the second 91 toxin and 2,851 nontoxin sequences. The merged data set consisted of 123 unique toxin sequences and 2,879 nontoxins that together accounted for 62.9% of the sequencing reads (Figure F3 3).
Figure 3Expression levels of major classes of toxins and nontoxins
Expression levels of major classes of toxins and nontoxins. More than 60% of the total reads have been accounted for with full-length annotated transcripts. (A) The major toxin classes were the CTLs, SVSPs, MYO, and SVMPs (types II and III). (B) As expected for a protein-secreting tissue, the venom gland expresses an abundance of proteins involved in proteostasis.
1471-2164-13-312-3
Toxin transcripts
We identified 123 individual, unique toxin transcripts with full-length coding sequences. To estimate the abundances of these transcripts in the C. adamanteus venom-gland transcriptome, we clustered them into 78 groups with less than 1% nt divergence (Table 3). Clusters could include alleles, recent duplicates, or even sequencing errors, which are characteristic of high-throughput sequencing B42 42. For longer genes, clusters might also include different combinations of variable sites that are widely separated in the sequence. We chose 1% as a practical, but arbitrary, cut-off for defining clusters. Mapping reads back to more similar sequences to estimate abundances would be problematic because reads could not be uniquely assigned to a particular sequence. The true number of toxin genes for C. adamanteus probably lies somewhere between 78 and 123. This range is at the lower end of the number of unique toxins typically identified for viperids by means of proteomic techniques 10, which may indicate that the venom of C. adamanteus is less complex than that of other species. Alternatively, posttranscriptional processes such as alternative splicing or posttranslational modifications could significantly increase the diversity of toxins present in the venom. Our identified toxins accounted for 35.4% of the total reads (Figure 3), and the vast majority of the extremely high-abundance transcripts were those encoding toxin proteins (Figure 2A). We named toxins with a combination of a toxin-class abbreviation, a cluster number, and, if the cluster had more than a single member, a lower-case letter to indicate the member of the cluster (e.g., CTL-3b).
Table 3
Expression levels of full-length toxin clusters
Rank
Cluster name
Cluster size
Length
% total reads
% toxin reads
GenBank TSA accessions
1
MYO
1
994
5.93
16.780
JU173668
2
LAAO
1
3089
1.88
5.309
JU173667
3
SVSP-6
1
1720
1.71
4.849
JU173733
4
CTL-8
3
721
1.02
2.896
a:JU173656,
b:JU173657,
c:JU173658
5
SVSP-13
1
1661
1.01
2.864
JU173724
6
PLA2-1
2
904
9.68 × 10−1
2.739
a:JU173675,
b:JU173676
7
SVSP-3
2
1830
9.38 × 10−1
2.653
a:JU173728,
b:JU173729
8
SVMPII-5
8
2124
9.15 × 10−1
2.587
a:JU173694,
b:JU173695,
c:JU173696,
d:JU173697,
e:JU173698,
f:JU173699,
g:JU173700,
h:JU173701
9
CTL-4
5
780
9.14 × 10−1
2.585
a:JU173646,
b:JU173647,
c:JU173648,
d:JU173649,
e:JU173650
10
SVMPII-1
5
2298
9.13 × 10−1
2.583
a:JU173682,
b:JU173683,
c:JU173684,
d:JU173685,
e:JU173686
11
SVMPIII-2
5
2246
8.97 × 10−1
2.538
a:JU173707,
b:JU173708,
c:JU173709,
d:JU173710,
e:JU173711
12
SVMPII-2
2
2138
8.38 × 10−1
2.369
a:JU173687,
b:JU173688
13
SVSP-1
1
3120
8.30 × 10−1
2.348
JU173726
14
CTL-16
1
711
7.82 × 10−1
2.211
JU173631
15
SVMPII-7
1
2082
7.74 × 10−1
2.191
JU173703
16
SVMPII-3
4
1931
7.69 × 10−1
2.176
a:JU173689,
b:JU173690,
c:JU173691,
d:JU173692
17
PLA2-4
1
890
6.98 × 10−1
1.974
JU173679
18
CTL-3
6
797
6.73 × 10−1
1.905
a:JU173640,
b:JU173641,
c:JU173642,
d:JU173643,
e:JU173644,
f:JU173645
19
SVMPII-6
1
2183
6.31 × 10−1
1.784
JU173702
20
SVMPIII-3
3
2401
6.00 × 10−1
1.698
a:JU173712,
b:JU173713,
c:JU173714
21
SVSP-9
1
10270
5.96 × 10−1
1.685
JU173738
22
SVMPII-4
1
2016
5.41 × 10−1
1.530
JU173693
23
SVSP-8
1
3524
5.41 × 10−1
1.529
JU173737
24
CTL-7
1
763
5.40 × 10−1
1.528
JU173655
25
CTL-18
1
775
5.33 × 10−1
1.509
JU173633
26
CTL-1
1
763
5.15 × 10−1
1.458
JU173635
27
SVSP-12
1
957
5.04 × 10−1
1.424
JU173723
28
CTL-6
1
607
4.80 × 10−1
1.358
JU173654
29
CRISP
1
1579
4.66 × 10−1
1.317
JU173623
30
SVMPIII-7
1
2343
4.31 × 10−1
1.218
JU173719
31
SVMPII-8
1
1863
4.24 × 10−1
1.198
JU173704
32
PLA2-6
1
834
4.15 × 10−1
1.174
JU173681
33
SVSP-7
3
2108
3.59 × 10−1
1.016
a:JU173734,
b:JU173735,
c:JU173736
34
CTL-15
1
947
3.59 × 10−1
1.016
JU173630
35
PLA2-2
1
872
3.58 × 10−1
1.013
JU173677
36
CTL-10
1
587
3.49 × 10−1
0.987
JU173624
37
CTL-14
1
1246
3.36 × 10−1
0.951
JU173629
38
CTL-13
1
680
3.15 × 10−1
0.891
JU173628
39
PLA2-5
1
651
2.90 × 10−1
0.819
JU173680
40
CTL-9
2
755
2.88 × 10−1
0.814
a:JU173659,
b:JU173660
41
BPP
1
1300
2.57 × 10−1
0.726
JU173621
42
SVMPIII-1
2
2665
2.41 × 10−1
0.683
a:JU173705,
b:JU173706
43
SVSP-14
1
1732
2.23 × 10−1
0.631
JU173725
44
SVMPIII-4
2
2433
2.16 × 10−1
0.611
a:JU173715,
b:JU173716
45
CTL-2
2
749
1.90 × 10−1
0.538
a:JU173638,
b:JU173639
46
SVMPIII-8
1
2150
1.71 × 10−1
0.484
JU173720
47
VESP
1
1603
1.71 × 10−1
0.483
JU173741
48
SVMPIII-5
1
2339
1.59 × 10−1
0.449
JU173717
49
SVSP-4
2
2152
1.34 × 10−1
0.379
a:JU173730,
b:JU173731
50
CTL-20
1
785
1.19 × 10−1
0.336
JU173636
51
NUC
1
2706
1.16 × 10−1
0.327
JU173671
52
SVSP-5
1
1890
1.12 × 10−1
0.317
JU173732
53
SVMPIII-6
1
2367
1.10 × 10−1
0.311
JU173718
54
CTL-21
1
824
1.09 × 10−1
0.307
JU173637
55
CTL-19
1
618
1.05 × 10−1
0.296
JU173634
56
NF
1
1395
9.53 × 10−2
0.269
JU173669
57
CTL-12
2
825
9.40 × 10−2
0.266
a:JU173626,
b:JU173627
58
SVSP-2
1
1675
7.35 × 10−2
0.208
JU173727
59
CTL-5
3
637
5.83 × 10−2
0.165
a:JU173651,
b:JU173652,
c:JU173653
60
PDE
1
2743
5.62 × 10−2
0.159
JU173674
61
CTL-11
1
625
4.86 × 10−2
0.137
JU173625
62
PLA2-3
1
957
3.22 × 10−2
0.091
JU173678
63
CREGF
1
1945
3.09 × 10−2
0.087
JU173622
64
SVSP-10
1
1815
2.22 × 10−2
0.063
JU173721
65
HYAL-1
1
2545
2.10 × 10−2
0.059
JU173662
66
KUN
1
1698
1.14 × 10−2
0.032
JU173666
67
HYAL-2
1
1302
7.83 × 10−3
0.022
JU173663
68
KUN-1
1
2575
5.11 × 10−3
0.014
JU173664
69
VEGF-1
2
906
4.99 × 10−3
0.014
a:JU173739,
b:JU173740
70
SVSP-11
1
1207
4.46 × 10−3
0.013
JU173722
71
GC
1
1730
3.65 × 10−3
0.010
JU173661
72
PDE-6
1
3691
3.36 × 10−3
0.010
JU173673
73
NGF
1
951
3.14 × 10−3
0.009
JU173670
74
KUN-2
1
1438
2.68 × 10−3
0.008
JU173665
75
PDE-4
1
2633
2.24 × 10−3
0.006
JU173672
76
VF
1
5087
1.70 × 10−3
0.005
JU173742
77
WAP
1
627
5.60 × 10−4
0.002
JU173743
78
CTL-17
1
774
3.80 × 10−4
0.001
JU173632
We used the number or percentage of reads mapping to a particular transcript as a measure of its abundance. Although average coverage might be a more appropriate proxy for the number of copies of a given transcript present, because it accounts for differences in transcript lengths, we prefer read counts as a measure of the expression expenditure on a given transcript because they better reflect the energetic cost associated with producing the encoded protein and are consistent with previous work using low-throughput sequencing (see, e.g., Pahari et al. 25). In addition, this measurement should more closely match proteomic-based measurements of the contents of venom components (see, e.g., Gibbs et al. B43 43) which come in the form of the percentages of total peptide bonds in the sample.
Snake venom metalloproteinases
We identified 39 unique sequences and 16 clusters of snake-venom metalloproteinases (SVMPs) that accounted for 24.4% of the reads mapping to toxin sequences and 8.6% of the total reads (Figure 3A and Table 3). In terms of total reads, the SVMPs were the most abundant class of toxins in the C. adamanteus venom-gland transcriptome. SVMPs are the primary sources of the local and systemic hemorrhage associated with envenomation by viperids and are divided into a number of subclasses based on their domain structure B44 44B45 45. All SVMPs have a metalloproteinase domain characterized by a zinc-binding motif. All of the SVMPs identified for C. adamanteus belong to either the type II or the type III subclass. Type II SVMPs (SVMPIIs) have a disintegrin domain in addition to the metalloproteinase domain, which may be proteolytically cleaved posttranslationally to produce a free disintegrin. Type III SVMPs (SVMPIIIs) have a disintegrin-like and a cysteine-rich domain in addition to the metalloproteinase domain. We found 8 clusters of each of these two subclasses with 23 unique SVMPII sequences and 16 unique SVMPIII sequences. SVMPII and SVMPIII clusters comprise 16.4% and 8.0% of the reads mapping to toxins respectively (Figure 3). The sequences in both subclasses are diverse. The maximum pairwise nt divergence for the SVMPIIs was 10.0%, corresponding to a maximum amino-acid divergence of 18.1%. For the SVMPIIIs, the maximum pairwise nt divergence was 20.4% with a maximum amino-acid divergence of 42.3%. Although SVMPs were the dominant toxins as a class, the individual SVMP cluster with the highest abundance was SVMPII-5, which was only the eighth most abundant toxin cluster (Figure 2B and Table 3).
Mackessy B46 46 categorized rattlesnake venoms as type I or type II on the basis of their toxicities and metalloproteinase activities. These two measurements tend to be inversely related in rattlesnakes: species (or populations) with low LDsub 50 values tend also to have low or undetectable hemorrhagic activities. SVMPs are the major hemorrhagic components of snake venoms, and high toxicity appears to be caused mostly by neurotoxic venom components. Low-toxicity venoms with high metalloproteinase activity are classified as type I, and high-toxicity venoms with low metalloproteinase activity are classified as type II. On the basis of the abundance of SVMPs in the venom-gland transcriptome, C. adamanteus clearly has type I venom, although the relatively low toxicity of its venom 46 is at least partially compensated for by its large size and venom yield.
C-type lectins
The most diverse and the second most abundant toxin class in the C. adamanteus venom-gland transcriptome was the C-type lectin (CTL) class. We identified 37 unique sequences and 21 clusters of CTLs that accounted for 22.2% of the reads mapping to toxins and 7.8% of the total reads (Figure 3A and Table 3). CTLs generally either inhibit or activate components of plasma or blood-cell types, thereby interfering with hemostasis B47 47. Most known snake-venom CTLs function as heterodimers or even more complex arrangements B48 48, probably accounting in part for their diversity. The divergence among members of this class within the C. adamanteus genome was extreme, although all members preserved a CTL-like domain. Some pairs shared virtually no conserved amino-acid positions. Three of the CTL clusters provide evidence for the relevance of alternative splicing in the generation of toxin proteins. CTL-3f, CTL-4e, and CTL9b all have 48-nt insertions in the same region but are otherwise similar or identical to other members of their clusters.
Snake venom serine proteinases
The third most abundant toxin class for C. adamanteus was the snake-venom serine proteinases (SVSPs). We identified 18 unique sequences and 14 clusters in this toxin class, accounting for 20.0% of the toxin reads and 7.1% of the total reads (Figure 3A and Table 3). Three of the 10 most highly expressed individual toxins were SVSPs (Figure 2). SVSPs interfere with a wide array of reactions involving blood coagulation and hemostasis and belong to the trypsin family of serine proteases B49 49B50 50. Mackessy 46 detected significant thrombin-like and kallikrein-like activity in the venom of C. adamanteus, which are attributable to the action of SVSPs. The diversity of SVSPs within the C. adamanteus genome is high; maximum pairwise nt divergence is 20.6% and amino-acid divergence is 47.4%.
The members of two SVSP clusters differ in a way that should be noted. The lengths of SVSPs are generally well conserved throughout the class. SVSP-7a has a 27-nt insertion relative to the two other members of its cluster but is otherwise identical to SVSP-7b. This difference could reflect the presence of alternative splicing for this gene. SVSP-3a is unique among the C. adamanteus SVSPs or those known from other snake species in apparently having a 65-amino-acid extension of its C-terminal region. The other member of its cluster, SVSP-3b, has a single deletion of a C nt in a poly-C tract that terminates its coding sequence consistently with other known SVSPs. The reads generating the SVSP-3a form vastly outnumber those for the SVSP-3b form; more than 95% of the reads support the extended version of the protein. The effect, if any, of this C-terminal extension remains to be determined.
Phospholipase A2’s
Previous work with C. adamanteus identified only a single phospholipase A2 (PLA2) sequence 11, but we identifed seven unique sequences in six clusters (Figure 2 and Table 3), accounting for 7.8% of the toxin reads and 2.8% of the total reads (Figure 3). PLA2s are among the most functionally diverse classes of snake-venom toxins and have pharmocological effects ranging from neurotoxicity (presynaptic or postsynaptic) to myotoxicity and cardiotoxicity. Anticoagulant and hemolytic effects due to PLA2s are also known B51 51B52 52. Compared to other toxin classes of C. adamanteus, the diversity of PLA2s is low. Five of the six clusters are all within 5% nt divergence of one another. PLA2-3 is the lone, high-divergence outlier, differing by more than 31% at the nt level from the other clusters. PLA2-3 is also expressed at the lowest level of any of the PLA2s (Table 3).
Other high-abundance toxins
The SVMPs, CTLs, SVSPs, and PLA2s account for 74% of the reads mapping to toxin sequences (Figure 3), 73% of the toxin clusters, and 82% of the unique toxin sequences. The remaining toxins belong to 16 different classes. Many of these are low-abundance transcripts (Figure 2 and Table 3) and may not actually function as significant toxins, whereas several others have high to moderate abundances and represent significant components of the venom.
The most abundant toxin transcript and the most abundant transcript overall (Figure 2) was a small basic myotoxin related to crotamine B53 53B54 54. The precursor protein is just 70 amino acids in length with a predicted 22-amino-acid signal peptide. This transcript was detected by Rokyta et al. 11, but the coding sequence was prematurely truncated in their sequence because of a single nt deletion. This toxin accounts for 16.8% of the toxin reads (Figure 3A) and 5.9% of the total reads. Crotamine, originally isolated from the venom of C. durissus, causes spastic paralysis in mice and is found in the venoms of many species of Crotalus54. Muscle spasms, twitching, and paralysis of the legs have been reported for human envenomations by C. adamanteus20. Interestingly, Straight et al. B55 55 noted that individuals of C. adamanteus from populations in southern and central Florida lack this toxin in their venoms. Given that this myotoxin is the most abundant transcript in the venom of our specimen, its absence in southern populations points to a dramatic difference in venoms within this species and the potential for significantly different pathological effects associated with bites from different C. adamanteus populations.
A single L-amino-acid oxidase (LAAO) transcript was the second most abundant toxin transcript (Figure 2B), consistent with the previously detected LAAO activity in the C. adamanteus venom 46. This single transcript accounted for 5.3% of the reads mapping to toxins and 1.9% of the total reads. LAAOs are flavoproteins, giving the venom its yellow color; can be edema- or apoptosis-inducing; and can induce or inhibit platelet aggregation B56 56. These effects are probably mediated by H2O2released during the oxidation reaction catalyzed by the enzyme. The 29th most abundant toxin transcript was a cysteine-rich secretory protein (CRISP) (Figure 2B and Table 3), accounting for 1.3% of the toxin reads (Figure 3A). Although CRISPs are widely found in snake venoms, their precise effects are not well established B57 57, but they appear to interfere with smooth-muscle contraction B58 58B59 59. A single transcript for a bradykinin-potentiating and C-type natriuretic peptide transcript (BPP) was found to account for 0.7% of the toxin reads (Figure 3A). The encoded protein is similar to a protein identified in Sistrurus catenatus (GenBank accession: DQ464265) that was hypothesized to reduce blood pressure in envenomated prey 25. A loss of blood pressure has been reported in human envenomations by C. adamanteus20.
Other low-abundance toxins
The remaining 17 clusters are classified as “others” in Figure 3A. Because each has a relatively low expression level (Table 3), many of these should be considered putative toxins until their presence in the C. adamanteus venom is confirmed proteomically and pharmacological effects are associated with them.
Rokyta et al. 11 detected the presence of a transcript encoding a protein homologous to ohanin from Ophiophagus hannahB60 60B61 61 and to a homologous protein from Lachesis mutaB62 62; we found a transcript identical to that of Rokyta et al. 11. Pung et al. 6061 found the O. hannah version of this protein to increase pain sensitivity (hyperalgesia) and to induce temporary hypolocomotion in mice and proposed naming the class vespryns (VESP). Exceptionally intense pain has been reported after envenomation of humans by C. adamanteus20, although whether such pain is due to a specific toxin is not clear.
We detected three different nucleotidases (NUCs) and five different phosphodiesterases (PDEs) in the venom-gland transcriptome of C. adamanteus. Only one of the NUCs and three of the PDEs had signal peptides, and we therefore only considered these as potential toxins: NUC, PDE, PDE-4, and PDE-6 (Table 3). The roles of these enzymes in venoms are uncertain, but their primary function may be to liberate toxic nucleosides B63 63B64 64B65 65. Significant PDE activity has been detected previously in the venom of C. adamanteus46.
The C. adamanteus venom-gland transcriptome contained three Kunitz-type protease inhibitors (KUNs). Two of these shared more than 75% animo-acid identity with a KUN from Austrelaps labialis (GenBank accession: B2BS84), an Australian elapid. All three KUNs have domains that place them in the superfamily of bovine pancreatic trypsin-like inhibitors, and snake toxins from this family are known to inhibit plasma serine proteinases. Although KUNs are commonly observed in snake venoms, their role in envenomation (if any) is not well defined B66 66. The three KUNs detected for C. adamanteus are all at relatively low abundances, suggesting that they are not major components of the venom.
We identified two transcripts, HYAL-1 and HYAL-2, encoding hyaluronidase-like proteins. Hyaluronidases are generally regarded as venom components that promote the dissemination of other venom components by degrading the extracellular matrix at the site of injection B67 67, although they may have more direct toxic effects B68 68. The coding sequences of our two transcripts differ only in the presence of a 765-nt deletion in HYAL-2 relative to HYAL-1. Truncated hyaluronidases such as HYAL-2 have been detected in the venoms of other viperid species 67 and may represent an example of alternative splicing. We also identified a transcript encoding a glutaminyl-peptide cyclotransferase (glutaminyl cyclase; GC). Many snake venom components have N termini blocked by pyroglutamate, and GCs catalyze the formation of this block. This component is related more to maturation and protection of other toxins and probably contributes only indirectly to toxicity B69 69.
We identified six growth-factor-related sequences in the venom-gland transcriptome of C. adamanteus: a nerve growth factor (NGF), a neurotrophic factor (NF), two vascular endothelial growth factors (VEGF) in a single cluster, and a cysteine-rich with EGF-like domain protein (CREGF). The NGF transcript encodes a 241 amino-acid precursor protein and shares 99% amino-acid identity with a NGF from C. durissus (GenBank accession: AAG30924). The NF transcript encodes a 180-amino-acid precursor that shares homology with mesencephalic astrocyte-derived neurotrophic factors. We found no close venom-related sequences for this NF in the available databases. The VEGF sequences appear to be alternatively spliced versions of one another. VEGF-1a encodes a 192-amino-acid precursor, and VEGF-1b encodes a 148-amino-acid precursor. Aside from the 132-nt deletion in VEGF-1b relative to VEGF-1a, their coding sequences are identical. Both forms have database matches of the same length with 99% amino-acid identity from Trimeresurus flavoviridis (GenBank accessions: AB154418 and AB154419). Finally, we detected the same cysteine-rich with EGF-like domain protein as described by Rokyta et al. 11.
The final two putative toxin transcripts are of questionable significance because of their low expression levels. A single sequence with 77% amino-acid identity to a waprin (WAP) sequence from Philodryas olfersii (GenBank accession: EU029742), a rear-fanged colubrid, was detected. Related sequences have been detected in a variety of other rear-fanged snake species, but such proteins are only known to exhibit antimicrobial activity B70 70. We detected a venom factor (VF) transcript that shares 87% animo-acid identity with a VF from Austrelaps superbus (GenBank accession: AY903291) B71 71. The C. adamanteus VF transcript encodes a 1,652-amino-acid precursor with a 22-amino-acid signal peptide. The best-studied member of this toxin family is cobra venom factor, which is known to activate the complement system B72 72. The extremely low expression levels of these transcripts may indicate that they represent the orthologous genes to the ancestors of the known toxic forms and may therefore have no toxic functions.
Comparison to previous work
Rokyta et al. 11 previously described toxin transcripts in the venom-gland transcriptome of C. adamanteus on the basis of 454 pyrosequencing. Their work used RNA from the venom gland of the same individual used in the present work. They found 40 unique toxin transcripts, 10 of which contained only partial coding sequences. Table T4 4 lists the closest matches from our current sequences to those of Rokyta et al. 11. The vast majority of the 454-based sequences had either identical matches in our current set of toxins or matches with less than 1% nt divergence (Table 4). Only a single 454 toxin, SVSP-9, did not have a close match. This sequence contains only a partial coding sequence and therefore may not represent a true, functional toxin.
Table 4
Correspondence with the results of Rokyta et al.
11
]
454 name
Accession
Closest match
% nt divergence
Notes
CREGF
HQ414087
CREGF
0.1
CRISP
HQ414088
CRISP
0.0
Identical
CTL-1
HQ414089
CTL-4a
0.0
Identical
CTL-2
HQ414090
CTL-8a
0.0
Identical
CTL-3
HQ414091
CTL-1
0.0
Identical
CTL-4
HQ414092
CTL-9a
0.8
CTL-5
HQ414093
CTL-3e
0.9
CTL-6
HQ414094
CTL-12b
0.0
Identical
CTL-7
HQ414095
CTL-10
0.0
Identical
CTL-8
HQ414096
CTL-2a
0.0
Identical
CTL-9
HQ414097
CTL-5a
0.0
Identical
HYAL
HQ414098
HYAL-1
0.0
454 version incomplete
LAAO
HQ414099
LAAO
0.0
Identical
MYO
HQ414100
MYO
0.0
454 version has 1-nt deletion that truncates the coding sequence prematurely
NUC
HQ414101
NUC
0.0
454 version incomplete
PDE-1
HQ414102
PDE
0.0
454 version has 123-nt insertion
PDE-2
HQ414103
PDE-2 (nontoxin)
0.0
454 version incomplete; no signal peptide; no longer considered toxin
PLA2
HQ414104
PLA2-1b
0.0
Identical
PLB
HQ414105
PLB (nontoxin)
0.2
No longer considered toxin
SVMP-1
HQ414106
SVMPII-3b
0.0
Identical
SVMP-2
HQ414107
SVMPII-3b/c
0.5
SVMP-3
HQ414108
SVMPII-5a
0.3
SVMP-4
HQ414109
SVMPIII-2d
1.2
SVMP-5
HQ414110
SVMPIII-4b
1.0
SVMP-6
HQ414111
SVMPIII-2d
0.2
SVMP-7
HQ414112
SVMPIII-4a
0.0
Identical
SVMP-8
HQ414113
SVMPIII-5
0.5
454 version incomplete
SVMP-9
HQ414114
SVMPIII-1a/b
0.0
454 version incomplete
SVMP-10
HQ414115
SVMPIII-6
0.0
454 version incomplete
SVMP-11
HQ414116
SVMPIII-3a
0.0
454 version incomplete
SVSP-1
HQ414117
SVSP-3a
0.0
454 version has 1-nt deletion that truncates the coding sequence prematurely
SVSP-2
HQ414118
SVSP-1
0.0
Identical
SVSP-3
HQ414119
SVSP-7a
0.0
Identical
SVSP-4
HQ414120
SVSP-5
0.1
SVSP-5
HQ414121
SVSP-9
0.5
SVSP-6
HQ414122
SVSP-6
0.0
Identical
SVSP-7
HQ414123
SVSP-4b
0.0
454 version incomplete
SVSP-8
HQ414124
SVSP-2
0.0
454 version incomplete
SVSP-9
HQ414125
None
>10
454 version incomplete
VESP
HQ414126
VESP
0.0
Identical
Nontoxin transcripts
We characterized the nontoxin genes expressed in the C. adamanteus venom gland by two means. First, we took all of the contigs from one of our four de novo NGen assemblies based on 20 million merged reads and conducted a full Blast2Go B73 73 analysis on the contigs comprising ≥ 100 reads. Of the 12,746 contigs (assembly 2 in Table 2), we were able to provide gene ontology (GO) annotations for 9,040 of them (Figure F4 4A). The major functional classes (level 2) represented in these results were binding and catalysis, followed by transcription regulation (Figure 4B). The major biological process GO terms (level 2) were cellular processes and metabolic processes (Figure 4C). Interestingly, viral reproductive function was detected and probably represents the activity of transposable elements or retroviruses like those previously noted in snake venom-gland transcriptomes 34. The major cellular component GO terms (level 2) were cell and organelle (Figure 4D). For these results, we made no attempt to exclude toxin sequences, because they are necessarily a small minority of the total sequences, and did not require that contigs contain full-length coding sequences.
Figure 4Comparison of gene ontology (GO) results for our annotated full-length nontoxin sequences with those of the contigs from a de novo assembly with NGen
Comparison of gene ontology (GO) results for our annotated full-length nontoxin sequences with those of the contigs from ade novo assembly with NGen. Only level 2 GO terms are shown. The distributions of GO terms are similar across data sets, suggesting that the annotated transcripts provided a comprehensive characterization of the genes expressed in the venom gland. (A) The distributions of sequences reaching various stages of identification and annotation are shown. The level 2 GO terms are shown for molecular function (B), biological process (C), and cellular component (D).
1471-2164-13-312-4
For our second approach, we used only the 2,879 transcripts with full-length coding sequences for nontoxin proteins. We analyzed these sequences with Blast2GO. The distributions of level 2 GO terms for these data were almost identical to those of the full NGen assembly described above (Figure 4), suggesting that our 2,879 annotated nontoxin sequences provide a representative sample of the full venom-gland transcriptome. The full distributions of GO terms for these sequences across all levels are shown in Figures F5 5, F6 6, and F7 7. As expected for a secretory tissue, processes related to protein production and secretion were well represented (e.g., protein transport and protein modification; Figure 5), as were protein-binding functions (Figure 6) and proteins localized to the endoplasmic reticulum (ER) and the Golgi apparatus (Figure 7).
Figure 5The biological-process GO terms identified for the 2,879 annotated full-length nontoxin sequences
The biological-process GO terms identified for the 2,879 annotated full-length nontoxin sequences. Terms specific for the production, processing, and export of proteins are highlighted in black. The inset shows the low-abundance portion of the full distribution.
1471-2164-13-312-5
Figure 6The molecular-function GO terms identified for the 2,879 annotated full-length nontoxin sequences
The molecular-function GO terms identified for the 2,879 annotated full-length nontoxin sequences. Terms specific for the production, processing, and export of proteins are highlighted in black. The inset shows the low-abundance portion of the full distribution.
1471-2164-13-312-6
Figure 7The cellular-components GO terms identified for the 2,879 annotated full-length nontoxin sequences
The cellular-components GO terms identified for the 2,879 annotated full-length nontoxin sequences. Terms specific for the production, processing, and export of proteins are highlighted in black. The inset shows the low-abundance portion of the full distribution.
1471-2164-13-312-7
Four of the top 20 most highly expressed nontoxin genes (Table T5 5), including the most highly expressed, were protein disulfide isomerases (PDIs). In particular, they were members of the PDI family that is retained in the ER and are characterized by having two or more PDI domains, which are similar to thioredoxin. PDIs catalyze the formation or breaking of disulfide bonds and are therefore involved in protein folding. Molecular chaperones were well represented in the top 20 nontoxins by four genes: endoplasmin (a member of the HSP90 family), calreticulin, 78-kDA glucose-regulated protein (GRP78), and heat shock protein 5. The latter gene appears to be a splice variant of GRP78, differing within the coding region by two point mutations and two short deletions. All of these chaperones are ER specific. Six of the top 20 nontoxins were mitochondrial genes involved in oxidative cellular respiration, consistent with the high energetic demands of venom production B74 74: cytochrome C oxidase subunits I and III, cytochrome B, and NADH dehydrogenase subunits 1, 4, and 5. The cells of venom glands are particularly rich in mitochondria B75 75. Four genes were involved in various aspects of translation: two translation elongation factors, 18S rRNA, and vigilin. Vigilins are hypothesized to be involved in regulating mRNA stability and translation and might be involved in RNA-mediated gene silencing B76 76B77 77. The final top 20 nontoxin gene was actin, a component of the cytoskeleton.
Table 5
The 20 most highly expressed nontoxin transcripts
16
Name
Length
% reads
Function
Accession
Protein disulfide isomerase
2970
5.223
Rearrange disulfide bonds (ER)
JU175360
Cytochrome C oxidase subunit I
2789
0.966
Electron transport chain
JU175042
Cytochrome B
1275
0.499
Electron transport chain
JU175040
Translation elongation factor 1 α 1
1985
0.459
Translation
JU174424
18S rRNA
2509
0.421
Ribosomal component
JU173759
Calreticulin
1661
0.406
Protein chaperone (ER)
JU174061
Endoplasmin (HSP90 family)
3012
0.333
Protein chaperone (ER)
JU174456
78 kDa glucose-regulated protein
2676
0.332
Protein chaperone (ER)
JU174713
Heat shock protein 5 (GRP78 splice variant?)
2063
0.327
Protein chaperone (ER)
JU174801
NADH dehydrogenase subunit 5
4448
0.272
Electron transport chain
JU175113
Cytochrome C oxidase subunit III
2103
0.239
Electron transport chain
JU175043
Protein disulfide isomerase A6
4933
0.212
Rearrange disulfide bonds (ER)
JU175358
Nucleobindin 2
2937
0.203
Calcium binding
JU175278
Protein disulfide isomerase A3
1904
0.186
Rearrange disulfide bonds (ER)
JU175355
NADH dehydrogenase subunit 1
1878
0.173
Electron transport chain
JU175111
NADH dehydrogenase subunit 4
1751
0.172
Electron transport chain
JU175112
Protein disulfide isomerase A4
2540
0.159
Rearrange disulfide bonds (ER)
JU175356
Translation elongation factor 2
3057
0.147
Translation
JU174429
Vigilin 2
6107
0.129
mRNA stability and translation
JU176512
Actin, cytoplasmic 2
1971
0.124
Cytoskeleton
JU173777
The abundances of several major classes of nontoxins are provided in Figure 3B. We identified 57 sequences with functions related to protein folding 19B78 78B79 79B80 80, including various classes of heat-shock proteins, protein-disulfide isomerases, peptidyl-prolyl cis-trans isomerases, dnaJ-complex components, and T-complex components. These sequences together accounted for 28.4% of the total reads mapping to nontoxins. Ribosomal-protein transcripts (cytoplasmic and mitochondrial) accounted for 9.5% of the nontoxin reads, and mitochondrial genes accounted for another 9.0%. Finally, we identified 110 sequences transcripts encoding proteins involved in protein degradation B81 81B82 82, including proteins involved in the ubiquitin-proteasome system and the ER-associated protein-degradation system B83 83, which accounted for 2.6% of the nontoxin reads. Protein-quality control should be essential in a high-throughput protein-producing tissue such as a snake venom gland.
Our collection of nontoxins included several notable potential inhibitors of the toxins or other proteases (Table T6 6). Such inhibitors may play a role in preventing autolysis B84 84 or may serve to protect venom components once inside a victim B85 85. We detected three cystatin-like transcripts in the venom gland. Cystatins are cysteine-protease inhibitors and have been detected in numerous elapid venom glands and venoms 85. We detected three unique metalloproteinase inhibitors and two serine proteinase inhibitors (serpins). Finally, we found four unique PLA2 inhibitors.
Table 6
Toxin and protease inhibitors detected in the venom-gland transcripts
15
Name
Length
% reads
Function
Accession
Cystatin 1
790
9.80 × 10−4
Cysteine-protease inhibitor
JU174278
Cystatin B
460
1.23 × 10−3
Cysteine-protease inhibitor
JU174279
Cystatin 2
709
1.31 × 10−3
Cysteine-protease inhibitor
JU174280
Metalloproteinase inhibitor 1
820
1.04 × 10−3
Metalloproteinase inhibitor
JU175124
Metalloproteinase inhibitor 2
2560
2.57 × 10−3
Metalloproteinase inhibitor
JU175125
Metalloproteinase inhibitor 3
2202
1.01 × 10−3
Metalloproteinase inhibitor
JU175126
PLA2 inhibitor beta
1210
1.54 × 10−3
PLA2 inhibitor
JU175425
PLA2 inhibitor gamma B 1
1492
3.68 × 10−3
PLA2 inhibitor
JU175444
PLA2 inhibitor gamma B 2
1694
7.70 × 10−4
PLA2 inhibitor
JU175442
PLA2 inhibitor B
2339
1.07 × 10−3
PLA2 inhibitor
JU175443
Serpin B6
1708
9.68 × 10−3
Serine-proteinase inhibitor
JU175869
Serpin H1
2004
9.40 × 10−4
Serine-proteinase inhibitor
JU175870
Sequence accession numbers
The original, unmerged sequencing reads were submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive under accession number SRA050594. The annotated toxin and nontoxin sequences were submitted to the GenBank Transcriptome Shotgun Assembly (TSA) database under accession numbers JU173621–JU173743 (toxins) and JU173744–JU176622 (nontoxins).
Conclusions
We have described the most comprehensive venom-gland transcriptomic characterization of a snake species to date and provided full-length coding sequences for 123 unique toxin proteins and 2,879 unique nontoxin proteins. We have demonstrated the use of Illumina sequencing technology for the sequencing and de novo assembly of a tissue-specific transcriptome for a nonmodel species, C. adamanteus, for which genome-scale resources were previously unavailable. Because the nontoxin sequences in particular should be conserved across snake species, our results should greatly facilitate similar work with other venomous species, serving as an assembly template and reducing the number of reads for which de novo assembly will be necessary.
The expressed toxin genes in the venom gland of C. adamanteus provide a detailed portrait of a type I rattlesnake venom 46. The most abundant transcript expressed in the C. adamanteus venom gland encoded a myotoxin homologous to crotamine. Crotamine is known to induce spastic paralysis 54, a symptom that has been observed in human envenomations by C. adamanteus20. Like those of most viperids, the bites of C. adamanteus result in significant tissue damage and necrosis, and we found that SVMPs, the major class of hemorrhagic toxins, dominated venom-gland gene expression. The second most abundant toxin transcript overall was an LAAO, which are also noted for causing local tissue damage 46. Coagulopathy is a common occurrence with pit-viper bites 5. The CTLs and SVSPs were also both diverse and abundant in the venom-gland transcriptome of C. adamanteus, and both classes primarily attack the hemostatic system. In terms of gene sequences of venom components, the venom of C. adamanteus is now the best-characterized snake venom, although a thorough proteomic analysis of the venom is still needed. The sequences we have generated will greatly facilitate such a proteomic characterization by serving as a database against which to query mass-spectrum results.
The expression patterns of the nontoxin genes in the venom gland of C. adamanteus reflect the protein-secretory function of the tissue and the high energetic demands of rapid venom production 75. The most highly expressed nontoxin genes were those involved in the production and processing of proteins and energy production to support these activities. Molecular chaperones and PDIs were particularly abundant. Though the expression patterns for nontoxins were not surprising, future comparisons with other snake species, especially those from other snake families, may be able to elucidate the origin and early stages of the evolution of the venom gland.
Methods
Venom-gland transcriptome sequencing
We sequenced the venom-gland transcriptome of a single animal from Florida (Wakulla County): an adult female weighing 393 g with a snout-to-vent length of 792 mm and a total length of 844 mm. To stimulate transcription in the venom glands, we anesthetized the snake by propofol injection (10 mg/kg) and extracted venom by electrostimulation under anesthesia B86 86. After venom extraction, the animal was allowed to recover for four days while transcription levels reached their maxima B87 87. The snake was euthanized by injection of sodium pentobarbitol (100 mg/kg), and its venom glands were subsequently removed. The above techniques were approved by the Florida State University Institutional Animal Care and Use Committee (IACUC) under protocol #0924.
Sequencing and nonnormalized cDNA library preparation were performed by the HudsonAlpha Institute for Biotechnology Genomic Services Laboratory (http://www.hudsonalpha.org/gsl/). Transcriptome sequencing was performed essentially as described by Mortazavi et al. B88 88 in a modification of the standard Illumina methods described in detail in Bentley et al. B89 89. Total RNA was reduced to poly-A+ RNA with oligo-dT beads. Two rounds of poly-A+ selection were performed. The purified mRNA was then subjected to a mild heat fragmentation followed by random priming for first-strand synthesis. Standard second-strand synthesis was followed by standard library preparation with the double-stranded cDNA as input material. This approach is similar to that of Illumina’s TruSeq RNA-seq library preparation kit. Sequencing was performed in one lane on the Illumina HiSeq 2000 with 100-base-pair paired-end reads.
Transcriptome assembly and analysis
The average insert length of our cDNA library was ∼170 nt, excluding the Illumina adaptors. With 100-base-pair paired-end sequencing, the majority of paired-end reads overlapped at their 3’ ends. Because read quality declines toward the 3’ ends of reads, we developed a method similar to that of Rodrigue et al. 36 for merging the overlapping pairs into single, long, high-quality reads. The members of each pair of reads were slid along each other, and, for each overlap of length n, we calculated the probability of getting the observed number of matches k by chance using a binomial probability given by
display-formula M1
m:math name 1471-2164-13-312-i1 xmlns:m http:www.w3.org1998MathMathML m:mi P
m:mo (
k
|
n
)
=
m:mfenced open ( close )
m:mrow
m:mtable
m:mtr
m:mtd
n
k
m:mspace width 0.3em
m:msup
m:mfrac
m:mn 1
4
k
3
4
n

k
assuming any of the four nucleotides is equally likely to be at any position. To be conservative, we only merged reads if the minimum probability was less than 10−10and the second smallest probability was at least 1000 times larger (Figure 1A). The latter condition was meant to help avoid merging reads that span highly repetitive regions. For cases in which the insert size was less than the read length, sequence data outside the overlap were assumed to represent adaptors and were deleted. We updated quality scores for the overlapping positions following the approach of Rodrigue et al. 36. For merged reads, quality scores for nonoverlapping bases were left unchanged (Figure 1B). The unmerged reads were typically those pairs from the longer end of the insert-size distribution.
Because of the inherent difficulty in de novo transcriptome assembly, we used a diverse array of assembly approaches and combined the results for a final data set. We performed assemblies using ABySS version 1.2.6 3738 under a wide array of parameter values using both the merged and unmerged reads. In particular, we used k-mer values of 51, 61, 71, 81, and 91 and varied the coverage (c) and erode (e) parameters from 2 to 1,000. We set E = 0, m = 20, and s = 200 for all assemblies. Trans-ABySS B90 90 provided little or no improvement of our assemblies, primarily because assembly quality appeared to be more dependent on the coverage and erode parameters than on the k-mer length. We also conducted assemblies using both the merged and unmerged reads with Velvet version 1.1.02 39 and k-mer values of 71, 81, and 91. We selected the best of these assemblies on the basis of the N50 values for further assembly into transcripts with Oases version 0.1.20 (http://www.ebi.ac.uk/∼zerbino/oases/) 40. For Oases, we set the minimum transcript length to 300 nt and the coverage cutoff to 10. We also followed the approach of Rokyta et al. 11 and used the NGen2.2 assembler from DNAStar (http://www.dnastar.com/). Because this assembler is limited to 20–30 million reads, we used only the merged reads. We performed four independent assemblies: three with 20 million merged reads each and one with the remaining 12,114,709 merged reads. Each assembly was performed with the default settings for high-stringency, de novo transcriptome assembly for long Illumina reads, including default quality trimming. The high-stringency setting corresponded to setting the minimum match percentage to 90%. We retained contigs comprising at least 100 reads.
In addition to the all-at-once assembly approaches above, we developed an iterative approach that was both more effective at generating full-length transcripts and more computationally efficient. The first step consisted of applying our Extender program (see below) as a de novo assembler starting from 1,000 reads. Full-length transcripts were identified with blastx searches (see below), then used as templates in a reference-based assembly in NGen3.1 with a 98% minimum match percentage to filter reads corresponding to identified transcripts. Ten million of the unassembled sequences were then used in a de novo transcriptome assembly in NGen3.1 with the same settings as described above for de novo assembly except that the minimum match percentage was increased to 93% and contigs comprising less than 200 sequences were discarded. The resulting sequences were identified, where possible, by means of blastx searches, and the identified full-length transcripts were used in another templated assembly to generate a further-reduced set of reads. This iterative process was repeated two additional times.
To provide transcriptional profiles of the venom gland, we performed GO annotation with Blast2GO 73. We ran full analyses on one of NGen assemblies of 20 million merged reads, including blastx searches, GO mapping, and annotation. We used the default Blast2GO parameters throughout. We converted the GO annotation to generic GO-slim terms. We ran the same analysis on the combined set of annotated nontoxin sequences.
For gene identification and annotation, we conducted blastx searches using mpiblast version 1.6.0 (http://www.mpiblast.org/) of the consensus sequences of contigs of our assemblies against the NCBI nonredundant protein database (nr; downloaded March 2011 and updated through November 2011). We used an E-value cut-off of 10−4, and only the top 10 matches were considered. For toxin identification, hit descriptions were searched for a set of keywords based on known snake-venom toxins and protein classes. Any sequence matching these keywords was checked for a full-length coding sequence. We generally only retained transcripts with full-length coding sequences (but see below). For the iterative assembly approach, the remaining, presumably nontoxin-encoding, contigs were screened for those whose match lengths were at least 90% of the length of at least one of their database matches. This step was intended to minimize the number of fragmented or partial sequences that were considered for annotation. In addition, we sorted the contigs of the three 20-million-sequence NGen assemblies from the all-at-once approach on the basis of the number of reads and attempted to annotate the top 500 contigs from one assembly and the top 100 from the other two.
We estimated transcript abundances using high-stringency reference-based assemblies in NGen3.1 with a minimum match percentage of 95. Ten million of the merged reads were mapped onto the full-length, annotated transcripts, and the percentage of reads mapping to each transcript was used as a proxy for abundance.
The extender
The purpose of Extender is to estimate quickly one or more full-length transcript sequences from a large number of high-quality sequence reads. The procedure begins with one or more seed sequences provided by the user. The seeds can be known sequences (e.g., partial transcripts from a previous assembly) or simply sequences of one or more of the reads. The Extender procedure begins by hashing the k-mers observed at the two ends of the seeds. If k is set to 50, for example, then the 50-base sequence present at the 5’ end of each seed is used as a key in a hash table, and the hash value is a pointer to the seed in the list of seeds. A second hash table is likewise used for k-mers from the 3’ ends of the seeds. Note that this method requires that all initial k-mers be unique (that no two sequence ends be identical). Once the seeds are hashed, the seeds are extended with the set of reads provided by the user as follows. The two k-mers from the ends of each read are looked up in each hash table. If the key is present in the hash table, the seed is extended by concatenation of the nonoverlapping bases from the read onto the appropriate end of the seed. If the key is absent, the reverse complement of the read is used to extend the seed if the end k-mers are found. After each extension, the k-mer key facilitating the extension is removed from the hash table and the new k-mer key is added (the reference to the seed remains the same). The procedure is repeated until the reads have been cycled through N times, where N is chosen by the user. Cycling is beneficial because the Extender does not reset to the beginning of the read list when an extension is made.
Extension of a seed typically terminates when the end of the full-length transcript is reached or when a sequencing error is encountered in the end of an incorporated read. The presence of low-frequency biological artifacts (e.g., unspliced introns) may also result in termination of the extension. In order to improve the accuracy of the consensus sequence prediction, Extender can create replicate seeds for a particular seed by sequentially trimming one base at a time from both ends. Using replicate seeds allows several independent sequences that represent the same target consensus sequence to be generated simultaneously, and these replicates are entirely independent because they begin with different keys. The user can obtain the final estimate of the sequence corresponding to each original seed by taking the consensus across replicates or by simply choosing the replicate producing the longest sequence. We took the former approach for all of our assembly efforts. Overall, Extender is highly inefficient with its use of data and requires many long, high-quality reads, but it is extremely computationally efficient, having short run times and low memory requirements.
We used Extender in two different ways: to complete partial toxin transcripts and as a de novo assembler. For the former, we used partial toxin transcripts from NGen assemblies that were found to have fragments of coding sequence homologous with known toxins. The partial transcripts were trimmed to just the partial coding sequence and used as seeds. To use Extender as a de novo assembler, we seeded it with 1,000 random reads. For both applications, we used a k-mer size of 100, 20 replicates, 10 cycles through the complete set of merged reads excluding all reads with any bases with quality scores less than 30.
Abbreviations
BPP: Bradykinin potentiating and C-type natriuretic peptides; CTL: C-type lectin; CREGF: Cysteine-rich with EGF-like domain; CRISP: Cysteine-rich secretory protein; Gb: Gigabase; GC: Glutaminyl-peptide cyclotransferase; GO: Gene ontology; HYAL: Hyaluronidase; KUN: Kunitz-type protease inhibitor; LAAO: L amino-acid oxidase; MYO: Myotoxin (crotamine); NGF: Nerve growth factor; NF: Neurotrophic factor; nt: Nucleotide; NUC: Nucleotidase; PDE: Phosphodiesterase; PDI: Protein disulfide isomerase; PLA2: Phospholipase A2; SVMP: Snake venom metalloproteinase (types II and III); SVSP: Snake venom serine proteinase; VEGF: Vascular endothelial growth factor; VESP: Vespryn (ohanin-like); VF: Venom factor; WAP: Waprin.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The project was conceived and planned by DR and AL. DR, MM, and KA collected and analyzed the data. DR wrote the manuscript. All authors read and approved the final manuscript.
bm
ack
Acknowledgements
The authors thank Kenneth P. Wray for dissecting the venom glands and Darryl Heard for training DRR in the electrostimulation technique for venom extraction. Computational resources were provided by the Florida State University High-Performance Computing cluster, and the authors thank James C. Wilgenbusch for assistance in the use of these resources. Funding for this work was provided to DRR and ARL by Florida State University.
refgrp Snake-bites: appraisal of the global situationChippauxJPBull WHO199876515lpage 524pmcid 23057899868843Snakebite injuries treated in United States emergency departments, 2001–2004O’NeilMEMackKAGilchristJWozniakEJWilderness Env Med20071828128710.1580/06-WEME-OR-080R1.1Deaths from reptile bites in the United States, 1979–2004LangleyRLClin Toxicol200947444710.1080/15563650801968313Report of a WHO workshop on the standardization and control of antivenomsTheakstonRDGWarrellDAGriffithsEToxicon20034154155710.1016/S0041-0101(02)00393-8link fulltext 12676433Envenomations by reptiles in the United StatesSmithJBushSHandbook of Venoms and Toxins of Reptilespublisher CRC Press, Boca Raton, Floridaeditor Mackessy SP2010475490Natural inhibitors: innate immunity to snake venomsNeves-FerreiraAGCValenteRHPeralesJDomontGBHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP2010259284Neutralization of myonecrosis, hemorrhage, and edema induced by Bothrops apser snake venom by homologous and heterologous pre-existing antibodies in miceRucavadoALomonteBToxicon19963456757710.1016/0041-0101(95)00162-X8783451Isolation and characterization of a novel proteinase inhibitor from the snake serum of Taiwan habu (Trimeresurus mucrosquamatus)HuangKFChowLPChiouSHBiochem Biophys Res Commun199926361061610.1006/bbrc.1999.142110512726BJ46a, a snake venom metalloproteinase inhibitorValenteRHDragulevBPeralesJFoxJWDomontGBEur J Biochem20012683042305210.1046/j.1432-1327.2001.02199.x11358523A multifaceted analysis of viperid snake venoms by two-dimensional gel electrophoresis: An approach to understanding venom proteomicsSerranoSMTShannonJDWangDCamargoACMFoxJWProteomics2005550151010.1002/pmic.20040093115627971A high-throughput venom-gland transcriptome for the eastern diamondback rattlesnake (Crotalus adamanteus) and evidence for pervasive positive selection across toxin classesRokytaDRWrayKPLemmonARLemmonEMCaudleSBToxicon20115765767110.1016/j.toxicon.2011.01.00821255598California ground squirrel (Spermophilus beecheyi) defenses against rattlesnake venom digestive and hemostatic toxinsBiardiJEChienDCCossRGJ Chem Ecol2005312501251810.1007/s10886-005-7610-116273425A rapid and sensitive fluorometric method for the quantitative analysis of snake venom metalloproteases and their inhibitorsBiardiJENguyenKTLanderSWhitleyMNambiarKPToxicon20115734234710.1016/j.toxicon.2010.12.014303399021187109Adaptive evolution of the venom-targeted vWF protein in opossums that eat pitvipersJansaSAVossRSPLoS One20116e2099710.1371/journal.pone.0020997312082421731638What can toxins tell us for drug discovery?HarveyALBradleyKNCochranSARowanEGPrattJAQuillfeldtJAJerusalinskyDAToxicon1998361635164010.1016/S0041-0101(98)00156-19792180Functional architectures of animal toxins: a clue to drug design?MénezAToxicon1998361557157210.1016/S0041-0101(98)00148-29792172Venomics as a drug discovery platformEscoubasPKingGFExpert Rev Proteomics2009622122410.1586/epr.09.4519489692A heteromeric Texas coral snake toxin targets acid-sensing ion channels to produce painBohlenCJCheslerATSharif-NaeiniRMedzihradszkyKFZhouSKingDSánchezEEBurlingameALBasbaumAIJuliusDNature201147941041410.1038/nature10607322674722094702Molecular chaperones in protein folding and proteostasisHartlFUBracherAHayer-HartlMNature201147532433210.1038/nature1031721776078KlauberLMRattlesnakes: Their Habits, Life Histories, and Influence on MankindUniversity of California Press, Berkeley, California1997Bites of venomous snakesGoldBSDartRCBarishRAN Engl J Med200234734735610.1056/NEJMra01347712151473ConantRCollinsJTA Fieldguide to Reptiles and Amphibians of Eastern and Central North AmericaHoughton Mifflin Harcourt, New York, New York1998PalmerWMBraswellALReptiles of North CarolinaUniversity of North Carolina Press, Chapel Hill, North Carolina1995DundeeHARossmanDAThe Amphibians and Reptiles of LouisianaLouisiana University Press, Baton Rouge, Louisiana1996The venom gland transcriptome of the desert massasauga rattlesnake (Sistrurus catenatus edwardsii): towards an understanding of venom composition among advanced snakes (superfamily Colubroidea)PahariSMackessySPKiniRMBMC Mol Biol2007811510.1186/1471-2199-8-115224280318096037Comparative venom gland transcriptome surveys of the saw-scaled vipers (Viperidae: Echis) reveal substantial intra-family gene diversity and novel venom transcriptsCasewellNRHarrisonRAWüsterWWagstaffSCBMC Genomics20091056410.1186/1471-2164-10-564279047519948012Transcriptomic basis for an antiserum against Micrurus corallinus (coral snake) venomLeãoLIHoPLde L M Junqueira-de AzevedoIBMC Genomics20091011210.1186/1471-2164-10-112266288119291316Venom gland transcriptomes of two elapid snakes (Bungarus multicinctus and Naja atra) and evolution of toxin genesJiangYLiYLeeWXuXZhangYZhaoRZhangYWangWBMC Genomics201112110.1186/1471-2164-12-1315487521733171The tale of a resting venom gland: transcriptome of a replete venom gland from the scorpion Hottentotta judaicusMorgensternDRohdeBHKingGFTalTSherDZlotkinEToxicon20115769570310.1016/j.toxicon.2011.02.00121329713Novel venom gene discovery in the platypusWhittingtonCMPapenfussATLockeDPMardisERWilsonRKAbubuckerSMitrevaMWongESWHsuALKuchelPWBelovKWarrenWCGenome Biol201011R9510.1186/gb-2010-11-9-r95296538720920228A novel expression profile of the Loxosceles intermedia spider venomous gland revealed by transcriptome analysisGremskiLHSilveiraRBDChaimOMProbstCMFerrerVPNowatzkiJWeinschutzHCMadeiraHMGremskiWNaderHBSenff-RibeiroAVeigaSSMol BioSyst201062403241610.1039/c004118a20644878Comparative venom gland transcriptome analysis of the scorpion Lychas mucronatus reveals intraspecific toxic gene diversity and new venomous componentsRuimingZYibaoMYawenHZhiyongDYingliangWZhijianCWenxinLBMC Genomics20101145210.1186/1471-2164-11-452309164920663230Characterization of the Conus bullatus genome and its venon-duct transcriptomeHuHBandyopadhyayPKOliveraBMYandellMBMC Genomics2011126010.1186/1471-2164-12-60304072721266071Profiling the venom gland transcriptomes of Costa Rican snakes by 454 pyrosequencingDurbanJJuárezPAnguloYLomonteBFlores-DiazMAlape-GirónASasaMSanzLGutiérrezJMDopazoJConesaACalveteJJBMC Genomics20111225910.1186/1471-2164-12-259312806621605378Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencingGillesAMegléczEPechMFerreiraSMalausaTMartinJFBMC Genomics20111224510.1186/1471-2164-12-245311650621592414Unlocking short read sequencing for metagenomicsRodrigueSMaternaACTimberlakeSCBlackburnMCMalmstromRRAimEJChisholmSWPLoS One20105e1184010.1371/journal.pone.0011840291138720676378De novo transcriptome assembly with ABySSBirolIJackmanSDNielsenCBQianJQVarholRStazykGMorinRDZhaoYHirstMScheinJEHorsmanDEConnorsJMGascoyneRDMarraMAJonesSJMBioinformatics2009252872287710.1093/bioinformatics/btp36719528083ABySS: a parallel assembler for short read sequence dataSimpsonJTWongKJackmanSDScheinJEJonesSJMBirolIGenome Res2009191117112310.1101/gr.089532.108269447219251739Velvet: algorithms for de novo short read assembly using de Bruijn graphsZerbinoDRBirneyEGenome Res20081882182910.1101/gr.074492.107233680118349386Oases: robust de novo RNA-seq assembly across the dynamic range of expression levelsSchulzMHZerbinoDRVingronMBirneyEBioinformatics2012281086109210.1093/bioinformatics/bts094332451522368243Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performanceFeldmeyerBWheatCWKrezdornNRotterBPfenningerMBMC Genomics20111231710.1186/1471-2164-12-317312807021679424Substantial biases in ultra-short read data sets from high-throughput DNA sequencingDohmJCLottazCBorodinaTHimmelbauerHNucleic Acids Res200836e10510.1093/nar/gkn425253272618660515Snake population venomics: proteomics-based analyses of individual variation reveals significant gene regulation effects on venom protein expression in Sistrurus rattlesnakesGibbsHLSanzLCalveteJJJ Mol Evol20096811312510.1007/s00239-008-9186-119184165Structural considerations of the snake venom metalloproteinases, key members of the M12 reprolysin family of metalloproteinasesFoxJWSerranoSMTToxicon20054596998510.1016/j.toxicon.2005.02.01215922769Snake venom metalloproteinasesFoxJWSerranoSMTHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP201095113Venom composition in rattlesnakes: trends and biological significanceMackessySPThe Biology of RattlesnakesLoma Linda University Press, Loma Linda, CaliforniaHayes WK, Beaman KR, Cardwell MD, Bush SP2008495510Reptile C-type lectinsDuXYClemetsonKJHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP2010359375X-ray crystal structure of a galactose-specific C-type lectin possessing a novel decameric quaternary structureWalkerJRNagarBYoungNMHiramaTRiniJMBiochemistry2004433783379210.1021/bi035871a15049685Snake venom serine proteinases: sequence homology vs. substrate specificity, a paradox to be solvedSerranoSMTMarounRCToxicon2005451115113210.1016/j.toxicon.2005.02.02015922778Thrombin-like snake venom serine proteinasesPhillipsDJSwensonSDFrancisSMarklandJHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP2010139154Inventing an arsenal: adaptive evolution and neofunctionalization of snake venom phospholipase A2genesLynchVJBMC Evol Biol20077210.1186/1471-2148-7-2178384417233905Snake venom phospholipase A2 enzymesDoleyRZhouXKiniRMHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP2010173205Nucleotide sequence of crotamine isoform precursors from a single South American rattlesnake (Crotalus durissus terrificus)Rádis-BaptistaGOguiuraNHayashiMAFCamargoMEGregoKFOliveiraEBYamaneTToxicon19993797398410.1016/S0041-0101(98)00226-810484745New view on crotamine, a small basic polypeptide myotoxin from South American rattlesnake venomOguiuraNBoni-MitakeMRádis-BaptistaGToxicon20054636337010.1016/j.toxicon.2005.06.00916115660Regional differences in content of small basic peptide toxins in the venoms of Crotalus adamanteus and Crotalus horridusStraightRCGlennJLWoltTBWolfeMCComp Biochem Physiol B1991100515810.1016/0305-0491(91)90083-P1756621Snake venom L-amino acid oxidasesTanNHFungSYHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP2010221235Cysteine-rich secretory proteins in reptile venomsHeyborneWHMackessySPHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP2010325336Wide distribution of cysteine-rich secretory proteins in snake venoms: isolation and cloning of novel snake venom cysteine-rich secretory proteinsYamazakiYHyodoFMoritaTArch Biochem Biophys200341213314110.1016/S0003-9861(03)00028-612646276Structure and function of snake venom cysteine-rich secretory proteinsYamazakiYMoritaTToxicon20044422723110.1016/j.toxicon.2004.05.02315302528Ohanin, a novel protein from king cobra venom, induces hypolocomotion and hyperalgesia in micePungYFWongPTHKumarPPHodgsonWCKiniRMJ Biol Chem2005280131371314715668253Ohanin, a novel protein from king cobra venom: its cDNA and genomic organizationPungYFKumarSVRajagopalanNFryBGKumarPPKiniRMGene200637124625610.1016/j.gene.2005.12.00216472942Lachesis muta (Viperidae) cDNAs reveal diverging pit viper molecules and scaffolds typical of cobra (Elapidae) venoms: implications for snake toxin repertoire evolutionJunqueira-de-AzevedoILMChingATCCarvalhoEFariaFNishiyama JrMYHoPLDinizMRVGenetics200617387788910.1534/genetics.106.056515152651216582429Ophidian envenomation strategies and the role of purinesAirdSDToxicon20024033539310.1016/S0041-0101(01)00232-X11738231The role of purine and pyrimidine nucleosides in snake venomsAirdSDHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP2010393419Snake venom nucleases, nucleotidases, and phosphomonoesterasesDhananjayaBLVishwanathBSD’SouzaCJMHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP2010155171Purification and characterization of a chymotrypsin Kunitz inhibitor type of polypeptide from the venom of cobra (Naja naja naja)ShafqatJZaidiZHJörnvallHFEBS Lett19902756810.1016/0014-5793(90)81426-O2262001Identification of cDNAs encoding viper venom hyaluronidases: cross-generic sequence conservation of full-length and unusually short variant transcriptsHarrisonRAIbisonFWilbrahamDWagstaffSCGene2007392223310.1016/j.gene.2006.10.02617210232Hyaluronidases, a neglected class of glycosidases from snake venom: beyond a spreading factorKemparajuKGirishKSNagarajuSHandbook of Venoms and Toxins of ReptilesCRC Press, Boca Raton, FloridaMackessy SP2010237258Snake venom glutaminyl cyclasePawlakJKiniRMToxicon20064827828610.1016/j.toxicon.2006.05.01316863655Evolution of an arsenalFryBGScheibHvan der WeerdLYoungBMcNaughtanJRamjanSFRVidalNPoelmannRENormanJAMol Cell Proteomics2008721524617855442Molecular isoforms of cobra venom factor-like proteins in the venom of Austrelaps superbusRehanaSKiniRMToxicon200750325210.1016/j.toxicon.2007.02.01617412383Molecular characterization of the complement activating protein in the venom of the Indian cobra (Naja n. siamensis)EggertsenGLindPSjöquistJMol Immunol19811812513310.1016/0161-5890(81)90078-X6790937Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics researchConesaAGötzSGarcía-GómezJMTerolJTalónMRoblesMBioinformatics2005213674367610.1093/bioinformatics/bti61016081474Cost of producing venom in three North American pitviper speciesMcCueMDCopeia2006200681882510.1643/0045-8511(2006)6[818:COPVIT]2.0.CO;2Bioweapons synthesis and storage: the venom gland of front-fanged snakesMackessySPBaxterLMZool Anz200624514715910.1016/j.jcz.2006.01.003Vigilins bind to promiscuously A-to-I-edited RNAs and are involved in the formation of heterochromatinWangQZhangZBlackwellKCarmichaelGGCurr Biol20051538439110.1016/j.cub.2005.01.04615723802Functions and regulation of RNA editing by ADAR deaminasesNishikuraKAnnu Rev Biochem20107932134910.1146/annurev-biochem-060208-105251295342520192758Molecular chaperones in cellular protein foldingHartlFUNature199638157158010.1038/381571a08637592Chaperone-mediated protein foldingFinkALPhysiol Rev19997942544910221986Pathways of chaperone-mediated protein folding in the cytosolYoungJCAgasheVRSiegersKHartlFUNat Rev Mol Cell Biol2004578179110.1038/nrm149215459659Recognition and processing of ubiquitin-protein conjugates by the proteasomeFinleyDAnnu Rev Biochem20097847751310.1146/annurev.biochem.78.081507.101607343116019489727Protein quality control in the cytosol and the endoplasmic reticulum: brothers in armsBuchbergerABukauBSommerTMol Cell20104023825210.1016/j.molcel.2010.10.00120965419Protein dislocation from the ERBagolaKMehnertMJaroschESommerTBiochim Biophys Acta2011180892593610.1016/j.bbamem.2010.06.02520599420Determinants of the inhibition of a Taiwan habu venom metalloproteinase by its endogenous inhibitors by X-ray crystallography and synthetic inhibitor analoguesHuangKFChiouSHKoTPWangAHJEur J Biochem20022693047305610.1046/j.1432-1033.2002.02982.x12071970Cloning and characterization of novel cystatins from elapid snake venom glandsRichardsRTrabiMJohnsonLAde JerseyJMasciPPLavinMFSt PierreLBiochimie20119365966810.1016/j.biochi.2010.12.00821172403Venom extraction from anesthetized Florida cottonmouths, Agkistrodon piscivorus conanti, using a portable nerve stimulatorMcClearyRJRHeardDJToxicon20105525025510.1016/j.toxicon.2009.07.03019647760Studies on ribonucleic acid synthesis in the venom glands of Vipera palaestinae (Ophidia, Reptilia)RotenbergDBambergerESKochvaEBiochem J197112160961211766364940047Mapping and quantifying mammalian transcriptomes by RNA-SeqMortazaviAWilliamsBAMcCueKSchaefferLWoldBNat Methods2008562162810.1038/nmeth.122618516045Accurate whole human genome sequencing using reversible terminator chemistryBentleyDRBalasubramanianSSwerdlowHPSmithGPMiltonJBrownCGHallKPEversDJBarnesCLBignellHRBoutellJMBryantJCarterRJCheethamRKCoxAJEllisDJFlatbushMRGormleyNAHumphraySJIrvingLJKarbelashviliMSKirkSMLiHLiuXMaisingerKSMurrayLJObradovicBOstTParkinsonMLPrattMRRasolonjatovoIMJReedMTRigattiRRodighieroCRossMTSabotASankarSVScallyASchrothGPSmithMESmithVPSpiridouATorrancePETzonevSSVermaasEHWalterKWuXZhangLAlamMDAnastasiCAnieboICBaileyDMDBancarzIRBanerjeeSBarbourSGBaybayanPABenoitVABensonKFBevisCBlackPJBoodhunABrennanJSBridghamJABrownRCBrownAABuermannDHBunduAABurrowsJCCarterNPCastilloNChiaraECatenazziMChangSNeil CooleyRCrakeNRDadaOODiakoumakosKDDominguez-FernandezBEarnshawDJEgbujorUCElmoreDWEtchinSSEwanMRFedurcoMFraserLJFuentes FajardoKVFureyWSGeorgeDGietzenKJGoddardCPGoldaGSGranieriPAGreenDEGustafsonDLHansenNFHarnishKHaudenschildCDHeyerNIHimsMMHoJTHorganAMHoschlerKHurwitzSIvanovDVJohnsonMQJamesTJonesTAHKangGDKerelskaTHKerseyADKhrebtukovaIKindwallAPKingsburyZKokko-GonzalesPIKumarALaurentMALawleyCTLeeSELeeXLiaoAKLochJALokMLuoSMammenRMMartinJWMcCauleyPGMcNittPMehtaPMoonKWMullensJWNewingtonTNingZNgBLNovoSMO’NeillMJOsborneMAOsnowskiAOstadanOParaschosLLPickeringLPikeACPikeACPinkardDCPliskinDPPodhaskyJQuijanoVJRaczyCRaeVHRawlingsSRRodriguezACRoePMRogersJBacigalupoMCRRomanovNRomieuARothRKRourkeNJRuedigerSTRusmanESanches-KuiperRMSchenkerMRSeoaneJMShawRJShiverMKShortSWSiztoNLSluisJPSmithMASohnaJESSpenceEJStevensKSuttonNSzajkowskiLTregidgoCLTurcattiGvandeVondeleSVerhovskyYVirkSMWakelinSWalcottGCWangJWorsleyGJYanJYauLZuerleinMRogersJMullikinJCHurlesMEMcCookeNJWestJSOaksFLLundbergPLKlenermanDDurbinRSmithAJNature2008456535910.1038/nature07517258179118987734De novo assembly and analysis of RNA-seq dataRobertsonGScheinJChiuRCorbettRFieldMJackmanSDMungallKLeeSOkadaHMQianJQGriffithMRaymondAThiessenNCezardTButterfieldYSNewsomeRChanSKSheRVarholRKamohBPrabhuALTamAZhaoYMooreRAHirstMMarraMAJonesSJMHoodlessPABirolINat Methods2010790991210.1038/nmeth.151720935650


xml version 1.0 encoding utf-8 standalone no
mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance
xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd
metsHdr CREATEDATE 2012-12-22T07:21:12
agent ROLE CUSTODIAN TYPE ORGANIZATION
name BioMed Central
dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1
mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml
xmlData
epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI
http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd
epdcx:description epdcx:resourceId sword-mets-epdcx-1
epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork
http:purl.orgdcelements1.1title
epdcx:valueString The venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus)
http:purl.orgdctermsabstract
Abstract
Background
Snake venoms have significant impacts on human populations through the morbidity and mortality associated with snakebites and as sources of drugs, drug leads, and physiological research tools. Genes expressed by venom-gland tissue, including those encoding toxic proteins, have therefore been sequenced but only with relatively sparse coverage resulting from the low-throughput sequencing approaches available. High-throughput approaches based on 454 pyrosequencing have recently been applied to the study of snake venoms to give the most complete characterizations to date of the genes expressed in active venom glands, but such approaches are costly and still provide a far-from-complete characterization of the genes expressed during venom production.
Results
We describe the de novo assembly and analysis of the venom-gland transcriptome of an eastern diamondback rattlesnake (Crotalus adamanteus) based on 95,643,958 pairs of quality-filtered, 100-base-pair Illumina reads. We identified 123 unique, full-length toxin-coding sequences, which cluster into 78 groups with less than 1% nucleotide divergence, and 2,879 unique, full-length nontoxin coding sequences. The toxin sequences accounted for 35.4% of the total reads, and the nontoxin sequences for an additional 27.5%. The most highly expressed toxin was a small myotoxin related to crotamine, which accounted for 5.9% of the total reads. Snake-venom metalloproteinases accounted for the highest percentage of reads mapping to a toxin class (24.4%), followed by C-type lectins (22.2%) and serine proteinases (20.0%). The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters). The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.
Conclusions
We have provided the most complete characterization of the genes expressed in an active snake venom gland to date, producing insights into snakebite pathology and guidance for snakebite treatment for the largest rattlesnake species and arguably the most dangerous snake native to the United States of America, C. adamanteus. We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.
http:purl.orgdcelements1.1creator
Rokyta, Darin R
Lemmon, Alan R
Margres, Mark J
Aronow, Karalyn
http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1
http:purl.orgeprintentityTypeExpression
http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066
en
http:purl.orgeprinttermsType
http:purl.orgeprinttypeJournalArticle
http:purl.orgdctermsavailable
epdcx:sesURI http:purl.orgdctermsW3CDTF 2012-07-16
http:purl.orgdcelements1.1publisher
BioMed Central Ltd
http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus
http:purl.orgeprintstatusPeerReviewed
http:purl.orgeprinttermscopyrightHolder
Darin R Rokyta et al.; licensee BioMed Central Ltd.
http:purl.orgdctermslicense
http://creativecommons.org/licenses/by/2.0
http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights
http:purl.orgeprintaccessRightsOpenAccess
http:purl.orgeprinttermsbibliographicCitation
BMC Genomics. 2012 Jul 16;13(1):312
http:purl.orgdcelements1.1identifier
http:purl.orgdctermsURI http://dx.doi.org/10.1186/1471-2164-13-312
fileSec
fileGrp sword-mets-fgrp-1 USE CONTENT
file sword-mets-fgid-0 sword-mets-file-1
FLocat LOCTYPE URL xlink:href 1471-2164-13-312.xml
sword-mets-fgid-1 sword-mets-file-2 applicationpdf
1471-2164-13-312.pdf
structMap sword-mets-struct-1 structure LOGICAL
div sword-mets-div-1 DMDID Object
sword-mets-div-2 File
fptr FILEID
sword-mets-div-3