Group Title: BMC Plant Biology
Title: Rapid and accurate pyrosequencing of angiosperm plastid genomes
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00100014/00001
 Material Information
Title: Rapid and accurate pyrosequencing of angiosperm plastid genomes
Physical Description: Book
Language: English
Creator: Moore, Michael
Dhingra, Amit
Soltis, Pamela
Shaw, Regina
Farmerie, William
Folta, Kevin
Soltis, Douglas
Publisher: BMC Plant Biology
Publication Date: 2006
 Notes
Abstract: BACKGROUND:Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae).RESULTS:More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions.CONCLUSION:Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically.
General Note: Start page 17
General Note: M3: 10.1186/1471-2229-6-17
 Record Information
Bibliographic ID: UF00100014
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access: http://www.biomedcentral.com/info/about/openaccess/
Resource Identifier: issn - 1471-2229
http://www.biomedcentral.com/1471-2229/6/17

Downloads

This item has the following downloads:

PDF ( PDF )


Full Text




BMC Plant Biology Central


Research article


Rapid and accurate pyrosequencing of angiosperm plastid genomes
Michael J Moore*1,2, Amit Dhingra3, Pamela S Soltis2, Regina Shaw4,
William G Farmerie4, Kevin M Folta3 and Douglas E Soltis'


Address: 'Department of Botany, University of Florida, P.O. Box 118526, Gainesville, FL, 32611, USA, 2Florida Museum of Natural History,
University of Florida, P.O. Box 117800, Gainesville, FL, 32611, USA, 3Horticultural Sciences Department, University of Florida, P.O. Box 110690,
Gainesville, FL, 32611, USA and 4ICBR Genome Sequencing Service Laboratory, University of Florida, P.O. Box 100156, Gainesville, FL, 32610,
USA
Email: Michael J Moore* mjmoorel @ufl.edu; Amit Dhingra adhingra@ufl.edu; Pamela S Soltis psoltis@flmnh.ufl.edu;
Regina Shaw regina@biotech.ufl.edu; William G Farmerie wgf@biotech.ufl.edu; Kevin M Folta kfolta@ifas.ufl.edu;
Douglas E Soltis dsoltis@botany.ufl.edu
* Corresponding author



Published: 25 August 2006 Received: 06 April 2006
BMC Plant Biology 2006, 6:17 doi: 10.1 186/1471-2229-6-17 Accepted: 25 August 2006
This article is available from: http://www.biomedcentral.com/1471-2229/6/17
2006 Moore et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.



Abstract
Background: Plastid genome sequence information is vital to several disciplines in plant biology, including
phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number
of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing
technology. Here we report a further significant reduction in time and cost for plastid genome sequencing
through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS
20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes
of the basal eudicot angiosperms Nandina domestic (Berberidaceae) and Platanus occidentalis (Platanaceae).
Results: More than 99.75% of each plastid genome was simultaneously obtained during two GS 20
sequence runs, to an average depth of coverage of 24.6x in Nandina and 17.3x in Platanus. The Nandina
and Platanus plastid genomes shared essentially identical gene complements and possessed the typical
angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over
45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error
rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More
than 97% of all observed errors were associated with homopolymer runs, with -60% of all errors
associated with homopolymer runs of 5 or more nucleotides and -50% of all errors associated with
regions of extensive homopolymer runs. No substitution errors were present in either genome. Error
rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to
the inverted repeat and coding regions.
Conclusion: Highly accurate and essentially complete sequence information was obtained for the Nandina
and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the
GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional
shotgun-based genome sequencing techniques, although with approximately half the coverage of
previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to
angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and
phylogenetic research dramatically.




Page 1 of 13
(page number not for citation purposes)


70p7en7A7c







http://www.biomedcentral.com/1471-2229/6/17


Background
Plastid genome sequence information is of central impor-
tance to several fields of plant biology, including phyloge-
netics, molecular biology and evolution, and plastid
genetic engineering [1-6]. The relatively small size of the
plastid genome (~150 kb) has made its complete sequenc-
ing technically feasible since the mid-1980s, although
limitations in sequencing technology resulted in only a
few complete plastid genomes appearing between 1986
and 2000 [7]. However, the pace of plastid genome
sequencing has increased markedly over the last five years
[7]. More than 50 complete plastid genomes are now
available on GenBank, and several plastid genome
sequencing projects [8-10] promise to increase that
number to more than 200 in the near future. This dra-
matic growth in plastid genome sequencing has been
driven largely by improvements in Sanger sequencing
technology that have greatly reduced the time and cost
involved in genome sequencing [ 11].

New approaches to genome sequencing have been pro-
posed in recent years that, if effective, will further signifi-
cantly reduce the time and cost of obtaining whole plastid
genome sequences [11,12]. Perhaps the most promising
of these new technologies involves the Genome
Sequencer 20 (GS 20) System, a pyrosequencing platform
developed by the 454 Life Sciences Corporation (Bran-
ford, CT, USA; available through Roche Diagnostics, Indi-
anapolis, IN, USA). In pyrosequencing, the DNA sequence
is determined by analyzing flashes of light that are
released during the enzymatic conversion of pyrophos-
phate generated during template DNA extension, using a
predetermined sequence of dNTP addition [13]. The GS
20 System implements several novel technologies that
allow for relatively rapid and inexpensive pyrosequencing
on a massive scale [14]. These include an emulsion-based
method to amplify random fragment libraries of template
DNA in bulk, fiber-optic slides containing high-density,
picoliter-sized pyrosequencing reactors, and a three-bead
system to deliver the enzymes necessary for the pyrose-
quencing reactions. In a single run the GS 20 system gen-
erates up to 25 million high-quality bases in hundreds of
thousands of short sequence reads called flowgrams,
which are then assembled into genomic contigs. For rela-
tively small genomes, the high number of reads results in
a high average depth of sequence coverage, effectively
overcoming many of the limitations of pyrosequencing,
which include relatively short read length and uncertainty
in the length of homopolymer runs [14,15]. Perhaps the
greatest advantage of the GS 20 System is that it generates
genome sequence much more rapidly and economically
than traditional Sanger-based shotgun sequencing. It is
not necessary to clone template DNA into bacterial vec-
tors, and genome sequence can be obtained on the GS 20
in a single five-hour run with a few days of template prep-


aration. Likewise, the GS 20 System relies on less expen-
sive reagents than traditional Sanger sequencing.
However, the savings in time and money associated with
GS 20 de novo genome sequence comes at the cost of a
slightly higher error rate compared to traditional Sanger-
based genome sequence (~0.04% in GS 20 vs. 0.01% in
Sanger sequence) [ 14,16,17].

To date the GS 20 System has been successfully utilized in
an increasing number of de novo sequencing projects,
including sequencing the genomes of several bacteria and
the mitochondrial genome of an extinct species of mam-
moth, as well as exploring the sequence diversity present
in environmental samples [14,18-22]. Because of its small
size and similarity to bacterial genomes, the plastid
genome seems particularly amenable to sequencing via
the GS 20 System. In conjunction with the Angiosperm
Tree of Life (ATOL) project [8], part of which involves
sequencing 30 plastid genomes representing the phyloge-
netic diversity of angiosperms, we used the GS 20 to
sequence the complete plastid genomes of the eudicot
angiosperms Nandina domestic Thunb. (Berberidaceae)
and Platanus occidentalis L. (Platanaceae). A major focus of
the ATOL plastid genome sequencing project is the use of
whole-chloroplast genome sequence data to determine
the evolutionary relationships among the basal lineages
of eudicots, which have hitherto proved difficult to
resolve [23]. We therefore sequenced Nandina and Plata-
nus because they represent members of two phylogeneti-
cally pivotal basal lineages of eudicots (Ranunculales and
Proteales, respectively), which shared their last common
ancestor approximately 120 million years ago [24]. In
sequencing these two plastid genomes using the GS 20
System we had the following specific objectives: (1) to test
the overall feasibility of generating plastid genome
sequence using the GS 20 System, (2) to determine the
potential error rate in GS 20 de novo plastid genome
sequence, and (3) to determine whether the magnitude of
the GS 20 error rate is enough to offset any potential gains
in time and cost efficiency associated with the use of the
GS 20. Here we demonstrate the viability of the GS 20 Sys-
tem for plastid genome sequencing projects by generating
highly accurate and essentially complete plastid genome
sequences of both Nandina and Platanus, for a significant
reduction in time and cost over traditional Sanger-based
plastid genome sequencing.

Results
GS 20 sequencing run characteristics
Results of the GS 20 sequencing runs for Nandina and Pla-
tanus are summarized in Table 1. More than 99.75% cov-
erage of each genome was obtained by assembling the raw
sequence data from the titration and supplemental
sequencing runs (these data will be referred to as the com-
bined run data; see Methods), to an overall average depth


Page 2 of 13
(page number not for citation purposes)


BMC Plant Biology 2006, 6:17








http://www.biomedcentral.com/1471-2229/6/17


Table I: Characteristics of the GS 20 combined run data assemblies


Nandina


combined run data length
no. of combined data contigs
average contig length
size of largest contig
total no. of reads
average read length
overall average read depth (incl. one IR)
overall average read depth (incl. both IRs)
IR average read depth
SC average read depth
proportion of bases > Q40
no. of gaps
total gap length
average gap length
no. of zero-length gaps
size of largest gap


Platanus


130503 bp
8
16313 bp
35901 bp
31019
103.6 bp
24.6x
20.5x
24.2 x
24.7x
99.8%


136335 bp
10
13634 bp
28803 bp
23743
99.8 bp
17.3x
14.6x
28.2x
14.9x
99.4%
II
390 bp
35.5 bp
5
170 bp


34 bp
3.8 bp
7
32 bp


Characteristics of the GS 20 combined run data assemblies. The overall average read depth is calculated in two ways: by including one copy of the
inverted repeat (IR) region (to reflect the fact that the two copies of the IR are indistinguishable during genome sequencing, and are therefore
contigged together) and by including both copies of the IR region. SC = single-copy region.


of coverage of 24.6x in Nandina and 17.3x in Platanus.
Few gaps were present in either genome assembly (Table
1). All but three gaps were less than 50 bp, with many
zero-length gaps (no missing sequence between adjoining
contigs) present in both assemblies. Only one gap in
either assembly was larger than 100 bp (in Platanus; Table
1). In several cases gaps in the assemblies occurred in the
same regions of both genomes. Short gaps (mostly zero-
length, but all < 5 bp) were present at all four junctions
between the inverted repeat (IR) and single-copy (SC)
regions in both Nandina and Platanus, as well as within
the rpoB gene (32 bp and 27 bp gaps, respectively) of each
genome.

Genome characteristics
The plastid genomes of both Nandina and Platanus pos-
sess the typical genome structure observed in most
angiosperm plastids, with an IR region of ~25 kb separat-
ing large and small SC regions (Figs. 1, 2; Table 2) [25,26].
Neither genome is rearranged relative to Nicotiana
[27,28]. The plastid genomes of Nandina and Platanus
share essentially identical complements of coding genes,
each containing 30 tRNA genes, 4 rRNA genes, and 79
protein-coding genes (Table 3). Based on the presence of
internal stop codons, two pseudogenes (ycfl5 and ycf68)
are present in the Platanus plastid genome. In Nandina the
latter locus is also present as a pseudogene, although ycfl5
appears intact. Both of these genes have been frequently
reported as pseudogenes in other angiosperms [29,30],
and so their presence as pseudogenes in Nandina and Pla-
tanus is not surprising. Based on the presence of ACG start
codons in their DNA sequence, RNA editing appears to be
necessary for the proper translation of two genes in


Nandina (ndhD and rpl2) and three genes in Platanus
(ndhD, psbL, and rpl2), and likely occurs throughout each
genome on a broader scale [28,31].

Accuracy of the GS 20 sequence
Conventional sequencing of the IR, IR/SC junctions, and
regions surrounding putative coding sequence errors
resulted in 46134 bp of comparison sequence in Nandina
and 45249 bp of comparison sequence in Platanus.
Observed error rates in the combined run data for these
regions are summarized in Table 4. Observed numbers of
errors in combined run data and lengths of conventional
sequence data that were used in the error calculations are
presented in Table 5. The overall observed error rate was
0.043% in Nandina and 0.031% in Platanus, and the com-
bined overall error rate for both genomes was 0.037%
(Table 4).

Two types of errors were observed in the GS 20 combined
data sequence: errors associated with contig ends, and
insertions and deletions (indels), usually associated with
homopolymer runs. A small number of errors was present
within 50 bp of the ends of the combined data contigs in
both genomes (5 errors in Nandina and 6 errors in Plata-
nus). Including these errors increased overall error rates to
0.054% in Nandina and 0.044% in Platanus. However,
these errors were excluded from other error calculations
because they were expected as a result of the low depth of
coverage at contig ends, and because such errors were nec-
essarily checked by targeted Sanger sequencing when
bridging the gaps between contigs, unlike the remaining,
higher-coverage regions of the GS 20 assembly. All
remaining errors were indels, all but one of which (a C/G



Page 3 of 13
(page number not for citation purposes)


BMC Plant Biology 2006, 6:17








http://www.biomedcentral.com/1471-2229/6/17


*ATP synthase
* Ribosomal protein
O ycf


0 Cytochrome b6/f
* rRNA
0 Pseudogene


O NADH dehydrogenase
* RNA polymerase


Figure I
Plastid genome map of Nandina domestic (Berberidaceae). Map of the plastid genome of Nandina domestic (Berberi-
daceae), showing annotated genes and introns. Asterisks (*) after the gene names indicate the presence of introns; the introns
themselves are denoted by white boxes within genes. Within the genome map, the inverted repeat regions (IRA and IRB) are
depicted by the solid black bars, and the large and small single-copy regions (LSC and SSC) are depicted by the solid gray bars.
Regions that were conventionally sequenced are indicated by the blue bars to the inside of the genome map.





Page 4 of 13
(page number not for citation purposes)


* tRNA
0 Photosystem
* Other


BMC Plant Biology 2006, 6:17








http://www.biomedcentral.com/1471-2229/6/17


*ATP synthase
E Ribosomal protein
] ycf


* Cytochrome b6/f
* rRNA
* Pseudogene


o NADH dehydrogenase
* RNA polymerase


Figure 2
Plastid genome map of Platanus occidentalis (Platanaceae). Map of the plastid genome of Platanus occidentalis (Platan-
aceae), showing annotated genes and introns. Asterisks (*) after the gene names indicate the presence of introns; the introns
themselves are denoted by white boxes within genes. Within the genome map, the inverted repeat regions (IRA and IRB) are
depicted by the solid black bars, and the large and small single-copy regions (LSC and SSC) are depicted by the solid gray bars.
Regions that were conventionally sequenced are indicated by the blue bars to the inside of the genome map.





Page 5 of 13
(page number not for citation purposes)


* tRNA
* Photosystem
* Other


BMC Plant Biology 2006, 6:17







http://www.biomedcentral.com/1471-2229/6/17


Table 2: Basic characteristics of the Nandina and Platanus plastid genomes


total genome length
IR length
SSC length
LSC length
total length of coding sequence (both IRs)
total length of coding sequence (one IR)
total length of noncoding sequence (both IRs)
total length of noncoding sequence (one IR)
overall G/C content


Nandina

156599
26062
19002
85473
92284
75763
64315
54774
38.3%


Platanus

161791
25066
19509
92150
91397
75716
70394
61009
38.0%


Basic characteristics of the Nandina and Platanus plastid genomes. All lengths are given in base pairs (bp). IR = inverted repeat region; SSC = small
single-copy region; LSC = large single-copy region.


insertion in Platanus) were directly associated with
homopolymer runs (HRs). All HR-associated indel errors
fell into two overall classes (summarized in Table 6).
Approximately 85% of all errors associated with HRs
involved length variation in the number of bases in a
given HR. The remaining HR-associated errors involved
the insertion of a base identical in composition with a
given HR to a nearby, nonadjacent position. Because these
insertions appear similar to transpositions, they are
referred to as transposition-like insertions. An illustration
of a transposition-like insertion is provided in Figure 3A.

Substitution errors were not definitively observed in
either genome, although two differences in base composi-
tion between the conventional and GS 20 sequence were
observed in the IR of Nandina. However, because the con-
ventional IR sequence for Nandina was derived from a sep-
arate individual than that used in the GS 20 sequencing, it
is likely that both differences result from interindividual
variation, especially given that both sites possessed high-
quality phred scores (> 40) in the GS 20 sequence. These
two putative substitutions were therefore not included in
error calculations.

Characteristics of the homopolymer runs associated with
observed and estimated errors are also summarized in
Table 6. More than 95% of all error-associated HRs in
both genomes were A/T runs rather than C/G runs. A X2
test indicated that this A/T HR-associated error bias was
significantly higher than would be expected given the
observed A/T content of both genomes (P < 0.01 for both
genomes). Approximately half of all errors occurred in
regions characterized by groups of HRs of identical base
composition interrupted occasionally by a differing base
(these will be termed homopolymer run sets; an example
is illustrated in Figure 3B). The length distribution of HRs
associated with the observed errors is shown in Figure 4.
Approximately 60% of all errors were associated with runs
of 5 nucleotides or greater in both genomes. Of those
errors associated with runs less than 5 nucleotides, all


were associated with homopolymer run sets in Platanus, as
were 10 of 11 such errors in Nandina. All 10 of the HR set-
associated errors in Nandina occurred in a single 100-bp
extensive HR set within the trnV/rps12 spacer in the
inverted repeat. HR-associated insertion errors occurred
more frequently than deletion errors in both genomes
(~5x more frequently in Nandina and ~2.5x more fre-
quently in Platanus; Table 6).

Nearly all insertion errors in both genomes occurred at
sites with low or very low GS 20 quality scores (Table 7).
Approximately 81% of all insertion errors had GS 20
phred-equivalent quality scores < 20, and approximately
93% of insertion errors had quality scores < 40. However,
one insertion error in each genome occurred at a site with
a quality score > 40 (Table 7).

Errors were not distributed uniformly throughout either
plastid genome (Table 4). The combined error rate across
both genomes was higher in the SC regions than in the IR
regions (0.047% in the SC regions and 0.029% in the IR
regions). Regions of putative noncoding sequence also
exhibited a higher error rate (-2x higher) than regions of
putative coding sequence across both genomes (hence-
forth, putative coding and noncoding sequence will be
referred to simply as coding and noncoding sequence).
Similarly, error rates for noncoding sequence partitioned
into IR and SC regions were higher than for coding
sequence when pooled across both genomes (Table 4).
The lowest overall error rates for both genomes were
observed in the IR coding regions while the highest overall
error rates were observed in the IR and SC noncoding
regions. In both genomes at least one relatively small
region contained a disproportionately large percentage of
the total errors. A region of approximately 100 bp in the
trnV/rpsl2 spacer of the Nandina genome contained 11
errors (representing 55.0% of all observed errors) in asso-
ciation with an extensive homopolymer run set. Likewise,
three errors were observed in the ycfl gene in both
genomes (representing 15.0% of all errors in Nandina and


Page 6 of 13
(page number not for citation purposes)


BMC Plant Biology 2006, 6:17








http://www.biomedcentral.com/1471-2229/6/17


Table 3: List of genes present in the plastid genomes of Nandina and Platanus


Gene Class


Ribosomal RNAs

Transfer RNAs


trnH-GUG
trnG-UCC*
trnY-GUA
trnG-GCC
trnL-UAA*
trnW-CCA
trnV-GAC (x2)
trnN-GUU (x2)


Photosystem I


Photosystem II


Cytochrome b6/f


ATP synthase


rrn5 (x2)


trnK-UUU*
trnR-UCU
trnE-UUC
trnfM-CAU
trnF-GAA
trnP-UGG
trnI-GAU* (x2)
trnL-UAG

psaB


psbB
psbF
psbK
psbT

petB*
petN

atpB
atpl


rrnl6 (x2)


trnQ-UUG
trnC-GCA
trnT-GGU
trnS-GGA
trnV-UAC*
trnI-CAU (x2)
trnA-UGC* (x2)


psaC


psbC
psbH
psbL
psbZ

petD*


atpE


NADH
dehydrogenase


Ribosomal proteins
large subunit


ndhA*
ndhE
ndhl


rpl2* (x2)
rp122
rp136


small subunit


RNA polymerase

Miscellaneous
proteins


Hypothetical
proteins


ndhB* (x2) ndhC
ndhF ndhG
ndhJ ndhK


rpll4 rpil6*
rp123 (x 2) rp132


rps3 rps4
rpsll I rpsl2* (x2)
rps 6* rpsl8

rpoB rpoCl*

ccsA cemA
matK rbcL

ycf2 (x2) ycf3*
ycfl5 (x2; present in Nandina; T in Platanus)


List of genes present in the plastid genomes of Nandina and Platanus. Genes with an asterisk (*) contain introns; genes that are present as duplicate
copies due to their position within the inverted repeat regions are indicated as (x2). T = pseudogene.


21.5% of all errors in Platanus), and three errors were also
present in rpoB of Platanus.

Discussion
Using the GS 20 System, we generated highly accurate and
essentially complete plastid genome sequences simulta-
neously for two angiosperms in a short period of time (-2
weeks, including chloroplast isolation and library prepa-
ration) and for a significant reduction in cost (_$4500 per


genome, including all library preparation and sequence
run costs) over traditional shotgun-based genome
sequencing methods. This savings in time and cost derives
largely from the relative ease of template preparation and
the extremely high throughput of the GS 20 System,
which avoids the use of bacterial vectors and multiple
rounds of expensive dye terminator-based sequencing
reactions, both of which are necessary and time-consum-
ing (taking several weeks to complete) components of



Page 7 of 13
(page number not for citation purposes)


rrn4.5 (x2)


rrn23 (x2)


trnS-GCU
trnD-GUC
trnS-UGA
trnT-UGU
trnM-CAU
trnL-CAA (x2)
trnR-ACG (x2)


psal


psbD
psbl
psbM


petG


atpPF*


ndhD
ndhH



rpl20
rpl33


rps7 (x2)
rpsl4
rps 9

rpoC2

dpP*


ycf4


BMC Plant Biology 2006, 6:17








http://www.biomedcentral.com/1471-2229/6/17


Table 4: Error rates for the

Region

overall genome
overall SC
overall IR
overall coding
overall noncoding
SC coding
SC noncoding
IR coding
IR noncoding

Observed error rates for the
Nandina, Platanus, and both g
These error rates are based
regions of conventional comp
IR was included in error calci


Sanger-based shotgun s
the GS 20 System require
of template preparation
traditional Sanger-based
genome sequencing. Mo
ing using the GS 20 can 1
instrument runs, while
Sanger-based shotgun s
lary sequencer runs (usi
The small size of the plans
the savings accompany
tiple genomes to be sequ
release of larger GS 20 P
sequence up to four plas
drive down the cost of (
even more, to _$3500 p

It is important to note tl
sequencing of Nandina a
lower average coverage o


GS 20 plastid genome sequence genomes (~20x) compared to that reported by Margulies
Nandna Paans combined et al. [141 for de novo genome sequencing (~40x). A simi-
Nandina Platanus combined
lar reduction in coverage using Sanger-based sequencing
0.043 0.031 0.037 methods would also result in a significant cost savings,
0.030 0.064 0.047 perhaps still with a slightly higher sequence accuracy com-
0.054 0.004 0.029 pared to the GS 20 genome sequence. However, to take
0.027 0.029 0.028 full advantage of the ability to reduce coverage in Sanger-
0.085 0.036 0.062 based plastid genome sequencing would require the
0.036 0.055 0.046 sequencing of pure plastid DNA, something that can only
0.000 0.161 0.057 reliably be achieved at present by constructing whole-
0.018 0.000 0.009
0.115 0.01 I 0.063 genome bacterial artificial chromosome (BAC) libraries
and then strictly sequencing plastid DNA-containing
GS 20 plastid genome sequence of clones. The method of isolating plastid DNA using
enomes combined (given in percent), sucrose-gradient based chloroplast isolation and RCA (see
on known GS 20 errors discovered in Methods) that is employed in most angiosperm plastid

latison sequence. Onlyonne copy of the genome sequencing projects is significantly less expensive
than the construction of BAC libraries, although approxi-
mately 10-40% of the resulting RCA product consists of
sequencing [32]. We estimate that non-plastid DNA [7]. This contamination penalty mustbe
es approximately half the amount overcome in Sanger-based sequencing through the addi-
itime (~16 hours) compared to tion of extra sequencing capacity, thereby partially miti-
methods (~36 hours) for plastid gating against the significant savings that could be accrued
)reover, plastid genome sequenc- through reducing sequence coverage. The same contami-
be accomplished with two 4-hour nants also reduce overall plastid genome coverage in GS
obtaining plastid genomes with 20 sequencing runs, but this does not impede the recovery
equencing requires several capil- of essentially complete plastid genomes at high accuracy,
ng 384-well plates) per genome. as evidenced by the sequencing of the Nandina and Plata-
stid genome further contributes to nus genomes. Thus the GS 20 instrument seems a reason-
ng the GS 20 by allowing for mul- able and cost-effective alternative to Sanger-based
fenced simultaneously. The recent shotgun sequencing with respect to angiosperm plastid
icoTiterPlates with the capacity to genomics.


tid genomes at a time promises to
S 20 plastid genome sequencing
er genome.

hat the savings observed in GS 20
nd Platanus also resulted from the
obtainedd for these two chloroplast


The generation of GS 20 genome sequence comes at the
price of a slightly higher error rate (~0.04%) in compari-
son to Sanger sequencing (~0.01%) [16,17]. Nevertheless,
the small magnitude of this error is not enough to offset
the potential gains in time and cost efficiency of the GS 20
system. It is possible that the addition of extra GS 20


Table 5: Raw values used in error calculations


Nandina


Region

overall genome
overall SC
overall IR
overall coding
overall noncoding
SC coding
SC noncoding
IR coding
IR noncoding


# errors

20
6
14
9
II
6
0
3
II


Platanus


length (bp)

46134
20072
26062
33170
12946
16649
3405
16521
9541


# errors

14
13
10
10
4
10
3
0
I


combined


length (bp)

45249
20183
25066
34006
11243
18325
1858
15681
9385


# errors

34
19
15
19
15
16
3
3
12


length (bp)

91383
40255
51128
67176
24189
34974
5263
32202
18926


Raw values that were used in calculations of observed error in GS 20 plastid genome sequence. Length refers to the length of conventional
sequence data used in error calculations.


Page 8 of 13
(page number not for citation purposes)


BMC Plant Biology 2006, 6:17







http://www.biomedcentral.com/1471-2229/6/17


Table 6: Characteristics of GS 20 sequencing errors



proportion of length-variant HR errors
proportion of TLI HR errors
proportion of A/T HR errors
proportion of C/G HR errors
proportion of errors associated with HR sets
proportion of errors associated with HRs > 5
average length of HR associated with error
proportion of HR-associated insertion errors
proportion of HR-associated deletion errors


Nandina


Platanus

61.5
38.5
100.0
0.0
46.2
76.9
6.5
69.2
30.8


Characteristics of observed GS 20 sequencing errors that were associated with homopolymer runs. All values are reported in percent. HR =
homopolymer run; TLI = transposition-like insertion (see text).


sequencing lanes on the PicoTiterPlates could reduce error
rates below that observed in Nandina and Platanus, partic-
ularly in regions of relatively lower coverage. However,
adding more lanes for each genome would drive up the
cost of sequencing by reducing the number of plastid
genomes that could be sequenced per plate (currently,
four plastid genomes per plate are possible with the recent
release of larger PicoTiterPlates). Depending on the aims
and fiscal resources of a given sequencing project, the
extra cost imparted by additional PicoTiterPlate space
may not outweigh the benefits of slightly lower error rates.

The quantitative and qualitative aspects of the observed
error in the GS 20 genome sequence of Nandina and Plat-
anus are similar to those reported in published GS 20
sequence data. Although the error rates in Margulies et al.
[14] for de novo genome sequencing represent estimates
derived from consensus quality scores rather than
observed error rates derived from comparison to Sanger
sequence, the overall error rate reported for bacterial


GS 20

correct


genomes in [14] (0.04%) was similar to that observed in
both plastid genomes (0.043% in Nandina and 0.031% in
Platanus). Importantly, we achieved comparable error
rates to Margulies et al. [14] at approximately half the cov-
erage in [ 14]. This equivalent error rate of ~0.04% at lower
coverage is the result of recent improvements in the GS 20
assembly software (version 1.0.52.06); assembling the
Nandina and Platanus genomes using the older software
resulted in much higher error rates for both genomes
(0.07% for Nandina and 0.14% for Platanus). It is also
interesting to note that the lower average coverage of Pla-
tanus, which resulted directly from the higher percentage
of non-cpDNA contamination in the RCA product of Pla-
tanus (~44% contamination) vs. that of Nandina (~18%
contamination), did not result in a higher error rate com-
pared to Nandina (Table 4).

The high percentage of errors associated with HRs and HR
sets in Nandina and Platanus is similar to that reported in
previously published GS 20 genome sequence [14] and is


TTGATCCAAAAAAAAAG

TTG:TCCAAAAAAAAAG

t


( TAAAAT AAAAAAAAAAAATACAAAGAAAAAGAAAAAG

Figure 3
Illustrations of a transposition-like insertion error and a homopolymer run set. Illustrations of a transposition-like
insertion error and a homopolymer run set. (A) Comparison of a hypothetical stretch of GS 20 genome sequence (top) vs. the
"correct" sequence (bottom) in order to illustrate an example of a transposition-like insertion error, in which a base identical
in composition to a given HR is inserted in a nearby, nonadjacent position. The transposition-like insertion error in the GS 20
sequence is indicated by the arrow; the colon (:) in the "correct" sequence indicates the absence of the A at the same position.
(B) Example of a homopolymer run set.



Page 9 of 13
(page number not for citation purposes)


combined

84.8
15.2
97.0
3.0
51.5
57.6
5.8
78.8
21.2


BMC Plant Biology 2006, 6:17







http://www.biomedcentral.com/1471-2229/6/17


Figure 4
Distribution of errors associated with homopolymer
runs. Distribution of errors associated with homopolymer
runs, as a function of homopolymer run length.



unsurprising given the known limitations ofpyrosequenc-
ing technology [15]. The relatively high percentage of
errors associated with these long HRs or HR sets also
imparted some of the nonuniformity observed in the dis-
tributions of errors in both genomes. Likewise, the higher
frequency of such long homopolymer runs or sets in non-
coding plastid regions [33] explains the higher observed
error rates in noncoding regions of both genomes (Table
4). Finally, the A/T bias present in both genomes (Table 2)
does not appear to be solely responsible for the high pro-
portion of A/T-associated HR errors (Table 6). Whether
this excess of A/T HR errors is a byproduct of the GS 20
pyrosequencing technology is difficult to determine with-
out more extensive analyses of additional genome
sequences.

Another primary factor influencing the nonrandom distri-
bution of errors in both genomes was relative depth of
coverage in a particular region. The lower error rates
observed in the IR regions of Platanus probably resulted in
part from the essentially double coverage of the IR vs. SC
regions during GS 20 sequencing (although this relation-
ship does not hold in Nandina; Table 1). It is also likely
that the higher error rate observed in some areas of both
plastid genomes, as for example in ycfl and rpoB, resulted
from lower GS 20 sequence coverage in these regions. The

Table 7: GS 20 quality scores associated with insertion errors

# of insertion errors
# of insertion errors


GS 20 quality scores


Nandina Platanus combined


ultimate cause of this lower coverage is unknown, but a
plausible explanation involves the relative underamplifi-
cation of these regions during the RCA reactions [34].

As we have demonstrated, the presence of a small amount
of error in GS 20 genome sequence is not a serious imped-
iment to the future use of the GS 20 System. Because
nearly all errors in GS 20 sequence involve HR-associated
length variation, the few errors that occur in protein-cod-
ing sequence can be easily identified because they induce
frameshifts. Such errors can then be corrected through
conventional sequencing. The GS 20 System should there-
fore prove to be an extremely useful tool in generating
sequence for plastid coding regions, with only minimal
finishing required to achieve essentially 100% accuracy.
The GS 20-derived noncoding sequence will also be
highly accurate, although a small number of errors will
remain in the unchecked noncoding regions. However,
the great majority of these errors will be associated with
long homopolymer runs or homopolymer run sets, which
are regions that are known to evolve rapidly via length
mutations [35,36]. Moreover, long homopolymer runs
are also prone to PCR errors [37-39], and therefore even
conventional sequencing cannot guarantee 100% accu-
racy in such regions. For these reasons short length varia-
tion in such areas is frequently removed from
phylogenetic sequence alignments, and the few remaining
unchecked errors in GS 20 sequence are therefore unlikely
to cause major problems should they be included in phy-
logenetic analyses.

The GS 20 System thus appears to be a viable option for
plastid genome sequencing projects, especially given that
the strong conservation of gene content and order exhib-
ited by the Nandina and Platanus plastid genomes is
shared across the overwhelming majority of angiosperms
[25,26]. Perhaps the only significant limitation to the cur-
rent use of the GS 20 in angiosperm plastid genome
sequencing is posed by highly rearranged plastid
genomes. Such genomes are characterized by high num-
bers of repeats [26,40], which could drive misassemblies
during GS 20 sequence analysis due to short GS 20 read
lengths. However, because very few lineages of
angiosperms contain highly rearranged plastid genomes
(examples include the families Campanulaceae and Gera-
niaceae, as well as some legumes) [26], the GS 20 should
prove widely applicable to most angiosperms, as well as
land plants in general.


<20
20-40
> 40


14 8


Number of insertion errors in GS 20 combined sequence, as a
function of the GS 20 phred-equivalent quality score at the insertion
error site.


Conclusion
The utility of the GS 20 has already been demonstrated in
bacterial, mitochondrial, and environmental de novo
sequencing projects [14,18-22], and it shows promise for
a number of other high-throughput sequencing projects,
including transcriptome sequencing and SNP discovery.


Page 10 of 13
(page number not for citation purposes)


BMC Plant Biology 2006, 6:17







http://www.biomedcentral.com/1471-2229/6/17


Here we have successfully applied GS 20 pyrosequencing
technology to sequence the entire plastid genomes of two
distantly related angiosperms with a significant savings of
time and cost over traditional shotgun-based sequencing
methods. This savings was partially achieved by sequenc-
ing to a lower average coverage than that reported in other
GS 20 de novo genome sequencing projects. Nevertheless,
this _20x level of coverage was sufficient for the near com-
plete recovery of both plastid genomes with ~99.96%
accuracy. The GS 20 System may well usher in a new era
of rapid and inexpensive plastid genome sequencing,
thereby revolutionizing the fields of plant genetics and
phylogenetics by dramatically expanding the amount of
sequence data available to both.

Methods
Fresh leaf material of Nandina and Platanus was collected
on the campus of the University of Florida for chloroplast
isolation. Voucher specimens (Nandina, M. J. Moore 310;
Platanus, M. J. Moore 309) have been deposited in the her-
barium of the Florida Museum of Natural History (FLAS).
Purified chloroplasts were isolated from approximately
8.2 g of Nandina leaf material and 30.8 g of Platanus leaf
material, following the sucrose gradient protocol of
Jansen et al. [7]. Two 25-mL sucrose step gradients were
used for each species. The purified chloroplasts were lysed
in a solution containing 1.0 ptL chloroplasts, 4.0 ptL lx
PBS, and 1.5 ptL activated solution A [7]. The lysis reac-
tions were incubated on ice for 10 min, and then were
stopped using 3.5 ptL of solution B [7]. The released chlo-
roplast DNA (cpDNA) was amplified via rolling circle
amplification (RCA) [41] using the Repli-G kit (Qiagen,
Inc., Valencia, CA, USA), following the manufacturer's
instructions. To assess the relative percentage of cpDNA
vs. nuclear DNA, RCA products were digested with EcoRI
and visualized on agarose gels following the protocol in
Jansen et al. [7].

GS 20 library construction and sequencing were per-
formed as described in the supplementary material and
methods of Margulies et al. [14] with slight modifications
as specified by 454 Life Sciences. Briefly, high molecular
weight DNA from the RCA reactions was sheared by neb-
ulization to a size range of 300-800 bp. DNA fragment
ends were repaired and phosphorylated using T4 DNA
polymerase and T4 polynucleotide kinase. Adaptor oligo-
nucleotides "A" and "B" supplied with the 454 Life Sci-
ences sequencing reagent kit were ligated to the DNA
fragments using T4 DNA ligase. Purified DNA fragments
were hybridized to DNA capture beads and clonally
amplified by emulsion PCR (emPCR). DNA capture beads
containing amplified DNA were deposited onto a 40 x 75
mm PicoTiterPlate equipped with an eight-lane gasket.
This gasket divides the plate into eight identical regions
(lanes) in which the pyrosequencing reactions occur dur-


ing GS 20 sequencing runs. Each species was initially
assigned four lanes on a single plate for a titration
sequencing run, which is a standard preliminary sequenc-
ing run in which the relative quality of GS 20 libraries is
assessed. Preliminary analyses of these data allowed for
the estimation of the number of additional GS 20
sequencing lanes (three for Nandina, five for Platanus) on
a second plate that were necessary to obtain approxi-
mately 20x coverage for each genome. This second
sequencing run will be referred to as the supplementary
run.

DNA sequence data from the titration and supplementary
runs were combined in a single assembly for each species
using version 1.0.52.06 of the GS 20 Newbler sequence
assembly software. These data are referred to as the com-
bined run data. The combined data contigs were then
imported into DOGMA [42] to determine their approxi-
mate positions within the plastid genome. Based on this
information, putatively adjacent contigs were examined
in Sequencher 4.2 (GeneCodes Corp., Ann Arbor, MI,
USA) in order to unite any contigs where short sequence
overlap at the ends went undetected in the initial assem-
bly. Gaps between contigs were bridged by designing cus-
tom primers near the ends of the GS 20 contigs for PCR
and conventional capillary-based sequencing.

To estimate the accuracy of the GS 20 sequence, custom
primers were designed to check all possible frame shift
errors encountered in the preliminary DOGMA annota-
tion of the GS 20 sequence of both genomes using PCR
and conventional sequencing. In addition, the four junc-
tions between the inverted repeat (IR) and single-copy
(SC) regions of both genomes were sequenced conven-
tionally, as was the entire IR region for both genomes. The
IR regions were amplified using the recently described
ASAP method [43], which utilizes a set of 27 overlapping
primer pairs that are designed to obtain extensive cover-
age of the IR across the phylogenetic diversity of
angiosperms. RCA product derived from the same chloro-
plast isolations used in GS 20 sequencing was utilized for
all amplifications involving the single-copy regions of
Nandina and for all regions of Platanus. The IR region of
Nandina was amplified from a separate total DNA isola-
tion from a different individual collected at Kanapaha
Botanical Gardens in Gainesville, Florida (A. Dhingra
s.n.).

The completed genome sequences were annotated using
DOGMA and are available in GenBank.

Authors' contributions
MJM isolated chloroplasts and performed RCA reactions,
performed PCR for all regions outside of the IR, annotated
the genomes, performed all error analyses, and drafted the


Page 11 of 13
(page number not for citation purposes)


BMC Plant Biology 2006, 6:17








http://www.biomedcentral.com/1471-2229/6/17


manuscript. AD performed ASAP PCR for the IR regions.
DES and PSS participated in the design of the study. RS
performed laboratory protocols for GS 20 sequencing.
WGF coordinated the GS 20 sequencing, assembled the
raw data, and contributed to writing the Methods. KMF
participated in the coordination of the study. All authors
read and approved the final manuscript.


Acknowledgements
The authors thank Bob Jansen for teaching the senior author plastid isola-
tion techniques and Tim Chumley for help with RCA and creating plastid
genome maps. We are also grateful to Robert Ferl, Beth Laughner, and all
the members of the Ferl and Hannah labs for providing lab space and equip-
ment for plastid isolations. We also thank three anonymous reviewers for
their helpful comments. This work was completed as part of the
Angiosperm Tree of Life project, funded by the National Science Founda-
tion (EF-043 1266 to DES, PSS, et al.).

References
I. Olmstead RG, Palmer JD: Chloroplast DNA systematics: A
review of methods and data analysis. Am J Bot 1994,
81(9):1205-1224.
2. Savolainen V, Chase MW: A decade of progress in plant molec-
ular phylogenetics. Trends Genet 2003, 19(1 2):717-724.
3. Bungard RA: Photosynthetic evolution in parasitic plants:
insight from the chloroplast genome. Bioessays 2004,
26(3):235-247.
4. Maliga P: Plastid transformation in higher plants. Annu Rev Plant
Biol 2004, 55:289-313.
5. Grevich JJ, Daniell H: Chloroplast genetic engineering: Recent
advances and future perspectives. Crit Rev Plant Sci 2005,
24(2):83-107.
6. Dhingra A, Daniell H: Chloroplast genetic engineering via org-
anogenesis or somatic embryogenesis. In Arabidopsis Protocols
Volume 323. 2nd edition. Edited by: Salinas J, Sanchez-Serrano JJ.
Totowa, New Jersey, USA, Humana Press; 2005:525.
7. Jansen RK, Raubeson LA, Boore JL, dePamphilis CW, Chumley TW,
Haberle RC, Wyman SK, Alverson AJ, Peery R, Herman SJ, Fourcade
HM, Kuehl JV, McNeal JR, Leebens-Mack J, Cui L: Methods for
obtaining and analyzing whole chloroplast genome
sequences. Methods Enzymol 2005, 395:348-384.
8. Angiosperm Tree of Life project [http://www.flmnh.ufl.edu/
angiospermATOL/]_
9. Green Tree of Life project [http://ucieps.berkeley.edu/Treeof
Life]
10. Comparative chloroplast genomics project [http://evo
gen.jgi.doe.gov/second levels/chloroplasts/jansen project home/
chlorosite.html]
II. Metzker ML: Emerging technologies in DNA sequencing.
Genome Res 2005, 15(1 2): 1767-1776.
12. Shendure J, Mitra RD, Varma C, Church GM: Advanced sequenc-
ing technologies: methods and goals. Nat Rev Genet 2004,
5(5):335-344.
13. Ronaghi M, Uhlen M, Nyren P: A sequencing method based on
real-time pyrophosphate. Science 1998, 281(5375):363, 365.
14. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA,
Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, FierroJM,
Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando
SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR,
Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB,
McDade KE, McKenna MP, Myers EW, Nickerson E, NobileJR, Plant
R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW,
Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang
SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM: Genome
sequencing in microfabricated high-density picolitre reac-
tors. Nature 2005, 437(7057):376-380.
15. Ronaghi M: Pyrosequencing sheds light on DNA sequencing.
Genome Res 2001, 11 ():3- I I.
16. Ewing B, Green P: Base-calling of automated sequencer traces
using phred. II. Error probabilities. Genome Res 1998,
8(3): 186-194.


17. Meldrum D: Automation for genomics, part one: preparation
for sequencing. Genome Res 2000, 10(8): 1081-1092.
18. Poinar HN, Schwarz C, Qi J, Shapiro B, Macphee RD, Buigues B,
Tikhonov A, Huson DH, Tomsho LP, Auch A, Rampp M, Miller W,
Schuster SC: Metagenomics to paleogenomics: large-scale
sequencing of mammoth DNA. Science 2006,
311(5759):392-394.
19. Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M,
Peterson DM, Saar MO, Alexander S, Alexander EC, Rohwer F:
Using pyrosequencing to shed light on deep mine microbial
ecology under extreme hydrogeologic conditions. BMC
Genomics 2006, 7:57.
20. Goldberg SM, Johnson J, Busam D, Feldblyum T, Ferriera S, Friedman
R, Halpern A, Khouri H, Kravitz SA, Lauro FM, Li K, Rogers YH,
Strausberg R, Sutton G, Tallon L, Thomas T, Venter E, Frazier M, Ven-
ter JC: A Sanger/pyrosequencing hybrid approach for the
generation of high-quality draft assemblies of marine micro-
bial genomes. ProcNatl AcadSci USA 2006, 103(30): 11240-11245.
21. Hofreuter D, Tsai J, Watson RO, Novik V, Altman B, Benitez M, Clark
C, Perbost C, Jarvie T, Du L, Galan JE: Unique features of a highly
pathogenic Campylobacter jejuni strain. Infect Immun 2006,
74(8):4694-4707.
22. Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR,
Arrieta JM, Herndl GJ: Microbial diversity in the deep sea and
the underexplored "rare biosphere". Proc Natl Acad Sci US A
2006, 103(32):12115-12120.
23. Soltis DE, Soltis PS, Endress PK, Chase MW: Phylogeny and Evolu-
tion of Angiosperms. Sunderland, Massachusetts, USA Sinauer
Associates; 2005.
24. Anderson CL, Bremer K, Friis EM: Dating phylogenetically basal
eudicots using rbcL sequences and multiple fossil reference
points. Am j 8ot 2005, 92:1737-1748.
25. Palmer JD: Plastid chromosomes: structure and evolution. In
Cell Culture and Somatic Cell Genetics of Plants, vol 7A, The Molecular Biol-
ogy of Plastids Edited by: Hermann RG. Vienna Academic Press, Inc.;
1991:5-53.
26. Raubeson LA, Jansen RK: Chloroplast genomes of plants. In Plant
Diversity and Evolution: Genotypic and Phenotypic Variation in Higher Plants
Edited by: Henry RJ. Cambridge, Massachusetts, USA CABI Publish-
ing; 2005:45-68.
27. Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsuba-
yashi T, Zaita N, Chunwongse J, Obokata J, Yamaguchi-Shinozaki K,
Ohto C, Torazawa K, Meng BY, Sugita M, Deno H, Kamogashira T,
Yamada K, Kusuda J, Takaiwa F, Kato A, Tohdoh N, Shimada H, Sug-
iura M: The complete nucleotide sequence of the tobacco
chloroplast genome: its gene organization and expression.
EMBOJ 1986, 5(9):2043-2049.
28. Wakasugi T, Tsudzuki T, Sugiura M: The genomics of land plant
chloroplasts: Gene content and alteration of genomic infor-
mation by RNA editing. Photosynth Res 2001, 70(1): 107-1 18.
29. Schmitz-Linneweber C, Maier RM, Alcaraz JP, Cottet A, Herrmann
RG, Mache R: The plastid chromosome of spinach (Spinacia
oleracea): complete nucleotide sequence and gene organiza-
tion. Plant Mol Biol 2001, 45(3):307-3 15.
30. Steane DA: Complete Nucleotide Sequence of the Chloro-
plast Genome from the Tasmanian Blue Gum, Eucalyptus
globulus (Myrtaceae). DNA Res 2005, 1 2(3):215-220.
31. Tsudzuki T, Wakasugi T, Sugiura M: Comparative analysis of RNA
editing sites in higher plant chloroplasts. J Mol Evol 2001, 53(4-
5):327-332.
32. Rogers YH, Venter JC: Genomics: massively parallel sequenc-
ing. Nature 2005, 437(7057):326-327.
33. Powell W, Morgante M, Andre C, McNicol JW, Machray GC, Doyle
JJ, Tingey SV, Rafalski JA: Hypervariable microsatellites provide
a general source of polymorphic DNA markers for the chlo-
roplast genome. Curr Biol 1995, 5(9): 1023-1029.
34. Lasken RS, Egholm M: Whole genome amplification: abundant
supplies of DNA from precious samples or clinical speci-
mens. Trends Biotechnol 2003, 21(1 2):53 1-535.
35. Strauss BS: Frameshift mutation, microsatellites and mis-
match repair. Mutat Res 1999, 437(3): 195-203.
36. Provan J, Powell W, Hollingsworth PM: Chloroplast microsatel-
lites: new tools for studies in plant ecology and evolution.
Trends Ecol Evol 2001, 16(3): 142-147.





Page 12 of 13
(page number not for citation purposes)


BMC Plant Biology 2006, 6:17








http://www.biomedcentral.com/1471-2229/6/17


37. Zirvi M, Nakayama T, Newman G, McCaffrey T, Paty P, Barany F:
Ligase-based detection of mononucleotide repeat
sequences. Nucleic Acids Res 1999, 27(24):e40.
38. Clarke LA, Rebelo CS, Goncalves J, Boavida MG, Jordan P: PCR
amplification introduces errors into mononucleotide and
dinucleotide repeat sequences. Mol Pathol 2001, 54(5):351-353.
39. Liepelt S, Kuhlenkamp V, Anzidei M, Vendramin GG, Ziegenhagen B:
Pitfalls in determining size homoplasy of microsatellite loci.
Mol Ecol Notes 2001, I (4):332-335.
40. Cosner ME, Jansen RK, Palmer JD, Downie SR: The highly rear-
ranged chloroplast genome of Trachelium caeruleum (Cam-
panulaceae): multiple inversions, inverted repeat expansion
and contraction, transposition, insertions/deletions, and sev-
eral repeat families. Curr Genet 1997, 31 (5):419-429.
41. Dean FB, Nelson JR, Giesler TL, Lasken RS: Rapid amplification of
plasmid and phage DNA using Phi 29 DNA polymerase and
multiply-primed rolling circle amplification. Genome Res 2001,
I 1(6):1095-1099.
42. Wyman SK, Jansen RK, Boore JL: Automatic annotation of
organellar genomes with DOGMA. Bioinformatics 2004,
20(17):3252-3255.
43. Dhingra A, Folta KM: ASAP: amplification, sequencing & anno-
tation of plastomes. 8MC Genomics 2005, 6:176.


Page 13 of 13
(page number not for citation purposes)


Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright

Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishing adv.asp


BMC Plant Biology 2006, 6:17




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs