Group Title: Genome biology
Title: Lateral gene transfer and ancient paralogy of operons containing redundant copies of tryptophan-pathway genes in Xylella species and in heterocystous cyanobacteria
Full Citation
Permanent Link:
 Material Information
Title: Lateral gene transfer and ancient paralogy of operons containing redundant copies of tryptophan-pathway genes in Xylella species and in heterocystous cyanobacteria
Series Title: Genome biology
Physical Description: Book
Language: English
Creator: Xie, Gary
Bonner, Carol
Brettin, Tom
Gottardo, Raphael
Keyhani, Nemat
Jensen, Roy
Publication Date: 2003
Abstract: BACKGROUND:Tryptophan-pathway genes that exist within an apparent operon-like organization were evaluated as examples of multi-genic genomic regions that contain phylogenetically incongruous genes and coexist with genes outside the operon that are congruous. A seven-gene cluster in Xylella fastidiosa includes genes encoding the two subunits of anthranilate synthase, an aryl-CoA synthetase, and trpR. A second gene block, present in the Anabaena/Nostoc lineage, but not in other cyanobacteria, contains a near-complete tryptophan operon nested within an apparent supraoperon containing other aromatic-pathway genes.RESULTS:The gene block in X. fastidiosa exhibits a sharply delineated low-GC content. This, as well as bias of codon usage and 3:1 dinucleotide analysis, strongly implicates lateral gene transfer (LGT). In contrast, parametric studies and protein tree phylogenies did not support the origination of the Anabaena/Nostoc gene block by LGT.CONCLUSIONS:Judging from the apparent minimal amelioration, the low-GC gene block in X. fastidiosa probably originated by LGT at a relatively recent time. The surprising inability to pinpoint a donor lineage still leaves room for alternative, albeit less likely, explanations other than LGT. On the other hand, the large Anabaena/Nostoc gene block does not seem to have arisen by LGT. We suggest that the contemporary Anabaena/Nostoc array of divergent paralogs represents an ancient ancestral state of paralog divergence, with extensive streamlining by gene loss occurring in the lineage of descent representing other (unicellular) cyanobacteria.
General Note: Periodical Abbreviation:Genome Biol.
General Note: Start page R14
General Note: M3: 10.1186/gb-2003-4-2-r14
 Record Information
Bibliographic ID: UF00100054
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access:
Resource Identifier: issn - 1465-6906


This item has the following downloads:


Full Text


Lateral gene transfer and ancient paralogy of operons containing
redundant copies of tryptophan-pathway genes in Xylella species
and in heterocystous cyanobacteria
Gary Xie*t, Carol A Bonner*, Tom Brettint, Raphael Gottardo*, Nemat 0
Keyhani* and Roy A Jensen*`*

Addresses: *Department of Microbiology and Cell Science, University of Florida, PO Box 110700, Gainesville, FL 32611, USA. 'BioScience
Division, Los Alamos National Laboratory, Los Alamos, NM 87544, USA. *Department of Chemistry, City College of New York, New York, NY
10031, USA.

Correspondence: Nemat Keyhani. E-mail:

Published: 29 January 2003 Received: 27 September 2002
Genome Biology 2003, 4:R14 Revised: 4 November 2002
Genome Biology 2003, 4:RI4Accepted: 26 November 2002
The electronic version of this article is the complete one and can be
found online at
2003 Xie et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media
for any purpose, provided this notice is preserved along with the article's original URL.


Background: Tryptophan-pathway genes that exist within an apparent operon-like organization
were evaluated as examples of multi-genic genomic regions that contain phylogenetically
incongruous genes and coexist with genes outside the operon that are congruous. A seven-gene
cluster in Xylella fastidiosa includes genes encoding the two subunits of anthranilate synthase, an
aryl-CoA synthetase, and trpR. A second gene block, present in the Anabaena/Nostoc lineage, but
not in other cyanobacteria, contains a near-complete tryptophan operon nested within an
apparent supraoperon containing other aromatic-pathway genes.

Results: The gene block in X. fastidiosa exhibits a sharply delineated low-GC content. This, as
well as bias of codon usage and 3:1 dinucleotide analysis, strongly implicates lateral gene transfer
(LGT). In contrast, parametric studies and protein tree phylogenies did not support the
origination of the Anabaena/Nostoc gene block by LGT.

Conclusions: Judging from the apparent minimal amelioration, the low-GC gene block in
X. fastidiosa probably originated by LGT at a relatively recent time. The surprising inability to
pinpoint a donor lineage still leaves room for alternative, albeit less likely, explanations other than
LGT. On the other hand, the large Anabaena/Nostoc gene block does not seem to have arisen by
LGT. We suggest that the contemporary Anabaena/Nostoc array of divergent paralogs represents
an ancient ancestral state of paralog divergence, with extensive streamlining by gene loss
occurring in the lineage of descent representing other (unicellular) cyanobacteria.

Background background of general conviction has held that LGT is rare,
Lateral gene transfer especially between distant organisms. However, the modern
Lateral gene transfer (LGT) has been generally accepted for era of genomics has been accompanied by increasingly
some time, as exemplified by the endosymbiotic hypothe- numerous claims that LGT is frequent [3-6], and there now
sis of organelle origin [1,2]. Nevertheless, a long-standing seems little doubt that LGT exerts a significant influence

Genome Biology 2003, 4:R 14


RI14.2 Genome Biology 2003, Volume 4, Issue 2, Article R14 Xie et al.

upon evolutionary histories. Indeed, it has even been
asserted that vertical evolutionary patterns of descent might
be impossibly masked by rampant events of LGT and that, in
fact, instead of bifurcating phylogenetic trees, a reticulate
(net-like) pattern exists [7-9]. On the other hand, others
urge a more balanced perspective, pointing out that alterna-
tive explanations for apparent cases of LGT have not always
been considered [10-14]. The rationale for explanations
other than LGT for genealogical incongruities (such as
hidden paralogies and reconstruction artifacts) have been
presented in comprehensive detail by Glansdorff [15].

Woese [16] contends that the rRNA tree is a valid represen-
tation of organismal genealogy, that LGT was rampant only
before the initial bifurcation of the universal phylogenetic
tree, and that LGT has become progressively more restricted
as a function of elapsed evolutionary time. Using the
aminoacyl-tRNA synthases as an example of the modular-
type entities asserted to be most amenable to LGT, Woese
concludes that the genealogical trace of vertical gene flow is
readable, despite a significant jumbling influence of LGT. If
correct, this allows the optimistic viewpoint that the complex
interplay of vertical gene descent and LGT can be deciphered
to yield correct evolutionary histories, provided that suffi-
ciently detailed studies are done.

Approaches for detection of LGT events are either phyloge-
netic or parametric. Phylogenetic approaches depend on
congruence of phylogenetic trees. Aside from technical diffi-
culties of inferring high-quality trees, conflicts between trees
under comparison are not necessarily due to LGT, but can
arise from coincidental loss of divergent paralogs in differ-
ent, widely spaced lineages or from convergent evolution.
Parametric approaches for detection of LGT include (but are
not limited to) the analysis of nucleotide composition, dinu-
cleotide frequencies and codon usage biases. Lawrence and
Ochman [17] used such parametric analysis to identify a set
of Escherichia coli genes (17.5% of the genome) having puta-
tive origin by LGT, and this has stimulated much discussion.
High rates of both false positives and false negatives have
been asserted by others [18,19], but this is tempered by pre-
sentation of a rationale for why phylogenetic and different
parametric methods detect different gene subsets [20-22]. A
consensus seems to be emerging that the most proficient
attempts to reconstruct evolutionary events will employ a
multifaceted approach that combines tree inference with
parametric analysis in a biological context [21,22]. Lawrence
and Ochman [21] provide a number of examples of how the
context of biological information can assist the analysis, and
this approach is implemented herein.

If each member of a linked group of genes is already repre-
sented elsewhere in a genome, their origin by LGT is a dis-
tinct possibility, as their transfer en bloc as an operon unit
would have required only a single evolutionary event.
During an ongoing analysis of the genomic distribution of 14

tryptophan-pathway genes, we observed two such cases, that
is, where one set of genes was phylogenetically congruent, in
contrast to the incongruence of redundant gene copies that
were linked to one another. We have evaluated the evidence
for the alternative possibilities of LGT or ancient paralogy,
as reported here.

A block of Trp-pathway genes in Xylella
The phylogenetic incongruence of trpR, a regulatory gene in
Xylellafastidiosa, led to recognition of a low-GC gene block
in X. fastidiosa. The tryptophan repressor (TrpR) is quite
limited in its phylogenetic distribution, being consistently
present only within the enteric lineage, as shown in the
protein tree of Figure 1. Here TrpR of Shewanella putrefa-
ciens marks the outlying sequence of the enteric lineage
(shown in gray). Outside the boundaries of the enteric
lineage, only Coxiella burnetii, X. fastidiosa and two
chlamydial species are thus far known to possess trpR. The
distribution of trpR in the later three lineages is phylogeneti-
cally incongruent because they are widely spaced from one
another on the 16S rRNA tree.

In Chlamydia trachomatis and Chlamydophila psittaci,
trpR is positioned near structural genes of tryptophan
biosynthesis, but no indication of recent origin by LGT of
genes in this region was obtained [23]. X. fastidiosa trpR is
separated by three genes from two structural genes of tryp-
tophan biosynthesis. These latter genes do not appear to be
essential for the primary task of tryptophan biosynthesis as
all seven genes of tryptophan biosynthesis are represented
elsewhere in the genome within one of two operons. Thus, in
X. fastidiosa the incongruous phylogenetic position of trpR,
the redundancy of the trp-linked genes encoding trpAa and
trpAb, and the distinct phylogenetic incongruence of the
latter gene pair all supported a reasonable possibility of
origin by LGT.

The tryptophan supraoperon of AnaboenalNostoc
All cyanobacteria possess each of the seven Trp-pathway
genes at dispersed loci, and individual trees of proteins cor-
responding to these dispersed genes are phylogenetically
congruent. Although this generalization also applies to
Anabaena/Nostoc, this latter lineage is unique among
cyanobacteria in its possession of an additional set of Trp-
pathway genes (lacking only trpC) that coexist within an
apparent operon. As shown in Figure 2, both Anabaena and
Nostoc exhibit the same relative order of operonic trp genes:
trpAa.trpAb -- trpD -- trpEa -- trpEb -- trpB. trpAa and
trpAb are fused, as indicated in Figure 2 with a filled bar and
in the text by the bullet in the notation: trpAa*trpAb. In
Anabaena, qor (encoding NADPH: quinone reductase) has
been inserted between trpD and trpEa. Another qor paralog
is present elsewhere in the genome of Anabaena. Nostoc
also has two qor paralogs, but neither resides within the
tryptophan operon. Other cyanobacteria lack qor homologs
altogether. In Nostoc, tyrPi (encoding tyrosinase) has been

Genome Biology 2003, 4:R14

Genome Biology 2003, Volume 4, Issue 2, Article R 14 Xie et al. R14.3

Figure I
Protein tree for TrpR. Bootstrap values are shown at internal branch positions as percentages (1,000 replicates).

inserted between trpEa and trpEb. All other cyanobacteria,
including Anabaena, lack tyrPi. The two trp operons are
less compact than frequently observed elsewhere, and rela-
tively large intergenic spacing exists, especially in N. puncti-
forme. The only instance of translational coupling is
between trpAa*trpAb and trpD in N. punctiforme.

The tryptophan operons appear to be nested within what
could be a larger unit of transcription that is reminiscent of
what has been called a supraoperon in Bacillus subtilis [24].
The genes comprising the supraoperon of B. subtilis are
aroG -> aroB -> aroH -> trpAaBDCEbEa -> hisHb -- tyrAp
-- aroF. A hierarchy of internal promoters and terminators
exists for differential control of the B. subtilis supraoperon.
The Anabaena/Nostoc linkage group is additionally reminis-
cent of the B. subtilis supraoperon in the presence of aroB

and tyrA. Although B. subtilis does not have aroA1, repre-
sented in its supraoperon (as do Anabaena and Nostoc),
aroA1i is the homology class (of three possible DAHP syn-
thase homologs distributed in nature [25-27]) that is utilized
by B. subtilis. A number of supraoperon gene insertions have
occurred outside of the trp operon as well. These differ for
Anabaena and Nostoc as depicted in Figure 2. Anabaena
has genes encoding aph and a hypothetical gene (open
reading frame (ORF)) inserted between aroB and the
trpAa*trpAb fusion. The aph gene encodes an uncharacter-
ized protein of the defined alkaline phosphatases (metal-
loenzyme superfamily) (group COG1524 in the COGS
database). Among cyanobacteria, only Nostoc has homologs
of these two Anabaena genes, although they are not inserted
in the Nostoc supraoperon. Nostoc has frnE (encoding a
thiol-disulfide isomerase) inserted between tyrA(p) and aroB.

Genome Biology 2003, 4:R 14

Shewanella putrefaciens

nia pestis

obacter cloacae

onella paratyphi

obacter aerogenes

'siella pneumoniae

erichia coli

Imonella typhimurium
- Pasteurella multocida

5-- Haemophilus influenzae
Haemophilus actinomycetemcomitans 14

RI4.4 Genome Biology 2003, Volume 4, Issue 2, Article R14 Xie et al.

Synechocystis sp.
PCC 6803

Anabaena sp.
PCC 7120

Nostoc punctiforme
ATCC 19133

) aro

> tyrA(P)

3,573,470 bp
trpAa aroE aroC trpB tyaroB a F trpC rpAb a trpEb aroD
pheA aro aroB trpD
trp a
S1,000,000 2,000,000 3,000,000 4,000,000

iS^ d j4 p4 p 1 ^

5,000,000 6,000,000

.4i o \ I '

trpAb aroA1 a tyrAa aroE aroB 2 aroD aroC aroA1 _3 trpEb 2 aroH pheA trpD 3 aroF trpC/
trpAa aroAl _2 trpB 2 trpAa*trpAb 2 trpEa 2 /
aroG trpD 2 6,413,771 bp

478,304 bp 492,165 bp
aroA| 3 tyrA
25 15 56 56 201 -53 361 91 15 156

Al- AIA -2 AIp -3 B_2 B_3 C D E F G H pheA tyrA1 Aa Ab B C D42 D13 Eb_2 Ea12
aro genes trp genes 2&t b
Fragments, order unknown 2,666bp

15,992 bp -----2760 bp
2aroAl 0 4 1 1tyA (p)
263 118 60 120 480 183 481 -1 138

Figure 2
Genomic organization of aromatic-pathway genes in cyanobacteria. Genes relevant to the common pathway segment, the tryptophan branch, the
tyrosine branch, and the phenylalanine branch are color-coded, as indicated. A system for uniform genomic naming of Trp-pathway genes or domains has
been used as previously implemented [23,57]. Fused catalytic domains are joined by solid black linkers. Gene positions along the entire chromosomes of
Synechocystis sp. PCC 6803 and Anabaena sp. PCC 7120 are shown. The qualitative presence or absence of genes in Nostoc punctiforme, an unfinished
genome, is also indicated. Detailed zoom-in schematics are shown for the gene organizations within the supraoperons of Anabaena and Nostoc, regions
spanning 13,000-14,000 bp. In the latter regions, intergenic spacing is shown, with negative values indicating the extent of genic overlap.

Four subclasses of tyrA are defined according to the sub-
strate specificities of the TyrA gene product: tyrA,, specific
for prephenate; tyrAa, specific for arogenate; tyrAc, accepts
either prephenate or arogenate; and tyrA(p), has broad speci-
ficity but exhibits a distinct preference for prephenate.
Among all cyanobacteria, only Nostoc possesses frnE.

In their genomes outside the supraoperon boundaries,
Nostoc and Anabaena possess a full complement of genes for
biosynthesis of tryptophan, tyrosine and phenylalanine. Even
these extra-supraoperonic genes of the Anabaena/Nostoc
lineage are represented by multiple paralogs in many cases
(Figure 2). If one considers the single-copy assemblage of
aromatic-pathway genes present in the Synechocystis/Syne-
chococcus/Prochlorococcus lineage as a fundamental com-
plement of genes common to all cyanobacteria, the
Anabaena/Nostoc genomic repertoire contains substantial
redundancy. Thus, Anabaena has two additional extra-oper-
onic paralogs of aroA1i and trpD. In addition to extra-oper-
onic, free-standing copies of trpAa and trpAb, a second
fused gene (trpAa.trpAb_2) encoding the two domains of

anthranilate synthase is present in Anabaena. Nostoc has
two extra-operonic copies of aroA1I, aroB and trpD. All
cyanobacteria possess AroA of the IP3 class (aroA1n). While
this is also true of the Anabaena/Nostoc lineage (in fact,
having multiple copies), both Anabaena and Nostoc possess
an additional gene encoding AroA of the la class (aroAI).
All cyanobacteria possess a tyrA gene of the arogenate-
specificity class (tyrAa), but the Anabaena/Nostoc supra-
operons also possess a tyrA gene deemed to be a
cyclohexadienyl dehydrogenase [28] with a favored speci-
ficity for prephenate (tyrA(p)) (C.A.B., R.A.J., N.K. and
McNally A., unpublished observation).

Figure 3 shows an evolutionary scenario, using a Fitch
diagram [29], that depicts the suggested origin of trpD par-
alogs via two gene duplication events (Dpi and Dp2) that
preceded the node of speciation divergence (Sp4) to Nostoc
(Npu) and Anabaena (Asp). Consistent with the latter con-
clusion, Npu TrpDi exhibits greater identity with its
ortholog Asp TrpD_l than with its paralogs Npu TrpD_2
and Npu TrpD_3. Likewise, Npu TrpD_2 and Npu TrpD_3

Genome Biology 2003, 4:R14 14 14

Figure 3
Fitch diagram [29] illustrating the origin and distribution of ortholog and
paralogs of trpD in cyanobacteria. Paralogs, originating by gene duplication
events (Dp I and Dp2), track back to a horizontal line, whereas orthologs,
originating by speciation (Sp I, Sp2, Sp3 and Sp4), track back to an
inverted Y. The six trpD genes of Nostoc (Npu) and Anabaena (Asp)
comprise a paralog set, and each of those comprises a four-member
ortholog set with respect to the trpD genes from P. marinus (Pmu),
Synechococcus sp. (Syn), and Synechocystis sp. (Ssp).

exhibit greater identity with their Asp orthologs than with
their Npu paralogs.

Since the basic single-copy repertoire of dispersed aro-
matic-pathway genes shown in Figure 2 for Synechocystis
(Ssp) is representative of other cyanobacteria such as Syne-
chococcus (Syn) and Prochlorococcus (Pmu) and is also
present at dispersed extra-operonic loci of Anabaena and
Nostoc, an obvious possibility would seem to be that the
genes of the supraoperon originated by LGT in a common
ancestor of Anabaena and Nostoc. If so, speciation was fol-
lowed by different species-specific gene-insertion events.
Because the divergence of Anabaena and Nostoc was rela-
tively recent, evidence for LGT by analysis of GC content,
codon usage, or dinucleotide frequency might be forthcom-
ing. A number of distinctive properties of the supraoperon
gene block represent items of biological context (as dis-
cussed by Lawrence and Ochman [21]) that potentially
could provide excellent tracking clues about the identity of
the putative donor in LGT. These include the overall gene
organization of the trp operon, for which many microbial
patterns are known; the extremely rare gene order of trpEa
trpEb instead of the typical order trpEb trpEa; the fusion of
genes encoding the alpha (trpAa) and beta (trpAb) subunits
of anthranilate synthase, a fusion that exists in only a
limited number of other taxa, and the presence of operonic
genes exhibiting distinctive homology subtypes (aroAi,
and tyrA()).

Genome Biology 2003, Volume 4, Issue 2, Article R 14 Xie et al. R14.5

Results and discussion
Lateral gene transfer of a block of genes in Xylella
The trpR gene in X. fastidiosa was previously noted [23] to
have anomalously low GC content, relative to that of the
genome. Low-GC blocks of genes have been attributed to
LGT before, for example, argF (present in E. coli K-12 but
not in other strains) is bracketed with unidentified high-GC
(59%) genes that together comprise a distinctive block of
LGT genes [30o]. The flanking genes of trpR were accordingly
analyzed for GC content. Figure 4 shows that trpR in X. fas-
tidiosa is at one end of a block of seven genes, all of which
have a distinctively low GC content (highlighted in green),
compared to the flanking genes (highlighted in yellow).

If the block of low-GC genes in Xylella really reflects an alien
origin, differences in dinucleotide frequencies might be
expected, as such context biases differ from organism to
organism. A 3:1 dinucleotide bias (third nucleotide position
in a codon analysis algorithm followed by the first nucleotide
position in the succeeding codon) was utilized, as it is the
dinucleotide that is least restricted by amino-acid preference
and codon usage in individual genes [31]. The 3:1 dinu-
cleotide frequencies were calculated for the entire block of
low-GC genes, as well as for the immediately flanking genes.
These results presented in Figure 5 with a set of four selected
dinucleotides shows that dinucleotides frequencies of the
flanking genes were within a variance of about 4% from
genomic frequencies, whereas the low-GC block of genes
exhibited recognizably greater variances from the genomic
dinucleotides frequencies of X. fastidiosa.

The co-variation of 3:1 dinucleotide frequencies of genes in the
low-GC gene block of Xylella with the corresponding genomic
frequencies was also evaluated using the Spearman rank cor-
relation coefficient. Table 1 illustrates the data used to
compare the Xylella trpR gene and the Xylella genome. A
p-value of 0o.73o0 indicated that the 3:1 dinucleotide frequencies
of trpR from Xylella did not exhibit significant co-variation
with the frequencies of the Xylella genome. In contrast, the
3:1 dinucleotide frequencies of trpR from Chlamydia tra-
chomatis did exhibit significant co-variation with the fre-
quencies characteristic of the C. trachomatis genome
(p-value = 0.031). These analyses are consistent with occur-
rence of recent LGT in X.fastidiosa.

What is the origin of the LGT gene block?
Gene organization is subject to constant change. For pre-
cisely this reason, the overall gene organization within the
low-GC gene block might implicate a donor organism
because the LGT event is inferred to be recent. Because the
enteric lineage is a reasonable source of the LGT gene
block, it is pertinent that the gene organization around
trpR is highly conserved in the enteric lineage. Without
exception, trpR in the enteric lineage is preceded upstream
by a gene encoding soluble lytic murein transglycosylase
(sit). hemK is usually positioned directly downstream,

Genome Biology 2003, 4:R 14

R14.6 Genome Biology 2003, Volume 4, Issue 2, Article R14 Xie et al.

1,817,702 0

ca trp[Ab ad 4 [IsodA = 1,827,702
250 -21 -13 -1 128 6 198 262

Xylella fastidiosa genome 53.7 GC%

Figure 4
Block of genes acquired by lateral gene transfer (LGT) in Xylella fastidiosa. The gene map at the top shows the LGT block of genes with a green bar. The
gene block begins with trpAa on the left and ends with trpR on the right. Intergenic spacing is given. The vertical pale green bar in the lower panel shows
the corresponding genes from bottom to top. The GC% for each gene is shown, and the gene products are named. The hypothetical protein belongs to
pfam00583, the acetyltransferase (GNAT) family. The low-GC gene block of the X. fastidiosa genome corresponds to gene numbers XF 1914 (trpAa)-
XF1920 (trpR).

except for the Haemophilus actinomycetemcomitans/
H. influenzae/Pasteurella multocida grouping (where the
downstream gene encodes a monofunctional biosynthetic
peptidoglycan transglycosylase (mtgA)). No genomes of
the enteric lineage were found to possess trpR in a
context of flanking genes that resembled the X. fastidiosa
gene organization.

The LGT-block of Xylella genes conceivably could have orig-
inated from a donor similar to a common ancestor of the
chlamydiae before the massive gene reduction associated
with the chlamydial lifestyle. This would be consistent with
the low GC content of both the chlamydial genome and the
LGT-block of genes, as well as with the observation that
chlamydiae and Xylella are the only two known taxa where

trpR is positioned near structural genes of the tryptophan
pathway. Direct comparison of chlamydial trpAa and trpAb
genes with those of the Xylella operon is not possible
because all chlamydial genomes thus far mapped lack trpAa
and trpAb [23]. In this context, sequencing of genomes from
closely related free-living relatives of the chlamydiae could
be informative. The currently available chlamydial genomes
also lack other genes of the low-GC block.

C. burnetii was also considered as a possible source of the
low-GC gene block in X. fastidiosa because it possesses
trpR. This potential LGT event seems ruled out because trpR
is not near any structural genes encoding TrpAa and TrpAb
in C. burnetti; C. burnetii TrpAa and TrpAb are not close to
the corresponding X. fastidiosa enzymes on phylogenetic

Genome Biology 2003, 4:R14

Genes encoding: GC%
L-Aspartate oxidase (nadB) 55%

Quinolinate synthetase A (nadA) 49%
Hypothetical protein 50%
Superoxide dismutase (sodA) 50%
Tryptophan-pathway repressor 34%
Iron-sulfur flavoprotein 40%

Transcriptional regulator 37%

Hypothetical protein E-, 37%
Aryl-CoA ligase 37%
Anthranilate synthase (small subunit) tb 38%
Anthranilate synthase (large subunit) 39%

sec-independent protein (tatD) 52%

5-Methylcytosine-specific restriction enzyme A (mcrA) 52%
Hypothetical protein 51%
Signal recognition particle receptor (ftsY) 51% 14

Volume 4, Issue 2, Article R14 Xie et al. R14.7

Xylella fastidiosa







* TA

nadB nadA sodA

tatD mcrA hypo ftsY

Figure 5
Three-to-one dinucleotide analysis of the putative LGT-block of X. fastidiosa genes shown in Figure 4. For easier viewing, four of the 16 dinucleotide
combinations have been selected. The frequency variation of each gene is shown as positive variation (upward-pointing bars) or negative variation
(downward-pointing bars) with respect to the average genomic frequencies (set to a value of zero at the midline), the absolute values of which can be
seen in Table I. treg, transcriptional regulator; hypo, hypothetical gene.

trees; and C. burnetii lacks the remaining genes in the low-
GC gene block of X.fastidiosa.

If LGT accounts for the low-GC gene block in X. fastidiosa,
how recent was this event? Presumably, it was sufficiently
recent that significant amelioration to the genomic GC
content has not yet occurred. The closest sequenced genome
to Xylella is Xanthomonas. Genomes representing two
species of the latter genus have been sequenced, and both
lack the low-GC gene block. Therefore, the putative LGT
event occurred some time after lineage divergence of Xylella
and Xanthomonas. On the other hand, LGT presumably has
predated speciation in the Xylella genus as all three strains
of Xylella in the National Center for Biotechnology Informa-
tion (NCBI) database possess the low-GC gene block. The T2
score of Hooper and Berg [31] measures the covariance of
3:1 dinucleotide signatures, and is designed to recognize very
recent imports of alien genes by LGT. T2 scores calculated
for the low-GC gene block of X. fastidiosa were not above the
required threshold for very recent gene imports.

What is the function of the low-GC block of genes in
Within the low-GC block, trpR is separated by four ORFs
from genes encoding the two subunits of anthranilate

synthase (trpAa and trpAb). These probably do not function
for general tryptophan biosynthesis since paralogs of these
genes, which exhibit a phylogenetically congruent context of
gene organization, exist elsewhere in the genome (Figure 6).
The latter genes are located within either of two separate
operon clusters (Figure 6) with the GC content characteristic
of X. fastidiosa. The GC-content values for the latter genes:
trpAa, trpAb, trpB, trpC, trpD, trpEa, and trpEb are 52%,
49%, 54%, 55%, 51%, 59% and 55%, respectively. Further-
more, Figure 6 shows that the organization of the full com-
plement of trp-pathway genes into two operons in
X. fastidiosa is similar or identical to that of some of its
nearest neighbors on the 16S rRNA tree, although the
Xylella operons exhibit atypically large intergenic spacings.
None of these neighbors possesses the low-GC block of
Xylella genes illustrated in Figure 4. Hence, the two operons
shown in Figure 6 can be inferred to be responsible for
primary tryptophan biosynthesis throughout this clade.

Genes encoding the two anthranilate synthase subunits
(trpAa and trpAb) and aryl-CoA ligase (acl) surely belong to
an operon, as translational coupling is evident from the
overlap of start and stop codons (Figure 4). Acl exhibits
strong similarity to coenzyme F390 synthetase of
methanogenic archaea, as well as to phenylacetate-CoA

Genome Biology 2003, 4:R 14 14

Genome Biology 2003,

R14.8 Genome Biology 2003, Volume 4, Issue 2, Article R14 Xie et al.

Table I

Statistical test of co-variation of 3:1 dinucleotide frequencies of
trpR and its cognate genome

3:1 Dinucleotide

Xylella fastidiosa 14

While it appears likely that trpR, aryl-CoA ligase, trpAa and
trpAb belong to a common functional unit, the possible roles
of the remaining three genes downstream of acl are prob-
lematic at the present time.

Chlamydia trachomatis

Genome trpR Genome trpR The Anaboenal/Nostoc gene blocks
The large gene blocks in Anabaena and Nostoc that begin
4.5 4.3 9.3 9.6 with aroA1i and end with tyrA(p) exhibited GC ratios that
5.0 6.5 8.4 9.6 were similar to that of the host genome (Table 2). This is not
4.2 16.3 8.6 9.6 necessarily inconsistent with their possible origin by LGT
11.4 7.6 10.2 3.2 because the GC ratio of a putative donor genome could have
4 .5 4.3 47 4.3 been coincidentally similar to that of Anabaena and Nostoc.

Xfa genome/trpR p-value = 0.031; Ctr genome/trpR p-value = 0.730.

ligase of E. coli. As Xylella does not appear to make the F420
cofactor that is the substrate of F390 synthetase, the func-
tion of Acl is likely to be closer to phenylacetate-CoA ligase.
The aromatic ring is highly stable, and CoA thioesterification
can provide chemical activation, allowing cleavage of the
aromatic ring, as exemplified by catabolism of benzoate,
4-hydroxybenzoate, and anthranilate [32]. Because acl is
tightly organized with trpAa and trpAb, it seems feasible
that anthranilate might be the substrate of acl. An anthrani-
late-CoA ligase has been described recently in Azoarcus
evansii by Schiihle et al. [33]. The Xylella Acl exhibited
greater identity with phenylacetate-CoA ligase of E. coli than
with anthranilate-CoA ligase of A. evansii, but a given sub-
strate specificity within homology groups often can be asso-
ciated with different subgroupings [25,34].

If anthranilate is indeed the substrate of Acl in Xylella, it
would be a futile cycle if anthranilate were formed biosyn-
thetically, only to be subsequently catabolized. Therefore,
it seems more likely that the activation of anthranilate
could be a step in the formation of a siderophore or antibi-
otic compound that is assembled by a nonribosomal
peptide synthetase mechanism (see Quadri et al. [35] and
references therein for numerous examples). Pyochelin
from Pseudomonas aeruginosa exemplifies an iron
siderophore whose peptide-based synthesis depends on
CoA-activated salicylate (closely related to anthranilate) as
a starter unit [36].

Accordingly, the 3:1 dinucleotide frequencies of the aroA1i-
tyrA, gene block and the immediately flanking genes were
analyzed, but these dinucleotide frequencies also did not
suggest LGT. Figure 7 shows that dinucleotide frequencies
did not deviate more than 5% from the genomic frequencies
across the aroAi -tyrA, gene block. This contrasts with the
distinctly greater deviation of 3:1 dinucleotide frequencies
within the low-GC gene block of Xylella, which is shown on
the same scale as the Anabaena data.

Codon usage was analyzed throughout the gene block and
also failed to implicate LGT. Figure 8 exemplifies this with a
comparison of the pair of TrpAa domains in Anabaena, one
encoded from within the gene block and the other outside.
As Xylella also possesses a TrpAa pair, one encoded from
within the low-GC gene block and the other outside, the
analyses of these are also included in Figure 8 as a kind of
positive control. In Anabaena (two bars on the right of each
panel) the codon usage for leucine, serine, arginine, glycine,
valine and proline was very similar for the TrpAa domain of
TrpAa*TrpAb in the aroAin-tyrA(p) gene block and for the
stand-alone TrpAa protein. This contrasts with the results
obtained for the two Xylella TrpAa proteins, one in the low-
GC gene block (on the far left) and the other (second bar in
each panel) encoded by the gene in the trpAaAbBD operon
(Figure 6). Thus, in contrast to the Anabaena TrpAa pair,
the Xylella TrpAaAb pair exhibited distinctly different codon
usage. Although this result is certainly consistent with an
explanation of LGT in Xylella, one cannot be certain that the
different functional roles of TrpAa domains might be associ-
ated with differing intra-genomic patterns of codon usage
that are not yet well characterized [371].

Analysis of protein trees
We evaluated whether the closest BLAST hits, using as
queries the amino-acid sequences corresponding to the
operonic genes of Anabaena or Nostoc, would be with other
cyanobacteria (and therefore consistent with origin by gene
duplication) or with another taxon grouping (consistent with
LGT). In either case, one would expect that the sequences
encoded by the operonic genes of Anabaena would be the
best matches for the operonic genes of Nostoc, as was indeed

Genome Biology 2003, 4:R14

Volume 4, Issue 2, Article R14 Xie et al. R14.9

--Xylella fastidiosa


as europaea

-Ralstonia metallidurans

- Burkholderia fungorum

- Bordetella parapertussis

154 68 78

-1 4

24 -4 7

14 -1 19

55 18 -4

368 125

-13 56

60 -1 12

97 12 17


Figure 6
Organization of trp-pathway genes in X. fastidiosa and its nearest phylogenetic neighbors. The position of the organisms indicated on a 16S rRNA subtree
is shown at the far left. To enhance the presentation, the trp-gene acronyms are shortened. Thus, trpAa is shown as Aa, etc. Intergenic spacing is
indicated. dmt refers to a putative DNA methyltransferase. TrpAa in Nitrosomonas europeae and trpC in Bordetella parapertussis are located in other
chromosomal positions, unlinked to other trp-pathway genes. X. fastidiosa and N. europeae, but not the other organisms shown in the figure, possess truA
(encoding tRNA pseudouridine synthase A) upstream of trpC. truA and trpC are translationally coupled with 31 -bp overlaps in X. fastidiosa and N.
europeae, respectively. The gene organizations shown for a given organism is identical to the other organisms shown in parentheses as follows: Ralstonia
metallidurans (R. solanacearum), Burkholderia fungorum (B. pseudomallei, B. mallei), and B. parapertussis (B. pertussis, B. bronchiseptica). R. solanacearum, in
addition to the genes shown, has adjacent paralogs of trpB and trpD located on a large plasmid. The trpAaAbBD and trpCEbEa operons of the X. fastidiosa
9a5c genome correspond to gene numbers XF0210-XF0213 and XF1374-XF1376, respectively.

the case. For all of the operonic Anabaena/Nostoc Trp-
pathway proteins used as queries, homolog sequences from
other cyanobacteria (Synechocystis, Synechococcus,
Prochlorococcus) were the remaining top hits returned in
the BLAST queue. As BLAST hits must be considered imper-
fect indicators of nearest-neighbor homologs [38], the con-
clusion that the operonic trp-pathway genes are of
cyanobacterial lineage origin was confirmed more rigorously
by examination of extensive trees (available upon request)
constructed for each trp protein of Anabaena and Nostoc.
For the Trp-pathway proteins, all the cyanobacterial proteins
clustered together, regardless of whether they were Anabaena
or Nostoc paralogs or whether they were the singly repre-
sented proteins of Synechocystis, Synechococcus, or
Prochlorococcus. The same result was obtained for AroAi,
protein trees. All the redundant genes exhibited identity rela-
tionships that suggested their origin by one or more gene-
duplication events in the common ancestor of Anabaena and
Nostoc; that is, exactly as diagrammed in Figure 3.

A different result was obtained for genes encoding AroB and
TyrA. AroB sequences in nature are rather divergent. All of
the cyanobacterial AroB proteins form a compact cluster in
the AroB tree (including the non-operonic Anabaena/Nostoc
aroB genes), except for those encoded by the Anabaena/
Nostoc supraoperons. The supraoperonic AroB proteins
occupy a tree position that is not particularly close to other
AroB proteins (the closest matches being on the order of
30-35% identity with some enteric bacteria). A similar

situation applies to TyrA(p). All cyanobacteria possess the
arogenate dehydrogenase specificity class (denoted TyrAa) of
the TyrA superfamily. The additional TyrA(p) present only in
Anabaena and Nostoc and located as the carboxy-terminal
gene of the supraoperon exhibits identities of 39-43% with
the TyrA(p) proteins of some enteric bacteria. These results
for supraoperonic aroB and tyrA(p) could be consistent with
LGT, but with no clear donor candidates available. On the
other hand, origin as ancient paralogs is also a possibility.

The trpAa*trpAb fusion
A particularly fortuitous gene that could favor or disfavor the
hypothesis of LGT of the aroAi -tyrA(p) gene block in
Anabaena/Nostoc is trpAa*trpAb, a fusion corresponding to
two genes that are usually separate (free-standing). As only a
limited number of trpAa*trpAb fusions are known, possible
LGT donors can be evaluated. Organisms known to possess
the trpAaotrpAb fusion are listed at the top of Table 3.
Another small group of trpAa*trpAb fusions are known,
which are dedicated to phenazine biosynthesis and which
form a distinct cluster. These are denoted trpAa*trpAb_phz
in Table 3. Thus far, the trpAa*trpAb_phz fusions are limited
to species of Pseudomonas and Streptomyces. pabAa and
pabAb are homologs of trpAa and trpAb, and the distribution
of fusions involving these domains are also listed in Table 3 to
give a general sense of the frequencies of such gene fusions. A
variety of data (G.X. and R.A.J., unpublished observation)
indicates that equivalent fusions often arise independently of
one another in widely spaced lineages.

Genome Biology 2003, 4:R 14

LW 14

Genome Biology 2003,

R I4.10 Genome Biology 2003, Volume 4, Issue 2, Article R 14 Xie et al.

Table 2

Did operonic genes originate by LGT?

Second BLAST hit

% Identity

Anabaena AroA1,
Gpm I

Nostoc punctiforme
Nostoc punctiforme
Nostoc punctiforme
Nostoc punctiforme
Enterococcus faecalis
Nostoc punctiforme
Nostoc punctiforme
Nostoc punctiforme
Nostoc punctiforme
Nostoc punctiforme
Nostoc punctiforme

Nostoc punctiforme
Anabaena sp.
Anabaena sp.
Anabaena sp.
Streptomyces coelicolor
Anabaena sp.
Anabaena sp.
Streptomyces coelicolor
Streptomyces coelicolor
Nostoc punctiforme
Yersinia pseudotuberculosis

Nostoct AroA1, 46 Anabaena sp. 80 Nostoc punctiforme 79
TrpB 47 Anabaena sp. 77 Nostoc punctiforme 64
TrpEb 46 Anabaena sp. 88 Nostoc punctiforme 88
TyrP_ I 44 Nostoc punctiforme 56 Nitrosomonas europeae 30
TrpEa 46 Anabaena sp. 85 Anabaena sp. 74
TrpD 42 Anabaena sp. 72 Anabaena sp. 68
TrpAa*TrpAb 42 Anabaena sp. 81 Anabaena sp. 76
AroB 43 Anabaena sp. 68 Nostoc punctiforme 68
FrnE 40 Deinococcus radiodurans 30 Rhodobacter capsulatus 29
TyrA(p) 39 Anabaena sp. 72 Yersinia pseudotuberculosis 38

*Anabaena sp. PCC 7120 has a genomic GC ratio of 42.82%. tNostoc punctiforme has a genomic GC ratio of 43.90%.

Figure 9 shows a segment of the 16S rRNA tree that contains
all of the trpAa*trpAb fusions which are known so far.
Cyanobacteria other than Anabaena/Nostoc lack the fusion,
as do all nearby lineages. The fusion is present in the cluster
that includes Rhodopseudomonas palustris, Rhizobium loti,
Brucella melitensis, Agrobacterium tumefaciens and
Sinorhizobium meliloti. (A. tumefaciens, which is not shown
in Figure 9, is virtually identical to S. meliloti). Additional
phylogenetically spaced fusions are present in Thermomono-
spora fuisca, Azospirillum brasilense, and Legionella pneu-
mophila. Other fusions that involve trpAa or trpAb homologs
also occur in nature, as shown in Table 3, and a degree of care
is needed to avoid confusion between them.

A phylogenetic tree consisting of all free-standing TrpAa
and TrpAb proteins was constructed, together with the cor-
responding two domains of the TrpAa.TrpAb fusions (avail-
able upon request). Surprisingly, each of the to fusion
domains clustered tightly on the TrpAa and TrpAb trees, to
the exclusion of the free-standing TrpAa and TrpAb
domains. This is consistent with a single ancestral fusion
event, but requires the assumption of multiple LGT events.
However, it is surprising that no free-standing domains
(that is, close homologs of the original fusion partners)

cluster with either of the two sets of 10 fusion domains.
This might suggest an alternative to LGT, namely that there
has been extreme sequence convergence because of strong
selection for appropriate residues mediating domain-
domain interactions. If so, it is possible that trpAa*trpAb
fusions occurred as a number of independent events, fol-
lowed by strong convergence.

Figure 9 shows the individual genomic organization of trp-
pathway genes in the 16S rRNA tree sector that is relevant to
the trpAa*trpAb fusion. The Anabaena/Nostoc lineage is
unique in having trpAa*trpAb linked to other trp-pathway
genes and is further unique in having an additional set of free-
standing genes encoding TrpAa and TrpAb. Although gener-
ally uncommon, complete dispersal of Trp-pathway genes is
characteristic of the non-filamentous cyanobacteria, Aquifex
aeolicus and Chlorobium tepidum. The ancestral state of trp
gene organization has been asserted (G.X., C.B., N.K. and R.J.,
unpublished work) to be trpAa/Ab/B/D/C/Eb/Ea, an operon
organization seen in contemporary Cytophaga hutchinsonii,
Desulfovibrio vulgaris and Coxiella burnetii (Figure 9).
Dynamic gene reorganization events that involve gene inser-
tions, gene scrambling, gene duplications and gene dispersal
are apparent from inspection of Figure 9.

Genome Biology 2003, 4:R14

Gene product


First BLAST hit


% Identity

Organism 14

Volume 4, Issue 2, Article RI 14 Xie et al. RI4.11







Anabaena sp. PCC7120

Xylella fastidiosa












* AC

l AA



* GC

* GA


Figure 7
Three-to-one dinucleotide analysis. (a) The aroA,,-tyrAc gene block in Anabaena. Deviations from genomic frequencies are expressed as positive (upward-
pointing bars) or negative (downward-pointing bars) percentages. (b) For comparison, the results obtained for the low-GC gene block of X. fastidiosa (of
which Figure 4 is a subset). The gene blocks of interest are highlighted in yellow, and the flanking genes are indicated by numbers.

It is expected that LGT would most easily be recognized if it genome, for example GC content. In the case of each of the
occurred relatively recently before passage of sufficient time known trpAaotrpAb gene fusions, the absence of the gene
for amelioration of alien characteristics to those of the host fusion in a closely related genome implies that the gene-fusion

Genome Biology 2003, 4:R 14 14

Genome Biology 2003,

Volume 4, Issue 2, Article R 14 Xie et al.

Figure 8
Codon usage for the pairs of TrpAa domains in the genomes of Anabaena sp. (Asp) and Xylella fastidiosa (Xfa). (a) Leucine; (b) serine; (c) arginine;
(d) glycine; (e) valine and (f) proline. From left-to-right, Xfa TrpAa_l is encoded from the low-GC gene block (In) and Xfa TrpAa_2 is encoded from
outside (Out) the gene block; Asp TrpAa_l is encoded from within the aroA,,tyrA(p) gene block (In) and Asp TrpAa_2 is encoded from outside (Out)
the latter gene block. Synonymous codons are shown at the right of each amino acid set and color-coded to match the percent usage indicated by
the bars.

Genome Biology 2003, 4:R14

(a) Leucine
Xfa Asp


80 mTTG
(a CTG
o 40

20 m CTA

In Out In Out

(b) Serine
Xfa Asp


80 iTCT
E 60 i C-C
I 40 I

20 *AGC

In Out In Out

(f) Proline
Xfa Asp


80- -
- 60 o OT

In40 00 0Ou
20 o CCA

In Out In Out

R 14.12 Genome Biology 2003, 14

Genome Biology 2003, Volume 4, Issue 2, Article R 14 Xie et al. RI4.13

Table 3

Gene fusions involving trpAa and or trpAb homologs





Brucella melitensis; Sinorhizobium meliloti; Agrobacterium tumefaciens; Azospirillum brasilense; Nostoc punctiforme; Thermomonospora
fusca; Rhodopseudomonas palustris; Rhizobium loti; Legionella pneumophila; Anabaena sp._ I; Anabaena sp._2
Pseudomonas aureofaciens; Pseudomonas aeruginosa; Pseudomonas chlororaphis; Pseudomonas fluorescens; Streptomyces venezuelae;
Streptomyces coelicolor
Escherichia coli; Salmonella typhi; Campylobacter jejuni; Thermotoga maritima
Deinococcus radiodurans
Neisseria meningitides; Neisseria gonorrhoeae; Chlorobium tepidum; Helicobacter pylori; Campylobacter jejuni; Streptococcus pneumoniae;
Streptococcus pyogenes; Streptococcus equi; Streptococcus gordonii; Listeria innocua; Listeria monocytogenes; Geobacter sulfurreducens;
Ralstonia solanacearum; Burkholderia fungorum; Sphingomonas aromaticivorans; Chlorobium tepidum; Ralstonia metallidurans;
Lactococcus lactis; Burkholderia pseudomallei; Magnetococcus sp.
Streptomyces griseus; Streptomyces venezuelae; Streptomyces pristinaespiralis; Thermomonospora fusca; Anabaena sp.; Nostoc
punctiforme; Corynebacterium glutamicum; Saccharomyces cerevisiae; Aspergillus fumigatus; Plasmodium falciparum; Coprinus cinereus;
Schizosaccharomyces pombe

'Also known as phzE. tpabAa, pabAb and pabAc are also known as pabB, pabA and pabC.

event (or the LGT event) occurred recently, that is, in the one
lineage following the time of its separation from the other by
speciation. Thus, the acquisition of trpAa.trpAb by Ther-
momonospora fusca must have occurred by fusion or by LGT
relatively recently, that is, after the speciation event that gener-
ated the Streptomyces lineage (see Figure 9). In each of the
remaining cases of trpAaotrpAb fusion, a relatively near time of
fusion shown in Figure 9 origin can be identified. These are
defined by points of speciation divergence between
Anabaena/Nostoc and other cyanobacteria, between the
Rhodopseudomonas/Sinorhizobium cluster (fusion) and
Caulobacter (no fusion), between Azospirillum brasilense
(fusion) and Magnetospirillum magnetotacticum (no fusion),
and between Legionella pneumophila (fusion) and Coxiella
burnetii (no fusion).

If any of the trpAaotrpAb fusions, other than the
Nostoc/Anabaena pair, have a common origin, similar
flanking regions of gene organization might be expected
since all of the fusions are of relatively recent origin. On this
criterion, only R. loti, B. melitensis, A. tumefaciens and
S. meliloti exhibited similarities of flanking-gene organiza-
tion, and this is phylogenetically congruent. These observa-
tions imply that within the span of phylogeny shown in
Figure 9, the trpAa.trpAb fusion may have occurred inde-
pendently as many as seven times.

Interdomain linker regions
In fusion proteins an interdomain linker region of critical
length and mobility is important to facilitate specific domain-
domain interactions. Fusions of independent origin might be
expected to exhibit a variety of linker regions. Particular con-
straints undoubtedly limit this variety, and such constraints
might be more stringent for some domain combinations than
others. (In the case of particularly stringent constraints,
similar linker regions would not necessarily demonstrate a

common origin). Figure to shows an alignment of the
carboxy-terminal region of the TrpAa domain, the linker
region, and the amino-terminal region of the TrpAb domain
for all of the fusion proteins depicted in Figure 9 (as well as
that from A. tumefaciens).

Only the two operonic fusion proteins from Anabaena and
Nostoc and the four rhizobial fusion proteins (Mlo, Bme,
Rme and Atu) exhibit linker regions of identical length and
obvious similarity. The paralog TrpAa.TrpAb protein of
Anabaena sp (Asp_2) seems to have a distinctly different
linker, and it may be that the two fusions in Anabaena arose
as two independent events. The partial sequences shown in
Figure o10 are spaced to indicate the seven independent
events of gene fusion that are suggested.

Function of the AnabaenalNostoc gene blocks?
The gene blocks shown in Figure 2 encode the entire trypto-
phan pathway (except for trpC), as well as the first two
enzymes of the common aromatic pathway, and the key
enzyme of tyrosine biosynthesis. Multiple enzymes catalyz-
ing the same reaction have been described in developmental
systems where differential regulation of isoenzymes are
deployed in different temporal and spatial contexts. Filamen-
tous cyanobacteria (such as Anabaena and Nostoc) subscribe
to a developmental program of heterocyst formation that is
widely considered the primitive state and that correlates with
their exceedingly large genomes. Unicellular cyanobacteria
such as Synechocystis, Synechococcus and Prochlorococcus
have far smaller genomes and lack the ability to fix nitrogen
(heterocyst formation). It, therefore, seems to be a distinct
possibility that the gene blocks diagrammed in Figure 2 (as
well as additional gene duplicates) are specifically involved
in specialized capabilities of Nostoc/Anabaena that do not
exist in other cyanobacteria. In terms of the evolutionary
scenario, the Anabaena/Nostoc lineage may reflect the

Genome Biology 2003, 4:R 14 14

R14.14 Genome Biology 2003, Volume 4, Issue 2, Article R 14 Xie et al.

Thermomonospora fusca M
Streptomyces coelicolor E E ED[C[>3 XEb
Corynebacterium diphtheriae IEb.EAbEB D@C. ? Eb Ea
Mycobacterium tuberculosis/avium )DDHSXE
Synechocystis Gene dispersal
Synechococcus Gene dispersal
Prochlorococcus marinus Gene dispersal
Nostoc punctiforme tyrA> frnE> aroB> E [ D tyrPl [Ij) aroA>
Anabaena EM tyrA>aroB>ORF>gpm > M EDqor[ aroA>
Chlamydophila psittaci [D.DLE
Clhirocrlom tepidum Gene dispersal
-- Cytophaga hutchinsonii BXDC Eb
o-- CLvcilr suliurreiLicens
DesulcriO Lro Itilgaris EBl D [ LE-
Campylobacterjejuni EMEDID [L
H-- Hicobacterpylori E.DABi EI IC
Calobacter crescentus [ [[
-- Rl-odopseudomonas palustris E [DED E>[EX
I- RihIzoLIum loti M [X [[E
SLBruLCall melitensis E [EAb [I[Ii0
S1- or-tizobium meliloti E [E D [[[
-- Spl;ngomonas aromaticivorans E
-- RI-oobacter sphaeroides E I [
A OQlanwiospirillum magnetotacticum [ E[ E
-- zosporillum brasilense EM D Not in genome sequence project
-- Te-iloVacllus ferrooxidans $)D'ED[ [[ii[
--- Xi ftf/a fastidiosa [N2^[BXE [ b[E[>
--- Nitrosomonas europaea [ D> [[[E
alstonia metallidurans EM [ D [Ii[[
Burkholderia mallei E E [
Bordetella pertussis E [E
'---- Neisseria gonorrhoeae [E
Cox, ila Liurnetii ,.lAbB C E[0iE
Legionella pneumophila EM AD [[D [ E

Figure 9
1 6S rRNA tree showing the phylogenetic distribution (highlighted in yellow) of trpAaotrpAb fusions. The gene fusions unlinked to any other trp genes are
shown to the right of the highlighted name. The remaining trp-operon gene organizations are shown at the right. The white arrows indicate gene insertions
that encode the following: Thermomonospora, integral membrane protein; Streptomyces, three membrane proteins: Corynebacterium, membrane protein,
pantoate (3-alanine ligase (panC), and 3-methyl-2-oxobutanoate hydroxymethyl transferase (panB); Mycobacterium, conserved hypothetical protein;
Cytophaga, conserved hypothetical protein; Sphingomonas, conserved hypothetical protein and outer-membrane protein; Rhodobacter, and acetyltransferase
yibQ; Ralstonia, DNA methyltransferase (dmt); Burkholdaria, DNA methyltransferase (dmt). In addition aroR in R sphaeroides is a putative regulatory gene
[58]. The lineage relationships of three organisms that have maintained the putative ancestral trp operon are shown with heavy, gray lines.

Genome Biology 2003, 4:R14 14

Genome Biology 2003, Volume 4, Issue 2, Article R 14 Xie et al. RI4.15

Asp 498
Npu 498















Linker region















Figure 10
Comparison of TrpAa*TrpAb linker regions. The seven independent fusions that are suggested were aligned with free-standing TrpAa and TrpAb
proteins in order to visualize the inter-domain linker regions. Amino-acid residue numbering is indicated at the left and right margins.

ancestral state, and modern unicellular cyanobacteria may
be derived genomes that are smaller and more streamlined
(reductive evolution).

Operon displacement
Alien genes that may be subject to possible LGT can gener-
ally expect a hostile reception in that they lack a history of
functional integration with the resident genome. Genes that
offer immediate selective advantages (for example, antibiotic
resistance) are likely to persist. The acquisition of a com-
pletely new functional capability will often require an entire
suite of novel genes, and such recruitment is certainly easier
to envision if all of the genes arrive en bloc (that is, as an
operon). Once a primary biosynthetic pathway, such as that
responsible for tryptophan formation, has been established
and integrated with the individualistic metabolic circuitry of
a given organism, one does not expect facile displacement of
resident genes. This should apply even if the incoming genes
all coexist as an operon. We have found only two examples of
LGT of whole-Trp operons, that of trpAa/Ab/B/D*C/Eb/Ea
from the enteric lineage to coryneform bacteria and to Heli-
cobacter, as discussed earlier.

Has there been separate lateral gene transfer of
individual genes?
According to the foregoing rationale, isolated genes that par-
ticipate in multi-step processes would not generally be
expected to have much success in LGT. In some cases analog

genes encode enzymes that catalyze the same reaction in a
multi-step pathway, and one analog gene might conceivably
displace another. Lack of enough information about genomic
representation of such analog genes can lead to incorrect
inferences of LGT. For example, the initial discovery of
"plant-type" AroA, in bacteria led to the assumption of LGT
from plant to bacterium. Elucidation of the fuller genomic
representation of aroAj ([27] and refs therein) demonstrated
the origin of aroAj, in Bacteria, and plants probably have
received aroAj, from the Bacteria via endosymbiosis. A
similar outcome seems quite possible with respect to the
eukaryoticc" fructose-i,6-bisphosphate aldolase in Xylella
species. Phylogenetic incongruities that involve such analogs
can pose great difficulties in distinguishing LGT from vertical
progressions of differential analog losses in different lineages.

Specialized Trp genes not required for primary
In this article we focus on a number of cases where at least
several trp genes are linked, thus providing analytical advan-
tages offered by the analysis of more than one gene. These
genes are also redundant and phylogenetically incongruent,
in contrast to coexisting homolog genes that are part of a full
phylogenetically congruent set. Both of the latter are consis-
tent with origin by LGT, but unrecognized ancient paralogy
is also possible. In the first case, the homologs coexisting in
one organism are xenologs, whereas in the latter case, they
are paralogs. A relatively simple example is the trpAa/trpAb
pair originally denoted phnA/phnB in Pseudomonas aerugi-
nosa [39]. This comprises an anthranilate synthase that is

Genome Biology 2003, 4:R 14 14

RI 4.16 Genome Biology 2003, Volume 4, Issue 2, Article R 14 Xie et al.

not strictly required for primary tryptophan biosynthesis
and that is uniquely expressed during stationary-phase
physiology [40]. Why the generation of anthranilate under
these conditions would be of value is unknown, but phyloge-
netic trees clearly show phnA/phnB to be xenologs originat-
ing from the enteric lineage via LGT (G.X. and R.A.J.,
unpublished data). In this case, genes that function for
primary biosynthesis in the donor genome did not displace
the corresponding genes in the recipient genome, but have
instead been recruited to a specialized function. In Strepto-
myces coelicolor, trpAa/trpAb/trpB/trpD/aroAi, are con-
tained within a large cluster dedicated to antibiotic synthesis
[41]. Calcium-dependent antibiotic (CDC) contains trypto-
phan, and presumably the feedback-resistant variety of
enzyme encoded by aroA1 ensures enhanced precursor flow
to tryptophan during antibiotic production. Detailed studies
have not yet been done to see whether the CDC gene cluster
originated via LGT or reflects ancient paralogy.

In this article, we have discussed at length the Xylella and
cyanobacterial gene blocks that seem likely to have specialized
functional roles other than primary biosynthesis. The Xylella
genes are associated with other genes that presumably dictate
a fate for anthranilate other than as a primary precursor of
tryptophan. We suspect that selective advantages conferred by
this specialized operon accommodated successful LGT to
Xylella. The Anabaena/Nostoc supraoperon is reminiscent of
the S. coelicolor system in the inclusion of AroA1n, which
might enhance precursor flow to chorismate. Although the
Anabaena/Nostoc operon only lacks trpC, its features of
gene fusion and gene organization are novel. It might
perhaps have an unknown physiological function related to
the complex developmental programs unique to heterocys-
tous cyanobacteria. We conclude that in this case the oper-
onic trp genes are ancient paralogs of a dispersed set of trp
genes engaged in primary biosynthesis.

Against a backdrop where organisms generally possess
highly efficient and integrated pathways of tryptophan
biosynthesis, displacement of resident genes by LGT of the
corresponding genes is relatively infrequent. Aside from the
broadly distributed primary pathway, highly specialized
pathways are known that utilize some or all tryptophan-
pathway enzymes, and these pathways can originate by
recruitment of paralog genes derived from the primary-
pathway genes [42]. The genes of such specialized operons
may diverge considerably to meet the demands of a novel
functional role. In a contemporary organism this might have
the status of unrecognized (or recognized) paralogy, as we
suggest for the Anabaena/Nostoc gene block. However, such
an operon module also has strong potential for xenologous
transfer because of its specialized functional potential.

The tryptophan pathway exemplifies the situation where par-
alogs can be engaged in primary amino-acid biosynthesis
(widespread) or in a variety of specialized pathways (narrowly 14

distributed). Aside from the extent to which the specialized
pathways may be individually intriguing and important, this
study illustrates that case-by-case analysis can distinguish
paralogs (or xenologs) from their homologs engaged in
primary biosynthesis. This conclusion is encouraging as it
shows that both vertical and horizontal events of gene trans-
fer can be deduced to track evolutionary history.

Materials and methods
Dinucleotide frequencies
The CODONW program [43] was used to calculate 3:1 dinu-
cleotide frequencies (third base of a given codon followed by
the first base of the next codon). For whole-genome calcula-
tions, genome nucleotide sequences (.ffn file) were obtained
from GenBank [44]. Perl scripts were used to eliminate the
defline and assemble all genomic ORFs together for
CODONW calculation. The length (from UNIX we
command) divided by 3 was used to validate the absence of
frameshift errors. Pairwise covariation of 3:1 dinucleotide
frequencies was assessed by the Spearman rank correlation
coefficient [45], a nonparametric rank statistic for testing
monotonic relationships. T2 values were kindly provided by
Hooper [311].

Codon usage
Codon usage for individual genes was computed with the
CDONTREE program [46]. Codon-usage values for whole
genomes were obtained from the Codon Usage Database

Phylogenetic trees
16S rRNA subtrees were derived from the Ribosomal Data-
base site [49,50]. Unrooted phylogenetic protein trees were
derived by input of the indicated homolog amino-acid
sequences into the ClustalW program (Version 1.4) [51].
Manual alignment adjustments were made as needed with
the assistance of the BioEdit multiple alignment tool of Hall
[52]. The refined multiple alignment was used as input for
generation of a phylogenetic tree using the program package
PHYLIP [53]. The neighbor-joining and Fitch programs [511]
were used to obtain distance-based trees. The distance
matrix was obtained using Protdist with a Dayhoff Pam
matrix. The Seqboot and Consense programs were then used
to assess the statistical strength of the tree using bootstrap
resampling. Neighbor-joining and Fitch trees yielded similar
clusters and arrangement of taxa within them. Bootstrap
values indicate the number of times a node was supported in
1,ooo resampling replications.

Identification of linker regions
Fusion proteins were aligned (ClustalW) with one another
and with the assemblage of free-standing proteins corre-
sponding to the amino-terminal and the carboxy-terminal
domains of the fusion proteins. The boundaries of each
domain were defined by the last highly conserved residues of

Genome Biology 2003, 4:R14 14

the amino-terminal domain and the early highly conserved
residues of the carboxy-terminal domain. The Conserved
Domain Database was useful as a reference guide [54,55].

Comparative genome analysis
Most of the comparative genome analysis was carried out
using the database and tools of ERGO [56].

G.X. was partially supported in this work through the STDGEN project at
Los Alamos National Laboratory (NIH/NIAIDAGYI-AI-8228-05). We
thank Sean Hooper (Department of Molecular Evolution, Uppsala Univer-
sity, Sweden) for assistance with dinucleotide frequency calculations. We
are indebted to A. Osterman of Integrated Genomics, Inc. (Chicago, IL)
for provision of access to ERGO [56]. This is Florida Agricultural Experi-
ment Station Journal series no. R-09159.

I. Margulis L: Symbiosis in Cell Evolution. San Francisco: WH Freeman;
2. Gray MW: Evolution of organellar genomes. Curr Opin Genet
Dev 1999, 9:678-687.
3. Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV: Evidence
for massive gene exchange between archaeal and bacterial
hyperthermophiles. Trends Genet 1998, 14:442-444.
4. Lawrence JG, Ochman H: Molecular archaeology of the
Escherichia coli genome. Proc Natl Acad Sci USA 1998, 95:9413-
5. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH,
Hickey EK, Peterson JD, Nelson WC, Ketchum KA, et al.: Evidence
for lateral gene transfer between archaea and bacteria from
genome sequence of Thermotoga maritima. Nature 1999,
6. Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer
and the nature of bacterial innovation. Nature 2000, 405:299-
7. Doolittle WF: Phylogenetic classification and the universal
tree. Science 1999, 284:2124-2129.
8. Martin W: Mosaic bacterial chromosomes: a challenge en
route to a tree of genomes. BioEssays 1999, 21:99-104.
9. Syvanen M: On the occurrence of horizontal gene transfer
among an arbitrarily chosen group of 26 genes. J Mol Evol
2002, 54:258-266.
10. Kyrpides NC, Olsen GJ: Archaeal and bacterial hyperther-
mophiles: horizontal gene exchange or common ancestry?
Trends Genet 1999, 15:298-299.
I I. Stiller JW, Hall BD: Lateral gene transfer, genome surveys,
and the phylogeny of prokaryotes. Science 1999, 286:1443a.
12. Kurland CG: Something for everyone: Horizontal gene trans-
fer in evolution. EMBO Rep 2000, 1:92-95.
13. Salzberg SL, White 0, Peterson J, Eisen JA: Microbial genes in the
human genome: lateral transfer or gene loss? Science 2001,
14. Stanhope MJ, Lupas A, Italia MJ, Koretke KK, Volker C, Brown JR:
Phylogenetic analyses do not support horizontal gene trans-
fers from bacteria to vertebrates. Nature 2001, 4 I 1:940-944.
15. Glansdorff N: About the last common ancestor, the universal
life-tree and lateral gene transfer: a reappraisal. Mol Microbiol
2000, 38:177-185.
16. Woese CR: Interpreting the universal phylogenetic tree. Proc
Natl Acad Sci USA 2000, 97:8392-8396.
17. Lawrence JG, Ochman H: Molecular archaeology of the
Escherichia coli genome. Proc Natl Acad Sci USA 1998, 95:9413-
18. Wang B: Limitations of compositional approach to identify-
ing horizontally transferred genes.J Mol Evol 2001, 53:244-250.
19. Koski LB, Morton RA, Golding GB: Codon bias and base compo-
sition are poor indicators of horizontally transferred genes.
Mol Biol Evol 2001, 18:404-412.

Genome Biology 2003, Volume 4, Issue 2, Article R 14 Xie et al. R I4.17

20. Ragan MA: On surrogate methods for detecting lateral gene
transfer. FEMS Microbiol Lett 2001, 201:187-191.
21. Lawrence JG, Ochman H: Reconciling the many facets of
lateral gene transfer. Trends Microbiol 2002, 10:1 -4.
22. Ragan M: Detection of lateral gene transfer among microbial
genomes. Curr Opin Genet Dev 2001, I 1:620-626.
23. Xie G, Bonner CA, Jensen RA: Dynamic diversity of the trypto-
phan pathway in the chlamydiae: reductive evolution and a
novel operon for tryptophan recapture. Genome Biol 2002,
3:research0005. 1-0005.13.
24. Henner D, Yanofsky C: Biosynthesis of aromatic amino acids.
In Bacillus subtilis and other Gram-positive Bacteria: Biochemistry, Physiol-
ogy, and Molecular Genetics. Edit by Sonenshein AL, Hoch J, Losick R.
Washington, DC: ASM Press; 1993: 269-280.
25. Subramaniam PS, Xie G, Xia T, Jensen RA: Substrate ambiguity of
3-deoxy-D-manno-octulosonate 8-phosphate synthase from
Neisseria gonorrhoeae in the context of its membership in a
protein family containing a subset of 3-deoxy-D-arabino-hep-
tulosonate 7-phosphate synthases.J Bacteriol 1998, 180:119-127.
26. Jensen RA, Xie G, Bonner CA: The correct phylogenetic rela-
tionship of KdsA (3-deoxy-D-manno-octulosonate 8-phos-
phate synthase) with one of two independently evolved
classes of AroA (3-deoxy-D-arabino-heptulosonate 7-phos-
phate synthase.J Mol Evol 2002, 54:416-423.
27. Gosset G, Bonner CA, Jensen RA: Microbial origin of plant-type
2-keto-3-deoxy-D-arabino-heptulosonate 7-phosphate syn-
thases, exemplified by the chorismate-and tryptophan-regu-
lated enzyme from Xanthomonas campestris. J Bacteriol 2001,
28. Xie G, Bonner CA, Jensen RA: Cyclohexadienyl dehydrogenase
from Pseudomonas stutzeri exemplifies a widespread type of
tyrosine-pathway dehydrogenase in the TyrA protein
family. Comp Biochem Physiol C Toxicol Pharmacol 2000, 125:65-83.
29. Fitch WM: Homology: a personal view on some of the prob-
lems. Trends Genet 2000, 16:227-231.
30. Van Vliet F, Boyen A, Glansdorff N: On interspecies gene trans-
fer: the case of the argF gene of Escherichia coli. Ann Inst
Pasteur Microbiol 1988, 139:493-496.
31. Hooper SD, Berg OG: Detection of genes with atypical
nucleotide sequence in microbial genomes. J Mol Evol 2002,
32. Mohamed MES, Zaar A, Ebenau-Jehle C, Fuchs G: Reinvestigation
of a new type of aerobic benzoate metabolism in the pro-
teobacterium Azoarcus evansii. J Bacteriol 2001, 183:1899-1908.
33. Schuhle K, Jahn M, Ghisla S, Fuchs G: Two similar gene clusters
coding for enzymes of a new type of aerobic 2-aminoben-
zoate (anthranilate) metabolism in the bacterium Azoarcus
evansii.J Bacteriol 2001, 183:5268-5278.
34. Calhoun DH, Bonner CA, Gu W, Xie G, Jensen RA: The emerging
periplasm-localized subclass of AroQ chorismate mutases,
exemplified by those from Salmonella typhimurium and
Pseudomonas aeruginosa. Genome Biol 2001, 2:research0030.1-
35. Quadri LEN, Keating TA, Patel HM, Walsh CT: Assembly of the
Pseudomonas aeruginosa nonribosomal peptide siderophore
pyochelin: In vitro reconstitution of aryl-4,2-bisthiazoline
synthetase activity from PchD, PchE, and PchF. Biochemistry
1999, 38:14941-14954.
36. Patel HM, Walsh CT: In vitro reconstitution of the
Pseudomonas aeruginosa nonribosomal peptide synthesis of
pyochelin: characterization of backbone tailoring thiazoline
reductase and N-methyltransferase activities. Biochemistry
2001, 40:9023-9031.
37. Guindon S, Perriere S: Intragenomic base content variation is a
potential source of biases when searching for horizontally
transferred genes. Mol Biol Evol 2001, 18:1838-1840.
38. Koski L, Golding GB: The closest BLAST hit is often not the
nearest neighbor. J Mol Evol 2001, 52:540-542.
39. Essar DW, Eberly L, Hadero A, Crawford IP: Identification and
characterization of genes for a second anthranilate synthase
in Pseudomonas aeruginosa: interchangeability of the two
anthranilate synthases and evolutionary implications.J. Bacte-
riol 1990, 172: 884-900.
40. Mavrodi DV, Bonsall RF, Delaney SM, Soule MJ, Phillips G,
Thomashow LS: Functional analysis of genes for biosynthesis of
pyocyanin and phenazine-l-carboxamide from Pseudomonas
aeruginosa PA0 I.] Bacteriol 2001, 183:6454-6465.

Genome Biology 2003, 4:R 14

R I4.18 Genome Biology 2003, Volume 4, Issue 2, Article R 14 Xie et al.

41. RydingJJ, Anderson TB, Champness WC: Regulation of the Strep-
tomyces coelicolor calcium-dependent antibiotic by absA,
encoding a cluster-linked two-component system. J Bacteriol
2002, 184:794-805.
42. Jensen RA: Enzyme recruitment in the evolution of new func-
tion. Annu Rev Microbiol 1976, 30:409-425.
43. Peden J: CodonW as a freeware release for codon usage
analysis. []
44. GenBank Database
45. Lehmann EL, D'Abrera HJM: Nonparametrics: Statistical Methods Based
on Ranks, Rev Edn. Englewood Cliffs, NJ: Prentice-Hall; 1998: 292,
300, 323.
46. Pesole G, Attimonelli M, Liuni S: A backtranslation method
based on codon usage strategy. Nucleic Acids Res 1988, 16:1715-
47. Nakamura Y, Gojobori T, Ikemura T: Codon usage tabulated
from the international DNA sequence databases: status for
the year 2000. Nucleic Acids Res 2000, 28:292.
48. Codon Usage Database []
49. Maidak BL, Cole JR, Lilburn TG, Parker CT Jr, Saxman PR, Farris RJ,
Garrity GM, Olsen GJ, Schmidt TM, Tiedje JM: The RDP-11 (Ribo-
somal Database Project). Nucleic Acids Res 2001, 29:173-174.
50. Ribosomal Database Project II []
51. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving
the sensitivity of progressive multiple sequence alignment
through sequence weighting, position-specific gap penalties
and weight matrix choice. Nucleic Acids Res 1994, 22:4673-4680.
52. Hall T: Biological Sequence Alignment Editor for Windows 95/98/NT,
5.0.9 Edn. Raleigh: North Carolina State University
53. Felsenstein J: PHYLIP Phylogeny inference package (version
3.2). Cladistics 1989, 5:164-166.
54. Marchler-Bauer A, Panchenko AR, Shoemaker BA, Thiessen PA,
Geer LY, Bryant SH: CDD: a database of conserved domain
alignments with links to domain three-dimensional struc-
ture. Nucleic Acid Res 2002, 30:281-283.
55. NCBI Conserved Domain Database
56. ERGO []
57. Xie G, Forst C, Bonner C, Jensen RA: Significance of two distinct
types of tryptophan synthase beta chain in Bacteria,
Archaea and higher plants. Genome Biol 2002, 3:research0004.1-
58. Mackenzie C, Simmons AE, Kaplan S: Multiple chromosomes in
bacteria: The yin and yang of trp gene localization in
Rhodobacter sphaeroides 2.4.1. Genetics 1999, 153:525-538. 14

Genome Biology 2003, 4:R14

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs