Comparative genomic analysis of the DUF71/ COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage

MISSING IMAGE

Material Information

Title:
Comparative genomic analysis of the DUF71/ COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage
Physical Description:
Mixed Material
Language:
English
Creator:
Crecy-Lagard, Valerie de
Forouhar, Farhard
Brocheir-Armanet, Celine
Tong, Liang
Hunt, John F.
Publisher:
BioMed Central ( Biology Direct)
Publication Date:

Notes

Abstract:
Background: The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family.
General Note:
Publication of this article was funded in part by the University of Florida Open-Access publishing Fund. In addition, requestors receiving funding through the UFOAP project are expected to submit a post-review, final draft of the article to UF's institutional repository at the University of Florida community, with research, news, outreach, and educational materials.
General Note:
de Crécy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32; Pages 1-13
General Note:
doi:10.1186/1745-6150-7-32 Cite this article as: de Crécy-Lagard et al.: Comparative genomic analysis of the DUF71/COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage. Biology Direct 2012 7:32.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution.
System ID:
AA00013476:00001


This item is only available as the following downloads:


Full Text


de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


BIOLOGY DIRECT


Comparative genomic analysis of the DUF71/

COG2102 family predicts roles in diphthamide

biosynthesis and B12 salvage

Val6rie de Cr6cy-Lagard", Farhad Forouhar 2, C61ine Brochier-Armanet3, Liang Tong2 and John F Hunt2


Abstract
Background: The availability of over 3000 published genome sequences has enabled the use of comparative
genomic approaches to drive the biological function discovery process. Classically, one used to link gene with
function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution
profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or
post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate
this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family.
Results: The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing
diphthine-ammonia ligase (EC 6.3.1.14), Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis
of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins.
Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the
diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to
have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different
function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest
that some archaeal and bacterial DUF71 proteins participate in B12 salvage.
Conclusions: This detailed analysis of the DUF71 family members provides an example of the power of integrated
data-miming for solving important "missing genes" or "missing function" cases and illustrates the danger of
functional annotation of protein families by homology alone.
Reviewers' names: This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind.
Keywords: Diphthamide, Vitamin B12, Amidotransferase, Comparative genomics


Background
In both Archaea and Eucarya, the translation Elongation
Factor 2 (EF-2) harbors a complex post-translational
modification of a strictly conserved histidine (His699 in
yeast) called diphthamide [1]. This modification is the
target of the diphtheria toxin and the Pseudomonas exo-
toxin A, which inactivate EF-2 by ADP-ribosylation of
the diphthamide [2,3]. Although the diphthamide bio-
synthesis pathway was described in the early 1980's
[2,3], the corresponding enzymes have only recently

SCorrespondence vcrecy@ufl edu
Department of Microbiology and Cell Science, University of Florida,
Gainesville, FL 32611, USA
Full list of author information is available at the end of the article


been characterized. In vitro reconstitution experiments
have shown that the first step, the transfer of a 3-amino-
3-carboxypropyl (ACP) group from S-adenosylmethio-
nine (SAM) to the C-2 position of the imidazole ring of
the target histidine residue, is catalyzed in Archaea by
the iron-sulfur-cluster enzyme, Dph2 [4,5] (Figure 1A).
Genetic and complementation studies have shown that
the catalysis of the same first step requires four proteins
(Dphl-Dph4) in yeast and other eukaryotes [6-9]. The
subsequent step, trimethylation of an amino group to
form the diphthine intermediate, is catalyzed by
diphthine synthase, Dph5 (EC 2.1.1.98) (Figure 1A)
[10,11]. The last step, the ATP-dependent amidation of
the carboxylate group [12], is catalyzed by diphthine-


0 2012 de Crecy-Lagard et al., licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the
Biole led Central Creative Commons Attribution License (http//creativecommons.org/licenses/by/2.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited.







de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


NH2

I ('(A
/S. N


CO2 OH OH M

Dph2
(+Dphl, Dph3,
Dph4)


13N
CO;


EC 2.1.1.98



NH SAM NH

Dph5 H3C
- H3C CO2
Diphthine


EC 6.3.1.14


ATP, NH3 NNH
------ / NH
Dph6? H3C, +
(+Dph7) H3C-'N
H3C NH2
0
Diphthamide


NH
H3C + ATP

H3C CO2-
Diphthine


Dph6? NH
------ > H3c, + NH3
H3C-NN
H3C AMP
0


C H2N 0

S| O N H 2













Hd R NH2
S/Adenine


R=deoxyadenosyl
Ado-PseudoB12
Figure 1 Structures of diphthamide and B12 precursors and derivatives. (A) The core diphthamide pathway is predicted to contain three
enzymes Dph2, Dph5 and Dph6 in Archaea. The formation of diphthine has been reconstituted in vitro using Dph2 and Dph5 from Pyrococcus
horikoshii [4,5]. The enzyme family catalyzing the last step in Archaea and Eukarya Dph6 was missing. In yeast, the first and last steps require
additional proteins (Dphi, Dph3 and Dph7). (B) Predicted Dph6-catalyzed reactions. (C) Ado-Pseudo-B12 structure and hydrolysis site by the
bacterial CbiZ enzyme (bCbiZ). Parts (A) and (B) are adapted with permission from Xuling Zhu; Jungwoo Kim; Xiaoyang Su; Hening Lin;
Biochemistry 2010, 49, 9649-9657. Copyright 2010 American Chemical Society.
HZ ---- NH2-













Biochemistry 2010, 49, 9649-9657 Copyrigt 2010 Amerian Chemical Socety.


ammonia ligase (EC 6.3.1.14), but the corresponding
gene has not been identified (http://www.orenza.u-psud.
fr/). A protein involved in this last step was recently
identified in yeast (YBR246W or Dph7), but it is most
certainly not directly involved in catalysis as it is not
conserved in Archaea and it contains a WD-domain
likely to be involved in protein/protein interactions [13].
Using a combination of comparative genomic
approaches, we set out to identify a candidate gene for
this orphan enzyme family. Based on taxonomic


distribution, domain organization of gene fusions, phys-
ical clustering on chromosomes, atomic structural data,
co-expression, and phenotype data, a promising candi-
date was identified, the family called Domain of Un-
known Function family DUF71(IPR002761) in Interpro
[14]. This family is also called ATP_bind_4 (PF01902) in
Pfam [15]or Predicted ATPases of PP-loop superfamily
(COG2102) in the Cluster of Ortholous Group database
[16]. However, detailed analysis of the DUF71 family
revealed that this family is almost surely not


A



N \
-NH


Page 2 of 13


Dph6?
- ----------....._


NH

H3C-N
H3C NH2
0
Diphthamide






de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


isofunctional. Some Archaea contain two very divergent
copies of the gene, while homologs are found in Bac-
teria, which are known to lack diphthamide. This obser-
vation suggests that some DUF71 members have
different functions and probably participate in different
biochemical pathways.

Methods
Comparative genomics
The BLAST tools [17] and resources at NCBI (http://
www.ncbi.nlm.nih.gov/) were routinely used. Multiple se-
quence alignments were built using ClustalW [18] or Mul-
tialin [19]. Protein domain analysis was performed using
the Pfam database tools (http://pfam.janelia.org/) [15].
Analysis of the phylogenetic distribution and physical
clustering was performed in the SEED database [20].
Results are available in the "Diphthamide biosynthesis"
and "DUF71-B12" subsystem on the public SEED server
(http://pubseed.theseed.org/SubsysEditor.cgi). Phylogen-
etic profile searches were performed on the IMG platform
[21] using the phylogenetic query tool (http://img.jgi.doe.
gov/cgi-bin/w/main.cgi?section=PhylogenProfiler&page=-
phyloProfileForm). Physical clustering was analyzed with
the SEED subsystem coloring tool or the Seedviewer
Compare region tool [20] as well as on the MicrobesOn-
line (http://www.microbesonline.org/) tree based genome
browser [22]. The SPELL microarray analysis resource
[23] was used through the Saccharomyces Genome Data-
base (SGD) (http://www.yeastgenome.org/)[24] to analyze
yeast gene coexpression profiles. Clustering of yeast dele-
tion mutants based on phenotype analysis was analyzed
through the yeast fitness database available at http://fitdb.
stanford.edu/ [25,26]. Mapping of gene distribution profile
to taxonomic trees were generated using the iTOL suite
(http://itol.embl.de/index.shtml) [27]. Sequence logos
were derived using the WebLogo platform [28].

Structure analysis
Visualization and comparison of protein structures and
manual docking of ligand molecules were performed
using PyMol (The PyMOL Molecular Graphics System,
Version 1.4.1, Schr6dinger, LLC). XtalView [7] was used
for the protein docking exercises.

Phylogenetic analyses
The survey of the 1996 complete prokaryotic genomes
available at the NCBI (http://www.ncbi.nlm.nih.gov/)
using BLASTP [17] (default parameters) allowed identifi-
cation of 119 bacterial and 144 archaeal DUF71 homo-
logs in addition to the 182 eukaryotes homologs
identified in the RefSeq database at the NCBI [29]
(Additional file 1: Table Sl). The retrieved sequences
were aligned using MAFFT [8] and the resulting alignment
was visually inspected using ED, the alignment editor of


the MUST package [30]. The phylogenetic analysis of the
445 sequence was performed using the neighbor-joining
distance method implemented in SeaView [31]. The
robustness of the resulting tree was assessed by the non-
parametric bootstrap method (100 replicates of the
original dataset) implemented in SeaView. A second phylo-
genetic analysis restricted to 50 archaeal and eukaryotic
homologs representative of the genetic and genomic diver-
sity of these two Domains was performed using the
Bayesian approach implemented in Phylobayes [6] with a
LG model.

Results and discussion
Comparative genomics points to DUF71/COG2102 as a
strong candidate for the missing diphthamide synthase
family
The distribution of known diphthamide biosynthesis
genes in Archaea was analyzed using the SEED database
and its tools [20]. The 59 archaeal genomes analyzed all
contained an EF-2 encoding gene. Analysis of the distri-
bution of Dph2 and Dph5 in Archaea showed that 58/59
genomes encoded these two proteins. The only archaeon
lacking both Dph2 and Dph5 was Korarchaeum cryptofi-
lum OPF8 (Figure 2A). We therefore hypothesized that
this organism has lost the diphthamide modification
pathway even if the K. cryptofilum EF-2 still harbors the
conserved His residue at the site of the modification
(His6o3 in the K. cryptofilum sequence, Accession
B1L7Q0 in UniprotKB). Using the IMG/JGI phylogenetic
query tools [21], we searched for protein families found
in all Archaea except Korarchaeum cryptofilum OPF8,
present in Saccharomyces cerevisiae and Homo sapiens
but absent in Escherichia coli and Bacillus subtilis, as
bacteria are known to lack this modification pathway.
Only one family, DUF71/COG2102, followed this taxo-
nomic distribution. This family had been described pre-
viously as a PP-loop ATPase of unknown function
containing a Rossmannoid class HUP domain [32].
Using the neighborhood analysis tool of the SEED
database [20], physical clustering was generally not
observed between the dph2, dph5 and DUF71 genes ex-
cept in three Methanosarcina genomes where the dph5
is located in the vicinity of DUF71 genes (Figure 2B). If
members of the DUF71 catalyze the last step of diphtha-
mide synthesis they should bind ATP [12]. Structural
analysis of the DUF71 protein from Pyrococcus furious
(PF0828) reveals the presence of two distinct domains:
an N-terminal HUP domain that contains a highly con-
served PP-motif that interacts with ATP (PDB id: 3RK1)
and AMP (PDB id: 3RKO), and a C-terminal 100-residue
domain belonging to a novel fold with a highly conserved
motif GEGGEF/YE188T/S (P. furious numbering)
that is probably involved in substrate binding and
recognition [33].


Page 3 of 13







de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


Dph2e
Dph50 -. -ooo Met
Duf71 I I. I :o :
IF"F, | t .it ----------- -* O 0
EF2 .0. 0

S ., -------- 0 M e


r,,,,,,,,,,,- -- ----- e
-----0
S -----------** *
1 B--"r*" i-i, t ni n ---------- go


S. ,,M A---------- 0



---C7 ---- --- *


-------------- *
--------- e 0
S---------00 0
S------ 0 0
*.0 0





^^^ *^^ ^^^ Piooocui h nliriia T3 ------------ ^ I0 0




-------------e *
Fiue2 o prti gnm ia lyasisof 7 f--------- 0
S 1 -----------00 0

-------------------------------------0 0 *
FigreC ----------------lysis ot--- ------------- 6
I~~~---------------------------~maro---------00 6



Figure 2 Comparative genomic analysis of the DUF71 family. (A) Distri


hanosarcina mazei Goel 1
DUF711
Dph5 COG1849 COG1885 RbsK COG210
hanosarcina acetivorans C2A


hanosarcina barker str. fusaro






Methanohalophilus mahii YP_003542089.1
Methanosalsum zhilinae YP 004615790
Candidates Nanosalina sp. J07AB43 EGQ42883.1


3at-II _Asn_Synthase_B_ Duf7 (PF01902)
cd00352) C (cd01991)

AsnB (COG0367)


S. cerevisiae YLR 143W
Arabidopsis thaliana AT3g04480


the core diphthamide genes Dph2 and Dph5 and


and DUF71 in Archaea, according to data derived from the "Diphthamide biosynthesis" subsystem in the SEED database. The tree is a species tree


istructed in iTol (itol.embl.de/). The presence and absence of th


(B) Physical clustering
www.microbesonline.,
CDD, or Pfam domain


f DUF71/COG2102
g/). (C) Examples o
lumbers are aiven i


genes with Dph
proteins contain
i parentheses.


e specific genes was derived from the "Diphthamide biosynthesis" subsystem.
ree Methanosarcina genomes derived from the MicrobesOnline database (http:/
mains fused to DUF71 in Archaea and Eucarya. Accession numbers and COG,


Coexpression, phenotype and structural data link the
yeast DUF71 to translation and diphthamide biosynthesis
YLR143w is the only S. cerevisiae DUF71 family mem-
ber. Using YLR143w as input in the SPELL co-
expression query tool [23] showed that nearly all
co-expressed genes were involved in translation and
ribosome biogenesis (Additional file 2: Table S2). This
observation suggested that the DUF71 protein family
has a role in translation as expected for a protein modi-
fying EF-2. Like all known diphthamide synthesis genes,
YLR143w is also not essential. More specifically, deletion
of any of the five known diphthamide genes confers sor-
darin resistance in yeast [34,35] and ylrl43wA strain was
shown to be as resistant to this compound as the
diphthamide deficient strains (see supplemental data in
[34]). Furthermore, in a recent complete analysis of rela-
tionships between gene fitness profiles (co-fitness) and
drug inhibition profiles (co-inhibition) from several hun-
dred chemogenomic screens in yeast [25,26] available at


http://fitdb.stanford.edu/, it was found that among the
top ten interactors with YLR143w by homozygous co-
sensitivity are DPH5, DPH2, DPH4 (or JJJ3) and the
newly identified DPH7 (or YBR246w) (Additional file 3:
Figure Si). Both the coexpression and phenotype data
thereby strongly support the hypothesis that YLR143w
catalyzes the missing last step of diphthamide biosyn-
thesis, even if one cannot rule out at this stage that
other catalytic subunits yet to be identified may also be
required.
Finally, comparison of ATP- and AMP-containing
structures of PF0828 reveals that the active site of the
former has a narrow groove at the end of which only the
a-phosphate of ATP is exposed to the solvent whereas
the active site of the latter is wide open (Figure 3A and
B). Also, there is a sharp turn at the a-phosphate of
ATP, suggesting that it is the site of the nucleophilic at-
tack. We therefore performed a docking exercise using
the EF-2 structure (PDB id: 3B82) [36] with the ATP-


Page 4 of 13


Duf71 (PF01902) Ri AA Ri( A 4
I I (PF01042) (PF01042)







de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


A PF0828
/P IO


PF0828



;,


Di th ne..'


o- -

EF2 "


-6 ;*- ;
S ^?-
^ ^"f -


PF0828


6,-
E 1 8 8 -. (
E-SS \. .- -


containing structure of PF0828. The docking revealed
that the active site groove of the ATP-containing struc-
ture can easily accommodate diphthine with a few minor
clashes between the two structures (Figure 3A and B).
The modeling also showed that the carboxyl group
of diphthine resides near the a-phosphate of ATP and
carboxylate group of residue Glu188, suggesting that
nucleophilic attack by diphthine on the a-phosphate


Page 5 of 13


Figure 3 Structural analysis of the DUF71 (PF0828) putative
activesite. (A) Docking of modified EF-2 cyann, PDB id: 3B82) onto
ATP-bound structure of PF0828 (yellow, PDB id: 3RK1). ATP and
several residues of PF0828 (DUF71), which are conserved among
archaeal and eukaryotic orthologs, and diphthine of EF-2 (see text
for details) are shown in stick models. (B) Close-up stereo pair of
panel A. Diphthine of EF-2 and the side chains of conserved
residues of PF0828, at the interface of PF0828 and EF-2, are shown
in stick models and labeled. (C) Stereo pair view of ATP-binding
region of PF0828. Residues that are conserved among Dph6 and
DUF71-B12 families are depicted in stick models with carbon atoms
in cyan, while the residues that are specific to Dph6 family are
shown in stick models with carbon atoms in green. Oxygen and
nitrogen atoms are shown in red and blue in all stick models,
respectively.


of ATP is highly feasible (Figure 3B). As shown in
Figure 3B, the modelling also shows that several resi-
dues which are highly conserved among archaeal and
eukaryotic PF0828 and YLR143w orthologs beside
E188, including S44, Y45, E78, Y103, Q104, A149, E183 and
E186 (Additional file 3: Figure S2), are at the interface
of the modelled complex of PF0828 with EF-2, sup-
porting the hypothesis that they play important roles
in EF-2 recognition (Figure 3B).


Linking DUF71 family members to ammonia transfer
reactions
The diphthine ammonia lyase reaction requires a source
of NH3 [12]. Domain fusions involving members of the
DUF71 family in the Pfam database [15] suggests the
source of NH3 might vary depending on the organism.
For example, in a few Archaea (e.g. Methanohalophilus
mahii DSM 5219, Methanosalsum zhilinae DSM 4017 or
'Candidatus Nanosalinarum sp. J07AB56'), a COG0367/
AsnB asparagine synthetase [glutamine-hydrolyzing] (EC
6.3.5.4) domain is found at the N-terminus of the DUF71
domain (Figure 2C). This AsnB domain can be further
separated into two subdomains, an N-terminal class-II
glutamine amidotransferase domain (GAT-II) [37] and an
Asn_SynthaseBC PP-loop ATPase domain (Figure 2C) .
This domain organization suggests that in this subset of
enzymes, the hydrolysis of glutamine catalyzed by the
GAT-II domain could provide the NH3 moiety to both the
DUF71 and the Asn_SynthaseBC enzymes. On the
other hand, in many eukaryotes such as yeast and Arabi-
dopsis thaliana, two YjgF-YER057c-UK114-like domains
are fused to the C-terminus of the DUF71 protein as pre-
viously noted by Aravind et al. [32] (Figure 2C). The
stand-alone members of the YjgF-YER057c-UK114 family,
now called the RidA family (for reactive intermediate/
imine deaminase A), have been shown to deaminate pro-
ducts generated by PLP-dependent enzymes, which results
in the release of NH3 [38]. The RidA domains fused to







de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


DUF71 could therefore be involved in providing the NH3
ammonium moiety for diphthamide synthesis.

The Duf71 family is not monofunctional
The taxonomic distribution of DUF71 homologs in avail-
able complete genomes confirmed that DUF71 is present
in one or occasionally two copies in all Archaea except the
korarchaeon K. cryptofilum (Table 1 and Additional file 1:
Table Sl). This pattern is consistent with an ancient origin
of the DUF71 gene in Archaea. In sharp contrast, DUF71
is sporadically distributed in Bacteria, being present only
in a few representatives of some phyla (Table 1 and
Additional file 1: Table Sl). This pattern fits either with an
ancient origin of DUF71 in Bacteria followed by numerous
losses or, conversely, with a more recent acquisition fol-
lowed by horizontal gene transfer (HGT) among bacterial
lineages. To further investigate the evolutionary history of
DUF71, we made a phylogenetic analysis of the homologs
identified in the three Domains of Life. The resulting tree
showed two divergent groups of sequences. The first
group contains the eukaryotic and nearly all archaeal
sequences (including the predicted yeast DPH6 (YLR143w)
and P. furious PF0828), whereas the second encom-
passes all the bacterial sequences as well as the second
copy found in a few archaeal genomes (Figure 4 and
Additional file 3: Figure S3).
This second group emerged from within the archaeal
sequences of the first cluster and showed various
contradictions with the currently recognized taxonomy
because bacterial sequences from distantly related
lineages appeared intermixed in the tree (Figure 4).
These observations together with the extremely patchy


distribution of DUF71 in bacteria strongly supports the
hypothesis that the bacterial DUF71 was of archaeal
origin and spread through this domain mainly by HGT.
Interestingly, the second homologs present in a few
archaeal genomes emerged from bacterial sequences,
suggesting that secondary HGT occurred from Bacteria
to Archaea allowing them acquiring a second DUF71
homolog.
In contrast, a phylogenetic analysis focused on
archaeal and eukaryotic sequences strongly supported
the separation between these two Domains (posterior
probabilities (PP) = 1). Moreover it recovered the mono-
phyly of most eukaryotic and archaeal major lineages
(most PP > 0.95, Additional file 3: Figure S3), suggesting
that DUF71 was present in their ancestors. However, as
expected given the small number of amino acid positions
analyzed (182 positions), the relationships among these
lineages were mainly unresolved (most PP < 0.95) pre-
cluding the in-depth analysis of the ancient evolutionary
history of DUF71 in Archaea and Eucarya (Additional
file 3: Figure S3). Nevertheless, the wide distribution of
DUF71 in these two Domains (even in highly derived
parasites such as Microsporidia, Cryptosporidium,
Entamoeba or Nanoarchaeum equitans, not shown) and
its ancestral presence in most of their orders/phyla sug-
gested that this gene was present in the last common an-
cestor of these two Domains. This inference does not
imply, however, that no HGT occurred in these
Domains. Indeed, some incongruence between the
DUF71 phylogeny and the reference phylogeny of organ-
isms [39] suggested putative cases of HGT. For instance,
it was observed for the Thermofilum pendens DUF71


Table 1 Taxonomic distribution of DUF71 homologs in archaeal and bacterial genomes
Phylum Nb (%) genomes Phylum Nb (%) genomes Phylum
Archaea


Nb (%) genomes


Crenarchaeota
Euryarchaeota
Bacteria
Acidobacteria
Actinobacteria
Aquificae
Bacteroidetes
Caldiserica
Chlorobi
Chloroflexi
Chrysiogenetes
Cyanobacteria
Deferribacteres


37/37 (100%)
79/79 (100%)


3/7 (42.9%)
1/206 (0.5%)
0/10 (0%)
20/73 (27.4%)
0/1 (0%)
0/11 (0%)
5/16 (31.3%)
0/1 (0%)
0/45 (0%)
0/4 (0%)


Deinococcus-Thermus 2/17 (11.8%)


rarchaeota


Dictyoglomi
Elusimicrobia
Fibrobacteres
Firmicutes
Fusobacteria
Gemmatimonadetes
Ignavibacteria
Nitrospirae


0/1 (0%)


0/2 (0%)
0/2 (0%)
0/2 (0%)
20/484 (4.1%)
0/5 (0%)
0/1 (0%)
0/1 (0%)
1/3 (33.3%)


ProteobacteriaAlpha 2/204 (1%)
Proteobacteria Beta 8/119 (6.7%)
ProteobacteriaDelta 1/48 (2.1%)


Thaumarchaeota



ProteobacteriaEpsil


2/2 (100%)



0/64 (0%)


Proteobacteria Gamma 27/406 (6.7%)
PVCChlamydiae 1/73 (1 4%)


PVCPlanctomycetes
PVC Verrucomicrobia
Spirochaetes
Synergistetes
Thermodesu Ifobacteria
Thermotoqae


3/6 (50%)
0/4 (0%)
1/45 (2.2%)
0/4 (0%)
0/2 (0%)
5/14 (35.7%)


The number of genomes per phylum containing at least one homolog of DUF71 is indicated.


Page 6 of 13







de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


Cluster 1/Dph6


Page 7 of 13


Eucarya
Archaea
Actinobacteria
Acidobacteria
Bacteroidetes
Chloroflexi

Firmicutes

Proteobacteria_Alpha
Proteobacteria Beta
Proteobacteria Delta
Proteobacter a Gamma
PVC_Planctomycetes
PVC_Chlamydiae
Spirochaetes
Thermotogae


Cluster 2/Duf71-B12


Figure 4 (See legend on next page)


030536


1tX~


Cluster 3


YP 004754874
' 003642272







de Cr6cy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


Page 8 of 13


(See figure on previous page.)
Figure 4 Neighbor-joining phylogenetic tree of the 445 DUF71 homologs identified in public databases. The scale bar represents the
average number of substitutions per site. Numbers at nodes are bootstrap values. For clarity only values greater than 50% are indicated. Colors
correspond to the taxonomic affiliation of sequences (see the box on the figure for details). The full tree of Cluster 1 is shown in Additional file 3:
Figure S3).


that robustly groups with Methanomicrobia (Euryarch-
aeota) and not with other Thermoproteales (Additional
file 3: Figure S3).
Because diphthamide is a modification specific to the
archaeal and eukaryotic EF-2 proteins and bacteria lack all
known diphthamide biosynthesis genes, we propose that
cluster 1 in our phylogeny corresponds to bonafide Dph6
enzymes involved in diphthamide synthesis (Figure 4).
This function therefore very likely represents the ancestral
function of the whole DUF71 family. In contrast, bacteria
do not synthesize diphthamide, suggesting that the bacter-
ial DUF71 homologs and the few additional archaeal cop-
ies (cluster 2, Figure 4) are involved in another function,


and thus a functional shift occurred after the HGT of an
archaeal bona fide Dph6 to bacteria. Notably, these
genes (including PF0295, the second DUF71 copy
found in P. furious) are strongly clustered on the
chromosome with vitamin B12 salvage genes. More
precisely 75/102 are adjacent to vitamin B12 trans-
porter genes (such as the BtuCDF genes) [40] and
18/102 are adjacent to cbiB genes encoding
adenosylcobinamide-phosphate synthetase, an en-
zyme shared by the de novo and salvage pathways
[41] (Figure 5A). This clustering data can be visua-
lized in the "Duf71-B12" subsystem in the SEED
database, and two typical clusters are shown in


~EuDo )pAdo-Pseudo-B12 --

cx-AMP-AP

SCobA De novo corrin ring

AP-P <- Thr-P



CobUTSC (Bacteria) DMB
CobYZ (Archaea) "

1>0 (E^do)


Pyrobaculum calidifontis
cobY cbiB cobD cobT

cobA DUF71-B12 cobS aCbiZ/CobZ
Clostridium perfringens

cobT cobUcobS cobCcbiBcobD btuF btuC btuD CbiP

DUF71-B12
Figure 5 Links between the DUF71 family and B12 salvage. (A) Summary of cobinamide derivative salvage in Bacteria and Archaea; arrows
with dotted lines denote multiple steps. (B) Typical examples of physical clustering of DUF71-B12 genes with B12 salvage genes in Archaea and
Bacteria. Abbreviations: Pseudo-B12, adenosylpseudocobalamin; Cbi, Cobinamide; AdoCbi, adenosylCbi; AdoCbi-P, AdenosylCbi-phosphate;
AdoCby, adenosylcobyric acid; AP; (R)-1-amino-2-propanol; AP-P, AP-phosphate; Thr-P, L-threonine-phosphate; DMB, 5,6-dimethylbenzimidazole; a-
AMP-AP, a-adenylate-AP; CobU, ATP:AdoCbi kinase, GTP:AdoCbi-GDP guanylyltransferase; CobY, NTP:AdoCbi-P nucleotidyltransferase; CobA, ATP:co
(I)rrinoid adenosyltransferase; aCbiZ, adenosylcobinamide amidohydrolase; bCbiZ, pseudo-B12 amidohydrolase; CbiB, cobyric acid synthetase;
CobD, L-threonine phosphate decarboxylase; CobS, cobalamin (5-P) synthase; CobT, 5,6-dimethylbenzimidazole phosphoribosyltransferase; CobC
or CobZ, alpha-ribazole-5'-phosphate phosphatase; cobY, adenosylcobinamide-phosphate guanylyltransferase; CbiP, cobyric acid synthase; BtuFCD,
cobamide transporter subunits.






de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


Figure 5B. On this basis, we hypothesize that the
archaeal and bacterial DUF71 genes that cluster with
B12 vitamin genes have a role in B12 metabolism.
Finally, some bacterial DUF71 proteins might also have
other functions because a set of bacteria such as Clostrid-
ium perfringens have two or more DUF71 homologs
(Figure 4 and Additional file 1: Table Sl). The most ex-
treme example is Dehalococcoides sp. CBDB1, which
encodes five DUF71 homologs in its genome. In the case
of C. perfringens ATCC 13124 and SM101, one homolog
(YP_695745 and YP_698440) clusters both physically and
phylogenetically (Figure 4 and 5A) with the B12 subgroup
proteins, whereas the second homolog (YP_695178 and
YP_698039) is related to Acinetobacter baumanii (Cluster
3, Figure 4) and is not found associated to gene clusters
related to B12 salvage (data not shown).
Therefore, based on phylogenetic and physical cluster-
ing the DUF71 proteins were split into: the Dph6 and
the DUF71-B12 subgroups that were annotated as such
and captured in the "Diphthamide biosynthesis" and
"Duf71-B12" subsystems in the SEED database.

Predicting the function of members of the DUF71-B12
subgroup
As members of the DUF71-B12 subgroup clustered
strongly with B12 transport genes and with cbiB
(Figure 5B), we focused on the early steps on B12 salvage,
which are quite diverse because several forms of coba-
mides [cobalamin-like or Cbl-like compounds] can be sal-
vaged (Figure 5A). Cobinamide (Cbi) is adenylated after
transport to form adenosylcobinamide (AdoCbi). In most
bacteria, AdoCbi is directly phosphorylated by CobU be-
fore being transformed after several steps into adenosylco-
balamin (AdoCbl or coenzyme B12), in which the lower
ligand is 5,6-dimethylbenzimidazole (DMB) (see [42] for
review) (Figure 5A). Archaea use a different salvage route
in which AdoCbi is converted to adenosylcobyric acid
(AdoCby), an intermediate of the de novo pathway, by an
amidohydrolase, aCbiZ [43] (Figure 5A). AdoCby is then
converted directly to adenosylcobinamide-phosphate
(AdoCbi-P) by CbiB. Finally some bacteria have CbiZ
homologs (bCbiZ) that hydrolyze adenosylpseudocobala-
min (Ado-Pseudo-B12) [44], which contains an adenine
instead of DMB as its lower ligand (Figure IC and 5A).
In order to gain insight into the possible function of
DUF71-B12 family members, we analyzed the co-
distribution pattern of CbiZ, CbiB and DUF71-B12 pro-
teins in Archaea and Bacteria. Interestingly, to a few
exceptions, all prokaryotic genomes encoding CbiB har-
bor either CbiZ or DUF71-B12 (Figure 6). However, in
bacteria, there was strict anti-correlation between the
DUF71-B12 and the CbiZ families (Figure 6A). This was
not the case in Archaea where quite a few organisms
(such as P. furiosis or Methanosarcina mazei Gol)


harbored both families (Figure 6B). This distribution
profile suggests that members of the DUF71-B12 sub-
family fulfil the same roles as the bacterial CbiZ enzymes
(bCbiZ), either by catalysing the same reaction (cleaving
Ado-pseudo-B12 into AdoCby) or by providing another
route to salvaging Pseudo-B12. This hypothesis would
explain why bacteria would have one or the other while
Archaea could carry both (Figure 6B), because archaeal
CbiZ proteins have been predicted to lack pseudo-B12
cleavage activity [44].
Detailed analysis of the signature motifs of the two
subfamilies reveal that the strictly conserved EGGE/
DXE188 motif (P. furious PF0828 numbering) in
Dph6 proteins is replaced by a ENGEF/YH188 motif
in the DUF71-B12 proteins (Additional file 3: Figure
S2 and Additional file 3: Figure S4). In the Dph6
family, E188 is located near the predicted diphthine
binding site and is predicted to be involved in cataly-
sis (Figure 3B). The replacement of the strictly con-
served E188 residue by a Histidine residue strongly
suggest a change in the reaction catalyzed by the
DUF71-B12 subfamily compared to the Dph6 family.
The structure based comparison between the two
subfamilies also strongly supports the hypothesis that
their substrates are different, because all residues pre-
dicted to be involved in EF-2 binding (Figure 3B see
section above) are different in the DUF71-B12 sub-
family but mostly conserved within this subfamily
(Additional file 3: Figure S2 and residues in green in
Figure 3C). Residues that are conserved between the
two DUF71 subfamilies (Additional file 3: Figure S2
and residues in blue in Figure 3C) are found around
the phosphate groups of ATP, including S12, G13, G14,
K15, D16, H48, and T189 (PF0828 sequence numbering)
or belong to the C-terminal conserved sequence motif
(EGGE/D-X-E188) such as G182, G184, G185, E186, F187
(Additional file 3: Figure S2 and Figure 3C). Further
experimental studies will be required to determine
whether DUF71-B12 proteins are Ado-pseudo-B12
amidohydrolases or have another role in Ado-pseudo-
B12 salvage.

Conclusions
Our detailed analyses of the DUF71 family members pre-
sented here provide an example of the power of compara-
tive genomic approaches for solving important "missing
genes" or "missing function" cases. These analyses simul-
taneously illustrate the difficulties inherent in accurately
annotating gene families. On one hand, the evidence iden-
tifying a candidate for the missing Dph6 gene family
derived from genomic evidence (mainly phylogenetic dis-
tribution and gene fusions) and post-genomic evidence
(structure, co-expression analysis and genome-wide
phenotype experiments) is so strong that it could be used


Page 9 of 13








de Crecy-Lagard et aI. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


as an example where the functional annotation of a pro-
tein of unknown function could be derived from compara-
tive genomic alone. On the other hand our analyses show
that a subgroup of the DUF71 family is most certainly
involved in a metabolic pathway unrelated to diphthamide
synthesis and that transferring functional annotations
from homology scores alone would be inappropriate in
this case. We believe that this integrated functional anno-
tation approach will play an important role in future pipe-
lines for annotation of protein families.


Additional files


Additional file 1: Table 51. Genbank RefSeq identities and
corresponding organisms for all proteins used in the phylogenies
Additional file 2: Table S2. GO Term Enrichment Spell analysis (http//
imperioprinceton edu'3000/yeast) with YLR143w as input


Page 10 of 13


-400e
A -... -----0 B


I ---. o0
SI ........... ------------- 0
------- oo
------------ *0

.,,--- ---------- -0

-.-.... .................... -.-..-. .P..-........ m T o ------ -
,-.- --f ---------a edlM534s ---0
So o-stodasir --------- 0-
-- -Mo--sd-atncusP2 .....0 0--------
---- ----- -0 0 i a angense SC -16 ----- 0 0

----- obacuJumaeopilmstrIM2 ------
i .... .ebaculu0nisncumOSM414 ---*O
3_,-----0 I lobalum cadifoots JCM 11548 ---4
P2---A -------0
S ------0 haega sdta DSM ---0 *

0 i .......ctersmi ATCC 35061 --Q 0
S------- o akarmanus --------Q *
S --~V ., ^- -,. .......... -000 0-- o a o Ss ----
S------------- o Thnemococcus kodaarensisK 1 ---K *
S-------oThem onms NAl -----
Sf-----400 -oeoohOiSOT3 --------*
S-- ----------- --P o susOS 3;; -----*
-E ---'00S ococ-sabysG--E5 -----
--------- boiquenseDSM 11551 0
,,,,,,,,,3 .. ArC 5* ---.- o ulmmnukohaiaeDSDSM122f6 Q
I ______ Oti~S .... ...... In....... l-taloquadratum waesbyq DSM 16790 ---
I -oarlaansnoi ATCC 43049 ---
Sa s -----------0 0 arhabdus utaensisDSM 1290 ---Q
-----*0 a apharaoisDSM2180 ---0
-liga-e-- ------ 0

-- ^ ^*MCM M--------c efehnoocdsbgoB.S 64 -- 0
S ---- l-4 ----- ---0 0 0
Q-m A- -- ----- 0A --0 0
---A ------. o l ethac mi1 --------40
8ab-.O --------00 *. e i-rsa Son.,osC72A ---0
=,, ,wC- l -k ------ nt ...



DUF71-B12 ............ 0
--------- Ba B-ux --------..--. -o
Z-b ------------ .w..3 ----0


CbiZ B--- .----- --------------0 o
9---------- ---- -------- ........-0
---------------- ---o 0





Figure 6 Distribution of DUF71-B12, CbiZ and CbiB in bacterial (A) and archaeal genomes (B). The trees are species tree constructed in
iTol (itol.embl.de/), the presence and absence of the specific genes was derived from the "DUF71-B12" subsystem in the SEED database.


Additional file 3: Figure 51. Top 10 interactors with YLR143W by
homozygous co-sensitivity in S cerevisiae (from the Yeast fitness
database http'//ftdbstanfordedu/fitdbcgiquery=YLR143W) Figure S2
Multiple sequence alignment of selected Dph6 family and DUF71-B12
family sequences generated using the Multialin platform (http//multalin
toulouseinrafr/multalin/) Strictly conserved residues between the two
families are in red Residues conserved only in the Dph6 family are boxed
in green Residues found around the phosphate group of ATP are noted
by red arrows Secondary structural elements, yellow rectangles for c-
helix and cyan arrows for 3-strand, shown above the alignment, are from
the crystal structure of P furiosus_Dph6 (PF0828) (PDB id 3RK1) Figure
S3 Bayesian tree of archaeal and eukaryotic Dph6 sequences The scale
bar represents the average number of substitutions per site Number at
nodes represent posterior probabilities For clarity only values greater
than 085 are indicated Figure S4 (Top) Sequence logo derived from 95
Dph6 sequences extracted from Diphthamide subsystem in SEED The
E188 reside (PF0828 numbering) is located at position 10 in the logo
(Bottom) Sequence logo derived of the corresponding region derived
from 102 DUF71-B12 sequences extracted from the DUF71-B12
subsystem in SEED Both logos were made at http//weblogo berkeley
edu/Iogocgi based on clustalw derived alignments








de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


Competing interests
The authors) declare that they have no competing interests

Authors' contributions
VdC-L conducted the comparative genomic analysis and made the
functional predictions CB-A performed the phylogenetic analysis FF, LT and
JFH did the structural analysis All authors participated in writing/reviewing
the manuscript All authors read and approved the final manuscript

Reviewers' comments
Reviewer number 1: Arcady Mushegian
Stowers Institute for Medical Research, 1000 E 50th Street, Kansas City,
Missouri 64110

The study by de Crecy-Lagard and co-authors pinpoints the DUF71/
COG2102 asthe most likely archaeal/eukaryotic ATP-dependent diphthine-
ammonia ligase,the so far unaccounted-for enzyme in the pathway of
diphtamide biosynthesis, which pathway is responsible for the formation of
unique derivative of the conserved histidine within the translation
elongation factor 2 A distinct subfamily of this protein family appears to play
another role in bacteria and a subset of archaea, most likely in the salvage of
an intermediate of cobalamine biosynthesis The evidence presented in the
paper consists of genome context information, sequence-structure
prediction and the data from yeast concerning gene expression and
chemical-genomics profiling Taken together, the evidence seems
compelling to me The data from yeast represent partial functional validation
of predictions made for prokaryotes I would recommend only to tone down
the suggestion that all this is a "novel paradigm" in analysis of gene function'
researchers have been inferring gene functions from phenotypes, as well as
from directly detected changes in genotype, for a long, long time, and the
current study is a logical extension of these approaches What is different in
the last 15 years is that we can compare these properties across many
species with completely sequenced genomes; but even this is a logical
extension of the previous work (compare, for example, with work from
Yanofskyand Jensen labs on biosynthesis of aromatic amino acids) it was
not any prescription of a previous scientific paradigm that constrained the
work, but rather the lack of the data

Response The references to a "novel paragdim" were eliminated in the abstract
and the introduction as suggested

Reviewer number 2: Michael Galperin
NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC
6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075

The paper by de Crecy-Lagard and colleagues is a fine example of using
comparative genomics to patch the remaining holes in the metabolic
pathways The key conclusion of this work, prediction of the participation of
the members of the DUF71/COG2102 family in diphtamide biosynthesis in
archaea and eukaryotes and in B12 metabolism in some bacteria and archaea,
is extremely convincing and hardly even needs an experimental verification
The second conclusion, that ammonia used in the diphthine ammonia lyase-
catalyzed reaction in different organisms could use generated by two
different enzymes, asparagine synthetase and the RidA domain, also sounds
convincing However, proving beyond reasonable doubt that DUF71/
COG2102 family members with their ATP-pyrophosphatase activity comprise
the key part of diphthine ammonia lyase does not prove that they are the
only subunits of this enzyme Even if the proposed reaction scheme
(Figure 1B) is correct, there still might be a need for a ligase subunit that
couple removal of the AMP moiety from EF2 with its amidation There is a
definite possibility that DUF71/COG2102 family members catalyze all these
individual reactions, eg using its unique C-terminal 100-aa domain, but that
would have to be proven experimentally The reported involvement of the
likely scaffold protein YBR246w (DPH7) appears to support the idea that
diphthine ammonia lyase consists of more than one type of subunits
Otherwise, it is a great paper that vividly demonstrates the power of
comparative-genomics approaches

We added a phrase stating that "even if one cannot rule out at this stage
that other catalytic subunits yet to be identified may also be required"


Reviewer number 3: L. Aravind
NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC
6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075

This work uses contextual information to identify the diphthine-ammonia
ligase in archaea and eukaryotes It also shows that the yeast protein
YBR246W is indeed not the correct ligase, but rather the MJ0570-like PP-loop
ATPases The authors also show that this family has been transferred to
certain bacteria where they infer that it is likely to have undergone a
functional shift to participate in B12 salvage They cautiously propose that it
might function as a replacement for CbiZ to function as an amidohydrolase
(the reverse of the typical PP-loop ATPase reaction) as against a ligase The
conclusions are definitive and the article makes a useful contribution to the
understanding of protein modification and cofactor biosynthesis This said,
there are certain issues with the current form of the article that authors
necessarily need to address in their revision 1) (pg 8) The authors state that
the MJ0570-like enzymes have a HUP domain followed by a distinct C-
terminal domain They do not explain the meaning of this properly nor cite
the reference of the paper (PMID 12012333) pertaining to the HUP domains
where this family was identified as a PP-loop ATPase, along with the
observations (Table 1 in that reference) that it has a primarilyarchaeo-
eukaryotic phyletic pattern, and that eukaryotic versions might be fused to
two C-terminal domains of the YabJ-like chorismate lyase fold (now termed
RidA) It should be stated that the N-terminus is a PP-loop ATPase domain of
the HUP class of Rossmannoid domains not all HUP domains are ligases -
only the PP-loop and the HIGH nucleotidyltransferases This clarifies that it is
related to other ATP-utilizing amidoligases such as NAD synthethase, GMP
synthetase and asparagine synthetase This would place their inferred
amidoligase activity in the context of comparable, known amidoligase
activities of related enzymes In fact it would be advisable to place the fact
that these are PP-loop enzymes in the abstract itself

The following sentence was added "This family had previously been
previously described a aa PP-loop ATPase of unknown function containing a
Rossmannoid class HUP domain (Aravind et al 2002)" A reference to the PP-
loop ATPase family was added in the abstract as requested A reference to
the same work was added when talking about the RidA fusion For the
phylogenetic distribution the results presented here are a bit different from
the previous study because many more genome are available after 10 years
and we show that the family is also bacterial

2) The authors persistently refer to the domain as DUF71 This name is no
longer current in Pfam and it has long been recognized as mentioned in the
reference noted above that these proteins are not "domains of unknown
function" but PP-loop ATPases The domain is correctly termed ATP bind 4
(PF01902) in Pfam This Pfam (not the misleading DUF71) name and Pfam
number should be indicated with just a statement in the introduction that it
was formerly DUF71

This domain is currently called "Domain of unknown function DUF71, ATP-
binding domain" in the InterPro database (IPR002761) even if it is called
ATP bind 4 (PF01902) in Pfam It is much shorter to use (as well as easier for
the reader to follow) the DUF71 abbreviation rather than the ATP_bind_4
abbreviation We therefore prefer to keep DUF71 We however introduced a
statement giving the different names of this domain in the InterPro, Pfam
and COG databases at the end of the introduction

3) The authors apparently have a misapprehension regarding the
Methanohalophilus mahii protein both in the text and the domain architecture
rendered in the figure First, these proteins have two N-terminal domains fused
tothe MJ0570-like module namely a N terminal class-11
glutamineamidotransferase (GAT-11, e g see PMID 20023723) and second PP
loop ATPase domain thereafter (ie one related to asparagine synthetase) This
GAT domain as in the case of other PP -loop enzymes could supply ammonia by
cleaving it offglutamine But this does not explain which PP-loop domain
utilizes it In the case of the Asn-synthetase it is used by the cognate PP-loop
domain In this case the presence of two PP -loop domains suggests that it is
either utilized by both for different reactions or else the second domain does not
receive the NH3 from this GAT This also leads to the question what reaction is
the Asn synthetase like PP-loop domain catalyzing?


Page 11 of 13








de Crecy-Lagard et al. Biology Direct 2012, 7:32
http://www.biology-direct.com/content/7/1/32


Quality of written English Acceptable
The source of the confusion came from the fact that the Asn Synthase
domain (AsnB) contains two domains the GAT 11 domain and the
Asn_Synthase_B_C PP-Loop ATPase domain Both the figure and the text
were modified to avoid the confusion Based on the reviewer's comments
the sentence discussing the potential role of the AsnB domain was modified
as follows "This domain organization strongly suggests that in this subset of
enzymes, the hydrolysis of glutamine catalyzed by the fused GAT 11 domain
could provide the NH3 moiety to both the DUF71 and the
Asn_Synthase_B_C enzymes"

4) Based on phyletic complementarity the authors suggest that bacterial
CbiZ might be displaced by the bacterial MJ0570-like enzymes This seems
unusual Why utilize a PP-loop ATPase for the reverse reaction, i e
amidohydrolase? Typically there is little overlap between the families
involved in amidohydrolase as opposed to ATP-dependent ligase activity Of
the almost 12 distinct major inventions of amidoligase activity, hardly any
representatives of these superfamilies have been reused as amidohydrolases
So do the authors note anything special in the case of the bacterial
representatives that might support such a functional shift?
This hypothesis is derived from phylogenetic distribution and it is not
unprecedented that ligases and hydrolases are found in the same family (see
example in PMID'12359880) However, we agree that this hypothesis derives
mainly from phylogenetic patterns analysisand beyond the differences in the
predicted substrate binding pocket found in the DUF71-B12 family we did
not identify specify changes that could point to a shift to hydrolase, hence
our caution in our prediction as stated in the text
Quality of written English Acceptable


Acknowledgements
This work was supported by the US National Science Foundation (grant
MCB-1153413 to V dC-L), the US National Institutes of Health (grant
U54GM094597 to GT Montelione and the Northeast Structural Genomics
Consortium) and the Agence Nationale pour la Recherche
(grant ANR 10 BINF-01 0127 Ancestrome) to C B-A We thank Raffael
Schaffrath and Mike Stark for sharing for sharing unpublished diphthamide
related data and critical evaluation of manuscript parts We thank for Jorge
Escalante Semerena for sharing his immense knowledge on B12 salvage
pathways, Diana Downs for disclosing unpublished results on RidA function,
Manal Swairjo for chemical insight, and Andrew Hanson for helpful input on
the manuscript

Author details
Department of Microbiology and Cell Science, University of Florida,
Gainesville, FL 32611, USA 2Department of Biological Sciences, Columbia
University, Northeast Structural Genomics Consortium, 1212 Amsterdam Ave,
New York, NY 10027, USA 3Universite de Lyon; Universite Lyon 1; CNRS;
UMR5558, Laboratoire de Biometrie et Biologie Evolutive, 43 boulevard du 11
Novembre 1918, Lyon, Villeurbanne F-69622, France

Received: 17 July 2012 Accepted: 18 September 2012
Published: 26 September 2012


References
1 Greganova E, Altmann M, Butikofer P Unique modifications of translation
elongation factors. FEBS J 2011, 278(15)'2613-2624
2 Van Ness BG, Howard JB, Bodley JW ADP-ribosylation of elongation factor
2 by diphtheria toxin. Isolation and properties of the novel
ribosyl-amino acid and its hydrolysis products. J Bio Chem 1980,
255(22)'10717-10720
3 Van Ness BG, Howard JB, Bodley JW ADP-ribosylation of elongation factor
2 by diphtheria toxin. NMR spectra and proposed structures of
ribosyl-diphthamide and its hydrolysis products. J Bio Chem 1980,
255(22)'10710-10716
4 Zhang Y, Zhu X, Torelli AT, Lee M, Dzikovski B, Koralewski RM, Wang E,
Freed J, Krebs C, Ealick SE, et ao/ Diphthamide biosynthesis requires an
organic radical generated by an iron-sulphur enzyme. Nature 2010,
465(7300)'891-896
5 Zhu X, Dzikovski B, Su X, Torelli AT, Zhang Y, Ealick SE, Freed JH, Lin H'
Mechanistic understanding of Pyrococcus horikoshii Dph2, a [4Fe-4S]


enzyme required for diphthamide biosynthesis. Mol Biosyst 2011,
7(1)74-81
6 Lartillot N, Lepage T, Blanquart S PhyloBayes 3: a Bayesian software
package for phylogenetic reconstruction and molecular dating.
Bioinformatics 2009, 25(17) 2286-2288
7 McRee DE XtalView/Xfit-A versatile program for manipulating atomic
coordinates and electron density. J Struct Biol 1999, 125(2-3) 156-165
8 Katoh K, Misawa K, Kuma KlI, Miyata T MAFFT: a novel method for rapid
multiple sequence alignment based on fast Fourier transform. Nucleic
Adds Res 2002, 30(14) 3059-3066
9 Webb TR, Cross SH, McKie L, Edgar R, Vizor L, Harrison J, Peters J, Jackson IJ
Diphthamide modification of eEF2 requires a J-domain protein and is
essential for normal development. J Cell Sc 2008, 121(19) 3140-3145
10 Zhu X, Kim J, Su X, Lin H Reconstitution of diphthine synthase activity
in vitro. Biochemistry 2010, 49(44)9649-9657
11 Mattheakis LC, Shen WH, Collier RJ' DPH5, a methyltransferase gene
required for diphthamide biosynthesis in Saccharomyces cerevisiae. Mol
Cell Biol 1992, 12(9)4026-4037
12 Moehring TJ, Danley DE, Moehring JM In vitro biosynthesis of
diphthamide, studied with mutant Chinese hamster ovary cells resistant
to diphtheria toxin. Mol Cell Bio 1984, 4(4) 642-650
13 Su X, Chen W, Lee W, Jiang H, Zhang S, Lin H YBR246W is required for the
third step of diphthamide biosynthesis. J Am Chem Soc 2011,
134(2) 773-776
14 Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard
T, Binns D, Bork P, Burge S, et oal InterPro in 2011: new developments in
the family and domain prediction database. Nucleic Acds Res 2012,
40(D1)'D306-D312
15 Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL,
Gunasekaran P, Ceric G, Forslund K, et al The Pfam protein families
database. Nucleic Acds Res 2010, 38(suppl_1) D211 -D222
16 Tatusov R, Fedorova N, Jackson J, Jacobs A, KiryutinB Ko E, Krylov D,
Mazumder R, Mekhedov S, Nikolskaya A, et o' The COG database: an
updated version includes eukaryotes. BMC Bioinformo 2003, 4(1)41
17 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ
Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res 1997, 25(17)'3389-3402
18 Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam
H, Valentin F, Wallace IM, Wilm A, Lopez R, et oal Clustal W and Clustal X
version 2.0. Bioinformatics 2007, 23(21)'2947-2948
19 Corpet F Multiple sequence alignment with hierarchical clustering.
Nucleic Ads Res 1988, 16(22)'10881-10890
20 Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de
Crecy-Lagard V, Diaz N, Disz T, Edwards R, et oal The subsystems approach
to genome annotation and its use in the project to annotate 1000
genomes. Nucleic Ads Res 2005, 33(17)'5691-5702
21 Markowitz VM, Chen I-MA, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner
A, Anderson I, Lykidis A, Mavromatis K, et al The integrated microbial
genomes system: an expanding comparative analysis resource. Nucleic
Acids Res 2010, 38(suppl 1)'D382-D390
22 Aim EJ, Huang KH, Price MN, Koche RP, Keller K, Dubchak IL, Arkin AP The
MicrobesOnline web site for comparative genomics. Genome Res 2005,
15(7)'1015-1022
23 Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG
Exploring the functional landscape of gene expression: directed search
of large microarray compendia. Bioinformotics 2007, 23(20) 2692-2699
24 Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, Chan ET,
Christie KR, Costanzo MC, Dwight SS, Engel SR, et al Saccharomyces
genome database: the genomics resource of budding yeast. Nucleic Acids
es 2012, 40(D1)'D700-D705
25 Hillenmeyer M, Ericson E, Davis R, Nislow C, Koller D, Giaever G Systematic
analysis of genome-wide fitness data in yeast reveals novel gene
function and drug action. Genome Biol 2010, 11 (3) R30
26 Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S, Lee W, Proctor M,
StOnge RP, Tyers M, Koller D, et oa The chemical genomic portrait of
yeast: uncovering a phenotype for all genes. Science 2008,
320(5874)'362-365
27 Letunic I, Bork P Interactive Tree Of Life (iTOL): an online tool for phylogenetic
tree display and annotation. Bioinformtics 2007, 23(1)'127-128
28 Crooks GE, Hon G, Chandonia J-M, Brenner SE WebLogo: a sequence logo
generator. Genome Res 2004, 14(6)'1188-1190


Page 12 of 13








de Crecy-Lagard et al. Biology Direct 2012, 7:32 Page 13 of 13
http://www.biology-direct.com/content/7/1/32





29 Pruitt KD, Tatusova T, Brown GR, Maglott DR NCBI Reference sequences
(RefSeq): current status, new features and genome annotation policy. Nucleic
Acids Res 2012, 40(D1)'D130-D135
30 Philippe H MUST, a computer package of management utilities for
sequences and trees. Nucleic Acids Res 1993, 21(22) 5264-5272
31 Gouy M, Guindon S, Gascuel 0 SeaView Version 4: a multiplatform graphical
user interface for sequence alignment and phylogenetic tree building. Mol
Bio Evol 2010, 27(2)'221-224
32 Aravind L, Anantharaman V, Koonin EV Monophyly of class I aminoacyl tRNA
synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding
domains: implications for protein evolution in the RNA world. Proteins
Structure, Funct Bioinform 2002, 48(1 )1-14
33 Forouhar F, Saadat N, Hussain M, Seetharaman J, Lee I, Janjua H, Xiao R, Shastry R,
Acton TB, Montelione GT, et oa A large conformational change in the putative
ATP pyrophosphatase PF0828 induced by ATP binding. Acto Crystallogr Sect F
Struct Bio Cryst Commun 2011, 67(11 )1323-1327
34 Botet J, Rodriguez-Mateos M, Ballesta JPG, Revuelta JL, Remacha M A
chemical genomic screen in Saccharomyces cerevisiae reveals a role for
diphthamidation of translation Elongation Factor 2 in inhibition of
protein synthesis by Sordarin. Antimicrob Agents Chemother 2008,
52(5) 1623-1629
35 Bar C, Zabel R, Liu S, Stark MJR, Schaffrath R A versatile partner of
eukaryotic protein complexes that is involved in multiple biological
processes: Ktil l/Dph3. Mol Microbiol 2008, 69(5)'1221-1233
36 Jorgensen R, Wang Y, Visschedyk D, Merrill AR The nature and character of
the transition state for the ADP-ribosyltransferase reaction. EMBO Rep
2008, 9(8)'802-809
37 lyer LM, Abhiman S, Maxwell Burroughs A, Aravind L Amidoligases with
ATP-grasp, glutamine synthetase-like and acetyltransferase-like domains:
synthesis of novel metabolites and peptide modifications of proteins.
Mol BioSys 2009, 5(12)1636-1660
38 Lambrecht JA, Flynn JM, Downs DM Conserved YjgF protein family
deaminates reactive enamine/imine intermediates of Pyridoxal 5'-
Phosphate (PLP)-dependent enzyme reactions. J Bio/ Chem 2012,
287(5) 3454-3461
39 Brochier-Armanet C, Forterre P, Gribaldo S Phylogeny and evolution of the
Archaea: one hundred genomes later. Curr Opin Microbiol 2011,
14(3) 274-281
40 Berths EL, Poolman B, Hvorup RN, Locher KP, Rees DC In vitro functional
characterization of BtuCD-F, the Escherichia coli ABC transporter for
vitamin B12 uptake. Biochemistry 2005, 44(49)'16301-16309
41 Zayas CL, Claas K, Escalante-Semerena JC The CbiB protein of Salmonella
enterica is an integral membrane protein involved in the last step of the
de novo corrin ring biosynthetic pathway. J Bacterial 2007,
189(21 )7697-7708
42 Escalante-Semerena JC Conversion of cobinamide into
adenosylcobamide in bacteria and archaea. J Bacteriol 2007,
189(13) 4555-4560
43 Woodson JD, Escalante-Semerena JC CbiZ, an amidohydrolase enzyme
required for salvaging the coenzyme B12 precursor cobinamide in
archaea. Proc Not Acod Sci USA 2004, 101(10) 3591-3596
44 Gray MJ, Escalante-Semerena JC The cobinamide amidohydrolase (cobyric
acid-forming) CbiZ enzyme: a critical activity of the cobamide
remodelling system of Rhodobacter sphaeroides. Mol Microbiol 2009,
74(5) 1198-1210


doi:10.1186/1745-6150-7-32
Cite this article as: de Crecy Lagard et al Comparative genomic analysis
of the DUF71/COG2102 family predicts roles in diphthamide
biosynthesis and B12 salvage. Biology Direct 2012 7'32


Submit your next manuscript to BioMed Central
and take full advantage of:

* Convenient online submission
* Thorough peer review
* No space constraints or color figure charges
* Immediate publication on acceptance
* Inclusion in PubMed, CAS, Scopus and Google Scholar
* Research which is freely available for redistribution

Submit your manuscript at nt
www.biomedcentral.com/submit 0 BiIloid Central




Full Text

PAGE 1

Supplemental Figure 1 = DPH7 = DUF71 = DPH4Top 10 interactorswith YLR143W by homozygous co-sensitivity in S. cerevisiae(from the Yeast fitness database http://fitdb.st anford.edu/fitdb.cgi?query=YLR143W).

PAGE 2

Multiple sequence alignment of select ed Dph6 family and DUF71-B12 family se quences generated using the Multialin platform ( http://multalin.toulouse.inra.fr/multalin/ ) Strictly conserved residues between the two families are in red. Residues conserved only in the Dph6 family ar e boxed in green. Residues found around the phosphate gr oup of ATP are noted by red arrows. Secondary structural elements, yellow rectangles for -helix and cyan arrows for -strand, shown above the alignment, are from the crystal structure of P. furiosus_D ph6 (PF0828) (PDB id: 3RK1).Supplemental Figure 2 DUF71-Signature motif

PAGE 3

Supplemental Figure 3 0.1 1 1 1 0.96 1 1 0.97 1 1 0.98 1 0.87 1 1 1 1 1 1 1 1 1 1 1 0.98 0.98 1 1 Archaeoglobus veneficus SNP6 YP_004341662 Archaeoglobus fulgidus DSM 4304 NP_070494 Thermococcus barophilus MP YP_004071328 Pyrococcus furiosus DSM 3638 NP_578557 Candidatus Caldiarchaeum subterraneum BAJ48107 Pyrobaculum oguniense TE7 YP_005259944 Pyrobaculum aerophilum str. IM2 NP_560174 Methanopyrus kandleri AV19 NP_613929 Aciduliprofundum boonei T469 YP_003483464 Picrophilus torridus DSM 9790 YP_023745 Methanococcus voltae A3 YP_003708306 Methanocaldococcus jannaschii DSM 2661 NP_247549 Methanosphaera stadtmanae DSM 3091 YP_447605 Methanothermobacter t hermautotrophicus st r. Delta H NP_275575 uncultured marine group II euryarchaeote EHR76673 halophilic archaeon DL31 YP _004806933 Haloterrigena turkmenica DSM 5511 YP_003402914 Methanocella arvoryzae MRE50 YP_687132 Methanocella conradii HZ254 YP_005381419 Thermofilum pendens Hrk 5 YP_920509 Methanospirillum hungatei JF-1 YP_503034 Methanosaeta thermophila PT YP_843767 Methanoplanus petrolearius DSM 11571 YP_003894468 Methanoregula boonei 6A8 YP_001404153 Methanosalsum zhilinae DSM 4017 YP_004615790 Methanohalobium evestigatum Z-7303 YP_003727184 Methanococcoides burtonii DSM 6242 YP_565201 Cenarchaeum symbiosum A YP_875359 Nitrosopumilus maritimus SCM1 YP_001581864 Candidatus Nitrosoarchaeum limnia SFB1 ZP_08256363 Candidatus Nitrosoarchaeum koreensis MY1 ZP_08667550 Ignisphaera aggregans DSM 17230 YP_003859822 Staphylothermus marinus F1 YP_001041170 Caldivirga maquilingensis IC-167 YP_001541343 Metallosphaera sedula DSM 5348 YP_001192089 Sulfolobus solfataricus P2 NP_342190 Tetrahymena thermophila XP_001016762 Paramecium tetraurelia strain d4-2 XP_001444982 Plasmodium falciparum 3D7 XP_001350622 Cryptosporidium parvum Iowa II XP_625666 Homo sapiens NP_542381 Drosophila melanogaster NP_572749 Arabidopsis thaliana NP_187098 Chlamydomonas reinhardtii XP_001690073 Saccharomyces cerevisiae S288c NP_013244 Yarrowia lipolytica CLIB122 XP_505418 Trypanosoma brucei brucei st rain 927 4 GUTat10.1 XP_828128 Leishmania major strain Friedlin XP_001683782 Phaeodactylum tricornutum CCAP_1055 1 XP_002180405 Thalassiosira pseudonana CCMP1335 XP_002292404 1 0.88 Alveolata Metazoa Plantae Fungi Stramenopiles Excavata Archaeoglobales Thermococcales Thermoproteales Thermoplasmatales/DHEV2 Methenococcales Methanobacteriales Halobacteriales Methanocellales Sulfolobales Methanomicrobia Thermoproteales Desulfurococcales Thaumarchaeota Group II ‘Aigarchaeota’ Methanopyrales Bayesian tree of archaealand eukar yotic Dph6 sequences. The scale bar represents the average number of s ubstitutions per site. Number at nodes represent posterior probabilities. For clarity only values greater than 0.85 are indicated.

PAGE 4

(Top) Sequence logo derived from 95 Dph6 sequences extracted from Diphthamide subsystem in SEED. The E188 reside (PF0828 numbering) is located at position 10 in the logo. (Botto m) Sequence logo derived of the corresponding region derived from 102 DUF71-B12 s equences extracted from the DUF71-B12 subsystem in SEED. Both logos were made at http://weblogo.berkeley.edu/logo.cgi based on clustalwderived alignments. Supplemental figure 4


!DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd'
ui 1745-6150-7-32
ji 1745-6150
fm
dochead Research
bibl
title
p Comparative genomic analysis of the DUF71/COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage
aug
au id A1 ca yes snm de Crécy-Lagardfnm Valérieinsr iid I1 email vcrecy@ufl.edu
A2 ForouharFarhadI2 farhadf@biology.columbia.edu
A3 Brochier-ArmanetCélineI3 celine.brochier-armanet@univ-lyon1.fr
A4 TongLiangltong@columbia.edu
A5 Huntmi FJohnfhunt1@gmail.com
insg
ins Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32611, USA
Department of Biological Sciences, Columbia University, Northeast Structural Genomics Consortium, 1212 Amsterdam Ave, New York, NY, 10027, USA
Université de Lyon; Université Lyon 1; CNRS; UMR5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 Novembre 1918, Lyon, Villeurbanne, F-69622, France
source Biology Direct
section Genomics, bioinformatics and systems biologyissn 1745-6150
pubdate 2012
volume 7
issue 1
fpage 32
url http://www.biology-direct.com/content/7/1/32
xrefbib pubidlist pubid idtype doi 10.1186/1745-6150-7-32pmpid 23013770
history rec date day 17month 7year 2012acc 1892012pub 2692012
cpyrt 2012collab de Crécy-Lagard et al.; licensee BioMed Central Ltd.note This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
kwdg
kwd Diphthamide
Vitamin B12
Amidotransferase
Comparative genomics
abs
sec
st
Abstract
Background
The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family.
Results
The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing diphthine-ammonia ligase (EC 6.3.1.14), Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins. Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest that some archaeal and bacterial DUF71 proteins participate in B12 salvage.
Conclusions
This detailed analysis of the DUF71 family members provides an example of the power of integrated data-miming for solving important “missing genes” or “missing function” cases and illustrates the danger of functional annotation of protein families by homology alone.
Reviewers’ names
This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind.
bdy
Background
In both Archaea and Eucarya, the translation Elongation Factor 2 (EF-2) harbors a complex post-translational modification of a strictly conserved histidine (Hissub 699 in yeast) called diphthamide abbrgrp
abbr bid B1 1
. This modification is the target of the diphtheria toxin and the it Pseudomonas exotoxin A, which inactivate EF-2 by ADP-ribosylation of the diphthamide
B2 2
B3 3
. Although the diphthamide biosynthesis pathway was described in the early 1980′s
2
3
, the corresponding enzymes have only recently been characterized. In vitro reconstitution experiments have shown that the first step, the transfer of a 3-amino-3-carboxypropyl (ACP) group from S-adenosylmethionine (SAM) to the C-2 position of the imidazole ring of the target histidine residue, is catalyzed in Archaea by the iron-sulfur-cluster enzyme, Dph2
B4 4
B5 5
(Figure figr fid F1 1A). Genetic and complementation studies have shown that the catalysis of the same first step requires four proteins (Dph1-Dph4) in yeast and other eukaryotes
B6 6
B7 7
B8 8
B9 9
. The subsequent step, trimethylation of an amino group to form the diphthine intermediate, is catalyzed by diphthine synthase, Dph5 (EC 2.1.1.98) (Figure 1A)
B10 10
B11 11
. The last step, the ATP-dependent amidation of the carboxylate group
B12 12
, is catalyzed by diphthine-ammonia ligase (EC 6.3.1.14), but the corresponding gene has not been identified (http://www.orenza.u-psud.fr/). A protein involved in this last step was recently identified in yeast (YBR246W or Dph7), but it is most certainly not directly involved in catalysis as it is not conserved in Archaea and it contains a WD-domain likely to be involved in protein/protein interactions
B13 13
.
fig Figure 1caption Structures of diphthamide and B12 precursors and derivatives. (A) The core diphthamide pathway is predicted to contain three enzymes Dph2, Dph5 and Dph6 in Archaea.text
b Structures of diphthamide and B12 precursors and derivatives. (A) The core diphthamide pathway is predicted to contain three enzymes Dph2, Dph5 and Dph6 in Archaea. The formation of diphthine has been reconstituted in vitro using Dph2 and Dph5 from Pyrococcus horikoshii 45. The enzyme family catalyzing the last step in Archaea and Eukarya Dph6 was missing. In yeast, the first and last steps require additional proteins (Dph1, Dph3 and Dph7). (B) Predicted Dph6-catalyzed reactions. (C) Ado-Pseudo-B12 structure and hydrolysis site by the bacterial CbiZ enzyme (bCbiZ). Parts (A) and (B) are adapted with permission from Xuling Zhu; Jungwoo Kim; Xiaoyang Su; Hening Lin; Biochemistry 2010, 49, 9649–9657. Copyright 2010 American Chemical Society.
graphic file 1745-6150-7-32-1 Using a combination of comparative genomic approaches, we set out to identify a candidate gene for this orphan enzyme family. Based on taxonomic distribution, domain organization of gene fusions, physical clustering on chromosomes, atomic structural data, co-expression, and phenotype data, a promising candidate was identified, the family called Domain of Unknown Function family DUF71(IPR002761) in Interpro
B14 14
. This family is also called ATP_bind_4 (PF01902) in Pfam
B15 15
or Predicted ATPases of PP-loop superfamily (COG2102) in the Cluster of Ortholous Group database
B16 16
. However, detailed analysis of the DUF71 family revealed that this family is almost surely not isofunctional. Some Archaea contain two very divergent copies of the gene, while homologs are found in Bacteria, which are known to lack diphthamide. This observation suggests that some DUF71 members have different functions and probably participate in different biochemical pathways.
Methods
Comparative genomics
The BLAST tools
B17 17
and resources at NCBI (http://www.ncbi.nlm.nih.gov/) were routinely used. Multiple sequence alignments were built using ClustalW
B18 18
or Multialin
B19 19
. Protein domain analysis was performed using the Pfam database tools (http://pfam.janelia.org/)
15
. Analysis of the phylogenetic distribution and physical clustering was performed in the SEED database
B20 20
. Results are available in the “Diphthamide biosynthesis” and “DUF71-B12” subsystem on the public SEED server (http://pubseed.theseed.org/SubsysEditor.cgi). Phylogenetic profile searches were performed on the IMG platform
B21 21
using the phylogenetic query tool (http://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm). Physical clustering was analyzed with the SEED subsystem coloring tool or the Seedviewer Compare region tool
20
as well as on the MicrobesOnline (http://www.microbesonline.org/) tree based genome browser
B22 22
. The SPELL microarray analysis resource
B23 23
was used through the Saccharomyces Genome Database (SGD) (http://www.yeastgenome.org/)
B24 24
to analyze yeast gene coexpression profiles. Clustering of yeast deletion mutants based on phenotype analysis was analyzed through the yeast fitness database available at http://fitdb.stanford.edu/
B25 25
B26 26
. Mapping of gene distribution profile to taxonomic trees were generated using the iTOL suite (http://itol.embl.de/index.shtml)
B27 27
. Sequence logos were derived using the WebLogo platform
B28 28
.
Structure analysis
Visualization and comparison of protein structures and manual docking of ligand molecules were performed using PyMol (The PyMOL Molecular Graphics System, Version 1.4.1, Schrödinger, LLC). XtalView
7
was used for the protein docking exercises.
Phylogenetic analyses
The survey of the 1996 complete prokaryotic genomes available at the NCBI (http://www.ncbi.nlm.nih.gov/) using BLASTP
17
(default parameters) allowed identification of 119 bacterial and 144 archaeal DUF71 homologs in addition to the 182 eukaryotes homologs identified in the RefSeq database at the NCBI
B29 29
(Additional file supplr sid S1 1: Table S1). The retrieved sequences were aligned using MAFFT
8
and the resulting alignment was visually inspected using ED, the alignment editor of the MUST package
B30 30
. The phylogenetic analysis of the 445 sequence was performed using the neighbor-joining distance method implemented in SeaView
B31 31
. The robustness of the resulting tree was assessed by the non-parametric bootstrap method (100 replicates of the original dataset) implemented in SeaView. A second phylogenetic analysis restricted to 50 archaeal and eukaryotic homologs representative of the genetic and genomic diversity of these two Domains was performed using the Bayesian approach implemented in Phylobayes
6
with a LG model.
suppl
Additional file 1
Table S1. Genbank RefSeq identities and corresponding organisms for all proteins used in the phylogenies.
name 1745-6150-7-32-S1.xlsx
Click here for file
Results and discussion
Comparative genomics points to DUF71/COG2102 as a strong candidate for the missing diphthamide synthase family
The distribution of known diphthamide biosynthesis genes in Archaea was analyzed using the SEED database and its tools
20
. The 59 archaeal genomes analyzed all contained an EF-2 encoding gene. Analysis of the distribution of Dph2 and Dph5 in Archaea showed that 58/59 genomes encoded these two proteins. The only archaeon lacking both Dph2 and Dph5 was Korarchaeum cryptofilum OPF8 (Figure F2 2A). We therefore hypothesized that this organism has lost the diphthamide modification pathway even if the K. cryptofilum EF-2 still harbors the conserved His residue at the site of the modification (His603 in the K. cryptofilum sequence, Accession B1L7Q0 in UniprotKB). Using the IMG/JGI phylogenetic query tools
21
, we searched for protein families found in all Archaea except Korarchaeum cryptofilum OPF8, present in Saccharomyces cerevisiae and Homo sapiens but absent in Escherichia coli and Bacillus subtilis, as bacteria are known to lack this modification pathway. Only one family, DUF71/COG2102, followed this taxonomic distribution. This family had been described previously as a PP-loop ATPase of unknown function containing a Rossmannoid class HUP domain
B32 32
.
Figure 2Comparative genomic analysis of the DUF71 family. (A) Distribution of the core diphthamide genes Dph2 and Dph5 and of EF-2 and DUF71 in Archaea, according to data derived from the “Diphthamide biosynthesis“ subsystem in the SEED database.
Comparative genomic analysis of the DUF71 family. (A) Distribution of the core diphthamide genes Dph2 and Dph5 and of EF-2 and DUF71 in Archaea, according to data derived from the “Diphthamide biosynthesis“ subsystem in the SEED database. The tree is a species tree constructed in iTol (itol.embl.de/). The presence and absence of the specific genes was derived from the “Diphthamide biosynthesis“ subsystem. (B) Physical clustering of DUF71/COG2102 genes with Dph5 in three Methanosarcina genomes derived from the MicrobesOnline database (http://www.microbesonline.org/). (C) Examples of proteins containing domains fused to DUF71 in Archaea and Eucarya. Accession numbers and COG, CDD, or Pfam domain numbers are given in parentheses.
1745-6150-7-32-2 Using the neighborhood analysis tool of the SEED database
20
, physical clustering was generally not observed between the dph2, dph5 and DUF71 genes except in three Methanosarcina genomes where the dph5 is located in the vicinity of DUF71 genes (Figure 2B). If members of the DUF71 catalyze the last step of diphthamide synthesis they should bind ATP
12
. Structural analysis of the DUF71 protein from Pyrococcus furiosus (PF0828) reveals the presence of two distinct domains: an N-terminal HUP domain that contains a highly conserved PP-motif that interacts with ATP (PDB id: 3RK1) and AMP (PDB id: 3RK0), and a C-terminal 100-residue domain belonging to a novel fold with a highly conserved motif GEGGEF/YE188T/S (P. furiosus numbering) that is probably involved in substrate binding and recognition
B33 33
.
Coexpression, phenotype and structural data link the yeast DUF71 to translation and diphthamide biosynthesis
YLR143w is the only S. cerevisiae DUF71 family member. Using YLR143w as input in the SPELL co-expression query tool
23
showed that nearly all co-expressed genes were involved in translation and ribosome biogenesis (Additional file S2 2: Table S2). This observation suggested that the DUF71 protein family has a role in translation as expected for a protein modifying EF-2. Like all known diphthamide synthesis genes, YLR143w is also not essential. More specifically, deletion of any of the five known diphthamide genes confers sordarin resistance in yeast
B34 34
B35 35
and ylr143wΔ strain was shown to be as resistant to this compound as the diphthamide deficient strains (see supplemental data in
34
). Furthermore, in a recent complete analysis of relationships between gene fitness profiles (co-fitness) and drug inhibition profiles (co-inhibition) from several hundred chemogenomic screens in yeast
25
26
available at http://fitdb.stanford.edu/, it was found that among the top ten interactors with YLR143w by homozygous co-sensitivity are DPH5, DPH2, DPH4 (or JJJ3) and the newly identified DPH7 (or YBR246w) (Additional file S3 3: Figure S1). Both the coexpression and phenotype data thereby strongly support the hypothesis that YLR143w catalyzes the missing last step of diphthamide biosynthesis, even if one cannot rule out at this stage that other catalytic subunits yet to be identified may also be required.
Additional file 2
Table S2. GO Term Enrichment Spell analysis (http://imperio.princeton.edu:3000/yeast) with YLR143w as input.
1745-6150-7-32-S2.xlsx
Click here for file
Additional file 3
Figure S1. Top 10 interactors with YLR143W by homozygous co-sensitivity in S. cerevisiae (from the Yeast fitness database http://fitdb.stanford.edu/fitdb.cgi?query=YLR143W). Figure S2 Multiple sequence alignment of selected Dph6 family and DUF71-B12 family sequences generated using the Multialin platform (http://multalin.toulouse.inra.fr/multalin/) Strictly conserved residues between the two families are in red. Residues conserved only in the Dph6 family are boxed in green. Residues found around the phosphate group of ATP are noted by red arrows. Secondary structural elements, yellow rectangles for α-helix and cyan arrows for β-strand, shown above the alignment, are from the crystal structure of P. furiosus_Dph6 (PF0828) (PDB id: 3RK1). Figure S3 Bayesian tree of archaeal and eukaryotic Dph6 sequences. The scale bar represents the average number of substitutions per site. Number at nodes represent posterior probabilities. For clarity only values greater than 0.85 are indicated. Figure S4 (Top) Sequence logo derived from 95 Dph6 sequences extracted from Diphthamide subsystem in SEED. The E188 reside (PF0828 numbering) is located at position 10 in the logo. (Bottom) Sequence logo derived of the corresponding region derived from 102 DUF71-B12 sequences extracted from the DUF71-B12 subsystem in SEED. Both logos were made at http://weblogo.berkeley.edu/logo.cgi based on clustalw derived alignments.
1745-6150-7-32-S3.pdf
Click here for file
Finally, comparison of ATP- and AMP-containing structures of PF0828 reveals that the active site of the former has a narrow groove at the end of which only the α-phosphate of ATP is exposed to the solvent whereas the active site of the latter is wide open (Figure F3 3A and B). Also, there is a sharp turn at the α-phosphate of ATP, suggesting that it is the site of the nucleophilic attack. We therefore performed a docking exercise using the EF-2 structure (PDB id: 3B82)
B36 36
with the ATP-containing structure of PF0828. The docking revealed that the active site groove of the ATP-containing structure can easily accommodate diphthine with a few minor clashes between the two structures (Figure 3A and B).
Figure 3Structural analysis of the DUF71 (PF0828) putative active site. (A) Docking of modified EF-2 (cyan, PDB id: 3B82) onto ATP-bound structure of PF0828 (yellow, PDB id: 3RK1).
Structural analysis of the DUF71 (PF0828) putative activesite. (A) Docking of modified EF-2 (cyan, PDB id: 3B82) onto ATP-bound structure of PF0828 (yellow, PDB id: 3RK1). ATP and several residues of PF0828 (DUF71), which are conserved among archaeal and eukaryotic orthologs, and diphthine of EF-2 (see text for details) are shown in stick models. (B) Close-up stereo pair of panel A. Diphthine of EF-2 and the side chains of conserved residues of PF0828, at the interface of PF0828 and EF-2, are shown in stick models and labeled. (C) Stereo pair view of ATP-binding region of PF0828. Residues that are conserved among Dph6 and DUF71-B12 families are depicted in stick models with carbon atoms in cyan, while the residues that are specific to Dph6 family are shown in stick models with carbon atoms in green. Oxygen and nitrogen atoms are shown in red and blue in all stick models, respectively.
1745-6150-7-32-3 The modeling also showed that the carboxyl group of diphthine resides near the α-phosphate of ATP and carboxylate group of residue Glu188, suggesting that nucleophilic attack by diphthine on the α-phosphate of ATP is highly feasible (Figure 3B). As shown in Figure 3B, the modelling also shows that several residues which are highly conserved among archaeal and eukaryotic PF0828 and YLR143w orthologs beside E188, including S44, Y45, E78, Y103, Q104, A149, E183 and E186 (Additional file 3: Figure S2), are at the interface of the modelled complex of PF0828 with EF-2, supporting the hypothesis that they play important roles in EF-2 recognition (Figure 3B).
Linking DUF71 family members to ammonia transfer reactions
The diphthine ammonia lyase reaction requires a source of NH3
12
. Domain fusions involving members of the DUF71 family in the Pfam database
15
suggests the source of NH3 might vary depending on the organism. For example, in a few Archaea (e.g. Methanohalophilus mahii DSM 5219, Methanosalsum zhilinae DSM 4017 or ‘Candidatus Nanosalinarum sp. J07AB56′), a COG0367/AsnB asparagine synthetase [glutamine-hydrolyzing] (EC 6.3.5.4) domain is found at the N-terminus of the DUF71 domain (Figure 2C). This AsnB domain can be further separated into two subdomains, an N-terminal class-II glutamine amidotransferase domain (GAT-II)
B37 37
and an Asn_Synthase_B_C PP-loop ATPase domain (Figure 2C) This domain organization suggests that in this subset of enzymes, the hydrolysis of glutamine catalyzed by the GAT-II domain could provide the NH3 moiety to both the DUF71 and the Asn_Synthase_B_C enzymes. On the other hand, in many eukaryotes such as yeast and Arabidopsis thaliana, two YjgF-YER057c-UK114-like domains are fused to the C-terminus of the DUF71 protein as previously noted by Aravind et al.
32
(Figure 2C). The stand-alone members of the YjgF-YER057c-UK114 family, now called the RidA family (for reactive intermediate/imine deaminase A), have been shown to deaminate products generated by PLP-dependent enzymes, which results in the release of NH3
B38 38
. The RidA domains fused to DUF71 could therefore be involved in providing the NH3 ammonium moiety for diphthamide synthesis.
The Duf71 family is not monofunctional
The taxonomic distribution of DUF71 homologs in available complete genomes confirmed that DUF71 is present in one or occasionally two copies in all Archaea except the korarchaeon K. cryptofilum (Table tblr tid T1 1 and Additional file 1: Table S1). This pattern is consistent with an ancient origin of the DUF71 gene in Archaea. In sharp contrast, DUF71 is sporadically distributed in Bacteria, being present only in a few representatives of some phyla (Table 1 and Additional file 1: Table S1). This pattern fits either with an ancient origin of DUF71 in Bacteria followed by numerous losses or, conversely, with a more recent acquisition followed by horizontal gene transfer (HGT) among bacterial lineages. To further investigate the evolutionary history of DUF71, we made a phylogenetic analysis of the homologs identified in the three Domains of Life. The resulting tree showed two divergent groups of sequences. The first group contains the eukaryotic and nearly all archaeal sequences (including the predicted yeast DPH6 (YLR143w) and P. furiosus PF0828), whereas the second encompasses all the bacterial sequences as well as the second copy found in a few archaeal genomes (Figure F4 4 and Additional file 3: Figure S3).
table
Table 1
Taxonomic distribution of DUF71 homologs in archaeal and bacterial genomes
tgroup align left cols 6
colspec colname c1 colnum 1 colwidth 1*
c2 2
c3 3
c4 4
c5 5
c6
thead valign top
row rowsep
entry
Phylum
Nb (%) genomes
Phylum
Nb (%) genomes
Phylum
Nb (%) genomes
tfoot
The number of genomes per phylum containing at least one homolog of DUF71 is indicated.
tbody
Archaea
Crenarchaeota
37/37 (100%)
Korarchaeota
0/1 (0%)
Thaumarchaeota
2/2 (100%)
Euryarchaeota
79/79 (100%)
Bacteria
Acidobacteria
3/7 (42.9%)
Dictyoglomi
0/2 (0%)
Proteobacteria_Epsilon
0/64 (0%)
Actinobacteria
1/206 (0.5%)
Elusimicrobia
0/2 (0%)
Proteobacteria_Gamma
27/406 (6.7%)
Aquificae
0/10 (0%)
Fibrobacteres
0/2 (0%)
PVC_Chlamydiae
1/73 (1.4%)
Bacteroidetes
20/73 (27.4%)
Firmicutes
20/484 (4.1%)
PVC_Planctomycetes
3/6 (50%)
Caldiserica
0/1 (0%)
Fusobacteria
0/5 (0%)
PVC_Verrucomicrobia
0/4 (0%)
Chlorobi
0/11 (0%)
Gemmatimonadetes
0/1 (0%)
Spirochaetes
1/45 (2.2%)
Chloroflexi
5/16 (31.3%)
Ignavibacteria
0/1 (0%)
Synergistetes
0/4 (0%)
Chrysiogenetes
0/1 (0%)
Nitrospirae
1/3 (33.3%)
Thermodesulfobacteria
0/2 (0%)
Cyanobacteria
0/45 (0%)
Proteobacteria_Alpha
2/204 (1%)
Thermotogae
5/14 (35.7%)
Deferribacteres
0/4 (0%)
Proteobacteria_Beta
8/119 (6.7%)
Deinococcus-Thermus
2/17 (11.8%)
Proteobacteria_Delta
1/48 (2.1%)
Figure 4Neighbor-joining phylogenetic tree of the 445 DUF71 homologs identified in public databases.
Neighbor-joining phylogenetic tree of the 445 DUF71 homologs identified in public databases. The scale bar represents the average number of substitutions per site. Numbers at nodes are bootstrap values. For clarity only values greater than 50% are indicated. Colors correspond to the taxonomic affiliation of sequences (see the box on the figure for details). The full tree of Cluster 1 is shown in Additional file 3: Figure S3).
1745-6150-7-32-4 This second group emerged from within the archaeal sequences of the first cluster and showed various contradictions with the currently recognized taxonomy because bacterial sequences from distantly related lineages appeared intermixed in the tree (Figure 4). These observations together with the extremely patchy distribution of DUF71 in bacteria strongly supports the hypothesis that the bacterial DUF71 was of archaeal origin and spread through this domain mainly by HGT. Interestingly, the second homologs present in a few archaeal genomes emerged from bacterial sequences, suggesting that secondary HGT occurred from Bacteria to Archaea allowing them acquiring a second DUF71 homolog.In contrast, a phylogenetic analysis focused on archaeal and eukaryotic sequences strongly supported the separation between these two Domains (posterior probabilities (PP) = 1). Moreover it recovered the monophyly of most eukaryotic and archaeal major lineages (most PP > 0.95, Additional file 3: Figure S3), suggesting that DUF71 was present in their ancestors. However, as expected given the small number of amino acid positions analyzed (182 positions), the relationships among these lineages were mainly unresolved (most PP < 0.95) precluding the in-depth analysis of the ancient evolutionary history of DUF71 in Archaea and Eucarya (Additional file 3: Figure S3). Nevertheless, the wide distribution of DUF71 in these two Domains (even in highly derived parasites such as Microsporidia, Cryptosporidium, Entamoeba or Nanoarchaeum equitans, not shown) and its ancestral presence in most of their orders/phyla suggested that this gene was present in the last common ancestor of these two Domains. This inference does not imply, however, that no HGT occurred in these Domains. Indeed, some incongruence between the DUF71 phylogeny and the reference phylogeny of organisms
B39 39
suggested putative cases of HGT. For instance, it was observed for the Thermofilum pendens DUF71 that robustly groups with Methanomicrobia (Euryarchaeota) and not with other Thermoproteales (Additional file 3: Figure S3).Because diphthamide is a modification specific to the archaeal and eukaryotic EF-2 proteins and bacteria lack all known diphthamide biosynthesis genes, we propose that cluster 1 in our phylogeny corresponds to bona fide Dph6 enzymes involved in diphthamide synthesis (Figure 4). This function therefore very likely represents the ancestral function of the whole DUF71 family. In contrast, bacteria do not synthesize diphthamide, suggesting that the bacterial DUF71 homologs and the few additional archaeal copies (cluster 2, Figure 4) are involved in another function, and thus a functional shift occurred after the HGT of an archaeal bona fide Dph6 to bacteria. Notably, these genes (including PF0295, the second DUF71 copy found in P. furiosus) are strongly clustered on the chromosome with vitamin B12 salvage genes. More precisely 75/102 are adjacent to vitamin B12 transporter genes (such as the BtuCDF genes)
B40 40
and 18/102 are adjacent to cbiB genes encoding adenosylcobinamide-phosphate synthetase, an enzyme shared by the de novo and salvage pathways
B41 41
(Figure F5 5A). This clustering data can be visualized in the “Duf71-B12” subsystem in the SEED database, and two typical clusters are shown in Figure 5B. On this basis, we hypothesize that the archaeal and bacterial DUF71 genes that cluster with B12 vitamin genes have a role in B12 metabolism.
Figure 5Links between the DUF71 family and B12 salvage (A) Summary of cobinamide derivative salvage in Bacteria and Archaea; arrows with dotted lines denote multiple steps.
Links between the DUF71 family and B12 salvage. (A) Summary of cobinamide derivative salvage in Bacteria and Archaea; arrows with dotted lines denote multiple steps. (B) Typical examples of physical clustering of DUF71-B12 genes with B12 salvage genes in Archaea and Bacteria. Abbreviations: Pseudo-B12, adenosylpseudocobalamin; Cbi, Cobinamide; AdoCbi, adenosylCbi; AdoCbi-P, AdenosylCbi-phosphate; AdoCby, adenosylcobyric acid; AP; (R)-1-amino-2-propanol; AP-P, AP-phosphate; Thr-P, L-threonine-phosphate; DMB, 5,6-dimethylbenzimidazole; α-AMP-AP, α-adenylate-AP; CobU, ATP:AdoCbi kinase, GTP:AdoCbi-GDP guanylyltransferase; CobY, NTP:AdoCbi-P nucleotidyltransferase; CobA, ATP:co(I)rrinoid adenosyltransferase; aCbiZ, adenosylcobinamide amidohydrolase; bCbiZ, pseudo-B12 amidohydrolase; CbiB, cobyric acid synthetase; CobD, L-threonine phosphate decarboxylase; CobS, cobalamin (5-P) synthase; CobT, 5,6-dimethylbenzimidazole phosphoribosyltransferase; CobC or CobZ, alpha-ribazole-5′-phosphate phosphatase; cobY, adenosylcobinamide-phosphate guanylyltransferase; CbiP, cobyric acid synthase; BtuFCD, cobamide transporter subunits.
1745-6150-7-32-5 Finally, some bacterial DUF71 proteins might also have other functions because a set of bacteria such as Clostridium perfringens have two or more DUF71 homologs (Figure 4 and Additional file 1: Table S1). The most extreme example is Dehalococcoides sp. CBDB1, which encodes five DUF71 homologs in its genome. In the case of C. perfringens ATCC 13124 and SM101, one homolog (YP_695745 and YP_698440) clusters both physically and phylogenetically (Figure 4 and 5A) with the B12 subgroup proteins, whereas the second homolog (YP_695178 and YP_698039) is related to Acinetobacter baumanii (Cluster 3, Figure 4) and is not found associated to gene clusters related to B12 salvage (data not shown).Therefore, based on phylogenetic and physical clustering the DUF71 proteins were split into: the Dph6 and the DUF71-B12 subgroups that were annotated as such and captured in the “Diphthamide biosynthesis” and “Duf71-B12” subsystems in the SEED database.
Predicting the function of members of the DUF71-B12 subgroup
As members of the DUF71-B12 subgroup clustered strongly with B12 transport genes and with cbiB (Figure 5B), we focused on the early steps on B12 salvage, which are quite diverse because several forms of cobamides [cobalamin-like or Cbl-like compounds] can be salvaged (Figure 5A). Cobinamide (Cbi) is adenylated after transport to form adenosylcobinamide (AdoCbi). In most bacteria, AdoCbi is directly phosphorylated by CobU before being transformed after several steps into adenosylcobalamin (AdoCbl or coenzyme B12), in which the lower ligand is 5,6-dimethylbenzimidazole (DMB) (see
B42 42
for review) (Figure 5A). Archaea use a different salvage route in which AdoCbi is converted to adenosylcobyric acid (AdoCby), an intermediate of the de novo pathway, by an amidohydrolase, aCbiZ
B43 43
(Figure 5A). AdoCby is then converted directly to adenosylcobinamide-phosphate (AdoCbi-P) by CbiB. Finally some bacteria have CbiZ homologs (bCbiZ) that hydrolyze adenosylpseudocobalamin (Ado-Pseudo-B12)
B44 44
, which contains an adenine instead of DMB as its lower ligand (Figure 1C and 5A).In order to gain insight into the possible function of DUF71-B12 family members, we analyzed the co-distribution pattern of CbiZ, CbiB and DUF71-B12 proteins in Archaea and Bacteria. Interestingly, to a few exceptions, all prokaryotic genomes encoding CbiB harbor either CbiZ or DUF71-B12 (Figure F6 6). However, in bacteria, there was strict anti-correlation between the DUF71-B12 and the CbiZ families (Figure 6A). This was not the case in Archaea where quite a few organisms (such as P. furiosis or Methanosarcina mazei Go1) harbored both families (Figure 6B). This distribution profile suggests that members of the DUF71-B12 subfamily fulfil the same roles as the bacterial CbiZ enzymes (bCbiZ), either by catalysing the same reaction (cleaving Ado-pseudo-B12 into AdoCby) or by providing another route to salvaging Pseudo-B12. This hypothesis would explain why bacteria would have one or the other while Archaea could carry both (Figure 6B), because archaeal CbiZ proteins have been predicted to lack pseudo-B12 cleavage activity
44
.
Figure 6Distribution of DUF71-B12, CbiZ and CbiB in bacterial (A) and archaeal genomes (B).
Distribution of DUF71-B12, CbiZ and CbiB in bacterial (A) and archaeal genomes (B). The trees are species tree constructed in iTol (itol.embl.de/), the presence and absence of the specific genes was derived from the “DUF71-B12” subsystem in the SEED database.
1745-6150-7-32-6 Detailed analysis of the signature motifs of the two subfamilies reveal that the strictly conserved EGGE/DXE188 motif (P. furiosus PF0828 numbering) in Dph6 proteins is replaced by a ENGEF/YH188 motif in the DUF71-B12 proteins (Additional file 3: Figure S2 and Additional file 3: Figure S4). In the Dph6 family, E188 is located near the predicted diphthine binding site and is predicted to be involved in catalysis (Figure 3B). The replacement of the strictly conserved E188 residue by a Histidine residue strongly suggest a change in the reaction catalyzed by the DUF71-B12 subfamily compared to the Dph6 family. The structure based comparison between the two subfamilies also strongly supports the hypothesis that their substrates are different, because all residues predicted to be involved in EF-2 binding (Figure 3B see section above) are different in the DUF71-B12 subfamily but mostly conserved within this subfamily (Additional file 3: Figure S2 and residues in green in Figure 3C). Residues that are conserved between the two DUF71 subfamilies (Additional file 3: Figure S2 and residues in blue in Figure 3C) are found around the phosphate groups of ATP, including S12, G13, G14, K15, D16, H48, and T189 (PF0828 sequence numbering) or belong to the C-terminal conserved sequence motif (EGGE/D-X-E188) such as G182, G184, G185, E186, F187 (Additional file 3: Figure S2 and Figure 3C). Further experimental studies will be required to determine whether DUF71-B12 proteins are Ado-pseudo-B12 amidohydrolases or have another role in Ado-pseudo-B12 salvage.
Conclusions
Our detailed analyses of the DUF71 family members presented here provide an example of the power of comparative genomic approaches for solving important “missing genes” or “missing function” cases. These analyses simultaneously illustrate the difficulties inherent in accurately annotating gene families. On one hand, the evidence identifying a candidate for the missing Dph6 gene family derived from genomic evidence (mainly phylogenetic distribution and gene fusions) and post-genomic evidence (structure, co-expression analysis and genome-wide phenotype experiments) is so strong that it could be used as an example where the functional annotation of a protein of unknown function could be derived from comparative genomic alone. On the other hand our analyses show that a subgroup of the DUF71 family is most certainly involved in a metabolic pathway unrelated to diphthamide synthesis and that transferring functional annotations from homology scores alone would be inappropriate in this case. We believe that this integrated functional annotation approach will play an important role in future pipelines for annotation of protein families.
Competing interests
The author(s) declare that they have no competing interests.
Authors’ contributions
VdC-L conducted the comparative genomic analysis and made the functional predictions. CB-A performed the phylogenetic analysis. FF, LT and JFH did the structural analysis. All authors participated in writing/reviewing the manuscript. All authors read and approved the final manuscript.
Reviewers’ comments
Reviewer number 1: Arcady Mushegian
Stowers Institute for Medical Research, 1000 E 50
sup
th
Street, Kansas City, Missouri 64110
The study by de Crecy-Lagard and co-authors pinpoints the DUF71/COG2102 asthe most likely archaeal/eukaryotic ATP-dependent diphthine-ammonia ligase,the so far unaccounted-for enzyme in the pathway of diphtamide biosynthesis, which pathway is responsible for the formation of unique derivative of the conserved histidine within the translation elongation factor 2. A distinct subfamily of this protein family appears to play another role in bacteria and a subset of archaea, most likely in the salvage of an intermediate of cobalamine biosynthesis. The evidence presented in the paper consists of genome context information, sequence-structure prediction and the data from yeast concerning gene expression and chemical-genomics profiling. Taken together, the evidence seems compelling to me. The data from yeast represent partial functional validation of predictions made for prokaryotes. I would recommend only to tone down the suggestion that all this is a “novel paradigm” in analysis of gene function: researchers have been inferring gene functions from phenotypes, as well as from directly detected changes in genotype, for a long, long time, and the current study is a logical extension of these approaches. What is different in the last 15 years is that we can compare these properties across many species with completely sequenced genomes; but even this is a logical extension of the previous work (compare, for example, with work from Yanofskyand Jensen labs on biosynthesis of aromatic amino acids) it was not any prescription of a previous scientific paradigm that constrained the work, but rather the lack of the data.Response: The references to a “novel paragdim” were eliminated in the abstract and the introduction as suggested.
Reviewer number 2: Michael Galperin
NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC 6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075
The paper by de Crecy-Lagard and colleagues is a fine example of using comparative genomics to patch the remaining holes in the metabolic pathways. The key conclusion of this work, prediction of the participation of the members of the DUF71/COG2102 family in diphtamide biosynthesis in archaea and eukaryotes and in B12 metabolism in some bacteria and archaea, is extremely convincing and hardly even needs an experimental verification. The second conclusion, that ammonia used in the diphthine ammonia lyase-catalyzed reaction in different organisms could use generated by two different enzymes, asparagine synthetase and the RidA domain, also sounds convincing. However, proving beyond reasonable doubt that DUF71/COG2102 family members with their ATP-pyrophosphatase activity comprise the key part of diphthine ammonia lyase does not prove that they are the only subunits of this enzyme. Even if the proposed reaction scheme (Figure 1B) is correct, there still might be a need for a ligase subunit that couple removal of the AMP moiety from EF2 with its amidation. There is a definite possibility that DUF71/COG2102 family members catalyze all these individual reactions, e.g. using its unique C-terminal 100-aa domain, but that would have to be proven experimentally. The reported involvement of the likely scaffold protein YBR246w (DPH7) appears to support the idea that diphthine ammonia lyase consists of more than one type of subunits. Otherwise, it is a great paper that vividly demonstrates the power of comparative-genomics approaches.We added a phrase stating that “even if one cannot rule out at this stage that other catalytic subunits yet to be identified may also be required”.
Reviewer number 3: L. Aravind
NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC 6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075
This work uses contextual information to identify the diphthine-ammonia ligase in archaea and eukaryotes. It also shows that the yeast protein YBR246W is indeed not the correct ligase, but rather the MJ0570-like PP-loop ATPases. The authors also show that this family has been transferred to certain bacteria where they infer that it is likely to have undergone a functional shift to participate in B12 salvage. They cautiously propose that it might function as a replacement for CbiZ to function as an amidohydrolase (the reverse of the typical PP-loop ATPase reaction) as against a ligase. The conclusions are definitive and the article makes a useful contribution to the understanding of protein modification and cofactor biosynthesis. This said, there are certain issues with the current form of the article that authors necessarily need to address in their revision: 1) (pg 8) The authors state that the MJ0570-like enzymes have a HUP domain followed by a distinct C-terminal domain. They do not explain the meaning of this properly nor cite the reference of the paper (PMID: 12012333) pertaining to the HUP domains where this family was identified as a PP-loop ATPase, along with the observations (Table 1 in that reference) that it has a primarilyarchaeo-eukaryotic phyletic pattern, and that eukaryotic versions might be fused to two C-terminal domains of the YabJ-like chorismate lyase fold (now termed RidA). It should be stated that the N-terminus is a PP-loop ATPase domain of the HUP class of Rossmannoid domains not all HUP domains are ligases only the PP-loop and the HIGH nucleotidyltransferases This clarifies that it is related to other ATP-utilizing amidoligases such as NAD synthethase, GMP synthetase and asparagine synthetase. This would place their inferred amidoligase activity in the context of comparable, known amidoligase activities of related enzymes. In fact it would be advisable to place the fact that these are PP-loop enzymes in the abstract itself.The following sentence was added: “This family had previously been previously described as a PP-loop ATPase of unknown function containing a Rossmannoid class HUP domain (Aravind et al. 2002).” A reference to the PP-loop ATPase family was added in the abstract as requested. A reference to the same work was added when talking about the RidA fusion. For the phylogenetic distribution the results presented here are a bit different from the previous study because many more genome are available after 10 years and we show that the family is also bacterial.2) The authors persistently refer to the domain as DUF71. This name is no longer current in Pfam and it has long been recognized as mentioned in the reference noted above that these proteins are not “domains of unknown function” but PP-loop ATPases. The domain is correctly termed ATP_bind_4 (PF01902) in Pfam. This Pfam (not the misleading DUF71) name and Pfam number should be indicated with just a statement in the introduction that it was formerly DUF71.This domain is currently called “Domain of unknown function DUF71, ATP-binding domain” in the InterPro database (IPR002761) even if it is called ATP_bind_4 (PF01902) in Pfam. It is much shorter to use (as well as easier for the reader to follow) the DUF71 abbreviation rather than the ATP_bind_4 abbreviation. We therefore prefer to keep DUF71. We however introduced a statement giving the different names of this domain in the InterPro, Pfam and COG databases at the end of the introduction.
3) The authors apparently have
a misapprehension regarding the
Methanohalophilus mahii protein both
in the text and
the domain architecture rendered
in the figure. First, these proteins have two
N-terminal domains fused tothe
MJ0570-like module: namely a
N-terminal class-II glutamineamidotransferase (GAT-II,
e.g. see PMID: 20023723)
and second PP-loop ATPase
domain thereafter (i.e. one
related to asparagine synthetase).
This GAT domain as
in the case of
other PP-loop enzymes could
supply ammonia by cleaving
it off glutamine. But
this does not explain
which PP-loop domain utilizes
it. In the case
of the Asn-synthetase it
is used by the
cognate PP-loop domain. In
this case the presence
of two PP-loop domains
suggests that it is
either utilized by both
for different reactions or
else the second domain
does not receive the
NH3 from this GAT.
This also leads to
the question what reaction
is the Asn synthetase
like PP-loop domain catalyzing? Quality of written English: AcceptableThe source of the confusion came from the fact that the Asn Synthase domain (AsnB) contains two domains the GAT-II domain and the Asn_Synthase_B_C PP-Loop ATPase domain. Both the figure and the text were modified to avoid the confusion. Based on the reviewer’s comments the sentence discussing the potential role of the AsnB domain was modified as follows: “This domain organization strongly suggests that in this subset of enzymes, the hydrolysis of glutamine catalyzed by the fused GAT-II domain could provide the NH3 moiety to both the DUF71 and the Asn_Synthase_B_C enzymes.”4) Based on phyletic complementarity the authors suggest that bacterial CbiZ might be displaced by the bacterial MJ0570-like enzymes. This seems unusual Why utilize a PP-loop ATPase for the reverse reaction, i.e. amidohydrolase? Typically there is little overlap between the families involved in amidohydrolase as opposed to ATP-dependent ligase activity. Of the almost 12 distinct major inventions of amidoligase activity, hardly any representatives of these superfamilies have been reused as amidohydrolases. So do the authors note anything special in the case of the bacterial representatives that might support such a functional shift?This hypothesis is derived from phylogenetic distribution and it is not unprecedented that ligases and hydrolases are found in the same family (see example in PMID:12359880). However, we agree that this hypothesis derives mainly from phylogenetic patterns analysisand beyond the differences in the predicted substrate binding pocket found in the DUF71-B12 family we did not identify specify changes that could point to a shift to hydrolase, hence our caution in our prediction as stated in the text.Quality of written English: Acceptable
bm
ack
Acknowledgements
This work was supported by the US National Science Foundation (grant MCB-1153413 to V. dC-L), the US National Institutes of Health (grant U54GM094597 to G.T. Montelione and the Northeast Structural Genomics Consortium) and the Agence Nationale pour la Recherche (grant ANR-10-BINF-01-0127 Ancestrome) to C. B-A. We thank Raffael Schaffrath and Mike Stark for sharing for sharing unpublished diphthamide related data and critical evaluation of manuscript parts. We thank for Jorge Escalante-Semerena for sharing his immense knowledge on B12 salvage pathways, Diana Downs for disclosing unpublished results on RidA function, Manal Swairjo for chemical insight, and Andrew Hanson for helpful input on the manuscript.
refgrp Unique modifications of translation elongation factorsGreganovaEAltmannMBütikoferPFEBS J2011278152613lpage 2624ADP-ribosylation of elongation factor 2 by diphtheria toxin. Isolation and properties of the novel ribosyl-amino acid and its hydrolysis productsVan NessBGHowardJBBodleyJWJ Biol Chem1980255221071710720ADP-ribosylation of elongation factor 2 by diphtheria toxin. NMR spectra and proposed structures of ribosyl-diphthamide and its hydrolysis productsVan NessBGHowardJBBodleyJWJ Biol Chem1980255221071010716Diphthamide biosynthesis requires an organic radical generated by an iron–sulphur enzymeZhangYZhuXTorelliATLeeMDzikovskiBKoralewskiRMWangEFreedJKrebsCEalickSEetal Nature20104657300891896Mechanistic understanding of Pyrococcus horikoshii Dph2, a [4Fe-4S] enzyme required for diphthamide biosynthesisZhuXDzikovskiBSuXTorelliATZhangYEalickSEFreedJHLinHMol Biosyst2011717481PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular datingLartillotNLepageTBlanquartSBioinformatics2009251722862288XtalView/Xfit—A versatile program for manipulating atomic coordinates and electron densityMcReeDEJ Struct Biol19991252–3156165MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transformKatohKMisawaKKumaK-IMiyataTNucleic Acids Res2002301430593066Diphthamide modification of eEF2 requires a J-domain protein and is essential for normal developmentWebbTRCrossSHMcKieLEdgarRVizorLHarrisonJPetersJJacksonIJJ Cell Sci20081211931403145Reconstitution of diphthine synthase activity in vitroZhuXKimJSuXLinHBiochemistry2010494496499657DPH5, a methyltransferase gene required for diphthamide biosynthesis in Saccharomyces cerevisiaeMattheakisLCShenWHCollierRJMol Cell Biol199212940264037In vitro biosynthesis of diphthamide, studied with mutant Chinese hamster ovary cells resistant to diphtheria toxinMoehringTJDanleyDEMoehringJMMol Cell Biol198444642650YBR246W is required for the third step of diphthamide biosynthesisSuXChenWLeeWJiangHZhangSLinHJ Am Chem Soc20111342773776InterPro in 2011: new developments in the family and domain prediction databaseHunterSJonesPMitchellAApweilerRAttwoodTKBatemanABernardTBinnsDBorkPBurgeSNucleic Acids Res201240D1D306D312The Pfam protein families databaseFinnRDMistryJTateJCoggillPHegerAPollingtonJEGavinOLGunasekaranPCericGForslundKNucleic Acids Res201038suppl_1D211D222The COG database: an updated version includes eukaryotesTatusovRFedorovaNJacksonJJacobsAKiryutinBKooninEKrylovDMazumderRMekhedovSNikolskayaABMC Bioinforma20034141Gapped BLAST and PSI-BLAST: a new generation of protein database search programsAltschulSFMaddenTLSchafferAAZhangJZhangZMillerWLipmanDJNucleic Acids Res1997251733893402Clustal W and Clustal X version 2.0LarkinMABlackshieldsGBrownNPChennaRMcGettiganPAMcWilliamHValentinFWallaceIMWilmALopezRBioinformatics2007232129472948Multiple sequence alignment with hierarchical clusteringCorpetFNucleic Acids Res198816221088110890The subsystems approach to genome annotation and its use in the project to annotate 1000 genomesOverbeekRBegleyTButlerRMChoudhuriJVChuangHYCohoonMde Crécy-LagardVDiazNDiszTEdwardsRNucleic Acids Res2005331756915702The integrated microbial genomes system: an expanding comparative analysis resourceMarkowitzVMChenI-MAPalaniappanKChuKSzetoEGrechkinYRatnerAAndersonILykidisAMavromatisKNucleic Acids Res201038suppl 1D382D390The MicrobesOnline web site for comparative genomicsAlmEJHuangKHPriceMNKocheRPKellerKDubchakILArkinAPGenome Res200515710151022Exploring the functional landscape of gene expression: directed search of large microarray compendiaHibbsMAHessDCMyersCLHuttenhowerCLiKTroyanskayaOGBioinformatics2007232026922699Saccharomyces genome database: the genomics resource of budding yeastCherryJMHongELAmundsenCBalakrishnanRBinkleyGChanETChristieKRCostanzoMCDwightSSEngelSRNucleic Acids Res201240D1D700D705Systematic analysis of genome-wide fitness data in yeast reveals novel gene function and drug actionHillenmeyerMEricsonEDavisRNislowCKollerDGiaeverGGenome Biol2010113R30The chemical genomic portrait of yeast: uncovering a phenotype for all genesHillenmeyerMEFungEWildenhainJPierceSEHoonSLeeWProctorMSt.OngeRPTyersMKollerDScience20083205874362365Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotationLetunicIBorkPBioinformatics2007231127128WebLogo: a sequence logo generatorCrooksGEHonGChandoniaJ-MBrennerSEGenome Res200414611881190NCBI Reference sequences (RefSeq): current status, new features and genome annotation policyPruittKDTatusovaTBrownGRMaglottDRNucleic Acids Res201240D1D130D135MUST, a computer package of management utilities for sequences and treesPhilippeHNucleic Acids Res1993212252645272SeaView Version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree buildingGouyMGuindonSGascuelOMol Biol Evol2010272221224Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA worldAravindLAnantharamanVKooninEVProteins: Structure, Funct Bioinform2002481114A large conformational change in the putative ATP pyrophosphatase PF0828 induced by ATP bindingForouharFSaadatNHussainMSeetharamanJLeeIJanjuaHXiaoRShastryRActonTBMontelioneGTActa Crystallogr Sect F Struct Biol Cryst Commun2011671113231327A chemical genomic screen in Saccharomyces cerevisiae reveals a role for diphthamidation of translation Elongation Factor 2 in inhibition of protein synthesis by SordarinBotetJRodriguez-MateosMBallestaJPGRevueltaJLRemachaMAntimicrob Agents Chemother200852516231629A versatile partner of eukaryotic protein complexes that is involved in multiple biological processes: Kti11/Dph3BärCZabelRLiuSStarkMJRSchaffrathRMol Microbiol200869512211233The nature and character of the transition state for the ADP-ribosyltransferase reactionJorgensenRWangYVisschedykDMerrillAREMBO Rep200898802809Amidoligases with ATP-grasp, glutamine synthetase-like and acetyltransferase-like domains: synthesis of novel metabolites and peptide modifications of proteinsIyerLMAbhimanSMaxwell BurroughsAAravindLMol BioSys200951216361660Conserved YjgF protein family deaminates reactive enamine/imine intermediates of Pyridoxal 5′-Phosphate (PLP)-dependent enzyme reactionsLambrechtJAFlynnJMDownsDMJ Biol Chem2012287534543461Phylogeny and evolution of the Archaea: one hundred genomes laterBrochier-ArmanetCForterrePGribaldoSCurr Opin Microbiol2011143274281In vitro functional characterization of BtuCD-F, the Escherichia coli ABC transporter for vitamin B12 uptakeBorthsELPoolmanBHvorupRNLocherKPReesDCBiochemistry200544491630116309The CbiB protein of Salmonella enterica is an integral membrane protein involved in the last step of the de novo corrin ring biosynthetic pathwayZayasCLClaasKEscalante-SemerenaJCJ Bacteriol20071892176977708Conversion of cobinamide into adenosylcobamide in bacteria and archaeaEscalante-SemerenaJCJ Bacteriol20071891345554560CbiZ, an amidohydrolase enzyme required for salvaging the coenzyme B12 precursor cobinamide in archaeaWoodsonJDEscalante-SemerenaJCProc Natl Acad Sci USA20041011035913596The cobinamide amidohydrolase (cobyric acid-forming) CbiZ enzyme: a critical activity of the cobamide remodelling system of Rhodobacter sphaeroidesGrayMJEscalante-SemerenaJCMol Microbiol200974511981210


xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E1MF9V84T_MCIMUU INGEST_TIME 2013-03-05T20:12:30Z PACKAGE AA00013476_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES



PAGE 1

RESEARCHOpenAccessComparativegenomicanalysisoftheDUF71/ COG2102familypredictsrolesindiphthamide biosynthesisandB12salvageValriedeCrcy-Lagard1*,FarhadForouhar2,ClineBrochier-Armanet3,LiangTong2andJohnFHunt2AbstractBackground: Theavailabilityofover3000publishedgenomesequenceshasenabledtheuseofcomparative genomicapproachestodrivethebiologicalfunctiondiscoveryprocess.Classically,oneusedtolinkgenewith functionbygeneticorbiochemicalapproaches,alengthyprocessthatoftentookyears.Phylogeneticdistribution profiles,physicalclustering,genefusion,co-expressionprofiles,structuralinformationandothergenomicor post-genomicderivedassociationscanbenowusedtomakeverystrongfunctionalhypotheses.Here,weillustrate thisshiftwiththeanalysisoftheDUF71/COG2102family,asubgroupofthePP-loopATPasefamily. Results: TheDUF71familycontainsatleasttwosubfamilies,oneofwhichwaspredictedtobethemissing diphthine-ammonialigase(EC6.3.1.14),Dph6.ThisenzymecatalyzesthelastATP-dependentstepinthesynthesis ofdiphthamide,acomplexmodificationofElongationFactor2thatcanbeADP-ribosylatedbybacterialtoxins. Dph6orthologsarefoundinnearlyallsequencedArchaeaandEucarya,asexpectedfromthedistributionofthe diphthamidemodification.TheDUF71familyappearstohaveoriginatedintheArchaea/Eucaryaancestorandto havebeensubsequentlyhorizontallytransferredtoBacteria.BacterialDUF71memberslikelyacquiredadifferent functionbecausethediphthamidemodificationisabsentinthisDomainofLife.In-depthinvestigationssuggest thatsomearchaealandbacterialDUF71proteinsparticipateinB12salvage. Conclusions: ThisdetailedanalysisoftheDUF71familymembersprovidesanexampleofthepowerofintegrated data-mimingforsolvingimportant “ missinggenes ” or “ missingfunction ” casesandillustratesthedangerof functionalannotationofproteinfamiliesbyhomologyalone. Reviewers ’ names: ThisarticlewasreviewedbyArcadyMushegian,MichaelGalperinandL.Aravind. Keywords: Diphthamide,VitaminB12,Amidotransferase,ComparativegenomicsBackgroundInbothArchaeaandEucarya,thetranslationElongation Factor2(EF-2)harborsacomplexpost-translational modificationofastrictlyconservedhistidine(His699in yeast)calleddiphthamide[1].Thismodificationisthe targetofthediphtheriatoxinandthe Pseudomonas exotoxinA,whichinactivateEF-2byADP-ribosylationof thediphthamide[2,3].Althoughthediphthamidebiosynthesispathwaywasdescribedintheearly1980 s [2,3],thecorrespondingenzymeshaveonlyrecently beencharacterized. Invitro reconstitutionexperiments haveshownthatthefirststep,thetransferofa3-amino3-carboxypropyl(ACP)groupfrom S -adenosylmethionine(SAM)totheC-2positionoftheimidazoleringof thetargethistidineresidue,iscatalyzedinArchaeaby theiron-sulfur-clusterenzyme,Dph2[4,5](Figure1A). Geneticandcomplementationstudieshaveshownthat thecatalysisofthesamefirststeprequiresfourproteins (Dph1-Dph4)inyeastandothereukaryotes[6-9].The subsequentstep,trimethylationofanaminogroupto formthediphthineintermediate,iscatalyzedby diphthinesynthase,Dph5(EC2.1.1.98)(Figure1A) [10,11].Thelaststep,theATP-dependentamidationof thecarboxylategroup[12],iscatalyzedbydiphthine*Correspondence: vcrecy@ufl.edu1DepartmentofMicrobiologyandCellScience,UniversityofFlorida, Gainesville,FL32611,USA Fulllistofauthorinformationisavailableattheendofthearticle 2012deCrcy-Lagardetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsofthe CreativeCommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse, distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 http://www.biology-direct.com/content/7/1/32

PAGE 2

ammonialigase(EC6.3.1.14),butthecorresponding genehasnotbeenidentified(http://www.orenza.u-psud. fr/).Aproteininvolvedinthislaststepwasrecently identifiedinyeast(YBR246WorDph7),butitismost certainlynotdirectlyinvolvedincatalysisasitisnot conservedinArchaeaanditcontainsaWD-domain likelytobeinvolvedinprotein/proteininteractions[13]. Usingacombinationofcomparativegenomic approaches,wesetouttoidentifyacandidategenefor thisorphanenzymefamily.Basedontaxonomic distribution,domainorganizationofgenefusions,physicalclusteringonchromosomes,atomicstructuraldata, co-expression,andphenotypedata,apromisingcandidatewasidentified,thefamilycalledDomainofUnknownFunctionfamilyDUF71(IPR002761)inInterpro [14].ThisfamilyisalsocalledATP_bind_4(PF01902)in Pfam[15]orPredictedATPasesofPP-loopsuperfamily (COG2102)intheClusterofOrtholousGroupdatabase [16].However,detailedanalysisoftheDUF71family revealedthatthisfamilyisalmostsurelynot Dph2 (+Dph1, Dph3, Dph4)EC 2.1.1.98 8Dph5EC 6.3.1.14SAM ATP, NH3 Dph6? (+Dph7) SA M Diphthine DiphthamideA BAdo-PseudoB12 R=deoxyadenosyl bCbiZ CDiphthine Diphthamide Dph6? Dph6? Figure1 StructuresofdiphthamideandB12precursorsandderivatives. ( A )Thecorediphthamidepathwayispredictedtocontainthree enzymesDph2,Dph5andDph6inArchaea.Theformationofdiphthinehasbeenreconstituted invitro usingDph2andDph5from Pyrococcus horikoshii [4,5].TheenzymefamilycatalyzingthelaststepinArchaeaandEukaryaDph6wasmissing.Inyeast,thefirstandlaststepsrequire additionalproteins(Dph1,Dph3andDph7).( B )PredictedDph6-catalyzedreactions.( C )Ado-Pseudo-B12structureandhydrolysissitebythe bacterialCbiZenzyme(bCbiZ).Parts( A )and( B )areadaptedwithpermissionfromXulingZhu;JungwooKim;XiaoyangSu;HeningLin; Biochemistry 2010,49,9649 – 9657.Copyright2010AmericanChemicalSociety. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page2of13 http://www.biology-direct.com/content/7/1/32

PAGE 3

isofunctional.SomeArchaeacontaintwoverydivergent copiesofthegene,whilehomologsarefoundinBacteria,whichareknowntolackdiphthamide.ThisobservationsuggeststhatsomeDUF71membershave differentfunctionsandprobablyparticipateindifferent biochemicalpathways.MethodsComparativegenomicsTheBLASTtools[17]andresourcesatNCBI(http:// www.ncbi.nlm.nih.gov/)wereroutinelyused.MultiplesequencealignmentswerebuiltusingClustalW[18]orMultialin[19].Proteindomainanalysiswasperformedusing thePfamdatabasetools(http://pfam.janelia.org/)[15]. Analysisofthephylogeneticdistributionandphysical clusteringwasperformedintheSEEDdatabase[20]. Resultsareavailableinthe  Diphthamidebiosynthesis Ž and  DUF71-B12 Ž subsystemonthepublicSEEDserver (http://pubseed.theseed.org/SubsysEditor.cgi).PhylogeneticprofilesearcheswereperformedontheIMGplatform [21]usingthephylogeneticquerytool(http://img.jgi.doe. gov/cgi-bin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm).Physicalclusteringwasanalyzedwith theSEEDsubsystemcoloringtoolortheSeedviewer Compareregiontool[20]aswellasontheMicrobesOnline(http://www.microbesonline.org/)treebasedgenome browser[22].TheSPELLmicroarrayanalysisresource [23]wasusedthroughthe Saccharomyces GenomeDatabase(SGD)(http://www.yeastgenome.org/)[24]toanalyze yeastgenecoexpressionprofiles.Clusteringofyeastdeletionmutantsbasedonphenotypeanalysiswasanalyzed throughtheyeastfitnessdatabaseavailableathttp://fitdb. stanford.edu/[25,26].Mappingofgenedistributionprofile totaxonomictreesweregeneratedusingtheiTOLsuite (http://itol.embl.de/index.shtml)[27].Sequencelogos werederivedusingtheWebLogoplatform[28].StructureanalysisVisualizationandcomparisonofproteinstructuresand manualdockingofligandmoleculeswereperformed usingPyMol(ThePyMOLMolecularGraphicsSystem, Version1.4.1,Schrdinger,LLC).XtalView[7]wasused fortheproteindockingexercises.PhylogeneticanalysesThesurveyofthe1996completeprokaryoticgenomes availableattheNCBI(http://www.ncbi.nlm.nih.gov/) usingBLASTP[17](defaultparameters)allowedidentificationof119bacterialand144archaealDUF71homologsinadditiontothe182eukaryoteshomologs identifiedintheRefSeqdatabaseattheNCBI[29] (Additionalfile1:TableS1).Theretrievedsequences werealignedusingMAFFT[8]andtheresultingalignment wasvisuallyinspectedusingED,thealignmenteditorof theMUSTpackage[30].Thephylogeneticanalysisofthe 445sequencewasperformedusingtheneighbor-joining distancemethodimplementedinSeaView[31].The robustnessoftheresultingtreewasassessedbythenonparametricbootstrapmethod(100replicatesofthe originaldataset)implementedinSeaView.Asecondphylogeneticanalysisrestrictedto50archaealandeukaryotic homologsrepresentativeofthegeneticandgenomicdiversityofthesetwoDomainswasperformedusingthe BayesianapproachimplementedinPhylobayes[6]witha LGmodel.ResultsanddiscussionComparativegenomicspointstoDUF71/COG2102asa strongcandidateforthemissingdiphthamidesynthase familyThedistributionofknowndiphthamidebiosynthesis genesinArchaeawasanalyzedusingtheSEEDdatabase anditstools[20].The59archaealgenomesanalyzedall containedanEF-2encodinggene.AnalysisofthedistributionofDph2andDph5inArchaeashowedthat58/59 genomesencodedthesetwoproteins.Theonlyarchaeon lackingbothDph2andDph5was Korarchaeumcryptofilum OPF8(Figure2A).Wethereforehypothesizedthat thisorganismhaslostthediphthamidemodification pathwayevenifthe K.cryptofilum EF-2stillharborsthe conservedHisresidueatthesiteofthemodification (His603inthe K.cryptofilum sequence Accession B1L7Q0inUniprotKB).UsingtheIMG/JGIphylogenetic querytools[21],wesearchedforproteinfamiliesfound inallArchaeaexcept Korarchaeumcryptofilum OPF8, presentin Saccharomycescerevisiae and Homosapiens butabsentin Escherichiacoli and Bacillussubtilis ,as bacteriaareknowntolackthismodificationpathway. Onlyonefamily,DUF71/COG2102,followedthistaxonomicdistribution.ThisfamilyhadbeendescribedpreviouslyasaPP-loopATPaseofunknownfunction containingaRossmannoidclassHUPdomain[32]. UsingtheneighborhoodanalysistooloftheSEED database[20],physicalclusteringwasgenerallynot observedbetweenthe dph2 dph5 and DUF71 genesexceptinthree Methanosarcina genomeswherethe dph5 islocatedinthevicinityof DUF71 genes(Figure2B).If membersoftheDUF71catalyzethelaststepofdiphthamidesynthesistheyshouldbindATP[12].Structural analysisoftheDUF71proteinfrom Pyrococcusfuriosus (PF0828)revealsthepresenceoftwodistinctdomains: anN-terminalHUPdomainthatcontainsahighlyconservedPP-motifthatinteractswithATP(PDBid:3RK1) andAMP(PDBid:3RK0),andaC-terminal100-residue domainbelongingtoanovelfoldwithahighlyconserved motifGEGGEF/YE188T/S( P furiosus numbering) thatisprobablyinvolvedinsubstratebindingand recognition[33].deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page3of13 http://www.biology-direct.com/content/7/1/32

PAGE 4

Coexpression,phenotypeandstructuraldatalinkthe yeastDUF71totranslationanddiphthamidebiosynthesisYLR143wistheonly S cerevisiae DUF71familymember.UsingYLR143wasinputintheSPELLcoexpressionquerytool[23]showedthatnearlyall co-expressedgeneswereinvolvedintranslationand ribosomebiogenesis(Additionalfile2:TableS2).This observationsuggestedthattheDUF71proteinfamily hasaroleintranslationasexpectedforaproteinmodifyingEF-2.Likeallknowndiphthamidesynthesisgenes, YLR143w isalsonotessential.Morespecifically,deletion ofanyofthefiveknowndiphthamidegenesconferssordarinresistanceinyeast[34,35]and ylr143w strainwas showntobeasresistanttothiscompoundasthe diphthamidedeficientstrains(seesupplementaldatain [34]).Furthermore,inarecentcompleteanalysisofrelationshipsbetweengenefitnessprofiles(co-fitness)and druginhibitionprofiles(co-inhibition)fromseveralhundredchemogenomicscreensinyeast[25,26]availableat http://fitdb.stanford.edu/,itwasfoundthatamongthe topteninteractorswithYLR143wbyhomozygouscosensitivityareDPH5,DPH2,DPH4(orJJJ3)andthe newlyidentifiedDPH7(orYBR246w)(Additionalfile3: FigureS1).Boththecoexpressionandphenotypedata therebystronglysupportthehypothesisthatYLR143w catalyzesthemissinglaststepofdiphthamidebiosynthesis,evenifonecannotruleoutatthisstagethat othercatalyticsubunitsyettobeidentifiedmayalsobe required. Finally,comparisonofATP-andAMP-containing structuresofPF0828revealsthattheactivesiteofthe formerhasanarrowgrooveattheendofwhichonlythe -phosphateofATPisexposedtothesolventwhereas theactivesiteofthelatteriswideopen(Figure3Aand B).Also,thereisasharpturnatthe -phosphateof ATP,suggestingthatitisthesiteofthenucleophilicattack.Wethereforeperformedadockingexerciseusing theEF-2structure(PDBid:3B82)[36]withtheATPBMethanohalophilusmahii YP_003542089.1 Methanosalsumzhilinae YP_004615790 Candidatus Nanosalina sp. J07AB43 EGQ42883.1 Duf71 (PF01902) Asn_Synthase_B_ C(cd01991) C Duf71 (PF01902) RidA (PF01042) RidA (PF01042) S. cerevisiae YLR143W Arabidopsis thaliana AT3g04480AMethanosarcinamazei Goe1 Methanosarcinaacetivorans C2A Methanosarcinabarkeri str. fusaro Dph5COG1849COG1885RbsK DUF71/ COG2102 Dph2 Dph5 Duf71 EF2 AsnB(COG0367) Gat-II (cd00352) Figure2 ComparativegenomicanalysisoftheDUF71family. ( A )DistributionofthecorediphthamidegenesDph2andDph5andofEF-2 andDUF71inArchaea,accordingtodataderivedfromthe “ Diphthamidebiosynthesis “ subsystemintheSEEDdatabase.Thetreeisaspeciestree constructediniTol(itol.embl.de/).Thepresenceandabsenceofthespecificgeneswasderivedfromthe “ Diphthamidebiosynthesis “ subsystem. ( B )PhysicalclusteringofDUF71/COG2102geneswithDph5inthree Methanosarcina genomesderivedfromtheMicrobesOnlinedatabase(http:// www.microbesonline.org/).( C )ExamplesofproteinscontainingdomainsfusedtoDUF71inArchaeaandEucarya.AccessionnumbersandCOG, CDD,orPfamdomainnumbersaregiveninparentheses. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page4of13 http://www.biology-direct.com/content/7/1/32

PAGE 5

containingstructureofPF0828.Thedockingrevealed thattheactivesitegrooveoftheATP-containingstructurecaneasilyaccommodatediphthinewithafewminor clashesbetweenthetwostructures(Figure3AandB). Themodelingalsoshowedthatthecarboxylgroup ofdiphthineresidesnearthe -phosphateofATPand carboxylategroupofresidueGlu188,suggestingthat nucleophilicattackbydiphthineonthe -phosphate ofATPishighlyfeasible(Figure3B).Asshownin Figure3B,themodellingalsoshowsthatseveralresidueswhicharehighlyconservedamongarchaealand eukaryoticPF0828andYLR143worthologsbeside E188,includingS44,Y45,E78,Y103,Q104,A149,E183and E186(Additionalfile3:FigureS2),areattheinterface ofthemodelledcomplexofPF0828withEF-2,supportingthehypothesisthattheyplayimportantroles inEF-2recognition(Figure3B).LinkingDUF71familymemberstoammoniatransfer reactionsThediphthineammonialyasereactionrequiresasource ofNH3[12].Domainfusionsinvolvingmembersofthe DUF71familyinthePfamdatabase[15]suggeststhe sourceofNH3mightvarydependingontheorganism. Forexample,inafewArchaea(e.g. Methanohalophilus mahii DSM5219, Methanosalsumzhilinae DSM4017or  Candidatus Nanosalinarumsp.J07AB56 ),aCOG0367/ AsnBasparaginesynthetase[glutamine-hydrolyzing](EC 6.3.5.4)domainisfoundattheN-terminusoftheDUF71 domain(Figure2C).ThisAsnBdomaincanbefurther separatedintotwosubdomains,anN-terminalclass-II glutamineamidotransferasedomain(GAT-II)[37]andan Asn_Synthase_B_CPP-loopATPasedomain(Figure2C). Thisdomainorganizationsuggeststhatinthissubsetof enzymes,thehydrolysisofglutaminecatalyzedbythe GAT-IIdomaincouldprovidetheNH3moietytoboththe DUF71andtheAsn_Synthase_B_Cenzymes.Onthe otherhand,inmanyeukaryotessuchasyeastand Arabidopsisthaliana ,twoYjgF-YER057c-UK114-likedomains arefusedtotheC-terminusoftheDUF71proteinaspreviouslynotedbyAravindetal.[32](Figure2C).The stand-alonemembersoftheYjgF-YER057c-UK114family, nowcalledtheRidAfamily(forreactiveintermediate/ iminedeaminaseA),havebeenshowntodeaminateproductsgeneratedbyPLP-dependentenzymes,whichresults inthereleaseofNH3[38].TheRidAdomainsfusedto Figure3 StructuralanalysisoftheDUF71(PF0828)putative activesite. ( A )DockingofmodifiedEF-2(cyan,PDBid:3B82)onto ATP-boundstructureofPF0828(yellow,PDBid:3RK1).ATPand severalresiduesofPF0828(DUF71),whichareconservedamong archaealandeukaryoticorthologs,anddiphthineofEF-2(seetext fordetails)areshowninstickmodels.( B )Close-upstereopairof panelA.DiphthineofEF-2andthesidechainsofconserved residuesofPF0828,attheinterfaceofPF0828andEF-2,areshown instickmodelsandlabeled.( C )StereopairviewofATP-binding regionofPF0828.ResiduesthatareconservedamongDph6and DUF71-B12familiesaredepictedinstickmodelswithcarbonatoms incyan,whiletheresiduesthatarespecifictoDph6familyare showninstickmodelswithcarbonatomsingreen.Oxygenand nitrogenatomsareshowninredandblueinallstickmodels, respectively. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page5of13 http://www.biology-direct.com/content/7/1/32

PAGE 6

DUF71couldthereforebeinvolvedinprovidingtheNH3ammoniummoietyfordiphthamidesynthesis.TheDuf71familyisnotmonofunctionalThetaxonomicdistributionofDUF71homologsinavailablecompletegenomesconfirmedthatDUF71ispresent inoneoroccasionallytwocopiesinallArchaeaexceptthe korarchaeon K cryptofilum (Table1andAdditionalfile1: TableS1).Thispatternisconsistentwithanancientorigin oftheDUF71geneinArchaea.Insharpcontrast,DUF71 issporadicallydistributedinBacteria,beingpresentonly inafewrepresentativesofsomephyla(Table1and Additionalfile1:TableS1).Thispatternfitseitherwithan ancientoriginofDUF71inBacteriafollowedbynumerous lossesor,conversely,withamorerecentacquisitionfollowedbyhorizontalgenetransfer(HGT)amongbacterial lineages.Tofurtherinvestigatetheevolutionaryhistoryof DUF71,wemadeaphylogeneticanalysisofthehomologs identifiedinthethreeDomainsofLife.Theresultingtree showedtwodivergentgroupsofsequences.Thefirst groupcontainstheeukaryoticandnearlyallarchaeal sequences(includingthepredictedyeastDPH6(YLR143w) and P furiosus PF0828),whereasthesecondencompassesallthebacterialsequencesaswellasthesecond copyfoundinafewarchaealgenomes(Figure4and Additionalfile3:FigureS3). Thissecondgroupemergedfromwithinthearchaeal sequencesofthefirstclusterandshowedvarious contradictionswiththecurrentlyrecognizedtaxonomy becausebacterialsequencesfromdistantlyrelated lineagesappearedintermixedinthetree(Figure4). Theseobservationstogetherwiththeextremelypatchy distributionofDUF71inbacteriastronglysupportsthe hypothesisthatthebacterialDUF71wasofarchaeal originandspreadthroughthisdomainmainlybyHGT. Interestingly,thesecondhomologspresentinafew archaealgenomesemergedfrombacterialsequences, suggestingthatsecondaryHGToccurredfromBacteria toArchaeaallowingthemacquiringasecondDUF71 homolog. Incontrast,aphylogeneticanalysisfocusedon archaealandeukaryoticsequencesstronglysupported theseparationbetweenthesetwoDomains(posterior probabilities(PP)=1).Moreoveritrecoveredthemonophylyofmosteukaryoticandarchaealmajorlineages (mostPP>0.95,Additionalfile3:FigureS3),suggesting thatDUF71waspresentintheirancestors.However,as expectedgiventhesmallnumberofaminoacidpositions analyzed(182positions),therelationshipsamongthese lineagesweremainlyunresolved(mostPP<0.95)precludingthein-depthanalysisoftheancientevolutionary historyofDUF71inArchaeaandEucarya(Additional file3:FigureS3).Nevertheless,thewidedistributionof DUF71inthesetwoDomains(eveninhighlyderived parasitessuchas Microsporidia Cryptosporidium Entamoeba or Nanoarchaeumequitans ,notshown)and itsancestralpresenceinmostoftheirorders/phylasuggestedthatthisgenewaspresentinthelastcommonancestorofthesetwoDomains.Thisinferencedoesnot imply,however,thatnoHGToccurredinthese Domains.Indeed,someincongruencebetweenthe DUF71phylogenyandthereferencephylogenyoforganisms[39]suggestedputativecasesofHGT.Forinstance, itwasobservedforthe Thermofilumpendens DUF71 Table1TaxonomicdistributionofDUF71homologsinarchaealandbacterialgenomesPhylumNb(%)genomesPhylumNb(%)genomesPhylumNb(%)genomes Archaea Crenarchaeota37/37(100%)Korarchaeota0/1(0%)Thaumarchaeota2/2(100%) Euryarchaeota79/79(100%) Bacteria Acidobacteria3/7(42.9%)Dictyoglomi0/2(0%)Proteobacteria_Epsilon0/64(0%) Actinobacteria1/206(0.5%)Elusimicrobia0/2(0%)Proteobacteria_Gamma27/406(6.7%) Aquificae0/10(0%)Fibrobacteres0/2(0%)PVC_Chlamydiae1/73(1.4%) Bacteroidetes20/73(27.4%)Firmicutes20/484(4.1%)PVC_Planctomycetes3/6(50%) Caldiserica0/1(0%)Fusobacteria0/5(0%)PVC_Verrucomicrobia0/4(0%) Chlorobi0/11(0%)Gemmatimonadetes0/1(0%)Spirochaetes1/45(2.2%) Chloroflexi5/16(31.3%)Ignavibacteria0/1(0%)Synergistetes0/4(0%) Chrysiogenetes0/1(0%)Nitrospirae1/3(33.3%)Thermodesulfobacteria0/2(0%) Cyanobacteria0/45(0%)Proteobacteria_Alpha2/204(1%)Thermotogae5/14(35.7%) Deferribacteres0/4(0%)Proteobacteria_Beta8/119(6.7%) Deinococcus-Thermus2/17(11.8%)Proteobacteria_Delta1/48(2.1%)ThenumberofgenomesperphylumcontainingatleastonehomologofDUF71isindicated.deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page6of13 http://www.biology-direct.com/content/7/1/32

PAGE 7

Chromobacterium violaceum ATCC 12472 NP_902058 Aeromonas salmonicida subsp. salmonicida A449 YP_001142643 Aeromonas veronii B565 YP_004393356 Aeromonas hydrophila subsp. hydrophila ATCC 7966 YP_856022 Pyrococcus sp. NA2 YP_004423066 Pyrococcus abyssi GE5 NP_126295 Pyrococcus furiosus DSM 3638 NP_578024 Thermococcus kodakarensis KOD1 YP_183274 Pyrococcus horikoshii OT3 NP_142349 Thermococcus barophilus MP_YP_004070751 Thermococcus onnurineus NA1 YP_002307067 Thermococcus gammatolerans EJ3 YP_002958908 Thermococcus sp. AM4 YP_002582960 Pyrolobus fumarii 1A YP_004780286 Ignisphaera aggregans DSM 17230 YP_003859058 Acidilobus saccharovorans 345-15 YP_003816038 Pyrobaculum aerophilum str. IM2 NP_558567 Pyrobaculum calidifontis JCM 11548 YP_001056423 Pyrobaculum oguniense TE7 YP_005259912 Thermoproteus tenax Kra 1 YP_004892056 Thermoproteus uzoniensis 768-20 YP_004336982 Thermoproteus neutrophilus V24Sta YP_001793918 Pyrobaculum islandicum DSM 4184 YP_930950 Aeropyrum pernix K1 NP_148342 Aliivibrio salmonicida LFI1238 YP_002264215 Vibrio fischeri MJ11 YP_002157220 Vibrio fischeri ES114 YP_205817 Vibrio furnissii NCTC 11218 YP_004991716 Vibrio anguillarum 775 YP_004564701 Photobacterium profundum SS9 YP_131536 Vibrio splendidus LGP32 YP_002418517 Vibrio vulnificus YJ016 NP_932960 Vibrio vulnificus CMCP6 NP_760118 Vibrio vulnificus MO6-24 O YP_004187372 Vibrio sp. EJY3 YP_005024419 Vibrio sp. Ex25 YP_003284785 Vibrio parahaemolyticus RIMD 2210633 NP_799316 PlanctomyceslimnophilusDSM 3776 YP_003629882 Zobellia galactanivorans YP_004737775 Marivirga tractuosa DSM 4126 YP_004055467 Cytophaga hutchinsonii ATCC 33406 YP_678376 Zunongwangia profunda SM-A87 YP_003584394 Niastella koreensis GR20-10 YP_005012462 Cellulophaga lytica DSM 7489 YP_004260901 Croceibacter atlanticus HTCC2559 YP_003717135 Lacinutrix sp. 5H-3-7-4 YP_004580289 Gramella forsetii KT0803 YP_861903 Runella slithyformis DSM 19594 YP_004655601 Spirosoma linguale DSM 74 YP_003390559 Fluviicola taffensis DSM 16823 YP_004346296 Flavobacterium indicum GPTSA100-9 YP_005357093 Flavobacterium psychrophilum JIP02 86 YP_001295914 Flavobacterium columnare ATCC 49512 YP_004941086 Flavobacterium johnsoniae UW101 YP_001195269 RhodothermusmarinusSG0.5JP17-172 YP_004825176 Rhodothermus marinus DSM 4252 YP_003290637 Thermobaculum terrenum ATCC BAA-798 YP_003321764 Salinibacter ruber M8 YP_003571382 Salinibacter ruber DSM 13855 YP_445440 Conexibacter woesei DSM 14684 YP_003396308 Thermaerobacter marianensis DSM 12885 YP_004102200 Pseudoxanthomonas suwonensis 11-1 YP_004147801 Nitrobacter hamburgensis X14 YP_575433 Nitrobacter winogradskyi Nb-255 YP_316684 PelobactercarbinolicusDSM 2380 YP_357837 PlanctomycesbrasiliensisDSM 5305 YP_004271423 Candidatus SolibacterusitatusEllin6076 YP_821425 AcidobacteriumcapsulatumATCC 51196 YP_002755976 Nitrosomonas eutropha C91 YP_748349 IsosphaerapallidaATCC 43644 YP_004179529 Candidatus KoribacterversatilisEllin345 YP_592723 Ferrimonas balearica DSM 9799 YP_003914428 Aeromonas hydrophila subsp. hydrophila ATCC 7966 YP_854946 Aeromonas veronii B565 YP_004394294 Aeromonas salmonicida subsp. salmonicida A449 YP_001143626 Nitrosomonas europaea ATCC 19718 NP_841603 Gallionella capsiferriformans ES-2 YP_003848368 Sideroxydans lithotrophicus ES-1 YP_003523855 Meiothermus silvanus DSM 9946 YP_003686661 Flavobacterium indicum GPTSA100-9 YP_005356948 Flavobacterium columnare ATCC 49512 YP_004941633 Enterococcus faecalis OG1RF YP_005708970 Enterococcus faecalis 62 YP_005706644 Enterococcus faecalis V583 NP_816130 Methanocorpusculum labreanum Z YP_001030536 Brachyspira intermedia PWS A YP_005593503 Clostridium clariflavum DSM 19732 YP_005047925 Clostridium lentocellum DSM 5427 YP_004310192 Clostridium difficile BI1 YP_006198469 Clostridium difficile R20291 YP_003217762 Clostridium difficile CD196 YP_003214316 Clostridium difficile 630 YP_001087917 Clostridium botulinum E3 str. Alaska E43 YP_001922040 Clostridium beijerinckii NCIMB 8052 YP_001308407 Clostridium botulinum B str. Eklund 17B YP_001885603 Clostridium perfringens SM101 YP_698440 Clostridium perfringens ATCC 13124 YP_695745 Clostridium perfringens str. 13 NP_561960 Methanosaeta concilii GP6 YP_004384553 Methanosarcina mazei Go1 NP_634837 Dehalococcoides sp. CBDB1 YP_307398Dehalococcoides sp. VS YP_003329581 Dehalococcoides ethenogenes 195 YP_180986 Dehalococcoides sp. BAV1 YP_001213574 Dehalococcoides sp. CBDB1 YP_307267 Dehalococcoides sp. BAV1 YP_001213750 Dehalococcoides sp. VS YP_003330762 Dehalococcoides sp. VS YP_003329596 Dehalococcoides sp. VS YP_003330811 Dehalococcoides sp. CBDB1 YP_307272 Dehalococcoides sp. BAV1 YP_001213579 Dehalococcoides sp. GT YP_003463115 Dehalococcoides sp. CBDB1 YP_308519 Dehalococcoides sp. GT YP_003463086 Dehalococcoides sp. CBDB1 YP_308477 Lysinibacillus sphaericus C3-41 YP_001698527 Clostridium perfringens ATCC 13124 YP_695178 Clostridium perfringens SM101 YP_698039 Paenibacillus sp. JDR-2 YP_003009328 Acinetobacter oleivorans DR1 YP_003731161 Acinetobacter calcoaceticus PHEA-2 YP_004996279 Acinetobacter baumannii MDR-ZJ06 YP_005526958 Acinetobacter baumannii 1656-2 YP_005515444 Acinetobacter baumannii ACICU YP_001847477 Acinetobacter baumannii AB307-0294 YP_002324816 Acinetobacter baumannii AB0057 YP_002320315 Acinetobacter baumannii AYE YP_001712867 Archaeoglobus fulgidus DSM 4304 NP_069554 Thermodesulfovibrio yellowstonii DSM 11347 YP_002249002 Collimonas fungivorans Ter331 YP_004754874 Thiomonas intermedia K12 YP_003642272 Laribacter hongkongensis HLHK9 YP_002796382 Clostridium sticklandii DSM 519 YP_003937452 WaddliachondrophilaWSU 86-1044 YP_003709136 Methanococcus maripaludis C7 YP_001330978 Fervidobacterium pennivorans DSM 9078 YP_005471911 Fervidobacterium nodosum Rt17-B1 YP_001410876 Thermotoga thermarum DSM 5069 YP_004660873 Thermosipho africanus TCF52B YP_002334883 Thermosipho melanesiensis BI429 YP_001305965 0.1 91 100 51 100 64 58 100 96 70 100 86 100 95 98 100 95 100 99 73 70 100 72 96 81 100 70 72 53 78 100 100 100 59 74 100 98 67 54 50 75 100 93 100 100 100 100 100 100 70 100 100 99 51 53 64 58 100 100 100 100 100 100 62 62 62 99 100 82 100 56 100 56 100 100 100 100 100 100 88 100 100 87 71 100 100 93 96 94 Eucarya Archaea Actinobacteria Acidobacteria Bacteroidetes Chloroflexi Deinococcus_Thermus Firmicutes Nitrospirae Proteobacteria_Alpha Proteobacteria_Beta Proteobacteria_Delta Proteobacteria_Gamma PVC_Planctomycetes PVC_Chlamydiae Spirochaetes Thermotogae Cluster 1/Dph6 Cluster 2/Duf71-B12 Cluster 3 Figure4 (Seelegendonnextpage.) deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page7of13 http://www.biology-direct.com/content/7/1/32

PAGE 8

thatrobustlygroupswithMethanomicrobia(Euryarchaeota)andnotwithotherThermoproteales(Additional file3:FigureS3). Becausediphthamideisamodificationspecifictothe archaealandeukaryoticEF-2proteinsandbacterialackall knowndiphthamidebiosynthesisgenes,weproposethat cluster1inourphylogenycorrespondsto bonafide Dph6 enzymesinvolvedindiphthamidesynthesis(Figure4). Thisfunctionthereforeverylikelyrepresentstheancestral functionofthewholeDUF71family.Incontrast,bacteria donotsynthesizediphthamide,suggestingthatthebacterialDUF71homologsandthefewadditionalarchaealcopies(cluster2,Figure4)areinvolvedinanotherfunction, andthusafunctionalshiftoccurredaftertheHGTofan archaealbonafideDph6tobacteria.Notably,these genes(includingPF0295,thesecondDUF71copy foundin P furiosus )arestronglyclusteredonthe chromosomewithvitaminB12salvagegenes.More precisely75/102areadjacenttovitaminB12transportergenes(suchastheBtuCDFgenes)[40]and 18/102areadjacentto cbiB genesencoding adenosylcobinamide-phosphatesynthetase,anenzymesharedbythe denovo andsalvagepathways [41](Figure5A).Thisclusteringdatacanbevisualizedinthe  Duf71-B12 Ž subsystemintheSEED database,andtwotypicalclustersareshownin (Seefigureonpreviouspage.) Figure4 Neighbor-joiningphylogenetictreeofthe445DUF71homologsidentifiedinpublicdatabases. Thescalebarrepresentsthe averagenumberofsubstitutionspersite.Numbersatnodesarebootstrapvalues.Forclarityonlyvaluesgreaterthan50%areindicated.Colors correspondtothetaxonomicaffiliationofsequences(seetheboxonthefigurefordetails).ThefulltreeofCluster1isshowninAdditionalfile3: FigureS3). APyrobaculumcalidifontis cobY cobA cbiBDUF71-B12 cobD cobS cobT aCbiZ/CobZ Clostridium perfringens cobUcobC cbiBDUF71-B12 btuD btuF cobT CbiP btuC cobD cobS B BtuB FCDCbiCbi CobA AdoCbi CobUTSC(Bacteria) CobYZ(Archaea)AdoCbi-P aCbiZ AdoCby De novo corrinring synthesis CbiB Ado-B12 Ado-Pseudo-B12 BtuB FCD BtuB FCDB12 Pseudo-B12 bCbiZ AP AP-P -AMP-AP DMB CobU CbiP Thr-P CobA Pseudo-B12 B12 CobA CobD Figure5 LinksbetweentheDUF71familyandB12salvage. ( A )SummaryofcobinamidederivativesalvageinBacteriaandArchaea;arrows withdottedlinesdenotemultiplesteps.( B )TypicalexamplesofphysicalclusteringofDUF71-B12geneswithB12salvagegenesinArchaeaand Bacteria.Abbreviations:Pseudo-B12,adenosylpseudocobalamin;Cbi,Cobinamide;AdoCbi,adenosylCbi;AdoCbi-P,AdenosylCbi-phosphate; AdoCby,adenosylcobyricacid;AP;(R)-1-amino-2-propanol;AP-P,AP-phosphate;Thr-P,L-threonine-phosphate;DMB,5,6-dimethylbenzimidazol e; AMP-AP, -adenylate-AP;CobU,ATP:AdoCbikinase,GTP:AdoCbi-GDPguanylyltransferase;CobY,NTP:AdoCbi-Pnucleotidyltransferase;CobA,ATP:co (I)rrinoidadenosyltransferase;aCbiZ,adenosylcobinamideamidohydrolase;bCbiZ,pseudo-B12amidohydrolase;CbiB,cobyricacidsynthetase; CobD,L-threoninephosphatedecarboxylase;CobS,cobalamin(5-P)synthase;CobT,5,6-dimethylbenzimidazolephosphoribosyltransferase;CobC orCobZ,alpha-ribazole-5 -phosphatephosphatase;cobY,adenosylcobinamide-phosphateguanylyltransferase;CbiP,cobyricacidsynthase;BtuFCD, cobamidetransportersubunits. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page8of13 http://www.biology-direct.com/content/7/1/32

PAGE 9

Figure5B.Onthisbasis,wehypothesizethatthe archaealandbacterialDUF71genesthatclusterwith B12vitamingeneshavearoleinB12metabolism. Finally,somebacterialDUF71proteinsmightalsohave otherfunctionsbecauseasetofbacteriasuchasClostridiumperfringenshavetwoormoreDUF71homologs (Figure4andAdditionalfile1:TableS1).ThemostextremeexampleisDehalococcoidessp.CBDB1,which encodesfiveDUF71homologsinitsgenome.Inthecase ofC.perfringensATCC13124andSM101,onehomolog (YP_695745andYP_698440)clustersbothphysicallyand phylogenetically(Figure4and5A)withtheB12subgroup proteins,whereasthesecondhomolog(YP_695178and YP_698039)isrelatedtoAcinetobacterbaumanii(Cluster 3,Figure4)andisnotfoundassociatedtogeneclusters relatedtoB12salvage(datanotshown). Therefore,basedonphylogeneticandphysicalclusteringtheDUF71proteinsweresplitinto:theDph6and theDUF71-B12subgroupsthatwereannotatedassuch andcapturedinthe  Diphthamidebiosynthesis Ž and  Duf71-B12 Ž subsystemsintheSEEDdatabase.PredictingthefunctionofmembersoftheDUF71-B12 subgroupAsmembersoftheDUF71-B12subgroupclustered stronglywithB12transportgenesandwith cbiB (Figure5B),wefocusedontheearlystepsonB12salvage, whicharequitediversebecauseseveralformsofcobamides[cobalamin-likeorCbl-likecompounds]canbesalvaged(Figure5A).Cobinamide(Cbi)isadenylatedafter transporttoformadenosylcobinamide(AdoCbi).Inmost bacteria,AdoCbiisdirectlyphosphorylatedbyCobUbeforebeingtransformedafterseveralstepsintoadenosylcobalamin(AdoCblorcoenzymeB12),inwhichthelower ligandis5,6-dimethylbenzimidazole(DMB)(see[42]for review)(Figure5A).Archaeauseadifferentsalvageroute inwhichAdoCbiisconvertedtoadenosylcobyricacid (AdoCby),anintermediateofthe denovo pathway,byan amidohydrolase,aCbiZ[43](Figure5A).AdoCbyisthen converteddirectlytoadenosylcobinamide-phosphate (AdoCbi-P)byCbiB.FinallysomebacteriahaveCbiZ homologs(bCbiZ)thathydrolyzeadenosylpseudocobalamin(Ado-Pseudo-B12)[44],whichcontainsanadenine insteadofDMBasitslowerligand(Figure1Cand5A). Inordertogaininsightintothepossiblefunctionof DUF71-B12familymembers,weanalyzedthecodistributionpatternofCbiZ,CbiBandDUF71-B12proteinsinArchaeaandBacteria.Interestingly,toafew exceptions,allprokaryoticgenomesencodingCbiBharboreitherCbiZorDUF71-B12(Figure6).However,in bacteria,therewasstrictanti-correlationbetweenthe DUF71-B12andtheCbiZfamilies(Figure6A).Thiswas notthecaseinArchaeawherequiteafeworganisms (suchas P furiosis or Methanosarcinamazei Go1) harboredbothfamilies(Figure6B).Thisdistribution profilesuggeststhatmembersoftheDUF71-B12subfamilyfulfilthesamerolesasthebacterialCbiZenzymes (bCbiZ),eitherbycatalysingthesamereaction(cleaving Ado-pseudo-B12intoAdoCby)orbyprovidinganother routetosalvagingPseudo-B12.Thishypothesiswould explainwhybacteriawouldhaveoneortheotherwhile Archaeacouldcarryboth(Figure6B),becausearchaeal CbiZproteinshavebeenpredictedtolackpseudo-B12 cleavageactivity[44]. Detailedanalysisofthesignaturemotifsofthetwo subfamiliesrevealthatthestrictlyconservedEGGE/ DXE188motif( P furiosus PF0828numbering)in Dph6proteinsisreplacedbyaENGEF/YH188motif intheDUF71-B12proteins(Additionalfile3:Figure S2andAdditionalfile3:FigureS4).IntheDph6 family,E188islocatednearthepredicteddiphthine bindingsiteandispredictedtobeinvolvedincatalysis(Figure3B).ThereplacementofthestrictlyconservedE188residuebyaHistidineresiduestrongly suggestachangeinthereactioncatalyzedbythe DUF71-B12subfamilycomparedtotheDph6family. Thestructurebasedcomparisonbetweenthetwo subfamiliesalsostronglysupportsthehypothesisthat theirsubstratesaredifferent,becauseallresiduespredictedtobeinvolvedinEF-2binding(Figure3Bsee sectionabove)aredifferentintheDUF71-B12subfamilybutmostlyconservedwithinthissubfamily (Additionalfile3:FigureS2andresiduesingreenin Figure3C).Residuesthatareconservedbetweenthe twoDUF71subfamilies(Additionalfile3:FigureS2 andresiduesinblueinFigure3C)arefoundaround thephosphategroupsofATP,includingS12,G13,G14, K15,D16,H48,andT189(PF0828sequencenumbering) orbelongtotheC-terminalconservedsequencemotif (EGGE/D-X-E188)suchasG182,G184,G185,E186,F187(Additionalfile3:FigureS2andFigure3C).Further experimentalstudieswillberequiredtodetermine whetherDUF71-B12proteinsareAdo-pseudo-B12 amidohydrolasesorhaveanotherroleinAdo-pseudoB12salvage.ConclusionsOurdetailedanalysesoftheDUF71familymemberspresentedhereprovideanexampleofthepowerofcomparativegenomicapproachesforsolvingimportant  missing genes Ž or  missingfunction Ž cases.Theseanalysessimultaneouslyillustratethedifficultiesinherentinaccurately annotatinggenefamilies.Ononehand,theevidenceidentifyingacandidateforthemissingDph6genefamily derivedfromgenomicevidence(mainlyphylogeneticdistributionandgenefusions)andpost-genomicevidence (structure,co-expressionanalysisandgenome-wide phenotypeexperiments)issostrongthatitcouldbeuseddeCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page9of13 http://www.biology-direct.com/content/7/1/32

PAGE 10

asanexamplewherethefunctionalannotationofaproteinofunknownfunctioncouldbederivedfromcomparativegenomicalone.Ontheotherhandouranalysesshow thatasubgroupoftheDUF71familyismostcertainly involvedinametabolicpathwayunrelatedtodiphthamide synthesisandthattransferringfunctionalannotations fromhomologyscoresalonewouldbeinappropriatein thiscase.Webelievethatthisintegratedfunctionalannotationapproachwillplayanimportantroleinfuturepipelinesforannotationofproteinfamilies.AdditionalfilesAdditionalfile1: TableS1. GenbankRefSeqidentitiesand correspondingorganismsforallproteinsusedinthephylogenies. Additionalfile2: TableS2. GOTermEnrichmentSpellanalysis(http:// imperio.princeton.edu:3000/yeast)withYLR143wasinput. Additionalfile3: FigureS1. Top10interactorswithYLR143Wby homozygousco-sensitivityinS.cerevisiae(fromtheYeastfitness databasehttp://fitdb.stanford.edu/fitdb.cgi?query=YLR143W). FigureS2 MultiplesequencealignmentofselectedDph6familyandDUF71-B12 familysequencesgeneratedusingtheMultialinplatform(http://multalin. toulouse.inra.fr/multalin/)Strictlyconservedresiduesbetweenthetwo familiesareinred.ResiduesconservedonlyintheDph6familyareboxed ingreen.ResiduesfoundaroundthephosphategroupofATParenoted byredarrows.Secondarystructuralelements,yellowrectanglesfor helixandcyanarrowsfor -strand,shownabovethealignment,arefrom thecrystalstructureofP.furiosus_Dph6(PF0828)(PDBid:3RK1). Figure S3 BayesiantreeofarchaealandeukaryoticDph6sequences.Thescale barrepresentstheaveragenumberofsubstitutionspersite.Numberat nodesrepresentposteriorprobabilities.Forclarityonlyvaluesgreater than0.85areindicated. FigureS4 (Top)Sequencelogoderivedfrom95 Dph6sequencesextractedfromDiphthamidesubsysteminSEED.The E188reside(PF0828numbering)islocatedatposition10inthelogo. (Bottom)Sequencelogoderivedofthecorrespondingregionderived from102DUF71-B12sequencesextractedfromtheDUF71-B12 subsysteminSEED.Bothlogosweremadeathttp://weblogo.berkeley. edu/logo.cgibasedonclustalwderivedalignments. ABDUF71-B12 CbiZ CbiB Figure6 DistributionofDUF71-B12,CbiZandCbiBinbacterial(A)andarchaealgenomes(B). Thetreesarespeciestreeconstructedin iTol(itol.embl.de/),thepresenceandabsenceofthespecificgeneswasderivedfromthe “ DUF71-B12 ” subsystemintheSEEDdatabase. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page10of13 http://www.biology-direct.com/content/7/1/32

PAGE 11

Competinginterests Theauthor(s)declarethattheyhavenocompetinginterests. Authors ’ contributions VdC-Lconductedthecomparativegenomicanalysisandmadethe functionalpredictions.CB-Aperformedthephylogeneticanalysis.FF,LTand JFHdidthestructuralanalysis.Allauthorsparticipatedinwriting/reviewing themanuscript.Allauthorsreadandapprovedthefinalmanuscript. Reviewers ’ comments Reviewernumber1:ArcadyMushegian StowersInstituteforMedicalResearch,1000E50thStreet,KansasCity, Missouri64110 ThestudybydeCrecy-Lagardandco-authorspinpointstheDUF71/ COG2102asthemostlikelyarchaeal/eukaryoticATP-dependentdiphthineammonialigase,thesofarunaccounted-forenzymeinthepathwayof diphtamidebiosynthesis,whichpathwayisresponsiblefortheformationof uniquederivativeoftheconservedhistidinewithinthetranslation elongationfactor2.Adistinctsubfamilyofthisproteinfamilyappearstoplay anotherroleinbacteriaandasubsetofarchaea,mostlikelyinthesalvageof anintermediateofcobalaminebiosynthesis.Theevidencepresentedinthe paperconsistsofgenomecontextinformation,sequence-structure predictionandthedatafromyeastconcerninggeneexpressionand chemical-genomicsprofiling.Takentogether,theevidenceseems compellingtome.Thedatafromyeastrepresentpartialfunctionalvalidation ofpredictionsmadeforprokaryotes.Iwouldrecommendonlytotonedown thesuggestionthatallthisisa “ novelparadigm ” inanalysisofgenefunction: researchershavebeeninferringgenefunctionsfromphenotypes,aswellas fromdirectlydetectedchangesingenotype,foralong,longtime,andthe currentstudyisalogicalextensionoftheseapproaches.Whatisdifferentin thelast15yearsisthatwecancomparethesepropertiesacrossmany specieswithcompletelysequencedgenomes;buteventhisisalogical extensionofthepreviouswork(compare,forexample,withworkfrom YanofskyandJensenlabsonbiosynthesisofaromaticaminoacids)-itwas notanyprescriptionofapreviousscientificparadigmthatconstrainedthe work,butratherthelackofthedata. Response: Thereferencestoa “ novelparagdim ” wereeliminatedintheabstract andtheintroductionassuggested. Reviewernumber2:MichaelGalperin NCBI,NLM,NIHComputationalBiologyBranch,8600RockvillePikeMSC 6075,Building38A,Room6N601,Bethesda,MD20894-6075 ThepaperbydeCrecy-Lagardandcolleaguesisafineexampleofusing comparativegenomicstopatchtheremainingholesinthemetabolic pathways.Thekeyconclusionofthiswork,predictionoftheparticipationof themembersoftheDUF71/COG2102familyindiphtamidebiosynthesisin archaeaandeukaryotesandinB12metabolisminsomebacteriaandarchaea, isextremelyconvincingandhardlyevenneedsanexperimentalverification. Thesecondconclusion,thatammoniausedinthediphthineammonialyasecatalyzedreactionindifferentorganismscouldusegeneratedbytwo differentenzymes,asparaginesynthetaseandtheRidAdomain,alsosounds convincing.However,provingbeyondreasonabledoubtthatDUF71/ COG2102familymemberswiththeirATP-pyrophosphataseactivitycomprise thekeypartofdiphthineammonialyasedoesnotprovethattheyarethe onlysubunitsofthisenzyme.Eveniftheproposedreactionscheme (Figure1B)iscorrect,therestillmightbeaneedforaligasesubunitthat coupleremovaloftheAMPmoietyfromEF2withitsamidation.Thereisa definitepossibilitythatDUF71/COG2102familymemberscatalyzeallthese individualreactions,e.g.usingitsuniqueC-terminal100-aadomain,butthat wouldhavetobeprovenexperimentally.Thereportedinvolvementofthe likelyscaffoldproteinYBR246w(DPH7)appearstosupporttheideathat diphthineammonialyaseconsistsofmorethanonetypeofsubunits. Otherwise,itisagreatpaperthatvividlydemonstratesthepowerof comparative-genomicsapproaches. Weaddedaphrasestatingthat “ evenifonecannotruleoutatthisstage thatothercatalyticsubunitsyettobeidentifiedmayalsoberequired ” Reviewernumber3:L.Aravind NCBI,NLM,NIHComputationalBiologyBranch,8600RockvillePikeMSC 6075,Building38A,Room6N601,Bethesda,MD20894-6075 Thisworkusescontextualinformationtoidentifythediphthine-ammonia ligaseinarchaeaandeukaryotes.Italsoshowsthattheyeastprotein YBR246Wisindeednotthecorrectligase,butrathertheMJ0570-likePP-loop ATPases.Theauthorsalsoshowthatthisfamilyhasbeentransferredto certainbacteriawheretheyinferthatitislikelytohaveundergonea functionalshifttoparticipateinB12salvage.Theycautiouslyproposethatit mightfunctionasareplacementforCbiZtofunctionasanamidohydrolase (thereverseofthetypicalPP-loopATPasereaction)asagainstaligase.The conclusionsaredefinitiveandthearticlemakesausefulcontributiontothe understandingofproteinmodificationandcofactorbiosynthesis.Thissaid, therearecertainissueswiththecurrentformofthearticlethatauthors necessarilyneedtoaddressintheirrevision:1)(pg8)Theauthorsstatethat theMJ0570-likeenzymeshaveaHUPdomainfollowedbyadistinctCterminaldomain.Theydonotexplainthemeaningofthisproperlynorcite thereferenceofthepaper(PMID:12012333)pertainingtotheHUPdomains wherethisfamilywasidentifiedasaPP-loopATPase,alongwiththe observations(Table1inthatreference)thatithasaprimarilyarchaeoeukaryoticphyleticpattern,andthateukaryoticversionsmightbefusedto twoC-terminaldomainsoftheYabJ-likechorismatelyasefold(nowtermed RidA).ItshouldbestatedthattheN-terminusisaPP-loopATPasedomainof theHUPclassofRossmannoiddomains-notallHUPdomainsareligasesonlythePP-loopandtheHIGHnucleotidyltransferases.Thisclarifiesthatitis relatedtootherATP-utilizingamidoligasessuchasNADsynthethase,GMP synthetaseandasparaginesynthetase.Thiswouldplacetheirinferred amidoligaseactivityinthecontextofcomparable,knownamidoligase activitiesofrelatedenzymes.Infactitwouldbeadvisabletoplacethefact thatthesearePP-loopenzymesintheabstractitself. Thefollowingsentencewasadded: “ Thisfamilyhadpreviouslybeen previouslydescribedasaPP-loopATPaseofunknownfunctioncontaininga RossmannoidclassHUPdomain(Aravindetal.2002). ” AreferencetothePPloopATPasefamilywasaddedintheabstractasrequested.Areferenceto thesameworkwasaddedwhentalkingabouttheRidAfusion.Forthe phylogeneticdistributiontheresultspresentedhereareabitdifferentfrom thepreviousstudybecausemanymoregenomeareavailableafter10years andweshowthatthefamilyisalsobacterial. 2)TheauthorspersistentlyrefertothedomainasDUF71.Thisnameisno longercurrentinPfamandithaslongbeenrecognizedasmentionedinthe referencenotedabovethattheseproteinsarenot “ domainsofunknown function ” butPP-loopATPases.ThedomainiscorrectlytermedATP_bind_4 (PF01902)inPfam.ThisPfam(notthemisleadingDUF71)nameandPfam numbershouldbeindicatedwithjustastatementintheintroductionthatit wasformerlyDUF71. Thisdomainiscurrentlycalled “ DomainofunknownfunctionDUF71,ATPbindingdomain ” intheInterProdatabase(IPR002761)evenifitiscalled ATP_bind_4(PF01902)inPfam.Itismuchshortertouse(aswellaseasierfor thereadertofollow)theDUF71abbreviationratherthantheATP_bind_4 abbreviation.WethereforeprefertokeepDUF71.Wehoweverintroduceda statementgivingthedifferentnamesofthisdomainintheInterPro,Pfam andCOGdatabasesattheendoftheintroduction. 3 ) Theauthorsapparentlyhaveamisapprehensionregardingthe Methanohalophilusmahiiproteinbothinthetextandthedomainarchitecture renderedinthefigure First, theseproteinshavetwoN-terminaldomainsfused totheMJ0570-likemodule:namelyaN-terminalclass-II glutamineamidotransferase(GAT-II,e.g.seePMID:20023723)andsecondPPloopATPasedomainthereafter(i.e.onerelatedtoasparaginesynthetase).This GATdomainasinthecaseofotherPP-loopenzymescouldsupplyammoniaby cleavingitoffglutamine.ButthisdoesnotexplainwhichPP-loopdomain utilizesit.InthecaseoftheAsn-synthetaseitisusedbythecognatePP-loop domain.InthiscasethepresenceoftwoPP-loopdomainssuggeststhatitis eitherutilizedbybothfordifferentreactionsorelsetheseconddomaindoesnot receivetheNH3fromthisGAT.Thisalsoleadstothequestionwhatreactionis theAsnsynthetaselikePP-loopdomaincatalyzing ?deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page11of13 http://www.biology-direct.com/content/7/1/32

PAGE 12

QualityofwrittenEnglish:Acceptable ThesourceoftheconfusioncamefromthefactthattheAsnSynthase domain(AsnB)containstwodomainstheGAT-IIdomainandthe Asn_Synthase_B_CPP-LoopATPasedomain.Boththefigureandthetext weremodifiedtoavoidtheconfusion.Basedonthereviewer  scomments thesentencediscussingthepotentialroleoftheAsnBdomainwasmodified asfollows: “ Thisdomainorganizationstronglysuggeststhatinthissubsetof enzymes,thehydrolysisofglutaminecatalyzedbythefusedGAT-IIdomain couldprovidetheNH3moietytoboththeDUF71andthe Asn_Synthase_B_Cenzymes. ” 4)Basedonphyleticcomplementaritytheauthorssuggestthatbacterial CbiZmightbedisplacedbythebacterialMJ0570-likeenzymes.Thisseems unusual-WhyutilizeaPP-loopATPaseforthereversereaction,i.e. amidohydrolase?Typicallythereislittleoverlapbetweenthefamilies involvedinamidohydrolaseasopposedtoATP-dependentligaseactivity.Of thealmost12distinctmajorinventionsofamidoligaseactivity,hardlyany representativesofthesesuperfamilieshavebeenreusedasamidohydrolases. Sodotheauthorsnoteanythingspecialinthecaseofthebacterial representativesthatmightsupportsuchafunctionalshift? Thishypothesisisderivedfromphylogeneticdistributionanditisnot unprecedentedthatligasesandhydrolasesarefoundinthesamefamily(see exampleinPMID:12359880).However,weagreethatthishypothesisderives mainlyfromphylogeneticpatternsanalysisandbeyondthedifferencesinthe predictedsubstratebindingpocketfoundintheDUF71-B12familywedid notidentifyspecifychangesthatcouldpointtoashifttohydrolase,hence ourcautioninourpredictionasstatedinthetext. QualityofwrittenEnglish:Acceptable Acknowledgements ThisworkwassupportedbytheUSNationalScienceFoundation(grant MCB-1153413toV.dC-L),theUSNationalInstitutesofHealth(grant U54GM094597toG.T.MontelioneandtheNortheastStructuralGenomics Consortium)andtheAgenceNationalepourlaRecherche (grantANR-10-BINF-01-0127Ancestrome)toC.B-A.WethankRaffael SchaffrathandMikeStarkforsharingforsharingunpublisheddiphthamide relateddataandcriticalevaluationofmanuscriptparts.WethankforJorge Escalante-SemerenaforsharinghisimmenseknowledgeonB12salvage pathways,DianaDownsfordisclosingunpublishedresultsonRidAfunction, ManalSwairjoforchemicalinsight,andAndrewHansonforhelpfulinputon themanuscript. Authordetails1DepartmentofMicrobiologyandCellScience,UniversityofFlorida, Gainesville,FL32611,USA.2DepartmentofBiologicalSciences,Columbia University,NortheastStructuralGenomicsConsortium,1212AmsterdamAve, NewYork,NY10027,USA.3UniversitdeLyon;UniversitLyon1;CNRS; UMR5558,LaboratoiredeBiomtrieetBiologieEvolutive,43boulevarddu11 Novembre1918,Lyon,VilleurbanneF-69622,France. Received:17July2012Accepted:18September2012 Published:26September2012 References1.GreganovaE,AltmannM,BtikoferP: Uniquemodificationsoftranslation elongationfactors. FEBSJ 2011, 278 (15):2613 – 2624. 2.VanNessBG,HowardJB,BodleyJW: ADP-ribosylationofelongationfactor 2bydiphtheriatoxin.Isolationandpropertiesofthenovel ribosyl-aminoacidanditshydrolysisproducts. JBiolChem 1980, 255 (22):10717 – 10720. 3.VanNessBG,HowardJB,BodleyJW: ADP-ribosylationofelongationfactor 2bydiphtheriatoxin.NMRspectraandproposedstructuresof ribosyl-diphthamideanditshydrolysisproducts. JBiolChem 1980, 255 (22):10710 – 10716. 4.ZhangY,ZhuX,TorelliAT,LeeM,DzikovskiB,KoralewskiRM,WangE, FreedJ,KrebsC,EalickSE, etal : Diphthamidebiosynthesisrequiresan organicradicalgeneratedbyaniron – sulphurenzyme. Nature 2010, 465 (7300):891 – 896. 5.ZhuX,DzikovskiB,SuX,TorelliAT,ZhangY,EalickSE,FreedJH,LinH: Mechanisticunderstandingof Pyrococcushorikoshii Dph2,a[4Fe-4S] enzymerequiredfordiphthamidebiosynthesis. MolBiosyst 2011, 7 (1):74 – 81. 6.LartillotN,LepageT,BlanquartS: PhyloBayes3:aBayesiansoftware packageforphylogeneticreconstructionandmoleculardating. Bioinformatics 2009, 25 (17):2286 – 2288. 7.McReeDE: XtalView/Xfit — Aversatileprogramformanipulatingatomic coordinatesandelectrondensity. JStructBiol 1999, 125 (2 – 3):156 – 165. 8.KatohK,MisawaK,KumaK-I,MiyataT: MAFFT:anovelmethodforrapid multiplesequencealignmentbasedonfastFouriertransform. Nucleic AcidsRes 2002, 30 (14):3059 – 3066. 9.WebbTR,CrossSH,McKieL,EdgarR,VizorL,HarrisonJ,PetersJ,JacksonIJ: DiphthamidemodificationofeEF2requiresaJ-domainproteinandis essentialfornormaldevelopment. JCellSci 2008, 121 (19):3140 – 3145. 10.ZhuX,KimJ,SuX,LinH: Reconstitutionofdiphthinesynthaseactivity invitro. Biochemistry 2010, 49 (44):9649 – 9657.11.MattheakisLC,ShenWH,CollierRJ: DPH5,amethyltransferasegene requiredfordiphthamidebiosynthesisin Saccharomycescerevisiae Mol CellBiol 1992, 12 (9):4026 – 4037. 12.MoehringTJ,DanleyDE,MoehringJM: Invitrobiosynthesisof diphthamide,studiedwithmutantChinesehamsterovarycellsresistant todiphtheriatoxin. MolCellBiol 1984, 4 (4):642 – 650. 13.SuX,ChenW,LeeW,JiangH,ZhangS,LinH: YBR246Wisrequiredforthe thirdstepofdiphthamidebiosynthesis. JAmChemSoc 2011, 134 (2):773 – 776. 14.HunterS,JonesP,MitchellA,ApweilerR,AttwoodTK,BatemanA,Bernard T,BinnsD,BorkP,BurgeS, etal : InterProin2011:newdevelopmentsin thefamilyanddomainpredictiondatabase. NucleicAcidsRes 2012, 40 (D1):D306 – D312. 15.FinnRD,MistryJ,TateJ,CoggillP,HegerA,PollingtonJE,GavinOL, GunasekaranP,CericG,ForslundK, etal : ThePfamproteinfamilies database. NucleicAcidsRes 2010, 38 (suppl_1):D211 – D222. 16.TatusovR,FedorovaN,JacksonJ,JacobsA,KiryutinB,KooninE,KrylovD, MazumderR,MekhedovS,NikolskayaA, etal : TheCOGdatabase:an updatedversionincludeseukaryotes. BMCBioinforma 2003, 4 (1):41. 17.AltschulSF,MaddenTL,SchafferAA,ZhangJ,ZhangZ,MillerW,LipmanDJ: GappedBLASTandPSI-BLAST:anewgenerationofproteindatabase searchprograms. NucleicAcidsRes 1997, 25 (17):3389 – 3402. 18.LarkinMA,BlackshieldsG,BrownNP,ChennaR,McGettiganPA,McWilliam H,ValentinF,WallaceIM,WilmA,LopezR, etal : ClustalWandClustalX version2.0. Bioinformatics 2007, 23 (21):2947 – 2948. 19.CorpetF: Multiplesequencealignmentwithhierarchicalclustering. NucleicAcidsRes 1988, 16 (22):10881 – 10890. 20.OverbeekR,BegleyT,ButlerRM,ChoudhuriJV,ChuangHY,CohoonM,de Crcy-LagardV,DiazN,DiszT,EdwardsR, etal : Thesubsystemsapproach togenomeannotationanditsuseintheprojecttoannotate1000 genomes. NucleicAcidsRes 2005,33 (17):5691 – 5702. 21.MarkowitzVM,ChenI-MA,PalaniappanK,ChuK,SzetoE,GrechkinY,Ratner A,AndersonI,LykidisA,MavromatisK, etal : Theintegratedmicrobial genomessystem:anexpandingcomparativeanalysisresource. Nucleic AcidsRes 2010, 38 (suppl1):D382 – D390. 22.AlmEJ,HuangKH,PriceMN,KocheRP,KellerK,DubchakIL,ArkinAP: The MicrobesOnlinewebsiteforcomparativegenomics. GenomeRes 2005, 15 (7):1015 – 1022. 23.HibbsMA,HessDC,MyersCL,HuttenhowerC,LiK,TroyanskayaOG: Exploringthefunctionallandscapeofgeneexpression:directedsearch oflargemicroarraycompendia. Bioinformatics 2007, 23 (20):2692 – 2699. 24.CherryJM,HongEL,AmundsenC,BalakrishnanR,BinkleyG,ChanET, ChristieKR,CostanzoMC,DwightSS,EngelSR, etal : Saccharomyces genomedatabase:thegenomicsresourceofbuddingyeast. NucleicAcids Res 2012, 40 (D1):D700 – D705. 25.HillenmeyerM,EricsonE,DavisR,NislowC,KollerD,GiaeverG: Systematic analysisofgenome-widefitnessdatainyeastrevealsnovelgene functionanddrugaction. GenomeBiol 2010, 11 (3):R30. 26.HillenmeyerME,FungE,WildenhainJ,PierceSE,HoonS,LeeW,ProctorM, St.OngeRP,TyersM,KollerD, etal : Thechemicalgenomicportraitof yeast:uncoveringaphenotypeforallgenes. Science 2008, 320 (5874):362 – 365. 27.LetunicI,BorkP: InteractiveTreeOfLife(iTOL):anonlinetoolforphylogenetic treedisplayandannotation. Bioinformatics 2007, 23 (1):127 – 128. 28.CrooksGE,HonG,ChandoniaJ-M,BrennerSE: WebLogo:asequencelogo generator. GenomeRes 2004, 14 (6):1188 – 1190.deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page12of13 http://www.biology-direct.com/content/7/1/32

PAGE 13

29.PruittKD,TatusovaT,BrownGR,MaglottDR: NCBIReferencesequences (RefSeq):currentstatus,newfeaturesandgenomeannotationpolicy. Nucleic AcidsRes 2012, 40 (D1):D130 – D135. 30.PhilippeH: MUST,acomputerpackageofmanagementutilitiesfor sequencesandtrees. NucleicAcidsRes 1993, 21 (22):5264 – 5272. 31.GouyM,GuindonS,GascuelO: SeaViewVersion4:amultiplatformgraphical userinterfaceforsequencealignm entandphylogenetictreebuilding. Mol BiolEvol 2010, 27 (2):221 – 224. 32.AravindL,AnantharamanV,KooninEV: MonophylyofclassIaminoacyltRNA synthetase,USPA,ETFP,photolyase,andPP-ATPasenucleotide-binding domains:implicationsforproteinevolutionintheRNAworld. Proteins: Structure,FunctBioinform 2002, 48 (1):1 – 14. 33.ForouharF,SaadatN,HussainM,Seethara manJ,LeeI,JanjuaH,XiaoR,ShastryR, ActonTB,MontelioneGT, etal : Alargeconformationalchangeintheputative ATPpyrophosphatasePF0828inducedbyATPbinding. ActaCrystallogrSectF StructBiolCrystCommun 2011, 67 (11):1323 – 1327. 34.BotetJ,Rodriguez-MateosM,BallestaJPG,RevueltaJL,RemachaM: A chemicalgenomicscreenin Saccharomycescerevisiae revealsarolefor diphthamidationoftranslationElongationFactor2ininhibitionof proteinsynthesisbySordarin. AntimicrobAgentsChemother 2008, 52 (5):1623 – 1629. 35.BrC,ZabelR,LiuS,StarkMJR,SchaffrathR: Aversatilepartnerof eukaryoticproteincomplexesthatisinvolvedinmultiplebiological processes:Kti11/Dph3. MolMicrobiol 2008, 69 (5):1221 – 1233. 36.JorgensenR,WangY,VisschedykD,MerrillAR: Thenatureandcharacterof thetransitionstatefortheADP-ribosyltransferasereaction. EMBORep 2008, 9 (8):802 – 809. 37.IyerLM,AbhimanS,MaxwellBurroughsA,AravindL: Amidoligaseswith ATP-grasp,glutaminesynthetase-likeandacetyltransferase-likedomains: synthesisofnovelmetabolitesandpeptidemodificationsofproteins. MolBioSys 2009, 5 (12):1636 – 1660. 38.LambrechtJA,FlynnJM,DownsDM: ConservedYjgFproteinfamily deaminatesreactiveenamine/imineintermediatesofPyridoxal5 Phosphate(PLP)-dependentenzymereactions.JBiolChem 2012, 287 (5):3454 – 3461. 39.Brochier-ArmanetC,ForterreP,GribaldoS: Phylogenyandevolutionofthe Archaea:onehundredgenomeslater. CurrOpinMicrobiol 2011, 14 (3):274 – 281. 40.BorthsEL,PoolmanB,HvorupRN,LocherKP,ReesDC: Invitrofunctional characterizationofBtuCD-F,the Escherichiacoli ABCtransporterfor vitaminB12uptake. Biochemistry 2005, 44 (49):16301 – 16309. 41.ZayasCL,ClaasK,Escalante-SemerenaJC: TheCbiBproteinof Salmonella enterica isanintegralmembraneproteininvolvedinthelaststepofthe denovocorrinringbiosyntheticpathway. JBacteriol 2007, 189 (21):7697 – 7708. 42.Escalante-SemerenaJC: Conversionofcobinamideinto adenosylcobamideinbacteriaandarchaea. JBacteriol 2007, 189 (13):4555 – 4560. 43.WoodsonJD,Escalante-SemerenaJC: CbiZ,anamidohydrolaseenzyme requiredforsalvagingthecoenzymeB12precursorcobinamidein archaea. ProcNatlAcadSciUSA 2004, 101 (10):3591 – 3596. 44.GrayMJ,Escalante-SemerenaJC: Thecobinamideamidohydrolase(cobyric acid-forming)CbiZenzyme:acriticalactivityofthecobamide remodellingsystemof Rhodobactersphaeroides MolMicrobiol 2009, 74 (5):1198 – 1210.doi:10.1186/1745-6150-7-32 Citethisarticleas: deCrcy-Lagard etal. : Comparativegenomicanalysis oftheDUF71/COG2102familypredictsrolesindiphthamide biosynthesisandB12salvage. BiologyDirect 2012 7 :32. Submit your next manuscript to BioMed Central and take full advantage of: € Convenient online submission € Thorough peer review € No space constraints or color “gure charges € Immediate publication on acceptance € Inclusion in PubMed, CAS, Scopus and Google Scholar € Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page13of13 http://www.biology-direct.com/content/7/1/32


xml version 1.0 encoding utf-8 standalone no
mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance
xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd
metsHdr CREATEDATE 2013-01-09T12:07:58
agent ROLE CUSTODIAN TYPE ORGANIZATION
name BioMed Central
dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1
mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml
xmlData
epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI
http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd
epdcx:description epdcx:resourceId sword-mets-epdcx-1
epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork
http:purl.orgdcelements1.1title
epdcx:valueString Comparative genomic analysis of the DUF71/COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage
http:purl.orgdctermsabstract
Abstract
Background
The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family.
Results
The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing diphthine-ammonia ligase (EC 6.3.1.14), Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins. Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest that some archaeal and bacterial DUF71 proteins participate in B12 salvage.
Conclusions
This detailed analysis of the DUF71 family members provides an example of the power of integrated data-miming for solving important “missing genes” or “missing function” cases and illustrates the danger of functional annotation of protein families by homology alone.
Reviewers’ names
This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind.
http:purl.orgdcelements1.1creator
de Crécy-Lagard, Valérie
Forouhar, Farhad
Brochier-Armanet, Céline
Tong, Liang
Hunt, John F
http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1
http:purl.orgeprintentityTypeExpression
http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066
en
http:purl.orgeprinttermsType
http:purl.orgeprinttypeJournalArticle
http:purl.orgdctermsavailable
epdcx:sesURI http:purl.orgdctermsW3CDTF 2012-09-26
http:purl.orgdcelements1.1publisher
BioMed Central Ltd
http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus
http:purl.orgeprintstatusPeerReviewed
http:purl.orgeprinttermscopyrightHolder
Valérie de Crécy-Lagard et al.; licensee BioMed Central Ltd.
http:purl.orgdctermslicense
http://creativecommons.org/licenses/by/2.0
http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights
http:purl.orgeprintaccessRightsOpenAccess
http:purl.orgeprinttermsbibliographicCitation
Biology Direct. 2012 Sep 26;7(1):32
http:purl.orgdcelements1.1identifier
http:purl.orgdctermsURI http://dx.doi.org/10.1186/1745-6150-7-32
fileSec
fileGrp sword-mets-fgrp-1 USE CONTENT
file sword-mets-fgid-0 sword-mets-file-1
FLocat LOCTYPE URL xlink:href 1745-6150-7-32.xml
sword-mets-fgid-1 sword-mets-file-2 applicationpdf
1745-6150-7-32.pdf
sword-mets-fgid-3 sword-mets-file-3 applicationvnd.openxmlformats-officedocument.spreadsheetml.sheet
1745-6150-7-32-S2.XLSX
sword-mets-fgid-4 sword-mets-file-4
1745-6150-7-32-S1.XLSX
sword-mets-fgid-5 sword-mets-file-5
1745-6150-7-32-S3.PDF
structMap sword-mets-struct-1 structure LOGICAL
div sword-mets-div-1 DMDID Object
sword-mets-div-2 File
fptr FILEID
sword-mets-div-3
sword-mets-div-4
sword-mets-div-5
sword-mets-div-6